GSoC Week 6

 

To address the time issue, I introduced two key optimizations into the annotation pipeline:

  • Early termination: as soon as a precise SBO term is obtained for the current reaction, stop querying further databases to avoid unnecessary requests.

  • EC number truncation: truncate from the first non-digit character to enforce consistent EC formatting and reduce matching noise.

I then re-ran the full evaluation on 108 models.

Core Optimizations

Early Termination

  • Goal: within the adapter chain, once a non-generic SBO (i.e., not SBO:0000176) is found, immediately stop querying other data sources to reduce I/O and network wait.

  • Effect: significantly lowers total query volume and shortens overall wall time.

Results

  • Across the 108 models, 3,317 reactions were converted from the generic SBO:0000176 to more specific SBO categories.

  • Per-model average processing time: 432.99 seconds/model (≈ 7.22 minutes/model).

  • Compared with the previous sequential four-database approach (~ 14 hours/model), the new setup achieves about 116× speedup (≈ 50,400 ÷ 432.99).

  • Total wall time for all 108 models: 46,762.92 seconds (≈ 12.99 hours).

Comments

Popular posts from this blog

GSoC Week1

GSoC week2

GSoC Week 5