GSoC Week 6
To address the time issue, I introduced two key optimizations into the annotation pipeline:
-
Early termination: as soon as a precise SBO term is obtained for the current reaction, stop querying further databases to avoid unnecessary requests.
-
EC number truncation: truncate from the first non-digit character to enforce consistent EC formatting and reduce matching noise.
I then re-ran the full evaluation on 108 models.
Core Optimizations
Early Termination
-
Goal: within the adapter chain, once a non-generic SBO (i.e., not SBO:0000176) is found, immediately stop querying other data sources to reduce I/O and network wait.
-
Effect: significantly lowers total query volume and shortens overall wall time.
Results
-
Across the 108 models, 3,317 reactions were converted from the generic SBO:0000176 to more specific SBO categories.
-
Per-model average processing time: 432.99 seconds/model (≈ 7.22 minutes/model).
-
Compared with the previous sequential four-database approach (~ 14 hours/model), the new setup achieves about 116× speedup (≈ 50,400 ÷ 432.99).
-
Total wall time for all 108 models: 46,762.92 seconds (≈ 12.99 hours).
Comments
Post a Comment