GSoC Week 5
This week, I mainly focused on diagnosing why adding KEGG still failed to yield finer-grained SBO classifications across 108 models, extending the EC lookup pipeline, and experimenting with additional data sources. Here’s the complete summary of the work update during the week.
Root-Cause Investigation
-
Added
adviaecnumber()
helper to attempt refinement via EC number–driven mapping. -
Re-ran the enhanced pipeline on 108 models ,reactions that remained in generic SBO classes were still missing EC numbers.
-
Result: no additional reactions could be refined to more specific SBO classes.
-
Conclusion: the main blocker is not the mapping rule itself but the absence of EC numbers for those reactions across the currently consulted sources.
New Data Sources Added
-
SEED adapter: uses the Solr query interface with strict null filtering to avoid spurious entries.
-
Reactome adapter: combines web parsing with the QuickGO API to perform a multi-step conversion/mapping chain for EC retrieval.
Performance Observation
-
Querying four databases (KEGG, BiGG, SEED, Reactome) to obtain EC numbers is currently very time-consuming.
-
End-to-end processing time observed at approximately 14 hours per model under the present implementation and network conditions.
Current Status
-
Enhanced multi-source EC lookup is wired into the unified adapter layer (adapter pattern).
-
Despite broader coverage attempts (KEGG/BiGG/SEED/Reactome), the lack of EC numbers addressed, But it is time consumin
Regards!
Comments
Post a Comment