GSoC Week 5

 This week, I mainly focused on diagnosing why adding KEGG still failed to yield finer-grained SBO classifications across 108 models, extending the EC lookup pipeline, and experimenting with additional data sources. Here’s the complete summary of the work update during the week.

Root-Cause Investigation

  • Added adviaecnumber() helper to attempt refinement via EC number–driven mapping.

  • Re-ran the enhanced pipeline on 108 models ,reactions that remained in generic SBO classes were still missing EC numbers.

  • Result: no additional reactions could be refined to more specific SBO classes.

  • Conclusion: the main blocker is not the mapping rule itself but the absence of EC numbers for those reactions across the currently consulted sources.

New Data Sources Added

  • SEED adapter: uses the Solr query interface with strict null filtering to avoid spurious entries.

  • Reactome adapter: combines web parsing with the QuickGO API to perform a multi-step conversion/mapping chain for EC retrieval.

Performance Observation

  • Querying four databases (KEGG, BiGG, SEED, Reactome) to obtain EC numbers is currently very time-consuming.

  • End-to-end processing time observed at approximately 14 hours per model under the present implementation and network conditions.

Current Status

  • Enhanced multi-source EC lookup is wired into the unified adapter layer (adapter pattern).

  • Despite broader coverage attempts (KEGG/BiGG/SEED/Reactome), the lack of EC numbers addressed, But it is time consumin

Regards!

Comments

Popular posts from this blog

GSoC Week1

GSoC week2