GSoC Week 5

September 07, 2025

This week, I mainly focused on diagnosing why adding KEGG still failed to yield finer-grained SBO classifications across 108 models, extending the EC lookup pipeline, and experimenting with additional data sources. Here’s the complete summary of the work update during the week.

Root-Cause Investigation

Added adviaecnumber() helper to attempt refinement via EC number–driven mapping.
Re-ran the enhanced pipeline on 108 models ，reactions that remained in generic SBO classes were still missing EC numbers.
Result: no additional reactions could be refined to more specific SBO classes.
Conclusion: the main blocker is not the mapping rule itself but the absence of EC numbers for those reactions across the currently consulted sources.

New Data Sources Added

SEED adapter: uses the Solr query interface with strict null filtering to avoid spurious entries.
Reactome adapter: combines web parsing with the QuickGO API to perform a multi-step conversion/mapping chain for EC retrieval.

Performance Observation

Querying four databases (KEGG, BiGG, SEED, Reactome) to obtain EC numbers is currently very time-consuming.
End-to-end processing time observed at approximately 14 hours per model under the present implementation and network conditions.

Current Status

Enhanced multi-source EC lookup is wired into the unified adapter layer (adapter pattern).
Despite broader coverage attempts (KEGG/BiGG/SEED/Reactome), the lack of EC numbers addressed, But it is time consumin

Regards!

Search This Blog

Gsoc2025: Enhancing SBOannotator with LLM Integration & Dynamic Term Retrieval

GSoC Week 5

Comments

Post a Comment

Popular posts from this blog

GSoC Week1

GSoC week2