GSoC Week4
Overview
This week focused on two tracks:
-
Enhancing SBOannotator’s EC annotation coverage by introducing a unified database adapter (adapter pattern).
-
Community outreach: posting a brief project intro with code/blog links to SBML discuss, COBRA, COMBINE, SysMod, X (Twitter), and LinkedIn
Code & Features
1) New adapter.py
(Unified Database Adapter Layer)
Purpose: Aggregate multiple biology databases to retrieve EC numbers via the adapter pattern.
Key Components:
-
EnzymeDataAdapter: abstract base class defining a unified interface.
-
KEGGAdapter: queries the KEGG REST API (reaction ID → EC).
-
BiGGAdapter: queries the BiGG universal reactions database.
-
UnifiedEnzymeDataProvider: coordinates adapters and performs deduplication.
-
callForECAnnotRxnUnified(reaction)
: entry point that integrates with the original system.
2) SBOannotatorEnhancedClass.py
vs SBOannotator.py
Key change: at line 94, replace with:
This supplements the original logic by fetching EC numbers via BiGG/KEGG APIs, then applying more appropriate SBO terms.
3) main.py
modifications
-
Dual-run comparison: run both the original and enhanced workflows on the same model.
-
Result diff/metrics: show the annotation differences between the two.
-
Timing/profiling: record wall time for both paths.
Runs & Outputs
Expected per-model flow
-
Round 1 (original): pre/post SBO term stats, unannotated reactions, execution time.
-
Round 2 (enhanced): API-based EC lookup with final SBO term stats and execution time.
-
Output file:
RECON1_SBOannotated.xml
→../../models/Annotated_Models/
. -
Performance comparison: original vs enhanced wall time.
Batch Evaluation Result
-
108 models were processed.
-
Only 2 reactions were converted from the generic SBO:0000176 (non-covalent binding/association placeholder) to more specific categories.
Current data shows a real but modest improvement. Subsequent work will aim to increase hit rate and stability via additional data sources, refined matching heuristics, and caching/rate limiting.
Community Interactions
Sent a brief intro email with blog and source code links to SBML discuss / COBRA / COMBINE / SysMod; also posted updates on X (Twitter) and LinkedIn to gather feedback on database coverage and edge cases.
Comments
Post a Comment