GSoC Week4

 

Overview

This week focused on two tracks:

  • Enhancing SBOannotator’s EC annotation coverage by introducing a unified database adapter (adapter pattern).

  • Community outreach: posting a brief project intro with code/blog links to SBML discuss, COBRA, COMBINE, SysMod, X (Twitter), and LinkedIn

Code & Features

1) New adapter.py (Unified Database Adapter Layer)

Purpose: Aggregate multiple biology databases to retrieve EC numbers via the adapter pattern.
Key Components:

  • EnzymeDataAdapter: abstract base class defining a unified interface.

  • KEGGAdapter: queries the KEGG REST API (reaction ID → EC).

  • BiGGAdapter: queries the BiGG universal reactions database.

  • UnifiedEnzymeDataProvider: coordinates adapters and performs deduplication.

  • callForECAnnotRxnUnified(reaction): entry point that integrates with the original system.

2) SBOannotatorEnhancedClass.py vs SBOannotator.py

Key change: at line 94, replace with:

callForECAnnotRxnUnified(reaction) # Enhanced: query multiple databases

This supplements the original logic by fetching EC numbers via BiGG/KEGG APIs, then applying more appropriate SBO terms.

3) main.py modifications

  • Dual-run comparison: run both the original and enhanced workflows on the same model.

  • Result diff/metrics: show the annotation differences between the two.

  • Timing/profiling: record wall time for both paths.

Runs & Outputs

Expected per-model flow

  • Round 1 (original): pre/post SBO term stats, unannotated reactions, execution time.

  • Round 2 (enhanced): API-based EC lookup with final SBO term stats and execution time.

  • Output file: RECON1_SBOannotated.xml../../models/Annotated_Models/.

  • Performance comparison: original vs enhanced wall time.

Batch Evaluation Result

  • 108 models were processed.

  • Only 2 reactions were converted from the generic SBO:0000176 (non-covalent binding/association placeholder) to more specific categories.
    Current data shows a real but modest improvement. Subsequent work will aim to increase hit rate and stability via additional data sources, refined matching heuristics, and caching/rate limiting.

Community Interactions

Sent a brief intro email with blog and source code links to SBML discuss / COBRA / COMBINE / SysMod; also posted updates on X (Twitter) and LinkedIn to gather feedback on database coverage and edge cases.

Comments

Popular posts from this blog

GSoC Week1

GSoC week2

GSoC Week 5