GSoC Week 9
Three production upgrades were shipped around data-source query strategy and annotation quality control:
-
Config-driven query strategy management
-
Target SBO term tracking
-
Smart reaction filtering to reduce LLM ambiguity
1) Config-Driven Query Strategy Management
Presets
-
Fast: BiGG only for the fastest response
-
Balanced: BiGG + KEGG for a balance of recall and latency
-
Comprehensive: all sources (BiGG/KEGG/SEED/Reactome) for maximum coverage
Custom configuration: users can arbitrarily combine any of the four databases.
Interactive UI: supports live edit, input validation, and exception handling to prevent task failures due to misconfiguration.
2) Target SBO Term Tracking
-
Maintain a set of parent SBO terms to identify reactions that require further LLM processing.
-
Perform in-pipeline monitoring during annotation; once a reaction matches a target term, immediately log details including the SBO term, list of EC numbers, and annotation source/adapter to enable later aggregation and manual review.
3) Smart Reaction Filtering
-
Single EC number: send directly to the LLM for fine-grained classification.
-
Multiple EC numbers: check whether the first digit is consistent:
-
If consistent → keep and generate a unified EC prefix identifier.
-
If inconsistent → treat as an EC number conflict and filter it out to avoid ambiguity for the LLM.
-
Goal: maximize disambiguation before entering the LLM, improving final annotation precision and interpretability.
Comments
Post a Comment