GSoC Week 9

September 07, 2025

Three production upgrades were shipped around data-source query strategy and annotation quality control:

Config-driven query strategy management
Target SBO term tracking
Smart reaction filtering to reduce LLM ambiguity

1) Config-Driven Query Strategy Management

Presets

Fast: BiGG only for the fastest response
Balanced: BiGG + KEGG for a balance of recall and latency
Comprehensive: all sources (BiGG/KEGG/SEED/Reactome) for maximum coverage

Custom configuration: users can arbitrarily combine any of the four databases.
Interactive UI: supports live edit, input validation, and exception handling to prevent task failures due to misconfiguration.

2) Target SBO Term Tracking

Maintain a set of parent SBO terms to identify reactions that require further LLM processing.
Perform in-pipeline monitoring during annotation; once a reaction matches a target term, immediately log details including the SBO term, list of EC numbers, and annotation source/adapter to enable later aggregation and manual review.

3) Smart Reaction Filtering

Single EC number: send directly to the LLM for fine-grained classification.
Multiple EC numbers: check whether the first digit is consistent:
- If consistent → keep and generate a unified EC prefix identifier.
- If inconsistent → treat as an EC number conflict and filter it out to avoid ambiguity for the LLM.

Goal: maximize disambiguation before entering the LLM, improving final annotation precision and interpretability.

Search This Blog

Gsoc2025: Enhancing SBOannotator with LLM Integration & Dynamic Term Retrieval

GSoC Week 9

1) Config-Driven Query Strategy Management

2) Target SBO Term Tracking

3) Smart Reaction Filtering

Comments

Post a Comment

Popular posts from this blog

GSoC Week1

GSoC week2

GSoC Week 5