GSoC week2

Hi,

As discussed in the last blog, I started working on my first milestone of fetching SBO table from OLS to SBO annotator. This blog contains a summary of the meeting with the new features I added and the issues I worked on and fixed this week to add SBML L3V2 support. So, let's quickly dive in summary:

1. Change Detection Mechanism for SBO Terms

  • Issue: OLS API’s updated field changed, but no observable differences in content.

  • Observation:

    • Compared OLS (API) and GitHub .owl versions – content is identical.

    • Latest update timestamp on OLS is 2025, but GitHub file was last modified in 2023.

    • New terms are already included in the GitHub version.

  • Decision: We can rely on GitHub .owl file for consistency and reduced request time.


2. Fetching ‘is_a’ / Parent Info Performance Issue

  • ⏱️ Fetching subclass_of (parent info) via OLS takes ~10 minutes.

  • ⚡ Other fields can be fetched in <1 minute.

  • Optimization: Parent info should be pulled from GitHub .owl, not OLS API, to avoid delay.


3. Difference Detection Strategy

  • How to compare current vs previous SBO graph?

  • ✅ Two options discussed:

    • Hardcoded comparison

    • Use git diff or existing libraries in Python

  • ✅ Preferred approach: Use whatever is time-efficient. No need to overengineer.


4. Change Logging Mechanism

  • Any changes (added, removed, updated terms) should be logged to a .log or JSON diff report.

  • Fields to include in logs:

    • description, obo id, label, parent id, parent label

  • For CLI: show summary like:

    • +5 terms added, -2 terms removed, ~3 terms updated


5. User Control over Updates

  • Draeger: Users should be able to skip updates (for reproducibility).

  • Nantia: Updates should be enforced (for consistency).

  • Compromise Strategy:

    • At app launch, check for updates and prompt user:

      • (A) Use latest

      • (B) Use previously downloaded version

      • (C) Load local .json file manually


6. Version Management Strategy

6.1. System Version Storage

  • Option 1: Only keep latest (space-saving)

  • Option 2: Keep latest + 1 previous version (balance)

  • Final: Keep 2 versions per run:

    • The one at app launch

    • The newly updated one (if changes detected)

6.2. User-imported Version Support

  • Support for manually importing .json or .owl files recommended (per Draeger)


7. File Reference

  • Current active file in system:
    sbo_data_updated_2025-06-17T17-50-27-631081774.json
    → This is converted from OLS

  • For performance: will switch to GitHub version to reduce API latency



Complete code is updated in the feature/ols/add_dynamic_sbo_fetching branch in my forked repository linked here .

Comments

Popular posts from this blog

GSoC Week1

GSoC Week 5