GSoC week2
Hi,
As discussed in the last blog, I started working on my first milestone of fetching SBO table from OLS to SBO annotator. This blog contains a summary of the meeting with the new features I added and the issues I worked on and fixed this week to add SBML L3V2 support. So, let's quickly dive in summary:
1. Change Detection Mechanism for SBO Terms
-
❓Issue: OLS API’s
updated
field changed, but no observable differences in content. -
✅ Observation:
-
Compared OLS (API) and GitHub
.owl
versions – content is identical. -
Latest update timestamp on OLS is 2025, but GitHub file was last modified in 2023.
-
New terms are already included in the GitHub version.
-
-
✅ Decision: We can rely on GitHub
.owl
file for consistency and reduced request time.
2. Fetching ‘is_a’ / Parent Info Performance Issue
-
⏱️ Fetching
subclass_of
(parent info) via OLS takes ~10 minutes. -
⚡ Other fields can be fetched in <1 minute.
-
✅ Optimization: Parent info should be pulled from GitHub
.owl
, not OLS API, to avoid delay.
3. Difference Detection Strategy
-
❓How to compare current vs previous SBO graph?
-
✅ Two options discussed:
-
Hardcoded comparison
-
Use
git diff
or existing libraries in Python
-
-
✅ Preferred approach: Use whatever is time-efficient. No need to overengineer.
4. Change Logging Mechanism
-
Any changes (added, removed, updated terms) should be logged to a
.log
or JSON diff report. -
Fields to include in logs:
-
description
,obo id
,label
,parent id
,parent label
-
-
For CLI: show summary like:
-
+5 terms added
,-2 terms removed
,~3 terms updated
-
5. User Control over Updates
-
Draeger: Users should be able to skip updates (for reproducibility).
-
Nantia: Updates should be enforced (for consistency).
-
✅ Compromise Strategy:
-
At app launch, check for updates and prompt user:
-
(A) Use latest
-
(B) Use previously downloaded version
-
(C) Load local
.json
file manually
-
-
6. Version Management Strategy
6.1. System Version Storage
-
Option 1: Only keep latest (space-saving)
-
Option 2: Keep latest + 1 previous version (balance)
-
✅ Final: Keep 2 versions per run:
-
The one at app launch
-
The newly updated one (if changes detected)
-
6.2. User-imported Version Support
-
✅ Support for manually importing
.json
or.owl
files recommended (per Draeger)
7. File Reference
-
Current active file in system:
sbo_data_updated_2025-06-17T17-50-27-631081774.json
→ This is converted from OLS -
For performance: will switch to GitHub version to reduce API latency
Comments
Post a Comment