GSoC Week1
Hi,
This week focused on the design and implementation of a dynamic update mechanism for SBO (Systems Biology Ontology) data.
Key Decisions and Progress
1. Data Update Strategy
-
Decision: On application startup → Check for updates → Download new data → Application becomes available
-
Reason: Prioritize data consistency
Status: ✅ Implemented and Pull Request submitted
2. User Control Over Updates
-
Draeger’s Suggestion: Allow users to skip updates to support reproducibility of specific versions
-
Nantia’s Suggestion: Enforce updates to ensure data consistency
Status: Further discussion and decision in Wednesday's meeting
3. Data Version Management Strategy
-
System Version History Management
-
strategy1: Only retain the latest version to save storage
startegy 2:Retain up to 3 versions (2 historical + 1 latest)
Status: Further discussion and decision in Wednesday's meeting
-
-
User Import History Management
-
strategy:whether or not Support manual loading of local OLS files (suggested by Draeger
Status: Further discussion and decision in Wednesday's meeting
-
Technical Implementation Progress
Core Feature Development
-
Pull Request:
Add dynamic SBO fetching from OLS API
Key Functions Implemented
-
Version Checking & Update Detection
-
get_remote_updated_time()
: Fetch latest "updated" timestamp from ols api. -
get_local_updated_time()
: Retrieve local SBO "updated" timestamp -
compare_timestamps()
: Determine if update is required
-
-
Data Download & Processing
-
Full SBO ontology download
-
Paginated term retrieval (100 per page)
-
Fetch parent-child hierarchy relations
-
API Rate Control: 0.05-second delay between requests
-
-
Structured Data Storage
-
Metadata: Update time, version, total term count
-
Term Details: Description, label, IRI, OBO ID, parent link, parent label, parent OBO ID,
-
Statistical Summary
-
-
Intelligent File Management
-
Auto-naming based on "updated" timestamp
-
Supports co-existence of multiple versions
-
Incremental update support
-
Technical Highlights
-
Incremental Updates: Download only when updated is detected
-
Fault Tolerance: Detailed error handling and logging
-
API Friendliness: Prevent OLS server overload
-
Data Integrity: Full hierarchical structure information retrieved
Comments
Post a Comment