GSoC Week1

Hi,
This week focused on the design and implementation of a dynamic update mechanism for SBO (Systems Biology Ontology) data.


Key Decisions and Progress

1. Data Update Strategy

  • Decision: On application startup → Check for updates → Download new data → Application becomes available

  • Reason: Prioritize data consistency

  • Status: ✅ Implemented and Pull Request submitted

2. User Control Over Updates

  • Draeger’s Suggestion: Allow users to skip updates to support reproducibility of specific versions

  • Nantia’s Suggestion: Enforce updates to ensure data consistency

  • Status:  Further discussion and decision in Wednesday's meeting

3. Data Version Management Strategy

  • System Version History Management

    • strategy1:  Only retain the latest version to save storage 

    • startegy 2:Retain up to 3 versions (2 historical + 1 latest)

    • Status:  Further discussion and decision in Wednesday's meeting

  • User Import History Management

    • strategy:whether or not Support manual loading of local OLS files (suggested by Draeger

    • Status:  Further discussion and decision in Wednesday's meeting


Technical Implementation Progress

Core Feature Development

Key Functions Implemented

  • Version Checking & Update Detection

    • get_remote_updated_time(): Fetch latest "updated"  timestamp from ols api.

    • get_local_updated_time(): Retrieve local SBO "updated" timestamp

    • compare_timestamps(): Determine if update is required

  • Data Download & Processing

    • Full SBO ontology download

    • Paginated term retrieval (100 per page)

    • Fetch parent-child hierarchy relations

    • API Rate Control: 0.05-second delay between requests

  • Structured Data Storage

    • Metadata: Update time, version, total term count

    • Term Details: Description, label, IRI, OBO ID, parent link, parent label, parent OBO ID,

    • Statistical Summary

  • Intelligent File Management

    • Auto-naming based on  "updated"  timestamp

    • Supports co-existence of multiple versions

    • Incremental update support


Technical Highlights

  • Incremental Updates: Download only when updated is detected

  • Fault Tolerance: Detailed error handling and logging

  • API Friendliness: Prevent OLS server overload

  • Data Integrity: Full hierarchical structure information retrieved

Comments

Popular posts from this blog

GSoC week2

GSoC Week 5