hide
Free keywords:
ISMB2022, Bio-ontologies COSI, SPARQL, RDF, Linked Data, Semantic Web, WikiData, Uniprot-KB, omero, openBIS, SEEK, Tripal
Abstract:
We have recently published an updated genome assembly and annotation of our
model organism \Pseudomonas fluorescens SBW25. We are now facing the challenge to keep the annotation up to date with novel results from experimental and computational studies of gene function, fitness assays, regulatory and metabolic networks in a continuous, transparent, and accessible manner. In this contribution we will present how we combine various opensource software tools and open data and metadata standards into a public knowledge base for our model organism. The central part is our genome database and genome browser which is based on the opensource framework Tripal. It allows internal and external colleagues to feed in their data and results in a curated fashion.
To further integrate our data we are working on a Linked Data architecture that connects our genome database to various *omics databanks such as UniProt, KEGG or Rfam, as well as to internal datasources such as our microscopy image database, strain database, sequence data repository, and data sharing platform to form an organism specific knowledge graph. By exposing a public SPARQL endpoint, our data ultimately becomes part of the world wide semantic web that incorporates other, domain specific knowledge graphs but also generic data sources such as Wikipedia (via WikiData) and social media hubs. In this way, our system facilitates the growth of the Pseudomonas fluorescens SBW25 knowledge graph both through manual explorations as well as through automated procedures.
All components of our system are opensource products. We heavily benefit from open data and metadata standards and we strive to ``pay'' back to the opensource community by contributing customizations to the various software projects and by making our genome annotation ontology part of the public domain.