Context Navigation

Version 17 (modified by markjschreiber, 17 years ago)
--

The Problem

One of the goals of BioSQL was to provide an interchange platform for the Bio* objects. This has not yet succeeded due to differences in the way the Bio* projects interpret an individual sequence record and how they persist it to the database.

Common sequence semantics and object/ format handling would probably be of great benefit to many other WS providers and consumers. If Bio* can agree on semantics it would be a good reference for many other projects.

Possible Tasks

Choose some 'reference' sequences to see how the Bio* projects 'round-trip' them.
- Where are the differences and why are there differences?
Find out where each Bio* project persists it's data into BioSQL during ORM.
- Why are there differences?
Establish guidelines for where things should go in BioSQL, eg given a Genbank file, what bits should go where.
- UML diagrams?
Define an interchange format for the Bio* projects. Probably XML, probably borrow something already existing (XEMBL etc).
Decide on a restricted vocab for annotations and feature types. Probably use SO.
Define a middleware API for uniform I/O access to sequence database.
- Intially backed by BioSQL.
- Could be backed by any DB.
Derby version of the BioSQL schema (Derby is the Java reference database).
A BioSQL release.

Participants

Mark Schreiber
Jan Aerts
Richard Holland
Hilmar Lapp
Heikki Lehvaslaiho
Richard Bruskiewich
Jan Byrne
Raoul J.P. Bonnal

Random ideas from Jan Aerts

Mind that this is *very* incomplete. Just to help my really bad memory. As the issue is interoperability of the Bio* toolkits, we don't have to synchronize the toolkits at the object level, but rather at the interface level.

First thing to check: what types of objects do we want to synchronize? Of course sequence objects; but what else? The results of a BLAST parsing?

For sequences

Check if each toolkit reads and writes a GenBank/Fasta/... serialization in the same way. Input can either be an original GenBank/Fasta/... file or a dbfetch from any database.

What should be conserved:
- Tags
- for a sequence: lower/uppercase
  - Withing a project it is desirable to mask an alphabet, for transfer between bio* projects this is not a good idea.

What not necessarily should be conserved:
- for a sequence over multiple lines: length of each line

Task achieved

Tuesday

Initial planning.
Approved a BioSQL logo.
Hilmar initiated BioSQL release discussion
Selected sequence files to roundtrip
Began roundtrips

Back to ListOfTopics

Attachments

dag1.fa (5.6 KB) - added by jan.aerts 17 years ago. DAG1 gene: FASTA formatted sequence
dag1.gb (17.9 KB) - added by jan.aerts 17 years ago. DAG1 gene: genbank formatted file exported from NCBI
dag1.insd (38.6 KB) - added by jan.aerts 17 years ago. DAG1 gene: INSD XML formatted file exported from NCBI
dag1.asn1 (58.0 KB) - added by jan.aerts 17 years ago. DAG1 gene: ASN.1 formatted file exported from NCBI
dag1.gb_xml (227.9 KB) - added by jan.aerts 17 years ago. DAG1 gene: genbank XML formatted file exported from NCBI (probably?)
aj224122.asn1 (17.9 KB) - added by jan.aerts 17 years ago. AJ224122 in ASN1 format as downloaded from NCBI
aj224122.fa (3.9 KB) - added by jan.aerts 17 years ago. AJ224122 in FASTA format as downloaded from NCBI
aj224122.gb (8.9 KB) - added by jan.aerts 17 years ago. AJ224122 in genbank format as downloaded from NCBI
aj224122.gb_xml (68.3 KB) - added by jan.aerts 17 years ago. AJ224122 in genbank XML format as downloaded from NCBI
aj224122.insd (19.5 KB) - added by jan.aerts 17 years ago. AJ224122 in INSD format as downloaded from NCBI
aj224122.swiss (7.8 KB) - added by heikki 17 years ago. AJ224122 translation (DOF37_ARATH/Q43385) in swiss-prot format as downloaded from UniProt?
dag1-biojava.fa (5.7 KB) - added by markjschreiber 17 years ago. biojava roundtrip of fasta
aj224122-biojava.fa (3.9 KB) - added by markjschreiber 17 years ago. biojava roundtrip of fasta
aj224122-biojava.gb (9.1 KB) - added by markjschreiber 17 years ago. biojava roundtrip of genbank
aj224122-biojava.insd (22.6 KB) - added by markjschreiber 17 years ago. biojava roundtrip of ISNDseq
aj224122.embl (9.1 KB) - added by markjschreiber 17 years ago. Correct version of EMBL file
aj224122.EMBL.xml (14.9 KB) - added by markjschreiber 17 years ago. EMBLxml format
aj224122.uniprot.xml (17.2 KB) - added by markjschreiber 17 years ago. uniprot xml
aj224122-biojava.insd.xml (23.6 KB) - added by markjschreiber 17 years ago. biojava round trip of INSDseq
Main.java (4.7 KB) - added by markjschreiber 17 years ago. Roundtrip program for biojava
derby-biosql.sql (27.7 KB) - added by markjschreiber 17 years ago. DERBY schema for BioSQL
aj224122-biojava.embl (9.2 KB) - added by holland 17 years ago. biojava embl round-trip
aj224122-biojava.swiss (7.8 KB) - added by holland 17 years ago. biojava uniprot round-trip
aj224122-biojava.uniprot.xml (18.0 KB) - added by holland 17 years ago. biojava uniprotxml round trip
aj224122-bioruby.embl (8.9 KB) - added by jan.aerts 17 years ago. bioruby roundtrip for EMBL format
roundtrip_bioruby.rb (161 bytes) - added by jan.aerts 17 years ago. Ruby script for roundtrip (i.c. EMBL)
bioruby_aav50056_fasta_annotated.png (108.3 KB) - added by jan.aerts 17 years ago. Annotated picture of a FASTA entry
bioruby_ab09071_embl_annotated.png (92.0 KB) - added by jan.aerts 17 years ago. Annotated picture of an EMBL entry
bioruby_ab09071_gb_annotated.png (124.9 KB) - added by jan.aerts 17 years ago. Annotated picture of a GenBank? entry
bioperl_convert.pl (1.6 KB) - added by heikki 17 years ago. Bioperl script fr roundtriping EMBL, GenBank? and Swiss-Prot files
aj224122-bioperl.fa (3.9 KB) - added by heikki 17 years ago. bioperl processed fasta file
aj224122-bioperl.gb (8.9 KB) - added by heikki 17 years ago. bioperl processed Genbank file
aj224122-bioperl.swiss (7.8 KB) - added by heikki 17 years ago. bioperl processed swissprot file
aj224122-bioperl.embl (8.9 KB) - added by heikki 17 years ago. bioperl processed EMBL file

Download in other formats:

Plain Text