= The Problem = One of the goals of BioSQL was to provide an interchange platform for the Bio* objects. This has not yet succeeded due to differences in the way the Bio* projects interpret an individual sequence record and how they persist it to the database. Common sequence semantics and object/ format handling would probably be of great benefit to many other WS providers and consumers. If Bio* can agree on semantics it would be a good reference for many other projects. = Possible Tasks = * Choose some 'reference' sequences to see how the Bio* projects 'round-trip' them. * Where are the differences and why are there differences? * Find out where each Bio* project persists it's data into BioSQL during ORM. * Why are there differences? * Establish guidelines for where things should go in BioSQL, eg given a Genbank file, what bits should go where. * UML diagrams? * Define an interchange format for the Bio* projects. Probably XML, probably borrow something already existing (XEMBL etc). * Decide on a restricted vocab for annotations and feature types. Probably use SO. * Define a middleware API for uniform I/O access to sequence database. * Intially backed by BioSQL. * Could be backed by any DB. * Derby version of the BioSQL schema (Derby is the Java reference database). * A BioSQL release. = Participants = * Mark Schreiber * Jan Aerts * Richard Holland * Hilmar Lapp * Heikki Lehvaslaiho * Richard Bruskiewich * Jan Byrne * Raoul J.P. Bonnal = Random ideas from Jan Aerts = Mind that this is *very* incomplete. Just to help my really bad memory. As the issue is interoperability of the Bio* toolkits, we don't have to synchronize the toolkits at the object level, but rather at the interface level. First thing to check: what types of objects do we want to synchronize? Of course sequence objects; but what else? The results of a BLAST parsing? == For sequences == Check if each toolkit reads and writes a !GenBank/Fasta/... serialization in the same way. Input can either be an original !GenBank/Fasta/... file or a dbfetch from any database. * What should be conserved: * Tags * for a sequence: lower/uppercase * Within a project it is desirable to mask an alphabet, for transfer between bio* projects this is not a good idea. * What not necessarily should be conserved: * for a sequence over multiple lines: length of each line * Proposal for a default value, 60bp ? = Task achieved = == Tuesday == * Initial planning. * Approved a BioSQL logo. * Hilmar initiated BioSQL release discussion * Selected sequence files to roundtrip * Began roundtrips * BioPerlRoundTripFirstPass * BioJavaRoundTripFirstPass * Started UML diagram to describe object model with Richard Bruskiewich. Back to [wiki:ListOfTopics]