Each Day's progress is described below.
Day 2 Progress
The long-term goal of the Glycomics standards and interoperability group is to integrate the emerging bioinformatics tools for glycobiology into the larger bioinformatics world. Bioinformatics for glycobiology is in its infancy, and the tools for identifying glycan structures, their biosynthetic mechanisms, and their biological function are just being developed. The three participants in this group have taken active roles in developing these tools.
A major obstacle in this endeavor is the difficulty in the nonambiguous digital represention of complex glycans. This is due, in part, to the branched nature of glycans and the fact that individual glycosyl residues can be linked to each other in several different ways. Several different successful representation protocols have been developed, including LINUCS (developed by DKFZ in Heidelberg, where Rene Ranzinger works), and KCF (developed by KEGG, where Kiyoko Aoki-Kinoshita has worked and remains a valued collaborator). Recently, it has become clear that interoperability of the various databases and web services for glycobiology depend on an robust data exchange standard, leading to the development of GLYDE-II as a collaborative effort by York (at the Complex Carbohydrate Research Center[CCRC] at the University of Georgia) Ranzinger, Aoki-Kinoshita, and others. GLYDE-II is now almost completely functional, providing a key element for interoperability of informatics for glycobiology.
Advancement in this area demands robust protocols for web service discovery and composition of web processes. The BioHackathon? is a unique opportunity to get developers of glycoinformatics together to explore possibilities for this purpose. Thus, the immediate goal of the group is to develop a prototype workflow that integrates web services provided by the groups of the three glycoinformatics participants. This will serve as a test-bed and model for future integration efforts.
On Tuesday (Day two of the Hackathon) the following progress was made:
1. The overall goal was agreed upon and a very simple prototype workflow was designed. The workflow involves three WSs, linked together as follows.
[microarray data] --> RINGS (Aoki-Kinoshita) --> [KEGG Glycan IDs] --> GlycomeDB (Ranzinger) --> [GLYDE-II ] --> GLYDE2SVG (York)
2. BioMOBY was chosen as the development platform for workflow development, with transfer to Taverna upon generation of a functional workflow.
3. BioMOBY Dashboard was successfully installed on two laptops to faclitate exploration and definition of data type objects for inclusion into the BioMOBY framework. This was much more difficult than anticipated, as it took some time to satisfy all of the dependencies of the software, which were not clearly stated in the installation instructions.
4. The BioMOBY object ontology (Object-OWLDL2.owl) was downloaded and explored using the Protege ontology editor.
5. The preliminary design of data objects for glycoinformatics, including different structural representation data objects, was completed. This involved looking at the BioMOBY object ontology and BioMOBY Dashboard (#2 and #3, above).
The group members were enthusiastic about the simple and elegant way that BioMOBY can locate and evaluate WSs acting as a WS client to generate functional Web Processes. However, all were concerned with the ad hoc nature of the data object definition process, and the apparent lack of error checking for this process. By design, the BioMOBY object ontology contains little else other than syntactic information. While this is a strength in terms of ease of use and flexibility, it may lead to major difficulties in scaling the BioMOBY platform. For example, a web service provider could have a service (call it SQRT) that takes as input an integer and returns a float, which is the square root of the integer. Without any problem the developer could then define a BioMOBY data string-encoded object called "myNumber" and register the web service, specifying that input and output are both instances of "myNumber". While this would go against the spirit of BioMOBY, there is nothing to prevent this from happening. Then, clients could find the service if they somehow know what kind of object "myNumber" is. Clients attempting to find WSs that take integers as input would fail to find the WS. That is, the process of defining data objects is so flexible that it is possible to create all sorts of different data objects that may be identical to other data objects that have already been defined, or to specify that a WS takes a data object that it really cannot handle. It is totally up to the WS to check for errors of this type. This lack of rigorous vocabulary control may become a major problem as BioMOBY is scaled up.
Despite these concerns of the group members, it was decided to continue with BioMOBY to generate a prototype web process. It is expected that the BioMOBY developers are keenly aware of these issues and are actively attempting to resolve them, and that failure of BioMOBY to scale well will be averted.
Day 3 Progress
Toy MOBY web services were implemented. Due to the complexity of the dependencies in java and our inexperience with MOBY, this took virtually all day. The taxonomy of MOBY data objects for our domain (glycomics) was discussed, and the issue of semantic vs syntactic subsumption reared its ugly head.
Day 4 Progress
We began to develop actually useful MOBY WSs in our domain. This took most of the day, due to our inexperience and the complexity of implementing the WSs on our web services on our servers (remotely). Long discussions about sntax vs. semantics in the MOBY data object taxonomy.
Day 5 Progress
Decided on the following taxonomy for glycomics data objects.
Glyco_SVG (has_a String) is_a Glyco_Image is_a Glycomics_Object is_a Object Glyde_II (has_a String) is_a Glyco_Sequence is_a Glycomics_Object is_a Object
(underscores added to prevent WIKI formatting of strings)
The(semantic) organization of this taxonomy (based on advice from several different hackathon attendees) may contravene the conclusion stated at the end of the Hackathon, that "syntax" rules in the MOBY data type taxonomy.
Continued development of MOBY WS, and registered the following three.
Only the first was completely implemented. However, the MOBY registrations allowed us to create the following workflow in Taverna.
Glycomics Object (ns=LINUCS) --> getSimilarGlycans (Aoki-Kinoshita) --> Glycomics Object (ns=KEGG) --> getGlydeByID (Ranzinger) --> GLYDE-II --> GLYDE2SVG (York) --> glycoSVG
This prototype workflow will be used as a basis for further development at our home institutions, with the goal of making easily usable workflows for glycoscientists who use our data and software.