Version 2 (modified by will, 16 years ago)

--

Each Day's progress is described below.

Day 2 Progress.

The long-term goal of the Glycomics standards and interoperability group is to integrate the emerging bioinformatics tools for glycobiology into the larger bioinformatics world. Bioinformatics for glycobiology is in its infancy, and the tools for identifying glycan structures, their biosynthetic mechanisms, and their biological function are just being developed. The three participants in this group have taken active roles in developing these tools.

A major obstacle in this endeavor is the difficulty in the nonambiguous digital represention of complex glycans. This is due, in part, to the branched nature of glycans and the fact that individual glycosyl residues can be linked to each other in several different ways. Several different successful representation protocols have been developed, including LINUCS (developed by DKFZ in Heidelberg, where Rene Ranzinger works), and KCF (developed by KEGG, where Kiyoko Aoki-Kinoshita has worked and remains a valued collaborator). Recently, it has become clear that interoperability of the various databases and web services for glycobiology depend on an robust data exchange standard, leading to the development of GLYDE-II as a collaborative effort by York (at the Complex Carbohydrate Research Center[CCRC] at the University of Georgia) Ranzinger, Aoki-Kinoshita, and others. GLYDE-II is now almost completely functional, providing a key element for interoperability of informatics for glycobiology.

Advancement in this area demands robust protocols for web service discovery and composition of web processes. The BioHackathon? is a unique opportunity to get developers of glycoinformatics together to explore possibilities for this purpose. Thus, the immediate goal of the group is to develop a prototype workflow that integrates web services provided by the groups of the three glycoinformatics participants. This will serve as a test-bed and model for future integration efforts.

On Tuesday (Day two of the Hackathon) the following progress was made:

1. The overall goal was agreed upon and a very simple prototype workflow was designed. The workflow involves three WSs, linked together as follows.

[microarray data] --> RINGS (Aoki-Kinoshita) --> [KEGG Glycan IDs] --> GlycomeDB (Ranzinger) --> [GLYDE-II ] --> GLYDE2SVG (York)

2. BioMOBY was chosen as the development platform for workflow development, with transfer to Taverna upon generation of a functional workflow.

3. BioMOBY Dashboard was successfully installed on two laptops to faclitate exploration and definition of data type objects for inclusion into the BioMOBY framework. This was much more difficult than anticipated, as it took some time to satisfy all of the dependencies of the software, which were not clearly stated in the installation instructions.

4. The BioMOBY object ontology (Object-OWLDL2.owl) was downloaded and explored using the Protege ontology editor.

5. The preliminary design of data objects for glycoinformatics, including different structural representation data objects, was completed. This involved looking at the BioMOBY object ontology and BioMOBY Dashboard (#2 and #3, above).

The group members were enthusiastic about the simple and elegant way that BioMOBY can locate and evaluate WSs acting as a WS client to generate functional Web Processes. However, all were concerned with the ad hoc nature of the data object definition process, and the apparent lack of error checking for this process. By design, the BioMOBY object ontology contains little else other than syntactic information. While this is a strength in terms of ease of use and flexibility, it may lead to major difficulties in scaling the BioMOBY platform. For example, a web service provider could have a service (call it SQRT) that takes as input an integer and returns a float, which is the square root of the integer. Without any problem the developer could then define a BioMOBY data object called "myNumber" and register the web service, specifying that input and output are both instances of "myNumber". While this would go against the spirit of BioMOBY, there is nothing to prevent this from happening. Then, clients could find the service if they somehow know what kind of object "myNumber" is. Clients attempting to find WSs that take integers as input would fail to find the WS. That is, the process of defining data objects is so flexible that it is possible to create all sorts of different data objects that may be identical to other data objects that have already been defined, or to specify that a WS takes a data object that it really cannot handle. It is totally up to the WS to check for errors of this type. This lack of rigorous vocabulary control may become a major problem as BioMOBY is scaled up.

Despite these concerns of the group members, it was decided to continue with BioMOBY to generate a prototype web process. It is expected that the BioMOBY developers are keenly aware of these issues and are actively attempting to resolve them, and that failure of BioMOBY to scale well will be averted.