Context Navigation

Workflow Workgroup

Here I hope to learn from Taverna, Remora, MOWServ etc. other existing workflow frameworks which utilize various webservices.

Existing Workflow Engines

Desktop/Rich Client Applications:

Web Applications:

Discussion Items

What is the current bottleneck to create bioinformatics workflows?

What kind of workflows have been created? (see examples below)
What kind of services are missing yet? (see wishlist below)
Reusing existing workflows as virtual services.
The naïve end-user problem
- How are you giving a end-user GUI for your workflows?
ID mapping problem
- Some usufl services are already there:
  - BioMart
  - PICR
- Need standard API!
- One-to-many/many-to-one mapping problem
Data format conversion
Job management
Remote and local execution
Grid integration
- One possible solution to provide grid access from simple web service API: NBCR Opal Toolkit

Example Workflows and Current Isuues

Network/Pathway Analysis (suppose client is Cytoscape)
1. Load Networks from various kinds of data sources:
  - local files
  - web services ( IntAct, etc.)
  - manually download from web applications ( STRING, BioGRID, etc.)
  - NLP(Natual Language Processiong)-based data miner ( Agilent Litelature Search)
2. Load annotations:
  - Local files (XML docs/text file/Excel worksheet)
  - Web services (BioMart?, NCBI Gene, PICR, KEGG)
  - Gene Ontology
  - Gene Expression data
3. Analyze the networks by plugins
  - MCODE
  - jActiveModules
4. Visualize the result
5. Generate publishable quality images

Questions:
- How can we (at least partially) automate this by connecting to other workflow engins?
- Scripting vs Visual Programming

(Please add your typical workflows here...)
- Sample workflows in Web API for Bioinformatics (WABI)
- http://www.xml.nig.ac.jp/workflow/index.html
- http://www.myexperiment.org
- http://ubio.bioinfo.cnio.es/biotools/IWWEM/workflowmanager.html

Workflow Wishlist

Obtaining a sequence family and/or profile associated with a PDB entry
- This would involve:
  1. Get FASTA file for PDB chain (from RCSB-PDB, MSD-EBI, or PDBj)
  2. Get family and/or profile (from NCBI, UNIPROT or DDBJ)

Build phylogenetic tree from set of sequence and structure alignments
- This is trickier then the above example, but one approach would be:
  1. Cluster sequences using clustalw (from NCBI,EBI, DDBJ)
  2. Collect the PDB IDs for each sequence that has a structure (from RCSB-PDB, MSD-EBI, or PDBj:Sequence Navigator)
  3. Compute all-on-all structure alignments for those sequences with structures (from MSD-EBI:SSM or PDBj:ASH)
  4. Now, combine all the sequence scores and structure scores (if available) into a single distance matrix
  5. Compute tree from distance matrix

Download in other formats:

Plain Text