Version 11 (modified by jmfernandez, 17 years ago) |
---|
Workflow Workgroup
Here I hope to learn from Taverna, Remora, MOWServ etc. other existing workflow frameworks which utilize various webservices.
Existing Workflow Engines
Desktop/Rich Client Applications:
Web Applications:
Discussion Items
- What is the current bottleneck to create bioinformatics workflows?
- What kind of workflows have been created? (see examples below)
- What kind of services are missing yet? (see wishlist below)
- Reusing existing workflows as virtual services.
- The naïve end-user problem
- How are you giving a end-user GUI for your workflows?
- ID mapping problem
- Data format conversion
- Job management
- Remote and local execution
- Grid integration
- One possible solution to provide grid access from simple web service API: NBCR Opal Toolkit
Example Workflows and Current Isuues
- Network/Pathway Analysis (suppose client is Cytoscape)
- Load Networks from various kinds of data sources:
- local files
- web services ( IntAct, etc.)
- manually download from web applications ( STRING, BioGRID, etc.)
- NLP(Natual Language Processiong)-based data miner ( Agilent Litelature Search)
- Load annotations:
- Local files (XML docs/text file/Excel worksheet)
- Web services (BioMart?, NCBI Gene, PICR, KEGG)
- Gene Ontology
- Gene Expression data
- Analyze the networks by plugins
- MCODE
- jActiveModules
- Visualize the result
- Generate publishable quality images
- Load Networks from various kinds of data sources:
- Questions:
- How can we (at least partially) automate this by connecting to other workflow engins?
- Scripting vs Visual Programming
- (Please add your typical workflows here...)
- Sample workflows in Web API for Bioinformatics (WABI)
- http://www.xml.nig.ac.jp/workflow/index.html
- http://www.myexperiment.org
- http://ubio.bioinfo.cnio.es/biotools/IWWEM/workflowmanager.html
Workflow Wishlist
- Obtaining a sequence family and/or profile associated with a PDB entry
- This would involve:
- Get FASTA file for PDB chain (from RCSB-PDB, MSD-EBI, or PDBj)
- Get family and/or profile (from NCBI, UNIPROT or DDBJ)
- This would involve:
- Build phylogenetic tree from set of sequence and structure alignments
- This is trickier then the above example, but one approach would be:
- Cluster sequences using clustalw (from NCBI,EBI, DDBJ)
- Collect the PDB IDs for each sequence that has a structure (from RCSB-PDB, MSD-EBI, or PDBj:Sequence Navigator)
- Compute all-on-all structure alignments for those sequences with structures (from MSD-EBI:SSM or PDBj:ASH)
- Now, combine all the sequence scores and structure scores (if available) into a single distance matrix
- Compute tree from distance matrix
- This is trickier then the above example, but one approach would be: