Cloud Workflows for Proteomics Data Analysis

Copyright Notice

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA).
Copyright© 2012 Yassene Mohammed

Creative Commons License

Cloud X!Tandem Workflow

This workflow runs X!Tandem on the cloud. X!Tandem is an opensource database search engine for peptide identification, i.e. the mapping of each spectrum to a unique peptide or more peptides. The cloud here includes any Linux machine to which Taverna has an SSH access to. The workflow takes 5 fixed and 3 variable inputs.
The fixed inputs should normally be set once for multiple runs. These inputs are:
- mzxmlDecomposerExe: a string indicates the full path to the mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- runTandemExe: a string indicates the full path to the runTandem executable (runTandem_vXXX.jar)
- tandem2xmlExe: a string indicates the full path to the tandem2xml executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run X!Tandem.
- tandemExe: a string indicates the full path to the tandem (X!Tandem) executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run X!Tandem.

The three variable inputs are modified normally for each run. These are:
- nrOfDaughters: the number of intended daughter files. In order to use the available cloud machines/workers to their maximum, the noOfDaughters is ideally an integer factor of the available cloud workers.
- fastaFileZipped: a string indicates the full path to the zipped search data base in FASTA format.
- mzxmlFile: a string or a list of strings indicate the full path to the mzXML file(s).

The access to the cloud workers should be configured in the Tool Invocation. This includes adding the IP address of the workers. More information about this can be found online in the Taverna documentation.

Download the workflow from here 

Needed software

- Taverna workflow engine
- TPP X!Tandem
- mzxmlDecomposer
- pepxmlComposer
- runTandem

Cloud SpectraST Workflow


This workflow runs SpectraST on the cloud. SpectraST is an open source library search engine for peptide identification, i.e. the mapping of each spectrum to a unique peptide or more peptides. The cloud here includes any Linux machine to which Taverna has an SSH access to. The workflow takes 5 fixed and 3 variable inputs.
The fixed inputs should normally be set once for multiple runs. These inputs are:
- spectrastParamerters: a string contains the command line parameters for SpectraST search.
- mzxmlDecomposerExe: a string indicates the full path to the mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- indexmzxmlExe: a string indicates the full path to the indexmzxml executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run SpectraST.
- spectrastExe: a string indicates the full path to the SpectraST executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run the search.

The three variable inputs are modified normally for each run. These are:
- nrOfDaughters: the number of intended daughter files. In order to use the available cloud machines/workers to their maximum, the noOfDaughters is ideally an integer factor of the available cloud workers.
- NISTLibraryZipped: a string indicates the full path to the zipped search library files (including the .slip, .spidx and .pepidx files).
- mzxmlFile: a string or a list of strings indicate the full path to the mzXML file(s).

The access to the cloud workers should be configured in the Tool Invocation. This includes adding the IP address of the workers. More information about this can be found online in the Taverna documentation.

Download the workflow from here 

Needed software

- Taverna workflow engine
- TPP X!Tandem
- mzxmlDecomposer
- pepxmlComposer

Advanced Cloud SpectraST Workflow

This workflow runs SpectraST on the cloud. SpectraST is an open source library search engine for peptide identification, i.e. the mapping of each spectrum to a unique peptide or more peptides. The cloud here includes any Linux machine to which Taverna has an SSH access to. The workflow takes 6 fixed and 4 variable inputs.
The fixed inputs should normally be set once for multiple runs. These inputs are:
- spectrastParamerters: a string contains the command line parameters for SpectraST search.
- mzxmlDecomposerExe: a string indicates the full path to the mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- indexmzxmlExe: a string indicates the full path to the indexmzxml executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run SpectraST.
- spectrastExe: a string indicates the full path to the SpectraST executable. This executable is part of the Trans Proteomics Pipeline (TPP) package and should be compiled for the cloud target machine, which the workflow will use to run the search.
- cloudWorkingDirectory: this directory normally is used to keep and share the executables and libraries files between the different calls on the same cloud machines during the same run.

The 4 variable inputs are modified normally for each run. These are:
- nrOfDaughters: the number of intended daughter files. In order to use the available cloud machines/workers to their maximum, the noOfDaughters is ideally an integer factor of the available cloud workers.
- NISTLibraryZipped: a string indicates the full path to the zipped search library files (including the .slip, .spidx and .pepidx files).
- mzxmlFile: a string or a list of strings indicate the full path to the mzXML file(s).
- nrOfWorkers: the exact the number of the cloud workers or machines, to which the Taverna Tool services will have access to in order to run the search.

The access to the cloud workers should be configured in the Tool Invocation. This includes adding the IP address of the workers. More information about this can be found online in the Taverna documentation.

Download the workflow from here 

Needed software

- Taverna workflow engine
- TPP X!Tandem
- mzxmlDecomposer
- pepxmlComposer

Feedback

Please send your feedback, questions, comments and suggestions for improvement to y.mohammed@lumc.nl


Last update 25 June 2012