Cloud Workflows for Proteomics Data Analysis
Copyright Notice
This work is licensed under a Creative Commons Attribution-ShareAlike
3.0 Unported License (CC BY-SA).
Copyright© 2012 Yassene Mohammed
Cloud X!Tandem Workflow
This workflow runs X!Tandem on the cloud. X!Tandem is an opensource
database search engine for peptide identification, i.e. the mapping of
each spectrum to a unique peptide or more peptides. The cloud here
includes any Linux machine to which Taverna has an SSH access to. The
workflow takes 5 fixed and 3 variable inputs.
The fixed inputs should normally be set once for multiple runs. These
inputs are:
- mzxmlDecomposerExe: a string indicates the full path to the
mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the
pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- runTandemExe: a string indicates the full path to the runTandem
executable (runTandem_vXXX.jar)
- tandem2xmlExe: a string indicates the full path to the tandem2xml
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run X!Tandem.
- tandemExe: a string indicates the full path to the tandem (X!Tandem)
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run X!Tandem.
The three variable inputs are modified normally for each run. These
are:
- nrOfDaughters: the number of intended daughter files. In order to use
the available cloud machines/workers to their maximum, the
noOfDaughters is ideally an integer factor of the available cloud
workers.
- fastaFileZipped: a string indicates the full path to the zipped
search data base in FASTA format.
- mzxmlFile: a string or a list of strings indicate the full path to
the mzXML file(s).
The access to the cloud workers should be configured in the Tool
Invocation. This includes adding the IP address of the workers. More
information about this can be found online in the Taverna
documentation.
Download the workflow from here
Needed software
- Taverna workflow
engine
- TPP
X!Tandem
- mzxmlDecomposer
- pepxmlComposer
- runTandem
Cloud SpectraST Workflow
This workflow runs SpectraST on the cloud. SpectraST is an open source
library search engine for peptide identification, i.e. the mapping of
each spectrum to a unique peptide or more peptides. The cloud here
includes any Linux machine to which Taverna has an SSH access to. The
workflow takes 5 fixed and 3 variable inputs.
The fixed inputs should normally be set once for multiple runs. These
inputs are:
- spectrastParamerters: a string contains the command line parameters
for SpectraST search.
- mzxmlDecomposerExe: a string indicates the full path to the
mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the
pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- indexmzxmlExe: a string indicates the full path to the indexmzxml
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run SpectraST.
- spectrastExe: a string indicates the full path to the SpectraST
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run the search.
The three variable inputs are modified normally for each run. These
are:
- nrOfDaughters: the number of intended daughter files. In order to use
the available cloud machines/workers to their maximum, the
noOfDaughters is ideally an integer factor of the available cloud
workers.
- NISTLibraryZipped: a string indicates the full path to the zipped
search library files (including the .slip, .spidx and .pepidx files).
- mzxmlFile: a string or a list of strings indicate the full path to
the mzXML file(s).
The access to the cloud workers should be configured in the Tool
Invocation. This includes adding the IP address of the workers. More
information about this can be found online in the Taverna
documentation.
Download the workflow from here
Needed software
- Taverna workflow
engine
- TPP
X!Tandem
- mzxmlDecomposer
- pepxmlComposer
Advanced Cloud SpectraST Workflow
This workflow runs SpectraST on the cloud. SpectraST is an open source
library search engine for peptide identification, i.e. the mapping of
each spectrum to a unique peptide or more peptides. The cloud here
includes any Linux machine to which Taverna has an SSH access to. The
workflow takes 6 fixed and 4 variable inputs.
The fixed inputs should normally be set once for multiple runs. These
inputs are:
- spectrastParamerters: a string contains the command line parameters
for SpectraST search.
- mzxmlDecomposerExe: a string indicates the full path to the
mzxmlDecomposer executable (mzxmlDecomposer_vXXX.jar)
- pepxmlComposerExe: a string indicates the full path to the
pepxmlComposer executable (pepxmlComposer_vXXX.jar)
- indexmzxmlExe: a string indicates the full path to the indexmzxml
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run SpectraST.
- spectrastExe: a string indicates the full path to the SpectraST
executable. This executable is part of the Trans Proteomics Pipeline
(TPP) package and should be compiled for the cloud target machine,
which the workflow will use to run the search.
- cloudWorkingDirectory: this directory normally is used to keep and
share the executables and libraries files between the different calls
on the same cloud machines during the same run.
The 4 variable inputs are modified normally for each run. These are:
- nrOfDaughters: the number of intended daughter files. In order to use
the available cloud machines/workers to their maximum, the
noOfDaughters is ideally an integer factor of the available cloud
workers.
- NISTLibraryZipped: a string indicates the full path to the zipped
search library files (including the .slip, .spidx and .pepidx files).
- mzxmlFile: a string or a list of strings indicate the full path to
the mzXML file(s).
- nrOfWorkers: the exact the number of the cloud workers or machines,
to which the Taverna Tool services will have access to in order to run
the search.
The access to the cloud workers should be configured in the Tool
Invocation. This includes adding the IP address of the workers. More
information about this can be found online in the Taverna
documentation.
Download the workflow from here
Needed software
- Taverna workflow
engine
- TPP
X!Tandem
- mzxmlDecomposer
- pepxmlComposer
Feedback
Please send your feedback, questions, comments and suggestions for
improvement to y.mohammed@lumc.nl
Last update 25 June 2012