Mass spectrometry software

From Wikipedia, the free encyclopedia

This article or section reads like a news release, or is otherwise written in an overly promotional tone.
Please help rewrite this article from a neutral point of view to be less promotional.
Where appropriate, blatant advertising may be marked for speedy deletion with {{db-spam}}.

This article or section uses abbreviations that may be confusing or ambiguous.
Please improve the article or discuss changes on the talk page.(December 2007)

Mass spectrometry software is any software for data acquisition, analysis or data representation in mass spectrometry.

Most of the following tools work on the mass spectrometry data formats mzData and mzXML.

If you are in the market for Mass spectrometry software consider your application first. For standard sequenced organisms, many standard database search engine providers will suffice. If research is being conducted where unsequenced organisms may present, a de novo sequencing algorithm is imperative.Many software providers produce valuded database search engine algorthims have various strengths and weaknesses. For this reason many researchers are seeking software tools that offer a complete package of tools such as de novo sequencing, database protein identification, and possibly quantification.

1 PEAKS
2 PROTRAWLER
3 REGATTA
4 SPIDER
5 SEQUEST
6 Mascot
7 VIPER and Decon2LS
8 Phenyx
9 OpenMS / TOPP
10 Xtandem
11 Mass Frontier
12 massXpert
13 External links
14 References
15 Published Resources

[edit] PEAKS

PEAKS is world renowned for solving the de novo sequencing problem (identifying unknown proteins) as well provides a strong protein identification search engine. This software is one of the earliest and successful adaptors for de novo sequencing (both automated and manual) and sequence tag based searching (SPIDER). In short, de novo sequencing is peptide sequencing performed without prior knowledge of the amino acid sequence, approximately 1 spectra per second or a run of 1000 spectra in about 20 minutes.

Reasons for de novo:

Good quality spectra are often left unexplained after database searching; de novo sequencing is the only recourse.
Since a database search cannot reliably find peptides that are unexpectedly modified, de novo sequence information is a valuable first step in finding these PTM.
A de novo sequencing based approach is proven to be the best method database search results validation. False positives are virtually eliminated.
Sequence tag searching requires good quality de novo sequences to be effective. Coverage can be drastically increased by using a sequence tag based and ms/ms ion searches together.
An incomplete or corrupted database is a common, but invisible problem in protein identification and characterization research. How many peptides are you missing because of this? De novo sequencing as a regular part of your workflow will help catch these.
When studying species that are not in the public databases, de novo sequencing is the only way to get valuable peptide sequence information.

PEAKS provides a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and greater knowledge for scientificly sensitive in depth investigations.

inChorus: One of the most useful tools in any form of research is the ability to compare results. PEAKS will cross check test results automatically with other protein ID search engines, such as Sequest, OMSSA, X!Tandem and Mascot. This approach guards against false positive peptide assignments.

PEAKS reads all standard vendor data formats: ABI, Aglient, Bruker, Thermo, Waters, etc.

Reliable software package: de novo (industry gold standard) and database search engine, inChorus meta server (for comparing multiple methods easily), sequence homology tool and quantitation available. Good scalability and very fast processing speed.

[edit] PROTRAWLER

ProTrawler is an LCMS data reduction application that reads raw mass spectrometry vendor data (from a variety of well-known instrument companies) and creates lists of {mass, retention time, integrated signal intensity} triplets summarizing the LCMS chromatogram. The measurements are reported with errors, which are essential for performing dynamic binning for comparisons between data sets. ProTrawler operates in two modes: a highly visual hands-on (expert) mode for the development of parameters used in data reduction and a fully automated mode for moving through many chromatograms in an automated fashion. ProTrawler's data reduction work flow includes background elimination, noise estimation, peak shape estimation, shape deconvolution, and isotopic and charge-state list deconvolution (factoring in errors and signal noise) to give a list features. Typically, ProTrawler reduces 1 GB of raw data to 10 Kb of processed results with a detection sensitivity of three orders of magnitude in 25% of the data acquisition time. No formal Bayesian methods are used, but sophisticated statistical inference is employed throughout. ProTrawler has been used for bacterial protein biomarker discovery efforts as well as for IPEx-related applications.

[edit] REGATTA

Regatta is an LCMS list comparison application that works hand-in-hand with ProTrawler (but accepts input in Excel/CSV form) to provide an environment for LCMS results list filtering and normalization {mass, retention time, integrated intensity} lists. To accomplish this, Regatta solves the famous Transitive Property of Equality problem that arises in the comparison of analytical list data, viz., if Peak A in Sample A overlaps Peak B in Sample B, and Peak B overlaps Peak C in Sample C, but Peak A does not overlap Peak C, then can we say that we've measured the same analyte in all three samples or not? Regatta also implements multivariate analysis, e.g., hierarchical cluster analysis, principal component analysis, as well as statistical tests, e.g., coefficients of variation. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.

[edit] SPIDER

For the identification of proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. The SPIDER algorithm matches sequence tags with errors to database sequences for the purpose of protein and peptide identification.^[1]

BLAST (and similar) homology approaches can fail when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS. SPIDER is designed to avoid these problems.

SPIDER can be used in conjunction with PEAKS mass spectrometry data analysis software.

[edit] SEQUEST

SEQUEST is a tandem mass spectrometry data analysis program ^[2]. Sequest identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences.

This tool is most useful in the context of shotgun proteomics. Starting with a complex mixture of proteins, this strategy typically employs trypsin to digest proteins. These peptides are separated by liquid chromatography en route to a tandem mass spectrometer. The mass spectrometer then isolates ions of a particular peptide, subjects them to collision-induced dissociation, and records the produced fragments in a tandem mass spectrum. This process, repeated for several hours, will produce thousands of tandem mass spectra. Identifying such a data collection requires automation, and Sequest was the first software to fill that need.

Sequest, like many engines, identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and Sequest uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, Sequest projects a theoretical tandem mass spectrum, and Sequest compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.

While very successful in terms of sensitivity, it is quite slow to process data and there are concerns against specificity (especially if multiple PTMS are present).

[edit] Mascot

Matrix Science produces an algorithm called "Mascot" that performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.

Formerly the dominant leader for database search engine and used by competitors for benchmarking. Improvements by competitors and new technologies have caught up to and surpassed in performance. Although very sensitive, the software is significantly slower when looking for multiple PTM.

[edit] VIPER and Decon2LS

The "Proteomics Research Resource for Integrative Biology" distributes software tools (VIPER ^[3], Decon2LS, and others) that can be used to perform analysis of accurate mass and chromatography retention time analysis of LC-MS features. Sometimes referred to as the Accurate Mass and Time tag approach (AMT tag approach) generally these tools are used for Proteomics.

[edit] Phenyx

Phenyx is developed by Geneva Bioinformatics (GeneBio) in collaboration with the Swiss Institute of Bioinformatics (SIB). Phenyx incorporates OLAV, a family of statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments. Although, not RAW, unprocessed data. ^[4] Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.

In addition to regular peptide and protein identification features, Phenyx proposes a number of additional functionalities, such as: a result comparison interface to visualise side-by-side multiple results; an import functionality to incorporate results from other search engines; a manual validation feature to manually accept/reject identifications and dynamically recalculates protein scores.

[edit] OpenMS / TOPP

OpenMS is a software C++ library for LC/MS data management and analysis. It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.

TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL.

OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Department for Simulation of Biological Systems of Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.

[edit] Xtandem

X! Tandem is open source software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.

This software has a very simple, sophisticated application programming interface (API): it simply takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file.^[5] This format is used for all of the X! series search engines, as well as the GPM and GPMDB.

Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random.^[6] Therefore, separate assembly and statistical analysis software (e.g. PeptideProphet and ProteinProphet) are not needed.

This approach is good in terms of speed but poor with regard to false negatives and sensitivity.

[edit] Mass Frontier

Mass Frontier is a software tool for interpretation and management of mass spectra of small molecules. Computer methods for interpretation of mass spectral data in Mass Frontier centre on three fundamental methodologies: library search techniques, expert system procedures and classification methods. Mass Frontier uses automated generation of possible fragments at an expert level, including complete fragmentation and rearrangement mechanisms, starting from a user-supplied chemical structure. This software contains an expert system that automatically extracts a decomposition mechanism for each fragmentation reaction in the fragmentation library and determines the compound class range that the mechanism can be applied to. The expert system applies database mechanisms to a user provided structure and automatically predicts the fragmentation reactions for a given compound. The knowleadge base uses around 30,000 fragmentation schemes that contain around 100,000 reactions collected from mass spectrometry literature.

Mass Frontier also incorporates an automated system for detecting chromatographic components in complex GC/MS, LC/MS or MSⁿ runs and extracting mass spectral signals from closely coeluting components (deconvolution).

Classification methods include principal component analysis, neural networks and fuzzy clustering.

[edit] massXpert

massXpert is a graphical user interface-based (GUI) software for simulating and analyzing mass spectrometric data obtained on known bio-polymer sequences. The software runs in an identical manner on MS-Windows, Mac OS X and GNU/Linux/Unix platforms. massXpert is not for identifying proteins, but is useful when characterizing biopolymer sequences (post-translational modifications, intra-molecular cross-links...). It comprises four modules, all available in the same program interface: XpertDef will let the user define any aspect of the polymer chemistry at hand (atoms/isotopes, monomers, modifications, cleavage agents, fragmentation patterns, cross-links, default ionization...) ; XpertCalc is a desktop calculator with which anything mass is calculatable (the calculation is polymer chemistry definition-aware and is fully programmable; m/z ratios are computable with automatic replacement of the ionization agent ; isotopic patterns are computable starting from an elemental composition, with the possibility to specify the resolution of the mass spectrometer) ; XpertEdit is the central part of the software suite. In it reside all the simualtions/analysis functionalities, like polymer sequence editing, sequence/monomer chemical modifications, cleavages, fragmentations, elemental/monomeric composition determinations, pI/net charge calculations, arbitrary mass searches in the polymer sequence; XpertMiner is a rather recently developed module (still experimental) in which it is possible to import lists of (m/z, z) pairs to submit them to any kind of calculation. Typically this module will be used to apply a formula to all the pairs in a single strike, or to perform matches between two lists, one from a simulation and another from the mass spectrometric data actually gotten from the mass spectrometer. All the simulations' results can be exported in the form of text either to the clipboard or to text files. The massXpert software is fully documented as a pdf file or as an HTML file hierarchy.

[edit] External links

[edit] References

^ PEAKS: SPIDER (Sequence Homology Search Tool)
^ Eng JK et al (1994). "Analysis of the An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database". JASMS.
^ Monroe ME et al (2007). "Analysis of the VIPER: an advanced software package to support high-throughput LC-MS peptide identification". Bioinformatics 23: 2021. doi:10.1093/bioinformatics/btm281. PMID 17545182.
^ Colinge J, Masselot A, Giron M, Dessingy T, Magnin J (2003). "OLAV: towards high-throughput tandem mass spectrometry data identification". Proteomics 3 (8): 1454–63. doi:10.1002/pmic.200300485. PMID 12923771.
^ http://www.thegpm.org/docs/X_series_output_form.pdf here
^ http://www.thegpm.org/docs/peptide_protein_expect.pdf

[edit] Published Resources

Bin Ma, et al. Search for the Undiscovered Peptide; Using de novo sequencing and sequence tag homology search to improve protein characterization, Biotechniques Journal, Vol. 42, No. 5, 2007.
Changjiang Xu, Bin Ma, Software for Computational Peptide Identification from MS-MS data Drug Discovery Today, Volume 11, Numbers 13/14, July 2006, p 595-600.
Yonghua Han, Bin Ma, Kaizhong Zhang. SPIDER: Software for Protein Identification from Sequence Tags Containing De Novo Sequencing Error. Journal of Bioinformatics and Computational Bioliogy 3(3):697-716. 2005. (Also appeared in CSB'04, 206-215: Received the Best Paper Award)
Bin Ma, Kaizhong Zhang, Christopher Hendrie, Chengzhi Liang, Ming Li, Amanda Doherty-Kirby, Gilles Lajoie. PEAKS: Powerful Software for Peptide De Novo Sequencing by MS/MS. Rapid Communications in Mass Spectrometry, 17(20):2337-2342. 2003. Early version appeared in 50th ASMS Conference 2002.

v • d • e Mass spectrometry

Mass • Mass spectrum • MS software

Ion source	EI • CI • IA • FAB • FD • MALDI • APCI • ESI • DESI • GD • ICP • MIP • TS • DART

Mass analyzer	Time-of-flight • Quadrupole • Ion trap • Quadrupole ion trap • Orbitrap

Detector	Electron multiplier • Microchannel plate detector • Daly detector

MS combination	MS/MS • GC/MS • LC/MS

Hidden categories: Wikipedia articles needing style editing from December 2007 | All articles needing style editing