Mass spectrometry software is software used for data acquisition, analysis, or representation in mass spectrometry.
Within the field of protein mass spectrometry, tandem mass spectrometry (also known as MS/MS or MS2) experiments are used for protein/peptide identification.
In these experiments, sample proteins are broken up into short peptides using an enzyme like trypsin and separated in time using liquid chromatography. They are then sent through one mass spectrometer to separate them by mass. Peptide having a specific mass are then typically fragmented using collision-induced dissociation and sent through a second mass spectrometer, which will generate a set of fragment peaks from which the amino acid sequence of the peptide may often be inferred. Peptide identification software is used to try to reliably make these inferences.[1]
A typical experiment involves several hours of mass spectrometer time, and recent instruments may produce hundreds of thousands of MS/MS spectra, which must then be interpreted.
Peptide identification algorithms fall into two broad classes: database search and de novo search. The former search takes place against a database containing all amino acid sequences assumed to be present in the analyzed sample, whereas the latter infers peptide sequences without knowledge of genomic data. At present, database search is more popular and considered to produce higher quality results for most uses. With increasing instrument precision, however, de novo search may become increasingly attractive.
SEQUEST is a proprietary tandem mass spectrometry data analysis program developed by John Yates and Jimmy Eng in 1994.[2] The algorithm used by this program is covered by several US and European software patents.
SEQUEST identifies collections of tandem mass spectra to peptide sequences that have been generated from databases of protein sequences. It was one of the first, if not the first, database search program.
SEQUEST, like many engines, identifies each tandem mass spectrum individually. The software evaluates protein sequences from a database to compute the list of peptides that could result from each. The peptide's intact mass is known from the mass spectrum, and SEQUEST uses this information to determine the set of candidate peptides sequences that could meaningfully be compared to the spectrum by including only those which are near the mass of the observed peptide ion. For each candidate peptide, SEQUEST projects a theoretical tandem mass spectrum, and SEQUEST compares these theoretical spectra to the observed tandem mass spectrum by the use of cross correlation. The candidate sequence with the best matching theoretical tandem mass spectrum is reported as the best identification for this spectrum.
While very successful in terms of sensitivity, it is quite slow to process data and there are concerns against specificity, especially if multiple posttranslational modifications (PTMs) are present.
Mascot [3] is a proprietary identification program available from Matrix Science. It performs mass spectrometry data analysis through a statistical evaluation of matches between observed and projected peptide fragments rather than cross correlation. As of version 2.2, support for peptide quantitation methods is provided in addition to the identification features.
PEAKS DB is a proprietary proteomic mass spectrometry database search engine, developed by Bioinformatics Solutions Inc. In addition to providing an independent database search, results can be incorporated as part of the software’s multi-engine (Sequest, Mascot, X!Tandem, OMSSA, PEAKS DB) consensus reporting tool, inChorus.[4] In addition to reporting database sequences, it also provides a list of sequences identified exclusively by de novo sequencing. The approach of considering de novo sequence results with those of database searching increases the efficiency of the search process, maintains speed and ultimately maintains a low false discovery rate (FDR).[5]
X!Tandem[6] is open source software that can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
This software has a simple, XML-based input file format.[7] This format is used for all of the X! series search engines, as well as the GPM and GPMDB.
Unlike some earlier generation search engines, all of the X! Series search engines calculate statistical confidence (expectation values) for all of the individual spectrum-to-sequence assignments. They also reassemble all of the peptide assignments in a data set onto the known protein sequences and assign the statistical confidence that this assembly and alignment is non-random (i.e., did not occur by chance).[8] Therefore, separate assembly and statistical analysis software (e.g. PeptideProphet and ProteinProphet) are not needed.
This approach is good in terms of speed but poor with regard to false negatives and sensitivity.
X!!Tandem [9] is a parallel, high performance version of X!Tandem that has been parallelized via MPI to run on clusters or other non-shared memory multiprocessors running Linux. In X!!Tandem the search is parallelized by splitting the input spectra into as many subsets as there are processors, and processing each subset independently. Both compute-intensive stages of the processing (initial and refinement) are parallelized, and overall speedups in excess of 20-fold have been observed on real datasets.
With the exception of the details related to MPI launch, it is run exactly as X!Tandem, and produces exactly the same results using the same input and configuration files. It differs from Parallel Tandem[10] in that the parallelism is handled internally, rather than as an external driver/wrapper.
Phenyx is developed by Geneva Bioinformatics (GeneBio) in collaboration with the Swiss Institute of Bioinformatics (SIB). Phenyx incorporates OLAV, a family of statistical scoring models, to generate and optimize scoring schemes that can be tailored for all kinds of instruments, instrumental set-ups and general sample treatments. Although, not RAW, unprocessed data. [11] Phenyx computes a score to evaluate the quality of a match between a theoretical and experimental peak list (i.e. mass spectrum). A match is thus a collection of observations deduced from this comparison. The basic peptide score is ultimately transformed into a normalized z-Score and a p-Value. A basic peptide score is the sum of raw scores for up to twelve physico-chemical properties.
In addition to regular peptide and protein identification features, Phenyx proposes a number of additional functionalities, such as: a result comparison interface to visualise side-by-side multiple results; an import functionality to incorporate results from other search engines; a manual validation feature to manually accept/reject identifications and dynamically recalculates protein scores.
OMSSA[12] is an open source database search program developed at NCBI.[13]
MyriMatch[14] is an open source database search program developed at the Vanderbilt Medical Center.[15]
Greylag is an open source database search program developed at the Stowers Institute for Medical Research.[16] Its scoring algorithm is based on that of MyriMatch, but it includes a novel FDR (false discovery rate) validation algorithm as well. It is designed to perform large searches on computational clusters having hundreds of nodes. Notably, it is largely implemented in an interpreted language, Python, with only the CPU-intensive routines written in a compiled language (C++).
ByOnic is a database search program with a public web interface.[17] developed at PARC.[18] ByOnic works together with ComByne,[19] which combines peptide identifications to produce a protein score.
A MS-alignment search engine available at the Center for Computational Mass Spectrometry at the University of California, San Diego [20]
SIMS (Sequential Interval Motif Search)[21] is a software tool design to perform unrestrictive PTM search over tandem mass spectra. In other words, users do not have to characterize the potential PTMs. Instead, users only need to specify the range of modification mass for each individual amino acid.[22]
MassWiz[23] is a free, open source search algorithm developed at Institute of Genomics and Integrative Biology. It is available as a windows commandline tool [24] and also as a webserver.[25]
De novo peptide sequencing algorithms are based, in general, on the approach proposed in.[26]
DeNovoX performs de novo sequencing on CID spectra acquired with ion trap mass spectrometers. The software, launched in 2002 by Thermo Fisher Scientific, was the first commercially available software for low-resolution data.[27] DeNovoX delivers complete and/or partial peptide sequences (sequence tags). Each output sequence comes with a probability indicating how likely is for the sequence to have been obtained by chance. The software implements an algorithm based on probabilistic inference.
DeNoS is part of the software tool Proteinmatching Analysis Software (PAS) which in turn is part of the software package Medicwave Bioinformatics Suite (MBS).[28][29]
Task: DeNoS performs complete or almost complete sequencing of peptides with reliability (>95%). DeNoS uses all information from CAD and ECD spectra. It is a hierarchal algorithm. In the first step fragments that are confirmed in both CAD and ECD (so called Golden Complementary Pairs) along with fragments that are only found in CAD (so called Complementary Pairs) are used. After that, step-by-step fragments with low reliability are used. In the last step, if the peptide is still not fully sequenced, the software uses a trivial application from the graf theory to sequence the remaining peptide parts with "unreliable" fragments.
Advantage: DeNoS is the first algorithm ever to be able to sequence peptides with >95% reliability. 13% percent of all MS/MS spectra are almost completely sequenced (in typical experiments you usually only identify about 10% of all MS/MS spectra using a search engine, so 13% in this case is very good).
Input: DTA files, where each file contains data from a mass spectrum, either ECD or CAD.
Output: Complete or almost complete peptide sequences.
PEAKS de novo automatically provides a complete sequence for each peptide, confidence scores on individual amino acid assignments, simple reporting for high-throughput analysis, and greater knowledge for scientifically sensitive, in-depth investigations.[30] A de novo, manually-assisted mode, is available for users who wish to tweak/optimize their results further. According to published reports, PEAKS is currently the fastest, most accurate auto de novo algorithm available. Automated de novo sequencing on an entire LC run processed data faster than 1 spectra per second.[31] The results went unmatched in accuracy; PEAKS determines at least 3 times as many completely correct sequences as the next best de novo software.[32] Accurate mass capabilities mean de novo at 97% accuracy is possible.[33]
Lutefisk is software for the de novo interpretation of peptide CID spectra.[34]
For the identification of proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum.[35][36] Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. The SPIDER algorithm matches sequence tags with errors to database sequences for the purpose of protein and peptide identification.[37] BLAST (and similar) homology approaches can fail when confronted with common sequence substitutions such as I/L, N/GG, SAT/TAS. SPIDER is designed to avoid these problems. SPIDER can be used in conjunction with PEAKS mass spectrometry data analysis software.
AnalyzerPro is a proprietary software by SpectralWorks Limited. It is a vendor independent software application for processing mass spectrometry data.Using proprietary algorithms, AnalyzerPro can analyze both GC-MS and LC-MS using both qualitative and quantitative data processing. It is widely used for metabolomics data processing using MatrixAnalyzer for the comparison of multiple data sets.
Analyst is a proprietary software by AB Sciex, a division of The Danaher Corporation.
RemoteAnalyzer is a proprietary software by SpectralWorks Limited. It is a vendor independent 'Open Access' client/server based solution to provide a walk-up and use LC-MS and GC-MS data system. Instrument control and data processing support for multiple vendors' hardware is provided.
Electrospray ionization (ESI) mass spectrometry (MS) devices with relatively low resolution are widely used for proteomics and metabolomics. Ion trap devices like the Agilent MSD/XCT ultra or the Bruker HCT ultra are typical representatives. However, even if ESI-MS data of most of the naturally occurring proteins can be measured, the availability of data evaluation software for such ESI protein spectra with low resolution is quite limited.
ESIprot 1.0 enables the charge state determination and molecular weight calculation for low resolution electrospray ionization (ESI) mass spectrometry (MS) data of proteins.[38][39]
Spectrolyzer is a flexible Microsoft Windows based software package that provides bioinformatics data analysis tools for different mass spectrometers.
Spectrolyzer focuses on finding protein biomarkers and detecting protein deviations. Spectrolyzer is compatible with most mass spectrometers, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS. The software ensures high quality of analysis, while allowing high flexibility for special requirements and reduces time needed for each analysis. Spectrolyzer is a software package that contains several other software tools where each of them focuses on analyzing data from a certain mass spectrometry technology, i.e. TANDEM MS (MS/MS), MALDI-TOF MS and SELDI-TOF MS.[40]
ProTrawler is an LC/MS data reduction application that reads raw mass spectrometry vendor data (from a variety of well-known instrument companies) and creates lists of {mass, retention time, integrated signal intensity} triplets summarizing the LC/MS chromatogram. The measurements are reported with errors, which are essential for performing dynamic binning for comparisons between data sets. ProTrawler operates in two modes: a highly visual hands-on (expert) mode for the development of parameters used in data reduction and a fully automated mode for moving through many chromatograms in an automated fashion. ProTrawler's data reduction work flow includes background elimination, noise estimation, peak shape estimation, shape deconvolution, and isotopic and charge-state list deconvolution (factoring in errors and signal noise) to give a list features. Typically, ProTrawler reduces 1 GB of raw data to 10 Kb of processed results with a detection sensitivity of three orders of magnitude in 25% of the data acquisition time. No formal Bayesian methods are used, but sophisticated statistical inference is employed throughout. ProTrawler has been used for bacterial protein biomarker discovery efforts as well as for IPEx-related applications.
Regatta is an LC/MS list comparison application that works hand-in-hand with ProTrawler (but accepts input in Excel/CSV form) to provide an environment for LC/MS results list filtering and normalization {mass, retention time, integrated intensity} lists. To accomplish this, Regatta solves the famous Transitive Property of Equality problem that arises in the comparison of analytical list data, viz., if Peak A in Sample A overlaps Peak B in Sample B, and Peak B overlaps Peak C in Sample C, but Peak A does not overlap Peak C, then can we say that we've measured the same analyte in all three samples or not? Regatta also implements multivariate analysis, e.g., hierarchical cluster analysis, principal component analysis, as well as statistical tests, e.g., coefficients of variation. Input is not necessarily restricted to output from ProTrawler. Regatta has been used for successfully for biomarker discovery.
OmicsHub Proteomics combines a LIMS for mass spec information management with data analysis functionalities on one platform. The software allows the user to import data files from multiple instruments, and conduct protein peak detection, filtering, protein identification, annotation and exportation of formatted reports. It is a single server platform with a web interface for multiuser access and is proprietary software of Integromics.
The "Proteomics Research Resource for Integrative Biology" distributes software tools (VIPER,[41] Decon2LS, and others) that can be used to perform analysis of accurate mass and chromatography retention time analysis of LC-MS features. Sometimes referred to as the Accurate Mass and Time tag approach (AMT tag approach) generally these tools are used for Proteomics.
OpenMS is a software C++ library for LC/MS data management and analysis.[42] It offers an infrastructure for the development of mass spectrometry related software. OpenMS is free software available under the LGPL.
TOPP - The OpenMS Proteomics Pipeline - is a set of small applications that can be chained to create analysis pipelines tailored for a specific problem. TOPP is developed using the datastructures and algorithms provided by OpenMS. TOPP is free software available under the LGPL. TOPP provides ready-to-use applications for peak picking, the finding of peptides features, their quantitation and interfaces for most of the database search engines.
OpenMS and TOPP are a joint project of the Algorithmic Bioinformatics group at the Free University of Berlin, the Applied Bioinformatics group at Tübingen University and the Junior Research Group for Protein-Protein Interactions and Computational Proteomics at Saarland University.
Mass Frontier is a software tool for interpretation and management of mass spectra of small molecules. Computer methods for interpretation of mass spectral data in Mass Frontier centre on three fundamental methodologies: library search techniques, expert system procedures and classification methods. Mass Frontier uses automated generation of possible fragments at an expert level, including complete fragmentation and rearrangement mechanisms, starting from a user-supplied chemical structure. This software contains an expert system that automatically extracts a decomposition mechanism for each fragmentation reaction in the fragmentation library and determines the compound class range that the mechanism can be applied to. The expert system applies database mechanisms to a user provided structure and automatically predicts the fragmentation reactions for a given compound. The knowleadge base uses around 30,000 fragmentation schemes that contain around 100,000 reactions collected from mass spectrometry literature.
Mass Frontier also incorporates an automated system for detecting chromatographic components in complex GC/MS, LC/MS or MSn runs and extracting mass spectral signals from closely coeluting components (deconvolution).
Classification methods include principal component analysis, neural networks and fuzzy clustering.
The program massXpert [43] is a graphical user interface-based (GUI) software for simulating and analyzing mass spectrometric data obtained on known bio-polymer sequences.[44] The software runs in an identical manner on MS-Windows, Mac OS X and GNU/Linux/Unix platforms. massXpert is not for identifying proteins, but is useful when characterizing biopolymer sequences (post-translational modifications, intra-molecular cross-links...). It comprises four modules, all available in the same program interface: XpertDef will let the user define any aspect of the polymer chemistry at hand (atoms/isotopes, monomers, modifications, cleavage agents, fragmentation patterns, cross-links, default ionization...) ; XpertCalc is a desktop calculator with which anything mass is calculatable (the calculation is polymer chemistry definition-aware and is fully programmable; m/z ratios are computable with automatic replacement of the ionization agent ; isotopic patterns are computable starting from an elemental composition, with the possibility to specify the resolution of the mass spectrometer) ; XpertEdit is the central part of the software suite. In it reside all the simulation/analysis functionalities, like polymer sequence editing, sequence/monomer chemical modifications, cleavages, fragmentations, elemental/monomeric composition determinations, pI/net charge calculations, arbitrary mass searches in the polymer sequence; XpertMiner is a rather recently developed module (still experimental) in which it is possible to import lists of (m/z, z) pairs to submit them to any kind of calculation. Typically this module will be used to apply a formula to all the pairs in a single strike, or to perform matches between two lists, one from a simulation and another from the mass spectrometric data actually gotten from the mass spectrometer. All the simulations' results can be exported in the form of text either to the clipboard or to text files.
mMass[45] presents open source multi-platform package of tools for precise mass spectrometric data analysis and interpretation. It is written in Python language, so it is portable to different computer platforms, and released under GNU General Public License, so it can be modified or extended by modules for specific needs.
ProteoIQ is commercial software for the post-analysis of Mascot, SEQUEST, or X!Tandem database search results. The software provides the means to combine tandem mass spectrometry database search results derived from different instruments/platforms. Since the primary goal of many proteomics projects is to determine thresholds which identify as many real proteins as possible while encountering a minimal number of false positive protein identifications, ProteoIQ incorporates the two most common methods for statistical validation of large proteome datasets: the false discovery rate and protein probability approaches.[46][47][48] For false discovery rate calculations, ProteoIQ incorporates proprietary Protein Validation Technology (ProValT) algorithms licensed from the University of Georgia Research Foundation. Protein and peptide probabilities are generated by independent implementations of the Peptide Prophet and Protein Prophet algorithms. In ProteoIQ, protein relative quantitation is performed via spectral counting, standard deviations are automatically calculated across replicates, and spectral count abundances are normalized between samples. Integrated comparison functions allow user to quickly compare proteomic results across biological samples.
PatternLab is a free software for post-analysis of SEQUEST or ProLuCID database search results filtered by DTASelect or Census. It offers several tools that combine false discovery rates with statistical tests and protein fold changes to pinpoint differentially expressed proteins, find trend of proteins having similar expression profiles in time course experiments, generate area proportional Venn diagrams, and even deconvolute mass spectra to enable analysis of top-down / middle-down proteomic data (YADA module). Results can also be analyzed using its Gene Ontology Explorer module.[49]
MolAna was developed by Phenomenome Discoveries Inc, (PDI) for use in IONICS Mass Spectrometry Group's 3Q Molecular Analyzer, Triple quadrupole mass spectrometer
Xcalibur is a proprietary software by Thermo Fisher Scientific used with mass spectrometry instruments.
MassCenter is a proprietary software by JEOL used with mass spectrometry instruments like The JMS AccuTOF T100LC.
MassLynx is a proprietary software by Waters Corporation.
TurboMass is proprietary GC/MS software by PerkinElmer.
MSight is a free software for mass spectrometry imaging developed by the Swiss Institute of Bioinformatics.[50]
Spectromania is a commercial software for analysis and visualization of mass spectrometric data.[51]
Peacock is an open source Mac OS X application developed by Johan Kool that can be used to interpret gas-chromatography/mass-spectrometry (GC/MS) data files.[52]
MSGraph is an open source mass spectrometry software working in MS-DOS.[53]
OpenChrom is an open source chromatography and mass spectrometry software. It can be extended using plug-ins and is available for several operating systems (Microsoft Windows, Linux, Unix, Mac OS X) and processor architectures (x86, x86_64, ppc).[54] A free of charge read only converter for Agilents ChemStation (*.D) files is also available.[55]
ms2mz is a free, simple utility for converting between mass spectrometer file formats. The most common use of ms2mz is to convert proprietary binary files to MGF peak list files. This is a handy way to prepare files for upload to Proteome Cluster.[56]
Mass Spectrometry Software at the Open Directory Project
|