De novo peptide sequencing

In mass spectrometry, de novo peptide sequencing is the method in which a peptide amino acid sequence is determined from tandem mass spectrometry.

Knowing the amino acid sequence of peptides from a protein digest is essential to study the biological function of the protein. In the old days, this was accomplished by the Edman degradation procedure.^[1] Nowadays, analysis by a tandem mass spectrometer is a more common method to solve the sequencing of peptides. Generally, there are two approaches: database search and de novo sequencing. Database search is a simple version as the mass spectra data of the unknown peptide is submitted and run to find a match with a known peptide sequence, the peptide with the highest matching score will be selected.^[2] This approach fails to recognize novel peptides since it can only match to exited sequences in the database. De novo sequencing is an assignment of fragment ions from a mass spectrum. Different algorithms^[3] are used for interpretation and most instruments come with de novo sequencing programs.

Peptide fragmentation

Peptides are protonated in positive-ion mode. The proton initially locates at the N-terminus or a basic residue side chain, but because of the internal solvation, it can move along the backbone breaking at different sites which result in different fragments. The fragmentation rules are well explained by some publications.^[4]^[5]^[6]^[7]^[8]^[9]

Three different types of backbone bones can be broken to form peptide fragments:

1 the alkyl carbonyl bone (CHR-CO);

2 the peptide amide bond (CO-NH);

3 the amino alkyl bond (NH-CHR).

Different types of fragment ions

Sequence ions

6 types of sequence ions in peptide fragmentation^[10]

When the backbone bones cleave, six different types of sequence ions are formed as shown in Fig. 1. The N-terminal charged fragment ions are classed as a, b or c, while the C-terminal charged ones are classed as x, y or z. The subscript n is the number of amino acid residues. The nomenclature was first proposed by Reopstorff and Fohlman, then Biemann modified it and this became the most widely accepted version.^[11]^[12]

Among these sequence ions, a, b and y-ions are the most common ion types, especially in the low-energy collision-induced dissociation (CID) mass spectrometers, since the peptide amide bone (CO-NH) is the most vulnerable and the loss of CO from b-ions.

Mass of b-ions = ∑ (residue masses) + 1 (H⁺)

Mass of y-ions = ∑ (residue masses) + 19 (H₂O+H⁺)

Mass of a-ions = mass of b-ions – 28 (CO)

Internal ions

Double backbone cleavage produces internal ions, acylium-type like H₂N-CHR²-CO-NH-CHR³-CO+ or immonium-type like H₂N-CHR²-CO-NH⁺=CHR³. These ions are usually disturbance in the spectra.

Satellite ions

Satellite ions in peptide fragmentation^[8]

Further cleavage happens under high-energy CID at the side chain of C-terminal residues, forming d_n, v_n, w_n-ions.^[8]

Fragmentation rules summary^[9]

1 Most fragment ions are b- or y-ions. a-ions are also frequently seen by the loss of CO from b-ions.

2 Satellite ions(w_n, v_n, d_n-ions) are formed by high-energy CID.

3 Ser-, Thr-, Asp- and Glu-containing ions generate neutral molecular loss of water (-18).

4 Asn-, Gln-, Lys-, Arg-containing ions generate neutral molecular loss of ammonia (-17).

5 Neutral loss of ammonia from Arg leads to fragment ions (y-17) or (b-17) ions with higher abundant than their corresponding ions.

6 When C-terminus has a basic residue, the peptide generates (b_n-1+18) ion.

7 A complementary b-y ion pair can be observed in multiply charged ions spectra. For this b-y ion pair, the sum of their subscripts is equal to the total number of amino acid residues in the unknown peptide.

8 If the C-terminus is Arg or Lys, y₁-ion can be found in the spectrum to prove it.

Methods for peptide fragmentation

Collision induced dissociation (CID)

In low energy CID, b- and y-ions are the main product ions. In addition, loss of ammonia (-17 Da) is observed in fragment with RKNQ amino acids in it. Loss of water (-18 Da) can be observed in fragment with STED amino acids in it. No satellite ions are shown in the spectra.

In high energy CID, all different types of fragment ions can be observed but no losses of ammonia or water.

Electron transfer dissociation(ETD) and electron capture dissociation(ECD)

Predominant ions are c, y, z+1, z+2 and sometimes w ions.

Post source decay(PSD)in MALDI

a, b, y-ions are most common product ions in MALDI-TOF PSD.

Factors affecting fragmentation

Charge state: the higher charge state, the less energy is needed for fragmentation.
Mass of the peptide: the larger mass, the more energy is required.
Induced energy: higher energy leads to more fragmentation.
Primary amino acid sequence
Mode of dissociation
Collision gas

Guidelines for interpretation

Table 1. Mass of amino acid fragment ions^[4]^[13]

Interpretation guidelines:^[14]

1 First, look for single amino acid immonium ions (H₂N⁺=CHR²). Corresponding immonium ions for amino acids are listed in Table 1.

2 Ignore a few peaks at the high-mass end of the spectrum. They are ions that undergo neutral molecules losses (H₂O, NH₃, CO₂, HCOOH) from [M+H]⁺ ions.

3 Find mass difference at 28Da. Since b-ions can form a-ions by loss of CO.

4 Look for b₂-ions at low-mass end of the spectrum, which helps to identify y_n-2-ions too. Mass of b₂-ions are listed in Table 2, as well as single amino acids that have equal mass to b₂-ions.^[15]

Mass of b₂-ion = mass of two amino acid residues + 1.

5 Identify a sequence ion series by the same mass difference, which matches one of the amino acid residue masses (see Table 1). For example, mass differences between a_n and a_n-1, b_n and b_n-1, c_n and c_n-1 are the same.

6 Identify y_n-1-ion at the high-mass end of the spectrum. Then continue to identify y_n-2, y_n-3… ions by matching mass differences with the amino acid residue masses (see Table 1).

7 Look for the corresponding b-ions of the identified y-ions.

Mass of b+y ions = mass of the peptide+2

8 After knowing the y-ion series and b-ion series, assign the amino acid sequence and check the mass.

9 The other method is to identify b-ions first and then find the corresponding y-ions.

Algorithms and software

Manual de novo sequencing is labor intensive and time consuming. Usually algorithms or programs come with the mass spectrometer instrument are applied for the interpretation of spectra.

Early development of de novo sequencing algorithms

An old method is to list all possible peptides for the precursor ion in mass spectrum, and match the mass spectrum for each candidate to the experimental spectrum. The possible peptide that has the most similar spectrum will have the highest chance to be the right sequence. However, the number of possible peptides may be large. For example, a precursor peptide with a molecular weight of 774 has 21,909,046 possible peptides. Even though it is done in the computer, it takes a long time.^[17]^[18]

Another method is called “subsequencing”, which instead of listing whole sequence of possible peptides, matches short sequences of peptides that represent only a part of the complete peptide. When sequences that highly match the fragment ions in the experimental spectrum are found, they are extended by residues one by one to find the best matching.^[19]^[20]^[21]^[22]

In the third method, graphical display of the data is applied, in which fragment ions that have the same mass differences of one amino acid residue are connected by lines. In this way, it is easier to get a clear image of ion series of the same type. This method could be helpful for manual de novo peptide sequencing, but doesn’t work for high-throughput condition.^[23]

The fourth method, which is considered to be successful, is the graph theory. Applying graph theory in de novo peptide sequencing was first mentioned by Bartels.^[24] Peaks in the spectrum are transformed into vertices in a graph called “spectrum graph”. If two vertices have the same mass difference of one or several amino acids, a directed edge will be applied. The SeqMS algorithm,^[25] Lutefisk algorithm,^[26] Sherenga algorithm^[27] are some examples of this type.

Software packages

Antilope

Free in OpenMS

As described by Andreotti et al^[28] in 2012, Antilope is a combination of Lagrangian relaxation and an adaptation of Yen's k shortest paths. It is based on 'spectrum graph' method and contains different scoring functions, and can be comparable on the running time and accuracy to "the popular state-of-the-art programs" PepNovo and NovoHMM.

AUDENS

Open source tool

Grossmann et al^[29] presented AUDENS in 2005 as an automated de novo peptide sequencing tool cotaining a preprocessing module that can recognize signal peaks and noise peaks.

Lutefisk

Free download

It can solve de novo sequencing from CID mass spectra. In this algorithm, significant ions are first found, then determine the N- and C-terminal evidence list. Based on the sequence list, it generates complete sequences in spectra and scores them with the experimental spectrum. However, the result may include several sequence candidates that have only little difference, so it is hard to find the right peptide sequence. A second program, CIDentify, which is a modified version by Alex Taylor of Bill Pearson's FASTA algorithm, can be applied to distinguish those uncertain similar candidates.

MSNovo

Free download

Mo et al^[30] presented this algorithm in 2007 and proved that it performed "better than existing de novo tools on multiple data sets". This algorithm can do de novo sequencing interpretation of LCQ, LTQ mass spectrometers and of singly, doubly, triply charged ions. Different from other algorithms, it applied a novel scoring function and use a mass array instead of a spectrum graph.

NovoHMM

Free download

Fisher et al^[31] proposed this method of de novo sequencing. A hidden Markov model(HMM) is applied as a new way to solve de novo sequencing in a Bayesian framework. Instead of scoring for single symbols of the sequence, this method considers posterior probabilities for amino acids. In the paper, this method is proved to have better performance than other popular de novo peptide sequencing methods like PepNovo by a lot of example spectra.

PEAKS

Commercial tool

PEAKS is a complete software package for the interpretation of peptide mass spectra. It contains de novo sequencing, database search, PTM identification, homology search and quantification in data analysis. Ma et al described a new model and algorithm for de novo sequencing in PEAKS, and compared the performance with Lutefisk of several tryptic peptides of standard proteins, by the quadrupole time-of-flight(Q-TOF) mass spectrometer.^[32]

PepNovo

Free download

PepNovo is a high throughput de novo peptide sequencing tool and uses a probabilistic network as scoring method. It usually takes less than 0.2 seconds for interpretation of one spectrum. Described by Frank et al, PepNovo works better than several popular algorithms like Sherenga, PEAKS, Lutefisk.^[33] Now a new version PepNovo+ is available.

pNovo+

Free download

Chi et al presented pNovo+ in 2013 as a new de novo peptide sequencing tool by using complementary HCD and ETD tandem mass spectra.^[34] In this method, a component algorithm, pDAG, largely speeds up the acquisition time of peptide sequencing to 0.018s on average, which is three times as fast as the other popular de novo sequencing softwares.

UniNovo

Free download

As described by Jeong et al, compared with other do novo peptide sequencing tools, which works well on only certain types of spectra, UniNovo is a more universal tool that has a good performance on various types of spectra or spectral pairs like CID, ETD, HCD, CID/ETD, etc. It has a better accuracy than PepNovo+ or PEAKS. Moreover, it generates the error rate of the reported peptide sequences.^[35]

Comparison of five software packages

Pevtsov et. al. compared the performance of 5 de novo sequencing algorithms, including AUDENS (v.1), Lutefisk (XP v.1.0.5), NovoHMM (unclear), PepNovo (version 1.01) and PEAKS (online version 1.1). QSTAR and LCQ mass spectrometer data were employed in the analysis, and evaluated by relative sequence distance (RSD) value, which was the similarity between de novo peptide sequencing and true peptide sequence calculated by a dynamic programming method. Results showed that all algorithms had better performance in QSTAR data than on LCQ data, while PEAKS as the best had a success rate of 49.7% in QSTAR data, and NovoHMM as the best had a success rate of 18.3% in LCQ data. The performance order in QSTAR data was PEAKS > Lutefisk, PepNovo > AUDENS, NovoHMM, and in LCQ data was NovoHMM > PepNovo, PEAKS > Lutefisk > AUDENS. Compared in a range of spectrum quality, PEAKS and NovoHMM also showed the best performance in both data among all 5 algorithms. PEAKS and NovoHMM had the best sensitivity in both QSTAR and LCQ data as well. However, no evaluated algorithms exceeded a 50% of exact identification for both data sets.^[36]

References

↑ Edman, P.; Begg, G. (March 1967). "A Protein Sequenator". European Journal of Biochemistry 1 (1): 80–91. doi:10.1111/j.1432-1033.1967.tb00047.x.
↑ Webb-Robertson, B.-J. M.; Cannon, W. R. (20 June 2007). "Current trends in computational inference from mass spectrometry-based proteomics". Briefings in Bioinformatics 8 (5): 304–317. doi:10.1093/bib/bbm023.
↑ Lu, Bingwen; Chen, Ting (March 2004). "Algorithms for de novo peptide sequencing using tandem mass spectrometry". Drug Discovery Today: BIOSILICO 2 (2): 85–90. doi:10.1016/S1741-8364(04)02387-X.
↑ 4.0 4.1 Papayannopoulos, Ioannis A. (January 1995). "The interpretation of collision-induced dissociation tandem mass spectra of peptides". Mass Spectrometry Reviews 14 (1): 49–73. doi:10.1002/mas.1280140104.
↑ Dass, Chhabil; Desiderio, Dominic M. (May 1987). "Fast atom bombardment mass spectrometry analysis of opioid peptides". Analytical Biochemistry 163 (1): 52–66. doi:10.1016/0003-2697(87)90092-3.
↑ Yalcin, Talat; Csizmadia, Imre G.; Peterson, Michael R.; Harrison, Alex G. (March 1996). "The structure and fragmentation of B n (n≥3) ions in peptide spectra". Journal of the American Society for Mass Spectrometry 7 (3): 233–242. doi:10.1016/1044-0305(95)00677-X.
↑ Tang, Xue-Jun; Boyd, Robert K.; Bertrand, M. J. (November 1992). "An investigation of fragmentation mechanisms of doubly protonated tryptic peptides". Rapid Communications in Mass Spectrometry 6 (11): 651–657. doi:10.1002/rcm.1290061105.
↑ 8.0 8.1 8.2 Johnson, Richard S.; Martin, Stephen A.; Biemann, Klaus (December 1988). "Collision-induced fragmentation of (M + H)+ ions of peptides. Side chain specific sequence ions". International Journal of Mass Spectrometry and Ion Processes 86: 137–154. doi:10.1016/0168-1176(88)80060-0.
↑ 9.0 9.1 Dass, Chhabil (2007). Fundamentals of contemporary mass spectrometry ([Online-Ausg.]. ed.). Hoboken, N.J.: Wiley-Interscience. pp. 317–322. ISBN 9780470118498. |accessdate= requires |url= (help)doi: 10.1002/0470118490
↑ Dass, Chhabil (2001). Principles and practice of biological mass spectrometry. New York, NY [u.a.]: Wiley. ISBN 978-0-471-33053-0.
↑ Roepstorff, P; Fohlman, J (November 1984). "Proposal for a common nomenclature for sequence ions in mass spectra of peptides.". Biomedical mass spectrometry 11 (11): 601. PMID 6525415.
↑ McCloskey, edited by James A. (1990). Mass spectrometry. San Diego: Academic Press. pp. 886–887. ISBN 978-0121820947.
↑ Falick, A. M.; Hines, W. M.; Medzihradszky, K. F.; Baldwin, M. A.; Gibson, B. W. (November 1993). "Low-mass ions produced from peptides by high-energy collision-induced dissociation in tandem mass spectrometry". Journal of the American Society for Mass Spectrometry 4 (11): 882–893. doi:10.1016/1044-0305(93)87006-X.
↑ Dass, Chhabil (2007). Fundamentals of contemporary mass spectrometry ([Online-Ausg.]. ed.). Hoboken, N.J.: Wiley-Interscience. pp. 327–330. ISBN 9780470118498. |accessdate= requires |url= (help)doi: 10.1002/0470118490
↑ Harrison, Alex G.; Csizmadia, Imre G.; Tang, Ting-Hua (May 2000). "Structure and fragmentation of b₂ ions in peptide mass spectra". Journal of the American Society for Mass Spectrometry 11 (5): 427–436. doi:10.1016/S1044-0305(00)00104-5.
↑ Dass, Chhabil (2007). Fundamentals of contemporary mass spectrometry ([Online-Ausg.]. ed.). Hoboken, N.J.: Wiley-Interscience. p. 329. ISBN 9780470118498. |accessdate= requires |url= (help)doi: 10.1002/0470118490
↑ Sakurai, T.; Matsuo, T.; Matsuda, H.; Katakuse, I. (August 1984). "PAAS 3: A computer program to determine probable sequence of peptides from mass spectrometric data". Biological Mass Spectrometry 11 (8): 396–399. doi:10.1002/bms.1200110806.
↑ Hamm, C. W.; Wilson, W. E.; Harvan, D. J. (1986). "Peptide sequencing program". Bioinformatics 2 (2): 115–118. doi:10.1093/bioinformatics/2.2.115.
↑ Biemann, K; Cone, C; Webster, BR; Arsenault, GP (5 December 1966). "Determination of the amino acid sequence in oligopeptides by computer interpretation of their high-resolution mass spectra.". Journal of the American Chemical Society 88 (23): 5598–606. PMID 5980176.
↑ Ishikawa, K.; Niwa, Y. (July 1986). "Computer-aided peptide sequencing by fast atom bombardment mass spectrometry". Biological Mass Spectrometry 13 (7): 373–380. doi:10.1002/bms.1200130709.
↑ Siegel, MM; Bauman, N (15 March 1988). "An efficient algorithm for sequencing peptides using fast atom bombardment mass spectral data.". Biomedical & environmental mass spectrometry 15 (6): 333–43. PMID 2967723.
↑ Johnson, RS; Biemann, K (November 1989). "Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides.". Biomedical & environmental mass spectrometry 18 (11): 945–57. PMID 2620156.
↑ Scoble, Hubert A.; Biller, James E.; Biemann, Klaus (1987). "A graphics display-oriented strategy for the amino acid sequencing of peptides by tandem mass spectrometry". Fresenius' Zeitschrift f�r Analytische Chemie 327 (2): 239–245. doi:10.1007/BF00469824.
↑ Bartels, Christian (June 1990). "Fast algorithm for peptide sequencing by mass spectroscopy". Biological Mass Spectrometry 19 (6): 363–368. doi:10.1002/bms.1200190607.
↑ Fernández-de-Cossío, J; Gonzalez, J; Besada, V (August 1995). "A computer program to aid the sequencing of peptides in collision-activated decomposition experiments.". Computer applications in the biosciences : CABIOS 11 (4): 427–34. PMID 8521052.
↑ Taylor, JA; Johnson, RS (1997). "Sequence database searches via de novo peptide sequencing by tandem mass spectrometry.". Rapid communications in mass spectrometry : RCM 11 (9): 1067–75. PMID 9204580.
↑ Dančík, Vlado; Addona, Theresa A.; Clauser, Karl R.; Vath, James E.; Pevzner, Pavel A. (October 1999). "Peptide Sequencing via Tandem Mass Spectrometry". Journal of Computational Biology 6 (3-4): 327–342. doi:10.1089/106652799318300.
↑ Andreotti, S; Klau, GW; Reinert, K (2012). "Antilope--a Lagrangian relaxation approach to the de novo peptide sequencing problem.". IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 9 (2): 385–94. PMID 21464512.
↑ Grossmann, J; Roos, FF; Cieliebak, M; Lipták, Z; Mathis, LK; Müller, M; Gruissem, W; Baginsky, S (2005). "AUDENS: a tool for automated peptide de novo sequencing.". Journal of proteome research 4 (5): 1768–74. PMID 16212431.
↑ Mo, L; Dutta, D; Wan, Y; Chen, T (1 July 2007). "MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry.". Analytical chemistry 79 (13): 4870–8. PMID 17550227.
↑ Fischer, B; Roth, V; Roos, F; Grossmann, J; Baginsky, S; Widmayer, P; Gruissem, W; Buhmann, JM (15 November 2005). "NovoHMM: a hidden Markov model for de novo peptide sequencing.". Analytical chemistry 77 (22): 7265–73. PMID 16285674.
↑ Ma, Bin; Zhang, Kaizhong; Hendrie, Christopher; Liang, Chengzhi; Li, Ming; Doherty-Kirby, Amanda; Lajoie, Gilles (30 October 2003). "PEAKS: powerful software for peptidede novo sequencing by tandem mass spectrometry". Rapid Communications in Mass Spectrometry 17 (20): 2337–2342. doi:10.1002/rcm.1196. PMID 14558135.
↑ Frank, A; Pevzner, P (15 February 2005). "PepNovo: de novo peptide sequencing via probabilistic network modeling.". Analytical chemistry 77 (4): 964–73. PMID 15858974.
↑ Chi, H; Chen, H; He, K; Wu, L; Yang, B; Sun, RX; Liu, J; Zeng, WF; Song, CQ; He, SM; Dong, MQ (1 February 2013). "pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra.". Journal of proteome research 12 (2): 615–25. PMID 23272783.
↑ Jeong, K; Kim, S; Pevzner, PA (15 August 2013). "UniNovo: a universal tool for de novo peptide sequencing.". Bioinformatics (Oxford, England) 29 (16): 1953–62. PMID 23766417.
↑ Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. (2006). "Performance Evaluation of Existing De Novo Sequencing Algorithms". Journal of Proteome Research 5 (11): 3018. doi:10.1021/pr060222h.

De novo peptide sequencing

Peptide fragmentation

Different types of fragment ions

Fragmentation rules summary[9]

Methods for peptide fragmentation

Factors affecting fragmentation

Guidelines for interpretation

Algorithms and software

Early development of de novo sequencing algorithms

Software packages

Comparison of five software packages

References

Fragmentation rules summary^[9]