HMMER

HMMER
Developer(s) Sean Eddy
Stable release 3.0 / 28 March 2010
Preview release 3.1b1 / May 2013
Written in C
Available in English
Type Bioinformatics tool
License GPL
Website hmmer.janelia.org

HMMER is a free and commonly used software package for sequence analysis written by Sean Eddy.[1] Its general usage is to identify homologous protein or nucleotide sequences. It does this by comparing a profile-HMM to either a single sequence or a database of sequences. Sequences that score significantly better to the profile-HMM compared to a null model are considered to be homologous to the sequences that were used to construct the profile-HMM. Profile-HMMs are constructed from a multiple sequence alignment in the HMMER package using the hmmbuild program. The profile-HMM implementation used in the HMMER software was based on the work of Krogh and colleagues.[2] HMMER is a console utility ported to every major operating system, including different versions of Linux, Windows, and Mac OS.

HMMER is the core utility that protein family databases such as Pfam and InterPro are based upon. Some other bioinformatics tools such as UGENE also use HMMER.

HMMER3 is complete rewrite of the earlier HMMER2 package, with the aim of improving the speed of profile-HMM searches. The main performance gain is due to a heuristic filter that finds high-scoring un-gapped matches within database sequences to a query profile. This heuristic results in a computation time comparable to BLAST with little impact on accuracy. Further gains in performance are due to a log-likelihood model that requires no calibration for estimating E-values, and allows the more accurate forward scores to be used for computing the significance of a homologous sequence.[3]

HMMER3 also makes extensive use of vector instructions for increasing computational speed. This work is based upon earlier publication showing a significant acceleration of the Smith-Waterman algorithm for aligning two sequences.[4]

See also

Several implementations of profile HMM methods and related position-specific scoring matrix methods are available. Some are listed below:

References

  1. Durbin, Richard; Sean R. Eddy, Anders Krogh, Graeme Mitchison (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. ISBN 0-521-62971-3.
  2. Krogh A, Brown M, Mian IS, Sjölander K, Haussler D (February 1994). "Hidden Markov models in computational biology. Applications to protein modeling". J. Mol. Biol. 235 (5): 1501–31. doi:10.1006/jmbi.1994.1104. PMID 8107089.
  3. Eddy SR (2008). Rost, Burkhard, ed. "A probabilistic model of local sequence alignment that simplifies statistical significance estimation". PLoS Comput Biol 4 (5): e1000069. doi:10.1371/journal.pcbi.1000069. PMC 2396288. PMID 18516236.
  4. Farrar M (January 2007). "Striped Smith-Waterman speeds database searches six times over other SIMD implementations". Bioinformatics 23 (2): 156–61. doi:10.1093/bioinformatics/btl582. PMID 17110365.

External links