BAli-Phy
Developer(s) | Benjamin Redelings and Marc Suchard |
---|---|
Stable release |
2.3.7
/ 15 June 2015 |
Written in | C++ |
Operating system | Windows NT, macOS, UNIX, Linux |
Type | Bioinformatics tool |
Licence | GPLv2 |
Website |
bali-phy |
BAli-Phy is a free software program for simultaneously estimating a multiple sequence alignment and its phylogenetic tree. BAli-Phy achieves high accuracy in alignment estimation by using information from the co-estimated phylogeny. BAli-Phy takes alignment uncertainty into account while estimating the phylogeny by averaging over possible alignments. Unlike most phylogeny inference software, input sequences need not be aligned beforehand. This differs from traditional approaches to alignment and phylogeny estimation, which first estimate the alignment without a high-quality tree estimate, and then estimate the tree given the alignment.
BAli-Phy produces a Bayesian posterior distribution on both the alignments and the tree. The software shows uncertainty in both the alignment and the tree. BAli-Phy uses Markov chain Monte Carlo methods for estimation. It can take several days to run.
Alignment uncertainty
Alignment uncertainty stems from two main sources: near-optimal alignments and evolutionary parameter uncertainty. Evolutionary parameters include branch lengths, substitution rates, insertion/deletion rates, and the phylogeny itself. If the exact value for these parameters is unknown, and the alignment estimate is sensitive to the parameter, then the alignment cannot be known with confidence.
Even when evolutionary parameters are fully known, many different alignments may be optimal, or nearly optimal. In this case, the researcher cannot have confidence in any single alignment, but must average over the cloud of near-optimal alignments.
BAli-Phy can handle both near-optimal alignment uncertainty and evolutionary parameter uncertainty by integrating over possible alignments and parameter values.
Input and output
BAli-Phy accepts nucleotide, amino acid, and codon sequences in FASTA format. Input sequences need not be aligned. Ambiguous nucleotides such as R and Y are supported, as are the ambiguous amino acids B, Z, and J.
Trees are output in Newick format. Alignments are output in FASTA format. Output alignments include homology information for sequences at internal nodes of the tree.