Newick format
From Wikipedia, the free encyclopedia
Newick tree format (or Newick notation) is a way to represent graph theoretical trees with edge lengths using parentheses and commas. It was created by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick's restaurant in Dover, New Hampshire, USA.[1]
Contents |
[edit] Examples of Newick tree format:
The following tree:
could be represented in several ways
(,,(,)); no nodes are named (A,B,(C,D)); leaf nodes are named (A,B,(C,D)E)F; all nodes are named (:0.1,:0.2,(:0.3,:0.4):0.5); all but root node have a distance to parent (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); distances and leaf names (popular) (A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F; distances and all names ((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A; a tree rooted on a leaf node (rare)
Newick format is typically used for tools like PHYLIP and is a minimal definition for a phylogenetic tree.
[edit] Rooted and Unrooted Binary Trees
Trees are generally rooted on an internal node and it is rare (but legal) to root a tree on a leaf node. When a tree is unrooted an arbitrary internal node is chosen as its root.
Rooted binary trees that are rooted on an internal node have exactly two main top-level nodes, and each internal node has exactly two immediate descendants. Unrooted binary trees that are rooted on an arbitrary internal node have exactly three main top-level nodes, and each internal node has exactly two immediate descendants. A binary tree rooted from a leaf has at most one main top-level node, and each internal node has exactly two immediate descendants.
[edit] Grammar
A grammar for parsing the Newick format.
[edit] The Grammar Nodes
Tree: The full input Newick Format for a single tree Subtree: an internal node and its descendant (sub)subtree or a leaf node Leaf: a leaf node Internal: an internal node and its descendants BranchList: a set of one or more Branches Branch: a tree edge and its descendant subtree. Name: the name of a node Length: the length of a tree edge.
[edit] The Grammar Rules
Note, "|" separates alternatives.
Tree --> Subtree ";" Subtree --> Leaf | Internal Leaf --> Name Internal --> "(" BranchList ")" Name BranchList --> Branch | Branch "," BranchList Branch --> Subtree Length Name --> empty | string Length --> empty | ":" number
Whitespace (spaces, tabs, carriage returns, and linefeeds) within number is prohibited. Whitespace within string is often prohibited. Whitespace elsewhere is ignored.