Minimum spanning tree

From Wikipedia, the free encyclopedia

The minimum spanning tree of a planar graph. Each edge is labelled with its weight, which here is roughly equal to its length.
Enlarge
The minimum spanning tree of a planar graph. Each edge is labelled with its weight, which here is roughly equal to its length.

Given a connected, undirected graph, a spanning tree of that graph is a subgraph which is a tree and connects all the vertices together. A single graph can have many different spanning trees. We can also assign a weight to each edge, which is a number representing how unfavorable it is, and use this to assign a weight to a spanning tree by computing the sum of the weights of the edges in that spanning tree. A minimum spanning tree or minimum weight spanning tree is then a spanning tree with weight less than or equal to the weight of every other spanning tree. More generally, any undirected graph has a minimum spanning forest.

One example would be a cable TV company laying cable to a new neighborhood. If it is constrained to bury the cable only along certain paths, then there would be a graph representing which points are connected by those paths. Some of those paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects to every house. There might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost.

In case of a tie, there could be several minimum spanning trees; in particular, if all weights are the same, every spanning tree is minimum. However, one theorem states that if each edge has a distinct weight, the minimum spanning tree is unique.[citation needed] This is true in many realistic situations, such as the one above, where it's unlikely any two paths have exactly the same cost. This generalizes to spanning forests as well.

If the weights are non-negative, then a minimum spanning tree is in fact the minimum-cost subgraph connecting all vertices, since subgraphs containing cycles necessarily have more total weight.

Contents

[edit] Algorithms

The first algorithm for finding a minimum spanning tree was developed by Czech scientist Otakar Borůvka in 1926 (see Boruvka's algorithm). Its purpose was an efficient electrical coverage of Bohemia. There are now two algorithms commonly used, Prim's algorithm and Kruskal's algorithm. All three are greedy algorithms that run in polynomial time, so the problem of finding such trees is in P.

The fastest minimum spanning tree algorithm to date was developed by Bernard Chazelle, and based on Borůvka's. Its running time is O(e α(e,v)), where e is the number of edges, v refers to the number of vertices and α is the classical functional inverse of the Ackermann function. The function α grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to O(e) time.

What is the fastest possible algorithm for this problem? That is one of the oldest open questions in computer science. There is clearly a linear lower bound, since we must at least examine all the weights. If the edge weights are integers with a bounded bit length, then deterministic algorithms are known with linear running time, O(e). For general weights, randomized algorithms are known that run in linear expected time.

Whether there exists a deterministic algorithm with linear running time for general weights is still an open question. However, Seth Pettie and Vijaya Ramachandran have found a provably optimal deterministic minimum spanning tree algorithm, the computational complexity of which is unknown. [1]

More recently, research has focused on solving the minimum spanning tree problem in a highly parallelized manner. For example, the pragmatic 2003 paper "Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs" by David A. Bader and Guojing Cong demonstrates an algorithm that can compute MSTs 5 times faster on 8 processors than an optimized sequential algorithm.[2] Typically, parallel algorithms are based on Boruvka's algorithm — Prim's and especially Kruskal's algorithm do not scale as well to additional processors.

Other specialized algorithms have been designed for computing minimum spanning trees of a graph so large that most of it must be stored on disk at all times. These external storage algorithms, for example as described in "Engineering an External Memory Minimum Spanning Tree Algorithm" by Roman Dementiev et al.,[3] can operate as little as 2 to 5 times slower than a traditional in-memory algorithm; they claim that "massive minimum spanning tree problems filling several hard disks can be solved overnight on a PC." They rely on efficient external storage sorting algorithms and on graph contraction techniques for reducing the graph's size efficiently.

[edit] MST on complete graphs

It has been shown by J. Michael Steele based on work by A. M. Frieze that given a complete graph on n vertices, with edge weights chosen from a continuous random distribution f such that f'(0) > 0, as n approaches infinity the size of the MST approaches ζ(3) / f'(0), where ζ is the Riemann zeta function.

For uniform random weights in [0,1], the exact expected size of the minimum spanning tree has been computed for small complete graphs.

Vertices Expected size
2 1 / 2
3 3 / 4
4 31 / 35
5 893 / 924
6 278 / 273
7 30739 / 29172
8 199462271 / 184848378
9 126510063932 / 115228853025

[edit] Related problems

A related graph is the k-minimum spanning tree (k-MST) which is the tree that spans some subset of k vertices in the graph with minimum weight.

A set of k-smallest spanning trees is a subset of k spanning trees (out of all possible spanning trees) such that no spanning tree outside the subset has smaller weight. [4] (Note that this problem is unrelated to the k-minimum spanning tree.)

A geometrically related problem is the Euclidean minimum spanning tree.

In the distributed model, where each node is considered a computer and no node knows anything except its own connected links, one can consider Distributed minimum spanning tree. Mathematical definition of the problem is the same but has different approaches for solution.

[edit] See also

[edit] References

[edit] External links