Minimum spanning tree
Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a tree and connects all the vertices together. A single graph can have many different spanning trees. We can also assign a weight to each edge, which is a number representing how unfavorable it is, and use this to assign a weight to a spanning tree by computing the sum of the weights of the edges in that spanning tree. A minimum spanning tree (MST) or minimum weight spanning tree is then a spanning tree with weight less than or equal to the weight of every other spanning tree. More generally, any undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of minimum spanning trees for its connected components.
One example would be a telecommunications company laying cable to a new neighborhood. If it is constrained to bury the cable only along certain paths, then there would be a graph representing which points are connected by those paths. Some of those paths might be more expensive, because they are longer, or require the cable to be buried deeper; these paths would be represented by edges with larger weights. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects to every house. There might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost.
Properties
Possible multiplicity
There may be several minimum spanning trees of the same weight having a minimum number of edges; in particular, if all the edge weights of a given graph are the same, then every spanning tree of that graph is minimum. If there are n vertices in the graph, then each tree has n-1 edges.
Uniqueness
If each edge has a distinct weight then there will be only one, unique minimum spanning tree. This is true in many realistic situations, such as the cable TV company example above, where it's unlikely any two paths have exactly the same cost. This generalizes to spanning forests as well. If the edge weights are not unique, only the (multi-)set of weights in minimum spanning trees is unique, that is the same for all minimum spanning trees.[1]
A proof of uniqueness by contradiction is as follows.
- Suppose there are two different MSTs A and B.
- Let e1 be the edge of least weight that is in one of the MSTs and not the other. Without loss of generality, assume e1 is in A but not in B.
- As B is a MST, {e1} B must contain a cycle C.
- Then C has an edge e2 whose weight is greater than the weight of e1, since all edges in B with less weight are in A by the choice of e1.
- Replacing e2 with e1 in B yields a spanning tree with a smaller weight.
- This contradicts the assumption that B is a MST.
Minimum-cost subgraph
If the weights are positive, then a minimum spanning tree is in fact a minimum-cost subgraph connecting all vertices, since subgraphs containing cycles necessarily have more total weight.
Cycle property
For any cycle C in the graph, if the weight of an edge e of C is larger than the weights of all other edges of C, then this edge cannot belong to an MST. Assuming the contrary, i.e. that e belongs to an MST T1, then deleting e will break T1 into two subtrees with the two ends of e in different subtrees. The remainder of C reconnects the subtrees, hence there is an edge f of C with ends in different subtrees, i.e., it reconnects the subtrees into a tree T2 with weight less than that of T1, because the weight of f is less than the weight of e.
Cut property
For any cut C in the graph, if the weight of an edge e of C is strictly smaller than the weights of all other edges of C, then this edge belongs to all MSTs of the graph. To prove this, assume the contrary: in the figure at right, make edge BC (weight 6) part of the MST T instead of edge e (weight 4). Adding e to T will produce a cycle, while replacing BC with e would produce MST of smaller weight. Thus, a tree containing BC is not a MST, a contradiction that violates our assumption. By a similar argument, if more than one edge is of minimum weight across a cut, then each such edge is contained in a minimum spanning tree.
Minimum-cost edge
If the edge of a graph with the minimum cost e is unique, then this edge is included in any MST. Indeed, if e was not included in the MST, removing any of the (larger cost) edges in the cycle formed after adding e to the MST, would yield a spanning tree of smaller weight.
Algorithms
The first algorithm for finding a minimum spanning tree was developed by Czech scientist Otakar Borůvka in 1926 (see Borůvka's algorithm). Its purpose was an efficient electrical coverage of Moravia. There are now two algorithms commonly used, Prim's algorithm and Kruskal's algorithm. All three are greedy algorithms that run in polynomial time, so the problem of finding such trees is in FP, and related decision problems such as determining whether a particular edge is in the MST or determining if the minimum total weight exceeds a certain value are in P. Another greedy algorithm not as commonly used is the reverse-delete algorithm, which is the reverse of Kruskal's algorithm.
If the edge weights are integers, then deterministic algorithms are known that solve the problem in O(m + n) integer operations, where m is the number of edges, n is the number of vertices.[2] In a comparison model, in which the only allowed operations on edge weights are pairwise comparisons, Karger, Klein & Tarjan (1995) found a linear time randomized algorithm based on a combination of Borůvka's algorithm and the reverse-delete algorithm.[3][4] Whether the problem can be solved deterministically in linear time by a comparison-based algorithm remains an open question, however. The fastest non-randomized comparison-based algorithm with known complexity, by Bernard Chazelle, is based on the soft heap, an approximate priority queue.[5][6] Its running time is O(m α(m,n)), where α is the classical functional inverse of the Ackermann function. The function α grows extremely slowly, so that for all practical purposes it may be considered a constant no greater than 4; thus Chazelle's algorithm takes very close to linear time. Seth Pettie and Vijaya Ramachandran have found a provably optimal deterministic comparison-based minimum spanning tree algorithm, the computational complexity of which is unknown.[7]
Research has also considered parallel algorithms for the minimum spanning tree problem. With a linear number of processors it is possible to solve the problem in time.[8][9] Bader & Cong (2003) demonstrate an algorithm that can compute MSTs 5 times faster on 8 processors than an optimized sequential algorithm.[10]
Other specialized algorithms have been designed for computing minimum spanning trees of a graph so large that most of it must be stored on disk at all times. These external storage algorithms, for example as described in "Engineering an External Memory Minimum Spanning Tree Algorithm" by Roman, Dementiev et al.,[11] can operate, by authors' claims, as little as 2 to 5 times slower than a traditional in-memory algorithm. They rely on efficient external storage sorting algorithms and on graph contraction techniques for reducing the graph's size efficiently.
The problem can also be approached in a distributed manner. If each node is considered a computer and no node knows anything except its own connected links, one can still calculate the distributed minimum spanning tree.
MST on complete graphs
Alan M. Frieze showed that given a complete graph on n vertices, with edge weights that are independent identically distributed random variables with distribution function satisfying , then as n approaches +∞ the expected weight of the MST approaches , where is the Riemann zeta function. Frieze and Steele also proved convergence in probability. Svante Janson proved a central limit theorem for weight of the MST.
For uniform random weights in , the exact expected size of the minimum spanning tree has been computed for small complete graphs.[12]
Vertices | Expected size | Approximative expected size |
---|---|---|
2 | 1 / 2 | 0.5 |
3 | 3 / 4 | 0.75 |
4 | 31 / 35 | 0.8857143 |
5 | 893 / 924 | 0.9664502 |
6 | 278 / 273 | 1.0183151 |
7 | 30739 / 29172 | 1.053716 |
8 | 199462271 / 184848378 | 1.0790588 |
9 | 126510063932 / 115228853025 | 1.0979027 |
Applications
Minimum spanning trees have direct applications in the design of networks, including computer networks, telecommunications networks, transportation networks, water supply networks, and electrical grids (which they were first invented for, as mentioned above).[13] They are invoked as subroutines in algorithms for other problems, including the Christofides algorithm for approximating the traveling salesman problem,[14] approximating the multi-terminal minimum cut problem (which is equivalent in the single-terminal case to the maximum flow problem),[15] and approximating the minimum-cost weighted perfect matching.[16]
Other practical applications based on minimal spanning trees include:
- Taxonomy, one of the earliest motivating applications.[17]
- Cluster analysis: clustering points in the plane,[18] single-linkage clustering (a method of hierarchical clustering),[19] graph-theoretic clustering,[20] and clustering gene expression data.[21]
- Constructing trees for broadcasting in computer networks.[22] On Ethernet networks this is accomplished by means of the Spanning Tree Protocol.
- Image registration[23] and segmentation[24] — see minimum spanning tree-based segmentation.
- Curvilinear feature extraction in computer vision.[25]
- Handwriting recognition of mathematical expressions.[26]
- Circuit design: implementing efficient multiple constant multiplications, as used in finite impulse response filters.[27]
- Regionalisation of socio-geographic areas, the grouping of areas into homogeneous, contiguous regions.[28]
- Comparing ecotoxicology data.[29]
- Topological observability in power systems.[30]
- Measuring homogeneity of two-dimensional materials.[31]
- Minimax process control.[32]
In pedagogical contexts, minimum spanning tree algorithms serve as a common introductory example of both graph algorithms and greedy algorithms due to their simplicity.
Related problems
The problem of finding the Steiner tree of a subset of the vertices, that is, minimum tree that spans the given subset, is known to be NP-Complete.[33]
A related problem is the k-minimum spanning tree (k-MST), which is the tree that spans some subset of k vertices in the graph with minimum weight.
A set of k-smallest spanning trees is a subset of k spanning trees (out of all possible spanning trees) such that no spanning tree outside the subset has smaller weight.[34][35][36] (Note that this problem is unrelated to the k-minimum spanning tree.)
The Euclidean minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the Euclidean distance between vertices which are points in the plane (or space).
The rectilinear minimum spanning tree is a spanning tree of a graph with edge weights corresponding to the rectilinear distance between vertices which are points in the plane (or space).
In the distributed model, where each node is considered a computer and no node knows anything except its own connected links, one can consider distributed minimum spanning tree. Mathematical definition of the problem is the same but has different approaches for solution.
The capacitated minimum spanning tree is a tree that has a marked node (origin, or root) and each of the subtrees attached to the node contains no more than a c nodes. c is called a tree capacity. Solving CMST optimally requires exponential time, but good heuristics such as Esau-Williams and Sharma produce solutions close to optimal in polynomial time.
The degree constrained minimum spanning tree is a minimum spanning tree in with each vertex is connected to no more than d other vertices, for some given number d. The case d = 2 is a special case of the traveling salesman problem, so the degree constrained minimum spanning tree is NP-hard in general.
For directed graphs, the minimum spanning tree problem is called the Arborescence problem and can be solved in quadratic time using the Chu–Liu/Edmonds algorithm.
A maximum spanning tree is a spanning tree with weight greater than or equal to the weight of every other spanning tree. Such a tree can be found with algorithms such as Prim's or Kruskal's after multiplying the edge weights by -1 and solving the MST problem on the new graph. A path in the maximum spanning tree is the widest path in the graph between its two endpoints: among all possible paths, it maximizes the weight of the minimum-weight edge.[37] Maximum spanning trees find applications in parsing algorithms for natural languages[38] and in training algorithms for conditional random fields.
The dynamic MST problem concerns the update of a previously computed MST after an edge weight change in the original graph or the insertion/deletion of a vertex.[39][40][41]
The minimum labeling spanning tree problem is to find a spanning tree with least types of labels if each edge in a graph is associated with a label from a finite label set instead of a weight.[42]
Minimum bottleneck spanning tree
A bottleneck edge is the highest weighted edge in a spanning tree.
A spanning tree is a minimum bottleneck spanning tree (or MBST) if the graph does not contain a spanning tree with a smaller bottleneck edge weight.
A MST is necessarily a MBST (provable by the cut property), but a MBST is not necessarily a MST.
See also
- Reverse-Delete algorithm
- Dijkstra's algorithm
- Spanning tree protocol, used in switched networks
- Edmonds's algorithm
- Distributed minimum spanning tree
- Prim's algorithm
- Kruskal's algorithm
- Steiner tree
- Borůvka's algorithm
References
- ↑ Do the minimum spanning trees of a weighted graph have the same number of edges with a given weight?
- ↑ Fredman, M. L.; Willard, D. E. (1994), "Trans-dichotomous algorithms for minimum spanning trees and shortest paths", Journal of Computer and System Sciences 48 (3): 533–551, doi:10.1016/S0022-0000(05)80064-9, MR 1279413.
- ↑ Karger, David R.; Klein, Philip N.; Tarjan, Robert E. (1995), "A randomized linear-time algorithm to find minimum spanning trees", Journal of the Association for Computing Machinery 42 (2): 321–328, doi:10.1145/201019.201022, MR 1409738
- ↑ Pettie, Seth; Ramachandran, Vijaya (2002), "Minimizing randomness in minimum spanning tree, parallel connectivity, and set maxima algorithms", Proc. 13th ACM-SIAM Symposium on Discrete Algorithms (SODA '02), San Francisco, California, pp. 713–722.
- ↑ Chazelle, Bernard (2000), "A minimum spanning tree algorithm with inverse-Ackermann type complexity", Journal of the Association for Computing Machinery 47 (6): 1028–1047, doi:10.1145/355541.355562, MR 1866456.
- ↑ Chazelle, Bernard (2000), "The soft heap: an approximate priority queue with optimal error rate", Journal of the Association for Computing Machinery 47 (6): 1012–1027, doi:10.1145/355541.355554, MR 1866455.
- ↑ Pettie, Seth; Ramachandran, Vijaya (2002), "An optimal minimum spanning tree algorithm", Journal of the Association for Computing Machinery 49 (1): 16–34, doi:10.1145/505241.505243, MR 2148431.
- ↑ Chong, Ka Wong; Han, Yijie; Lam, Tak Wah (2001), "Concurrent threads and optimal parallel minimum spanning trees algorithm", Journal of the Association for Computing Machinery 48 (2): 297–323, doi:10.1145/375827.375847, MR 1868718.
- ↑ Pettie, Seth; Ramachandran, Vijaya (2002), "A randomized time-work optimal parallel algorithm for finding a minimum spanning forest", SIAM Journal on Computing 31 (6): 1879–1895, doi:10.1137/S0097539700371065, MR 1954882.
- ↑ Bader, David A.; Cong, Guojing (2006), "Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs", Journal of Parallel and Distributed Computing 66 (11): 1366–1378, doi:10.1016/j.jpdc.2006.06.001.
- ↑ Dementiev, Roman; Sanders, Peter; Schultes, Dominik; Sibeyn, Jop F. (2004), "Engineering an external memory minimum spanning tree algorithm", Proc. IFIP 18th World Computer Congress, TC1 3rd International Conference on Theoretical Computer Science (TCS2004), pp. 195–208.
- ↑ Steele, J. Michael (2002), "Minimal spanning trees for graphs with random edge lengths", Mathematics and computer science, II (Versailles, 2002), Trends Math., Basel: Birkhäuser, pp. 223–245, MR 1940139
- ↑ Graham, R. L.; Hell, Pavol (1985), "On the history of the minimum spanning tree problem", Annals of the History of Computing 7 (1): 43–57, doi:10.1109/MAHC.1985.10011, MR 783327
- ↑ Nicos Christofides, Worst-case analysis of a new heuristic for the travelling salesman problem, Report 388, Graduate School of Industrial Administration, CMU, 1976.
- ↑ Dahlhaus, E.; Johnson, D. S.; Papadimitriou, C. H.; Seymour, P. D.; Yannakakis, M. (August 1994). "The complexity of multiterminal cuts". SIAM Journal on Computing 23 (4): 864–894. doi:10.1137/S0097539792225297. Retrieved 17 December 2012.
- ↑ Supowit, Kenneth J.; Plaisted, David A.; Reingold, Edward M. (1980). "Heuristics for weighted perfect matching". 12th Annual ACM Symposium on Theory of Computing (STOC '80). New York, NY, USA: ACM. pp. 398–419. doi:10.1145/800141.804689.
- ↑ Sneath, P. H. A. (1 August 1957). "The Application of Computers to Taxonomy". Journal of General Microbiology 17 (1): 201–226. doi:10.1099/00221287-17-1-201.
- ↑ Asano, T.; Bhattacharya, B.; Keil, M.; Yao, F. (1988). "Clustering algorithms based on minimum and maximum spanning trees". Fourth Annual Symposium on Computational Geometry (SCG '88) 1. pp. 252–257. doi:10.1145/73393.73419.
- ↑ Gower, J. C.; Ross, G. J. S. (1969). "Minimum Spanning Trees and Single Linkage Cluster Analysis". Journal of the Royal Statistical Society. C (Applied Statistics) 18 (1): 54–64.
- ↑ Päivinen, Niina (1 May 2005). "Clustering with a minimum spanning tree of scale-free-like structure". Pattern Recognition Letters 26 (7): 921–930. doi:10.1016/j.patrec.2004.09.039.
- ↑ Xu, Y.; Olman, V.; Xu, D. (1 April 2002). "Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees". Bioinformatics 18 (4): 536–545. doi:10.1093/bioinformatics/18.4.536.
- ↑ Dalal, Yogen K.; Metcalfe, Robert M. (1 December 1978). "Reverse path forwarding of broadcast packets". Communications of the ACM 21 (12): 1040–1048. doi:10.1145/359657.359665.
- ↑ Ma; Hero, A.; Gorman, J.; Michel, O. (2000). "Image registration with minimum spanning tree algorithm". International Conference on Image Processing 1. pp. 481–484. doi:10.1109/ICIP.2000.901000.
- ↑ P. Felzenszwalb, D. Huttenlocher: Efficient Graph-Based Image Segmentation. IJCV 59(2) (September 2004)
- ↑ Suk, Minsoo; Song, Ohyoung (1 June 1984). "Curvilinear feature extraction using minimum spanning trees". Computer Vision, Graphics, and Image Processing 26 (3): 400–411. doi:10.1016/0734-189X(84)90221-4.
- ↑ Tapia, Ernesto; Rojas, Raúl (2004). "Recognition of On-line Handwritten Mathematical Expressions Using a Minimum Spanning Tree Construction and Symbol Dominance". Graphics Recognition. Recent Advances and Perspectives. Lecture Notes in Computer Science 3088. Berlin Heidelberg: Springer-Verlag. pp. 329–340. ISBN 3540224785.
- ↑ Ohlsson (2004). "Implementation of low complexity FIR filters using a minimum spanning tree". 12th IEEE Mediterranean Electrotechnical Conference (MELECON 2004) 1. pp. 261–264. doi:10.1109/MELCON.2004.1346826.
- ↑ AssunÇão, R. M.; M. C. Neves, G. Câmara, C. Da Costa Freitas (2006). "Efficient regionalization techniques for socio‐economic geographical units using minimum spanning trees". International Journal of Geographical Information Science 20 (7): 797–811. doi:10.1080/13658810600665111.
- ↑ Devillers, J.; Dore, J.C. (1 April 1989). "Heuristic potency of the minimum spanning tree (MST) method in toxicology". Ecotoxicology and Environmental Safety 17 (2): 227–235. doi:10.1016/0147-6513(89)90042-0.
- ↑ Mori, H.; Tsuzuki, S. (1 May 1991). "A fast method for topological observability analysis using a minimum spanning tree technique". IEEE Transactions on Power Systems 6 (2): 491–500. doi:10.1109/59.76691.
- ↑ "Testing for homogeneity of two-dimensional surfaces". Mathematical Modelling 4 (2): 167–189. 1 January 1983. doi:10.1016/0270-0255(83)90026-X.
- ↑ Kalaba, Robert E. (1963), Graph Theory and Automatic Control
- ↑ Garey, Michael R.; Johnson, David S. (1979), Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman, ISBN 0-7167-1045-5. ND12
- ↑ Gabow, Harold N. (1977), "Two algorithms for generating weighted spanning trees in order", SIAM Journal on Computing 6 (1): 139–150, MR 0441784.
- ↑ Eppstein, David (1992), "Finding the k smallest spanning trees", BIT 32 (2): 237–248, doi:10.1007/BF01994879, MR 1172188.
- ↑ Frederickson, Greg N. (1997), "Ambivalent data structures for dynamic 2-edge-connectivity and k smallest spanning trees", SIAM Journal on Computing 26 (2): 484–538, doi:10.1137/S0097539792226825, MR 1438526.
- ↑ Hu, T. C. (1961), "The maximum capacity route problem", Operations Research 9 (6): 898–900, doi:10.1287/opre.9.6.898, JSTOR 167055.
- ↑ McDonald, Ryan; Pereira, Fernando; Ribarov, Kiril; Hajič, Jan (2005). "Non-projective dependency parsing using spanning tree algorithms". Proc. HLT/EMNLP.
- ↑ Spira, P. M.; Pan, A. (1975), "On finding and updating spanning trees and shortest paths", SIAM Journal on Computing 4 (3): 375–380, MR 0378466.
- ↑ Holm, Jacob; de Lichtenberg, Kristian; Thorup, Mikkel (2001), "Poly-logarithmic deterministic fully dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity", Journal of the Association for Computing Machinery 48 (4): 723–760, doi:10.1145/502090.502095, MR 2144928.
- ↑ Chin, F.; Houck, D. (1978), "Algorithms for updating minimal spanning trees", Journal of Computer and System Sciences 16: 333–344.
- ↑ Chang, R.S.; Leu, S.J. (1997), "The minimum labeling spanning trees", Information Processing Letters 63: 277–282.
Additional reading
- Otakar Boruvka on Minimum Spanning Tree Problem (translation of the both 1926 papers, comments, history) (2000) Jaroslav Nesetril, Eva Milková, Helena Nesetrilová. (Section 7 gives his algorithm, which looks like a cross between Prim's and Kruskal's.)
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Chapter 23: Minimum Spanning Trees, pp. 561–579.
- Eisner, Jason (1997). State-of-the-art algorithms for minimum spanning trees: A tutorial discussion. Manuscript, University of Pennsylvania, April. 78 pp.
- Kromkowski, John David. "Still Unmelted after All These Years", in Annual Editions, Race and Ethnic Relations, 17/e (2009 McGraw Hill) (Using minimum spanning tree as method of demographic analysis of ethnic diversity across the United States).
External links
- Jeff Erickson's MST lecture notes
- Implemented in BGL, the Boost Graph Library
- The Stony Brook Algorithm Repository - Minimum Spanning Tree codes
- Implemented in QuickGraph for .Net