The travelling salesman problem (TSP) in operations research is a problem in discrete or combinatorial optimization. It is a prominent illustration of a class of problems in computational complexity theory which are classified as NP-hard.
The problem is: given a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city?
Contents |
Given a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city?
The size of the solution space is (n - 1)!/2 for n > 2, where n is the number of cities. This is the number of Hamiltonian cycles in a complete graph of n nodes, that is, closed paths that visit all nodes exactly once.
Mathematical problems related to the travelling salesman problem were treated in the 1800s by the Irish mathematician Sir William Rowan Hamilton and by the British mathematician Thomas Kirkman. A discussion of the early work of Hamilton and Kirkman can be found in Graph Theory 1736-1936.[2] The general form of the TSP appears to have been first studied by mathematicians during the 1930s in Vienna and at Harvard, notably by Karl Menger.
The problem was later undertaken by Hassler Whitney and Merrill M. Flood at Princeton University. A detailed treatment of the connection between Menger and Whitney as well as the growth in the study of TSP can be found in Alexander Schrijver's 2005 paper "On the history of combinatorial optimization (till 1960)".[3]
The problem has been shown to be NP-hard (more precisely, it is complete for the complexity class FPNP; see function problem), and the decision problem version ("given the costs and a number x, decide whether there is a round-trip route cheaper than x") is NP-complete. The bottleneck travelling salesman problem is also NP-hard. The problem remains NP-hard even for the case when the cities are in the plane with Euclidean distances, as well as in a number of other restrictive cases. Removing the condition of visiting each city "only once" does not remove the NP-hardness, since it is easily seen that in the planar case an optimal tour visits cities only once (otherwise, by the triangle inequality, a shortcut that skips a repeated visit would decrease the tour length).
The traditional lines of attacking for the NP-hard problems are the following:
For benchmarking of TSP algorithms, TSPLIB is a library of sample instances of the TSP and related problems is maintained, see the TSPLIB external reference. Many of them are lists of actual cities and layouts of actual printed circuits.
The most direct solution would be to try all the permutations (ordered combinations) and see which one is cheapest (using brute force search), but given that the number of permutations is n! (the factorial of the number of cities, n), this solution rapidly becomes impractical. Using the techniques of dynamic programming, one can solve the problem in time [4]. Although this is exponential, it is still much better than .
Other approaches include:
An exact solution for 15,112 German towns from TSPLIB was found in 2001 using the cutting-plane method proposed by George Dantzig, Ray Fulkerson, and Selmer Johnson in 1954, based on linear programming. The computations were performed on a network of 110 processors located at Rice University and Princeton University (see the Princeton external link). The total computation time was equivalent to 22.6 years on a single 500 MHz Alpha processor. In May 2004, the travelling salesman problem of visiting all 24,978 towns in Sweden was solved: a tour of length approximately 72,500 kilometers was found and it was proven that no shorter tour exists.[6]
In March 2005, the travelling salesman problem of visiting all 33,810 points in a circuit board was solved using CONCORDE: a tour of length 66,048,945 units was found and it was proven that no shorter tour exists. The computation took approximately 15.7 CPU years (Cook et al. 2006). In April 2006 an instance with 85,900 points was solved using CONCORDE, taking over 136 CPU years, see the book by Applegate et al [2006] [7].
Various approximation algorithms, which quickly yield good solutions with high probability, have been devised. Modern methods can find solutions for extremely large problems (millions of cities) within a reasonable time which are with a high probability just 2-3% away from the optimal solution.
Several categories of heuristics are recognized.
The nearest neighbour (NN) algorithm (the so-called greedy algorithm is similar, but slightly different) lets the salesman start from any one city and choose the nearest city not visited yet to be his next visit. This algorithm quickly yields an effectively short route.
Rosenkrantz et al. [1977] showed that the NN algorithm has the approximation factor for instances satisfying the triangle inequality. And the result is always of length <= 0.5*(log(n)+1), where n is the number of cities (Levitin, 2003).
For each n>1, there exist infinitely many examples for which the NN (greedy algorithm) gives the longest possible route (Gutin, Yeo, and Zverovich, 2002). This is true for both asymmetric and symmetric TSPs (Gutin and Yeo, 2007).
Recently a new constructive heuristic, Match Twice and Stitch (MTS) (Kahng, Reda 2004 [8]), is proposed. MTS has been shown to empirically outperform all existing tour construction heuristics. MTS performs two sequential matchings, where the second matching is executed after deleting all the edges of the first matching, to yield a set of cycles. The cycles are then stitched to produce the final tour.
TSP is a touchstone for many general heuristics devised for combinatorial optimization such as genetic algorithms, simulated annealing, Tabu search, ant colony optimization, and the cross entropy method.
Suppose that the number of towns is sixty. For a random search process, this is like having a deck of cards numbered 1, 2, 3, ... 59, 60 where the number of permutations is of the same order of magnitude as the total number of atoms in the known universe. If the hometown is not counted the number of possible tours becomes 60*59*58*...*4*3 (about 1080, i. e. a 1 followed by 80 zeros).
Suppose that the salesman does not have a map showing the location of the towns, but only a deck of numbered cards, which he may permute, put in a card reader - as in early computers - and let the computer calculate the length of the tour. The probability to find the shortest tour by random permutation is about one in 1080 so, it will never happen. So, should he give up?
No, by no means, evolution may be of great help to him; at least if it could be simulated on his computer. The natural evolution uses an inversion operator, which - in principle - is extremely well suited for finding good solutions to the problem. A part of the card deck (DNA) - chosen at random - is taken out, turned in opposite direction and put back in the deck again like in the figure below with 6 towns. The hometown (nr 1) is not counted.
If this inversion takes place where the tour happens to have a loop, then the loop is opened and the salesman is guaranteed a shorter tour. The probability that this will happen is greater than 1/(60*60) for any loop if we have 60 towns, so, in a population with one million card decks it might happen 1000000/3600 = 277 times that a loop will disappear.
This has been simulated with a population of 180 card decks, from which 60 decks are selected in every generation. The figure below shows a random tour at start
After about 1500 generations all loops have been removed and the length of the random tour at start has been reduced to 1/5 of the original tour. The human eye can see that some improvements can be made, but probably the random search has found a tour, which is not much longer than the shortest possible. See figure below.
In a special case when all towns are equidistantly placed along a circle, the optimal solution is found when all loops have been removed. This means that this simple random search is able to find one optimal tour out of as many as 1080. See also Goldberg, 1989.
Artificial intelligence researcher Marco Dorigo described in 1997 a method of heuristically generating "good solutions" to the TSP using a simulation of an ant colony called ACS.[9] It uses some of the same ideas used by real ants to find short paths between food sources and their nest, an emergent behavior resulting from each ant's preference to follow trail pheromones deposited by other ants.
ACS sends out a large number of virtual ant agents to explore many possible routes on the map. Each ant probabilistically chooses the next city to visit based on a heuristic combining the distance to the city and the amount of virtual pheromone deposited on the edge to the city. The ants explore, depositing pheromone on each edge that they cross, until they have all completed a tour. At this point the ant which completed the shortest tour deposits virtual pheromone along its complete tour route (global trail updating). The amount of pheromone deposited is inversely proportional to the tour length; the shorter the tour, the more it deposits.
A very natural restriction of the TSP is to require that the distances between cities form a metric, i.e., they satisfy the triangle inequality. That is, for any 3 cities A, B and C, the distance between A and C must be at most the distance from A to B plus the distance from B to C. Most natural instances of TSP satisfy this constraint.
In this case, there is a constant-factor approximation algorithm due to Christofides[10] that always finds a tour of length at most 1.5 times the shortest tour. In the next paragraphs, we explain a weaker (but simpler) algorithm which finds a tour of length at most twice the shortest tour.
The length of the minimum spanning tree of the network is a natural lower bound for the length of the optimal route. In the TSP with triangle inequality case it is possible to prove upper bounds in terms of the minimum spanning tree and design an algorithm that has a provable upper bound on the length of the route. The first published (and the simplest) example follows.
It is easy to prove that the last step works. Moreover, thanks to the triangle inequality, each skipping at Step 4 is in fact a shortcut, i.e., the length of the cycle does not increase. Hence it gives us a TSP tour no more than twice as long as the optimal one.
The Christofides algorithm follows a similar outline but combines the minimum spanning tree with a solution of another problem, minimum-weight perfect matching. This gives a TSP tour which is at most 1.5 times the optimal. The Christofides algorithm was one of the first approximation algorithms, and was in part responsible for drawing attention to approximation algorithms as a practical approach to intractable problems. As a matter of fact, the term "algorithm" was not commonly extended to approximation algorithms until later; the Christofides algorithm was initially referred to as the Christofides heuristic.
In the special case that distances between cities are all either one or two (and thus the triangle inequality is necessarily satisfied), there is a polynomial-time approximation algorithm that finds a tour of length at most 8/7 times the optimal tour length[11]. However, it is a long-standing (since 1975) open problem to improve the Christofides approximation factor of 1.5 for general metric TSP to a smaller constant. It is known that, unless P = NP, there is no polynomial-time algorithm that finds a tour of length at most 220/219=1.00456… times the optimal tour's length[12]. In the case of bounded metrics it is known that there is no polynomial time algorithm that constructs a tour of length at most 1/388 more than optimal, unless P = NP[13].
Euclidean TSP, or planar TSP, is the TSP with the distance being the ordinary Euclidean distance.
Euclidean TSP is a particular case of TSP with triangle inequality, since distances in plane obey triangle inequality. However, it seems to be easier than general TSP with triangle inequality. For example, the minimum spanning tree of the graph associated with an instance of Euclidean TSP is a Euclidean minimum spanning tree, and so can be computed in expected O(n log n) time for n points (considerably less than the number of edges). This enables the simple 2-approximation algorithm for TSP with triangle inequality above to operate more quickly.
In general, for any c > 0, there is a polynomial-time algorithm that finds a tour of length at most (1 + 1/c) times the optimal for geometric instances of TSP in O(n (log n)^O(c)) time; this is called a polynomial-time approximation scheme[14] In practice, heuristics with weaker guarantees continue to be used.
In most cases, the distance between two nodes in the TSP network is the same in both directions. The case where the distance from A to B is not equal to the distance from B to A is called asymmetric TSP. A practical application of an asymmetric TSP is route optimisation using street-level routing (asymmetric due to one-way streets, slip-roads and motorways).
Solving an asymmetric TSP graph can be somewhat complex. The following is a 3x3 matrix containing all possible path weights between the nodes A, B and C. One option is to turn an asymmetric matrix of size N into a symmetric matrix of size 2N, doubling the complexity.
A | B | C | |
---|---|---|---|
A | 1 | 2 | |
B | 6 | 3 | |
C | 5 | 4 |
To double the size, each of the nodes in the graph is duplicated, creating a second ghost node. Using duplicate points with very low weights, such as -∞, provides a cheap route "linking" back to the real node and allowing symmetric evaluation to continue. The original 3x3 matrix shown above is visible in the bottom left and the inverse of the original in the top-right. Both copies of the matrix have had their diagonals replaced by the low-cost hop paths, represented by -∞.
A | B | C | A' | B' | C' | |
---|---|---|---|---|---|---|
A | -∞ | 6 | 5 | |||
B | 1 | -∞ | 4 | |||
C | 2 | 3 | -∞ | |||
A' | -∞ | 1 | 2 | |||
B' | 6 | -∞ | 3 | |||
C' | 5 | 4 | -∞ |
The original 3x3 matrix would produce two Hamiltonian cycles (a path that visits every node once), namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 6x6 symmetric version of the same problem now produces many paths, including A-A'-B-B'-C-C'-A, A-B'-C-A'-A, A-A'-B-C'-A [all score 9-∞].
The important thing about each new sequence is that there will be an alternation between dashed (A',B',C') and un-dashed nodes (A,B,C) and that the link to "jump" between any related pair (A-A') is effectively free. A version of the algorithm could use any weight for the A-A' path, as long as that weight is lower than all other path weights present in the graph. As the path weight to "jump" must effectively be "free", the value zero (0) could be used to represent this cost — if zero is not being used for another purpose already (such as designating invalid paths). In the two examples above, non-existent paths between nodes are shown as a blank square.
The TSP, in particular the Euclidean variant of the problem, has attracted the attention of researchers in cognitive psychology. It is observed that humans are able to produce good quality solutions quickly. The first issue of the Journal of Problem Solving is devoted to the topic of human performance on TSP.
Many quick algorithms yield approximate TSP solutions for a large city number. To have an idea of the precision of an approximation, one should measure the resulted path length and compare it to the exact path length. To find out the exact path length, there are 3 approaches:
Consider N points randomly distributed in one unit square, with N>>1. A simple lower bound of the shortest path length is , obtained by considering each point connected to its nearest neighbor which is distance away on average.
Another lower bound is , obtained by considering each point j connected to j's nearest neighbor, and j's second nearest neighbor connected to j. Since j's nearest neighbor is distance away; j's second nearest neighbor is distance away on average.