A* search algorithm

From Wikipedia, the free encyclopedia

Graph search algorithms
Search

In computer science, A* (pronounced "A star") is a graph search algorithm that finds a path from a given initial node to a given goal node (or one passing a given goal test). It employs a "heuristic estimate" h(x) that ranks each node x by an estimate of the best route that goes through that node. It visits the nodes in order of this heuristic estimate. The A* algorithm is therefore an example of best-first search.

The algorithm was first described in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael. In their paper, it was called algorithm A; using this algorithm with an appropriate heuristic yields optimal behavior, hence A*.

Dijkstra's algorithm is the special case of A* where h(x) = 0 for all x.

Contents

[edit] Intuition

Consider the problem of route finding, for which A* is commonly used. A* incrementally builds all routes leading from the starting point until it finds one that reaches the goal. But, like all informed search algorithms, it only builds routes that appear to lead towards the goal.

To know which routes will likely lead to the goal, A* employs a heuristic estimation of the distance from any given point to the goal. In the case of route finding, this may be the straight-line distance, which is usually an approximation of road distance.

What sets A* apart from best-first search is that it also takes the distance already travelled into account. This makes A* complete and optimal, i.e., A* will always find the shortest route if any exists. It is not guaranteed to perform better than simpler search algorithms. In a maze-like environment, the only way to reach the goal might be to first travel one way (away from the goal) and eventually turn around. In this case trying nodes closer to your destination first may cost you time.

[edit] Algorithm description

A single-step simulation in a Visual Basic GUI (link see below)
Enlarge
A single-step simulation in a Visual Basic GUI (link see below)

A* maintains a set of partial solutions, i.e. paths through the graph starting at the start node, stored in a priority queue. The priority assigned to a path x is determined by the function f(x) = g(x) + h(x)

Here, g(x) is the cost of the path so far, i.e. the weight of the edges followed so far. h(x) is the heuristic estimate of the minimal cost to reach the goal from x. For example, if "cost" is taken to mean distance travelled, the straight line distance between two points on a map is a heuristic estimate of the distance to be travelled.

The lower f(x), the higher the priority (so a min-heap could be used to implement the queue).

function A*(start,goal)
    var closed := the empty set
    var q := make_queue(path(start))
    while q is not empty
        var p := remove_first(q)
        var x := the last node of p
        if x in closed
            continue
        if x = goal
            return p
        add x to closed
        foreach y in successors(p)
            enqueue(q, y)
    return failure

Here, successors(p) returns the set of paths created by extending p with one neighbor node. It is assumed that the queue maintains an ordering by f-value automatically.

In the closed set (closed), all the last node of p (nodes with paths found) are recorded, so as to avoid repetition and cycles (making this a graph search). The queue is sometimes analogously called the open set. The closed set can be omitted (yielding a tree search algorithm) if either a solution is guaranteed to exist, or if the successors function is adapted to reject cycles.

[edit] Properties

Like breadth-first search, A* is complete in the sense that it will always find a solution if there is one.

If the heuristic function h is admissible, meaning that it never overestimates the actual minimal cost of reaching the goal, then A* is itself admissible (or optimal) if a closed set is used. If no closed set is used, then h must also be monotonic (or consistent) for A* to be optimal. This means that it never overestimates the cost of getting from a node to its neighbor. Formally, for all nodes x,y where y is a successor of x:

h(x) \le g(y) - g(x) + h(y)

A* is also optimally efficient for any heuristic h, meaning that no algorithm employing the same heuristic will expand fewer nodes than A*, except when there are several partial solutions where h exactly predicts the cost of the optimal path.

[edit] Why A* is admissible and computationally optimal

A* is both admissible and considers fewer nodes than any other admissible search algorithm, because A*'s works from an "optimistic" estimate of the cost of a path through every node that it considers -- optimistic in that the true cost of a path through that node to the goal will be at least as great as the estimate. But, critically, as far as A* "knows", that optimistic estimate might be achievable.

When A* terminates its search, it by definition has found a path whose actual cost is lower than the estimated cost of any path through any open node. But since those estimates are optimistic, A* can safely ignore those nodes. In other words, A* will never overlook the possibility of a lower-cost path and so is admissible.

Suppose now that some other search algorithm A terminates its search with a path whose actual cost is not less than the estimated cost of a path through some open node. Algorithm A cannot rule out the possibility, based on the heuristic information it has, that a path through that node might have a lower cost. So while A might consider fewer nodes than A*, it cannot be admissible. Accordingly, A* considers the fewest nodes of any admissible search algorithm that uses a no more accurate heuristic estimate.

[edit] Complexity

The time complexity of A* depends on the heuristic. In the worst case, the number of nodes expanded is exponential in the length of the solution (the shortest path), but it is polynomial when the heuristic function h meets the following condition:

|h(x) - h^*(x)| \le O(\log h^*(x))

where h * is the optimal heuristic, i.e. the exact cost to get from x to the goal. In other words, the error of h should not grow faster than the logarithm of the "perfect heuristic" h * that returns the true distance from x to the goal (Russell and Norvig 2003, p. 101).

More problematic than its time complexity is A*'s memory usage. In the worst case, it must also remember an exponential number of nodes. Several variants of A* have been developed to cope with this, including iterative deepening A*, memory-bounded A* (MA*) and simplified memory bounded A*.

Another informed search algorithm that is optimal and complete if its heuristic is admissible is recursive best-first search (RBFS).

[edit] References

[edit] External links