Single-Pair Shortest Paths

These are the most simple type of shortest-path queries, as they involves finding a unique set of edges from a node to another. This result can be stored as an array of edges, as a set of nodes, or a sequence of both nodes and edges -- it all depends which representation is the most convenient for your particular problem.

A search is usually used to build up this solution. Starting from the root node, the set of edges in the solution path is built up incrementally and modified as the graph is traversed. As in standard computer science algorithmic terminology, this can be both depth first and breadth first. The depth first will generally take up less memory, while the breadth first approach will be very close to solving the full single source problem.

So, you may not be surprised to hear that in theory, single pair problems are just as expensive as their single-source counter-parts! This is due to the fact that in the worse theoretical case, both types of search (depth first and breadth first) are just as complex. Indeed, a single pair problem will at most have to determine the distance of all the nodes to the root. However, in practice, this is rarely the case. Paths are often direct or very close to being straight, so very few redundant nodes are traversed in the graph.

This is the exact assumption the A* algorithm relies on to perform efficiently.

A Star (A*)

Though the size of the search-space is not generally one of the biggest around, naive implementations can still suffer. By combining the efficiency of heuristics (functions that are based on, and that return approximate solutions), performance can be greatly improved. This is exactly what is needed in real-time systems, such as computer games.

The process can be guided by estimating which is the best node to search next. At each stage, we know the distance of the current vertex to the start node (s) -- since we've traversed the paths to get here, and summed up their lengths. So, if we knew the distance to the target (t) of the all the next possible nodes (u) via the shortest paths, we could pick the next move that minimises the length of the total path (sum of path (s,u) and (u,t). In this ideal world, our algorithm would only traverse the nodes that are in the final solution! This is great, no computation is wasted.

But, the problem is we do not know the length of the path (u,t) since we haven't traversed it yet. In fact, we dont even knwo if it is possible to get from u to t! And this is where the heuristics come into play. If you use a heuristic that always under-estimates the distance, then the algorithm will search a smaller amount of nodes. Obviously, the closer this gets to the perfect value, the better. But this is not always possible. In the worst case, returning 0 would cause a standard breath-first search to be performed. However, if you use a heuristic that can over-estimate the distance, then you're hampering the search by potentially sending it down wrong branches. This can lead to worst case performance everytime, which is definitely what we want to avoid.

So in practice, we estimate the distance from (u,t) not by scanning the edges -- which would cost us traversal time -- but by using the distance "as birds fly". This is a Euclidian distance, and can be computed very easily using standard vector math. At best, this will match the length of the path, but never over-estimate it.

Pseudo-Code C++

That's about all there is to the theory really! All you need is to implement the search, include the heuristic, and youre in business ;) It looks something like this:

/**
 * Initialise the cost of each node to the maximum possible cost.
 */
extern Node* source;
extern Node* target;

source->cost = 0.0f;

// could be almost any data-structure
vector<Node*> open;

open.insert( source );

while (!open.empty())
{
 // pick the next node as the one with the best estimate
    Node* current = FindBest( &open );

 // check if we've reached the target, and if so, return the result!
    if (current == target)
        return true;

 // scan all the neighbours of the current node
    for (Edge* e = current->edges.begin(); e != current->edges.end(); e++)
    {
     // determine the cost of this node for taking the current edge
        Node* neighbour = e->GetEndNode();
        const float cost = current->cost + e->length;

     // if it's worse, then we just skip it
        if (cost > neigbour->cost)
            continue;
                
     // remove this neighbour from the list
        open.remove( neigbour );

     // remember that the path via the current node is better
        neighbour->cost = cost;
        neighbour->parent = current;

     // and compute the estimated path length if this neighbour is used next
        neighbour->estimate = neighbour->cost 
                 + VectorLength( neighbour->pos - target->pos );

     // and allow the algorithm to scan this node when it becomes the best option
        open.insert( neighbour );
    }
}

Some store the list of scanned nodes explicitly, but I dont believe it's necessary. Also note that instead of removing and inserting neighbour from the open list, it could just be percolated to the new position corresponding to the updated estimate value.

Optimisations

There are few theoretical optimisations that can be made. Estimating the distance better than the euclidian is a tough task, since we need to make sure the heuristic is admissible. Any machine learning approach would struggle to do so.

That implies you're left with only practical implementation issues (or engineering problems, as some of you may like to call them). As mentioned numerous times in the previous pages, it's all about your data-structures!! Using a format that can perform both efficient insertions and retrivals, you can get great performance. I think a good compromise of simplicity and efficiency would be a priority queue implemented as a binary heap (though I have not tested this in practice).

Other memory optimsations can be employed for huge problems, by discarding the n worst options for example. But this may cause problems in returning valid solutions at all, since you've been trimming the search space! Also, by making it a depth-first search rather than breadth-first, you can reduce the need for complex data-structures, but at the cost of having to back-track a fair bit. It's a trade-off, as ever

Remember you can visit the Message Store to discuss this tutorial. Comments are always welcome!. There are already replies in the thread, why not join in?