Solving Travelling Salesman Problems Using Genetic Algorithms

Introduction

A majority of texts on the Internet will tell you exactly what a Genetic Algorithm is. This article won't. It will show you how to solve an actual problem using Genetic Algorithms. The problem that we will discuss throughout this article is the Travelling Salesman Problem. In summary, this problem can be expressed as optimizing a path to link a group of two dimensions positions.

What's required to understand this article?

The reader should be familiar with Genetic Algorithms; the principles of fitness, random wheel selection, crossover, mutation, etc. The article will also require some very basic understanding of 2D vector mathematics (the distance between two points in two dimensions).

What's required to implement this article?

First, if you want to compile the code that came with this article, you must use a C++ standard compliant compiler. The code will be as "standard" as possible, but since it can't be tested on every compiler, some corrections might need to be done to the source code for some compilers. If you don't have a C++ compiler but would still want to test the code, you can try gcc or djgpp.

In order to understand what the code really does, you need some object-oriented programming experience so C++ and Java programmers will have no difficulty reading it.

If anyone wishes to port the code to other languages, go ahead, it can help others understand what's going on under the hood.

Implementation

Here we will try to describe the main functions used by the algorithm to find the shortest closed path through a group of two dimensions positions.

Since this article is not a programming class, we will try to stay as straight forward as possible, staying away from bitshifts and other complicated techniques. For this reason, you will notice a few "magic" numbers. Almost every variable will be stored on a single byte, which means 8 bits. Since a bit can hold the values 0 or 1, most of our variables will have values between 0 and 28-1 (255) which means we will have 28 (256) different values possible per variable.

The search domain

Our search domain will be defined as a rectangular area of 256 x 256 entries, ranging from 0 to 255 in both directions (x,y). Each pair (x,y) possible in this domain represents the position of a point.

We then pick 256 of these pairs at random to populate our group of two dimentionnal positions. We index, or label, these pairs from 0 to 255.

The population

Our population will be formed of 100 strings of 256 genes. Each of our genes within a string will represent a unique index corresponding to a pair picked in the previous step.

Having unique genes will force us to write custom genes inheriting, crossing and mutating functions.

The operators

The inheriting and crossing functions will both use a pool of free indices from which used indices will be removed to keep the uniqueness of the transferred indices.

The roulette wheel selection function

The parent strings of a string that will exist on the next generation are picked by a roulette wheel selection algorithm. This means that higher fitness strings represent bigger parts of the wheel so they have more chances to be picked.

The fitness function

The fitness of a string will represent "how short" the line connecting the points formed by the pairs indexed by the value of the genes from 0 to 255 then back to 0, which will close the line. To get higher values for smaller line length values, we will set to fitness to


n(i)
=
value of the (i modulo 256)th gene of a string

Vn(i)
 
=
vector representing the n(i)th initial position
l
=
255

i=0 
|
(
Vn(i)
 
-
Vn(i + 1)
 
)
 
|
fitness
=
1 / l

The crossover function

Our crossover function will be a bit uncommon. Since our genes are unique, we can't simply pick a random crossing point then swap the values between two randomly picked strings.

First, we'll have to identify what makes a high fitness string. To get better strings after crossing, we'll have to imagine a clever way to cross strings to keep this information and achieve to better strings after crossing.

In our problem, geographically nearer points will give better fitness. Since crossover cuts the string in half, it is very unlikely that looping information, the length calculated between the pairs indexed at genes 255 and 0, will be kept. Since we have to reconstruct those strings, lets choose to reconstruct them from the point the furthest from these two genes, the crossing point. What we will do is add genes in Child A from Parent A, starting from the crossing site, until we reach the head of the string. We then fill the right side of the string by querying the free indices pool with Parent B's genes; if a gene is still free, we set it, else we skip the gene and leave it empty until we reach the tail of the string. We then fill the empty genes randomly with the indices remaining in the pool. We do the same thing with Child B but we use the Parent B first, then fill the left part of the string with Parent A's genes.

Here's a quick example of how the algorithm should work, if you don't understand it, please refer to the source code.

Lets pick two random strings, Parent A and Parent B.


Parent A

141, 133, 21, ... , 14, 87, 172

Parent B
2, 127, 221, ... , 47, 122, 251

We only show 6 of the 256 unique terms to simplify the example. We pick a random number from 0 to 253 and add 1 to it to be our crossover point. This means that a crossover point can only be between two genes.

For example, lets say the random number ended up to be 86 (87 with the plus 1) . This means that Child A should be constituted from exactly 87 genes from Parent A and about 169 genes from Parent B. At the opposite, Child B will have exactly 169 genes from Parent B and about 87 from Parent A. It is very important to note that since the genes are unique, there are chances that for example Child A receives first a gene from Parent A that should also be transferred from Parent B. In this case, as stated earlier, we should leave the gene empty until the transfer of all free genes is completed. Next thing to do is to fill these empty genes with all the remaining pairs in the pool.

To illustrate this method, here are a few genes around our crossover point of Parent A and Parent B.


Parent A
..., 110, 34, 192 | 74, 1, 65, ...
Parent B
..., 55, 67, 204 | 34, 147, 104, ...

From this, we will construct Child A. First we will transfer all the first 87 genes of Parent A into Child A, then, we fill it with genes from Parent B. When we come to the point of transferring the 88th gene from Parent B, we notice that Parent A already transferred the value 34, so we leave the 88th gene of Child A empty. We continue until we reach the tail of the string.

In a second pass, all we need to do is fill the gaps with the remaining indices that haven't been used yet.

The mutation function

The mutation function is quite simple here. The only thing it does is swap two genes within the same string.

The main loop

Prior to the main loop, we must first initialize our population. As stated earlier, we decided to use a population of 100 strings, so we must create 100 strings composed of a randomized list of indices from 0 to 255. Then, each iteration of our main loop represents a new generation. Every generation creates a new population of strings and destroys the old one. On every generation, we first calculate the fitness of every string. Using the random wheel selection, we pick two parents. We then check if crossover occurs (it should occur 70 percent of the time). If it occurs, we cross the parents to create two new child, if not, we simply copy the parents into their child without modifications. With a very low probability, say 1 percent, we apply mutation. You can play with these probabilities to see the algorithm evolve from a stiffer algorithm to a more random algorithm.

This way, the global fitness of the population should increase after each iteration, towards a unique shortest path. Since we don't know at the start what this shortest path will be, we should loop until the best fitness stalls for n iterations.

Conclusion

In this article, we showed how easy it could be to solve real problems with Genetic Algorithms. Just for fun, you could try to compare the performances you're getting with this algorithm with a brute force method. I hope the results will be convincing enough.

Send me any question, comment or suggestion to eric.martel@tricktronic.com

Remember you can visit the Message Store to discuss this tutorial. Comments are always welcome!. There are already replies in the thread, why not join in?