Home • Essays • Lost Articles • Loose Ends • Collections • Computing • Projects • Widdershins • Quotations • Links • Us

 

Genetic Algorithms

A TSP Application of the "Wolfpack" Genetic Algorithm

It Ain't Always So Easy

Some types of "mathematically" framed problems are not amenable to solution via direct calculation. In such cases, an algorithmic or "heuristic" approach may be needed to find the solution – or, at least, a solution arbitrarily close to the "best" one. This is particularly true for the class of problems for which the result of an iterative step or an interim solution can be readily evaluated for "how good it is". The procedural result may then be plowed back into the procedure, leading to an even better result. The optimum solution for these kind of problems can often be considered as the "lowest energy state" of all possible solutions.

One well-known example of the above is the "Traveling Saleman’s Problem" (TSP). How might a salesman go about finding the best travel route to visit all his customers, who do business in different cities? In this case, the ideal route -- that is, the most fuel-efficient or "lowest energy" one -- is the shortest circuit that lets the salesman visit each unique customer (city) once, while ending up back at his starting point or "home". Upon reflection, you will find that the problem, while easy to state, is not solvable by strength of pure calculation alone.

Mathematicians refer to these sorts of problems as combinatorial optimization problems. (In formal terms, they call the TSP problem one of the class of combinatorial optimization problems known as "NP-complete".) The challenge has been to find an efficient algorithm -- i.e., an algorithm that will be guaranteed to find the optimal solution in a defined number of steps -- for the problem. To date, such an algorithm has not been found.

While by no means the only way solve a TSP, the "genetic algorithm" (GA) approach offers one particularly good way to tackle it. In its operation, the GA mimics biological genetics and incorporates the Darwinian "survival of the fittest" principle. It is particularly gratifying to use it in conjunction with some simple PC screen graphics, in order to appreciate the physical dynamism of the procedure as it unfolds.

String ‘Em Up

In using a GA for the TSP problem, a "chromosome" is first established to encode the set of cities comprising the salesman’s territory, represented by a set of integers from 1 to the Nth city. Looking deeper, the chromosome is made up of a string containing N genes, each gene within the chromosome tagged with a unique city number. The genes are randomly disbursed along the chromosome string, in no particular order. It is only necessary that there are exactly N genes (equaling the number of cities in the circuit), and that each city number is represented by one and only one gene in the string. So for example, a typical set of genes for a circuit of, say, 8 cities in size, might be encoded as:

g1=4, g2=1, g3=7, g4=5, g5=3, g6=8 g7=2 g8=6

In the above case, the city trip circuit would be: city 4 to city 1 to city 7 to city 5 to city 3 to city 8 to city 2 to city 6, with the last "home" leg looping back to the starting city in the circuit: city 6 to city 4.

Are We There Yet? Are We There Yet?

We must establish the absolute locations of the cities in order to determine the resultant circuit length. For simplicity, each city is given a Cartesian x and y map coordinate, and the distance between one city and the next may be calculated as the (Euclidian) distance between the points. This is the most foreboding -- and, in fact, just about the only -- math involved in this article. So if you can get past it, you’ve got it made. Otherwise, take my word for it:

City-to-City Distance = SQR((x2-x1)2+(y2-y1)2)

The total circuit distance is the sum of the distances between the city represented by the first gene number (city1 ID) and the next gene number (city2 ID) along the string, and so on in sequence through the length of the chromosome. The final leg represents the distance between the cities encoded by the last gene in the chromosome and the first one. In this manner, the length of a complete circuit many be evaluated for its "fitness" – i.e., the shorter the distance, the fitter the solution.

No Wolf Is An Island

Since genetics works through reproduction – and since solitary asexual reproduction is close to being the most boring thing ever invented -- we will establish a population of sexually active individuals to enrich our world. In the Wolfpack GA, we use a fairly small population of 50 individuals. Each individual holds a chromosome containing the set of unique city numbers encoded in its genes. The order of the gene (city number) sequence is different in each individual when they are initially created – the sequences are randomly established, just as in our first individual. The total pack population will stay constant at 50 individuals at all times.

The Alpha

In a real wolfpack, there is normally only one male who breeds: the "Alpha" male, who is recognized as the strongest and fittest individual within the pack. So it is in the Wolfpack GA, and whence its name. At the start of each GA "run cycle", an evaluation is made to determine who the Alpha breeder will be, based on the individual who presently carries the shortest sequential gene city-circuit length.

Sexual Congress (Note: This Section May Not Be Suitable For Squeamish Children -- or Adults)

There are two ways of viewing the reproductive act in the Wolfpack GA.

First, we could be delicate and say that, unlike real biological reproduction, we do not produce actual offspring. Rather, the Alpha breeder inserts part -- a sequential gene string "snippet" -- of his own genetic material into the chromosomes of each of the other pack members, thus "transforming" the city circuits encoded by their gene sequences.

A second (and less wholesome) way to view this is that offspring do indeed occur in each breeding cycle -- but that their mothers always die at childbirth. Remember, the wolfpack population remains constant at all times! Being a sensitive person who gets a bit teary at funerals, I rather prefer the "transformation" explanation for this protocol.

The sequential gene snippet contributed by the Alpha breeder may be anywhere from 1 gene long to 20% of his entire chromosome’s length long. It may be inserted anywhere along the chromosome of the receiving individual – a randomly selected "crossover" point. In doing the snippet insertion, suitable care is taken to ensure that the resulting transformed chromosome still contains a set of N-unique city numbers. Therefore, it is necessary to purge any redundant genes that may have been produced during the snippet insertion (i.e., ones with duplicate city numbers), and to move any displaced unique ones to a different spot in the chromosome. If the gene snippet reaches the end of the receiving individual’s chromosome before it is fully inserted, it loops around and finishes inserting itself at the beginning of the receiver’s chromosome.

There is no reciprocation during gene transfer. The Alpha’s chromosome remains unaffected and pure at all times during the breeding. Note that there is no gender distinction among the members of the pack. All are transformed by the Alpha during the breeding cycle.

Crank Her Up!

OK, we have the population established, each individual with its own trip circuit encoded in its genes. And we have the protocol for sexual congress established. Now it becomes a simple matter of letting these critters have at it! Prior to the first breeding cycle, all individuals are evaluated for the fitness of their genetically encoded trip circuit, and the best – the Alpha – is selected. The Alpha then does his thing, transforming all pack members’ chromosomes using portions of his own genetic material. The resultant population is evaluated for (hopefully) a new and better Alpha, whereupon the cycle repeats -- and repeats and repeats.

Come Together, Right Now, Over Me

After a sufficient number of breeding ("transforming") cycles by the Alpha individual, it is easy to see how the chromosomes of all pack members will become more and more similar to the Alpha’s. Given enough time, genetic stagnation will surely occur – with no further improvements to the circuit lengths represented by any of the population’s gene strings. Everybody will get more fit early on, but not necessarily "fittest". Eventually, everybody will look the same, genetically. We have to figure out ways to keep this "Lake Woebegone" effect from happening.

The Red-Headed Stepchild

As in a real wolfpack, sometimes a "less-than-Alpha" individual may sneakily connive to achieve sex with the other members of the pack. We use this concept in our Wolfpack GA. This happens only extremely infrequently. Of course, this "wildcard" breeder will never attempt to transform the Alpha’s gene string – after all, his illicit breeding must be done carefully and secretly!

Wildcard breeding is one way that the Wolfpack GA ensures that things are "mixed up" occasionally (and quite dramatically), and that the genetic makeup of the pack remains diverse.

Gotta Love Those Transcription Errors

But to guarantee that we maintain genetic diversity among such a small population, it is not enough to have a wildcard breeder running loose occasionally. After all, suppose the wildcard guy turns out to be just a transformed clone of the Alpha? There must be another source of random and somewhat unpredictable change always occurring in the population’s gene sequences, at a low level. Otherwise -- even though the pack may take pride that its Alpha has a very good overall trip circuit length -- there may well be better trip circuits out there that will never be discovered, owing to the pack’s genetic stagnation.

We introduce essential variability by means of occasional chromosome transcription errors – or, if you like, "mutations" -- within the gene string. These always occur immediately following breeding transformations. A transcription error, when it occurs, may be any one of several types:

· Randomly interchange 2 adjacent genes in the gene string

· Randomly interchange 2 genes from anywhere in the gene string

· Randomly interchange 2 adjacent gene pairs from anywhere in the gene string

· Randomly interchange 2 adjacent pairs of genes

· Shift a random number of adjacent genes along the length of the chromosome

· Invert a random-length section of genes

The chromosome of the reigning Alpha individual is naturally immune to any transcription errors or mutations – since, you will recall, it is never transformed or altered in any way during the breeding cycle. An Alpha can never get worse than it was – unless it gets bumped off its throne by a new Alpha.

Say -- You Look Just Like Me!

Maintaining control over genetic diversity is immensely important to achieving fittest GA solutions. When there is ample chromosomal diversity exhibited in the population, transcription errors are minimized to operate at a very low level. Said another way, the threshold for a mutation to occur is high (RND>0.999) for every bred individual. However, as more and more of the most fittest individuals in the pack become exactly as good as the Alpha, the threshold level for transcription error occurrence is reduced -- making it much more likely that random genetic changes occur. With a total of 50 individuals in the Wolfpack GA, this "stagnation" point is set at the top 6 individuals. That is, whenever the top 6 members of the pack are seen to have identical genes, the transcription error threshold drops suddenly and rapidly – and thus transcription errors occur much more frequently. It then begins to climb back up to 0.999 -- but at a much slower rate than it had when it fell.

Is This Really Better Than Just Rolling The Dice?

A touch of random intervention (expressed as transcription errors or mutation) is an essential component of any effective GA. But that begs the question as to whether simple randomization might do the whole job as well or better, leaving out the genetic manipulation piece entirely. Be assured that there is absolutely no doubt that the GA does a better and quicker job at achieving a solution to the TSP than you would get by simply randomly mixing up all the gene sequences a bunch of times in the hopes of getting the best one. In point of fact, I think that the total number of possible trip circuits in the TSP problem can be figured as the factorial of N (N! or, for a 50-city circuit, 50 x 49 x 48 x 47 x 46 x … etc.). So a "bunch of times" can easily equate to more dice rolls (even computer-aided) than you could achieve within your lifetime.

For yucks, let’s figure the possible trips involved in a 50-city circuit to see what I mean. I’ll go ahead and figure it as 50!. By my old rusty Canon F-44 scientific calculator, that amounts to more than 3 X 1064 possible unique trip routes. That is about equal to the estimated number of atoms in the universe. Say you have one of the best supercomputers in the world and can do 100 trillion (1014) rolls of the dice and evaluate them every second – a torrid, 100 teraflop pace. Even assuming every random run produces a unique trip, you will need about 1043 years before you will have run through and tested them all. Since the age of the universe today is only about 1010 years – well, I’ll keep the light on for you.

Of course, you could get lucky and hit it on the first run.

As Far As Eye Can See

I like to envision the "playing field" for the TSP as a huge (really huge!) checkerboard grid, with each square being one of the possible trip circuits. Now since we are talking about low-energy solutions, I make our playing grid dimpled, so that squares representing better solutions are depressed into valleys. The depth of any local valley represents the fitness of the solution. Somewhere in all these myriad checkerboard squares is one and only one that is deeper than all the rest. I see the Wolfpack GA Alpha represented as a steel ball bearing, rolling around the surface, seeking deep dens. As each generation produces a better Alpha, its ball is popped out of a local valley and finds a better, deeper one.

Meanwhile, the "Alpha-transformed" individuals are also out exploring the neighborhood. I reckon them as littler ball bearings pinging and ponging about, almost always landing on a flatter square but quickly getting pulled back toward the deep "nest" dimple by the genetic actions of the Alpha male. They normally can’t stray very far from home. After all, it is a big, big world-grid we are talking about here.

The downside of the GA method is that you can’t ever be sure that you’ve got the best solution. Your Alpha may exhibit a TSP trip route that looks mighty fit, mighty fine. But is there a better one out there, lurking undiscovered? Despite all your efforts at maintaining good genetic diversity (i.e., introducing drivers for genetic change and thus potential improvements), your Alpha may in fact be stuck in a deep local hole – with a slightly better, deeper one just over the horizon. And the deeper the hole it’s in, the harder it is for the GA to "pop" the ball out into a better one.

Green Grass and High Times – Forever?

We have to find a way to search the nearby fitness grid neighborhood just beyond our view, there to find potentially better solutions. Consequently, we will add a twist and some new information. The Wolfpack GA actually starts off with two distinctly separate populations, each generated randomly and uniquely at the start of the procedure. One of the populations is designated as the "primary" one, and the other as "secondary". If, during the course of their separate genetic transformations, the secondary population produces a better Alpha than the reigning primary population’s Alpha, the secondary population immediately takes over the primary one and displaces all its original members. The secondary population’s Alpha is then reconstituted with 2/3 of the new primary Alpha’s gene sequence intact, and 1/3 of it scrambled randomly. The balance of the secondary starting population is given 100% randomly constituted chromosomes. In this way, we continue to have two distinct, competing populations, living and breeding in neighboring TSP solution spaces, both searching and striving for ultimate perfection.

Moreover, if the secondary population does not produce a better Alpha within a goodly long period of opportunity (say, within several thousands of breeding cycles), the primary population will spinoff its own new secondary population to replace the one that is not being successful. This will be done in the same manner as noted above, with the new secondary Alpha having 2/3 of the primary Alpha’s gene sequence intact, but 1/3 of it scrambled. Likewise, the remaining secondary population’s members will be 100% randomly constituted. Think of this as sort of like a milkweed, casting out a seed to land and take root at some distance – but not too awfully far -- from the mother plant.

Hi, I’m From The Pack That Lives Next Door…

In yet another embodiment of the Wolfpack GA, there is some continual communication going on between the two competing populations. In this version, the secondary population’s Alpha is cloned into the primary population at every cycle. It replaces the 10th-best member of the primary population. (This placement is necessary so as not to interfere with the program routine that evaluates genetic stagnation among the top 6 members.) While the primary population does not often "use" this contribution, it does come in handy when both population Alphas are very close in their fitness scores, and where a subtle improvement can possibly occur via the normal transformation protocol of the primary population’s Alpha on the "visiting" member. When successful, its effect is similar to the secondary population "taking over" the primary one – but in this case, the new primary Alpha remains a little bit better than the secondary one.

Here’s Looking At Me

So, what do I see (and imagine) when I run this GA? The answer: Lots of things that I see when I reflect on my own life.

  • I see the pace of increasing fitness being quite rapid initially, when the TSP "solution field" – where the Alpha "ball bearing" is rolling around -- is fairly flat. When there are only bad solutions extant, there are many more opportunities for improvement. And those opportunities are grabbed up easily and quickly. The "solution ball" is propelled and kicked around easily from the shallow dimples to slightly deeper ones. Moral: You do your best thinking when you’re down and out…

  • Eventually, when the tentative solution becomes very good, it can take a long, long time for the random transcription errors used here to pop the solution ball out of its deep dimple. The Wolfpack population will cling tightly to a good -- but not optimal -- fitness solution. Moral: It’s so easy to rest on your laurels…

  • Optimal solutions to the TSP are always described by "open" figures – that is, there are never any criss-crossing city-to-city route lines whatsoever. It is interesting to watch the progressive GA solutions "unwrap" themselves into these open figures. I do not know intuitively why open-perimeter figures are best. Thinking in terms of "low-energy" models, it is as if you stuck a pin in each of the cities on the map, then took a loop of shrink-wrap plastic and shrunk it down so that the plastic wrapped tightly around all the pins. Or, alternately, say you took a small rubber band placed somewhere on the interior of the pin-studded map and stretched it outwards against all the pins in a way that produced the lowest net tensile force on the band. If you had a 3-dimensional set of points, I suppose the corresponding model would be an inflated balloon, constrained by all the city-vertices in such a way that the balloon had the lowest internal volume possible. Of course, it is not as simple as I make it, since there are many different open-figure perimeters possible in any given set of city points. I am also thinking of fluid-spreading analogies. Moral: Let go and try not to complicate your life. The best path is the simplest one…if you can find it!

  • An improvement to a fitness solution – the replacement of one Alpha by another, better individual – is often is followed by yet another improvement in rapid succession. Evolution seems to happen in spurts. This is true even when improvements are few and far between, and when there is a long period where no improvements have occurred. Moral: We ought to quickly recognize, champion and improve upon the novel, successful ideas of others.

  • All of the transcription error modes defined above are gainfully employed by the GA. Some are used more often than others, depending on the state and "orderliness" of the suboptimal solutions. But all are employed at one time or another to improve the population’s fitness. I expect that if additional change modalities were defined, they would be utilized as well. Moral: Be open to all different kinds of change.

  • How can such a crude representation of Nature embodied in these kinds of GAs work so well at solving intractable problems? Does this simple algorithm actually reflect in some fashion the unseen and, as yet, unknown mechanisms of Nature? I have read that the fact of life on this planet is a product of states and balances that are improbable to the extreme. Yet, a billion years have passed since the inception of life here. The fossil record attests that life has been nearly extinguished on this planet on several occasions in the past. Even so, we continue to thrive and remain diverse. There must be a very strong life-protecting principle at work within Gaia, our Mother World -- more so than we can ever contemplate. Moral: I shall not fear, for Thou art with me.

Below is an image of the runtime screen for the Wolfpack TSP algorithm. It is honestly a complete coincidence that the shortest route for this "standard" 60-city set, that I obtained from an external site, looks kinda like -- a wolf!

 

  Back to Recreational Computing...  

 

 

 

First-time visitors -- including you!

Free Web Counter

Free Hit Counter The Foggiest Notion The Foggiest Notion The Foggiest Notion The Foggiest Notion The Foggiest Notion

 

Luck Favors the Prepared Mind...

Essays • Lost Articles • Loose Ends • Collections • Computing • Projects • Widdershins • Quotations • Links • Us

Site contents Copyright 2004-2008 by Gary Cuba       Email: webmeister at thefoggiestnotion dot com