A TSP Application of the "Wolfpack"
Genetic Algorithm
It Ain't Always So Easy
Some types of "mathematically" framed problems
are not amenable to solution via direct calculation. In such cases, an
algorithmic or "heuristic" approach may be needed to find the solution – or,
at least, a solution arbitrarily close to the "best" one. This is particularly
true for the class of problems for which the result of an iterative step or an
interim solution can be readily evaluated for "how good it is". The procedural
result may then be plowed back into the procedure, leading to an even better
result. The optimum solution for these kind of problems can often be
considered as the "lowest energy state" of all possible solutions.
One well-known example of the above is the
"Traveling Saleman’s Problem" (TSP). How might a salesman go about finding the
best travel route to visit all his customers, who do business in different
cities? In this case, the ideal route -- that is, the most fuel-efficient or
"lowest energy" one -- is the shortest circuit that lets the salesman visit
each unique customer (city) once, while ending up back at his starting point
or "home". Upon reflection, you will find that the problem, while easy to
state, is not solvable by strength of pure calculation alone.
Mathematicians refer to these sorts of
problems as combinatorial optimization problems. (In formal terms, they
call the TSP problem one of the class of combinatorial optimization problems
known as "NP-complete".) The challenge has been to find an efficient algorithm
-- i.e., an algorithm that will be guaranteed to find the optimal solution in
a defined number of steps -- for the problem. To date, such an algorithm has
not been found.
While by no means the only way solve a TSP,
the "genetic algorithm" (GA) approach offers one particularly good way to
tackle it. In its operation, the GA mimics biological genetics and
incorporates the Darwinian "survival of the fittest" principle. It is
particularly gratifying to use it in conjunction with some simple PC screen
graphics, in order to appreciate the physical dynamism of the procedure as it
unfolds.
String ‘Em Up
In using a GA for the TSP problem, a "chromosome"
is first established to encode the set of cities comprising the salesman’s
territory, represented by a set of integers from 1 to the Nth city. Looking
deeper, the chromosome is made up of a string containing N genes, each gene
within the chromosome tagged with a unique city number. The genes are randomly
disbursed along the chromosome string, in no particular order. It is only
necessary that there are exactly N genes (equaling the number of cities in the
circuit), and that each city number is represented by one and only one gene in
the string. So for example, a typical set of genes for a circuit of, say, 8
cities in size, might be encoded as:
g1=4, g2=1, g3=7,
g4=5, g5=3, g6=8 g7=2 g8=6
In the above case, the city trip circuit would
be: city 4 to city 1 to city 7 to city 5 to city 3 to city 8 to city 2 to city 6, with the last
"home" leg looping back to the starting city in the circuit: city 6
to city 4.
Are We There Yet? Are We There Yet?
We must establish the absolute locations of
the cities in order to determine the resultant circuit length. For simplicity,
each city is given a Cartesian x and y map coordinate, and the distance
between one city and the next may be calculated as the (Euclidian) distance
between the points. This is the most foreboding -- and, in fact, just about the
only -- math involved in this article. So if you can get past it, you’ve got
it made. Otherwise, take my word for it:
City-to-City Distance = SQR((x2-x1)2+(y2-y1)2)
The total circuit distance is the sum of the
distances between the city represented by the first gene number (city1
ID) and the next gene number (city2 ID) along the string, and so on
in sequence through the length of the chromosome. The final leg represents the
distance between the cities encoded by the last gene in the chromosome and the
first one. In this manner, the length of a complete circuit many be evaluated
for its "fitness" – i.e., the shorter the distance, the fitter the solution.
No Wolf Is An Island
Since genetics works through reproduction – and since solitary asexual
reproduction is close to being the most boring thing ever invented -- we will
establish a population of sexually active individuals to enrich our world. In
the Wolfpack GA, we use a fairly small population of 50 individuals. Each
individual holds a chromosome containing the set of unique city numbers
encoded in its genes. The order of the gene (city number) sequence is
different in each individual when they are initially created – the sequences
are randomly established, just as in our first individual. The total pack
population will stay constant at 50 individuals at all times.
The Alpha
In a real wolfpack, there is normally only one male who breeds: the "Alpha"
male, who is recognized as the strongest and fittest individual within the
pack. So it is in the Wolfpack GA, and whence its name. At the start of each
GA "run cycle", an evaluation is made to determine who the Alpha breeder will
be, based on the individual who presently carries the shortest sequential gene
city-circuit length.
Sexual Congress (Note: This Section
May Not Be Suitable For Squeamish Children -- or Adults)
There are two ways of viewing the reproductive
act in the Wolfpack GA.
First, we could be delicate and say that,
unlike real biological reproduction, we do not produce actual offspring.
Rather, the Alpha breeder inserts part -- a sequential gene string "snippet"
-- of his own genetic material into the chromosomes of each of the other pack
members, thus "transforming" the city circuits encoded by their gene
sequences.
A second (and less wholesome) way to view this
is that offspring do indeed occur in each breeding cycle -- but that their
mothers always die at childbirth. Remember, the wolfpack population remains
constant at all times! Being a sensitive person who gets a bit teary at
funerals, I rather prefer the "transformation" explanation for this protocol.
The sequential gene snippet contributed by the
Alpha breeder may be anywhere from 1 gene long to 20% of his entire
chromosome’s length long. It may be inserted anywhere along the chromosome of
the receiving individual – a randomly selected "crossover" point. In doing the
snippet insertion, suitable care is taken to ensure that the resulting
transformed chromosome still contains a set of N-unique city numbers.
Therefore, it is necessary to purge any redundant genes that may have been
produced during the snippet insertion (i.e., ones with duplicate city
numbers), and to move any displaced unique ones to a different spot in the
chromosome. If the gene snippet reaches the end of the receiving individual’s
chromosome before it is fully inserted, it loops around and finishes inserting
itself at the beginning of the receiver’s chromosome.
There is no reciprocation during gene
transfer. The Alpha’s chromosome remains unaffected and pure at all times
during the breeding. Note that there is no gender distinction among the
members of the pack. All are transformed by the Alpha during the
breeding cycle.
Crank Her Up!
OK, we have the population established, each individual with its own trip
circuit encoded in its genes. And we have the protocol for sexual congress
established. Now it becomes a simple matter of letting these critters have at
it! Prior to the first breeding cycle, all individuals are evaluated for the
fitness of their genetically encoded trip circuit, and the best – the Alpha –
is selected. The Alpha then does his thing, transforming all pack members’
chromosomes using portions of his own genetic material. The resultant
population is evaluated for (hopefully) a new and better Alpha, whereupon the
cycle repeats -- and repeats and repeats.
Come Together, Right Now, Over Me
After a sufficient number of breeding ("transforming") cycles by the Alpha
individual, it is easy to see how the chromosomes of all pack members will
become more and more similar to the Alpha’s. Given enough time, genetic
stagnation will surely occur – with no further improvements to the circuit
lengths represented by any of the population’s gene strings. Everybody will
get more fit early on, but not necessarily "fittest". Eventually, everybody
will look the same, genetically. We have to figure out ways to keep this "Lake
Woebegone" effect from happening.
The Red-Headed Stepchild
As in a real wolfpack, sometimes a
"less-than-Alpha" individual may sneakily connive to achieve sex with the
other members of the pack. We use this concept in our Wolfpack GA. This
happens only extremely infrequently. Of course, this "wildcard" breeder will
never attempt to transform the Alpha’s gene string – after all, his illicit
breeding must be done carefully and secretly!
Wildcard breeding is one way that the Wolfpack
GA ensures that things are "mixed up" occasionally (and quite dramatically),
and that the genetic makeup of the pack remains diverse.
Gotta Love Those Transcription
Errors
But to guarantee that we maintain genetic
diversity among such a small population, it is not enough to have a wildcard
breeder running loose occasionally. After all, suppose the wildcard guy turns
out to be just a transformed clone of the Alpha? There must be another source
of random and somewhat unpredictable change always occurring in the
population’s gene sequences, at a low level. Otherwise -- even though the pack
may take pride that its Alpha has a very good overall trip circuit length --
there may well be better trip circuits out there that will never be
discovered, owing to the pack’s genetic stagnation.
We introduce essential variability by means of
occasional chromosome transcription errors – or, if you like, "mutations" --
within the gene string. These always occur immediately following
breeding transformations. A transcription error, when it occurs, may be any
one of several types:
·
Randomly interchange 2 adjacent genes in the gene string
·
Randomly interchange 2 genes from anywhere in the gene string
·
Randomly interchange 2 adjacent gene pairs from anywhere in the gene
string
·
Randomly interchange 2 adjacent pairs of genes
·
Shift a random number of adjacent genes along the length of the chromosome
·
Invert a random-length section of genes
The chromosome of the reigning Alpha individual
is naturally immune to any transcription errors or mutations – since, you will
recall, it is never transformed or altered in any way during the breeding
cycle. An Alpha can never get worse than it was – unless it gets bumped off
its throne by a new Alpha.
Say -- You Look Just Like Me!
Maintaining control over genetic diversity is immensely important to
achieving fittest GA solutions. When there is ample chromosomal diversity
exhibited in the population, transcription errors are minimized to operate at
a very low level. Said another way, the threshold for a mutation to occur is
high (RND>0.999) for every bred individual. However, as more and more of the
most fittest individuals in the pack become exactly as good as the Alpha, the
threshold level for transcription error occurrence is reduced -- making it
much more likely that random genetic changes occur. With a total of 50
individuals in the Wolfpack GA, this "stagnation" point is set at the top 6
individuals. That is, whenever the top 6 members of the pack are seen to have
identical genes, the transcription error threshold drops suddenly and rapidly
– and thus transcription errors occur much more frequently. It then begins to
climb back up to 0.999 -- but at a much slower rate than it had when it fell.
Is This Really Better Than Just Rolling The
Dice?
A touch of random intervention (expressed as
transcription errors or mutation) is an essential component of any effective
GA. But that begs the question as to whether simple randomization might do the
whole job as well or better, leaving out the genetic manipulation piece
entirely. Be assured that there is absolutely no doubt that the GA does
a better and quicker job at achieving a solution to the TSP than you would get
by simply randomly mixing up all the gene sequences a bunch of times in the
hopes of getting the best one. In point of fact, I think that the total number
of possible trip circuits in the TSP problem can be figured as the factorial
of N (N! or, for a 50-city circuit, 50 x 49 x 48 x 47 x 46 x … etc.). So a
"bunch of times" can easily equate to more dice rolls (even computer-aided)
than you could achieve within your lifetime.
For yucks, let’s figure the possible trips
involved in a 50-city circuit to see what I mean. I’ll go ahead and figure it
as 50!. By my old rusty Canon F-44 scientific calculator, that amounts to more
than 3 X 1064 possible unique trip routes. That is about equal to
the estimated number of atoms in the universe. Say you have one of the best
supercomputers in the world and can do 100 trillion (1014) rolls of
the dice and evaluate them every second – a torrid, 100 teraflop pace. Even
assuming every random run produces a unique trip, you will need about 1043
years before you will have run through and tested them all. Since the age of
the universe today is only about 1010 years – well, I’ll keep the
light on for you.
Of course, you could get lucky and hit it on
the first run.
As Far As Eye Can See
I like to envision the "playing field" for the
TSP as a huge (really huge!) checkerboard grid, with each square being one of
the possible trip circuits. Now since we are talking about low-energy
solutions, I make our playing grid dimpled, so that squares representing
better solutions are depressed into valleys. The depth of any local valley
represents the fitness of the solution. Somewhere in all these myriad
checkerboard squares is one and only one that is deeper than all the rest. I see
the Wolfpack GA Alpha represented as a steel ball bearing, rolling around the
surface, seeking deep dens. As each generation produces a better Alpha, its
ball is popped out of a local valley and finds a better, deeper one.
Meanwhile, the "Alpha-transformed" individuals
are also out exploring the neighborhood. I reckon them as littler ball
bearings pinging and ponging about, almost always landing on a flatter square
but quickly getting pulled back toward the deep "nest" dimple by the genetic
actions of the Alpha male. They normally can’t stray very far from home. After
all, it is a big, big world-grid we are talking about here.
The downside of the GA method is that you
can’t ever be sure that you’ve got the best solution. Your Alpha may
exhibit a TSP trip route that looks mighty fit, mighty fine. But is there a
better one out there, lurking undiscovered? Despite all your efforts at
maintaining good genetic diversity (i.e., introducing drivers for genetic
change and thus potential improvements), your Alpha may in fact be stuck in a
deep local hole – with a slightly better, deeper one just over the horizon.
And the deeper the hole it’s in, the harder it is for the GA to "pop" the ball
out into a better one.
Green Grass and High Times –
Forever?
We have to find a way to search the nearby
fitness grid neighborhood just beyond our view, there to find potentially
better solutions. Consequently, we will add a twist and some new information.
The Wolfpack GA actually starts off with two distinctly separate
populations, each generated randomly and uniquely at the start of the
procedure. One of the populations is designated as the "primary" one, and the
other as "secondary". If, during the course of their separate genetic
transformations, the secondary population produces a better Alpha than the
reigning primary population’s Alpha, the secondary population immediately
takes over the primary one and displaces all its original members. The
secondary population’s Alpha is then reconstituted with 2/3 of the new primary
Alpha’s gene sequence intact, and 1/3 of it scrambled randomly. The balance of
the secondary starting population is given 100% randomly constituted
chromosomes. In this way, we continue to have two distinct, competing
populations, living and breeding in neighboring TSP solution spaces, both
searching and striving for ultimate perfection.
Moreover, if the secondary population does not
produce a better Alpha within a goodly long period of opportunity (say, within
several thousands of breeding cycles), the primary population will spinoff its
own new secondary population to replace the one that is not being successful.
This will be done in the same manner as noted above, with the new secondary
Alpha having 2/3 of the primary Alpha’s gene sequence intact, but 1/3 of it
scrambled. Likewise, the remaining secondary population’s members will be 100%
randomly constituted. Think of this as sort of like a milkweed, casting out a
seed to land and take root at some distance – but not too awfully far
-- from the mother plant.
Hi, I’m From The Pack That Lives
Next Door…
In yet another embodiment of the Wolfpack GA,
there is some continual communication going on between the two competing
populations. In this version, the secondary population’s Alpha is cloned into
the primary population at every cycle. It replaces the 10th-best
member of the primary population. (This placement is necessary so as not to
interfere with the program routine that evaluates genetic stagnation among the
top 6 members.) While the primary population does not often "use" this
contribution, it does come in handy when both population Alphas are very close
in their fitness scores, and where a subtle improvement can possibly occur via
the normal transformation protocol of the primary population’s Alpha on the
"visiting" member. When successful, its effect is similar to the secondary
population "taking over" the primary one – but in this case, the new primary
Alpha remains a little bit better than the secondary one.
Here’s Looking At Me
So, what do I see (and imagine) when I run
this GA? The answer: Lots of things that I see when I reflect on my own life.
-
I
see the pace of increasing fitness being quite rapid initially, when the TSP
"solution field" – where the Alpha "ball bearing" is rolling around -- is
fairly flat. When there are only bad solutions extant, there are many more
opportunities for improvement. And those opportunities are grabbed up easily
and quickly. The "solution ball" is propelled and kicked around easily from
the shallow dimples to slightly deeper ones. Moral:
You do your best thinking when you’re down and
out…
-
Eventually, when the tentative solution becomes very good, it can take a
long, long time for the random transcription errors used here to pop
the solution ball out of its deep dimple. The Wolfpack population will cling
tightly to a good -- but not optimal -- fitness solution. Moral:
It’s so easy to rest on your laurels…
-
Optimal solutions to the TSP are always described by "open" figures – that
is, there are never any criss-crossing city-to-city route lines whatsoever.
It is interesting to watch the progressive GA solutions "unwrap" themselves
into these open figures. I do not know intuitively why open-perimeter
figures are best. Thinking in terms of "low-energy" models, it is as if you
stuck a pin in each of the cities on the map, then took a loop of
shrink-wrap plastic and shrunk it down so that the plastic wrapped tightly
around all the pins. Or, alternately, say you took a small rubber band
placed somewhere on the interior of the pin-studded map and stretched it
outwards against all the pins in a way that produced the lowest net tensile
force on the band. If you had a 3-dimensional set of points, I suppose the
corresponding model would be an inflated balloon, constrained by all the
city-vertices in such a way that the balloon had the lowest internal volume
possible. Of course, it is not as simple as I make it, since there are many
different open-figure perimeters possible in any given set of city points. I
am also thinking of fluid-spreading analogies. Moral: Let go and
try not to complicate your life.
The best path is the simplest one…if you can find it!
-
An
improvement to a fitness solution – the replacement of one Alpha by another,
better individual – is often is followed by yet another improvement in rapid
succession. Evolution seems to happen in spurts. This is true even when
improvements are few and far between, and when there is a long period where
no improvements have occurred. Moral:
We ought to quickly recognize, champion and
improve upon the novel, successful ideas of others.
-
All of the transcription error modes defined above are gainfully employed by
the GA. Some are used more often than others, depending on the state and
"orderliness" of the suboptimal solutions. But all are employed at one time
or another to improve the population’s fitness. I expect that if additional
change modalities were defined, they would be utilized as well. Moral:
Be open to all different kinds of
change.
-
How can such a crude representation of Nature embodied in these kinds of GAs
work so well at solving intractable problems? Does this simple algorithm
actually reflect in some fashion the unseen and, as yet, unknown mechanisms
of Nature? I have read that the fact of life on this planet is a product of
states and balances that are improbable to the extreme. Yet, a billion years
have passed since the inception of life here. The fossil record attests that
life has been nearly extinguished on this planet on several occasions in the
past. Even so, we continue to thrive and remain diverse. There must be a
very strong life-protecting principle at work within Gaia, our Mother World
-- more so than we can ever contemplate. Moral:
I shall not fear, for Thou art with me.
Below is an image of the runtime screen for
the Wolfpack TSP algorithm. It is honestly a complete coincidence that the
shortest route for this "standard" 60-city set, that I obtained from an
external site, looks kinda like -- a wolf!