More on moving blocks

So yes­ter­day I spent a while try­ing to imple­ment an intu­itive error heuris­tic for Cargo-Bot configurations.

It’s an inter­est­ing exer­cise in test-driven design, and I don’t have the sense I got it done very well the first time, so I’m going to spend a lit­tle while explor­ing in more detail, in the spirit of a cod­ing kata. I can def­i­nitely use some more help with my test-driven dev chops.

I had a bit of trou­ble think­ing it through myself, when I got more for­mal. This might be sim­pler in your lan­guage, in your test­ing frame­work. But I was able to work it all the way through with Ruby/Cucumber yes­ter­day, though I con­fess I pushed the notion of “accep­tance test” pretty close towards unit test­ing in doing so.

If you need some hints, or you want to see how a behavior-driven approach ends up, then take a look at my sce­nar­ios for this one fea­ture (scor­ing dif­fer­ent arrange­ments).

Stacks of Crates

So the prob­lem, briefly, is that you have two dif­fer­ent arrange­ments of (the same) col­ored blocks in a fixed num­ber of stacks. We’ll call one arrange­ment target, since it’s your goal in solv­ing the puzzle-game; we’ll call the other the observed one.

There are many dis­tance met­rics one could con­sider using, but those all seem to per­mit unre­al­is­tic phys­i­cal manip­u­la­tions. For exam­ple, Lev­en­shtein dis­tance mea­sures and their deriv­a­tives, which are used to com­pare strings, per­mit “swaps” between blocks as if that were a context-independent oper­a­tion. You can weight swaps to “cost” more in those met­rics, but those costs have no sense of the dif­fer­ence of swap­ping two blocks sit­ting on top of the same stack, as opposed to two blocks each on the bot­tom of a stack of 50 other blocks.

How I decided to approach the prob­lem is to treat each block’s con­tri­bu­tion to the total score as inde­pen­dent. Yes, this “sim­pli­fi­ca­tion” itself intro­duces unre­al­is­tic con­se­quences almost as heinous as the ones in the Lev­en­shtein exam­ple I just gave, since when you swap two adja­cent blocks in my scheme you’re count­ing the num­ber of moves you use to replace the top block with the bot­tom block, and then also the num­ber of moves you make to replace the bot­tom block with the top block (ignor­ing the fact you could take them both on and put them both back at once).

At this point? I don’t care. I’m the decider.

Now tech­ni­cally what we’re about to do here is imple­ment a heuris­tic or a qua­si­met­ric, not a met­ric. It makes rea­son­able phys­i­cal sense; con­sider two dif­fer­ent arrange­ments of the same blocks:

A: [[:r, :r, :r, :b, :r]]
B: [[:b, :r, :r, :r, :r]]

To review, this nota­tion means that A and B are con­fig­u­ra­tion with a sin­gle stack, con­tain­ing five blocks. The :b block is higher on stack 1 of arrange­ment A, and sits on the bot­tom of stack 1 of arrange­ment B.

If A is the target and B is what you have on hand, then to trans­form B into A you will need to make these changes to fix each block (top to bottom):

  1. do noth­ing to fix the cor­rect top block
  2. unstack all 5 blocks to get the :b, then stack 3 sup­port blocks, then replace the :b
  3. do noth­ing to fix the cor­rect third block up
  4. do noth­ing to fix the cor­rect sec­ond block up
  5. unstack all 5 blocks, then add an :r to replace the bot­tom block

Notice some­thing else that hap­pens here, which I may not have said out loud before but which is almost cer­tainly impor­tant: I’ve added an extra “space” to the sys­tem, a “pool” where I can just shove things and retrieve them. In my men­tal model of blocks stacked on a table, the “pool” is the table top itself. Within the game Cargo-Bot there’s no such thing, and blocks can only live on stacks or be moved between them. I’m not play­ing Cargo-Bot here, I’m eval­u­at­ing how close one arrange­ment is to another in a phys­i­cally intu­itive way.

On the other hand, if B is the target and A is what you observe:

  1. do noth­ing to fix the top block
  2. take 2 blocks off the remove the incor­rect :b, then add back 1 :r to fix the sec­ond block down
  3. mid­dle is right
  4. sec­ond one up is right
  5. take five blocks off to remove the wrong :r, but you’ve already set the :b aside in the “pool” by doing that, so you can just add the :b with one move

As I count them, the result we’ll get say­ing A is target is 15 moves, and if we say B is target it’s 9 moves. And that’s OK, as far as I’m con­cerned. Grav­ity is a thing, order mat­ters, and not every­thing in the world has the sort of sym­me­try math­e­mati­cians presume.

What’s wrong with this block?

Notice what we just did, though: we eval­u­ated each block in turn, then added it all up. I don’t care if in “fix­ing” one block you acci­den­tally “fix” another. If we counted all those, we’d be in inte­ger pro­gram­ming and constraint-solving land, and I don’t need that headache now.

We’re count­ing each block as though every­thing else were incon­se­quen­tial, and adding the inde­pen­dent cleanup moves for that one box.

I hope you can tell from the exam­ple we just walked through, each crate or block or box (I find myself slip­ping with the word, so for­give me) falls into one of sev­eral sim­ple classes. Let’s treat them in turn.

I’ll make the notional “pool” vis­i­ble, even though the algo­rithm I ended up with didn’t bother imple­ment­ing it. Turns out you can make do with count­ing, once you’ve explored the possibilities.

The block is OK

If the block in the target is the same color as the block in the same posi­tion in the same stack of the observed arrange­ment, no cleanup is necessary.

  target: [[:r]]
observed: [[:r]]
    pool: {}

Score = 0 cleanup for that block.

The block is com­pletely missing

I know I said the block arrange­ments were the “same”, but I get para­noid when I’m test­ing, and as it hap­pens there is a strange sort of edge-case in Cargo-Bot where a block can appear to dis­ap­pear: when it’s left in the claw.

So in these cases, I just want to note, “Hey, something’s not right here!” so I slapped a penalty of 100 points for any block in the target which is com­pletely absent from the observed state.

  target: [[:r]]
observed: [[  ]]
    pool: {}

Score = 100 for that block.

The block is in another stack, maybe with blocks on it

Here’s an exam­ple, with two stacks:

  target: [[:r, :b], [:g]]
observed: [[:r], [:g, :b]]
    pool: {}

Let’s con­sider the :b block only. There is no block in the posi­tion 2 of stack 1 in the observed setup. So we have to go find it. Aha! there it is! How many moves does it take to get it in the right position?

It’s on top of stack 2, so we can just un-stack it, where I like to think of it end­ing up in a sort of notional “pool”:

  target: [[:r, :b], [:g]]
observed: [[:r], [:g]]
    pool: {:b}

It’s avail­able for use, so we can plop on stack 1 and be done.

What hap­pens if the :b is deeper, as in

  target: [[:r, :b], [:g]]
observed: [[:r], [:b, :g]]
    pool: {}

Well, we need that :b. So we unstack it, along with every­thing on top of it, tak­ing 2 moves…

  target: [[:r, :b], [:g]]
observed: [[:r], []]
    pool: {:b, :g}

and then we can replace it. Do we care about the extra :g? No, it doesn’t sig­nify for this par­tic­u­lar case. It had to be moved (so we count the removal), but it can sit there.

The block is wrong

  target: [[:r, :b], [:g]]
observed: [[:r, :g], [:b]]
    pool: {}

So, again think­ing only of the :b in our exam­ple, what do we do? First we can free up the :b and place it in the “pool”.

  target: [[:r, :b], [:g]]
observed: [[:r, :g], []]
    pool: {:b}

Then we need to remove the wrong block that’s in the way (also to the pool):

  target: [[:r, :b], [:g]]
observed: [[:r], []]
    pool: {:b, :g}

… and then we can move the cor­rect :b onto the first stack.

  target: [[:r, :b], [:g]]
observed: [[:r, :b], []]
    pool: {:g}

Stuff is in the way

What about things that are “in the way” of the block we’re gath­er­ing or the one we’re replac­ing? We move them too. Let’s fix the :b block here:

  target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :g, :y], [:b, :y, :y, :y]]
    pool: {}

... gather correct :b (4 moves)
  target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :g, :y], []]
    pool: {:b, :y, :y, :y}

... clear out incorrect :g (2 moves)
  target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r], []]
    pool: {:b, :y, :y, :y, :g, :y}

... place correct :b (1 move from pool)
  target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :b], []]
    pool: {:y, :y, :y, :g, :y}

So by now (espe­cially if you’re imple­ment­ing these as we go), you should be notic­ing some patterns:

  • to swap out a block from a stack, first shift it and all blocks above it into the pool
  • to obtain a replace­ment block, first shift it and all blocks above it into the pool

Which choice?

Sup­pose there’s more than one place to obtain a replace­ment block. Let’s look at the:b in stack 1

  target: [[:r, :y, :b], [:g, :y], [:b, :g]]
observed: [[:r, :y, :y], [:g, :b], [:b, :g]]
    pool: {}

There are two places we could get our replace­ment :b in the observed arrange­ment: from the top of stack 2, or the bot­tom of stack 3. In this case, the sim­pli­fy­ing rule is: Choose the one that requires the min­i­mum num­ber of steps.

What does that mean in prac­tice? Well, here’s how I see it: look at every stack in the observed arrange­ment and find the depth of the top­most (or in com­put­erese, “right­most”) replace­ment. You don’t actu­ally need to move any­thing, since the score you’re adding in is just the min­i­mum depth to retrieve the replacement.

In this sit­u­a­tion, the num­ber of blocks moved to retrieve the right­most :b in each of the three observed stacks would be (using Ruby nota­tion here) [nil, 1, 2]: there is no :b in stack 1, there’s one requir­ing 1 move to retrieve from stack 2, and one requir­ing 2 moves to retrieve from stack 3.

Float­ing blocks

What about the case where our target block is not only miss­ing, but lacks a place to stand? Let’s score good old :b again:

  target: [[:r, :r, :r, :b], []]
observed: [[:r], [:r, :b, :r]]
    pool: {}

Well, clearly he’ll take more moves than just the ones to retrieve the cor­rect :b: we’ll need to shore him up somehow.

... free up missing :b to pool (2 moves)
  target: [[:r, :r, :r, :b], []]
observed: [[:r], [:r]]
    pool: {:b, :r}

... free up enough support blocks (1 more move)
  target: [[:r, :r, :r, :b], []]
observed: [[:r], []]
    pool: {:b, :r, :r}

... fill in supports (2 moves)
  target: [[:r, :r, :r, :b], []]
observed: [[:r, :r, :r], []]
    pool: {:r}

... replace :b (1 move)
  target: [[:r, :r, :r, :b], []]
observed: [[:r, :r, :r, :b], []]
    pool: {}

This can be a bit tricky, but it’s not algo­rith­mi­cally too nasty: You know how many blocks you moved to the pool to free up the cor­rect block, and you know how many blocks you have to have in the pool to build up to the place where the cor­rect block needs to stand.

If you don’t have enough blocks in the pool to make that stack, even after you’ve retrieved the cor­rect color, you’ll need to get them some­where. They can be any color, so imag­ine them pop­ping off any other stacks in the arrangement.

Pre­sum­ing there is no dif­fer­ence in the num­ber of blocks in the two arrange­ments. I hon­estly don’t know what we’d do if there was.…

The one tricky case: same stack

OK, so there’s one class of sit­u­a­tions I’ve been avoid­ing. It’s the one that trips me up, frankly.

What about when the “replace­ment” block is in the same stack as the “wrong” one?

  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:b, :r, :r, :r, :r, :r]]
    pool: {}

OK, well let’s see how it goes when we try to score the :b:

... free up missing :b to pool (6 moves)
  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[]]
    pool: {:b, :r, :r, :r, :r, :r}

... replace supports (4 moves)
  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r]]
    pool: {:b, :r}

... add :b (1 move)
  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r, :b]]
    pool: {:r}

Not too bad. The trick of course is that your algo­rithm may be assum­ing that the stacks are dif­fer­ent. Mine was, and that’s when this head-scratcher comes along:

... remove incorrect :r in position 5 (2 moves)
  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:b, :r, :r, :r]]
    pool: {:r, :r}

... free up :b (? moves)
  target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r]]
    pool: {:b, :r}

Notice what hap­pened there? Depend­ing on whether we “free up cor­rect one” or “remove incor­rect one” first, the path-dependence of manip­u­lat­ing the same stack sud­denly matters.

Now I was try­ing, as I imple­mented this all, to avoid actu­ally sim­u­lat­ing the manip­u­la­tion of blocks. The lit­tle dia­grams I’ve been draw­ing for you have been help­ful ped­a­gogic sketches, but what I wanted (and still want) to do was just add up the points for maneu­vers. Up into this case, we’ve had no inter­ac­tion between the events on one stack and the events on another: To dig out a wrong one, move it and all the blocks above it; to dig out a right one, move it and all the blocks above it; to find sup­ports, grab more from some­where else as needed.

So my algo­rithm (which I tried ever so dili­gently to use TDD to design) takes the form of some con­di­tion­als and adding stuff up. Until we get here. Maybe your design was more sim­u­la­tory (if that’s a word?), and you actu­ally did more like the lit­tle dia­grams and shifted blocks around. I don’t know if that’s bet­ter or not.

So what to do?

Me, I came up with some­thing that works, but it’s unsat­is­fy­ing still.

How about you?

Sewing it together

Now I have one more story (assum­ing I can read my own code) that’s important.

I think we both see that the score for a stack should be the sum of the scores for each block in the target (com­pared to the observed), and that the score for an arrange­ment is the sum of the scores of each stack.

In work­ing through these sto­ries, though, I came across one more sit­u­a­tion of inter­est: What about a stack that’s empty in the target, but has junk in it in the observed arrange­ment? Given the rules I just sketched, since there’s noth­ing in the target stack we won’t actu­ally count any points for it, even though it feels wrong.

So one more rule for scor­ing a stack: If there are more boxes on the observed stack, we need to charge to remove each one.

  target: [[:r, :r, :r, :r, :r], []]
observed: [[], [:r, :r, :r, :r, :r]]
    pool: {}

Stack 1 gets scored by the rules we’ve already explored. But I also think stack 2 is both­er­some. How many moves would it take to remove each extra crate on stack 2, con­sid­ered indi­vid­u­ally? 1 move for the top one; 2 for the sec­ond; 3 for the third and so on.

So here’s another exam­ple, where I’m just look­ing at the two first stacks:

  target: [[:r, :r], …]
observed: [[:r, :r, :r, :r, :r], …]

There are three extra boxes on the observed stack, com­pared with the target height. So charge that 6 points (1+2+3).

Measuring the “error” in stacked colored crates

In rewrit­ing the first sec­tion of the book, I’m work­ing through the code required to evolve Cargo-Bot puz­zle solu­tions with genetic pro­gram­ming. The trick isn’t rep­re­sent­ing these puz­zle solu­tions, since they’re already rep­re­sented in a domain-specific language.

The tricky pro­gram­ming part is eval­u­at­ing how close the final arrange­ment of col­ored crates is to the target.

Now I’ve already poked around the var­i­ous for­mal dis­tance mea­sures you’d expect to find in a string-rewriting set­ting, and they’re all too gen­eral. We’re actu­ally talk­ing about a robot that picks crates up, and sets them down again on one another.

The rep­re­sen­ta­tion of a set of stacked crates, in my Ruby imple­men­ta­tion, is an Array of Arrays, with each crate rep­re­sented by a Sym­bol indi­cat­ing its color. So for instance [[:r, :b], [:b, :r], []] is a set of three loca­tions on which crates can be stacked, with the left­most posi­tion hold­ing a blue crate on a red one, and the mid­dle posi­tion hold­ing a red one on a blue one, and the right posi­tion empty.

To deter­mine whether (for exam­ple) [[:r, :b], [:b, :r], []] is “closer” to [[:b, :r], [:b, :r], []] or [[:r, :b], [:b], [:r]], I think we need to estab­lish a met­ric that takes into account the actual one-at-a-time move­ment of crates sit­ting on stacks. So “replac­ing” a crate from some other stack must, phys­i­cally, involve unstack­ing the wrong crate, and dig­ging out the cor­rect crate, and then replac­ing the bad one with the new good one.

So my sim­pli­fy­ing notion here is that we can deter­mine the cleanup_error for every crate in turn, and add them all up as if each were “wrong” or “right” inde­pen­dently; that is, as if we only had that par­tic­u­lar crate to “fix”.

For each crate in the tar­get arrange­ment (assum­ing all the crates are the same in both setups), if the observed arrangement:

  • … has the cor­rect crate in that posi­tion: score 0 error
  • … has the wrong color in that posi­tion: score the MINIMUM num­ber of crates needed to dig out the right col­ored crate from any stack, PLUS the num­ber of crates needed to remove the WRONG crate
  • … has no crate in that posi­tion: score the MINIMUM num­ber of crates needed to dig out the right col­ored crate from any stack, plus the num­ber of crates needed (if any) to sup­port the miss­ing crate

So for exam­ple, the cleanup_error for the :r crate when the tar­get is [[:r],[]] and the observed state is [[],[:r]] is 2: one point for shift­ing off the :r from the sec­ond stack, and one point for stick­ing it where it should be.

Or if the tar­get is [[:r], [:b, :b]] and the observed arrange­ment [[], [:r, :b, :b]], then the cleanup_error will be 4: three points for shift­ing out the :r from the bot­tom of the sec­ond stack, and another one to place it correctly.

Or if the tar­get is [[:y, :y, :y, :r], [:b, :b]] and the observed arrange­ment is [[], [:y, :y, :y, :r, :b, :b]], (to score the :r only) we’ve got to move three crates to dig it out, then stack three crates under it, then place it on top, for 7 points. To fur­ther sim­plify, let’s just assume we have a “pool” of extra crates to use for fill­ing in gaps under­neath float­ing crates.

And here’s the min­i­mum thing in action: If the tar­get is [[:r], [:r, :b, :b], [:g, :r, :g]], and the observed is [[], [:r, :r, :b, :b], [:g, :r, :g]], then the cleanup_error of the :r in the first stack is 3 (not 4), since it’s two steps to dig an :r out of the third stack, but three steps from the second.

Not too bad, huh?

How about this one: If the tar­get is [[:r, :r, :r, :b, :r]] and we have [[:b, :r, :r, :r, :r]], what is the cleanup_error of the :b crate?

Well, we need to remove 5 crates to free it up from the bot­tom of the stack, and place three under it, and then place it on top. So 9.

Agile folks: can you write this code in a sim­ple test-driven way?

What is GP?

The fol­low­ing is a draft of the intro­duc­tion from the book.

What is Genetic Programming?

I’ve noticed that when you look up “genetic pro­gram­ming” at Google and read the top hits, it often sounds as though the writer imag­ines you already know what he means by the phrase. After twenty years, here’s what I think: Nei­ther you nor they know what they mean by the phrase.

But then I’m not even sure I know.

I use the phrase, of course. “Genetic Pro­gram­ming.” “GP.” And I act as though I know what I mean. It’s what I do.

Let’s try some more research. It seems like maybe you have an Inter­net where you are, and your copy of Wikipedia isn’t bro­ken. Go see what they say about genetic pro­gram­ming there.

Come back when you’re done. I’ll be here.

OK, so as I read it — at least as of this writ­ing, and Wikipedia being what it is — “Genetic Pro­gram­ming” is some kind of computer-sciencey thing that does arti­fi­cial intel­li­gence with genes that con­nect ‘+’ signs and stuff in lit­tle trees. If you read closely, there’s some­thing about com­puter pro­grams that write them­selves auto­mat­i­cally. Plus there’s a lot of dif­fer­ent alter­na­tive approaches to it… what­ever it is. And based on the word­ing and the edit his­tory of the Wikipedia page some ways of doing it are clearly bet­ter than oth­ers… at some­thing… even I don’t quite under­stand what.

Also there’s muta­tion and crossover.

Yeah, that sounds tech­ni­cal enough, right? Can we agree to move ahead with that?

Ah, yes.… I thought not. Let me look around for a bit.

How about this? Here is a very good book I rec­om­mend to all my stu­dents: The Field Guide to Genetic Pro­gram­ming by Ric­cardo Poli, William B. Lang­don and Nic McPhee. It’s avail­able elec­tron­i­cally! You can read it now.

No? Not quite done?

All right, let’s bring out the big guns. How about Sean Luke’s Essen­tials of Meta­heuris­tics. I vouch for it whole­heart­edly: it’s full of inspir­ing machine learn­ing things, all explained sim­ply. And also avail­able elec­tron­i­cally. Read that!

Before we go any far­ther, let me tell you how this is going to end:

The stuff we call “genetic pro­gram­ming” is an inco­her­ent suite of tech­ni­cal habits — design pat­terns, mod­els, idioms — most often used to accel­er­ate human inno­va­tion.

All that; not that

The sen­ti­ment isn’t new. It just doesn’t get repeated often enough.

It’s a cliché when the author of a a tech­ni­cal work starts off by say­ing he’s a “bit of a heretic”, imply­ing that what he’s about to impart will prob­a­bly get the reader in trou­ble if repeated in the wrong com­pany.

For one thing it helps pro­mote a sense that the for­mal dis­ci­pline as “dynamic” and “lively”. You know, with beardy codgers and plucky upstarts con­ven­ing in lux­u­ri­ous Vic­to­rian audi­to­ria to threaten one another with walking-sticks before rac­ing to the Pole to show those fools what a real dinosaur looks like.

Also a nasty back-handed recruit­ing trick, if you ask me. I’ve been to way too many meet­ings, and they would have all been much bet­ter if we’d had walking-sticks, let alone dinosaurs.

The prover­bial “bit of heresy” can also be help­ful when the author is feel­ing self-conscious about play­ing fast and loose with details, or wants to puff up his own author­ity, or might even be fail­ing to give credit to col­leagues who deserve it. I write these words on the anniver­sary of one par­tic­u­larly noto­ri­ous exam­ple of the lat­ter, so don’t think it doesn’t hap­pen: being an “out­sider” sug­gests to the inno­cent reader that you might have thought all this stuff up on your own.

Telegraphed “heresy” can also be ped­a­gog­i­cally use­ful. If only they keep read­ing, the read­ers might be let in on a juicy bit of gos­sip about you know… that whole Leib­niz – New­ton thing, or… have you heard about how Alexan­der the Great really com­pared as a ruler to his dad? Keeps them from falling asleep or skip­ping to the answers in the back of the book.

But then — and you can’t tell me you didn’t see this com­ing: some­times it’s true.

So this is my hereti­cal ver­sion of What Genetic Pro­gram­ming Actu­ally Is:

I have no damned idea.

It’s all over the place. No, seri­ously — you have no notion what a bur­den it can be, try­ing to write one one of these intro­duc­tory overviews.

First we would have to review some his­tory. I’d point out that seven or eight (or a dozen) inde­pen­dent thinkers invented Genetic Pro­gram­ming through the last fifty years. They each called their vari­a­tion some dif­fer­ent thing1, and the details of imple­men­ta­tion were all dif­fer­ent, and some of the vari­a­tions are little-known while oth­ers are huge stars. None is everything.

Then to be fair I would have to say not only what all those inven­tors did back then, but also sum up all the impor­tant things the ten thou­sand sub­se­quent peo­ple work­ing with GP did in their papers and books and arti­cles and con­fer­ence posters on the sub­ject. Plus there’s all the domain-specific appli­ca­tion work. Plus the com­mer­cial and pro­pri­etary meth­ods, each one vying for authen­tic­ity and authority.

But that’s just a raw fact-dump. So next I’d need to cover the trends and cul­tural norms, themes and motifs, note­wor­thy genealo­gies and regionally-distinct Schools of Thought.

And then I’d need to fix some of your mis­con­cep­tions because “Genetic Pro­gram­ming” may be the most mis­lead­ing tech­ni­cal name in the whole world. I’d point out that it’s not genetic algo­rithms even though it sounds the same. It’s not really any­thing like math­e­mat­i­cal pro­gram­ming or con­straint pro­gram­ming. It’s not philo­soph­i­cally any­thing like bio­log­i­cal evo­lu­tion, even if you squint. It’s not quite the same as machine learn­ing (or it is, depend­ing on who you ask), not least because say­ing so pisses off the Sta­tis­ti­cians (who know bet­ter). It’s not just evolv­ing LISP trees, it’s evolv­ing all kinds of struc­tures and plans and algo­rithms and ideas and art. It’s not just sym­bolic regres­sion. It’s not a lot of things, apparently.

So what is it?

No, really

What­ever it isn’t, I can say that Genetic Pro­gram­ming is the cumu­la­tive work of a huge num­ber of very smart peo­ple. Thou­sands of researchers and prac­ti­tion­ers around the world. They have almost all been pas­sion­ate vision­ar­ies, and have all done amaz­ing things to… well, to achieve what­ever Genetic Pro­gram­ming turns out to be for in their diverse indi­vid­ual cases.

I am reminded that the soci­ol­o­gist Andrew Abbott pub­lished a very inter­est­ing and read­able book in 1988, which has helped me quite a bit to under­stand what GP actu­ally is. Abbott’s book is called The Sys­tem of Pro­fes­sions: An Essay on the Divi­sion of Expert Labor.

What? Why shouldn’t I define it with a soci­ol­ogy book? How is it you have paid so lit­tle atten­tion to the rant thus far?!

Any­way, in Sys­tem of Pro­fes­sions, Abbott describes the dynam­ics of pro­fes­sion­al­iza­tion. That is, how tech­ni­cally astute peo­ple with over­lap­ping tech­ni­cal roles come to self-identify and pro­mote their shared inter­ests by cre­at­ing (and even­tu­ally polic­ing) a pro­fes­sion. In Abbott’s model, pre-professional “fields” arise when­ever diverse peo­ple find them­selves explor­ing and exploit­ing par­tic­u­lar new oppor­tu­ni­ties — espe­cially new tech­ni­cal inventions.

His story of the stages of pro­fes­sion­al­iza­tion includes the devel­op­ment of regional and social com­mu­ni­ties of shared inter­est, then com­mu­ni­ties of prac­tice… then at some point they name them­selves. Then the boundary-setting starts, and the self-definition, and the author­i­ta­tive self-regulated train­ing and cre­den­tial­ing sys­tems, and finally — as a pat­tern, not a rule — we find them build­ing legal infra­struc­ture, rang­ing from Asso­ci­a­tions to Unions to state-licensed reg­u­la­tory bod­ies.2

No, this isn’t a digres­sion. You asked. Well, OK, I asked rhetor­i­cally for you: What is Genetic Programming?

And I answer, not at all rhetor­i­cally: Genetic Pro­gram­ming is a “field” emerg­ing from the inter­ests of diverse peo­ple, who find them­selves explor­ing and exploit­ing a par­tic­u­lar new oppor­tu­nity. It is their shared prac­tices and norms, their habits and their goals.

I could define GP as “the search for for­mal algo­rith­mic struc­tures by using meta­heuris­tics inspired by bio­log­i­cal evo­lu­tion”, but it can­not merely be that. Because (as you’ll learn first-hand) you don’t have to use evolution-like things to search.

I could try to uniquely iden­tify GP as “meta­heuris­tic opti­miza­tion of struc­tures, as opposed to tra­di­tional para­met­ric search or analytically-derived opti­miza­tion algo­rithms”. But (as you’ll learn first-hand) we some­times use those other things too. GP can’t just be evolv­ing pro­grams, because some peo­ple evolve anten­nas and bridges and molecules.

GP can’t just be for data min­ing, because some peo­ple evolve com­pletely abstract proofs. It isn’t about the tools or techniques.

It is, in fact and not just metaphor­i­cally, a com­mu­nity of self-identified peo­ple who share a way of try­ing to solve problems.

Ask­ing what GP “is” at this point in its pro­fes­sional his­tory is like ask­ing what “pro­gram­ming” is: Pro­gram­mers use com­put­ers to solve prob­lems for peo­ple. They don’t do it in any par­tic­u­lar way, except that most of them type on a key­board.

But “typ­ing” is not pro­gram­ming. Just as “evolv­ing code” is not GP.

Look at pro­fes­sional com­put­ing. You can eas­ily see pro­fes­sional bound­aries between the many peo­ple who write pro­grams. There are Soft­ware Engi­neers, and Com­puter Sci­en­tists, Pro­gram­mers and Ana­lysts. And of course there are those who pre­fer the label Soft­ware Devel­op­ers, so they can self-differentiate them­selves as the ones who actu­ally know how to col­lab­o­rate and make pro­grams that peo­ple can actu­ally to use to do stuff.3

I’m quite seri­ous: “Genetic Pro­gram­ming” lives some­where a bit ear­lier in the same pro­fes­sion­al­iza­tion story. As Rick Riolo has said many times: “It’s an art try­ing to become a craft.”

If you ask them, most will say they are doing auto­mated search for abstract struc­tures that solve prob­lems. But the details vary wildly, and every real or the­o­ret­i­cal prob­lem is still a spe­cial case.

So for the time being Genetic Pro­gram­ming is what peo­ple do, who self-identify as “using Genetic Programming.”

Tozier, that isn’t really very helpful

Yeah, trust me: I am totally on your side.4

But I have writ­ten this book, and you are read­ing it. Rather than think­ing you and I are both crazy, explain it this way:

If we play our cards right, we can our­selves define Genetic Programming.

I don’t mean to imply “GP is what you think it is”. I mean the field is so young and mal­leable, that you can learn to do amaz­ing things with­out ever being told you’re doing it wrong.

In these last twenty years I’ve seen for­tunes made, dis­ease treat­ments invented, patentable inven­tions piled thou­sand deep, philo­soph­i­cal and the­o­ret­i­cal prob­lems set­tled, space probes launched, robots that learn to walk in their dreams.…

Peo­ple can use GP to cre­ate things they could oth­er­wise only imag­ine. Here’s my lit­tle True Heresy, stated another way: Those peo­ple are not using GP to “auto­mat­i­cally invent” things. It isn’t a magic inven­tion machine.

It’s an accelerator.

I’ve hung out with a num­ber of these folks, through the years. They’re not smug geniuses… as a rule. Rather, they walk around in a sort of daze, telling one another how sur­prised they were by what they were shown when they started using GP.

A human being invents when she uses GP to con­sider a mil­lion out­ra­geous struc­tures and lay­outs no sane design engi­neer could incre­men­tally develop. The “inven­tion” hap­pens when she — a standard-issue human being—notices that some of those mil­lion designs is interesting.

That’s the same thing a tra­di­tional design engi­neer does, but faster. The effort is in a dif­fer­ent place.

A human being explains some­thing the world when he uses GP to con­sider a thou­sand novel mod­els of data, in less time than a tra­di­tional sta­tis­ti­cian can eval­u­ate two. The “expla­na­tion” hap­pens when he — a standard-issue human being—notices that some of the best mod­els invoke rela­tion­ships between vari­ables that nobody else had never mentioned.

That’s the same thing a tra­di­tional sta­tis­ti­cian work­ing with a domain expert does to explain the world, but faster. The effort is in a dif­fer­ent place.

And so on: an artist explores a thou­sand com­po­si­tions; a bio­med­ical researcher exam­ines a dozen or a hun­dred genomes and a mil­lion gene expres­sion pro­files; a trader mon­i­tors a mil­lion port­fo­lio man­age­ment rules.

The same thing they would nor­mally do. But more. The effort is in a dif­fer­ent place: on the think­ing.

Genetic Pro­gram­ming doesn’t auto­mate think­ing or cre­ativ­ity or any of those things. It helps peo­ple notice things.

GP is a prosthesis

Think about writ­ing — you know, with a pen, on paper. Writ­ing isn’t “auto­mated mem­ory”. Or think about pro­gram­ming com­put­ers. It isn’t “auto­mated arithmetic”.

Writ­ing and pro­gram­ming extend your mind. Writ­ing is a pros­the­sis in the sense that it offloads mem­o­ries to a long-term exter­nal stor­age medium. Pro­gram­ming is a pros­the­sis in the sense that it cal­cu­lates stuff really really fast.

But nei­ther one is “auto­matic”. Harry Pot­ter notwith­stand­ing, there are no self-writing pens, and no self-programming computers.

See what I did right there? There are no self-programming com­put­ers. That includes Genetic Pro­gram­ming, regard­less of what you may have heard from the nerds down the street.

I can’t tell you how many peo­ple I’ve seen come to GP, hav­ing read the hype about auto­mated inven­tion and stuff. Like a per­son who wants to write bet­ter, so she gets a really pow­er­ful pen. The per­son who wants to learn to pro­gram games, so he gets a really pow­er­ful com­puter.

How do you learn to write? How do you learn to pro­gram? Same with GP. Through guided prac­tice. We think a bit, we try some­thing, we learn if we’re lucky, and maybe we solve some problems.

And if we’re very good problem-solvers, we can use GP to help our­selves become use­fully surprised.


  1. Evo­lu­tion­ary pro­gram­ming, genetic pro­gram­ming, some Ger­man ones I can’t recall the names of at the moment… no many doubt oth­ers. 

  2. Ellen Mazur Thom­son pro­vides a lovely exam­ple of this same pro­fes­sion­al­iza­tion dynamic in her well-written his­tor­i­cal case study of the print­ing and graphic design trades: The Ori­gins of Graphic Design in Amer­ica, 1870 – 1920

  3. Though even they are frag­ment­ing on the basis of method­ol­ogy and domain.… 

  4. If only we’d had walk­ing sticks at the con­fer­ences these last twenty years, it would have all been so much more effi­cient.… 

Best Seen plots are best not seen…

The Best Seen plot

For decades (lit­er­ally), evo­lu­tion­ary algo­rithms (espe­cially genetic pro­gram­ming) have been visu­al­ized with “Best Seen plots”, or occa­sion­ally “Best-and-average plots”. Sim­ple enough to do, these are just a line chart of the first order sta­tis­tic of the sam­ple of fit­nesses seen so far. Because some­body with some sta­tis­ti­cal back­ground said some­thing back in the 90s, many Best Seen plots are actu­ally plots of the aver­age of sev­eral runs of the evo­lu­tion­ary algo­rithm, “so as bet­ter to com­pare per­for­mance between methods.”

In my expe­ri­ence, very few peo­ple have an intu­ition of how order sta­tis­tics work. I know I don’t.

First of all, it seems to me that the implicit premise of a Best Seen plot is that the under­ly­ing process you’re plot­ting is iid, and of course we all know that evo­lu­tion­ary algo­rithms of all sorts con­verge, and that the dis­tri­b­u­tion of vari­a­tions often shifts from “explo­ration” to “exploita­tion” (and back again, in some cases). So – it seems to me – the thing one wants to see is where that shift arises, the “elbow” of the curve, at least the change in rate of advancement.

If you want to know any­thing about the “best so far” at all.

More impor­tant, at least to me, is the infor­ma­tion these Best Seen plots hide. I can’t tell you how many papers I’ve seen where there’s a Aver­aged Best Seen pile of over­lap­ping hockey-sticks for dif­fer­ent test con­di­tions, and the authors spend two or three columns fret­ting about try­ing to “explain” what’s hap­pen­ing under var­i­ous con­di­tions. This just came up at the GPTP work­shop last week: Why is the lit­tle red hockey-stick curve taller than the blue one in the end, but they cross in the mid­dle? What makes the qual­i­ta­tive behav­ior of the two treat­ments differ?

Why don’t you plot the data you’re talk­ing about?

The Piano Roll plot

I’m a big fan of the use of sum­mary sta­tis­tics (includ­ing order sta­tis­tics, aver­ages, medi­ans, all that junk) for sto­ry­telling about the data. For explana­tory power they pro­vide, and the the­o­ret­i­cal under­pin­nings that can be use­ful when the accom­pa­ny­ing assump­tions are met.

But when­ever I am actu­ally try­ing to explain what’s hap­pen­ing in the course of an evo­lu­tion­ary algorithm’s run, I use a Piano Roll plot. Here’s one I dug up:

10100.jpg

  • A piano roll, as you know, is a long sheet of rolled paper, into which are cut a series of holes which trig­ger the keys of an auto­mated player piano or music box. A Piano Roll plot is sim­ply a plot of every eval­u­ated fit­ness in the order of evaluation.

    No sum­mary sta­tis­tics applied.

    I mean look at the infor­ma­tion you have there. Dur­ing the entire course of the run, you can see the “themes” of indi­vid­u­als with log error about 42, and about 14, and about 8. You can see all the actual work being done in the first 2500 eval­u­a­tions or so. You can see when minor vari­a­tions make incre­men­tal improvements.

    What you’re see­ing is the real impact your algo­rithm has on the mea­sured fitnesses.

    Do those weird lines at 42, 14 and 8 arise because of the inher­ent prob­lem struc­ture, or because there is an inher­ent ten­dency in the problem’s for­mu­la­tion to ran­domly pro­duce solu­tions with that quality?

    I can tell you imme­di­ately, even though these are results from years ago. You see that green line? That indi­cates the point where ran­dom guess­ing (dur­ing ini­tial­iza­tion) tran­si­tions to active search. We see there are def­i­nitely peaks at 42, 14 and 8, and so yes: those lines are per­fectly rea­son­able out­comes of my search algo­rithm “break­ing” bet­ter solu­tions back into their com­po­nent approximations.

    Why does that lit­tle stub of a line around log error 3 appear and then disappear?

    I can tell you imme­di­ately: Actual search is work­ing there. That fam­ily of solu­tions is the real “best seen” in those first 2500 steps or so, but it clearly gives rise to a sit­u­a­tion where selec­tion can pre­vent rever­sion to that form. Com­pare it to the long lines that cover the entire time-span, and you can see that in this case the appear­ance and sub­se­quent dis­ap­pear­ance of that vari­a­tion around 3 is qual­i­ta­tively dif­fer­ent from the one around 12.

    Why does the “best seen” curve (the lowest-error solu­tion, in this ori­en­ta­tion) have such a tight “elbow” around 2500 steps?

    In a Best Seen plot, you would lose all infor­ma­tion you might use to explain the dynam­ics of the “elbow” we see. Here we can see exploita­tion has stopped by about 2500 steps, and what we’re get­ting is mainly ran­dom search for the remain­ing 17000 steps or so. How can I say that? Because the rolling his­togram of fit­nesses, say over 200 steps, would look a lot like the ini­tial ran­dom search plus the best solu­tions found.

    Assum­ing this search is not “finished” — that is, that it hasn’t found the “right” solu­tion — I can tell you imme­di­ately what’s hap­pened: we’ve con­verged to a sub­op­ti­mal solution.

    You couldn’t tell me that with a Best Seen plot. You could guess. But as we begin to use more real­is­ti­cally com­pli­cated algo­rithms your guess is less likely to be useful.

    And draw­ing five or six guesses — five or six Best Seen plots — all on one plot?

    Do you want me to inter­ro­gate you in pub­lic and irri­tate you and make your speak­ing engage­ment a liv­ing hell, only to have you look at the data you threw away to answer a rea­son­able ques­tion like mine?

    No, you do not. Stop it.

    Another exam­ple

    Here’s another plot from a project I’m work­ing on right now:

    pianoRoll2.jpg

    Yes, there are a lot of points. I’m not look­ing at the points, I’m look­ing at the behav­ior of my algo­rithm in the con­text of the prob­lem it’s being applied to solve.

    I see a lot of things here, mainly in com­par­i­son to other prob­lems and other vari­a­tions of the algo­rithm. But there’s some­thing I want to point out (I’ve anno­tated the PDF): around 20000 steps, there is a tiny improve­ment in the error (you can see the small drop at the lower edge).

    But the inter­est­ing thing to me — the guy writ­ing the algo­rithm and hop­ing to make it work for a given task — is all that activ­ity hap­pen­ing sud­denly in “worse” solu­tions, start­ing around 18000 steps. A whole slew of new error rates start crop­ping up in the 130 – 140 range, which have rarely if ever been seen before in the run.

    Where do those come from? I can only infer they’re related to the slightly-better best that appears at the same time. Crossover and muta­tion prod­ucts, some favor­able mat­ings with what we had on hand.

    This is infor­ma­tion lost in the Best Seen plot. Explana­tory, use­ful information.

    Think about my response to the two forms:

    1. If I were watch­ing merely the Best Seen plot as this run was unfold­ing, I’d see an elbow around 2000 eval­u­a­tions, and a long plateau. Noth­ing seems to be chang­ing, even though within the actual pop­u­la­tion there’s plenty going on under the sur­face. If I were patient enough, and could have sat and stared at noth­ing for 17000 eval­u­a­tions, finally a lit­tle “blip” but then noth­ing bet­ter again for a long time. I’d get bored.
    2. Since I am watch­ing the Piano Roll plot, I see there was poten­tial for improve­ment. While the search is run­ning, I am spend­ing time work­ing on a diversity-preserving vari­a­tion that I hope can intro­duce “new blood” more often after about 3000 eval­u­a­tions. About the time I’m done writ­ing that, the Piano Roll shows me not a lit­tle “blip” but a qual­i­ta­tive change in the pop­u­la­tion dynam­ics around 17000 steps. And I can tell a use­ful story about that: the long wait was specif­i­cally the period when the pop­u­la­tion was wrestling with com­ing up with a novel approach to the prob­lem using muta­tion, not crossover. Fur­ther sup­port­ing evi­dence for my idea that more raw mate­ri­als are needed.

    And again, imag­ine if I had not only looked at the Best Seen plot for one run, but I had run the thing five or twenty or a hun­dred times and aver­aged those record curves.

    I won’t say the Best Seen plot is utterly use­less, because clearly peo­ple find enough value in them that they fill the pages of pro­ceed­ings vol­umes. But I can­not use one in my work, and nor can any­body else who is try­ing to solve the prob­lem on which they are searching.

    Best Seen plots are, appar­ently, for telling a very par­tic­u­lar kind of story about evo­lu­tion­ary algo­rithms, in a way that makes the world seem smooth and clean and unen­cum­bered by inter­est­ing variety.

    The world I work in is con­tin­gent. It’s the same one you work in, as it hap­pens. So when you tell me one of those nice smooth sto­ries about the imag­i­nary world, you leave your­self open for me to tell, quite sim­ply, five other sto­ries just as good.

    Do not show me your Best Seen plot, unless you have your story down pat.