So yesterday I spent a while trying to implement an intuitive error heuristic for Cargo-Bot configurations.
It’s an interesting exercise in test-driven design, and I don’t have the sense I got it done very well the first time, so I’m going to spend a little while exploring in more detail, in the spirit of a coding kata. I can definitely use some more help with my test-driven dev chops.
I had a bit of trouble thinking it through myself, when I got more formal. This might be simpler in your language, in your testing framework. But I was able to work it all the way through with Ruby/Cucumber yesterday, though I confess I pushed the notion of “acceptance test” pretty close towards unit testing in doing so.
If you need some hints, or you want to see how a behavior-driven approach ends up, then take a look at my scenarios for this one feature (scoring different arrangements).
Stacks of Crates
So the problem, briefly, is that you have two different arrangements of (the same) colored blocks in a fixed number of stacks. We’ll call one arrangement target, since it’s your goal in solving the puzzle-game; we’ll call the other the observed one.
There are many distance metrics one could consider using, but those all seem to permit unrealistic physical manipulations. For example, Levenshtein distance measures and their derivatives, which are used to compare strings, permit “swaps” between blocks as if that were a context-independent operation. You can weight swaps to “cost” more in those metrics, but those costs have no sense of the difference of swapping two blocks sitting on top of the same stack, as opposed to two blocks each on the bottom of a stack of 50 other blocks.
How I decided to approach the problem is to treat each block’s contribution to the total score as independent. Yes, this “simplification” itself introduces unrealistic consequences almost as heinous as the ones in the Levenshtein example I just gave, since when you swap two adjacent blocks in my scheme you’re counting the number of moves you use to replace the top block with the bottom block, and then also the number of moves you make to replace the bottom block with the top block (ignoring the fact you could take them both on and put them both back at once).
At this point? I don’t care. I’m the decider.
Now technically what we’re about to do here is implement a heuristic or a quasimetric, not a metric. It makes reasonable physical sense; consider two different arrangements of the same blocks:
A: [[:r, :r, :r, :b, :r]]
B: [[:b, :r, :r, :r, :r]]
To review, this notation means that A and B are configuration with a single stack, containing five blocks. The :b block is higher on stack 1 of arrangement A, and sits on the bottom of stack 1 of arrangement B.
If A is the target and B is what you have on hand, then to transform B into A you will need to make these changes to fix each block (top to bottom):
- do nothing to fix the correct top block
- unstack all 5 blocks to get the
:b, then stack 3 support blocks, then replace the:b - do nothing to fix the correct third block up
- do nothing to fix the correct second block up
- unstack all 5 blocks, then add an
:rto replace the bottom block
Notice something else that happens here, which I may not have said out loud before but which is almost certainly important: I’ve added an extra “space” to the system, a “pool” where I can just shove things and retrieve them. In my mental model of blocks stacked on a table, the “pool” is the table top itself. Within the game Cargo-Bot there’s no such thing, and blocks can only live on stacks or be moved between them. I’m not playing Cargo-Bot here, I’m evaluating how close one arrangement is to another in a physically intuitive way.
On the other hand, if B is the target and A is what you observe:
- do nothing to fix the top block
- take 2 blocks off the remove the incorrect
:b, then add back 1:rto fix the second block down - middle is right
- second one up is right
- take five blocks off to remove the wrong
:r, but you’ve already set the:baside in the “pool” by doing that, so you can just add the:bwith one move
As I count them, the result we’ll get saying A is target is 15 moves, and if we say B is target it’s 9 moves. And that’s OK, as far as I’m concerned. Gravity is a thing, order matters, and not everything in the world has the sort of symmetry mathematicians presume.
What’s wrong with this block?
Notice what we just did, though: we evaluated each block in turn, then added it all up. I don’t care if in “fixing” one block you accidentally “fix” another. If we counted all those, we’d be in integer programming and constraint-solving land, and I don’t need that headache now.
We’re counting each block as though everything else were inconsequential, and adding the independent cleanup moves for that one box.
I hope you can tell from the example we just walked through, each crate or block or box (I find myself slipping with the word, so forgive me) falls into one of several simple classes. Let’s treat them in turn.
I’ll make the notional “pool” visible, even though the algorithm I ended up with didn’t bother implementing it. Turns out you can make do with counting, once you’ve explored the possibilities.
The block is OK
If the block in the target is the same color as the block in the same position in the same stack of the observed arrangement, no cleanup is necessary.
target: [[:r]]
observed: [[:r]]
pool: {}
Score = 0 cleanup for that block.
The block is completely missing
I know I said the block arrangements were the “same”, but I get paranoid when I’m testing, and as it happens there is a strange sort of edge-case in Cargo-Bot where a block can appear to disappear: when it’s left in the claw.
So in these cases, I just want to note, “Hey, something’s not right here!” so I slapped a penalty of 100 points for any block in the target which is completely absent from the observed state.
target: [[:r]]
observed: [[ ]]
pool: {}
Score = 100 for that block.
The block is in another stack, maybe with blocks on it
Here’s an example, with two stacks:
target: [[:r, :b], [:g]]
observed: [[:r], [:g, :b]]
pool: {}
Let’s consider the :b block only. There is no block in the position 2 of stack 1 in the observed setup. So we have to go find it. Aha! there it is! How many moves does it take to get it in the right position?
It’s on top of stack 2, so we can just un-stack it, where I like to think of it ending up in a sort of notional “pool”:
target: [[:r, :b], [:g]]
observed: [[:r], [:g]]
pool: {:b}
It’s available for use, so we can plop on stack 1 and be done.
What happens if the :b is deeper, as in
target: [[:r, :b], [:g]]
observed: [[:r], [:b, :g]]
pool: {}
Well, we need that :b. So we unstack it, along with everything on top of it, taking 2 moves…
target: [[:r, :b], [:g]]
observed: [[:r], []]
pool: {:b, :g}
and then we can replace it. Do we care about the extra :g? No, it doesn’t signify for this particular case. It had to be moved (so we count the removal), but it can sit there.
The block is wrong
target: [[:r, :b], [:g]]
observed: [[:r, :g], [:b]]
pool: {}
So, again thinking only of the :b in our example, what do we do? First we can free up the :b and place it in the “pool”.
target: [[:r, :b], [:g]]
observed: [[:r, :g], []]
pool: {:b}
Then we need to remove the wrong block that’s in the way (also to the pool):
target: [[:r, :b], [:g]]
observed: [[:r], []]
pool: {:b, :g}
… and then we can move the correct :b onto the first stack.
target: [[:r, :b], [:g]]
observed: [[:r, :b], []]
pool: {:g}
Stuff is in the way
What about things that are “in the way” of the block we’re gathering or the one we’re replacing? We move them too. Let’s fix the :b block here:
target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :g, :y], [:b, :y, :y, :y]]
pool: {}
... gather correct :b (4 moves)
target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :g, :y], []]
pool: {:b, :y, :y, :y}
... clear out incorrect :g (2 moves)
target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r], []]
pool: {:b, :y, :y, :y, :g, :y}
... place correct :b (1 move from pool)
target: [[:r, :b, :y], [:g, :y, :y, :y]]
observed: [[:r, :b], []]
pool: {:y, :y, :y, :g, :y}
So by now (especially if you’re implementing these as we go), you should be noticing some patterns:
- to swap out a block from a stack, first shift it and all blocks above it into the pool
- to obtain a replacement block, first shift it and all blocks above it into the pool
Which choice?
Suppose there’s more than one place to obtain a replacement block. Let’s look at the:b in stack 1
target: [[:r, :y, :b], [:g, :y], [:b, :g]]
observed: [[:r, :y, :y], [:g, :b], [:b, :g]]
pool: {}
There are two places we could get our replacement :b in the observed arrangement: from the top of stack 2, or the bottom of stack 3. In this case, the simplifying rule is: Choose the one that requires the minimum number of steps.
What does that mean in practice? Well, here’s how I see it: look at every stack in the observed arrangement and find the depth of the topmost (or in computerese, “rightmost”) replacement. You don’t actually need to move anything, since the score you’re adding in is just the minimum depth to retrieve the replacement.
In this situation, the number of blocks moved to retrieve the rightmost :b in each of the three observed stacks would be (using Ruby notation here) [nil, 1, 2]: there is no :b in stack 1, there’s one requiring 1 move to retrieve from stack 2, and one requiring 2 moves to retrieve from stack 3.
Floating blocks
What about the case where our target block is not only missing, but lacks a place to stand? Let’s score good old :b again:
target: [[:r, :r, :r, :b], []]
observed: [[:r], [:r, :b, :r]]
pool: {}
Well, clearly he’ll take more moves than just the ones to retrieve the correct :b: we’ll need to shore him up somehow.
... free up missing :b to pool (2 moves)
target: [[:r, :r, :r, :b], []]
observed: [[:r], [:r]]
pool: {:b, :r}
... free up enough support blocks (1 more move)
target: [[:r, :r, :r, :b], []]
observed: [[:r], []]
pool: {:b, :r, :r}
... fill in supports (2 moves)
target: [[:r, :r, :r, :b], []]
observed: [[:r, :r, :r], []]
pool: {:r}
... replace :b (1 move)
target: [[:r, :r, :r, :b], []]
observed: [[:r, :r, :r, :b], []]
pool: {}
This can be a bit tricky, but it’s not algorithmically too nasty: You know how many blocks you moved to the pool to free up the correct block, and you know how many blocks you have to have in the pool to build up to the place where the correct block needs to stand.
If you don’t have enough blocks in the pool to make that stack, even after you’ve retrieved the correct color, you’ll need to get them somewhere. They can be any color, so imagine them popping off any other stacks in the arrangement.
Presuming there is no difference in the number of blocks in the two arrangements. I honestly don’t know what we’d do if there was.…
The one tricky case: same stack
OK, so there’s one class of situations I’ve been avoiding. It’s the one that trips me up, frankly.
What about when the “replacement” block is in the same stack as the “wrong” one?
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:b, :r, :r, :r, :r, :r]]
pool: {}
OK, well let’s see how it goes when we try to score the :b:
... free up missing :b to pool (6 moves)
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[]]
pool: {:b, :r, :r, :r, :r, :r}
... replace supports (4 moves)
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r]]
pool: {:b, :r}
... add :b (1 move)
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r, :b]]
pool: {:r}
Not too bad. The trick of course is that your algorithm may be assuming that the stacks are different. Mine was, and that’s when this head-scratcher comes along:
... remove incorrect :r in position 5 (2 moves)
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:b, :r, :r, :r]]
pool: {:r, :r}
... free up :b (? moves)
target: [[:r, :r, :r, :r, :b, :r]]
observed: [[:r, :r, :r, :r]]
pool: {:b, :r}
Notice what happened there? Depending on whether we “free up correct one” or “remove incorrect one” first, the path-dependence of manipulating the same stack suddenly matters.
Now I was trying, as I implemented this all, to avoid actually simulating the manipulation of blocks. The little diagrams I’ve been drawing for you have been helpful pedagogic sketches, but what I wanted (and still want) to do was just add up the points for maneuvers. Up into this case, we’ve had no interaction between the events on one stack and the events on another: To dig out a wrong one, move it and all the blocks above it; to dig out a right one, move it and all the blocks above it; to find supports, grab more from somewhere else as needed.
So my algorithm (which I tried ever so diligently to use TDD to design) takes the form of some conditionals and adding stuff up. Until we get here. Maybe your design was more simulatory (if that’s a word?), and you actually did more like the little diagrams and shifted blocks around. I don’t know if that’s better or not.
So what to do?
Me, I came up with something that works, but it’s unsatisfying still.
How about you?
Sewing it together
Now I have one more story (assuming I can read my own code) that’s important.
I think we both see that the score for a stack should be the sum of the scores for each block in the target (compared to the observed), and that the score for an arrangement is the sum of the scores of each stack.
In working through these stories, though, I came across one more situation of interest: What about a stack that’s empty in the target, but has junk in it in the observed arrangement? Given the rules I just sketched, since there’s nothing in the target stack we won’t actually count any points for it, even though it feels wrong.
So one more rule for scoring a stack: If there are more boxes on the observed stack, we need to charge to remove each one.
target: [[:r, :r, :r, :r, :r], []]
observed: [[], [:r, :r, :r, :r, :r]]
pool: {}
Stack 1 gets scored by the rules we’ve already explored. But I also think stack 2 is bothersome. How many moves would it take to remove each extra crate on stack 2, considered individually? 1 move for the top one; 2 for the second; 3 for the third and so on.
So here’s another example, where I’m just looking at the two first stacks:
target: [[:r, :r], …]
observed: [[:r, :r, :r, :r, :r], …]
There are three extra boxes on the observed stack, compared with the target height. So charge that 6 points (1+2+3).
I’m still not sure I understand your metric. Consider this test of yours: target [[:g, :r, :r, :r]], observed [[:r, :r, :r, :g]]. Shouldn’t it be 1 to find the g + 4 to dig out the wrong r + 1 to insert the g = 6? (It’s 5 according to https://github.com/Vaguery/CargoBot-ruby/blob/54902aeb90/features/cleanup_distance.feature#L38
Or http://github.com/Vaguery/CargoBot-ruby/blob/54902aeb90/features/cleanup_distance.feature#L53: target [[:y, :y, :y, :r], [:b, :b]], observed [[], [:y, :y, :y, :r, :b, :b]], shouldn’t the error for the :r be 3 to dig it out + 3 to build up the first stack + 1 to place it = 7?
—
Here’s the solution I came up with. It’s in a private lisp dialect, but I’ve tried to make it straightforward to run.
https://gist.github.com/2731697
I have the beginnings of a writeup but I want to wait until I’m on the same page as you to show it.
target: [[:g, :r, :r, :r]]
observed: [[:r, :r, :r, :g]]
pool: {}
… remove incorrect :r (bottom of stack) (4 moved blocks)
target: [[:g, :r, :r, :r]]
observed: [[]]
pool: {:r, :r, :r, :g}
… place :g, taking it from the pool (1 move)
target: [[:g, :r, :r, :r]]
observed: [[:g]]
pool: {:r, :r, :r, :g}
The “trick”, if any, is that you need to take into account the reasonable path-dependent route for exchanging items within the same stack. Consider your version:
target: [[:g, :r, :r, :r]]
observed: [[:r, :r, :r, :g]]
pool: {}
… fetch :g from top of stack (1 moved block)
target: [[:g, :r, :r, :r]]
observed: [[:r, :r, :r]]
pool: {:g}
… clear out incorrect :r (3 moved blocks)
target: [[:g, :r, :r, :r]]
observed: [[]]
pool: {:r, :r, :r, :g}
… replace correct :g (1 move from pool)
target: [[:g, :r, :r, :r]]
observed: [[:g]]
pool: {:r, :r, :r}
Make sense? Like I said above, the path-dependence of manipulations of a single stack makes the arithmetic approach I took in my code a bit more complicated.
In the case of the score contribution for the :r in
target: [[:y, :y, :y, :r], [:b, :b]]
observed: [[], [:y, :y, :y, :r, :b, :b]]
pool: {}
… dig out correct :r (3 moved blocks)
target: [[:y, :y, :y, :r], [:b, :b]]
observed: [[], [:y, :y, :y]]
pool: {:r, :b, :b}
… need one more block for support (1 moved block)
target: [[:y, :y, :y, :r], [:b, :b]]
observed: [[], [:y, :y]]
pool: {:r, :b, :b, :y}
… place supports (3 moved blocks)
target: [[:y, :y, :y, :r], [:b, :b]]
observed: [[:?, :?, :?], [:y, :y]]
pool: {:r}
… place :r (1 moved block)
target: [[:y, :y, :y, :r], [:b, :b]]
observed: [[:?, :?, :?, :r], [:y, :y]]
pool: {}
Ah, yes I wasn’t taking the path-dependence into account. So the rule is, make room for the target before finding a candidate?
The introduction of the pool makes things more ‘simulate-y’. If the pool isn’t an actual place, just an infinite sink and infinite source of blank crates, the analysis gets simpler and you can get by without needing to simulate boxes. The metric also has a nice symmetry to it:
crate error = abs(index of incorrect crate — observed length) + min(index of correct crate in each stack)
(where index is from the top)
Is there a reason this is a bad metric?
My metric is also not path-dependent!
I’m not sure any metric is a “bad” metric. The one I’m using is based on the metaphor of counting moved blocks, and assuming there is an (empty) tabletop where you can un-stack them and grab them again when you need them. Yours might be more like a situation where the table has a bunch of “clear” blocks on it already.
Since I actually have a tabletop where I tested it, and a limited number of blocks, I think I’ll stick with mine.
Why does symmetry seem so important? Certainly many real-world physical systems exhibit path-dependence like that I’m capturing.
Not a criticism, just a real question. I think the answer is one of habit (in a Deweyan sense of the word); what folks might call “cultural”. Mathematicians and computer scientists like to think things are pretty :)
Just seems easier to code and reason about correctness, so I can focus on the rest of the GP system, that’s all.
A metric can be bad if it relies on a poor representation, if it gets easily stuck on local optima, and if it’s too simple to distinguish very different solutions (any others?).
Very useful conversation. The point of the exercise for me (and the book, of course) is the “tension” between the habits we’ve all accumulated through our years of being Smart Technicians, and the desire to be usefully surprised.
So for example, the “realistic” stance I started from was really informed by the other mathematical assumption you haven’t mentioned: additivity. I started thinking about it as an “independent” contribution from each block, which when summed gives something a bit better than a string-rewriting distance. Your symmetry version is nice in the ways you mention — not least the symmetry and simpler coding — but feels less immediately “realistic”.
Of course one could also throw both aside, and think of a “simulation-y” version that counts all the crap you need to rearrange to fix it, all at once: just tear every stack down to the point where it’s correct, and then re-stack all the stuff you took down. Ought to be simple to calculate, and also easy to consider “realistic”.
Now the interesting question is not which is a priori “better”, but rather what symptoms and conditions you should watch for that would make you want to change from one measure to another: as you say, local trapping, poor discernment, possibly something too neutral, weird asymmetries.
Which, as I said, is the lesson: contingency should trump assumption.
Let’s try all of them.
I’ve added two additional error [quasi]metrics to the repository, and at some point I promise I’ll also implement Kartik’s. But today, here are teardown_distance and rebuild distance.
Empirically, rebuild_distance seems to be the best for both hillclimbing and evolutionary search. No, I have no idea why.