The following is a draft of the introduction from the book.
What is Genetic Programming?
I’ve noticed that when you look up “genetic programming” at Google and read the top hits, it often sounds as though the writer imagines you already know what he means by the phrase. After twenty years, here’s what I think: Neither you nor they know what they mean by the phrase.
But then I’m not even sure I know.
I use the phrase, of course. “Genetic Programming.” “GP.” And I act as though I know what I mean. It’s what I do.
Let’s try some more research. It seems like maybe you have an Internet where you are, and your copy of Wikipedia isn’t broken. Go see what they say about genetic programming there.
Come back when you’re done. I’ll be here.
❧
OK, so as I read it — at least as of this writing, and Wikipedia being what it is — “Genetic Programming” is some kind of computer-sciencey thing that does artificial intelligence with genes that connect ‘+’ signs and stuff in little trees. If you read closely, there’s something about computer programs that write themselves automatically. Plus there’s a lot of different alternative approaches to it… whatever it is. And based on the wording and the edit history of the Wikipedia page some ways of doing it are clearly better than others… at something… even I don’t quite understand what.
Also there’s mutation and crossover.
Yeah, that sounds technical enough, right? Can we agree to move ahead with that?
Ah, yes.… I thought not. Let me look around for a bit.
❧
How about this? Here is a very good book I recommend to all my students: The Field Guide to Genetic Programming by Riccardo Poli, William B. Langdon and Nic McPhee. It’s available electronically! You can read it now.
❧
No? Not quite done?
All right, let’s bring out the big guns. How about Sean Luke’s Essentials of Metaheuristics. I vouch for it wholeheartedly: it’s full of inspiring machine learning things, all explained simply. And also available electronically. Read that!
❧
Before we go any farther, let me tell you how this is going to end:
The stuff we call “genetic programming” is an incoherent suite of technical habits — design patterns, models, idioms — most often used to accelerate human innovation.
All that; not that
The sentiment isn’t new. It just doesn’t get repeated often enough.
It’s a cliché when the author of a a technical work starts off by saying he’s a “bit of a heretic”, implying that what he’s about to impart will probably get the reader in trouble if repeated in the wrong company.
For one thing it helps promote a sense that the formal discipline as “dynamic” and “lively”. You know, with beardy codgers and plucky upstarts convening in luxurious Victorian auditoria to threaten one another with walking-sticks before racing to the Pole to show those fools what a real dinosaur looks like.
Also a nasty back-handed recruiting trick, if you ask me. I’ve been to way too many meetings, and they would have all been much better if we’d had walking-sticks, let alone dinosaurs.
The proverbial “bit of heresy” can also be helpful when the author is feeling self-conscious about playing fast and loose with details, or wants to puff up his own authority, or might even be failing to give credit to colleagues who deserve it. I write these words on the anniversary of one particularly notorious example of the latter, so don’t think it doesn’t happen: being an “outsider” suggests to the innocent reader that you might have thought all this stuff up on your own.
Telegraphed “heresy” can also be pedagogically useful. If only they keep reading, the readers might be let in on a juicy bit of gossip about you know… that whole Leibniz – Newton thing, or… have you heard about how Alexander the Great really compared as a ruler to his dad? Keeps them from falling asleep or skipping to the answers in the back of the book.
But then — and you can’t tell me you didn’t see this coming: sometimes it’s true.
So this is my heretical version of What Genetic Programming Actually Is:
I have no damned idea.
It’s all over the place. No, seriously — you have no notion what a burden it can be, trying to write one one of these introductory overviews.
First we would have to review some history. I’d point out that seven or eight (or a dozen) independent thinkers invented Genetic Programming through the last fifty years. They each called their variation some different thing1, and the details of implementation were all different, and some of the variations are little-known while others are huge stars. None is everything.
Then to be fair I would have to say not only what all those inventors did back then, but also sum up all the important things the ten thousand subsequent people working with GP did in their papers and books and articles and conference posters on the subject. Plus there’s all the domain-specific application work. Plus the commercial and proprietary methods, each one vying for authenticity and authority.
But that’s just a raw fact-dump. So next I’d need to cover the trends and cultural norms, themes and motifs, noteworthy genealogies and regionally-distinct Schools of Thought.
And then I’d need to fix some of your misconceptions because “Genetic Programming” may be the most misleading technical name in the whole world. I’d point out that it’s not genetic algorithms even though it sounds the same. It’s not really anything like mathematical programming or constraint programming. It’s not philosophically anything like biological evolution, even if you squint. It’s not quite the same as machine learning (or it is, depending on who you ask), not least because saying so pisses off the Statisticians (who know better). It’s not just evolving LISP trees, it’s evolving all kinds of structures and plans and algorithms and ideas and art. It’s not just symbolic regression. It’s not a lot of things, apparently.
So what is it?
No, really
Whatever it isn’t, I can say that Genetic Programming is the cumulative work of a huge number of very smart people. Thousands of researchers and practitioners around the world. They have almost all been passionate visionaries, and have all done amazing things to… well, to achieve whatever Genetic Programming turns out to be for in their diverse individual cases.
I am reminded that the sociologist Andrew Abbott published a very interesting and readable book in 1988, which has helped me quite a bit to understand what GP actually is. Abbott’s book is called The System of Professions: An Essay on the Division of Expert Labor.
What? Why shouldn’t I define it with a sociology book? How is it you have paid so little attention to the rant thus far?!
Anyway, in System of Professions, Abbott describes the dynamics of professionalization. That is, how technically astute people with overlapping technical roles come to self-identify and promote their shared interests by creating (and eventually policing) a profession. In Abbott’s model, pre-professional “fields” arise whenever diverse people find themselves exploring and exploiting particular new opportunities — especially new technical inventions.
His story of the stages of professionalization includes the development of regional and social communities of shared interest, then communities of practice… then at some point they name themselves. Then the boundary-setting starts, and the self-definition, and the authoritative self-regulated training and credentialing systems, and finally — as a pattern, not a rule — we find them building legal infrastructure, ranging from Associations to Unions to state-licensed regulatory bodies.2
No, this isn’t a digression. You asked. Well, OK, I asked rhetorically for you: What is Genetic Programming?
And I answer, not at all rhetorically: Genetic Programming is a “field” emerging from the interests of diverse people, who find themselves exploring and exploiting a particular new opportunity. It is their shared practices and norms, their habits and their goals.
I could define GP as “the search for formal algorithmic structures by using metaheuristics inspired by biological evolution”, but it cannot merely be that. Because (as you’ll learn first-hand) you don’t have to use evolution-like things to search.
I could try to uniquely identify GP as “metaheuristic optimization of structures, as opposed to traditional parametric search or analytically-derived optimization algorithms”. But (as you’ll learn first-hand) we sometimes use those other things too. GP can’t just be evolving programs, because some people evolve antennas and bridges and molecules.
GP can’t just be for data mining, because some people evolve completely abstract proofs. It isn’t about the tools or techniques.
It is, in fact and not just metaphorically, a community of self-identified people who share a way of trying to solve problems.
Asking what GP “is” at this point in its professional history is like asking what “programming” is: Programmers use computers to solve problems for people. They don’t do it in any particular way, except that most of them type on a keyboard.
But “typing” is not programming. Just as “evolving code” is not GP.
Look at professional computing. You can easily see professional boundaries between the many people who write programs. There are Software Engineers, and Computer Scientists, Programmers and Analysts. And of course there are those who prefer the label Software Developers, so they can self-differentiate themselves as the ones who actually know how to collaborate and make programs that people can actually to use to do stuff.3
I’m quite serious: “Genetic Programming” lives somewhere a bit earlier in the same professionalization story. As Rick Riolo has said many times: “It’s an art trying to become a craft.”
If you ask them, most will say they are doing automated search for abstract structures that solve problems. But the details vary wildly, and every real or theoretical problem is still a special case.
So for the time being Genetic Programming is what people do, who self-identify as “using Genetic Programming.”
Tozier, that isn’t really very helpful
Yeah, trust me: I am totally on your side.4
But I have written this book, and you are reading it. Rather than thinking you and I are both crazy, explain it this way:
If we play our cards right, we can ourselves define Genetic Programming.
I don’t mean to imply “GP is what you think it is”. I mean the field is so young and malleable, that you can learn to do amazing things without ever being told you’re doing it wrong.
In these last twenty years I’ve seen fortunes made, disease treatments invented, patentable inventions piled thousand deep, philosophical and theoretical problems settled, space probes launched, robots that learn to walk in their dreams.…
People can use GP to create things they could otherwise only imagine. Here’s my little True Heresy, stated another way: Those people are not using GP to “automatically invent” things. It isn’t a magic invention machine.
It’s an accelerator.
I’ve hung out with a number of these folks, through the years. They’re not smug geniuses… as a rule. Rather, they walk around in a sort of daze, telling one another how surprised they were by what they were shown when they started using GP.
A human being invents when she uses GP to consider a million outrageous structures and layouts no sane design engineer could incrementally develop. The “invention” happens when she — a standard-issue human being—notices that some of those million designs is interesting.
That’s the same thing a traditional design engineer does, but faster. The effort is in a different place.
A human being explains something the world when he uses GP to consider a thousand novel models of data, in less time than a traditional statistician can evaluate two. The “explanation” happens when he — a standard-issue human being—notices that some of the best models invoke relationships between variables that nobody else had never mentioned.
That’s the same thing a traditional statistician working with a domain expert does to explain the world, but faster. The effort is in a different place.
And so on: an artist explores a thousand compositions; a biomedical researcher examines a dozen or a hundred genomes and a million gene expression profiles; a trader monitors a million portfolio management rules.
The same thing they would normally do. But more. The effort is in a different place: on the thinking.
Genetic Programming doesn’t automate thinking or creativity or any of those things. It helps people notice things.
GP is a prosthesis
Think about writing — you know, with a pen, on paper. Writing isn’t “automated memory”. Or think about programming computers. It isn’t “automated arithmetic”.
Writing and programming extend your mind. Writing is a prosthesis in the sense that it offloads memories to a long-term external storage medium. Programming is a prosthesis in the sense that it calculates stuff really really fast.
But neither one is “automatic”. Harry Potter notwithstanding, there are no self-writing pens, and no self-programming computers.
See what I did right there? There are no self-programming computers. That includes Genetic Programming, regardless of what you may have heard from the nerds down the street.
I can’t tell you how many people I’ve seen come to GP, having read the hype about automated invention and stuff. Like a person who wants to write better, so she gets a really powerful pen. The person who wants to learn to program games, so he gets a really powerful computer.
How do you learn to write? How do you learn to program? Same with GP. Through guided practice. We think a bit, we try something, we learn if we’re lucky, and maybe we solve some problems.
And if we’re very good problem-solvers, we can use GP to help ourselves become usefully surprised.
-
Evolutionary programming, genetic programming, some German ones I can’t recall the names of at the moment… no many doubt others. ↩
-
Ellen Mazur Thomson provides a lovely example of this same professionalization dynamic in her well-written historical case study of the printing and graphic design trades: The Origins of Graphic Design in America, 1870 – 1920. ↩
-
Though even they are fragmenting on the basis of methodology and domain.… ↩
-
If only we’d had walking sticks at the conferences these last twenty years, it would have all been so much more efficient.… ↩