Regexes and microarrays, oh my!

Anna explaning supermarket secrets to the public (by jmerelo)This Friday paper seminar has been about the paper Evolving Regular Expressions for GeneChip Probe Performance Prediction, by Langdon and Harrison, which, as the post title says, is a mean hacking feat.
Let’s see what I gathered from it: microarrays don’t work as they should. They contain ADN segments (probes) which stick to other segments which express proteins; the more they are, the more they stick, and the more that protein is expressed. Theoretically, probes matching a certain segment should stick in the same amount. Only they don’t. And why they do seems to have to do with their characteristics, that is, particular features of their sequence.
Langond published a previous paper which studied this; but in this one, they have tried to evolve a regular expression that matches those bad-behaving DNA strings. And the implementation is the hackerish part: instead of sticking to a general purpose-language, in the purest Unix tradition, they have used awk for checking the regexes against a grammar, and egrep to match the DNA strings against the regex.
Next time I expect them to write a bash genetic program. All in all, as I said, an interesting paper that was presented in PPSN 2008.