Regexes and microarrays, oh my!

Anna explaning supermarket secrets to the public (by jmerelo)This Friday paper seminar has been about the paper Evolving Regular Expressions for GeneChip Probe Performance Prediction, by Langdon and Harrison, which, as the post title says, is a mean hacking feat.
Let’s see what I gathered from it: microarrays don’t work as they should. They contain ADN segments (probes) which stick to other segments which express proteins; the more they are, the more they stick, and the more that protein is expressed. Theoretically, probes matching a certain segment should stick in the same amount. Only they don’t. And why they do seems to have to do with their characteristics, that is, particular features of their sequence.
Langond published a previous paper which studied this; but in this one, they have tried to evolve a regular expression that matches those bad-behaving DNA strings. And the implementation is the hackerish part: instead of sticking to a general purpose-language, in the purest Unix tradition, they have used awk for checking the regexes against a grammar, and egrep to match the DNA strings against the regex.
Next time I expect them to write a bash genetic program. All in all, as I said, an interesting paper that was presented in PPSN 2008.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s