E. Coli and Open-Source Software

(…) It’s a bizarre coincidence that just as scientists were discovering the evolutionary importance of viruses, computer engineers were creating a good metaphor for their effect. In the late 1990s, group of American engineers became frustrated by the slow pace of software development. Corporations would develop new programs to make it impossible for anyone on the outside to look at the code. Improvements could come only from within – and they came slowly, if at all. In 1998, these breakaway engineers issued a manifesto for a different way of developing programs, which they called open-source software. They began to write programs with fully acessible code. Other programmers could tinker with the program, or merge parts of different programs to create new ones. The open-source software movement predicted that this uncontrolled code swapping would make better programs faster. Studies have also shown that software can be debugged faster if it is opwn source than if it is private. Open-source software has now gone from manifesto to reality. Even big corporations such as Microsoft are beginning to open some of their programs to the world’s inspection.

In 2005, Anne O. Summers, a microbiologist at the University of Georgia, and her colleagues coined a new term for evolution driven by horizontal gene transfer: open-source evolution. Vertical gene transfer and natural selection act like an in-house team of software developers, hiding the details of their innovations from the community. Horizontal gene transfer allows E. Coli. to grab chunks opf software and test them in its own operating system. In some cases, the combination is a disaster. Its software crashes, and it dies. But in other cases, the fin-tuning of natural selection allows the combination to work well. The improved patch may later end up in the genome of other organism, where it can be improved even more. If E. Coli is any guide, the open-source movement has a bright future.


Carl Zimmer, in Microcosm – E. Coli and the New Science of Life

Estimation of Distribution Algorithms

In the week before the discussion on microarrays, the seminar’s paper was From Recombination of Genes to the Estimation of Parameters I, Binary Parameters, by H. Mulhenbein and G. Paaβ. The seminar aimed at discussing the simple Univariate Marginal Distribution Algorithm (UMDA), its origins, advantages and disadvantages when compared to standard Genetic Algorithms, and the algorithms that were born out of this simple evolutionary method. There is a lot of work done on UMDA and Estimation of Distribution Algorithms (EDAs), so there is not much latitude for speculation.

One of the most interesting features of UMDA and following EDAs is their simplicity and the way research has been done by carefully analyzing and improving existing EDAs, from the “old” PBIL to the sophisticated and rather effective BOA and hBOA algorithms. Theoretical analysis of Evolutionary Computation, namely that concerned with scalability and convergence issues, experienced a consisted improvement since the burst on EDAs research. Moreover, it is when scalability is carefully analyzed and/or measured that some EDAs reveal all its full power when compared to traditional Genetic Algorithms. Some EDAs, by “learning” a problems’ structure (think of BOA, for instance) are able to deal with very large instances of that problem in practicable computational time. In addition, recent investigations have concluded that these kind of algorithms and Ant Colony Optimization share a sufficient number of traits to be seen as methods belonging to the same class of heuristics, a fact that may spoil all the “magic” behind Ant Algorithms (and it will sure discredit much of the jargon involved in some discussions on Ant Algorithms, self-organization, etc). On the other hand, this “unification” will shed some light on both algorithms’ characteristics.

(In this line of investigation, our group has recently published the paper UMDAs for Dynamic Optimization Problems, but further research is on way, on both static and dynamic environments.)

Regexes and microarrays, oh my!

Anna explaning supermarket secrets to the public (by jmerelo)This Friday paper seminar has been about the paper Evolving Regular Expressions for GeneChip Probe Performance Prediction, by Langdon and Harrison, which, as the post title says, is a mean hacking feat.
Let’s see what I gathered from it: microarrays don’t work as they should. They contain ADN segments (probes) which stick to other segments which express proteins; the more they are, the more they stick, and the more that protein is expressed. Theoretically, probes matching a certain segment should stick in the same amount. Only they don’t. And why they do seems to have to do with their characteristics, that is, particular features of their sequence.
Langond published a previous paper which studied this; but in this one, they have tried to evolve a regular expression that matches those bad-behaving DNA strings. And the implementation is the hackerish part: instead of sticking to a general purpose-language, in the purest Unix tradition, they have used awk for checking the regexes against a grammar, and egrep to match the DNA strings against the regex.
Next time I expect them to write a bash genetic program. All in all, as I said, an interesting paper that was presented in PPSN 2008.