Diversity-enhanced Genetic Algorithms for Dynamic Optimization

December 18, 2009

Yesterday, in Lisbon, I defended my PhD thesis on Evolutionary Algorithms for dynamic optimization.  The pdf of the document is here, and this is the abstract:

Many industrial applications have dynamic components that lead to variations of the fitness function and Genetic Algorithms (GAs) adaptiveness is an appropriate tool to solve this type of problems. The thesis proposes two new evolutionary methods to tackle dynamic problems. The first acts upon mating and avoids crossover between similar individuals, via a self-regulated mechanism, thus preserving genetic diversity. The second is a new mutation operator able to evolve self-regulated mutation rates with a particular distribution that is suited for dynamic optimization. Finally, an efficient hybrid method that combines both strategies is proposed. The objective and main claim is the possibility of designing nature-inspired protocols for GAs that are efficient when evolving on dynamic environments while preserving algorithms’ complexity and not requiring a priori information about the problem.

The proposals are tested on a wide range of problems and are able to outperform frequently other GAs, namely when the frequency of change is lower. The hybrid scheme proves to be particularly effective since it broadened the range of dynamics in which each method by itself excels. As projected, the proposed techniques are robust and do not increase parameters’ set, thus fulfilling necessary conditions for real-world applications.


New operator for the Traveling Salesman Problem

December 11, 2009

As announced, Darrell Whitley gave a talk on the new operator his team has designed for the classic TSP.
L. Darrell Whitley mostrando el recorrido del viajante de comercio (by jmerelo)
After doing a review of all the classical techniques and operators used to solve this problem, he introduces his new crossover operator that tries not to introduce new edges in the offspring, acting thus as a pure recombination operator, instead of a crossover+mutation. This new operator is called perfect crossover, and is first respectful, and then it transmits alleles. It’s not always possible to apply it, but in samples studied it works 95% of the time. When used in combination with other mutation operators in an evolutionary algorithm, it yields very good results; not completely competitive with the best results known, but almost there.
It’s been great to have such a great talker in our weekly seminars. You can also check out his paper Tunneling between optima: partition crossover for the traveling salesman problem online


[IJHPSA] Resilience to Churn of a Peer-to-Peer Evolutionary Algorithm

December 3, 2009

In this paper we analyse the resilience of a Peer-to-Peer (P2P) Evolutionary Algorithm (EA) subject to the following dynamics: computing nodes acting as peers leave the system independently from each other causing a collective effect known as churn. Since the P2P EA has been designed to tackle large instances of computationally expensive problems, we will assess its behaviour under these conditions, by performing a scalability analysis in five different scenarios using the Massively Multimodal Deceptive Problem as a benchmark.In all cases, the P2P EA reaches the success criterion without a penalty on the runtime. We show that the key to the algorithm resilience is to ensure enough peers at the beginning of the experiment; even if some of them leave, those that remain contain enough information to guarantee a reliable convergence.



Tunneling Between Optima: a new operator for the Traveling Salesman Problem

November 28, 2009

El lunes que viene día 30 de noviembre, el profesor Darrell Whitley dará una charla con el título superior y el siguiente contenido

The first part of this talk will present a tutorial on local search
and evolutionary algorithms for the Traveling Salesman
Problem (TSP). The second part of the talk will introduce a
new recombination operator for the TSP. The operator
can recombine parents in such a way that offspring inherit
all shared edges. Offspring also only inherit edges from parents.
When applied to solutions that are already local optima,
the operator automatically generates samples of new
local optima without applying any additional local search.

La charla tendrá lugar a las 12 de la mañana en el salón de Grados de la ETSIIT.


CHAC, the Military Ants in the press around the world

November 9, 2009

Last Friday, 6th of November some news related to the mini-simulator which we have developed for the implementation and test of the algorithms based in the ants’ behaviour to search for the best path (attending to the speed and safety) in a military battlefield, were published. :)

This new appeared firstly (in May some days after my PhD Tesis lecture) at the Press  site of the University of Granada.

But last Friday, the new was published by some other webs, as AlphaGalileo, EurekAlert and RedOrbit. Then it appeared in many other scientific news webs (AECC, SoftPedia, ScienceDaily, Physorg, LabSpaces, TMCnet, First Science or Science Centric).

In addition it has appeared in some general news online sites (mainly in Asia): Sindh Today, New Kerala, TopNews.in, South Asia News, CBNews and Breaking News among others.

Finally, the new has been also introduced in some blogs, forums and social nets (Blog.Taragana, BizFace, Defense Forum India, Twitter).

Thanks a lot for the interest in our work and let us know if you need some documentation or other stuff concerning to it. ;)

The application can be downloaded from our group’s project site in forja.rediris (is the project hCHAC), and it has been developed under a GPL license, so it is free to download, use and even modify, but following this license restrictions. :)

Best regards.


[IDC'09] Studying the Cache Size in a gossip-based evolutionary algorithm

October 19, 2009

Last week we were presenting the work Studying the Cache Size in a Gossip-Based Evolutionary Algorithm [BibTex] in the 3rd International Symposium on Intelligent Distributed Computing hold in Ayia Napa (Cyprus).

Gossiping is a self-organized and decentralized approach to distribute algorithms through Peer-to-Peer (P2P) networks.
Based on such an approach, the Evolvable Agent model is a P2P Evolutionary Algorithm (EA) whose
population structure is defined by the gossiping protocol newscast, a protocol that behaves asymptotically as a small-world graph. This paper explores the impact of different cache sizes on the algorithm performance given that cache size is the only tunable parameter in newscast. To this aim, the problem generator wP-PEAKS and the multimodal deceptive problem MMDP have been used as benchmarks.
Results show that the quality of the solutions and the run-time of the algorithm are not altered when changing the settings of the cache size. This fact points out that newscast is a robust gossiping protocol for tackling distributed evolutionary computation.


Ants and Estimation of Distribution Algorithms (ECAL 2009)

September 21, 2009

Last week I went to Budapest to present the paper “An Ant-Based Rule for UMDA’s Update Strategy” in the 10th European Conference on Artificial Life (ECAL 2009). ECAL is one of the leading congresses in the area and some of the most relevant work in the Artificial Life research field is presented there in first hand. It is held every two years and this time the capital of Hungary was chosen to host the event. The Academy of Sciences, in Roosevelt tér (square), on the banks of the Danube and with a perfect view on the Castle and the hills of Buda was ECAL’s headquarters for 4 days.

Only 30% of the accepted papers were selected for oral presentation. The remaining was scheduled for poster sessions (although all the accepted papers were published in full-length in two LNCS volumes) that lasted…the whole day! I cannot understand why not all the congresses follow a line similar to PPSN (a poster-only congress, with 90 minutes sessions) when it comes to poster sessions, but ECAL’s strategy is, my opinion, particularly ineffective and exhausting.

talksroom
ECAL 2009, Budapest, Academy of Sciences

As for our paper, it presents a study on an alternative update strategy for the Univariate Marginal Distribution Algorithm based on the ACO computational paradigm and first presented here. The aim is to control the balance between exploration and exploitation in order to avoid diversity loss, reduce the optimal population size and improve the scalability of the algorithm on hard problems. The results confirmed the hypothesis. This is the abstract:

This paper investigates an update strategy for the Univariate Marginal Distribution Algorithm (UMDA) probabilistic model inspired by the equations of the Ant Colony Optimization (ACO) computational paradigm. By adapting ACO’s transition probability equations to the univariate probabilistic model, it is possible to control the balance between exploration and exploitation by tuning a single parameter. It is expected that a proper balance can improve the scalability of the algorithm on hard problems with bounded difficulties and experiments conducted on such problems with increasing difficulty and size confirmed these assumptions. These are important results because the performance is improved without increasing the complexity of the model, which is known to have a considerable computational effort.


Autonomic computing at the Friday paper seminar

September 11, 2009

Autonomic computing is one of those things that involves a certain amount of hand-waving but that corresponds to a really simple idea: tell a computing system what’s expected of it, instead of a step-by-step account of what we want it to do. We can tell a network we expect 99% uptime, or a database system to improve its performance by 5%, or a webserver to try and serve ten thousand simultaneous connections. Most systems will look back at you slack-jawed, but, well, that’s why the paper we’re dealing with today is called Research challenges of autonomic computing, which is written by Jeffrey O. Kephart, an IBM researcher which has produced lots of fine papers on the topic.
What are, then, the challenges and how does all that relates to our own research? The idea is to achieve what are collectively called self-star properties: self configuration, self-management, self-protection and self-healing. That is, let the system itself handle as a body, with each component being like an organ that performs its own function but is also aware that must work for the collective good and obtain an emergent property like body temperature or oxygen supply. That double function leads to architectural challenges, but also to algorithmic problems like how to be aware of what the system at large is doing and how it is behaving, how to learn new responses to changes in environment, and, eventually, how to optimize the system configuration and behaviour to attain desired targets set by the user.
And that’s where our own research comes in: the evolvable-agent architecture, to a point, configures an overlay network to take advantage of all its resources but it’s not there yet: it would be necessary to include more self-star properties such as self-management and self-configuration, so that the user should have to include just performance targets (obtain solution in a certain amount of time), and the system would set its parameters itself to attain it. A step in the right direction, anyways.


[GECCO'09] Improving Genetic Algorithms Performance via Deterministic Population Shrinkage

July 20, 2009

This year the Genetic and Evolutionary Computation Conference (GECCO) took place in Montréal (Québec-Canada) where we were presenting our last work in collaboration with the Laboratoire de Vision et Systèmes Numériques de l’Université Laval in Quebec City:

Despite the intuition that the same population size is not needed throughout the run of an Evolutionary Algorithm (EA), most EAs use a fixed population size.
This paper presents an empirical study on the possible benefits of a Simple Variable Population Sizing (SVPS) scheme on the performance of Genetic Algorithms (GAs). It consists in decreasing the population for a GA run following a predetermined schedule, configured by a speed and a severity parameter. The method uses as initial population size an estimation of the minimum size needed to supply enough building blocks, using a fixed-size selectorecombinative GA converging within some confidence interval toward good solutions for a particular problem. Following this methodology, a scalability analysis is conducted on deceptive, quasi-deceptive, and non-deceptive trap functions in order to assess whether SVPS-GA improves performances compared to a fixed-size GA under different problem instances and difficulty levels. Results show several combinations of speed-severity where SVPS-GA preserves the solution quality while improving performances, by reducing the number of evaluations needed for success.


Conferencias en el Máster en Tecnologías Informáticas Avanzadas de la UHU

May 25, 2009

El pasado viernes asistí a tres charlas que impartó el profesor Francisco Herrera del departamento de CCIA de la Universidad de Granada. En la primera de ellas, “Data Mining: From the Top 10 Algorithms to the New Challenges”, el profesor Herrera habló de los 10 mejores algorimos que se aplican a la minería de datos según (ICDM ‘06), así como de los 10 mayores retos de la minería de datos actual. La siguente charla, titulada “Data Complexity” estaba dedicada a los intentos de dar fundamento teórico a la minería de datos. Se habló de los modos de caracterizar los conjuntos de datos con el fin de recomendar que algoritmo funciona mejor con un determinado conjunto de datos. La última de las tres conferencias “How must I do my Experimental Study?” trataba los test estadísticos que deben aplicarse a los experimentos en un trabajo de investigación con el fin de justificar, por ejemplo, la calidad de un nuevo algoritmo. Se trató el uso de métodos estadísticos no-paramétricos que funcionan bien en conjuntos de datos que no siguen una distribución normal, cosa que sucede en la mayoría de los problemas reales. Este tipo de test, en palabras del profesor Herrera, empiezan a ser indispensables cuando se quiere publicar en revistas científicas de primer nivel. A continuación aportaré alguna información adicional sobre estas conferencias.

“Data Mining: From the Top 10 Algorithms to the New Challenges”

Lo candidatos al Top 10 debían cumplir algunos requisitos, como por ejemplo que tuviesen como mínimo 50 citas recientes.  De los inicialmente propuestos quedaron sólo 18 algoritmos finalistas. Los resultados del estudio fueron publicados en el artículo “Top 10 algorithm in data mining” y posteriormente en un libro con el mismo título. La siguiente tabla muestra a los finalistas clasificados en categorías.

Categoría Finalistas
Classification 1. C4.5, 2. CART, 3. K Nearest Neighbours (kNN), 4. Naive Bayes
Statistical Learning 5. SVM,  6. EM
Association Analysis 7. Apriori8. FP-Tree
Link Mining 9. PageRank, 10. HITS
Clustering 11. K-Means,  12. BIRCH
Bagging and Boosting 13.  AdaBoost
Sequential Patterns 14. GSP,  15. PrefixSpan
Integrated Mining 16. CBA
Rough Sets 17. Finding reduct
Graph Mining 18. gSpan

En la segunda parte de esta primera charla, se presentaron los 10 grandes retos en minería de datos según (ICDM 2005). No he conseguido más información sobre la fuente del estudio. Ampliaré esta información cuando consiga las transparencias utilizadas en la charla. Los principales desafíos se engloban en las siguientes áreas:

  1. Teoría. Definición de una teoría unificada ¿por qué funcionan los algoritmos? ¿cuales son los fundamentos del Data Mining?
  2. Escalabilidad. Algoritmos que manejen gran cantidad de datos a gran velocidad.
  3. Secuencialidad y series temporales. Técnicas para predecir acontecimientos a partir de datos históricos.
  4. Datos complejos. Obtención conocimiento complejo a partir de datos complejos.
  5. Minería de datos en redes sociales.
  6. Minería de datos en bases de datos distribuidas.
  7. Minería de datos para problemas biológicos y ambientales
  8. Procesos de la minería de datos. Automatización, caracterización de los problemas, etc.
  9. Seguridad, privacidad e integridad.
  10. Minería de datos sobre datos no estáticos y datos cuya observación sea costosa.

Buscando más información sobre estos desafíos futuros he encontrado un artículo reciente sobre este tema titulado “Future tends in data mining“.

“Data Complexity”

Pretende dar respuesta a la pregunta “dado un problema de clasificación, ¿qué clasificador es mejor para el?”.  El libro de referencia es Data Complexity in Pattern Recognition. Para poder responder a esa pregunta se necesitan medidas de complejidad de los conjuntos de datos. Podemos encontrar diferentes medida de la complejidad de los datos en el trabajo de Tin Kam Ho, titulado “Measures of Geometrical Complexity in Classification Problems“.  Os dejo un enlace a las transparencias de la charla.

“How must I do my Experimental Study?”

Cuando nuestro trabajo de investigación consiste en el desarrollo de un nuevo algoritmo, es muy importante compararlo con con algoritmos de calidad. Muchos artículos son rechazados en revistas de prestigio, cuando se compara un nuevo algoritmo con versiones antiguas de nuestros competidores y no se incluyen comparativas con las últimas versiones más eficientes y precisas.  Justificar convenientemente la bondad del nuevo algoritmo utilizando  conjuntos de datos conocidos y métodos estadísticos es igualmente importante. En la dirección del grupo de investigación “Soft Computing and Intelligent Information Systems” que dirige el profesor Herrera, podemos encontrar información sobre estos métodos estadísticos y un software desarrollado por el profesor Salvador García, que genera los gráficos para incluir en documentos Latex utilizando a la entrada un fichero en formato CVS.

Data Mining: From the Top 10 Algorithms to the New Challenges