P2P technology for computing tasks does not always mean P2P computing

What would you think of a Bugatti Veyron limited to a maximum speed of 20 Km/h?? mmmm… YES!! Give me back the money!!

… well, that’s roughly my feeling when I read a paper on P2P computing restricting the scalability analysis to some few nodes. Of course, the analogy is not completely fair since performing a real massively distributed (and decentralized) experiment presents some challenges that, in most of the cases so far, remain out of the scope of the state of the art research. That happens, for instance, to P2P environments applied to optimization, and more precisely to Evolutionary Computation.

Usually, you can find two different approches for P2P optimization testing, either using simulations or using a “GRID-like style” in real environments, each case presents its own advantanges and drawbacks:

  • Using a real P2P environment ( we performed a study like that using a 8×2 cluster). The adventage here is that the design has to face real restrictions, as communication or evaluation times, message passing restrictions or fault tolerance. Nevertheless, there are extreme difficulties to set a real and proper environment to test a model.

When you face a real environment, you find that:

  • Large number of resources are hard to grab
  • Scalability is hard to study. In the case of having few nodes, no real P2P computing is going on since no conclusions about large scalability can be drawn. On the other hand, if there are some good dozens of peers, other questions such as fault tolerance arise.

In the last friday paper seminar our team was discussing the following paper that uses the “grid-like style” for testing:

in which the authors propose a hybrid model combining islands with cellular EAs. Every peer holds an island and every island a cellular EA. As previously commented for grid-like testing, the scalability analysis is quite poor (up to 10 peers), additionally, the algorithm yields the best performance in the five peers’ scenario, pointing to a really poor scalability of the model. Nevertheless, the fact is that P-CAGE outperforms canonical EAs making of it a valid solution based on P2P technology, just that, such a solution is not really scalable and therefore, can not be really understood as P2P computing.

To conclude, I do think that simulations can benefit the understanding of complex interactions in P2P EAs at this stage of research, preventing situations as the one shown above, afterwards, there will be time to validate the models in real environments, letting that theory meets practice.

[PACT’08][PABA Workshop I] Addressing Churn in P2P EA

This week the first Workshop on Parallel Architerctures and Bioinspired Algorithms is being held in Toronto (Canada) in conjunction with the prestigious conference Parallel Architectures and Compilation Techniques (PACT).

In extension to our line of work in P2P EAs, we have presented the work:

In this paper we analyse the robustness of a Peer-to-Peer (P2P) Evolutionary Algorithm (EA) subject to the following dynamics: peers leave the system independently from each other causing a collective effect known as churn. The algorithm has been designed to tackle large instances of computationally expensive problems and, in this paper, we will assess its behavior under churn. To that end, we have performed a scalability analysis in five different scenarios using the Massively Multimodal Deceptive Problem as a
benchmark.  In all cases, the P2P EA reaches the success criterion without a penalty on the response time. The key to the algorithm robustness is to ensure enough peers at the beginning of the experiment. Some of them leave but those that remain are enough to guarantee a reliable  convergence.

¿Sueñan los evolutivos con voluntarios interconectados?

Frente a la capacidad de 360 TFlops de Blue Gene, que actualmente aparece primero en la lista top500 de supercomputadores propietarios, el sistema de computación BOINC obtiene en promedio unos 521 TFlops. Esta capacidad de computo es debida a un total aproximado de 2 millones de nodos de los cuales en promedio unos 400000 se encuentran conectados al sistema a través de Internet. Los recursos de estos nodos son cedidos de forma voluntaria por usuarios que deciden procesar paquetes de información en sus propias máquinas, estamos hablando de lo que se denomina computación voluntaria.

Dentro de la computación voluntaria, el paradigma P2P se revela como alternativa para interconectar enormes cantidades de recursos distribuidos de forma descentralizada, es decir, sin ningún sistema de control central (aunque esta es la visión más estricta). Se puede decir que la fase actual de la Computación P2P está en sus inicios y como tal muchos de los trabajos que intentan aplicar este paradigma a problemas clásicos de cómputo representan tímidos avances.

Este es el caso del trabajo de Wei-Po Lee:

donde nos presenta un sistema paralelo de computación evolutiva que intenta aprovechar los recursos de una red de ordenadores mediante una aproximación basada en agentes, que son desarrollados con el framework JADE. La idea de la Computación P2P subyace en el artículo, si bien, son numerosas las cuestiones que no se abordan (p.ej. tolerancia a fallos, capacidad masivamente paralela o que las fases de la evolución sean sincronizadas) y que el mismo autor apunta en el apartado de conclusiones.

Esquematicamente, Wei-Po Lee propone tres tipos de agentes:

  • Agentes de estado, que controlan en cada nodo si los recursos del mismo están disponibles o no.
  • Agentes móviles, encargados de realizar las fases de la evolución en una población de individuos. Cada uno de estos agentes correspondería a una isla dentro del modelo de islas.
  • Agente de sincronización, se trata de un solo agente que se ejecuta en el nodo principal (por lo que existe cierta centralización). Este agente comprueba el estado de los recursos mediante comunicación con los agentes de estado asignando agentes móviles a recursos libres.

Se han realizado varias pruebas a la arquitectura considerando, en una red de 8 nodos, la resolución mediante Programación Genética del problema de evolucionar controladores de robots. Los resultados muestran una ganancia lineal del tiempo de resolución con respecto al número de nodos.

Desde un punto de vista práctico se trata de un buen trabajo, aunque como todo no se puede tener, dista mucho de una metodología formal para paralelizar masivamente algoritmos evolutivos en entornos P2P de computación voluntaria. No hagamos ruido pues y dejemos que los evolutivos sigan soñando con voluntarios interconectados… por el momento.

Paper on distributed evolutionary computation uploaded to ArXiV

I uploaded a couple of days our paper Self-adaptive Gossip Policies for Distributed Population-based Algorithms, which is a collaboration among a bunch of researchers from 4 different institutions trying to come up with the perfect algorithms that is able to extract all the juice from heterogeneous, peer to peer, dynamic, networks. Here’s the abstract:

Gossipping has demonstrate to be an efficient mechanism for spreading information among P2P networks. Within the context of P2P computing, we propose the so-called Evolvable Agent Model for distributed population-based algorithms which uses gossipping as communication policy, and represents every individual as a self-scheduled single thread. The model avoids obsolete nodes in the population by defining a self-adaptive refresh rate which depends on the latency and bandwidth of the network. Such a mechanism balances the migration rate to the congestion of the links pursuing global population coherence. We perform an experimental evaluation of this model on a real parallel system and observe how solution quality and algorithm speed scale with the number of processors with this seamless approach.

It’s also been submitted to Europar 2007 . Let’s cross our fingers It wasn’t accepted.

(Via BloJJ)