Self-organized criticality in software repositories, poster presented at ECAL 2017

mural-insa

The European Conference on Artificial Life or ECAL is not one of our usual suspects. Although we have attended from time to time, and even organized it back in 95 (yep, that is a real web page from 1995, minus the slate gray background), it is a conference I quite enjoy, together with other artificial life related conferences. Artificial life was quite the buzzword in the 90s, but nowadays with all the deep learning and AI stuff it has gone out of fashion. Last time I attended,ten years ago, it seemed more crowded. Be that as it may, I have presented a tutorial and a poster about our work on looking for critical state in software repositories. This the poster itself, and there is a link to the open access proceedings, although, as you know, all our papers are online and you can obtain that one (and a slew of other ones) from repository.
This is a line of research we have been working on for a year now, from this initial paper were we examined a single repository for the Moose Perl module. We are looking for patterns that allow us to say whether repositories are in a critical state or not. Being as they are completely artificial systems, engineering artefacts, looking for self organized criticality might seem like a lost cause. On the other hand, it really clicks with our own experience when writing a paper or anything, really. You write in long stretches, and then you do small sessions where you change a line or two.
This paper, which looks at all kinds of open source projects, from Docker to vue.js, looks at three different things: long distance correlations, free-scale behavior of changes, and a pink noise in the spectral density of the time series of changes. And we do find it, almost everywhere. Most big repos, with more than a few hundred commits, possess it, independently of their language or origin (hobbyist or company).
There is still a lot of work ahead. What are the main mechanisms for this self-organization? Are there any exceptions? That will have to wait until the next conference.

Advertisements

Asynchronous, heterogeneous, pool based evolutionary algorithms in GECCO 2017

35172883894_62a4ac78b1_zboo
Fresh back from GECCO 2017, which is probably the main event for evolutionary algorithms and other metaheuristics. Together with the conference proper, there are workshops and tutorials. Last year we achieved full score, with papers, posters and tutorials. Unfortunately, not this year.
We’re happy though with the two papers that were accepted in the EvoSoft workshop, which we usually attend, and the BBOB benchmarking workshop. Both used the same thing, EvospaceJS, Mario’s framework for working with tuple-space pool-based evolutionary algorithms. The idea of this pool is decoupling algorithms from population. And as soon as you do that, a world of posibility opens, like using different clients on the same pool. In the EvoSoft paper, evospace-js: asynchronous pool-based execution of heterogeneous metaheuristics, we presented the general framework and a pool of concept which combined PSO and evolutionary algorithms, with very interesting results. Here’s the rather laconic presentation, which is a reason more to check out the paper.
In the second paperBenchmarking a pool-based execution with GA and PSO workers on the BBOB noiseless testbed.
All in all, EvospaceJS and NodIO, the two frameworks we work with, offer a nice platform for experimentation with different kind of algorithms that can be easily transported to the cloud and adapted to volunteer computing environments. Whatever the case, it also has an interesting dynamics that has an influence on the working of the evolutionary algorithms. Sure, we will continue tapping this source of interesting insights on evolutionary models.

Finding self-organized criticality in collaborative work via repository mining (IWANN’2017)

Captura de pantalla 2017-06-16 a la(s) 09.58.27

I would like to spot your attention to three points:

  • Development teams eventually become complex systems, mainly in collaborative work environments.
  • Relations and collaborations take place through the environment.
  • Pattern mining and analysing social-based information is a complex problem.

Thus, our main objective was studying new methodologies to analyse patterns of collaboration in collaborative work environments, as it is a complex problem that needs new tools to explore and analyse data related to relations-based information.

Also, we wanted to explore and analyse relations-based data, e.g. to answer the question “Do developers self-organize?”, and finally, to contribute to open science tools and methodologies.

In Statistical Physics, criticality is defined as a type of behaviour observed when a system undergoes a phase transition. A state on the edge between two different types of behaviour is called the critical state, and in this state the system is at criticality.

A clear example is the sandpile model, in which, if we add one grain to the pile, in average the steepness of slopes increases. However, the slopes might evolve to a critical state where a single grain of sand is likely to settle on the pile, or to trigger an avalanche:

Captura de pantalla 2017-06-16 a la(s) 09.42.24

In this report we work on a repository for several papers. There, we examined 4 repositories where the collaborative writing of scientific papers take place using GitHub. Repositories with a certain “length”, more than 50 commits (changes), have been chosen. Thus, we could analyse changes in files, looking for the existence of:

  • a scale free structure
  • long-distance correlations
  • pink noise

Several macro measures extracted from the size of changes to the files in the repository were obtained:
1. Sequence of changes
2. Timeline of commit sizes
3. Change sizes ranked in descending order
4. Long-distance correlations
5. Presence of pink noise (1/f)

Paying attention to the sequence of changes and the timeline of commit sizes (1, 2), no particular “rhythm” can be seen: daily nor on the changes. Repositories can be static for a long time, to experience a burst of changes all of a sudden (avalanche), that is a symptom of the underlying self-organized criticality state.

After plotting change sizes ranked in descending order (3), it can be seen that some authors send atomic changes while others write down big paragraphs/sections before commit those big changes. At the end, we can see a tail corresponding to big changes at the end (just before sending the paper).

Long-distance correlations plots show how long distance autocorrelations appear in different places depending on the repository, but is present in most cases anyway.

Finally, pink noise refers to any noise with a power spectral density of the form 1/f. In order to see clearly the presence of pink noise, the spectrum should present a slope equal to -1. However, there is not a clear trend downwards. Maybe this could appear later on in development. Maybe could see that trend using repositories with a longest history. In any case, the fact that this third characteristic is not present does not obscure the other two, which appear clearly.

As conclusions, we have demonstrated that, after analysing several repositories for scientific papers writing, they are in a critical state, as (1) changes have a scale-free form, (2) there are long-distance correlations, and (3) pink noise has been detected (only in some cases).

For the shake of reproducibility and as we support open science, both the programs and data related to this report are available online at the repository “Measuring progress in literature and in other creative endeavours, like programming”
http://github.com/JJ/literaturame

The slides used to present this work in IWANN’2017 Congress are available at:
https://es.slideshare.net/pacvslideshare/finding-selforganized-criticality-in-collaborative-work-via-repository-mining

I Reunión Internacional de Metabolómica y Cáncer

El próximo 26 de Mayo, Víctor Rivas, uno de los miembros del grupo GeNeura, impartirá una ponencia denominada “Interpretación de resultados mediante herramientas de minería de datos” como parte de la I Reunión Internacional de Metabolómica y Cáncer. Dicho evento tendrá lugar el 26 de mayo 2017 y ha sido organizado por la Fundación MEDINA  y el Complejo Hospitalario de Jaén.

La reunión se llevará a cabo en el Parador de Jaén, siendo la asistencia a la misma  completamente gratuita previa inscripción en http://www.esmeeting.es/i-reunion-internacional-de-metabolomica-y-cancer/

Our TORCS driving controller presented at EvoGAMES 2017

Last week, @jjmerelo presented at EvoGAMES 2017 (inside Evo* 2017) our work titled “Driving in TORCS using modular fuzzy controllers”.

This paper presents a novel car racing controller for TORCS (The Open Racing Car Simulator), which is based in the combination of two fuzzy subcontrollers, one for setting the speed, and one to control the steer angle. The obtained results are quite promissing, as the controller is quite competitive even against very tough TORCS teams.

The abstract of the paper is:

When driving a car it is essential to take into account all possible factors; even more so when, like in the TORCS simulated race game, the objective is not only to avoid collisions, but also to win the race within a limited budget. In this paper, we present the design of an autonomous driver for racing car in a simulated race. Unlike previous controllers, that only used fuzzy logic approaches for either acceleration or steering, the proposed driver uses simultaneously two fuzzy controllers for steering and computing the target speed of the car at every moment of the race. They use the track border sensors as inputs and besides, for enhanced safety, it has also taken into account the relative position of the other competitors. The proposed fuzzy driver is evaluated in practise and timed races giving good results across a wide variety of racing tracks, mainly those that have many turning points.

There was an interactive presentation at the conference, together with a poster:

The paper is available online from: https://link.springer.com/chapter/10.1007/978-3-319-55849-3_24

Enjoy (and cite) it! :D

 

Entropy is the best predictor of volunteer computing system performance

In volunteer computing systems the users get to decide when, and how much, their own computers are going to be working in a particular problem. We have been working for some time in using volunteer computing for evolutionary algorithms, and all our efforts have focused in having a scalable back end and also finding how the user behaves in order to understand the behavior. A priori, one would think that the more users, the better. However, the fact that these systems are asynchronous and have heterogeneous capabilities means that it might happen that new users do not really have any contribution to the overall effort.
In this paper presented at the EvoStar conference this week, we took a different approach to analyzing performance by using compression entropy, computed over the number of contributions per minute. The bigger compression, the more uniform contributions are; the lower the compression, that means that the contributions change all the time. After some preliminary reports published in FigShare we found that there is a clear trend in an increasing entropy making the algorithm end much faster. This contradicts our initial guess, and also opens new avenues for the design of volunteer evolutionary computing systems, and probably other systems whose performande depends on diversity such as evolutionary algorithms.
Check out the poster and also the presentation done at the conference. You will miss, however, the tulip origami we gave out to the visitors of the poster.
In our research group we support open science, that is why you can find everything, from data to processing scripts to the sources of this paper, in the GitHub repository