Early prediction of the outcome of Starcraft Games

As a result of Antonio Álvarez Caballero master’s thesis, we’ll be presenting tomorrow at the IJCCI 2017 conference a poster on the early prediction of Starcraft games.
The basic idea behind this line of research is to try and find a model of the game so that we can do fast fitness evaluation of strategies without playing the whole game, which can take up to 60 minutes. That way, we can optimize those strategies in an evolutionary algorithm and find the best ones.
In our usual open science style, paper and data are available in a repository.
Our conclusions say that we might be able to pull that off, using k-nearest neighbor algorithm. But we might have to investigate a bit further if we really want to find a model that gives us some insight about what makes a strategy a winner.

37404346594_a261c62e38_k.jpg

Dark clouds allow early prediction of heavy rain in Funchal, near where IJCCI is taking place

Advertisements

StarCraft protagoniza la reunión semanal de 30/oct/2017

En la pasada reunión del grupo Geneura, Victor estuvo exponiendo al resto de asistentes el trabajo denominado Predicting the Winner in Two Player StarCraft Games que fue publicado en el congreso  CoSECiVi’15 por el profesor Antonio A. Sanchez-Ruiz.

El enlace a la presentación está en https://vrivas.github.io/explicando-sanchez-ruiz-2015/output/index.html

 

Self-organized criticality in software repositories, poster presented at ECAL 2017

mural-insa

The European Conference on Artificial Life or ECAL is not one of our usual suspects. Although we have attended from time to time, and even organized it back in 95 (yep, that is a real web page from 1995, minus the slate gray background), it is a conference I quite enjoy, together with other artificial life related conferences. Artificial life was quite the buzzword in the 90s, but nowadays with all the deep learning and AI stuff it has gone out of fashion. Last time I attended,ten years ago, it seemed more crowded. Be that as it may, I have presented a tutorial and a poster about our work on looking for critical state in software repositories. This the poster itself, and there is a link to the open access proceedings, although, as you know, all our papers are online and you can obtain that one (and a slew of other ones) from repository.
This is a line of research we have been working on for a year now, from this initial paper were we examined a single repository for the Moose Perl module. We are looking for patterns that allow us to say whether repositories are in a critical state or not. Being as they are completely artificial systems, engineering artefacts, looking for self organized criticality might seem like a lost cause. On the other hand, it really clicks with our own experience when writing a paper or anything, really. You write in long stretches, and then you do small sessions where you change a line or two.
This paper, which looks at all kinds of open source projects, from Docker to vue.js, looks at three different things: long distance correlations, free-scale behavior of changes, and a pink noise in the spectral density of the time series of changes. And we do find it, almost everywhere. Most big repos, with more than a few hundred commits, possess it, independently of their language or origin (hobbyist or company).
There is still a lot of work ahead. What are the main mechanisms for this self-organization? Are there any exceptions? That will have to wait until the next conference.

Asynchronous, heterogeneous, pool based evolutionary algorithms in GECCO 2017

35172883894_62a4ac78b1_zboo
Fresh back from GECCO 2017, which is probably the main event for evolutionary algorithms and other metaheuristics. Together with the conference proper, there are workshops and tutorials. Last year we achieved full score, with papers, posters and tutorials. Unfortunately, not this year.
We’re happy though with the two papers that were accepted in the EvoSoft workshop, which we usually attend, and the BBOB benchmarking workshop. Both used the same thing, EvospaceJS, Mario’s framework for working with tuple-space pool-based evolutionary algorithms. The idea of this pool is decoupling algorithms from population. And as soon as you do that, a world of posibility opens, like using different clients on the same pool. In the EvoSoft paper, evospace-js: asynchronous pool-based execution of heterogeneous metaheuristics, we presented the general framework and a pool of concept which combined PSO and evolutionary algorithms, with very interesting results. Here’s the rather laconic presentation, which is a reason more to check out the paper.
In the second paperBenchmarking a pool-based execution with GA and PSO workers on the BBOB noiseless testbed.
All in all, EvospaceJS and NodIO, the two frameworks we work with, offer a nice platform for experimentation with different kind of algorithms that can be easily transported to the cloud and adapted to volunteer computing environments. Whatever the case, it also has an interesting dynamics that has an influence on the working of the evolutionary algorithms. Sure, we will continue tapping this source of interesting insights on evolutionary models.

Finding self-organized criticality in collaborative work via repository mining (IWANN’2017)

Captura de pantalla 2017-06-16 a la(s) 09.58.27

I would like to spot your attention to three points:

  • Development teams eventually become complex systems, mainly in collaborative work environments.
  • Relations and collaborations take place through the environment.
  • Pattern mining and analysing social-based information is a complex problem.

Thus, our main objective was studying new methodologies to analyse patterns of collaboration in collaborative work environments, as it is a complex problem that needs new tools to explore and analyse data related to relations-based information.

Also, we wanted to explore and analyse relations-based data, e.g. to answer the question “Do developers self-organize?”, and finally, to contribute to open science tools and methodologies.

In Statistical Physics, criticality is defined as a type of behaviour observed when a system undergoes a phase transition. A state on the edge between two different types of behaviour is called the critical state, and in this state the system is at criticality.

A clear example is the sandpile model, in which, if we add one grain to the pile, in average the steepness of slopes increases. However, the slopes might evolve to a critical state where a single grain of sand is likely to settle on the pile, or to trigger an avalanche:

Captura de pantalla 2017-06-16 a la(s) 09.42.24

In this report we work on a repository for several papers. There, we examined 4 repositories where the collaborative writing of scientific papers take place using GitHub. Repositories with a certain “length”, more than 50 commits (changes), have been chosen. Thus, we could analyse changes in files, looking for the existence of:

  • a scale free structure
  • long-distance correlations
  • pink noise

Several macro measures extracted from the size of changes to the files in the repository were obtained:
1. Sequence of changes
2. Timeline of commit sizes
3. Change sizes ranked in descending order
4. Long-distance correlations
5. Presence of pink noise (1/f)

Paying attention to the sequence of changes and the timeline of commit sizes (1, 2), no particular “rhythm” can be seen: daily nor on the changes. Repositories can be static for a long time, to experience a burst of changes all of a sudden (avalanche), that is a symptom of the underlying self-organized criticality state.

After plotting change sizes ranked in descending order (3), it can be seen that some authors send atomic changes while others write down big paragraphs/sections before commit those big changes. At the end, we can see a tail corresponding to big changes at the end (just before sending the paper).

Long-distance correlations plots show how long distance autocorrelations appear in different places depending on the repository, but is present in most cases anyway.

Finally, pink noise refers to any noise with a power spectral density of the form 1/f. In order to see clearly the presence of pink noise, the spectrum should present a slope equal to -1. However, there is not a clear trend downwards. Maybe this could appear later on in development. Maybe could see that trend using repositories with a longest history. In any case, the fact that this third characteristic is not present does not obscure the other two, which appear clearly.

As conclusions, we have demonstrated that, after analysing several repositories for scientific papers writing, they are in a critical state, as (1) changes have a scale-free form, (2) there are long-distance correlations, and (3) pink noise has been detected (only in some cases).

For the shake of reproducibility and as we support open science, both the programs and data related to this report are available online at the repository “Measuring progress in literature and in other creative endeavours, like programming”
http://github.com/JJ/literaturame

The slides used to present this work in IWANN’2017 Congress are available at:
https://es.slideshare.net/pacvslideshare/finding-selforganized-criticality-in-collaborative-work-via-repository-mining

I Reunión Internacional de Metabolómica y Cáncer

El próximo 26 de Mayo, Víctor Rivas, uno de los miembros del grupo GeNeura, impartirá una ponencia denominada “Interpretación de resultados mediante herramientas de minería de datos” como parte de la I Reunión Internacional de Metabolómica y Cáncer. Dicho evento tendrá lugar el 26 de mayo 2017 y ha sido organizado por la Fundación MEDINA  y el Complejo Hospitalario de Jaén.

La reunión se llevará a cabo en el Parador de Jaén, siendo la asistencia a la misma  completamente gratuita previa inscripción en http://www.esmeeting.es/i-reunion-internacional-de-metabolomica-y-cancer/