I would like to spot your attention to three points:
- Development teams eventually become complex systems, mainly in collaborative work environments.
- Relations and collaborations take place through the environment.
- Pattern mining and analysing social-based information is a complex problem.
Thus, our main objective was studying new methodologies to analyse patterns of collaboration in collaborative work environments, as it is a complex problem that needs new tools to explore and analyse data related to relations-based information.
Also, we wanted to explore and analyse relations-based data, e.g. to answer the question “Do developers self-organize?”, and finally, to contribute to open science tools and methodologies.
In Statistical Physics, criticality is defined as a type of behaviour observed when a system undergoes a phase transition. A state on the edge between two different types of behaviour is called the critical state, and in this state the system is at criticality.
A clear example is the sandpile model, in which, if we add one grain to the pile, in average the steepness of slopes increases. However, the slopes might evolve to a critical state where a single grain of sand is likely to settle on the pile, or to trigger an avalanche:
In this report we work on a repository for several papers. There, we examined 4 repositories where the collaborative writing of scientific papers take place using GitHub. Repositories with a certain “length”, more than 50 commits (changes), have been chosen. Thus, we could analyse changes in files, looking for the existence of:
- a scale free structure
- long-distance correlations
- pink noise
Several macro measures extracted from the size of changes to the files in the repository were obtained:
1. Sequence of changes
2. Timeline of commit sizes
3. Change sizes ranked in descending order
4. Long-distance correlations
5. Presence of pink noise (1/f)
Paying attention to the sequence of changes and the timeline of commit sizes (1, 2), no particular “rhythm” can be seen: daily nor on the changes. Repositories can be static for a long time, to experience a burst of changes all of a sudden (avalanche), that is a symptom of the underlying self-organized criticality state.
After plotting change sizes ranked in descending order (3), it can be seen that some authors send atomic changes while others write down big paragraphs/sections before commit those big changes. At the end, we can see a tail corresponding to big changes at the end (just before sending the paper).
Long-distance correlations plots show how long distance autocorrelations appear in different places depending on the repository, but is present in most cases anyway.
Finally, pink noise refers to any noise with a power spectral density of the form 1/f. In order to see clearly the presence of pink noise, the spectrum should present a slope equal to -1. However, there is not a clear trend downwards. Maybe this could appear later on in development. Maybe could see that trend using repositories with a longest history. In any case, the fact that this third characteristic is not present does not obscure the other two, which appear clearly.
As conclusions, we have demonstrated that, after analysing several repositories for scientific papers writing, they are in a critical state, as (1) changes have a scale-free form, (2) there are long-distance correlations, and (3) pink noise has been detected (only in some cases).
For the shake of reproducibility and as we support open science, both the programs and data related to this report are available online at the repository “Measuring progress in literature and in other creative endeavours, like programming”
The slides used to present this work in IWANN’2017 Congress are available at: