A data-mining based process to early identify breast cancer from metabolomic data

Abstract of our work presented at EURO 2018, the largest and most important conference for Operational Research, co-authored by Víctor M. Rivas Santos, jointly with researchers of Complejo Hospitalario de Jaén and Fundación Medina.

This paper was presented last 9-July-2018 at Valencia, as part of the stream Data Mining and Statistics.

A data-mining based process to early identify breast cancer from metabolomic data


We present the results yielded by our multidisciplinary group in the task of discriminating blood samples coming from breast cancer patients and healthy people. Models used to classify samples have been built using data mining techniques; data have been collected by means of liquid chromatography-mass spectrometry, a technique that detects and quantifies the metabolites present in blood samples.

Different algorithms have been tested under 10-CV and 75/25 scenarios. Our experiments showed that IBk, and J48 and Logistic Model Trees yielded rates greater than 90% only for healthy people. Naive Bayes and Random Forest enhanced the previous results in the 10-CV approach, but they did not yield more than 85% of true positives for patients in the 75/25 one. Finally, Bayesian network resulted to be the best algorithm as rates greater than 90% were yielded for both patients and rest of the people.

Many statistics have been computed as well as confusion matrices, showing that the model built by Bayesian network can effectively be used to solve this problem. Currently, the metabolites used to do built the model are being identified by biochemists. This last step will be definitive in order to consider them as a valid biomarker for breast cancer.


GeNeura at European Project MUSES Final Review

GeNeura’s members have been working in the three-year long, FP7 European project MUSES, which faced its last review last week at the European Commision Beaulieu Quarter Buildings in Brussels.

UGR was one of the partners participating in this project. More concretely, GeNeura’s members have contributed by leading WP2 – MUSES framework definition and integration during the completion of tasks to define the MUSES System Architecture. In addition, GeNeura’s research has been applied to the project in WP5 – Self-adaptive event correlation, lead by a Spanish security company S2 Grupo. The main purpose of this WP was to develop a system which, on the one side, uses event correlation to detect Security Policy violations and, on the other side, performs an analysis of all the data in the system and creates new Security Policies or enhances the existing ones. Different types of classification, rule association, and clustering algorithms, as well as Data Mining techniques, have been applied with satisfactory results. These results were specially welcomed by the comission, ponting that such a system will be very helpful to enhance security. Also, MUSES is an Open Software project, and you can contribute at https://github.com/MusesProject

The results were presented by S2 Grupo and GeNeura together. The slides are now published on Slideshare:

It has been a pleasure for GeNeura to work in MUSES


[Paper] Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by means of Categorical Classifiers

Our work titled Going a Step Beyond the Black and White Lists for URL Accesses in the Enterprise by means of Categorical Classifiers, as part of the researh under the MUSES project, has been presented today at the ECTA 2014 conference.


Corporate systems can be secured using an enormous quantity of methods, and the implementation of Black or White lists is among them.
With these lists it is possible to restrict (or to allow) the users the execution of applications or the access to certain URLs, among others. This paper is focused on the latter option. It describes the whole processing of a set of data composed by URL sessions performed by the employees of a company; from the preprocessing stage, including labelling and data balancing processes, to the application of several classification algorithms. The aim is to define a method for automatically make a decision of allowing or denying future URL requests, considering a set of corporate security policies.
Thus, this work goes a step beyond the usual black and white lists, since they can only control those URLs that are specifically included in them, but not by making decisions based in similarity (through classification techniques), or even in other variables of the session, as it is proposed here.
The results show a set of classification methods which get very good classification percentages (95-97%), and which infer some useful rules based in additional features (rather that just the URL string) related to the user’s access. This led us to consider that this kind of tool would be very useful tool for an enterprise.

You can check the presentation at: .