Thursday, 26 September 2013


Machine learning, data mining and statistical data analysis is clearly a popular area now, judging by the number of attendees of this year's European conference, ECMLPKDD 2013.

It's been a long time since I last attended (2001 for multi-label classification by a modification to C4.5). I think the field has grown and matured a lot. There are far fewer papers now showing the results of a new algorithm on 10 different UCI datasets. There is far more presence from people in industry. And industry is varied: search engines, internet shopping and finance. Yahoo, Amazon, Zalando, Deloitte and many others sponsored the conference and sent people to speak or attend. There was an "industry track", and that room was full.

Themes that I picked up on were: regression (still popular!), lots of tensors and matrices, numerical analysis methods for large data sets, network mining, sequence mining, and generally using ML/DM to influence people (buying, voting, doing good, giving your system feedback).

The organisers this year have really done a good job: working wifi, lots of food and coffee, sessions running on time, plenty of mingling time, and choosing a venue in a beautiful city, with accommodation in a wide range of hotels within easy walking distance booked as an easy part of the registration process. It is appreciated!

Diversity is something that the ECMLPKDD community have started to work on improving. It has the usual male/female imbalance of a technical conference. Perhaps slightly more women than I expected, or maybe I'm just getting used to this. I'd hazard a guess at about 20% or a bit less. But next year's organising committee are more gender-balanced, and there will also be a Diversity Chair to keep an eye on the issue.

Openness of code and data is something else the community are working on improving. This year for the first time they had an award for "Open Science", and encouraged paper submissions to include a link to code/data. In order to award this, the organisers had to download, compile, run and test lots of submitted code. I don't know which of the organisers did this onerous task, but I'm very pleased they did.

If I had to point out one thing that could still be improved, my number 1 would be that the proceedings are owned by Springer, and are not open. For reasons known only to Springer, I can't make an account with them or reactive an existing account. Maybe Springer will reply to my email eventually. But if the proceedings were open access (papers deposited at arXiv for example) then this would really benefit the ML/DM community and others, and more widely promote the work of everyone who presented.

Next year, 2014, the conference moves to Nancy, France, a city with Art Nouveau architecture, and with many fine wines.