Showing posts with label open science. Show all posts
Showing posts with label open science. Show all posts

Tuesday, 29 September 2015

An executable language for change in biological sequences

A discussion on Twitter about whether there was a language for representing sequence edits prompted me to post my draft proposal for such a language. http://figshare.com/articles/Draft_proposal/1559009

Comments, criticism, collaboration and competition welcome. Hopefully I'll submit it shortly.

Thursday, 26 September 2013

ECML PKDD 2013

Machine learning, data mining and statistical data analysis is clearly a popular area now, judging by the number of attendees of this year's European conference, ECMLPKDD 2013.

It's been a long time since I last attended (2001 for multi-label classification by a modification to C4.5). I think the field has grown and matured a lot. There are far fewer papers now showing the results of a new algorithm on 10 different UCI datasets. There is far more presence from people in industry. And industry is varied: search engines, internet shopping and finance. Yahoo, Amazon, Zalando, Deloitte and many others sponsored the conference and sent people to speak or attend. There was an "industry track", and that room was full.

Themes that I picked up on were: regression (still popular!), lots of tensors and matrices, numerical analysis methods for large data sets, network mining, sequence mining, and generally using ML/DM to influence people (buying, voting, doing good, giving your system feedback).

The organisers this year have really done a good job: working wifi, lots of food and coffee, sessions running on time, plenty of mingling time, and choosing a venue in a beautiful city, with accommodation in a wide range of hotels within easy walking distance booked as an easy part of the registration process. It is appreciated!


Diversity is something that the ECMLPKDD community have started to work on improving. It has the usual male/female imbalance of a technical conference. Perhaps slightly more women than I expected, or maybe I'm just getting used to this. I'd hazard a guess at about 20% or a bit less. But next year's organising committee are more gender-balanced, and there will also be a Diversity Chair to keep an eye on the issue.

Openness of code and data is something else the community are working on improving. This year for the first time they had an award for "Open Science", and encouraged paper submissions to include a link to code/data. In order to award this, the organisers had to download, compile, run and test lots of submitted code. I don't know which of the organisers did this onerous task, but I'm very pleased they did.

If I had to point out one thing that could still be improved, my number 1 would be that the proceedings are owned by Springer, and are not open. For reasons known only to Springer, I can't make an account with them or reactive an existing account. Maybe Springer will reply to my email eventually. But if the proceedings were open access (papers deposited at arXiv for example) then this would really benefit the ML/DM community and others, and more widely promote the work of everyone who presented.

Next year, 2014, the conference moves to Nancy, France, a city with Art Nouveau architecture, and with many fine wines. www.ecmlpkdd2014.org

Tuesday, 15 November 2011

Openness: parliament and academia

One thing I was particularly struck by was how open most of the workings of Parliament were. Members of the public can sit in the galleries to watch the debates in the Houses of Commons and Lords, and in Westminster Hall. The debates are televised and fully minuted in Hansard, with minutes online by 6am the next day. Wouldn't it be amazing if we did science in this way, with all of our operations televised and fully minuted for inspection! That really would be the ideal of Open Science.

Not only that, but members of the public can sit in on most select committee meetings, and watch the proceedings. The science equivalent would be to have a cordon at one end of your lab and to allow the public free access to come and go at any time as long as they sit behind the cordon. And the committee reports are all made publicly available on their websites. Not publicly available in some journal that you might not have access to or have to pay $30 per report for, but freely available. Science still has a long way to go to become this freely available.