Monday, 10 April 2017

Visiting JG Mainz University

I've just returned from a two week visit to Johannes Gutenberg University Mainz, where I was hosted by Andreas Karwath in the Data Mining group of the Dept of Informatik. I was there to pick up new research ideas, to get away from admin/teaching duties for a while and to share what we've been working on lately. 

The University at Mainz is a large campus based university on the outskirts of the town, on the beautiful river Rhine. It's named after the inventor of the printing press in the west, Johannes Gutenberg. The Gutenberg museum is excellent, tracing books from wax tablets through handwritten parchments, to moveable type print, then the industrial revolution, typewriters and high volume printing. This is a museum about the value of information and its transmission, and the technology invented to do this. There was even a special exhibition about the Futura font, as an added bonus. This is the geometric circles-and-lines font used in posters for "2001 A Space Odyssey", and on the Apollo 11 moon plaque ("We came in peace for all mankind"), and in so much future-looking advertising and propaganda in the Art Deco inspired 1930s era, both in Germany and beyond.

It was interesting to be embedded in a data mining group, rather than bioinformatics for a change. They have a broad range of application areas, but also happily switch technology (neural nets, relational learning, topic models, matrix decomposition, graphs, rs-trees and more) as the application area needs. Also very interesting to see in which ways a different country's research culture is different. It's not REF-dominated like the UK, so more they're more free to focus on quick-turnaround peer-reviewed compsci conference publications, and perhaps more hierarchically structured, as only the few professors have permanent positions. And yet it's still the same. University departments are international places and share much in common, whichever country you're in: same grant applications, student supervisions, seminar talks, dept silos, etc.

It was great fun to be there, and they were excellent hosts. At the same time, it was strange and sad to be a British person on exchange in Germany during the week that the UK sent article 50 to the EU. International collaborations are so important to research that leaving the EU is bound to be hugely detrimental to us UK academics. We need more exchange, not less.

Tuesday, 4 April 2017

The metahaplome

A sample of water, soil or gut contents will contain a whole community of microbes, cooperating and competing. We now have the sequencing technology to begin to explore these communities, to find out the variation that they possess. This can be useful in the search for new anti-microbials, or in the search for better enzymes for biofuels. However, the sequencing technology is not quite there yet. The very short length of the reads, together with the errors introduced, combine to make the problem of reassembling the underlying genomes much more complex.

We introduce the concept of the 'metahaplome': the exact sequence of DNA bases (or "haplotype") that constitutes the genes and genomes of every individual present. We also present a data structure and algorithm that will recover the haplotypes in the metahaplome, and rank them according to likelihood.

Our preprint about the metahaplome is now available at bioRxiv: Probabilistic Recovery Of Cryptic Haplotypes From Metagenomic Data.

Monday, 20 March 2017

Is academic impact useful as a proxy measure for real world impact?

In academia we have various measures of "success". Some of the more common measures are "how many papers did we publish in reputable journals" and "how many other academics went on to use my work or refer to my work". We count citations, check out our h-index, and perhaps even check the number of other academics talking about our work on social media. We'd all like our work to be useful to others. Of course, any measure of success can be gamed. Academics may cite themselves to boost citation counts, use clickbait paper titles to attract attention, and select journals for prestige rather than availability to the community.

However, to be useful to other academics is not the same as to be useful to the rest of the world. Are we also having impact outside of academia? Some blue sky basic research is unlikely to do this (but perhaps likely to be cited by academics). Some applied research can have immediate impact on medical outcomes, law and policy, societal attitudes, civil rights, environmental strategies and business practices.

The UK tries to measure this kind of research impact by asking universities to submit REF Impact case studies. These are summary documents, written by academics, describing what impact they've had. They're not easy to write, or to evidence, and yet they are used as part of the REF exercise measuring research quality, whereby Higher Education funding bodies distribute research money to the universities.

How can we make it easier for academics to find and present their impact? Before we answer that, is the whole exercise worth doing anyway? We need to know if we could just cheaply count citations and use this as a proxy for real world impact. After all, if a paper is popular with academics, surely it's also going to be useful to the rest of the world too?

We've done the analysis in Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvement. TL;DR: the answer is 'no'. Academic impact is not correlated with more comprehensive impact in the real world. We're not surprised, but we needed to prove it. But don't take our word for it: we've made all the data available. Try it yourself.

On with the next steps now... James will now be data mining the real world impact of scientific research, from the collection of unstructured documents out there in news archives, parliamentary proceedings, etc. We expect NLP, data mining and machine learning challenges ahead as we trace the movement of academic ideas and results out from academia and into the rest of the world.

Wednesday, 3 August 2016

Holding an internal research workshop

We have just held the 4th Aberystwyth Bioinformatics Workshop. It's a one-day workshop, held with no budget, and intended to be a mostly internal informal research networking event.

We call for 5 min lightning talks, 20 minute longer talks, demos of software, and posters. We end up with a good mixture of both. We especially encourage new PhD students to present, and for all attendees to be friendly and supportive rather than combative in their questions. Registration is done by a very simple Google form (name, email, what kind of talk, title of talk/poster, any other comments). Registration closes one week before the workshop. Tea and coffee is acquired somehow, a room is booked, talks are arranged into a programme, and then away we go.
Aber Bioinformatics Workshop attendees July 28th 2016. Photo by Sandy Spence.

 Each time we've done this we have ended up with a full day of talks. People use it to let others know what they're working on, to practise a talk they're preparing for an external conference, to ask for advice on their work, to describe the state of the compute cluster facilities and to just introduce new people. Bioinformatics at Aberystwyth is mostly done within the biology departments of IBERS, but this meeting allows Computer Science and Maths people to join in, and make interdisciplinary links. Finally we go down to the pub, and continue the discussions there.

It's a very low cost minimal preparation way to bring together a group of otherwise independent researchers. Many bioinformaticians feel that they are either the only one in their group, or else, that they're not really a bioinformatician at all and somehow masquerading as one. I've learned a great deal from each workshop that we've had, and its just great to find that we do have a surprisingly strong local support network in such a specialist field.

Yr Eisteddfod

Dw i'n dysgu Cymraeg. I'm a Welsh learner (still making many mistakes). Wythnos diwetha es i i'r Eisteddfod i helpu yn yr Pafiliwn Gwyddoniaeth a Thechnoleg (dydd Gwener i dydd Sul). Last week I went to the Eisteddfod to help in the Science and Technology Pavilion (Friday to Sunday). Mae hi'n fy Eisteddfod gyntaf. It was my first Eisteddfod. Bendigedig! Bydda yn mynd eto blwyddyn nesaf. Fantastic! I'll go again next year.

Thursday, 16 June 2016

Aros yn Ewrop : the metagenome of our countries

My thoughts on the parallels between our work in metagenomics and the referendum on June 23rd 2016.

A community of species
sharing cross-genomic pieces,
co-existing in the rumen
work to chew the grass around.

A community of nations
with historical relations
can assemble a consensus
to agree on common ground.

The variation that we're seeing
gives an accent to each being
and to the metagenome union,
and the medley swings in sound.

Vote to stay, aros yn ewrop!
Exchange people, plans and workshops.
In a globe of sequence differences
we can learn from each one found.

Wednesday, 20 April 2016

Gregynog Statistical Conference 2016

The Gregynog Statistical Conference is a long running conference, now in its 52nd year. This conference has been running since 1965. The conference has such a long history that its origins predate box-and-whisker plots, bootstrapping and the R language. But statistics has clearly been relevant and important for the last 52 years and will no doubt remain so for the next 52 years.

Gregynog Hall, where the conference is held every year, is in the heart of mid Wales. It is a beautiful old mansion bequeathed to the University of Wales by the Davies sisters, and now used for conferences, music festivals and educational activities, such as our computer science undergraduate weekends away.

This year the conference main themes seemed to be modelling of epidemics, using variants of S-I-R models, MCMC and Markov models in general. Another topic for discussion was p-values, following the Friday evening after-dinner talk on this subject by David Colquhoun. The statistical power of experiments and meta-studies to combine data from smaller studies was also a recurring theme. Some of the talks I enjoyed were by Ruth King, who described how to include time spent in each state (dwell time) in a Markov model, and Simon Spencer who explained S-I-R epidemic models and went on to use MCMC and importance sampling to estimate his model parameters. Also Chris Jewell, who described the challenges of modelling vector-borne disease outbreaks in cattle in New Zealand, while at the same time providing real-time advice to government on how to manage the course of the disease.

The poster session was a little haphazard. Somehow the posterboards hadn't arrived so posters were bluetacked to the cupboards, blackboards and walls. But the range of topics was good, from Sam Nicholls' work on modelling the metahaplome in metagenomics, to students from Warwick working on the approximation of integration and partial derivatives using Gaussian functions, and a meta-analysis of studies on delayed rewards and delayed penalties (receiving £10 today instead of £20 next week, vs minus £10 today instead of minus £20 next week).

Hopefully another new statistics lecturer will be joining our maths department shortly, as we're recruiting at the moment. Statistics underlies almost every area of research now, particularly in the sciences. We do need to make sure that we keep talking to the expert statisticians regularly.