Wednesday 3 August 2016

Holding an internal research workshop

We have just held the 4th Aberystwyth Bioinformatics Workshop. It's a one-day workshop, held with no budget, and intended to be a mostly internal informal research networking event.

We call for 5 min lightning talks, 20 minute longer talks, demos of software, and posters. We end up with a good mixture of both. We especially encourage new PhD students to present, and for all attendees to be friendly and supportive rather than combative in their questions. Registration is done by a very simple Google form (name, email, what kind of talk, title of talk/poster, any other comments). Registration closes one week before the workshop. Tea and coffee is acquired somehow, a room is booked, talks are arranged into a programme, and then away we go.
Aber Bioinformatics Workshop attendees July 28th 2016. Photo by Sandy Spence.

 Each time we've done this we have ended up with a full day of talks. People use it to let others know what they're working on, to practise a talk they're preparing for an external conference, to ask for advice on their work, to describe the state of the compute cluster facilities and to just introduce new people. Bioinformatics at Aberystwyth is mostly done within the biology departments of IBERS, but this meeting allows Computer Science and Maths people to join in, and make interdisciplinary links. Finally we go down to the pub, and continue the discussions there.

It's a very low cost minimal preparation way to bring together a group of otherwise independent researchers. Many bioinformaticians feel that they are either the only one in their group, or else, that they're not really a bioinformatician at all and somehow masquerading as one. I've learned a great deal from each workshop that we've had, and its just great to find that we do have a surprisingly strong local support network in such a specialist field.

Yr Eisteddfod

Dw i'n dysgu Cymraeg. I'm a Welsh learner (still making many mistakes). Wythnos diwetha es i i'r Eisteddfod i helpu yn yr Pafiliwn Gwyddoniaeth a Thechnoleg (dydd Gwener i dydd Sul). Last week I went to the Eisteddfod to help in the Science and Technology Pavilion (Friday to Sunday). Mae hi'n fy Eisteddfod gyntaf. It was my first Eisteddfod. Bendigedig! Bydda yn mynd eto blwyddyn nesaf. Fantastic! I'll go again next year.

Thursday 16 June 2016

Aros yn Ewrop : the metagenome of our countries

My thoughts on the parallels between our work in metagenomics and the referendum on June 23rd 2016.

A community of species
sharing cross-genomic pieces,
co-existing in the rumen
work to chew the grass around.

A community of nations
with historical relations
can assemble a consensus
to agree on common ground.

The variation that we're seeing
gives an accent to each being
and to the metagenome union,
and the medley swings in sound.

Vote to stay, aros yn ewrop!
Exchange people, plans and workshops.
In a globe of sequence differences
we can learn from each one found.

Wednesday 20 April 2016

Gregynog Statistical Conference 2016

The Gregynog Statistical Conference is a long running conference, now in its 52nd year. This conference has been running since 1965. The conference has such a long history that its origins predate box-and-whisker plots, bootstrapping and the R language. But statistics has clearly been relevant and important for the last 52 years and will no doubt remain so for the next 52 years.

Gregynog Hall, where the conference is held every year, is in the heart of mid Wales. It is a beautiful old mansion bequeathed to the University of Wales by the Davies sisters, and now used for conferences, music festivals and educational activities, such as our computer science undergraduate weekends away.

This year the conference main themes seemed to be modelling of epidemics, using variants of S-I-R models, MCMC and Markov models in general. Another topic for discussion was p-values, following the Friday evening after-dinner talk on this subject by David Colquhoun. The statistical power of experiments and meta-studies to combine data from smaller studies was also a recurring theme. Some of the talks I enjoyed were by Ruth King, who described how to include time spent in each state (dwell time) in a Markov model, and Simon Spencer who explained S-I-R epidemic models and went on to use MCMC and importance sampling to estimate his model parameters. Also Chris Jewell, who described the challenges of modelling vector-borne disease outbreaks in cattle in New Zealand, while at the same time providing real-time advice to government on how to manage the course of the disease.

The poster session was a little haphazard. Somehow the posterboards hadn't arrived so posters were bluetacked to the cupboards, blackboards and walls. But the range of topics was good, from Sam Nicholls' work on modelling the metahaplome in metagenomics, to students from Warwick working on the approximation of integration and partial derivatives using Gaussian functions, and a meta-analysis of studies on delayed rewards and delayed penalties (receiving £10 today instead of £20 next week, vs minus £10 today instead of minus £20 next week).

Hopefully another new statistics lecturer will be joining our maths department shortly, as we're recruiting at the moment. Statistics underlies almost every area of research now, particularly in the sciences. We do need to make sure that we keep talking to the expert statisticians regularly.

Tuesday 5 April 2016

Lovelace Colloquium 2016

This year's Lovelace Colloquium was held at Sheffield Hallam University, last week (March 31st). Sheffield Hallam proved to be a great venue. It's convenient for most people in the UK to get to, with a smart building right by the train station, providing a large poster-exhibiting hall, a modern lecture theatre and a cafeteria, all next to each other.

Every year the Lovelace is an inspiring event. I've now been (and blogged) in 2012, 2013, 2014 and 2015 and it gets bigger and better each year. I'm particularly impressed by the first year undergraduates who are up there presenting posters alongside everyone else, talking to employers and thinking about their future careers.

I didn't get to attend many of the talks this year because I spent more time on the desk and doing organisational jobs (and fixing my posters-numbering error, oops!). But there were some really strong posters covering a wide range of computer science topics, including several with live Arduino demos. The end of the day panel session featured questions and advice on where the field is going in the future, the pros and cos of a career in industry vs academia, the challenges of running your own business and questions about recruitment.

This year I was also impressed to chat to many interesting people during the evening social, for example Claire and Emily from Relish Learning. After graduating from uni they worked for others until deciding one day that they could do it themselves. They set up their own business in Sheffield, and now provide digital e-learning, for a wide range of topics. They described how they've been recently training people in the Army on how to change the wheel on a tank (imagine animations of the components required, and the order in which to remove parts, etc). They are now keen to help others to succeed, to encourage them to believe that they can and to talk to others about how they did it.

If you are not recommending this event to your women undergrads in computer science, then they are missing out. Poster presenters get expenses refunded and may come away with a prize, thanks to all the sponsors. Employers are keen to meet them, so they will also come away with contacts to help them apply for a job or placement. The photos of the event give a great impression of what it's really like if anyone needs any further reassurance.

Other summaries of the day:

Monday 28 March 2016

Goldilocks: census your genomes

Goldilocks is a new tool written by Sam Nicholls for counting interesting properties of genomes. It's very easy to install ("pip install goldilocks") and has a detailed user manual.

So, let's have a look at GC count across each of the chromosomes of Sorghum. Sorghum is a plant that is a reasonably close relative to Miscanthus, which is extensively studied here in Aberystwyth. I downloaded the chromosome assembly of sorghum from RefSeq. Here's the plot, showing amount of GC on the y-axis and position along the chromosome on the x-axis. The 10 Sorghum chromosomes are all shown stacked up in one plot panel.
The Python code for this using Goldilocks to do this plot is as simple as:

sequence_data = { 
    "sorghum" : {"file": "./sorghum.fna.fai"},
g = Goldilocks(GCRatioStrategy(), sequence_data, length="500K",
               stride = "1000K", is_faidx = True)
g.plot("sorghum", title="GC content of sorghum chromosomes")

The dip in GC for the centromere of each chromosome is obvious, except for chromosomes 2 and 6.

A similar but inverted pattern can be seen if we look at the number of Ns along the genome:

So, what's different about the centromeres in chromosomes 2 and 6? Why are they not so visible? Another way to spot them would be to look for a motif known to be in the centromeres. Centromeres have many repeats, and a repeat region known to be found in sorghum centromeres is CEN38. Let's choose a short motif from the sequence for CEN38, say "CCTAATG", and census that.

There's clearly plenty of this motif found in chromosomes 2 and 6, and found where we might expect a centromere to be (also lots of this motif in the centromeres of chr 3 and chr 5 too). But it's not found in all chromosomes. Could it be that CEN38 varies its sequence in the other chromosomes, and so doesn't have precisely that motif? Or that too many Ns in the other chromosomes stop CEN38 being characterised?

This is just a simple demonstration of how Golidlocks can be used to explore questions. And questions lead to more questions, and then many a happy hour can be spend browsing your genomes. Goldilocks can also be used to export details about which regions are the most interesting (hence the name: it finds regions that are "just right", for whatever your "just right" criterion might be).

Enjoy browsing your genomes! Goldilocks paper, Goldilocks docs, Goldilocks source code.

Saturday 30 January 2016

Playful coding: computing activities for schools

In many schools, computing is a topic that needs more encouragement. The Playful Coding project wants to make practical activities that can be run in schools to explore ideas in computer science. I've just been along to one of their meetings and seen it in action. It was extremely inspiring to be in a room full of people who didn't see running computing engagement activities as a chore, but as fun. They had all put a lot of thought into making fun activities and all wanted to run their activities with the groups of children.

It's an EU project involving teachers and university researchers from Spain, Romania, Italy, France and us in Aberystwyth, Wales. Each project partner had developed several activities and the purpose of the meeting was to tested out many of these activities on children and their teachers, and to start to develop a guide for teachers to explain how to use them. Until that guide is produced, you can still browse the activities and have a go with them. Try out for example:
There are lots more activities to choose from. Some just take an hour, some take a day, and some span a term. Some use robots, some use no particular equipment. They can be embedded into other lessons (languages, maths, science, art), or just standalone. And they are adaptable for different age ranges.

To follow the project see the Playful Coding website, follow #playfulcoding on Twitter or find Playful Coding on Facebook.