Amanda Clare: inter-disciplinary

Showing posts with label inter-disciplinary. Show all posts

Wednesday, 3 August 2016

Holding an internal research workshop

We have just held the 4th Aberystwyth Bioinformatics Workshop. It's a one-day workshop, held with no budget, and intended to be a mostly internal informal research networking event.

We call for 5 min lightning talks, 20 minute longer talks, demos of software, and posters. We end up with a good mixture of both. We especially encourage new PhD students to present, and for all attendees to be friendly and supportive rather than combative in their questions. Registration is done by a very simple Google form (name, email, what kind of talk, title of talk/poster, any other comments). Registration closes one week before the workshop. Tea and coffee is acquired somehow, a room is booked, talks are arranged into a programme, and then away we go.

Aber Bioinformatics Workshop attendees July 28th 2016. Photo by Sandy Spence.

Each time we've done this we have ended up with a full day of talks. People use it to let others know what they're working on, to practise a talk they're preparing for an external conference, to ask for advice on their work, to describe the state of the compute cluster facilities and to just introduce new people. Bioinformatics at Aberystwyth is mostly done within the biology departments of IBERS, but this meeting allows Computer Science and Maths people to join in, and make interdisciplinary links. Finally we go down to the pub, and continue the discussions there.

It's a very low cost minimal preparation way to bring together a group of otherwise independent researchers. Many bioinformaticians feel that they are either the only one in their group, or else, that they're not really a bioinformatician at all and somehow masquerading as one. I've learned a great deal from each workshop that we've had, and its just great to find that we do have a surprisingly strong local support network in such a specialist field.

Monday, 8 June 2015

Aber Bioinformatics Workshop

Last week we had the 2nd Aber Bioinformatics Workshop. It's an internal workshop for work-in-progress talks, posters and networking and the aim is for us all to keep up with what's going on in Aberystwyth in bioinformatics across departments and institutes. We had a wide range of talks on genomics and sequence analysis, metabolomics, optimising proteins, population and community modelling, data infrastructure and other topics. Here's the programme for the day.

Photo of all the attendees, taken by Sandy Spence

It was great to see that we now have so many people interested and working in bioinformatics, despite the difficulties in trying to understand all sides of the story (the biology, the computing, the statistics, etc). We talked about the range of modules and courses that were available to help people get up to speed with this, and how we should do more to let new PhD students know what is available. Also, now that we've had the workshop, hopefully we're more aware of the expertise and facilities available here in Aber, so we now know who to approach with questions and ideas.

At the end of the day we moved down to the pub, and continued to discuss more random topics: beetles, plant senescence, hens, temperature sensing wires for computer clusters, and concordance in Shakespeare texts. I'm sure this all helps in the long run.

Wednesday, 22 April 2015

Computer Science and Lindy Hop

It would seem that Lindy Hop is the dance of computing people, physicists and engineers. If you go to any swing dance camp, an unreasonable proportion of the people in the room will be somehow involved in IT. We have Lindy hoppers who have used Androids with sensors and fourier transforms to look at the pulse of the dance, use Lindy to illustrate quantum computing, and there is even a specific Lindy dance class for engineers. Sam Carroll described how digital media savvy the community was and is, in her Step Stealing work.

Okay, so people need money to go to dance camps, and computing professions generally pay well. And it gets us away from our desks and having fun with other people and music. However, these can't be the only reasons.

I think that I enjoy Lindy for lots of the same reasons that I enjoy computing. They're both about creating complex structures that are somehow beautiful. By complex structures I mean structures that are complicated enough that they make me feel pleased when I finally successfully make them work. By beautiful I mean code/dance/ideas that become elegant because of their appropriateness in that particular situation. And in both computing and Lindy I enjoy the reusable patterns. Reusable patterns in rhythm are like reusable patterns in computing: once you've understood them, they stay with you and can often tell you something more abstract about what you're trying to do.

So I think that computing and Lindy have more in common than just having fun. They also share reusable beautiful complexity.

Added note: If you want to try it out, come and join our Vintage Swinging in the Rain party on Friday 24th April, 8pm, Marine Hotel, Aberystwyth. There's a short dance class for beginners at about 8:30, and live music from The Paper Moon Band.

Tuesday, 10 March 2015

Python for Scientists

This year, 2014/2015, we started a new MSc course: Statistics for Computational Biology. We can see that there's a huge demand for bioinformaticians, for statisticians who can read biology, and for programmers who know about statistics and can apply stats to biological problems. So this new MSc encompasses programming, statistics and loads of the current hot topics in biology. It's the kind of MSc I would have loved to have done when I was younger.

As part of this degree, I'm teaching a brand new module called Programming for Scientists, which uses the Python programming language. This is aimed at students who have no prior programming knowledge, but have some science background. And in one semester we teach them the following:

The basics of programming: variables, loops, conditionals, functions
File handling (including CSV)
Plotting graphs using matplotlib
Exceptions
Version control using Git/Github
SQL database (basic design, queries, and using from SQLite from Python)
XML processing
Accessing data from online APIs

We taught it as a hands-on module, lectures held in a room full of computers, programming as we go through the slides, with exercises interspersed and demonstrators on hand to help.

We had students sign up for this module from a surprisingly diverse set of backgrounds, from biology, from maths, from geography and even from international politics. We also had a large number of staff and PhD students from our Biology department (IBERS) who wanted to sit in on the module. This was a wonderful group of students to teach. They're people who wanted to learn, and mostly just seemed to absorb ideas that first year undergraduates struggle with. They raised their game to the challenge.

Python's a great language for getting things done. So it makes a good hands-on language. However, it did highlight many of Python's limitations as a first teaching language. The objects/functions issue: I chose not to introduce the idea of objects at all. It's hard enough getting this much material comfortably into the time we had, and objects, classes and subclasses was something that I chose to leave out. So we have two ways to call functions: len(somelist) and somelist.reverse(). That's unfortunate. Variable scoping caught me out on occasion, and I'll have to fix that for next year. The Python 2 vs Python 3 issue was also annoying to work around. Hopefully next year we can just move to Python 3.

What impressed me most was the quality of the final assignment work. We asked the students to analyse a large amount of data about house sales, taken from http://data.gov.uk/ and population counts for counties in England and Wales taken from the Guardian/ONS. They had to access the data as XML over a REST-ful API, and it would take them approximately 4 days to download all the data they'd need. We didn't tell them in advance how large the data was and how slow it would be to pull it from an API. Undergrads would have complained. These postgrads just got on with it and recognised that the real world will be like this. If your data is large and slow to acquire then you'll need to test on a small subset, check and log any errors and start the assignment early. The students produced some clean, structured and well commented code and many creative summary graphs showing off their data processing and data visualisation skills.

I hope they're having just as much fun on their other modules for this course. I'm really looking forward to running this one again next year.

Friday, 18 July 2014

Microscope webcam microtitre plate reading using image analysis

An A-level student has just spent two weeks with us for his work experience, and his project has been to investigate the use of a cheap microscope webcam as an alternative to an expensive plate reader for the measurement of the growth of yeast in microtitre plates. The longer term aim would be to mount this webcam on the deck of our Tecan Genesis liquid handler robot, and to have the robot arm move the plate under the webcam.

The webcam is a Veho VMS-004, used at 20x magnification, and it costs just £40. It was recognised automatically by Linux as a webcam and worked really well with the OpenCV library.

Robert Buchan-Terrey did an excellent job in interdisciplinary science in just two weeks, including the following:

Preparing media and growing yeast in our lab
Pipetting the yeast to make dilutions
Using the microscope webcam, taking images of the wells in the plate at intervals throughout the day, and corresponding plate readings with a real plate reader
Coding using Python and OpenCV to process the images (find the circular well, work out the average pixel intensity in the well)
Data analysis and stats to understand the results

He also produced a poster to demonstrate the findings and to take back to his school.

And the answer is: although he's just analysed the data from one time point so far, and we took no care to make sure the lighting conditions were stable when taking the images, or to shake the plates to evenly disperse the yeast, it really does look very plausible that we could use this in future. Averaging over 8 replicate wells gives a remarkable correspondence between image-analysis results and plate reader results. Individual wells are more variable, but still show promise. We've yet to test all the data, and to test the full range of the scale of optical density, but this looks extremely exciting.

Thanks very much to Wayne Aubrey and Hannah Dee for their help and expertise with the yeast biology and the image processing respectively.

Monday, 22 July 2013

The Welsh Crucible

This year I applied for a place on the Welsh Crucible, and my application was accepted. The Welsh Crucible is a yearly scheme for researchers in Wales. It takes 30 researchers from different institutions and different disciplines and puts us all together for 3 workshops ("labs") over the summer, to see what interesting ideas and collaborations can be forged.

So I'm now talking about grant applications to a plant biologist in Cardiff, an environmental chemist in Bangor and a physical geographer in Aberystwyth, and I have a new network of future collaborators spanning a huge range of subjects.

Here are a few other thoughts about the experience, in no particular order:

It's a fantastic networking opportunity, a great way to make contacts for subjects outside your immediate area.

Over the course of the 3 labs I became better at introducing my research in a short and presentable manner. At the start I found this very difficult. At the end it's still hard, but I'm making progress. As a researcher I often feel that I can't really describe my research, full of its day-to-day technical details, in a way that anyone else will understand. One of the exercises they asked us to do was to write 100 words about our research, for a lay person. Another was speed-networking (just 3 minutes per person and then move on). My description of my work still varies, depending on who I'm talking to, but I'm much more happy to attempt it now. Doing these exercises not only helps us communicate with other researchers, but also makes us introspect and see if we're actually doing what we want to be doing.

Speed networking. Photo by Keith Morris.

The best interdisciplinary work is likely to come from a real working friendship where we already trust each other. So build these, and the rest will follow. The Crucible has actually made me take more of an interest in other disciplines and it allows me to feel that it is okay to do so. It's good to be interested in other disciplines, even if the REF assessment says otherwise. In fact its generally made me think about the longer term, about putting good foundations in place and not worrying about short term measurements, and individual wins and losses. From Crucible people I've learned about gallium teaspoons, arthritis inflammation and bone chomping, the placebo effect, dating rocks by luminescence under big black sheets, Jews in Scotland, and the problems of conducting health studies across populations of people. I'll definitely go to more seminars from other departments now. And maybe I'll wander into some other departments' coffee rooms too...

The wants and offers wall. Photo by Keith Morris.

We were asked to start off by introducing ourselves with a 9-slide pecha kucha presentation. Far more of the male participants than the female participants included a picture of their children in this introduction (about 8/19 male vs about 0/11 female).

Several of the Crucible participants were promoted or took on new positions of responsibility between lab two and lab three. One participant described it as a useful benchmarking experience. We look at others around us and see what can be achieved.

We have some very talented and enthusiastic researchers across Wales. As a group, we cover a lot of research expertise, and we have the whole range of skills (people management, media engagement, conference organising, book writing, schools outreach, grant writing, presenting, teaching, etc). I look forward to meeting everyone again at the reunion.

All the Crucible 2013 attendees. Photo by Keith Morris.

Wednesday, 6 June 2012

Alan Turing, the first bioinformatician

This is the Alan Turing centenary year, and Alan Turing would have been 100 years old this month (on 23rd June) if he had lived this long. As well as inventing computers, theories of decidability, computability, computational cryptography and artificial intelligence, just before his death he also studied the relationship between mathematics and the shapes and structures found in biology. How do patterns in plants, such as the spiral packing of the seeds found in the head of a sunflower, come about? This year, in a big experiment, devised by Prof Jonathan Swinton to celebrate his centenary, sunflowers are being grown across the country. The seed heads will be collected, their patterns counted, then hopefully the results will demonstrate the relationship between Fibonacci numbers and biological growth that Turing was investigating. We're growing two sunflowers here in the Dept of Computer Science at Aberystwyth University as part of this experiment. Their names were voted on by the Department, and "one" and "zero" were chosen.

Turing used an early computer at Manchester (the Ferranti Mark 1, the first commercially available general purpose electronic computer) to model the chemical processes of reaction and diffusion, which could give rise to patterns such as spots and stripes. You can play with a Turing reaction-diffusion applet online, which shows how changes to the diffusion equation parameters produce different patterns. Turing wrote, near the end of his 1952 paper The Chemical Basis of Morphogenesis that:
"Most of an organism, most of the time, is developing from one pattern into another, rather than from homogeneity into a pattern. One would like to be able to follow this more general process mathematically also. The difficulties are, however, such that one cannot hope to have any embracing theory of such processes, beyond the statement of the equations. It might be possible, however, to treat a few particular cases in detail with the aid of a digital computer."

He then goes on to elaborate further on how computers have already been extremely useful to him in helping him to understand his models (no need to make so many simplifying assumptions). The fact that he actually used computers to investigate the models underlying biology, makes him the first bioinformatician / computational biologist. The fact that he could see the future, and could see how computers would enable us to model and explore the natural sciences makes him an amazingly visionary scientist.

Extra reading:

See the list of books that Turing read as a teenager at school. This list was extracted from his school library record.
Manchester Uni are hosting a Turing Centenary Conference this month, with a host of famous names in computer science giving talks in his honour.

Saturday, 3 March 2012

De Bruijn graphs

De Bruijn graphs are currently a cornerstone of several genome sequence assembly algorithms. Nicolaas de Bruijn was a Dutch mathematician who died this year in February 2012. In 1946, he created the idea of the graph that is now named after him. The picture of him here is Copyright:MFO, and others can be seen at the Oberwolfach photo collection. I like to think he might be drawing graphs while resting on this bench. It looks like there's a football beside him, though he's hardly dressed for a game of football, in a tweed suit and tie.

The nodes of de Bruijn graphs are short sequences. There is an edge between two nodes if you could convert one sequence into another by shifting it along by one character and adding a new character at the end. So the nodes "AACAA" and "ACAAB" are allowed to be joined by an edge. In Haskell, you might say that tail node1 == init node2. In Python you might say node1[1:] == node2[:-1]. A very readable article in Nature Biotechnology describes how these graphs are used in sequence assembly. Basically, a subsequence size is chosen (for example, 32 characters), and all subsequences of this size that are found in your read fragment data then become nodes in a de Bruijn graph. We search for a good path through this graph after cleaning up some of the noise and errors and deciding what to do about loops. The path through the graph will tell you the final sequence.

At the moment, sequence assembly is still a big problem. A new genome is sequenced as a collection of short overlapping sequence pieces (short is usually anywhere between 32 and 500 bases at the moment), which then have to be painstakingly pieced together. There are errors in the sequencing, and repetitive regions, and this complicates the problem. The size of the data is also a problem. Plant genomes can have around 20 billion bases, so working with files and having enough memory to store data structures adds to the problem.

So much must have changed in de Bruijn's lifetime. Back in 1946 the structure of DNA was not clear. Watson and Crick's paper in 1953 was yet to come. Computers were yet to come. He wouldn't have known that more than 60 years later we'd be using his graphs to assemble whole genomes that were sequenced using the detection of fluoresence. What will we be doing 60 years from now that links maths, computer science, physics, chemistry and biology?

Friday, 13 January 2012

Artificial Intelligence and Microscopes

Artificial Intelligence has always been a branch of Computer Science that really catches the imagination of both scientists and the public. Trying to understand and replicate intelligence in all its different forms (reasoning, creativity, decision making, planning, language, etc) is exciting because it helps us to understand ourselves. Computer scientists such as Alan Turing have been pondering the implications and possibilities of AI since the 1940s and 50s. In 1951, Marvin Minsky built the first randomly wired neural network learning machine. He had studied mathematics and biology and was trying to understand brains. He's now famous for his work in AI, but, back in the 1950s, he wasn't just a mathematician, or just a computer scientist, but also studied optics, psychology, neurophysiology, mechanics and other subjects. Perhaps we pigeonholed people less into disciplines back then? Or maybe he was just amazing. Armed with all this knowledge, and a desire to learn about the brain and to look at neurons, he invented a new type of microscope, the confocal microscope. This gets rid of unwanted scattered light so that he could really focus in detail on a very specific part of the item he was looking at. Now he could see things that had never been seen before. He built the first one, and then patented this microscope in 1961. It would be another 20 years before the idea caught on (what would the research impact monitoring committees of today make of that?). Confocal microscopes are now in every biological lab and are taken for granted.

C. elegans is a 1mm long worm which lives in the soil. It is a very simple creature, easy to grow in the lab and it has a brain. Sydney Brenner (who is 85 years old today, 13th Jan 2012) has a Nobel Prize for introducing C. elegans to biologists as a "model organism": an ideal organism for studying the principles of life. In 1986, John White and his colleagues Southgate, Thomson and Brenner published a paper on the structure of the brain of C. elegans. Each worm has just 302 neurons and this number is the same for any C. elegans worm. They worked out where all the neurons were and what their connections to other neurons were, using a confocal microscope. John White had to make substantial improvements to Minsky's microscope design in order to do this. They took 8000 pictures ("prints", because it wouldn't have been digital back then) with the microscope and annotated them all by hand.

So we now have a complete picture of a simple brain. Other scientists have taken the data from White et al.'s work and created models of the brain. We understand a lot about the behaviour of the worm and which of its 302 neurons are responsible for which behviours. We have the entire C. elegans genome, so we know how many genes it has (approx 20,000), how many cells it has (approximately 1000), and we have a technique (RNA interference) for surpressing the behaviour of any gene we want to investigate. Are we nearly there yet? Are we at that tipping point where we've inspected all there is to inspect and found nothing except complexity? Have we already understood intelligence?