Monday, 16 April 2012

Ada Lovelace Colloquium 2012


On 12th April, I went to the BCS Women Ada Lovelace Colloquium for the first time, which is a yearly national event for women undergraduate and masters students in computing. This year's event was held at the University of Bath. Before I went, I'd chatted to one of our male members of staff. I described where I was going and he'd said "Isn't that a bit sexist?" If only! If only it were a level playing field for women in computer science. Most undergrad compsci courses in the UK run at about 10% females, 90% males. If I go to a compsci conference my gender will be hugely in the minority, at or below 10%. At this year's Turing Centenary Conference only 1 out of the 19 invited speakers will be women. The Lovelace Colloquium reverses that gender ratio - there were men in attendance, and they were welcomed, but they were the ones in the minority. And all the speakers were female.


But other than the gender ratio, it looked and felt just like any other academic conference and the standard of work was very high. There were a mix of motivational talks and technical talks, and an excellent poster session. I was one of the judges for the masters poster competition and it was a really difficult decision. Each of the poster presenters was able to tell us in enthusiastic detail about their work and ideas, and several of them had brought along demos. We eventually awarded that prize to a poster from Wuraola Jinadu from Robert Gordon Uni in Aberdeen who was creating an iPad app for designing UML diagrams.

Saturday, 3 March 2012

De Bruijn graphs

De Bruijn graphs are currently a cornerstone of several genome sequence assembly algorithms. Nicolaas de Bruijn was a Dutch mathematician who died this year in February 2012. In 1946, he created the idea of the graph that is now named after him. The picture of him here is Copyright:MFO, and others can be seen at the Oberwolfach photo collection. I like to think he might be drawing graphs while resting on this bench. It looks like there's a football beside him, though he's hardly dressed for a game of football, in a tweed suit and tie.

The nodes of de Bruijn graphs are short sequences. There is an edge between two nodes if you could convert one sequence into another by shifting it along by one character and adding a new character at the end. So the nodes "AACAA" and "ACAAB" are allowed to be joined by an edge. In Haskell, you might say that tail node1 == init node2. In Python you might say node1[1:] == node2[:-1]. A very readable article in Nature Biotechnology describes how these graphs are used in sequence assembly. Basically, a subsequence size is chosen (for example, 32 characters), and all subsequences of this size that are found in your read fragment data then become nodes in a de Bruijn graph. We search for a good path through this graph after cleaning up some of the noise and errors and deciding what to do about loops. The path through the graph will tell you the final sequence.

At the moment, sequence assembly is still a big problem. A new genome is sequenced as a collection of short overlapping sequence pieces (short is usually anywhere between 32 and 500 bases at the moment), which then have to be painstakingly pieced together. There are errors in the sequencing, and repetitive regions, and this complicates the problem. The size of the data is also a problem. Plant genomes can have around 20 billion bases, so working with files and having enough memory to store data structures adds to the problem.

So much must have changed in de Bruijn's lifetime. Back in 1946 the structure of DNA was not clear. Watson and Crick's paper in 1953 was yet to come. Computers were yet to come. He wouldn't have known that more than 60 years later we'd be using his graphs to assemble whole genomes that were sequenced using the detection of fluoresence. What will we be doing 60 years from now that links maths, computer science, physics, chemistry and biology?

Wednesday, 25 January 2012

We need more than one programming language

I teach Haskell as a programming language to our undergraduates. I'm sure it's a continual subject of debate in computer science department coffee rooms up and down the country: "Which programming languages should we teach in our CS degrees?" The module that I teach is called "Concepts in Programming", and the idea is that there are indeed concepts in programming that are fundamental to many languages, that you can articulate the differences between programming languages and that these differences give each language different characteristics. Differences such as the type system, the order of evaluation, the options for abstraction, the separation of data and functions.

"We need more than one" is the title of a paper by Kathleen Fisher, a Professor in Computer Science at Tufts University. Her short, eloquent paper describes why we will never have just one programming language ("because a single language cannot be well-suited to all programming tasks"). She has had a career spanning industry and academia, has been the chair of the top programming language conferences (OOPSLA, ICFP), and has been the chair of the ACM Special Interest Group on Programming Languages. Her paper On the Relationship between Classes, Objects and Data Abstraction tells you everything you ever needed to know about objects. She knows about programming.

Her recent work has been looking at that problem of how to read in data files when the data is in some ad-hoc non-standard representation (i.e. not XML with a schema, or CSV or anything obvious). So we all have to write parsers/data structures to fit these data files. We write a program in a language such as Perl. She says "The program itself is often unreadable by anyone other than the original authors (and usually not even them in a month or two)". I've been there and done that.

And when we've written another new parser like this for the umpteenth time (especially in the field of bioinformatics) we begin to wonder "How many programs like this will I have to write?" and "Are they all the same in the end?" and Kathleen's paper The Next 700 Data Description Languages looks at just that question. What is the family of languages for processing data and what properties do they have? I love the title of this paper because it instantly intrigues by its homage to the classic 1966 paper The Next 700 Programming Languages by Peter Landin. (To see why Peter Landin's work on programming languages was so important, the in memorium speech for Peter Landin given at ICFP 2009 is well worth a listen, and also the last 3 mins of Rod Burstall's speech discusses this paper in particular).

So perhaps, in computer science, we're doomed to keep inventing new programming languages, which wax and wane in popularity over time: Lisp, Fortran, C, C++, Java, Ruby, Haskell, F#, Javascript and so on. But as computer scientists we should be able to understand why this happens. They're all necessary, all useful and all part of a bigger theoretical picture. We need more than one.


This is my entry for the BCSWomen Blog Carnival.

Friday, 13 January 2012

Artificial Intelligence and Microscopes

Artificial Intelligence has always been a branch of Computer Science that really catches the imagination of both scientists and the public. Trying to understand and replicate intelligence in all its different forms (reasoning, creativity, decision making, planning, language, etc) is exciting because it helps us to understand ourselves. Computer scientists such as Alan Turing have been pondering the implications and possibilities of AI since the 1940s and 50s. In 1951, Marvin Minsky built the first randomly wired neural network learning machine. He had studied mathematics and biology and was trying to understand brains. He's now famous for his work in AI, but, back in the 1950s, he wasn't just a mathematician, or just a computer scientist, but also studied optics, psychology, neurophysiology, mechanics and other subjects. Perhaps we pigeonholed people less into disciplines back then? Or maybe he was just amazing. Armed with all this knowledge, and a desire to learn about the brain and to look at neurons, he invented a new type of microscope, the confocal microscope. This gets rid of unwanted scattered light so that he could really focus in detail on a very specific part of the item he was looking at. Now he could see things that had never been seen before. He built the first one, and then patented this microscope in 1961. It would be another 20 years before the idea caught on (what would the research impact monitoring committees of today make of that?). Confocal microscopes are now in every biological lab and are taken for granted.

C. elegans is a 1mm long worm which lives in the soil. It is a very simple creature, easy to grow in the lab and it has a brain. Sydney Brenner (who is 85 years old today, 13th Jan 2012) has a Nobel Prize for introducing C. elegans to biologists as a "model organism": an ideal organism for studying the principles of life. In 1986, John White and his colleagues Southgate, Thomson and Brenner published a paper on the structure of the brain of C. elegans. Each worm has just 302 neurons and this number is the same for any C. elegans worm. They worked out where all the neurons were and what their connections to other neurons were, using a confocal microscope. John White had to make substantial improvements to Minsky's microscope design in order to do this. They took 8000 pictures ("prints", because it wouldn't have been digital back then) with the microscope and annotated them all by hand.

So we now have a complete picture of a simple brain. Other scientists have taken the data from White et al.'s work and created models of the brain. We understand a lot about the behaviour of the worm and which of its 302 neurons are responsible for which behviours. We have the entire C. elegans genome, so we know how many genes it has (approx 20,000), how many cells it has (approximately 1000), and we have a technique (RNA interference) for surpressing the behaviour of any gene we want to investigate. Are we nearly there yet? Are we at that tipping point where we've inspected all there is to inspect and found nothing except complexity? Have we already understood intelligence?

Further reading/viewing:

Wednesday, 16 November 2011

Frederick Soddy


Last week I went to a Royal Society of Chemistry lecture about Frederick Soddy, who had been at Aberystwyth in 1894 and won the Nobel Prize for Chemistry in 1921 for his discovery of isotopes. The speaker was Dr Alun Price. I was fascinated by this diagram that Soddy showed to the British Association meeting in Birmingham in 1913, depicting what he knew about the radioactive decay of 3 elements (actinium, uranium, thorium). At first - what an untidy looking diagram! But then it does show what he knew at the time in an organised way, and tells the story far better than a paragraph of words. He now has a uranium compound named after him: Soddyite.

Apart from his fantastic work in chemistry, he also wrote poetry and wrote books about economics. One of the quotes given in the talk was: "The man who said that it was not possible to fool all the public all of the time was fortunately quite ignorant of the methods of modern banking" (Frederick Soddy, 1924).

Tuesday, 15 November 2011

Shadowing my MP: constantly switching topics

For two days during the Royal Society MP/Scientist Pairing Scheme I shadowed my MP. He's Mark Williams, the MP for my constituency, Ceredigion. His interests are mostly in Welsh issues, human rights issues, education issues, and general matters raised by his constituents, which are not usually science issues. However the shadowing experience was certainly an eye-opener for me.

Parliament sits from Mon-Thurs and then on Friday the MPs travel back to their constituencies and have surgeries, meet the public, attend events, open buildings, etc. Sometimes they do this at weekends too. Then back to Westminster during the week. While at Westminster it seems to be a hectic run of meetings and debates, and these constantly switch from one topic to the next. How do they do that?

On Thursday 3rd November we started with a press conference about Camp Ashraf in Iraq (a refugee camp for Iranians that the mass media seems to be ignoring for some reason, but that would be another post). After this was a session in the Commons Chamber, for Urgent Questions: the raising of anything and everything that MPs think there's a pressing need to ask for a debate about. Questions ranged from traffic problems in Bradford to student visas, metal theft, solar panel feed-in tariffs and the VAT threshold for micro businesses. After this, quickly back to the office to finish preparing a speech for the afternoon's debate on the Silk Commission which will be set up to look at Welsh Devolution issues. I watched part of this debate (which was surprisingly interesting) and then mid-afternoon moved to Westminster Hall to catch part of a debate about a report on shale gas extraction and 'fracking'.

As an MP, it's not possible to be an expert in all of these areas. It's not possible for them to have time to be an expert. It looked as if it was hardly possible to have time to grab a sandwich at lunch! So they have to rely on their office-researchers, on the information packs provided by POST and other parliamentary bodies, and on the information they receive from their constituents and the public. And from this information they have to raise topics for discussion, join in with informed debate on a huge range of issues and then be able to vote on detailed legislation, which provides the laws by which we all live. I found this idea somewhat scary.

Openness: parliament and academia

One thing I was particularly struck by was how open most of the workings of Parliament were. Members of the public can sit in the galleries to watch the debates in the Houses of Commons and Lords, and in Westminster Hall. The debates are televised and fully minuted in Hansard, with minutes online by 6am the next day. Wouldn't it be amazing if we did science in this way, with all of our operations televised and fully minuted for inspection! That really would be the ideal of Open Science.

Not only that, but members of the public can sit in on most select committee meetings, and watch the proceedings. The science equivalent would be to have a cordon at one end of your lab and to allow the public free access to come and go at any time as long as they sit behind the cordon. And the committee reports are all made publicly available on their websites. Not publicly available in some journal that you might not have access to or have to pay $30 per report for, but freely available. Science still has a long way to go to become this freely available.