Amanda Clare: programming languages

Showing posts with label programming languages. Show all posts

Wednesday, 18 March 2015

Laboratory automation in a functional programming language

Whenever I write code in Haskell instead of other programming languages, it feels cleaner. Not just more elegant, but also more obviously correct. And that's not just about the lack of side effects and mutable variables. Haskell has stronger typing, which gives the programmer many guarantees and allows you to express more information about the code. It also has tools such as QuickCheck, in which you can state and test further properties that you believe to be true.

We wanted to bring these ideas to the area of laboratory automation. We've had some fairly large and complex lab automation systems in our lab over the years, with multiple robot arms, and dozens of devices to be serviced. These robot arms pass plastic plates containing yeast around incubators, washers, liquid handlers, centrifuge devises and so on. If the plates get deadlocked or left out of the incubator for too long because scheduling operations went wrong, then the experiment is ruined. However, this can happen if the scheduler needs to be able to make decisions on the fly during the experiments. It may need to decide what to do next based on the current instrument readings and current system capacity. So either you make a scheduler that's so simple that you know exactly what it will do in advance (but it can't do the workflow you really need), or you make a scheduler that's complex and flexible, but it's very difficult to analyse its properties. Hmm, I think Tony Hoare already suggested that choice.

So we've written a paper to demonstrate the benefits of programming a lab automation scheduler in Haskell, and in particular to demonstrate the kinds of properties that can be expressed and checked. We illustrate the paper with a fairly simple system and a fairly simple scheduler, but it's immediately obvious that more complex systems and schedulers can be explored by tweaking the code.

This paper was written by the three of us, coming together with three very different perspectives. Colin is a functional programming researcher at the Uni of York who enjoys opportunities to demonstrate the benefits of FP in real world problems. Rob works for PAA, an excellent lab automation company, who build complex bespoke systems (and software) for their clients. They built one of our lab automation systems. I'm both a user of such lab automation systems, and also a user of Haskell, without ever actually being an FP researcher.

The code is available as a literate Haskell file. The entire code is in this file, along with a complete description of what's going on and how it all works, and this file can easily be turned into a readable PDF document (which also includes all the code). https://github.com/amandaclare/lab-auto-in-fp

If you've been inspired by the ideas in this work, do please cite the paper:
C. Runciman, A. Clare and R. Harkness. Laboratory automation in a functional programming language. Journal of Laboratory Automation 2014 Dec; 19(6):569-76. doi: 10.1177/2211068214543373.
http://jla.sagepub.com/content/19/6/569.abstract

Abstract:
After some years of use in academic and research settings, functional languages are starting to enter the mainstream as an alternative to more conventional programming languages. This article explores one way to use Haskell, a functional programming language, in the development of control programs for laboratory automation systems. We give code for an example system, discuss some programming concepts that we need for this example, and demonstrate how the use of functional programming allows us to express and verify properties of the resulting code.

Tuesday, 10 March 2015

Python for Scientists

This year, 2014/2015, we started a new MSc course: Statistics for Computational Biology. We can see that there's a huge demand for bioinformaticians, for statisticians who can read biology, and for programmers who know about statistics and can apply stats to biological problems. So this new MSc encompasses programming, statistics and loads of the current hot topics in biology. It's the kind of MSc I would have loved to have done when I was younger.

As part of this degree, I'm teaching a brand new module called Programming for Scientists, which uses the Python programming language. This is aimed at students who have no prior programming knowledge, but have some science background. And in one semester we teach them the following:

The basics of programming: variables, loops, conditionals, functions
File handling (including CSV)
Plotting graphs using matplotlib
Exceptions
Version control using Git/Github
SQL database (basic design, queries, and using from SQLite from Python)
XML processing
Accessing data from online APIs

We taught it as a hands-on module, lectures held in a room full of computers, programming as we go through the slides, with exercises interspersed and demonstrators on hand to help.

We had students sign up for this module from a surprisingly diverse set of backgrounds, from biology, from maths, from geography and even from international politics. We also had a large number of staff and PhD students from our Biology department (IBERS) who wanted to sit in on the module. This was a wonderful group of students to teach. They're people who wanted to learn, and mostly just seemed to absorb ideas that first year undergraduates struggle with. They raised their game to the challenge.

Python's a great language for getting things done. So it makes a good hands-on language. However, it did highlight many of Python's limitations as a first teaching language. The objects/functions issue: I chose not to introduce the idea of objects at all. It's hard enough getting this much material comfortably into the time we had, and objects, classes and subclasses was something that I chose to leave out. So we have two ways to call functions: len(somelist) and somelist.reverse(). That's unfortunate. Variable scoping caught me out on occasion, and I'll have to fix that for next year. The Python 2 vs Python 3 issue was also annoying to work around. Hopefully next year we can just move to Python 3.

What impressed me most was the quality of the final assignment work. We asked the students to analyse a large amount of data about house sales, taken from http://data.gov.uk/ and population counts for counties in England and Wales taken from the Guardian/ONS. They had to access the data as XML over a REST-ful API, and it would take them approximately 4 days to download all the data they'd need. We didn't tell them in advance how large the data was and how slow it would be to pull it from an API. Undergrads would have complained. These postgrads just got on with it and recognised that the real world will be like this. If your data is large and slow to acquire then you'll need to test on a small subset, check and log any errors and start the assignment early. The students produced some clean, structured and well commented code and many creative summary graphs showing off their data processing and data visualisation skills.

I hope they're having just as much fun on their other modules for this course. I'm really looking forward to running this one again next year.

Wednesday, 25 January 2012

We need more than one programming language

I teach Haskell as a programming language to our undergraduates. I'm sure it's a continual subject of debate in computer science department coffee rooms up and down the country: "Which programming languages should we teach in our CS degrees?" The module that I teach is called "Concepts in Programming", and the idea is that there are indeed concepts in programming that are fundamental to many languages, that you can articulate the differences between programming languages and that these differences give each language different characteristics. Differences such as the type system, the order of evaluation, the options for abstraction, the separation of data and functions.

"We need more than one" is the title of a paper by Kathleen Fisher, a Professor in Computer Science at Tufts University. Her short, eloquent paper describes why we will never have just one programming language ("because a single language cannot be well-suited to all programming tasks"). She has had a career spanning industry and academia, has been the chair of the top programming language conferences (OOPSLA, ICFP), and has been the chair of the ACM Special Interest Group on Programming Languages. Her paper On the Relationship between Classes, Objects and Data Abstraction tells you everything you ever needed to know about objects. She knows about programming.

Her recent work has been looking at that problem of how to read in data files when the data is in some ad-hoc non-standard representation (i.e. not XML with a schema, or CSV or anything obvious). So we all have to write parsers/data structures to fit these data files. We write a program in a language such as Perl. She says "The program itself is often unreadable by anyone other than the original authors (and usually not even them in a month or two)". I've been there and done that.

And when we've written another new parser like this for the umpteenth time (especially in the field of bioinformatics) we begin to wonder "How many programs like this will I have to write?" and "Are they all the same in the end?" and Kathleen's paper The Next 700 Data Description Languages looks at just that question. What is the family of languages for processing data and what properties do they have? I love the title of this paper because it instantly intrigues by its homage to the classic 1966 paper The Next 700 Programming Languages by Peter Landin. (To see why Peter Landin's work on programming languages was so important, the in memorium speech for Peter Landin given at ICFP 2009 is well worth a listen, and also the last 3 mins of Rod Burstall's speech discusses this paper in particular).

So perhaps, in computer science, we're doomed to keep inventing new programming languages, which wax and wane in popularity over time: Lisp, Fortran, C, C++, Java, Ruby, Haskell, F#, Javascript and so on. But as computer scientists we should be able to understand why this happens. They're all necessary, all useful and all part of a bigger theoretical picture. We need more than one.

By the way, Kathleen Fisher would use Python as a teaching language.

This is my entry for the BCSWomen Blog Carnival.