Amanda Clare: science

Showing posts with label science. Show all posts

Thursday, 25 May 2017

Releasing research software

How should researchers be releasing their software?

The BBSRC together with the Software Sustainability Institute and Elixir-UK recently held a workshop to find out. This was the "Developing Software Licensing Guidance for BBSRC Workshop" (April 2017). The BBSRC wanted to ask the community what guidance it should be providing for grant applicants, grant reviewers and the developers and users of research software.

Here's some of the points that were raised during the day:

Ownership

Who owns the software? University academics often have contracts that say that everything we do belongs to the uni, even if it's in the evenings or weekends. Students, on the other hand tend to own their own IP. On collaborative projects, especially those collaborating with businesses, the ownership of software becomes more complex again. If the software is truly open source, then hopefully multiple people will contribute ('random strangers on the internet'). Who owns it then?

Licenses

Three main options: very permissive, copyleft or commercial. If you need to choose a software license, some universities are well supported by technology transfer staff who understand all the implications and some are not. There are websites to help people choose, but the issues are complex and so far they haven't helped me choose. The consensus for permissive licenses seemed to be that while MIT was good (simple, easy to read and understand), the Apache 2.0 license actually deals with accepting contributions from other developers in future ("Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions."). If you don't use this license then you may have to use a contributer licensing agreement and that can scare off the casual patch submitter. The "for academic use only" kind of license was widely seen to be awful, restricting future collaboration, restricting project expansion and being unclear about who is and isn't an academic.

Software is produced at different scales

There are quick scripts, solid code, and whole platforms. The BBSRC will be funding work that produces all three types of software. It wants all grant-holders to think about how they will release their software and support the creation of software. We know that documenting, testing and packaging is time consuming. How do we ensure that projects allow time for this? How can software be cited properly (see Software citation principles)? Is there any such thing as a 'throwaway script'? Will the project management team include someone who knows about software?

Existing advice

There is a collection of existing advice from various places about how to choose licenses and release software.

Licenses: SSI Licensing (and Choosing an Open Source License), Choose a License, tldrLegal and Open Source Initiative
Journals that have given guidance: PLOS Comp Biol Submissions
SSI software management plan

The release of software as an output of research is clearly an issue that is being raised by multiple organisations now and it's great to see software being taken seriously. 2017 will see the second Research Software Engineers Conference (RSE2017) and a June 2016 Dahgstul Workshop produced an Engineering Academic Software Manifesto, containing pledges such as

I will make explicit how to cite my software.
I will cite the software I used to produce my research results.
When reviewing, I will encourage others to cite the software they have used.

and more. The BBSRC now has the task of pulling together all the discussion from the workshop and other places and creating a guidance document to help grant applicants, reviewers and panellists, and also the software developers. This will then become part of the grant proposal process along with other docs such as the data management plan, pathways to impact, justification of resources, etc.

Tuesday, 4 April 2017

The metahaplome

A sample of water, soil or gut contents will contain a whole community of microbes, cooperating and competing. We now have the sequencing technology to begin to explore these communities, to find out the variation that they possess. This can be useful in the search for new anti-microbials, or in the search for better enzymes for biofuels. However, the sequencing technology is not quite there yet. The very short length of the reads, together with the errors introduced, combine to make the problem of reassembling the underlying genomes much more complex.

We introduce the concept of the 'metahaplome': the exact sequence of DNA bases (or "haplotype") that constitutes the genes and genomes of every individual present. We also present a data structure and algorithm that will recover the haplotypes in the metahaplome, and rank them according to likelihood.

Our preprint about the metahaplome is now available at bioRxiv: Probabilistic Recovery Of Cryptic Haplotypes From Metagenomic Data.

Monday, 20 March 2017

Is academic impact useful as a proxy measure for real world impact?

In academia we have various measures of "success". Some of the more common measures are "how many papers did we publish in reputable journals" and "how many other academics went on to use my work or refer to my work". We count citations, check out our h-index, and perhaps even check the number of other academics talking about our work on social media. We'd all like our work to be useful to others. Of course, any measure of success can be gamed. Academics may cite themselves to boost citation counts, use clickbait paper titles to attract attention, and select journals for prestige rather than availability to the community.

However, to be useful to other academics is not the same as to be useful to the rest of the world. Are we also having impact outside of academia? Some blue sky basic research is unlikely to do this (but perhaps likely to be cited by academics). Some applied research can have immediate impact on medical outcomes, law and policy, societal attitudes, civil rights, environmental strategies and business practices.

The UK tries to measure this kind of research impact by asking universities to submit REF Impact case studies. These are summary documents, written by academics, describing what impact they've had. They're not easy to write, or to evidence, and yet they are used as part of the REF exercise measuring research quality, whereby Higher Education funding bodies distribute research money to the universities.

How can we make it easier for academics to find and present their impact? Before we answer that, is the whole exercise worth doing anyway? We need to know if we could just cheaply count citations and use this as a proxy for real world impact. After all, if a paper is popular with academics, surely it's also going to be useful to the rest of the world too?

We've done the analysis in Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvement. TL;DR: the answer is 'no'. Academic impact is not correlated with more comprehensive impact in the real world. We're not surprised, but we needed to prove it. But don't take our word for it: we've made all the data available. Try it yourself.

On with the next steps now... James will now be data mining the real world impact of scientific research, from the collection of unstructured documents out there in news archives, parliamentary proceedings, etc. We expect NLP, data mining and machine learning challenges ahead as we trace the movement of academic ideas and results out from academia and into the rest of the world.

Wednesday, 3 August 2016

Yr Eisteddfod

Dw i'n dysgu Cymraeg. I'm a Welsh learner (still making many mistakes). Wythnos diwetha es i i'r Eisteddfod i helpu yn yr Pafiliwn Gwyddoniaeth a Thechnoleg (dydd Gwener i dydd Sul). Last week I went to the Eisteddfod to help in the Science and Technology Pavilion (Friday to Sunday). Mae hi'n fy Eisteddfod gyntaf. It was my first Eisteddfod. Bendigedig! Bydda yn mynd eto blwyddyn nesaf. Fantastic! I'll go again next year.

Wednesday, 23 December 2015

Seamless gene deletion

2015 is the year that genome editing really became big news. A new technique, "CRISPR/CAS", was named as Science magazine's breakthrough of the year as voted by the public from a shortlist chosen by staff.

However, people have been manipulating DNA through many useful methods long before CRISPR/CAS made headlines. Gene deletion is an important tool when trying to understand the function of genes. Take out a gene and see what effect it causes. Genes can be disrupted (by removing a portion of the DNA or inserting some extra DNA) or can be interfered with, for example via their RNA production, or they can be entirely deleted. It's common practice when removing a gene to insert a marker, so that we can easily select for the cells where this procedure has been successful. For example, to insert an antibiotic resistance gene as a marker, so that we can now grow the cells on a plate with an antibiotic. Then only those that have lost our gene of interest and gained antibiotic resistance will now grow. The trouble with this is that many gene deletions have no visible effect by themselves. If we also want to delete a second gene and a third, then we need more markers, or we need to be able to remove and reuse the marker we inserted. We also don't want the process to leave any scars behind that could destabilise the genome. We've just published a paper to help solve this problem.

This process of 'swap a gene of interest for a marker gene' can be achieved in many organisms by homologous recombination. This is a process used by many cells to repair broken strands of DNA. If we provide a piece of DNA that has a good region of similarity to the region just downstream of our gene of interest, and also a good region of similarity to the region just upstream of the gene of interest, but instead of the gene of interest, has the marker gene between these regions, then the normal cellular processes of homologous recombination will exchange the two. Some organisms perform homologous recombination very readily (S. cerevisiae for example). Others may need a little more encouragement, such as creating a double stranded break.

Our new paper A tool for Multiple Targeted Genome Deletions that Is Precise, Scar-Free and Suitable for Automation with Wayne Aubrey as first author uses a 3-stage PCR process to synthesise a stretch of DNA (a 'cassette') that will do everything. It will have good regions of similarity to the regions upstream and downstream of the gene of interest. It will contain a marker gene. And (here's the good bit), it will contain a specially designed region ('R') before the marker gene that is identical to the region that occurs just after the gene of interest. In this way, after homologous recombination has done its thing and inserted the DNA cassette instead of the gene of interest, there will be two identical R regions, one before the marker gene, and one after the marker gene. Sometimes the DNA will loop round on itself, the two R regions will match up and homologous recombination will snip out the loop, including the marker gene.

We can encourage this to happen and select for the cells that have had this happen if our marker is also 'counter-selectable'. That is, we'd like a marker for which we can add something to the growth medium so that now only cells without the marker will now grow. That is, we'd like to use a marker or marker combination for which we can first select for its presence and then counter-select for its absence. When we have this we can select for cells that have had the marker replace the gene, and then counter-select for cells that have now lost the marker too. So we have a clean gene deletion.

Of course we're always standing on the shoulders of giants when we do science. Our method is an improvement on a method by Akada 2006, so that no extra bases are lost or gained and the method requires no gel purification steps. Just throw in your primers and products and away you go. It's not fussy about quantity. No purification steps means that it could be automated on lab robots. And it could be used to delete any genetic component, not just genes. Give it a try!

Tuesday, 10 March 2015

Python for Scientists

This year, 2014/2015, we started a new MSc course: Statistics for Computational Biology. We can see that there's a huge demand for bioinformaticians, for statisticians who can read biology, and for programmers who know about statistics and can apply stats to biological problems. So this new MSc encompasses programming, statistics and loads of the current hot topics in biology. It's the kind of MSc I would have loved to have done when I was younger.

As part of this degree, I'm teaching a brand new module called Programming for Scientists, which uses the Python programming language. This is aimed at students who have no prior programming knowledge, but have some science background. And in one semester we teach them the following:

The basics of programming: variables, loops, conditionals, functions
File handling (including CSV)
Plotting graphs using matplotlib
Exceptions
Version control using Git/Github
SQL database (basic design, queries, and using from SQLite from Python)
XML processing
Accessing data from online APIs

We taught it as a hands-on module, lectures held in a room full of computers, programming as we go through the slides, with exercises interspersed and demonstrators on hand to help.

We had students sign up for this module from a surprisingly diverse set of backgrounds, from biology, from maths, from geography and even from international politics. We also had a large number of staff and PhD students from our Biology department (IBERS) who wanted to sit in on the module. This was a wonderful group of students to teach. They're people who wanted to learn, and mostly just seemed to absorb ideas that first year undergraduates struggle with. They raised their game to the challenge.

Python's a great language for getting things done. So it makes a good hands-on language. However, it did highlight many of Python's limitations as a first teaching language. The objects/functions issue: I chose not to introduce the idea of objects at all. It's hard enough getting this much material comfortably into the time we had, and objects, classes and subclasses was something that I chose to leave out. So we have two ways to call functions: len(somelist) and somelist.reverse(). That's unfortunate. Variable scoping caught me out on occasion, and I'll have to fix that for next year. The Python 2 vs Python 3 issue was also annoying to work around. Hopefully next year we can just move to Python 3.

What impressed me most was the quality of the final assignment work. We asked the students to analyse a large amount of data about house sales, taken from http://data.gov.uk/ and population counts for counties in England and Wales taken from the Guardian/ONS. They had to access the data as XML over a REST-ful API, and it would take them approximately 4 days to download all the data they'd need. We didn't tell them in advance how large the data was and how slow it would be to pull it from an API. Undergrads would have complained. These postgrads just got on with it and recognised that the real world will be like this. If your data is large and slow to acquire then you'll need to test on a small subset, check and log any errors and start the assignment early. The students produced some clean, structured and well commented code and many creative summary graphs showing off their data processing and data visualisation skills.

I hope they're having just as much fun on their other modules for this course. I'm really looking forward to running this one again next year.

Monday, 9 March 2015

International Women's Day pub quiz

On Sunday 8th March 2015, Hannah Dee and I organised a pub quiz for International Women's Day. We wanted to highlight some famous women in science, but we don't expect people to know much about famous women in science. So how to do a quiz? We themed 5 rounds around the women:

1) The Mary Anning fossil hunting round
A huge word search with many words related to Mary Anning's work and fossils to find (including "ichtheosaur" and "she sells sea shells", "on the sea shore".

2) The Amelia Earhart aviation round
Create paper aeroplanes that will travel from Europe (over here) to America (over there) and land within an area marked by a hula hoop. We should have had planes crossing the Atlantic in the other direction, but oh well, we're in west Wales.

3) The Caroline Herschel stargazing round
Early astronomy was often about spotting small differences in maps of the heavens. Thanks to heavens-above.com we had a copy of the sky map for the evening, and another copy that had been modified with gimp. Spot the difference! Three Gemini twins?

4) The Barbara McClintock genome round
Here we used C. Titus Brown's shotgunator to make a set of short reads from a few sentences about the work of Barbara McClintock. The teams had to assemble the genome to decipher the sentences. It must have seemed as if transposons were at work, because with a few repeated words the sentences they were constructing did get rather jumbled.

5) The Florence Nightingale data visualisation round
Finally the teams got to use a box of stuff (pipe cleaners, stickers, fluorescent paper, googly eyes, coloured pens) to make the most creative version of this year's HESA stats on women employed in higher education.

The scales of employment in HE

No trivia or celebrities in the quiz at all!

Friday, 18 July 2014

Microscope webcam microtitre plate reading using image analysis

An A-level student has just spent two weeks with us for his work experience, and his project has been to investigate the use of a cheap microscope webcam as an alternative to an expensive plate reader for the measurement of the growth of yeast in microtitre plates. The longer term aim would be to mount this webcam on the deck of our Tecan Genesis liquid handler robot, and to have the robot arm move the plate under the webcam.

The webcam is a Veho VMS-004, used at 20x magnification, and it costs just £40. It was recognised automatically by Linux as a webcam and worked really well with the OpenCV library.

Robert Buchan-Terrey did an excellent job in interdisciplinary science in just two weeks, including the following:

Preparing media and growing yeast in our lab
Pipetting the yeast to make dilutions
Using the microscope webcam, taking images of the wells in the plate at intervals throughout the day, and corresponding plate readings with a real plate reader
Coding using Python and OpenCV to process the images (find the circular well, work out the average pixel intensity in the well)
Data analysis and stats to understand the results

He also produced a poster to demonstrate the findings and to take back to his school.

And the answer is: although he's just analysed the data from one time point so far, and we took no care to make sure the lighting conditions were stable when taking the images, or to shake the plates to evenly disperse the yeast, it really does look very plausible that we could use this in future. Averaging over 8 replicate wells gives a remarkable correspondence between image-analysis results and plate reader results. Individual wells are more variable, but still show promise. We've yet to test all the data, and to test the full range of the scale of optical density, but this looks extremely exciting.

Thanks very much to Wayne Aubrey and Hannah Dee for their help and expertise with the yeast biology and the image processing respectively.

Monday, 12 November 2012

Alfred Henry Allen, an extraordinary chemist

My parents have just written a fantastic paper about A. H. Allen, Sheffield's first Public Analyst. Alfred Henry Allen lived from 1846 to 1904, in a time of gas lamps, horse drawn carriages and dubious Victorian era water quality. His chemical investigations and new methods of analysis shone a light on the practices of careless or unscrupulous food and drink manufacturers.

The paper describes lots of his achievements. For example he investigated why the drinking water in some areas of Sheffield contained harmful lead, which was poisoning the population, while other areas had lead-free water (it turns out that the leaded water came from reservoirs which were found to be acidic, and the acid dissolved some of the lead from pipework, so he proposed that the water be treated with lime and limestone to remove the acidity).

Allen investigated the proportions and effects of the different alcohols found in whisky, and even did some testing on himself, drinking a wine glass full of whisky every evening for 3 weeks in order to show that amyl alcohols did no harm. You'll also want to read about his concerns about the "slovenly and ignorant" production of cider, where the cider makers reused manure carts to carry apples.

He didn't just do the science, but also communicated it to a wider audience. He gave public talks and "entertainments", such as Alchemy and the Alchemist, Chemistry of Explosives, Visible Sound, and Artificial Light. The paper points out that "People in Victorian times had a thirst for entertainment of a scientific and paranormal nature", and I imagine his lectures would have been very popular. Would his talk on Chemistry of Explosives have featured exciting demonstrations that would be impossible today for health and safety reasons? What magic would the lecture on Alchemy have shown?

Allen published over 150 papers and many books on methods of chemical analysis. He wrote 13 volumes of a book called "Commercial Organic Analysis", which needed continual updating as science progressed. He was a founder member of the Society of Public Analysts. He died of diabetes, a disease that had no cure or effective treatment at the time (though Allen had published papers, a book and chemical analysis methods for the determination of sugars in urine, so he would have been well able to measure his condition). During his life, he developed a business of consulting chemists (A.H. Allen & Partners), at which, after many years, my parents worked, and they have now honoured him by documenting his place in history.

Friday, 17 August 2012

Science galleries

There is a growing awareness that scientists and engineers should be better at communicating their work to the public (and to policy makers). It's not enough just to do good science if no one knows about it.

The excellent Sixty Symbols project has scientists at Nottingham Uni making videos about interesting aspects of physics and astronomy. There are Science Cafes (or Cafe Scientifiques) around the country to chat about science in a mixed audience. As part of our current research project we're going to make an online bioinformatics activity that can be used in schools to teach children about binary numbers, rules and about how genome translates into phenotype. BCS Mid-Wales have recently had two successful show-and-tell events where robots, games, and technology were proudly shown and enjoyed. And we've just had a student develop a fantastic HTML5/Javascript game to demonstrate evolution in partially selfing fish populations (you have to breed a bigger population than your competitors and not get infected).

In every reasonably-sized town there will an art gallery (where the public can go to see art), a museum (where the public can learn about history), and a library (where the public can go to find literature). is there an equivalent for science? There are science museums/education centres in certain towns, but they aren't as widespread as art galleries/museums/libraries by a long way. And they're often aimed at educating children and teenagers rather than aiming at inspiring adults. Art galleries are rarely aimed at children and teenagers.

Here in Aberystwyth we have an Arts Centre with multiple galleries where I can see stunning photography, paintings and prints, Ceredigion museum which shows me what amazing outfits people used to wear when bathing, and how huge telephone cables used to be, and we have more libraries than most towns, including the university libraries, a town library and the National Library of Wales. But we don't have a place I can go on a rainy Saturday afternoon to browse science. I can read about science in the library, but that's not the same thing. I'd like to see Science Galleries in every town.

Saturday, 12 November 2011

So how can scientists get involved in the use of science in the UK Parliament?

During the MP/Scientist pairing scheme we learned about several of the bodies involved in making sure science is used in Parliament.

One of those was the Science and Technology Select Committee which has the job of scrutinising institutions or policies. It produces reports with recommendations, to which Government has to respond. For example, a report on peer review in scientific publications or a report on practical experiments in school science lessons.

Another is the Parliamentary Office of Science and Technology (POST), which prepares information notes and packs for the members of parliament about the topics that they will be discussing and debating within the current term of Parliament. These reports can be browsed online, such as this report on women in science, engineering and technology. The reports aim to be party-neutral and fact-based (how they can do this, I don't know, when, as the saying goes, there are "lies, damned lies and statistics"). MPs will then use facts and figures from these reports in their debates.

And a third is the Government Office for Science (GO-Science), which seems to do everything else, from providing emergency response information (e.g. about flu pandemics or volcanic ash clouds), to providing views about the long term (what will life be like 20 years from now? what will we need to legislate about?). There are still more organisations (e.g. research councils, and the Parliamentary and Scientific Committee) and these are listed in this guide (which has a strangely unofficial-looking URL).

So there are lots of civil servants involved, and lots of committees. When these committees need to make reports, they invite expert opinion. They ask the learned societies (eg Royal Society of Chemistry, Royal Academy of Engineering, etc.) to suggest experts. They ask the research councils to suggest experts. They will also consider evidence submitted by scientists-at-large: if you have an expertise in an area they are investigating then you can make yourself known to them. After gathering written and oral evidence, they compile a report.

How do we, as scientists, know what's going to be discussed in Parliament in order to make ourselves known at the right time, to provide such evidence? That's not so easy. We can try to keep track of all the websites or feeds of the above committees. But really, we could do with having subject specific bodies such as the BCS, RAEng, RCS, etc aggregate that sort of information and push it out to us.

Wednesday, 2 November 2011

Day 3 in Westminster

So, today I've been to a Science and Technology Select Committee meeting asking questions about the Met Office. The public are welcome to sit in on these meetings, and there were 4 or 5 rows of chairs at the back of the room for this purpose. MPs and civil servants sit around a horseshoe-shaped table, interviewing expert witnesses who sit in a line across the end of the horseshoe. In this case they were interviewing the chiefs of the Met Office. The questions were mostly read from a script, and the aim was to obtain more information on whether the Met Office is doing what it should be doing as a public service, what could be improved, what (supercomputer) investments were needed and where the money for that might come from. They asked about their collaborations - who do they collaborate with, how are the relationships with private sector and academia and how much more could be done. They asked about how they were planning to make their data more accessible: to the academics, to businesses and to the public (and the problem of how to communicate uncertainty to the public). They asked a whole range of diverse questions, which the Met Office seemed to answer easily, and it felt like they had very little real opposition from the panel. That might be because the Met Office does do an excellent job, it might be because the panel didn't know how to ask probing questions about the supercomputing resources necessary (or understand the answers they might get), or simply because that wasn't in their remit, I don't know.

How do MPs learn enough about a subject to be able to conduct Select Committee interviews, to debate an issue in the Commons, or to make an informed choice whether to vote aye or no to a Bill? This is what I'm curious to find out. And then, how can the scientific community best help them to make such informed choices when they need that advice?

Tuesday, 1 November 2011

Day 2 in Westminster

At the end of the second day of the Week in Westminster for the Royal Society MP-Scientist Pairing Scheme, we have had an introduction to all the committees and structures involved in science and technology that operate to help the government make policy. I have been amazed to find out how little I actually knew about how government worked. We've discussed issues such as how Select Committees scrutinise government policies (and how the Select Committees from the House of Lords and the House of Commons differ in what they investigate), how scientific advice is provided in emergency situations, the Foresight team that has to report on issues that might arise far ahead in the future, and the role of John Beddington, the chief scientific advisor.

One question we were asked today, which I found very interesting, was to imagine being the people responsible for reviewing the use of science and engineering in government departments. What processes, structures and resources would you want a department have in place to ensure its science and analytical activities are robust and effective? If you were reviewing a department (for example, the Department of Education), what would you want them to demonstrate in order to convince you that they were effectively using science to guide their policy-making?

Tomorrow I'll be attending a meeting of the House of Commons Science and Technology Select Committee (they'll be discussing the Met Office) and then shadowing my constituency MP, Mark Williams to find out what a day in his life in Westminster is like.

Sunday, 30 October 2011

A week in Westminster

I've been paired with my MP as part of the Royal Society MP-Scientist pairing scheme and today I'm travelling to London in order to start a week in Westminster. The aim of the pairing is for the scientists involved to learn more about the work of policy making, and then for the MPs to pay reciprocal visits to find out more about the work of the scientists. I'm paired with Mark Williams, Lib Dem MP for Ceredigion.

The Royal Society have organised a packed timetable for the week, of talks and tours of the different aspects of Westminster. Then in any spare time, we shadow our MPs or civil servants (it's possible to be paired with either). It should prove to be quite a different week to my normal week in computer science!