February 12, 2010 | Serendipity

Climate Science and Software Quality

12. February 2010 · 22 comments · Categories: climate science

Here’s a letter I’ve sent to the Guardian newspaper. I wonder if they’ll print it? [Update – I’ve marked a few corrections since sending it. Darn]

Professor Darrel Ince, writing in the Guardian on February 5th, reflects on lessons from the emails and documents stolen from the Climatic Research Unit at the University of East Anglia. Prof Ince uses an example from the stolen emails to argue that there are serious concerns about software quality and openness in climate science, and goes on to suggest that this perceived alleged lack of openness is unscientific. Unfortunately, Prof Ince makes a serious error of science himself – he bases his entire argument on a single data point, without asking whether the example is in any way representative.

The email and files from the CRU that were released to the public are quite clearly a carefully chosen selection, where the selection criteria appears to be those that might cause maximum embarrassment to the climate scientists. I’m quite sure that I could find equally embarrassing examples of poor software on the computers of Prof Ince and his colleagues. The Guardian has been conducting a careful study of claims that have been made about these emails, and has shown that the allegations that have been made about defects in the climate science are unfounded. However, these investigations haven’t covered the issues that Prof Ince raises, so it is worth examining them in more detail.

The Harry README file does appear to be a long struggle by a junior scientist to get some poor quality software to work. Does this indicate that there is a systemic problem of software quality in climate science? To answer that question, we would need more data. Let me offer one more data point, representing the other end of the spectrum. Two years ago I carried out a careful study of the software development methods used for main climate simulation models developed at the UK Met Office. I was expecting to see many of the problems Prof Ince describes, because such problems are common across the entire software industry. However, I was extremely impressed with the care and rigor by which the climate models are constructed, and the extensive testing they are subjected to. In many ways, this process achieves a higher quality code than the vast majority of commercial software that I have studied, which includes the spacecraft flight control code developed by NASA’s contractors. [My results were published here: http://dx.doi.org/10.1109/MCSE.2009.193].

The climate models are developed over many years, by a large team of scientists, through a process of scientific experimentation. The scientists understand that their models are approximations of complex physical processes in the Earth’s atmosphere and oceans. They build their models through a process of iterative refinement. They run the models, and compare them with observational data, to look for the places where the models perform poorly. They then create hypotheses for how to improve the model, and then run experiments: using the previous version of the model as a control, and the new version as the experimental case, they compare both runs with the observational data to determine whether the hypothesis was correct. By a continual process of making small changes, and experimenting with the results, they end up testing their models far more effectively than most commercial software developers. And through careful use of tools to keep track of this process, they can reproduce past experiments on old versions of the model whenever necessary. The main climate models are also subjected to extensive model intercomparison tests, as part of the IPCC assessment process. Models from different labs are run on the same scenarios, and the results compared in detail, to explore the strengths and weaknesses of each model.

Like many software industries, different types of climate software are verified to different extents, representing choices of where to apply limited resources. The main climate models are tested extensively, as I described above. But often scientists need to develop other programs for occasional data analysis tasks. Sometimes, they do this rather haphazardly (which appears to be the case with the Harry file). Many of these tasks are experimental tentative in nature, and correspond to the way software engineers regularly throw a piece of code together to try out an idea. What matters is that, if the idea matures, and leads to results that are published or shared with other scientists, the results are checked out carefully by other scientists. Getting hold of the code and re-running it is usually a poor way of doing this (I’ve found over the years that replicating someone else’s experiment is fraught with difficulties, and not primarily exclusively because of problems with code quality). A much better approach is for other scientists to write their own code, and check independently whether the results are confirmed. This avoids the problem of everyone relying on one particular piece of software, as we can never be sure any software is entirely error-free.

The claim that many climate scientists have refused to publish their computer programs is also specious. I compiled a list last summer of how to access the code for the 23 main models used in the IPCC report. Although only a handful are fully open source, most are available free under fairly light licensing arrangements. For our own research we have asked for and obtained the the full code, version histories, and bug databases from several centres, with no difficulties (other than the need for a little patience as the appropriate licensing agreements were sorted out). Climate and weather forecasting code has a number of potential commercial applications, so the modeling centres use a license agreement that permits academic research, but prohibits commercial use. This is no different from what would be expected when we obtain code from any commercial organization.

Professor Ince mentions Hatton’s work, which is indeed an impressive study, and one of the few that that has been carried out on scientific code. And it is quite correct that there is a lot of shoddy scientific software out there. We’ve applied some of Hatton’s research methods to climate model software, and have found that, by standard software quality metrics, the climate models are consistently good quality code. Unfortunately, is it is not clear that standard software engineering quality metrics apply well to this code. Climate models aren’t built to satisfy a specification, but to address a scientific problem where the answer is not known in advance, and where only approximate solutions are possible. Many standard software testing techniques don’t work in this domain, and it is a shame that the software engineering research community has almost completely ignored this problem – we desperately need more research into this.

Prof Ince also echoes a belief that seems to be common across the academic software community that releasing the code will solve the quality problems seen in the specific case of the Harry file. This is a rather dubious claim. There is no evidence that, in general, open source software is any less buggy than closed source software. Dr Xu at the University of Notre Dame studied thousands of open source software projects, and found that the majority had nobody other than the original developer using them, while a very small number of projects had attracted a big community of developers. This pattern would be true of scientific software: the problem isn’t lack of openness, it’s lack of time – most of the code thrown together to test out an idea by a particular scientist is only of interest to that one scientist. If a result is published and other scientists think it’s interesting and novel, they attempt to replicate the result themselves. Sometimes they ask for the original code (and in my experience, are nearly always given it). But in general, they write their own versions, because what matters isn’t independent verification of the code, but independent verification of the scientific results.

I am encouraged that my colleagues in the software engineering research community are starting to take an interest in studying the methods by which climate science software is developed. I fully agree that this is an important topic, and have been urging my colleagues to address it for a number of years. I do hope that they take the time to study the problem more carefully though, before drawing conclusions about overall software quality of climate code.

Prof Steve Easterbrook, University of Toronto

Update: The Guardian never published my letter, but I did find a few other rebuttals to Ince’s article in various blogs. Davec’s is my favourite!