Well, this is what it comes down to. Code reviews on national TV. Who would have thought it? And, by the standards of a Newsnight code review, the code in question doesn’t look so good. Well, it’s not surprising it doesn’t. It’s the work of one, untrained programmer, working in an academic environment, trying to reconstruct someone else’s data analysis. And given the way in which the CRU files were stolen, we can be pretty sure this is not a random sample of code from the CRU; it’s handpicked to be one of the worst examples.
Watch the clip from about 2:00. They compare the code with some NASA code, although we’re not told what exactly. Well, duh. If you compare the experimental code written by one scientist on his own, which has clearly not been through any code review, with that produced by a NASA’s engineering processes, of course it looks messy. For any programmers reading this: How many of you can honestly say that you’d come out looking good if I trawled through your files, picked the worst piece of code lying around in there, and reviewed it on national TV? And the “software engineer” on the program says it’s “below the standards you would expect in any commercial software”. Well, I’ve seen a lot of commercial software. It’s a mix of good, bad, and ugly. If you’re deliberate with your sampling technique, you can find a lot worse out there.
Does any of this matter? Well, a number of things bug me about how this is being presented in the media and blogosphere:
- The first, obviously, is the ridiculous conclusion that many people seem to be making that poor code quality in one, deliberately selected program file somehow invalidates all of climate science. As cdavid points out towards the end of this discussion, if you’re going to do that, then you pretty much have to throw out most results in every field of science over the past few decades for the same reason. Bad code is endemic in science.
- The slightly more nuanced, but equally specious, conclusion that bugs in this code mean that research results at the CRU must be wrong. Eric Raymond picks out an example he calls blatant data-cooking, but is quite clearly fishing for results, because he ignores the fact that the correction he picks on is never used in the code, except in parts that are commented out. He’s quote mining for effect, and given Raymond’s political views, it’s not surprising. Just for fun, someone quote mined Raymond’s own code, and was horrified at what he found. Clearly we have to avoid all open source code immediately because of this…? The problem, of course, is that none of these quote miners have gone to the trouble to establish what this particular code is, why it was written, and what it was used for.
- The widely repeated assertion that this just proves that scientific software must be made open source, so that a broader community of people can review it and improve it.
It’s this last point that bothers me most, because at first sight, it seems very reasonable. But actually, it’s a red herring. To understand why, we need to pick apart two different arguments:
- An argument that when a paper is published, all of the code and data on which it is based should be released so that other scientists (who have the appropriate background) can re-run it and validate the results. In fields with complex, messy datasets, this is exceedingly hard, but might be achievable with good tools. The complete toolset needed to do this does not exist today, so just calling for making the code open source is pointless. Much climate code is already open source, but that doesn’t mean anyone in another lab can repeat a run and check the results. The problems of reproducibility have very little to do with whether the code is open – the key problem is to capture the entire scientific workflow and all data provenance. This is very much an active line of research, and we have a long way to go. In the absence of this, we rely on other scientists testing the results with other methods, rather than repeating the same tests. Which is the way it’s done in most branches of science.
- An argument that there is a big community of open source programmers out there who could help. This is based on a fundamental misconception about why open source software development works. It matters how the community is organised, and how contributions to the code are controlled by a small group of experts. It matters that it works as a meritocracy, where programmers need to prove their ability before they are accepted into the inner developer group. And most of all, it matters that the developers are the domain experts. For example, the developers who built the Linux kernel are world-class experts on operating systems and computer architecture. Quite often they don’t realize just how high their level of expertise is, because they hang out with others who also have the same level of expertise. Likewise, it takes years of training to understand the dynamics of atmospheric physics in order to be able to contribute to the development of a climate simulation model. There is not a big pool of people with the appropriate expertise to contribute to open source climate model development, and nor is there ever likely to be, unless we expand our PhD programs in climatology dramatically (I’m sure the nay-sayers would like that!).
We do know that most of the heavy duty climate models are built at large government research centres, rather than at universities. Dave Randall explains why this is: the operational overhead of developing, testing and maintaining a Global Climate Model is far too high for university-based researchers. The Universities use (parts of) the models, and do further data analysis on both observational data and outputs from the big models. Much of this is the work of indivdual PhD students or postdocs. Which means that the argument that all code written at all stages of climate research must meet some gold standard of code quality is about as sensible as saying no programmer should ever be allowed to throw together a script to test out if some idea works. Of course bad code will get written in a hurry. What matters is that as a particular line of research matures, the coding practices associated with it should mature too. And we have plenty of evidence that this is true of climate science: the software practices used at the Hadley Centre for their climate models are better than most commercial software practices. Furthermore, they manage to produce code that appears to be less buggy than just about any other code anywhere (although we’re still trying to validate this result, and understand what it means).
None of this excuses bad code written by scientists. But the sensible response to this problem is to figure out how to train scientists to be better programmers, rather than argue that some community of programmers other than scientists can take on the job instead. The idea of open source climate software is great, but it won’t magically make the code better.