I wasn’t going to post anything about the CRU emails story (apart from my attempt at humour), because I think it’s a non-story. I’ve read a few of the emails, and it looks no different to the studies I’ve done of how climate science works. It’s messy. It involves lots of observational data sets, many of which contain errors that have to be corrected. Luckily, we have a process for that – it’s called science. It’s carried out by a very large number of people, any of whom might make mistakes. But it’s self-correcting, because they all review each other’s work, and one of the best ways to get noticed as a scientist is to identify and correct a problem in someone else’s work. There is, of course, a weakness in the way such corrections are usually done to a previously published paper. The corrections appear as letters and other papers in various journals, and don’t connect up well to the original paper. Which means you have to know an area well to understand which papers have stood the test of time and which have been discredited. Outsiders to a particular subfield won’t be able to tell the difference. They’ll also have a tendency to seize on what they think are errors, but actually have already been addressed in the literature.
If you want all the gory details about the CRU emails, read the RealClimate posts (here and here) and the lengthy discussion threads that follow from them. Many of the scientists involved in the CRU emails comment in these threads, and the resulting picture of the complexity of the data and analysis that they have to deal with is very interesting. But of course, none of this will change anyone’s minds. If you’re convinced global warming is a huge conspiracy, you’ll just see scientists trying to circle the wagons and spin the story their way. If you’re convinced that AGW is now accepted as scientific fact, then you’ll see scientists tearing their hair out at the ignorance of their detractors. If you’re not sure about this global warming stuff, you’ll see a lively debate about the details of the science, none of which makes much sense on its own. Don’t venture in without a good map and compass. (Update: I’ve no idea who’s running this site, but it’s a brilliant deconstruction of the allegations about the CRU emails).
But one issue has arisen in many discussions about the CRU emails that touches strongly on the research we’re doing in our group. Many people have looked at the emails and concluded that if the CRU had been fully open with its data and code right from the start, there would be no issue now. This is of course, a central question in Alicia‘s research on open science. While in principle, open science is a great idea, in practice, there are many hurdles, including the fear of being “scooped”, the need to give appropriate credit, the problems of metadata definition and of data provenance, the cost of curation, and the fact that software has a very short shelf-life, and so on. For the CRU dataset at the centre of the current kerfuffle, someone would have to go back to all the data sources, and re-negotiate agreements about how the data can be used. Of course, the anti-science crowd just think that’s an excuse.
However, for climate scientists there is another problem, which is the workload involved in being open. Gavin, at RealClimate, raises this issue in response to a comment that just putting the model code online is not sufficient. For example, if someone wanted to reproduce a graph that appears in a published paper, they’ll need much more: the script files, the data sets, the parameter settings, the spinup files, and so on. Michael Tobis argues that much of this can be solved with good tools and lots of automated scripts. Which would be fine if we were talking about how to help other scientists replicate the results.
Unfortunately, in the special case of climate science, that’s not what we’re talking about. A significant factor in the reluctance of climate scientists to release code and data is to protect themselves from denial-of-service attacks. There is a very well-funded and PR-savvy campaign to discredit climate science. Most scientists just don’t understand how to respond to this. Firing off hundreds of requests to CRU to release data under the freedom of information act, despite each such request being denied for good legal reasons, is the equivalent of frivolous lawsuits. But even worse, once datasets and codes are released, it is very easy for an anti-science campaign to tie the scientists up in knots trying to respond to their attempts to poke holes in the data. If the denialists were engaged in an honest attempt to push the science ahead, this would be fine (although many scientists would still get frustrated – they are human too).
But in reality, the denialists don’t care about the science at all; their aim is a PR campaign to sow doubt in the minds of the general public. In the process, they effect a denial-of-service attack on the scientists – the scientists can’t get on with doing their science because their time is taken up responding to frivolous queries (and criticisms) about specific features of the data. And their failure to respond to each and every such query will be trumpeted as an admission that an alleged error is indeed an error. In such an environment, is it perfectly rational not to release data and code – it’s better to pull up the drawbridge and get on with the drudgery of real science in private. That way the only attacks are complaints about lack of openness. Such complaints are bothersome, but much better than the alternative.
In this case, because the science is vitally important for all of us, it’s actually in the public interest that climate scientists be allowed to withhold their data. Which is really a tragic state of affairs. The forces of anti-science have a lot to answer for.
Update: Robert Grumbine has a superb post this morning on why openness and reproducibility is intrinsically hard in this field