This week I’m visiting the Max Planck Institute for Meteorology (MPI-M) in Hamburg. I gave my talk yesterday on the Hadley study, and it led to some fascinating discussions about software practices used for model building. One of the topics that came up in the discussion afterwards was how this kind of software development compares with agile software practices, and in particular the reliance on face-to-face communication, rather than documentation. Like many software projects, climate modellers struggle to keep good, up-to-date documentation, but generally feel they should be doing better. The problem of course, is that traditional forms of documentation (e.g. large, stand-alone descriptions of design and implementation details) are expensive to maintain, and of questionable value – the typical experience is that you wade through the documentation and discover that despite all the details, it never quite answers your question. Such documents are often produced in a huge burst of enthusiasm for the first release of the software, but then never touched again through subsequent releases. And as the code in the climate models evolves steadily over decades, the chances of any stand-alone documentation keeping up are remote.
An obvious response is that the code itself should be self-documenting. I’ve looked at a lot of climate model code, and readability is somewhat variable (to put it politely). This could be partially addressed with more attention to coding standards, although it’s not clear how familiar you would have to be with the model already to be able to read the code, even with very good coding standards. Initiatives like Clear Climate Code intend to address this problem, by re-implementing climate tools as open source projects in Python, with a strong focus on making the code as understandable as possible. Michael Tobis and I have speculated recently about how we’d scale up this kind of initiative to the development of coupled GCMs.
But readable code won’t fill the need for a higher level explanation of the physical equations and their numerical approximations used in the model, along with rationale for algorithm choices. These are often written up in various forms of (short) white papers when the numerical routines are first developed, and as these core routines rarely change, this form of documentation tends to remain useful. The problem is that these white papers tend to have no official status (or perhaps at best, they appear as technical reports), and are not linked in any usable way to distributions of the source code. The idea of literate programming was meant to solve this problem, but it never took off, probably because it demands that programmers must tear themselves away from using programming languages as their main form of expression, and start thinking about how to express themselves to other human beings. Given that most programmers define themselves in terms of the programming languages they are fluent in, the tyranny of the source code is unlikely to disappear anytime soon. In this respect, climate modelers have a very different culture from most other kinds of software development teams, so perhaps this is an area where the ideas of literate programming could take root.
Lack of access to these white papers could also be solved by publishing them as journal papers (thus instantly making them citeable objects). However, scientific journals tend not to publish descriptions of the designs of climate models, unless they are accompanied with new scientific results from the models. There are occasional exceptions (e.g. see the special issue of the Journal of Climate devoted to the MPI-M models). But things are changing, with the recent appearance of two new journals:
- Geoscientific Model Development, an open access journal that accepts technical descriptions of the development and evaluation of the models;
- Earth Science Informatics, a Springer Journal with a broader remit than GMD, but which does cover descriptions of the development of computational tools for climate science.
The problem is related to another dilemma in climate modeling groups: acknowledgement for the contributions of those who devote themselves more to model development rather than doing “publishable science”. Most of the code development is done by scientists whose performance is assessed by their publication record. Some modeling centres have created job positions such as “programmers” or “systems staff”, although most people hired into these roles have a very strong geosciences background. A growing recognition of the importance of their contributions represents a major culture change in the climate modeling community over the last decade.