This week I’m visiting the Max Planck Institute for Meteorology (MPI-M) in Hamburg. I gave my talk yesterday on the Hadley study, and it led to some fascinating discussions about software practices used for model building. One of the topics that came up in the discussion afterwards was how this kind of software development compares with agile software practices, and in particular the reliance on face-to-face communication, rather than documentation. Like many software projects, climate modellers struggle to keep good, up-to-date documentation, but generally feel they should be doing better. The problem of course, is that traditional forms of documentation (e.g. large, stand-alone descriptions of design and implementation details) are expensive to maintain, and of questionable value – the typical experience is that you wade through the documentation and discover that despite all the details, it never quite answers your question. Such documents are often produced in a huge burst of enthusiasm for the first release of the software, but then never touched again through subsequent releases. And as the code in the climate models evolves steadily over decades, the chances of any stand-alone documentation keeping up are remote.

An obvious response is that the code itself should be self-documenting. I’ve looked at a lot of climate model code, and readability is somewhat variable (to put it politely). This could be partially addressed with more attention to coding standards, although it’s not clear how familiar you would have to be with the model already to be able to read the code, even with very good coding standards. Initiatives like Clear Climate Code intend to address this problem, by re-implementing climate tools as open source projects in Python, with a strong focus on making the code as understandable as possible. Michael Tobis and I have speculated recently about how we’d scale up this kind of initiative to the development of coupled GCMs.

But readable code won’t fill the need for a higher level explanation of the physical equations and their numerical approximations used in the model, along with rationale for algorithm choices. These are often written up in various forms of (short) white papers when the numerical routines are first developed, and as these core routines rarely change, this form of documentation tends to remain useful. The problem is that these white papers tend to have no official status (or perhaps at best, they appear as technical reports), and are not linked in any usable way to distributions of the source code. The idea of literate programming was meant to solve this problem, but it never took off, probably because it demands that programmers must tear themselves away from using programming languages as their main form of expression, and start thinking about how to express themselves to other human beings. Given that most programmers define themselves in terms of the programming languages they are fluent in, the tyranny of the source code is unlikely to disappear anytime soon. In this respect, climate modelers have a very different culture from most other kinds of software development teams, so perhaps this is an area where the ideas of literate programming could take root.

Lack of access to these white papers could also be solved by publishing them as journal papers (thus instantly making them citeable objects). However, scientific journals tend not to publish descriptions of the designs of climate models, unless they are accompanied with new scientific results from the models. There are occasional exceptions (e.g. see the special issue of the Journal of Climate devoted to the MPI-M models). But things are changing, with the recent appearance of two new journals:

  • Geoscientific Model Development, an open access journal that accepts technical descriptions of the development and evaluation of the models;
  • Earth Science Informatics, a Springer Journal with a broader remit than GMD, but which does cover descriptions of the development of computational tools for climate science.

The problem is related to another dilemma in climate modeling groups: acknowledgement for the contributions of those who devote themselves more to model development rather than doing “publishable science”. Most of the code development is done by scientists whose performance is assessed by their publication record. Some modeling centres have created job positions such as “programmers” or “systems staff”, although most people hired into these roles have a very strong geosciences background. A growing recognition of the importance of their contributions represents a major culture change in the climate modeling community over the last decade.

7 Comments

  1. Hi Steve,

    IMHO, the culture change you mention is to better appreciate there are two quite distinct uses for the climate models. 1) Science tools. 2) Engineering tools. And to better appreciate that the optimal methods to create and maintain these tools are also quite distinct.

    Both types of tools contain a lot of science, but the purpose of the former is to help scientists do better science while the purpose of the latter is to help address more practical issues.

    But what scientist’s goal is to do her job so well that she becomes an engineer? :-) There is a lot of firmly established science already in the models so it’s about time that software engineers are given the resources to fork the models. And I would be very surprised if you did not have some thoughts about how to do just that.

    George

    [George – If I understand the distinction you are making, then there are currently no climate models anywhere used as engineering tools; they’re all built and used for basic science – Steve]

  2. If your focus is on readability, I think this would be a really interesting test case for Sun’s Fortress language. The syntax resembles mathematical notation, and they have a Fortress-to-LaTeX translator for pretty rendering. For example, check out their implementation of the Buffon needle problem.

    [Thanks, Lorin – that’s very interesting. It’ll be very hard to persuade them to move away from Fortran for the big models, but it might be possible for smaller side projects. – Steve]

  3. Check this simple tool for modernized Literate Programming:
    http://github.com/unixtechie/Literate-Molly

    It is extremely easy to use, it can be used with “noweb” tools etc. etc.

  4. Pingback: Experimental climatology for kids | Serendipity

  5. George, Steve, there might be some examples already existing of climate models more in the engineering tools category. The engineering end of weather/climate is forecasting. The UK Met. Office and the US NWS might have something close enough to be interesting for this view.

    The UKMO uses their ‘unified model’ for both daily weather forecast modelling and for climate modelling. It is, I’m told, not quite as unified as some advertising would have it, but more so than other climate models.

    The US NWS has their Climate Forecast System model (CFS), which is a coupled atmosphere-ocean-ice system. The atmosphere is relatively, but not absolutely, the atmosphere used for the short range (1-16 day) global atmospheric weather forecasting. While the ocean side had lineage to the Princeton Modular Ocean Model, there has, again, been some modification to the climate purpose.

  6. Metafor (google it) is aimed at a piece of this problem: a common information model to describe models, both in scientific terms, and computational terms. In the long run, if it takes off, we expect both human GUIs and self-documenting code co-existing (both are necessary to solve this problem). It’s quite early days for metafor, but we’re expecting a massive documentation effort as part of CMIP5 … and publication kudos for those who make the effort to do it …

    [yes of course! I had Metafor categorized as more about curation of the datasets, but I’d forgotten about the long term goal of unifying the data descriptions with descriptions of the models from which they came. – Steve]

  7. Robert – yes, the UKMO builds weather forecasting and climate models from the same unified code base, so that the two types of model share a large amount of code; for example they use the same dynamical code, and much of the same physics, but differ when the big difference in spatial or temporal resolution demands a different way of computing stuff. This imposes some engineering rigor on the model development processes (and indeed, as a result, they are much more mature in their software development processes than other climate modeling centres, as I’ve been reminded several times this week).

    But I still stand by my comment that nobody is running *climate models* (i.e. decadal or longer forecasts) in an “engineering” mode – they are all experimental scientific instruments. I think George is suggesting that forking the science models and improving the engineering (presumably for commercial forecasting use) would be a possible route to getting better software engineering practices put in place. But I don’t think that’s going to happen any time soon. There is no commercial market for *using* climate models (as opposed to a market for the *outputs* of the scientific models), so no sign of any pressure for such engineering to be done. The most obvious potential customer is the insurance industry, but it’s hard to imagine why they would acquire the expertise to run the models themselves, when all they really need are the outputs, which are freely available.

  8. One issue that climate models are designed with scientific research – not as examples of good software engineering. The perception is that the time and effort taken to make the codes better from a s/w point of view is time and effort taken away from improving the science contained within them. I don’t agree that the perception is 100% correct, but it’s there. Besides, scientists run the show, and they don’t get rewarded for good code – they get rewarded for exploring the science. Besides, the centers and labs doing climate modeling are research centers, not code shops.

  9. Pingback: Exploiting Spatial Memory: Code Canvas | Serendipity

  10. Pingback: High level architecture of earth system models | Serendipity

  11. Pingback: I never said that! | Serendipity

  12. Pingback: Launch of the Climate Code Foundation | Serendipity

Join the discussion: