I’ve pointed out a number of times that the software processes used to build the Earth System Models used in climate science don’t look anything like conventional software engineering practices. One very noticeable difference is the absence of detailed project plans, estimates, development phases, etc. While scientific steering committees do discuss long term strategy and set high level goals for the development of the model, the vast majority of model development work occurs bottom-up, through a series of open-ended, exploratory changes to the code. The scientists who work most closely with the models get together and decide what needs doing, typically on a week-to-week basis. Which is a little like agile planning, but without any of the agile planning techniques. Is this the best approach? Well, if the goal was to deliver working software to some external customer by a certain target date, then probably not. But that’s not the goal at all – the goal is to do good science. Which means that much of the work is exploratory and opportunistic.  It’s difficult to plan model development in any detail, because it’s never clear what will work, nor how long it will take to try out some new idea. Nearly everything that’s worth doing to improve the model hasn’t been done before.

This approach also favours a kind of scientific bricolage. Imagine we have sketched out a conceptual architecture for an earth system model. The conventional software development approach would be to draw up a plan to build each of the components on a given timeline, such that they would all be ready by some target date for integration. And it would fail spectacularly, because it would be impossible to estimate timelines for each component – each part involves significant new research. The best we can do is to get groups of scientists to go off and work on each subsystem, and wait to see what emerges. And to be willing to try incorporating new pieces of code whenever they seem to be mature enough, no matter where they came from.

So we might end up with a coupled earth system model where each of the major components was built at a different lab, each was incorporated into the model at a different stage in its development, and none of this was planned long in advance. And, as a consequence, each component has its own community of developers and users who have goals that often diverge from the goals of the overall earth system model. Typically, each community wants to run its component model in stand-alone model, to pursue scientific questions specific to that subfield. For example, ocean models are built by oceanographers to study oceanography. Plant growth models are built by biologists to study the carbon cycle. And so on.

One problem is that if you take components from each of these communities to incorporate into a coupled model, you don’t want to fork the code. A fork would give you the freedom to modify the component to make it work in the coupled scheme. But, as with forking in open source projects, is nearly always a mistake. It fragments the community, and means the forked copy no longer gets the ongoing improvements to the original software (or more precisely, it quickly becomes too costly to transplant such improvements into the forked code). Access to the relevant community of expertise and their ongoing model improvements are at least as important as any specific snapshot of their code, otherwise the coupled model will fail to keep up with the latest science. Which means a series of compromises must be made – some changes might be necessary to make the component work in a coupled scheme, but these must not detract from the ability of the community to continue working with the component as a stand-alone model.

So, building an earth system model means assembling a set of components that weren’t really designed to work together, and a continual process of negotiation between the requirements for the entire coupled model and the requirements of the individual modeling communities. The alternative, re-building each component from scratch, doesn’t make sense financially or scientifically. It would be expensive and time consuming, and you’d end up with untested software, that scientifically, is several years behind the state-of-the-art. [Actually, this might be true of any software: see this story of the netscape rebuild].

Over the long term, a set of conventions have emerged that help to make it easier to couple together components built by different communities. These include the basic data formating and message passing standards, as well as standard couplers. And more recently, modeling frameworks, metadata standards and data sharing infrastructure. But as with all standardization efforts, it takes a long time (decades?) for these to be accepted across the various modeling communities, and there is always resistance, in part because meeting the standard incurs a cost and usually detracts from the immediate goals of each particular modeling community (with the benefits accruing elsewhere – specifically to those interested in working with coupled models). Remember: these models are expensive scientific instruments. Changes that limit the use of the component as a standalone model, or which tie it to a particular coupling scheme, can diminish its value to the community that built it.

So, we’re stuck with the problem of incorporating a set of independently developed component models, without the ability to impose a set of interface standards on the teams that build the components. The interface definitions have to be continually re-negotiated. Bryan Lawrence has some nice slides on the choices, which he characterizes as the “coupler approach” and the “framework approach” (I shamelessly stole his diagrams…)

The coupler approach leaves the models almost unchanged, with a communication library doing any necessary transformation on the data fields.

The framework approach splits the original code into smaller units, adapting their data structures and calling interfaces, allowing them to be recombined in a more appropriate calling hierarchy

The advantage of the coupler approach is that it requires very little change to the original code, and allows the coupler itself to be treated as just another stand-alone component that can be re-used by other labs. However, it’s inefficient, and seriously limits the opportunities to optimize the run configuration: while the components can run in parallel, the coupler must still wait on each component to do its stuff.

The advantage of the framework approach is that it produces a much more flexible and efficient coupled model, with more opportunities to lay out the subcomponents across a parallel machine architecture, and a greater ability to plug other subcomponents in as desired. The disadvantage is that component models might need substantial re-factoring to work in the framework. The trick here is to get the framework accepted as a standard across a variety of different modeling communities. This is, of course, a bit of a chicken-and-egg problem, because its advantages have to be clearly demonstrated with some success stories before such acceptance can happen.

There is a third approach, adopted by some of the bigger climate modeling labs: build everything (or as much as possible) in house, and build ad hoc interfaces between various components as necessary. However, as earth system models become more complex, and incorporate more and more different physical, chemical and biological processes, the ability to do it all in-house is getting harder and harder. This is not a viable long term strategy.

3 Comments

  1. Pingback: What makes software engineering for climate models different? | Serendipity

  2. > I’ve pointed out a number of times that the software processes used to build the Earth System Models used in climate science don’t look anything like conventional software engineering practices.

    I should hope not! “Conventional” software engineering practices have much more to do with vendors and employees conspiring as rent-seekers, to the detriment of customers, internal and external.

    If one has the ability, the management structure should be as well documented and openly published as the actual source code, along with making the revision control system available for read access (meaning that the management documents should be under revision control too). Ah, the disinfecting power of sunlight! ;-) But it is hard to find coders and managers with the backbone to pull this off.

    The Python language and the CPython implementation are particularly well managed, by this high standard.

    The particulars of the development system are secondary, because the published results for efficacy for different development methodologies are so slight. Also, everyone agrees the efficacy is highly dependent on individuals and situation and type of application. Better to dip a toe into any particular technique of management, test & measure & document & admit mistakes of judgement quickly and publicly.

  3. Just bumbed into this old but related article in Science Daily:

    Open Source Software Toolkit Plays Key Role In New Climate Simulations
    http://www.sciencedaily.com/releases/2007/02/070215111454.htm

    The Model Coupling Toolkit
    http://www.mcs.anl.gov/research/projects/mct/

  4. Just for credit where credit’s due: Sophie Valcke (my co-author) on the presentation produced the coupler/framework diagrams!

    What to do about it … indeed? Spending more and more of my time thinking about it lately (as you can tell). No easy answers, since most of the problems are not technical, but social …

  5. Pingback: I never said that! | Serendipity

  6. Pingback: Talks and Workshops I can’t attend :-( | Serendipity

  7. Pingback: You can’t delegate ill-defined problems to software engineers | Serendipity

Join the discussion: