Here’s the first of a series of posts from the American Geophysical Society (AGU) Fall meeting, which is happening this week in San Francisco. The meeting is huge – they’re expecting 19,000 scientists to attend, making it the largest such meeting in the physical sciences.

The most interesting session today was a new session for the AGU:  IN14B “Software Engineering for Climate Modeling”. And I’m not just saying that because it included my talk – all the talks were fascinating. (I’ve posted the slides for my talk, “Do Over or Make Do: Climate Models as a Software Development Challenge“).

After my talk, the next speaker was Cecelia DeLuca of NOAA, with a talk entitled “Emergence of a Common Modeling Architecture for Earth System Science”. Cecelia gave a great overview of the Earth System Modelling Framework. She began by pointing out that climate models don’t just contain science code – they consist of a number of different kinds of software. Lots of the code is infrastructure code, which doesn’t necessarily need to be written by scientists. Around ten years ago, a number of projects started up that had the aim of building shared, standards-based infrastructure code. The projects needed to develop the technical and mathematical expertise to build infrastructure code. But the advantages of separating this code development from the science code was clear: the teams building infrastructure code could prioritize best practices, run the nightly testing process, etc, whereas typically the scientists would not do this.

ESMF provides a common modelling architecture. Native model data structures (modules, fields, grids, timekeeping) are wrapped into ESMF standard data structures, which conform to relevant standards (E.g. ISO standards, CF standards, and the Metafor common information model, etc). The framework also offers runtime compliance checking (e.g. to check timekeeping behaviour is correct), and automated documentation (e.g. the ability to write out model metadata in an XML standard format).

Because of these efforts, in the US, earth system  models are converging on a common architecture. It’s built on standardized component interfaces, and creates a layer of structured information within Earth system codes. The lesson here is that if you can take the legacy code, and express it in a standard way, you get tremendous power.

The next speaker was Amy Langenhorst from GFDL, “Making sense of complexity with the FRE climate modelling workflow system”. Amy explained the organisational setup at GFDL: there are approximately 300 people organized into groups: 6 science based groups groups, plus a technical services group, and a modelling services group. The latter consists of 15 people, with one of them acting as a liaison for each of the science groups. This group provides the software engineering support for the science teams.

The Flexible Modeling System (FMS) is software framework that provides a coupler and infrastructure support. FMS releases happen about once per year; it provides an extensive testing framework that currently includes 209 different model configurations.

One of the biggest challenges for modelling groups like GFDL is the IPCC cycle. Each providing the model runs for the IPCC assessments involves massive complex data processing, for which a good workflow manager is needed. FRE is the workflow manager for FMS. Development of FRE was started in 2002 by Amy, at a time when the model services group didn’t yet exist.

FRE includes version control, configuration management, tools for building executables, control of execution, etc. It also provides facilities for creating XML model description files, model configuration (using a component-based approach), and integrated model testing (e.g. basic tests, restarts, scaling). It also allows for experiment inheritance, so that it’s possible to set up new model configurations based on variants of previous runs, which is useful for perturbation studies.

Next up was Rob Burns from NASA GSFC, talking about “Software Engineering Practices in the Development of NASA Unified Weather Research and Forecasting (NU-WRF) Model“. WRF is a weather forecasting model originally developed at NCAR, but widely used across the NWP community. NU-WRF is an attempt to unify variants of NCAR WRF and to facilitate better use of WRF. NU-WRF is built from versions of NCAR’s WRF, with separate process of folding in enhancements.

As is common with many modelling efforts, there were challenges arising from multiple science teams, with individual goals, interests and expertise, and scientists don’t consider software engineering as their first priority. At NASA, the Sofware Integration and Visualization Office (SIVO) provides Software Engineering support for the scientific modelling teams. SIVO helps to drive, but not to lead the scientific modelling efforts. They help with full software lifecycle management, assisting with all software processes from requirements to release, but with domain experts still making the scientific decisions. The code is under full version control, using Subversion, and the software engineering team coordinates the effort to get the codes into version control.

The experience with NU-WRF shows that this kind of partnership between science teams and a software support team can work well. Leadership and active engagement with the science teams is needed. However, involvement of the entire science team for decisions is too slow, so a core team was formed to do this.

The next speaker was Thomas Clune from NASA GISS, with a talk “Constraints and Opportunities in GCM Model Development“. Thomas began with the question: How did we end up with the software we have today? From a software quality perspective, we wrote the wrong software. Over the years, improvements in fidelity in the models have driven a disproportionate growth in complexity of implementations.

One important constraint is that model codes change relatively slowly, in part because of the model validation processes – it’s important to be able to validate each code change individually – they can’t be bundled together. But also because code familiarity is important – the scientists have to understand their code, and if it changes too fast, they lose this familiarity.

However, the problem now is that software quality is incommensurate with the growing socioeconomic role for our models in understanding climate change. There’s a great quote from Ward Cunningham: “Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise…” Examples of this debt in climate models include long procedures, kludges, cut-and-paste duplication, short/ambiguous names, and inconsistent style.

The opportunities then are to exploit advances in software engineering from elsewhere to systematically and incrementally improve the software quality of climate models. For example:

  • Coding standards – these improve productivity through familiarity, reducesome types of bugs, and help newcomers. But must be adopted from within the community by negotiation.
  • Abandon CVS. It has too many liabilities for managing legacy code, e.g. a permanence to the directory structures. The community needs version control systems that handle branching and merging. NASA GISS is planning to switch to GIT in the new year, as soon as the IPCC runs are out of the way.
  • Unit testing. There’s a great quote from Michael Feathers: “The main thing that distinguishes legacy code from non-legacy code is tests. Or rather lack of tests”. Lack of tests leads to fear of introducing subtle bugs. Elsewhere, unit testing frameworks have caused a major shift in how commercial software development works, particularly in enabling test-driven development. Tom has been experimenting with pFUnit, a testing framework with support for parallel Fortran and MPI. The existence of such testing frameworks removes some of the excuses for not using unit testing for climate models (in most cases, the modeling community relies on regression testing in preference to unit testing). Some of the reasons commonly given for not doing unit testing seem to represent some confusion about what unit testing is for: e.g. that some constraints are unknown, that tests would just duplicate implementation, or that it’s impossible to test emergent behaviour. These kinds of excuse indicate that modelers tend to conflate scientific validation with the verification offered by unit testing.
  • Clone Detection. Tools now exist to detect code clones (places where code has been copied, sometimes with minor modifications across different parts of the software). Tom has experimented with some of these with the NASA modelE, with promising results.

The next talk was by John Krasting from GFDL, on “NOAA-GFDL’s Workflow for CMIP5/IPCC AR5 Experiments”. I didn’t take many notes, mainly because the subject was very familiar to me, having visited several modeling labs over the summer, all of whom were in the middle of the frantic process of generating their IPCC CMIP5 runs (or in some cases struggling to get started).

John explained that CMIP5 is somewhat different from the earlier CMIP projects, because it is much more comprehensive, with a much larger set of model experiments, and much larger set of model variables requested. CMIP1 focussed on pre-industrial control runs, while CMIP2 added some idealized climate change scenario experiments. For CMIP3, the entire archive (from all modeling centres) was 36 terabytes. For CMIP5, this is expected to be at least two orders of magnitude bigger. Because of the larger number of experiments, CMIP5 has a tiered structure, so that some kinds of experiments are prioritized (e.g. see the diagram from Taylor et al).

GFDL is expecting to generate around 15,000 model years of simulation, yielding around 10 petabytes of data, of which around 10%-15% will be released to the public, distributed via the ESG Gateway. The remainder of the data represents some redundancy, and some diagnostic data that’s intended for internal analysis.

The final speaker in the session was Archer Batcheller, from University of Michigan, with a talk entitled “Programming Makes Software; Support Makes Users“. Archer was reporting on the results of a study he has been conducting of several software infrastructure projects in the earth system modeling community. His main observation is that e-Science is about growing socio-technical systems, and that people are a key part of these systems. Effort is needed to nurture communities of users, but such effort is crucial for building the scientific cyberinfrastructure.

From his studies, Archer found that most people developing modeling infrastructure software divide their time about 50:50 between coding and other activities, including:

  • “selling” – explaining/promoting the software in publications, at conferences, and at community meetings (even though the software is free, it still has to be “marketed”)
  • support – helping users, which in turn helps with identifying new requirements
  • training – including 1-on-1, workshops online tutorials, etc.