Of all the global climate models, the Community Earth System Model, CESM, seems to come closest to the way an open source community works. The annual CESM workshop, this week in Breckenridge, Colorado, provides an example of how the community works. There are about 350 people attending, and much of the meeting is devoted to detailed discussion of the science and modeling issues across a set of working groups: Atmosphere model, Paleoclimate, Polar Climate, Ocean model, Chemistry-climate, Land model, Biogeochemistry, Climate Variability, Land Ice, Climate Change, Software Engineering, and Whole Atmosphere.
In the opening plenary on Monday, Mariana Vertenstein (who is hosting my visit to NCAR this month), was awarded the 2010 CESM distinguished achievement award for her role in overseeing the software engineering of the CESM. This is interesting for a number of reasons, not least because it demonstrates how much the CESM community values the role of the software engineering team, and the advances that the software engineering working group has made improving the software infrastructure over the last few years.
Earth system models are generally developed in a manner that’s very much like agile development. Getting the science working in the model is prioritized, with issues such as code structure, maintainability and portability worked in later, as needed. To some extent, this is appropriate – getting the science right is the most important thing, and it’s not clear how much a big upfront design effort would payoff, especially in the early stages of model development, when it’s not clear whether the model will become anything more than an interesting research idea. The downside of this strategy, is that as the model grows in sophistication, the software architecture ends up being a mess. As Mariana explained in her talk, coupled models like the CESM have reached a point in their development where this approach no longer works. In effect, a massive refactoring effort is needed to clean up the software infrastructure to permit future maintainability.
Mariana’s talk was entitled “Better science through better software”. She identified a number of major challenges facing the current generation of earth system models, and described some of the changes in the software infrastructure that have been put in place for the CESM to address them.
The challenges are:
1) New system complexity, as new physics, and new grids are incorporated into the models. For example, the CESM now has a new land ice model, which along with the atmosphere, ocean, land surface, and sea ice components brings the total to five distinct geophysical component models, each operating on different grids, and each with its own community of users. These component models exchange boundary information via the coupler, and the entire coupled model now runs to about 1.2 million lines of code (compare with the previous generation model, CCSM3, now six years old, which had about 330KLoC).
The increasing number of component models increases the complexity of the coupler. It now has to handle regridding (where data such as energy and mass is exchanged between component models with different grids), data merging, atmosphere-ocean fluxes, and conservation diagnostics (e.g. to ensure the entire model conserves energy and mass). Note: Older versions of the model were restricted, for example with the atmosphere, ocean and land surface schemes all required to use the same grid.
Users also want to be able to swap in different versions of each major component. For example, a particular run might demand a fully prognostic atmosphere model, coupled with a prescribed ocean parameterization (taken from observational data, for example). Then, within each major component, users might want different configurations: multiple dynamic cores, multiple chemistry modes, etc.
Another source of complexity comes from resolutions. Model components now run over a much wider range of resolutions, and the re-gridding challenges are substantial. And finally, whereas the old model used rectangular latitude-longitude grids, now people want to accommodate many different types of grid.
2) Ultra-high resolution. The trend towards higher resolution grids poses serious challenges for scalability, especially given the massive increase in volume of data being handled. All components (and the coupler) need to be scalable in terms of both memory and performance.
Higher resolution increases the need for more parallelism, and there has been tremendous progress on this in the last few years. A few years back, as part of the DOE/LLNL grand challenge, CCSM3 managed 0.5 simulation years per day, running on 4,000 cores, and this was considered a great achievement. This year, the new version of CESM has successfully run on 80,000 cores, to give 3 simyears per day in a very high resolution model: 0.125° grid for the atmosphere, 0.25° for the land and 0.1° for the ocean.
Interestingly, in these highly parallel configurations, the ocean model, POP, is no longer dominant for processing time; the sea ice and atmosphere models start to dominate because the two of them are coupled sequentially. Hence the ocean model scales more readily.
3) Data assimilation. For weather forecasting models, this has long been standard analysis practice. Briefly, the model state and the observational data are combined at each timestep to give a detailed analysis of the current state of the system, which helps to overcome limitations in both the model and the data, and to better understand the physical processes underlying the observational data. It’s also useful in forecasting, as it allows you to arrive at a more accurate initial state for a forecast run.
In climate modeling, data assimilation is a relatively new capability. The current version of the CESM can do data assimilation in both the atmosphere and ocean. The new framework also supports experiments where multiple versions of the same component are used within a run. For example, the model might have multiple atmosphere components in a single simulation, each coupled with its own instance of the ocean, where one is an assimilation module and the other a prognostic model.
4) The needs of the user community. Supporting a broad community of model users adds complexity, especially as the community becomes more diverse. The community needs more frequent releases of the model (e.g. more often than every six years!), and people ned to be able to merge new releases more easily into their own sandboxes.
These challenges have inspired a number of software infrastructure improvements in the CESM. Mariana described a number of advances.
The old model, CCSM3 was run as multiple executables, one for each major component, exchanging data with a coupler via MPI. And each component used to have its own way of doing coupling. But this kills efficiency – processors end up idling when a component has to wait on data from the others. It’s also very hard in this scheme to understand the time evolution as the model runs, which then also makes it very hard to debug. And the old approach was notoriously hard to port to different platforms.
The new framework has a top level driver that controls time evolution, with all coupling done at the top level. Then the component models can be laid out across the available processors, either all in parallel, or in a hybrid parallel-sequential mode. For example, atmosphere, land scheme and sea ice modules might be called in sequence, with the ocean model running in parallel with the whole set. The chosen architecture is specified in a single XML file. This brings a number of benefits:
- Better flexibility for very different platforms;
- Facilitates model configurations with huge amounts of parallelism across a very large number of processors;
- Allows the coupler & components to be ESMF compliant, so the model can can couple with other ESMF compliant models;
- Integrated release cycle – it’s now all one model, whereas in the past each component model had it’s own separate releases.
- Much easier to debug, as it’s easier to follow the time evolution.
The new infrastructure also includes scripting tools that support the process of setting up an experiment, and making sure it runs with optimal performance on a particular platform. For example, the current release includes script to create wide variety of out-of-the-box experiments. It also includes a load balancing tool, to check how much time each component is idle during a run, and new scripts with hints for porting to new platforms, based on a set of generic machine templates.
The model also has a new parallel I/O library (PIO), which adds a layer of abstraction between the data structures used in each model component and the arrangement of the data when written to disk.
The new versions of the model are now being released via the subversion repository (rather than a .tar file, as used in the past). Hence, users can use an svn merge to get the latest release. There have been three model releases since January:
- CCSM Alpha, released in January 2010;
- CCSM 4.0 full release, in April 2010;
- CESM 1.0 released June 2010.
Mariana ended her talk with a summary of the future work – complete the CMIP5 runs for the next round of the IPCC assessment process; regional refinement with scalable grids; extend the data assimilation capability; handle super-parameterization (e.g. include cloud resolving models); add hooks for human dimensions within the models (e.g. to support the DOE program on integrated assessment); and improved validation metrics.
Note: the CESM is the successor to CCSM – the community climate system model. The name change recognises the wider set of earth systems now incorporated into the model.