One of the exercises I set myself while visiting NCAR this month is to try porting the climate model CESM1 to my laptop (a MacBook Pro). Partly because I wanted to understand what makes it tick, and partly because I thought it would be neat to be able to run it anywhere. At first I thought the idea was crazy – these things are designed to be run on big supercomputers. But CESM is also intended to be portable, as part of a mission to support a broader community of model users. So, porting it to my laptop is a simple proof of concept test – if it ports easily, that’s a good sign that the code is robust.
It took me several days of effort to complete the port, but most of that time was spent on two things that have very little to do with the model itself. The first was a compiler bug that I tripped over (yeah, yeah, blame the compiler, right?) and the second was the issue of getting all the necessary third party packages installed. But in the end I was successful. I’ve just completed two very basic test runs of the model. The first is what’s known as an ‘X’ component set, in which all the major components (atmosphere, ocean, land, ice, etc) don’t actually do anything – this just tests that the framework code builds and runs. The second is an “A” compset at a low resolution, in which all the components are static data models (this ran for five days of simulation time in about 1.5 minutes). If I was going on to test the port correctly, there’s a whole sequence of port validation tests that I ought to perform, for example to check that my runs are consistent with the benchmark runs, that I can stop and restart the model from the data files, that I get the same results in different model configurations, etc. And then eventually there’s the scientific validation tests – checks that the simulated climate in my ported model is realistic.
But for now, I just want to reflect on the process of getting the model to build and run on a new (unsupported) platform. I’ll describe some of the issues I encountered, and then reflect on what I’ve learned about the model. First, some stats. The latest version of the model, CESM1.0 was released on June 25, 2010. It contains just under 1 million lines of code. Three quarters of this is Fortran (mainly Fortran 90), the rest is a mix of shell scripts (of several varieties), XML and HTML:
In addition to the model itself, there are another 12,000 lines of perl and shell script that handle the installing, configuring, building and running the model.
The main issues that tripped me up were
- The compiler. I decided to use the gnu compiler package (gfortran, included in the gcc package), because it’s free. But it’s not one of the compilers that’s supported for CESM, because in general CESM is used with commercial compilers (e.g. IBM’s) on the supercomputers. I grabbed the newest version of gcc that I could find a pre-built Mac binary for (v4.3.0), but it turned out not to be new enough – I spent a few hours diagnosing what turned out to be a (previously undiscovered?) bug in gfortran v4.3.0 that’s fixed in newer versions (I switched to v4.4.1). And then there’s a whole bunch of compiler flags (mainly to do with compatibility for certain architectures and language versions) that are not compatible with the commercial compilers, which I needed to track down.
- Third party packages such as MPI (the message passing interface used for exchanging data between model components) and NetCDF (the data formating standard used for geoscience data). It turns out that the Mac already has MPI installed, but without Fortran and Parallel IO support, so I had to rebuild it. And it took me a few rebuilds to get both these packages installed with all the right options.
Once I’d got these sorted, and figured out which compiler flags I needed, the build went pretty smoothly, and I’ve had no problems so far running it. Which leads me to draw a number of (tentative) conclusions about portability. First, CESM is a little unusual compared to most climate models, because it is intended as a community effort, and hence portability is a high priority. It has already been ported to around 30 different platforms, including a variety IBM and Cray supercomputers, and various Linux clusters. Just the process of running the code through many different compilers shakes out not just portability issues, but good coding practices too, as different compilers tend to be picky about different language constructs.
Second, in the process of building the model, it’s quite easy to see that it consists of a number of distinct components, written by different communities, to different coding standards. Most obviously, CESM itself is built from five different component models (atmosphere, ocean, sea ice, land ice, land surface), along with a coupler that allows them to interact. There’s a tension between the needs of scientists who develop code just for a particular component model (run as a standalone model) versus scientists who want to use a fully coupled model. These communities overlap, but not completely, and coordinating the different needs takes considerable effort. Sometimes code that makes sense in a standalone module will break the coupling scheme.
But there’s another distinction that wasn’t obvious to me previously:
- Scientific code – the bulk of the Fortran code in the component modules. This includes the core numerical routines, radiation schemes, physics parameterizations, and so on. This code is largely written by domain experts (scientists), for whom scientific validity is the over-riding concern (and hence they tend to under-estimate the importance of portability, readability, maintainability, etc).
- Infrastructure code – including the coupler that allows the components to interact, the shared data handling routines, and a number of shared libraries. Most of this I could characterize as a modeling framework – it provides an overall architecture for a coupled model, and calls the scientific code as and when needed. This code is written jointly by the software engineering team and the various scientific groups.
- Installation code – including configuration and build scripts. These are distinct from the model itself, but intended to provide flexibility to the community to handle a huge variety of target architectures and model configurations. These are written exclusively by the software engineering team (I think!), and tend to suffer from a serious time crunch: making this code clean and maintainable is difficult, given the need to get a complex and ever-changing model working in reasonable time.
In an earlier post, I described the rapid growth of complexity in earth system models as a major headache. This growth of complexity can be seen in all three types of software, but the complexity growth is compounded in the latter two: modeling frameworks need to support a growing diversity of earth system component models, which then leads to exponential growth in the number of possible model configurations that the build scripts have to deal with. Handling the growing complexity of the installation code is likely to be one of the biggest software challenges for the earth system modeling community in the next few years.