One of the exercises I set myself while visiting NCAR this month is to try porting the climate model CESM1 to my laptop (a MacBook Pro). Partly because I wanted to understand what makes it tick, and partly because I thought it would be neat to be able to run it anywhere. At first I thought the idea was crazy – these things are designed to be run on big supercomputers. But CESM is also intended to be portable, as part of a mission to support a broader community of model users. So, porting it to my laptop is a simple proof of concept test – if it ports easily, that’s a good sign that the code is robust.

It took me several days of effort to complete the port, but most of that time was spent on two things that have very little to do with the model itself. The first was a compiler bug that I tripped over (yeah, yeah, blame the compiler, right?) and the second was the issue of getting all the necessary third party packages installed. But in the end I was successful. I’ve just completed two very basic test runs of the model. The first is what’s known as an ‘X’ component set, in which all the major components (atmosphere, ocean, land, ice, etc) don’t actually do anything – this just tests that the framework code builds and runs. The second is an “A” compset at a low resolution, in which all the components are static data models (this ran for five days of simulation time in about 1.5 minutes). If I was going on to test the port correctly, there’s a whole sequence of port validation tests that I ought to perform, for example to check that my runs are consistent with the benchmark runs, that I can stop and restart the model from the data files, that I get the same results in different model configurations, etc. And then eventually there’s the scientific validation tests – checks that the simulated climate in my ported model is realistic.

But for now, I just want to reflect on the process of getting the model to build and run on a new (unsupported) platform. I’ll describe some of the issues I encountered, and then reflect on what I’ve learned about the model. First, some stats. The latest version of the model, CESM1.0 was released on June 25, 2010. It contains just under 1 million lines of code. Three quarters of this is Fortran (mainly Fortran 90), the rest is a mix of shell scripts (of several varieties), XML and HTML:

Lines of Code count for CESM v1.0 (not including build scripts), as calculated by cloc.sourceforge.net v1.51

In addition to the model itself, there are another 12,000 lines of perl and shell script that handle the installing, configuring, building and running the model.

The main issues that tripped me up were

  • The compiler. I decided to use the gnu compiler package (gfortran, included in the gcc package), because it’s free. But it’s not one of the compilers that’s supported for CESM, because in general CESM is used with commercial compilers (e.g. IBM’s) on the supercomputers. I grabbed the newest version of gcc that I could find a pre-built Mac binary for (v4.3.0), but it turned out not to be new enough – I spent a few hours diagnosing what turned out to be a (previously undiscovered?) bug in gfortran v4.3.0 that’s fixed in newer versions (I switched to v4.4.1). And then there’s a whole bunch of compiler flags (mainly to do with compatibility for certain architectures and language versions) that are not compatible with the commercial compilers, which I needed to track down.
  • Third party packages such as MPI (the message passing interface used for exchanging data between model components) and NetCDF (the data formating standard used for geoscience data). It turns out that the Mac already has MPI installed, but without Fortran and Parallel IO support, so I had to rebuild it. And it took me a few rebuilds to get both these packages installed with all the right options.

Once I’d got these sorted, and figured out which compiler flags I needed, the build went pretty smoothly, and I’ve had no problems so far running it. Which leads me to draw a number of (tentative) conclusions about portability. First, CESM is a little unusual compared to most climate models, because it is intended as a community effort, and hence portability is a high priority. It has already been ported to around 30 different platforms, including a variety IBM and Cray supercomputers, and various Linux clusters. Just the process of running the code through many different compilers shakes out not just portability issues, but good coding practices too, as different compilers tend to be picky about different language constructs.

Second, in the process of building the model, it’s quite easy to see that it consists of a number of distinct components, written by different communities, to different coding standards. Most obviously, CESM itself is built from five different component models (atmosphere, ocean, sea ice, land ice, land surface), along with a coupler that allows them to interact. There’s a tension between the needs of scientists who develop code just for a particular component model (run as a standalone model) versus scientists who want to use a fully coupled model. These communities overlap, but not completely, and coordinating the different needs takes considerable effort. Sometimes code that makes sense in a standalone module will break the coupling scheme.

But there’s another distinction that wasn’t obvious to me previously:

  • Scientific code – the bulk of the Fortran code in the component modules. This includes the core numerical routines, radiation schemes, physics parameterizations, and so on. This code is largely written by domain experts (scientists), for whom scientific validity is the over-riding concern (and hence they tend to under-estimate the importance of portability, readability, maintainability, etc).
  • Infrastructure code – including the coupler that allows the components to interact, the shared data handling routines, and a number of shared libraries. Most of this I could characterize as a modeling framework – it provides an overall architecture for a coupled model, and calls the scientific code as and when needed. This code is written jointly by the software engineering team and the various scientific groups.
  • Installation code – including configuration and build scripts. These are distinct from the model itself, but intended to provide flexibility to the community to handle a huge variety of target architectures and model configurations. These are written exclusively by the software engineering team (I think!), and tend to suffer from a serious time crunch: making this code clean and maintainable is difficult, given the need to get a complex and ever-changing model working in reasonable time.

In an earlier post, I described the rapid growth of complexity in earth system models as a major headache. This growth of complexity can be seen in all three types of software, but the complexity growth is compounded in the latter two: modeling frameworks need to support a growing diversity of earth system component models, which then leads to exponential growth in the number of possible model configurations that the build scripts have to deal with. Handling the growing complexity of the installation code is likely to be one of the biggest software challenges for the earth system modeling community in the next few years.

William Connolly has written a detailed critique of our paper “Engineering the Software for Understanding Climate Change”, which follows on from a very interesting discussion about “Amateurish Supercomputing Codes?” in his previous post. One of the issues raised in that discussion is the reward structure in scientific labs for software engineers versus scientists. The funding in such labs is pretty much all devoted to “doing science” which invariably means publishable climate science research. People who devote time and effort to improving the engineering of the model code might get a pat on the back, but inevitably it’s under-rewarded because it doesn’t lead directly to publishable science. The net result is that all the labs I’ve visited so far (UK Met Office, NCAR, MPI-M) have too few software engineers working on the model code.

Which brings up another point. Even if these labs decided to devote more budget to the software engineering effort (and it’s not clear how easy it would be to do this, without re-educating funding agencies), where will they recruit the necessary talent? They could try bringing in software professionals who don’t yet have the domain expertise in climate science, and see what happens. I can’t see this working out well on a large scale. The more I work with climate scientists, the more I appreciate how much domain expertise it takes to understand the science requirements, and to develop climate code. The potential culture clash is huge: software professionals (especially seasoned ones) tend to be very opinionated about “the right way to build software”, and insensitive to contextual factors that might make their previous experiences inapplicable. I envision lots of the requirements that scientists care about most (e.g. the scientific validity of the models) getting trampled on in the process of “fixing” the engineering processes. Right now the trade-off between getting the science right versus having beautifully engineered models is tipped firmly in favour of the former. Tipping it the other way might be a huge mistake for scientific progress, and very few people seem to understand how to get both right simultaneously.

The only realistic alternative is to invest in training scientists to become good software developers. Greg Wilson is pretty much the only person around who is covering this need, but his software carpentry course is desperately underfunded. We’re going to need a lot more like this to fix things…

The Muir Russell report came out today, and I just finished reading the thing. It should be no surprise to anyone paying attention that it completely demolishes the the allegations that have been made about the supposed bad behaviour of the CRU research team. But overall, I’m extremely disappointed, because the report completely misses the wood for the trees. It devotes over 100 pages to a painstaking walk through every single allegation made against the CRU, assessing the evidence for each, and demolishing them one after another. The worst it can find to say about the CRU is that it hasn’t been out there in the lead over the last decade in responding to the new FoI laws, adapting to the rise of the blogosphere, and adapting to changing culture of openness for scientific data. The report makes a number of recommendations for improvements in processes and practices at the CRU, and so can be taken as mildly critical, especially of CRU governance. But in so doing, it never really acknowledges the problems a small research unit (varying between 3.5 to 5 FTE staff over the last decade) would have in finding the resources and funding to be an early adopter in open data and public communication, while somehow managing to do cutting edge research in its area of expertise too. Sheesh!

But my biggest beef with the report is that nowhere, in 100 pages of report plus 60 pages of appendices, does it ever piece together the pattern represented by the set of allegations it investigates. Which means it achieves nothing more than being one more exoneration in a very long list of exonerations of climate scientists. It will do nothing to stop the flood of hostile attacks on science, because it never once considers the nature of those attacks. Let’s survey some of the missed opportunities…

I’m pleased to see the report cite some of the research literature on the nature of electronic communication (e.g. the early work of Sara Kiesler et al), but it’s a really pity they didn’t read much of this literature. One problem recognized even in early studies of email communication is the requesters/informers imbalance. Electronic communication makes it much easier for large numbers of people to offload information retrieval tasks onto others, and receivers of such requests find it hard to figure out which requests they are obliged to respond to. They end up being swamped. Which is exactly what happened with that (tiny) research unit in the UK, when a bunch of self-styled auditors went after them.

And similar imbalances pervade everything. For example on p42, we have:

“There continues to be a scientific debate about the reality, causes and uncertainties of climate change that is conducted through the conventional mechanisms of peer-reviewed publication of results, but this has been paralleled by a more vociferous, more polarised debate in the blogosphere and in popular books. In this the protagonists tend to be divided between those who believe that climate is changing and that human activities are contributing strongly to it, and those that are sceptical of this view. This strand of debate has been more passionate, more rhetorical, highly political and one in which each side frequently doubts the motives and impugns the honesty of the other, a conflict that has fuelled many of the views expressed in the released CRU emails, and one that has also been dramatically fuelled by them.” (page 42, para 26)

But the imbalance is clear. This highly rhetorical debate in the blogosphere occurs between, on the one hand, a group of climate scientists with many years training, and whose expertise is considerable (and the report makes a good job of defending their expertise), and on the other hand, a bunch of amateurs, most of whom have no understanding of how science works, and who are unable to distinguish scientific arguments from ideology. And the failure to recognise this imbalance leads the report to conclude that a suitable remedy is to :

“…urge all scientists to learn to communicate their work in ways that the public can access and understand; and to be open in providing the information that will enable the debate, wherever it occurs, to be conducted objectively.” (page 42, para 28)

No, no, no. As I said very strongly earlier this year, this is naive and irresponsible. No scientist can be an effective communicator in a world where people with vested interests will do everything they can to destroy his or her reputation.

Chapter 6 of the report, on the land station temperature record ought to shut Steve McKitrick McIntyre up forever. But of course it won’t, because he’s not interested in truth, only in the dogged determination to find fault with climate scientists’ work no matter what. Here’s some beautiful quotes:

“To carry out the analysis we obtained raw primary instrumental temperature station data. This can be obtained either directly from the appropriate National Meteorological Office (NMO) or by consulting the World Weather Records (WWR) …[web links elided] … Anyone working in this area would have knowledge of the availability of data from these sources.” (Page 46, paras 13-14)

“Any independent researcher may freely obtain the primary station data. It is impossible for a third party to withhold access to the data.” (Page 48, para 20).

…well, anyone that it except McKitrickMcIntyre and followers, who continue to insist, despite all evidence to the contrary, that climate scientists are withholding station data.

And on sharing the code, the report is equally dismissive of the allegations:

“The computer code required to read and analyse the instrumental temperature data is straightforward to write based upon the published literature.  It amounts a few hundred lines of executable code (i.e. ignoring spaces and comments). Such code could be written by any research unit which is competent to reproduce or test the CRUTEM analysis.  For the trial analysis of the Review Team, the code was written in less than two days and produced results similar to other independent analyses. No information was required from CRU to do this.” (Page 51, para 33)

I like the “any research unit which is competent to reproduce or test the CRUTEM analysis” bit. A lovely British way of saying that  the people making allegations about lack of openness are incompetent. And here’s another wonderful British understatement, referring to ongoing criticism of Briffa’s 1992 work:

“We find it unreasonable that this issue, pertaining to a publication in 1992, should continue to be misrepresented widely to imply some sort of wrongdoing or sloppy science.” (page 62, para 32)

Unreasonable? Unreasonable? It’s an outrage, an outrage I tell you!! (translation provided for those who don’t speak British English).

And there’s that failure to address the imbalance again. In examining the allegations from Boehmer-Christiansen, editor of the notoriously low-quality journal Energy and Environment, that the CRU researchers tried to interfer with the peer-review process, we get the following bits of evidence: An email sent by Boehmer-Christiansen to a variety of people with the subject line Please take note of potetially [sic] serious scientific fraud by CRU and Met Office.“, and Jones’ eventual reply to her head of department: “I don‟t think there is anything more you can do. I have vented my frustration and have had a considered reply from you“, which leads to the finding:

“We see nothing in these exchanges or in Boehmer-Christiansen’s evidence that supports any allegation that CRU has directly and improperly attempted to influence the journal that she edits. Jones’ response to her accusation of scientific fraud was appropriate, measured and restrained.” (page 66, para 14).

Again, a missed opportunity to comment on the imbalance here. Boehmer-Christiansen is able to make wild and completely unfounded accusations of fraud, and nobody investigates her, while Jones’ reactions to the allegations are endlessly dissected, and in the end everything’s okay, because his response was “appropriate, measured and restained”. No, that doesn’t make it okay. It means someone failed to ask some serious questions how and why people like Boehmer-Christiansen can be allowed to get away with continual smearing of respected climate scientists.

So, an entire 160 pages, in which the imbalance is never once questioned – the imbalance between the behaviour that’s expected of climate scientists, and the crap that the denialists are allowed to get away with. Someone has to put a stop to their nonsense, but unfortunately, Muir Russell ducked the responsibility.

Postscript: my interest in software engineering issues makes me unable to let this one pass without comment. The final few pages of the report criticize the CRU for poor software development standards:

“We found that, in common with many other small units across a range of universities and disciplines, CRU saw software development as a necessary part of a researcher‘s role, but not resourced in any professional sense.  Small pieces of software were written as required, with whatever level of skill the specific researcher happened to possess.  No formal standards were in place for: Software specification and implementation; Code reviews; and Software testing” (page 103, para 30).

I don’t dispute this – it is common across small units, and it ought to be fixed. However, it’s a real shame the report doesn’t address the lack of resources and funding for this. But wait. Scroll back a few pages…

“The computer code required to read and analyse the instrumental temperature data is straightforward to write […] It amounts a few hundred lines of executable code […]  For the trial analysis of the Review Team, the code was written in less than two days and produced results similar to other independent analyses.” (page 51, para 33)

Er, several hundred lines of code written in less than 2 days? What, with full software specification, code review, and good quality testing standards? I don’t think so. Ironic that the review team can criticize the CRU software practices, while taking the same approach themselves. Surely they must have spotted the irony?? But, apparently not. The hypocrisy that’s endemic across the software industry strikes again: everyone has strong opinions about what other groups ought to be doing, but nobody practices what they preach.

Gavin beat me to posting the best quote from the CCSM workshop last week – the Uncertainty Prayer. Uncertainty cropped up as a theme throughout the workshop. In discussions about the IPCC process, one issue came up several times: the likelihood that the spread of model projections in the next IPCC assessment will be larger than in AR4. The models are significantly more complex than they were five years ago, incorporating a broader set of earth system phenomena and resolving finer grain processes. The uncertainties in a more complex earth system model have a tendency to multiply, leading to a broader spread.

There is a big concern here about how to communicate this. Does this mean the science is going backwards – that we know less now than we did five years ago (imagine the sort of hay that some of the crazier parts of the blogosphere will make of that)? Well, there has been all sorts of progress in the past five years, much of it to do with understanding the uncertainties. And one result is the realization that the previous generations of models have under-represented uncertainty in the physical climate system – i.e. the previous projections for future climate change were more precise than they should have been. The implications are very serious for policymaking, not because there is any weaker case now for action, but precisely the opposite – the case for urgent action is stronger because the risks are worse, and good policy must be based on sound risk assessment. A bigger model spread means there’s now a bigger risk of more extreme climate responses to anthropogenic emissions. This problem was discussed at a fascinating session at the AGU meeting last year on validating model uncertainty (See: “How good are predictions from climate models?“).

At the CCSM meeting last week, Julia Slingo, chief scientist at the UK Met Office put the problem of dealing with uncertainty into context, by reviewing the current state of the art in short and long term forecasting, in a fascinating talk “Uncertainty in Weather and Climate Prediction”.

She began with the work of Ed Lorenz. The Lorenz attractor is the prototype chaotic model. A chaotic system is not random, and the non-linear equations of a chaotic system demonstrate some very interesting behaviours. If it’s not random, then it must be predictable, but this predictability is flow dependent – where you are in the attractor will determine where you will go, but some starting points lead to a much more tightly constrained set of behaviours than others. Hence, the spread of possible outcomes depends on the initial state, and some states have more predictable outcomes than others.

Why stochastic forecasting is better than deterministic forecasting

Much of the challenge in weather forecasting is to sample the initial condition uncertainty. Rather than using a single (deterministic) forecast run, modern weather forecasting makes use of ensemble forecasts, which probe the space of possible outcomes from a given (uncertain) initial state. This then allows the forecasters to assess possible outcomes, estimate risks and possibilities, and communicate risks to the users. Note the phrase “to allow the forecasters to…” – the role of experts in interpreting the forecasts and explaining the risks is vital.

As an example, Julia showed two temperature forecasts for London, using initial conditions for 26 June on two consecutive years, 1994 and 1995. The red curves show the individual members of an ensemble forecast. The ensemble spread is very different in each case, demonstrating that some initial conditions are more predictable than others: one has very high spread of model forecasts, and the other doesn’t (although note that in both cases the actual observations lie within the forecast spread):

Ensemble forecasts for two different initial states (click for bigger)

The problem is that in ensemble forecasting, the root mean squared (rms) error of the ensemble mean often grows faster than the spread, which indicates that the forecast is under-dispersive; in other words, the models don’t capture enough of the internal variability in the system. In such cases, improving the models (by eliminating modeling errors) will lead to increased internal variability, and hence larger ensemble spread.

One response to this problem is the work on stochastic parameterizations. Essentially, this introduces noise into the model to simulate variability in the sub-grid processes. This can then reduce the systematic model error if it better captures the chaotic behaviour of the system. Julia mentioned three schemes that have been explored for doing this:

  • Random Parameters (RP), in which some of the tunable model parameters are varied randomly. This approach is not very convincing as it indicates we don’t really know what’s going on in the model.
  • Stochastic Convective Vorticity (SCV)
  • Stochastic Kinetic Energy Backscatter (SKEB)

The latter two approaches tackle known weaknesses in the models, at the boundaries between resolved physical processes and sub-scale parameterizations. There is plenty of evidence in recent years that there are upscale energy cascades from unresolved scales, and that parametrizations don’t capture this. For example, in the backscatter scheme, some fraction of dissipated energy is scattered upscale and acts as a forcing for the resolved-scale flow. By including this in the ensemble prediction system, the forecast is no longer under-dispersive.

The other major approach is to increase the resolution of the model. Higher resolutions models will explicitly resolve more of the moist processes in sub-kilometer scale, and (presumably) remove this source of model error, although it’s not yet clear how successful this will be.

But what about seasonal forecasting – surely this growth of uncertainty prevents any kind of forecasting? People frequently ask “If we can’t predict weather beyond the next week, why is it possible to make seasonal forecasts?” The reason is that for longer term forecasts, the boundary forcings start to matter more. For example, if you add a boundary forcing to the Lorenz attractor, it changes the time in which the system stays in some part of the attractor, without changing the overall behaviour of the chaotic system. For a weak forcing, the frequency of occurrence of different regimes is changed, but the number and spatial patterns are unchanged. Under strong forcing, even the patterns of regimes are modified as the system goes through bifurcation points. So if we know something about the forcing, we can forecast the general statistics of weather, even if it’s not possible to say what the weather will be at a particular location at a particular time.

Of course, there’s still a communication problem: people feel weather, not the statistics of climate.

Building on the early work of Charney and Shukla (e.g. see their 1981 paper on monsoon predictability), seasonal to decadal prediction using coupled atmosphere-ocean systems does work, whereas 20 years ago, we would never have believed it. But again, we get the problem that some parts of the behaviour space are easier to predict than others. For example, the onset of El Niño is much harder to predict than the decay.

In a fully coupled system, systematic and model-specific errors grow much more strongly. Because the errors can grow quickly, and bias the probability distribution of outcomes, seasonal and decadal forecasts may not be reliable. So we assess reliability of a given model using hindcasts. Every time you change the model, you have to redo the hindcasts to check reliability. This gives a reasonable sanity check for seasonal forecasting, but for decadal prediction, it is challenging has we have very limited observational base.

And now, we have another problem: climate change is reducing the suitability of observations from the recent past to validate the models, even for seasonal prediction:

Climate Change shifts the climatology, so that models tuned to 20th century climate might no longer give good forecasts

Hence, a 40-year hindcast set might no longer be useful for validating future forecasts. As an example, the UK Met Office got into trouble for failing to predict the cold winter in the UK for 2009-2010. Re-analysis of the forecasts indicates why: Models that are calibrated on a 40-year hindcast gave only 20% probability of cold winter (and this was what was used for the seasonal forecast last year). However, models that are calibrated on just the past 20-years gave a 45% probability. Which indicates that the past 40 years might no longer be a good indicator of future seasonal weather. Climate change makes seasonal forecasting harder!

Today, the state-of-the-art for longer term forecasts is multi-model ensembles, but it’s not clear this is really the best approach, it just happens to be where we are today. Multi-model ensembles have a number of strengths: Each model is extensively tested by its own community and a large pool of alternative components provides some sampling across structural assumptions. But they are still an ensemble of opportunity – they do not systematically sample uncertainties. Also the set is rather small – e.g. 21 different models. So the sample is too small for determining the distribution of possible changes, and the ensembles are especially weak for predicting extreme events.

There has been a major effort on quantifying uncertainty over last few years at the Hadley Centre, using a perturbed physics ensemble. This allows for a larger sample: 100s (or even 10,000s in climateprediction.net) of variants of the same model. The poorly constrained model parameters are systematically perturbed, within expert-suggested ranges. But this still doesn’t sample the structural uncertainty in the models, because all the variants are from a single base model. As an example of this work, the UKCP09 project was an attempt to move from uncertainty ranges (as in AR4) to a probability density function (pdf) for likely change. UKCP uses over 400 model projections to compute the pdf. Although there are many problems with the UKCP (see the AGU discussion for a critique), but they were a step forward in understanding how to quantify uncertainty. [Note: Julia acknowledged weaknesses in both CP.net and the UKCP projects, but pointed out that they are mainly interesting as examples of how forecasting methodology is changing]

Another approach is to show which factors tend to dominate the uncertainty. For example, a pie chart showing impact of different sources of uncertainty (model weaknesses, carbon cycle, natural variability, downscaling uncertainty) on the forecast for rainfall in 2020s vs 2080s is interesting – for the 2020s, the uncertainty about the carbon cycle is relatively small factor, whereas for the 2080s it’s a much bigger factor.

Julia suggests it’s time for a coordinated study of the effects of model resolution on uncertainty. Every modeling group is looking at this, but they are not doing standardized experiments, so comparisons are hard.

Here is an example from Tim Palmer. In AR4, WG1 chapter 11 gave an assessment of regional patterns of change in precipitation. For some regions, it was impossible to give a prediction (the white areas), whereas for others, the models appear to give highly confident predictions. But the confidence might be misplaced because many of the models have known weaknesses that are relevant to future precipitation. For example, the models don’t simulate persistent blocking anticyclones very well. Which means that it’s wrong to assume that if most models agree, we can be confident in the prediction. For example, the Athena experiments with very high resolution models (T1259) showed much better blocking behaviour against the observational dataset ERA40. This implies we need to be more careful about selecting models for a multi-model ensemble for certain types of forecast.

The real butterfly effect raises some fundamental unanswered questions about convergence of climate simlations with increasing resoltion. Maybe there is an irreducible level of uncertainty in climate change. And if so, what is it? How much will increased resolution reduce the uncertainty? Will things be much better when we can resolve processes at  20km, 2km, or even 0.2km? compared to say 200km? Once we reach a certain resolution (e.g. 20km) is it just as good to represent small scale motions using stochastic equations? And what’s the most effective way to use the available computing resources as we increase the resolution? [There’s an obvious trade-off between increasing the size of the ensemble, and increasing the resolution of individual ensemble members]

Julia’s main conclusion is that Lorenz’ theory of chaotic systems now pervades all aspects of weather and climate prediction. Estimating and reducing uncertainty requires better multi-scale physics, higher resolution models, and more complete observations.

Some of the questions after the talk probed these issues a little more. For example, Julia was asked  how to handle policymakers demanding better decadal prediction, when we’re not ready to deliver it. Her response was that she believes higher resolution modeling will help, but that we haven’t proved this yet, so we have to manage expectations very carefully. She was also asked about the criteria to use to use for including different models in an ensemble – e.g. should we exclude models that don’t conserve physical quantities, that don’t do blocking, etc? For UKCP09, the criteria were global in nature, but this isn’t sufficient – we need criteria that test for skill with specific phenomena such as El Nino. Because the inclusion criteria aren’t clear enough yet, the UKCP project couldn’t give advice on wind in the projections. In the long run, the focus should be on building the best model we can, rather than putting effort into exploring perturbed physics, but we have to balance needs of users for better probablistic predictions against need to get on and develop better phyiscs in the models.

Finally, on the question of interpretation, Julia was asked what if users (of the forecasts) can’t understand or process probablistic forecasts? Julia pointed out that some users can process probablistic forecasts, and indeed that’s exactly what they need. For example, the insurance industry. Others use it as input for risk assessment – e.g. water utilities. So we do have to distinguish the needs of different types of users.

The IPCC schedule impacts nearly all aspects of climate science. At the start of this week’s CCSM workshop, Thomas Stocker from the University of Bern, and co-chair of working group 1 of the IPCC, gave an overview of the road toward the fifth assessment report (AR5), due to be released in 2013

First, Thomas reminded us that the IPCC does not perform science (it’s job is to assess the current state of the science), but increasingly it stimulates science. This causes some tension though, as curiosity-driven research must remain the priority for the scientific community.

The highly politicized environment also poses a huge risk. There are some groups actively seeking to discredit climate science and damage the IPCC, which means that rigor of the IPCC procedures are now particularly important. One important lesson from the last year is that there is no procedure for correcting serious errors in the assessment reports. Minor errors are routine, and are handled by releasing errata. But this process broke down for bigger issues such as the Himalayan glacier error.

Despite the critics, climate science is about as transparent as a scientific field can be. Anyone can download a climate model and see what’s in there. The IPCC process is founded on four key values (thanks to the advocacy of Susan Solomon): Rigor, Robustness, Transparency, and Comprehensiveness. However, there are clearly practical limits to transparency. For example, it’s not possible to open up lead author meetings, because the scientists need to be able to work together in a constructive atmosphere, rather than “having miscellaneous bloggers in the room”!

The structure of the IPCC remain the same: three working groups: WG1 on the physical science basis, WG2 on impacts and adaptation, and WG3 on mitigation, along with a task force on GHG inventories.

The most important principles for the IPCC are in article 2 and 3:

2. “The role of the IPCC is to assess on a comprehensive, objective, open and transparent basis the scientific, technical and socio-economic information relevant to understanding the scientific basis of risk of human-induced climate change, its potential impacts and options for adaptation and mitigation. IPCC reports should be neutral with respect to policy, although they may need to deal objectively with scientific, technical and socio-economic factors relevant to the application of particular policies.

3. Review is an essential part of the IPCC process. Since the IPCC is an intergovernmental body, review of IPCC documents should involve both peer review by experts and review by governments.

A series of meetings have already occurred in preparation for AR5:

  • Mar 2009: An expert meeting on science of alternative greenhouse gas metrics. The met and produced a report.
  • Sept 2009: An expert meeting on detection and attribution, which produced a report and a good practice guidance paper [which itself is a great introduction to how attribution studies are done].
  • Jan 2010: An expert meeting at NCAR on assessing and combining multi-model projections. The report from this meeting is due in a few weeks, and will also include a good practice guide.
  • Jun 2010: A workshop on sea level rise and ice sheet instability, which was needed because of the widespread recognition that AR4 was weak on this issue, perhaps too cautious.
  • And in a couple of weeks, in July 2010, a workshop on consistent treatment of uncertainties and risks. This is a cross-Working Group meeting, at which they hope to make progress on getting all three working groups to use the same approach. In the AR4, WG1 developed a standardized language for describing uncertainty, but other working groups have not yet.

Thomas then identified some important emerging questions leading up to AR5.

  1. Trends and rates of observed climate change, and in particular, the question of whether climate change has accelerated? Many recent papers and reports indicate that it has; the IPCC needs to figure out how to assess this, especially as there are mixed signals. For example, the decadal trend is accelerating in Arctic sea ice extent, but  the global temperature anomaly has not accelerated over this time period.
  2. Stability of the Western and Eastern Antarctic ice sheets (WAIS and EAIS). There has been much more dynamic change at margins of these ice sheets, accelerating mass loss, as observed by GRACE. The assessment needs to look into whether these really are accelerating trends, or if its just an artefact of limited duration of measurements.
  3. Irreversibilities and abrupt change: how robust and accurate is our understanding? For example, what long term commitment have been made already in sea level rise. And what about commitments in the hydrological cycle, where some regions (Africa, Europe) might go beyond the range of observed drought within the next couple of decades, and this may be unavoidable.
  4. Clouds and Aerosols, which will have their own entire chapter in AR5. There are still big uncertainties here. For example, low level clouds are a positive feedback in the north-east Pacific, yet all but one model are unable to simulate this.
  5. Carbon and other biogeochemical cycles. New ice core reconstructions were published just after AR4, and give us more insights into regional carbon cycle footprints caused by abrupt climate change in the past. For example, the ice cores show clear changes in soil moisture and total carbon stored  in the Amazon region.
  6. Near-term and long-term projections, for example the question of how reliable the decadal projections are. This is a difficult area. Some people say we already have seamless prediction (from decades to centuries), but not Thomas is not yet convinced. For example, there are alarming new results on number of extreme hot days across southern Europe that need to be assessed – these appear to challenge assumptions about the decadal trends.
  7. Regional issues – eg frequency and severity of impacts. Traditionally, the IPCC reports have taken an encyclopedic approach: take each region, and list the impacts in each. Instead, for AR5, the plan is to start with the physical processes, and then say something about sensitivity within each region to these processes.

Here’s an overview of the planned structure of the AR5 WG1 report:

  • Intro
  • 4 chps on observations and paleoclimate
  • 2 chps on process understanding (biogeochemistry and clouds/aerosols)
  • 3 chps from forcing to attributions
  • 2 chps on future climate change and predictability (near term and long term)
  • 2 integration chapters (one on sea level rise, and one on regional issues)

Some changes are evident from AR4. Observations have become more important. They grew to 3 chapters in AR4, and will keep the same in AR5. There will be another crack at paleoclimate, and new chapters on: sea level rise (a serious omission in AR4); clouds and aerosols; the carbon cycle; and regional change. There is also a proposal to produce an atlas which will include a series of maps summarizing the regional issues.

The final draft of the WG1 report is due in May 2013, with a final plenary in Sept 2013. WG2 will finish in March 2014, and WG3 in April 2014. Finally, the IPCC Synthesis Report is to be done no later than 12 months from WG1 report, ie. by September 2014. There has been pressure to create a process that incorporates new science throughout 2014 in to the synthesis report, however Thomas has successfully opposed this, on the basis that it will cause far more controversy if the synthesis report is not consistent with the WG reports.

The deadlines for published research to be included in the assessment is as follows. Papers need to be submitted for publication by 31 July 2012, and must be in press by 15 March 2013. The IPCC has to be very strict about this, because there are people out there who have nothing better to do than to wade through all the references in AR4 and check that all of them appeared before the cutoff date.

Of course, these dates are very relevant to the CCSM workshop audience. Thomas urged everyone not to leave this to the last minute; journal editors and reviewers will be swamped if everyone tries to get their papers published just prior to the deadline [although I suspect this is inevitable?].

Finally, here is a significant challenge in communication coming up. For AR5 we’re expecting to see a much broader model diversity than in previous assessments, partly because there are more models (and more variants), and partly because the models now include a broader range of earth system processes. This will almost certainly mean a bigger model spread,  and hence a likely increase in uncertainty. It will be a significant challenge to communicate the reasons for this to policymakers and a lay audience. Thomas argues that we must not be ashamed to present how science works – that in some cases the uncertainties multiply, during which the spread of projections grows, and then when we get the models more constrained by observations they converge again. But this also poses problems in how we do model elimination and model weighting in ensemble projections. For example, if a particular model shows no sea ice in the year 2000, it probably should be excluded as this is clearly wrong. But how do we set clear criteria for this?

I’ve speculated before about the factors that determine the length of the release cycle for climate models. The IPCC assessment process, which operates on a 5-year cycle tends to dominate everything. But there are clearly other rhythms that matter too. I had speculated that the 6-year gap between the release of CCSM3 and CCSM4 could largely be explained by the demands of the the IPCC cycle; however the NCAR folks might have blown holes in that idea by making three new releases in the last six months; clearly other temporal cycles are at play.

In discussion over lunch yesterday, Archer pointed me to the paper “Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work”  by Steven Jackson and co, who point out that while the role of space and proximity have been widely studied in colloborative work, the role of time and patterns of temporal constraints have not. They set out four different kinds of temporal rhythm that are relevant to scientific work:

  • phenomenal rhythms, arising from the objects of study – e.g. annual and seasonal cycles strongly affect when fieldwork can be done in biology/ecology; the development of a disease in an individual patient affects the flow of medical research;
  • institutional rhythms, such as the academic calendar, funding deadlines, the timing of conferences and paper deadlines, etc.
  • biographical rhythms, arising from individual needs – family time, career development milestones, illnesses and vacations, etc.
  • infrastructural rhythms, arising from the development of the buildings and equipment that scientific research depends on. Examples include the launch, operation and expected life of a scientific instrument on a satellite, the timing of software releases, and the development of classification systems and standards.

The paper gives two interesting examples of problems in aligning these rhythms. First, the example of the study of long term phenomena such as river flow on short term research grants led to mistakes where a data collected during an unusually wet period in the early 20th century led to serious deficiencies in water management plans for the Colorado river. Second, for NASA’s Mars mission MER, the decision was taken to put the support team on “Mars time” as the Martian day is 2.7% longer than the earth day. But as the team’s daily work cycle drifted from the normal earth day, serious tensions arose between the family and social needs of the project team and the demands of the project rhythm.

Here’s another example that fascinated me when I was at the NASA software verification lab in the 90s. The Cassini spacecraft took about six years to get to Saturn. Rather than develop all the mission software prior to launch, NASA took the decision to develop only the minimal software needed for launch and navigation, and delayed the start of development of the mission software until just prior to arrival at Saturn. The rational was that they didn’t want a six year gap between development and use of this software, during which time the software teams might disperse – they needed the teams in place, with recent familiarity with the code, at the point the main science missions started.

For climate science, the IPCC process is clearly a major institutional rhythm, but the infrastructural rhythms that arise in model development interact with this in complex ways. I need to spend time looking at the other rhythms as well.

Of all the global climate models, the Community Earth System Model, CESM, seems to come closest to the way an open source community works. The annual CESM workshop, this week in Breckenridge, Colorado, provides an example of how the community works. There are about 350 people attending, and much of the meeting is devoted to detailed discussion of the science and modeling issues across a set of working groups: Atmosphere model, Paleoclimate, Polar Climate, Ocean model, Chemistry-climate, Land model, Biogeochemistry, Climate Variability, Land Ice, Climate Change, Software Engineering, and Whole Atmosphere.

In the opening plenary on Monday, Mariana Vertenstein (who is hosting my visit to NCAR this month), was awarded the 2010 CESM distinguished achievement award for her role in overseeing the software engineering of the CESM. This is interesting for a number of reasons, not least because it demonstrates how much the CESM community values the role of the software engineering team, and the advances that the software engineering working group has made improving the software infrastructure over the last few years.

Earth system models are generally developed in a manner that’s very much like agile development. Getting the science working in the model is prioritized, with issues such as code structure, maintainability and portability worked in later, as needed. To some extent, this is appropriate – getting the science right is the most important thing, and it’s not clear how much a big upfront design effort would payoff, especially in the early stages of model development, when it’s not clear whether the model will become anything more than an interesting research idea. The downside of this strategy, is that as the model grows in sophistication, the software architecture ends up being a mess. As Mariana explained in her talk, coupled models like the CESM have reached a point in their development where this approach no longer works. In effect, a massive refactoring effort is needed to clean up the software infrastructure to permit future maintainability.

Mariana’s talk was entitled “Better science through better software”. She identified a number of major challenges facing the current generation of earth system models, and described some of the changes in the software infrastructure that have been put in place for the CESM to address them.

The challenges are:

1) New system complexity, as new physics, and new grids are incorporated into the models. For example, the CESM now has a new land ice model, which along with the atmosphere, ocean, land surface, and sea ice components brings the total to five distinct geophysical component models, each operating on different grids, and each with its own community of users. These component models exchange boundary information via the coupler, and the entire coupled model now runs to about 1.2 million lines of code (compare with the previous generation model, CCSM3, now six years old, which had about 330KLoC).

The increasing number of component models increases the complexity of the coupler. It now has to handle regridding (where data such as energy and mass is exchanged between component models with different grids), data merging, atmosphere-ocean fluxes, and conservation diagnostics (e.g. to ensure the entire model conserves energy and mass). Note: Older versions of the model were restricted, for example with the atmosphere, ocean and land surface schemes all required to use the same grid.

Users also want to be able to swap in different versions of each major component. For example, a particular run might demand a fully prognostic atmosphere model, coupled with a prescribed ocean parameterization (taken from observational data, for example). Then, within each major component, users might want different configurations:  multiple dynamic cores, multiple chemistry modes, etc.

Another source of complexity comes from resolutions. Model components now run over a much wider range of resolutions, and the re-gridding challenges are substantial. And finally, whereas the old model used rectangular latitude-longitude grids, now people want to accommodate many different types of grid.

2) Ultra-high resolution. The trend towards higher resolution grids poses serious challenges for scalability, especially given the massive increase in volume of data being handled. All components (and the coupler) need to be scalable in terms of both memory and performance.

Higher resolution increases the need for more parallelism, and there has been tremendous progress on this in the last few years. A few years back, as part of the DOE/LLNL grand challenge, CCSM3 managed 0.5 simulation years per day, running on 4,000 cores, and this was considered a great achievement. This year, the new version of CESM has successfully run on 80,000 cores, to give 3 simyears per day in a very high resolution model: 0.125° grid for the atmosphere, 0.25° for the land and 0.1° for the ocean.

Interestingly, in these highly parallel configurations, the ocean model, POP, is no longer dominant for processing time; the sea ice and atmosphere models start to dominate because the two of them are coupled sequentially. Hence the ocean model scales more readily.

3) Data assimilation. For weather forecasting models, this has long been standard analysis practice. Briefly, the model state and the observational data are combined at each timestep to give a detailed analysis of the current state of the system, which helps to overcome limitations in both the model and the data, and to better understand the physical processes underlying the observational data. It’s also useful in forecasting, as it allows you to arrive at a more accurate initial state for a forecast run.

In climate modeling, data assimilation is a relatively new capability. The current version of the CESM can do data assimilation in both the atmosphere and ocean. The new framework also supports experiments where multiple versions of the same component are used within a run. For example, the model might have multiple atmosphere components in a single simulation, each coupled with its own instance of the ocean, where one is an assimilation module and the other a prognostic model.

4) The needs of the user community. Supporting a broad community of model users adds complexity, especially as the community becomes more diverse. The community needs more frequent releases of the model (e.g. more often than every six years!), and people ned to be able to merge new releases more easily into their own sandboxes.

These challenges have inspired a number of software infrastructure improvements in the CESM. Mariana described a number of advances.

The old model, CCSM3 was run as multiple executables, one for each major component, exchanging data with a coupler via MPI. And each component used to have its own way of doing coupling. But this kills efficiency – processors end up idling when a component has to wait on data from the others. It’s also very hard in this scheme to understand the time evolution as the model runs, which then also makes it very hard to debug. And the old approach was notoriously hard to port to different platforms.

The new framework has a top level driver that controls time evolution, with all coupling done at the top level. Then the component models can be laid out across the available processors, either all in parallel, or in a hybrid parallel-sequential mode. For example, atmosphere, land scheme and sea ice modules might be called in sequence, with the ocean model running in parallel with the whole set. The chosen architecture is specified in a single XML file. This brings a number of benefits:

  • Better flexibility for very different platforms;
  • Facilitates model configurations with huge amounts of parallelism across a very large number of processors;
  • Allows the coupler & components to be ESMF compliant, so the model can can couple with other ESMF compliant models;
  • Integrated release cycle – it’s now all one model, whereas in the past each component model had it’s own separate releases.
  • Much easier to debug, as it’s easier to follow the time evolution.

The new infrastructure also includes scripting tools that support the process of setting up an experiment, and making sure it runs with optimal performance on a particular platform. For example, the current release includes script to create wide variety of out-of-the-box experiments. It also includes a load balancing tool, to check how much time each component is idle during a run, and new scripts with hints for porting to new platforms, based on a set of generic machine templates.

The model also has a new parallel I/O library (PIO), which adds a layer of abstraction between the data structures used in each model component and the arrangement of the data when written to disk.

The new versions of the model are now being released via the subversion repository (rather than a .tar file, as used in the past). Hence, users can use an svn merge to get the latest release. There have been three model releases since January:

  • CCSM Alpha, released in January 2010;
  • CCSM 4.0 full release, in April 2010;
  • CESM 1.0 released June 2010.

Mariana ended her talk with a summary of the future work – complete the CMIP5 runs for the next round of the IPCC assessment process; regional refinement with scalable grids; extend the data assimilation capability; handle super-parameterization (e.g. include cloud resolving models); add hooks for human dimensions within the models (e.g. to support the DOE program on integrated assessment); and improved validation metrics.

Note: the CESM is the successor to CCSM – the community climate system model. The name change recognises the wider set of earth systems now incorporated into the model.