The meteorologist, Norman Phillips died last week, at the grand old age of 95. As I’ve written about his work in my forthcoming book, Computing the Climate, I’ve extracted this piece from the manuscript, to honour his contribution to climate modelling—not only did he create the first ever global circulation model, but the ideas in his model sparked off a revolution in how we use computers to model the climate system. We join the story shortly after the success of the first numerical forecast model, developed by Jule Charney and his team of meteorologists at Princeton in the late 1940s. Among the team was a young Normal Phillips…

In the 1950’s, a team of meteorologists led by Jule Charney at Princeton’s Institute for Advanced Studies (IAS) had turned the equations of motion into a program that could compute the weather. Flushed with the success of a trial run of their forecast model on ENIAC in March 1950, they were keen to figure out how to extend the range of their forecasts. Within a couple of years, they had produced some reasonably good forecasts for 24-hours, and sometimes even 36-hours, although in the early 1950s, they couldn’t yet do this consistently. For better forecasts, they would need better models and better data.

Because of limited computing power, and limited observational data, their early models were designed to cover only a part of the globe—the region over North America. This meant they were simulating an “open” system. In the real world, the part of the atmosphere included in the model interacts with parts outside the model, exchanging mass and energy freely. If a storm system from elsewhere moved into the region, the model could not simulate this, as it has no information on what is happening beyond its boundaries.

In his initial models, Charney had ignored this problem, and treated the boundary conditions as fixed. He added an extra strip of grid points at each edge of the model’s main grid, where conditions were treated as constant. When the simulation calculated the next state of the atmosphere for each point within the grid, these edge points just kept their initial values. This simplification imposed a major limitation on the accuracy of the weather forecasts. As the simulation proceeded, the values at these edge points would become less and less like the real conditions, and these errors would propagate inwards, across the grid. To get longer forecasts—say for weeks, instead of days—a better solution was needed. For long-range forecasting, the computer would need to think outside the box.

The obvious way to do this was to extend the grid to cover the entire globe, making it a “closed” system. This would leave only two, simpler boundaries. At the top of the atmosphere, energy arrives from the sun, and is lost back to space. But no air mass crosses this boundary, which means there are no significant boundary disturbances. At the bottom, where the atmosphere meets the surface of the planet, things are more complicated, as the both heat and moisture cross the boundary, with water evaporating from the land and oceans, and eventually being returned as rain and snow. But this effect is small compared to movements within the atmosphere, so it could be ignored, at least for the coarse-grained models of the 1950s—later models would incorporate this exchange between surface and atmosphere directly in the simulation.

Among the group at Princeton, Norman Philips was the first to create a working global model. Because the available computer power was still relatively tiny, extending the grid for an existing forecast model wasn’t feasible. Instead, Phillips took a different approach. He removed so many of the features of the real planet, the model barely resembled the earth at all.

To simplify things, he treated the surface of the earth as smooth and featureless. He used a 17×16 grid, not unlike the original ENIAC model, but connected the cells on the eastern edge with the cells on the western edge, so that instead of having fixed boundaries to the east and the west, the grid wrapped around, as though it were a cylindrical planet [1]. At the north and south edges of the grid, the model behaved as if there were solid walls—movement of the atmosphere against the wall would be reflected back again. This overall shape simplified things: by connecting the east and west edges, model could simulate airflows that circulate all the way around the planet, but Phillips didn’t have to figure out the complex geometry where grid cells converge at the poles.

The dimensions of this simulated cylindrical planet were similar to those of Charney’s original weather model, as it used the same equations. Phillips’ grid points were 375km apart in the east-west direction and 625km apart in the north-south. This gave a virtual planet whose circumference was less than 1/6th of the circumference of the earth, but whose height was almost the same as the height of the earth from pole to pole. A tall, thin, cylindrical earth.

To simplify things even more, Phillip’s cylindrical model represented only one hemisphere of earth. He included a heating effect at the southern end of the grid, to represent the equator receiving the most energy from the sun, and a cooling effect at the northern end of the model, to represent the arctic cooling as it loses heat to space [2]. The atmosphere was represented as two layers of air, and each layer was a version of Charney’s original one-layer model. The grid therefore had 17x16x2 cells in total, and it ran on a machine with 5Kbytes of RAM and 10Kbytes of magnetic drum memory. The choice of this grid is not an accident: the internal memory of the IAS machine could store 1,024 numbers (it had 1024 words, each 40-bits long). Phillip’s choice of grid meant a single state of the global atmosphere could be represented with about 500 variables [3], thus taking up just under half of the machine’s memory, leaving the other half available for calculating the next state.

To initialize the model, Phillips decided not to bother with observational data at all. That would have been hard anyway, as the geometry of the model didn’t resemble planet earth. Instead, he started with a uniform atmosphere at rest. In other words, every grid point started with the same values, as though there was no wind anywhere. Starting a simulation model with the atmosphere at rest and hoping the equations would start to generate realistic weather patterns was a bold, and perhaps crazy idea.

It is also the ultimate test of the equations in the model: if they could get the virtual atmosphere moving in a realistic way, it means nothing important has been left out. Today, we call this a spin-up run. The ocean and atmosphere components of today’s global climate models are regularly started in this way. Spin-up runs for today’s models are expensive though, because they require a lot of time on the supercomputer, and until the model settles into a stable pattern the simulation results are unusable. Oceans in particular have tremendous inertia, so modern ocean models can take hundreds of years of simulation time to produce stable and realistic ocean currents, which typically requires many weeks to run on a supercomputer. Therefore, the spin-up is typically run just once, and the state at the end of this spin-up is used as a start state for all the science experiments to be run on the model.

By 1955, Phillip had his global simulation model running successfully. Once the run started, the simulated atmosphere didn’t stay at rest. The basic equations of the model included terms for forces that would move the atmosphere: gravity, the Coriolis force, expansion and contraction when air warms and cools, and the movement of air from high pressure areas to lower pressure areas. As heat entered the atmosphere towards the southern edge, the equations in the model made this air expand, rise and move northwards, just as it does in real life. Under the effect of the Coriolis force, this moving air mass slowly curled towards the east. The model developed its own stable jet stream.
In his early tests, Phillips was able to run the model for a month of simulation time, during which the model developed a realistic jet stream and gave good results for monthly and seasonal weather statistics. Unfortunately, getting the model to run longer than a month proved to be difficult, as numerical errors in the algorithms would accumulate. In later work, Phillips was able to fix these problems, but by then a whole generation of more realistic global climate models were emerging.

Phillips’ model wasn’t a predictive model, as it didn’t attempt to match any real conditions of the earth’s atmosphere. But the fact that it could simulate realistic patterns made it an exciting scientific model. It opened the door to the use of computer models to improve our understanding of the climate system. As the model could generate typical weather patterns from first principles, models like this could start to answer questions about the factors that shape the climate and drive regional differences. Clearly, long range simulation models were possible, for scientists who are interested in the general patterns—the climate—rather than the actual weather on any specific day.

Despite its huge over-simplifications, Phillips’ model was regarded as a major step forward, and is now credited as the first General Circulation Model. The head of IAS, John von Neumann was so excited that within a few months he persuaded the US Weather Bureau, Air Force, and Army to jointly fund a major new research program to develop the work further, at what in today’s money would be $2 million per year. The new research program, initially known as the General Circulation Research Section [4], and housed at the Weather Bureau’s computing facility in Maryland, eventually grew to become today’s Geophysical Fluid Dynamics Lab (GFDL), one of the world’s leading research labs for climate modelling. Von Neumann then convened a conference in Princeton, in October 1955, to discuss prospects for General Circulation Modelling. Phillips’ model was the highlight of the conference, but the topics also included stability of the numerical algorithms, how to improve forecasting of precipitation (rain and snow), and the need to include in the models the role of role of greenhouse gases.

In his opening speech to the conference, von Neumann divided weather prediction into three distinct problems. Short term weather prediction, over the span of a few days, he argued, was completely dominated by the initial values. Better data would soon provide better forecasts. In contrast, long-term prediction, like Phillips’ model, is largely unaffected by initial conditions. Von Neumann argued that by modelling the general circulation patterns for the entire globe, an “infinite forecast” would be possible—a model that could reproduce the large scale patterns of the climate system indefinitely. But the hardest prediction problem, he suggested, lay in between these two: intermediate range forecasts, which are shaped by both initial conditions and general circulation patterns. His assessment was correct: short term weather forecasting and global circulation modelling both developed rapidly in the ensuing decades, whereas intermediate forecasting (on the scale of months) is still a major challenge today.

Unfortunately, von Neumann didn’t live long enough to see his prediction play out. That same year he was diagnosed with cancer, and died two years later in February 1957, at the age of 53. The meteorology team no longer had a champion on the faculty at Princeton. Charney and Phillips left to take up positions at MIT, where Phillips would soon be head of the Department of Meteorology. The IAS meteorology project that had done so much to kick-start computerized weather forecasting was soon closed. However, its influence lived on, as a whole generation of young meteorologists established new research labs around the world to develop the techniques.


Notes:

[1] Although the geometry of the grid could be considered a cylinder, Phillips used a variable Coriolis factor suitable for a spherical planet, which means his artificial planet didn’t spin like a cylinder – the Coriolis force would get stronger, the further north you moved. This is essential for the formation of a jet stream. Strictly speaking, a cylindrical planet, if it could exist at all, wouldn’t have a Coriolis force, as the effect comes from the curvature towards the poles. Phillips included it in the equations anyway, to see if it would still produce a jet stream. For details see: Lewis, J. M. (1998). Clarifying the Dynamics of the General Circulation: Phillips’s 1956 Experiment. Bulletin of the American Meteorological Society, 79(1), 39–60.

[2] This was implemented in the model using a heating parameter as a linear function of latitude, with maximum heating at the southern edge, and maximum cooling at the northern edge, with points in between scaled accordingly. As Phillips points out, this is not quite like the real planet, but it was sufficient to generate stable circulation patterns similar to those in the real atmosphere. See Phillips, N. A. (1956). The general circulation of the atmosphere: A numerical experiment. Quarterly Journal of the Royal Meteorological Society, 82(352), 123–164.

[3] Actually, the grid was only 17×15, because it wrapped around, with the westernmost grid points being the same as the easternmost ones. So each of the two atmospheric levels could be represented as a geopotential array of 255 elements. (See Lewis, 1998)

[4] Joseph Smagorinsky, another member of the team that had run the ENIAC forecasts, was appointed head of this project. See Aspray, W. (1990). John von Neumann and the Origins of Modern Computing. MIT Press. Note that von Neumann’s original proposal is reproduced in full in Smagorinsky, J. (1983). The beginnings of numerical weather prediction and general circulation modelling: Early recollections. Advances in Geophysics, 25, 3–38.

This is an excerpt from the draft manuscript of my forthcoming book, Computing the Climate.

While models are used throughout the sciences, the word ‘model’ can mean something very different to scientists from different fields. This can cause great confusion. I often encounter scientists from outside of climate science who think climate models are statistical models of observed data, and that future projections from these models must be just extrapolations of past trends. And just to confuse things further, some of the models used in climate policy analysis are like this. But the physical climate models that underpin our knowledge of why climate change occurs are fundamentally different from statistical models.

A useful distinction made by philosophers of science is between models of phenomena, and models of data. The former include models developed by physicists and engineers to capture cause-and-effect relationships. Such models are derived from theory and experimentation, and have explanatory power: the model captures the reasons why things happen. Models of data, on the other hand, describe patterns in observed data, such as correlations and trends over time, without reference to why they occur. Statistical models, for example, describe common patterns (distributions) in data, without saying anything about what caused them. This simplifies the job of describing and analyzing patterns: if you can find a statistical model that matches your data, you can reduce the data to a few parameters (sometimes just two: a mean and a standard deviation). For example, the heights of any large group of people tend to follow a normal distribution—the bell-shaped curve—but this model doesn’t explain why heights vary in that way, nor whether they always will in the future. New techniques from machine learning have extended the power of these kinds of models in recent years, allowing more complex patterns to be discovered by “training” an algorithm to find more complex kinds of pattern.

Statistical techniques and machine learning algorithms are good at discovering patterns in data (eg “A and B always seems to change together”), but hopeless at explaining why those patterns occur. To get over this, many branches of science use statistical methods together with controlled experiments, so that if we find a pattern in the data after we’ve carefully manipulated the conditions, we can argue that the changes we introduced in the experiment caused that pattern. The ability to identify a causal relationship in a controlled experiment has nothing to do with the statistical model used—it comes from the logic of the experimental design. Only if the experiment is designed properly will statistical analysis of the results provide any insights into cause and effect.

Unfortunately, for some scientific questions, experimentation is hard, or even impossible. Climate change is a good example. Even though it’s possible to manipulate the climate (as indeed we are currently doing, by adding more greenhouse gases), we can’t set up a carefully controlled experiment, because we only have one planet to work with. Instead, we use numerical models, which simulate the causal factors—a kind of virtual experiment. An experiment conducted in a causal model won’t necessarily tell us what will happen in the real world, but it often gives a very useful clue. If we run the virtual experiment many times in our causal model, under slightly varied conditions, we can then turn back to a statistical model to help analyze the results. But without the causal model to set up the experiment, a statistical analysis won’t tell us much.

Both traditional statistical models and modern machine learning techniques are brittle, in the sense that they struggle when confronted with new situations not captured in the data from which the models were derived. An observed statistical trend projected into the future is only useful as a predictor if the future is like the past; it will be a very poor predictor if the conditions that cause the trend change. Climate change in particular is likely to make a mess of all of our statistical models, because the future will be very unlike the past. In contrast, a causal model based on the laws of physics will continue to give good predictions, as long as the laws of physics still hold.

Modern climate models contain elements of both types of model. The core elements of a climate model capture cause-and-effect relationships from basic physics, such as the thermodynamics and radiative properties of the atmosphere. But these elements are supplemented by statistical models of phenomena such as clouds, which are less well understood. To a large degree, our confidence in future predictions from climate models comes from the parts that are causal models based on physical laws, and the uncertainties in these predictions derive from the parts that are statistical summaries of less well-understood phenomena. Over the years, many of the improvements in climate models have come from removing a component that was based on a statistical model, and replacing it with a causal model. And our confidence in the causal components in these models comes from our knowledge of the laws of physics, and from running a very large number of virtual experiments in the model to check whether we’ve captured these laws correctly in the model, and whether they really do explain climate patterns that have been observed in the past.

This week I’m reading my way through three biographies, which neatly capture the work of three key scientists who laid the foundation for modern climate modeling: Arrhenius, Bjerknes and Callendar.

Arrhenius-bookAppropriatingWeatherCallFullJacket#3.indd

Crawford, E. (1996). Arrhenius: From Ionic Theory to the Greenhouse Effect. Science History Publications.
A biography of Svante Arrhenius, the Swedish scientist who, in 1895, created the first computational climate model, and spent almost a full year calculating by hand the likely temperature changes across the planet for increased and decreased levels of carbon dioxide. The term “greenhouse effect” hadn’t been coined back then, and Arrhenius was more interested in the question of whether the ice ages might have been caused by reduced levels of CO2. But nevertheless, his model was a remarkably good first attempt, and produced the first quantitative estimate of the warming expected from human’s ongoing use of fossil fuels.
Friedman, R. M. (1993). Appropriating the Weather: Vilhelm Bjerknes and the Construction of a Modern Meteorology. Cornell University Press.
A biography of Vilhelm Bjerknes, the Norwegian scientist, who, in 1904, identified the primitive equations, a set of differential equations that form the basis of modern computational weather forecasting and climate models. The equations are, in essence, an adaption of the equations of fluid flow and thermodynamics, adapted to represent the atmosphere as a fluid on a rotating sphere in a gravitational field. At the time, the equations were little more than a theoretical exercise, and we had to wait half a century for the early digital computers, before it became possible to use them for quantitative weather forecasting.
Fleming, J. R. (2009). The Callendar Effect: The Life and Work of Guy Stewart Callendar (1898-1964). University of Chicago Press.
A biography of Guy S. Callendar, the British scientist, who, in 1938, first compared long term observations of temperatures with measurements of rising carbon dioxide in the atmosphere, to demonstrate a warming trend as predicted by Arrhenius’ theory. It was several decades before his work was taken seriously by the scientific community. Some now argue that we should use the term “Callendar Effect” to describe the warming from increased emissions of carbon dioxide, because the term “greenhouse effect” is too confusing – greenhouse gases were keeping the planet warm long before we started adding more, and anyway, the analogy with the way that glass traps heat in a greenhouse is a little inaccurate.

Not only do the three form a neat ABC, they also represent the three crucial elements you need for modern climate modelling: a theoretical framework to determine which physical processes are likely to matter, a set of detailed equations that allow you to quantify the effects, and comparison with observations as a first step in validating the calculations.

It’s been a while since I’ve written about the question of climate model validation, but I regularly get asked about it when I talk about the work I’ve been doing studying how climate models are developed. There’s an upcoming conference organized by the Rotman Institute of Philosophy, in London, Ontario, on Knowledge and Models in Climate Science, at which many of my favourite thinkers on this topic will be speaking. So I thought it was a good time to get philosophical about this again, and define some terms that I think help frame the discussion (at least in the way I see it!).

Here’s my abstract for the conference:

Constructive and External Validity for Climate Modeling

Discussion of validity of scientific computational models tend to treat “the model” as a unitary artifact, and ask questions about its fidelity with respect to observational data, and its predictive power with respect to future situations. For climate modeling, both of these questions are problematic, because of long timescales and inhomogeneities in the available data. Our ethnographic studies of the day-to-day practices of climate modelers suggest an alternative framework for model validity, focusing on a modeling system rather than any individual model. Any given climate model can be configured for a huge variety of different simulation runs, and only ever represents a single instance of a continually evolving body of program code. Furthermore, its execution is always embedded in a broader social system of scientific collaboration which selects suitable model configurations for specific experiments, and interprets the results of the simulations within the broader context of the current body of theory about earth system processes.

We propose that the validity of a climate modeling system should be assessed with respect to two criteria: Constructive Validity, which refers to the extent to which the day-to-day practices of climate model construction involve the continual testing of hypotheses about the ways in which earth system processes are coded into the models, and External Validity, which refers to the appropriateness of claims about how well model outputs ought to correspond to past or future states of the observed climate system. For example, a typical feature of the day-to-day practice of climate model construction is the incremental improvement of the representation of specific earth system processes in the program code, via a series of hypothesis-testing experiments. Each experiment begins with a hypothesis (drawn from current or emerging theories about the earth system) that a particular change to the model code ought to result in a predicable change to the climatology produced by various runs of the model. Such a hypothesis is then tested empirically, using the current version of the model as a control, and the modified version of the model as the experimental case. Such experiments are then replicated for various configurations of the model, and results are evaluated in a peer review process via the scientific working groups who are responsible for steering the ongoing model development effort.

Assessment of constructive validity for a modeling system would take account of how well the day-to-day practices in a climate modeling laboratory adhere to rigorous standards for such experiments, and how well they routinely test the assumptions that are built into the model in this way. Similarly, assessment of the external validity of the modeling system would take account of how well knowledge of the strengths and weaknesses of particular instances of the model are taken into account when making claims about the scope of applicability of model results. We argue that such an approach offers a more coherent approach to questions of model validity, as it corresponds more directly with the way in which climate models are developed and used.

For more background, see:

Imagine for a moment if Microsoft had 24 competitors around the world, each building their own version of Microsoft Word. Imagine further that every few years, they all agreed to run their software through the same set of very demanding tests of what a word processor ought to be able to do in a large variety of different conditions. And imagine that all these competing  companies agreed that all the results from these tests would be freely available on the web, for anyone to see. Then, people who want to use a word processor can explore the data and decide for themselves which one best serves their purpose. People who have concerns about the reliability of word processors can analyze the strengths and weaknesses of each company’s software. Then think about what such a process would do to the reliability of word processors. Wouldn’t that be a great world to live in?

Well, that’s what climate modellers do, through a series of model inter-comparison projects. There are around 25 major climate modelling labs around the world developing fully integrated global climate models, and hundreds of smaller labs building specialized models of specific components of the earth system. The fully integrated models are compared in detail every few years through the Coupled Model Intercomparison Projects. And there are many other model inter-comparison projects for various specialist communities within climate science.

Have a look at how this process works, via this short paper on the planning process for CMIP6.

What’s the difference between forecasting the weather and predicting future climate change? A few years ago, I wrote a long post explaining that weather forecasting is an initial value problem, while climate is a boundary value problem. This is a much shorter explanation:

Imagine I were to throw a water balloon at you. If you could measure precisely how I threw it, and you understand the laws of physics correctly, you could predict precisely where it will go. If you could calculate it fast enough, you would know whether you’re going to get wet, or whether I’ll miss. That’s an initial value problem. The less precise your measurements of the initial value (how I throw it), the less accurate your prediction will be. Also, the longer the throw, the more the errors grow. This is how weather forecasting works – you measure the current conditions (temperature, humidity, wind speed, and so on) as accurately as possible, put them into a model that simulates the physics of the atmosphere, and run it to see how the weather will evolve. But the further into the future that you want to peer, the less accurate your forecast, because the errors on the initial value get bigger. It’s really hard to predict the weather more than about a week into the future:

Weather as an initial value problem

Now imagine I release a helium balloon into the air flow from a desk fan, and the balloon is on a string that’s tied to the fan casing. The balloon will reach the end of its string, and bob around in the stream of air. It doesn’t matter how exactly I throw the balloon into the airstream – it will keep on bobbing about in the same small area. I could leave it there for hours and it will do the same thing. This is a boundary value problem. I won’t be able to predict exactly where the balloon will be at any moment, but I will be able to tell you fairly precisely the boundaries of the space in which it will be bobbing. If anything affects these boundaries (e.g. because I move the fan a little), I should also be able to predict how this will shifts the area in which the balloon will bob. This is how climate prediction works. You start off with any (reasonable) starting state, and run your model for as long as you like. If your model gets the physics right, it will simulate a stable climate indefinitely, no matter how you initialize it:

Climate as a boundary value problem

But if the boundary conditions change, because, for example, we alter the radiative balance of the planet, the model should also be able to predict fairly accurately how this will shift the boundaries on the climate:

Climate change as a change in boundary conditions

 

We cannot predict what the weather will do on any given day far into the future. But if we understand the boundary conditions and how they are altered, we can predict fairly accurately how the range of possible weather patterns will be affected. Climate change is a change in the boundary conditions on our weather systems.

A few weeks ago, Mark Higgins, from EUMETSAT, posted this wonderful video of satellite imagery of planet earth for the whole of the year 2013. The video superimposes the aggregated satellite data from multiple satellites on the top of NASA’s ‘Blue Marble Next Generation’ ground maps, to give a consistent picture of large scale weather patterns (Original video here – be sure to listen to Mark’s commentary):

When I saw the video, it reminded me of something. Here’s the output from the CAM3, the atmospheric component of the global climate model CESM, run at very high resolution (Original video here):

I find it fascinating to play these two videos at the same time, and observe how the model captures the large scale weather patterns of the planet. The comparison isn’t perfect, because the satellite data measures the cloud temperature (the colder the clouds, the whiter they are shown), while the climate model output shows total water vapour & rain (i.e. warmer clouds are a lot more visible, and precipitation is shown in orange). This means the tropical regions look much drier in the satellite imagery than they do in the model output.

But even so, there are some remarkable similarities. For example, both videos clearly show the westerlies, the winds that flow from west to east at the top and bottom of the map (e.g. pushing rain across the North Atlantic to the UK), and they both show the trade winds, which flow from east to west, closer to the equator. Both videos also show how cyclones form in the regions between these wind patterns. For example, in both videos, you can see the typhoon season ramp up in the Western Pacific in August and September – the model has two hitting Japan in August, and the satellite data shows several hitting China in September. The curved tracks of these storms are similar in both models. If you look closely, you can also see the daily cycle of evaporation and rain over South America and Central Africa in both videos – watch how these regions appear to pulse each day.

I find these similarities remarkable, because none of these patterns are coded into the climate model – they all emerge as a consequence of getting the basic thermodynamic properties of the atmosphere right. Remember also that a climate model is not intended to forecast the particular weather of any given year (that would be impossible, due to chaos theory). However, the model simulates a “typical” year on planet earth. So the specifics of where and when each storm forms do not correspond to anything that actually happened in any given year. But when the model gets the overall patterns about right, that’s a pretty impressive achievement.

We now have a fourth paper added to our special issue of the journal Geoscientific Model Development, on Community software to support the delivery of CMIP5. All papers are open access:

  • M. Stockhause, H. Höck, F. Toussaint, and M. Lautenschlager, Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data, Geosci. Model Dev., 5, 1023-1032, 2012.
    Describes the distributed quality control concept that was developed for handling the terabytes of data generated from CMIP5, and the challenges in ensuring data integrity (also includes a useful glossary in an appendix).
  • B. N. Lawrence, V. Balaji, P. Bentley, S. Callaghan, C. DeLuca, S. Denvil, G. Devine, M. Elkington, R. W. Ford, E. Guilyardi, M. Lautenschlager, M. Morgan, M.-P. Moine, S. Murphy, C. Pascoe, H. Ramthun, P. Slavin, L. Steenman-Clark, F. Toussaint, A. Treshansky, and S. Valcke, Describing Earth system simulations with the Metafor CIM, Geosci. Model Dev., 5, 1493-1500, 2012.
    Explains the Common Information Model, which was developed to describe climate model experiments in a uniform way, including the model used, the experimental setup and the resulting simulation.
  • S. Valcke, V. Balaji, A. Craig, C. DeLuca, R. Dunlap, R. W. Ford, R. Jacob, J. Larson, R. O’Kuinghttons, G. D. Riley, and M. Vertenstein, Coupling technologies for Earth System Modelling, Geosci. Model Dev., 5, 1589-1596, 2012.
    An overview paper that compares different approaches to model coupling used by different earth system models in the CMIP5 ensemble.
  • S. Valcke, The OASIS3 coupler: a European climate modelling community software, Geosci. Model Dev., 6, 373-388, 2013 (See also the Supplement)
    A detailed description of the OASIS3 coupler, which is used in all the European models contributing to CMIP5. The OASIS User Guide is included as a supplement to this paper.

(Note: technically speaking, the call for papers for this issue is still open – if there are more software aspects of CMIP5 that you want to write about, feel free to submit them!)

This week, I start teaching a new grad course on computational models of climate change, aimed at computer science grad students with no prior background in climate science or meteorology. Here’s my brief blurb:

Detailed projections of future climate change are created using sophisticated computational models that simulate the physical dynamics of the atmosphere and oceans and their interaction with chemical and biological processes around the globe. These models have evolved over the last 60 years, along with scientists’ understanding of the climate system. This course provides an introduction to the computational techniques used in constructing global climate models, the engineering challenges in coupling and testing models of disparate earth system processes, and the scaling challenges involved in exploiting peta-scale computing architectures. The course will also provide a historical perspective on climate modelling, from the early ENIAC weather simulations created by von Neumann and Charney, through to today’s Earth System Models, and the role that these models play in the scientific assessments of the UN’s Intergovernmental Panel on Climate Change (IPCC). The course will also address the philosophical issues raised by the role of computational modelling in the discovery of scientific knowledge, the measurement of uncertainty, and a variety of techniques for model validation. Additional topics, based on interest, may include the use of multi-model ensembles for probabilistic forecasting, data assimilation techniques, and the use of models for re-analysis.

I’ve come up with a draft outline for the course, and some possible readings for each topic. Comments are very welcome:

  1. History of climate and weather modelling. Early climate science. Quick tour of range of current models. Overview of what we knew about climate change before computational modeling was possible.
  2. Calculating the weather. Bjerknes’ equations. ENIAC runs. What does a modern dynamical core do? [Includes basic introduction to thermodynamics of atmosphere and ocean]
  3. Chaos and complexity science. Key ideas: forcings, feedbacks, dynamic equilibrium, tipping points, regime shifts, systems thinking. Planetary boundaries. Potential for runaway feedbacks. Resilience & sustainability. (way too many readings this week. Have to think about how to address this – maybe this is two weeks worth of material?)
    • Liepert, B. G. (2010). The physical concept of climate forcing. Wiley Interdisciplinary Reviews: Climate Change, 1(6), 786-802.
    • Manson, S. M. (2001). Simplifying complexity: a review of complexity theory. Geoforum, 32(3), 405-414.
    • Rind, D. (1999). Complexity and Climate. Science, 284(5411), 105-107.
    • Randall, D. A. (2011). The Evolution of Complexity In General Circulation Models. In L. Donner, W. Schubert, & R. Somerville (Eds.), The Development of Atmospheric General Circulation Models: Complexity, Synthesis, and Computation. Cambridge University Press.
    • Meadows, D. H. (2008). Chapter One: The Basics. Thinking In Systems: A Primer (pp. 11-34). Chelsea Green Publishing.
    • Randers, J. (2012). The Real Message of Limits to Growth: A Plea for Forward-Looking Global Policy, 2, 102-105.
    • Rockström, J., Steffen, W., Noone, K., Persson, Å., Chapin, F. S., Lambin, E., Lenton, T. M., et al. (2009). Planetary boundaries: exploring the safe operating space for humanity. Ecology and Society, 14(2), 32.
    • Lenton, T. M., Held, H., Kriegler, E., Hall, J. W., Lucht, W., Rahmstorf, S., & Schellnhuber, H. J. (2008). Tipping elements in the Earth’s climate system. Proceedings of the National Academy of Sciences of the United States of America, 105(6), 1786-93.
  4. Typology of climate Models. Basic energy balance models. Adding a layered atmosphere. 3-D models. Coupling in other earth systems. Exploring dynamics of the socio-economic system. Other types of model: EMICS; IAMS.
  5. Earth System Modeling. Using models to study interactions in the earth system. Overview of key systems (carbon cycle, hydrology, ice dynamics, biogeochemistry).
  6. Overcoming computational limits. Choice of grid resolution; grid geometry, online versus offline; regional models; ensembles of simpler models; perturbed ensembles. The challenge of very long simulations (e.g. for studying paleoclimate).
  7. Epistemic status of climate models. E.g. what does a future forecast actually mean? How are model runs interpreted? Relationship between model and theory. Reproducibility and open science.
    • Shackley, S. (2001). Epistemic Lifestyles in Climate Change Modeling. In P. N. Edwards (Ed.), Changing the Atmosphere: Expert Knowledge and Environmental Government (pp. 107-133). MIT Press.
    • Sterman, J. D., Jr, E. R., & Oreskes, N. (1994). The Meaning of Models. Science, 264(5157), 329-331.
    • Randall, D. A., & Wielicki, B. A. (1997). Measurement, Models, and Hypotheses in the Atmospheric Sciences. Bulletin of the American Meteorological Society, 78(3), 399-406.
    • Smith, L. a. (2002). What might we learn from climate forecasts? Proceedings of the National Academy of Sciences of the United States of America, 99 Suppl 1, 2487-92.
  8. Assessing model skill – comparing models against observations, forecast validation, hindcasting. Validation of the entire modelling system. Problems of uncertainty in the data. Re-analysis, data assimilation. Model intercomparison projects.
  9. Uncertainty. Three different types: initial state uncertainty, scenario uncertainty and structural uncertainty. How well are we doing? Assessing structural uncertainty in the models. How different are the models anyway?
  10. Current Research Challenges. Eg: Non-standard grids – e.g. non-rectangular, adaptive, etc; Probabilistic modelling – both fine grain (e.g. ECMWF work) and use of ensembles; Petascale datasets; Reusable couplers and software frameworks. (need some more readings on different research challenges for this topic)
  11. The future. Projecting future climates. Role of modelling in the IPCC assessments. What policymakers want versus what they get. Demands for actionable science and regional, decadal forecasting. The idea of climate services.
  12. Knowledge and wisdom. What the models tell us. Climate ethics. The politics of doubt. The understanding gap. Disconnect between our understanding of climate and our policy choices.

For a talk earlier this year, I put together a timeline of the history of climate modelling. I just updated it for my course, and now it’s up on Prezi, as a presentation you can watch and play with. Click the play button to follow the story, or just drag and zoom within the viewing pane to explore your own path.

Consider this a first draft though – if there are key milestones I’ve missed out (or misrepresented!) let me know!

In the talk I gave this week at the workshop on the CMIP5 experiments, I argued that we should do a better job of explaining how climate science works, especially the day-to-day business of working with models and data. I think we have a widespread problem that people outside of climate science have the wrong mental models about what a climate scientist does. As with any science, the day-to-day work might appear to be chaotic, with scientists dealing with the daily frustrations of working with large, messy datasets, having instruments and models not work the way they’re supposed to, and of course, the occasional mistake that you only discover after months of work. This doesn’t map onto the mental model that many non-scientists have of “how science should be done”, because the view presented in school, and in the media, is that science is about nicely packaged facts. In reality, it’s a messy process of frustrations, dead-end paths, and incremental progress exploring the available evidence.

Some climate scientists I’ve chatted to are nervous about exposing more of this messy day-to-day work. They already feel under constant attack, and they feel that allowing the public to peer under the lid (or if you prefer, to see inside the sausage factory) will only diminish people’s respect for the science. I take the opposite view – the more we present the science as a set of nicely polished results, the more potential there is for the credibility of the science to be undermined when people do manage to peek under the lid (e.g. by publishing internal emails). I think it’s vitally important that we work to clear away some of the incorrect mental models people have of how science is (or should be) done, and give people a better appreciation for how our confidence in scientific results slowly emerges from a slow, messy, collaborative process.

Giving people a better appreciation of how science is done would also help to overcome some of games of ping pong you get in the media, where each new result in a published paper is presented as a startling new discovery, overturning previous research, and (if you’re in the business of selling newspapers, preferably) overturning an entire field. In fact, it’s normal for new published results to turn out to be wrong, and most of the interesting work in science is in reconciling apparently contradictory findings.

The problem is that these incorrect mental models of how science is done are often well entrenched, and the best that we can do is to try to chip away at them, by explaining at every opportunity what scientists actually do. For example, here’s a mental model I’ve encountered from time to time about how climate scientists build models to address the kinds of questions policymakers ask about the need for different kinds of climate policy:

This view suggests that scientists respond to a specific policy question by designing and building software models (preferably testing that the model satisfies its specification), and then running the model to answer the question. This is not the only (or even the most common?) layperson’s view of climate modelling, but the point is that there are many incorrect mental models of how climate models are developed and used, and one of the things we should strive to do is to work towards dislodging some of these by doing a better job of explaining the process.

With respect to climate model development, I’ve written before about how models slowly advance based on a process that roughly mimics the traditional view of “the scientific method” (I should acknowledge, for all the philosophy of science buffs, that there really isn’t a single, “correct” scientific method, but let’s keep that discussion for another day). So here’s how I characterize the day to day work of developing a model:

Most of the effort is spent identifying and diagnosing where the weaknesses in the current model are, and looking for ways to improve them. Each possible improvement then becomes an experiment, in which the experimental hypothesis might look like:

“if I change <piece of code> in <routine>, I expect it to have <specific impact on model error> in <output variable> by <expected margin> because of <tentative theory about climactic processes and how they’re represented in the model>”

The previous version of the model acts as a control, and the modified model is the experimental condition.

But of course, this process isn’t just a random walk – it’s guided at the next level up by a number of influences, because the broader climate science community (and to some extent the meteorological community) are doing all sorts of related research, which then influences model development. In the paper we wrote about the software development processes at the UK Met Office, we portrayed it like this:

But I could go even broader and place this within a context in which a number of longer term observational campaigns (“process studies”) are collecting new types of observational data to investigate climate processes that are still poorly understood. This then involves the interaction several distinct communities. Christian Jakob portrays it like this:

Although the point of Jakob’s paper is to argue that the modelling and process studies communities don’t currently do enough of this kind of interactions, so there’s room for improvement in how the modelling influences the kinds of process studies needed, and how the results from process studies feed back into model development.

So, how else should we be explaining the day-to-day work of climate scientists?

I’m attending a workshop this week in which some of the initial results from the Fifth Coupled Model Intercomparison Project (CMIP5) will be presented. CMIP5 will form a key part of the next IPCC assessment report – it’s a coordinated set of experiments on the global climate models built by labs around the world. The experiments include hindcasts to compare model skill on pre-industrial and 20th Century climate, projections into the future for 100 and 300 years, shorter term decadal projections, paleoclimate studies, plus lots of other experiments that probe specific processes in the models. (For more explanation, see the post I wrote on the design of the experiments for CMIP5 back in September).

I’ve been looking at some of the data for the past CMIP exercises. CMIP1 originally consisted of one experiment – a control run with fixed forcings. The idea was to compare how each of the models simulates a stable climate. CMIP2 included two experiments, a control run like CMIP1, and a climate change scenario in which CO2 levels were increased by 1% per year. CMIP3 then built on these projects with a much broader set of experiments, and formed a key input to the IPCC Fourth Assessment Report.

There was no CMIP4, as the numbers were resynchronised to match the IPCC report numbers (also there was a thing called the Coupled Carbon Cycle Climate Model Intercomparison Project, which was nicknamed C4MIP, so it’s probably just as well!), so CMIP5 will feed into the fifth assessment report.

So here’s what I have found so far on the vital statistics of each project. Feel free to correct my numbers and help me to fill in the gaps!

CMIP
(1996 onwards)
CMIP2
(1997 onwards)
CMIP3
(2005-2006)
CMIP5
(2010-2011)
Number of Experiments 1 2 12 110
Centres Participating 16 18 15 24
# of Distinct Models 19 24 21 45
# of Runs (Models X Expts) 19 48 211 841
Total Dataset Size 1 Gigabyte 500 Gigabyte 36 TeraByte 3.3 PetaByte
Total Downloads from archive ?? ?? 1.2 PetaByte
Number of Papers Published 47 595
Users ?? ?? 6700

[Update:] I’ve added a row for number of runs, i.e. the sum of the number of experiments run on each model (in CMIP3 and CMIP5, centres were able to pick a subset of the experiments to run, so you can’t just multiply models and experiments to get the number of runs). Also, I ought to calculate the total number of simulated years that represents (If a centre did all the CMIP5 experiments, I figure it would result in at least 12,000 simulated years).

Oh, one more datapoint from this week. We came up with an estimate that by 2020, each individual experiment will generate an Exabyte of data. I’ll explain how we got this number once we’ve given the calculations a bit more of a thorough checking over.

Our paper on defect density analysis of climate models is now out for review at the journal Geoscientific Model Development (GMD). GMD is an open review / open access journal, which means the review process is publicly available (anyone can see the submitted paper, the reviews it receives during the process, and the authors’ response). If the paper is eventually accepted, the final version will also be freely available.

The way this works at GMD is that the paper is first published to Geoscientific Model Development Discussions (GMDD) as an un-reviewed manuscript. The interactive discussion is then open for a fixed period (in this case, 2 months). At that point the editors will make a final accept/reject decision, and, if accepted, the paper is then published to GMD itself. During the interactive discussion period, anyone can post comments on the paper, although in practice, discussion papers often only get comments from the expert reviewers commissioned by the editors.

One of the things I enjoy about the peer-review process is that a good, careful review can help improve the final paper immensely. As I’ve never submitted before to a journal that uses an open review process, I’m curious to see how the open reviewing will help – I suspect (and hope!) it will tend to make reviewers more constructive.

Anyway, here’s the paper. As it’s open review, anyone can read it and make comments (click the title to get to the review site):

Assessing climate model software quality: a defect density analysis of three models

J. Pipitone and S. Easterbrook
Department of Computer Science, University of Toronto, Canada

Abstract. A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated. Thus, in order to trust a climate model one must trust that the software it is built from is built correctly. Our study explores the nature of software quality in the context of climate modelling. We performed an analysis of defect reports and defect fixes in several versions of leading global climate models by collecting defect data from bug tracking systems and version control repository comments. We found that the climate models all have very low defect densities compared to well-known, similarly sized open-source projects. We discuss the implications of our findings for the assessment of climate model software trustworthiness.

On Thursday, Kaitlin presented her poster at the AGU meeting, which shows the results of the study she did with us in the summer. Her poster generated a lot of interest, especially the visualizations she has of the different model architectures. Click on thumbnail to see the full poster at the AGU site:

A few things to note when looking at the diagrams:

  • Each diagram shows the components of a model, scale to their relative size by lines of code. However, the models are not to scale with one another, as the smallest, UVic’s, is only a tenth of the size of the biggest, CESM. Someone asked what accounts for that size. Well, the UVic model is an EMIC rather than a GCM. It has a very simplified atmosphere model that does not include atmospheric dynamics, which makes it easier to run for very long simulations (e.g. to study paleoclimate). On the other hand, CESM is a community model, with a large number of contributors across the scientific community. (See Randall and Held’s point/counterpoint article in last months IEEE Software for a discussion of how these fit into different model development strategies).
  • The diagrams show the couplers (in grey), again sized according to number of lines of code. A coupler handles data re-gridding (when the scientific components use different grids), temporal aggregation (when the scientific components run on different time steps) along with other data handling. These are often invisible in diagrams the scientists create of their models, because they are part of the infrastructure code; however Kaitlin’s diagrams show how substantial they are in comparison with the scientific modules. The European models all use the same coupler, following a decade-long effort to develop this as a shared code resource.
  • Note that there are many different choices associated with the use of a coupler, as sometimes it’s easier to connect components directly rather through the coupler, and the choice may be driven by performance impact, flexibility (e.g. ‘plug-and-play’ compatibility) and legacy code issues. Sea ice presents an interesting example, because its extent varies over the course of a model run. So somewhere there must be code that keeps track of which grid cells have ice, and then routes the fluxes from ocean and atmosphere to the sea ice component for these grid cells. This could be done in the coupler, or in any of the three scientific modules. In the GFDL model, sea ice is treated as an interface to the ocean, so all atmosphere-ocean fluxes pass through it, whether there’s ice in a particular cell or not.
  • The relative size of the scientific components is a reasonable proxy for functionality (or, if you like, scientific complexity/maturity). Hence, the diagrams give clues about where each lab has placed its emphasis in terms of scientific development, whether by deliberate choice, or because of availability (or unavailability) of different areas of expertise. The differences between the models from different labs show some strikingly different choices here, for example between models that are clearly atmosphere-centric, versus models that have a more balanced set of earth system components.
  • One comment we received in discussions around the poster was about the places where we have shown sub-components in some of the models. Some modeling groups are more explicit about naming the sub-components, and indicating them in the code. Hence, our ability to identify these might be more dependent on naming practices rather than any fundamental architectural differences.

I’m sure Kaitlin will blog more of her reflections on the poster (and AGU in general) once she’s back home.

I’m at the AGU meeting in San Francisco this week. The internet connections in the meeting rooms suck, so I won’t be twittering much, but will try and blog any interesting talks. But first things first! I presented my poster in the session on “Methodologies of Climate Model Evaluation, Confirmation, and Interpretation” yesterday morning. Nice to get my presentation out of the way early, so I can enjoy the rest of the conference.

Here’s my poster, and the abstract is below (click for the full sized version at the AGU ePoster site):

A Hierarchical Systems Approach to Model Validation

Introduction

Discussions of how climate models should be evaluated tend to rely on either philosophical arguments about the status of models as scientific tools, or on empirical arguments about how well runs from a given model match observational data. These lead to quantitative measures expressed in terms of model bias or forecast skill, and ensemble approaches where models are assessed according to the extent to which the ensemble brackets the observational data.

Such approaches focus the evaluation on models per se (or more specifically, on the simulation runs they produce), as if the models can be isolated from their context. Such approaches may overlook a number of important aspects of the use of climate models:

  • the process by which models are selected and configured for a given scientific question.
  • the process by which model outputs are selected, aggregated and interpreted by a community of expertise in climatology.
  • the software fidelity of the models (i.e. whether the running code is actually doing what the modellers think it’s doing).
  • the (often convoluted) history that begat a given model, along with the modelling choices long embedded in the code.
  • variability in the scientific maturity of different components within a coupled earth system model.

These omissions mean that quantitative approaches cannot assess whether a model produces the right results for the wrong reasons, or conversely, the wrong results for the right reasons (where, say the observational data is problematic, or the model is configured to be unlike the earth system for a specific reason).

Furthermore, quantitative skill scores only assess specific versions of models, configured for specific ensembles of runs; they cannot reliably make any statements about other configurations built from the same code.

Quality as Fitness for Purpose

The problem is that there is no such thing as “the model”. The body of code that constitutes a modern climate model actually represents an enormous number of possible models, each corresponding to a different way of configuring that code for a particular run. Furthermore, this body of code isn’t a static thing. The code is changed on a daily basis, through a continual process of experimentation and model improvement. This applies even to any specific “official release”, which again is just a body of code that can be configured to run as any of a huge number of different models, and again, is not unchanging – as with all software, there will be occasional bugfix releases applied to it, along with improvements to the ancillary datasets.

Evaluation of climate models should not be about “the model”, but about the relationship between a modelling system and the purposes to which it is put. More precisely, it’s about the relationship between particular ways of building and configuring models and the ways in which the runs produced by those models are used.

What are the uses of a climate model? They vary tremendously:

  • To provide inputs to assessments of the current state of climate science;
  • To explore the consequences of a current theory;
  • To test a hypothesis about the observational system (e.g. forward modeling);
  • To test a hypothesis about the calculational system (e.g. to explore known weaknesses);
  • To provide homogenized datasets (e.g. re-analysis);
  • To conduct thought experiments about different climates;
  • To act as a comparator when debugging another model;

In general, we can distinguish three separate systems: the calculational system (the model code); the theoretical system (current understandings of climate processes) and the observational system. In the most general sense, climate models are developed to explore how well our current understanding (i.e. our theories) of climate explain the available observations. And of course the inverse: what additional observations might we make to help test our theories.

We’re dealing with relationships between three different systems

Validation of the Entire Modeling System

When we ask questions about likely future climate change, we don’t ask the question of the calculational system, we ask it of the theoretical system; the models are just a convenient way of probing the theory to provide answers.
When society asks climate scientists for future projections, the question is directed at climate scientists, not their models. Modellers apply their judgment to select appropriate versions & configurations of the models to use, set up the runs, and interpret the results in the light of what is known about the models’ strengths and weaknesses and about any gaps between the computational models and the current theoretical understanding. And they add all sorts of caveats to the conclusions they draw from the model runs when they present their results.

Validation is not a post-hoc process to be applied to an individual “finished” model, to ensure it meets some criteria for fidelity to the real world. In reality, there is no such thing as a finished model, just many different snapshots of a large set of model configurations, steadily evolving as the science progresses. Knowing something about the fidelity of a given model configuration to the real world is useful, but not sufficient to address fitness for purpose. For this, we have to assess the extent to which climate models match our current theories, and the extent to which the process of improving the models keeps up with theoretical advances.

Summary

Our approach to model validation extends current approaches:

  • down into the detailed codebase to explore the processes by which the code is built and tested. Thus, we build up a picture of the day-to-day practices by which modellers make small changes to the model and test the effect of such changes (both in isolated sections of code, and on the climatology of a full model). The extent to which these practices improve the confidence and understanding of the model depends on how systematically this testing process is applied, and how many of the broad range of possible types of testing are applied. We also look beyond testing to other software practices that improve trust in the code, including automated checking for conservation of mass across the coupled system, and various approaches to spin-up and restart testing.
  • up into the broader scientific context in which models are selected and used to explore theories and test hypotheses. Thus, we examine how features of the entire scientific enterprise improve (or impede) model validity, from the collection of observational data, creation of theories, use of these theories to develop models, choices for which model and which model configuration to use, choices for how to set up the runs, and interpretation of the results. We also look at how model inter-comparison projects provide a de facto benchmarking process, leading in turn to exchanges of ideas between modelling labs, and hence advances in the scientific maturity of the models.

This layered approach does not attempt to quantify model validity, but it can provide a systematic account of how the detailed practices involved in the development and use of climate models contribute to the quality of modelling systems and the scientific enterprise that they support. By making the relationships between these practices and model quality more explicit, we expect to identify specific strengths and weaknesses the modelling systems, particularly with respect to structural uncertainty in the models, and better characterize the “unknown unknowns”.