On Thursday, Tim Palmer of the University of Oxford and the European Centre for Medium-Range Weather Forecasts (ECMWF) gave the Bjerknes lecture, with a talk entitled “Towards a Community-Wide Prototype Probablistic Earth-System Model“. For me, it was definitely the best talk of this year’s AGU meeting. [Update: the video of the talk is now up at the AGU webcasts page]

I should note of course, that this year’s Bjerknes lecture was originally supposed to have been given by Stephen Schneider, who sadly died this summer. Stephen’s ghost seems to hover over the entire conference, with many sessions beginning and ending with tributes to him. His photo was on the screens as we filed into the room, and the session began with a moment of silence for him. I’m disappointed that I never had a chance to see one of Steve’s talks, but I’m delighted they chose Tim Palmer as a replacement. And of course, he’s eminently qualified. As the introduction said: “Tim is a fellow of pretty much everything worth being a fellow of”, and one of the few people to have won both the Rossby and the Charney awards.

Tim’s main theme was the development of climate and weather forecasting models, especially the issue of probability and uncertainty. He began by reminding us that the name Bjerknes is iconic for this. Vilhelm Bjerknes set weather prediction on its current scientific course, by posing it as a problem in mathematical physics. His son, Jacob Bjerknes, pioneered the mechanisms that underpin our ability to do seasonal forecasting, particularly air-sea coupling.

If there’s one fly in the ointment though, it’s the issue of determinism. Lorenz put a stake into the heart of determinism, through his description of the butterfly effect. As an example, Tim showed the weather forecast for the UK for 13 Oct 1987, shortly before the “great storm” that turned the town of Sevenoaks [where I used to live!] into “No-oaks”. The forecast models pointed to a ridge moving in, whereas what developed was really a very strong vortex causing a serious storm.

Nowadays the forecast models are run many hundreds of times per day, to capture the inherent uncertainty in the initial conditions. An (retrospective) ensemble forecast for 13 Oct 1987 shows this was an inherently unpredictable set of circumstances. The approach now taken is to convert a large number of runs into a probabilistic forecast. This gives a tool for decision-making across a range of sectors that takes into account the uncertainty. And then, if you know your cost function, you can use the probabilities from the weather forecast to decide what to do. For example, if you were setting out to sail in the English channel on the 15th October 1987, you’d need both the probabilistic forecast *and* some measure of the cost/benefit of your voyage.

The same probabilistic approach is used in seasonal forecasting, for example for the current forecasts of the progress of El Niño.

Moving on to the climate arena, what are the key uncertainties in climate predictions? The three key sources are: initial uncertainty, future emissions, and model uncertainty. As we go for longer and longer timescales, model uncertainty dominates – it becomes the paramount issue in assessing reliability of predictions.

Back in the 1970’s, life was simple. Since then, the models have grown dramatically in complexity as new earth system processes have been added. But at the heart of the models, the essential paradigm hasn’t changed. We believe we know the basic equations of fluid motion, expressed as differential equations. It’s quite amazing that 23 mathematical symbols are sufficient to express virtually all aspects of motion in air and oceans. But the problem comes in how to solve them. The traditional approach is to project them (e.g. onto a grid), to convert them into a large number of ordinary differential equations. And then the other physical processes have to be represented in a computationally tractable way. Some of this is empirical, based on observations, along with plausible assumptions on how these processes work.

These deterministic, bulk-parameter parameterizations are based on the presumption of a large ensemble of subgrid processes (e.g. deep convective cloud systems) within each grid box, which then means we can represent them by their overall statistics. Deterministic closures have a venerable history in fluid dynamics, and we can incorporate these subgrid closures into the climate models.

But there’s a problem. Observations indicate a shallow power law for atmospheric energy wavenumber spectra. In other words, there’s no scale separation between the resolved and unresolved scales in weather and climate. The power law is consistent with what one would deduce from the scaling symmetries of the Navier-Stokes equations, but it’s violated by conventional deterministic parameterizations.

But does it matter? Surely if we can do a half-decent job on the subgrid scales, it will be okay? Tim showed a lovely cartoon from Schertzer and Lovejoy, 1993:

As pointed out in the IPCC WG1 Chp8:

“Nevertheless, models still show significant errors. Although these are generally greater at smaller scales, important large-scale problems also remain. For example, deficiencies remain in the simulation of tropical precipitation, the El Niño-Southern Oscillation and the Madden-Julian Oscillation (an observed variation in tropical winds and rainfall with a time scale of 30 to 90 days). The ultimate source of most such errors is that many important small-scale processes cannot be represented explicitly in models, and so must be included in approximate form as they interact with larger-scale features.”

The figures from the IPCC report show the models doing a good job over the 20thC. But what’s not made clear is that each model has had its bias subtracted out before this was plotted, so you’re looking at anomalies relative the the model’s own climatology. In fact, there is an enormous spread of the models against reality.

At present, we don’t know how to close these equations, and a major part of the uncertainty is in these equations. So, a missing box on the diagram of the processes in Earth System Models is “UNCERTAINTY”.

What does the community do to estimate model uncertainty? The state of the art is the multi-model ensemble (e.g. CMIP5). The idea is to poll across the  models to assess how broad the distribution is. But as everyone involved in the process understands, there are problems that are common to all of the models, because they are all based on the same basic approach to the underlying equations. And they also typically have similar resolutions.

Another pragmatic approach, to overcome the limitation of the number of available models, is to use perturbed physics ensembles – take a single model and perturb the parameters systematically. But this approach is blind to structural errors, because the one model used as the basis.

A third approach is to use stochastic closure schemes for climate models. You replace the deterministic formulae with stochastic formulae. Potentially, we have a range of scales at which we can try this. For example, Tim has experimented with cellular automata to capture missing processes, which is attractive because it can also capture how the subgrid processes move from one grid box to another. These ideas have been implemented in the ECMWF models (and are described in the book Stochastic Physics and Climate Modelling).

So where do we go from here? Tim identified a number of reasons he’s convinced stochastic-dynamic parameterizations make sense:

1) More accurate accounts of uncertainty. For example, attempts to assess skill of seasonal forecast with various different types of ensemble. For example Weisheimer et al 2009 scored the ensembles according to how well they captured the uncertainty – stochastic physics ensembles did slightly better than other types of ensemble.

2) Stochastic closures could be more accurate. For example, Berner et al 2009 experimented with adding stochastic backscatter up the spectrum, imposed on the resolved scales. To evaluate it, they looked a model bias. Use the ECMWF model, they increased resolution by factor of 5, which is computationally very expensive, but fills out the bias in the model. They showed the backscatter scheme reduces the bias of the model, in a way that’s not dissimilar to the increased resolution model. It’s like adding symmetric noise, but means that the model on average does the right thing.

3) Taking advantage of exascale computing. Tim recently attended talk by Don Grice, IBM Chief engineer, talking about getting ready for exascale computing. He said “There will be a tension between energy efficiency and error detection”. What he meant was that if you insist on bit-reproducibility you will pay an enormous premium in energy use. So the end of bit-reproducibility might be in sight for High Performance Computting.

To Tim, this is music to his ears, as he thinks stochastic approaches will be the solution to this. He gave the example of Lyric semiconductors, who are launching a new type of computer, with 1000 times the performance, but at the cost of some accuracy – in other words, probabilistic computing.

4) More efficient use of human resources. The additional complexity in earth system models comes at a price – huge demands on human resources. For many climate modelling labs, the demands are too great. So perhaps we should pool our development teams, so that we’re not all busy trying to replicate each other’s codes.

Could we move to a more community wide approach? It happened to the aerospace industry in Europe, when the various countries got together to form Aerobus. Is it a good idea for climate modelling? Institutional directors take a dogmatic view that it’s a bad idea. The argument is that we need model diversity to have good estimates of uncertainty. Tim doesn’t want to argue against this, but points out that once we have a probabilistic modelling capability, we can test this statement objectively – in other words, we can test whether in different modes, the multi-model ensemble does better than a stochastic approach.

When we talk about modelling, it covers a large spectrum, from idealized mathematically tractable models through to comprehensive mathematical models. But this has led to a separation of the communities. The academic community develops the idealized models, while the software engineering groups in the met offices build the brute-force models.

Which brings Tim to the grand challenge: the academic community should help develop prototype probabilistic Earth System Models, based on innovative and physically robust stochastic-dynamics models. The effort has started already, at the Isaac Newton Institute. They are engaging mathematicians and climate modellers, looking at stochastic approaches to climate modelling. They have already set up a network, and Tim encouraged people who are interested to subscribe.

Finally, Tim commented on the issue of how to communicate the science in this Post-cancun, post-climategate world. He went to a talk about how climate scientists should become much more emotional about communicating climate [Presumably the authors session the previous day]. Tim wanted to give his own read on this. There is a wide body of opinion that cost of major emissions cuts is not justified given current levels of uncertainty in climate predictions (and this body of opinion has strong political traction). Repeatedly appealing to the precautionary principle, and our grandchildren is not an effective approach. They can bring out pictures of their grandchildren, saying they don’t want them to grow up in a country bankrupted by bad climate policies.

We might not be able to move forward from the current stalemate without improving the accuracy of climate predictions. And are we (as scientists and government) doing all we possibly can to assess whether climate change will be disastrous, or something we can adapt to? Tim gives us 7/10 at present.

One thing we could do is to integrate NWP and seasonal to interannual prediction into this idea of seamless prediction. NWP and climate diverged in the 1960s, and need to come together again. If he had more time, he would talk about how data assimilation can be used as a powerful tool to test and improve the models. NWP models run at much finer resolution than climate models,  but are enormously computationally expensive. So are governments giving the scientists all the tools they need? In Europe, they’re not getting enough computing resources to put onto this problem. So why aren’t we doing all we possibly can to reduce these uncertainties?

Update: John Baez has a great in-depth interview with Tim over at Azimuth.

To follow on from the authors session on Wednesday morning, Michael Oppenheimer, from Princeton, gave the inaugural Stephen Schneider Global Change Lecture, with a talk entitled “Scientists, Expert Judgment, and Public Policy: What is Our Proper Role?” (see the webcast here)

Michael’s theme was about how (and when) scientists should engage in broader communication in the public arena. His aim was to address three issues: the doubts that scientists often have about engaging in public communication, strategies for people who aren’t Carl Sagans (or Stephen Schneiders), and some cautionary tales about the difficulties.

First some context. There is a substantial literature on the relationship between scientists and broader communities, going at least back to CP Snow, and through Naomi Oreskes. CP Snow provides a good starting point. In the two cultures talk, Snow launched a diatribe against Britain’s educated elite. Strip away the critique of class structures, and you get an analysis of the difficulty most political leaders have of comprehending the science that sheds light on how the world is. There have been some changes since then – in particular, the culture of political leaders is no longer as highbrow as it used to be. Snow argued that industrial revolution was a mixed bag, that brought huge inequalities. He saw scientists as wiser, more ethical, and more likely to act in the interest of society than others. But he also saw that they were poor at explaining their own work, making their role in public education problematic. One cannot prove that the world has taken a better path because of the intervention of scientists, but one can clearly show that scientists have raised the level of public discourse.

But science communication is hard to do. Messages are easily misunderstood, and it’s not clear who is listening to us, and when, or even whether anyone is listening at all. So why get involved? Michael began by answering the standard objections:

It takes time. Are we as a community obligated to do this? Can’t we stay in our labs while the policymakers get on with it? Answer: If we don’t engage, we leave congress (for example) with the option of seeking advice from people who are less competent to provide it.

Can we minimize involvement just by issuing reports and leaving it at that? Answer: reports need to be interpreted. For example, the statement “warming of the climate system is unequivocal” was used in the IPCC AR4. But what does this mean? A reasonably intelligent person could ask all sorts of questions about what it means. In this case, the IPCC did not intend to say that both the fact of warming, and the fact of human attribution of that warming, are unequivocal. But last week at COP Cancun, that double meaning was widely used.

Well someone has to do the dirty work, but I’m not so good at it, so I’ll let others do it. Answer: we may no longer have this choice. Ask the people who were at the centre of climategate, many of whom were swept up in the story whether they liked it or not (and some of the people swept up in it were no more than recipients of some of the emails). We’re now at the centre of a contentious public debate, and it’s not up to the institutions, but to the people who make up those institutions to participate.

Do we have an obligation? Answer: Public money funds much of our work, including our salaries. Because of this, we have an obligation not just to publish, but to think about how others might use our research. We don’t spend enough time thinking about the scientific context in which our findings will be understood and used.

So like it or not, we cannot avoid the responsibility to communicate our science to broader audiences. But in doing this, our organisations need to be able to distinguish fair criticism from outside, where responding to them will strengthen our institutions, from unsupported attacks, which are usually met with silence.

What are our options?

Take a partisan position (for a candidate or a policy) that is tied to your judgement on the science. Probably this is not to everyone’s taste. People worry that being seen as partisan will damage science. But visible participation by scientists in political process does not damage in any way the collective reputation of science and the scientific community. Problems occur when scientific credentials are used to support a political position that has nothing to do with the science. (Michael cited the example of Bill Frisk using his credentials to make medical pronouncements for political reasons, on a case he wasn’t qualified to comment on). Make sure you are comfortable in your scientific skin if you go the political route.

Take sides publicly about the policy implications of your research (e.g. blog about it, write letters, talk to your congressperson, etc). This is a political act, and is based both on science and on other considerations. The further from your own expertise you go in making pronouncements, the shakier ground you are on. For example, it is far outside the expertise of most people in the room to judge viability of different policy options on climate change. But if we’re clear what kind of value judgements we’re making, and how they relate to our expertise, then its okay.

Can we stop speaking as experts when we talk about value issues? The problem is that people wander over the line all the time without worrying about it, and the media can be lazy about doing due diligence on finding people with appropriate expertise. That doesn’t mean we shouldn’t take the opportunity to fix this. If you become concerned about an issue and want to speak out on it, do take the time to understand the relevant literature. Half-truths taken out of context can be the most damaging thing of all. And it’s intoxicating being asked to give expert opinions. So we need to keep our heads about us and be careful. For example make use of scripts provided by assessment reports.

We should not be reticent about expressing value judgements that border on our areas of expertise, but we should be clear that those value judgements don’t necessarily carry more weight than other people’s value judgements.

Participate in community activities such as the IPCC, NAS, panels, AGU outreach, etc. The emphasis we place on these implies some judgement. As more of us speak in public, there will be more open disagreement about the details (for example, different scientific opinions about likelihood of ice sheets melting this century). The IPCC doesn’t disparage divergent views, but it doesn’t tolerate people who don’t accept evidence-based assessments.

Avoid all of it (but is this even possible?). Even if you avoid sitting on panels where implications of the research will be discussed, or refuse to discuss applied aspects of your work, you’re still not safe, as the CRU email issue showed.

Above all, we all have a citizen’s right to express an opinion, and some citizens might think our opinions carry special weight because of our expertise. But we have no right to throw a temper tantrum because the policy choices don’t go the way we would like. Scientists are held in higher regard than most professional communities (although there isn’t much competition here). But we also need to be psychologically ready for when there are media stories about areas we are expert in, and nobody comes to seek our opinion.

We’re not a priesthood, we are fallible. We can contribute to the public debate, but we don’t automatically get a privileged role.

So some advice:

  • Don’t be rushed. Our first answer might not be our best answer – we often need to reflect. Michael pointed out the smartest response he ever gave a reporter was “I’ll call you back”.
  • Think about your audience in advance, and be prepared for people who don’t want to listen to or hear you. People tend to pick surrogates for expertise, usually ones who reflect their own worldview. E.g. Al Gore was well received among progressives, but on the right, people were attuned to other surrogates. Discordent threads often aren’t accommodated, they tend to be ignored. You could try putting aside moral principles while serving up the science, if your audience has different ideological view to you. For example, if you disagree with the Wall Street Journal editorial stance on science, then adjust your message when speaking to people who read them – cocktail parties might be more important than universities for education.
  • Expect to be vilified, but don’t return the favour (Michael read out some of the hate mails he has received at this point). You might even be subjected to legal moves and complaints of misconduct. E.g. Inhofe’s list of scientists who he claimed were implicated in the CRU emails, and for whom he recommended investigation. His criteria seems to have been anyone involved in IPCC processes who ever received any of the CRU emails (even if they never replied). Some people on the list have never spoken out publicly about climate change.
  • Don’t hide your biases, think them over and lay them out in advance. For example, Michael once asked a senior colleague why he believed climate sensitivity was around 1.5°C. rather than being in the 2 to 4.5°C range assessed by the national academies. He replied that he just didn’t think that humans could have that much impact on the climate. This is a belief though, rather than an evidence-based thing, and this should be clear up front, not hidden in the weeds.
  • Keep it civil. Michael has broken this rule in the past (e.g. getting into food fights on TV). But the worst outcome would be to let this divide us, whereas we’re all bound together by the same principles and ethics that underpin science.

And finally, to repeat Stephen Schneider’s standard advice: The truth is bad enough; Our integrity should never be compromised; Don’t be afraid of metaphors; and distinguish when speaking about values and when speaking as an expert.

I was particularly looking forward to two AGU keynote talks on Monday – John Holdren (Science and technology advisor to the President) and Julia Slingo (Chief Scientist at the UK Met Office). Holdren’s talk was a waste of time, while Slingo’s was fabulous. I might post later about what I disliked about Holdren’s talk (James Annan has some hints), and you can see both talks online:

Here’s my notes from Julia’s talk, for those who want a shorter version than the video.

Julia started with the observation that 2010 was an unprecedented year of geophysical hazards, which presents some serious challenges for how we discuss and communicate about these, and especially how to communicate about risks in a way that’s meaningful. And as most geophysical hazards either start with the weather or are mediated through impact on the weather, forecasting services like the UK Met Office have to struggle with this on a daily basis.

Julia was asked originally to come and talk about Eyjafjallajökull, as she was in the thick of the response to this emergency at the Met Office. But in putting together the talk, she decided to broaden things to draw lessons from several other major events this year:

  • Eyjafjallajökull’s eruptions and their impact on European Air Traffic.
  • Pakistan experienced the worst flooding since 1929, with huge loss of life and loss of crops, devastating an area the size of England.
  • The Russian heatwave and the forest fires, which was part of the worst drought in Russia since records began.
  • The Chinese summer floods and landslides, which was probably tied up with the same weather pattern, and caused the Three Gorges Dam, only just completed, to reach near capacity.
  • The first significant space weather storm of the new solar cycle as we head into a solar maximum (and looking forward, the likelihood of that major solar storms will have an impact on global telecommunications, electricity supply and global trading systems).
  • And now, in the past week, another dose of severe winter weather in the UK, along with the traffic chaos it always brings.

The big picture is that we are increasingly vulnerable to these geophysical events in an inter-dependent environment: Hydro-meteorological events and their impact on Marine and Coastal Infrastructures; Space Weather events and their impact on satellite communications, aviation, and electricity supply; Geolological hazards such as earthquakes and volcanos; and Climate Disruption and its impact on food and water security, health, and infrastructure resilience.

What people really want to know is “what does it mean to me?” and “what action should I take?”. Which means we need to be able to quantify exposure and vulnerability, and to assess socio-economic impact, so that we can then quantify and reduce the risk. But it’s a complex landscape, with different physical scales (local, regional, global), temporal scales (today, next year, next decade, next century), and responses (preparedness, reslience, adaptation). And it all exists within the the bigger picture on climate change (mitigation, policy, economics).

Part of the issue is the shifting context, with changing exposure (for example, more people live on the coast, and along rivers), changing vulnerability (for example our growing dependency on communication infrastructure, power grids, etc).

And forecasting is hard. Lorenz’s work on chaotic systems has become deeply embedded in meteorological science, with ensemble prediction systems now the main weapon for handling the various sources of uncertainty: initial condition uncertainty, model uncertainty  (arising from stochastic unresolved processes and parameter uncertainty), and forecast uncertainty. And we can’t use past forecast assessments to validate future forecasts under conditions of changing climate. The only way to build confidence in forecast system is to do the best possible underpinning science, and go back to the fundamentals, which means we need to collect the best observational data we can, and think about the theoretical principles.


This shouldn’t have been unusual – there are 30 active volcanoes in Iceland, but they’ve been unusually quiet during the period in which aviation travel has developed. Eyjafjallajökull began to erupt in March. But in April it erupted through the glacier, causing a rapid transfer of heat from magma to water. A small volume of water produces a large volume of steam and very fine ash. The eruption then interacted with unfortunate meteorological conditions, which circulated the ash around a high pressure system over the North Atlantic. The North Atlantic Oscillation (NAO) was in strong negative phase, which causes the Jet stream to make a detour north, and then back down over UK and Western Europe. This pattern caused more frequent negative blocking NAO patterns from February though March, and then again from April though June.

Normally, ash from volcanoes is just blown away, and normally it’s not as fine. The Volcanic Ash Advisory Centres (VAACs) are responsible for managing the risks. London handles a small region (which includes the UK and Iceland), but if ash originates in your area, it’s considered to be yours to manage, no matter where it then goes. So, as the ash spread over other regions, the UK couldn’t get rid of responsibility!

To assess the risk, you take what you know and feed it into a dispersion model, which then is used to generate a VAAC advisory. These advisories usually don’t say anything about how much ash there is, they just define a boundary of the affected area, and advise not to fly through it. As this eruption unfolded, it became clear there were no-fly zones all over the place. Then, the question came about how much ash there was – people needed to know how much ash and at what level, to make finer grained decisions about flying risk. The UK VAAC had to do more science very rapidly (within a five day period) to generate more detailed data for planning.

And there are many sources of uncertainty:

  • Data on ash clouds is hard to collect, because you cannot fly the normal meteorological aircraft into the zone, as they have jet engines.
  • Dispersion patterns. While the dispersion model gave very accurate descriptions of ash gradients, it did poorly on the longer term dispersion. Normally, ash drops out of the air after a couple of days. In this case, ash as old as five days was still relevant, and needed to be captured in the model. Also, the ash became very stratified vertically, making it particularly challenging for advising the aviation industry.
  • Emissions characteristics. This rapidly became a multidisciplinary science operation (lots of different experts brought together in a few days). The current models represent the release as a vertical column with no vertical variation. But the plume changed shape dramatically over the course of the eruption. It was important to figure out what was exiting the area downwind, as well as the nature of the plume. Understanding dynamics of plumes is central to the problem, and it’s a hard computational fluid dynamics problem.
  • Particle size, as dispersion patterns depend on this.
  • Engineering tolerances. For risk based assessment, we need to work with aircraft engine manufacturers to figure out what kinds of ash concentration are dangerous. Needed to provide detailed risk assessment for exceeding thresholds for engine safety.

Some parts of the process are more uncertain than others. For example the formation of the suspended ash plume was a major source of uncertainty, and the ash cloud properties led to some uncertainty. The meteorology, dispersion forecasts, and engineering data on aircraft engines are smaller sources of uncertainty.

The Pakistan Floods

This is more a story of changing vulnerability rather than changing exposure.It wasn’t unprecedented, but it was very serious. There’s now a much larger population in Pakistan, and particularly more people living along river banks. So it had a very different impact to last similar flooding in 1920s.

The floods were caused by a conjunction of two weather systems – the active phase of the summer monsoon, in conjunction with large amplitude waves in mid-latitudes. The position of the sub-tropical jet, which is usually well to the north of the tibetan plateau, made a huge turn south, down over Pakistan. It caused exceptional cloudbursts over the mountains of western Pakistan.

Could these storms have been predicted? Days ahead, the weather forecast models showed unusually large accumulations – for example 9 days ahead, the ECMWF showed a probability of exceeding 100mm over four days. These figures could have been fed into hydrological models to assess impact on river systems (but weren’t).

The Russian heatwave

Whereas Eyjafjallajökull was a story of changing exposure, and Pakistan was a story of changing vulnerability, it’s likely that the Russian heatwave was a story of changing climate.

There were seasonal forecasts, and the heatwaves were within the range of the ensemble runs, but nowhere near the ensemble mean. For example, the May 2010 seasonal forecast for July showed a strong warm signal in over Russia in the ensemble mean. The two warmest forecasts in the ensemble captured very well the observed warm pattern and intensity. It’s possible that the story here is that use of past data to validate seasonal forecasts is increasingly problematic under conditions of changing climate, as it gives a probability density function that is too conservative.

More importantly, seasonal forecasts of extreme heat are associated with blocking and downstream trough. But we don’t have enough resolution in the models to do this well yet – the capability is just emerging.

We could also have taken these seasonal forecasts and pushed them through to analyze impact on air quality (but didn’t).

And the attribution? It was a blocking event (Martin Hoerling at NOAA has a more detailed analysis). It has the same cause as the European heatwaves in 2003. It’s part of a normal blocking pattern, but amplified by global warming.

Cumbrian floods

From 17-20 Novemver 2009, there was unprecedented flooding (at least going back 2 centuries) in Cumbria, in the north of England. The UK Met office was able to put out a red alert warning two days in advance for severe flooding in the region. It was quite a bold forecast, and they couldn’t have done this a couple of years ago. The forecast was possible from the high resolution 1.5km UK Model, which was quasi-operational in May 2009. Now these forecasts are on a scale that is meaningful and useful to hydrologists.


We have made considerable progress on our ability to predict weather and climate extremes, and geophysical hazards. We have made some progress on assessing vulnerability, exposure and socio-economic impact, but these are a major limiting factor our ability to provide useful advice. And there is still major uncertainty in quantifying and reducing risk.

The modelling and forecasting needs to be done in a probabilistic framework. Geophysical hazards cross many disciplines and many scales in space and time. We’re moving towards a seamless forecasting system, that attempts to bridge the gap between weather and climate forecasting, but there are still problems in bridging the gaps and bridging the scales. Progress depends on observation and monitoring, analysis and modelling, prediction and impacts assessment, handling and communicating uncertainty.  And dialogue with end users is essential – it’s very stimulating, as they challenge the science, and they bring fresh thinking.

And finally, a major barrier is access to supercomputing power – we could do so much more if had more computing capability.

I spent most of Wednesday attending a series of sessions featuring bestselling authors from the AGU Global Environmental Change division. The presenters were authors of books published in the last couple of years, all on various aspects of climate change, and all aimed at a more general audience. As the chairs of the track pointed out, it’s not news when an AGU member publishes a book, but it is news when so many publish books aimed at a general audience in a short space of time  – you don’t normally walk into a bookstore and see a whole table of books authored by AGU members.

As the session unfolded, and the authors talked about their books, and their reasons for writing them, it became clear that there’s a groundswell here, of scientists who have realised that the traditional mode by which science gets communicated with the broader society just isn’t working with respect to climate change, and a different approach is needed, along with a few from outside the climate science community who have stepped in to help overcome the communication barrier.

The first two books were on geoengineering. Unfortunately, I missed the first, Eli Kintish’s “Hack the Planet: What we Talk About When we Talk About Geoengineering”, and second speaker, Jeff Goodall, author of “How to Cool the Planet” didn’t make it. So instead, I’ll point to the review of both books that appeared in Nature Reports back in April. As the review makes clear, both books are very timely, given how little public discussion there has been on geoengineering, and how important it is that we think much more carefully about this because we’re likely to be approaching a point where people will attempt geoengineering in desperation.

One interesting point made in the Nature Reports review is the contrast in styles, between Eli’s book, which is much more of a science book suitable for a scientifically literature audience, and which digs deeper into how various geoengineering proposals might work, versus Jeff’s book, which is more lively in style, illustrating each chapter through the work of a particular scientist.

This theme of how to get the ideas across, and especially how to humanize them, came out throughout the session as the other authors presented their experiences.

In place of Jeff’s talk, Brian Fagan, author of “The great warming: Climate Change and the Rise and Fall of Civilization” filled in. Brian is an anthropologist by training, but has focussed much of his career on how to explain research in his field to a broader audience. As snippets from his book, Brian gave a number of examples of how human civilization in the past has been affected by changing climate. He talked about how a warmer European Climate in medieval times allowed the Vikings to explore widely across the North Atlantic (in open boats!), and how the Mayan civilization, which lasted from 200BC to 900AD was eventually brought down by a series of droughts. The Mayans took water very seriously, and many of their rituals focussed on water (or lack of it), while the Mayan pyramids also acted as water towers. In the late 19th Century, the Indian Monsoons failed, and millions died, at at time when the British Raj was exporting rice from India to bring down food prices in Europe.

The interesting thing about all these examples is that it’s not the case that climate change causes civilization to fall. It’s more like the ripples spreading out from a stone dropped into a calm pool – the spreading ripples are the social and economic consequences of climate changes, which in some cases make adaptation possible, and in other cases lead to the end of a civilization.

But most of what Brian wanted to talk about was why he wrote the book in the first place, or rather why he got involved in communicating issues such as climate change to broader audiences. He taught as a professor for 36 long(!) years. But he was strongly affected by experiences at the beginning of his career, in his early 20s, when he spent a year in the Zambezi valley. Here, rainfall is unpredictable, and when the rains don’t come people starve. He’s thought a lot since then about the experience. More recently, seeing the results of the Hadley Centre models that forecast increasing droughts through the next few decades, he realised that the story of drought in human history needed to be told.

But there’s a challenge. As an academic, from a research culture, you have to deal with the “publish or perish” culture. If we want to reach the public, something has to change. The NSF doesn’t provide research funds to explain to the public what we do. So he had to raise money by other means to fund his work, mostly from the private sector. Brian made much of this contrast – studies of (faintly disgusting) ancient artefacts for their own sake are fundable, but attempts to put this work in context and tell the larger stories are not. Brian was accused by one University administrator of doing “inappropriate research”. And yet, archeology is about human diversity – about people, so telling these stories about human diversity ought to be central to the field.

Having written the book, he found himself on the bestseller lists, and got onto the Daily show. This was quite an experience – Jon Stewart reads everything in the book, and he sits right up close to you and is in your face. Brian’s comment was “Thank god I had taught graduate seminars” and was experienced with dealing with probing questions.

His other advice was if you want to reach out, you have to know why. People will ask, and “because I love it” isn’t enough – you have to have a really have a good reason. Always think about how your work related to others and to wider society. Use your research to tell stories, write clearly, and personal experience is very important. But above all, you must have passion – there is no point writing for a wider audience without it.

The next talk was by Claire .L. Parkinson, author of “Coming Climate Crisis? Consider the Past, Beware the Big Fix”. Claire’s motivation for writing the book was her concerns about geoengineering, and the need to explain the risks. She mentioned that if she’d realised Eli and Jeff were writing their books, she probably wouldn’t have.

She also felt she needed to deal with the question about how polarized and confused the community has become about climate change. Her goal was to lessen the confusion and to encourage caution about geoengineering. A central message of the book is that the earth’s climate has been changing for 4.6 billion years, but humans were not around for most of this. Climate can change can happen much more abruptly than what humans have experienced. And in the face of abrupt climate change, people tend to assume geoengineering can get us out of the problem. But geoengineering can have serious unintended consequences, because we are not all knowing, no matter how good our models and analysis are.

Claire gave an quick, chapter-by-chapter overview of the book: Chapter 2 gives an overview of 4.6 billion years of global changes, including tectonics, extra-terrestrial events, changes in orbit, etc; Chapter 3 covers abrupt climate changes, putting the last 20 years in comparison with the historical record from ice cores, with the key point being that the earth’s system can and does change abruptly, with the beginning and end of the Younger-Dryas period as the most obvious examples. Chapter 4 is a short history of human impacts on climate. The big impacts began with human agriculture, and with the industrial revolution.

Chapter 5 looks at the future, and the consensus view that the future looks bleak if business as usual continues. The IPCC scenarios show consequences of warming over the coming century. In this chapter, Claire also included a section at the end about scientists who disagree with the IPCC assessment. Her feeling is that we shouldn’t be disrespectful to the skeptics, because we might not be right. However she has been criticized for this [see for example, Alan Robock’s review, which explains exactly what he thinks is wrong about this approach],

The next few chapters then explore geoengineering. Chapter 6 looks at things that were done in the past with good intentions, but went wrong. An example is the introduction of prickly pear cactus into Australia. Within decades it had grown so profusely that areas were destroyed by it and homesteads had to be abandoned. Chapter 7 explains the commonly touted geoengineering schemes, including space mirrors, carbon capture and sequestration, white roofs (which actually make sense), stratospheric sulfates, artificial trees, and ocean fertilization. Chapter 8 covers examples of attempts at a smaller scale to change the weather, such as cloud seeding, lessening hailstorms, and attempts to tame hurricanes (Jim Fleming, the next speaker had many more examples). These examples demonstrate lots of interest and ingenuity, but none were really successful, and therefore they provide a cautionary tale.

The last three chapters are also cautionary: just because we have a scientific consensus doesn’t mean we’re right. It’s unfortunately that people express things with 100% certainty, because it give the impression that we’re not open minded scientists. Chapter 10 is on climate models – no matter how wonderful they are, and no matter how wonderful the data records are, neither are perfect. So the models might provide misleading results, for example arctic sea ice has declined far faster than the models predicted. Chapter 11 is on the social pressures, and was the toughest chapter to write. There is both peer pressure and media pressure to conform to the consensus. Most people who got into the earth sciences in Claire’s generation never expected their work to have strong public interest. Scientists are now expected to provide soundbites to the media, which then get distorted and cause problems. Finally, chapter 12 looks at the alternatives – if geoengineering is too risky, what else can we do?

The next speaker was Jim Fleming, author of “Fixing the Sky: Why the History of Climate Engineering Matters”. Jim is a historian, and points out that most history of science books are heroic stories, whereas this book was his first tragicomedy. Throughout the book, hubris (on the part of the scientists involved) is a strong theme.

As an aside, Jim gave a simple reason why you should be nice to historians, best captured in the Samuel Johnson quote “God can’t alter the past, but historians can”. He also pointed out that we should take heed of the Bruntland’s point that current environmental crises require that we move beyond scientific compartmentalization, to draw the very best of our intellectual reserves from every field of endeavour.

Jim was the only historian invited to a NASA meeting at Ames in 2007, on managing solar radiation. He was rather amused when someone got on the mic to apologise for the problems they were having managing the temperature in the meeting room (and here they were, talking about managing the planet’s climate!). There were clearly some serious delusions among the scientists in the room about the prospect. As a result, he wrote an essay, “The climate engineers” which was published in Wilson Quarterly, but was clearly a bit too short to do justice to the topic.

So the book set out to bring these issues to the public, and in particular the tragic history of public policy in weather and climate engineering. For climate change and geoengineering, people have been claiming we don’t have a history to draw on, that we are the first generation to think about these things, and that we don’t have time to ponder the lessons of history because the problem is too urgent. Jim says otherwise – there is a history to draw on, and we have to understand this history and learn from it. If you don’t study history, everything is unprecedented!!

Geogengineering will alter relationships, not just between humans and climate,  but among humans. If you think someone else is modifying your climate, you’re going to have a fundamentally altered relationship with them! He gave some fascinating anecdotes to illustrate this point. For example, one of the NCAR gliders was attacked with molotov cocktail – it turns out people thought they were “stealing the sky-water”, while in fact, the reason they were using a glider was to minimize the impact on clouds.

An early example of attempts to manage the weather include James Espy, who, having studied volcanoes, realized there’s always more rain after an eruption. In 1839, he proposed we should burn large fires across the appalachians to make more rain and to purify the air (because the extra rain would wash out the “miasmas”).

About the same time, Eliza Leslie wrote a short story “The Rain King“, which captures many of the social dynamics of geoengineering very well. It’s the story of the opening of a new Rain Office, which has the machinery to control the weekend weather, and sets up a democratic process for people to vote on what weather they want for the weekend. The story is brilliant in its depiction of the different petitioners, and the cases they make, along with the biases of the rain office staff themselves (they want to go for rain to test the machinery), the eventual trumping of them all by a high society lady, and the eventual disappointment of everyone concerned at the outcome of the process.

Another example focusses on Wexler (von Neumann’s right hand man) and the story of numerical computing in the 1940’s and 1950’s. At the time, one could imagine decommissioned WW2 flight squadrons going out to bomb a developing hurricane to stop it. Wexler and von Neumann both endorsed this idea. von Neumann’s 1955 essay “Can we survive technology?” warned that climate control could lead to serious social consequences. Meanwhile, Wexler was concerned with other ways of fighting the Russians, opening up access to space, etc. While studying the Weather Watch program, he explored how rocket trails affect the ozone layer, and explored the idea of an ozone bomb that could take out the ozone layer, as well as weapons that could warm or cool the planet.

James Van Allen, discoverer of the van Allen belt, was also a geoengineer. He explored ways to change the earth’s magnetic field using A-bombs. His work was mainly focussed on “bell ringing” to test the impact of these bombs on the magnetic field. But there were also attempts to weaponize this, e.g. to cause a magnetic storm over Moscow.

Jim wrapped up with a crucial point about tipping points: if we attempt to tip the earth, where will it roll? If we do end up trying geoengineering, we will have to be interdisciplinary, international, and intergenerational about it.

The next speaker was Edward Parson, co-author with Andy Dessler of “The science and politics of global climate change: a guide to the debate”. The book is a broad overview of climate science, intended as a teaching resource. The collaboration in writing the book was interesting – Andy is an atmospheric scientist, Edward is an expert in climate policy. But neither knew much about the other discipline, so they had to collaborate and learn, rather than just dividing up the chapters. This meant they ended up looking in much more detail at the interactions between the science and the politics.

It was hard to nagivate a path through the treacherous waters of communicating the scientific knowledge as a basis for action: not just what we know, but how we know it and why we know it. In particular, they didn’t want to over-reach, to say scientific knowledge by itself is sufficient to know what to do in policymaking. Rather, it requires a second step, to specify something you wish to do, or something you wish to avoid, in order to understand policy choices. With climate change it has become much easier to demonstrate to anyone with a rational approach (as opposed to those who do magical thinking) that there are very clear arguments for urgent policy action, but you have to make this second step clear.

So why does everyone try to frame their policy disagreements as scientific disagreements? Edward pointed out that in fact most people are just doing “evidence shopping”, on one side or another. He’s been to many congressional hearings, where intelligent, thoughtful legislators, who are quite ignorant about the science, pound the table saying “the science says this, the science says that”. Scientific assessment processes are an important weapon in curtailing this evidence shopping. They restrain the ability of legislators to misuse the science to bolster their preferred policy response. A scientific assessment process is not the same as collective authorship of a scientific paper. It’s purpose is to assemble and survey the science.

Many of the fights over climate policy can actually be understood as different positions on how to manage risks under uncertainty. Many of these positions take an extreme stance on management of risk. Some of this can be traced back to the 1970s, when it was common for advocates to conflate environmental issues with criminal law. For example, a manufacturer of CFCs, arguing against action to protect the ozone layer, saying “what happened to the presumption of innocence?”, while ignoring the fact that chemicals aren’t humans.

In criminal proceedings, there are two ways to be wrong – you can convict the innocent, or release the guilty. We have a very strong bias in favour of the defendant, because one of these errors is regarded as much more serious than the other  – we always try and err on the side of not convicting innocent people. This rhetoric of “the burden of proof” and “presumption of innocence” has faded in environmental issues, but its legacy lives on. Now we hear lots of rhetoric about “science-based” policy, for example the claim that the Kyoto protocol isn’t based on the science. In effect, this is the same rhetorical game, with people demanding to delay policy responses until there is ever more scientific evidence.

But science is conservative in this, in the same way that criminal law is. As a scientist, it is much worse to be promiscuous in accepting new scientific claims that turn out to be wrong, than it is to reject new claims that turn out to be right, largely because of the cost of getting it wrong, and directing research funds to a line of research that doesn’t bear fruit.

When there are high stakes for managing public risk, this perception about the relative magnitude of the cost of the two types of error no longer applies. So attacks on Kyoto as not being exclusively based on the science are technically correct, but they are based on an approach to decision making that is dangerously unbalanced. For example, some people say to assessment bodies, “don’t even tell me about a risk until you have evidence that allows you to be absolutely certain about it”. Which is nuts – it’s the role of these bodies to lay out the risks, lay out the evidence and the uncertainties, so that policymaking can take them into account.

Much of the book ended up being a guide for how to use the science in policy making, without making biasing mistakes, such as these recklessly risky demands for scientists to be absolutely certain, or demands for scientists to suppress dissent. But in hindsight, perhaps they punted a little on how to solve these problems. Also, the book does attempt to address some of the claims of climate change deniers, but it’s not always possible to keep up with the silly things people are saying.

Edward finished by saying he has long wished for a book you could give to your irritating uncle, who is a smart guy with forceful opinions, but who gets his knowledge on climate change from Fox news and climate denialist blogs. The feedback is that the book does a good job on this. It’s a shame that the denialist movement has appropriated and sullied the term “skeptic” which is really what science is all about.

The next speaker was Naomi Oreskes, co-author (with Erik Conway) of “Merchants of Doubt”. Naomi titled her talk “Are debatable scientific questions debatable?”, a title taken from John Ziman’s 2000 paper, who points out there is a big contrast between debate in politics and debate in science, and this difference disadvantages scientists.

In political debates, debate is adversarial and polarized, aimed typically at deciding simple yes/no decisions. In science, we seek out intermediate positions, multivalent arguments, and consider many different hypotheses. And there is no simple voting process to declare a “winner”.

More importantly, “scientific debates” generally aren’t about the science (evidence, findings) at all, they are about trans-scientific issues, and cannot be resolved by doing more science, nor won by people with more facts. Naomi argues that climate change is a trans-science issue.

When they wrote merchants of doubt, they were interested in why there is such a big gap between the scientific consensus and the policy discussions. For example, 18 years after the UN framework convention on climate change, the world still has not acted on it in any significant way. In 2007, the IPCC said the warming is unequivocal. But opinion polls showed a vast majority of the [American] population didn’t believe it. At the same time as scientific consensus was developing on climate change, a politically motivated consensus to attack the science was also developing. It focussed on credible, distinguished scientists who rejected the work of their own colleagues, and made common cause with the tobacco and fossil fuel industry.

Central to the story is the Marshall institute, which has been denying the science since the 1980’s. It was founded by three physicists, Seitz, Jastrow, and Nierenberg. All three had built their careers in cold war weaponry. They founded the Marshall institute to defend the Strategic Defence Initiative (SDI), which was extremely controversial at the time in the scientific community. 6500 scientists and engineeers signed a boycott of the program funds, a move that was historically unprecedented in the cold war era. In anger at this boycott,, Jastrow wrote an article in 1987 entitled “America has Five Years Left”, warning about Soviet technical supremacy (and there’s a prediction that didn’t come true!). Jastrow was also working for the Reynolds corporation, whose principle strategy to fight increasing tobacco regulation was to cast doubt on the science that linked tobacco smoke to cancer. An infamous tobacco industry memo boasted that “Doubt is our product”.

You might have thought that after the collapse of the Soviet Union, these old cold warriors would have retired, happy that America had won. But they found a new enemy: environmental extremism. They applied the tabacco strategy, but they needed credible scientists to promote doubt. In every case, they argued that the scientific evidence was not strong enough to lead to government action.

Why did they do it? It wasn’t for money, nor for scientific concerns. They did it because they shared the political ideology that Soros calls “free market economy”. This brand of neo-liberalism was first widely promoted by Thatcher and Reagan, but also lives on even in the policies of left-leaning politicians such as Tony Blair. The ideology is based on the work of Milton Friedman. The problem, of course, is that environmentalists generally argue for regulation, but to the neo-liberal, regulation is one step to governmental control of everything.

This ideological motivation is clear in Singer’s work on the EPA ruling that second-hand smoke is a carcinogen. Independent expert reviews had concluded that second-hand smoke was responsible for 150,000 to 300,000 deaths. So why would a rocket scientist defend the tobacco industry? Singer lays it out clearly in his report: “If we do not carefully limit government control…”

These people tend to refer to environmentalists as “watermelons” – green on the outside, red on the inside. And yet the history of American environmentalism traces back to the work of Roosevelt, and Rockefeller. For example, the 1964 Wilderness Act was clearly bi-partisan – it passed congress with a vote of 373-1. Things began to change in the 1980s, when scientific evidence revealed problems such as acid rain and the ozone hole that seemed to require much greater government regulation, just as Reagan was promoting the idea of less government.

Some environmentalists might be socialists, but this doesn’t mean the science is wrong. But it does mean that there is a a problem with our economic system as we know it. It’s due to “negative externalities” – costs of economic activity that are not borne by those reaping the profits. Stern described climate change as “the greatest market failure ever”. In fact, acid rain, the ozone hole and climate change are all market failure, and it’s science that revealed this.

It seems pretty clear that all Americans believe in liberty, and prefer less intrusion by government. But at the same time, all societies accept there are limits to their freedoms. The debate, then, is on where these limits l should ie, which is clearly not a scientific question.

If this analysis is correct, then we should focus not on more evidence that the science is unequvocal, nor on collecting more evidence that there is a consensus among scientists. What we need is more vivid portrayals of what will happen.

The next talk was by Wally Broecker, about his latest book, The Great Ocean Conveyor. He said he wrote the book partly because he loves to write books, and partly because he’s been encouraged to speak out more on global warming, especially to young people. He wrote it in 3 months, but it took about a year to get published. Which is a shame, because in a fast moving science, things go out of date very quickly.

Students have a tendency to think everything in their textbooks is gospel. But of course this is not true – the science moves on. In the book, Wally shows that many of the things he originally thought about the ocean conveyer turned out not to be correct.

The first diagram showing the conveyer was produced from a sketch for a magazine article. Wally never met the artist, and the diagram is wrong in many ways, but it does get across the idea that the ocean is an interconnected system.

The ocean conveyer idea was discovered by serendipity. A series of meetings were held to examine the new data coming from the Greenland ice cores. On seeing graphs showing the CO2 record against ice depth, Wally wondered how the wide variations in the CO2 record could be explained. He focussed on the North Atlantic, exploring whether the CO2 could have got in and out of the atmosphere through changes to the the ocean overturning. Eventually he stumbled on the idea of the ocean conveyer.

I was particularly struck by the map Wally showed of world river drainage, showing that the vast majority of the world’s landmasses drain into the Atlantic. This drainage pattern, together with condensation from warm tropical seas cause large changes in salt concentration, which in turn drive ocean movements because saltier water is heavier and sinks, while less salty water rises.

There are still a number of mysteries to be solved. For example, what caused the Younger Dryas event? Wally was a proponent of the theory that a break in ocean overturning occurred when Lake Agassiz broke through to drain into the Atlantic, dramatically changing the salinity. But no evidence of this flood has been found, so he’s had to abandon this idea. Some argue that the flood might have gone in a different direction (e.g. to Gulf of Mexico).  Or it could all have been due to a meteorite. It’s a big remaining problem – what caused it?

The next talk was by Dorothy Kenny, “Seeing Through Smoke: Sorting through the Science and Politics in the Making of the 1956 British Clean Air Act”. She hasn’t published this study yet, but is hoping to find a publisher soon. The story starts on December 5th, 1952. A “pea soup” fog covers London. White shirts turns grey. Streetcars are abandoned in the street. The smog lasts until the 9th, and newspapers start to tot up the growing death count. Within a week, 4,000 people were dead. Three months later, the death toll had risen to 12,000 people. The smogs had become killers.

By July 1953, the UK government had formed a Committee on Air pollution. In December 1953, it presented an interim report on the cause and effect of the smogs (but with no policy prescriptions). A year later, it produced a final report with plans for action, and in 1956 the clean air act was finalized and passed by parliament.

What was needed for this act to pass? Dorothy laid out three factors:

  1. Responsibility had to be established. Who was responsible for acting? Three different Ministries (Health; Housing and Local Govt; and Fuel and Power) all punted, each pointing at the department of science and industry. But DSIR hadn’t looked into it, citing a lack of funding, and a lack of people. The formation of the Beaver committee fixed this – the committee could become the central body for public discontent. They were anxious to get something published by the first anniversary of the smog, in part responding to the need for a ritual response to show that government is doing something.
  2. The problem needed to be defined and described. The interim report identified sulphur dioxides and visible smoke as the main culprits, both from coal. The media critized the report, because it didn’t propose a solution, and just told people to stay indoors on smog days. There was widespread fear of another killer smog and the public wanted a plan of action.
  3. Possible solutions to the problem needed to be discussed and weighed up. A cost-benefit analysis was used in the final report to include and exclude policy solutions. In the end, the clean air act focussed on particulate matter, and left out any action on sulphur dioxide. It promoted smokeless fuel, which was a huge cultural change, taking away the traditional British coal fire, and replacing it with a new, strange fuel. Even the public pamphlets at the time hid the role of SO2, eliding them from graphs showing the impacts of the smogs. Why was SO2 excluded? Largely because of technical limitations. The available approaches for removing SO2 from coal were deemed impractical: flue gas washing, which involves flushing river water through the flues and dumping it back into the rivers, was highly polluting; while coal washing was ineffective, as there was no method at the time to get rid of the sulphur. The committee argued that solutions deemed not practical could not be included in the legislation.

What lessons can be drawn from the clean air act? First, that environmental policy is exceedingly complex. Second, policy doesn’t necessarily have a short term outcome. Third, even with loopholes and exclusions, the act was effective in setting the framework for dark smoke prevention. And finally, a change in public perception was crucial.

Next up was Jim Hansen, talking about his book “Storms of My Grandchildren: The Truth about the Coming Climate Catastrophe and Our Last Chance to Save Humanity”. (Lots of extra people flowed into the room for this talk!)

Jim gave a thoughtful account of his motivations, in particular the point that climate change is much more than a scientific matter. He has been doing science all his life, but it is only in the last few years that his grandchildren have dragged him into other aspects. Most especially, he’s motivated by the thought that he doesn’t want his grandchildren to look back and say “Grandpa understood the problem but didn’t do enough to make it clear”.

One thing he keeps forgetting to mention when talking about the book: all the royalties go to 350.org, which Jim believes is probably the most effective organization right now pushing for action.

Jim argues that dealing with climate change is not only possible, but makes sense for all sorts of reasons. But lots of people are busy making money from business as usual, and in particular, all governments are heavily invested in the fossil fuel industry.

Jim had testified to congress in the 1980s, and got lots of attention after this, but he decided didn’t want to get involved in this public aspect. So he referred requests from the media to other scientists who he thought more enjoyed the public visibility. Then, in 1990, after a newspaper report called him the grandfather of global warming, he used a photo in one of his talks of first grandchild, Sophie, at age 2, to demonstrate that at least was a grandfather, if not of global warming.

Later, he was invited to give a talk in Washington which for various reasons never happened, so he gave it instead as a distinguished lecture at the University of Iowa. In the talk, he used another photo of his grandchildren, to make a point about public understanding. It shows Sophie explaining greenhouse gas warming to her baby brother, with the caption “It’s 2W/m² forcing”. But baby Connor only counts to 1.

Just before the talk, he got a memo from NASA saying not to give the talk, as it could violate policy. He ignored the message, and gave the talk anyway, as he had paid his own way to get there for a vacation. A year later in 2005, Keeling invited him to give another talk, and for this he decided to connect the dots between special interests seeking to maximize profits and the long term economic wellbeing of the country. This talk gave rise to the “shitstorm at NASA HQ”, and the decision to prevent him from talking to the media. He managed to get the ban lifted by talking about it to the NY Times. But even that story was presented wrongly in the press – it wasn’t a 24 year-old appointee at NASA public relations, but a decision from very high up in NASA headquarters.

Then, in 2007, Bill McKibben started asking what is a safe level for carbon dioxide concentrations in the atmosphere. Bill was going to start an organisation called 450.org, based on Hansen’s work. But by 2007, it was becoming clear that even 450ppm might still be disastrous. Jim told him to wait until the AGU2007 fall meeting, when he would present a new paper with a new number. The analysis showed that if we want to keep a planet similar to the one in which civilization developed, we need to get back below 350ppm. This is feasible if we phase out coal over next two decades and leave the oil sands untouched. But the US has just signed an agreement for a pipeline from the Alberta tar sands to Texas refineries. The problem is that there’s a huge gap between the rhetoric of politicians and their policies, which are just small perturbations from business as usual.

Now he has two more grandchildren. Jim showed a photo of Jake at 2.5 years, showing he thinks he can protect his baby sister. But of course, Jake doesn’t understand there is more warming in the pipeline. The issue is really about inter-generational justice, but the public doesn’t understand this. It’s also about international justice – the developed countries have become rich by burning fossil fuels, but are unwilling to admit this. Fossil fuels are the cheapest source of energy, but only because nobody is obligated to pay for the damage caused.

Jim’s suggested solution is a fee at the point of energy generation, to be distributed to all people in the country (sometimes known as fee and dividend). It would stimulate the economy by putting money into peoples hands. He believes cap-and-trade won’t work, because industry, and China and India, won’t accept a cap. Cap-and-trade also keeps the issue very close to (and under control of) the fossil fuel industry.

So what are young people supposed to do? Recently, the young people in Britain who blocked a coal plant were convicted, and are likely to serve a jail term. Jim’s first grandchild, Sophie, now 12, wrote a letter to Obama, which includes phrases like “why don’t you listen to my grandfather?”. It’s rather a good letter. Young people need positive examples of things like this that they can do.

Jim ended his talk on a couple of notes of optimism:

  • China is behaving rationally. There is good chance they will put a price on carbon, and they are making enormous investments in carbon-free energy.
  • The legal approach is promising. The judicial branch of the US government is less influenced by fossil fuel money. We can sue the government for not doing it’s job!

The next speaker was Heidi M. Cullen, talking about her book “The Weather of the Future: Heat Waves, Extreme Storms, and Other Scenes from a Climate-Changed Planet”. Heidi set out to walk through the process of writing a book. She works for a non-profit group, Climate Central, aimed at communicating the science to the general public.

Heidi worked for many years as a climatologist for the weather channel, where she found it very hard to explain climate change to people who don’t understand the difference between climate and weather. When hurricane Katrina hit, she felt like a loser. It was the biggest story of the year, and as a climatologist, there was very little she could say about this tragic, terrible event. It was too hard amongst all the human tragedy to connect the dots and provide the context. But the experience planted the seed for the book, because it was a big climate change story – scientists had been saying for 20 years how vulnerable New Orleans was, and the disaster could have been prevented. And this story needed to be told.

So book was designed to tell the history – showing it goes all the way back to Arrhenius, not just something that started in the 1980’s with Hansen’s testimony to congress. And to tell the story of the science as a heroic endeavour, looking at the research that scientists are doing now, and how it fits into the story.

A recent poll showed that less than 18% of Americans know a scientist personally. So an important premise for the book was an attempt to connect the public more with scientists and their work. Heidi began by emailing all the climate scientists she knew, asking them if they had to pick the hotspots in the science, what would they pick.

It was a lot of work with publisher to pitch the book, and to convince them they should publish “another book on climate change”. Heidi’s editor was brilliant. He was also working on Pat Benetar’s biography, and other book on Rock and Roll, which made for an interesting juxtaposition. His advice was not to start the book at the beginning, but to start at the easiest place. But as an engineer, being anal, Heidi wanted to start at the beginning. Her editor turned out to be right.

It was very hard to manage the time needed to write the book. Each chapter, on a specific scientist, was effectively peer reviewed by the scientists. There were lots of interviews, all recorded and transcribed, which takes ages. She tried to tell it as a story that people could relate to. The story had no pre-ordained outcome, but different aspects scared the scientists in different ways.

The book came out in August, coincidentally, at the same time as the Russian heatwaves, so it got lots of interest from the press. Which brings Heidi to her final point: when you’ve finished the book and it gets published, that’s really only the start of the process!

The final talk of the session was by Greg Craven, author of “What’s the Worst that Could Happen”. Greg’s talk was completely different from everything that had come before. He gave an impassioned speech, more like the great speeches of the civil rights era – a call to arms – than a scientific talk. Which made both a great contrast to the previous speakers, and a challenge to them.

Greg challenged the audience, the scientists of the AGU, by pointing out we’re insane, at least according to the definition that insanity is doing the same thing over and over again expecting a different outcome. His point is that we’ve been using the same communication strategy, giving them straightforward scientific information, and that strategy isn’t working. Therefore it’s time for a radical change in approach. It’s time for scientists to come way outside of their comfort zones, and to inject some emotion, some passion in to the message.

It became clear during the talk that Greg was on at least his third different version of the talk, having lost one version when his hard drive crashed in the early hours, and having been inspired by the previous night’s dinner conversation with several seasoned climate scientists.

Greg’s advice was to stop communicating as scientists, and start speaking as human beings. Talk about our hopes and fears, and tell them frankly about the terrors you are ignoring when you get your head down doing the science, hoping that someone else will solve the problem. Scientists are civilization’s last chance – the cavalry who must come charging down the hill.

If you don’t believe now is the time, then come up with an operational definition, a test, for when it is the appropriate time to take extreme action. And if you can demonstrate rationally that it’s not the right time, then you can be absolved from the fight.

Anyway, I couldn’t possibly do justice to Greg’s passionate speech – you had to be there! Luckily, he’s promised to post the text of the speech to gregcraven.org by the weekend. Go read it, and figure out how you would respond to his challenge.

Here’s the first of a series of posts from the American Geophysical Society (AGU) Fall meeting, which is happening this week in San Francisco. The meeting is huge – they’re expecting 19,000 scientists to attend, making it the largest such meeting in the physical sciences.

The most interesting session today was a new session for the AGU:  IN14B “Software Engineering for Climate Modeling”. And I’m not just saying that because it included my talk – all the talks were fascinating. (I’ve posted the slides for my talk, “Do Over or Make Do: Climate Models as a Software Development Challenge“).

After my talk, the next speaker was Cecelia DeLuca of NOAA, with a talk entitled “Emergence of a Common Modeling Architecture for Earth System Science”. Cecelia gave a great overview of the Earth System Modelling Framework. She began by pointing out that climate models don’t just contain science code – they consist of a number of different kinds of software. Lots of the code is infrastructure code, which doesn’t necessarily need to be written by scientists. Around ten years ago, a number of projects started up that had the aim of building shared, standards-based infrastructure code. The projects needed to develop the technical and mathematical expertise to build infrastructure code. But the advantages of separating this code development from the science code was clear: the teams building infrastructure code could prioritize best practices, run the nightly testing process, etc, whereas typically the scientists would not do this.

ESMF provides a common modelling architecture. Native model data structures (modules, fields, grids, timekeeping) are wrapped into ESMF standard data structures, which conform to relevant standards (E.g. ISO standards, CF standards, and the Metafor common information model, etc). The framework also offers runtime compliance checking (e.g. to check timekeeping behaviour is correct), and automated documentation (e.g. the ability to write out model metadata in an XML standard format).

Because of these efforts, in the US, earth system  models are converging on a common architecture. It’s built on standardized component interfaces, and creates a layer of structured information within Earth system codes. The lesson here is that if you can take the legacy code, and express it in a standard way, you get tremendous power.

The next speaker was Amy Langenhorst from GFDL, “Making sense of complexity with the FRE climate modelling workflow system”. Amy explained the organisational setup at GFDL: there are approximately 300 people organized into groups: 6 science based groups groups, plus a technical services group, and a modelling services group. The latter consists of 15 people, with one of them acting as a liaison for each of the science groups. This group provides the software engineering support for the science teams.

The Flexible Modeling System (FMS) is software framework that provides a coupler and infrastructure support. FMS releases happen about once per year; it provides an extensive testing framework that currently includes 209 different model configurations.

One of the biggest challenges for modelling groups like GFDL is the IPCC cycle. Each providing the model runs for the IPCC assessments involves massive complex data processing, for which a good workflow manager is needed. FRE is the workflow manager for FMS. Development of FRE was started in 2002 by Amy, at a time when the model services group didn’t yet exist.

FRE includes version control, configuration management, tools for building executables, control of execution, etc. It also provides facilities for creating XML model description files, model configuration (using a component-based approach), and integrated model testing (e.g. basic tests, restarts, scaling). It also allows for experiment inheritance, so that it’s possible to set up new model configurations based on variants of previous runs, which is useful for perturbation studies.

Next up was Rob Burns from NASA GSFC, talking about “Software Engineering Practices in the Development of NASA Unified Weather Research and Forecasting (NU-WRF) Model“. WRF is a weather forecasting model originally developed at NCAR, but widely used across the NWP community. NU-WRF is an attempt to unify variants of NCAR WRF and to facilitate better use of WRF. NU-WRF is built from versions of NCAR’s WRF, with separate process of folding in enhancements.

As is common with many modelling efforts, there were challenges arising from multiple science teams, with individual goals, interests and expertise, and scientists don’t consider software engineering as their first priority. At NASA, the Sofware Integration and Visualization Office (SIVO) provides Software Engineering support for the scientific modelling teams. SIVO helps to drive, but not to lead the scientific modelling efforts. They help with full software lifecycle management, assisting with all software processes from requirements to release, but with domain experts still making the scientific decisions. The code is under full version control, using Subversion, and the software engineering team coordinates the effort to get the codes into version control.

The experience with NU-WRF shows that this kind of partnership between science teams and a software support team can work well. Leadership and active engagement with the science teams is needed. However, involvement of the entire science team for decisions is too slow, so a core team was formed to do this.

The next speaker was Thomas Clune from NASA GISS, with a talk “Constraints and Opportunities in GCM Model Development“. Thomas began with the question: How did we end up with the software we have today? From a software quality perspective, we wrote the wrong software. Over the years, improvements in fidelity in the models have driven a disproportionate growth in complexity of implementations.

One important constraint is that model codes change relatively slowly, in part because of the model validation processes – it’s important to be able to validate each code change individually – they can’t be bundled together. But also because code familiarity is important – the scientists have to understand their code, and if it changes too fast, they lose this familiarity.

However, the problem now is that software quality is incommensurate with the growing socioeconomic role for our models in understanding climate change. There’s a great quote from Ward Cunningham: “Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise…” Examples of this debt in climate models include long procedures, kludges, cut-and-paste duplication, short/ambiguous names, and inconsistent style.

The opportunities then are to exploit advances in software engineering from elsewhere to systematically and incrementally improve the software quality of climate models. For example:

  • Coding standards – these improve productivity through familiarity, reducesome types of bugs, and help newcomers. But must be adopted from within the community by negotiation.
  • Abandon CVS. It has too many liabilities for managing legacy code, e.g. a permanence to the directory structures. The community needs version control systems that handle branching and merging. NASA GISS is planning to switch to GIT in the new year, as soon as the IPCC runs are out of the way.
  • Unit testing. There’s a great quote from Michael Feathers: “The main thing that distinguishes legacy code from non-legacy code is tests. Or rather lack of tests”. Lack of tests leads to fear of introducing subtle bugs. Elsewhere, unit testing frameworks have caused a major shift in how commercial software development works, particularly in enabling test-driven development. Tom has been experimenting with pFUnit, a testing framework with support for parallel Fortran and MPI. The existence of such testing frameworks removes some of the excuses for not using unit testing for climate models (in most cases, the modeling community relies on regression testing in preference to unit testing). Some of the reasons commonly given for not doing unit testing seem to represent some confusion about what unit testing is for: e.g. that some constraints are unknown, that tests would just duplicate implementation, or that it’s impossible to test emergent behaviour. These kinds of excuse indicate that modelers tend to conflate scientific validation with the verification offered by unit testing.
  • Clone Detection. Tools now exist to detect code clones (places where code has been copied, sometimes with minor modifications across different parts of the software). Tom has experimented with some of these with the NASA modelE, with promising results.

The next talk was by John Krasting from GFDL, on “NOAA-GFDL’s Workflow for CMIP5/IPCC AR5 Experiments”. I didn’t take many notes, mainly because the subject was very familiar to me, having visited several modeling labs over the summer, all of whom were in the middle of the frantic process of generating their IPCC CMIP5 runs (or in some cases struggling to get started).

John explained that CMIP5 is somewhat different from the earlier CMIP projects, because it is much more comprehensive, with a much larger set of model experiments, and much larger set of model variables requested. CMIP1 focussed on pre-industrial control runs, while CMIP2 added some idealized climate change scenario experiments. For CMIP3, the entire archive (from all modeling centres) was 36 terabytes. For CMIP5, this is expected to be at least two orders of magnitude bigger. Because of the larger number of experiments, CMIP5 has a tiered structure, so that some kinds of experiments are prioritized (e.g. see the diagram from Taylor et al).

GFDL is expecting to generate around 15,000 model years of simulation, yielding around 10 petabytes of data, of which around 10%-15% will be released to the public, distributed via the ESG Gateway. The remainder of the data represents some redundancy, and some diagnostic data that’s intended for internal analysis.

The final speaker in the session was Archer Batcheller, from University of Michigan, with a talk entitled “Programming Makes Software; Support Makes Users“. Archer was reporting on the results of a study he has been conducting of several software infrastructure projects in the earth system modeling community. His main observation is that e-Science is about growing socio-technical systems, and that people are a key part of these systems. Effort is needed to nurture communities of users, but such effort is crucial for building the scientific cyberinfrastructure.

From his studies, Archer found that most people developing modeling infrastructure software divide their time about 50:50 between coding and other activities, including:

  • “selling” – explaining/promoting the software in publications, at conferences, and at community meetings (even though the software is free, it still has to be “marketed”)
  • support – helping users, which in turn helps with identifying new requirements
  • training – including 1-on-1, workshops online tutorials, etc.

Call for Papers:
IEEE Software Special Issue on Climate Change: Software, Science and Society

Submission Deadline: 8 April 2011
Publication (tentative): Nov/Dec 2011

A vast software infrastructure underpins our ability to understand climate change, assess the implications, and form suitable policy responses. This software infrastructure allows large teams of scientists to construct very complex models out of many interlocking parts, and further allows scientists, activists and policymakers to share data, explore scenarios, and validate assumptions. The extent of this infrastructure is often invisible (as infrastructure often is, until it breaks down), both to those who rely on it, and to interested observers, such as politicians, journalists, and the general public. Yet weaknesses in this software (whether real or imaginary) will impede our ability to make progress on what may be the biggest challenge faced by humanity in the 21st Century.

This special issue of IEEE Software will explore the challenges in developing the software infrastructure for understanding and responding to climate change. Our aim is to help bridge the gap between the software community and the climate science community, by soliciting a collection of articles that explain the nature and extent of this software infrastructure, the technical challenges it poses, and the current state-of-the-art.

We invite papers covering any of the software challenges involved in creating this technical infrastructure, but please note that we are not soliciting papers that discuss the validity of the science itself, or which take sides in the policy debate on climate change.

We especially welcome review papers, which explain the current state-of-the-art in some specific aspect of climate software in an accessible way, and roadmap papers, which describe the challenges in the construction and validation of this software. Suitable topics for the special issue include (but are not restricted to):

  • Construction, verification and validation of computational models and data analysis tools used in climate science;
  • Frameworks, coupling strategies and software integration issues for earth system modeling;
  • Challenges of scale and complexity in climate software, including high data volumes and throughputs, massive parallelization and performance issues, numerical complexity, and coupling complexity;
  • Challenges of longevity and evolution of climate models codes, including legacy code, backwards compatibility, and computational reproducibility;
  • Experiences with model ensembles and model inter-comparison projects, particularly as these relate to software verification and validation;
  • Meta-data standards and data management for earth system data, including the challenge of making models and data self-describing;
  • Coordination of cross-disciplinary teams in the development of integrated assessment and decision support systems;
  • The role of open science and usable simulation tools in increasing public accessibility of climate science and public participation in climate policy discussions;
  • Case studies and lessons learned from application of software engineering techniques within climate science.

Manuscripts must not exceed 4,700 words including figures and tables, which count for 200 words each. Submissions in excess of these limits may be rejected without refereeing. The articles we deem within the theme’s scope will be peer-reviewed and are subject to editing for magazine style, clarity, organization, and space. Be sure to include the name of the theme you are submitting for.

Articles should have a practical orientation, and be written in a style accessible to software practitioners. Overly complex, purely research-oriented or theoretical treatments are not appropriate. Articles should be novel. IEEE Software does not republish material published previously in other venues, including other periodicals and formal conference/workshop proceedings, whether previous publication was in print or in electronic form.


For more information about the special issue, contact the Guest Editors:

  • Steve Easterbrook, University of Toronto, Canada (sme@cs.toronto.edu)
  • Reinhard Budich, Max Planck Institute for Meteorology, Germany (reinhard.budich@zmaw.de)
  • Paul N. Edwards, University of Michigan, USA (pne@umich.edu)
  • V. Balaji, NOAA Geophysical Fluid Dynamics Laboratory, USA. (balaji@princeton.edu)

For general author guidelines: www.computer.org/software/author.htm

For submission details: software@computer.org

To submit an article: https://mc.manuscriptcentral.com/sw-cs

I mentioned this in the comment thread on my earlier post on model IV&V, but I’m elevating it to a full post here because it’s an interesting point of discussion.

I had a very interesting lunch with David Randall at CSU yesterday, in which we talked about many of the challenges facing climate modelling groups as they deal with increasing complexity in the models. One topic that came up was the question of whether it’s time for the climate modeling labs to establish separate divisions for the science models (experimental tools for trying out new ideas) and production models (which would be used for assessments and policy support). This separation hasn’t happened in climate modelling, but may well be inevitable, if the the anticipated market for climate services ever materializes.

There are many benefits of such a separation. It would clarify the goals and roles within the modeling labs, and allow for a more explicit decision process that decides which ideas from the bleeding edge science models are mature enough for inclusion in the operational models. The latter would presumably only contain the more well-established science, would change less rapidly, and could be better engineered for robustness and usability. And better documented.

But there’s a huge downside: the separation would effectively mean two separate models need to be developed and maintained (thus potentially doubling the effort), and the separation would make it harder to get the latest science transferred into the production model. Which in turn would mean a risk that assessments such as the IPCC’s become even more dated than they are now: there’s already a several year delay because of the time it takes to complete model runs, share the data, analyze it, peer-review and publish results, and then compile the assessment reports. Divorcing science models from production models would make this delay worse.

But there’s an even bigger problem: the community is too small. There aren’t enough people who understand how to put together a climate model as it is; bifurcating the effort will make this shortfall even worse. David points out that part of the problem is that climate models are now so complex that nobody really understands the entire model; the other problem is that our grad schools aren’t producing many people who have both the aptitude and enthusiasm for climate modeling. There’s a risk that the best modellers will choose to stay in the science shops (because getting leading edge science into the models is much more motivating), leaving insufficient expertise to maintain quality in the production models.

So really, it comes down to some difficult questions about priorities: given the serious shortage of good modellers, do we push ahead with the current approach in which progress at the leading edge of the science is prioritized, or do we split the effort to create these production shops? It seems to me that what matters for the IPCC at the moment is a good assessment of the current science, not some separate climate forecasting service. If a commercial market develops for the latter (which is possible, once people really start to get serious about climate change), then someone will have to figure out how to channel the revenues into training a new generation of modellers.

When I mentioned this discussion on the earlier thread, Josh was surprised at the point that universities aren’t producing enough people with the aptitude and motivation:

“seems like this is a hot / growth area (maybe that impression is just due to the press coverage).”

Michael replied:

“Funding is weak and sporadic; the political visibility of these issues often causes revenge-taking at the top of the funding hierarchy. Recent news, for instance, seems to be of drastic cuts in Canadian funding for climate science. …

The limited budgets lead to attachment to awkward legacy codes, which drives away the most ambitious programmers. The nature of the problem stymies the most mathematically adept who are inclined to look for more purity. Software engineers take a back seat to physical scientists with little regard for software design as a profession. All in all, the work is drastically ill-rewarded in proportion to its importance, and it’s fair to say that while it attracts good people, it’s not hard to imagine a larger group of much higher productivity and greater computational science sophistication working on this problem.

And that is the nub of the problem. There’s plenty of scope for improving the quality of the models and the quality of the software in them. But if we can’t grow the pool of engaged talent, it won’t happen.

My post on validating climate models suggested that the key validation criteria is the extent to which the model captures (some aspect of) the current scientific theory, and is useful in exploring the theory. In effect, I’m saying that climate models are scientific tools, and should be validated as scientific tools. This makes them very different from, say numerical weather prediction (NWP) software, which are used in an operational setting to provide a service (predicting the weather).

What’s confusing is that both communities (climate modeling and weather modeling) use many of the same techniques both for the design of the models, and for comparing the models with observational data.

For NWP, forecast accuracy is the overriding objective, and the community has developed an extensive methodology for doing forecast verification. I pondered for a while whether this use of the term ‘verification’ here is consistent with my definitions, because surely we should be “validating” a forecast rather than “verifying it”. After thinking about it for a while, I concluded that the terminology is consistent, because forecast verification is like checking a program against it’s specification. In this case the specification states precisely what is being predicted, with what accuracy, and what would constitute a successful forecast (Bob Grumbine gives a recent example in verifying accuracy of seasonal sea ice forecasts). The verification procedure checks that the actual forecast was accurate, within the criteria set by this specification. Whether or not the forecast was useful is another question: that’s the validation question (and it’s a subjective question that requires some investigation of why people want forecasts in the first place).

An important point here is that forecast verification is not software verification: it doesn’t verify a particular piece of software. It’s also not simulation verification: it doesn’t verify a given run produced by that software. It’s verification of an entire forecasting system. A forecasting system makes use of computational models (often more than one), as well as a bunch of experts who interpret the model results.It also includes an extensive data collection system that gathers information about the current state of the world to use as input to the model. (And of course, some forecasting systems don’t use computational models at all). So:

  • If the forecast is inaccurate (according to the forecast criteria), it doesn’t necessarily mean there’s a flaw in the models – it might just as well be a flaw in the interpretation of the model outputs, or in the data collection process that provided it’s inputs. Oh, and of course, the verification might also fail because the specification is wrong, e.g. because there are flaws in the observational system used in the verification procedure too.
  • If the forecasting system persistently produces accurate forecasts (according to the forecast criteria), that doesn’t necessarily tell us anything about the quality of the software itself, it just means that the entire forecast system worked. It may well be that the model is very poor, but the meteorologists who interpret model outputs are brilliant at overcoming the weaknesses in the model (perhaps in the way they configure the runs, or perhaps in the way they filter model outputs), to produce accurate forecasts for their customers.

However, one effect of using this forecast verification approach day-in-day-out for weather forecasting systems over several decades (with an overall demand from customers for steady improvements in forecast accuracy) is that all parts of the forecasting system have improved dramatically over the last few decades, including the software. And climate modelling has benefited from this, as improvements in the modelling of processes needed for NWP can often also be used to improve the climate models (Senior et al have an excellent chapter on this in a forthcoming book, which I will review nearer to the publication date).

The question is, can we apply a similar forecast verification methodology to the “climate forecasting system”, despite the differences between weather and climate?

Note that the question isn’t about whether we can verify the accuracy of climate models this way, because the methodology doesn’t separate the models from the broader system in which they are used. So, if we take this route at all, we’re attempting to verify the forecast accuracy of the whole system: collection of observational data, creation of theories, use of these theories to develop models, choices for which model and which model configuration to use, choices for how to set up the runs, and interpretation of the results.

Climate models are not designed as forecasting tools, they are designed as tools to explore current theories about the climate system, and to investigate sources of uncertainty in these theories. However, the fact that they can be used to project potential future climate change (under various scenarios) is very handy. Of course, this is not the only way to produce quantified estimates of future climate change – you can do it using paper and pencil. It’s also a little unfortunate, because the IPCC process (or at least the end-users of IPCC reports) tend to over-emphasize the model projections at the expense of the science that went into them, and increasingly the funding for the science is tied to the production of such projections.

But some people (both within the climate modeling community and within the denialist community) would prefer that they not be used to project future climate change at all. (The argument from within the modelling community is that the results get over-interpreted or mis-interpreted by lay audiences; the argument from the denialist community is that models aren’t perfect. I think these two arguments are connected…). However, both arguments ignore reality: society demands of climate science that it provides its best estimates of the rate and size of future climate change, and (to the extent that they embody what we currently know about climate) the models are the best tool for this job. Not using them in the IPCC assessments would be like marching into the jungle with one eye closed.

So, back to the question: can we use NWP forecast verification for climate projections? I think the answer is ‘no’, because of the timescales involved. Projections of climate change really only make sense on the scale of decades to centuries. Waiting for decades to do the verification is pointless – by then the science will have moved on, and it will be way too late for policymaking purposes anyway.

If we can’t verify the forecasts on a timescale that’s actually useful, does this mean the models are invalid? Again the answer is ‘no’, for three reasons. First, we have plenty of other V&V techniques to apply to climate models. Second, the argument that climate models are a valid tool for creating future projections of climate change is based not on our ability to do forecast verification, but on how well the models capture the current state of the science. And third, because forecast verification wouldn’t necessarily say anything about the models themselves anyway, as it assesses the entire forecast system.

It would certainly be really, really useful to be able to verify the “climate forecast” system. But the fact that we can’t does not mean we cannot validate climate models.

30. November 2010 · 2 comments · Categories: humour

Prem sent me a picture this morning which beautifully illustrates the way the media portrays science. On the left, the Wall Street Journal (page D1). On the right, the New York Times (page A1).  Both today:

Reminds me of one of my favourite cartoons:

In my last two posts, I demolished the idea that climate models need Independent Verification and Validation (IV&V), and I described the idea of a toolbox approach to V&V. Both posts were attacking myths: in the first case, the myth that an independent agent should be engaged to perform IV&V on the models, and in the second, the myth that you can critique the V&V of climate models without knowing anything about how they are currently built and tested.

I now want to expand on the latter point, and explain how the day-to-day practices of climate modellers taken together constitute a robust validation process, and that the only way to improve this validation process is just to do more of it (i.e. give the modeling labs more funds to expand their current activities, rather than to do something very different).

The most common mistake made by people discussing validation of climate models is to assume that a climate model is a thing-in-itself, and that the goal of validation is to demonstrate that some property holds of this thing. And whatever that property is, the assumption is that such measurement of it can be made without reference to its scientific milieu, and in particular without reference to its history and the processes by which it was constructed.

This mistake leads people to talk of validation in terms of how well “the model” matches observations, or how well “the model” matches the processes in some real world system. This approach to validation is, as Oreskes et al pointed out, quite impossible. The models are numerical approximations of complex physical phenomena. You can verify that the underlying equations are coded correctly in a given version of the model, but you can never validate that a given model accurately captures real physical processes, because it never will accurately capture them. Or as George Box summed it up: “All models are wrong…” (we’ll come back to the second half of the quote later).

The problem is that there is no such thing as “the model”. The body of code that constitutes a modern climate model actually represents an enormous number of possible models, each corresponding to a different way of configuring that code for a particular run. Furthermore, this body of code isn’t a static thing. The code is changed on a daily basis, through a continual process of experimentation and model improvement. Often these changes are done in parallel, so that there are multiple version at any given moment, being developed along multiple lines of investigation. Sometimes these lines of evolution are merged, to bring a number of useful enhancements together into a single version. Occasionally, the lines diverge enough to cause a fork: a point at which they are different enough that it just becomes too hard to reconcile them (See for example, this visualization of the evolution of ocean models). A forked model might at some point be given a new name, but the process by which a model gets a new name is rather arbitrary.

Occasionally, a modeling lab will label a particular snapshot of this evolving body of code as an “official release”. An official release has typically been tested much more extensively, in a number of standard configurations for a variety of different platforms. It’s likely to be more reliable, and therefore easier for users to work with. By more reliable here, I mean relatively free from coding defects. In other words, it is better verified than other versions, but not necessarily better validated (I’ll explain why shortly). In many cases, official releases also contain some significant new science (e.g. new parameterizations), and these scientific enhancements will be described in a set of published papers.

However, an official release isn’t a single model either. Again it’s just a body of code that can be configured to run as any of a huge number of different models, and it’s not unchanging either – as with all software, there will be occasional bugfix releases applied to it. Oh, and did I mention that to run a model, you have to make use of a huge number of ancillary datafiles, which define everything from the shape of the coastlines and land surfaces, to the specific carbon emissions scenario to be used. Any change to these effectively gives a different model too.

So, if you’re hoping to validate “the model”, you have to say which one you mean: which configuration of which code version of which line of evolution, and with which ancillary files. I suppose the response from those clamouring for something different in the way of model validation would say “well, the one used for the IPCC projections, of course”. Which is a little tricky, because each lab produces a large number of different runs for the CMIP process that provides input to the IPCC, and each of these is a likely to involve a different model configuration.

But let’s say for sake of argument that we could agree on a specific model configuration that ought to be “validated”. What will we do to validate it? What does validation actually mean? The Oreskes paper I mentioned earlier already demonstrated that comparison with real world observations, while interesting, does not constitute “validation”. The model will never match the observations exactly, so the best we’ll ever get along these lines is an argument that, on balance, given the sum total of the places where there’s a good match and the places where there’s a poor match, that the model does better or worse than some other model. This isn’t validation, and furthermore it isn’t even a sensible way of thinking about validation.

At this point many commentators stop, and argue that if validation of a model isn’t possible, then the models can’t be used to support the science (or more usually, they mean they can’t be used for IPCC projections). But this is a strawman argument, based on a fundamental misconception of what validation is all about. Validation isn’t about checking that a given instance of a model satisfies some given criteria. Validation is about about fitness for purpose, which means it’s not about the model at all, but about the relationship between a model and the purposes to which it is put. Or more precisely, its about the relationship between particular ways of building and configuring models and the ways in which runs produced by those models are used.

Furthermore, the purposes to which models are put and the processes by which they are developed co-evolve. The models evolve continually, and our ideas about what kinds of runs we might use them for evolve continually, which means validation must take this ongoing evolution into account. To summarize, validation isn’t about a property of some particular model instance; its about the whole process of developing and using models, and how this process evolves over time.

Let’s take a step back a moment, and ask what is the purpose of a climate model. The second half of the George Box quote is “…but some models are useful”. Climate models are tools that allow scientists to explore their current understanding of climate processes, to build and test theories, and to explore the consequences of those theories. In other words we’re dealing with three distinct systems:

We're dealing with relationships between three different systems

There does not need to be any clear relationship between the calculational system and the observational system – I didn’t include such a relationship in my diagram. For example, climate models can be run in configurations that don’t match the real world at all: e.g. a waterworld with no landmasses, or a world in which interesting things are varied: the tilt of the pole, the composition of the atmosphere, etc. These models are useful, and the experiments performed with them may be perfectly valid, even though they differ deliberately from the observational system.

What really matters is the relationship between the theoretical system and the observational system: in other words, how well does our current understanding (i.e. our theories) of climate explain the available observations (and of course the inverse: what additional observations might we make to help test our theories). When we ask questions about likely future climate changes, we’re not asking this question of the the calculational system, we’re asking it of the theoretical system; the models are just a convenient way of probing the theory to provide answers.

By the way, when I use the term theory, I mean it in exactly the way it’s used in throughout all sciences: a theory is the best current explanation of a given set of phenomena. The word “theory” doesn’t mean knowledge that is somehow more tentative than other forms of knowledge; a theory is actually the kind of knowledge that has the strongest epistemological basis of any kind of knowledge, because it is supported by the available evidence, and best explains that evidence. A theory might not be capable of providing quantitative predictions (but it’s good when it does), but it must have explanatory power.

In this context, the calculational system is valid as long as it can offer insights that help to understand the relationship between the theoretical system and the observational system. A model is useful as long as it helps to improve our understanding of climate, and to further the development of new (or better) theories. So a model that might have been useful (and hence valid) thirty years ago might not be useful today. If the old approach to modelling no longer matches current theory, then it has lost some or all of its validity. The model’s correspondence (or lack of) to the observations hasn’t changed (*), nor has its predictive power. But its utility as a scientific tool has changed, and hence its validity has changed.

[(*) except that that accuracy of the observations may have changed in the meantime, due to the ongoing process of discovering and resolving anomalies in the historical record.]

The key questions for validation then, are to do with how well the current generation of models (plural) support the discovery of new theoretical knowledge, and whether the ongoing process of improving those models continues to enhance their utility as scientific tools. We could focus this down to specific things we could measure by asking whether each individual change to the model is theoretically justified, and whether each such change makes the model more useful as a scientific tool.

To do this requires a detailed study of day-to-day model development practices, the extent to which these are closely tied with the rest of climate science (e.g. field campaigns, process studies, etc). It also takes in questions such as how modeling centres decide on their priorities (e.g. which new bits of science to get into the models sooner), and how each individual change is evaluated. In this approach, validation proceeds by checking whether the individual steps taken to construct and test changes to the code add up to a sound scientific process, and how good this process is at incorporating the latest theoretical ideas. And we ought to be able to demonstrate a steady improvement in the theoretical basis for the model. An interesting quirk here is that sometimes an improvement to the model from a theoretical point of view reduces its skill at matching observations; this happens particularly when we’re replacing bits of the model that were based on empirical parameters with an implementation that has a stronger theoretical basis, because the empirical parameters were tuned to give a better climate simulation, without necessarily being well understood. In the approach I’m describing, this would be an indicator of an improvement in validity, even while reduces the correspondence with observations. If on the other hand we based our validation on some measure of correspondence with observations, such a step would reduce the validity of the model!

But what does all of this tell us about whether it’s “valid” to use the models to produce projections of climate change into the future? Well, recall that when we ask for projections of future climate change, we’re not asking the question of the calculational system, because all that would result in is a number, or range of numbers, that are impossible to interpret, and therefore meaningless. Instead we’re asking the question of the theoretical system: given the sum total of our current theoretical understanding of climate, what is likely to happen in the future, under various scenarios for expected emissions and/or concentrations of greenhouse gases? If the models capture our current theoretical understanding well, then running the scenario on the model is a valid thing to do. If the models do a poor job of capturing our theoretical understanding, then running the models on these scenarios won’t be very useful.

Note what is happening here: when we ask climate scientists for future projections, we’re asking the question of the scientists, not of their models. The scientists will apply their judgement to select appropriate versions/configurations of the models to use, they will set up the runs, and they will interpret the results in the light of what is known about the models’ strengths and weaknesses and about any gaps between the comptuational models and the current theoretical understanding. And they will add all sorts of caveats to the conclusions they draw from the model runs when they present their results.

And how do we know whether the models capture our current theoretical understanding? By studying the processes by which the models are developed (i.e. continually evolved) be the various modeling centres, and examining how good each centre is at getting the latest science into the models. And by checking that whenever there are gaps between the models and the theory, these are adequately described by the caveats in the papers published about experiments with the models.

Summary: It is a mistake to think that validation is a post-hoc process to be applied to an individual “finished” model to ensure it meets some criteria for fidelity to the real world. In reality, there is no such thing as a finished model, just many different snapshots of a large set of model configurations, steadily evolving as the science progresses. And fidelity of a model to the real world is impossible to establish, because the models are approximations. In reality, climate models are tools to probe our current theories about how climate processes work. Validity is the extent to which climate models match our current theories, and the extent to which the process of improving the models keeps up with theoretical advances.

Sometime in the 1990’s, I drafted a frequently asked question list for NASA’s IV&V facility. Here’s what I wrote on the meaning of the terms “validation” and “verification”:

The terms Verification and Validation are commonly used in software engineering to mean two different types of analysis. The usual definitions are:

  • Validation: Are we building the right system?
  • Verification: Are we building the system right?

In other words, validation is concerned with checking that the system will meet the customer’s actual needs, while verification is concerned with whether the system is well-engineered, error-free, and so on. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

The distinction between the two terms is largely to do with the role of specifications. Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

Verification includes all the activities associated with the producing high quality software: testing, inspection, design analysis, specification analysis, and so on. It is a relatively objective process, in that if the various products and documents are expressed precisely enough, no subjective judgements should be needed in order to verify software.

In contrast, validation is an extremely subjective process. It involves making subjective assessments of how well the (proposed) system addresses a real-world need. Validation includes activities such as requirements modelling, prototyping and user evaluation.

In a traditional phased software lifecycle, verification is often taken to mean checking that the products of each phase satisfy the requirements of the previous phase. Validation is relegated to just the begining and ending of the project: requirements analysis and acceptance testing. This view is common in many software engineering textbooks, and is misguided. It assumes that the customer’s requirements can be captured completely at the start of a project, and that those requirements will not change while the software is being developed. In practice, the requirements change throughout a project, partly in reaction to the project itself: the development of new software makes new things possible. Therefore both validation and verification are needed throughout the lifecycle.

Finally, V&V is now regarded as a coherent discipline: ”Software V&V is a systems engineering discipline which evaluates the software in a systems context, relative to all system elements of hardware, users, and other software”. (from Software Verification and Validation: Its Role in Computer Assurance and Its Relationship with Software Project Management Standards, by Dolores R. Wallace and Roger U. Fujii, NIST Special Publication 500-165)

Having thus carefully distinguished the two terms, my advice to V&V practitioners was then to forget about the distinction, and think instead about V&V as a toolbox, which provides a wide range of tools for asking different kinds of questions about software. And to master the use of each tool and figure out when and how to use it. Here’s one of my attempts to visualize the space of tools in the toolbox:

A range of V&V techniques. Note that "modeling" and "model checking" refer to building and analyzing abstracted models of software behaviour, a very different kind of beast from scientific models used in the computational sciences

For climate models, the definitions that focus on specifications don’t make much sense, because there are no detailed specifications of climate models (nor can there be – they’re built by iterative refinement like agile software development). But no matter – the toolbox approach still works; it just means some of the tools are applied a little differently. An appropriate toolbox for climate modeling looks a little different from my picture above, because some of these tools are more appropriate for real-time control systems, applications software, etc, and there are some missing from the above picture that are particular for simulation software. I’ll draw a better picture when I’ve finished analyzing the data from my field studies of practices used at climate labs.

Many different V&V tools are already in use at most climate modelling labs, but there is room for adding more tools to the toolbox, and for sharpening the existing tools (what and how are the subjects of my current research). But the question of how best to do this must proceed from a detailed analysis of current practices and how effective they are. There seem to be plenty of people wandering into this space, claiming that the models are insufficiently verified, validated, or both. And such people like to pontificate about what climate modelers ought to do differently. But anyone who pontificates in this way, but is unable to give a detailed account of which V&V techniques climate modellers currently use, is just blowing smoke. If you don’t know what’s in the toolbox already, then you can’t really make constructive comments about what’s missing.

A common cry from climate contrarians is that climate models need better verification and validation (V&V), and in particular, that they need Independent V&V (aka IV&V). George Crews has been arguing this for a while, and now Judith Curry has taken up the cry. Having spent part of the 1990’s as lead scientist at NASA’s IV&V facility, and the last few years studying climate model development processes, I think I can offer some good insights into this question.

The short answer is “no, they don’t”. The slightly longer answer is “if you have more money to spend to enhance the quality of climate models, spending it on IV&V is probably the least effective thing you could do”.

The full answer involves deconstructing the question, to show that it is based on three incorrect assumptions about climate models: (1) that there’s some significant risk to society associated with the use of climate models; (2) that the existing models are inadequately tested / verified / validated / whatevered; and (3) that trust in the models can be improved by using an IV&V process. I will demonstrate what’s wrong with each of these assumptions, but first I need to explain what IV&V is.

Independent Verification and Validation (IV&V) is a methodology developed primarily in the aerospace industry for reducing the risk of software failures, by engaging a separate team (separate from the software development team, that is) to perform various kinds of testing and analysis on the software as it is produced. NASA adopted IV&V for development of the flight software for the space shuttle in the 1970’s. Because IV&V is expensive (it typically adds 10%-20% to the cost of a software development contract), NASA tried to cancel the IV&V on the shuttle in the early 1980’s, once the shuttle was declared operational. Then, of course the Challenger disaster occurred. Although software wasn’t implicated, a consequence of the investigation was the creation of the Leveson committee, to review the software risk. Leveson’s committee concluded that far from cancelling IV&V, NASA needed to adopt the practice across all of its space flight programs. As a result of the Leveson report, the NASA IV&V facility was established in the early 1990’s, as a centre of expertise for all of NASA’s IV&V contracts. In 1995, I was recruited as lead scientist at the facility, and while I was there, our team investigated the operational effectiveness of the IV&V contracts on the Space Shuttle, International Space Station, Earth Observation System, Cassini, as well as a few other smaller programs. (I also reviewed the software failures on NASA’s Mars missions in the 1990’s, and have a talk about the lessons learned)

The key idea for IV&V is that when NASA puts out a contract to develop flight control software, it also creates a separate contract with a different company, to provide an ongoing assessment of software quality and risk as the development proceeds. One difficulty with IV&V contracts in the US aerospace industry is that it’s hard to achieve real independence, because industry consolidation has left very few aerospace companies available to take on such contracts, and they’re not sufficiently independent from one another.

NASA’s approach demands independence along three dimensions:

  • managerial independence (the IV&V contractor is free to determine how to proceed, and where to devote effort, independently of either the software development contractor and the customer)
  • financial independence (the funding for the IV&V contract is separate from the development contract, and cannot be raided if more resources are needed for development); and
  • technical independence (the IV&V contractor is free to develop its own criteria, and apply whatever V&V methods and tools it deems appropriate).

This has led to the development of a number of small companies who specialize only in IV&V (thus avoiding any contractual relationship with other aerospace companies), and who tend to recruit ex-NASA staff to provide them with the necessary domain expertise.

For the aerospace industry, IV&V has been demonstrated to be a cost effective strategy to improve software quality and reduce risk. The problem is that the risks are extreme: software errors in the control software for a spacecraft or an aircraft are highly likely to cause loss of life, loss of the vehicle, and/or loss of the mission. There is a sharp distinction between the development phase and the operation phase for such software: it had better be correct when it’s launched. Which means the risk mitigation has to be done during development, rather than during operation. In other words, iterative/agile approaches don’t work – you can’t launch with a beta version of the software. The goal is to detect and remove software defects before the software is ever used in an operational setting. An extreme example of this was the construction of the space station, where the only full end-to-end construction of the system was done in orbit; it wasn’t possible to put the hardware together on the ground in order to do a full systems test on the software.

IV&V is essential for such projects, because it overcomes natural confirmation bias of software development teams. Even the NASA program managers overseeing the contracts suffer from this too – we discovered one case where IV&V reports on serious risks were being systematically ignored by the NASA program office, because the program managers preferred to believe the project was going well. We fixed this by changing the reporting structure, and routing the IV&V reports directly to the Office of Safety and Mission Assurance at NASA headquarters. The IV&V teams developed their own emergency strategy too – if they encountered a risk that they considered mission-critical, and couldn’t get the attention of the program office to address it, they would go and have a quiet word with the astronauts, who would then ensure the problem got seen to!

But IV&V is very hard to do right, because much of it is a sociological problem rather than a technical problem. The two companies (developer and IV&V contractor) are naturally set up in an adversarial relationship, but if they act as adversaries, they cannot be effective: the developer will have a tendency to hide things, and the IV&V contractor will have a tendency to exaggerate the risks. Hence, we observed that the relationship is most effective where there is a good horizontal communication channel between the technical staff in each company, and that they come to respect one another’s expertise. The IV&V contractor has to be careful not to swamp the communication channels with spurious low-level worries, and the development contractor must be willing to respond positively to criticism. One way this works very well is for the IV&V team to give the developers advance warning of any issues they planned to report up the hierarchy to NASA, so that the development contractor could have a solution in place as even before NASA asked for it. For a more detailed account of these coordination and communication issues, see:

Okay, let’s look at whether IV&V is applicable to climate modeling. Earlier, I identified three assumptions made by people advocating it. Let’s take them one at a time:

1) The assumption there’s some significant risk to society associated with the use of climate models.

A large part of the mistake here is to misconstrue the role of climate models in policymaking. Contrarians tend to start from an assumption that proposed climate change mitigation policies (especially any attempt to regulate emissions) will wreck the economies of the developed nations (or specifically the US economy, if it’s an American contrarian). I prefer to think that a massive investment in carbon-neutral technologies will be a huge boon to the world’s economy, but let’s set aside that debate, and assume for sake of arguments that whatever policy path the world takes, it’s incredibly risky, with a non-neglibable probability of global catastrophe if the policies are either too aggressive or not aggressive enough, i.e. if the scientific assessments are wrong.

The key observation is that software does not play the same role in this system that flight software does for a spacecraft. For a spacecraft, the software represents a single point of failure. An error in the control software can immediately cause a disaster. But climate models are not control systems, and they do not determine climate policy. They don’t even control it indirectly – policy is set by a laborious process of political manoeuvring and international negotiation, in which the impact of any particular climate model is negligible.

Here’s what happens: the IPCC committees propose a whole series of experiments for the climate modelling labs around the world to perform, as part of a Coupled Model Intercomparison Project. Each participating lab chooses those runs they are most able to do, given their resources. When they have completed their runs, they submit the data to a public data repository. Scientists around the world then have about a year to analyze this data, interpret the results, to compare performance of the models, discuss findings at conferences and workshops, and publish papers. This results in thousands of publications from across a number of different scientific disciplines. The publications that make use of model outputs take their place alongside other forms of evidence, including observational studies, studies of paleoclimate data, and so on. The IPCC reports are an assessment of the sum total of the evidence; the model results from many runs of many different models are just one part of that evidence. Jim Hansen rates models as the third most important source of evidence for understanding climate change, after (1) paleoclimate studies and (2) observed global changes.

The consequences of software errors in a model, in the worst case, are likely to extend to no more than a few published papers being retracted. This is a crucial point: climate scientists don’t blindly publish model outputs as truth; they use model outputs to explore assumptions and test theories, and then publish papers describing the balance of evidence. Further papers then come along that add more evidence, or contradict the earlier findings. The assessment reports then weigh up all these sources of evidence.

I’ve been asking around for a couple of years for examples of published papers that were subsequently invalidated by software errors in the models. I’ve found several cases where a version of the model used in the experiments reported in a published paper was later found to contain an important software bug. But in none of those cases did the bug actually invalidate the conclusions of the paper. So even this risk is probably overstated.

The other point to make is that around twenty different labs around the world participate in the Model Intercomparison Projects that provide data for the IPCC assessments. That’s a level of software redundancy that is simply impossible in the aerospace industry. It’s likely that these 20+ models are not quite as independent as they might be (e.g. see Knutti’s analysis of this), but even so, the ability to run many different models on the same set of experiments, and to compare and discuss their differences is really quite remarkable, and the Model Intercomparison Projects have been a major factor in driving the science forward in the last decade or so. It’s effectively a huge benchmarking effort for climate models, with all the benefits normally associated with software benchmarking (and worthy of a separate post – stay tuned).

So in summary, while there are huge risks to society of getting climate policy wrong, those risks are not software risks. A single error in the flight software for a spacecraft could kill the crew. A single error in a climate model can, at most, only affect a handful of the thousands of published papers on which the IPCC assessments are based. The actual results of a particular model run are far less important than the understanding the scientists gain about what the model is doing and why, and the nature of the uncertainties involved. The modellers know that the models are imperfect approximations of very complex physical, chemical and biological processes. Conclusions about key issues such as climate sensitivity are based not on particular model runs, but on many different experiments with many different models over many years, and the extent to which these experiments agree or disagree with other sources of evidence.

2) the assumption that the current models are inadequately tested / verified / validated / whatevered;

This is a common talking point among contrarians. Part of the problem is that while the modeling labs have evolved sophisticated processes for developing and testing their models, they rarely bother to describe these processes to outsiders – nearly all published reports focus on the science done with the models, rather than the modeling process itself. I’ve been working to correct this, with, first, my study of the model development processes at the UK Met Office, and more recently my comparative studies of other labs, and my accounts of the existing V&V processes. Some people have interpreted the latter as a proposal for what should be done, but it is not; it is an account of the practices currently in place across all the of the labs I have studied.

A key point is that for climate models, unlike spacecraft flight controllers, there is no enforced separation between software development and software operation. A climate model is always an evolving, experimental tool, it’s never a finished product – even the prognostic runs done as input to the IPCC process are just experiments, requiring careful interpretation before any conclusions can be drawn. If the model crashes, or gives crazy results, the only damage is wasted time.

This means that an iterative development approach is the norm, which is far superior to the waterfall process used in the aerospace industry. Climate modeling labs have elevated the iterative development process to a new height: each change to the model is treated as a scientific experiment, where the change represents a hypothesis for how to improve the model, and a series of experiments is used to test whether the hypothesis was correct. This means that software development proceeds far more slowly than commercial software practices (at least in terms of lines of code per day), but that the models are continually tested and challenged by the people who know them inside out, and comparison with observational data is a daily activity.

The result is that climate models have very few bugs, compared to commercial software, when measured using industry standard defect density measures. However, although defect density is a standard IV&V metric, it’s probably a poor measure for this type of software – it’s handy for assessing risk of failure in a control system, but a poor way of assessing the validity and utility of a climate model. The real risk is that there may be latent errors in the model that mean it isn’t doing what the modellers designed it to do. The good news is that such errors are extremely rare: nearly all coding defects cause problems that are immediately obvious: the model crashes, or the simulation becomes unstable. Coding defects can only remain hidden if they have an effect that is small enough that it doesn’t cause significant perturbations in any of the diagnostic variables collected during a model run; in this case they are indistinguishable from the acceptable imperfections that arise as a result of using approximate techniques. The testing processes for the climate models (which in most labs include a daily build and automated test across all reference configurations) are sufficient that such problems are nearly always identified relatively early.

This means that there are really only two serious error types that can lead to misleading scientific results: (1) misunderstanding of what the model is actually doing by the scientists who conduct the model experiments, and (2) structural errors, where specific earth system processes are omitted or poorly captured in the model. In flight control software, these would correspond to requirements errors, and would be probed by an IV&V team through specification analysis. Catching these in control software is vital because you only get one chance to get it right. But in climate science, these are science errors, and are handled very well by the scientific process: making such mistakes, learning from them, and correcting them are all crucial parts of doing science. The normal scientific peer review process handles these kinds of errors very well. Model developers publish the details of their numerical algorithms and parameterization schemes, and these are reviewed and discussed in the community. In many cases, different labs will attempt to build their own implementations from these descriptions, and in the process subject them to critical scrutiny. In other words, there is already an independent expert review process for the most critical parts of the models, using the normal scientific route of replicating one another’s techniques. Similarly, experimental results are published, and the data is made available for other scientists to explore.

As a measure of how well this process works for building scientifically valid models, one senior modeller recently pointed out to me that it’s increasingly the case now that when the models diverge from the observations, it’s often the observational data that turns out to be wrong. The observational data is itself error prone, and software models turn out to be an important weapon in identifying and eliminating such errors.

However, there is another risk here that needs to be dealt with. Outside of the labs where the models are developed, there is a tendency for scientists who want to make use of the models to treat them as black box oracles. Proper use of the models depends on a detailed understanding of their strengths and weaknesses, and the ways in which uncertainties are handled. If we have some funding available to improve the quality of climate models, it would be far better spent on improving the user interfaces, and better training of the broader community of model users.

The bottom line is that climate models are subjected to very intensive system testing, and the incremental development process incorporates a sophisticated regression test process that’s superior to most industrial software practices. The biggest threat to validity of climate models is errors in the scientific theories on which they are based, but such errors are best investigated through the scientific process, rather than through an IV&V process. Which brings us to:

(3) the assumption that our ability to trust  in the models can be improved by an IV&V process;

IV&V is essentially a risk management strategy for safety-critical software when which an iterative development strategy is not possible – where the software has to work correctly the first (and every) time it is used in an operational setting. Climate models aren’t like this at all. They aren’t safety critical, they can be used even while they are being developed (and hence are built by iterative refinement); and they solve complex, wicked problems, for which there’s no clear correctness criteria. In fact, as a species of software development process, I’ve come to the conclusion they are dramatically different from any of the commercial software development paradigms that have been described in the literature.

A common mistake in the software engineering community is to think that software processes can be successfully transplanted from one organisation to another. Our comparative studies of different software organizations show that this is simply not true, even for organisations developing similar types of software. There are few, if any, documented cases of a software development organisation successfully adopting a process model developed elsewhere, without very substantial tailoring. What usually happens is that ideas from elsewhere are gradually infused and re-fashioned to work in the local context. And the evidence shows that every software oganisation evolves its own development processes that are highly dependent on local context, and on the constraints they operate under. Far more important than a prescribed process is the development of a shared understanding within the software team. The idea of taking a process model that was developed in the aerospace industry, and transplanting it wholesale into a vastly different kind of software development process (climate modeling) is quite simply ludicrous.

For example, one consequence of applying IV&V is that it reduces flexibility for development team, as they have to set clearer milestones and deliver workpackages on schedule (otherwise IV&V team cannot plan their efforts). Because the development of scientific codes is inherently unpredictable, would be almost impossible to plan and resource an IV&V effort. The flexibility to explore new model improvements opportunistically, and to adjust schedules to match varying scientific rhythms, is crucial to the scientific mission – locking the development into more rigid schedules to permit IV&V would be a disaster.

If you wanted to set up an IV&V process for climate models, it would have to be done by domain experts; domain expertise is the single most important factor in successful use of IV&V in the aerospace industry. This means it would have to be done by other climate scientists. But other climate scientists already do this routinely – it’s built into the Model Intercomparison Projects, as well as the peer review process and through attempts to replicate one another’s results. In fact the Model Intercomparison Projects already achieve far more than an IV&V process would, because they are done in the open and involve a much broader community.

In other words, the available pool of talent for performing IV&V is already busy using a process that’s far more effective than IV&V ever can be: it’s called doing science. Actually, I suspect that those people calling for IV&V of climate models are really trying to say that climate scientists can’t be trusted to check each other’s work, and that some other (unspecified) group ought to do the IV&V for them. However, this argument can only be used by people who don’t understand what IV&V is. IV&V works in the aerospace industry not because of any particular process, but because it brings in the experts – the people with grey hair who understand the flight systems inside out, and understand all the risks.

And remember that IV&V is expensive. NASA’s rule of thumb was an additional 10%-20% of the development cost. This cannot be taken from the development budget – it’s strictly an additional cost. Given my estimate of the development cost of a climate model as somewhere in the ballpark of  $350 million, then we’ll need to find another $35 million for each climate modeling centre to fund their IV&V contract. And if we had such funds to add to their budgets, I would argue that IV&V is one of the least sensible ways of spending this money. Instead, I would:

  • Hire more permanent software support staff to work alongside the scientists;
  • Provide more training courses to give the scientists better software skills;
  • Do more research into modeling frameworks;
  • Experiment with incremental improvements to existing practices, such as greater use of testing tools and frameworks, pair programming and code sprints;
  • More support to grow the user communities (e.g. user workshops and training courses), and more community building and beta testing;
  • Documenting the existing software development and V&V best practices so that different labs can share ideas and experiences, and the process of model building becomes more transparent to outsiders.

To summarize, IV&V would be an expensive mistake for climate modeling. It would divert precious resources (experts) away from existing modeling teams, and reduce their flexibility to respond to the science. IV&V isn’t appropriate because this isn’t missionsafety-critical software, it doesn’t have distinct development and operational phases, and the risks of software error are minor. There’s no single point of failure, because many labs around the world build their own models, and the normal scientific processes of experimentation, peer-review, replication, and model inter-comparison already provide a sophisticated process to examine the scientific validity of the models. Virtually all coding errors are detected in routine testing, and science errors are best handled through the usual scientific process, rather than through an IV&V process. Furthermore, there is only a small pool of experts available to perform IV&V on climate models (namely, other climate modelers) and they are already hard at work improving their own models. Re-deploying them to do IV&V of each other’s models would reduce the overall quality of the science rather than improving it.

(BTW I shouldn’t have had to write this article at all…)

Eugenia Kalnay has an interesting talk on a core problem that most people avoid when talking about climate: the growth in human population. It’s a difficult subject politically, because any analysis of the link between emissions growth and population growth invites the simple-minded response that de-population is the solution, which then quickly sinks into accusations that environmentalists are misanthropes.

In her talk, “Population and Climate Change: A Proposal“, Kalnay makes some excellent observations, for example that per dollar spent, family planning reduces four times as much carbon over the next 40-years as adoption of low-carbon technologies, and yet family planning is still not discussed at the COP meetings, because it is taboo. The cause and effect is a little complicated too. While it’s clear that more people means more fossil fuel emissions, it’s also the case that fossil fuels enabled the massive population growth – without fossil fuels the human population would be much smaller.

Kalnay then points out that, rather than thinking about coercive approaches to population control, there’s a fundamental human rights issue here: most women would prefer not to have lots of kids (especially not the averages of 6 or more in the developing world), but they simply have no choice. Kalnay cites a UN poll that shows “in many countries more than 80% of married women of reproductive age with 2 children, do not want to have more children”, and that estimates show that 40% of pregnancies worldwide are unwanted. And the most effective strategies to address this are education, access to birth control, and equal (economic) opportunities for women.

There’s also the risk of population collapse. Kalnay discussed the Club of Rome analysis that first alerted the world to the possibility of overshoot and collapse, and which was roundly dismissed by economists as absurd. But despite a whole lot of denialism, the models are still valid, and correspond well with what actually happened, and that rather than approaching the carrying capacity of the earth asymptotically, we have overshot. These dynamics models now show population collapse on most scenarios, rather than a slight overshoot and oscillation.

Kalnay concludes with a strong argument that we need to start including population dynamics into climate modelling, to help understand how different population growth scenarios impact emissions, and also to explore, from a scientific point of view, what the limits to growth really look like when we include earth system dynamics and resource depletion. And, importantly, she points out that you can’t do this by just modeling human population at the global level; we will need regional models to capture the different dynamics in different regions of the globe, as both the growth/decline rates, and the per capita emissions rates vary widely in different countries/regions.

Following my post last week about Fortran coding standards for climate models, Tim reminded me of a much older paper that was very influential in the creation (and sharing) of coding standards across climate modeling centers:

The paper is the result of a series of discussions in the mid-1980s across many different modeling centres (the paper lists 11 labs) about how to facilitate sharing of code modules. To simplify things, the paper assumes what is being shared are parameterization modules that operate in a single column of the model. Of course, this was back in the 1980s, which means the models were primarily atmospheric models, rather than the more comprehensive earth system models of today. The dynamical core of the model handles most of the horizontal processes (e.g. wind), which means that most of the remaining physical processes (the subject of these parameterizations) affect what happens vertically within a single column, e.g. by affecting radiative or convective transfer of heat between the layers. Plugging in new parameterization modules becomes much easier if this assumption holds, because the new module needs to be called once per time step per column, and if it doesn’t interact with other columns, it doesn’t mess up the vectorization. The paper describes a number of coding conventions, effectively providing an interface specification for single-column parameterizations.

An interesting point about this paper is that popularized the term “plug compatibility” amongst the modeling community, along with the (implicit) broader goal of designing all models to be plug-compatible. (although it cites Pielke & Arrit for the origin of the term). Unfortunately, the goal seems to be still very elusive. While most modelers will agree accept that plug-compatibility is desirable, a few people I’ve spoken to are very skeptical that it’s actually possible. Perhaps the strongest statement on this is from:

  • Randall DA. A University Perspective on Global Climate Modeling. Bulletin of the American Meteorological Society. 1996;77(11):2685-2690.
    p2687: “It is sometimes suggested that it is possible to make a plug-compatible global model so that an “outside” scientist can “easily make changes”. With a few exceptions (e.g. radiation codes), however, this is a fantasy, and I am surprised that such claims are not greeted with more skepticism.”

He goes on to describe instances where parameterizations have been transplanted from one model to another, but likens it to a major organ transplant, but more painful. The problem is that the various processes of the earth system interact in complex ways, and these complex interactions have to be handled properly in the code. As Randall puts it: “…the reality is that a global model must have a certain architectural unity or it will fail”. In my interviews with climate modellers, I’ve heard many tales of it taking months, and sometimes years of effort to take a code module contributed by someone outside the main modeling group, and to make it work properly in the model.

So plug compatibility and code sharing sound great in principle. In practice, no amount of interface specification and coding standards can reduce the essential complexity of earth system processes.

Note: most of the above is about plug compatibility of parameterization modules (i.e. code packages that live within the green boxes on the Bretherton diagram). More progress has been made (especially in the last decade) in standardizing the interfaces between major earth system components (i.e. the arrows on the Bretherton diagram). That’s where standardized couplers come in – see my post on the high level architecture of earth system models for an introduction. The IS-ENES workshop on coupling technologies in December will be an interesting overview of the state of the art here, although I won’t be able to attend, as it clashes with the AGU meeting.

Here’s an interesting article entitled “Decoding the Value of Computer Science” in the Chronicle of Higher Education. The article purports to be about the importance of computer science degrees, and the risks of not enough people enrolling for such degrees these days. But it seems to me it does a much better job of demonstrating the idea of computational thinking, i.e. that people who have been trained to program approach problems differently from those who have not.

It’s this approach to problem solving that I think we need more of in tackling the challenge of climate change.