Applying systems thinking to computing, climate and sustainability

Verifying Forecasting Systems

02. December 2010 · 1 comment · Categories: climate modeling

My post on validating climate models suggested that the key validation criteria is the extent to which the model captures (some aspect of) the current scientific theory, and is useful in exploring the theory. In effect, I’m saying that climate models are scientific tools, and should be validated as scientific tools. This makes them very different from, say numerical weather prediction (NWP) software, which are used in an operational setting to provide a service (predicting the weather).

What’s confusing is that both communities (climate modeling and weather modeling) use many of the same techniques both for the design of the models, and for comparing the models with observational data.

For NWP, forecast accuracy is the overriding objective, and the community has developed an extensive methodology for doing forecast verification. I pondered for a while whether this use of the term ‘verification’ here is consistent with my definitions, because surely we should be “validating” a forecast rather than “verifying it”. After thinking about it for a while, I concluded that the terminology is consistent, because forecast verification is like checking a program against it’s specification. In this case the specification states precisely what is being predicted, with what accuracy, and what would constitute a successful forecast (Bob Grumbine gives a recent example in verifying accuracy of seasonal sea ice forecasts). The verification procedure checks that the actual forecast was accurate, within the criteria set by this specification. Whether or not the forecast was useful is another question: that’s the validation question (and it’s a subjective question that requires some investigation of why people want forecasts in the first place).

An important point here is that forecast verification is not software verification: it doesn’t verify a particular piece of software. It’s also not simulation verification: it doesn’t verify a given run produced by that software. It’s verification of an entire forecasting system. A forecasting system makes use of computational models (often more than one), as well as a bunch of experts who interpret the model results.It also includes an extensive data collection system that gathers information about the current state of the world to use as input to the model. (And of course, some forecasting systems don’t use computational models at all). So:

If the forecast is inaccurate (according to the forecast criteria), it doesn’t necessarily mean there’s a flaw in the models – it might just as well be a flaw in the interpretation of the model outputs, or in the data collection process that provided it’s inputs. Oh, and of course, the verification might also fail because the specification is wrong, e.g. because there are flaws in the observational system used in the verification procedure too.
If the forecasting system persistently produces accurate forecasts (according to the forecast criteria), that doesn’t necessarily tell us anything about the quality of the software itself, it just means that the entire forecast system worked. It may well be that the model is very poor, but the meteorologists who interpret model outputs are brilliant at overcoming the weaknesses in the model (perhaps in the way they configure the runs, or perhaps in the way they filter model outputs), to produce accurate forecasts for their customers.

However, one effect of using this forecast verification approach day-in-day-out for weather forecasting systems over several decades (with an overall demand from customers for steady improvements in forecast accuracy) is that all parts of the forecasting system have improved dramatically over the last few decades, including the software. And climate modelling has benefited from this, as improvements in the modelling of processes needed for NWP can often also be used to improve the climate models (Senior et al have an excellent chapter on this in a forthcoming book, which I will review nearer to the publication date).

The question is, can we apply a similar forecast verification methodology to the “climate forecasting system”, despite the differences between weather and climate?

Note that the question isn’t about whether we can verify the accuracy of climate models this way, because the methodology doesn’t separate the models from the broader system in which they are used. So, if we take this route at all, we’re attempting to verify the forecast accuracy of the whole system: collection of observational data, creation of theories, use of these theories to develop models, choices for which model and which model configuration to use, choices for how to set up the runs, and interpretation of the results.

Climate models are not designed as forecasting tools, they are designed as tools to explore current theories about the climate system, and to investigate sources of uncertainty in these theories. However, the fact that they can be used to project potential future climate change (under various scenarios) is very handy. Of course, this is not the only way to produce quantified estimates of future climate change – you can do it using paper and pencil. It’s also a little unfortunate, because the IPCC process (or at least the end-users of IPCC reports) tend to over-emphasize the model projections at the expense of the science that went into them, and increasingly the funding for the science is tied to the production of such projections.

But some people (both within the climate modeling community and within the denialist community) would prefer that they not be used to project future climate change at all. (The argument from within the modelling community is that the results get over-interpreted or mis-interpreted by lay audiences; the argument from the denialist community is that models aren’t perfect. I think these two arguments are connected…). However, both arguments ignore reality: society demands of climate science that it provides its best estimates of the rate and size of future climate change, and (to the extent that they embody what we currently know about climate) the models are the best tool for this job. Not using them in the IPCC assessments would be like marching into the jungle with one eye closed.

So, back to the question: can we use NWP forecast verification for climate projections? I think the answer is ‘no’, because of the timescales involved. Projections of climate change really only make sense on the scale of decades to centuries. Waiting for decades to do the verification is pointless – by then the science will have moved on, and it will be way too late for policymaking purposes anyway.

If we can’t verify the forecasts on a timescale that’s actually useful, does this mean the models are invalid? Again the answer is ‘no’, for three reasons. First, we have plenty of other V&V techniques to apply to climate models. Second, the argument that climate models are a valid tool for creating future projections of climate change is based not on our ability to do forecast verification, but on how well the models capture the current state of the science. And third, because forecast verification wouldn’t necessarily say anything about the models themselves anyway, as it assesses the entire forecast system.

It would certainly be really, really useful to be able to verify the “climate forecast” system. But the fact that we can’t does not mean we cannot validate climate models.

Headline Spin

30. November 2010 · 2 comments · Categories: humour

Prem sent me a picture this morning which beautifully illustrates the way the media portrays science. On the left, the Wall Street Journal (page D1). On the right, the New York Times (page A1). Both today:

Reminds me of one of my favourite cartoons:

Validating Climate Models

30. November 2010 · 16 comments · Categories: climate modeling

In my last two posts, I demolished the idea that climate models need Independent Verification and Validation (IV&V), and I described the idea of a toolbox approach to V&V. Both posts were attacking myths: in the first case, the myth that an independent agent should be engaged to perform IV&V on the models, and in the second, the myth that you can critique the V&V of climate models without knowing anything about how they are currently built and tested.

I now want to expand on the latter point, and explain how the day-to-day practices of climate modellers taken together constitute a robust validation process, and that the only way to improve this validation process is just to do more of it (i.e. give the modeling labs more funds to expand their current activities, rather than to do something very different).

The most common mistake made by people discussing validation of climate models is to assume that a climate model is a thing-in-itself, and that the goal of validation is to demonstrate that some property holds of this thing. And whatever that property is, the assumption is that such measurement of it can be made without reference to its scientific milieu, and in particular without reference to its history and the processes by which it was constructed.

This mistake leads people to talk of validation in terms of how well “the model” matches observations, or how well “the model” matches the processes in some real world system. This approach to validation is, as Oreskes et al pointed out, quite impossible. The models are numerical approximations of complex physical phenomena. You can verify that the underlying equations are coded correctly in a given version of the model, but you can never validate that a given model accurately captures real physical processes, because it never will accurately capture them. Or as George Box summed it up: “All models are wrong…” (we’ll come back to the second half of the quote later).

The problem is that there is no such thing as “the model”. The body of code that constitutes a modern climate model actually represents an enormous number of possible models, each corresponding to a different way of configuring that code for a particular run. Furthermore, this body of code isn’t a static thing. The code is changed on a daily basis, through a continual process of experimentation and model improvement. Often these changes are done in parallel, so that there are multiple version at any given moment, being developed along multiple lines of investigation. Sometimes these lines of evolution are merged, to bring a number of useful enhancements together into a single version. Occasionally, the lines diverge enough to cause a fork: a point at which they are different enough that it just becomes too hard to reconcile them (See for example, this visualization of the evolution of ocean models). A forked model might at some point be given a new name, but the process by which a model gets a new name is rather arbitrary.

Occasionally, a modeling lab will label a particular snapshot of this evolving body of code as an “official release”. An official release has typically been tested much more extensively, in a number of standard configurations for a variety of different platforms. It’s likely to be more reliable, and therefore easier for users to work with. By more reliable here, I mean relatively free from coding defects. In other words, it is better verified than other versions, but not necessarily better validated (I’ll explain why shortly). In many cases, official releases also contain some significant new science (e.g. new parameterizations), and these scientific enhancements will be described in a set of published papers.

However, an official release isn’t a single model either. Again it’s just a body of code that can be configured to run as any of a huge number of different models, and it’s not unchanging either – as with all software, there will be occasional bugfix releases applied to it. Oh, and did I mention that to run a model, you have to make use of a huge number of ancillary datafiles, which define everything from the shape of the coastlines and land surfaces, to the specific carbon emissions scenario to be used. Any change to these effectively gives a different model too.

So, if you’re hoping to validate “the model”, you have to say which one you mean: which configuration of which code version of which line of evolution, and with which ancillary files. I suppose the response from those clamouring for something different in the way of model validation would say “well, the one used for the IPCC projections, of course”. Which is a little tricky, because each lab produces a large number of different runs for the CMIP process that provides input to the IPCC, and each of these is a likely to involve a different model configuration.

But let’s say for sake of argument that we could agree on a specific model configuration that ought to be “validated”. What will we do to validate it? What does validation actually mean? The Oreskes paper I mentioned earlier already demonstrated that comparison with real world observations, while interesting, does not constitute “validation”. The model will never match the observations exactly, so the best we’ll ever get along these lines is an argument that, on balance, given the sum total of the places where there’s a good match and the places where there’s a poor match, that the model does better or worse than some other model. This isn’t validation, and furthermore it isn’t even a sensible way of thinking about validation.

At this point many commentators stop, and argue that if validation of a model isn’t possible, then the models can’t be used to support the science (or more usually, they mean they can’t be used for IPCC projections). But this is a strawman argument, based on a fundamental misconception of what validation is all about. Validation isn’t about checking that a given instance of a model satisfies some given criteria. Validation is about about fitness for purpose, which means it’s not about the model at all, but about the relationship between a model and the purposes to which it is put. Or more precisely, its about the relationship between particular ways of building and configuring models and the ways in which runs produced by those models are used.

Furthermore, the purposes to which models are put and the processes by which they are developed co-evolve. The models evolve continually, and our ideas about what kinds of runs we might use them for evolve continually, which means validation must take this ongoing evolution into account. To summarize, validation isn’t about a property of some particular model instance; its about the whole process of developing and using models, and how this process evolves over time.

Let’s take a step back a moment, and ask what is the purpose of a climate model. The second half of the George Box quote is “…but some models are useful”. Climate models are tools that allow scientists to explore their current understanding of climate processes, to build and test theories, and to explore the consequences of those theories. In other words we’re dealing with three distinct systems:

We're dealing with relationships between three different systems

There does not need to be any clear relationship between the calculational system and the observational system – I didn’t include such a relationship in my diagram. For example, climate models can be run in configurations that don’t match the real world at all: e.g. a waterworld with no landmasses, or a world in which interesting things are varied: the tilt of the pole, the composition of the atmosphere, etc. These models are useful, and the experiments performed with them may be perfectly valid, even though they differ deliberately from the observational system.

What really matters is the relationship between the theoretical system and the observational system: in other words, how well does our current understanding (i.e. our theories) of climate explain the available observations (and of course the inverse: what additional observations might we make to help test our theories). When we ask questions about likely future climate changes, we’re not asking this question of the the calculational system, we’re asking it of the theoretical system; the models are just a convenient way of probing the theory to provide answers.

By the way, when I use the term theory, I mean it in exactly the way it’s used in throughout all sciences: a theory is the best current explanation of a given set of phenomena. The word “theory” doesn’t mean knowledge that is somehow more tentative than other forms of knowledge; a theory is actually the kind of knowledge that has the strongest epistemological basis of any kind of knowledge, because it is supported by the available evidence, and best explains that evidence. A theory might not be capable of providing quantitative predictions (but it’s good when it does), but it must have explanatory power.

In this context, the calculational system is valid as long as it can offer insights that help to understand the relationship between the theoretical system and the observational system. A model is useful as long as it helps to improve our understanding of climate, and to further the development of new (or better) theories. So a model that might have been useful (and hence valid) thirty years ago might not be useful today. If the old approach to modelling no longer matches current theory, then it has lost some or all of its validity. The model’s correspondence (or lack of) to the observations hasn’t changed (*), nor has its predictive power. But its utility as a scientific tool has changed, and hence its validity has changed.

[(*) except that that accuracy of the observations may have changed in the meantime, due to the ongoing process of discovering and resolving anomalies in the historical record.]

The key questions for validation then, are to do with how well the current generation of models (plural) support the discovery of new theoretical knowledge, and whether the ongoing process of improving those models continues to enhance their utility as scientific tools. We could focus this down to specific things we could measure by asking whether each individual change to the model is theoretically justified, and whether each such change makes the model more useful as a scientific tool.

To do this requires a detailed study of day-to-day model development practices, the extent to which these are closely tied with the rest of climate science (e.g. field campaigns, process studies, etc). It also takes in questions such as how modeling centres decide on their priorities (e.g. which new bits of science to get into the models sooner), and how each individual change is evaluated. In this approach, validation proceeds by checking whether the individual steps taken to construct and test changes to the code add up to a sound scientific process, and how good this process is at incorporating the latest theoretical ideas. And we ought to be able to demonstrate a steady improvement in the theoretical basis for the model. An interesting quirk here is that sometimes an improvement to the model from a theoretical point of view reduces its skill at matching observations; this happens particularly when we’re replacing bits of the model that were based on empirical parameters with an implementation that has a stronger theoretical basis, because the empirical parameters were tuned to give a better climate simulation, without necessarily being well understood. In the approach I’m describing, this would be an indicator of an improvement in validity, even while reduces the correspondence with observations. If on the other hand we based our validation on some measure of correspondence with observations, such a step would reduce the validity of the model!

But what does all of this tell us about whether it’s “valid” to use the models to produce projections of climate change into the future? Well, recall that when we ask for projections of future climate change, we’re not asking the question of the calculational system, because all that would result in is a number, or range of numbers, that are impossible to interpret, and therefore meaningless. Instead we’re asking the question of the theoretical system: given the sum total of our current theoretical understanding of climate, what is likely to happen in the future, under various scenarios for expected emissions and/or concentrations of greenhouse gases? If the models capture our current theoretical understanding well, then running the scenario on the model is a valid thing to do. If the models do a poor job of capturing our theoretical understanding, then running the models on these scenarios won’t be very useful.

Note what is happening here: when we ask climate scientists for future projections, we’re asking the question of the scientists, not of their models. The scientists will apply their judgement to select appropriate versions/configurations of the models to use, they will set up the runs, and they will interpret the results in the light of what is known about the models’ strengths and weaknesses and about any gaps between the comptuational models and the current theoretical understanding. And they will add all sorts of caveats to the conclusions they draw from the model runs when they present their results.

And how do we know whether the models capture our current theoretical understanding? By studying the processes by which the models are developed (i.e. continually evolved) be the various modeling centres, and examining how good each centre is at getting the latest science into the models. And by checking that whenever there are gaps between the models and the theory, these are adequately described by the caveats in the papers published about experiments with the models.

Summary: It is a mistake to think that validation is a post-hoc process to be applied to an individual “finished” model to ensure it meets some criteria for fidelity to the real world. In reality, there is no such thing as a finished model, just many different snapshots of a large set of model configurations, steadily evolving as the science progresses. And fidelity of a model to the real world is impossible to establish, because the models are approximations. In reality, climate models are tools to probe our current theories about how climate processes work. Validity is the extent to which climate models match our current theories, and the extent to which the process of improving the models keeps up with theoretical advances.

The difference between Verification and Validation

29. November 2010 · 20 comments · Categories: climate modeling

Sometime in the 1990’s, I drafted a frequently asked question list for NASA’s IV&V facility. Here’s what I wrote on the meaning of the terms “validation” and “verification”:

The terms Verification and Validation are commonly used in software engineering to mean two different types of analysis. The usual definitions are:

Validation: Are we building the right system?

Verification: Are we building the system right?

In other words, validation is concerned with checking that the system will meet the customer’s actual needs, while verification is concerned with whether the system is well-engineered, error-free, and so on. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

The distinction between the two terms is largely to do with the role of specifications. Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

Verification includes all the activities associated with the producing high quality software: testing, inspection, design analysis, specification analysis, and so on. It is a relatively objective process, in that if the various products and documents are expressed precisely enough, no subjective judgements should be needed in order to verify software.

In contrast, validation is an extremely subjective process. It involves making subjective assessments of how well the (proposed) system addresses a real-world need. Validation includes activities such as requirements modelling, prototyping and user evaluation.

In a traditional phased software lifecycle, verification is often taken to mean checking that the products of each phase satisfy the requirements of the previous phase. Validation is relegated to just the begining and ending of the project: requirements analysis and acceptance testing. This view is common in many software engineering textbooks, and is misguided. It assumes that the customer’s requirements can be captured completely at the start of a project, and that those requirements will not change while the software is being developed. In practice, the requirements change throughout a project, partly in reaction to the project itself: the development of new software makes new things possible. Therefore both validation and verification are needed throughout the lifecycle.

Finally, V&V is now regarded as a coherent discipline: ”Software V&V is a systems engineering discipline which evaluates the software in a systems context, relative to all system elements of hardware, users, and other software”. (from Software Verification and Validation: Its Role in Computer Assurance and Its Relationship with Software Project Management Standards, by Dolores R. Wallace and Roger U. Fujii, NIST Special Publication 500-165)

Having thus carefully distinguished the two terms, my advice to V&V practitioners was then to forget about the distinction, and think instead about V&V as a toolbox, which provides a wide range of tools for asking different kinds of questions about software. And to master the use of each tool and figure out when and how to use it. Here’s one of my attempts to visualize the space of tools in the toolbox:

A range of V&V techniques. Note that "modeling" and "model checking" refer to building and analyzing abstracted models of software behaviour, a very different kind of beast from scientific models used in the computational sciences

For climate models, the definitions that focus on specifications don’t make much sense, because there are no detailed specifications of climate models (nor can there be – they’re built by iterative refinement like agile software development). But no matter – the toolbox approach still works; it just means some of the tools are applied a little differently. An appropriate toolbox for climate modeling looks a little different from my picture above, because some of these tools are more appropriate for real-time control systems, applications software, etc, and there are some missing from the above picture that are particular for simulation software. I’ll draw a better picture when I’ve finished analyzing the data from my field studies of practices used at climate labs.

Many different V&V tools are already in use at most climate modelling labs, but there is room for adding more tools to the toolbox, and for sharpening the existing tools (what and how are the subjects of my current research). But the question of how best to do this must proceed from a detailed analysis of current practices and how effective they are. There seem to be plenty of people wandering into this space, claiming that the models are insufficiently verified, validated, or both. And such people like to pontificate about what climate modelers ought to do differently. But anyone who pontificates in this way, but is unable to give a detailed account of which V&V techniques climate modellers currently use, is just blowing smoke. If you don’t know what’s in the toolbox already, then you can’t really make constructive comments about what’s missing.

Do Climate Models need Independent Verification and Validation?

27. November 2010 · 29 comments · Categories: climate modeling

A common cry from climate contrarians is that climate models need better verification and validation (V&V), and in particular, that they need Independent V&V (aka IV&V). George Crews has been arguing this for a while, and now Judith Curry has taken up the cry. Having spent part of the 1990’s as lead scientist at NASA’s IV&V facility, and the last few years studying climate model development processes, I think I can offer some good insights into this question.

The short answer is “no, they don’t”. The slightly longer answer is “if you have more money to spend to enhance the quality of climate models, spending it on IV&V is probably the least effective thing you could do”.

The full answer involves deconstructing the question, to show that it is based on three incorrect assumptions about climate models: (1) that there’s some significant risk to society associated with the use of climate models; (2) that the existing models are inadequately tested / verified / validated / whatevered; and (3) that trust in the models can be improved by using an IV&V process. I will demonstrate what’s wrong with each of these assumptions, but first I need to explain what IV&V is.

Independent Verification and Validation (IV&V) is a methodology developed primarily in the aerospace industry for reducing the risk of software failures, by engaging a separate team (separate from the software development team, that is) to perform various kinds of testing and analysis on the software as it is produced. NASA adopted IV&V for development of the flight software for the space shuttle in the 1970’s. Because IV&V is expensive (it typically adds 10%-20% to the cost of a software development contract), NASA tried to cancel the IV&V on the shuttle in the early 1980’s, once the shuttle was declared operational. Then, of course the Challenger disaster occurred. Although software wasn’t implicated, a consequence of the investigation was the creation of the Leveson committee, to review the software risk. Leveson’s committee concluded that far from cancelling IV&V, NASA needed to adopt the practice across all of its space flight programs. As a result of the Leveson report, the NASA IV&V facility was established in the early 1990’s, as a centre of expertise for all of NASA’s IV&V contracts. In 1995, I was recruited as lead scientist at the facility, and while I was there, our team investigated the operational effectiveness of the IV&V contracts on the Space Shuttle, International Space Station, Earth Observation System, Cassini, as well as a few other smaller programs. (I also reviewed the software failures on NASA’s Mars missions in the 1990’s, and have a talk about the lessons learned)

The key idea for IV&V is that when NASA puts out a contract to develop flight control software, it also creates a separate contract with a different company, to provide an ongoing assessment of software quality and risk as the development proceeds. One difficulty with IV&V contracts in the US aerospace industry is that it’s hard to achieve real independence, because industry consolidation has left very few aerospace companies available to take on such contracts, and they’re not sufficiently independent from one another.

NASA’s approach demands independence along three dimensions:

managerial independence (the IV&V contractor is free to determine how to proceed, and where to devote effort, independently of either the software development contractor and the customer)
financial independence (the funding for the IV&V contract is separate from the development contract, and cannot be raided if more resources are needed for development); and
technical independence (the IV&V contractor is free to develop its own criteria, and apply whatever V&V methods and tools it deems appropriate).

This has led to the development of a number of small companies who specialize only in IV&V (thus avoiding any contractual relationship with other aerospace companies), and who tend to recruit ex-NASA staff to provide them with the necessary domain expertise.

For the aerospace industry, IV&V has been demonstrated to be a cost effective strategy to improve software quality and reduce risk. The problem is that the risks are extreme: software errors in the control software for a spacecraft or an aircraft are highly likely to cause loss of life, loss of the vehicle, and/or loss of the mission. There is a sharp distinction between the development phase and the operation phase for such software: it had better be correct when it’s launched. Which means the risk mitigation has to be done during development, rather than during operation. In other words, iterative/agile approaches don’t work – you can’t launch with a beta version of the software. The goal is to detect and remove software defects before the software is ever used in an operational setting. An extreme example of this was the construction of the space station, where the only full end-to-end construction of the system was done in orbit; it wasn’t possible to put the hardware together on the ground in order to do a full systems test on the software.

IV&V is essential for such projects, because it overcomes natural confirmation bias of software development teams. Even the NASA program managers overseeing the contracts suffer from this too – we discovered one case where IV&V reports on serious risks were being systematically ignored by the NASA program office, because the program managers preferred to believe the project was going well. We fixed this by changing the reporting structure, and routing the IV&V reports directly to the Office of Safety and Mission Assurance at NASA headquarters. The IV&V teams developed their own emergency strategy too – if they encountered a risk that they considered mission-critical, and couldn’t get the attention of the program office to address it, they would go and have a quiet word with the astronauts, who would then ensure the problem got seen to!

But IV&V is very hard to do right, because much of it is a sociological problem rather than a technical problem. The two companies (developer and IV&V contractor) are naturally set up in an adversarial relationship, but if they act as adversaries, they cannot be effective: the developer will have a tendency to hide things, and the IV&V contractor will have a tendency to exaggerate the risks. Hence, we observed that the relationship is most effective where there is a good horizontal communication channel between the technical staff in each company, and that they come to respect one another’s expertise. The IV&V contractor has to be careful not to swamp the communication channels with spurious low-level worries, and the development contractor must be willing to respond positively to criticism. One way this works very well is for the IV&V team to give the developers advance warning of any issues they planned to report up the hierarchy to NASA, so that the development contractor could have a solution in place as even before NASA asked for it. For a more detailed account of these coordination and communication issues, see:

Easterbrook S. The Role of Indepedent V&V in Upstream Software Development Processes. In: 2nd World Conference on Integrated Design and Process Technology (IDPT). Austin, Texas; 1996.

Okay, let’s look at whether IV&V is applicable to climate modeling. Earlier, I identified three assumptions made by people advocating it. Let’s take them one at a time:

1) The assumption there’s some significant risk to society associated with the use of climate models.

A large part of the mistake here is to misconstrue the role of climate models in policymaking. Contrarians tend to start from an assumption that proposed climate change mitigation policies (especially any attempt to regulate emissions) will wreck the economies of the developed nations (or specifically the US economy, if it’s an American contrarian). I prefer to think that a massive investment in carbon-neutral technologies will be a huge boon to the world’s economy, but let’s set aside that debate, and assume for sake of arguments that whatever policy path the world takes, it’s incredibly risky, with a non-neglibable probability of global catastrophe if the policies are either too aggressive or not aggressive enough, i.e. if the scientific assessments are wrong.

The key observation is that software does not play the same role in this system that flight software does for a spacecraft. For a spacecraft, the software represents a single point of failure. An error in the control software can immediately cause a disaster. But climate models are not control systems, and they do not determine climate policy. They don’t even control it indirectly – policy is set by a laborious process of political manoeuvring and international negotiation, in which the impact of any particular climate model is negligible.

Here’s what happens: the IPCC committees propose a whole series of experiments for the climate modelling labs around the world to perform, as part of a Coupled Model Intercomparison Project. Each participating lab chooses those runs they are most able to do, given their resources. When they have completed their runs, they submit the data to a public data repository. Scientists around the world then have about a year to analyze this data, interpret the results, to compare performance of the models, discuss findings at conferences and workshops, and publish papers. This results in thousands of publications from across a number of different scientific disciplines. The publications that make use of model outputs take their place alongside other forms of evidence, including observational studies, studies of paleoclimate data, and so on. The IPCC reports are an assessment of the sum total of the evidence; the model results from many runs of many different models are just one part of that evidence. Jim Hansen rates models as the third most important source of evidence for understanding climate change, after (1) paleoclimate studies and (2) observed global changes.

The consequences of software errors in a model, in the worst case, are likely to extend to no more than a few published papers being retracted. This is a crucial point: climate scientists don’t blindly publish model outputs as truth; they use model outputs to explore assumptions and test theories, and then publish papers describing the balance of evidence. Further papers then come along that add more evidence, or contradict the earlier findings. The assessment reports then weigh up all these sources of evidence.

I’ve been asking around for a couple of years for examples of published papers that were subsequently invalidated by software errors in the models. I’ve found several cases where a version of the model used in the experiments reported in a published paper was later found to contain an important software bug. But in none of those cases did the bug actually invalidate the conclusions of the paper. So even this risk is probably overstated.

The other point to make is that around twenty different labs around the world participate in the Model Intercomparison Projects that provide data for the IPCC assessments. That’s a level of software redundancy that is simply impossible in the aerospace industry. It’s likely that these 20+ models are not quite as independent as they might be (e.g. see Knutti’s analysis of this), but even so, the ability to run many different models on the same set of experiments, and to compare and discuss their differences is really quite remarkable, and the Model Intercomparison Projects have been a major factor in driving the science forward in the last decade or so. It’s effectively a huge benchmarking effort for climate models, with all the benefits normally associated with software benchmarking (and worthy of a separate post – stay tuned).

So in summary, while there are huge risks to society of getting climate policy wrong, those risks are not software risks. A single error in the flight software for a spacecraft could kill the crew. A single error in a climate model can, at most, only affect a handful of the thousands of published papers on which the IPCC assessments are based. The actual results of a particular model run are far less important than the understanding the scientists gain about what the model is doing and why, and the nature of the uncertainties involved. The modellers know that the models are imperfect approximations of very complex physical, chemical and biological processes. Conclusions about key issues such as climate sensitivity are based not on particular model runs, but on many different experiments with many different models over many years, and the extent to which these experiments agree or disagree with other sources of evidence.

2) the assumption that the current models are inadequately tested / verified / validated / whatevered;

This is a common talking point among contrarians. Part of the problem is that while the modeling labs have evolved sophisticated processes for developing and testing their models, they rarely bother to describe these processes to outsiders – nearly all published reports focus on the science done with the models, rather than the modeling process itself. I’ve been working to correct this, with, first, my study of the model development processes at the UK Met Office, and more recently my comparative studies of other labs, and my accounts of the existing V&V processes. Some people have interpreted the latter as a proposal for what should be done, but it is not; it is an account of the practices currently in place across all the of the labs I have studied.

A key point is that for climate models, unlike spacecraft flight controllers, there is no enforced separation between software development and software operation. A climate model is always an evolving, experimental tool, it’s never a finished product – even the prognostic runs done as input to the IPCC process are just experiments, requiring careful interpretation before any conclusions can be drawn. If the model crashes, or gives crazy results, the only damage is wasted time.

This means that an iterative development approach is the norm, which is far superior to the waterfall process used in the aerospace industry. Climate modeling labs have elevated the iterative development process to a new height: each change to the model is treated as a scientific experiment, where the change represents a hypothesis for how to improve the model, and a series of experiments is used to test whether the hypothesis was correct. This means that software development proceeds far more slowly than commercial software practices (at least in terms of lines of code per day), but that the models are continually tested and challenged by the people who know them inside out, and comparison with observational data is a daily activity.

The result is that climate models have very few bugs, compared to commercial software, when measured using industry standard defect density measures. However, although defect density is a standard IV&V metric, it’s probably a poor measure for this type of software – it’s handy for assessing risk of failure in a control system, but a poor way of assessing the validity and utility of a climate model. The real risk is that there may be latent errors in the model that mean it isn’t doing what the modellers designed it to do. The good news is that such errors are extremely rare: nearly all coding defects cause problems that are immediately obvious: the model crashes, or the simulation becomes unstable. Coding defects can only remain hidden if they have an effect that is small enough that it doesn’t cause significant perturbations in any of the diagnostic variables collected during a model run; in this case they are indistinguishable from the acceptable imperfections that arise as a result of using approximate techniques. The testing processes for the climate models (which in most labs include a daily build and automated test across all reference configurations) are sufficient that such problems are nearly always identified relatively early.

This means that there are really only two serious error types that can lead to misleading scientific results: (1) misunderstanding of what the model is actually doing by the scientists who conduct the model experiments, and (2) structural errors, where specific earth system processes are omitted or poorly captured in the model. In flight control software, these would correspond to requirements errors, and would be probed by an IV&V team through specification analysis. Catching these in control software is vital because you only get one chance to get it right. But in climate science, these are science errors, and are handled very well by the scientific process: making such mistakes, learning from them, and correcting them are all crucial parts of doing science. The normal scientific peer review process handles these kinds of errors very well. Model developers publish the details of their numerical algorithms and parameterization schemes, and these are reviewed and discussed in the community. In many cases, different labs will attempt to build their own implementations from these descriptions, and in the process subject them to critical scrutiny. In other words, there is already an independent expert review process for the most critical parts of the models, using the normal scientific route of replicating one another’s techniques. Similarly, experimental results are published, and the data is made available for other scientists to explore.

As a measure of how well this process works for building scientifically valid models, one senior modeller recently pointed out to me that it’s increasingly the case now that when the models diverge from the observations, it’s often the observational data that turns out to be wrong. The observational data is itself error prone, and software models turn out to be an important weapon in identifying and eliminating such errors.

However, there is another risk here that needs to be dealt with. Outside of the labs where the models are developed, there is a tendency for scientists who want to make use of the models to treat them as black box oracles. Proper use of the models depends on a detailed understanding of their strengths and weaknesses, and the ways in which uncertainties are handled. If we have some funding available to improve the quality of climate models, it would be far better spent on improving the user interfaces, and better training of the broader community of model users.

The bottom line is that climate models are subjected to very intensive system testing, and the incremental development process incorporates a sophisticated regression test process that’s superior to most industrial software practices. The biggest threat to validity of climate models is errors in the scientific theories on which they are based, but such errors are best investigated through the scientific process, rather than through an IV&V process. Which brings us to:

(3) the assumption that our ability to trust in the models can be improved by an IV&V process;

IV&V is essentially a risk management strategy for safety-critical software when which an iterative development strategy is not possible – where the software has to work correctly the first (and every) time it is used in an operational setting. Climate models aren’t like this at all. They aren’t safety critical, they can be used even while they are being developed (and hence are built by iterative refinement); and they solve complex, wicked problems, for which there’s no clear correctness criteria. In fact, as a species of software development process, I’ve come to the conclusion they are dramatically different from any of the commercial software development paradigms that have been described in the literature.

A common mistake in the software engineering community is to think that software processes can be successfully transplanted from one organisation to another. Our comparative studies of different software organizations show that this is simply not true, even for organisations developing similar types of software. There are few, if any, documented cases of a software development organisation successfully adopting a process model developed elsewhere, without very substantial tailoring. What usually happens is that ideas from elsewhere are gradually infused and re-fashioned to work in the local context. And the evidence shows that every software oganisation evolves its own development processes that are highly dependent on local context, and on the constraints they operate under. Far more important than a prescribed process is the development of a shared understanding within the software team. The idea of taking a process model that was developed in the aerospace industry, and transplanting it wholesale into a vastly different kind of software development process (climate modeling) is quite simply ludicrous.

For example, one consequence of applying IV&V is that it reduces flexibility for development team, as they have to set clearer milestones and deliver workpackages on schedule (otherwise IV&V team cannot plan their efforts). Because the development of scientific codes is inherently unpredictable, would be almost impossible to plan and resource an IV&V effort. The flexibility to explore new model improvements opportunistically, and to adjust schedules to match varying scientific rhythms, is crucial to the scientific mission – locking the development into more rigid schedules to permit IV&V would be a disaster.

If you wanted to set up an IV&V process for climate models, it would have to be done by domain experts; domain expertise is the single most important factor in successful use of IV&V in the aerospace industry. This means it would have to be done by other climate scientists. But other climate scientists already do this routinely – it’s built into the Model Intercomparison Projects, as well as the peer review process and through attempts to replicate one another’s results. In fact the Model Intercomparison Projects already achieve far more than an IV&V process would, because they are done in the open and involve a much broader community.

In other words, the available pool of talent for performing IV&V is already busy using a process that’s far more effective than IV&V ever can be: it’s called doing science. Actually, I suspect that those people calling for IV&V of climate models are really trying to say that climate scientists can’t be trusted to check each other’s work, and that some other (unspecified) group ought to do the IV&V for them. However, this argument can only be used by people who don’t understand what IV&V is. IV&V works in the aerospace industry not because of any particular process, but because it brings in the experts – the people with grey hair who understand the flight systems inside out, and understand all the risks.

And remember that IV&V is expensive. NASA’s rule of thumb was an additional 10%-20% of the development cost. This cannot be taken from the development budget – it’s strictly an additional cost. Given my estimate of the development cost of a climate model as somewhere in the ballpark of $350 million, then we’ll need to find another $35 million for each climate modeling centre to fund their IV&V contract. And if we had such funds to add to their budgets, I would argue that IV&V is one of the least sensible ways of spending this money. Instead, I would:

Hire more permanent software support staff to work alongside the scientists;
Provide more training courses to give the scientists better software skills;
Do more research into modeling frameworks;
Experiment with incremental improvements to existing practices, such as greater use of testing tools and frameworks, pair programming and code sprints;
More support to grow the user communities (e.g. user workshops and training courses), and more community building and beta testing;
Documenting the existing software development and V&V best practices so that different labs can share ideas and experiences, and the process of model building becomes more transparent to outsiders.

To summarize, IV&V would be an expensive mistake for climate modeling. It would divert precious resources (experts) away from existing modeling teams, and reduce their flexibility to respond to the science. IV&V isn’t appropriate because this isn’t ~~mission~~safety-critical software, it doesn’t have distinct development and operational phases, and the risks of software error are minor. There’s no single point of failure, because many labs around the world build their own models, and the normal scientific processes of experimentation, peer-review, replication, and model inter-comparison already provide a sophisticated process to examine the scientific validity of the models. Virtually all coding errors are detected in routine testing, and science errors are best handled through the usual scientific process, rather than through an IV&V process. Furthermore, there is only a small pool of experts available to perform IV&V on climate models (namely, other climate modelers) and they are already hard at work improving their own models. Re-deploying them to do IV&V of each other’s models would reduce the overall quality of the science rather than improving it.

(BTW I shouldn’t have had to write this article at all…)

Getting population dynamics into the models

22. November 2010 · 2 comments · Categories: reducing emissions

Eugenia Kalnay has an interesting talk on a core problem that most people avoid when talking about climate: the growth in human population. It’s a difficult subject politically, because any analysis of the link between emissions growth and population growth invites the simple-minded response that de-population is the solution, which then quickly sinks into accusations that environmentalists are misanthropes.

In her talk, “Population and Climate Change: A Proposal“, Kalnay makes some excellent observations, for example that per dollar spent, family planning reduces four times as much carbon over the next 40-years as adoption of low-carbon technologies, and yet family planning is still not discussed at the COP meetings, because it is taboo. The cause and effect is a little complicated too. While it’s clear that more people means more fossil fuel emissions, it’s also the case that fossil fuels enabled the massive population growth – without fossil fuels the human population would be much smaller.

Kalnay then points out that, rather than thinking about coercive approaches to population control, there’s a fundamental human rights issue here: most women would prefer not to have lots of kids (especially not the averages of 6 or more in the developing world), but they simply have no choice. Kalnay cites a UN poll that shows “in many countries more than 80% of married women of reproductive age with 2 children, do not want to have more children”, and that estimates show that 40% of pregnancies worldwide are unwanted. And the most effective strategies to address this are education, access to birth control, and equal (economic) opportunities for women.

There’s also the risk of population collapse. Kalnay discussed the Club of Rome analysis that first alerted the world to the possibility of overshoot and collapse, and which was roundly dismissed by economists as absurd. But despite a whole lot of denialism, the models are still valid, and correspond well with what actually happened, and that rather than approaching the carrying capacity of the earth asymptotically, we have overshot. These dynamics models now show population collapse on most scenarios, rather than a slight overshoot and oscillation.

Kalnay concludes with a strong argument that we need to start including population dynamics into climate modelling, to help understand how different population growth scenarios impact emissions, and also to explore, from a scientific point of view, what the limits to growth really look like when we include earth system dynamics and resource depletion. And, importantly, she points out that you can’t do this by just modeling human population at the global level; we will need regional models to capture the different dynamics in different regions of the globe, as both the growth/decline rates, and the per capita emissions rates vary widely in different countries/regions.

Plug-compatibility and climate models

18. November 2010 · 7 comments · Categories: climate modeling

Following my post last week about Fortran coding standards for climate models, Tim reminded me of a much older paper that was very influential in the creation (and sharing) of coding standards across climate modeling centers:

Kalnay E, Kanamitsu M, Pfaendtner J. Rules for Interchange of Physical Parameterizations“. Bulletin of the American Meteorological Society. 1989;70(6):620-622.

The paper is the result of a series of discussions in the mid-1980s across many different modeling centres (the paper lists 11 labs) about how to facilitate sharing of code modules. To simplify things, the paper assumes what is being shared are parameterization modules that operate in a single column of the model. Of course, this was back in the 1980s, which means the models were primarily atmospheric models, rather than the more comprehensive earth system models of today. The dynamical core of the model handles most of the horizontal processes (e.g. wind), which means that most of the remaining physical processes (the subject of these parameterizations) affect what happens vertically within a single column, e.g. by affecting radiative or convective transfer of heat between the layers. Plugging in new parameterization modules becomes much easier if this assumption holds, because the new module needs to be called once per time step per column, and if it doesn’t interact with other columns, it doesn’t mess up the vectorization. The paper describes a number of coding conventions, effectively providing an interface specification for single-column parameterizations.

An interesting point about this paper is that popularized the term “plug compatibility” amongst the modeling community, along with the (implicit) broader goal of designing all models to be plug-compatible. (although it cites Pielke & Arrit for the origin of the term). Unfortunately, the goal seems to be still very elusive. While most modelers will agree accept that plug-compatibility is desirable, a few people I’ve spoken to are very skeptical that it’s actually possible. Perhaps the strongest statement on this is from:

Randall DA. A University Perspective on Global Climate Modeling. Bulletin of the American Meteorological Society. 1996;77(11):2685-2690.
p2687: “It is sometimes suggested that it is possible to make a plug-compatible global model so that an “outside” scientist can “easily make changes”. With a few exceptions (e.g. radiation codes), however, this is a fantasy, and I am surprised that such claims are not greeted with more skepticism.”

He goes on to describe instances where parameterizations have been transplanted from one model to another, but likens it to a major organ transplant, but more painful. The problem is that the various processes of the earth system interact in complex ways, and these complex interactions have to be handled properly in the code. As Randall puts it: “…the reality is that a global model must have a certain architectural unity or it will fail”. In my interviews with climate modellers, I’ve heard many tales of it taking months, and sometimes years of effort to take a code module contributed by someone outside the main modeling group, and to make it work properly in the model.

So plug compatibility and code sharing sound great in principle. In practice, no amount of interface specification and coding standards can reduce the essential complexity of earth system processes.

Note: most of the above is about plug compatibility of parameterization modules (i.e. code packages that live within the green boxes on the Bretherton diagram). More progress has been made (especially in the last decade) in standardizing the interfaces between major earth system components (i.e. the arrows on the Bretherton diagram). That’s where standardized couplers come in – see my post on the high level architecture of earth system models for an introduction. The IS-ENES workshop on coupling technologies in December will be an interesting overview of the state of the art here, although I won’t be able to attend, as it clashes with the AGU meeting.

Computer scientists think differently

17. November 2010 · 1 comment · Categories: systems thinking

Here’s an interesting article entitled “Decoding the Value of Computer Science” in the Chronicle of Higher Education. The article purports to be about the importance of computer science degrees, and the risks of not enough people enrolling for such degrees these days. But it seems to me it does a much better job of demonstrating the idea of computational thinking, i.e. that people who have been trained to program approach problems differently from those who have not.

It’s this approach to problem solving that I think we need more of in tackling the challenge of climate change.

You can’t delegate ill-defined problems to software engineers

16. November 2010 · 13 comments · Categories: climate modeling, collaborative science

I had lunch last week with Gerhard Fischer at the University of Colorado. Gerhard is director of the center for lifelong learning and design, and his work focusses on technologies that help people to learn and design solutions to suit their own needs. We talked a lot about meta-design, especially how you create tools that help domain experts (who are not necessarily software experts) to design their own software solutions.

I was describing some of my observations about why climate scientists prefer to write their own code rather than delegating it to software professionals, when Gerhard put it into words brilliantly. He said “You can’t delegate ill-defined problems to software engineers”. And that’s the nub of it. Much (but not all) of the work of building a global climate model is an ill-defined problem. We don’t know at the outset what should go into the model, which processes are important, how to simulate complex physical, chemical and biological processes and their interactions. We don’t know what’s computationally feasible (until we try it). We don’t know what will be scientifically useful. So we can’t write a specification, nor explain the requirements to someone who doesn’t have a high level of domain expertise. The only way forward is to actively engage in the process of building a little, experimenting with it, reflecting on the lessons learnt, and then modifying and iterating.

So the process of building a climate model is a loop of build-explore-learn-build. If you put people into that loop who don’t have the necessary understanding of the science being done with the models, then you slow things down. And as the climate scientists (mostly) have the necessary technical skills, it’s quicker and easier to write their own code than to explain to a software engineer what is needed. But there’s a trade-off: the exploratory loop can be traversed quickly, but the resulting code might not be very robust or modifiable. Just as in agile software practices, the aim is to build something that works first, and worry about elegant design later. And that ‘later’ might never come, as the next scientific question is nearly always more alluring than a re-design. Which means the main role for software engineers in the process is to do cleanup operations. Several of the software people I’ve interviewed in the last few months at climate modeling labs described their role as mopping up after the parade (and some of them used more colourful terms than that).

The term meta-design is helpful here, because it specifically addresses the question of how to put better design tools directly into the hands of the climate scientists. Modeling frameworks fit into this space, as do domain specific-languages. But I’m convinced that there’s a lot more scope for tools that raise the level of abstraction, so that modelers can work directly with meaningful building blocks than lines of Fortran. And there’s another problem. Meta-design is hard. Too often it produces tools that just don’t do what the target users want. If we’re really going to put better tools into the hands of climate modelers, then we need a new kind of expertise to build such tools: a community of meta-designers who have both the software expertise and the domain expertise in earth sciences.

Which brings me to another issue that came up in the discussion. Gerhard provided me a picture that helps me explain the issue better (I hope he doesn’t mind me reproducing it here; it comes from his talk “Meta-Design and Social Creativity” given at IEMC 2007):

To create reflective design communities, the software professionals need to acquire some domain expertise, and the domain experts need to acquire some software expertise (diagram by Gerhard Fischer)

Clearly, collaboration between software experts and climate scientists is likely to work much better if each acquires a little of the other’s expertise, if only to enable them to share some vocabulary to talk about the problems. It reduces the distance between them.

At climate modeling labs, I’ve met a number both kinds of people – i.e. climate scientists who have acquired good software knowledge, and software professionals who have acquired good climate science knowledge. But it seems to me that for climate modeling, one of these transitions is much easier than the other. It seems to be easier for climate scientists to acquire good software skills than it is for software professionals (with no prior background in the earth sciences) to acquire good climate science domain knowledge. That’s not to say it’s impossible, as I have met a few people who have followed this path (but they are rare). It seems to require many years of dedicated work. And there appears to be a big disincentive for many software professionals, as it turns them from generalists into specialists. If you dedicate several years to developing the necessary domain expertise in climate modeling, it probably means you’re committing the rest of your career to working in this space. But the pay is lousy, the programming language of choice is uncool, and mostly you’ll be expected to clean up after the parade rather than star in it.

The future of Software Engineering?

11. November 2010 · 6 comments · Categories: systems thinking

I went to a workshop earlier this week on “the Future of Software Engineering Research” in Santa Fe. My main excuse to attend was to see how much interest I could raise in getting more software engineering researchers to engage in the problem of climate change – I presented my paper “Climate Change: A Software Grand Challenge“. But I came away from the workshop with very mixed feelings. I met some fascinating people, and had very interesting discussions about research challenges, but overall, the tone of the workshop (especially the closing plenary discussion) seemed to be far more about navel-gazing and doing “more of the same”, rather than rising to new challenges.

The break-out group I participated in focussed on the role of software in addressing societal grand challenges. We came up with a brief list of such challenges: Climate Change; Energy; Safety & Security; Transportation; Health and Healthcare; Livable Mega-Cities. In all cases, we’re dealing with complex systems-of-systems, with all the properties laid out in the SEI report on Ultra-Large Scale Systems – decentralized systems with no clear ownership; systems that undergo continuous evolution while they are being used (you can’t take the system down for maintenance and upgrades); systems built from heterogeneous elements that are constructed at different times by different communities for different purposes; systems where traditional distinctions between developers and users disappear, as the human activity and technical functionality intertwine. And systems where the “requirements” are fundamentally unknowable – these systems simultaneously serve multiple purposes for multiple communities.

I’ve argued in the past that really all software is like this, but that we pretend otherwise by drawing boundaries around small pieces of functionality so that we can ignore the uncertainties in the broader social system in which it will be used. Traditional approaches to software engineering work when we can get away with this game – on those occasions when it’s possible to get local agreement about a specific set of software functions that will help solve a local problem. The fact that software engineers tend to insist on writing a specification is a symptom that they are playing this game. But such agreements/specifications are always local and temporary, which means that software built in this way is frequently disappointing or frustrating to use.

So, for societal grand challenge problems, what is the role of software engineering research, and what kinds of software engineering might be effective? In our break-out group, we talked a lot about examples of emergent successful systems such as Facebook and Wikipedia (and even the web itself), which were built not by any recognizable software development process, but by small groups of people incrementally adding to an evolving infrastructure, each nudging it a little further down an interesting road. And by frequently getting it wrong, and seeking continual improvement when things do go wrong. Software innovation is then an emergent feature in these endeavours, but it is the people and the way they collaborate that matters, rather than any particular approach to software development.

Obviously, software alone cannot solve these societal grand challenges, but software does have a vital role to play: good software infrastructure can catalyze the engagement of multiple communities, who together can tackle the challenges. In our break-out group, we talked specifically about healthcare and climate change – in both cases there are lots of individuals and communities with ideas and enthusiasm, but who are hampered by socio-technical barriers: lack of data exchange standards, lack of appropriate organizational structures, lack of institutional support, lack of a suitable framework for exploratory software development, tools that ignore key domain concepts. It seems increasingly clear that typical governmental approaches to information systems will not solve these problems. You can’t just put out a call for tender and commission construction of an ultra-large scale system; you have to evolve it from multiple existing systems. Witness repeated failures of efforts around shared health records, carbon accounting systems, etc. But governments do need to create the technical infrastructure and nurture the coming together of inter-disciplinary communities to address these challenges, and strategic funding of trans-disciplinary research projects is a key element.

But what was the response at the workshop to these issues? The breakout groups presented their ideas back to the workshop plenary on the final afternoon, and the resulting discussion was seriously underwhelming. Several people (I could characterize them as the “old guard” in the software engineering research community) stood up to speak out against making the field more inter-disciplinary. They don’t want to see the “core” of the field diluted in any way. There were some (unconvincing) arguments that software engineering research has had a stronger impact than most people acknowledge. And a long discussion that the future of software engineering research lies in stronger ties between academic and industrial software engineering. Never mind that increasingly, software is developed outside the “software industry”: e.g. open source projects, scientific software, end-user programmers, community engagement, and of course college students building web tools that go on to take the internet world by storm. All this is irrelevant to the old guard – they want to keep on believing that the only software engineering that matters is that which can be built to a specification by a large software company.

I came away from the workshop with the feeling that this community is in the process of dooming itself to irrelevancy. But then, as was pointed out to me over lunch today, the people who have done the best under the existing system are unlikely to want to change it. Innovation in software research won’t come from the distinguished senior people in the field…

Climate Model Coding Standards

07. November 2010 · 7 comments · Categories: climate modeling

Here are some climate model coding standards that I’ve collected over the last few months:

NASA GISS’s ModelE_Coding_Standards (dated June 2010)
NCAR’s CESM (previously known as CCSM) Coding Conventions (dated June 2001)
IPSL Ocean Model NEMO coding conventions (version 2, dated 2010)
The European Program for Integrated Earth System Modelling P rism Coding Rules (dated 2002)
The UK Met Office Unified Model Software Standards (link removed – see update below)
GFDL FMS coding conventions (dated 2002)
The Max-Planck-Institute’s Programming Guide for ICON (dated March 2006)

It’s encouraging that most modelling centres have developed detailed coding standards, but it’s a shame that most of them had to roll their own. The PRISM project is an exception – as many of the modelling labs across Europe were members of the PRISM project, some of these labs now use the PRISM coding rules.

Two followup tasks I hope to get to soon – (1) analyze how much these different standards overlap/differ, and (2) measure how much the model codes adhere to the standards.

16/11/2010 Update: The UK Met Office standard was an old version that was never publically released, so I’ve removed the link, at the request of the UKMO. I’ll post a newer version if I can sort out the permissions. I’ve added MPI-M’s ICON standards to the list.

AGU session on Climate Change Adaptation

05. November 2010 · 1 comment · Categories: AGU fall meeting 2010

Reading through the schedule for the AGU fall meeting this December, I came across the following session, scheduled for the final day of the conference (Dec 17). What a great line-up of speakers (I’ve pasted in the abstracts, as they’re hard to link to on the AGU’s meeting schedule):

U52A Climate Change Adaptation:

10:20AM Jim Hansen (NASA) “State of Climate Change Science: Need for Adaptation and Mitigation” (Invited)
Observations of on-going climate change, paleoclimate data, and climate simulations all concur: human-made greenhouse gases have set Earth on a path to climate change with dangerous consequences for humanity. We show that the matter is urgent and a moral issue that pits the rich and powerful against the young and unborn, against the defenseless, and against nature. Adaptation can only partially ameliorate the effects, as governments are failing to protect the public interest and failing in their duty to provide young people equal protection of the laws. We quantify the reduction pathway for fossil fuel emissions that is required to restore Earth’s energy balance and stabilize climate. We show that rapid changes in emission pathways are essential to avoid morally unacceptable adaptation requirements.
10:50AM Richard Alley (Penn State U) “Ice in the Hot Box—What Adaptation Challenges Might We Face?” (Invited)
Warming is projected to reduce ice, despite the tendency for increased precipitation. The many projected impacts include amplification of warming, sea-ice shrinkage opening seaways, and loss of water storage in snowpacks. However, sea-level rise may combine the largest effects with the greatest uncertainties. Rapid progress in understanding ice sheets has not yet produced projections with appropriately narrow uncertainties and high confidence to allow detailed planning. The range of recently published scaling arguments and back-of-the-envelope calculations is wide but often includes 1 m of rise this century. Steve Schneider’s many contributions on dangerous anthropogenic influence and on decision-making in the face of uncertainty help provide context for interpreting these preliminary and rapidly evolving results.
11:10AM Ken Caldeira (Stanford) Adaptation to Impacts of Greenhouse Gases on the Ocean (Invited)
Greenhouse gases are producing changes in ocean temperature and circulation, and these changes are already adversely affecting marine biota. Furthermore, carbon dioxide is absorbed by the oceans from the atmosphere, and this too is already adversely affecting some marine ecosystems. And, of course, sea-level rise affects both what is above and below the waterline.
Clearly, the most effective approach to limit the negative impacts of climate change and acidification on the marine environment is to greatly diminish the rate of greenhouse gas emissions. However, there are other measures that can be taken to limit some of the negative effects of these stresses in the marine environment.
Marine ecosystems are subject to multiple stresses, including overfishing, pollution, and loss of coastal wetlands that often serve as nurseries for the open ocean. The adaptive capacity of marine environments can be improved by limiting these other stresses.
If current carbon dioxide emission trends continue, for some cases (e.g., coral reefs), it is possible that no amount of reduction in other stresses can offset the increase in stresses posed by warming and acidification. For other cases (e.g., blue-water top-predator fisheries), better fisheries management might yield improved population health despite continued warming and acidification.
In addition to reducing stresses so as to improve the adaptive capacity of marine ecosystems, there is also the issue of adaptation in human communities that depend on this changing marine environment. For example, communities that depend on services provided by coral reefs may need to locate alternative foundations for their economies. The fishery industry will need to adapt to changes in fish abundance, timing and location.
Most of the things we would like to do to increase the adaptive capacity of marine ecosystems (e.g., reduce fishing pressure, reduce coastal pollution, preserve coastal wetlands) are things that would make sense to do even in the absence of threats from climate change and ocean acidification. Therefore, these measures represent “no regrets” policy options for the marine environment.
Nevertheless, even with adaptive policies in place, continued greenhouse gas emissions increasingly risk damaging marine ecosystems and the human communities that depend on them.
11:30AM Alan Robock (Rutgers) Geoengineering and adaptation
Geoengineering by carbon capture and storage (CCS) or solar radiation management (SRM) has been suggested as a possible solution to global warming. However, it is clear that mitigation should be the main response of society, quickly reducing emissions of greenhouse gases. While there is no concerted mitigation effort yet, even if the world moves quickly to reduce emissions, the gases that are already in the atmosphere will continue to warm the planet. CCS, if a system that is efficacious, safe, and not costly could be developed, would slowly remove CO2 from the atmosphere, but this will have a gradual effect on concentrations. SRM, if a system could be developed to produce stratospheric aerosols or brighten marine stratocumulus clouds, could be quickly effective in cooling, but could also have so many negative side effects that it would be better not do it at all. This means that, in spite of a concerted effort at mitigation and to develop CCS, there will be a certain amount of global warming in our future. Because CCS geoengineering will be too slow and SRM geoengineering is not a practical or safe solution to geoengineering, adaptation will be needed. Our current understanding of geoengineering makes it even more important to focus on adaptation responses to global warming.
11:50AM Olga Wilhelmi (NCAR) Adaptation to heat health risk among vulnerable urban residents: a multi-city approach
Recent studies on climate impacts demonstrate that climate change will have differential consequences in the U.S. at the regional and local scales. Changing climate is predicted to increase the frequency, intensity and impacts of extreme heat events prompting the need to develop preparedness and adaptation strategies that reduce societal vulnerability. Central to understanding societal vulnerability, is population’s adaptive capacity, which, in turn, influences adaptation, the actual adjustments made to cope with the impacts from current and future hazardous heat events. To-date, few studies have considered the complexity of vulnerability and its relationship to capacity to cope with or adapt to extreme heat. In this presentation we will discuss a pilot project conducted in 2009 in Phoenix, AZ, which explored urban societal vulnerability and adaptive capacity to extreme heat in several neighborhoods. Household-level surveys revealed differential adaptive capacity among the neighborhoods and social groups. In response to this pilot project, and in order to develop a methodological framework that could be used across locales, we also present an expansion of this project into Houston, TX and Toronto, Canada with the goal of furthering our understanding of adaptive capacity to extreme heat in very different urban settings. This presentation will communicate the results of the extreme heat vulnerability survey in Phoenix as well as the multidisciplinary, multi- model framework that will be used to explore urban vulnerability and adaptation strategies to heat in Houston and Toronto. We will outline challenges and opportunities in furthering our understanding of adaptive capacity and the need to approach these problems from a macro to a micro level.
12:05PM Anthony Socci (US EPA) An Accelerated Path to Assisting At-Risk Communities Adapt to Climate Change
Merely throwing money at adaptation is not development. Nor can the focus of adaptation assistance be development alone. Rather, adaptation assistance is arguably best served when it is country- or community-driven, and the overarching process is informed and guided by a set of underlying principles or a philosophy of action that primarily aims at improving the lives and livelihoods of affected communities.
In the instance of adaptation assistance, I offer the following three guiding principles: 1. adaptation is at its core, about people; 2. adaptation is not merely an investment opportunity or suite of projects but a process, a lifestyle; and 3. adaptation cannot take place by proxy; nor can it be imposed on others by outside entities.
With principles in hand, a suggested first step toward action is to assess what resources, capacity and skills one is capable of bringing to the table and whether these align with community needs. Clearly issues of scale demand a strategic approach in the interest of avoiding overselling and worse, creating false expectations. And because adaptation is a process, consider how best to ensure that adaptation activities remain sustainable by virtue of enhancing community capacity, resiliency and expertise should assistance and/or resources dwindle or come to an end.
While not necessarily a first step, community engagement is undoubtedly the most critical element in any assistance process, requiring sorting out and agreeing upon terms of cooperation and respective roles and responsibilities, aspects of which should include discussions on how to assess the efficacy of resource use, how to assess progress, success or outcomes, what constitutes same, and who decides. It is virtually certain that adaptation activities are unlikely to take hold or maintain if they are not community led, community driven or community owned. There is no adaptation by proxy or fiat.
It’s fair to ask at this point, how might one know what communities and countries need, what and where the opportunities are to assist countries and communities in adapting to climate change, and how might one get started? One of the most effective and efficient ways of identifying community/country needs, assistance opportunities and community/country entry points is to search the online archive of National Adaptation Programmes of Action (NAPAs) that many of the least developed countries have already assembled in conformance with the UNFCCC process. Better still perhaps, consider focusing on community-scale assessments and adaptation action plans that have already been compiled by various communities seeking assistance as national plans are unlikely to capture the nuances and variability of community needs. Unlike NAPAs, such plans are not archived in a central location. Yet clearly, community-scale plans in particular, not only represent an assessment of community needs and plans, presumptively crafted by affected communities, but also represent opportunities to align assistance resources and capacity with community needs, providing the basis for engaging affected communities in an accelerated process. Simply stated, take full advantage of the multitude of assessment and planning efforts that communities have already engaged in on their own behalf.

New undergrad course on climate models

04. November 2010 · 1 comment · Categories: climate modeling, courses

After an exciting sabbatical year spent visiting a number of climate modeling centres, I’ll be back to teaching in January. I’ll be introducing two brand new courses, both related to climate modeling. I already blogged about my new grad course on “Climate Change Informatics”, which will cover many current research issues to do with software and data in climate science.

But I didn’t yet mention my new undergrad course. I’ll be teaching a 199 course in January, which I’ve never done before. 199 courses are first-year seminar courses, open to all new students across the faculty of arts and science, intended to encourage critical thinking, communication and research skills. They are run as small group seminar courses (enrolment is capped at 24 students). I’ve never taught one of these courses before, so I’ve no idea what to expect – I’m hoping for an interesting mix of students with different backgrounds, so we can spend some time attacking the theme of the course from different perspectives. Here’s my course description:

“Climate Change: Software, Science and Society”

This course will examine the role of computers and software in understanding climate change. We will explore the use of computer models to build simulations of the global climate, including a historical view of the use of computer models to understand weather and climate, and a detailed look at the current state of computer modelling, especially how global climate models are tested, what kinds of experiments are performed with them, how scientists know they can trust the models, and how they deal with uncertainty. The course will also explore the role of computer models in helping to shape society’s responses to climate change, in particular, what they can (and can’t) tell us about how to make effective decisions about government policy, international treaties, community action and the choices we make as individuals. The course will take a cross-disciplinary approach to these questions, looking at the role of computer models in the physical sciences, environmental science, politics, philosophy, sociology and economics of climate change. However, students are not expected to have any specialist knowledge in any of these fields prior to the course.

If all goes well, I plan to include some hands-on experimentation with climate models, perhaps using EdGCM (or even CESM if I can simplify the process of installing it and running it for them). We’ll also look at how climate models are perceived in the media and blogosphere (that will be interesting!) and compare these perceptions to what really goes on in climate modelling labs. Of course, the nice thing about a small seminar course is that I can be flexible about responding to the students’ own interests. I’m really looking forward to this…

What is a climate model?

03. November 2010 · 1 comment · Categories: climate modeling

Here’s a very nice video explaining the basics of how climate models work, produced by the folks at IPSL in Paris. This version is French with English subtitles – for the francophones out there, you’ll notice the narration is a little more detailed than the subtitles. I particularly like bit where the earth grid is unpeeled and fed into the supercomputers:

[Qt:http://www.cs.toronto.edu/~sme/movies/ipsl-modeling-med.mov 480 360]

The original (without the English subtitles) is here: http://www.youtube.com/user/CEADSMCOM

After the fire

02. November 2010 · 1 comment · Categories: impacts and adaptation

In my last post, I described our firsthand experience of flooding in Venice, and pondered the likely impact of climate change on Venice in the future. But that wasn’t our only firsthand experience of the impacts of climate change on our travels this summer. Having visited NCAR in July this year, we decided to come back to Boulder for the rest of the fall, to give me a chance to do more followup interviews with the NCAR folks, while I write up the findings from my studies of the software development processes for climate models.

Back in August I found a great house for us to rent, up in the mountains at Gold Hill. Shortly after I paid the deposit, I discovered the house was right in the middle of one of the most devastating forest fires in Colorado’s history. The fire, now known as the Fourmile Canyon fire, started on September 6, 2010, burned for over a week, affecting 6,181 acres, and destroying 169 homes. In terms of acreage, it wasn’t the biggest fire ever in Colorado, but in terms of destruction of property and damage costs, it was the worst ever.

I first heard about the fire while I was attending the Surface Temperature Record workshop in Exeter in September, and only then because of a conversation at dinner with some of the NCAR folks whose homes were in the evacuation zone. We spent the next few days wondering whether we’d have somewhere to live after all this fall, and trying to trace the path of the fire on various collaborative maps created by those on the scene. Not that we were affected anywhere near as much as the people who were evacuated, many of whom lost their homes and everything in them. But it gave us a taste of the impact of these massive forest fires on the communities who are affected.

Amazingly, the house we’re renting survived, even though several of the neighbouring houses burned down. Indeed, it seems amazing just how random the fire was – several patches of ground a few hundred yards from our house have been burned, but almost everything we can see from the house is untouched. The satellite images seemed to show huge areas completely devastated, but in reality, the affected area is now a real patchwork of healthy trees and burned sections.

Burned trees on Sunshine Canyon Drive

But this patchwork effect is actually easy to understand once you build a good computational model. I particularly like this NCAR simulation of the spread of forest fire. Notice how the prevailing winds (shown by the arrows) push the fire forward, but also how the updraft from the fire affects the wind pattern to the sides and in the path of the fire, effectively funnelling it into a narrower and narrower path. This certainly corresponds to the stripes of fire damage now visible in the area of the Fourmile Canyon fire, and explains why the fire damage seems so patchy.

Patches of burned and unburned trees, from Sunshine Canyon Drive

As this fire was unusually large by Colorado standards, I wondered about the impact of climate change. In particular, I thought the damage caused by Mountain Pine Beetles might be to blame. When we drove up to Breckenridge in July, the kids noticed that many of the trees were dead, and we googled a little back then to discover it was because the hotter, drier summers were encouraging the spread of the pine beetles, and weakening the trees’ defences. And from a major study published in September this year, it’s clear that climate change is a major factor, and the destruction to pine forests across the North American Rockies will only get worse as climate change progresses.

So I figured all those dead trees would just encourage bigger wildfires. But, as usual with climate change, it’s not that simple. In particular, the areas damaged by fire this year don’t correlate with the areas most damaged by the beetles. It looks like the trees killed by beetles are actually less susceptible to fire, because the needles drop to the forest floor and decompose fairly quickly, while the trees lose the oils that encourage fire in the tree canopy. But although the beetle damage doesn’t cause the fires, climate change affects both, because the hotter drier summers increases both the spread of the beetles and the likelihood of fires.