In my last two posts, I demolished the idea that climate models need Independent Verification and Validation (IV&V), and I described the idea of a toolbox approach to V&V. Both posts were attacking myths: in the first case, the myth that an independent agent should be engaged to perform IV&V on the models, and in the second, the myth that you can critique the V&V of climate models without knowing anything about how they are currently built and tested.
I now want to expand on the latter point, and explain how the day-to-day practices of climate modellers taken together constitute a robust validation process, and that the only way to improve this validation process is just to do more of it (i.e. give the modeling labs more funds to expand their current activities, rather than to do something very different).
The most common mistake made by people discussing validation of climate models is to assume that a climate model is a thing-in-itself, and that the goal of validation is to demonstrate that some property holds of this thing. And whatever that property is, the assumption is that such measurement of it can be made without reference to its scientific milieu, and in particular without reference to its history and the processes by which it was constructed.
This mistake leads people to talk of validation in terms of how well “the model” matches observations, or how well “the model” matches the processes in some real world system. This approach to validation is, as Oreskes et al pointed out, quite impossible. The models are numerical approximations of complex physical phenomena. You can verify that the underlying equations are coded correctly in a given version of the model, but you can never validate that a given model accurately captures real physical processes, because it never will accurately capture them. Or as George Box summed it up: “All models are wrong…” (we’ll come back to the second half of the quote later).
The problem is that there is no such thing as “the model”. The body of code that constitutes a modern climate model actually represents an enormous number of possible models, each corresponding to a different way of configuring that code for a particular run. Furthermore, this body of code isn’t a static thing. The code is changed on a daily basis, through a continual process of experimentation and model improvement. Often these changes are done in parallel, so that there are multiple version at any given moment, being developed along multiple lines of investigation. Sometimes these lines of evolution are merged, to bring a number of useful enhancements together into a single version. Occasionally, the lines diverge enough to cause a fork: a point at which they are different enough that it just becomes too hard to reconcile them (See for example, this visualization of the evolution of ocean models). A forked model might at some point be given a new name, but the process by which a model gets a new name is rather arbitrary.
Occasionally, a modeling lab will label a particular snapshot of this evolving body of code as an “official release”. An official release has typically been tested much more extensively, in a number of standard configurations for a variety of different platforms. It’s likely to be more reliable, and therefore easier for users to work with. By more reliable here, I mean relatively free from coding defects. In other words, it is better verified than other versions, but not necessarily better validated (I’ll explain why shortly). In many cases, official releases also contain some significant new science (e.g. new parameterizations), and these scientific enhancements will be described in a set of published papers.
However, an official release isn’t a single model either. Again it’s just a body of code that can be configured to run as any of a huge number of different models, and it’s not unchanging either – as with all software, there will be occasional bugfix releases applied to it. Oh, and did I mention that to run a model, you have to make use of a huge number of ancillary datafiles, which define everything from the shape of the coastlines and land surfaces, to the specific carbon emissions scenario to be used. Any change to these effectively gives a different model too.
So, if you’re hoping to validate “the model”, you have to say which one you mean: which configuration of which code version of which line of evolution, and with which ancillary files. I suppose the response from those clamouring for something different in the way of model validation would say “well, the one used for the IPCC projections, of course”. Which is a little tricky, because each lab produces a large number of different runs for the CMIP process that provides input to the IPCC, and each of these is a likely to involve a different model configuration.
But let’s say for sake of argument that we could agree on a specific model configuration that ought to be “validated”. What will we do to validate it? What does validation actually mean? The Oreskes paper I mentioned earlier already demonstrated that comparison with real world observations, while interesting, does not constitute “validation”. The model will never match the observations exactly, so the best we’ll ever get along these lines is an argument that, on balance, given the sum total of the places where there’s a good match and the places where there’s a poor match, that the model does better or worse than some other model. This isn’t validation, and furthermore it isn’t even a sensible way of thinking about validation.
At this point many commentators stop, and argue that if validation of a model isn’t possible, then the models can’t be used to support the science (or more usually, they mean they can’t be used for IPCC projections). But this is a strawman argument, based on a fundamental misconception of what validation is all about. Validation isn’t about checking that a given instance of a model satisfies some given criteria. Validation is about about fitness for purpose, which means it’s not about the model at all, but about the relationship between a model and the purposes to which it is put. Or more precisely, its about the relationship between particular ways of building and configuring models and the ways in which runs produced by those models are used.
Furthermore, the purposes to which models are put and the processes by which they are developed co-evolve. The models evolve continually, and our ideas about what kinds of runs we might use them for evolve continually, which means validation must take this ongoing evolution into account. To summarize, validation isn’t about a property of some particular model instance; its about the whole process of developing and using models, and how this process evolves over time.
Let’s take a step back a moment, and ask what is the purpose of a climate model. The second half of the George Box quote is “…but some models are useful”. Climate models are tools that allow scientists to explore their current understanding of climate processes, to build and test theories, and to explore the consequences of those theories. In other words we’re dealing with three distinct systems:
There does not need to be any clear relationship between the calculational system and the observational system – I didn’t include such a relationship in my diagram. For example, climate models can be run in configurations that don’t match the real world at all: e.g. a waterworld with no landmasses, or a world in which interesting things are varied: the tilt of the pole, the composition of the atmosphere, etc. These models are useful, and the experiments performed with them may be perfectly valid, even though they differ deliberately from the observational system.
What really matters is the relationship between the theoretical system and the observational system: in other words, how well does our current understanding (i.e. our theories) of climate explain the available observations (and of course the inverse: what additional observations might we make to help test our theories). When we ask questions about likely future climate changes, we’re not asking this question of the the calculational system, we’re asking it of the theoretical system; the models are just a convenient way of probing the theory to provide answers.
By the way, when I use the term theory, I mean it in exactly the way it’s used in throughout all sciences: a theory is the best current explanation of a given set of phenomena. The word “theory” doesn’t mean knowledge that is somehow more tentative than other forms of knowledge; a theory is actually the kind of knowledge that has the strongest epistemological basis of any kind of knowledge, because it is supported by the available evidence, and best explains that evidence. A theory might not be capable of providing quantitative predictions (but it’s good when it does), but it must have explanatory power.
In this context, the calculational system is valid as long as it can offer insights that help to understand the relationship between the theoretical system and the observational system. A model is useful as long as it helps to improve our understanding of climate, and to further the development of new (or better) theories. So a model that might have been useful (and hence valid) thirty years ago might not be useful today. If the old approach to modelling no longer matches current theory, then it has lost some or all of its validity. The model’s correspondence (or lack of) to the observations hasn’t changed (*), nor has its predictive power. But its utility as a scientific tool has changed, and hence its validity has changed.
[(*) except that that accuracy of the observations may have changed in the meantime, due to the ongoing process of discovering and resolving anomalies in the historical record.]
The key questions for validation then, are to do with how well the current generation of models (plural) support the discovery of new theoretical knowledge, and whether the ongoing process of improving those models continues to enhance their utility as scientific tools. We could focus this down to specific things we could measure by asking whether each individual change to the model is theoretically justified, and whether each such change makes the model more useful as a scientific tool.
To do this requires a detailed study of day-to-day model development practices, the extent to which these are closely tied with the rest of climate science (e.g. field campaigns, process studies, etc). It also takes in questions such as how modeling centres decide on their priorities (e.g. which new bits of science to get into the models sooner), and how each individual change is evaluated. In this approach, validation proceeds by checking whether the individual steps taken to construct and test changes to the code add up to a sound scientific process, and how good this process is at incorporating the latest theoretical ideas. And we ought to be able to demonstrate a steady improvement in the theoretical basis for the model. An interesting quirk here is that sometimes an improvement to the model from a theoretical point of view reduces its skill at matching observations; this happens particularly when we’re replacing bits of the model that were based on empirical parameters with an implementation that has a stronger theoretical basis, because the empirical parameters were tuned to give a better climate simulation, without necessarily being well understood. In the approach I’m describing, this would be an indicator of an improvement in validity, even while reduces the correspondence with observations. If on the other hand we based our validation on some measure of correspondence with observations, such a step would reduce the validity of the model!
But what does all of this tell us about whether it’s “valid” to use the models to produce projections of climate change into the future? Well, recall that when we ask for projections of future climate change, we’re not asking the question of the calculational system, because all that would result in is a number, or range of numbers, that are impossible to interpret, and therefore meaningless. Instead we’re asking the question of the theoretical system: given the sum total of our current theoretical understanding of climate, what is likely to happen in the future, under various scenarios for expected emissions and/or concentrations of greenhouse gases? If the models capture our current theoretical understanding well, then running the scenario on the model is a valid thing to do. If the models do a poor job of capturing our theoretical understanding, then running the models on these scenarios won’t be very useful.
Note what is happening here: when we ask climate scientists for future projections, we’re asking the question of the scientists, not of their models. The scientists will apply their judgement to select appropriate versions/configurations of the models to use, they will set up the runs, and they will interpret the results in the light of what is known about the models’ strengths and weaknesses and about any gaps between the comptuational models and the current theoretical understanding. And they will add all sorts of caveats to the conclusions they draw from the model runs when they present their results.
And how do we know whether the models capture our current theoretical understanding? By studying the processes by which the models are developed (i.e. continually evolved) be the various modeling centres, and examining how good each centre is at getting the latest science into the models. And by checking that whenever there are gaps between the models and the theory, these are adequately described by the caveats in the papers published about experiments with the models.
Summary: It is a mistake to think that validation is a post-hoc process to be applied to an individual “finished” model to ensure it meets some criteria for fidelity to the real world. In reality, there is no such thing as a finished model, just many different snapshots of a large set of model configurations, steadily evolving as the science progresses. And fidelity of a model to the real world is impossible to establish, because the models are approximations. In reality, climate models are tools to probe our current theories about how climate processes work. Validity is the extent to which climate models match our current theories, and the extent to which the process of improving the models keeps up with theoretical advances.