A reader writes to me from New Zealand, arguing that climate science isn’t a science at all because there is no possibility to conduct experiments. This misconception appears to be common, even among some distinguished scientists, who presumably have never taken the time to read many published papers in climatology. The misconception arises because people assume that climate science is all about predicting future climate change, and because such predictions are for decades/centuries into the future, and we only have one planet to work with, we can’t check to see if these predictions are correct until it’s too late to be useful.

In fact, predictions of future climate are really only a by-product of climate science. The science itself concentrates on improving our understanding of the processes that shape climate, by analyzing observations of past and present climate, and testing how well we understand them. For example, detection/attribution studies focus on the detection of changes in climate that are outside the bounds of natural variability (using statistical techniques), and determining how much of the change can be attributed to each of a number of possible forcings (e.g. changes in: greenhouse gases, land use, aerosols, solar variation, etc). Like any science, the attribution is done by creating hypotheses about possible effects of each forcing, and then testing those hypotheses. Such hypotheses can be tested by looking for contradictory evidence (e.g. other episodes in the past where the forcing was present or absent, to test how well the hypothesis explains these too). They can also be tested by encoding each hypothesis in a climate model, and checking how well it simulates the observed data.

I’m not a climate modeler, but I have conducted anthropological studies of how how climate modelers work. Climate models are developed slowly and carefully over many years, as scientific instruments. One of the most striking aspects of climate model development is that it is an experimental science in the strongest sense. What do I mean?

Well, a climate model is a detailed theory of some subset of the earth’s physical processes. Like all theories, it is a simplification that focusses on those processes that are salient to a particular set of scientific questions, and approximates or ignores those processes that are less salient. Climate modelers use their models as experimental instruments. They compare the model run with the observational record for some relevant historical period. They then come up with a hypothesis to explain any divergences between the run and the observational record, and make a small improvement to the model that the hypothesis predicts will reduce the divergence. They then run an experiment in which the old version of the model acts as a control, and the new version is the experimental case. By comparing the two runs with the observational record, they determine whether the predicted improvement was achieved (and whether the change messed anything else up in the process). After a series of such experiments, the modelers will eventually either accept the change to the model as an improvement to be permanently incorporated into the model code, or they discard it because the experiments failed (i.e. they failed to give the expected improvement). By doing this day after day, year after year, the models get steadily more sophisticated, and steadily better at simulating real climactic processes.

This experimental approach has another interesting effect: the software appears to be tested much more thoroughly than most commercial software. Whether this actually delivers higher quality code is an interesting question; however, it is clear that the approach is much more thorough than most industry practices for software regression testing.


  1. An alternative way of responding would be to ask back whether astronomy should be considered to be a science?

  2. I’ll piggy-back off Ed’s comment. Actually, it looks like a good point for a full length post of it’s own, si I’ll probably take it up on my own blog shortly.

    The thing is, many people have a view of science they learned in jr. high — that science is something where you do ‘controlled’ ‘experiments’ in your lab, in a few minutes. Things much larger than your lab (like clouds, or planets, or stars), or that change much more slowly than a few minutes (like continental drift, mountain range erosion, climate change, …) get called ‘not science’ by that view.

    That is, of course, wrong. Science is a method, not limited to particular, rather modest, time and space scales.

  3. Hi Steve,

    It is a logical fallacy to analogize the embodiment of climate theory and processes (i.e., climate software) as scientific instruments. The fallacy is what E. T. Jaynes called the mind projection fallacy.

    A climate model (program) is a necessarily simplified representation of the climate. The program’s output, predictions/projections/scenarios, are expressions of our beliefs. Beliefs about future weather statistics — climate. Scientific instruments, on the other hand, measure and thus define what we *know* about the weather and its statistics.

    “Physicist and Bayesian philosopher E.T. Jaynes coined the term mind projection fallacy to refer to this kind of failure to distinguish between epistemological claims (statements about belief, about your map, about what we can say about reality) and ontological claims (statements about reality, about the territory, about how things are).” – from above reference.

    So although I heartily agree with your conclusion: climate science is science (all science is experiment based), I fault the tightness of your reasoning. I only bother to bring it up because, IMHO, an appreciation of this fallacy is necessary to appreciate the real differences between technical software validation and its verification.


  4. George – I don’t see how this is an instance of the mind projection fallacy. That would be true only if a climate model was nothing more than a set of untested conjectures.

    Climate models are scientific instruments in the same sense that a petri dish is a scientific instrument. You can set up all sorts of artificial situations in a petri dish, to experiment with different hypotheses. You still have to demonstrate that the conditions in your petri dishes correspond to some real world situations where you think your theories should apply. Ditto for climate models. You can set up all sorts of conditions in the climate models, and explore what happens.

    In both cases you create theories to explain the observed effects, and test those theories both by designing more tests in the lab, and by checking that the lab conditions are relevant to some part of the real world. When the experiments and the real world data agree, that increases confidence that the theories might be correct. You check out rival theories in the same way, to see if they offer a better explanation. And after many many years of probing a theory in this way, and still finding no refutation, you accept the theory as the best available explanation of the data.

    The key point is that you have to do both: experiment in the lab with your petri dish / climate model, and carefully check that the dish/model corresponds in a strong way to the part of the real world you are trying to understand.

  5. @steve
    Hi Steve,

    Do people really think about things like Petri dishes when they think about scientific instruments? And does science typically include untested conjectures? Does debugging software by “experimenting” make debugging a science? My comment was not related to any of these questions.

    Rather, there is an expression the map is not the territory. I would add that neither are instruments like compasses and meter sticks.

    Just as climate software is not a “territory,” it is also neither a “compass” nor a “meter stick”. Climate software is a “map” of some/all of the climate. Our belief in the climate models are an expression of our belief in our climate “maps.” Not the belief in our climate “compasses.”

    The things we do to ensure the accuracy and precision of compasses and meter sticks are very much different than the things we do to ensure the accuracy and precision of maps.

    Likewise, the things we do in order to ensure the accuracy and precision of climate software is more like the things we do to ensure the accuracy and precision of maps and less like those things we do to ensure the accuracy and precision of compasses and meter sticks.

    BTW, I would be interested in your thoughts (or anyone) about the claim that climate science is an emerging technoscience.


    [George – you’re tilting at windmills. Show me a single climate scientists who “believes in the climate models”. That’s the whole point – they are treated as tools to test our understanding, not as things to believe in. – Steve]

  6. Hi Steve,

    If nobody “believes” in the climate models, then why validate or verify them? Why bother even trying to understand them? I’m certain you are not saying this — so I am misunderstanding you.

    Do you not agree that the dominant paradigm in climate science is of an ensemble of climate models sampled from a distribution centered on the truth?

    Justification? By way of this blog posting (BTW, IMHO, an excellent blog for anyone interested in software V&V), consider its link to Reliability of the CMIP3 ensemble, by J.D. Annan and J.C. Hargreaves, and its reference to a recent posting on James’ Blog here.


    [“Validation” here means a continual process of comparing the models with the observational data. It’s done not because people “believe” in the model, but because the process of comparison leads to interesting insights. As in “here’s a bit we didn’t understand well enough, let’s figure out where we screwed up”. I’m familiar with the posts you link (all good stuff), and will write a more detailed post on model validation. Oh, and the assumption that the ensemble is “centered on the truth” is problematic for many reasons. I certainly wouldn’t describe that as a dominant paradigm. – Steve]

  7. In fact, predictions of future climate are really only a by-product of climate science. The science itself concentrates on improving our understanding of the processes that shape climate, by analyzing observations of past and present climate, and testing how well we understand them.

    As far as that goes, I don’t think you’ll find many folks to argue with you. Unfortunately, it doesn’t really go far enough.

    George – you’re tilting at windmills. Show me a single climate scientists who “believes in the climate models”. That’s the whole point – they are treated as tools to test our understanding, not as things to believe in.

    I don’t think George is tilting at windmills (and not just because he linked my site 😉 ), if the results of climate science / modeling are not eventually translated into some sort of subjective probability or ‘state of belief’ in future outcomes, then they are not useful for informing policy. Maybe that is ok, but based on your motivation statement, I don’t think this is what you mean to say.

    “Validation” here means a continual process of comparing the models with the observational data.

    I think that is a slightly different definition than many folks in the computational physics community are used to. I think Karniadakis puts it well:

    Validation is not always feasible (e.g., in astronomy or in certain nanotechnology applications [or in climate science]), and it is, in general, very costly because it requires data from many carefully conducted experiments.

    Just because you can’t do validation experiments doesn’t mean what you are doing is not science. What it does mean (and what makes climate policy different than most of our computational physics-supported decision making) is that when you act on the output of the unvalidated model, you take on an unquantifiable risk. This is why the attempts of climate policy advocates to borrow credibility from, for instance, the aeronautical engineering community, are unconscionable. The state of validate-ability in these two model-driven fields is entirely different.

    Perhaps this risk is worth taking in the climate policy case, but that is properly a political question rather than a scientific one.

    Looking forward to your model validation post.

  8. Acting without full-up validation experiments is not without precedent. In the case of the US nuclear weapons stockpile stewardship program, the political decision was made to accept the risk, and then US DOE had to figure out how to provide rational / science-based decision support without being able to blow-up any complete weapons. That group of physicists and engineers has already answered many of the questions that the climate science and modeling community is still struggling with (echos of Wegman’s citation analysis, too insular). Fortunately for the nuke boys, they had access to historical test results of full, representative systems and they can still do sub-critical testing on components and assemblies, their problem is a bit easier in that regard than the climate one.

    I think the distinction I didn’t make clear in my previous comment was that the purpose of validation experiments is to know when you have achieved an acceptable level of uncertainty in predicting the things you care about. You know it’s safe enough to stop the ‘continuous cycle’ that you mention and apply the model in anger to support decisions. Without that, you never really know when the model is useful enough, and what level of risk you are accepting by acting on its output.

    (those two comments turned out a little longer than I intended, thanks for letting me write a mini-post of my own on your site)

  9. Verification is required to be successfully completed prior to Validation.

    [I disagree. In fact, from a requirements perspective, there’s no point spending time verifying the product, if you can easily show the requirements are wrong in the first place. – Steve]

  10. …if you can easily show the requirements are wrong in the first place…

    In the climate modeling context would that be policy makers or the public asking for things that don’t make sense / aren’t predictable? “How will cap-n-trade affect the irrigation requirements for the golf courses in my congressional district?”

    Also I think maybe there is some confusion about terms, software folks seem to use the V-words a little differently from the accepted (admittedly arbitrary) definitions settled on by the computational physics community (in fact Roache mentions, sorry I can’t find the link, that he used the terms to mean different things in his early papers than what ended up as accepted by DMSO/AIAA/ASME). If you are solving PDEs then verification basically means grid convergence studies, if you are worried about doing software engineering then verification probably encompasses a broader set of activities.


    [The terms are also used inconsistently throughout the software literature – Steve]

  11. I read your paper, if you take requests, I’d be interesting in hearing about more details on validation notes. That seems like a useful concept.

    One of the things I’ve been struggling with is where to draw the line between short, automated unit / regression tests where there’s a clear pass-fail result, and verification / validation tests that can require a combination of quantitative and qualitative assessment. Thanks for any insight you can share on how those guys automate or organize that process.

    The use of bit checks as a regression test surprised me a bit. I see the value in doing it to make sure that the code is deterministic (given the exact same ICs / libraries / hardware / number of nodes you should get the same trajectory every time, if not, there’s a bug), but as you point out in the paper, going to new hardware / math libraries will change things, so I’m a little unsure of why they think it is valuable to use them in that way.

    Also, it seemed like there was no mention of grid convergence tests, are those generally only done when the ‘dynamic core’ is changed / updated? Do they do them as a part of calculation verification? Is something like the grid convergence index part of an automated report like the validation notes?

    You might find this paper interesting, from the abstract:A set of examples which use a blind-test protocol demonstrates the kinds of coding mistakes that can (and cannot) be exposed via the MMS code Verification procedure.

    Sorry if that’s too many questions, if the answer is ‘no time to field all those’, that’s fine, any that are convenient works too.

    [More good questions: I can’t keep up! I’ll tackle them soon, but remind me in a few weeks if you don’t see me come back to this – Steve]

  12. Pingback: What makes software engineering for climate models different? | Serendipity

  13. You may have already seen this, but if not, thought you’d be interested, abstract:
    Climate modeling is closely tied, through its institutions and practices, to observations from satellites and to the field sciences. The validity, quality and scientific credibility of models are based on interaction between models and observation data. In the case of numerical modeling of climate and climate change, validation is not solely a scientific interest: the legitimacy of computer modeling, as a tool of knowledge, has been called into question in order to deny the reality of any anthropogenic climate change; model validations thereby bring political issues into play as well. There is no systematic protocol of validation: one never validates a model in general, but the capacity of a model to account for a defined climatic phenomenon or characteristic. From practices observed in the two research centers developing and using a climate model in France, this paper reviews different ways in which the researchers establish links between models and empirical data (which are not reduced to the latter validating the former) and convince themselves that their models are valid. The analysis of validation practices—relating to parametrization, modes of variability, climatic phenomena, etc.—allows us to highlight some elements of the epistemology of modeling.

  14. Pingback: Do Climate Models need Independent Verification and Validation? | Serendipity

Leave a Reply

Your email address will not be published. Required fields are marked *