Valdivino, who is working on a PhD in Brazil, on formal software verification techniques, is inspired by my suggestion to find ways to apply our current software research skills to climate science. But he asks some hard questions:

1.) If I want to Validate and Verify climate models should I forget all the things that I have learned so far in the V&V discipline? (e.g. Model-Based Testing (Finite State Machine, Statecharts, Z, B), structural testing, code inspection, static analysis, model checking)
2.) Among all V&V techniques, what can really be reused / adapted for climate models?

Well, I wish I had some good answers. When I started looking at the software development processes for climate models, I expected to be able to apply many of the [edit] formal techniques I’ve worked on in the past in Verification and Validation (V&V) and Requirements Engineering (RE). It turns out almost none of it seems to apply, at least in any obvious way.

Climate models are built through a long, slow process of trial and error, continually seeking to improve the quality of the simulations (See here for an overview of how they’re tested). As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work, because the relationship between particular earth system processes and the overall behaviour of the climate system is exactly what the researchers are working to understand.

Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works. The closest they come to a formal specification is a set of hypotheses along the lines of:

“if I change <piece of code> in <routine>, I expect it to have <specific impact on model error> in <output variable> by <expected margin> because of <tentative theory about climactic processes and how they’re represented in the model>”

This hypothesis can then be tested by a formal experiment in which runs of the model with and without the altered code become two treatments, assessed against the observational data for some relevant period in the past. The expected improvement might be a reduction in the root mean squared error for some variable of interest, or just as importantly, an improvement in the variability (e.g. the seasonal or diurnal spread).

The whole process looks a bit like this (although, see Jakob’s 2010 paper for a more sophisticated view of the process):

And of course, the central V&V technique here is full integration testing. The scientists build and run the full model to conduct the end-to-end tests that constitute the experiments.

So the closest thing they have to a specification would be a chart such as the following (courtesy of Tim Johns at the UK Met Office):

This chart shows how well the model is doing on 34 selected output variables (click the graph to see a bigger version, to get a sense of what the variables are). The scores for the previous model version have been normalized to 1.0, so you can quickly see whether the new model version did better or worse for each output variable – the previous model version is the line at “1.0” and the new model version is shown as the coloured dots above and below the line. The whiskers show the target skill level for each variable. If the coloured dots are within the whisker for a given variable, then the model is considered to be within the variability range for the observational data for that variable. Colour-coded dots then show how well the current version did: green dots mean it’s within the target skill range, yellow mean it’s outside the target range, but did better than the previous model version, and red means it’s outside the target and did worse than the previous model version.

Now, as we know, agile software practices aren’t really amenable to any kind of formal verification technique. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification (the ‘target skill levels’ in the chart above don’t count – these aspirational goals rather than specifications). And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.

So does this really mean, as Valdivino suggests, that we can’t apply any of our toolbox of formal verification methods? I think attempting to answer this would make a great research project. I have some ideas for places to look where such techniques might be applicable. For example:

  • One important built-in check in a climate model is ‘conservation of mass’. Some fluxes move mass between the different components of the model. Water is an obvious one – it’s evaporated from the oceans, to become part of the atmosphere, and is then passed to the land component as rain, thence to the rivers module, and finally back to the ocean. All the while, the total mass of water across all components must not change. Similar checks apply to salt, carbon (actually this does change due to emissions), and various trace elements. At present, such checks are this is built in to the models as code assertions. In some cases, flux corrections were necessary because of imperfections in the numerical routines or the geometry of the grid, although in most cases, the models have improved enough that most flux corrections have been removed. But I think you could automatically extract from the code an abstracted model capturing just the ways in which these quantities change, and then use a model checker to track down and reason about such problems.
  • A more general version of the previous idea: In some sense, a climate model is a giant state-machine, but the scientists don’t ever build abstracted versions of it – they only work at the code level. If we build more abstracted models of the major state changes in each component of the model, and then do compositional verification over a combination of these models, it *might* offer useful insights into how the model works and how to improve it. At the very least, it would be an interesting teaching tool for people who want to learn about how a climate model works.
  • Climate modellers generally don’t use unit testing. The challenge here is that they find it hard to write down correctness properties for individual code units. I’m not entirely clear how formal methods could help here, but it seems like someone with experience of patterns for temporal logic properties might be able to help here. Clune and Rood have a forthcoming paper on this in November’s IEEE Software. I suspect this is one of the easiest places to get started for software people new to climate models.
  • There’s one other kind of verification test that is currently done by inspection, but might be amenable to some kind of formalization: the check that the code correctly implements a given mathematical formula. I don’t think this will be a high value tool, as the fortran code is close enough to the mathematics that simple inspection is already very effective. But occasionally a subtle bug slips through – for example, I came across an example where the modellers discovered they had used the wrong logarithm (loge in place of log10), although this was more due to lack of clarity in the original published paper, rather than a coding error.

Feel free to suggest more ideas in the comments!

21 Comments

  1. You don’t tell how old the model is that was used in the image or was it some experiment with reduced snowfall and aerosols, but if I understood it right, this could be why those parameters are somewhat off the target (observed) values. (Snowfall generally reduces windspeed and aerosols are essential for precipitation), and thank you for these descriptions of models. They clear things up for me at least.

  2. Is an implication of your diagram that it illustrates the scientific method as applied to the climate models? If so, then where are the predictions about reality? It is one thing to explain the past, another to predict the future.

    I fully support your efforts at trying to address the outstanding V&V issues of the climate models — a difficult task to be sure. IMHO, it is important (perhaps an understatement) that a consensus IV&V be reached so everybody can use the models in the engineering sense.

  3. In contrast to formal methods, other procedures and processes for verification and validation have been developed and successfully applied to a variety of scientific and engineering. Pat Roache, a pioneer and leading authority, has written and revised a book on the general concept of V&V for scientific and engineering software. The First Edition, Verification and Validation in Computational Science and Engineering, is available for about half the cost of the Second Edition, Fundamentals of Verification and Validation. And recently Oberkampf and Roy, also among the pioneers, have written Verification and Validation in Scientific Computing.

    On several occasions and at several places, I have posted on procedures and processes with which I have been personally associated. These have been successfully applied to independent verification and validation for several engineering software projects. The reviewers of the software were not in any way associated with development of the software.

    Definitions and Characterization of the Software

    Requirements for Release of Production Software

    Verification Procedures

    These procedures and processes are based on the assumption that documentation sufficient to define the specifications for all aspects of the software are available. Verification of the specifications can’t begin, of course, until the specifications have been identified.

  4. An interesting summary of some history, along with useful references, is given here.

  5. @George: “a consensus IV&V be reached so everybody can use the models in the engineering sense”

    I think this phrase is meaningless. Climate models are scientific tools, built to explore current theories about how the climate system works. It’s unlikely we’ll ever get to the point where “everybody can use them” because to do anything interesting with them, you need years of training in the domain. And “in the engineering sense” makes no sense – this is science, not engineering.

  6. @Dan: I see you’re still playing “broken record”.

    Once again: the processes you prescribe are suited to control systems, where there is very little room for error, and the software has to work correctly once the system goes operational. What you describe doesn’t work for many kinds of software – for example you wouldn’t use it for web design, because users don’t know what they want ahead of time. That’s why agile development was invented. I’ve seen the methods you prescribe applied to the kinds of software for which they are inappropriate. The results ain’t pretty.

    Writing a detailed specification for a climate model is pointless, for the same reasons that writing a detailed specification for an user-centred web design is useless. In both cases, it’s only by writing code and trying things out that you figure out what’s needed.

  7. I have never applied the procedures and processes outlined in the linked material to real-time software systems of any kind. In stark contrast to your un-supported assertions, I would not recommend applications of the methodology to, ” . . . control systems, where there is very little room for error, and the software has to work correctly once the system goes operational.” I do not have any experience in that domain. None. I do not have any experience with any software that, ” . . . has to work correctly once the system goes operational.” whether or not it’s real-time control software. All my experience has been in the engineering domain of multi-phase, multi-scale thermal sciences in both engineered equipment and open systems.

    [...snip (invective removed...]

  8. Pingback: Censored at Serendipity « Models Methods Software

  9. @Dan Hughes

    @Steve: I see you’re still playing “broken record”.

    Can you point a single instance in any of the literature cited in #3 and #4 above in which the methodology is said to be useful for real-time control-system software. And by way of contrast, can you point to any instance at all which indicates that the methodologies are not appropriate for application to the wide range of software encountered in Climate Science?

    Thanks

  10. @Dan: once again, you’ve missed the point. First, to remind you, the original post was mainly about formal verification (which you could interpret loosely as a set of techniques that use automated reasoning about software behaviour). Formal verification isn’t widely used in any kind of software practice, largely because it still doesn’t scale outside of laboratory conditions.

    The references you refer to (esp. Oberkampf and Roy, which is much better than Roache) describe many different V&V techniques, most of which is already routine practice in climate modelling. O&R do briefly mention IV&V, largely as a certification technique, but certainly don’t stress it. They also briefly mention the use of specifications, but then go on to describe a large number of techniques that don’t rely on specification.

    Yet you repeatedly assert that if the models are built without IV&V, and without writing specifications, then they can’t be trusted. I think it’s you who needs to re-read the texts you’re citing.

  11. @Steve: once again, you have failed. You cannot point to a statement by me anywhere in which I’ve said models built without IV&V and without writing specifications can’t be trusted. No where. I challenge you. I have constantly said that all software the results from which will influence public policy decisions should be required to undergo Independent V&V and be maintained under an approved SQA process.

    Additionally if you had read the first of the three links to my blog that I gave in my original comment, you would find that the methodology outlined there has been specifically designed to be applied to as-built software that has evolved over decades of time. You will also find that the software documentation is taken to represent the specifications.

    You have yet to cite a single example which indicates that the methodologies in Roache or Oberkampf and Roy or any of the cited references in those books are not appropriate for application to the wide range of software encountered in Climate Science.

    Let me remind you that the first sentence in my first comment said: “In contrast to formal methods, other procedures and processes for verification and validation have been developed and successfully applied to a variety of scientific and engineering.” I immediately dismissed formal methods. Nobody uses formal methods for real-world problems.

    Why can’t the subject be the focus of any discussions over here? Point me to any publicly available document which describes application of any IV&V methodology to any software used in Climate Science.

    Arguing from a position of authority, especially when the argument lacks connections with reality, is a fail: every time all the time.

  12. @Dan: I’ve edited the original post to make it clearer that I was specifically talking about potential lines of research into *formal* verification techniques (in response to a question from a researcher who has been studying formal verification in grad school). You’re reading far more into what I’m saying than is there.

    As I’ve posted frequently in the past (e.g. here, here and here), a wide variety of V&V techniques are applied to climate model development. But it appears that no matter how much I write about this, and how much is done, if it’s doesn’t conform to your narrow view of what software processes should look like, then it counts for nothing.

    You’re misrepresenting me, both here and elsewhere, as arguing against the use of verification and validation techniques (such as those advocated by Roache, Oberkampf & Roy, etc). I would like you to stop doing that, although I suspect it’s fruitless to ask.

    My specific point of disagreement with you is with the use of specification-based certification methods common in *independent* V&V. These are inappropriate for reasons I’ve set out before. If you want to continue this discussion, I would suggest you actually think about what I’m saying and critique it thoughtfully, rather than continually misrepresenting my position.

    PS. All further accusations of “argumentation from authority” will be deleted. You don’t appear to understand what this logical fallacy actually means. Writing about my considered opinion as an expert, while drawing on the results of my empirical studies, is not an appeal to authority.

  13. I’m arriving a little late to this discussion, but I think this point worth making. On your chart of variables, a critical one is missing: density.

    The equation governing the movement of mass in a GCM does not actually impose full and correct conservation of mass. In some textbooks the relation in question is referred to, more precisely, as a ‘continuity of mass’ equation. It only governs horizontal movement, ensuring that the mass taken from one box is credited to another.

    Vertical conservation of mass is thought to be maintained by a more drastic method of eliminating vertical motion and thus eliminating mass transfer altogether. Each layer keeps whatever air it started with.

    This has been standard practice in GCM’s for more than fifty years, but it is an aspect of climate theory that has never been closely scrutinized.

    [...blah blah blah. I've deleted the rest of your nonsense. I suggest you read some texts on how the models actually work. Start with:
    Arakawa, A., and W.H. Schubert, 1974: Interaction of a cumulus cloud ensemble with the large-scale environment. Part I. J. Atmospheric Sciences, 31, 674–701.
    and work forward from there.
    Are you seriously telling me that people like Judith Curry have read through your BS and didn't correct this nonsense? - Steve]

  14. in #16 below you said:

    You’re misrepresenting me, both here and elsewhere, as arguing against the use of verification and validation techniques (such as those advocated by Roache, Oberkampf & Roy, etc). I would like you to stop doing that, although I suspect it’s fruitless to ask.

    At this, in your first response, you say, firstly:

    Once again: the processes you prescribe are suited to control systems, where there is very little room for error, and the software has to work correctly once the system goes operational.

    and then

    I’ve seen the methods you prescribe applied to the kinds of software for which they are inappropriate. The results ain’t pretty.

    As is exceedingly clear to everyone who has bothered to actually read the material that I have cited, the methodologies have not been developed for and I have never advocated applications to the systems you stated above.

    In the third blockquote here, I take the words to mean that you would not apply the methodologies actually in the material to the very software for which they have been developed. Real-time, safety-critical control systems are not the subjects of the materials. The methodologies have been developed for and applied to scientific and engineering software.

    I challenge you again to point to where I have said what you say I have said. Both here and other places.

    It seems to me that if you make statements about the contents of materials, and these statements are at odds with the actual content of the materials, and you provide no evidence to support your claims of the contents, you’re arguing from a position of authority in the sense that you expect your words to be accepted at face value.

    [...snip (invective removed. Again.)]

  15. @Dan Hughes:

    all software the results from which will influence public policy decisions should be required to undergo Independent V&V and be maintained under an approved SQA process.

    How do you suggest we identify such software, and how ought we to fund the process? I imagine we should start with the low-hanging fruit such as Mathematica, Matlab, IDL, R, Excel, and all the software used for economic modelling.

  16. @Dan: *sigh*. The stuff you write on your blog is completely inconsistent with the texts you cite. So, on the one hand, virtually everything written in O&R about V&V for numerical codes is standard practice for climate and NWP models. On the other hand, nearly all the stuff you write in your blog about certifying process steps is not, and would be inappropriate. O&R don’t even mention IV&V. Not once.

    So the problem is, either you have absolutely no understanding of the appropriate V&V techniques for numerical simulations, or your just trying to score political points. Claiming, as you’ve done at Judith Curry’s blog, that I’ve said there’s no V&V on climate models seems to me to be evidence of the latter. Because you’ve seen and commented on the many posts I’ve written in the past about the extensive V&V used in climate modeling. Somehow or other, you think that if the particular process steps you insist on aren’t applied, then the models aren’t V&Ved. That’s pure BS, and it’s a real pity that the credulous folks over at Curry’s believe any of it.

  17. Dean Brooks :
    I’m arriving a little late to this discussion, but I think this point worth making. On your chart of variables, a critical one is missing: density.
    The equation governing the movement of mass in a GCM does not actually impose full and correct conservation of mass. In some textbooks the relation in question is referred to, more precisely, as a ‘continuity of mass’ equation. It only governs horizontal movement, ensuring that the mass taken from one box is credited to another.
    Vertical conservation of mass is thought to be maintained by a more drastic method of eliminating vertical motion and thus eliminating mass transfer altogether. Each layer keeps whatever air it started with.
    This has been standard practice in GCM’s for more than fifty years, but it is an aspect of climate theory that has never been closely scrutinized.
    [...blah blah blah. I've deleted the rest of your nonsense. I suggest you read some texts on how the models actually work. Start with:
    Arakawa, A., and W.H. Schubert, 1974: Interaction of a cumulus cloud ensemble with the large-scale environment. Part I. J. Atmospheric Sciences, 31, 674–701.
    and work forward from there.
    Are you seriously telling me that people like Judith Curry have read through your BS and didn't correct this nonsense? - Steve]

    Thanks for responding.

    I’ve read Arakawa and Schubert. I’ve also read the remarks in model documentation about Arakawa and Schubert and how they are implemented.

    In the CAM3 model, for example, there is an elaborate updraft and downdraft entrainment scheme based on Arakawa and Schubert which requires 15 pages to describe (see the CAM3 description, Collins et al 2004). It lays out a series of mass flux variables which are used to estimate the quantity of energy and moisture that is carried from layer to layer. I have no quarrel with the mechanisms laid out there. I’m not an expert on cloud formation, obviously, but I don’t have to be, because the model documentation then explains, on page 90, that none of these mass flux variables are allowed to move dry air mass from layer to layer. They are employed in the calculation of the other parameterized variables, and then discarded when the model steps to its next iteration.

    “Because there is an explicit relationship between the surface pressure and the air mass within each layer we assume that water mass can change within the layer by physical parameterizations but dry air mass cannot.”

    Got that? The dry air mass in a layer cannot be changed by the physical parameterizations (that is, by anything based on Arakawa-Schubert). The equation for the iterative recalculation of the air mass in the layer is given in that paragraph on page 90, and it does not contain any of the mass flux terms so painstakingly laid out in the previous 15 pages. It is a strict hydrostatic approximation as I described in my paper.

    What we see in the models is a kind of virtual or incomplete Arakawa-Schubert. The updrafts and downdrafts only move moisture, or tracer, or heat energy. They don’t move dry air. If they did move dry air it would lead to stability problems in the dynamics — again, as I explained in my paper.

    I’ve had this issue of Arakawa-Schubert and convection raised a number of times now. Clearly I need to add a section to the paper explaining why it doesn’t answer the challenge I am making.

    Let me ask you this question in hopes of speeding the discussion along. Assume for a moment that I’ve done my homework and the hydrostatic approximation does limit movement of air from layer to layer as I describe. Would you then concede that the problems I attribute to the approximation are plausible?

    I’m just trying to evaluate where the biggest resistance to my argument lies. Thank you again.

  18. Dean Brooks :
    Vertical conservation of mass is thought to be maintained by [snip]

    “Thought to be”? By whom? Why do they think this?

    People think all sorts of things. The sun is thought to be made of iron, by Oliver Manuel. That doesn’t mean that he’s right, or that we should take his thoughts into account.

    Instead of reporting the thoughts of unidentified people, why don’t you find out the truth of the matter?

  19. @Dean Brooks The biggest resistance to your argument is that you clearly have no idea how dynamical cores work. I suggest you enrol in an NWP course, and actually learn this stuff.

    It appears to me that you’ve written an entire paper full of nonsense, based on quote mining a technical document. The conservation of mass equations don’t mean air mass cannot move – they just mean that the relationship between pressure and mass must be preserved. [Did you stop to wonder why the documentation would lay out in detail how the convective processes work, if they're not actually used? Can you really be that stupid?]

    I suggest you try running CAM3 and see what happens. If you’re not competent to do that, then you should ask yourself whether you’re competent to write about this topic at all.

  20. @Steve
    [...snip. Repetition of previous comments deleted, although if you want to read them ad nauseam, I'm sure Dan will post them on his own blog and whine about censorship...]

    You now also say:

    Claiming, as you’ve done at Judith Curry’s blog, that I’ve said there’s no V&V on climate models seems to me to be evidence of the latter.

    I say. Maybe I dropped an I or an Independent. If you will point me to where I said what you claim I have said, I will correct it.

    Thanks

    You also now say:

    O&R don’t even mention IV&V. Not once.

    Well, gee, does that mean the procedures and processes can’t be applied by persons independent of those who developed the software?

  21. @Dan Hughes Ah, Dan. Now you’re starting to get it. The only argument I have with you is your repeated insistence on the “I” part.

    Climate models need, and currently get, extensive V&V. It’s the kind that’s needed for numerical simulations, as set out in great detail in books like Oberkampf and Roy. It has to be carried out by experts, and that means by the same people building the models. We don’t have any other experts available, as I’ve already argued extensively elsewhere.

    By repeatedly and deliberately conflating “V&V” with “IV&V” you’re misrepresenting everything I’ve been saying. In short, lying.

    Until you’re able to recognize that, there’s no point continuing this exchange.

  22. PS. Dan: For an example of how badly wrong attempts at IV&V by people outside the climate and NWP modeling communities can go, see my exchange above with Dean Brooks. He thinks he’s found a major error in climate models that has persisted for 30 years. Unfortunately, he hasn’t got a clue what he’s talking about. Is that the kind of IV&V you’re advocating?

  23. Pingback: How Climate Science is Done | Serendipity

Join the discussion: