Valdivino, who is working on a PhD in Brazil, on formal software verification techniques, is inspired by my suggestion to find ways to apply our current software research skills to climate science. But he asks some hard questions:
1.) If I want to Validate and Verify climate models should I forget all the things that I have learned so far in the V&V discipline? (e.g. Model-Based Testing (Finite State Machine, Statecharts, Z, B), structural testing, code inspection, static analysis, model checking)
2.) Among all V&V techniques, what can really be reused / adapted for climate models?
Well, I wish I had some good answers. When I started looking at the software development processes for climate models, I expected to be able to apply many of the  formal techniques I’ve worked on in the past in Verification and Validation (V&V) and Requirements Engineering (RE). It turns out almost none of it seems to apply, at least in any obvious way.
Climate models are built through a long, slow process of trial and error, continually seeking to improve the quality of the simulations (See here for an overview of how they’re tested). As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work, because the relationship between particular earth system processes and the overall behaviour of the climate system is exactly what the researchers are working to understand.
Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works. The closest they come to a formal specification is a set of hypotheses along the lines of:
“if I change <piece of code> in <routine>, I expect it to have <specific impact on model error> in <output variable> by <expected margin> because of <tentative theory about climactic processes and how they’re represented in the model>”
This hypothesis can then be tested by a formal experiment in which runs of the model with and without the altered code become two treatments, assessed against the observational data for some relevant period in the past. The expected improvement might be a reduction in the root mean squared error for some variable of interest, or just as importantly, an improvement in the variability (e.g. the seasonal or diurnal spread).
The whole process looks a bit like this (although, see Jakob’s 2010 paper for a more sophisticated view of the process):
So the closest thing they have to a specification would be a chart such as the following (courtesy of Tim Johns at the UK Met Office):
This chart shows how well the model is doing on 34 selected output variables (click the graph to see a bigger version, to get a sense of what the variables are). The scores for the previous model version have been normalized to 1.0, so you can quickly see whether the new model version did better or worse for each output variable – the previous model version is the line at “1.0” and the new model version is shown as the coloured dots above and below the line. The whiskers show the target skill level for each variable. If the coloured dots are within the whisker for a given variable, then the model is considered to be within the variability range for the observational data for that variable. Colour-coded dots then show how well the current version did: green dots mean it’s within the target skill range, yellow mean it’s outside the target range, but did better than the previous model version, and red means it’s outside the target and did worse than the previous model version.
Now, as we know, agile software practices aren’t really amenable to any kind of formal verification technique. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification (the ‘target skill levels’ in the chart above don’t count – these aspirational goals rather than specifications). And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.
So does this really mean, as Valdivino suggests, that we can’t apply any of our toolbox of formal verification methods? I think attempting to answer this would make a great research project. I have some ideas for places to look where such techniques might be applicable. For example:
- One important built-in check in a climate model is ‘conservation of mass’. Some fluxes move mass between the different components of the model. Water is an obvious one – it’s evaporated from the oceans, to become part of the atmosphere, and is then passed to the land component as rain, thence to the rivers module, and finally back to the ocean. All the while, the total mass of water across all components must not change. Similar checks apply to salt, carbon (actually this does change due to emissions), and various trace elements. At present, such checks are this is built in to the models as code assertions. In some cases, flux corrections were necessary because of imperfections in the numerical routines or the geometry of the grid, although in most cases, the models have improved enough that most flux corrections have been removed. But I think you could automatically extract from the code an abstracted model capturing just the ways in which these quantities change, and then use a model checker to track down and reason about such problems.
- A more general version of the previous idea: In some sense, a climate model is a giant state-machine, but the scientists don’t ever build abstracted versions of it – they only work at the code level. If we build more abstracted models of the major state changes in each component of the model, and then do compositional verification over a combination of these models, it *might* offer useful insights into how the model works and how to improve it. At the very least, it would be an interesting teaching tool for people who want to learn about how a climate model works.
- Climate modellers generally don’t use unit testing. The challenge here is that they find it hard to write down correctness properties for individual code units. I’m not entirely clear how formal methods could help here, but it seems like someone with experience of patterns for temporal logic properties might be able to help here. Clune and Rood have a forthcoming paper on this in November’s IEEE Software. I suspect this is one of the easiest places to get started for software people new to climate models.
- There’s one other kind of verification test that is currently done by inspection, but might be amenable to some kind of formalization: the check that the code correctly implements a given mathematical formula. I don’t think this will be a high value tool, as the fortran code is close enough to the mathematics that simple inspection is already very effective. But occasionally a subtle bug slips through – for example, I came across an example where the modellers discovered they had used the wrong logarithm (loge in place of log10), although this was more due to lack of clarity in the original published paper, rather than a coding error.
Feel free to suggest more ideas in the comments!