Previously I posted on the first two sessions of the workshop on “A National Strategy for Advancing Climate Modeling” that was held at NCAR at the end of last month:
The third session focussed on the relationship between models and data.
Kevin Trenberth kicked off with a talk on Observing Systems. Unfortunately, I missed part of his talk, but I’ll attempt a summary anyway – apologies if it’s incomplete. His main points were that we don’t suffer from a lack of observational data, but from problems with quality, consistency, and characterization of errors. Continuity is a major problem, because much of the observational system was designed for weather forecasting, where consistency of measurement over years and decades isn’t required. Hence, there’s a need for reprocessing and reanalysis of past data, to improve calibration and assess accuracy, and we need benchmarks to measure the effectiveness of reprocessing tools.
Kevin points out that it’s important to understand that models are used for much more than prediction. They are used:
- for analysis of observational data, for example to produce global gridded data from the raw observations;
- to diagnose climate & improve understanding of climate processes (and thence to improve the models);
- for attribution studies, through experiments to determine climate forcing;
- for projections and prediction of future climate change;
- for downscaling to provide regional information about climate impacts;
Confronting the models with observations is a core activity in earth system modelling. Obviously, it is essential for model evaluation. But observational data is also used to tune the models, for example to remove known systematic biases. Several people at the workshop pointed out that the community needs to do a better job of keeping the data used to tune the models distinct from the data used to evaluate them. For tuning, a number of fields are used – typically top-of-the-atmosphere data such as net shortwave and longwave radiation flux, cloud and clear sky forcing, and cloud fractions. Also, precipitation and surface wind stress, global mean surface temperature, and the period and amplitude of ENSO. Kevin suggests we need to do a better job of collecting information about model tuning from different modelling groups, and ensure model evaluations don’t use the same fields.
For model evaluation, a number of integrated score metrics have been proposed to summarize correlation, root-mean-squared (rms) error and variance ratios – See for example, Taylor 2001; Boer and Lambert 2001; Murphy et al, 2004; Reichler & Kim 2008.
But model evaluation and tuning aren’t the only ways in which models and data are brought together. Just as important is re-analysis, where multiple observational datasets are processed through a model to provide more comprehensive (model-like) data products. For this, data assimilation is needed, whereby observational data fields are used to nudge the model at each timestep as it runs.
Kevin also talked about forward modelling, a technique in which the model used to reproduce the signal that a particular instrument would record, given certain climate conditions. Forward modelling is used for comparing models with ground observations and satellite data. In much of this work, there is an implicit assumption that the satellite data are correct, but in practice, all satellite data have biases, and need re-processing. For this work, the models need good emulation of instrument properties and thresholds. For examples, see: Chepfer, Bony et al, 2010; Stubenrauch & Kinne 2009.
He also talked about some of the problems with existing data and models:
- nearly all satellite data sets contain large spurious variability associated with changing instruments and satellites, orbital decay/drift, calibration, and changing methods of analysis.
- simulation of the hydrological cycle is poor, especially in the intertropical convergence zone (ITCZ). Tropical transients are too weak, runoff and recycling is not correct, and the diurnal cycle is poor.
- there are large differences between datasets for low cloud (see Marchand at al 2010)
- clouds are not well defined. Partly this is a problem of sensitivity of instruments, compounded by the difficulty of distinguishing between clouds and aerosols.
- Most models have too much incoming solar radiation in the southern oceans, caused by too few clouds. This makes for warmer oceans and diminished poleward transport, which messes up storm tracking and analysis of ocean transports.
What is needed to support modelling over the next twenty years? Kevin made the following recommendations:
- Support observations and development into climate datasets.
- Support reprocessing and reanalysis.
- Unify NWP and climate models to exploit short term predictions and confront the models with data.
- Develop more forward modelling and observation simulators, but with more observational input.
- Targeted process studies such as GEWEX and analysis of climate extremes, for model evaluation.
- Target problem areas such as monsoons and tropical precipitation.
- Carry out a survey of fields used to tune models.
- Design evaluation and model merit scoring based on fields other than those used in tuning.
- Promote assessments of observational datasets so modellers know which to use (and not use).
- Support existing projects, including GSICS, SCOPE-CM, CLARREO, GRUAN,
Overall, there’s a need for a climate observing system. Process studies should not just be left to the observationists – we need the modellers to get involved.
The second talk was by Ben Kirtman, on “Predictability, Credibility, and Uncertainty Quantification“. He began by pointing out that there is ongoing debate over what predictability means. Some treat it as an inherent property of the climate system, while others think of it as a model property. Ben distinguished two kinds of predictability:
- Sensitivity of the climate system to initial conditions (predictability of the first kind);
- Predictability of the boundary forcing (predictability of the second kind).
Predictability is enhanced by ensuring specific processes are included. For example, you need to include the MJO if you want to predict ENSO. But model-based estimates of predictability are model dependent. If we want to do a better job of assessing predictability, we have to characterize model uncertainty, and we don’t know how to do this today.
Good progress has been made on quantifying initial condition uncertainty. We have pretty of good ideas for how to probe this (stochastic optimals, bred vectors, etc.) using ensembles with perturbed initial conditions. But from our understanding of chaos theory (e.g. see the Lorenz attractor), predictability depends on which part of the regime you’re in, so we need to assess the predictability for each particular forecast.
Uncertainty in external forcing include uncertainties in both the natural and anthropogenic forcing; however this is becoming less of an issue in modelling, as these forcings are better understood. Therefore, the biggest challenge is in quantifying uncertainties in model formulation. These arise because of the discrete representation of climate system, the use of parameterization of subgrid processes, and because of missing processes. Current approaches can be characterized as:
- a posteriori techniques, such as the multi-model ensembles of opportunity used in IPCC assessments, and perturbed parameters/parameterizations, as used in climateprediction.net.
- a priori techniques, where we incorporate uncertainty as the model evolves. The idea is that uncertainty is in subscale processes and missing physics. Model this non-locally and stochastically. E.g. backscatter, interactive ensembles to incorporate uncertainty in the coupling.
The term credibility is even less well defined. Ben asked his students what they understood by the term, and they came up with a simple answer: credibility is the extent to which you use the best available science [which corresponds roughly to my suggestion of what model validation ought to mean]. In the literature, there are a number of other way of expressing credibility:
- In terms of model bias. For example, Lenny Smith offers a Temporal (or spatial) credibility ratio, calculated as the ratio of the smallest timestep in the model to the smallest duration over which a variable has to be averaged before it compares favourably with observations. This expresses how much averaging over the temporal (or spatial) scale you have to do to make the model look like the data.
- In terms of whether the ensembles bracket the observations. But the problem here is that you can always pump up an ensemble to do this, and it doesn’t really tell you about probabilistic forecast skill.
- In terms of model skill. In numerical weather prediction, it’s usual to measure forecast quality using some specific skill metrics.
- In terms of process fidelity – how well the processes represented in the model capture what is known about those processes in reality. This is a reductionist approach, and depends on the extent to which specific processes can be isolated (both in the model, and in the world).
- In terms of faith – for example, the modellers’ subjective assessment of how good their model is.
In the literature, credibility is usually used in a qualitative way to talk about model bias. Hence, in the literature, model bias is roughly synonymous with inverse of credibility. However, in these terms, the models currently have a major credibility gap. For example, Ben showed the annual mean rainfall from a long simulation of CESM1, showing bias with respect to GPCP observations. These show the model struggling to capture the spatial distribution of sea surface temperature (SST), especially in equatorial regions.
Every climate model has a problem with equatorial sea surface temperatures (SST). A recent paper, Anagnostopoulos et al 2009 makes a big deal of this, and is clearly very hostile to climate modelling. They look at regional biases in temperature and precipitation, where the models are clearly not bracketing observations. I googled the Anagnostopooulos paper while Ben was talking – The first few pages of google hits are dominated by denialist website proclaiming this as a major new study demonstrating the models are poor. It’s amusing that this is treated as news, given that such weaknesses in the models are well known within the modelling community, and discussed in the IPCC report. Meanwhile the hydrologists at the workshop tell me that it’s a third-rate journal, so none of them would pay any attention to this paper.
Ben argues that these weaknesses need to be removed to increase model credibility. This argument seems a little weak to me. While improving model skill and removing biases are important goals for this community, they don’t necessarily help with model credibility in terms of using the best science (because often replacing an empirically derived parameterization with one that’s more theoretically justified will often reduce model skill). More importantly, those outside the modeling community will have their own definitions of credibility, and they’re unlikely to correspond to those used within the community. Some attention to the ways in which other stakeholders understand model credibility would be useful and interesting.
In summary, Ben identified a number of important tensions for climate modeling. For example, there are tensions between:
- the desire to measure prediction skill vs. the desire to explore the limits of predictability;
- the desire to quantify uncertainty, vs. the push for more resolution and complexity in the models;
- a priori vs. a posteriori methods of assessing model uncertainty.
- operational vs. research activities (Many modellers believe the IPCC effort is getting a little out of control – it’s a good exercise, but too demanding on resources);
- weather vs climate modelling;
- model diversity vs critical mass;
Ben urged the community to develop a baseline for climate modelling, capturing best practices for uncertainty estimation.