Gavin beat me to posting the best quote from the CCSM workshop last week – the Uncertainty Prayer. Uncertainty cropped up as a theme throughout the workshop. In discussions about the IPCC process, one issue came up several times: the likelihood that the spread of model projections in the next IPCC assessment will be larger than in AR4. The models are significantly more complex than they were five years ago, incorporating a broader set of earth system phenomena and resolving finer grain processes. The uncertainties in a more complex earth system model have a tendency to multiply, leading to a broader spread.
There is a big concern here about how to communicate this. Does this mean the science is going backwards – that we know less now than we did five years ago (imagine the sort of hay that some of the crazier parts of the blogosphere will make of that)? Well, there has been all sorts of progress in the past five years, much of it to do with understanding the uncertainties. And one result is the realization that the previous generations of models have under-represented uncertainty in the physical climate system – i.e. the previous projections for future climate change were more precise than they should have been. The implications are very serious for policymaking, not because there is any weaker case now for action, but precisely the opposite – the case for urgent action is stronger because the risks are worse, and good policy must be based on sound risk assessment. A bigger model spread means there’s now a bigger risk of more extreme climate responses to anthropogenic emissions. This problem was discussed at a fascinating session at the AGU meeting last year on validating model uncertainty (See: “How good are predictions from climate models?“).
At the CCSM meeting last week, Julia Slingo, chief scientist at the UK Met Office put the problem of dealing with uncertainty into context, by reviewing the current state of the art in short and long term forecasting, in a fascinating talk “Uncertainty in Weather and Climate Prediction”.
She began with the work of Ed Lorenz. The Lorenz attractor is the prototype chaotic model. A chaotic system is not random, and the non-linear equations of a chaotic system demonstrate some very interesting behaviours. If it’s not random, then it must be predictable, but this predictability is flow dependent – where you are in the attractor will determine where you will go, but some starting points lead to a much more tightly constrained set of behaviours than others. Hence, the spread of possible outcomes depends on the initial state, and some states have more predictable outcomes than others.
Why stochastic forecasting is better than deterministic forecasting
Much of the challenge in weather forecasting is to sample the initial condition uncertainty. Rather than using a single (deterministic) forecast run, modern weather forecasting makes use of ensemble forecasts, which probe the space of possible outcomes from a given (uncertain) initial state. This then allows the forecasters to assess possible outcomes, estimate risks and possibilities, and communicate risks to the users. Note the phrase “to allow the forecasters to…” – the role of experts in interpreting the forecasts and explaining the risks is vital.
As an example, Julia showed two temperature forecasts for London, using initial conditions for 26 June on two consecutive years, 1994 and 1995. The red curves show the individual members of an ensemble forecast. The ensemble spread is very different in each case, demonstrating that some initial conditions are more predictable than others: one has very high spread of model forecasts, and the other doesn’t (although note that in both cases the actual observations lie within the forecast spread):
Ensemble forecasts for two different initial states (click for bigger)
The problem is that in ensemble forecasting, the root mean squared (rms) error of the ensemble mean often grows faster than the spread, which indicates that the forecast is under-dispersive; in other words, the models don’t capture enough of the internal variability in the system. In such cases, improving the models (by eliminating modeling errors) will lead to increased internal variability, and hence larger ensemble spread.
One response to this problem is the work on stochastic parameterizations. Essentially, this introduces noise into the model to simulate variability in the sub-grid processes. This can then reduce the systematic model error if it better captures the chaotic behaviour of the system. Julia mentioned three schemes that have been explored for doing this:
- Random Parameters (RP), in which some of the tunable model parameters are varied randomly. This approach is not very convincing as it indicates we don’t really know what’s going on in the model.
- Stochastic Convective Vorticity (SCV)
- Stochastic Kinetic Energy Backscatter (SKEB)
The latter two approaches tackle known weaknesses in the models, at the boundaries between resolved physical processes and sub-scale parameterizations. There is plenty of evidence in recent years that there are upscale energy cascades from unresolved scales, and that parametrizations don’t capture this. For example, in the backscatter scheme, some fraction of dissipated energy is scattered upscale and acts as a forcing for the resolved-scale flow. By including this in the ensemble prediction system, the forecast is no longer under-dispersive.
The other major approach is to increase the resolution of the model. Higher resolutions models will explicitly resolve more of the moist processes in sub-kilometer scale, and (presumably) remove this source of model error, although it’s not yet clear how successful this will be.
But what about seasonal forecasting – surely this growth of uncertainty prevents any kind of forecasting? People frequently ask “If we can’t predict weather beyond the next week, why is it possible to make seasonal forecasts?” The reason is that for longer term forecasts, the boundary forcings start to matter more. For example, if you add a boundary forcing to the Lorenz attractor, it changes the time in which the system stays in some part of the attractor, without changing the overall behaviour of the chaotic system. For a weak forcing, the frequency of occurrence of different regimes is changed, but the number and spatial patterns are unchanged. Under strong forcing, even the patterns of regimes are modified as the system goes through bifurcation points. So if we know something about the forcing, we can forecast the general statistics of weather, even if it’s not possible to say what the weather will be at a particular location at a particular time.
Of course, there’s still a communication problem: people feel weather, not the statistics of climate.
Building on the early work of Charney and Shukla (e.g. see their 1981 paper on monsoon predictability), seasonal to decadal prediction using coupled atmosphere-ocean systems does work, whereas 20 years ago, we would never have believed it. But again, we get the problem that some parts of the behaviour space are easier to predict than others. For example, the onset of El Niño is much harder to predict than the decay.
In a fully coupled system, systematic and model-specific errors grow much more strongly. Because the errors can grow quickly, and bias the probability distribution of outcomes, seasonal and decadal forecasts may not be reliable. So we assess reliability of a given model using hindcasts. Every time you change the model, you have to redo the hindcasts to check reliability. This gives a reasonable sanity check for seasonal forecasting, but for decadal prediction, it is challenging has we have very limited observational base.
And now, we have another problem: climate change is reducing the suitability of observations from the recent past to validate the models, even for seasonal prediction:
Climate Change shifts the climatology, so that models tuned to 20th century climate might no longer give good forecasts
Hence, a 40-year hindcast set might no longer be useful for validating future forecasts. As an example, the UK Met Office got into trouble for failing to predict the cold winter in the UK for 2009-2010. Re-analysis of the forecasts indicates why: Models that are calibrated on a 40-year hindcast gave only 20% probability of cold winter (and this was what was used for the seasonal forecast last year). However, models that are calibrated on just the past 20-years gave a 45% probability. Which indicates that the past 40 years might no longer be a good indicator of future seasonal weather. Climate change makes seasonal forecasting harder!
Today, the state-of-the-art for longer term forecasts is multi-model ensembles, but it’s not clear this is really the best approach, it just happens to be where we are today. Multi-model ensembles have a number of strengths: Each model is extensively tested by its own community and a large pool of alternative components provides some sampling across structural assumptions. But they are still an ensemble of opportunity – they do not systematically sample uncertainties. Also the set is rather small – e.g. 21 different models. So the sample is too small for determining the distribution of possible changes, and the ensembles are especially weak for predicting extreme events.
There has been a major effort on quantifying uncertainty over last few years at the Hadley Centre, using a perturbed physics ensemble. This allows for a larger sample: 100s (or even 10,000s in climateprediction.net) of variants of the same model. The poorly constrained model parameters are systematically perturbed, within expert-suggested ranges. But this still doesn’t sample the structural uncertainty in the models, because all the variants are from a single base model. As an example of this work, the UKCP09 project was an attempt to move from uncertainty ranges (as in AR4) to a probability density function (pdf) for likely change. UKCP uses over 400 model projections to compute the pdf. Although there are many problems with the UKCP (see the AGU discussion for a critique), but they were a step forward in understanding how to quantify uncertainty. [Note: Julia acknowledged weaknesses in both CP.net and the UKCP projects, but pointed out that they are mainly interesting as examples of how forecasting methodology is changing]
Another approach is to show which factors tend to dominate the uncertainty. For example, a pie chart showing impact of different sources of uncertainty (model weaknesses, carbon cycle, natural variability, downscaling uncertainty) on the forecast for rainfall in 2020s vs 2080s is interesting – for the 2020s, the uncertainty about the carbon cycle is relatively small factor, whereas for the 2080s it’s a much bigger factor.
Julia suggests it’s time for a coordinated study of the effects of model resolution on uncertainty. Every modeling group is looking at this, but they are not doing standardized experiments, so comparisons are hard.
Here is an example from Tim Palmer. In AR4, WG1 chapter 11 gave an assessment of regional patterns of change in precipitation. For some regions, it was impossible to give a prediction (the white areas), whereas for others, the models appear to give highly confident predictions. But the confidence might be misplaced because many of the models have known weaknesses that are relevant to future precipitation. For example, the models don’t simulate persistent blocking anticyclones very well. Which means that it’s wrong to assume that if most models agree, we can be confident in the prediction. For example, the Athena experiments with very high resolution models (T1259) showed much better blocking behaviour against the observational dataset ERA40. This implies we need to be more careful about selecting models for a multi-model ensemble for certain types of forecast.
The real butterfly effect raises some fundamental unanswered questions about convergence of climate simlations with increasing resoltion. Maybe there is an irreducible level of uncertainty in climate change. And if so, what is it? How much will increased resolution reduce the uncertainty? Will things be much better when we can resolve processes at 20km, 2km, or even 0.2km? compared to say 200km? Once we reach a certain resolution (e.g. 20km) is it just as good to represent small scale motions using stochastic equations? And what’s the most effective way to use the available computing resources as we increase the resolution? [There's an obvious trade-off between increasing the size of the ensemble, and increasing the resolution of individual ensemble members]
Julia’s main conclusion is that Lorenz’ theory of chaotic systems now pervades all aspects of weather and climate prediction. Estimating and reducing uncertainty requires better multi-scale physics, higher resolution models, and more complete observations.
Some of the questions after the talk probed these issues a little more. For example, Julia was asked how to handle policymakers demanding better decadal prediction, when we’re not ready to deliver it. Her response was that she believes higher resolution modeling will help, but that we haven’t proved this yet, so we have to manage expectations very carefully. She was also asked about the criteria to use to use for including different models in an ensemble – e.g. should we exclude models that don’t conserve physical quantities, that don’t do blocking, etc? For UKCP09, the criteria were global in nature, but this isn’t sufficient – we need criteria that test for skill with specific phenomena such as El Nino. Because the inclusion criteria aren’t clear enough yet, the UKCP project couldn’t give advice on wind in the projections. In the long run, the focus should be on building the best model we can, rather than putting effort into exploring perturbed physics, but we have to balance needs of users for better probablistic predictions against need to get on and develop better phyiscs in the models.
Finally, on the question of interpretation, Julia was asked what if users (of the forecasts) can’t understand or process probablistic forecasts? Julia pointed out that some users can process probablistic forecasts, and indeed that’s exactly what they need. For example, the insurance industry. Others use it as input for risk assessment – e.g. water utilities. So we do have to distinguish the needs of different types of users.