One of the things that strikes me about discussions of climate change, especially from those who dismiss it as relatively harmless, is a widespread lack of understanding on how non-linear systems behave. Indeed, this seems to be one of the key characteristics that separate those who are alarmed at the prospect of a warming climate from those who are not.

At the AGU meeting this month, Kerry Emanuel presented a great example of this in his talk on “Hurricanes in a Warming Climate”. I only caught his talk by chance, as I was slipping out of the session in the next room, but I’m glad I did, because he made an important point about how we think about the impacts of climate change, and in particular, showed two graphs that illustrate the point beautifully.

Kerry’s talk was an overview of a new study that estimates changes in damage from tropical cyclones with climate change, using a new integrated assessment model. The results are reported in detail in a working paper at the World Bank. The report points out that the link between hurricanes and climate change remains controversial. So, while Atlantic hurricane power has more than doubled over the last 30 years, and model forecasts show an increase in the average intensity of hurricanes in a warmer world, there is still no clear statistical evidence of a trend in damages caused by these storms, and hence a great deal of uncertainty about future trends.

The analysis is complicated by several factors:

  • Increasing insurance claims from hurricane damage in the US have a lot to do with growing economic activity in vulnerable regions. Indeed, expected economic development in the regions subject to tropical storm damage means that there’s certain to be big increases in damage even if there were no warming at all.
  • The damage is determined more by when and where each storm makes landfall than it is by the intensity of the storm.
  • There simply isn’t enough data to detect trends. More than half of the economic damage due to hurricanes in the US since 1870 was caused by just 8 storms.

The new study by Emanuel and colleagues overcomes some of these difficulties by simulating large numbers of storms. They took the outputs of four different Global Climate Models, using the A1B emissions scenario, and fed them into a cyclone generator model to simulate thousands of storms, comparing the characteristics of these storms with those that have caused damage in the US in the last few decades, and then adjusting the damage estimates according to anticipated changes in population and economic activity in the areas impacted (for details, see the report).

The first thing to note is that the models forecast only a small change in hurricanes, typically a slight decrease in medium-strength storms and a slight increase in more intense storms. For example, at first sight, the MIROC model indicates almost no difference:

Probability density for storm damage on the US East Coast, generated from the MIROC model for current vs. year 2100, under the A1B scenario, for which this model forecasts a global average temperature increase of around 4.5C. Note that x axis is a logarithmic scale: 8 means $100 million, 9 means $1 billion, 10 means $10 billion, etc (source: Figure 9 in Mendelsohn et al, 2011)

Note particularly that at the peak of the graph, the model shows a very slight reduction in the number of storms (consistent with a slight decrease in the overall frequency of hurricanes), while on the upper tail, the model shows a very slight increase (consistent with a forecast that there’ll be more of the most intense storms). The other three models show slightly bigger changes by the year 2100, but overall, the graphs seem very comforting. It looks like we don’t have much to worry about (at least as far as hurricane damage from climate change is concerned). Right?

The problem is that the long tail is where all the action is. The good news is that there appears to be a fundamental limit on storm intensity, so the tail doesn’t really get much longer. But the problem is that it only takes a few more of these very intense storms to make a big difference in the amount of damage caused. Here’s what you get if you multiply the probability by the damage in the above graph:

Changing risk of hurricane damage due to climate change. Calculated as probability times impact. (Source: courtesy of K. Emanuel, from his AGU 2011 talk)

That tiny change in the long tail generates a massive change in the risk, because the system is non-linear. If most of the damage is done by a few very intense storms, then you only need a few more of them to greatly increase the damage. Note in particular, what happens at 12 on the damage scale – these are trillion dollar storms. [Update: Kerry points out that the total hurricane damage is proportional to the area under the curves of the second graph].

The key observation here is that the things that matter most to people (e.g. storm damage) do not change linearly as the climate changes. That’s why people who understand non-linear systems tend to worry much more about climate change than people who do not.

Here’s the call for papers for a workshop we’re organizing at ICSE next May:

The First International Workshop on Green and Sustainable Software (GREENS’2012)

(In conjunction with the 34th International Conference on Software Engineering (ICSE 2012), Zurich, Switzerland, June 2-9, 2012

Important Dates:

  • 17th February 2012 – paper submission
  • 19th March 2012 – notification of acceptance
  • 29th March 2012 – camera-ready
  • 3rd June 2011 – workshop

Workshop theme and goals: The Focus of the GREENS workshop is the engineering of green and sustainable software. Our goal is to bring together academics and practitioners to discuss research initiatives, challenges, ideas, and results in this critically important area of the software industry. To this end GREENS will both discuss the state of the practice, especially at the industrial level, and define a roadmap, both for academic research and for technology transfer to industry. GREENS seeks contributions addressing, but not limited to, the following list of topics:

Concepts and foundations:

  • Definition of sustainability properties (e.g. energy and power consumption, green-house gases emissions, waste and pollutants production), their relationships, their units of measure, their measurement procedures in the context of software-intensive systems, their relationships with other properties (e.g. response time, latency, cost, maintainability);
  • Green architectural knowledge, green IT strategies and design patterns;

Greening domain-specific software systems:

  • Energy-awareness in mobile software development;
  • Mobile software systems scalability in low-power situations;
  • Energy-efficient techniques aimed at optimizing battery consumption;
  • Large and ultra-large scale green information systems design and development (including inter-organizational effects)

Greening of IT systems, data and web centers:

  • Methods and approaches to improve sustainability of existing software systems;
  • Customer co-creation strategies to motivate behavior changes;
  • Virtualization and offloading;
  • Green policies, green labels, green metrics, key indicators for sustainability and energy efficiency;
  • Data center and storage optimization;
  • Analysis, assessment, and refactoring of source code to improve energy efficiency;
  • Workload balancing;
  • Lifecycle Extension

Greening the process:

  • Methods to design and develop greener software systems;
  • Managerial and technical risks for a sustainable modernization;
  • Quality & risk assessments, tradeoff analyses between energy efficiency, sustainability and traditional quality requirements;

Case studies, industry experience reports and empirical studies:

  • Empirical data and analysis about sustainability properties, at various granularity levels: complete infrastructure, or nodes of the infrastructure (PCs, servers, and mobile devices);
  • Studies to define technical and economic models of green aspects;
  • Return on investment of greening projects, reasoning about the triple bottom line of people, planet and profits;
  • Models of energy and power consumption, at various granularity levels;
  • Benchmarking of power consumption in software applications;

Guidelines for Submission: We are soliciting papers in two distinct categories:

  1. Research papers describing innovative and significant original research in the field (maximum 8 pages);
  2. Industrial papers describing industrial experience, case studies, challenges, problems and solutions (maximum 8 pages).

Please submit your paper online through EasyChair (see the GREENS website). Submissions should be original and unpublished work. Each submitted paper will undergo a rigorous review process by three members of the Program Committee. All types of papers must conform to the ICSE submission format and guidelines. All accepted papers will appear in the ACM Digital Library.

Workshop Organizers:

  • Patricia Lago (VU University Amsterdam, The Netherlands)
  • Rick Kazman (University of Hawaii, USA)
  • Niklaus Meyer (Green IT SIG, Swiss Informatics Society, Switzerland)
  • Maurizio Morisio (Politecnico di Torino, Italy)
  • Hausi A. Mueller (University of Victoria, Canada)
  • Frances Paulisch (Siemens Corporate Technology, Germany)
  • Giuseppe Scanniello (Università della Basilicata, Italy)
  • Olaf Zimmermann (IBM Research, Zurich, Switzerland)

Program committee:

  • Marco Aiello, University of Groningen, Netherlands
  • Luca Ardito, Politecnico di Torino, Italy
  • Ioannis Athanasiadis, Democritus Univ. of Thrace, Greece
  • Rami Bahsoon, University College London, UK
  • Ivica Crnkovic, Malardalen University, Sweden
  • Steve Easterbrook, University of Toronto, Canada
  • Hakan Erdogmus, Things Software
  • Anthony Finkelstein, University College London, UK
  • Matthias Galster, University of Groningen, Netherlands
  • Ian Gorton, Pacific Northwest National Laboratory, USA
  • Qing Gu, VU University Amsterdam, Netherlands
  • Wolfgang Lohmann, Informatics and Sustainability Research, Swiss Federal Laboratories for Materials Science and Technology, Switzerland
  • Lin Liu, School of Software, Tsinghua University, China
  • Alessandro Marchetto, Fondazione Bruno Kessler, Italy
  • Henry Muccini, University of L’Aquila, Italy
  • Stefan Naumann, Trier University of Applied Sciences, Environmental Campus, Germany
  • Cesare Pautasso, University of Lugano, Switzerland
  • Barbara Pernici, Politecnico di Milano, Italy
  • Giuseppe Procaccianti, Politecnico di Torino, Italy
  • Filippo Ricca, University of Genova
  • Antony Tang, Swinburne University of Tech., Australia
  • Antonio Vetro’, Fraunhofer IESE, USA
  • Joost Visser, Software Improvement Group and Knowledge Network Green Software, Netherlands
  • Andrea Zisman, City University London, UK

A number of sessions at the AGU meeting this week discussed projects to improve climate literacy among different audiences:

  • The Climate Literacy and Energy Awareness Network (CLEANET), are developing concept maps for use in middle school and high school, along with a large set of pointers to educational resources on climate and energy for use in the classroom.
  • The Climate Literacy Zoo Education Network (CliZEN). Michael Mann (of Hockey Stick Fame) talked about this project, which was a rather nice uplifting change from hearing about his experiences with political attacks on his work. This is a pilot effort, currently involving ten zoos, mainly in the north east US. So far, they have completed a visitor survey across a network of zoos, plus some aquaria, exploring the views of visitors on climate change, using the categories of the Six Americas report. The data they have collected show that zoo visitors tend to be more skewed towards the “alarmed” category compared to the general US population. [Incidentally, I’m impressed with their sample size: 3,558 responses. The original Six Americas study only had 981, and most surveys in my field have much smaller sample sizes]. The next steps in the project are to build on this audience analysis to put together targeted information and education material that links what we know about climate climate with it’s impact on specific animals at the zoos (especially polar animals).
  • The Climate Interpreter Project. Bill Spitzer from the New England Aquarium talked about this project. Bill points out that aquaria (and museums, zoos etc) are have an important role to play, because people come for the experience, which must be enjoyable, but they do expect to learn something, and they do trust museums and zoos to provide them accurate information. This project focusses on the role of interpreters and volunteers, who are important because they tend to be more passionate, more knowledgeable, and come into contact with many people. But many interpreters are not yet comfortable in talking about issues around climate change. They need help, and training. Interpretation isn’t just transmission of information. It’s about translating science in a way that’s meaningful and resonates with an audience. It requires a systems perspective. The strategy adopted by this project is to begin with audience research, to understand people’s interests and passions; connect this with the cognitive and social sciences on how people learn, and how they make sense of what they’re hearing; and finally to make use of strategic framing, which gets away from the ‘crisis’ frame that dominates most news reporting (on crime, disasters, fires), but which tends to leave people feeling overwhelmed, which leads them to treat it as someone else’s problem. Thinking explicitly about framing allows you to connect information about climate change with people’s values, with what they’re passionate about, and even with their sense of self-identity. The website climateinterpreter.org describes what they’ve learnt so far
    (As an aside, Bill points out that it can’t just be about training the interpreters – you need institutional support and leadership, if they are to focus on a controversial issue. Which got me thinking about why science museums tend to avoid talking much about climate change – it’s easy for the boards of directors to avoid the issue, because of worries about whether it’s politically sensitive, and hence might affect fundraising.)
  • The WorldViews Network. Rachel Connolly from Nova/WGBH presented this collaboration between museums, scientists and TV networks. Partners include planetariums and groups interested in data visualization, GIS data, mapping, many from an astronomy background. Their 3-pronged approach, called TPACK, identifies three types of knowledge: technological, pedagogical, and content knowledge.  The aim is to take people from seeing, to knowing, to doing. For example, they might start with a dome presentation, but bring into it live and interactive web resources, and then move on to community dialogues. Storylines use visualizations that move seamlessly across scale: cosmic, global, bioregional. They draw a lot on the ideas from Rockstrom’s planetary boundaries, within which they’r focussing on three: climate, biodiversity loss and ocean acidification. A recent example from Denver, in May, focussed on water. On the cosmic scale, they look at where water comes from as planets are formed. They eventually bring this down to the bioregional scale, looking at the rivershed for Denver, and the pressures on the Colorado River. Good visual design is a crucial part of the project (Rachel showed a neat example of a visualization of the size of water on the planet: comparing all water, with fresh water and frozen water. Another fascinating example was a satellite picture of the border of Egypt and Israel, where the different water management strategies either side of the border produce a starkly visible difference either side of the border. (Rachel also recommended Sciencecafes.org and the Buckminster Fuller Challenge).
  • ClimateCommunication.org. There was a lot of talk through the week about this project, led by Susan Hassol and Richard Somerville, especially their recent paper in Physics Today, which explores the use of jargon, and how it can mislead the general public. The paper went viral on the internet shortly after it was published, and and they used an open google doc to collect many more examples. Scientists are often completely unaware the non-specialists have different meaning for jargon terms, which can  then become a barrier for communication. My favourite examples from Susan’s list are “aerosol”, which to the public means a spray can (leading to a quip by Glenn Beck who had heard that aerosols cool the planet); ‘enhanced’, which the public understands as ‘made better’ so the ‘enhanced greenhouse effect’ sounds like a good thing, and ‘positive feedback’ which also sounds like a good thing, as it suggests a reward for doing something good.
  • Finally, slightly off topic, but I was amused by the Union of Concerned Scientists’ periodic table of political interferences in science.

On Thursday, Kaitlin presented her poster at the AGU meeting, which shows the results of the study she did with us in the summer. Her poster generated a lot of interest, especially the visualizations she has of the different model architectures. Click on thumbnail to see the full poster at the AGU site:

A few things to note when looking at the diagrams:

  • Each diagram shows the components of a model, scale to their relative size by lines of code. However, the models are not to scale with one another, as the smallest, UVic’s, is only a tenth of the size of the biggest, CESM. Someone asked what accounts for that size. Well, the UVic model is an EMIC rather than a GCM. It has a very simplified atmosphere model that does not include atmospheric dynamics, which makes it easier to run for very long simulations (e.g. to study paleoclimate). On the other hand, CESM is a community model, with a large number of contributors across the scientific community. (See Randall and Held’s point/counterpoint article in last months IEEE Software for a discussion of how these fit into different model development strategies).
  • The diagrams show the couplers (in grey), again sized according to number of lines of code. A coupler handles data re-gridding (when the scientific components use different grids), temporal aggregation (when the scientific components run on different time steps) along with other data handling. These are often invisible in diagrams the scientists create of their models, because they are part of the infrastructure code; however Kaitlin’s diagrams show how substantial they are in comparison with the scientific modules. The European models all use the same coupler, following a decade-long effort to develop this as a shared code resource.
  • Note that there are many different choices associated with the use of a coupler, as sometimes it’s easier to connect components directly rather through the coupler, and the choice may be driven by performance impact, flexibility (e.g. ‘plug-and-play’ compatibility) and legacy code issues. Sea ice presents an interesting example, because its extent varies over the course of a model run. So somewhere there must be code that keeps track of which grid cells have ice, and then routes the fluxes from ocean and atmosphere to the sea ice component for these grid cells. This could be done in the coupler, or in any of the three scientific modules. In the GFDL model, sea ice is treated as an interface to the ocean, so all atmosphere-ocean fluxes pass through it, whether there’s ice in a particular cell or not.
  • The relative size of the scientific components is a reasonable proxy for functionality (or, if you like, scientific complexity/maturity). Hence, the diagrams give clues about where each lab has placed its emphasis in terms of scientific development, whether by deliberate choice, or because of availability (or unavailability) of different areas of expertise. The differences between the models from different labs show some strikingly different choices here, for example between models that are clearly atmosphere-centric, versus models that have a more balanced set of earth system components.
  • One comment we received in discussions around the poster was about the places where we have shown sub-components in some of the models. Some modeling groups are more explicit about naming the sub-components, and indicating them in the code. Hence, our ability to identify these might be more dependent on naming practices rather than any fundamental architectural differences.

I’m sure Kaitlin will blog more of her reflections on the poster (and AGU in general) once she’s back home.

I’m at the AGU meeting in San Francisco this week. The internet connections in the meeting rooms suck, so I won’t be twittering much, but will try and blog any interesting talks. But first things first! I presented my poster in the session on “Methodologies of Climate Model Evaluation, Confirmation, and Interpretation” yesterday morning. Nice to get my presentation out of the way early, so I can enjoy the rest of the conference.

Here’s my poster, and the abstract is below (click for the full sized version at the AGU ePoster site):

A Hierarchical Systems Approach to Model Validation

Introduction

Discussions of how climate models should be evaluated tend to rely on either philosophical arguments about the status of models as scientific tools, or on empirical arguments about how well runs from a given model match observational data. These lead to quantitative measures expressed in terms of model bias or forecast skill, and ensemble approaches where models are assessed according to the extent to which the ensemble brackets the observational data.

Such approaches focus the evaluation on models per se (or more specifically, on the simulation runs they produce), as if the models can be isolated from their context. Such approaches may overlook a number of important aspects of the use of climate models:

  • the process by which models are selected and configured for a given scientific question.
  • the process by which model outputs are selected, aggregated and interpreted by a community of expertise in climatology.
  • the software fidelity of the models (i.e. whether the running code is actually doing what the modellers think it’s doing).
  • the (often convoluted) history that begat a given model, along with the modelling choices long embedded in the code.
  • variability in the scientific maturity of different components within a coupled earth system model.

These omissions mean that quantitative approaches cannot assess whether a model produces the right results for the wrong reasons, or conversely, the wrong results for the right reasons (where, say the observational data is problematic, or the model is configured to be unlike the earth system for a specific reason).

Furthermore, quantitative skill scores only assess specific versions of models, configured for specific ensembles of runs; they cannot reliably make any statements about other configurations built from the same code.

Quality as Fitness for Purpose

The problem is that there is no such thing as “the model”. The body of code that constitutes a modern climate model actually represents an enormous number of possible models, each corresponding to a different way of configuring that code for a particular run. Furthermore, this body of code isn’t a static thing. The code is changed on a daily basis, through a continual process of experimentation and model improvement. This applies even to any specific “official release”, which again is just a body of code that can be configured to run as any of a huge number of different models, and again, is not unchanging – as with all software, there will be occasional bugfix releases applied to it, along with improvements to the ancillary datasets.

Evaluation of climate models should not be about “the model”, but about the relationship between a modelling system and the purposes to which it is put. More precisely, it’s about the relationship between particular ways of building and configuring models and the ways in which the runs produced by those models are used.

What are the uses of a climate model? They vary tremendously:

  • To provide inputs to assessments of the current state of climate science;
  • To explore the consequences of a current theory;
  • To test a hypothesis about the observational system (e.g. forward modeling);
  • To test a hypothesis about the calculational system (e.g. to explore known weaknesses);
  • To provide homogenized datasets (e.g. re-analysis);
  • To conduct thought experiments about different climates;
  • To act as a comparator when debugging another model;

In general, we can distinguish three separate systems: the calculational system (the model code); the theoretical system (current understandings of climate processes) and the observational system. In the most general sense, climate models are developed to explore how well our current understanding (i.e. our theories) of climate explain the available observations. And of course the inverse: what additional observations might we make to help test our theories.

We're dealing with relationships between three different systems

Validation of the Entire Modeling System

When we ask questions about likely future climate change, we don’t ask the question of the calculational system, we ask it of the theoretical system; the models are just a convenient way of probing the theory to provide answers.
When society asks climate scientists for future projections, the question is directed at climate scientists, not their models. Modellers apply their judgment to select appropriate versions & configurations of the models to use, set up the runs, and interpret the results in the light of what is known about the models’ strengths and weaknesses and about any gaps between the computational models and the current theoretical understanding. And they add all sorts of caveats to the conclusions they draw from the model runs when they present their results.

Validation is not a post-hoc process to be applied to an individual “finished” model, to ensure it meets some criteria for fidelity to the real world. In reality, there is no such thing as a finished model, just many different snapshots of a large set of model configurations, steadily evolving as the science progresses. Knowing something about the fidelity of a given model configuration to the real world is useful, but not sufficient to address fitness for purpose. For this, we have to assess the extent to which climate models match our current theories, and the extent to which the process of improving the models keeps up with theoretical advances.

Summary

Our approach to model validation extends current approaches:

  • down into the detailed codebase to explore the processes by which the code is built and tested. Thus, we build up a picture of the day-to-day practices by which modellers make small changes to the model and test the effect of such changes (both in isolated sections of code, and on the climatology of a full model). The extent to which these practices improve the confidence and understanding of the model depends on how systematically this testing process is applied, and how many of the broad range of possible types of testing are applied. We also look beyond testing to other software practices that improve trust in the code, including automated checking for conservation of mass across the coupled system, and various approaches to spin-up and restart testing.
  • up into the broader scientific context in which models are selected and used to explore theories and test hypotheses. Thus, we examine how features of the entire scientific enterprise improve (or impede) model validity, from the collection of observational data, creation of theories, use of these theories to develop models, choices for which model and which model configuration to use, choices for how to set up the runs, and interpretation of the results. We also look at how model inter-comparison projects provide a de facto benchmarking process, leading in turn to exchanges of ideas between modelling labs, and hence advances in the scientific maturity of the models.

This layered approach does not attempt to quantify model validity, but it can provide a systematic account of how the detailed practices involved in the development and use of climate models contribute to the quality of modelling systems and the scientific enterprise that they support. By making the relationships between these practices and model quality more explicit, we expect to identify specific strengths and weaknesses the modelling systems, particularly with respect to structural uncertainty in the models, and better characterize the “unknown unknowns”.

I’ve spent much of the last month preparing a major research proposal for the Ontario Research Fund (ORF), entitled “Integrated Decision Support for Sustainable Communities”. We’ve assembled a great research team, with professors from a number of different departments, across the schools of engineering, information, architecture, and arts and science. We’ve held meetings with a number of industrial companies involved in software for data analytics and 3D modeling, consultancy companies involved in urban planning and design, and people from both provincial and city government. We started putting this together in September, and were working to a proposal deadline at the end of January.

And then this week, out of the blue, the province announced that it was cancelling the funding program entirely, “in light of current fiscal challenges”. The best bit in the letter I received was:

The work being done by researchers in this province is recognized and valued. This announcement is not a reflection of the government’s continued commitment through other programs that provides support to the important work being done by researchers.

I’ve searched hard for the “other programs” they mention, but there don’t appear to be any. It’s increasingly hard to get any finding for research, especially trans-disciplinary research. Here’s the abstract from our proposal:

Our goal is to establish Ontario as a world leader in building sustainable communities, through the use of data analytics tools that provide decision-makers with a more complete understanding of how cities work. We will bring together existing expertise in data integration, systems analysis, modeling, and visualization to address the information needs of citizens and policy-makers who must come together to re-invent towns and cities as the basis for a liveable, resilient, carbon-neutral society. The program integrates the work of a team of world-class researchers, and builds on the advantages Ontario enjoys as an early adopter of smart grid technologies and open data initiatives.

The long-term sustainability of Ontario’s quality of life and economic prosperity depends on our ability to adopt new, transformative approaches to urban design and energy management. The transition to clean energy and the renewal of urban infrastructure must go hand-in-hand, to deliver improvements across a wide range of indicators, including design quality, innovation, lifestyle, transportation, energy efficiency and social justice. Design, planning and decision-making must incorporate a systems-of-systems view, to encompass the many processes that shape modern cities, and the complex interactions between them.

Our research program integrates emerging techniques in five theme areas that bridge the gap between decision-making processes for building sustainable cities and the vast sources of data on social demographics, energy, buildings, transport, food, water and waste:

  • Decision-Support and Public Engagement: We begin by analyzing the needs of different participants, and develop strategies for active engagement;
  • Visualization: We will create collaborative and immersive visualizations to enhance participatory decision-making;
  • Modelling and Simulation: We will develop a model integration framework to bring together models of different systems that define the spatio-temporal and socio-economic dynamics of cities, to drive our visualizations;
  • Data Privacy: We will assess the threats to privacy of all citizens that arise when detailed data about everyday activities is mined for patterns and identify appropriate techniques for protecting privacy when such data is used in the modeling and analysis process;
  • Data Integration and Management: We will identify access paths to the data sources needed to drive our simulations and visualizations, and incorporate techniques for managing and combining very large datasets.

These themes combine to provide an integrated approach to intelligent, data-driven planning and decision-making. We will apply the technologies we develop in a series of community-based design case studies, chosen to demonstrate how our approach would apply to increasingly complex problems such as energy efficiency, urban intensification, and transportation. Our goal is to show how an integrated approach can improve the quality and openness of the decision-making process, while taking into account the needs of diverse stakeholders, and the inter-dependencies between policy, governance, finance and sustainability in city planning.

Because urban regions throughout the world face many of the same challenges, this research will allow Ontario to develop a technological advantage in areas such as energy management and urban change, and enabling a new set of creative knowledge-based services address the needs of communities and governments. Ontario is well placed to develop this as a competitive advantage, due to its leadership in the collection and maintenance of large datasets in areas such as energy management, social well-being, and urban infrastructure. We will leverage this investment and create a world-class capability not available in any other jurisdiction.

Incidentally, we spent much of last fall preparing a similar proposal for the previous funding round. That was rejected on the basis that we weren’t clear enough what the project outcomes would be, and what the pathways to commercialization were. For our second crack at it, we were planning to focus much more specifically on the model integration part, by developing a software framework for coupling urban system models, based on a detailed requirements analysis of the stakeholders involved in urban design and planning, with case studies on neighbourhood re-design and building energy retro-fits. Our industrial partners have identified a number of routes to commercial services that would make use of such software. Everything was coming together beautifully. *Sigh*.

Now we have to find some other source of funding for this. Contributions welcome!

I went to a talk yesterday by Mark Pagani (Yale University), on the role of methane hydrates in the Paleocene-Eocene Thermal Maximum (PETM). The talk was focussed on how to explain the dramatic warming seen at the end of the Paleocene, 56 million years ago. During the Paleocene, the world was already much warmer than it is today (by around 5°C), and had been ice free for millions of years. But at the end of the Paleocene, the tempature shot up by at least another 5°C, over the course of a few thousand years, giving us a world with palm trees and crocodiles in the arctic, and this “thermal maximum” lasted around 100,000 years. The era brought a dramatic reduction in animal body size (although note: the dinosaurs had already been wiped out at the beginning of the Paleocene), and saw the emergence of small mammals.

But what explains the dramatic warming? The story is fascinating, involving many different lines of evidence, and I doubt I can do it justice without a lot more background reading. I’ll do a brief summary here, as I want to go on to talk about something that came up in the questions about climate sensitivity.

First, we know that the warming at the PETM coincided with a massive influx of carbon, and the fossil record shows a significant shift in carbon isotopes, so it was a new and different source of carbon. The resulting increase in CO2 warmed the planet in the way we would expect. But where did the carbon come from? The dominant hypothesis has been that it came from a sudden melting of undersea methane hydrates, triggered by tectonic shifts. But Mark explained that this hypothesis doesn’t add up, because there isn’t enough carbon to account for the observed shift in carbon isotopes, and it also requires a very high value for climate sensitivity (in the range 9-11°C), which is inconsistent with the IPCC estimates of 2-4.5ºC. Some have argued this is evidence that climate sensitivity really is much higher, or perhaps that our models are missing some significant amplifiers of warming (see for instance, the 2008 paper by Zeebe et al., which caused a ruckus in the media). But, as Mark pointed out, this really misses the key point. If the numbers are inconsistent with all the other evidence about climate sensitivity, then it’s more likely that the methane hydrates hypothesis itself is wrong. Mark’s preferred explanation is a melting of the antarctic permafrost, caused by a shift in orbital cycles, and indeed he demonstrates that the orbital pattern leads to similar spikes (of decreasing amplitude) throughout the Eocene. Prior to the PETM, Antarctica would have been ice free for so long that a substantial permafrost would have built up, and even conservative estimates based on today’s permafrost in the sub-arctic regions would have enough carbon to explain the observed changes. (Mark has a paper on this coming out soon).

That was very interesting, but for me the most interesting part was in the discussion at the end of the talk. Mark had used the term “earth system sensitivity” instead of “climate sensitivity”, and Dick Peltier suggested he should explain the distinction for the benefit of the audience.

Mark began by pointing out that the real scientific debate about climate change (after you discount the crazies) is around the actual value of climate sensitivity, which is shorthand for the relationship between changes in atmospheric concentrations of CO2 and the resulting change in global temperature:

Key relationships in the climate system. Adapted from a flickr image by ClimateSafety (click image for the original)

The term climate sensitivity was popularized in 1979 by the Charney report, and refers to the eventual temperature response to a doubling of CO2 concentrations, taking into account fast feedbacks such as water vapour, but not the slow feedbacks such as geological changes. Charney sensitivity also assumes everything else about the earth system (e.g. ice sheets, vegetation, ocean biogeochemistry, atmospheric chemistry, aerosols, etc) is held constant. The reason the definition refers to warming per doubling of CO2 is because the radiative effect of CO2 is roughly logarithmic, so you get about he same warming each time you double atmospheric concentrations. Charney calculated climate sensitivity to be 3°C (±1.5), a value that was first worked out in the 1950’s, and hasn’t really changed, despite decades of research since then. Note: equilibrium climate sensitivity is also not the same as the transient response.

Earth System Sensitivity is then the expected change in global temperature in response to a doubling of CO2 when we do take into account all the other aspects of the earth system. This is much harder to estimate, because there is a lot more uncertainty around different kinds of interactions in the earth system. However, many scientists expect it to be higher than the Charney sensitivity, because, on balance, most of the known earth system feedbacks are positive (i.e. they amplify the basic greenhouse gas warming).

Mark put it this way: Earth System Sensitivity is like an accordion. It stretches out or contracts, depending on the current state of the earth system. For example, if you melt the arctic sea ice, this causes an amplifying feedback because white ice has a higher albedo than the dark sea water that replaces it. So if there’s a lot of ice to melt, it would increase earth system sensitivity. But if you’ve already melted all the sea ice, the effect is gone. Similarly, if the warming leads to a massive drying out and burning of vegetation, that’s another temporary amplification that will cease once you’ve burned off most of the forests. If you start the doubling in a warmer world, in which these feedbacks are no longer available, earth system sensitivity might be lower.

The key point is that, unlike Charney sensitivity, earth system sensitivity depends on where you start from. In the case of the PETM, the starting point for the sudden warming was a world that was already ice free. So we shouldn’t expect the earth system sensitivity to be the same as it is in the 21st century. Which certainly complicates the job of comparing climate changes in the distant past with those of today.

But, more relevantly for current thinking about climate policy, thinking in terms of Charney sensitivity is likely to be misleading. If earth system sensitivity is significantly bigger in today’s earth system, which seems likely, then calculations of expected warming based on Charney sensitivity will underestimate the warming, and hence the underestimate the size of the necessary policy responses.

I’ll be giving a talk to the Toronto section of the IEEE Systems Council on December 1st, in which I plan to draw together several of the ideas I’ve been writing about recently on systems thinking and leverage points, and apply them to the problem of planetary boundaries. Come and join in the discussion if you’re around:

Who’s flying this ship? Systems Engineering for Planet Earth

Thurs, Dec 1, 2011, 12:00 p.m. – 1:00 p.m, Ryerson University (details and free registration here)

At the beginning of this month, the human population reached 7 billion people. The impact of humanity on the planet is vast: we use nearly 40% of the earth’s land surface to grow food, we’re driving other species to extinction at a rate not seen since the last ice age, and we’ve altered the planet’s energy balance by changing the atmosphere. In short, we’ve entered a new geological age, the Anthropocene, in which our collective actions will dramatically alter the inhabitability of the planet. We face an urgent task: we have to learn how to manage the earth as a giant system of systems, before we do irreparable damage. In this talk, I will describe some of the key systems that are relevant to this task, including climate change, agriculture, trade, energy production, and the global financial system. I will explore some of the interactions between these systems, and characterize the feedback cycles that alter their dynamics and affect their stability. This will lead us to an initial attempt to identify planetary boundaries for some of these systems, which together define a safe operating space for humanity. I will end the talk by offering a framework for thinking about the leverage points that may allow us to manage these systems to keep them within the safe operating limits.

I had several interesting conversations at WCRP11 last week about how different the various climate models are. The question is important because it gives some insight into how much an ensemble of different models captures the uncertainty in climate projections. Several speakers at WCRP suggested we need an international effort to build a new, best of breed climate model. For example, Christian Jakob argued that we need a “Manhattan project” to build a new, more modern climate model, rather than continuing to evolve our old ones (I’ve argued in the past that this is not a viable approach). There have also been calls for a new international climate modeling centre, with the resources to build much larger supercomputing facilities.

The counter-argument is that the current diversity in models is important, and re-allocating resources to a single centre would remove this benefit. Currently around 20 or so different labs around the world build their own climate models to participate in the model inter-comparison projects that form a key input to the IPCC assessments. Part of the argument for this diversity of models is that when different models give similar results, that boosts our confidence in those results, and when they give different results, the comparisons provide insights into how well we currently understand and can simulate the climate system. For assessment purposes, the spread of the models is often taken as a proxy for uncertainty, in the absence of any other way of calculating error bars for model projections.

But that raises a number of questions. How well do the current set of coupled climate models capture the uncertainty? How different are the models really? Do they all share similar biases? And can we characterize how model intercomparisons feed back into progress in improving the models? I think we’re starting to get interesting answers to the first two of these questions, while the last two are, I think, still unanswered.

First, then, is the question of representing uncertainty. There are, of course, a number of sources of uncertainty. [Note that ‘uncertainty’ here doesn’t mean ‘ignorance’ (a mistake often made by non-scientists); it means, roughly, how big should the error bars be when we make a forecast, or more usefully, what does the probability distribution look like for different climate outcomes?]. In climate projections, sources of uncertainty can be grouped into three types:

  • Internal variability: natural fluctuations in the climate (for example, the year-to-year differences caused by the El Niño Southern Oscillation, ENSO);
  • Scenario uncertainty: the uncertainty over future carbon emissions, land use changes, and other types of anthropogenic forcings. As we really don’t know how these will change year-by-year in the future (irrespective of whether any explicit policy targets are set), it’s hard to say exactly how much climate change we should expect.
  • Model uncertainty: the range of different responses to the same emissions scenario given by different models. Such differences arise, presumably, because we don’t understand all the relevant processes in the climate system perfectly. This is the kind of uncertainty that a large ensemble of different models ought to be able to assess.

Hawkins and Sutton analyzed the impact of these different type of uncertainty on projections of global temperature over the range of a century. Here, Fractional Uncertainty means the ratio of the model spread to the projected temperature change (against a 1971-2000 mean):

This analysis shows that for short term (decadal) projections, the internal variability is significant. Finding ways of reducing this (for example by better model initialization from the current state of the climate) is important the kind of near-term regional projections needed by, for example, city planners, and utility and insurance companies, etc. Hawkins & Sutton indicate with dashed lines some potential to reduce this uncertainty for decadal projections through better initialization of the models.

For longer term (century) projections, internal variability is dwarfed by scenario uncertainty. However, if we’re clear about the nature of the scenarios used, we can put scenario uncertainty aside and treat model runs as “what-if” explorations – if the emissions follow a particular pathway over the 21st Century, what climate response might we expect?

Model uncertainty remains significant over both short and long term projections. The important question here for predicting climate change is how much of this range of different model responses captures the real uncertainties in the science itself. In the analysis above, the variability due to model differences is about 1/4 of the magnitude of the mean temperature rise projected for the end of the century. For example, if a given emissions scenario leads to a model mean of +4°C, the model spread would be about 1°C, yielding a projection of +4±0.5°C. So is that the right size for an error bar on our end-of-century temperature projections? Or, to turn the question around, what is the probability of a surprise – where the climate change turns out to fall outside the range represented by the current model ensemble?

Just as importantly, is the model ensemble mean the most likely outcome? Or do the models share certain biases so that the truth is somewhere other than the multi-model mean? Last year, James Annan demolished the idea that the models cluster around the truth, and in a paper with Julia Hargreaves, provides some evidence that the model ensembles do a relatively good job of bracketing the observational data, and, if anything, the ensemble spread is too broad. If the latter point is correct, then the model ensembles over-estimate the uncertainty.

This brings me to the question of how different the models really are. Over the summer, Kaitlin Alexander worked with me to explore the software architecture of some of the models that I’ve worked with from Europe and N. America. The first thing that jumped out at me when she showed me her diagrams was how different the models all look from one another. Here are six of them presented side-by-side. The coloured ovals indicate the size (in lines of code) of each major model component (relative to other components in the same model; the different models are not shown to scale), and the coloured arrows indicate data exchanges between the major components (see Kaitlin’s post for more details):

There are clearly differences in how the components are coupled together (for example, whether all data exchanges pass through a coupler, or whether components interact directly). In some cases, major subcomponents are embedded as subroutines within a model component, which makes the architecture harder to understand, but may make sense from a scientific point of view, when earth system processes themselves are tightly coupled. However, such differences in the code might just be superficial, as the choice of call structure should not, in principle affect the climatology.

The other significant difference is in the relative sizes of the major components. Lines of code isn’t necessarily a reliable measure, but it usually offers a reasonable proxy for the amount of functionality. So a model with an atmosphere model dramatically bigger than the other components indicates a model for which far more work (and hence far more science) has gone into modeling the atmosphere than the other components.

Compare for example, the relative sizes of the atmosphere and ocean components for HadGEM3 and IPSLCM5A, which, incidentally, both use the same ocean model, NEMO. HadGEMs has a much bigger atmosphere model, representing more science, or at least many more options for different configurations. In part, this is because the UK Met Office is an operational weather forecasting centre, and the code base is shared between NWP and climate research. Daily use of this model for weather forecasting offers many opportunities to improve the skill of the model (although improvement in skill in short term weather forecasting doesn’t necessarily imply improvements in skill for climate simulations). However, the atmosphere model is the biggest beneficiary of this process, and, in fact, the UK Met Office does not have much expertise in ocean modeling. In contrast, the IPSL model is the result of a collaboration between several similarly sized research groups, representing different earth subsystems.

But do these architectural differences show up as scientific differences? I think they do, but was finding this hard to analyze. Then I had a fascinating conversation at WCRP last week with Reto Knutti, who showed me a recent paper that he published with D. Masson, in which they analyzed model similarity from across the CMIP3 dataset. The paper describes a cluster analysis over all the CMIP3 models (plus three re-analysis datasets, to represent observations), based on how well the capture the full spatial field for temperature (on the left) and precipitation (on the right). The cluster diagrams look like this (click for bigger):

In these diagrams, the models from the same lab are coloured the same. Observational data are in pale blue (three observational datasets were included for temperature, and two for precipitation). Some obvious things jump out: the different observational datasets are more similar to each other than they are to any other model, but as a cluster, they don’t look any different from the models. Interestingly, models from the same lab tend to be more similar to one another, even when these span different model generations. For example, for temperature, the UK Met Office models HadCM3 and HadGEM1 are more like each other than they are like any other models, even though they run at very different resolutions, and have different ocean models. For precipitation, all the GISS models cluster together and are quite different from all the other models.

The overall conclusion from this analysis is that using models from just one lab (even in very different configurations, and across model generations) gives you a lot less variability than using models from different labs. Which does suggest that there’s something in the architectural choices made at each lab that leads to a difference in the climatology. In the paper, Masson & Knutti go on to analyze perturbed physics ensembles, and show that the same effect shows up here too. Taking a single model, and systematically varying the parameters used in the model physics still gives you less variability than using models from different labs.

There’s another followup question that I would like to analyze: do models that share major components tend to cluster together? There’s a growing tendency for a given component (e.g. an ocean model, an atmosphere model) to show up in more than one lab’s GCM. It’s not yet clear how this affects variability in a multi-model ensemble.

So what are the lessons here? First, there is evidence that the use of multi-model ensembles is valuable and important, and that these ensembles capture the uncertainty much better than multiple runs of a single model (no matter how it is perturbed). The evidence suggests that models from different labs are significantly different from one another both scientifically and structurally, and at least part of the explanation for this is that labs tend to have different clusters of expertise across the full range of earth system processes. Studies that compare model results with observational data (E.g. Hargreaves & Annan; Masson & Knutti) show that the observations looks no different from just another member of the multi-model ensemble (or to put it in Annan and Hargreaves’ terms, the truth is statistically indistinguishable from another model in the ensemble).

It would appear that the current arrangement of twenty or so different labs competing to build their own models is a remarkably robust approach to capturing the full range of scientific uncertainty with respect to climate processes. And hence it doesn’t make sense to attempt to consolidate this effort into one international lab.

One of the questions I’ve been chatting to people about this week at the WCRP Open Science Conference this week is whether climate modelling needs to be reorganized as an operational service, rather than as a scientific activity. The two respond to quite different goals, and hence would be organized very differently:

  • An operational modelling centre would prioritize stability and robustness of the code base, and focus on supporting the needs of (non-scientist) end-users who want models and model results.
  • A scientific modelling centre focusses on supporting scientists themselves as users. The key priority here is to support the scientists’ need to get their latest ideas into the code, to run experiments and get data ready to support publication of new results. (This is what most climate modeling centres do right now).

Both need good software practices, but those practices would look very different in the case when the scientists are building code for their own experiments, versus serving the needs of other communities. There are also very different resource implications: an operational centre that serves the needs of a much more diverse set of stakeholders would need a much larger engineering support team in relation to the scientific team.

The question seems very relevant to the conference this week, as one of the running themes has been the question of what “climate services” might look like. Many of the speakers call for “actionable science”, and there has been a lot of discussion of how scientists should work with various communities who need knowledge about climate to inform their decision-making.

And there’s clearly a gap here, with lots of criticism of how it works at the moment. For example, here’s a great from Bruce Hewitson on the current state of climate information:

“A proliferation of portals and data sets, developed with mixed motivations, with poorly articulated uncertainties and weakly explained assumptions and dependencies, the data implied as information, displayed through confusing materials, hard to find or access, written in opaque language, and communicated by interface organizations only semi‐aware of the nuances, to a user community poorly equipped to understand the information limitations”

I can’t argue with any of that. But it begs the question as to whether solving this problem requires a reconceptualization of climate modeling activities to make them much more like operational weather forecasting centres?

Most of the people I spoke to this week think that’s the wrong paradigm. In weather forecasting, the numerical models play a central role, and become the workhorse for service provision. The models are run every day, to supply all sorts of different types of forecasts to a variety of stakeholders. Sure, a weather forecasting service also needs to provide expertise to interpret model runs (and of course, also needs a vast data collection infrastructure to feed the models with observations). But in all of this, the models are absolutely central.

In contrast, for climate services, the models are unlikely to play such a central role. Take for example, the century-long runs, such as those used in the IPCC assessments. One might think that these model runs represent an “operational service” provided to the IPCC as an external customer. But this is a fundamentally mistaken view of what the IPCC is and what it does. The IPCC is really just the scientific community itself, reviewing and assessing the current state of the science. The CMIP5 model runs currently being done in preparation for the next IPCC assessment report, AR5, are conducted by, and for, the science community itself. Hence, these runs have to come from science labs working at the cutting edge of earth system modelling. An operational centre one step removed from the leading science would not be able to provide what the IPCC needs.

One can criticize the IPCC for not doing enough to translate the scientific knowledge into something that’s “actionable” for different communities that need such knowledge. But that criticism isn’t really about the modeling effort (e.g. the CMIP5 runs) that contributes to the Working Group 1 reports. It’s about how the implications of the working group 1 translate into useful information in working groups 2 and 3.

The stakeholders who need climate services won’t be interested in century-long runs. At most they’re interest in decadal forecasts (a task that is itself still in it’s infancy, and a long way from being ready for operational forecasting). More often, they will want help interpreting observational data and trends, and assessing impacts on health, infrastructure, ecosystems, agriculture, water, etc. While such services might make use of data from climate model runs, it generally involve run models regularly in an operational mode. Instead the needs would be more focussed on downscaling the outputs from existing model run datasets. And sitting somewhere between current weather forecasting and long term climate projections is the need for seasonal forecasts and regional analysis of trends, attribution of extreme events, and so on.

So I don’t think it makes sense for climate modelling labs to move towards an operational modelling capability. Climate modeling centres will continue to focus primarily on developing models for use within the scientific community itself. Organizations that provide climate services might need to develop their own modelling capability, focussed more on high resolution, short term (decadal or shorter) regional modelling, and of course, on assessment models that explore the interaction of socio-economic factors and policy choices. Such assessment models would make use of basic climate data from global circulation models (for example, calculations of climate sensitivity, and spatial distributions of temperature change), but don’t connect directly with climate modeling.

This week, I presented our poster on Benchmarking and Assessment of Homogenisation Algorithms for the International Surface Temperature Initiative (ISTI) at the WCRP Open Science Conference (click on the poster for a readable version).

This work is part of the International Surface Temperature Initiative (ISTI) that I blogged about last year. The intent is to create a new open access database for historial surface temperature records at a much higher resolution than has previously been available. In the past, only monthly averages were widely available; daily and sub-daily observations collected by meteorological services around the world are often considered commercially valuable, and hence tend to be hard to obtain. And if you go back far enough, much of the data was never digitized and some is held in deteriorating archives.

The goal of the benchmarking part of the project is to assess the effectiveness of the tools used to remove data errors from the raw temperature records. My interest in this part of the project stems from the work that my student, Susan Sim, did a few years ago on the role of benchmarking to advance research in software engineering. Susan’s PhD thesis described a theory that explains why benchmarking efforts tend to accelerate progress within a research community. The main idea is that creating a benchmark brings the community together to build consensus on what the key research problem is, what sample tasks are appropriate to show progress, and what metrics should be used to measure that progress. The benchmark then embodies this consensus, allowing different research groups to do detailed comparisons of their techniques, and facilitating sharing of approaches that work well.

Of course, it’s not all roses. Developing a benchmark in the first place is hard, and requires participation from across the community; a benchmark put forward by a single research group is unlikely to accepted as unbiased by other groups. This also means that a research community has to be sufficiently mature in terms of their collaborative relationships and consensus on common research problems (in Kuhnian terms, they must be in the normal science phase). Also, note that a benchmark is anchored to a particular stage of the research, as it captures problems that are currently challenging; continued use of a benchmark after a few years can lead to a degeneration of the research, with groups over-fitting to the benchmark, rather than moving on to harder challenges. Hence, it’s important to retire a benchmark every few years and replace it with a new one.

The benchmarks we’re exploring for the ISTI project are intended to evaluate homogenization algorithms. These algorithms detect and remove artifacts in the data that are due to things that have nothing to do with climate – for example when instruments designed to collect short-term weather data don’t give consistent results over the long-term record. The technical term for these is inhomogeneities, but I’ll try to avoid the word, not least because I find it hard to say. I’d like to call them anomalies, but that word is already used in this field to mean differences in temperature due to climate change. Which means that anomalies and inhomogeneities are, in some ways, opposites: anomalies are the long term warming signal that we’re trying to assess, and inhomogeneities represent data noise that we have to get rid of first. I think I’ll just call them bad data.

Bad data arise for a number of reasons, usually isolated to changes at individual recording stations: a change of instruments, an instrument drifting out of calibration, a re-siting, a slow encroachment of urbanization which changes the local micro-climate. Because these problems tend to be localized, they can often be detected by statistical algorithms that compare individual stations with their neighbours. In essence, the algorithms look for step changes and spurious trends in the data such as the following:

These bad data are a serious problem in climate science – for a recent example, see the post yesterday at RealClimate, which discusses how homogenization algorithms might have gotten in the way of understanding the relationship between climate change and the Russian heatwave of 2010. Unhelpfully, they’re also used by deniers to beat up climate scientists, as some people latched onto the idea of blaming warming trends on bad data rather than, say, actual warming. Of course, this ignores two facts: (1) climate scientists already spend a lot of time assessing and removing such bad data and (2) independent analysis has repeatedly shown that the global warming signal is robust with respect to such data problems.

However, such problems in the data still matter for the detailed regional assessments that we’ll need in the near future for identifying vulnerabilities (e.g. to extreme weather), and, as the example at RealClimate shows, for attribution studies for localized weather events and hence for decision-making on local and regional adaptation to climate change.

The challenge is that it’s hard to test how well homogenization algorithms work, because we don’t have access to the truth – the actual temperatures that the observational records should have recorded. The ISTI benchmarking project aims to fill this gap by creating a data set that has been seeded with artificial errors. The approach reminds me of the software engineering technique of bug seeding (aka mutation testing), which deliberately introduce errors into software to assess how good the test suite is at detecting them.

The first challenge is where to get a “clean” temperature record to start with, because the assessment is much easier if the only bad data in the sample are the ones we deliberately seeded. The technique we’re exploring is to start with the output of a Global Climate Model (GCM), which is probably the closest we can get to a globally consistent temperature record. The GCM output is on a regular grid, and may not always match the observational temperature record in terms of means and variances. So to make it as realistic as possible, we have to downscale the gridded data to yield a set of “station records” that match the location of real observational stations, and adjust the means and variances to match the real-world:

Then we inject the errors. Of course, the error profile we use is based on what we currently know about typical kinds of bad data in surface temperature records. It’s always possible there are other types of error in the raw data that we don’t yet know about; that’s one of the reasons for planning to retire the benchmark periodically and replace it with a new one – it allows new findings about error profiles to be incorporated.

Once the benchmark is created, it will be used within the community to assess different homogenization algorithms. Initially, the actual injected error profile will be kept secret, to ensure the assessment is honest. Towards the end of the 3-year benchmarking cycle, we will release the details about the injected errors, to allow different research groups to measure how well they did. Details of the results will then be included in the ISTI dataset for any data products that use the homogenization algorithms, so that users of these data products have more accurate estimates of uncertainty in the temperature record. Such estimates are important, because use of the processed data without a quantification of uncertainty can lead to misleading or incorrect research.

For more details of the project, see the Benchmarking and Assessment Working Group website, and the group blog.

How would you like to help the weather and climate research community digitize historical records before they’re lost forever to a fate such as this:

Watch this video, from the International Surface Temperature Initiative‘s data rescue initiative for more background (skip to around 2:20 for the interesting parts):

…and then get involved with the Data Rescue at Home Projects:

Our specialissue of IEEE Software, for Nov/Dec 2011, is out! The title for the issue is Climate Change: Science and Software, and the guest editors were me, Paul Edward, Balaji, and Reinhard Budich.

There’s a great editorial by Forrest Shull, reflecting on interviews he conducted with Robert Jacob at Argonne National Labs and Gavin Schmidt at NASA GISS. The papers in the issue are:

Unfortunately most of the content is behind a paywall, although you can read our guest editors introduction in full here. I’m working on making some of the other content more freely available too.

This is really last week’s news, but I practice slow science. A new web magazine has launched: Planet3.org.

The aim is to cover more in-depth analysis of climate change & sustainability, to get away from the usual false dichotomy between “deniers” and “activists”, and more into the question of what kind of future we’d like, and how we get there. Constructive, science-based discussions are welcome, and will be moderated to ensure the discussion threads are worth reading. The site also features a “best of the blogs” feed, and we’re experimenting with models for open peer-review for the more in-depth articles. And as an experimental collaborative project, there’s an ongoing discussion on how to build a community portal.

I’m proud to serve on the scientific review panel, and delighted that my essay on leverage points has been the main featured article this week.

Go check out the site and join in the discussions!