I wasn’t going to post anything about the CRU emails story (apart from my attempt at humour), because I think it’s a non-story. I’ve read a few of the emails, and it looks no different to the studies I’ve done of how climate science works. It’s messy. It involves lots of observational data sets, many of which contain errors that have to be corrected. Luckily, we have a process for that – it’s called science. It’s carried out by a very large number of people, any of whom might make mistakes. But it’s self-correcting, because they all review each other’s work, and one of the best ways to get noticed as a scientist is to identify and correct a problem in someone else’s work. There is, of course, a weakness in the way such corrections are usually done to a previously published paper. The corrections appear as letters and other papers in various journals, and don’t connect up well to the original paper. Which means you have to know an area well to understand which papers have stood the test of time and which have been discredited. Outsiders to a particular subfield won’t be able to tell the difference. They’ll also have a tendency to seize on what they think are errors, but actually have already been addressed in the literature.

If you want all the gory details about the CRU emails, read the RealClimate posts (here and here) and the lengthy discussion threads that follow from them. Many of the scientists involved in the CRU emails comment in these threads, and the resulting picture of the complexity of the data and analysis that they have to deal with is very interesting. But of course, none of this will change anyone’s minds. If you’re convinced global warming is a huge conspiracy, you’ll just see scientists trying to circle the wagons and spin the story their way. If you’re convinced that AGW is now accepted as scientific fact, then you’ll see scientists tearing their hair out at the ignorance of their detractors. If you’re not sure about this global warming stuff, you’ll see a lively debate about the details of the science, none of which makes much sense on its own. Don’t venture in without a good map and compass. (Update: I’ve no idea who’s running this site, but it’s a brilliant deconstruction of the allegations about the CRU emails).

But one issue has arisen in many discussions about the CRU emails that touches strongly on the research we’re doing in our group. Many people have looked at the emails and concluded that if the CRU had been fully open with its data and code right from the start, there would be no issue now. This is of course, a central question in Alicia‘s research on open science. While in principle, open science is a great idea, in practice, there are many hurdles, including the fear of being “scooped”, the need to give appropriate credit, the problems of metadata definition and of data provenance, the cost of curation, and the fact that software has a very short shelf-life, and so on. For the CRU dataset at the centre of the current kerfuffle, someone would have to go back to all the data sources, and re-negotiate agreements about how the data can be used. Of course, the anti-science crowd just think that’s an excuse.

However, for climate scientists there is another problem, which is the workload involved in being open. Gavin, at RealClimate, raises this issue in response to a comment that just putting the model code online is not sufficient. For example, if someone wanted to reproduce a graph that appears in a published paper, they’ll need much more: the script files, the data sets, the parameter settings, the spinup files, and so on. Michael Tobis argues that much of this can be solved with good tools and lots of automated scripts. Which would be fine if we were talking about how to help other scientists replicate the results.

Unfortunately, in the special case of climate science, that’s not what we’re talking about. A significant factor in the reluctance of climate scientists to release code and data is to protect themselves from denial-of-service attacks. There is a very well-funded and PR-savvy campaign to discredit climate science. Most scientists just don’t understand how to respond to this. Firing off hundreds of requests to CRU to release data under the freedom of information act, despite each such request being denied for good legal reasons, is the equivalent of frivolous lawsuits. But even worse, once datasets and codes are released, it is very easy for an anti-science campaign to tie the scientists up in knots trying to respond to their attempts to poke holes in the data. If the denialists were engaged in an honest attempt to push the science ahead, this would be fine (although many scientists would still get frustrated – they are human too).

But in reality, the denialists don’t care about the science at all; their aim is a PR campaign to sow doubt in the minds of the general public. In the process, they effect a denial-of-service attack on the scientists – the scientists can’t get on with doing their science because their time is taken up responding to frivolous queries (and criticisms) about specific features of the data. And their failure to respond to each and every such query will be trumpeted as an admission that an alleged error is indeed an error. In such an environment, is it perfectly rational not to release data and code – it’s better to pull up the drawbridge and get on with the drudgery of real science in private. That way the only attacks are complaints about lack of openness. Such complaints are bothersome, but much better than the alternative.

In this case, because the science is vitally important for all of us, it’s actually in the public interest that climate scientists be allowed to withhold their data. Which is really a tragic state of affairs. The forces of anti-science have a lot to answer for.

Update: Robert Grumbine has a superb post this morning on why openness and reproducibility is intrinsically hard in this field

25. November 2009 · 7 comments · Categories: humour

(with apologies to Maurice Sendak)

The night Mike wore his lab coat,
and made scientific discoveries of one kind and another,
the denialists called him a fraudster
and Mike said: “I’ll prove you wrong!”
So they sent his emails to the media without any context.
That very night on his blog,
a jungle of obfuscating comments grew,
and grew,
until the discussions became enflamed,
and spilled out into the internet all around.
And an ocean of journalists tumbled by,
with a high public profile for Mike,
and he argued away through night and day,
and in and out of weeks,
and almost over a year,
everywhere the denialists are.
And whenever he came to a place where the denialists are,
they talked their terrible talking points,
and quoted their terrible quotes,
and showed their terrible cherry picking,
and demonstrated their terrible ignorance.
Until Mike said “Be still!”
and tamed them with his Nature trick
of showing them actual data sets without blinking once.
And they were baffled, and called him the biggest denialist of all,
and declared him king of the denialists.
“And now”, said Mike, “let the data speak for itself.”
And he sent the denialists off to look at the evidence without their talking points.
Then Mike, the king of all denialists said,
“I’m lonely”,
and wanted to be where someone actually appreciated rational discussion.
Then all around, from far away, across the world
He saw evidence of good solid scientific work.
So he said “I’ll give up arguing with the denialists”
But the denialists cried
“Oh please don’t stop – we’ll eat you up, we need you so”.
And Mike said “No!”
And the denialists talked their terrible talking points,
and quoted their terrible quotes,
and showed their terrible cherry picking,
and demonstrated their terrible ignorance.
But Mike stepped back into his research, and waved goodbye.
And worked on, almost over a year and in and out of weeks and through a day,
in the sanity of his very own lab.
Where he found his peer-reviewed papers waiting for him.
And they were all accepted.

(PS, if you’ve no idea what this is referring to, trust me, you’re better off not knowing)

Update: I should have added a link to an even better humorous response! And this one too!

Brad points out that much of my discussion for a research agenda in climate change informatics focusses heavily on strategies for emissions reduction (aka Mitigation) and neglects the equally important topic of ensuring communities can survive the climate changes that are inevitable (aka Adaptation). Which is an important point. When I talk about the goal of keeping temperatures to below a 2°C rise, it’s equally important to acknowledge that we’ve almost certainly already lost any chance of keeping peak temperature rise much below 2°C.

Which means, of course, that we have some serious work to do, in understanding the impact of climate change on existing infrastructure, and to integrate an awareness of the likely climate change issues into new planning and construction projects. This is, of course, what Brad’s Adaptation and Impacts research division focusses on. There are some huge challenges to do with how we take the data we have (e.g. see the datasets in the CCCSN), downscale these to provide more localized forecasts, and then figure out how to incorporate these into decision making.

One existing tool to point out is the World Bank’s ADAPT, which is intended to help analyze projects in the planning stage, and identify risks related to climate change adaptation. This is quite a different decision-making task from the emissions reduction decision tools I’ve been looking at. But just as important.

Yesterday, I posted that the total budget of fossil fuel emissions we can ever emit is 1 trillion tonnes of Carbon. And that we’ve burnt through about half of that since the dawn of industrialization. Today, I read in the Guardian that existing oil reserves may have been deliberately overestimated by the International Energy Agency. George Monbiot explains how frightening this could be, given the likely impact of lower oil supplies on food production. Madeleine Bunting equates the reluctance to discuss this with the head-in-the-sand attitude that preceded last year’s financial crisis. Looks like the more pessimistic of the peak oil folks may have had it right all along.

None of these articles however makes the link to climate change (Monbiot only mentions it in passing in response to comments). So, which problem is bigger, peak oil or climate change? Does one cancel out the other? Should I stop worrying about the trillionth tonne, if the oil simply doesn’t exist to get there?

A back of the envelope calculation tells me that more than half of the world’s estimated remaining reserves of fossil fuels have to stay buried in the ground if we are to stay within a trillion tonnes. Here’s the numbers:

  • Oil: The Energy Watch Group estimates there are 854 Gb (gigabarrels) of oil left, while industry official figures put it at well over 1200Gb). Let’s split the difference and say 1,000Gb (1×10^12). Jim Bliss calculates that each barrel of crude oil releases about 100kg of carbon. That gives us 0.1 trillion tonnes of Carbon from oil.
  • Coal: Wikipedia tells us there are just under 1 trillion tonnes of proved recoverable coal reserves, and that coal has a carbon intensity of about 0.8, so that gives us 0.8 trillion tonnes of Carbon from coal.
  • Natural Gas: The US EIA gives the world’s natural gas reserves as about somewhat over 6,000 trillion cubic feet, which converts to about 170 trillion cubic meters. Each cubic meter gives about 0.5kg Carbon, so we have 85 trillion kg, or 0.08 trillion tonnes of Carbon from gas.

That all adds up to about 1 trillion tonnes of carbon from estimated fossil fuel reserves, the vast majority of which is coal. If we want a 50:50 chance of staying below 2ºC temperature rise, we can only burn half this much over the next few centuries. If we want better odds, say a 1-in-4 chance of exceeding 2ºC, we can only burn a quarter of it.

Conclusion: More than one half of all remaining fossil fuel reserves must remain unused. So peak oil and peak coal won’t save us. I would even go so far as to say that the peak oil folks are only about half as worried as they should be!

Our paper, Engineering the Software for Understanding Climate Change finally appeared today in IEEE Computing in Science and Engineering. The rest of the issue looks interesting too – a special issue on software engineering in computational science. Kudos to Greg and Andy for pulling it together.

Update: As the final paper is behind a paywall, folks might find this draft version useful. The final published version was edited for journal house style, and shortened to fit page constraints. Needless to say, I prefer my original draft…

I posted a few times already about Allen et al’s paper on the Trillionth Tonne, ever since I saw Chris Jones present it at the EGU meeting in April. Basically, the work gets to the heart of the global challenge. If we want to hold temperatures below a 2°C rise, the key factor is not how much we burn in fossil fuels each year, but the cumulative emissions over centuries (because once we release carbon molecules from being buried under the ground, they tend to stay in the carbon cycle for centuries).

Allen et. al. did a probablistic analysis, and found that cumulative emissions of about 1 trillion tonnes of carbon give us a most likely peak temperature rise of 2ºC (with a 90% confidence interval of 1.3 – 3.9°C). We’ve burnt about half of this total since the beginning of the industrial revolution, so basically, we mustn’t burn more than another 1/2 trillion tonnes. We’ll burn through that in less than 30 years at current emissions growth rates. Clearly, we can’t keep burning fossil fuels at the current rate and then just stop on a dime when we get to a trillion tonnes. We have to follow a reduction curve that gets us reducing emissions steadily over the next 50-60 years, until we get to zero net emissions. (One implication of this analysis is that a large amount of existing oil and coal reserves have to stay buried in the ground, which will be hard to ensure given how much money there is to be made in digging it up and selling it).

Anyway, there’s now a website with a set of counters to show how well we’re doing: Trillionthtonne.org. Er, not so well right now, actually.

While at Microsoft last week, Gina Venolia introduced me to George. Well, not literally, as he wasn’t there, but I met his proxy. Gina and co have been experimenting with how to make a remote team member feel part of the team, without the frequent travel, in the Embodied Social Proxies project. The current prototype kit comes pretty close to getting that sense of presence (Gina also has a great talk on this project):

The kit cost about $4000 to put together, and includes:

  • a monitor for a life-sized headshot;
  • two cameras – a very wide angle camera to capture the whole room, plus a remote control camera to pan and zoom (e.g. for seeing slides);
  • noise canceling telecom unit for audio;
  • adjustable height rig to allow George to sit or stand;
  • and of course, wheels, so he can be pushed around to different workspaces.

Now, the first question I had was: could this solve our problem of allowing remote participants to join in a hands-on workshop at a conference? At the last workshop on software research and climate change, we had the great idea that remote participants could appear on a laptop via skype, and be carried around between breakout sessions by a local buddy. Of course, skype wasn’t up to the job, and our remote participants ended up having their own mini-workshop. I suspect the wireless internet at most conferences won’t handle this either – the connections tend to get swamped.

But I still think the idea has legs (well, not literally!). See, $4000 is about what it would cost in total travel budget to send someone to Cape Town for the next ICSE. If we can buy much of the kit we need to create a lightweight version of the ESP prototype locally in Cape Town, and use a laptop for the monitor, we could even throw away much of the kit at the end of the conference and still come in under the typical travel budget (not that we would throw it away though!). I think the biggest challenges will be getting a reliable enough internet connection (we’ll probably need to set up our own routers), and figuring out how to mount the kit onto some local furniture for some degree of portability.

Well, if we’re serious about finding solutions to climate change, we have to explore ideas like this.

PS Via this book (thx, Greg) I learned the word “detravelization”. No idea if the chapter on detravelization is any good (because Safari books online doesn’t work at UofT), but I’m clearly going to have a love-hate relationship with a word that’s simultaneously hideous and perfectly apt.

Our group had three posters accepted for presentation at the upcoming AGU Fall Meeting. As the scientific program doesn’t seem to be amenable to linking, here are the abstracts in full:

Poster Session IN11D. Management and Dissemination of Earth and Space Science Models (Monday Dec 14, 2009, 8am – 12:20pm)

Fostering Team Awareness in Earth System Modeling Communities

S. M. Easterbrook; A. Lawson; and S. Strong
Computer Science, University of Toronto, Toronto, ON, Canada.

Existing Global Climate Models are typically managed and controlled at a single site, with varied levels of participation by scientists outside the core lab. As these models evolve to encompass a wider set of earth systems, this central control of the modeling effort becomes a bottleneck. But such models cannot evolve to become fully distributed open source projects unless they address the imbalance in the availability of communication channels: scientists at the core site have access to regular face-to-face communication with one another, while those at remote sites have access to only a subset of these conversations – e.g. formally scheduled teleconferences and user meetings. Because of this imbalance, critical decision making can be hidden from many participants, their code contributions can interact in unanticipated ways, and the community loses awareness of who knows what. We have documented some of these problems in a field study at one climate modeling centre, and started to develop tools to overcome these problems. We report on one such tool, TracSNAP, which analyzes the social network of the scientists contributing code to the model by extracting the data in an existing project code repository. The tool presents the results of this analysis to modelers and model users in a number of ways: recommendation for who has expertise on particular code modules, suggestions for code sections that are related to files being worked on, and visualizations of team communication patterns. The tool is currently available as a plugin for the Trac bug tracking system.

Poster Session IN31B. Emerging Issues in e-Science: Collaboration, Provenance, and the Ethics of Data (Wednesday Dec 16, 2009, 8am – 12:20pm)

Identifying Communication Barriers to Scientific Collaboration

A. M. Grubb; and S. M. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

The lack of availability of the majority of scientific artifacts reduces credibility and discourages collaboration. Some scientists have begun to advocate for reproducibility, open science, and computational provenance to address this problem, but there is no consolidated effort within the scientific community. There does not appear to be any consensus yet on the goals of an open science effort, and little understanding of the barriers. Hence we need to understand the views of the key stakeholders – the scientists who create and use these artifacts.

The goal of our research is to establish a baseline and categorize the views of experimental scientists on the topics of reproducibility, credibility, scooping, data sharing, results sharing, and the effectiveness of the peer review process. We gathered the opinions of scientists on these issues through a formal questionnaire and analyzed their responses by topic.

We found that scientists see a provenance problem in their communications with the public. For example, results are published separately from supporting evidence and detailed analysis. Furthermore, although scientists are enthusiastic about collaborating and openly sharing their data, they do not do so out of fear of being scooped. We discuss these serious challenges for the reproducibility, open science, and computational provenance movements.

Poster Session GC41A. Methodologies of Climate Model Confirmation and Interpretation (Thursday Dec 17, 2009, 8am – 12:20pm)

On the software quality of climate models

J. Pipitone; and S. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. Defect density analysis is an established software engineering technique for studying software quality. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.

Criteria for tools that communicate climate science to a broader audience (click for bigger)

Criteria for tools that communicate climate science to a broader audience (click for bigger)

I gave my talk last night to TorCHI on Usable Climate Science. I think it went down well, especially considering that I hadn’t finished preparing the slides, and had just gotten off the plane from Seattle. I’ll post the slides soon, once I have a chance to tidy them up. But, judging by the questions and comments, one slide in particular went down well.

I put this together when trying to organize my thoughts about what’s wrong with a number of existing tools/websites in the space of climate science communication. I’ll post the critique of existing tools soon, but I guess I should first explain the criteria:

  • Trustworthy (i.e. the audience must be able to trust the information content):
    • Collective Curation captures the idea that a large community of people is responsible for curating the information content. The extreme example is, of course, wikipedia.
    • Open means that we can get inside and see how it’s all put together. Open source and open data probably need no explanation, but I also want to get across the idea of “open reasoning” – for example, users need access to the calculations and assumptions built into any tool that gives recommendations for energy choices.
    • Provenance means that we know where the information came from, and can trace it back to source. Especially important is the ability to trace back to peer-reviewed scientific literature, or to trusted experts.
    • And the tool should help to build a community by connecting people with one another, through sharing of their knowledge.
  • Appropriate (i.e. the form and content of the information must be appropriate to the intended audience(s)):
    • Accessible for audience – information must build on what people already know, and be provided in a form that allows them to assimilate it (Vygotsky’s Zone of Proximal Development captures this idea well).
    • Contextualized means that the tool provides information that is appropriate to the audience’s specific context, or can be customized for that context. For example, information about energy choices depends on location.
    • Zoomable means that different users can zoom in for more detailed information if they wish. I particularly like the idea of infinite zoom shown off well in this demo. But I don’t just mean visually zoomable – I mean zoomable in terms of information detail, so people who want to dive into the detailed science can if they wish.
  • Effective (i.e. actually works at communicating information and stimulating action):
    • Narrative force is something that seems to be missing from most digital media – the tool must tell a story rather than just provide information.
    • Get the users to form the right mental models so that they understand the science as more than just facts and figures, and understand how to think about the risks.
    • Support exploration to allow users to follow their interests. Most web-based tools are good at this, but often at the expense of narrative force.
    • Give the big picture. For climate change this is crucial – we need to encourage systems thinking if we’re ever going to get good at collective decision making.
  • Compelling (i.e. something that draws people in):
    • Cool, because coolness is how viral marketing works. If it’s cool people will tell others about it.
    • Engaging, so that people want to use it and are drawn in by it.
    • Fun and Entertaining, because we’re often in danger of being too serious. This is especially important for stuff targeted at kids. If it’s not as much fun as the latest video games, then we’re already losing their attention.

During the talk, one of the audience members suggested adding actionable to my list, i.e. it actually leads to appropriate action, changes in behaviour, etc. I’m kicking myself for forgetting this, and can’t now decide whether it belongs under effective, or is an entirely new category. I’ll welcome suggestions.

I’m visiting Microsoft this week, and am fascinated to discover the scope and expertise in climate change at Microsoft Research (MSR), particularly through their Earth, Energy and Environment theme (also known as E3).

Microsoft External Research (MER) is the part of MSR that builds collaborative research relationships with academic and other industrial partners. It is currently headed by Tony Hey, who was previously director of the UK’s e-science initiative (and obviously, as a fellow Brit, he’s a delight to chat to). Tony is particularly passionate about the need to communicate science to the broader public.

The E3 initiative within MER is headed by Dan Fay, who has a fascinating blog, where I found a pointer to a thought-provoking essay by Bill Gail (of the Virtual Earth project) in the Bulletin of the American Meteorological Society on Achieving Climate Sustainability. Bill opens up the broader discussion of what climate sustainability actually means (beyond the narrow focus on physical properties such as emissions of greenhouse gases). The core of his essay is the observation that humanity has now replaced nature as the controller of the entire climate system, despite the fact that we’re hopelessly ill-equipped either philosophically or politically to take on this role right now (this point was also made very effectively at the end of Gwynne Dyer’s book, and in Michael Tobis’ recent talk on the Cybernetics of Climate). More interestingly, Bill argues that we began to assume this role much earlier that most people think: about 7,000 years ago at the dawn of agricultural society, when we first started messing around with existing ecosystems.

The problem I have with Bill’s paper though, is that he wants to expand the scope of the climate policy framework at a time when even the limited, weak framework we have is under attack from a concerted misinformation campaign. Back to that point about public understanding of the science: we have to teach the public about the unavoidable physical facts about greenhouse gases first, to get at least to a broad consensus of the urgent need to move to a zero-carbon economy. You can’t start the broader discussion about longer term climate sustainability unless we at least establish a broad public understanding of the physics of greenhouse gases.

I’ve finally managed to post the results of our workshop on Software Research and Climate Change, held at Onward/Oopsla last month. We did lots of brainstorming, and attempted to cluster the ideas, as you can see in the photos of our sticky notes.

After the workshop, I attempted to boil down the ideas even further, and came up with three clusters of research:

  1. Green IT (i.e. optimize power consumption of software and all things controlled by software (also known as “make sure ICT is no longer part of the problem”). Examples of research in this space include:
    • Power aware computing (better management of power in all devices from mobile to massive installations).
    • Green controllers (smart software to optimize and balance power consumption in everything that consumes power).
    • Sustainability as a first class requirement in software system design.
  2. Computer-Supported Collaborative Science (also known as eScience – i.e. software to support and accelerate inter-disciplinary science in climatology and related disciplines). Examples of research in this space include:
    • Software engineering tools/techniques for climate modellers
    • Data management for data-intensive science
    • Open Notebook science (electronic notebooks)
    • Social network tools for knowledge finding and expertise mapping
    • Smart ontologies
  3. Software to improve global collective decision making (which includes everything from tools to improve public understanding of science through to decision support at multiple levels: individual, community, government, inter-governmental,…). Examples of research in this space include:
    • Simulations, games, educational software to support public understanding of the science (usable climate science)
    • massive open collaborative decision support
    • carbon accounting for corporate decision making
    • systems analysis of sustainability in human activity systems (requires multi-level systems thinking)
    • better understanding of the processes of social epistemology

My personal opinion is that (1) is getting to be a crowded field,  which is great, but will only yield up to about 15% of the 100% reduction in carbon emissions we’re aiming for. (2) is has been mapped out as part of several initiatives in the UK and US on eScience, but  there’s still a huge amount to be done. (3) is pretty much a green  field (no pun intended) at the moment. It’s this third area that  fascinates me the most.