I attended a talk this morning by Holger Hoos, from UBC, and then had a fascinating conversation with him over lunch. He’s on an 8 week driving tour across Canada and the US, stopping off at universities along the way to meet with colleagues give talks. Great idea – more academics should do this (although I can’t figure out what I’d do with the kids…)

Anyway, what piqued my interest was the framing Holger used for the talk: we live in interesting times, and are faced with many grand challenges: climate change, peak oil, complex diseases, market turmoil, etc. Many of these challenges are due to complexity of various kinds, and to tame this complexity we need to be able to understand, model and control complex systems. And of course, taming complexity is what much of computer science is about.

The core of his talk was a fascinating look at new heuristic algorithms for solving NP hard problems, e.g. algorithms that that outperform the best TSP algorithms and SAT solvers, by using machine learning techniques to tweak the parameters on the heuristics to optimize them for different kinds of input. Which leads to a whole new approach of empirical algorithm design and algorithm engineering. One theme throughout the talk was shift in focus for algorithm design from thinking about worst case analysis, to thinking about handling typical cases, which is something I’ve long felt is a problem with theoretical computer science, and one of the reasons the field has been largely irrelevant when tackling most real engineering problems.

Anyway, for all that I enjoyed the talk, there seemed  to be a gap between the framing (tackling the grand challenges of our time) and the technical content (solutions to computationally intractable problems). Over lunch we talked about this. My observation is that, for climate change in particular, I don’t believe there are any aspects of the challenge that require solving computationally complex problems. It would be nice if there were – it would help me complete my map of how the various subfields of computer science can contribute to tackling climate change. There are obvious applications for information systems (aka databases), graphics and visualization, human computer interaction (usable climate models!!), software engineering, ubiquitous computing (e.g. sensor networks), systems (e.g. power aware computing), and so on.

We talked a little about whether climate models themselves count, but here the main challenges are in optimizing continuous mathematics routines for high performance, rather than solving complex discrete mathematics problems. For example, we speculated whether some of Holger’s work on applying machine learning techniques to parameter tuning could be applied to the parameter schemes for climate models, but even here, I’m not convinced, because there is no oracle. The problem is that climate scientists can’t write down good correctness criteria for climate models because the problem isn’t to develop a “formally correct” model, but rather a scientifically useful one. The model is good if it helps test a scientific hypothesis about how (some aspect of) earth systems work. A model that gets a good fit with observational data because the parameters have been ‘over-tuned’ will get a poor reception in the climate science community; the challenge is to get a model that matches observational data because we’ve correctly understood the underlying physical processes, not because we’ve blindly twiddled the knobs. However, I might be being overly pessimistic about this, and there might be scope for some of these techniques because model tuning still remains a challenging task in climate modeling.

But the more urgent and challenging problems in climate change remain squarely in the realm of how to wean the world off its addiction to fossil fuels as rapidly as possible. This is a problem of information (and overcoming disinformation), of behaviour (individual and social), of economics (although most of modern economic theory is useless in this respect), and of politics. Computer Science has a lot to offer in tackling the information problems, and also some useful abstraction and modeling techniques to understand the other problems. And of course, software is a critical enabling technology in the switch to alternative energy sources. But I still don’t see any computational complexity problems that need solving in all of this. Tell me I’m wrong!

Here’s the intro to a draft proposal I’m working on to set up a new research initiative in climate change informatics at U of T (see also: possible participants and ideas for a research agenda). Comments welcome.

Climate change is likely to be the defining issue of the 21st Century. The impacts of a climate change include a dramatic reduction of food production and water supplies, more extreme weather events, the spread of disease, sea level rise, ocean acidification, and mass extinctions. We are faced with the twin challenges of mitigation (avoiding the worst climate change effects by rapidly transitioning the world to a low-carbon economy) and adaptation (re-engineering the infrastructure of modern society so that we can survive and flourish on a hotter planet)
These challenges are global in nature, and pervade all aspects of society. To address them, researchers, engineers, policymakers, and educators from many different disciplines need to come to the table and ask what they can contribute. There are both short-term challenges (such as how to deploy, as rapidly as possible, existing technology to produce renewable energy; how to design government policies and international treaties to bring greenhouse gas emissions under control) and long-term challenges (such as how to complete the transition to a global carbon-neutral society by the latter half of this century).
For Ontario, climate change is both a challenge and an opportunity. The challenge comes in understanding the impacts and adapting to rapid changes in public health, agriculture, management of water and energy resources, transportation, urban planning, and so on. The opportunity is the creation of green jobs through the rapid development of new alternative energy sources and energy conservation measures. Indeed, it is the opportunity to become a world leader in low-carbon technologies.
While many of these challenges and opportunities are already well understood, the role of digital media as both a critical enabling technology and a growing service industry is less well understood. Digital media is critical to effective decision making on climate change issues at all levels. For governmental planning, simulations and visualizations are essential tools for designing and communicating policy choices. For corporations large and small, effective data gathering and business intelligence tools are needed to enable a transition to low-carbon energy solutions. For communities, social networking and web 2.0 technologies are the key tools in bringing people together and enabling coordinated action, and tracking the effectiveness of that action.
Research on climate change has generally clustered around a number of research questions, each studied in isolation. In the physical sciences, the focus is on the physical processes in the atmosphere and biosphere that lead to climate change. In geography and environmental sciences, there is a strong focus on impacts and adaptation. In economics there is a focus on the trade-offs around various policy instruments. In various fields of engineering there is a push for development and deployment of new low-carbon technologies.
Yet climate change is a systemic problem, and effective action requires an inter-disciplinary approach and a clear understanding of how these various spheres of activity interact. We need the appropriate digital infrastructure for these diverse disciplines to share data and results. We need to understand better how social and psychological processes (human behaviour, peer pressure, the media, etc) interact with political processes (policymaking, leadership, voting patterns, etc), and how both are affected by our level of understanding of the physical processes of climate change. And we need to understand how information about all these processes can be factored into effective decision-making.
To address this challenge, we propose the creation of a major new initiative on Climate Change Informatics at the University of Toronto. This will build on existing work across the university on digital media and climate change, and act as a focus for inter-disciplinary research. We will investigate the use of digital media to bridge the gaps between scientific disciplines, policymakers, the media, and public opinion.

Climate change is likely to be the defining issue of the 21st Century. The impacts of a climate change include a dramatic reduction of food production and water supplies, more extreme weather events, the spread of disease, sea level rise, ocean acidification, and mass extinctions. We are faced with the twin challenges of mitigation (avoiding the worst climate change effects by rapidly transitioning the world to a low-carbon economy) and adaptation (re-engineering the infrastructure of modern society so that we can survive and flourish on a hotter planet)

These challenges are global in nature, and pervade all aspects of society. To address them, researchers, engineers, policymakers, and educators from many different disciplines need to come to the table and ask what they can contribute. There are both short-term challenges (such as how to deploy, as rapidly as possible, existing technology to produce renewable energy; how to design government policies and international treaties to bring greenhouse gas emissions under control) and long-term challenges (such as how to complete the transition to a global carbon-neutral society by the latter half of this century).

For Ontario, climate change is both a challenge and an opportunity. The challenge comes in understanding the impacts and adapting to rapid changes in public health, agriculture, management of water and energy resources, transportation, urban planning, and so on. The opportunity is the creation of green jobs through the rapid development of new alternative energy sources and energy conservation measures. Indeed, it is the opportunity to become a world leader in low-carbon technologies.

While many of these challenges and opportunities are already well understood, the role of digital media as both a critical enabling technology and a growing service industry is less well understood. Digital media is critical to effective decision making on climate change issues at all levels. For governmental planning, simulations and visualizations are essential tools for designing and communicating policy choices. For corporations large and small, effective data gathering and business intelligence tools are needed to enable a transition to low-carbon energy solutions. For communities, social networking and web 2.0 technologies are the key tools in bringing people together and enabling coordinated action, and tracking the effectiveness of that action.

Research on climate change has generally clustered around a number of research questions, each studied in isolation. In the physical sciences, the focus is on the physical processes in the atmosphere and biosphere that lead to climate change. In geography and environmental sciences, there is a strong focus on impacts and adaptation. In economics there is a focus on the trade-offs around various policy instruments. In various fields of engineering there is a push for development and deployment of new low-carbon technologies.

Yet climate change is a systemic problem, and effective action requires an inter-disciplinary approach and a clear understanding of how these various spheres of activity interact. We need the appropriate digital infrastructure for these diverse disciplines to share data and results. We need to understand better how social and psychological processes (human behaviour, peer pressure, the media, etc) interact with political processes (policymaking, leadership, voting patterns, etc), and how both are affected by our level of understanding of the physical processes of climate change. And we need to understand how information about all these processes can be factored into effective decision-making.

To address this challenge, we propose the creation of a major new initiative on Climate Change Informatics at the University of Toronto. This will build on existing work across the university on digital media and climate change, and act as a focus for inter-disciplinary research. We will investigate the use of digital media to bridge the gaps between scientific disciplines, policymakers, the media, and public opinion.

Survey studies are hard to do well. I’ve been involved in some myself, and have helped many colleagues to design them, and we nearly always end up with problems when it comes to the data analysis. They are a powerful way of answering base-rate questions (i.e. the frequency or severity of some phenomena) or for exploring subjective opinion (which is, of course, what opinion polls do). But most people who design surveys don’t seem to know what they are doing. My checklist for determining if a survey is the right way to approach a particular research question includes the following:

  • Is it clear exactly what population you are interested in?
  • Is there a way to get a representative sample of that population?
  • Do you have resources to obtain a large enough sample?
  • Is it clear what variables need to be measured?
  • Is it clear how to measure them?

Most research surveys have serious problems getting enough people to respond to ensure the results really are representative, and the people who do respond are likely to be a self-selecting group with particularly strong opinions about the topic. Professional opinion pollsters put a lot of work into adjustments for sampling bias, and still often get it wrong. Researchers rarely have the resources to do this (and almost never repeat a survey, so never have the data to do such adjustments anyway). There are also plenty of ways to screw up on the phrasing of the questions and answer modes, such that you can never be sure people have all understood the questions in the same way, and that the available response modes aren’t biasing their responses. (Kitchenham has a good how-to guide)

ClimateSight recently blogged about a fascinating, unpublished survey of whether climate scientists think the IPCC AR4 is an accurate representation our current understanding of climate science. The authors themselves blog about their efforts to get the survey published here, here and here. Although they acknowledge some weaknesses to do with sampling size and representativeness, they basically think the survey itself is sound. Unfortunately, it’s not. As I commented on ClimateSight’s post, methodologically, this survey is a disaster. Here’s why:

The core problem with the paper is the design of the question and response modes. At the heart of their design is a 7-point Likert scale to measure agreement with the conclusions of the IPCC AR4. But this doesn’t work as a design for many reasons:

1) The IPCC AR4 is a massive document, which a huge number of different observations. Any climate scientist will be able to point to bits that are done better and bits that are done worse. Asking about agreement with it, without spelling out which of its many conclusions you’re asking about is hopeless. When people say they agree or disagree with it, you have no idea which of its many conclusions they are reacting to.

2) The response mode used in the study has a built in bias. If the intent is to measure the degree to which scientists think the IPCC accurately reflects, say, the scale of the global warming problem (whatever that means), then central position on the 7-point scale should be “the IPCC got it right”. In the study, this is point 5 on the scale, which immediately introduces a bias because there are twice as many available response modes available in to the left of this position (“IPCC overstates the problem”) than there are to the right (“IPCC understates the problem”). In other words, the scale itself is biased towards one particular pole.

3) The study authors gave detailed descriptive labels to each position on the scale. Although it’s generally regarded as a good idea to give clear labels to each point on a Likert scale, the idea is that this should help users to understand that the intervals on the scale are to be interpreted as roughly equivalent. The labels need to be very simple. The set of labels in this study end up conflating a whole bunch of different ideas, each of which should be tested with a different question and a separate scale. For example, the labels in include ideas such as:

  • fabrication of the science,
  • false hypotheses,
  • natural variation,
  • validity of models,
  • politically motivated scares,
  • divertion of attention,
  • uncertainties,
  • scientists who know what they’re doing,
  • urgency of action,
  • damage to the environment,

…and so on. Conflating all of these onto a single scale makes analysis impossible, because you don’t know which of the many ideas associated with each response mode each respondent is agreeing or disagreeing with. A good survey instrument would ask about only one of these issues at once.

4) Point 5 on the scale (the one interpreted as agreeing with the IPCC) includes the phrase “the lead scientists know what they are doing”. Yet the survey is sent out to select group that includes many such lead scientists and their immediate colleagues. This form of wording immediately biases this group towards this response, regardless of what they think about the overall IPCC findings. Again, asking specifically about different findings in the IPCC report is much more likely to find out what they really think; this study is likely to mask the range of opinions.

5) And finally, as other people have pointed out, the sampling method is very suspect. Although the authors acknowledge that they didn’t do random sampling, and that this limits the kinds of analysis they can do, it also means that any quantitative summary of the responses is likely to be invalid. There’s plenty of reason to suspect that significant clusters of opinion chose not to participate because they saw the questionnaire (especially given some of the wording) as suspect. Given the context for this questionnaire, within a public discourse where everything gets distorted sooner or later, many climate scientists would quite rationally refuse to participate in any such study. Which means really we have no idea if the distribution shown in the study represents the general opinion of any particular group of scientists at all.

So, it’s not surprising no-one wants to publish it. Not because of any concerns for the impact of its findings, but simply because it’s not a valid scientific study. The only conclusions that can be drawn from this study are existence ones:

  1. there exist some people who think the IPCC underestimated (some unspecified aspect of) climate change;
  2. there exist some people who think the IPCC overestimated (some unspecified aspect of) climate change and
  3. there exist some people who think the IPCC scientists know what they are doing.

The results really say nothing about the relative sizes of these three groups, nor even whether the three groups overlap!

Now, the original research question is very interesting, and worth pursuing. Anyone want to work on a proper scientific survey to answer it?

Next Wednesday, we’re oganising demos of our students’ summer projects, prior to the Science 2.0 conference. The demos will be in BA1200 (in the Bahen Centre), Wed July 29, 10am-12pm. All welcome!

Here are the demos to be included (running order hasn’t been determined yet – we’ll probably pull names out of hat…):

  • Basie (demo’d by Bill Konrad, Eran Henig and Florian Shkurti)
    Basie is a light weight, web-based software project forge with an emphasis on inter-component communication.  It integrates revision control, issue tracking, mailing lists, wikis, status dashboards, and other tools that developers need to work effectively in teams.  Our mission is to make Basie simple enough for undergraduate students to master in ten minutes, but powerful enough to support large, distributed teams.
  • BreadCrumbs (demo’d by Brent Mombourquette).
    When researching, the context in which a relevant piece of information is found is often overlooked. However, the journey is as important as the destination. BreadCrumbs is a Firefox extension designed to capture this journey, and therefor the context, by maintaining a well structured and dynamic graph of an Internet browsing session. It keeps track of both the chronological order in which websites are visited and the link-by-link path. In addition, through providing simple tools to leave notes to yourself, an accurate record of your thought process and reasoning for browsing the documents that you did can be preserved with limited overhead. The resulting session can then be saved and revisited at a later date, with little to no time spent trying to recall the relevance or semantic relations of documents in an unordered bookmark folder, for example. It can also be used to provide information to a colleague, by not just pointing them to a series of web pages, but by providing them a trail to follow and embedded personal notes. BreadCrumbs maintains the context so that you can focus on the content.
  • Feature Diagram Tool (demo’d by Ebenezer Hailemariam)
    We present a software tool to assist software developers work with legacy code. The tool reverse engineers “dependency diagrams” from Java code through which developers can perform refactoring actions. The tool is a plug-in for the Eclipse integrated development environment.
  • MarkUs (demo’d by Severin GehwolfNelle Varoquaux and Mike Conley)
    MarkUs is a Web application that recreates the ease and flexibility of grading assignments with pen on paper. Graders fill in a marking scheme and directly annotate student’s work.  MarkUs also provides support for other aspects of assignment delivery and management.  For example, it allows students or instructors to form groups for assignment collaboration, and allows students to upload their work for grading. Instructors can also create and manage group or solo assignments, and assign graders to mark and annotate the students’ work quickly and easily.
  • MyeLink: drawing connections between OpenScience lab notes (demo’d by Maria Yancheva)
    A MediaWiki extension which facilitates connections between related wiki pages, notes, and authors. Suitable for OpenScience research communities who maintain a wiki collection of experiment pages online. Provides search functionality on the basis of both structure and content of pages, as well as a user interface allowing the customization of options and displaying an embedded preview of results.
  • TracSNAP – Trac Social Network Analysis Plugin (demo’d by Ainsley Lawson and Sarah Strong)
    TracSNAP is a suite of simple tools to help contributors make use of information about the social aspect of their Trac coding project. It tries to help you to: Find out which other developers you should be talking to, by giving contact suggestions based on commonality of file edits; Recognize files that might be related to your current work, by showing you which files are often committed at the same time as your files; Get a feel for who works on similar pieces of functionality based on discussion in bug and feature tickets, and by edits in common; Visualize your project’s effective social network with graphs of who talks to who; Visualize coupling between files based on how often your colleagues edit them together.
  • VizExpress (demo’d by Samar Sabie)
    Graphs are effective visualizations because they present data quickly and easily. vizExpress is a Mediawiki extension that inserts user-customized tables and graphs in wiki pages without having to deal with complicated wiki syntax. When editing a wiki page, the extension adds a special toolbar icon for opening the vizExpress wizard. You can provide data to the wizard by browsing to a local Excel or CSV file, or by typing (or copying/pasting) data. You can choose from eight graph types and eight graph-coloring schemes, and apply further formatting such as titles, dimensions, limits, and legend position. Once a graph is inserted in a page, you can easily edit it by restarting the wizard or modifying a simple vizExpress tag.

[Update: the session was a great success, and some of the audience have blogged about it already: e.g. Cameron Neylon]

Here’s a very sketchy first second draft for a workshop proposal for the fall. I welcome all comments on this, together with volunteers to be on the organising team. Is this a good title for the workshop? Is the abstract looking good? What should I change?

Update: I’ve jazzed up and rearranged the list of topics, in response to Steffen’s comment to get a better balance between research likely to impact SE itself, vs. research likely to impact other fields.

The First International Workshop on Software Research and Climate Change (WSRCC-1)

In conjunction with: <http://onward-conference.org/> Onward Conference 2009 and <http://www.oopsla.org/oopsla2009/> Oopsla 2009

Workshop website: <http://www.cs.toronto.edu/wsrcc>

ABSTRACT

This workshop will explore the contributions that software research can make to the challenge of climate change. Climate change is likely to be the defining issue of the 21st Century. Recent studies indicate that climate change is accelerating, confirming the most pessimistic of scenarios identified by climate scientists. Our current use of fossil fuels commit the world to around 2°C average temperature rise during this century, and, unless urgent and drastic cuts are made, further heating is likely to trigger any of a number of climate change tipping points. The results will be a dramatic reduction of food production and water supplies, more extreme weather events, the spread of disease, sea level rise, ocean acidification, and mass extinctions. We are faced with the twin challenges of mitigation (avoiding the worst climate change effects by rapidly transitioning the world to a low-carbon economy) and adaptation (re-engineering the infrastructure of modern society so that we can survive and flourish on a hotter planet).

These challenges are global in nature, and pervade all aspects of society. To address them, we will need researchers, engineers, policymakers, and educators from many different disciplines to come the the table and ask what they can contribute. There are both short term challenges (such as how to deploy, as rapidly as possible, existing technology to produce renewable energy; how to design government policies and international treaties to bring greenhouse gas emissions under control) and long term challenges (such as how to complete the transition to a global carbon-neutral society by the latter half of this century). In nearly all these challenges, software has a major role to play as a critical enabling technology.

So, for the software research community, we can frame the challenge as follows: How can we, as experts in software technology, and as the creators of future software tools and techniques, apply our particular knowledge and experience to the challenge of climate change? How can we understand and exploit the particular intellectual assets of our community — our ability to:

  • think computationally;
  • understand and model complex inter-related systems;
  • build useful abstractions and problem decompositions;
  • manage and evolve large-scale socio-technical design efforts;
  • build the information systems and knowledge management tools that empower effective decision-making;
  • develop and verify complex control systems on which we now depend;
  • create user-friendly and task-appropriate interfaces to complex information and communication infrastructures.

In short, how can we apply our research strengths to make significant contributions to the problems of mitigation and adaptation of climate change?

This workshop will be the first in a series, intended to develop a community of researchers actively engaged in this challenge, and to flesh out a detailed research agenda that leverages existing research ideas and capabilities. Therefore we welcome any kind of response to this challenge statement.

WORKSHOP TOPICS

We welcome the active participation of software researchers and practitioners interested in any aspect of this challenge. The participants will themselves determine the scope and thrusts of this workshop, so this list of suggested topics is intended to act only as a starting point:

  • requirements analysis for complex global change problems;
  • integrating sustainability into software system design;
  • green IT, including power-aware computing and automated energy management;
  • developing control systems to create smart energy grids and improve energy conservation;
  • developing information systems to support urban planning, transport policies, green buildings, etc.;
  • software tools for open collaborative science, especially across scientific disciplines;
  • design patterns for successful emissions reduction strategies;
  • social networking tools to support rapid action and knowledge sharing among communities;
  • educational software for hands-on computational science;
  • knowledge management and decision support tools for designing and implementing climate change policies;
  • tools and techniques to accelerate the development and validation of earth system models by climate scientists;
  • data sharing and data management of large scientific datasets;
  • tools for creating and sharing visualizations of climate change data;
  • (more…?)

SUBMISSIONS AND PARTICIPATION

Our intent is to create a lively, interactive discussion, to foster brainstorming and community building. Registration will be open to all. However, we strongly encourage participants to submit (one or more) brief (1-page) responses to the challenge statement, either as:

  • Descriptions of existing research projects relevant to the challenge statement (preferably with pointers to published papers and/or online resources);
  • Position papers outlining potential research projects.

Be creative and forward-thinking in these proposals: think of the future, and think big!

There will be no formal publication of proceedings. Instead we will circulate all submitted papers to participants in advance of the workshop, via the workshop website, and invite participants to revise/update/embellish their contributions in response to everyone else’s contributions. Our plan is to write a post-workshop report, which will draw on both the submitted papers and the discussions during the workshop. This report will lay out a suggested agenda for both short-term and long-term research in response to the challenge, and act as a roadmap for subsequent workshops and funding proposals.

IMPORTANT DATES

Position paper submission deadline: September 25th, 2009

Workshop on Software Research and Climate Change: October 25th or 26th,  2009

WORKSHOP ORGANIZERS

+TBD

I posted some initial ideas for projects for our summer students awhile back. I’m pleased to say that the students have been making great progress in the last few weeks (despite, or perhaps because of, the fact that I haven’t been around much). Here’s what they’ve been up to:

Sarah Strong and Ainsley Lawson have been exploring how to take the ideas on visualizing the social network of a software development team (as embodied in tools such as Tesseract), and applying them as simple extensions to code browsers / version control tools. The aim is to see if we can add some value in the form of better awareness of who is working on related code, but without asking the scientists to adopt entirely new tools. Our initial target users are the climate scientists at the UK Met Office Hadley Centre, who currently use SVN/Trac as their code management environment.

Brent Mombourquette has been working on a Firefox extension that will capture the browsing history as a graph (pages and traversed links), which can then be visualized, saved, annotated, and shared with others. The main idea is to support the way in which scientists search/browse for resources (e.g. published papers on a particular topic), and to allow them to recall their exploration path to remember the context in which they obtained these resources. I should mention the key idea goes all the way back to the Vannevar Bush’s memex.

Maria Yancheva has been exploring the whole idea of electronic lab notebooks. She has been exploring the workflows used by the climate scientists when they configure and run their simulation models, and considering how a more structured form of wiki might help them. She has selected OpenWetWare as a good starting point, and is exploring how to add extensions to MediaWiki to make OWW more suitable for computational science, especially to keep track of model runs.

Samar Sabie has also been looking at MediaWiki extensions, specifically to find a way to add visualizations into wiki pages and blogs as simply as possible. The problem is that currently, adding something as simple as a table of data to a page requires extensive work with the markup language. The long term aim is to make the insertion of dynamic visualizations (such as those at ManyEyes), but the starting point is to try to make it as ridiculously simple as possible to insert a data table, link it to a graph, and select appropriate parameters to make the graph look good, with the idea that users can subsequently change the appearance in useful ways (which means cut and paste from Excel Spreadsheets won’t be good enough).

Oh, and they’ve all been regularly blogging their progress, so we’re practicing the whole open notebook science thingy.

As a fan of Edward Tufte’s books on the power of beautiful visualizations of qualitative and quantitative data, I’m keen on the idea of exploring new ways of visualizing the climate change challenge. In part because many key policymakers are not likely to ever read the detailed reports on the science, but a few simple, compelling graphics might capture their attention.

I like the visualizations of collected by the UNEP, especially their summary of climate processes and effects, their strategic options curve, the map of political choices, summary of emissions by sector, a guide to emissions assessment, trends in sea level rise, CO2 emissions per capita. I should also point out that the IPCC reports are full of great graphics too, but there’s no easy visual index – you have to read the reports.

Now these are all very nice, and (presumably) the work of professional graphic artists. But they’re all static. The scientist in me wants to play with them. I want to play around with different scales on the axes. I want to select from among different data series. And I want to do this in a web-brower that’s directly linked to the data sources, so that I don’t have to mess around with the data directly, nor worry about how the data is formatted.

What I have in mind is something like Gap Minder. This allows you to play with the data, create new views, and share them with others. Many Eyes is similar, but goes one step further in allowing a community to create entirely new kinds of visualization, and enhance each other’s, in a social networking style. Now, if i can connect up some of these to the climate data sets collected by the IPCC, all sorts of interesting things might happen. Except that the IPCC data sets don’t have enough descriptive metadata for non-experts to make sense of it. But fixing that’s another project.

Oh, and the periodic table of visualization methods is pretty neat as a guide to what’s possible.

Update: (via Shelly): Worldmapper is an interesting way of visualizing international comparisons.

So, here’s an interesting thought that came up the the Michael Jackson festschrift yesterday. Michael commented in his talk that understanding is not a state, it’s a process. David Notkin then asked how we can know how well we’re doing in that process. I suggested that one of the ways you know is by discovering where your understanding is incorrect, which can happen if your model surprises you. I noticed this is a basic mode of operation for earth system modelers. They put their current best understanding of the various earth systems (atmosphere, ocean, carbon cycle, atmospheric chemistry, soil hydrology, etc) into a coupled simulation model and run it. Whenever the model surprises them, they know they’re probing the limits of their understanding. For example, the current generation of models at the Hadley centre don’t get the Indian Monsoon in the right place at the right time. So they know there’s something in that part of the model they don’t yet understand sufficiently.

Contrast this with the way we use (and teach) modeling in software engineering. For example, students construct UML models as part of a course in requirements analysis. They hand in their models, and we grade them. But at no point in the process do the models ever surprise their authors. UML models don’t appear to have the capacity for surprise. Which is unfortunate, given what the students did in previous courses. In their programming courses, they were constantly surprised. Their programs didn’t compile. Then they didn’t run. Then they kept crashing. Then they gave the wrong outputs. At every point, the surprise is a learning opportunity, because it means there was something wrong with their understanding, which they have to fix. This contrast explains a lot about the relative value students get from programming courses versus software modeling courses.

Now of course, we do have some software engineering modeling frameworks that have the capacity for surprise. They allow you to create a model and play with it, and sometimes get unexpected results. For example, Alloy. And I guess model checkers have that capacity too. A necessary condition is that you can express some property that your model ought to have, and then automatically check that it does have it. But that’s not sufficient, because if the properties you express aren’t particularly interesting, or are trivially satisifed, you still won’t be surprised. For example, UML syntax checkers fall into this category – when your model fails a syntax check, that’s not surprising, it’s just annoying. Also, you don’t necessarily have to formally state the properties – but you do have to at least have clear expectations. When the model doesn’t meet those expectations, you get the surprise. So surprise isn’t just about executability, it’s really about falsifiability.

Summer projects: I posted yesterday on social network tools for computational scientists. Greg has posted a whole list of additional suggestions.

Here, I will elaborate another of these ideas: the electronic lab notebook. For computational scientists, wiki pages are an obvious substitute for traditional lab notebooks, because each description of an experiment can then be linked directly with the corresponding datasets, configuration files, visualizations of results, scientific papers, related experiments, etc. (In the most radical version, Open Notebook Science, the lab notebook is completely open for anyone to see. But the toolset would be the same whether it was open to anyone, or just shared with select colleagues)

In my study of the software practices at the UK Met Office last summer, I noticed that some of the scientists carefully document each experiment via a new wiki page, but the process is laborious in a standard wiki, involving a lot of cut-and-paste to create a suitable page structure. For this reason, many scientists don’t keep good records of their experiments. An obvious improvement would be to generate a basic wiki page automatically each time a model run is configured, and populate it with information about the run, and links to the relevant data files. The scientists could then add further commentary via a standard wiki editor.

Of course, an even better solution is to capture all information about a particular run of the model (including subsequent commentary on the results) as meta-data in the configuration file, so that no wiki pages are needed: lab notebook pages are just user-friendly views of the configuration file. I think that’s probably a longer term project, and links in with the observation that existing climate model configuration tools are hard to use anyway and need to be re-invented. Let’s leave that one aside for the moment…

A related problem is better support for navigating and linking existing lab book pages. For example, in the process of writing up a scientific paper, a scientist might need to search for the descriptions of number of individual experiments, select some of the data, create new visualizations for use in the paper, and so on. Recording this trail would improve reproducibility, by capturing the necessary links to source data in case the visualizations used in the paper need to be altered or recreated. Some of requires a detailed analysis of the specific workflows used in a particular lab (which reminds me I need to write up what I know of the Met Office’s workflows), but I think some of this can be achieved by simple generic tools (e.g. browser plugins) that help capture the trail as it happens, and perhaps edit and annotate it afterwards.

I’m sure some of these tools must exist already, but I don’t know of them. Feel free to send me pointers…

This summer, we have a group of undergrad students working with us, who will try building some of the tools we have identified as potentially useful for climate scientists. We’re just getting started this week, so it’s not clear what we’ll actually build yet, but I think I can guarantee we’ll end up with one of two outcomes: either we build something that is genuinely useful, or we learn a lot about what doesn’t work and why not.

Here’s the first project idea. It responds to the observation that large climate models (and indeed any large-scale scientific simulation) undergoes continuous evolution, as a variety of scientists contribute code over a long period of time (decades, in some cases). There is no well-defined specification for the system, and nor do the scientists even know ahead of time exactly what the software should do. Coordinating contributions to this code then becomes a problem. If you want to make a change to some particular routine, it can be hard to know who else is working on related code, what potential impacts your change might have, and sometimes it is hard even to know who to go and ask about these things – who’s the expert?

A similar problem occurs in many other types of software project, and there is a fascinating line of research that exploits the social network to visualize how the efforts of different people interact. It draws on work in sociology on social network analysis – basically the idea that you can treat a large group of people and their social interactions as a graph, which can then be visualized in interesting ways, and analyzed for its structural properties, to identify things like distance (as in six degrees of separation), and structural cohesion. For software engineering purposes, we can automatically construct two distinct graphs:

  1. A graph of social interactions (e.g. who talks to whom). This can be constructed by extracting records of electronic communication from the project database – email records, bug reports, bulletin boards, etc. Of course, this misses verbal interactions, which makes it more suitable for geographically distributed projects, but there are ways of adding some of this missing information if needed (e.g. if we can mine people’s calendars, meeting agendas, etc).
  2. A graph of code dependencies (which bits of code are related). This can include simply which routines call which other routines. More interestingly, it can include information such as which bits of code were checked into the repository at the same time by the same person, which bits of code are linked to the same bug report, etc.

Comparing these two graphs offers insight into socio-technical congruence – how well the social network (who talks to whom) matches the technical dependencies in the code. Which then leads to all sorts of interesting ideas for tools:

For added difficulty, we have to assume that our target users (climate scientists) are programming in Fortran, and are not using integrated programming environments. Although we can assume they have good version control tools (e.g. Subversion) and good bug tracking tools (e.g Trac).

Okay, here’s a slightly different modeling challenge. It might be more of a visualization challenge. Whatever. In part 1, I suggested we use requirements analysis techniques to identify stakeholders, and stakeholder goals, and link them to the various suggested “wedges“.

Here, I want to suggest something different. There are several excellent books that attempt to address the “how will we do it?” challenge. They each set out a set of suggested solutions, add up the contribution of each solution to reducing emissions, assess the feasibility of each solution, add up all the numbers, and attempt to make some strategic recommendations. But each book makes different input assumptions, focusses on slightly different kinds of solutions, and ends up with different recommendations (but they also agree on many things).

Here are the four books:

Cover image for Monbiots Heat
George Monbiot, Heat: How to Stop the Planet from Burning. This is probably the best book I have ever read on global warming. It’s brilliantly researched, passionate, and doesn’t pull it’s punches. Plus it’s furiously upbeat – Monbiot takes on the challenge of how we get to 90% emissions reduction, and shows that it is possible (although you kind of have to imagine a world in which politicians are willing to do the right thing).

Joseph Romm, Hell and High Water: Global Warming–the Solution and the Politics–and What We Should Do. While lacking Monbiot’s compelling writing style, Romm makes up by being an insider – he was an energy policy wonk in the Clinton administration. The other contrast is Monbiot is British, and focusses mainly on British examples, Romm is American and focusses on US example. The cultural contrasts are interesting.

David MacKay, Sustainable Energy – Without the Hot Air. Okay, so I haven’t read this one yet, but it got a glowing write-up on Boing Boing . Oh, and it’s available as a free download.

Lester Brown, Plan B 3.0L Mobilizing to Save Civilization. This one’s been on my reading list for a while, will read it soon. It has a much broader remit than the others: Brown wants to solve world poverty, cure disease, feed the world, and solve the climate crisis. I’m looking forward to this one. And it’s also available as a free download.

Okay, so what’s the challenge? Model the set of solutions in each of these books so that it’s possible to compare and contrast their solutions, compare their assumptions, and easily identify areas of agreement and disagreement. I’ve no idea yet how to do this, but a related challenge would be to come up with compelling visualizations that explain to a much broader audience what these solutions look like, and why it’s perfectly feasible. Something like this (my current favourite graphic):

Graph of cost/benefit of climate mitigation strategies

Graph of cost/benefit of climate mitigation strategies

I just spent the last two hours chewing the fat with Mark Klein at MIT and Mark Tovey at Carleton, talking about all sorts of ideas, but loosely focussed on how distributed collaborative modeling efforts can help address global change issues (e.g. climate, peak oil, sustainability).

MK has a project, Climate Interactive,[update: Mark tells me I got the wrong project – it should be The Climate Collaboratorium. Climate Interactive is from a different group at MIT] which is exploring how climate simulation tools can be hooked up to discussions around decision making, which is one of the ideas we kicked around in our brainstorming sessions here.

MT has been exploring how you take ideas from distributed cognition and scale them up to much larger teams of people. He has put together a wonderful one-pager that summarized many interesting ideas on how mass collaboration can be applied in this space.

This conversation is going to keep me going for days on stuff to explore and blog about:

And lots of interesting ideas for new projects…

One of the things that came up in our weekly brainstorming session today was the question of whether climate models can be made more modular, to permit distributed development, and distributed execution. Carolyn has already blogged about some of these ideas. Here’s a little bit of history for this topic.

First, a very old (well, 1989) paper by Kalnay et al,  on Data Interchange Formats, in which they float the idea of “plug compatibility” for climate model components. For a long time, this idea seems to have been accepted as the long term goal for the architecture for climate models. But no-one appears to have come close. In 1996, David Randall wrote an interesting introspective on how university teams can (or can’t) participate in climate model building, in which he speculates that plug compatibility might not be achievable in practice because of the complexity of the physical processes being simulated, and the complex interactions between them. He also points out that all climate models (up to that point) had each been developed at a single site, and he talks a bit about why this appears to be necessarily so.

Fast forward to a paper by Dickinson et al in 2002, which summarizes the results of a series of workshops on how to develop a better software infrastructure for model sharing, and talks about some prototype software frameworks. Then, a paper by Larson et al in 2004, introducing a common component architecture for earth system models, and a bit about the Earth System Modeling Framework being developed at NCAR. And finally, Drake et al.’s Overview of the Community Climate System Model, which appears to use these frameworks very successfully.

Now, admittedly I haven’t looked closely at the CCSM. But I have looked closely at the Met Office’s Unified Model and the Canadian CCCma, and neither of them get anywhere close to the ideal of modularity. In both cases, the developers have to invest months of effort to ‘naturalize’ code contributed from other labs, in the manner described in Randall’s paper.

So, here’s the mystery. Has the CCSM really achieved the modularity that others are only dreaming of? And if so how? The key test would be how much effort it takes to ‘plug in’ a module developed elsewhere…

Here’s a challenge for the requirements modelling experts. I’ve phrased it as an exam question for my graduate course on requirements engineering (the course is on hiatus, which is lucky, because it would be a long exam…):

Q: The governments of all the nations on a small blue planet want to fix a problem with the way their reliance on fossil fuels is altering the planet’s climate. Draw a goal model (using any appropriate goal modeling notation) showing the key stakeholders, their interdependencies, and their goals. Be sure to show how the set of solutions they are considering contribute to satisfying their goals. The attached documents may be useful in answering this question: (a) A outline of the top level goals; (b) A description of the available solutions, characterized as a set of Stabilization Wedges; (c) A domain expert’s view of the feasbility of the solutions.

Update: Someone’s done the initial identification of actors already.

A group of us at the lab, led by Jon Pipitone, has been meeting every Tuesday lunchtime (well almost every Tuesday) for a few months, to brainstorm ideas for how software engineers can contribute to addressing the climate crisis. Jon has been blogging some of our sessions (here, here and here).

This week we attempted to create a matrix, where the rows are “challenge problems” related to the climate crisis, and the columns are the various research areas of software engineering (e.g. requirements analysis, formal methods, testing, etc…). One reason to do this is to figure out how to run a structured brainstorming session with a bigger set of SE researchers (e.g. at ICSE). Having sketched out the matrix, we then attempted to populate one row with ideas for research projects. I thought the exercise went remarkably well. One thing I took away from it was that it was pretty easy to think up research projects to populate many of the cells in the matrix (I had initially thought the matrix might be rather sparse by the time we were done).

We also decided that it would be helpful to characterize each of the rows a little more, so that SE researchers who are unfamiliar with some of the challenges would understand each challenge enough to stimulate some interesting discussions. So, here is an initial list of challenges (I added some links where I could). Note that I’ve grouped them according to who immediate audience is for any tools, techniques, practices…).

  1. Help the climate scientists to develop a better understanding of climate processes.
  2. Help the educators to to teach kids about climate science – how the science is done, and how we know what we know about climate change.
    • Support hands-on computational science (e.g. an online climate lab with building blocks to support construction of simple simulation models)
    • Global warming games
  3. Help the journalists & science writers to raise awareness of the issues around climate change for a broader audience.
    • Better public understanding of climate processes
    • Better public understanding of how climate science works
    • Visualizations of complex earth systems
    • connect data generators (eg scientists) with potential users (e.g. bloggers)
  4. Help the policymakers to design, implement and adjust a comprehensive set of policies for reducing greenhouse gas emissions.
  5. Help the political activists who put pressure on governments to change their policies, or to get better leaders elected when the current ones don’t act.
    • Social networking tools for activitists
    • Tools for persuasion (e.g. visualizations) and community building (e.g. Essence)
  6. Help individuals and communities to lower their carbon footprints.
  7. Help the engineers who are developing new technologies for renewable energy and energy efficiency systems.
    • green IT
    • Smart energy grids
    • waste reduction
    • renewable energy
    • town planning
    • green buildings/architecture
    • transportation systems (better public transit, electric cars, etc)
    • etc