I’ve spent much of the last month preparing a major research proposal for the Ontario Research Fund (ORF), entitled “Integrated Decision Support for Sustainable Communities”. We’ve assembled a great research team, with professors from a number of different departments, across the schools of engineering, information, architecture, and arts and science. We’ve held meetings with a number of industrial companies involved in software for data analytics and 3D modeling, consultancy companies involved in urban planning and design, and people from both provincial and city government. We started putting this together in September, and were working to a proposal deadline at the end of January.

And then this week, out of the blue, the province announced that it was cancelling the funding program entirely, “in light of current fiscal challenges”. The best bit in the letter I received was:

The work being done by researchers in this province is recognized and valued. This announcement is not a reflection of the government’s continued commitment through other programs that provides support to the important work being done by researchers.

I’ve searched hard for the “other programs” they mention, but there don’t appear to be any. It’s increasingly hard to get any finding for research, especially trans-disciplinary research. Here’s the abstract from our proposal:

Our goal is to establish Ontario as a world leader in building sustainable communities, through the use of data analytics tools that provide decision-makers with a more complete understanding of how cities work. We will bring together existing expertise in data integration, systems analysis, modeling, and visualization to address the information needs of citizens and policy-makers who must come together to re-invent towns and cities as the basis for a liveable, resilient, carbon-neutral society. The program integrates the work of a team of world-class researchers, and builds on the advantages Ontario enjoys as an early adopter of smart grid technologies and open data initiatives.

The long-term sustainability of Ontario’s quality of life and economic prosperity depends on our ability to adopt new, transformative approaches to urban design and energy management. The transition to clean energy and the renewal of urban infrastructure must go hand-in-hand, to deliver improvements across a wide range of indicators, including design quality, innovation, lifestyle, transportation, energy efficiency and social justice. Design, planning and decision-making must incorporate a systems-of-systems view, to encompass the many processes that shape modern cities, and the complex interactions between them.

Our research program integrates emerging techniques in five theme areas that bridge the gap between decision-making processes for building sustainable cities and the vast sources of data on social demographics, energy, buildings, transport, food, water and waste:

  • Decision-Support and Public Engagement: We begin by analyzing the needs of different participants, and develop strategies for active engagement;
  • Visualization: We will create collaborative and immersive visualizations to enhance participatory decision-making;
  • Modelling and Simulation: We will develop a model integration framework to bring together models of different systems that define the spatio-temporal and socio-economic dynamics of cities, to drive our visualizations;
  • Data Privacy: We will assess the threats to privacy of all citizens that arise when detailed data about everyday activities is mined for patterns and identify appropriate techniques for protecting privacy when such data is used in the modeling and analysis process;
  • Data Integration and Management: We will identify access paths to the data sources needed to drive our simulations and visualizations, and incorporate techniques for managing and combining very large datasets.

These themes combine to provide an integrated approach to intelligent, data-driven planning and decision-making. We will apply the technologies we develop in a series of community-based design case studies, chosen to demonstrate how our approach would apply to increasingly complex problems such as energy efficiency, urban intensification, and transportation. Our goal is to show how an integrated approach can improve the quality and openness of the decision-making process, while taking into account the needs of diverse stakeholders, and the inter-dependencies between policy, governance, finance and sustainability in city planning.

Because urban regions throughout the world face many of the same challenges, this research will allow Ontario to develop a technological advantage in areas such as energy management and urban change, and enabling a new set of creative knowledge-based services address the needs of communities and governments. Ontario is well placed to develop this as a competitive advantage, due to its leadership in the collection and maintenance of large datasets in areas such as energy management, social well-being, and urban infrastructure. We will leverage this investment and create a world-class capability not available in any other jurisdiction.

Incidentally, we spent much of last fall preparing a similar proposal for the previous funding round. That was rejected on the basis that we weren’t clear enough what the project outcomes would be, and what the pathways to commercialization were. For our second crack at it, we were planning to focus much more specifically on the model integration part, by developing a software framework for coupling urban system models, based on a detailed requirements analysis of the stakeholders involved in urban design and planning, with case studies on neighbourhood re-design and building energy retro-fits. Our industrial partners have identified a number of routes to commercial services that would make use of such software. Everything was coming together beautifully. *Sigh*.

Now we have to find some other source of funding for this. Contributions welcome!

Valdivino, who is working on a PhD in Brazil, on formal software verification techniques, is inspired by my suggestion to find ways to apply our current software research skills to climate science. But he asks some hard questions:

1.) If I want to Validate and Verify climate models should I forget all the things that I have learned so far in the V&V discipline? (e.g. Model-Based Testing (Finite State Machine, Statecharts, Z, B), structural testing, code inspection, static analysis, model checking)
2.) Among all V&V techniques, what can really be reused / adapted for climate models?

Well, I wish I had some good answers. When I started looking at the software development processes for climate models, I expected to be able to apply many of the [edit] formal techniques I’ve worked on in the past in Verification and Validation (V&V) and Requirements Engineering (RE). It turns out almost none of it seems to apply, at least in any obvious way.

Climate models are built through a long, slow process of trial and error, continually seeking to improve the quality of the simulations (See here for an overview of how they’re tested). As this is scientific research, it’s unknown, a priori, what will work, what’s computationally feasible, etc. Worse still, the complexity of the earth systems being studied means its often hard to know which processes in the model most need work, because the relationship between particular earth system processes and the overall behaviour of the climate system is exactly what the researchers are working to understand.

Which means that model development looks most like an agile software development process, where the both the requirements and the set of techniques needed to implement them are unknown (and unknowable) up-front. So they build a little, and then explore how well it works. The closest they come to a formal specification is a set of hypotheses along the lines of:

“if I change <piece of code> in <routine>, I expect it to have <specific impact on model error> in <output variable> by <expected margin> because of <tentative theory about climactic processes and how they’re represented in the model>”

This hypothesis can then be tested by a formal experiment in which runs of the model with and without the altered code become two treatments, assessed against the observational data for some relevant period in the past. The expected improvement might be a reduction in the root mean squared error for some variable of interest, or just as importantly, an improvement in the variability (e.g. the seasonal or diurnal spread).

The whole process looks a bit like this (although, see Jakob’s 2010 paper for a more sophisticated view of the process):

And of course, the central V&V technique here is full integration testing. The scientists build and run the full model to conduct the end-to-end tests that constitute the experiments.

So the closest thing they have to a specification would be a chart such as the following (courtesy of Tim Johns at the UK Met Office):

This chart shows how well the model is doing on 34 selected output variables (click the graph to see a bigger version, to get a sense of what the variables are). The scores for the previous model version have been normalized to 1.0, so you can quickly see whether the new model version did better or worse for each output variable – the previous model version is the line at “1.0” and the new model version is shown as the coloured dots above and below the line. The whiskers show the target skill level for each variable. If the coloured dots are within the whisker for a given variable, then the model is considered to be within the variability range for the observational data for that variable. Colour-coded dots then show how well the current version did: green dots mean it’s within the target skill range, yellow mean it’s outside the target range, but did better than the previous model version, and red means it’s outside the target and did worse than the previous model version.

Now, as we know, agile software practices aren’t really amenable to any kind of formal verification technique. If you don’t know what’s possible before you write the code, then you can’t write down a formal specification (the ‘target skill levels’ in the chart above don’t count – these aspirational goals rather than specifications). And if you can’t write down a formal specification for the expected software behaviour, then you can’t apply formal reasoning techniques to determine if the specification was met.

So does this really mean, as Valdivino suggests, that we can’t apply any of our toolbox of formal verification methods? I think attempting to answer this would make a great research project. I have some ideas for places to look where such techniques might be applicable. For example:

  • One important built-in check in a climate model is ‘conservation of mass’. Some fluxes move mass between the different components of the model. Water is an obvious one – it’s evaporated from the oceans, to become part of the atmosphere, and is then passed to the land component as rain, thence to the rivers module, and finally back to the ocean. All the while, the total mass of water across all components must not change. Similar checks apply to salt, carbon (actually this does change due to emissions), and various trace elements. At present, such checks are this is built in to the models as code assertions. In some cases, flux corrections were necessary because of imperfections in the numerical routines or the geometry of the grid, although in most cases, the models have improved enough that most flux corrections have been removed. But I think you could automatically extract from the code an abstracted model capturing just the ways in which these quantities change, and then use a model checker to track down and reason about such problems.
  • A more general version of the previous idea: In some sense, a climate model is a giant state-machine, but the scientists don’t ever build abstracted versions of it – they only work at the code level. If we build more abstracted models of the major state changes in each component of the model, and then do compositional verification over a combination of these models, it *might* offer useful insights into how the model works and how to improve it. At the very least, it would be an interesting teaching tool for people who want to learn about how a climate model works.
  • Climate modellers generally don’t use unit testing. The challenge here is that they find it hard to write down correctness properties for individual code units. I’m not entirely clear how formal methods could help here, but it seems like someone with experience of patterns for temporal logic properties might be able to help here. Clune and Rood have a forthcoming paper on this in November’s IEEE Software. I suspect this is one of the easiest places to get started for software people new to climate models.
  • There’s one other kind of verification test that is currently done by inspection, but might be amenable to some kind of formalization: the check that the code correctly implements a given mathematical formula. I don’t think this will be a high value tool, as the fortran code is close enough to the mathematics that simple inspection is already very effective. But occasionally a subtle bug slips through – for example, I came across an example where the modellers discovered they had used the wrong logarithm (loge in place of log10), although this was more due to lack of clarity in the original published paper, rather than a coding error.

Feel free to suggest more ideas in the comments!

Over the past month, we’ve welcomed a number of new researchers to the lab. It’s about time I introduced them properly:

  • Kaitlin Alexander is with us for the summer on a Centre for Global Change Science (CGCS) undergraduate internship. She’s doing her undergrad at the University of Manitoba in applied math, and is well know to many of us already through her blog, ClimateSight. Kaitlin will be studying the software architecture of climate models, and she’s already written a few blog posts about her progress.
  • Trista Mueller is working in the lab for the summer as a part-time undergrad intern – she’s a computer science major at U of T, and a regular volunteer at Hot Yam. Trista is developing a number of case studies for our Inflo open calculator, and helping to shake out a few bugs in the process.
  • Elizabeth Patitsas joins us from UBC, where she just graduated with a BSc honours degree in Integrated Science – she’s working as a research assistant over the summer and will be enrolling in grad school here in September. Elizabeth has been studying how we teach (or fail to teach) key computer science skills, both to computer science students, and to other groups, such as physical scientists. Over the summer, she’ll be developing a research project to identify which CS skills are most needed by scientists, with the eventual goal of building and evaluating a curriculum based on her findings.
  • Fabio da Silva is a professor of computer science at the Federal University of Pernambuco (UFPE) in Brazil, and is joining us this month for a one year sabbatical. Fabio works in empirical software engineering and will be exploring how software teams coordinate their work, in particular the role of self-organizing teams.

Great news – I’ve had my paper accepted for the 2010 FSE/SDP Workshop on the Future of Software Engineering Research, in Santa Fe, in November! The workshop sounds very interesting – 2 days intensive discussion on where we as a research community should be going. Here’s my contribution:

Climate Change: A Grand Software Challenge

Abstract

Software is a critical enabling technology in nearly all aspects of climate change, from the computational models used by climate scientists to improve our understanding of the impact of human activities on earth systems, through to the information and control systems needed to build an effective carbon-neutral society. Accordingly, we, as software researchers and software practitioners, have a major role to play in responding to the climate crisis. In this paper we map out the space in which our contributions are likely to be needed, and suggest a possible research agenda.

Introduction

Climate change is likely to be the defining issue of the 21st century. The science is unequivocal – concentrations of greenhouse gases are rising faster than at any previous era in the earth’s history, and the impacts are already evident [1]. Future impacts are likely to include a reduction of global food and water supplies, more frequent extreme weather events, sea level rise, ocean acidification, and mass extinctions [10]. In the next few decades, serious impacts are expected on human health from heat stress and vector-borne diseases [2].

Unfortunately, the scale of the systems involved makes the problem hard to understand, and hard to solve. For example, the additional carbon in greenhouse gases tends to remain in atmosphere-ocean circulation for centuries, which means past emissions commit us to further warming throughout this century, even if new emissions are dramatically reduced [12]. The human response is also very slow – it will take decades to complete a worldwide switch to carbon-neutral energy sources, during which time atmospheric concentrations of greenhouse gases will continue to rise. These lags in the system mean that further warming is inevitable, and catastrophic climate disruption is likely on the business-as-usual scenario.

Hence, we face a triple challenge: mitigation to avoid the worst climate change effects by rapidly transitioning the world to a low-carbon economy; adaptation to re-engineer the infrastructure of modern society so that we can survive and flourish on a hotter planet; and education to improve public understanding of the inter-relationships of the planetary climate system and human activity systems, and of the scale and urgency of the problem.

These challenges are global in nature, and pervade all aspects of society. To address them, researchers, engineers, policymakers, and educators from many different disciplines need to come to the table and ask what they can contribute. In the short term, we need to deploy, as rapidly as possible, existing technology to produce renewable energy[8] and design government policies and international treaties to bring greenhouse gas emissions under control. In the longer term, we need to complete the transition to a global carbon-neutral society by the latter half of this century [1]. Meeting these challenges will demand the mobilization of entire communities of expertise.

Software plays a ma jor role, both as part of the problem and as part of the solution. A large part of the massive growth of energy consumption in the past few decades is due to the manufacture and use of computing and communication technologies, and the technological advances they make possible. Energy efficiency has never been a key requirement in the development of software-intensive technologies, and so there is a very large potential for efficiency improvements [16].

But software also provides the critical infrastructure that supports the scientific study of climate change, and the use of that science by society. Software allows us to process vast amounts of geoscientific data, to simulate earth system processes, to assess the implications, and to explore possible policy responses. Software models allow scientists, activists and policymakers to share data, explore scenarios, and validate assumptions. The extent of this infrastructure is often invisible, both to those who rely on it, and to the general public [6]. Yet weaknesses in this software (whether real or imaginary) will impede our ability to make progress in tackling climate change. We need to solve hard problems to improve the way that society finds, assesses, and uses knowledge to support collective decision-making.

In this paper, we explore the role of the software community in addressing these challenges, and the potential for software infrastructure to bridge the gaps between scientific disciplines, policymakers, the media, and public opinion. We also identify critical weaknesses in our ability to develop and validate this software infrastructure, particularly as traditional software engineering methods are poorly adapted to the construction of such a vast, evolving knowledge-intensive software infrastructure.

Now read the full paper here (don’t worry, it’s only four pages, and you’ve now already read the first one!)

Oh, and many thanks to everyone who read drafts of this and sent me comments!

I’ve speculated before about the factors that determine the length of the release cycle for climate models. The IPCC assessment process, which operates on a 5-year cycle tends to dominate everything. But there are clearly other rhythms that matter too. I had speculated that the 6-year gap between the release of CCSM3 and CCSM4 could largely be explained by the demands of the the IPCC cycle; however the NCAR folks might have blown holes in that idea by making three new releases in the last six months; clearly other temporal cycles are at play.

In discussion over lunch yesterday, Archer pointed me to the paper “Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work”  by Steven Jackson and co, who point out that while the role of space and proximity have been widely studied in colloborative work, the role of time and patterns of temporal constraints have not. They set out four different kinds of temporal rhythm that are relevant to scientific work:

  • phenomenal rhythms, arising from the objects of study – e.g. annual and seasonal cycles strongly affect when fieldwork can be done in biology/ecology; the development of a disease in an individual patient affects the flow of medical research;
  • institutional rhythms, such as the academic calendar, funding deadlines, the timing of conferences and paper deadlines, etc.
  • biographical rhythms, arising from individual needs – family time, career development milestones, illnesses and vacations, etc.
  • infrastructural rhythms, arising from the development of the buildings and equipment that scientific research depends on. Examples include the launch, operation and expected life of a scientific instrument on a satellite, the timing of software releases, and the development of classification systems and standards.

The paper gives two interesting examples of problems in aligning these rhythms. First, the example of the study of long term phenomena such as river flow on short term research grants led to mistakes where a data collected during an unusually wet period in the early 20th century led to serious deficiencies in water management plans for the Colorado river. Second, for NASA’s Mars mission MER, the decision was taken to put the support team on “Mars time” as the Martian day is 2.7% longer than the earth day. But as the team’s daily work cycle drifted from the normal earth day, serious tensions arose between the family and social needs of the project team and the demands of the project rhythm.

Here’s another example that fascinated me when I was at the NASA software verification lab in the 90s. The Cassini spacecraft took about six years to get to Saturn. Rather than develop all the mission software prior to launch, NASA took the decision to develop only the minimal software needed for launch and navigation, and delayed the start of development of the mission software until just prior to arrival at Saturn. The rational was that they didn’t want a six year gap between development and use of this software, during which time the software teams might disperse – they needed the teams in place, with recent familiarity with the code, at the point the main science missions started.

For climate science, the IPCC process is clearly a major institutional rhythm, but the infrastructural rhythms that arise in model development interact with this in complex ways. I need to spend time looking at the other rhythms as well.

Congratulations to Jorge, who passed the first part of his PhD thesis defense yesterday with flying colours. Jorge’ thesis is based on a whole series of qualitative case studies of different software development teams (links go to ones he’s already published):

  • 7 successful small companies (under 50 employees) in the Toronto region;
  • 9 scientific software development groups, in an academic environment;
  • 2 studies of large companies (IBM and Microsoft);
  • 1 detailed comparative study of a company using Extreme Programming (XP) versus a similar sized company that uses more traditional development process (both building similar types of software for similar customers);

We don’t have anywhere near enough detailed case studies in software engineering – most claims for the effectiveness of various approaches to software development are based on little more than marketing claims and anecdotal evidence. There has been a push in the last decade or so for laboratory experiments, which are usually conducted along the lines of experiments in psychology: recruit a set of subjects, assign them a programming task, and measure the difference in variables like productivity or software quality when half of them are given some new tool or technique. While these experiments are sometimes useful for insights into how individual programmers work on small tasks, they really don’t tell us much about software development in the wild, where, as Parnas puts it, the interesting challenges are in multi-person development of multi-version software over long time scales. Jorge cites a particular example in his thesis of a controlled study of pair programming, which purports to show that pair programming lowers productivity. Except that it shows no such thing – any claimed benefits of pair programming are unlikely to emerge with subjects who are put together for a single day, but who otherwise have no connection with one another, and no shared context (like, for example, a project they are both committed to).

Each of Jorge’s case studies is interesting, but to me, the theory he uses them to develop is even more interesting. He starts by identifying three different traditions to the study of software development:

  • The process view, in which software construction is treated like a production line, and the details of the individuals and teams who do the construction are abstracted away, allowing researchers to talk about processes and process models, which, it is assumed, can be applied in any organizational context to achieve a predictable result. This view is predominant in the SE literature. The problem, of course, is that the experience and skills of individuals and teams do matter, and the focus on processes is a poor way to understand how software development works.
  • The information flow view, in which much of software development is seen as a problem in sharing information across software teams. This view has become popular recently, as it enables the study of electronic repositories of team communications as evidence of interaction patterns across the team, and leads to a set of theories abut how well patterns of communication acts match the technical dependencies in the software. The view is appealing because it connects well with what we know about interdependencies within the software, where clean interfaces and information hiding are important. Jorge argues that the problem with this view is that it fails to distinguish between successful and unsuccessful acts of communication. It assumes that communication is all about transmitting and receiving information, and it ignores problems in reconstructing the meaning of a message, which is particularly hard when the recipient is in a remote location, or is reading it months or years later.
  • The third view is that software development is largely about the development of a shared understanding within teams. This view is attractive because it takes seriously the intensive cognitive effort of software construction, and emphasizes the role of coordination, and the way that different forms of communication can impact coordination. It should be no surprise that Jorge and I both prefer this view.

Then comes the most interesting part. Jorge points out that software teams need to develop a shared understanding of goals, plans, status and context, and that four factors will strongly impact their success in this: proximity (how close the team members are to each other – being in the same room is much more useful than being in different cities), synchrony (talking to each other in (near) realtime is much more useful than writing documents to be read at some later time); symmetry (which means the coordination and information sharing is done best by the people whom it most concerns, rather than imposed by, say, managers) and maturity (it really helps if a team has an established set of working relationships and a shared culture).

This theory leads to a reconceptualization of many aspects of software development, such as the role of tools, the layout of physical space, the value of documentation, and the impact of growth on software teams. But you’ll have to read the thesis to get the scoop on all these…

I’ve been busy the last few weeks setting up the travel details for my sabbatical. My plan is to visit three different climate modeling centers, to do a comparative study of their software practices. The goal is to understand how the software engineering culture and practices vary across different centers, and how the differences affect the quality and flexibility of the models. The three centers I’ll be visiting are:

I’ll spend 4 weeks at each centre, starting in July, running through to October, after which I’ll spend some time analyzing the data and writing up my observations. Here’s my research plan…

Our previous studies at the UK Met Office Hadley Center suggest that there are many features of software development for earth system modeling that make it markedly different from other types of software development, and which therefore affect the applicability of standard software engineering tools and techniques. Tools developed for commercial software tend not to cater for the demands of working with high performance code for parallel architectures, and usually do not fit well with the working practices of scientific teams. Scientific code development has challenges that don’t apply to other forms of software: the need to keep track of exactly which version of the program code was used in a particular experiment, the need to re-run experiments with precisely repeatable results, the need to build alternative versions of the software from a common code base for different kinds of experiments. Checking software “correctness” is hard because frequently the software must calculate approximate solutions to numerical problems for which there is no analytical solution. Because the overall goal is to build code to explore a theory, there is no oracle for what the outputs should be, and therefore conventional approaches to testing (and perhaps code quality in general) don’t apply.

Despite this potential mismatch, the earth system modeling community has adopted (and sometimes adapted) many tools and practices from mainstream software engineering. These include version control, bug tracking, automated build and test processes, release planning, code reviews, frequent regression testing, and so on. Such tools may offer a number of potential benefits:

  • they may increase productivity by speeding up the development cycle, so that scientists can get their ideas into working code much faster;
  • they may improve verification, for example using code analysis tools to identify and remove (or even prevent) software errors;
  • they may improve the understandability and modifiability of computational models (making it easier to continue to evolve the models);
  • they may improve coordination, allowing a broader community to contribute to and make use of a shared the code base for a wider variety of experiments;
  • they may improve scalability and performance, allowing code to be configured and optimized for a wider variety of high performance architectures (including massively parallel machines), and for a wider variety of grid resolutions.

This study will investigate which tools and practices have been adopted at the different centers, identify differences and similarities in how they are applied, and, as far as is possible, assess the effectiveness of these practices. We will also attempt to characterize the remaining challenges, and identify opportunities where additional tools and techniques might be adopted.

Specific questions for the study include:

  1. Verification – What techniques are used to ensure that the code matches the scientists’ understanding of what it should do? In traditional software engineering, this is usually taken to be a question of correctness (does the code do what it is supposed to?); however, for exploratory modeling it is just as often a question of understanding (have we adequately understood what happens when the model runs?). We will investigate the practices used to test the code, to validate it against observational data, and to compare different model runs against one another, and assess how effective these are at eliminating errors of correctness and errors of understanding.
  2. Coordination – How are the contributions from across the modeling community coordinated? In particular, we will examine the challenges of synchronizing the development processes for coupled models with the development processes of their component models, and how the differences in the priorities of different, overlapping communities of users affect this coordination.
  3. Division of responsibility – How are the responsibilities for coding, verification, and coordination distributed between different roles in the organization? In particular, we will examine how these responsibilities are divided across the scientists and other support roles such as ‘systems’ or ‘software engineering’ personnel. We will also explore expectations on the quality of contributed code from end-user scientists, and the potential for testing and review practices to affect the quality of contributed code.
  4. Planning and release processes – How do modelers decide on priorities for model development, how do they decide which changes to tackle in a particular release of the model, and how they navigate between computational feasibility and scientific priorities? We will also investigate how the change process is organized, how changes are propagated to different sub-communities.
  5. Debugging – How do scientists currently debug the models, what types of bugs do they find in their code currently, and how they find them? In particular, we will develop a categorization of model errors, to use as a basis for subsequent studies into new techniques for detecting and/or eliminating such errors.

The study will be conducted through a mix of interviews and observational studies, focusing on particular changes to the model codes developed at each center. The proposed methodology is to identify a number of candidate code changes, including recently completed changes and current work-in-progress, and to build a “life story” for each such change, covering how each change was planned and conducted, what techniques were applied, and what problems were encountered. This will lead to a more detailed description of the current software development practices, which can then be compared and contrasted with studies of practices used for other types of software. This end result will be an identification of opportunities where existing tools and techniques can be readily adapted (with some clear indication of the potential benefits), along with a longer-term research agenda for problem areas where no suitable solutions currently exist.

I posted a while back the introduction to a research proposal in climate change informatics. And I also posted a list of potential research areas, and a set of criteria by which we might judge climate informatics tools. But I didn’t say what kinds of things we might want climate informatics tools to do. Here’s my first attempt, based on a slide I used at the end of my talk on usable climate science:

What do we want the tools to support?

What I was trying to lay out on this slide was a wide range of possible activities for which we could build software tools, combining good visualizations, collaborative support, and compelling user interface design. If we are to improve the quality of the public discourse on climate change, and support the kind of collective decision making that leads to effective action, we need better tools for all four of these areas:

  • Improve the public understanding of the basic science. Much of this is laid out in the IPCC reports, but to most people these are “dead tree science” – lots of thick books that very few people will read. So, how about some dynamic, elegant and cool tools to convey:
    • The difference between emissions and concentrations.
    • The various sources of emissions and how we know about them from detection/attribution studies.
    • The impacts of global warming on your part of the world – health, food and water, extreme weather events, etc.
    • The various mitigation strategies we have available, and what we know about the cost and effectiveness of each.
  • Achieve a better understanding of how the science works, to allow people to evaluate the nature of the evidence about climate change:
    • How science works, as a process of discovery, including how scientists develop theories, and how they correct mistakes.
    • What climate models are and how they are used to improve our understanding of climate processes.
    • How the peer-review process works, and why it is important, both as a filter for poor research, and a way of assessing the credentials of scientists.
    • What it means to be expert in a particular field, why expertise matters, and why expertise in one area of science doesn’t necessarily mean expertise in another.
  • Tools to support critical thinking, to allow people to analyze the situation for themselves:
    • The importance of linking claims to sources of evidence, and the use of multiple sources of evidence to test a claim.
    • How to assess the credibility of a particular claim, and the credibility of its source (desperately needed for appropriate filtering of ‘found’ information on the internet).
    • Systems Thinking – because reductionist approaches won’t help. People need to be able to recognize and understand whole systems and the dynamics of systems-of-systems.
    • Understanding risk – because the inability to assess risk factors is a major barrier to effective action.
    • Identifying the operation of vested interests. Because much of the public discourse isn’t about science or politics. It’s about people with vested interests attempting to protect those interests, often at the expense of the rest of society.
  • And finally, none of the above makes any difference if we don’t also provide tools to support effective action:
    • How to prioritize between short-term and long term goals.
    • How to identify which kinds of personal action are important and effective.
    • How to improve the quality of policy-making, so that policy choices are linked to the scientific evidence.
    • How to support consensus building and democratic action for collective decision making, at the level of communities, cities, nationals, and globally.
    • Tools to monitor effectiveness of policies and practices once they are implemented.

I’m delighted to announce that my student, Jonathan Lung has started a blog. Jonathan’s PhD is on how we reduce energy consumption in computing. Unlike much work on green IT, he’s decided to focus on the human behavioural aspects of this, rather than hardware optimization. His first two posts are fascinating:

  • How to calculate if you should print something out or read it on the screen. Since he first did these calculations, we’ve been discussing how you turn this kind of analysis into an open, shared, visual representation, that others can poke and prod, to test the assumptions, customize them to their own context, and discuss. We’ll share more of our design ideas for such a tool in due course.
  • An analysis of whether the iPad is as green as Apple’s marketing claims. Which is, in effect, a special case of the more general calculation of print vs. screen. Oh, and his analysis also makes me feel okay about my desire to own an iPad…

As Jorge points out, this almost completes my set of grad student bloggers. We’ve been experimenting with blogging as a way of structuring research – a kind of open notebook science. Personally, I find it extremely helpful as a way of forcing me to write down ideas (rather than just thinking them), and for furthering discussion of ideas through the comments. And, just as importantly, it’s a way of letting other researchers know about what you’re working on – grad students’ future careers depend on them making a name for themselves in their chosen research community.

Of course, there’s a downside: grad students tend to worry about being “scooped”, by having someone else take their ideas, do the studies, and publish them first. My stock response is something along the lines of “research is 99% perspiration and 1% inspiration” – the ideas themselves, while important, are only a tiny part of doing research. It’s the investigation of the background literature and the implementation (design an empirical study, build a tool, develop a new theory, …etc) that matters. Give the same idea to a bunch of different grad students, and they will all do very different things with it, all of which (if the students are any good) ought to be publishable.

On balance, I think the benefits of blogging your way through grad school vastly outweigh the risks. Now if only my students updated their blogs more regularly… (hint, hint).

When I was visiting MPI-M earlier this month, I blogged about the difficulty of documenting climate models. The problem is particularly pertinent to questions of model validity and reproducibility, because the code itself is the result of a series of methodological choices by the climate scientists, which are entrenched in their design choices, and eventually become inscrutable. And when the code gets old, we lose access to these decisions. I suggested we need a kind of literate programming, which sprinkles the code among the relevant human representations (typically bits of physics, formulas, numerical algorithms, published papers), so that the emphasis is on explaining what the code does, rather than preparing it for a compiler to digest.

The problem with literate programming (at least in the way it was conceived) is that it requires programmers to give up using the program code as their organising principle, and maybe to give up traditional programming languages altogether. But there’s a much simpler way to achieve the same effect. It’s to provide an organising structure for existing programming languages and tools, but which mixes in non-code objects in an intuitive way. Imagine you had an infinitely large sheet of paper, and could zoom in and out, and scroll in any direction. Your chunks of code are laid out on the paper, in an spatial arrangement that means something to you, such that the layout helps you navigate. Bits of documentation, published papers, design notes, data files, parameterization schemes, etc can be placed on the sheet, near to the code that they are relevant to. When you zoom in on a chunk of code, the sheet becomes a code editor; when you zoom in on a set of math formulae, it becomes a LaTeX editor, and when you zoom in on a document it becomes a word processor.

Well, Code Canvas, a tool under development in Rob Deline‘s group at Microsoft Research does most of this already. The code is laid out as though it was one big UML diagram, but as you zoom in you move fluidly into a code editor. The whole thing appeals to me because I’m a spatial thinker. Traditional IDEs drive me crazy, because they separate the navigation views from the code, and force me to jump from one pane to another to navigate. In the process, they hide the inherent structure of a large code base, and constrain me to see only a small chunk at a time. Which means these tools create an artificial separation between higher level views (e.g. UML diagrams) and the code itself, sidelining the diagrammatic representations. I really like the idea of moving seamlessly back and forth between the big picture views and actual chunks of code.

Code Canvas is still an early prototype, and doesn’t yet have the ability to mix in other forms of documentation (e.g. LaTeX) on the sheet (or at least not in any demo Microsoft are willing to show off), but the potential is there. I’d like to explore how we take an idea like this an customize it for scientific code development, where there is less of a strict separation of code and data than in other forms of programming, and where the link to published papers and draft reports is important. The infinitely zoomable paper could provide an intuitive unifying tool to bring all these different types of object together in one place, to be managed as a set. And the use of spatial memory to help navigate will be helpful, when the set of things gets big.

I’m also interested in exploring the idea of using this metaphor for activities that don’t involve coding – for example complex decision-support for sustainability, where you need to move between spreadsheets, graphs & charts, models runs, and so on. I would lay out the basic decision task as a graph on the sheet, with sources of evidence connecting into the decision steps where they are needed. The sources of evidence could be text, graphs, spreadsheet models, live datafeeds, etc. And as you zoom in over each type of object, the sheet turns into the appropriate editor. As you zoom out, you get to see how the sources of evidence contribute to the decision-making task. Hmmm. Need a name for this idea. How about DecisionCanvas?

Update: Greg also pointed me to CodeBubbles and Intentional Software

I picked up Stephen Schneider’s “Science as a Contact Sport” to read on travel this week. I’m not that far into it yet (it’s been a busy trip), but was struck by a comment in chapter 1 about how he got involved in climate modeling. In the late 1960’s, he was working on his PhD thesis in plasma physics, and (in his words) “knew how to calculate magneto-hydro-dynamic shocks at 20,000 times the speed of sound”, with “one-and-a-half dimensional models of ionized gases” (Okay, I admit it, I have no idea what that means, but it sounds impressive)…

…Anyway, along comes Joe Smagorinsky from Princeton, to give a talk on the challenges of modeling the atmosphere as a three-dimensional fluid flow problem on a rotating sphere, and Schneider is immediately fascinated by both the mathematical challenges and the potential of this as important and useful research. He goes on to talk about the early modeling work and the mis-steps made in the early 1970’s on figuring out whether the global cooling from aerosols would be stronger than the global warming from greenhouse gases, and getting the relative magnitudes wrong by running the model without including the stratosphere. And how global warming denialists today like to repeat the line about “first you predicted global cooling, then you predicted global warming…” without understanding that this is exactly how science proceeds, by trying stuff, making mistakes, and learning from them. Or as Ms. Frizzle would say, “Take chances! Make Mistakes! Get Messy!” (No, Schneider doesn’t mention Magic School Bus in the book. He’s too old for that).

Anyway, I didn’t get much further reading the chapter, because my brain decided to have fun with the evocative phrase “modeling the atmosphere as a three-dimensional fluid flow problem on a rotating sphere”, which is perhaps the most succinct description I’ve heard yet of what a climate model is. And what would happen if Ms. Frizzle got hold of this model and encouraged her kids to “get messy” with it. What would they do?

Let’s assume the kids can run the model, and play around with its settings. Let’s assume that they have some wonderfully evocative ways of viewing the outputs too, such as these incredible animations of precipitation from a model (my favourite is “August“) from NCAR, and where greenhouse gases go after we emit them (okay, the latter was real data, rather than a model, but you get the idea).

What experiments might the kids try with the model? How about:

  1. Stop the rotation of the earth. What happens to the storms? Why? (we’ll continue to ask “why?” for each one…)
  2. Remove the land-masses. What happens to the gulf stream?
  3. Remove the ice at the poles. What happens to polar temperatures? Why? (we’ll switch to a different visualization for this one)
  4. Remove all CO2 from the atmosphere. How much colder is the earth? Why? What happens if you leave it running?
  5. Erupt a whole bunch of volcanoes all at once. What happens? Why? How long does the effect last? Does it depend on how many volcanoes you use?
  6. Remove all human activity (i.e. GHG emissions drop to zero instantly). How long does it take for the greenhouse gases to return to the levels they were at before the industrial revolution? Why?
  7. Change the tilt of the earth’s axis a bit. What happens to seasonal variability? Why? Can you induce an ice age? If so, why?
  8. Move the earth a little closer to the sun. What happens to temperatures? How long do they take to stabilize? Why that long?
  9. Burn all the remaining (estimated) reserves of fossil fuels all at once. What happens to temperatures? Sea levels? Polar ice?
  10. Set up the earth as it was in the last ice age. How much colder are global temperatures? How much colder are the poles? Why the difference? How much colder is it where you live?
  11. Melt all the ice at the poles (by whatever means you can). What happens to the coastlines near where you live? Over the rest of your continent? Which country loses the most land area?
  12. Keep CO2 levels constant at the level they were at in 1900, and run a century-long simulation. What happens to temperatures? Now try keeping aerosols constant at 1900 levels instead. What happens? How do these two results compare to what actually happened?

Now compare your answers with what the rest of the class got. And discuss what we’ve learned. [And finally, for the advanced students – look at the model software code, and point to the bits that are responsible for each outcome… Okay, I’m just kidding about that bit. We’d need literate code for that].

Okay, this seems like a worthwhile project. We’d need to wrap a desktop-runnable model in a simple user interface with the appropriate switches and dials. But is there any model out there that would come anywhere close to being useable in a classroom situation for this kind of exercise?

(feel free to suggest more experiments in the comments…)

Justyna sent me a pointer to another group of people exploring an interesting challenge for computing and software technology: The Crisis Mappers Net. I think I can characterize this as another form of collective intelligence, harnessed to mobile networks and visual analytics, to provide rapid response to humanitarian emergencies. And of course, after listening to George Monbiot in the debate last night, I’m convinced that over the coming decades, the crises to be tackled will increasingly be climate related (forest fires, floods, droughts, extreme weather events, etc).

While at Microsoft last week, Gina Venolia introduced me to George. Well, not literally, as he wasn’t there, but I met his proxy. Gina and co have been experimenting with how to make a remote team member feel part of the team, without the frequent travel, in the Embodied Social Proxies project. The current prototype kit comes pretty close to getting that sense of presence (Gina also has a great talk on this project):

The kit cost about $4000 to put together, and includes:

  • a monitor for a life-sized headshot;
  • two cameras – a very wide angle camera to capture the whole room, plus a remote control camera to pan and zoom (e.g. for seeing slides);
  • noise canceling telecom unit for audio;
  • adjustable height rig to allow George to sit or stand;
  • and of course, wheels, so he can be pushed around to different workspaces.

Now, the first question I had was: could this solve our problem of allowing remote participants to join in a hands-on workshop at a conference? At the last workshop on software research and climate change, we had the great idea that remote participants could appear on a laptop via skype, and be carried around between breakout sessions by a local buddy. Of course, skype wasn’t up to the job, and our remote participants ended up having their own mini-workshop. I suspect the wireless internet at most conferences won’t handle this either – the connections tend to get swamped.

But I still think the idea has legs (well, not literally!). See, $4000 is about what it would cost in total travel budget to send someone to Cape Town for the next ICSE. If we can buy much of the kit we need to create a lightweight version of the ESP prototype locally in Cape Town, and use a laptop for the monitor, we could even throw away much of the kit at the end of the conference and still come in under the typical travel budget (not that we would throw it away though!). I think the biggest challenges will be getting a reliable enough internet connection (we’ll probably need to set up our own routers), and figuring out how to mount the kit onto some local furniture for some degree of portability.

Well, if we’re serious about finding solutions to climate change, we have to explore ideas like this.

PS Via this book (thx, Greg) I learned the word “detravelization”. No idea if the chapter on detravelization is any good (because Safari books online doesn’t work at UofT), but I’m clearly going to have a love-hate relationship with a word that’s simultaneously hideous and perfectly apt.

Criteria for tools that communicate climate science to a broader audience (click for bigger)

Criteria for tools that communicate climate science to a broader audience (click for bigger)

I gave my talk last night to TorCHI on Usable Climate Science. I think it went down well, especially considering that I hadn’t finished preparing the slides, and had just gotten off the plane from Seattle. I’ll post the slides soon, once I have a chance to tidy them up. But, judging by the questions and comments, one slide in particular went down well.

I put this together when trying to organize my thoughts about what’s wrong with a number of existing tools/websites in the space of climate science communication. I’ll post the critique of existing tools soon, but I guess I should first explain the criteria:

  • Trustworthy (i.e. the audience must be able to trust the information content):
    • Collective Curation captures the idea that a large community of people is responsible for curating the information content. The extreme example is, of course, wikipedia.
    • Open means that we can get inside and see how it’s all put together. Open source and open data probably need no explanation, but I also want to get across the idea of “open reasoning” – for example, users need access to the calculations and assumptions built into any tool that gives recommendations for energy choices.
    • Provenance means that we know where the information came from, and can trace it back to source. Especially important is the ability to trace back to peer-reviewed scientific literature, or to trusted experts.
    • And the tool should help to build a community by connecting people with one another, through sharing of their knowledge.
  • Appropriate (i.e. the form and content of the information must be appropriate to the intended audience(s)):
    • Accessible for audience – information must build on what people already know, and be provided in a form that allows them to assimilate it (Vygotsky’s Zone of Proximal Development captures this idea well).
    • Contextualized means that the tool provides information that is appropriate to the audience’s specific context, or can be customized for that context. For example, information about energy choices depends on location.
    • Zoomable means that different users can zoom in for more detailed information if they wish. I particularly like the idea of infinite zoom shown off well in this demo. But I don’t just mean visually zoomable – I mean zoomable in terms of information detail, so people who want to dive into the detailed science can if they wish.
  • Effective (i.e. actually works at communicating information and stimulating action):
    • Narrative force is something that seems to be missing from most digital media – the tool must tell a story rather than just provide information.
    • Get the users to form the right mental models so that they understand the science as more than just facts and figures, and understand how to think about the risks.
    • Support exploration to allow users to follow their interests. Most web-based tools are good at this, but often at the expense of narrative force.
    • Give the big picture. For climate change this is crucial – we need to encourage systems thinking if we’re ever going to get good at collective decision making.
  • Compelling (i.e. something that draws people in):
    • Cool, because coolness is how viral marketing works. If it’s cool people will tell others about it.
    • Engaging, so that people want to use it and are drawn in by it.
    • Fun and Entertaining, because we’re often in danger of being too serious. This is especially important for stuff targeted at kids. If it’s not as much fun as the latest video games, then we’re already losing their attention.

During the talk, one of the audience members suggested adding actionable to my list, i.e. it actually leads to appropriate action, changes in behaviour, etc. I’m kicking myself for forgetting this, and can’t now decide whether it belongs under effective, or is an entirely new category. I’ll welcome suggestions.

I’ve finally managed to post the results of our workshop on Software Research and Climate Change, held at Onward/Oopsla last month. We did lots of brainstorming, and attempted to cluster the ideas, as you can see in the photos of our sticky notes.

After the workshop, I attempted to boil down the ideas even further, and came up with three clusters of research:

  1. Green IT (i.e. optimize power consumption of software and all things controlled by software (also known as “make sure ICT is no longer part of the problem”). Examples of research in this space include:
    • Power aware computing (better management of power in all devices from mobile to massive installations).
    • Green controllers (smart software to optimize and balance power consumption in everything that consumes power).
    • Sustainability as a first class requirement in software system design.
  2. Computer-Supported Collaborative Science (also known as eScience – i.e. software to support and accelerate inter-disciplinary science in climatology and related disciplines). Examples of research in this space include:
    • Software engineering tools/techniques for climate modellers
    • Data management for data-intensive science
    • Open Notebook science (electronic notebooks)
    • Social network tools for knowledge finding and expertise mapping
    • Smart ontologies
  3. Software to improve global collective decision making (which includes everything from tools to improve public understanding of science through to decision support at multiple levels: individual, community, government, inter-governmental,…). Examples of research in this space include:
    • Simulations, games, educational software to support public understanding of the science (usable climate science)
    • massive open collaborative decision support
    • carbon accounting for corporate decision making
    • systems analysis of sustainability in human activity systems (requires multi-level systems thinking)
    • better understanding of the processes of social epistemology

My personal opinion is that (1) is getting to be a crowded field,  which is great, but will only yield up to about 15% of the 100% reduction in carbon emissions we’re aiming for. (2) is has been mapped out as part of several initiatives in the UK and US on eScience, but  there’s still a huge amount to be done. (3) is pretty much a green  field (no pun intended) at the moment. It’s this third area that  fascinates me the most.