Our group had three posters accepted for presentation at the upcoming AGU Fall Meeting. As the scientific program doesn’t seem to be amenable to linking, here are the abstracts in full:

Poster Session IN11D. Management and Dissemination of Earth and Space Science Models (Monday Dec 14, 2009, 8am – 12:20pm)

Fostering Team Awareness in Earth System Modeling Communities

S. M. Easterbrook; A. Lawson; and S. Strong
Computer Science, University of Toronto, Toronto, ON, Canada.

Existing Global Climate Models are typically managed and controlled at a single site, with varied levels of participation by scientists outside the core lab. As these models evolve to encompass a wider set of earth systems, this central control of the modeling effort becomes a bottleneck. But such models cannot evolve to become fully distributed open source projects unless they address the imbalance in the availability of communication channels: scientists at the core site have access to regular face-to-face communication with one another, while those at remote sites have access to only a subset of these conversations – e.g. formally scheduled teleconferences and user meetings. Because of this imbalance, critical decision making can be hidden from many participants, their code contributions can interact in unanticipated ways, and the community loses awareness of who knows what. We have documented some of these problems in a field study at one climate modeling centre, and started to develop tools to overcome these problems. We report on one such tool, TracSNAP, which analyzes the social network of the scientists contributing code to the model by extracting the data in an existing project code repository. The tool presents the results of this analysis to modelers and model users in a number of ways: recommendation for who has expertise on particular code modules, suggestions for code sections that are related to files being worked on, and visualizations of team communication patterns. The tool is currently available as a plugin for the Trac bug tracking system.

Poster Session IN31B. Emerging Issues in e-Science: Collaboration, Provenance, and the Ethics of Data (Wednesday Dec 16, 2009, 8am – 12:20pm)

Identifying Communication Barriers to Scientific Collaboration

A. M. Grubb; and S. M. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

The lack of availability of the majority of scientific artifacts reduces credibility and discourages collaboration. Some scientists have begun to advocate for reproducibility, open science, and computational provenance to address this problem, but there is no consolidated effort within the scientific community. There does not appear to be any consensus yet on the goals of an open science effort, and little understanding of the barriers. Hence we need to understand the views of the key stakeholders – the scientists who create and use these artifacts.

The goal of our research is to establish a baseline and categorize the views of experimental scientists on the topics of reproducibility, credibility, scooping, data sharing, results sharing, and the effectiveness of the peer review process. We gathered the opinions of scientists on these issues through a formal questionnaire and analyzed their responses by topic.

We found that scientists see a provenance problem in their communications with the public. For example, results are published separately from supporting evidence and detailed analysis. Furthermore, although scientists are enthusiastic about collaborating and openly sharing their data, they do not do so out of fear of being scooped. We discuss these serious challenges for the reproducibility, open science, and computational provenance movements.

Poster Session GC41A. Methodologies of Climate Model Confirmation and Interpretation (Thursday Dec 17, 2009, 8am – 12:20pm)

On the software quality of climate models

J. Pipitone; and S. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. Defect density analysis is an established software engineering technique for studying software quality. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.

This week I’m at OOPSLA, mainly for the workshop on software research and climate change, which went exceedingly well (despite some technical hiccups), and which I will blog once I get my notes together. Now I can relax and enjoy the rest of the conference.

Today, Tom Malone from MIT is giving a keynote talk to kick off the Onward! track. Tom and I chatted over dinner last night about his Climate Collaboratorium project, which is an attempt to meet many of the goals I’ve been discussing about creating tools to foster a constructive public discourse about climate change and its solutions. So I’m keen to hear what he has to say in his keynote.

13:32: Bernd Bruegge is giving an overview of what the Onward! conference (part of Oopsla) is about. This year, Onward! has grown from a track within Oopsla to being a fully fledged co-located conference of its own.

13:36: He’s now introducing Tom Malone. Which reminds me I ought to get his book, The Future of Work. Okay, now Tom’s up, and his talk is entitled “The Future of Collective Intelligence”. His opening question is “who here is happy?” – he got us to raise hands. Looks like the overwhelming majority of the audience are happy. His definition of collective intelligence deliberately dodges the question of what intelligence is: “Groups of individuals doing things collectively that seem intelligent”. Oh, and collective stupidity also happens, and one of the interesting research questions is to figure out why. By this definition, collective intelligence has existed for centuries, but recently new forms have arrived. For example, the way google searches work; and of course, wikipedia. For wikipedia, the key enabler was the organisational design, rather than the technology. More examples: digg, youtube, linux, prediction markets,…

His core research question is: “how can people and computers be connected so that collectively they act more intelligently than any person, group or computer has ever done before?” It’s an aspirational question, but to answer it we need a systematic attempt to understand collective intelligence (rather than just marveling at the various instances). First attempt was to identify and understand different species of collective intelligence. Realised that a more productive metaphor was to look for individual genes that are common across several difference species. Or put another way, what are the design patterns?

Four questions for underlying activity involved in every design pattern: who is doing it? what are they doing? why are they doing it? and how? Tom challenged us to think about what percentage of the intelligence, energy etc of the people in the organisation you are in right now (e.g. the Oopsla conference) are actually available to the organisation. Most people in the audience had low numbers: 30% or less, lots said less than 10%. Then he showed us a video of an experiment in which many people in a large room were collectively driving a simulated airplane. Everyone had a two sided reflective wand (red on one side, green on the other). Half the people controlled up and down, the other half controlled left and right. The video was hilarous, but also surprising in how well the audience did.

So, in this example, the “who” is the crowd. The crowd gene is useful when the locations of knowledge needed for a task are distributed over a crowd, and you’re not sure a priori where it is, but only works when attempts to subvert the task can be controlled in some way.

The why boils down to love, glory, or money. Appealing to love and glory, rather than money, can reduce costs (but not always). E.g. make the task fun and people will chose to do it anyway. More interestingly you can influence the direction of the task by offering money or glory for certain actions. But most people get the motivational factors wrong (or just don’t think about it).

The what often boils down to Create or Decide. Which then gives us four situations depending on whether the crowd does pieces of the task independently or not:

  1. Collection: ‘create’ task where the pieces are independent. Examples include Wikipedia, Sourceforge and Youtube. A special subcategory of the collection pattern is the competition pattern, where you only need a few of the pieces. Eg: TopCoder. and the Netflix prize. In the latter, the offer of a $1 million prize motivated many people to work on this for two years. Eventually, several competing teams combined their solutions, and collectively they met the goal of 10% improvement in Netflix’s movie recommender. Another example: the Matlab programming contests. In this one, the competing algorithms are made available to all teams, so they can take each other’s ideas and incorporate them. This mix of competition and collaboration appears to be strangely addictive to many people. Competition pattern useful when only one (or a few) good solutions are needed, and the motivation is clear.
  2. Collaboration: ‘create’ tasks where the pieces are dependent on one another. Wikipedia is also an example of this, because different edits to the same article are highly inter-dependent. These dependencies are coordinated in wikipedia through the talk pages. In Linux the coordination is through the discussion forums. Tom’s Climate Collaboratorium is another example. In this project, plans are proposed and discussed and voted on, and by combining the plans, the aim is to create better plans than would be available without the collaboration. Managing the inter-dependencies turns out to be the hard part of Collaboration projects. Most existing examples rely on manual coordination mechanisms. Interesting question is what automated support can be provided. Suggestions here include better explicit representations of the interdependencies. The Collaborative pattern works when a large scale task needs doing, there is no satisfatory way of dividing up the task into independent pieces, but there is a way to manage interdependencies.
  3. Group Decision: ‘decide’ tasks where the pieces are inter-dependent. Simple mechanisms include:
    • voting. Interesting example is a baseball team where the fans can do internet voting to decide batting order, pitching rotation, starting line-up, etc. They did this for one season and lost most of their games, possibly because fans of other teams sabotaged the voting. Similar attempt by a UK soccer team, but where you have to be an “owner” of the team (35 pounds per year) to vote. This team seems to be doing well. Another example: Kasparov vs. the world. Expected that Kasparov would win easily, in fact he later said it was the hardest game he ever played. One key was that the crowd could discuss their moves over a 24 hour period before voting on them.
    • consensus. This is what is used in wikipedia when there are disagreements.
    • averaging. Useful in some group estimation tasks. The averages of a large number of individual estimates are often more accurate than individual estimates. Another example: NASA clickworkers, used to get crowds of people to identify craters on photos of astronomical bodies. The averaging of many novices did pretty much the same as experts (and much more cheaply). Another example is prediction markets. Great example: Microsoft used prediction markets to assess the likely release date of an internal product. Quickly found that their expected release date was way earlier than the people participating thought, and the product manager was then alerted to problems with the project that were known among some of the team, but had not been communicated.
  4. Individual Decision: ‘decide’ tasks where the pieces are independent. For example:
    • the market pattern, where everyone makes individual decisions about when to buy things, at what price. The Amazon Mechanical Turk is another example. One of Tom’s students has written a toolkit for iteratively launching and then intergrating mechanical turk tasks.
    • social network pattern – people make individual decisions, but without any money changing hands. For example the amazon recommendation system.

Observations of this analysis: Genes (patterns) don’t occur in isolation, but in particular combinations. For example, across the range of tasks involved in deciding which wikipedia articles to keep, and editing those articles, many of the different patterns across all four quadrants are used. There are also families of similar combinations. E.g.innocentive and threadless are almost identical in terms of the patterns they use, with the only difference being the threadless also includes a crowd vote.

Tom finished with some speculative comments about seeing us at some point in the future as a single global brain, and closed with a quote from Kevin Kelly’s We are the Web:

There is only one time in the history of each planet when its inhabitants first wire up its innumerable parts to make one large Machine. Later that Machine may run faster, but there is only one time when it is born.

You and I are alive at this moment.

PS: most of the ideas in the talk are in the paper Harnessing crowds.

I’ve just been browsing the sessions for the AGU Fall Meeting, to be held in San Francisco in December. Abstracts are due by September 3. The following sessions caught my attention:

Plus some sessions that sound generally interesting:

This afternoon, I’m at the science 2.0 symposium, or “What every scientist needs to know about how the web is changing the way they work”. The symposium has been organised as part of Greg’s Software Carpentry course. There’s about 120 people here, good internet access, and I got here early enough to snag a power outlet. And a Timmie’s just around the corner for a supply of fresh coffee. All set.

1:05pm. Greg’s up, introducing the challenge: for global challenges (e.g. disease control, climate change) we need two things: Courage and Science. Most of the afternoon will be talking about the latter. Six speakers, 40 minutes each, wine and cheese to follow.

1:08pm. Titus Brown, from Michigan State U. Approaching Open Source Science: Tools Approaches. Aims to talk about two things: how to suck people into your open source project, and automated testing. Why open source? Ideologically: for reproducibility and open communication. Idealistically: can’t change the world by keeping what you do secret. Practical reason: other people might help. Oh and “Closed-source science” is an oxymoron. First, the choice of license probably doesn’t matter, because it’s unlikely anyone will ever download your software. Basics: every open source project should have a place to get the latest release, a mailing list, and an openly accessible version control system. Cute point: a wiki and issue tracker are useful if you have time and manpower, but you don’t, so they’re not.

Then he got into a riff about whether or not to use distributed version control (e.g. git). This is interesting because I’ve heard lots of people complain that tools like git can only be used by ubergeeks (“you have to be Linus Torvolds to use it). Titus has been using it for 6 months, and says it has completely changed his life. Key advantages: decouples developers from the server, hence ability to work offline (on airplanes), but still do version control commits. Also, frees you from “permission” decisions – anyone can take the code and work on it independently (as long as they keep using the same version control system). But there are downsides – creates ‘effective forks’, which might then lead to code bombs – someone who wants to remerge a fork that has been developed independently for months, and which then affects large parts of the code base.

Open development is different to open source. The key question is do you want to allow others to take the code and do their own things with it, or do you want to keep control of everything (professors like to keep control!). Oh, and you open yourself up to “annoying questions” about design decisions, and frank (insulting) discussion of bugs. But the key idea is that these are the hallmarks of a good science project – a community of scientists thinking and discussing design decisions and looking for potential errors.

So, now for some of the core science issues. Titus has been working on Earthshine – measuring the albedo of the earth by measuring how much radiation from the earth lights up the (dark side of the) moon. He ended up looking though the PVwave source code, trying to figure out what the grad student working on the project was doing. By wading through the code, he discovered the student had been applying the same correction to the data multiple times, to try and get a particular smoothing. But the only people who understood how the code worked were the grad student and Titus. Which means there was no way, in general, to know that the code works. Quite clearly, “code working” should not be judged by whether it does what the PI thinks it should do. In practice the code is almost never right – more likely that the PI has the wrong mental model. Which lead to the realization that we don’t teach young scientists how to think about software – including being suspicious of their code. And CS programs don’t really do this well either. And fear of failure doesn’t seem to be enough incentive – there are plenty of examples where software errors have lead to scientific results being retracted.

Finally, he finished off with some thoughts about automated testing. E.g. regression testing is probably the most useful thing scientists can do with their code: run the changed code and compare the new results with the old ones. If there are unexpected changes, then you have a problem. Oh, and put assert statements in to check that things that should never occur don’t ever occur. Titus also suggests that code coverage tools can be useful for finding dead code, and continuous integration is handy if you’re building code that will be used on multiple platforms, so an automated process builds the code and tests it on multiple platforms, and reports when something broke. Bottom line: automated testing allows you to ‘lock down’ boring code (code that you understand), and allows you to focus on ‘interesting’ code.

Questions: I asked whether he has ever encountered problems with the paranoia among some scientific communities, for example, fear of being scooped, or journals who refuse to accept papers if any part has already appeared on the web. Titus pointed out that he has had a paper rejected without review, because when he mentioned that many people were already using the software, the journal editor then felt this means it was not novel. Luckily, he did manage to publish it elsewhere. Journals have to take the lead by, for example, refusing to publish paper unless the software is open, because it’s not really science otherwise.

1:55pm. Next up Cameron Neylon, “A Web Native Research Record: Applying the Best of the Web to the Lab Notebook”. Cameron’s first slide is a permission to copy, share, blog, etc. the contents of the talk (note to self – I need this slide). So the web is great for mixing, mashups, syndicated feeds, etc. Scientists need to publish, subscribe, syndicate (e.g. updates to handbooks), remix (e.g. taking ideas from different disciplines and pull them together to get new advances). So quite clearly, the web is going to solve all our problems, right?

But our publication mechanisms is dead, broken, disconnected. A PDF of a scientific paper is a deadend, when really it should be linked to data, sources, citations, etc. It’s the links between things that matter. Science is a set of loosely coupled chunks of knowledge, they need to be tightly wired to each other so that we understand their context, we understand their links. A paper is too big a piece to be thought of as a typical “chunk of science”. A tweet (example was of MarsPhoenix team announcing they found ice on Mars) is too small, and too disconnected. A blog post seems about right. It includes embedded links (e.g. to detailed information about the procedures and materials used in an experiment). He then shows how his own research group is using blogs as online lab notebooks. Even better, some blog posts are generated automatically by the machines (when dealing with computational steps in the scientific process). Then if you look at the graph of the ‘web of objects’, you can tell certain things about them. E.g. an experiment that failed occupies a certain position in the graph; a set of related experiments appear as a cluster; a procedure that wasn’t properly written up might appear as a disconnected note; etc.

Now, how do we get all this to work? Social tagging (folksonomies) don’t work well because of inconsistent use of tagging, not just across different people, but over time by the same person. Templates help, and the evolution of templates over time tells you a lot about the underlying ontology of the science (both the scientific process and the materials used). Cameron even points out places where their the templates they have developed don’t fit well with established taxonomies of materials developed (over many years) within his field, and that these mismatches reveal problems in the taxonomies themselves, where they have ignored how materials are actually used.

So, now everything becomes a digital object: procedures, analyses, materials, data. What we’re left with is the links between them. So doing science becomes a process of creating new relationships, and what you really want to know about someone’s work is the (semantic) feed of relationships created. The big challenge is the semantic part – how do we start to understand the meaning of the links. Finally, a demonstration of how new tools like Google Wave can support this idea – e.g. a Wave plugin that automates the creation of citations within a shared document (Cameron has an compelling screen capture of someone using it).

Finally, how do we measure research impact? Eventually, something like pagerank. Which means scientists have to be wired into the network, which means everything we create has to be open and available. Cameron says he’s doing a lot less of the traditional “write papers and publish” and much more of this new “create open online links”). But how do we persuade research funding bodies to change their culture to acknowledge and encourage these kinds of contribution? Well, 70% of all research is basically unfunded – done on a shoestring.

2:40pm. slight technical hitch getting the next speaker (Michael) set up, so a switch of speakers: Victoria Stodden, How Computational Science is Changing the Scientific Method. Victoria is particularly interested in reproducibility in scientific research, and how it can be facilitated. Massive computation changes what we can do in science, e.g. data mining for subtle patterns in vast databases, and large scale simulations of complex processes. Examples: climate modeling, high energy physics, astrophysics. Even mathematical proof is affected – e.g. use of a simulation to ‘prove’ a mathematical result. But is this really a valid proof? Is it even mathematics?

So, effectively this might be a third branch of science. (1) deductive method for theory development – e.g. mathematics and logic (2) inductive/empirical – the machinery of hypothesis testing. And now (3) large scale extrapolation and prediction. But there’s lots of contention about this third branch. E.g. Anderson “The End of Theory“, Hillis rebuttal – we look for patterns first, and then create hypotheses, just as we always have. Weinstein points out that simulation underlies the other branches – tools to build intuitions, and tools to test hypotheses. Scientific approach is primarily about the ubiquity of error, so that the main effort is to track down and understand sources of error.

Although computational techniques being widely used now (e.g. in JASA, over the last decade, grown to more than half the papers using them), but very few make their code open, and very little validation going on, which means that there is increasingly a credibility crisis. Scientists make their papers available, but not their complete body of research. Changes are coming (e.g. Madagascar, Sweave,…), and the push towards reproducibility pioneered by Jon Claerbout.

Victoria did a study of one particular subfield: Machine Learning. Surveyed academics attending one of the top conferences in the field (NIPS). Why did they not share? Top reason: time it takes to document and clean up the code and data. Then, not receiving attribution, possibility of patents, legal barriers such as copyright, and potential loss of future publications. Motivations to share are primarily communitarian (for the good of science/community), while most of the barriers are personal (worries about attribution, tenure and promotion, etc).

Idea: take the creative commons license model, and create a reproducible research standard. All media components get released under as CC BY license, code gets released under some form of BSD license. But what about data? Raw facts alone are not generally copyrightable, so this gets a little complicated. But the expression of facts in a particular way is.

So, what are the prospects for reproducibility? Simple case: small scripts and open data. But harder case: inscrutible code and organic programming. Really hard case: massive computing platforms and streaming data. But it’s not clear that readability of the code is essential, e.g. Wolfram Alpha – instead of making the code readable (because in practice nobody will read it), make it available for anyone to run it in any way they like.

Finally, there’s a downside to openness, in particular, a worry that science can be contaminated because anyone can come along, without the appropriate expertise, and create unvalidated science and results, and they will get cited and used.

3:40pm. David Rich. Using “Desktop” Languages for Big Problems. David starts of with an analogy of different types of drill – e.g. a hand drill – trivially easy to use, hard to hurt yourself, but slow; up to big industrial drills. He then compares these to different programming languages / frameworks. One particular class of tools, cordless electric drills, are interesting because they provide a balance between power and usability/utility. So what languages and tools do scientific programmers need? David presented the results of a survey of their userbase, to find out what tools they need. Much of the talk was about the need/potential for parallelization via GPUs. David’s company has a tool called Star-P which allows users of Matlab and NumPy to transform their code for parallel architectures.

4:10pm. Michael Nielsen. Doing Science in the Open: How Online Tools are Changing Scientific Discovery. Case study: Terry Tao‘s use of blogs to support community approaches to mathematics. In particular, he deconstructs one particular post: Why global regularity for Navier-Stokes is hard, which sets out a particular problem, identifies the approaches that have been used, and has attracted a large number of comments from some of the top mathematicians in the field, all of which helps to make progress on the problem. (similar examples from other mathematicians, such as the polymath project), and a brand new blog for this: polymathprojects.org.

But these examples couldn’t be published in the conventional sense. They are more like the scaling up of a conversation that might occur in a workshop or conference, but allowing the scientific community to continue the conversation over a long period of time (e.g. several years in some cases), and across geographical distance.

These examples are pushing the boundaries of blog and wiki software. But blogs are just the beginning. Blogs and open notebooks enable filtered access to new information sources and new conversations. Essentially, they are restructuring expert attention – people focus on different things and in a different way than before. And this is important because expert attention is the critical limiting factor in scientific research.

So, here’s a radically different idea. Markets are a good way to efficiently allocate scarce resources. So can we create online markets in expert attention. For example Innocentive. One particular example: need in India to get hold of solar powered wireless routers to support a social project (ASSET India) helping women in india escape from exploitation and abuse. So this was set up as a challenge on Innocentive. A 31-yr old software engineering from Texas designed a solution, and it’s now being prototyped.

But, after all, isn’t all this a distraction? Shouldn’t you be writing papers and grant proposals rather than blogging and contributing to wikipedia? When Galileo discovered the rings of Saturn (actually, that Saturn looked like three blobs), he sent an anagram to Kepler, which then allowed him to claim credit. The modern scientific publishing infrastructure was not available to him, and he couldn’t conceive of the idea of open sharing of discoveries. The point being that these technologies (blogs etc) are too new to understand the full impact and use, but we can see ways in which they are already changing the way science is done.

Some very interesting questions followed about attribution of contribution, especially for the massive collaboration examples such as polymath. In answer, Michael pointed to the fact that the record of the collaboration is open and available for inspection, and that letters of recommendation from senior people matter a lot, and junior people who contributed in a strong way to the collaboration will get great letters.

[An aside: I’m now trying to follow this on Friendfeed as well as liveblogging. It’s going to be hard to do both at once]

4:55pm. Last but not least, Jon Udell. Collaborative Curation of Public Events. So, Jon claims that he can’t talk about science itself, because he’s not qualified, but will talk about other consequences of the technologies that we’re talking about. For example, in the discussions we’ve been having with the City of Toronto on it’s open data initiative, there’s a meme that governments sit on large bodies of data, and people would like to get hold of. But in fact, citizens themselves are owners and creators of data, and that’s a more interesting thing to focus on than governments pushing data out to us. For example, posters advertising local community events on lampposts in neighbourhoods around the city. Jon makes the point that this form of community advertising is outperforming the web, which is shocking!

Key idea: syndication hubs. For example, an experiment to collate events in Keene, NH, in the summer of 2009. Takes in datafeeds from various events websites, calendar entries etc. Then aggregates them, and provides feeds out to various other websites. But not many people understand what this is yet – it’s not a destination, but a broker. Or another way of understanding it is as ‘curation’ – the site becomes a curator looking after information about public events, but in a way that distributes responsibility for curation to the individual sources of information, rather than say a person looking after an events diary.

Key principles: syndication is a two way process (need to both subscribe to things and publish your feeds).But tagging and data formating conventions become critical.  The available services form an ecosystem, and they co-evolve, and we’re now starting to understand the eco-system around RSS feeds – sites that are publishers, subscribers, and aggregators. Similar eco-system growing up around iCalendar feeds, but currently missing aggregators. iCalendar is interesting because the standard is 10 years old, but it’s only recently become possible to publish feeds from many tools. And people are still using RSS feeds to do this, when they are the wrong tool – an RSS feed doesn’t expose the data (calendar information) in a usable way.

So how do we manage the metadata for these feeds, and how do we handle the issue of trust (i.e. how do you know which feeds to trust for accuracy, authority, etc)? Jon talks a little about uses of tools like Delicious to bookmark feeds with appropriate metadata, and other tools for calendar aggregation. And the idea of guerilla feed creation – how to find implicit information about recurring events and making them explicit. Often the information is hard to scrape automatically – e.g. information about a regular square dance that is embedded in the image of a cartoon. But maybe this task could be farmed out to a service like mechanical turk.

And these are great examples of computational thinking. Indirection – instead of passing me your information, pass me a pointer to it, so that I can respect your authority over it. Abstraction – we can use any URL as a rendezvous for social information management, and can even invent imaginary ones just for this purpose.

Updates: The twitter tag is tosci20. Andrew Louis also blogged (part of) it, and has some great photos; Joey DeVilla has detailed blog posts on several of the speakers; Titus reflects on his own participation; and Jon Udell has a more detailed write up of the polymath project. Oh, and Greg has now posted the speakers’ slides.

Here’s a very sketchy first second draft for a workshop proposal for the fall. I welcome all comments on this, together with volunteers to be on the organising team. Is this a good title for the workshop? Is the abstract looking good? What should I change?

Update: I’ve jazzed up and rearranged the list of topics, in response to Steffen’s comment to get a better balance between research likely to impact SE itself, vs. research likely to impact other fields.

The First International Workshop on Software Research and Climate Change (WSRCC-1)

In conjunction with: <http://onward-conference.org/> Onward Conference 2009 and <http://www.oopsla.org/oopsla2009/> Oopsla 2009

Workshop website: <http://www.cs.toronto.edu/wsrcc>

ABSTRACT

This workshop will explore the contributions that software research can make to the challenge of climate change. Climate change is likely to be the defining issue of the 21st Century. Recent studies indicate that climate change is accelerating, confirming the most pessimistic of scenarios identified by climate scientists. Our current use of fossil fuels commit the world to around 2°C average temperature rise during this century, and, unless urgent and drastic cuts are made, further heating is likely to trigger any of a number of climate change tipping points. The results will be a dramatic reduction of food production and water supplies, more extreme weather events, the spread of disease, sea level rise, ocean acidification, and mass extinctions. We are faced with the twin challenges of mitigation (avoiding the worst climate change effects by rapidly transitioning the world to a low-carbon economy) and adaptation (re-engineering the infrastructure of modern society so that we can survive and flourish on a hotter planet).

These challenges are global in nature, and pervade all aspects of society. To address them, we will need researchers, engineers, policymakers, and educators from many different disciplines to come the the table and ask what they can contribute. There are both short term challenges (such as how to deploy, as rapidly as possible, existing technology to produce renewable energy; how to design government policies and international treaties to bring greenhouse gas emissions under control) and long term challenges (such as how to complete the transition to a global carbon-neutral society by the latter half of this century). In nearly all these challenges, software has a major role to play as a critical enabling technology.

So, for the software research community, we can frame the challenge as follows: How can we, as experts in software technology, and as the creators of future software tools and techniques, apply our particular knowledge and experience to the challenge of climate change? How can we understand and exploit the particular intellectual assets of our community — our ability to:

  • think computationally;
  • understand and model complex inter-related systems;
  • build useful abstractions and problem decompositions;
  • manage and evolve large-scale socio-technical design efforts;
  • build the information systems and knowledge management tools that empower effective decision-making;
  • develop and verify complex control systems on which we now depend;
  • create user-friendly and task-appropriate interfaces to complex information and communication infrastructures.

In short, how can we apply our research strengths to make significant contributions to the problems of mitigation and adaptation of climate change?

This workshop will be the first in a series, intended to develop a community of researchers actively engaged in this challenge, and to flesh out a detailed research agenda that leverages existing research ideas and capabilities. Therefore we welcome any kind of response to this challenge statement.

WORKSHOP TOPICS

We welcome the active participation of software researchers and practitioners interested in any aspect of this challenge. The participants will themselves determine the scope and thrusts of this workshop, so this list of suggested topics is intended to act only as a starting point:

  • requirements analysis for complex global change problems;
  • integrating sustainability into software system design;
  • green IT, including power-aware computing and automated energy management;
  • developing control systems to create smart energy grids and improve energy conservation;
  • developing information systems to support urban planning, transport policies, green buildings, etc.;
  • software tools for open collaborative science, especially across scientific disciplines;
  • design patterns for successful emissions reduction strategies;
  • social networking tools to support rapid action and knowledge sharing among communities;
  • educational software for hands-on computational science;
  • knowledge management and decision support tools for designing and implementing climate change policies;
  • tools and techniques to accelerate the development and validation of earth system models by climate scientists;
  • data sharing and data management of large scientific datasets;
  • tools for creating and sharing visualizations of climate change data;
  • (more…?)

SUBMISSIONS AND PARTICIPATION

Our intent is to create a lively, interactive discussion, to foster brainstorming and community building. Registration will be open to all. However, we strongly encourage participants to submit (one or more) brief (1-page) responses to the challenge statement, either as:

  • Descriptions of existing research projects relevant to the challenge statement (preferably with pointers to published papers and/or online resources);
  • Position papers outlining potential research projects.

Be creative and forward-thinking in these proposals: think of the future, and think big!

There will be no formal publication of proceedings. Instead we will circulate all submitted papers to participants in advance of the workshop, via the workshop website, and invite participants to revise/update/embellish their contributions in response to everyone else’s contributions. Our plan is to write a post-workshop report, which will draw on both the submitted papers and the discussions during the workshop. This report will lay out a suggested agenda for both short-term and long-term research in response to the challenge, and act as a roadmap for subsequent workshops and funding proposals.

IMPORTANT DATES

Position paper submission deadline: September 25th, 2009

Workshop on Software Research and Climate Change: October 25th or 26th,  2009

WORKSHOP ORGANIZERS

+TBD

Okay, I’ve had a few days to reflect on the session on Software Engineering for the Planet that we ran at ICSE last week. First, I owe a very big thank you to everyone who helped – to Spencer for co-presenting and lots of follow up work; to my grad students, Jon, Alicia, Carolyn, and Jorge for rehearsing the material with me and suggesting many improvements, and for helping advertise and run the brainstorming session; and of course to everyone who attended and participated in the brainstorming for lots of energy, enthusiasm and positive ideas.

First action as a result of the session was to set up a google group, SE-for-the-planet, as a starting point for coordinating further conversations. I’ve posted the talk slides and brainstorming notes there. Feel free to join the group, and help us build the momentum.

Now, I’m contemplating a whole bunch of immediate action items. I welcome comments on these and any other ideas for immediate next steps:

  • Plan a follow up workshop at a major SE conference in the fall, and another at ICSE next year (waiting a full year was considered by everyone to be too slow).
  • I should give my part of the talk at U of T in the next few weeks, and we should film it and get it up on the web. 
  • Write a short white paper based on the talk, and fire it off to NSF and other funding agencies, to get funding for community building workshops
  • Write a short challenge statement, to which researchers can respond with project ideas to bring to the next workshop.
  • Write up a vision paper based on the talk for CACM and/or IEEE Software
  • Take the talk on the road (a la Al Gore), and offer to give it at any university that has a large software engineering research group (assuming I can come to terms with the increased personal carbon footprint 😉
  • Broaden the talk to a more general computer science audience and repeat most of the above steps.
  • Write a short book (pamphlet) on this, to be used to introduce the topic in undergraduate CS courses, such as computers and society, project courses, etc.

Phew, that will keep me busy for the rest of the week…

Oh, and I managed to post my ICSE photos at last.

In the last session yesterday, Inez Fung gave the Charney Lecture: Progress in Earth System Modeling since the ENIAC Calculation. But I missed it as I had to go pick up the kids. She has a recent paper that seems to cover some of the same ground, and allegedly the lecture was recorded, so I’m looking forward to watching it once the AGU posts it. And this morning, Joanie Keyplas gave the Rachel Carson Lecture: Ocean Acidification and Coral Reef Ecosystems: A Simple Concept with Complex Findings. She also has a recent paper covering what I assume was in her talk (again, I missed it!). Both lectures were recorded, so I’m looking forward to watching them once the AGU posts them.

I made it to the latter half of the session on Standards-Based Interoperability. I missed Stefano Nativi‘s talk on the requirements analysis for GIS systems, but there’s lots of interesting stuff on his web page to explore. However, I did catch Olga Wilhelmi presenting the results of a community workshop at NCAR on GIS for Weather, Climate and Impacts. She asked some interesting questions about the gathering of user requirements, and we chatted after the session about how users find the data they need (here’s an interesting set of use cases). I also chatted with Ben Domenico from Unidata/UCAR about open science. We were complaining about how hard it is at a conference like this to get people to put their presentation slides on the web. It turns out that some journals in the geosciences have explicit policies to reject papers if any part of the results have already been presented on the web (including in blogs, powerpoints, etc). Ben’s feeling is that these print media are effectively dead, and had some interesting thoughts about moving to electronic publishing, althoug we both worried that some of these restrictive policies might live on in online peer-review venues. (Ben is part of the THREDDS project, which is attempting to improve the way that scientists find and access datasets).

Down at the ESSI poster session, I bumped into Peter Fox, whom I’d met at the EGU meeting last month. We both chatted to Benjamin Branch, about his poster on spatial thinking and earth sciences, and especially how educators approach this. Ben’s PhD thesis looks at all the institutional barriers that prevent changes in high school curricula, all of which mitigate against the nurturing of cross-disciplinary skills (like spatial reasoning) necessary for understanding global climate change. We brainstormed some ideas for overcoming these barriers, including putting cool tools in the students hands (e.g. Google Maps mashups of interesting data sets; or idea that Jon had for a Lego-style constructor kit for building simplified climate models). I also speculated that if the education policy in the US prevents this kind of initiative, we should do it in another country, build it to a major success, and then import it back into the US as a best practice model. Oh, well, I can dream.

Next I chatted to Dicky Allison from Woods Hole, and Tom Yoksas from Unidata/UCAR. Dicky’s poster is on the MapServer project, and Tom shared with us the slides from his talk yesterday on the RAMADDA project, which is intended as a publishing platform for geosciences data. We spent some time playing with the RAMADDA data server, and Tom encouraged us to play with it more, and send comments back on our experiences. Again, most of the discussion was about how to facilitate access to these data sets, how to keep the user interface as simple as possible, and the need for instant access – e.g. grabbing datasets from a server while travelling to a conference, without having to have all the tools and data loaded on a large disk first. Oh, and Tom explained the relationship between NCAR and UCAR, but it’s too complicated to repeat here.

Here’s an aside. Browsing the UCAR pages, I just found the Climate Modeller’s Commandments. Nice.

This afternoon, I attended the session “A Meeting of the Models“, on the use of Multi-model Ensembles for weather and climate prediction. First speaker was Peter Houtekamer, talking about the Canadian Ensemble Prediction Systems (EPS). The key idea of an ensemble is that it samples across the uncertainty in the initial conditions. However, challenges arise from the incomplete understanding of the model-error. So the interesting questions are how to sample adequately across the space, to get a better ensemble spread. The NCEP Short-Range Ensemble Forecast System (SREF), claimed to be the first real-time operational regional ensemble prediction system in the world. Even grander is TIGGE, in which the output of lots of operational EPS’s are combined into an archive. The volume of the database is large (100s of ensemble members), and you really only need something like 20-40 members to get converging scores (he cites Talagrand for this) (aside: Talagrand diagrams are an interesting way of visualizing model spread). NAEFS combines 20-member American (NCEP) and 20-member Canadian (MSC) operational ensembles forecasts, to get a 40-member ensemble. Nice demonstration of how NAEFS outperforms both of the individual ensembles from which it is constructed. Multi-centre ensembles improve the sampling of model error, but impose a big operational cost: data exchange protocols, telecommunications costs, etc. As more centres are added, there are likely to be diminishing returns.

The American Geophysical Union’s Joint Assembly is in Toronto this week. It’s a little slim on climate science content compared to the EGU meeting, but I’m taking in a few sessions as it’s local and convenient. Yesterday I managed to visit some of the climate science posters. I also caught the last talk of the session on connecting space and planetary science, and learned that the solar cycles have a significant temperature impact on the upper atmosphere, but no obvious effect on the lower atmosphere, but more research is needed to understand the impact on climate simulations. (Heather Andres‘ poster has some more detail on this).

This morning, I attended the session on Regional Scale Climate  Change. I’m learning that understanding the relationship between temperature change and increased tropical storm activity is complicated, because tropical storms seem to react to complex patterns of temperature change, rather than just the temperature itself. I’m also learning that you can use statistical downscaling from the climate models to get finer grained regional simulations of the changes in rainfall, e.g. over the US, leading to predictions for increased precipitation over much of the US in the winters and decreased in the summers. However, you have to be careful, because the models don’t capture seasonal variability well in some parts of the continent. A particular challenge for regional climate predictions is that some placed (e.g. Carribean Islands) are just too small to show up in the grids used in General Circulation Models (GCMs), which means we need more work on Regional Models to get the necessary resolution.

Final talk is Noah Diffenbaugh‘s talk on an ensemble approach to regional climate forecasts. He’s using the IPCC’s A1B scenario (but notes that in the last few years, emissions have exceeded those for this scenario). The model is nested – a hight resolution regional model (25km) is nested within a GCM (CCSM3, at T85 resolution), but the information flows only in one direction, from the GCM to the RCM. As far as I can tell, the reason it’s one way, is because the GCM run is pre-computed; specifically, it is taken by averaging 5 existing runs of the CCSM3 model from the IPCC AR4 dataset, and generate 6-hourly 3D atmosphere fields to drive the regional model. The runs show that by 2030-2039, we should expect 6-8 heat stress events per deacade across the whole of the south-west US (where a heat stress event is the kind of thing that should only hit once per  decade). Interestingly, the warming is greater in the south-eastern US, but because the south-western states are already closer to the threshold temperature for heat stress events, they get more heatwaves. Noah also showed some interesting validation images, to demonstrate that the regional model reproduced 20th Century temperatures over the US much better than the GCM does. 

Noah also talked a little about the role of the 2°C threshold used in climate negotiations, particularly at the Copenhagen meeting. The politicians don’t like that the climate scientists are expressing uncertainty about the 2°C threshold. But there has to be, because the models show that even below 2 degrees, there are some serious regional impacts, in this case on the US. His take home message is that we need to seriously question greenhouse gas mitigation targets. One of the questioners pointed out that there is also some confusion between whether the 2°C is supposed to be above pre-industrial temperatures.

After lunch, I attended the session on Breakthrough Ideas and Technologies for a Planet at Risk II. First talk is by Lewis Gilbert on monitoring and managing a planet at risk. First, he noted that really, the planet itself isn’t at risk – destroying it is still outside our capacity. Life will survive. Humans will survive (at least for a while). But it’s the quality of that survival that is at question. Some definitions of sustainability (he has quibbles with them all). First Bruntland’s – future generations should be able to meet their own needs; Natural Capital – future generations should have a standard of living better or equal to our own. Gilbert’s own: existance of a set of possible futures that are acceptable in some satisficing sense. But all of these definitions are based on human values and human life. So the concept of sustainability has human concerns deeply embedded in it. The rest of his talk was a little vague – he described a state space, E, with multiple dimensions (e.g. physical, such as CO2 concentrations; sociological, such as infant mortality in Somalia; biological, such as amphibian counts in Sierra Nevada), in which we can talk about quality of human life a some function of the vectors. The question then becomes what are the acceptable and unacceptable regions of E. But I’m not sure how this helps any.

Alan Robock talked about Geoengineering. He’s conducted studies of the effect of seeding sulphur particles into the atmosphere, using NASA’s climate model. In particular, injecting them over the arctic, where there is the most temperature change, and least impact on humans. His studies show that the seeding does have a significant impact on temperature, but as soon as you stop the seeding, the global warming quickly rises to where it would have been. So basically, once you start, you can’t stop. Also, you get other effects: e.g. a reduction of the tropical monsoons, a reduction of precipitation. Here’s an alternative: could it be done by just seeding in the arctic summer (when the temperature rise matters), and not in the winter. e.g. seed in April, May and June, or just in April, rather than year round. He’s exploring options like these with the model. Interesting aside: Rolling Stone Magazine, Nov 3, 2006 “Dr Evil’s plan to stop Global Warming”. There was a meeting convened by NASA, at which Alan started to create a long list of risks associated with geoengineering (and has a newer paper updating the list currently in submission).

George Shaw talked about biogeologic carbon sequestration. First, he demolished the idea that peak oil / peak coal etc will save us, by calculating the amount of carbon that can be easily extracted by known fossil fuel reserves. Carbon capture ideas include iron fertilization of the oceans, which stimulates plankton growth, which extracts carbon from. Cyanobacteria also extract carbon. E.g. attach an algae farm to every power station smoke stack. However, to make any difference, the algae farm for one power plant might have to be 40-50 square km. He then described a specific case study, of taking the Salton Basin Area in southern California, and filling it up with an algae farm. This would remove a chunk of agricultural land, but would probably make money under the current carbon trading schemes.

Roel Snieder gave a talk “Facing the Facts and Living Our Values”. Interesting graph on energy efficiency, which shows that 60% of the energy we use is lost. Also presents a version of the graph showing cost of intervention against emissions reduction, point out that sequestration is the most expensive choice of all. Another nice point: understanding of the facts – how much CO2 gas is produced by burning all the coal in one railroad car. Answer is about 3 times the weight of the coal, but most people would say only a few ounces, because gases are very light. Also he has a neat public lecture, and encouraged the audience to get out and give similar lectures to the public.

Eric Barron: Beyond Climate Science. It’s a mistake for the climate science community to say that “the science is settled”, and we need to move on to mitigation strategies. Still five things we need:

  1. A true climate services – an authoritative, credible, user-centric source of information on climate (models and data). E.g. Advice on resettlement of threatened towns, advice on forestry management, etc.
  2. Deliberately expand the family of forecasting elements. Some natural expansion of forecasting is occurring, but the geoscience community needs to push this forward deliberately.
  3. Invest in stage 2 science – social sciences and the human dimension of climate change (physical science budget dwarves the social sciences budget).
  4. Deliberately tackle the issue of scale and the demand for an integrated approach.
  5. Evolve from independent research groups to environmental “intelligence” centres. Cohesive regional observation and modeling framework. And must connect vigorously with users and decision-makers.

Key point: we’re not ready. Characterizes the research community as a cottage industry of climate modellers. Interesting analogy: health sciences, which is almost entirely a “point-of-service” community that reacts to people coming in the door, with no coherent forecasting service. Finally, some examples of forecasting spread of west nile disease, lyme disease, etc.

ICSE proper finished on Friday, but a few brave souls stayed around for more workshops on Saturday. There were two workshops in adjacent rooms that had a big topic overlap: SE Foundations for End-user programming (SEE-UP) and Software Engineering for Computational Science and Engineering (SECSE, pronounced “sexy”). I attended the latter, but chatted to some people attending the former during the breaks – seems we could have merged the two workshops for interesting effect. At SECSE, the first talk was by Greg Wilson, talking about the results of his survey of computational scientists. Some interesting comments about the qualitative data he showed, including the strong confidence exhibited in most of the responses (people who believe they are more effective at using computers than their colleagues). This probably indicates a self-selection bias, but it would be interesting to probe the extent of this. Also, many of them take a “toolbox” perspective – they treat the computer as a set of tools, and associate effectiveness with how well people understand the different tools, and how much they take the time to understand them. Oh and many of them mention that using a Mac makes them more effective. Tee Hee.

Next up: Judith Segal, talking about organisational and process issues – particularly the iterative, incremental approach they take to building software. Only cursory requirements analysis and only cursory testing. The model works because the programmers are the users – they build software for themselves, and because the software is developed (initially) only to solve a specific problem, so they can ignore maintainability and usability. Of course, the software often does escape from the lab, and get used by others, which leads to a large risk of using incorrect, poorly designed software leading to incorrect results. For the scientific communities Judith has been working with, there’s a cultural issue too – the scientists don’t value software skills, because they’re focussed on scientific skills and understanding. Also, openness is a problem because they are busy competing for publications and funding. But this is clearly not true of all scientific disciplines, as the climate scientists I’m familiar with are very different: for them computational skills are right at the core of their discipline, and they are much more collaborative than competitive.

Roscoe Bartlett, from Sandia Labs, presenting “Barely Sufficient Software Engineering: 10 Practices to Improve Your CSE Software”. It’s a good list: Agile (incremental) development, Code management, mail lists, checklists, make the source code the primary source of documentation. Most important was the idea of “barely sufficient”. Mindless application of formal software engineering processes to computational science doesn’t make any sense.

Carlton Crabtree described a study design to investigate the role of agile and plan-driven development processes among scientific software development projects. They are particularly interested in exploring the applicability of the Boehm and Turner model as an analytical tool. They’re also planning to use grounded theory to explore the scientists own perspectives, although I don’t quite get how they will reconcile the contructivist stance of grounded theory (it’s intended as a way of exploring the participants’ own perspectives), with the use of a pre-existing theoretical framework, such as the Boehm and Turner model.

Jeff Overbey, on refactoring Fortran. First, he started with a few thoughts on the history of Fortran (the language that everyone keeps thinking will die out, but never does. Some reference to zombies in here…). Jeff pointed out that languages only ever accumulate features (because removing features breaks backwards compatibility), so they just get more complex and harder to use with each update to the language standard. So, he’s looking at whether you can remove old language features using refactoring tools. This is especially useful for the older language features that encourage bad software engineering practices. Jeff then demo’d his tool. It’s neat, but is currently only available as an Eclipse plugin. If there was an emacs version, I could get lots of climate scientists to use this. [note: In the discussion, Greg recommended the book Working effectively with legacy code].

Next up: Roscoe again, this time on integration strategies. The software integration issues he describes are very familiar to me. and he outlined an “almost” continuous integration process, which makes a lot of sense. However, some of the things he describes a challenges don’t seem to be problems in the environment I’m familiar with (the climate scientists at the Hadley Centre). I need to follow up on this.

Last talk before the break: Wen Yu, talking about the use of program families for scientific computation, including a specific application for finite element method computations.

After an infusion of coffee, Ritu Arora, talking about the application of generative programming for scientific applications. She used a checkpointing example as a proof-of-concept, and created a domain specific language for describing checkpointing needs. Checkpointing is interesting, because it tends to be a cross cutting concern; generating code for this and automatically weaving it into the code is likely to be a significant benefit. Initial results are good: the automatically generated code had similar performance profiles to hand generated checkpointing code.

Next: Daniel Hook on testing for code trustworthiness. He started with some nice definitions and diagrams that distinguish some of the key terminology e.g. faults (mistakes in the code) versus errors (outcomes that affect the results). Here’s a great story: he walked into a glass storefront window the other day, thinking it was a door. The fault was mistaking a window for a door, and the error was about three feet. Two key problems: the oracle problem (we often have only approximate or limited oracles for what answers we should get) and the tolerance problem (there’s no objective way to say that the results are close enough to the expected results so that we can say they are correct). Standard SE techniques often don’t apply. For example, the use of mutation testing to check the quality of a test set doesn’t work on scientific code because of the tolerance problem – the mutant might be closer to the expected result than the unmutated code. So, he’s exploring a variant and it’s looking promising. The project is called matmute.

David Woollard, from JPL, talking about inserting architectural constraints into legacy (scientific) code. David has been doing some interesting work with assessing the applicability of workflow tools to computational science.

Parmit Chilana from U Washington. She’s working mainly with bioinformatics researchers, comparing the work practices of practitioners with researchers. The biologists understand the scientific relevance , but not the technical implementation; the computer scientists understand the tools and algorithms, but not the biological relevance. She’s clearly demonstrated the need for domain expertise during the design process, and explored several different ways to bring both domain expertise and usability expertise together (especially when the two types of expert are hard to get because they are in great demand).

After lunch, the last talk before we break out for discussion. Val Maxville, preparing scientists for scaleable software development. Val gave a great overview of the challenges for software development at iVEC. AuScope looks interesting – an integration of geosciences data across Australia. For each of the different projects. Val assessed how much they have taken practices from the SWEBOK – how much have they applied them, and how much do they value them. And she finished with some thoughts on the challenges for software engineering education for this community, including balancing between generic and niche content, and balance between ‘on demand’ versus a more planned skills development process.

And because this is a real workshop, we spent the rest of the afternoon in breakout groups having fascinating discussions. This was the best part of the workshop, but of course required me to put away the blogging tools and get involved (so I don’t have any notes…!). I’ll have to keep everyone in suspense.

Friday, the last day of the main conference, kicked off with Pamela Zave’s keynote “Software Engineering for the Next Internet”. Unfortunately I missed the first few minutes of the talk. But I regret that, because this was an excellent keynote. Why do I say that? Because Pamela demonstrated a beautiful example of what I want to call “software systems thinking”. By analyzing them from a software engineering perspective, she demonstrated how some of the basic protocols of the internet (eg the Simple Initiation Protocol, SIP), and the standardization process by which they are developed are broken in interesting ways. The reason they are broken is because they ignore software engineering principles. I thought the analysis was compelling: both thorough in terms of the level of detail, and elegant in the simplicity of the analysis.

Here’s some interesting tidbits;

  • A corner case is a possible behaviour that emerges from the interaction of unanticipated constraints. It is undesirable, and designers typically declare it to be rare and unimportant, without any evidence. Understanding and dealing with corner cases is important for assessing the robustness of a design.
  • The IETF standards process is an extreme (pathological?) case of bottom up thinking. It sets an artificial conflict between generality and simplicity, because any new needs are dealt with by adding more features and more documents to the standard. Generality is always achieved by making the design more complex. Better abstractions, and some more top down analysis can provide simple and general designs (and Pamela demonstrated a few)
  • How did the protocols get to be this broken? Most network functions are provided by cramming them into the IP layer. This is believed to be more efficient, and in the IETF design process, efficiency always takes precedence over separation of concerns.
  • We need a successor to the end-to-end principle. Each application should run on a stack of overlays that exactly meets its requirements. Overlays have to be composable. The bottom underlay runs on a virtual network which gets a predictable slice of the real network resources. Of course, there are still some tough technical challenges in designing the overlay hierarchy.

So, my reflections. Why did I like this talk so much? First it had an appealing balance of serious detail (with clear explanations) and new ideas that are based on an understanding of the big picture. Probably it helps that she’s talking about an analysis approach using techniques that I’m very familiar with (some basic software engineering design principles: modularity, separation of concerns, etc), and applies them to a problem that I’m really not familiar with at all (detailed protocol design). So that combination allows me to follow most of the talk (because I understand the way she approaches the problem), but tells me a lot of things that are new and interesting (because the domain is new to me).

She ended with a strong plug for domain-specific research. It’s more fun and more interesting! I agree wholeheartedly with that. Much of software engineering research is ultimately disappointing because in trying to be too generic it ends up being vague and wishy washy. And it misses good pithy examples.

So, having been very disappointed with Steve McConnell’s opening keynote yesterday, I’m pleased to report that the keynotes got steadily better over the week. Thursday’s keynote was by Carlo Ghezzi, entitled Reflections on 40+ years of software engineering research and beyond: An Insider’s View. He started with a little bit of history of the SE research community and the ICSE conference, but the main part of the talk was a trawl though the data from the conference over the years, motivated by questions such as “how international are we as a community?”, and “how diverse?” (e.g. academia, industry…), and “how did the research areas included in ICSE evolve?”. For example, there has been a clear trend in the composition of the program committee, from being N. American dominated (80% at first ICSE), to now approx equal N. American and European, with some from asia & elsewhere. However, there is a startling trend on industry vs. acadmia mix. The attendees at the first conference were 80% industry and only 20% academics. This has steadily changed: the conference is now 90% academics. The number of accepted papers each year has remained fairly steady (average is 44), but with a strong growth in submissions over past 15 years from 150 to 400. Which now gives us a paper acceptance rate now well below 15%. This is clearly good for the academics – the low acceptance rate keeps the quality of the accepted papers high, and makes the conference the top choice as a publication venue. But a strong academic research program clearly does not attract practitioners to attend.

In Carlo’s analysis of research areas, I was struck by the graph of number of papers on programming languages, which looks like a pair of vampire teeth – a huge spike in this area in the early days of ICSE, then nothing for years, and again a huge spike in the last couple of years. A truly interesting and surprising result.

Towards the end of the talk, Carlo got onto the question of how we could identify our best products. He talked about the strengths and weaknesses of quantitative measures such as citation count (difficult as it’s a moving target, and you have to account for journal/conference versions), number of downloads from ACM digital library over 12 months, etc. He drew a lot on a report by the Joint Committee on Quantitative Assessment of Research. He also mentioned Meyer’s viewpoint article in CACM April 2009, and of course, Parnas’s somewhat less nuanced “Stop the numbers game“. Why is the problem of quantitative assessment of research becoming so hot today? It’s being increasingly used to rank journals and conferences and individuals. Many stakeholders now need to evaluate research, and peer-review is considered to be expensive and subjective, while numeric metrics are considered to be simple and objective. The Joint committee report says that, to the contrary, numeric metrics are simple and misleading. From the report: Much of modern bibliometrics is flawed. The meaning of a citation can be even more subjective than peer review. Citation counts are only valid if reinforced by other judgements.

Carlos’ final message was that we have to care about impact of our research: understanding, measuring, and improving it. Because if we don’t others will (governments, funding agencies, universities, etc). Okay, that’s a good argument. I’ve been skeptical of SIGSOFT’s Impact Project in the past, largely because I think the process by which research ideas filter into industrial practice is much more complex, and takes much longer than everyone seems to expect. But I guess taking control of the assessment of impact is the obvious way to address this issue.

After the break, Jorge presented his paper on the Secret Life of Bugs. His did a great job on presenting the work, to an absolutely packed room, and I had lots of people comment on how much they enjoyed the paper afterwards. I beamed with pride.

But for most of the day, I was busy trying to finish off my talk “Software Engineering for the Planet”, in time for the session at 2pm. Many thanks to Spencer, Jon, Carolyn and Alicia for helping my polish it prior to delivery. I’ll get the slides up on the web soon. I think the session went very well – the questions and discussions afterwards were very encouraging – most people seemed to immediately get the key message (that we should stop focussing our energies on personal green choices, and instead figure out how our professional skills and experience can be used to address to the climate crisis). Aran posted a quick summary of the session, and some afterthoughts. Now we’ve got to do the community building, and keep the momentum going. [Aran said he doesn’t think I’ll get much research done in the next few months. He’s might be right, but I can just declare that this is now my research…]

Okay, the main conference started today, and we kick off with the keynote speaker – Steve McConnell talking about “The 10 most powerful ideas in software engineering”. Here’s my thoughts: when magazines are short of ideas for an upcoming issue, they resort to the cheap journalist’s trick of inventing top ten lists. It makes for easy reading filler, that never really engages the brain. Unfortunately, this approach also leads to dull talks. The best keynotes have a narrative thread. They tell a story. They build up ideas in interesting new ways. The top ten format kills this kind of narrative stone dead (except perhaps when used in parody). Okay, so I didn’t like the format, but what about the content? Steve walked through ten basic concepts that we’ve been teaching in our introductory software engineering courses for years, so I learned nothing new. Maybe this would be okay as a talk to junior programmers who missed out on software engineering courses in school. For ICSE keynotes, I expect a lot more – I’d have liked at least some sharper insights, or better marshalling of the evidence. I’m afraid I have to add this to my long list of poor ICSE keynotes. Which is okay, because ICSE keynotes always suck – even when the chosen speakers are normally brilliant thinkers and presenters. Maybe I’ll be proved wrong later this week… For what it’s worth, here’s his top ten list (which he said were in no particular order):

  1. Software Development work is performed by human beings. Human factors make a huge difference in the performance of a project.
  2. Incrementalism is essential. The benefits are feedback, feedback, and feedback! (on the software, on the development process, on the developer capability). And making small mistakes that prevent bigger mistakes later.
  3. I’ve no idea what number 3 was. Please excuse my inattention.
  4. Cost to fix a defect increases over time, because of the need to fix all the collateral and downstream consequences of the error.
  5. There’s an important kernel of truth in the waterfall model. Essentially, there are three intellectual phases: discovery, invention, construction. They are sequential, but also overlapping.
  6. Software Estimates can be improved over time, by reducing its uncertainty as the project progresses.
  7. The most powerful form of reuse is full reuse – i.e. not just code and design, but all aspects of process.
  8. Risk management is important.
  9. Different kinds of software call for different kinds of software development (the toolbox approach). This was witty: he showed a series of pictures of different kinds of saw, then newsflash: software development is as difficult as sawing.
  10. The software engineering body of knowledge (SWEBOK)

Next up, the first New Ideas and Emerging Results session. This is a new track at this year’s ICSE, and the intent is to have a series of short talks, with a poster session at the end of the day. Although I’m surprised how hard it was to get a paper accepted: of 118 submissions, they selected only 21 for presentation (an 18% acceptance rate). The organisers also encouraged the presenters to use the Pecha Kucha format: 20 slides on an automated timer, with 20 seconds per slide. Just to make it more fun and more dynamic.

I’m disappointed to report that none of the speakers this morning took up this challenge, although Andrew Begel’s talk on social networking for programmers was very interesting (and similar to some of our ideas for summer projects this year). The fourth talk, by Abram Hindle, also didn’t use the Pecha Kucha format, but made up for it with a brilliant and beautiful set of slides that explain how to form interesting time series analysis visualizations of software projects by mining the change logs.

Buried in the middle of the session was an object lesson in misuse of empirical methods. I won’t name the guilty parties, but let me describe the flaw in their study design. Two teams were assigned a problem to analyze, with one team being given a systems architecture, and the other team wasn’t. To measure the effect of being given this architecture on the requirements analysis, the authors asked experts to rate each of several hundred requirements generated by each of the teams, and then used a statistical test to see whether the requirements from one team were different on this ranking compared to the other. Unsurprisingly, they discovered a statistically significant difference. Unfortunately, the analysis is completely invalid, because they made a classic unit of analysis error. The unit of analysis for the experimental design is the team, because it was teams that were assigned the different treatments. But the statistical test was applied to individual requirements. But there was no randomization of these requirements – all the requirements from a given team have to be taken as a single unit. The analysis that was performed in this study merely shows that the requirements came from two different teams, which we knew already. It shows nothing at all about the experimental hypothesis. I guess the peer review process has to let a few klunkers through.

Well, we reach the end of the session and nobody did the Pecha Kucha thing. Never mind – my talk is first up in the next NIER session this afternoon, and I will take the challenge. Should be hilarious. On the plus side, I was impressed with the quality of all the talks – they all managed to pack in key ideas, make them interesting, and stick to the 6 minute time slot.

So, here’s an interesting thought that came up the the Michael Jackson festschrift yesterday. Michael commented in his talk that understanding is not a state, it’s a process. David Notkin then asked how we can know how well we’re doing in that process. I suggested that one of the ways you know is by discovering where your understanding is incorrect, which can happen if your model surprises you. I noticed this is a basic mode of operation for earth system modelers. They put their current best understanding of the various earth systems (atmosphere, ocean, carbon cycle, atmospheric chemistry, soil hydrology, etc) into a coupled simulation model and run it. Whenever the model surprises them, they know they’re probing the limits of their understanding. For example, the current generation of models at the Hadley centre don’t get the Indian Monsoon in the right place at the right time. So they know there’s something in that part of the model they don’t yet understand sufficiently.

Contrast this with the way we use (and teach) modeling in software engineering. For example, students construct UML models as part of a course in requirements analysis. They hand in their models, and we grade them. But at no point in the process do the models ever surprise their authors. UML models don’t appear to have the capacity for surprise. Which is unfortunate, given what the students did in previous courses. In their programming courses, they were constantly surprised. Their programs didn’t compile. Then they didn’t run. Then they kept crashing. Then they gave the wrong outputs. At every point, the surprise is a learning opportunity, because it means there was something wrong with their understanding, which they have to fix. This contrast explains a lot about the relative value students get from programming courses versus software modeling courses.

Now of course, we do have some software engineering modeling frameworks that have the capacity for surprise. They allow you to create a model and play with it, and sometimes get unexpected results. For example, Alloy. And I guess model checkers have that capacity too. A necessary condition is that you can express some property that your model ought to have, and then automatically check that it does have it. But that’s not sufficient, because if the properties you express aren’t particularly interesting, or are trivially satisifed, you still won’t be surprised. For example, UML syntax checkers fall into this category – when your model fails a syntax check, that’s not surprising, it’s just annoying. Also, you don’t necessarily have to formally state the properties – but you do have to at least have clear expectations. When the model doesn’t meet those expectations, you get the surprise. So surprise isn’t just about executability, it’s really about falsifiability.

So, I made it to ICSE at last. I’m way behind on blogging this one: the students from our group have been here for several days, busy blogging their experiences. So far, the internet connection is way too weak for liveblogging, so I’l have to make do with post-hoc summaries.

I spent the morning at the Socio-Technical Congruence (STC) workshop. The workshops is set up with discussants giving prepared responses to each full paper presentation, and I love the format. The discussants basically riff on ideas that the original paper made them think of. Which ends up being more interesting than the original paper. For example, Peri Tarr clarified how to tell whether something counts as a design pattern. A design pattern is a (1) proven solution to a (2) commonly occurring problem in a (3) particular context. To assess whether an observed “pattern” is actually a design pattern, you need to probe whether all these three things are in place. For example, the patterns that Marcelo had identified do express implemented solutions, but e has not yet identified the problems/concerns they solve, and the contexts in which the patterns are applicable.

Andy Begel’s discussion include a tour through learning theory (I’ve no idea why, but I enjoyed the ride!). On a single slide, he tooks us though the traditional “empty container” model of learning, though Piaget‘s constructivism; Vygotsky‘s social learning, Papert‘s constructionism), Van Maanen & Schein‘s newcomer socialization; Hutchins‘ distributed cognition and Lave & Wenger‘s legitimate peripheral participation. Whew. Luckily, I’m familiar with all of these except the Van Maanen & Schein stuff – I’m looking forward to read that. Oh and an interesting book recommendation “Anything that’s worth knowing is really complex” from Wolfram’s A New Kind of Science. Then, Andy posed some interesting question’s: how long can software live? How big can it get? How many people can work on it? And he proposed we should design for long-term social structures, rather than modular architecture.

We then spent some time discussing whether designing the software architecture is the same thing as designing the social structure. Audris suggested that while software architecture people tend not to talk about the social dimension, but in fact they are secretly designing it. If the two get out of synch, people are very adaptable – they find a way of working around the mismatch. Peri pointed out that technology also adapts to people. They are different things, with feedback loops that affect each other. It’s an emergent, adaptive thing.

And someone mentioned Rob DeLine’s keynote on the weekend at CHASE, in which pointed out that only about 20% of ICSE papers mention the human dimension, and we should seek to flip the ratio. To make it 80% we should insist that papers that ignore the people aspects have to prove that people are irrelevant to the problem being addressed. Nice!

After lots of catching up with ICSE regulars over lunch, I headed over to the last session of the Michael Jackson festschrift, to hear Michael’s talk. He kicked off with some quotes that he admitted he can’t take credit for: “description should precede invention”, and Tony Hoare’s: “there are 2 ways to make a system (1) make it so complicated that it has no obvious deficiencies or (2) make it so simple that it obviously has no deficiencies”. And another which may or may not be original: “Understanding is a process, not a state”. And another interesting book recommendation: Personal Knowledge by Michael Polanyi.

So, here’s the core of MJ’s talk: every “contrivance” has an operational principle, which specifies how the characteristic parts fulfill their function. Further, knowledge of physics, chemistry, etc, is not sufficient to understand and recognise the operating principle. E.g. describing a clock – the description of the mechanism is not a scientific description. While the physical science has made great strides, our description of contrivances has not. The operational principle answers questions like “What is it?” “What is it for?”,  and “how do the parts interact to achieve the purpose?”. To supplement this, the mathematical and scientific knowledge describes the underlying laws, context necessary for success (e.g. pendulum clock only works in the appropriate gravitational field, and must be completely upright – won’t work on the moon, on a ship, etc), part properties necessary for success, possible improvements, specific failures and causes, feasibility of a proposed contrivance.

MJ then goes on to show how problem decomposition works:

(1) Problem decomposition –  by breaking out the problem frames: e.g. for an elevator: provide prioritized lift service, brake on danger, provide information display for users.

(2) Instrumental decomposition – building manager specifies priority rules, system uses priority rules to determine operation.

The sources of complexity are the intrinsic complexity of each subproblem, plus the interaction of subproblems. But he calls for the use of free decomposition (meaning free as in unconstrained). For initial description purposes, there are no constraints on how the subproblems will interact; the only driver is that we’re looking for simple operating principles.

Finally, then he identified some composition concerns: interleaving (edit priority rules vs lift service); requirements elaboration (e.g. book loans vs member status), requirements conflict (linter-library vs member loan), switching (lift service vs emergency action), domain sharing (e.g. phone display: camera vs gps vs email).

The discussion was fascinating, but I was too busy participating to take notes. Hope someone else did!

Well, I had a fabulous week at the EGU. I tried to take in many different aspects of climate research, but inevitably ended up at lots of sessions on earth systems informatics (to satisfy my techie streak), and sessions looking at current cutting edge research on earth systems models, such as integrating weather forecast and climate models, model ensembles, and probabilistic predictions. Lots of interesting things going on in this space. 

Here’s what I would regard as the major themes of the conference from my perspective:

  • Ocean Acidification. It’s pretty easy to predict because it’s linear in the concentration of CO2 in the atmosphere – i.e. there’s no uncertainty at all. When we kill off life in the seas we also lose a major carbon sink.
  • Feedbacks. I learned at least nine different definitions of the word feedback, and also that there are a huge number of feedbacks that we might want to put into an earth system model, so someone’s got to work out which ones are most likely to be important.
  • Abrupt Climate Change. I learned that the paleontological record tells us that the earth is quite likely to be twitchy, and we still don’t know anywhere near enough about the triggers. Oh, and lots of climate scientists think we’ve already hit some of those triggers.
  • Probabilistic forecasting. I learned a lot about the use of model ensembles (both multi-models, and perturbed physics experiments with single models) to quantify our uncertainties. There’s a strong move in the climate community to replace single predictions of climate change with probabilistic forecasts. The simplest exposition of this idea is MIT’s wheels of fortune.
  • Simpler targets for policy makers. I’m very taken with the analysis from Chris Jones and colleagues that show that if we want to stay below the 2°C temperature rise, we have a total budget of One Trillion Tonnes of Carbon to emit, and since the dawn of industrialization, we used up more than half of it. 
  • Geo-Engineering. Suddenly it’s okay for climate scientists to start talking about geo-engineering. For years, this has been anathema, on the basis that even just talking about this possibility can undermine the efforts to reduce carbon emissions (which is always the most sensible way to tackle the problem). But now it appears that many scientists have concluded that it’s too late anyway to do the right thing, and now we have to start thinking the unthinkable.

Plus some things that I missed that I wish I’d seen (based on what others told me afterwards):