The recording of my Software Engineering for the Planet talk is now available online. Having watched it, I’m not terribly happy with it – it’s too slow, too long, and I make a few technical mistakes. But hey, it’s there. For anyone already familiar with the climate science, I would recommend starting around 50:00 (slide 45) when I get to part 2 – what should we do?

[Update: A shorter (7 minute) version of the talk is now available]

The slides are also available as a pdf with my speaking notes (part 1 and part 2), along with the talk that Spencer gave in the original presentation at ICSE. I’d recommend these pdfs rather than the video of me droning on….

Having given the talk three times now, I have some reflections on how I’d do it differently. First, I’d dramatically cut down the first part on the climate science, and spend longer on the second half – what software researchers and software engineers can do to help. I also need to handle skeptics in the audience better. There’s always one or two, and they ask questions based on typical skeptic talking points. I’ve attempted each time to answer these questions patiently and honestly, but it slows me down and takes me off-track. I probably need to just hold such questions to the end.

Mistakes? There are a few obvious ones:

  • On slide 11, I present a synoptic view of the earth’s temperature record going back 500 million years (it’s this graph from wikipedia). I use it to put current climate change into perspective, but also also to make the point that small changes in the earth’s temperature can be dramatic – in particular, the graph indicates that the difference between the last ice age and the current inter-glacial is about 2°C average global temperature. I’m now no longer sure this is correct. Most textbooks say it was around 8°C colder in the last ice age, but these appear to be based on an assumption that temperature readings taken from ice cores at the poles represent global averages. The temperature change at the poles is always much greater than the global average, but it’s hard to compute a precise estimate of global average temperature from polar records. Hansen’s reconstructions seem to suggest 3°C-4°C. So the 2°C rise shown on the wikipedia chart is almost certainly an underestimate. But I’m still trying to find a good peer-reviewed account of this question.
  • On slide 22, I talk about Arrhenius’s initial calculation of climate sensitivity (to doubling of CO2) back in the 1880’s. His figure was 4ºC-5ºC, whereas the IPCC’s current estimates are 2ºC-4.5ºC. And I need to pronounce his name correctly.

What’s next? I need to turn the talk into a paper…

This afternoon, I’m at the science 2.0 symposium, or “What every scientist needs to know about how the web is changing the way they work”. The symposium has been organised as part of Greg’s Software Carpentry course. There’s about 120 people here, good internet access, and I got here early enough to snag a power outlet. And a Timmie’s just around the corner for a supply of fresh coffee. All set.

1:05pm. Greg’s up, introducing the challenge: for global challenges (e.g. disease control, climate change) we need two things: Courage and Science. Most of the afternoon will be talking about the latter. Six speakers, 40 minutes each, wine and cheese to follow.

1:08pm. Titus Brown, from Michigan State U. Approaching Open Source Science: Tools Approaches. Aims to talk about two things: how to suck people into your open source project, and automated testing. Why open source? Ideologically: for reproducibility and open communication. Idealistically: can’t change the world by keeping what you do secret. Practical reason: other people might help. Oh and “Closed-source science” is an oxymoron. First, the choice of license probably doesn’t matter, because it’s unlikely anyone will ever download your software. Basics: every open source project should have a place to get the latest release, a mailing list, and an openly accessible version control system. Cute point: a wiki and issue tracker are useful if you have time and manpower, but you don’t, so they’re not.

Then he got into a riff about whether or not to use distributed version control (e.g. git). This is interesting because I’ve heard lots of people complain that tools like git can only be used by ubergeeks (“you have to be Linus Torvolds to use it). Titus has been using it for 6 months, and says it has completely changed his life. Key advantages: decouples developers from the server, hence ability to work offline (on airplanes), but still do version control commits. Also, frees you from “permission” decisions – anyone can take the code and work on it independently (as long as they keep using the same version control system). But there are downsides – creates ‘effective forks’, which might then lead to code bombs – someone who wants to remerge a fork that has been developed independently for months, and which then affects large parts of the code base.

Open development is different to open source. The key question is do you want to allow others to take the code and do their own things with it, or do you want to keep control of everything (professors like to keep control!). Oh, and you open yourself up to “annoying questions” about design decisions, and frank (insulting) discussion of bugs. But the key idea is that these are the hallmarks of a good science project – a community of scientists thinking and discussing design decisions and looking for potential errors.

So, now for some of the core science issues. Titus has been working on Earthshine – measuring the albedo of the earth by measuring how much radiation from the earth lights up the (dark side of the) moon. He ended up looking though the PVwave source code, trying to figure out what the grad student working on the project was doing. By wading through the code, he discovered the student had been applying the same correction to the data multiple times, to try and get a particular smoothing. But the only people who understood how the code worked were the grad student and Titus. Which means there was no way, in general, to know that the code works. Quite clearly, “code working” should not be judged by whether it does what the PI thinks it should do. In practice the code is almost never right – more likely that the PI has the wrong mental model. Which lead to the realization that we don’t teach young scientists how to think about software – including being suspicious of their code. And CS programs don’t really do this well either. And fear of failure doesn’t seem to be enough incentive – there are plenty of examples where software errors have lead to scientific results being retracted.

Finally, he finished off with some thoughts about automated testing. E.g. regression testing is probably the most useful thing scientists can do with their code: run the changed code and compare the new results with the old ones. If there are unexpected changes, then you have a problem. Oh, and put assert statements in to check that things that should never occur don’t ever occur. Titus also suggests that code coverage tools can be useful for finding dead code, and continuous integration is handy if you’re building code that will be used on multiple platforms, so an automated process builds the code and tests it on multiple platforms, and reports when something broke. Bottom line: automated testing allows you to ‘lock down’ boring code (code that you understand), and allows you to focus on ‘interesting’ code.

Questions: I asked whether he has ever encountered problems with the paranoia among some scientific communities, for example, fear of being scooped, or journals who refuse to accept papers if any part has already appeared on the web. Titus pointed out that he has had a paper rejected without review, because when he mentioned that many people were already using the software, the journal editor then felt this means it was not novel. Luckily, he did manage to publish it elsewhere. Journals have to take the lead by, for example, refusing to publish paper unless the software is open, because it’s not really science otherwise.

1:55pm. Next up Cameron Neylon, “A Web Native Research Record: Applying the Best of the Web to the Lab Notebook”. Cameron’s first slide is a permission to copy, share, blog, etc. the contents of the talk (note to self – I need this slide). So the web is great for mixing, mashups, syndicated feeds, etc. Scientists need to publish, subscribe, syndicate (e.g. updates to handbooks), remix (e.g. taking ideas from different disciplines and pull them together to get new advances). So quite clearly, the web is going to solve all our problems, right?

But our publication mechanisms is dead, broken, disconnected. A PDF of a scientific paper is a deadend, when really it should be linked to data, sources, citations, etc. It’s the links between things that matter. Science is a set of loosely coupled chunks of knowledge, they need to be tightly wired to each other so that we understand their context, we understand their links. A paper is too big a piece to be thought of as a typical “chunk of science”. A tweet (example was of MarsPhoenix team announcing they found ice on Mars) is too small, and too disconnected. A blog post seems about right. It includes embedded links (e.g. to detailed information about the procedures and materials used in an experiment). He then shows how his own research group is using blogs as online lab notebooks. Even better, some blog posts are generated automatically by the machines (when dealing with computational steps in the scientific process). Then if you look at the graph of the ‘web of objects’, you can tell certain things about them. E.g. an experiment that failed occupies a certain position in the graph; a set of related experiments appear as a cluster; a procedure that wasn’t properly written up might appear as a disconnected note; etc.

Now, how do we get all this to work? Social tagging (folksonomies) don’t work well because of inconsistent use of tagging, not just across different people, but over time by the same person. Templates help, and the evolution of templates over time tells you a lot about the underlying ontology of the science (both the scientific process and the materials used). Cameron even points out places where their the templates they have developed don’t fit well with established taxonomies of materials developed (over many years) within his field, and that these mismatches reveal problems in the taxonomies themselves, where they have ignored how materials are actually used.

So, now everything becomes a digital object: procedures, analyses, materials, data. What we’re left with is the links between them. So doing science becomes a process of creating new relationships, and what you really want to know about someone’s work is the (semantic) feed of relationships created. The big challenge is the semantic part – how do we start to understand the meaning of the links. Finally, a demonstration of how new tools like Google Wave can support this idea – e.g. a Wave plugin that automates the creation of citations within a shared document (Cameron has an compelling screen capture of someone using it).

Finally, how do we measure research impact? Eventually, something like pagerank. Which means scientists have to be wired into the network, which means everything we create has to be open and available. Cameron says he’s doing a lot less of the traditional “write papers and publish” and much more of this new “create open online links”). But how do we persuade research funding bodies to change their culture to acknowledge and encourage these kinds of contribution? Well, 70% of all research is basically unfunded – done on a shoestring.

2:40pm. slight technical hitch getting the next speaker (Michael) set up, so a switch of speakers: Victoria Stodden, How Computational Science is Changing the Scientific Method. Victoria is particularly interested in reproducibility in scientific research, and how it can be facilitated. Massive computation changes what we can do in science, e.g. data mining for subtle patterns in vast databases, and large scale simulations of complex processes. Examples: climate modeling, high energy physics, astrophysics. Even mathematical proof is affected – e.g. use of a simulation to ‘prove’ a mathematical result. But is this really a valid proof? Is it even mathematics?

So, effectively this might be a third branch of science. (1) deductive method for theory development – e.g. mathematics and logic (2) inductive/empirical – the machinery of hypothesis testing. And now (3) large scale extrapolation and prediction. But there’s lots of contention about this third branch. E.g. Anderson “The End of Theory“, Hillis rebuttal – we look for patterns first, and then create hypotheses, just as we always have. Weinstein points out that simulation underlies the other branches – tools to build intuitions, and tools to test hypotheses. Scientific approach is primarily about the ubiquity of error, so that the main effort is to track down and understand sources of error.

Although computational techniques being widely used now (e.g. in JASA, over the last decade, grown to more than half the papers using them), but very few make their code open, and very little validation going on, which means that there is increasingly a credibility crisis. Scientists make their papers available, but not their complete body of research. Changes are coming (e.g. Madagascar, Sweave,…), and the push towards reproducibility pioneered by Jon Claerbout.

Victoria did a study of one particular subfield: Machine Learning. Surveyed academics attending one of the top conferences in the field (NIPS). Why did they not share? Top reason: time it takes to document and clean up the code and data. Then, not receiving attribution, possibility of patents, legal barriers such as copyright, and potential loss of future publications. Motivations to share are primarily communitarian (for the good of science/community), while most of the barriers are personal (worries about attribution, tenure and promotion, etc).

Idea: take the creative commons license model, and create a reproducible research standard. All media components get released under as CC BY license, code gets released under some form of BSD license. But what about data? Raw facts alone are not generally copyrightable, so this gets a little complicated. But the expression of facts in a particular way is.

So, what are the prospects for reproducibility? Simple case: small scripts and open data. But harder case: inscrutible code and organic programming. Really hard case: massive computing platforms and streaming data. But it’s not clear that readability of the code is essential, e.g. Wolfram Alpha – instead of making the code readable (because in practice nobody will read it), make it available for anyone to run it in any way they like.

Finally, there’s a downside to openness, in particular, a worry that science can be contaminated because anyone can come along, without the appropriate expertise, and create unvalidated science and results, and they will get cited and used.

3:40pm. David Rich. Using “Desktop” Languages for Big Problems. David starts of with an analogy of different types of drill – e.g. a hand drill – trivially easy to use, hard to hurt yourself, but slow; up to big industrial drills. He then compares these to different programming languages / frameworks. One particular class of tools, cordless electric drills, are interesting because they provide a balance between power and usability/utility. So what languages and tools do scientific programmers need? David presented the results of a survey of their userbase, to find out what tools they need. Much of the talk was about the need/potential for parallelization via GPUs. David’s company has a tool called Star-P which allows users of Matlab and NumPy to transform their code for parallel architectures.

4:10pm. Michael Nielsen. Doing Science in the Open: How Online Tools are Changing Scientific Discovery. Case study: Terry Tao‘s use of blogs to support community approaches to mathematics. In particular, he deconstructs one particular post: Why global regularity for Navier-Stokes is hard, which sets out a particular problem, identifies the approaches that have been used, and has attracted a large number of comments from some of the top mathematicians in the field, all of which helps to make progress on the problem. (similar examples from other mathematicians, such as the polymath project), and a brand new blog for this: polymathprojects.org.

But these examples couldn’t be published in the conventional sense. They are more like the scaling up of a conversation that might occur in a workshop or conference, but allowing the scientific community to continue the conversation over a long period of time (e.g. several years in some cases), and across geographical distance.

These examples are pushing the boundaries of blog and wiki software. But blogs are just the beginning. Blogs and open notebooks enable filtered access to new information sources and new conversations. Essentially, they are restructuring expert attention – people focus on different things and in a different way than before. And this is important because expert attention is the critical limiting factor in scientific research.

So, here’s a radically different idea. Markets are a good way to efficiently allocate scarce resources. So can we create online markets in expert attention. For example Innocentive. One particular example: need in India to get hold of solar powered wireless routers to support a social project (ASSET India) helping women in india escape from exploitation and abuse. So this was set up as a challenge on Innocentive. A 31-yr old software engineering from Texas designed a solution, and it’s now being prototyped.

But, after all, isn’t all this a distraction? Shouldn’t you be writing papers and grant proposals rather than blogging and contributing to wikipedia? When Galileo discovered the rings of Saturn (actually, that Saturn looked like three blobs), he sent an anagram to Kepler, which then allowed him to claim credit. The modern scientific publishing infrastructure was not available to him, and he couldn’t conceive of the idea of open sharing of discoveries. The point being that these technologies (blogs etc) are too new to understand the full impact and use, but we can see ways in which they are already changing the way science is done.

Some very interesting questions followed about attribution of contribution, especially for the massive collaboration examples such as polymath. In answer, Michael pointed to the fact that the record of the collaboration is open and available for inspection, and that letters of recommendation from senior people matter a lot, and junior people who contributed in a strong way to the collaboration will get great letters.

[An aside: I’m now trying to follow this on Friendfeed as well as liveblogging. It’s going to be hard to do both at once]

4:55pm. Last but not least, Jon Udell. Collaborative Curation of Public Events. So, Jon claims that he can’t talk about science itself, because he’s not qualified, but will talk about other consequences of the technologies that we’re talking about. For example, in the discussions we’ve been having with the City of Toronto on it’s open data initiative, there’s a meme that governments sit on large bodies of data, and people would like to get hold of. But in fact, citizens themselves are owners and creators of data, and that’s a more interesting thing to focus on than governments pushing data out to us. For example, posters advertising local community events on lampposts in neighbourhoods around the city. Jon makes the point that this form of community advertising is outperforming the web, which is shocking!

Key idea: syndication hubs. For example, an experiment to collate events in Keene, NH, in the summer of 2009. Takes in datafeeds from various events websites, calendar entries etc. Then aggregates them, and provides feeds out to various other websites. But not many people understand what this is yet – it’s not a destination, but a broker. Or another way of understanding it is as ‘curation’ – the site becomes a curator looking after information about public events, but in a way that distributes responsibility for curation to the individual sources of information, rather than say a person looking after an events diary.

Key principles: syndication is a two way process (need to both subscribe to things and publish your feeds).But tagging and data formating conventions become critical.  The available services form an ecosystem, and they co-evolve, and we’re now starting to understand the eco-system around RSS feeds – sites that are publishers, subscribers, and aggregators. Similar eco-system growing up around iCalendar feeds, but currently missing aggregators. iCalendar is interesting because the standard is 10 years old, but it’s only recently become possible to publish feeds from many tools. And people are still using RSS feeds to do this, when they are the wrong tool – an RSS feed doesn’t expose the data (calendar information) in a usable way.

So how do we manage the metadata for these feeds, and how do we handle the issue of trust (i.e. how do you know which feeds to trust for accuracy, authority, etc)? Jon talks a little about uses of tools like Delicious to bookmark feeds with appropriate metadata, and other tools for calendar aggregation. And the idea of guerilla feed creation – how to find implicit information about recurring events and making them explicit. Often the information is hard to scrape automatically – e.g. information about a regular square dance that is embedded in the image of a cartoon. But maybe this task could be farmed out to a service like mechanical turk.

And these are great examples of computational thinking. Indirection – instead of passing me your information, pass me a pointer to it, so that I can respect your authority over it. Abstraction – we can use any URL as a rendezvous for social information management, and can even invent imaginary ones just for this purpose.

Updates: The twitter tag is tosci20. Andrew Louis also blogged (part of) it, and has some great photos; Joey DeVilla has detailed blog posts on several of the speakers; Titus reflects on his own participation; and Jon Udell has a more detailed write up of the polymath project. Oh, and Greg has now posted the speakers’ slides.

23. July 2009 · 6 comments · Categories: advocacy

Here’s a simple parable for climate change:

A large group of kids has congregated out on the sidewalk in front of their school. It started with just a few friends, showing off their latest video game. But the crowd grew, and now completely blocks the sidewalk. A guy in a wheelchair wants to pass, but can’t. The kids are so wrapped up in their own interests that they don’t even notice that together they have completely blocked the sidewalk.

Further along the street there is a busy pub. The lunchtime crowd has spilled out on to the sidewalk, and now has become so big that again the sidewalk is blocked. When the guy in the wheelchair wants to pass, quite a few people in the crowd recognize the problem, and they try to squeeze out of the way. But individually, none of them can make much difference to the blockage – there are just too many people there. They shrug their shoulders and apologise to the guy in the wheelchair.

In both cases, the blockages are not caused by individuals, and cannot be solved by individuals. The blockage is an emergent property of the crowd of people as a whole, and only occurs when the crowd grows to a certain size. In the first case, the members of the crowd remain blissfully unaware of the problem. In the second case, many people do recognise the problem, but cannot, on their own, do much about it. It would take concerted, systematic action by everyone in the crowd to clear a suitable passage. Understanding the problem and wanting to do something about it is not sufficient to solve it – the entire crowd has to take coordinated action.

And if some members of the crowd are more like the kids, unable to recognise the problem, no solution is possible.

When I was at the EGU meeting in Vienna in April, I attended a session on geoengineering, run by Jason Blackstock. During the session I blogged the main points of Jason’s talk, the key idea of which is that it’s time to start serious research into the feasibility and consequences of geoengineering, because it’s now highly likely we’ll need a plan B, and we’re going to need a much better understanding of what’s involved before we do it. Jason mentioned a brainstorming workshop, and the full report is now available: Climate Engineering Responses to Climate Emergencies. The report is an excellent primer on what we know currently about geoengineering, particularly the risks. It picks out stratospheric aerosols as the most likely intervention (from the point of view of both cost/feasibility, and current knowledge of effectiveness).

I got the sense from the meeting that we have reached an important threshold in the climate science community – previously geoengineering was unmentionable, for fear that it would get in the way of the serious and urgent job of reducing emissions. Alex Steffen explains this fear very well, and goes over the history of how the mere possibility of geoengineering has been used as an excuse by the denialists for inaction. And of course, from a systems point of view, geoengineering can only ever be a distraction if it tackles temperature (the symptom) rather than carbon concentrations (the real problem).

But the point made by Jason, and in the report, is that we cannot rule out the likelihood of climate emergencies – either very rapid warming triggered by feedback effects, or sudden onset of unanticipated consequences of (gradual) warming. In other words, changes that occur too rapidly for even the most aggressive mitigation strategies (i.e. emissions reduction) to have an effect on. Geoengineering then can be seen as “buying us time” to allow the mitigation strategies to work – e.g slowing the warming by a decade or so, while we get on and decarbonize our energy supplies.

Now, maybe it’s because I’m looking out for them, but I’ve started to see a flurry of research interest in geoengineering. Oliver Morton’s article “Great White Hope” in April’s Nature gives a good summary of several meetings earlier this year, along with a very readable overview of some of the technology choices available. In June, the US National Academies announced a call for input on geoengineering which yielded a treasure trove of information – everything you’ve ever wanted to know about geoengineering. And yesterday, New Scientist reported that geoengineering has gone mainstream, with a lovely infographic illustrating some of the proposals.

Finally, along with technical issues of feasibility and risk, the possibility of geoengineering raises major new challenges for world governance. Who gets to decide which geoengineering projects should go ahead, and when, and what will we do about the fact that, by definition, all such projects will have a profound effect on human society, and those effects will be distributed unequally?

Update: Alan Robock has a brilliant summary in the Bulletin of the Atomic Scientists entitled 20 reasons why geo-engineering might be a bad idea.

Next Wednesday, we’re oganising demos of our students’ summer projects, prior to the Science 2.0 conference. The demos will be in BA1200 (in the Bahen Centre), Wed July 29, 10am-12pm. All welcome!

Here are the demos to be included (running order hasn’t been determined yet – we’ll probably pull names out of hat…):

  • Basie (demo’d by Bill Konrad, Eran Henig and Florian Shkurti)
    Basie is a light weight, web-based software project forge with an emphasis on inter-component communication.  It integrates revision control, issue tracking, mailing lists, wikis, status dashboards, and other tools that developers need to work effectively in teams.  Our mission is to make Basie simple enough for undergraduate students to master in ten minutes, but powerful enough to support large, distributed teams.
  • BreadCrumbs (demo’d by Brent Mombourquette).
    When researching, the context in which a relevant piece of information is found is often overlooked. However, the journey is as important as the destination. BreadCrumbs is a Firefox extension designed to capture this journey, and therefor the context, by maintaining a well structured and dynamic graph of an Internet browsing session. It keeps track of both the chronological order in which websites are visited and the link-by-link path. In addition, through providing simple tools to leave notes to yourself, an accurate record of your thought process and reasoning for browsing the documents that you did can be preserved with limited overhead. The resulting session can then be saved and revisited at a later date, with little to no time spent trying to recall the relevance or semantic relations of documents in an unordered bookmark folder, for example. It can also be used to provide information to a colleague, by not just pointing them to a series of web pages, but by providing them a trail to follow and embedded personal notes. BreadCrumbs maintains the context so that you can focus on the content.
  • Feature Diagram Tool (demo’d by Ebenezer Hailemariam)
    We present a software tool to assist software developers work with legacy code. The tool reverse engineers “dependency diagrams” from Java code through which developers can perform refactoring actions. The tool is a plug-in for the Eclipse integrated development environment.
  • MarkUs (demo’d by Severin GehwolfNelle Varoquaux and Mike Conley)
    MarkUs is a Web application that recreates the ease and flexibility of grading assignments with pen on paper. Graders fill in a marking scheme and directly annotate student’s work.  MarkUs also provides support for other aspects of assignment delivery and management.  For example, it allows students or instructors to form groups for assignment collaboration, and allows students to upload their work for grading. Instructors can also create and manage group or solo assignments, and assign graders to mark and annotate the students’ work quickly and easily.
  • MyeLink: drawing connections between OpenScience lab notes (demo’d by Maria Yancheva)
    A MediaWiki extension which facilitates connections between related wiki pages, notes, and authors. Suitable for OpenScience research communities who maintain a wiki collection of experiment pages online. Provides search functionality on the basis of both structure and content of pages, as well as a user interface allowing the customization of options and displaying an embedded preview of results.
  • TracSNAP – Trac Social Network Analysis Plugin (demo’d by Ainsley Lawson and Sarah Strong)
    TracSNAP is a suite of simple tools to help contributors make use of information about the social aspect of their Trac coding project. It tries to help you to: Find out which other developers you should be talking to, by giving contact suggestions based on commonality of file edits; Recognize files that might be related to your current work, by showing you which files are often committed at the same time as your files; Get a feel for who works on similar pieces of functionality based on discussion in bug and feature tickets, and by edits in common; Visualize your project’s effective social network with graphs of who talks to who; Visualize coupling between files based on how often your colleagues edit them together.
  • VizExpress (demo’d by Samar Sabie)
    Graphs are effective visualizations because they present data quickly and easily. vizExpress is a Mediawiki extension that inserts user-customized tables and graphs in wiki pages without having to deal with complicated wiki syntax. When editing a wiki page, the extension adds a special toolbar icon for opening the vizExpress wizard. You can provide data to the wizard by browsing to a local Excel or CSV file, or by typing (or copying/pasting) data. You can choose from eight graph types and eight graph-coloring schemes, and apply further formatting such as titles, dimensions, limits, and legend position. Once a graph is inserted in a page, you can easily edit it by restarting the wizard or modifying a simple vizExpress tag.

[Update: the session was a great success, and some of the audience have blogged about it already: e.g. Cameron Neylon]

In our climate brainstorming session last week, we invited two postdocs (Chris and Lawrence) from the atmospheric physics group to come and talk to us about their experiences of using climate models. Most of the discussion focussed on which models they use and why, and what problems they experience. They’ve been using the GFDL models, specifically AM2 and AM3 (atmosphere only models) for most of their research, largely because of legacy: they’re working with Paul Kushner, who is from GFDL,  and the group now has many years experience working with these models. However, they’re now faced with having to switch to NCAR’s Community Climate System Model (CCSM). Why? Because the university has acquired a new IBM supercomputer, and the GFDL models won’t run on it (without a large effort to port them). The resulting dilemma reveals a lot about the current state of climate model engineering:

  • If they stick with the GFDL models, they can’t make use of the new supercomputer, hence miss out on a great opportunity to accelerate their research (they could do many more model runs).
  • If they switch to CCSM, they lose a large investment in understanding and working with the GFDL models. This includes both their knowledge of the model (some of their research involves making changes to the code to explore how perturbations affect the runs), and the investment in tools and scripts for dealing with model outputs, diagnostics, etc.

Of course, the obvious solution would be to port the GFDL models to the new IBM hardware. But this turns out to be hard because the models were never designed for portability. Right now, the GFDL models won’t even compile on the IBM compiler, because of differences in how picky different compilers are over syntax and style checking – climate models tend to have many coding idiosyncrasies that are never fixed because the usual compiler never complains about them: e.g. see Jon’s analysis of static checking NASA’s modelE. And even if they fix all these and get the compiler to accept the code, they’re still faced with extensive testing to make sure the models’ runtime behaviour is correct on the new hardware.

There’s also a big difference in support available. GFDL doesn’t have the resources to support external users (particularly ambitious attempts to port the code). In contrast, NCAR has extensive support for the CCSM, because they have made community building an explicit goal. Hence, CCSM is much more like an open source project. Which sounds great, but it also comes at a cost. NCAR have to devote significant resources to supporting the community. And making the model open and flexible (for use by a broader community) hampers their ability to get the latest science into the model quickly. Which leads me to hypothesize that it is the diversity of your user-base that most restricts the ongoing rate of evolution of a software system. For a climate modeling centers like GFDL, if you don’t have to worry about developing for multiple platforms and diverse users, you can get new ideas into the model much quicker.

Which brings me to a similar discussion over the choice of weather prediction models in the UK. Bryan recently posted an article about the choice between WRF (NCAR’s mesoscale weather model) versus the UM (the UK Met Office’s model). Alan posted a lengthy response which echoes much of what I said above (but with much more detail): basically the WRF is well supported and flexible for a diverse community. The UM has many advantages (particularly speed), but is basically unsupported outside the Met Office. He concludes that someone should re-write the UM to run on other hardware (specifically massively parallel machines), and presumably set up the kind of community support that NCAR has. But funding for this seems unlikely.

Spurred on by Michael Tobis’s thoughts about building a readable/understandable climate model in Python, I dug up an old bookmark to a course offered in UMinn’s geology department on Designing your own “Earth System” Model. Really neat idea for a course. Now, with a bit of tweaking, I could set up a similar course here, but with a twist – we recruit a mix of CS and Physics students onto the course, and put them together in cross-disciplinary teams, each team building an earth system model of some kind (tapping into the domain knowledge of the physicists), using current CS and software engineering tools and techniques. And to keep Michael happy, they code it all in Python. Marks awarded based on understandability of the code.

In the longer term, we keep the models produced from each instance of the course, and get the next cohort of students to develop them further – the aim is to build them up to be full scale earth system models.

Update: Michael has posted some more reflections on this.

Big news today: The G8 summit declares that climate change mitigation policies should aim to limit global temperature increases to no more than 2°C above 1900 levels. This is the limit that Europe adopted as a guideline a long time ago, and which many climate scientists generally regard as an important threshold. I’ve asked many climate scientists why this particular threshold, and the answer is generally because above this level many different positive feedback effects start to kick in, which will amplify the warming and take us into really scary scenarios.

Not all scientists agree this threshold is a sensible target for politicians though. For example, David Victor argues that the 2°C goal is a political delusion. The gist of his argument is that targets such as 2°C are neither safe (because nobody really knows what is safe) nor achievable. He suggests that rather than looking at long term targets such as a temperature threshold, or a cumulative emissions target, politicians need to focus on a series of short-term, credible promises, which, when achieved, will encourage greater efforts.

The problem with this argument is that it misses the opportunity to consider the bigger picture, and understand the enormity of the problem. First, although 2°C doesn’t sound like much, in the context of the history of the planet, it’s huge. James Hansen and colleagues put this into context best, by comparing current warming with the geological record. They point out that with the warming we have already experienced over the last century, the earth is now about as warm as the Holocene Maximum (about 5,000-9,000 years ago), and within 1°C of the maximum temperature of the last million years. If you look at Hansen’s figures (e.g. fig 5, shown below) you’ll see the difference in temperature between the ice ages and the interglacials is around 3°C. For example, the last ice age, which ended about 12,000 years ago shows up as the last big rise on this graph (a rise from 26°C to 29°C):

F5.large

The wikipedia entry on the Geologic temperature record has a nice graph pasting together a number of geological temperature records to get the longer term view (but it’s not peer-reviewed, so it’s not clear how valid the concatenation is). Anyway, Hansen concludes that 1°C above the 2000 temperature is already in the region of dangerous climate change. If a drop of 3°C is enough to cause an ice age, a rise of 2°C is pretty significant for the planet.

Even more worrying is that some scientists think we’re already committed to more than 2°C rise, based on greenhouse gases already emitted in the past, irrespective of what we do from today onwards. For example, Ramanathan and Feng’s paper in PNAS – Formidable challenges ahead, sets up a scenario in which greenhouse gas concentrations are fixed at 2005 levels, (i.e. no new emissions after 2005) and discover that the climate eventually stabilizes at +2.4°C (with an 95% confidence interval of 1.4°C to 4.3°C). In other words, it is more than likely that we’ve already committed to more than 2°C even if we stopped burning all fossil fuels today. [Note: when you read these papers, take care to distinguish between emissions and concentrations]. Now of course, R&F made some assumptions that can be challenged. For example, they assumed the cooling effect of other forms of pollution (e.g atmospheric aerosols) is ignored. There’s a very readable editorial comment on this paper by Hans Schellnhuber in which he argues that, while it’s useful to be reminded how much our greenhouse gas warming is being masked by other kinds of air pollution, it is still possible to keep below 2°C if we halve emissions by 2050. Here’s his graph (A), compared with R&F’s (B):

F1.large

The lower line on each graph is for constant concentrations from 2005 (i.e. no new emissions). The upper curves are projections for reducing emissions by 50% by 2050. The difference is that in (A), other forcings from atmospheric pollution are included (dirty air and smog, as per usual…).

But that’s not the end of the story. Both of the above graphs are essentially just “back of the envelope” calculations, based on data from previous studies such as those included in the IPCC 2007 assessment. More recent research has investigated these questions directly, using the latest models. The latest analysis is much more like R&F’s graph (B) than Shellenhuber’s graph (A). RealClimate has a nice summary of  two such papers in Nature, in April 2009. Basically, if developed countries cut their emissions by 80% by 2050, we still only get a 50% chance of sticking below the 2°C threshold. Another way of putting this is that a rise of 2°C is about the best we can hope for, even with the most aggressive climate policies imaginable. Parry et al argue we should be prepared for rises of around 4°C. And I’ve already blogged what that might be like.

So, nice to hear the the G8 leaders embrace the science. But what we really need is for them to talk about how bad it really is. And we need action. And fast.

(Update: Gareth Renowden has a lengthier post with more detail on this (framed by discussion of what NZ’s targets should be)

I thought this sounded very relevant: the 4th International Verification Methods Workshop. Of course, it’s not about software verification, but rather about verification of weather forecasts. The slides from the tutorials give a good sense of what verification means to this community (especially the first one, on verification basics). Much of it is statistical analysis of observational data and forecasts, but there are some interesting points on what verification actually means – for example, to do it properly you have to understand the user’s goals – a forecast (e.g. one that puts the rainstorm in the wrong place) might be useless for one purpose (e.g. managing flood defenses) but be very useful for another (e.g. aviation). Which means no verification technique is fully “objective”.

What I find interesting is that this really is about software verification – checking that large complex software systems (i.e. weather forecast models) do what they are supposed to do (i.e. accurately predict weather), but there is no mention anywhere of the software itself; all the discussion is about the problem domain. You don’t get much of that at software verification conferences…

I’m busy revising our paper on the study of the software development processes at the Hadley Centre for publication in CiSE (yay!). And I just looked again at the graph of code growth (click for bigger version):

UM_evolution2

The top line (green) shows lines of code, while the bottom line (blue) shows number of files. When I first produced this figure last summer, I was struck by the almost linear growth in lines of code over the fifteen years (with two obvious hiccups when core modules were replaced). Other studies have shown that lines of code is a good proxy for functionality, so I interpret this as a steady growth in functionality. Which contrasts with Lehman’s observations that for industrial software, an inverse square curve offers the best fit. Lehman offers as an explanation the theory that growing complexity of the software inevitably slows the addition of new functionality. He claims that his pattern is robust in other (commercial) software he has studied (but I’d haven’t trawled through his papers to see if he gives more case studies).

Subsequently, Godfrey & Tu showed that the Linux kernel did not suffer this limitation, but instead grew slightly faster than linearly (or geometrically if you include the device drivers, which you probably shouldn’t). So, that’s two studies that break Lehman’s pattern: the Linux kernel and the Hadley UM. What do they have in common that’s different from the commercial software systems that Lehman studied? I hypothesize the most likely explanation is that in both cases the code is written by the most knowledgeable domain experts, working in a non-hierarchical meritocracy (by which I mean that no one tells them what to work on, and that they get accepted as members of the development team by demonstrating their ability over a period of time). This isn’t a new hypothesis: Dewayne Perry has been saying for ages that domain expertise by the developers is the single biggest factor in project success.

Anyway, my co-author, Tim, was struck quite a different observation about the graph: the way the two lines diverged over the fifteen years shown. While lines of code have grown relatively fast (nearly tenfold over the fifteen years shown), the number of files has grown much more slowly (only threefold). Which means the average filesize has steadily grown too. What does this mean? Roughly speaking, new files mean the addition of new modules, while new lines within an existing file mean additional functionality within existing modules (although this is not quite correct, as scientific programmers don’t always use separate files for separate architectural modules). Which means more of the growth comes from adding complexity within the existing routines, rather than from expanding the model’s scope. I’m willing to bet a lot of that intra-file growth comes from adding lots of different options for different model configurations.