03. May 2010 · 1 comment · Categories: ICSE 2010

Today is our second workshop on software research and climate change, at ICSE 2010 in Cape Town. We’ve finalized the program, and we’re hoping to support some form of remote participation, but I’m still not sure how this will work out.

We had sixteen position papers and two videos submitted in the end, which I’m delighted about. To get everyone reading and discussing them prior to the workshop, we set up an open reviewing process, which I think went very well. Rather than the usual closed, anonymous reviews, we opened it up so that everyone could add reviews to any paper, and we encouraged everyone to review in their own name, rather than anonymously. The main problem we had was finding a suitable way of supporting this – until we hit upon the idea of creating a workshop blog, so each paper is a blog post, and the comment thread allows us to add reviews, and comment on each other’s reviews. This is nice because it means we can now make all the papers and reviews public, and continue the discussions during and after the workshop.

We’re trying out two different ways of supporting live remote participation – in the morning, the keynote talk (by Stephen Emmott of Microsoft Research) will be delivered via Microsoft’s LiveMeeting. We tested it out last week, and I’m pretty impressed with it (apart from the fact that there’s no client for the Mac). The setup we’ll be using is to have a video feed of Stephen giving the talk, displayed on a laptop screen at the front of the room, with his slides projected to the big screen. The laptop also has a webcam, so (if it works) Stephen will be able to see his audience too. I’ll document how well this works in a subsequent post.

For the last afternoon session, we’ll be trying out a live skype call. Feel free to send me your skype details if you’d like to participate. I’ve no idea if this will work (as it didn’t last time we tried), but hey, it’s worth exploring…

Excellent news: Jon Pipitone has finished his MSc project on the software quality of climate models, and it makes fascinating reading. I quote his abstract here:

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated. Thus, in order to trust a climate model one must trust that the software it is built from is built correctly. Our study explores the nature of software quality in the context of climate modelling. We performed an analysis of the reported and statically discoverable defects in several versions of leading global climate models by collecting defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. We found that the climate models all have very low defect densities compared to well-known, similarly sized open-source projects. As well, we present a classification of static code faults and find that many of them appear to be a result of design decisions to allow for flexible configurations of the model. We discuss the implications of our findings for the assessment of climate model software trustworthiness.

The idea for the project came from an initial back-of-the-envelope calculation we did of the Met Office Hadley’s Centre’s Unified Model, in which we estimated the number of defects per thousand lines of code (a common measure of defect density in software engineering) to be extremely low – of the order of 0.03 defects/KLoC. By comparison, the shuttle flight software, reputedly the most expensive software per line of code ever built, clocked in at 0.1 defects/KLoC; most of the software industry does worse than this.

This initial result was startling, because the climate scientists who build this software don’t follow any of the software processes commonly prescribed in the software literature. Indeed, when you talk to them, many climate modelers are a little apologetic about this, and have a strong sense they ought to be doing more rigorous job with their software engineering. However, as we documented in our paper, climate modeling centres such as the UK Met Office do have a excellent software processes, that they have developed over many years to suit their needs. I’ve come to the conclusion that it has to be very different from mainstream software engineering processes because the context is so very different.

Well, obviously we were skeptical (scientists are always skeptical, especially when results seem to contradict established theory). So Jon set about investigating this more thoroughly for his MSc project. He tackled the question in three ways: (1) measuring defect density by using bug repositories, version history and change logs to quantify bug fixes; (2) assessing the software directly using static analysis tools and (3) interviewing climate modelers to understand how they approach software development and bug fixing in particular.

I think there are two key results of Jon’s work:

  1. The initial results on defect density bear up. Although not quite as startlingly low as my back of the envelope calculation, Jon’s assessment of three major GCMs indicate they all fall in the range commonly regarded as good quality software by industry standards.
  2. There are a whole bunch of reasons why result #1 may well be meaningless, because the metrics for measuring software quality don’t really apply well to large scale scientific simulation models.

You’ll have to read Jon’s thesis to get all the details, but it will be well worth it. The conclusion? More research needed. It opens up plenty of questions for a PhD project….

27. April 2010 · 2 comments · Categories: ICSE 2009

It’s nearly a year late, but I finally managed to put together the soundtrack and slides from the short version of the talk on Software Engineering for the Planet that I gave at the International Conference on Software Engineering last year. The full version has been around for a while, but I’m not happy with it because it’s slow and ponderous. To kick off a lunchtime brainstorming session we had at ICSE last year, I did a Pecha Kucha version of it in 6 minutes and 40 seconds (if you listen carefully, you can hear the lunch plates rattling). For anyone who has never done a Pecha Kucha talk, I highly recommend it – putting the slides on an automated timer really keeps you on your toes.

PS If you look carefully, you’ll notice I cheated slightly, rather than 20 slides with 20 seconds each, I packed in more by cutting them down to 10 seconds each for the last half of the talk. It surprises me that this actually seems to work!

After catching the start of yesterday’s Centre for Environment Research Day, I headed around the corner to catch the talk by Ray Pierrehumbert on “Climate Ethics, Climate Justice“. Ray is here all week giving the 2010 Noble lectures, “New Worlds, New Climates“. His theme for the series is the new perspectives we get about Earth’s climate from the discovery of hundreds of new planets orbiting nearby stars, advances in knowledge about solar system planets, and advances in our knowledge of the early evolution of Earth, especially new insights into the snowball earth. I missed the rest of the series, but made it today, and I’m glad I did, because the talk was phenomenal.

Ray began by pointing out that climate ethics might not seem to fit with the theme of the rest of the series, but it does, because future climate change will, in effect, make the earth into a different planet. And the scary thing is we don’t know too much about what that planet will be like. Which then brings us to questions of responsibility, particularly the question of how much we should be spending to avoid this.

Figure 1 from Rockstrom et al, Nature 461, 472-475 (24 Sept 2009). Original caption: The inner green shading represents the proposed safe operating space for nine planetary systems. The red wedges represent an estimate of the current position for each variable. The boundaries in three systems (rate of biodiversity loss, climate change and human interference with the nitrogen cycle), have already been exceeded.

Humans are a form of life, and are altering the climate in a major way. Some people talk about humans now having an impact of “geological proportions” on the planet. But in fact, we’re a force of far greater than geological proportions: we’re releasing around 20 times as much carbon per year than what nature can do (for example via volcanoes). We may cause a major catastrophe. And we need to consider not just CO2, but many other planetary boundaries – all biogeochemical boundaries.

But this is nothing new – this is what life does – it alters the planet. The mother of all planet altering lifeforms is blue-green algae. It radically changed atmospheric chemistry, even affecting composition of rocks. If the IPCC had been around at the end of the Archean Eon (2500 million years ago) to consider how much photosynthesis should be allowed, it would have been a much bigger question than we face today. The biosphere (eventually!) recovers from such catastrophes. There are plenty of examples: oxygenation by cyanobacteria, snowball earth, permo-triassic mass extinction (90% of species died out) and the KT dinosaur killer astreroid (although the latter wasn’t biogeochemically driven). So the earth does just fine in the long run, and such catastrophes often cause interesting things to happen, eg. opening up new niches for new species to evolve (e.g. humans!).

But normally these changes take tens of millions of years, and whichever species were at the top of the heap before usually lose out: the new kind of planet favours new kinds of species.

So what is new with the current situation? Most importantly we have foresight and we know about what we’re doing to the planet. This means we have to decide what kind of climate the planet will have, and we can’t avoid that decision, because even deciding to do nothing about it is a decision. We cannot escape the responsibility. For example, we currently have a climate that humans evolved to exist in. The conservative thing is to decide not to rock the boat – to keep the climate we evolved in. On the other hand we could decide a different climate would be preferable, and work towards it – e.g. would things be better (on balance) if the world were a little warmer or a little cooler. So we have to decide how much warming is tolerable. And we must consider irreversible decisions – e.g. preserving irreplaceable treasures (e.g. species that will go extinct). Or we could put the human species at the centre of the issue, and observe that (as far as we know) the human specifies is unique as the only intelligent life in the universe; the welfare of the human species might be paramount. So then the question then becomes: how should we preserve a world worth living in for humanity?

So far, we’re not doing any better than cyanobacteria. We consume resources and reproduce until everything is filled up and used up. Okay, we have a few successes, for example in controlling acid rain and CFCs. But on balance, we don’t do much better than the bacteria.

Consider carbon accounting. You can buy carbon credits, sometimes expressed in terms of tonnes of CO2, sometimes in terms of tonnes of carbon. From a physics point of view, it’s much easier to think in terms of carbon molecules, because it’s the carbon in various forms that matters – e.g. dissolved in the ocean making them more acidic, in CO2 in the atmosphere, etc. We’re digging up this carbon in various forms (coal, oil, gas) and releasing it into the atmosphere. Most of this came from biological sources in the first place, but has been buried over very long (geological) timescales. So, we can do the accounting in terms of billions of tonnes (Gt) of carbon. The pre-industrial atmosphere contained 600Gt carbon. Burning another 600Gt would be enough to double atmospheric concentrations (except that we have to figure out how much stays in the atmosphere, how much is absorbed by the oceans, etc). World cumulative emissions show an exponential growth over last century. We are currently at 300Gt cumulative emissions from fossil fuel. 1000Gt of cumulative emissions is an interesting threshold, because that’s about enough to warm the planet by 2°C (which is the EU’s stated upper limit). A straight projection of the current exponential trend takes us to 5000GtC by 2100. It’s not clear there is enough coal to get us there, but it is dangerous to assume that we’ll run out of resources before this. The worst scenario: we get to 5000GtC, wreck the climate, just as we run out of fossil fuels, so civilization collapses, at a time when we no longer have a tolerable climate to live in.

Of course, such exponential growth can never continue indefinitely. To demonstrate the point, Ray showed the video of The Impossible Hamster Club. The key question is whether we will voluntarily stop this growth in carbon emissions, and if we don’t, at what point will natural limits kick in and stop the growth for us?

There are four timescales for CO2 drawdown:

  • Uptake by the ocean mixed layer – a few decades
  • Uptake by the deep ocean – a few centuries
  • Carbonate dissolution (laying down new sediments on the ocean bed) – a few millenia
  • Silicate weathering (reaction between rocks and CO2 in the atmosphere that creates limestone) – a few hundred millenia.

Ray then showed the results of some simulations using the Hamburg carbon cycle model. The scenario they used is a ramp up to peak emissions in 2010, followed by a drop to either 4, 2, or 1Gt per year from then on. The graph of atmospheric concentrations out to the year 4000 shows that holding emissions stable at 2Gt/yr still causes concentrations to ramp up to 1000ppm. Even reducing to 1Gt/yr leads to an increase to around 600ppm by the year 4000. The obvious conclusion is that we have to reduce net emissions to approximately zero in order to keep the climate stable over the next few thousand years.

What does a cumulative emissions total of 5000GtC mean for our future? Peak atmospheric concentrations will reach over 2000ppm, and stay there for around 10,000 years, then slowly reducing on a longer timescale because of silicate weathering. Global mean temperature rises by around 10°C. Most likely, the greenland and west antarctic ice sheets will melt completely (it’s not clear what it would take to melt the east antarctic). So what we do this century will affect us for tens of thousands of years. Paul Crutzen coined the term anthropocene to label this new era in which humans started altering the climate. In the distant future, the change in the start of the anthropocene will look as dramatic as other geolological shifts – certainly bigger than the change at the end of the KT extinction.

This makes geoengineering by changing the earth’s albedo an abomination (Ray mentioned as an example the view put forward in that awful book Superfreakonimics). It’s morally reprehensible, because it leads to the Damocles world. The sword hanging over us is that for the next 10,o000 years, we’re committed to doing the sulphur seeding every two years, and continuing to do so no matter what unforutunate consequence such as drought, etc. happen as side effects.

But we will need longwave geoengineering – some way of removing CO2 from the atmosphere to deal with the last gigatonne or so of emissions, because these will be hard to get rid of no matter how big the push to renewable energy sources. That suggests we do need a big research program on air capture techniques.

So, the core questions for climate ethics are:

  • What is the right amount to spend to reduce emissions?
  • How should costs be divided up (e.g. US, Europe, Asia, etc)?
  • How to figure the costs of inaction?
  • When should it be spent?

There is often a confusion between fairness and expedience (e.g. Cass Sunstein, an Obama advisor, makes this mistake in his work on climate change justice). The argument goes that a carbon tax that falls primarily on the first world is, in effect, a wealth transfer to the developing world. It’s a form of foreign aid, therefore hard to sell politically to Americans, and therefore unfair. But the real issue is not about what’s expedient, the issue is about the right thing to do.

Not all costs can be measured by money, which makes cost-benefit analysis a poor tool for reasoning about climate change. For example, how can we account for loss of life, loss of civil liberties, etc in a cost/benefit analysis? Take for example the effect of capital punishment on crime reduction versus the injustice of executing the innocent. This cannot be a cost/benefit decision, it’s a question of social values. In economic theory the “contingent valuation” of non-market costs and benefits is hopelessly broken. Does it make sense to trade off polar bear extinction against Arctic oil revenue by assigning a monetary value to polar bears? A democratic process must make these value judgements – we cannot push them off to economic analysis in terms of cost-benefits. The problem is that the costs and benefits of planetary scale processes are not additive. Therefore cost/benefit is not a suitable tool for making value decisions.

Similarly the use of (growth in) GDP, which is used by economists as a proxy for a nation’s welfare. Bastiat introduced the idea of the broken window fallacy – the idea that damage to people’s property boosts GDP because it increases the need for work to be done to fix it, and hence increases money circulation. This argument is often used by conservatives to poohpooh the idea of green jobs – what’s good for jobs doesnt necessarily make people better off. But right now the entire economy is made out of broken windows: Hummers, Mcmansions, video screens in fastfood joints,… all of it is consumption that boosts GDP without actually improving life for anyone. (Perhaps we should try to measuring gross national happiness instead, like the Bhutanese).

And then there’s discounting – how do we compare the future with the present? The usual technique is to exponentially downweight future harms according to how far in the future they are. The rationale is you could equally well put the money in the bank, and collect interest to pay for future harms (i.e. generate a “richer future”, rather than spend the money now on mitigating the problem). But certain things cannot be replaced by money (e.g. human life, species extinction). Therefore they cannot be discounted. And of course, economists make the 9 billion tonne hamster mistake – they assume the economy can keep on growing forever. [Note: Ray has more to say on cost-benefit and discounting in his slides, which he skipped over in the talk through lack of time]

Fairness is a major issue. How do we approach this? For example, retributive justice – punish the bad guys? You broke it, you fix it? Whoerever suffers the least from fixing it moves first? Consider everyone to be equal?  Well, the Canadian climate policy appears to be: wait to see what Obama does, and do the same, unless we can get away with doing less.

What about China vs. the US, the two biggest emitters of greenhouse gases? The graph of annual CO2 emissions shows that China overtook the US in the last few years (while, for example, France held their emissions constant). But you have to normalize the emissions per capita, then the picture looks very different. And here’s an interesting observation: China has a per capita emissions very close to that of France, but doesn’t have French standard of living. Therefore there is clearly room for China to improve its standard of living without increasing per capita emissions, which means that emissions controls do not necessarily hold back development.

But because it’s cumulative emissions that really matter, we have to look at each nation’s cumulative per capita emissions. The calculation is tricky because we have to account for population growth. It turns out that the US has a bigger population growth problem than China, which, when added to the cumulative emissions, means US has much bigger responsibility to act. If we take the target of 1000GtC as the upper limit on cumulative emissions (to stay within the 2°C temperature rise), and allocate that equally to everyone, based on 2006 population figures, we get about 100 tonnes of carbon per capita as a lifetime allowance. The US has an overdraft on this limit (because the US has used up more than this), while China still has a carbon balance (it’s used up less). In other words, in terms of the thing that matters most, cumulative emissions, the US has used up more than it’s fair share of a valuable resource (slide 43 from Ray’s talk):

This graph shows the cumulative emissions per (2006) capita for the US and China. If we take 100 tonnes as the lifetime limit for each person (to keep within the global 1000Gt target), then the US has already used more than its fair share, and China has used much less.

This analysis makes it clear what the climate justice position is. The Chinese might argue that just to protect themselves and their climate, China might need to do something more than its fair share. In terms of a negotiation, arguing about everyone taking action together, might be expedient. But the right thing to do for the US is not just to reduce emissions to zero immediately, but to pay back that overdraft.

Some interesting questions from the audience:

Q: On geoengineering – why rule out attempts to change the albedo of the earth by sulphate particle seeding when we might need an “all of the above” approach? A: Ray’s argument is largely about what happens if it fails. For example, if the dutch dykes fail, in the worst case, the Dutch could move elsewhere. If global geoengineering fails, we don’t have an elsewhere to move to. Also, if you stop, you get hit all at once with the accumulated temperature rise. This makes Levitt’s suggestion of “burn it all and geoengineer to balance” to be morally reprehensible.

Q: Could you say more about the potential for air capture? A: It’s a very intriguing idea. All the schemes being trialed right now capture carbon in the form of CO2 gas, which would then need to be put down into mineral form somehow. A more interesting approach is to capture CO2 directly in mineral form, e.g. limestone. It’s not obviously crazy, and if it works it would help. It’s more like insurance, and investing in research in this likely to provide a backup plan in a way that albedo alteration does not.

Q: What about other ways of altering the albedo? A: Suggestions such as painting roofs & parking lots white will help reduce urban heat, mitigate effect of heatwaves, and also reduce use of airconditioners. Which is good, but it’s essentially a regional effect. The overall effect on the global scale is probably negligible. So it’s a good idea because it only has a regional impact.

Q: About nuclear – will we need it? A: Ray says probably yes. If it comes down to a choice between nuclear vs. coal, the choice has to be nuclear.

Finally, I should mention Ray has a new book coming out: Principles of Planetary Climate, and is a regular contributor to RealClimate.org.

This morning I attended the first part of the Centre for Environment’s Research Day, and I’m glad I did, because I caught the talk by Chris Kennedy from the Sustainable Infrastructure Group in the Dept of Civil Engineering, on “Greenhouse Gases from Global Cities”. He talked about a study he’s just published, on the contribution of ten major cities to GHG emissions. Chris points out that most of the solutions to climate change will have to focus on changing cities. Lots of organisations are putting together greenhouse gas inventories for cities, but everyone is doing it differently, measuring different things. Chris’s study examined how to come up with a consistent approach. For example, the approach taken in Paris is good at capturing lifecycle emissions, London is good at spatial issues, Tokyo is good at analyzing emissions over time. Each perspective useful, but the differences make comparisons hard. But there’s no obvious right way to do it. For example, how do you account for the timing of emissions release, e.g. for waste disposal? Do you care about current emissions as a snapshot, or future emissions that are committed because of waste generated today?

The IPCC guidelines for measuring emissions take a pure producer perspective. They focus only on emissions that occur within the jurisdiction of each territory. This ignores, for example, consumer emissions when the consumer of a product or service is elsewhere. It also ignores upstream emissions: e.g. electricity generation is generally done outside the city, but used within the city. Then there’s line loss in power transmission to the city; that should also get counted. In Paris, Le Bilan Carbon counts embodied emissions in building materials, maintenance of vehicles, refining of fuels, etc. but it ignores emissions by tourists, which is a substantial part of Paris’ economy.

In the study Chris and colleagues did, they studied ten cities, many iconic: Bankok, Barcelona, Cape Town, Denver, Geneva, London, Los Angeles, New York, Prague and Toronto. Ideally they would like to have studied metropolitan regions rather than cities, because it then becomes simpler to include transport emissions for commuting, which really should be part of the assessment of each city. The study relied partially on existing assessments for some of these cities and analyzed emissions in terms of electricity, heating/industrial fuels (lumped together, unfortunately), ground transport, aviation and marine fuels, industrial processes, and waste (the methodology is described here).

For example, for electricity, Toronto comes second in consumption (MWh per capita), after Denver, and is about double that of London and Prague. Mostly, this difference is due to the different climate, but also the amount of commerce and industry within the city. However, the picture for carbon intensity is very different, as there is a big mix of renewables (e.g. hydro) in Toronto’s power supply, and Geneva gets its power supply almost entirely from hydro. So you get some interesting combinations: Toronto has high consumption but low intensity, whereas Cape Town has low consumption and high intensity. So multiply the two: Denver is off the map  at 9 t eCO2 per capita, because it has high consumption and high intensity, while most others are in the same range, around 2-3 t eCO2 per capita. And Geneva is very low:

Climate has to be taken into account somehow, because there is an obvious relationship between energy used for heating and typical temperatures, which can be assessed by counting heating degree days:

Aviation is very interesting. Some assessments exclude it, on the basis that local government has no control. But Chris points out that when it comes down to it, local government has very little control over anything, so that argument doesn’t really wash. The UNFCC says domestic flights should be included, but this then has a small country bias – small countries tend to have very few domestic flights. A better approach is to include international flights as well, so that we count all flights taking off from that city. Chris’ methodology assesses this as jet fuel loaded at each airport. For this then, London is way out in the lead:

In summary, looking at total emissions, Denver is way out in front. In conversations with them, it was clear they had no idea – they think of themselves as a clean green city up in the mountains. No surprises that the North American cities all fare the worst, driven by a big chunk of emissions from ground transportation. The real surprises though are Bangkok and Cape Town, which compare with New York and Toronto for total emissions:

Chris concluded the talk with some data from Asian cities that were not included in the above study. In particular, Shanghai and Beijing are important in part because of their sheer size. For example, if Shanghai on its own was a country, it would come it about #25 in the world for total emissions.

One thing I found interesting from the paper that Chris didn’t have time to cover in the talk was the obvious relationship between population density and emissions from ground transportation fuels. Clearly, to reduce carbon emissions, cities need to become much denser (and should all be more like Barcelona):

Busy few days coming up in Toronto, around the celebration of Earth day tomorrow:

  • tonight, the Green Party is hosting an evening with Gwynne Dyer and Elizabeth May entitled “Finding Hope: Confronting Climate Wars“. Gwynne is, of course, the author of the excellent book Climate Wars, also available as a series of podcasts from the CBC;
  • tomorrow (April 22) the Centre for Environment has its Research Day, showcasing some of the research of the centre;
  • all week, the Centre for Global Change science is hosting Ray Pierrehumbert giving a lecture series “New Worlds, New Climates“. Tomorrow’s (April 22) looks particularly interesting: “Climate Ethics, Climate Justice”;
  • next Tuesday (April 27), the Centre for Ethics is running a public issues forum on “Climate Change and the Ethics of Responsibility“;

I won’t make it to all of these, but will blog those that I do make. Seems like ethics of climate change is a theme, which I think is very timely.

09. April 2010 · 9 comments · Categories: politics

My debate with George Monbiot is still going on in this thread. I’m raising this comment to be a separate blog post (with extra linky goodness), because I think it’s important, independently of any discussion of the CRU emails (and to point out that the other thread is still growing – go see!)

Like many other commentators, George Monbiot suggests that “to retain the moral high ground we have to be sure that we’ve got our own house in order. That means demanding the highest standards of scientific openness, transparency and integrity”.

It’s hard to argue with these abstract ideals. But I’ll try, because I think this assertion is not only unhelpful, but also helps to perpetuate several myths about science.

The argument that scientists should somehow be more virtuous (than regular folks) is a huge fallacy. Openness and transparency are great as virtues to strive for. But they cannot ever become a standard by which we judge individual scientists. For a start, no scientific field has ever achieved the levels of openness that are being demanded here. The data is messy, the meta-data standards are not in place, the resources to curate this data are not in place. Which means the “get our own house in order” argument is straight denialist logic – they would have it that we can’t act on the science until every last bit of data is out in the public domain. In truth, climate science has developed a better culture of data sharing, replication, and results checking than almost any other scientific field. Here’s one datapoint to back this up: in no other field of computational science are there 25+ teams around the world building the same simulation models independently, and systematically comparing their results on thousands of different scenarios in order to understand the quality of those simulations.

We should demand from scientists that they do excellent science. But we should not expect them to also somehow be superhuman. The argument that scientists should never exhibit human weaknesses is not just fallacious, it’s dangerous. It promotes the idea that science depends on perfect people to carry it out, when in fact the opposite is the case. Science is a process that compensates for the human failings of the people who engage in it, by continually questioning evidence, re-testing ideas, replicating results, collecting more data, and so on. Mistakes are made all the time. Individual scientists screw up. If they don’t make mistakes, they’re not doing worthwhile science. It’s vitally important that we get across to the public that this is how science works, and that errors are an important part of the process. Its the process that matters, not any individual scientist’s work. The results of this process are more trustworthy than any other way of producing knowledge, precisely because the process is robust in the face of error.

In the particular case [of the CRU emails], calling for scientists to take the moral high ground, and to be more virtuous, is roughly the equivalent of suggesting that victims of sexual assault should act more virtuous. And if you think this analogy is over the top, you haven’t understood the nature of the attacks on scientists like Mann, Santer, Briffa, and Jones. Look at Jones now: he’s contemplated suicide, he’s on drugs just to help him get through the day, and more drugs to allow him to sleep at night. These bastards have destroyed a brilliant scientist. And somehow the correct response is that scientists should strive to be more virtuous?! Oh yes, blame the victim.

08. April 2010 · 6 comments · Categories: politics

Picked up my copy of the Guardian Weekly today, to see a front page story entitled “Trillion-Dollar question: future of climate talks”. Halfway through the article we get:

“Politicians and negotiators will find the mood of the talks very different from where they were left off in Copenhagen in December. For a start, the climate science that has underpinned them has suffered damaging setbacks. There was the leaking from the University of East Anglia’s climate research unit of email exchanges between some of the world’s top meteorologists as well as the discovery that a UN assessment report on climate change had vastly exaggerated the rate of melting of Himalayan glaciers.

The former revelation suggested some researchers were involved in massaging the truth, sceptics claimed, while the latter exposed deficiencies in the way the UN’s Intergovernmental Panel on Climate Change – authors of the report – go about their business. The overall effect has been to damage the credibility of the large number of scientists who fear our planet faces climatic disaster” (The Guardian Weekly, 2-8 April 2010)

This is such utter nonsense, I’m left wondering whether the Guardian has been taken over by Fox news. Lots of good science happened in the last few months, but all the media seems to care about is a minor error in one paragraph of a 3000 page document, and emails that show how much climate scientists are being harassed by denialist bullies. This isn’t a damaging setback, it’s a pack of lies. How did this denialist rhetoric come to dominate the media? Has the entire media now gone into denial? Will we get some kind of ransom note explaining what we have to pay to get our science back?

07. April 2010 · 53 comments · Categories: politics

My exposé of why academics’ private emails sometimes seem cranky has gotten a lot of attention. Joe Romm posted it at ClimateProgress, where it generated many comments, many expressing thanks for saying what needed to be said. George Monbiot posted a comment, pointing out that for a journalist, FoI laws are sacred: a hard won concession that allows them to fight the secrecy that normally surrounds the political establishment. So there’s clearly some mutual incomprehension between the two cultures, academic and journalistic. For journalists, FoI is a vital weapon to root out corruption in a world where few people can be trusted. For scientists, FoI is a blunt instrument, unneeded in a world where honesty and trust are the community norms, and data is freely shared as much as is practically possible.

George expands this theme on his blog, and I appear to have shifted his perspective on the CRU emails, although perhaps not as far as I might have hoped. His thesis is that scientists and journalists have each formed a closed culture, leading each to be suspicious (and worse) of the other. Well, I think this is not strictly accurate. I don’t think either culture is walled in. In fact, I’m beginning to think I overstated the case in my original post: scientists are certainly not “walled in like an anchorite”. Pretty much every scientist I know will happily talk at length to anyone who shows an interest in their work, and will nearly always share data and code with anyone who is engaged in honest scientific work. For our own research into software quality, we have obtained source code, datasets, software bug histories, and extensive access for interviews in every climate modeling centre we have approached.

Unfortunately, scientists tend to be way too focussed (obsessed?) for most people’s taste, so lay people don’t generally want to talk to them in the first place. But scientists will accept into the community anyone who’s willing to work at it (after all most of us spend a lot of time training students), as long as they show the necessary commitment to the scientific process and the pursuit of truth. Traditional investigative journalism used to share these values too, but this tradition now seems to be another endangered species. The experience when scientists talk to journalists is usually more about the journalist seeking a sensationalist angle to sell a story, rather than a quest for understanding. And a reliance on false balance rather than weighing up the evidence.

So there is a bit of a gulf between the two cultures, but its not insurmountable, and there are plenty of examples of good science reporting to show that people regularly do bridge this gulf.

No, the real story is not the relationship between science and the media at all. It’s the story of how the media has been completely taken in by a third group, a third culture, consisting of ideologically-driven, pathological liars, who will say almost anything in order to score political points, and will smear anyone they regard as an opponent. Stern calls climate change the greatest ever failure of the free markets. I think that looking back, we may come to regard the last six months as the greatest ever failure of mass media. Or alternatively, the most successful disinformation campaign ever waged.

At the centre of this story are people like Marc Morano and Jim Inhofe. They haven’t a clue what science is; to them it’s just one more political viewpoint to attack. They live in a world of paranoid fantasies, where some secret cabal is supposedly trying to set up a world government to take away their freedoms. Never mind that every credible scientific body on the planet is warning about the wealth of evidence we now have about the risk of dangerous climate change. Never mind that the IPCC puts together one of the most thorough (and balanced!) state-of-the-art surveys ever undertaken in any scientific field. Never mind that the newest research suggests that these assessments are, if anything, underestimating the risk. No, these people don’t like the message, and so set out to attack the messengers with a smear campaign based on hounding individual scientists for years and years until they snap, and then spreading stories in the media about what happens when the scientists tell them to piss off.

Throughout all this, in underfunded labs, and under a barrage of attacks, scientists have done their job admirably. They chase down the uncertainties, and report honestly and accurately what they know. They doggedly compile assessment reports year after year to present the mass of evidence to anyone who cares to listen. It simply beggars belief that journalists could, in 2010, still be writing opinion pieces arguing that the scientists need to do a better job, that they are poor communicators, that we need more openness and more data sharing. That these themes dominate the reporting is a testament to how effective the disinformation campaign has been. The problem is not in the science, or with scientists at all, nor with a culture gap between science and the media. The problem is with this third group, the disinformers, who have completely dominated the framing of the story, and how honest journalists have been completely taken in by this framing.

How did they do it? Well, one crucial element of their success is their use of FoI laws. By taking the journalists’ most prized weapon, and wielding it against climate scientists, they achieved a whole bunch of successes all at once. They got journalists on their side, because journalists have difficulty believing that FoI laws could be used for anything other than good old-fashioned citizen democracy. They got the public on their side by appearing to be the citizens fighting the establishment. They set up the false impression that scientists have stuff to hide, by ignoring the vast quantities of open data in climate science, and focussing on the few that were tied up with commercial licence agreements. And they effected a denial of service attack by flooding a few target scientists with huge numbers of FoI requests. Add to this the regular hate mail and death threats that climate scientists receive, and you have a recipe for personal meltdowns. And the media lapped up the story about personal meltdowns, picked it up and ran with it, and never once asked whose framing they were buying into.

And the result is that, faced with one of the greatest challenges humanity has ever faced, the media got the story completely backwards. Few journalists and few scientists seem to have any conception of how this misinformation campaign works, how nasty these people are, and how dirty they play. They have completely owned the story for the last few months, with their framing of “scientists making mistakes” and “scientists distorting their data”. They’ve successfully portrayed the scientists as being at fault, when it is the scientists who are the victims of one of the nastiest public bullying campaigns ever conducted. History will have to judge how it compares to other such episodes (McCarthyism would make a fascinating comparator). And the stakes are high: at risk is our ability to make sensible policy choices and international agreements based on good scientific evidence, to ensure that our children and grandchildren can flourish as we do.

We’re fucking this up bigtime, and it’s not the scientists who are at fault.

Susan Leigh Star passed away in her sleep this week, coincidently on Ada Lovelace day. As I didn’t get a chance to do a Lovelace post, I’m writing this one belatedly, as a tribute to Leigh.

Leigh Star (sometimes also known as L*) had a huge influence on my work back in the early 90’s. I met her when she was in the UK, at a time when there was a growing community of folks at Sussex, Surrey, and Xerox Europarc, interested in CSCW. We organised a series of workshops on CSCW in London, at the behest of the UK funding councils. Leigh spoke at the the workshop that I chaired, and she subsequently contributed a chapter entitled “Cooperation Without Consensus in Scientific Problem Solving” to our book, CSCW: Cooperation of Conflict. Looks like the book is out of print, and I really want to read Leigh’s chapter again, so I hope I haven’t lost my copy – the only chapter I still have electronically is our introduction.

Anyway, Leigh pioneered a new kind of sociology of scientific work practices, looking at the mechanisms by which coordination and sharing occurs across disciplinary boundaries. Perhaps one of her most famous observations is the concept of boundary objects, which I described in detail last year in response to seeing coordination issues arise between geophysicists trying to consolidate their databases. The story of the geologists realizing they didn’t share a common definition of the term “bedrock” would have amused and fascinated her.

It was Leigh’s work on this that first switched me on to the value of sociological studies as a way of understanding the working practices of scientists, and she taught me a lot about how to use ethnographic techniques to study how people use and develop technical infrastructures. I’ve remained fascinated by her ideas ever since. For those wanting to know more about her work, I could suggest this interview with her from 2008, or better yet, buy her book on how classification schemes work, or perhaps read this shorter paper on the Ethnography of Infrastructure. She had just moved to the i-school at U Pittsburgh last year, so I assumed she still had many years of active research ahead of her. I’m deeply saddened that I didn’t get another chance to meet with her.

Leigh – we’ll miss you!

Note: This started as a comment on a thread at RealClimate about the Guardian’s investigation of the CRU emails fiasco. The Guardian has, until recently, had an outstandingly good record on it’s climate change reporting. It commissioned Fred Pearce to do a detailed investigation into the emails, and he published his results in a 12-part series. While some parts of it are excellent, other parts demonstrate a complete misunderstanding of how science works, especially the sections dealing with the peer-review process. These were just hopelessly wrong, as demonstrated by Ben Santer’s rebuttal of the specific allegations. In parallel, George Monbiot, who I normally respect as one of the few journalists who really understands the science, has been arguing for Phil Jones to resign as head of the CRU at East Anglia, on the basis that his handling of the FOI requests was unprofessional. Monbiot has repeated this more recently, as can be seen in this BBC clip, where he is hopelessly ineffective in combating Delingpole’s nonsense, because he’s unwilling to defend the CRU scientists adequately.

The problem with both Pearce’s investigation, and Monbiot’s criticisms of Prof Jones is that neither has any idea of what academic research looks like from the inside, nor how scientists normally talk to one another. The following is my attempt to explain this context, and in particular why scientists talking freely among themselves might seem to rude or worse. Enough people liked my comment at RC that I decided to edit it a little and post it here (the original has already been reposted at ClimateSight and Prof Mandia’s blog). I should add one disclaimer: I don’t mean to suggest here that scientists are not nice people – the climate scientists I’ve gotten to know over the past few years are some of the nicest people you could ever ask to meet. It’s just that scientists are extremely passionate about the integrity of their work, and don’t take kindly to people pissing them around. Okay, now read on…

Once we’ve gotten past the quote-mining and distortion, the worst that can be said about the CRU emails is that the scientists sometimes come across as rude or dismissive, and say things in the emails that really aren’t very nice. However, the personal email messages between senior academics in any field are frequently not very nice. We tend to be very blunt about what appears to us as ignorance, and intolerant of anything that wastes our time, or distracts us from our work. And when we think (rightly or wrongly) that the peer review process has let another crap paper through, we certainly don’t hold back in expressing our opinions to one another. Which is of course completely different to how we behave when we meet one another. Most scientists distinguish clearly between the intellectual cut and thrust (in which we’re sometimes very rude about one another’s ideas) and our social interactions (in which we all get together over a beer and bitch about the downsides of academic life). Occasionally, there’s someone who is unable to separate the two, and takes the intellectual jabs personally, but such people are rare enough in most scientific fields that the rest of us know exactly who they are, and try to avoid them at conferences.

Part of this is due to the nature of academic research. Most career academics have large egos and very thick skins. I think the tenure process and the peer review process filter out those who don’t. We’re all jostling to get our work published and recognised, often by pointing out how flawed everyone else’s work is. But we also care deeply about intellectual rigor, and preserving the integrity of the published body of knowledge. And we also know that many key career milestones are dependent on being respected (and preferably liked) by others in the field: for example, the more senior people who might get asked to write recommendation letters for us, for tenure and promotion and honors, or the scientists with competing theories who will get asked to peer review our papers.

Which means in public (e.g. in conference talks and published papers) our criticisms of others are usually carefully coded to appear polite and respectful. A published paper might talk about making “an improvement on the methodology of Bloggs et al”. Meanwhile, in private, when talking to our colleagues, we’re more likely to say that Bloggs’ work is complete rubbish, and should never have been published in the first place, and anyway everyone knows Bloggs didn’t do any of the work himself, and the only decent bits are due to his poor, underpaid postdoc, who never gets any credit for her efforts. (Yes, academics like to gossip about one another just as regular people do). This kind of blunt rudeness is common in private emails, especially when we’re discussing other scientists behind their backs with likeminded colleagues. Don’t be fooled by the more measured politeness in public: when we think an idea is wrong, we’ll tear it to shreds.

Now, in climate science, all our conventions are being broken. Private email exchanges are being made public. People who have no scientific training and/or no prior exposure to the scientific culture are attempting to engage in a discourse with scientists, and neither side understands the other. People are misquoting scientists, and trying to trip them up with loaded questions. And, occasionally, resorting to death threats. Outside of the scientific community, most people just don’t understand how science works, and so don’t know how to make sense of what’s going on.

And scientists don’t really know how to engage with these strange outsiders. Scientists normally only interact with other scientists. We live rather sheltered lives; they don’t call it the ivory tower for nothing. When scientists are attacked for political reasons, we mistake it for an intellectual discussion over brandy in the senior common room. Scientists have no training for political battles, and so our responses often look rude or dismissive to outsiders. Which in turn gets interpreted as unprofessional behaviour by those who don’t understand how scientists talk. And unlike commercial organisations and politicians, universities don’t engage professional PR firms to make us look good, and we academics would be horrified if they did: horrified at the expense, and horrified by the idea that our research might need to be communicated on anything other than its scientific merits.

Journalists like Monbiot, despite all his brilliant work in keeping up with the science and trying to explain it to the masses, just haven’t ever experienced academic culture from the inside. Hence his call, which he keeps repeating, for Phil Jones to resign, on the basis that Phil reacted unprofessionally to FOI requests. But if you keep provoking a scientist with nonsense, you’ll get a hostile response. Any fool knows you don’t get data from a scientist by using FOI requests, you do it by stroking their ego a little, or by engaging them with a compelling research idea that you need the data to pursue. And in the rare cases where this doesn’t work, you do some extra work yourself to reconstruct the data you need using other sources, or you test your hypothesis using a different approach (because it’s the research result we care about, not any particular dataset). So to a scientist, anyone stupid enough to try to get scientific data through repeated FOI requests quite clearly deserves our utter contempt. Jones was merely expressing (in private) a sentiment that most scientists would share – and extreme frustration with people who clearly don’t get it.

The same misunderstandings occur when outsiders look at how we talk about the peer-review process. Outsiders tend to think that all published papers are somehow equal in merit, and that peer-review is a magical process that only lets the truth through (hint: we refer to it more often as a crap-shoot). Scientists know that while some papers are accepted because they are brilliant, others are accepted because its hard to tell whether they are any good, and publication might provoke other scientists to do the necessary followup work. We know some published papers are worth reading, and some should be ignored. So, we’re natural skeptics – we tend to think that most new published results are likely to be wrong, and we tend to accept them only once they’ve been repeatedly tested and refined.

We’re used to having our own papers rejected from time to time, and we learn how to deal with it – quite clearly the reviewers were stupid, and we’ll show them by getting it published elsewhere (remember, big ego, thick skin). We’re also used to seeing the occasional crap paper get accepted (even into our most prized journals), and again we understand that the reviewers were stupid, and the journal editors incompetent, and we waste no time in expressing that. And if there’s a particularly egregious example, everyone in the community will know about it, everyone will agree it’s bad, and some of us will start complaining loudly about the idiot editor who let it through. Yet at the same time, we’re all reviewers, and some of us are editors, so it’s understood that the people we’re calling stupid and incompetent are our colleagues. And a big part of calling them stupid or incompetent is to get them to be more rigorous next time round, and it works because no honest scientist wants to be seen as lacking rigor. What looks to the outsider like a bunch of scientists trying to subvert some gold standard of scientific truth is really just scientists trying to goad one another into doing a better job in what we all know is a messy, noisy process.

The bottom line is that scientists will always tend to be rude to ignorant and lazy people, because we expect to see in one another a driving desire to master complex ideas and to work damn hard at it. Unfortunately the outside world (and many journalists) interpret that rudeness as unprofessional conduct. And because they don’t see it every day (like we do!) they’re horrified.

Some people have suggested that scientists need to wise up, and learn how to present themselves better on the public stage. Indeed, the Guardian published an editorial calling for the emergence of new leaders from the scientific community who can explain the science. This is naive and irresponsible. It completely ignores the nature of the current wave of attacks on scientists, and what motivates those attacks. No scientist can be an effective communicator in a world where people with vested interests will do everything they can to destroy his or her reputation. The scientific community doesn’t have the resources to defend itself in this situation, and quite frankly it shouldn’t have to. What we really need is for newspaper editors, politicians, and business leaders to start acting responsibly, make the effort to understand what the science is saying, make the effort to understand what is really driving these swiftboat-style attacks on scientists, and then shift the discourse from endless dissection of scientists’ emails onto useful, substantive discussions of the policy choices we’re faced with.

[Update: Joe Romm has reposted this at ClimateProgress, and it’s generated some very interesting discussion, including a response from George Monbiot that’s worth reading]

[Update 2: 31/3/2010 The UK Parliament released its findings last night, and completely exonerates Prof. Jones and the CRU. It does, however, suggest that the UEA should bear responsibility for any mistakes that were made over how the FoI requests were handled, and it makes a very strong call for more openness with data and software from the climate science community]

[Update 3: 7/4/2010 A followup post in which I engaged George Monbiot in a lengthy debate (and correct some possible misimpressions from the above post)]

[Update 4: 27/4/2010 This post was picked up by Physics Today]

This week I attended a Dagstuhl seminar on New Frontiers for Empirical Software Engineering. It was a select gathering, with many great people, which meant lots of fascinating discussions, and not enough time to type up all the ideas we’ve been bouncing around. I was invited to run a working group on the challenges to empirical software engineering posed by climate change. I started off with a quick overview of the three research themes we identified at the Oopsla workshop in the fall:

  • Climate Modeling, which we could characterize as a kind of end-user software development, embedded in a scientific process;
  • Global collective decision-making, which involves creating the software infrastructure for collective curation of sources of evidence in a highly charged political atmosphere;
  • Green Software Engineering, including carbon accounting for the software systems lifecycle (development, operation and disposal), but where we have no existing no measurement framework, and tendency to to make unsupported claims (aka greenwashing).

Inevitably, we spent most of our time this week talking about the first topic – software engineering of computational models, as that’s the closest to the existing expertise of the group, and the most obvious place to start.

So, here’s a summary of our discussions. The bright ideas are due to the group (Vic Basili, Lionel Briand, Audris Mockus, Carolyn Seaman and Claes Wohlin), while the mistakes in presenting them here are all mine.

A lot of our discussion was focussed on the observation that climate modeling (and software for computational science in general) is a very different kind of software engineering than most of what’s discussed in the SE literature. It’s like we’ve identified a new species of software engineering, which appears to be a an outlier (perhaps an entirely new phylum?). This discovery (and the resulting comparisons) seems to tell us a lot about the other species that we thought we already understood.

The SE research community hasn’t really tackled the question of how the different contexts in which software development occurs might affect software development practices, nor when and how it’s appropriate to attempt to generalize empirical observations across different contexts. In our discussions at the workshop, we came up with many insights for mainstream software engineering, which means this is a two-way street: plenty of opportunity for re-examination of mainstream software engineering, as well as learning how to study SE for climate science. I should also say that many of our comparisons apply to computational science in general, not just climate science, although we used climate modeling for many specific examples.

We ended up discussing three closely related issues:

  1. How do we characterize/distinguish different points in this space (different species of software engineering)? We focussed particularly on how climate modeling is different from other forms of SE, but we also attempted to identify factors that would distinguish other species of SE from one another. We identified lots of contextual factors that seem to matter. We looked for external and internal constraints on the software development project that seem important. External constraints are things like resource limitations, or particular characteristics of customers or the environment where the software must run. Internal constraints are those that are imposed on the software team by itself, for example, choices of working style, project schedule, etc.
  2. Once we’ve identified what we think are important distinguishing traits (or constraints), how do we investigate whether these are indeed salient contextual factors? Do these contextual factors really explain observed differences in SE practices, and if so how? We need to consider how we would determine this empirically. What kinds of study are needed to investigate these contextual factors? How should the contextual factors be taken into account in other empirical studies?
  3. Now imagine we have already characterized this space of species of SE. What measures of software quality attributes (e.g. defect rates, productivity, portability, changeability…) are robust enough to allow us to make valid comparisons between species of SE? Which metrics can be applied in a consistent way across vastly different contexts? And if none of the traditional software engineering metrics (e.g. for quality, productivity, …) can be used for cross-species comparison, how can we do such comparisons?

In my study of the climate modelers at the UK Met Office Hadley centre, I had identified a list of potential success factors that might explain why the climate modelers appear to be successful (i.e. to the extent that we are able to assess it, they appear to build good quality software with low defect rates, without following a standard software engineering process). My list was:

  • Highly tailored software development process – software development is tightly integrated into scientific work;
  • Single Site Development – virtually all coupled climate models are developed at a single site, managed and coordinated at a single site, once they become sufficiently complex [edited – see Bob’s comments below], usually a government lab as universities don’t have the resources;
  • Software developers are domain experts – they do not delegate programming tasks to programmers, which means they avoid the misunderstandings of the requirements common in many software projects;
  • Shared ownership and commitment to quality, which means that the software developers are more likely to make contributions to the project that matter over the long term (in contrast to, say, offshored software development, where developers are only likely to do the tasks they are immediately paid for);
  • Openness – the software is freely shared with a broad community, which means that there are plenty of people examining it and identifying defects;
  • Benchmarking – there are many groups around the world building similar software, with regular, systematic comparisons on the same set of scenarios, through model inter-comparison projects (this trait could be unique – we couldn’t think of any other type of software for which this is done so widely).
  • Unconstrained Release Schedule – as there is no external customer, software releases are unhurried, and occur only when the software is considered stable and tested enough.

At the workshop we identified many more distinguishing traits, any of which might be important:

  • A stable architecture, defined by physical processes: atmosphere, ocean, sea ice, land scheme,…. All GCMs have the same conceptual architecture, and it is unchanged since modeling began, because it is derived from the natural boundaries in physical processes being simulated [edit: I mean the top level organisation of the code, not the choice of numerical methods, which do vary across models – see Bob’s comments below]. This is used as an organising principle both for the code modules, and also for the teams of scientists who contribute code. However, the modelers don’t necessarily derive some of the usual benefits of stable software architectures, such as information hiding and limiting the impacts of code changes, because the modules have very complex interfaces between them.
  • The modules and integrated system each have independent lives, owned by different communities. For example, a particular ocean model might be used uncoupled by a large community, and also be integrated into several different coupled climate models at different labs. The communities who care about the ocean model on its own will have different needs and priorities than each of communities who care about the coupled models. Hence, the inter-dependence has to be continually re-negotiated. Some other forms of software have this feature too: Audris mentioned voice response systems in telecoms, which can be used stand-alone, and also in integrated call centre software; Lionel mentioned some types of embedded control systems onboard ships, where the modules are used indendently on some ships, and as part of a larger integrated command and control system on others.
  • The software has huge societal importance, but the impact of software errors is very limited. First, a contrast: for automotive software, a software error can immediately lead to death, or huge expense, legal liability, etc,  as cars are recalled. What would be the impact of software errors in climate models? An error may affect some of the experiments performed on the model, with perhaps the most serious consequence being the need to withdraw published papers (although I know of no cases where this has happened because of software errors rather than methodological errors). Because there are many other modeling groups, and scientific results are filtered through processes of replication, and systematic assessment of the overall scientific evidence, the impact of software errors on, say, climate policy is effectively nil. I guess it is possible that systematic errors are being made by many different climate modeling groups in the same way, but these wouldn’t be coding errors – they would be errors in the understanding of the physical processes and how best to represent them in a model.
  • The programming language of choice is Fortran, and is unlikely to change for very good reasons. The reasons are simple: there is a huge body of legacy Fortran code, everyone in the community knows and understands Fortran (and for many of them, only Fortran), and Fortran is ideal for much of the work of coding up the mathematical formulae that represent the physics. Oh, and performance matters enough that the overhead of object oriented languages makes them unattractive. Several climate scientists have pointed out to me that it probably doesn’t matter what language they use, the bulk of the code would look pretty much the same – long chunks of sequential code implementing a series of equations. Which means there’s really no push to discard Fortran.
  • Existence and use of shared infrastructure and frameworks. An example used by pretty much every climate model is MPI. However, unlike Fortran, which is generally liked (if not loved), everyone universally hates MPI. If there was something better they would use it. [OpenMP doesn’t seem to have any bigger fanclub]. There are also frameworks for structuring climate models and coupling the different physics components (more on these in a subsequent post). Use of frameworks is an internal constraint that will distinguish some species of software engineering, although I’m really not clear how it will relate to choices of software development process. More research needed.
  • The software developers are very smart people. Typically with PhDs in physics or related geosciences. When we discussed this in the group, we all agreed this is a very significant factor, and that you don’t need much (formal) process with very smart people. But we couldn’t think of any existing empirical evidence to support such a claim. So we speculated that we needed a multi-case case study, with some cases representing software built by very smart people (e.g. climate models, the Linux kernel, Apache, etc), and other cases representing software built by …. stupid people. But we felt we might have some difficulty recruiting subjects for such a study (unless we concealed our intent), and we would probably get into trouble once we tried to publish the results 🙂
  • The software is developed by users for their own use, and this software is mission-critical for them. I mentioned this above, but want to add something here. Most open source projects are built by people who want a tool for their own use, but that others might find useful too. The tools are built on the side (i.e. not part of the developers’ main job performance evaluations) but most such tools aren’t critical to the developers’ regular work. In contrast, climate models are absolutely central to the scientific work on which the climate scientists’ job performance depends. Hence, we described them as mission-critical, but only in a personal kind of way. If that makes sense.
  • The software is used to build a product line, rather than an individual product. All the main climate models have a number of different model configurations, representing different builds from the codebase (rather than say just different settings). In the extreme case, the UK Met Office produces several operational weather forecasting models and several research climate models from the same unified codebase, although this is unusual for a climate modeling group.
  • Testing focuses almost exclusively on integration testing. In climate modeling, there is very little unit testing, because it’s hard to specify an appropriate test for small units in isolation from the full simulation. Instead the focus is on very extensive integration tests, with daily builds, overnight regression testing, and a rigorous process of comparing the output from runs before and after each code change. In contrast, most other types of software engineering focus instead on unit testing, with elaborate test harnesses to test pieces of the software in isolation from the rest of the system. In embedded software, the testing environment usually needs to simulate the operational environment; the most extreme case I’ve seen is the software for the international space station, where the only end-to-end software integration was the final assembly in low earth orbit.
  • Software development activities are completely entangled with a wide set of other activities: doing science. This makes it almost impossible to assess software productivity in the usual way, and even impossible to estimate the total development cost of the software. We tried this as a thought experiment at the Hadley Centre, and quickly gave up: there is no sensible way of drawing a boundary to distinguish some set of activities that could be regarded as contributing to the model development, from other activities that could not. The only reasonable path to assessing productivity that we can think of must focus on time-to-results, or time-to-publication, rather than on software development and delivery.
  • Optimization doesn’t help. This is interesting, because one might expect climate modelers to put a huge amount of effort into optimization, given that century-long climate simulations still take weeks/months on some of the world’s fastest supercomputers. In practice, optimization, where it is done, tends to be an afterthought. The reason is that the model is changed so frequently that hand optimization of any particular model version is not useful. Plus the code has to remain very understandable, so very clever designed-in optimizations tend to be counter-productive.
  • There are very few resources available for software infrastructure. Most of the funding is concentrated on the frontline science (and the costs of buying and operating supercomputers). It’s very hard to divert any of this funding to software engineering support, so development of the software infrastructure is sidelined and sporadic.
  • …and last but not least, A very politically charged atmosphere. A large number of people actively seek to undermine the science, and to discredit individual scientists, for political (ideological) or commercial (revenue protection) reasons. We discussed how much this directly impacts the climate modellers, and I have to admit I don’t really know. My sense is that all of the modelers I’ve interviewed are shielded to a large extend from the political battles (I never asked them about this). Those scientists who have been directly attacked (e.g. MannJonesSanter) tend to be scientists more involved in creation and analysis of datasets, rather than GCM developers. However, I also think the situation is changing rapidly, especially in the last few months, and climate scientists of all types are starting to feel more exposed.

We also speculated about some other contextual factors that might distinguish different software engineering species, not necessarily related to our analysis of computational science software. For example:

  • Existence of competitors;
  • Whether software is developed for single-person-use versus intended for broader user base;
  • Need for certification (and different modes by which certification might be done, for example where there are liability issues, and the need to demonstrate due diligence)
  • Whether software is expected to tolerate and/or compensate for hardware errors. For example, for automotive software, much of the complexity comes from building fault-tolerance into the software because correcting hardware problems introduced in design or manufacture is prohibitively expense. We pondered how often hardware errors occur in supercomputer installations, and whether if they did it would affect the software. I’ve no idea of the answer to the first question, but the second is readily handled by the checkpoint and restart features built into all climate models. Audris pointed out that given the volumes of data being handled (terrabytes per day), there are almost certainly errors introduced in storage and retrieval (i.e. bits getting flipped), and enough that standard error correction would still miss a few. However, there’s enough noise in the data that in general, such things probably go unnoticed, although we speculated what would happen when the most significant bit gets flipped in some important variable.

More interestingly, we talked about what happens when these contextual factors change over time. For example, the emergence of a competitor where there was none previously, or the creation of a new regulatory framework where none existed. Or even, in the case of health care, when change in the regulatory framework relaxes a constraint – such as the recent US healthcare bill, under which it (presumably) becomes easier to share health records among medical professionals if knowledge of pre-existing conditions is no longer a critical privacy concern. An example from climate modeling: software that was originally developed as part of a PhD project intended for use by just one person eventually grows into a vast legacy system, because it turns out to be a really useful model for the community to use. And another: the move from single site development (which is how nearly all climate models were developed) to geographically distributed development, now that it’s getting increasingly hard to get all the necessary expertise under one roof, because of the increasing diversity of science included in the models.

We think there are lots of interesting studies to be done of what happens to the software development processes for different species of software when such contextual factors change.

Finally, we talked a bit about the challenge of finding metrics that are valid across the vastly different contexts of the various software engineering species we identified. Experience with trying to measure defect rates in climate models suggests that it is much harder to make valid comparisons than is generally presumed in the software literature. There really has not been any serious consideration of these various contextual factors and their impact on software practices in the literature, and hence we might need to re-think a lot of the ways in which claims for generality are handled in empirical software engineering studies. We spent some time talking about the specific case of defect measurements, but I’ll save that for a future post.

Kate asked the question last week “How do you stay sane” (while fighting the misinformation campaigns and worrying about our prospects for averting dangerous climate change). Kate’s post reminded me of a post I did last year on climate trauma, and specifically the essay by Gillian Caldwell, in which she compares the emotional burnout that many of us feel when dealing with climate change with other types of psychological trauma. I originally read this at a time when I was overdoing it, working late into the evenings, going to bed exhausted, and then finding myself unable to sleep because my head was buzzing with everything I’d just been working on. Gillian’s essay struck a chord.

I took on board many of the climate trauma survival tips, and in particular, I started avoiding climate related work in the evenings. My blogging rate went down and I started sleeping and exercising properly again. But good habits can be hard to maintain, and I realise in the last few months I was overdoing it again. As it was March break last week, we took a snap decision to take some time off, and took the kids skiing in Quebec. We even managed to fit in trips to Ottawa and Montreal en route, as the kids hadn’t been to either city.

The trip was great, but wasn’t 100% effective as a complete break. I was reminded of climate change throughout: I didn’t need a coat in Ottawa (in March!!) and we picnicked outdoors in Montreal (in March!!). There’s no snow left in the Laurentides (except on the ski slopes); and we found ourselves skiing in hot sunshine (which meant by mid-afternoon the slopes were covered in piles of wet slush). The ski operators told us they normally stay open through mid-April, but that looks extremely unlikely this year. And sure enough, I return to the news that Canada has experienced the warmest winter ever recorded, and we’re on course for the hottest year ever. It can’t be good news for the ski industry.

And it’s not good news for me  because I’m now back to blogging late into the evening again…

14. March 2010 · 6 comments · Categories: humour

The Subversion book (it's turtles all the way down)

Here’s the funniest comment from when I visited NCAR the other week. We were talking over dinner about how just about anything the scientists say and do now will be twisted out of context, to try and prove a conspiracy. Never mind “tricks” and “data manipulation”. What happens when the ignoranti find out that the tool used to manage the code for the climate models is called Subversion?

13. March 2010 · 1 comment · Categories: ICSE 2010

We’re gearing up our plans for the second international workshop on software research and climate change (WSRCC-2), to be held in Cape Town on May 3 (in conjunction with ICSE-2010). The workshop follows from a successful WSRCC-1 we held in the fall at Oopsla/Onward! (See also my summary of the brainstorming session).

One of the biggest challenges for the workshop in Cape Town is to accommodate participation by people who can’t be there. After all, there is irony in the size of the carbon footprints for many of us to travel all the way to South Africa, and many of the organizing committee members felt it’s too far to travel. We’ve ruled out the idea of video-conferencing (our experience is that the technology and bandwidth at conference centres just isn’t reliable enough). However, after a little brainstorming, we came up with some interesting ideas:

  • Invite people to submit youtube-style videos, to be posted on the conference website. The best of these will be shown in a session at the workshop;
  • Make full use of twitter and friendfeed to connect with remote participants, perhaps projecting the feeds up on the screen during the workshop. (tags are ready – twitter: #wsrcc-2; friendfeed: wsrcc-2-may-2010);
  • Have one session at the workshop opened up to audio conferencing. The second afternoon session would work best for this, as will permit participation from most timezones: it’ll be evening in India, afternoon in Europe; and morning in N. & S. America. And I’m led to believe that the Aussies and Japanese are always happy stay up all night anyway…
  • And I was keen to experiment with embodied social proxies, but I don’t think we’ll be able to get the kit together for this year…

Anyway, I’d be interested in more ideas, and encourage everyone to participate, either physically or remotely. The draft program is up already.

Oh, and I’m really looking forward to the closing keynote at ICSE this year: Sir David King, talking about Planning for Climate Change in the 21st Century.