Here’s a letter I’ve sent to the Guardian newspaper. I wonder if they’ll print it? [Update – I’ve marked a few corrections since sending it. Darn]

Professor Darrel Ince, writing in the Guardian on February 5th, reflects on lessons from the emails and documents stolen from the Climatic Research Unit at the University of East Anglia. Prof Ince uses an example from the stolen emails to argue that there are serious concerns about software quality and openness in climate science, and goes on to suggest that this perceived alleged lack of openness is unscientific. Unfortunately, Prof Ince makes a serious error of science himself – he bases his entire argument on a single data point, without asking whether the example is in any way representative.

The email and files from the CRU that were released to the public are quite clearly a carefully chosen selection, where the selection criteria appears to be those that might cause maximum embarrassment to the climate scientists. I’m quite sure that I could find equally embarrassing examples of poor software on the computers of Prof Ince and his colleagues. The Guardian has been conducting a careful study of claims that have been made about these emails, and has shown that the allegations that have been made about defects in the climate science are unfounded. However, these investigations haven’t covered the issues that Prof Ince raises, so it is worth examining them in more detail.

The Harry README file does appear to be a long struggle by a junior scientist to get some poor quality software to work. Does this indicate that there is a systemic problem of software quality in climate science? To answer that question, we would need more data. Let me offer one more data point, representing the other end of the spectrum. Two years ago I carried out a careful study of the software development methods used for main climate simulation models developed at the UK Met Office. I was expecting to see many of the problems Prof Ince describes, because such problems are common across the entire software industry. However, I was extremely impressed with the care and rigor by which the climate models are constructed, and the extensive testing they are subjected to. In many ways, this process achieves a higher quality code than the vast majority of commercial software that I have studied, which includes the spacecraft flight control code developed by NASA’s contractors. [My results were published here: http://dx.doi.org/10.1109/MCSE.2009.193].

The climate models are developed over many years, by a large team of scientists, through a process of scientific experimentation. The scientists understand that their models are approximations of complex physical processes in the Earth’s atmosphere and oceans. They build their models through a process of iterative refinement. They run the models, and compare them with observational data, to look for the places where the models perform poorly. They then create hypotheses for how to improve the model, and then run experiments: using the previous version of the model as a control, and the new version as the experimental case, they compare both runs with the observational data to determine whether the hypothesis was correct. By a continual process of making small changes, and experimenting with the results, they end up testing their models far more effectively than most commercial software developers. And through careful use of tools to keep track of this process, they can reproduce past experiments on old versions of the model whenever necessary. The main climate models are also subjected to extensive model intercomparison tests, as part of the IPCC assessment process. Models from different labs are run on the same scenarios, and the results compared in detail, to explore the strengths and weaknesses of each model.

Like many software industries, different types of climate software are verified to different extents, representing choices of where to apply limited resources. The main climate models are tested extensively, as I described above. But often scientists need to develop other programs for occasional data analysis tasks. Sometimes, they do this rather haphazardly (which appears to be the case with the Harry file). Many of these tasks are experimental tentative in nature, and correspond to the way software engineers regularly throw a piece of code together to try out an idea. What matters is that, if the idea matures, and leads to results that are published or shared with other scientists, the results are checked out carefully by other scientists. Getting hold of the code and re-running it is usually a poor way of doing this (I’ve found over the years that replicating someone else’s experiment is fraught with difficulties, and not primarily exclusively because of problems with code quality). A much better approach is for other scientists to write their own code, and check independently whether the results are confirmed. This avoids the problem of everyone relying on one particular piece of software, as we can never be sure any software is entirely error-free.

The claim that many climate scientists have refused to publish their computer programs is also specious. I compiled a list last summer of how to access the code for the 23 main models used in the IPCC report. Although only a handful are fully open source, most are available free under fairly light licensing arrangements. For our own research we have asked for and obtained the the full code, version histories, and bug databases from several centres, with no difficulties (other than the need for a little patience as the appropriate licensing agreements were sorted out). Climate and weather forecasting code has a number of potential commercial applications, so the modeling centres use a license agreement that permits academic research, but prohibits commercial use. This is no different from what would be expected when we obtain code from any commercial organization.

Professor Ince mentions Hatton’s work, which is indeed an impressive study, and one of the few that that has been carried out on scientific code. And it is quite correct that there is a lot of shoddy scientific software out there. We’ve applied some of Hatton’s research methods to climate model software, and have found that, by standard software quality metrics, the climate models are consistently good quality code. Unfortunately, is it is not clear that standard software engineering quality metrics apply well to this code. Climate models aren’t built to satisfy a specification, but to address a scientific problem where the answer is not known in advance, and where only approximate solutions are possible. Many standard software testing techniques don’t work in this domain, and it is a shame that the software engineering research community has almost completely ignored this problem – we desperately need more research into this.

Prof Ince also echoes a belief that seems to be common across the academic software community that releasing the code will solve the quality problems seen in the specific case of the Harry file. This is a rather dubious claim. There is no evidence that, in general, open source software is any less buggy than closed source software. Dr Xu at the University of Notre Dame studied thousands of open source software projects, and found that the majority had nobody other than the original developer using them, while a very small number of projects had attracted a big community of developers. This pattern would be true of scientific software: the problem isn’t lack of openness, it’s lack of time – most of the code thrown together to test out an idea by a particular scientist is only of interest to that one scientist. If a result is published and other scientists think it’s interesting and novel, they attempt to replicate the result themselves. Sometimes they ask for the original code (and in my experience, are nearly always given it). But in general, they write their own versions, because what matters isn’t independent verification of the code, but independent verification of the scientific results.

I am encouraged that my colleagues in the software engineering research community are starting to take an interest in studying the methods by which climate science software is developed. I fully agree that this is an important topic, and have been urging my colleagues to address it for a number of years. I do hope that they take the time to study the problem more carefully though, before drawing conclusions about overall software quality of climate code.

Prof Steve Easterbrook, University of Toronto

Update: The Guardian never published my letter, but I did find a few other rebuttals to Ince’s article in various blogs. Davec’s is my favourite!

I guess headlines like “An error found in one paragraph of a 3000 page IPCC report; climate science unaffected” wouldn’t sell many newspapers. And so instead, the papers spin out the story that a few mistakes undermine the whole IPCC process. As if newspapers never ever make mistakes. Well, of course, scientists are supposed to be much more careful than sloppy journalists, so “shock horror, those clever scientists made a mistake. Now we can’t trust them” plays well to certain audiences.

And yet there are bound to be errors; the key question is whether any of them impact any important results in the field. The error with the Himalayan glaciers in the Working Group II report is interesting because Working Group I got it right. And the erroneous paragraph in WGII quite clearly contradicts itself. Stupid mistake, that should be pretty obvious to anyone reading that paragraph carefully. There’s obviously room for improvement in the editing and review process. But does this tell us anything useful about the overall quality of the review process?

There are errors in just about every book, newspaper, and blog post I’ve ever read. People make mistakes. Editorial processes catch many of them. Some get through. But few of these things have the kind of systematic review that the IPCC reports went through. Indeed, as large, detailed, technical artifacts, with extensive expert review, the IPCC reports are much less like normal books, and much more like large software systems. So, how many errors get through a typical review process for software? Is the IPCC doing better than this?

Even the best software testing and review practices in the world let errors through. Some examples (expressed in number of faults experienced in operation, per thousand lines of code):

  • Worst military systems: 55 faults/KLoC
  • Best military systems: 5 faults/KLoC
  • Agile software development (XP): 1.4 faults/KLoC
  • The Apache web server (open source): 0.5 faults/KLoC
  • NASA Space shuttle:  0.1 faults/KLoC

Because of the extensive review processes, the shuttle flight software is purported to be the most expensive in the world, in terms of dollars per line of code. Yet still about 1 error every ten thousand lines of code gets through the review and testing process. Thankfully none of those errors have ever caused a serious accident. When I worked for NASA on the Shuttle software verification in the 1990’s, they were still getting reports of software anomalies with every shuttle flight, and releasing a software update every 18 months (this, for an operational vehicle that had been flying for two decades, with only 500,000 lines of flight code!).

The IPCC reports consist of around 3000 pages, and approaching 100 lines of text per page. Let’s assume I can equate a line of text with a line of code (which seems reasonable, when you look a the information density of each line in the IPCC reports) – that would make them as complex as a 300,000 line software system. If the IPCC review process is as thorough as NASA’s, then we should still expect around 30 significant errors made it through the review process. We’ve heard of two recently – does this mean we have to endure another 28 stories, spread out over the next few months, as the drone army of denialists toils through trying to find more mistakes? Actually, it’s probably worse than that…

The IPCC writing, editing and review processes are carried out entirely by unpaid volunteers. They don’t have automated testing and static analysis tools to help – human reviewers are the only kind of review available. So they’re bound to do much worse than NASA’s flight software. I would expect there to be 100s of errors in the reports, even with the best possible review processes in the world. Somebody point me to a technical review process anywhere that can do better than this, and I’ll eat my hat. Now, what was the point of all those newspaper stories again? Oh, yes, sensationalism sells.

Having posted last night about how frustrating it is to see the same old lies get recycled in every news report, this morning I’m greeted with the news that there’s now an app for that. I’ve posted before about the skeptical science site. Well, now it’s available on a free iPhone app. I’ve downloaded it and played with it, and it looks fabulous. Here’s the screenshots:

Just perfect for bringing the science to the masses down the pub.

I’ve been distracted over the last few months with all these attacks on climate science. It’s like watching a car crash in slow motion. I know enough about climate science to be skeptical of absolutely everything written on the topic in the mainstream media. And yet I still feel compelled to read about each new revelation trumpeted in the press, and I feel compelled to do the necessary digging to find out what’s really going on. Well, I’m done with it. I’ve seen enough. I’m finally looking away. And I’m taking away some lessons about human behaviour, and most of it isn’t pretty. Many of the people attacking the scientists are truly nasty people.

Take climategate, for example (please!). It really was a non-event – a series of trumped up claims with no substance. We already knew the contrarians talk nonsense. At worse, some requests for access to data were mishandled. By scientists who were being hounded by an army of attack drones. What did those FOI requests look like? Well mostly they looked the same, because when Steve McIntyre was told that some of the metereological data was not available to non-academics because of commercial licencing agreements, he threw a hissy fit and told the lunatics that follow his blog to fire off FOI requests at the CRU. Sixty FOI requests in one weekend! Which makes them all vexatious, and probably counts as harrassment. Which is bad enough, but some of McIntyre’s followers did worse, and started firing off death threats. Death threats?!? Sometimes Often I think I’m on the wrong planet.

Or take the hockey stick controversy. Michael Mann was smeared again as a result of the CRU emails, but on investigation his name was cleared. The previous attempts to smear him, through the Wegman Investigation, turns out to be nothing but a political attack, put together by staffers in Senator Inhofe’s office. While any errors in Mann’s initial attempts at dendrochronology reconstructions have been long since been corrected, and and the results confirmed by other studies (that’s how science works, remember?), a group of obsessive denialists just won’t let the issue drop.

David Brin calls it a war on expertise. A bunch untrained armchair climatologists think they know more about the field than geoscientists who have been studying it as a fulltime career for decades. Or, more precisely, they think they can do a little poking and find errors, and that those errors will invalidate the science. Because they really really want the science to be wrong. Actually, I really really want the science to be wrong too, but I’m not so stupid as to think I can poke holes in it without first becoming an expert. If the science is wrong, you’ll read about it first in the peer-reviewed literature.

Here’s fascinating seminar, happening later this week:

Resilience in the Face of Climate Change and Peak Oil: Community-Building Responses
for an Equitable Transition to a Low-Carbon Society

Blake Poland, Associate Professor, Dalla Lana School of Public Health, UofT

THUR FEBRUARY 11, 4:10 p.m, Room 108, Health Sciences Building, 155 College St., at McCaul St, University of Toronto

The world, and North America in particular, is entering a period of unprecedented change. There is mounting evidence of the potential for (and pressure for action to avoid) catastrophic runaway climate change, unprecedented species extinctions and environmental degradation, the persistence (if not growth) of alarming inequities in health, and accelerated resource depletion. By many estimates we currently possess most of the technological know-how to solve the world’s fiscal, economic, environmental, social justice and climatological crises. In other words, the problem is not technical but social. Consensus is emerging that building resilience at 3 nested levels (psychological/ personal, community, systems level) is or must be at the centre of convergent social justice and environmental social change movements. Resilience is widely understood to refer to the ability of communities, persons, or systems to withstand shocks or stress without collapse, and perhaps the ability to accept and embrace (as opposed to resist) change. We are an interdisciplinary team principally from Canada and Brazil and we are working on the development of an arts-enabled transformative learning curriculum on the transition to a low-carbon society for application in educational and community settings, that draws on paradigms and sources of knowledge from the Global South and the Global North. We will describe work in progress.

Blake Poland is an Associate Professor in the Dalla Lana School of Public Health at the University of Toronto, Co-Director of the Environmental Health Justice in the City Research Interest Group (Centre for Urban Health Initiatives), and co-principal investigator in the CUHI-funded Building Community Resilience pilot project. His work draws on complexity science, critical social theory, arts-enabled approaches, environmental justice, community development, and health promotion.

I posted a while back the introduction to a research proposal in climate change informatics. And I also posted a list of potential research areas, and a set of criteria by which we might judge climate informatics tools. But I didn’t say what kinds of things we might want climate informatics tools to do. Here’s my first attempt, based on a slide I used at the end of my talk on usable climate science:

What do we want the tools to support?

What I was trying to lay out on this slide was a wide range of possible activities for which we could build software tools, combining good visualizations, collaborative support, and compelling user interface design. If we are to improve the quality of the public discourse on climate change, and support the kind of collective decision making that leads to effective action, we need better tools for all four of these areas:

  • Improve the public understanding of the basic science. Much of this is laid out in the IPCC reports, but to most people these are “dead tree science” – lots of thick books that very few people will read. So, how about some dynamic, elegant and cool tools to convey:
    • The difference between emissions and concentrations.
    • The various sources of emissions and how we know about them from detection/attribution studies.
    • The impacts of global warming on your part of the world – health, food and water, extreme weather events, etc.
    • The various mitigation strategies we have available, and what we know about the cost and effectiveness of each.
  • Achieve a better understanding of how the science works, to allow people to evaluate the nature of the evidence about climate change:
    • How science works, as a process of discovery, including how scientists develop theories, and how they correct mistakes.
    • What climate models are and how they are used to improve our understanding of climate processes.
    • How the peer-review process works, and why it is important, both as a filter for poor research, and a way of assessing the credentials of scientists.
    • What it means to be expert in a particular field, why expertise matters, and why expertise in one area of science doesn’t necessarily mean expertise in another.
  • Tools to support critical thinking, to allow people to analyze the situation for themselves:
    • The importance of linking claims to sources of evidence, and the use of multiple sources of evidence to test a claim.
    • How to assess the credibility of a particular claim, and the credibility of its source (desperately needed for appropriate filtering of ‘found’ information on the internet).
    • Systems Thinking – because reductionist approaches won’t help. People need to be able to recognize and understand whole systems and the dynamics of systems-of-systems.
    • Understanding risk – because the inability to assess risk factors is a major barrier to effective action.
    • Identifying the operation of vested interests. Because much of the public discourse isn’t about science or politics. It’s about people with vested interests attempting to protect those interests, often at the expense of the rest of society.
  • And finally, none of the above makes any difference if we don’t also provide tools to support effective action:
    • How to prioritize between short-term and long term goals.
    • How to identify which kinds of personal action are important and effective.
    • How to improve the quality of policy-making, so that policy choices are linked to the scientific evidence.
    • How to support consensus building and democratic action for collective decision making, at the level of communities, cities, nationals, and globally.
    • Tools to monitor effectiveness of policies and practices once they are implemented.

A reader writes to me from New Zealand, arguing that climate science isn’t a science at all because there is no possibility to conduct experiments. This misconception appears to be common, even among some distinguished scientists, who presumably have never taken the time to read many published papers in climatology. The misconception arises because people assume that climate science is all about predicting future climate change, and because such predictions are for decades/centuries into the future, and we only have one planet to work with, we can’t check to see if these predictions are correct until it’s too late to be useful.

In fact, predictions of future climate are really only a by-product of climate science. The science itself concentrates on improving our understanding of the processes that shape climate, by analyzing observations of past and present climate, and testing how well we understand them. For example, detection/attribution studies focus on the detection of changes in climate that are outside the bounds of natural variability (using statistical techniques), and determining how much of the change can be attributed to each of a number of possible forcings (e.g. changes in: greenhouse gases, land use, aerosols, solar variation, etc). Like any science, the attribution is done by creating hypotheses about possible effects of each forcing, and then testing those hypotheses. Such hypotheses can be tested by looking for contradictory evidence (e.g. other episodes in the past where the forcing was present or absent, to test how well the hypothesis explains these too). They can also be tested by encoding each hypothesis in a climate model, and checking how well it simulates the observed data.

I’m not a climate modeler, but I have conducted anthropological studies of how how climate modelers work. Climate models are developed slowly and carefully over many years, as scientific instruments. One of the most striking aspects of climate model development is that it is an experimental science in the strongest sense. What do I mean?

Well, a climate model is a detailed theory of some subset of the earth’s physical processes. Like all theories, it is a simplification that focusses on those processes that are salient to a particular set of scientific questions, and approximates or ignores those processes that are less salient. Climate modelers use their models as experimental instruments. They compare the model run with the observational record for some relevant historical period. They then come up with a hypothesis to explain any divergences between the run and the observational record, and make a small improvement to the model that the hypothesis predicts will reduce the divergence. They then run an experiment in which the old version of the model acts as a control, and the new version is the experimental case. By comparing the two runs with the observational record, they determine whether the predicted improvement was achieved (and whether the change messed anything else up in the process). After a series of such experiments, the modelers will eventually either accept the change to the model as an improvement to be permanently incorporated into the model code, or they discard it because the experiments failed (i.e. they failed to give the expected improvement). By doing this day after day, year after year, the models get steadily more sophisticated, and steadily better at simulating real climactic processes.

This experimental approach has another interesting effect: the software appears to be tested much more thoroughly than most commercial software. Whether this actually delivers higher quality code is an interesting question; however, it is clear that the approach is much more thorough than most industry practices for software regression testing.

I’m delighted to announce that my student, Jonathan Lung has started a blog. Jonathan’s PhD is on how we reduce energy consumption in computing. Unlike much work on green IT, he’s decided to focus on the human behavioural aspects of this, rather than hardware optimization. His first two posts are fascinating:

  • How to calculate if you should print something out or read it on the screen. Since he first did these calculations, we’ve been discussing how you turn this kind of analysis into an open, shared, visual representation, that others can poke and prod, to test the assumptions, customize them to their own context, and discuss. We’ll share more of our design ideas for such a tool in due course.
  • An analysis of whether the iPad is as green as Apple’s marketing claims. Which is, in effect, a special case of the more general calculation of print vs. screen. Oh, and his analysis also makes me feel okay about my desire to own an iPad…

As Jorge points out, this almost completes my set of grad student bloggers. We’ve been experimenting with blogging as a way of structuring research – a kind of open notebook science. Personally, I find it extremely helpful as a way of forcing me to write down ideas (rather than just thinking them), and for furthering discussion of ideas through the comments. And, just as importantly, it’s a way of letting other researchers know about what you’re working on – grad students’ future careers depend on them making a name for themselves in their chosen research community.

Of course, there’s a downside: grad students tend to worry about being “scooped”, by having someone else take their ideas, do the studies, and publish them first. My stock response is something along the lines of “research is 99% perspiration and 1% inspiration” – the ideas themselves, while important, are only a tiny part of doing research. It’s the investigation of the background literature and the implementation (design an empirical study, build a tool, develop a new theory, …etc) that matters. Give the same idea to a bunch of different grad students, and they will all do very different things with it, all of which (if the students are any good) ought to be publishable.

On balance, I think the benefits of blogging your way through grad school vastly outweigh the risks. Now if only my students updated their blogs more regularly… (hint, hint).

Interesting article by Andrew Jones entitled Are we taking supercomputing code seriously?:

Part of the problem is that in their rush to do science, scientists fail to spot the software for what it is: the analogue of the experimental instrument. Consequently, it needs to be treated with the same respect that a physical experiment would receive.

Any reputable physical experiment would ensure the instruments are appropriate to the job and have been tested. They would be checked for known error behaviour in the parameter regions of study, and chosen for their ability to give a satisfactory result within a useful timeframe and budget. Those same principles should apply to a software model.

In a blog post that was picked up by the Huffington post, Bill Gates writes about why we need innovation, not insulation. He sets up the piece as a choice of emphasis between two emissions targets: 30% reduction by 2025, and 80% reduction by 2050. He argues that the latter target is much more important, and hence we should focus on big R&D efforts to innovate our way to zero-carbon energy sources for transportation and power generation. In doing so, he pours scorn on energy conservation efforts, arguing, in effect, that they are a waste of time. Which means Bill Gates didn’t do his homework.

What matters is not some arbitrary target for any given year. What matters is the path we choose to get there. This is a prime example of the communications failure over climate change. Non-scientists don’t bother to learn the basic principles of climate science, and scientists completely fail to get the most important ideas across in a way that helps people make good judgements about strategy.

The key problem in climate change is not the actual emissions in any given year. It’s the cumulative emissions over time. The carbon we emit by burning fossil fuels doesn’t magically disappear. About half is absorbed by the oceans (making them more acidic). The rest cycles back and forth between the atmosphere and the biosphere, for centuries. And there is also tremendous lag in the system. The ocean warms up very slowly, so it take decades for the Earth to reach a new equilibrium temperature once concentrations in the atmosphere stabilize. This means even if we could immediately stop adding CO2 to the atmosphere today, the earth would keep warming for decades, and wouldn’t cool off again for centuries. It’s going to be tough adapting to the warming we’re already committed to. For every additional year that we fail to get emissions under control we compound the problem.

What does this mean for targets? It means that it matters much more how soon we get started on reducing emissions rather than eventual destination at any particular future year. Because any reduction in annual emissions achieved in the next few years means that we save that amount of emissions every year going forward. The longer we take to get the emissions under control, the harder we make the problem.

A picture might help:

Emissions pathways to give 67% chance of limiting global warming to 2ºC

Three different emissions pathways to give 67% chance of limiting global warming to 2ºC (From the Copenhagen Diagnosis, Figure 22)

The graph shows three different scenarios, each with the same cumulative emissions (i.e. the area under each curve is the same). If we get emissions to peak next year (the green line), it’s a lot easier to keep cumulative emissions under control. If we delay, and allow emissions to continue to rise until 2020, then we can forget about 80% reductions by 2050. We’ll have set ourselves the much tougher task of 100% emissions reductions by 2040!

The thing is, there are plenty of good analyses of how to achieve early emissions reductions by deploying existing technology. Anyone who argues we should put our hopes in some grand future R&D effort to invent new technologies clearly does not understand the climate science. Or perhaps can’t do calculus.

Here’s the abstract for a paper (that I haven’t written) on how to write an abstract:

How to Write an Abstract

The first sentence of an abstract should clearly introduce the topic of the paper so that readers can relate it to other work they are familiar with. However, an analysis of abstracts across a range of fields show that few follow this advice, nor do they take the opportunity to summarize previous work in their second sentence. A central issue is the lack of structure in standard advice on abstract writing, so most authors don’t realize the third sentence should point out the deficiencies of this existing research. To solve this problem, we describe a technique that structures the entire abstract around a set of six sentences, each of which has a specific role, so that by the end of the first four sentences you have introduced the idea fully. This structure then allows you to use the fifth sentence to elaborate a little on the research, explain how it works, and talk about the various ways that you have applied it, for example to teach generations of new graduate students how to write clearly. This technique is helpful because it clarifies your thinking and leads to a final sentence that summarizes why your research matters.

[I’m giving my talk on how to write a thesis to our grad students soon. Can you tell?]

Update 16 Oct 2011: This page gets lots of hits from people googling for “how to write an abstract”. So I should offer a little more constructive help for anyone still puzzling what the above really means. It comes from my standard advice for planning a PhD thesis (but probably works just as well for scientific papers, essays, etc.).

The key trick is to plan your argument in six sentences, and then use these to structure the entire thesis/paper/essay. The six sentences are:

  1. Introduction. In one sentence, what’s the topic? Phrase it in a way that your reader will understand. If you’re writing a PhD thesis, your readers are the examiners – assume they are familiar with the general field of research, so you need to tell them specifically what topic your thesis addresses. Same advice works for scientific papers – the readers are the peer reviewers, and eventually others in your field interested in your research, so again they know the background work, but want to know specifically what topic your paper covers.
  2. State the problem you tackle. What’s the key research question? Again, in one sentence. (Note: For a more general essay, I’d adjust this slightly to state the central question that you want to address) Remember, your first sentence introduced the overall topic, so now you can build on that, and focus on one key question within that topic. If you can’t summarize your thesis/paper/essay in one key question, then you don’t yet understand what you’re trying to write about. Keep working at this step until you have a single, concise (and understandable) question.
  3. Summarize (in one sentence) why nobody else has adequately answered the research question yet. For a PhD thesis, you’ll have an entire chapter, covering what’s been done previously in the literature. Here you have to boil that down to one sentence. But remember, the trick is not to try and cover all the various ways in which people have tried and failed; the trick is to explain that there’s this one particular approach that nobody else tried yet (hint: it’s the thing that your research does). But here you’re phrasing it in such a way that it’s clear it’s a gap in the literature. So use a phrase such as “previous work has failed to address…”. (if you’re writing a more general essay, you still need to summarize the source material you’re drawing on, so you can pull the same trick – explain in a few words what the general message in the source material is, but expressed in terms of what’s missing)
  4. Explain, in one sentence, how you tackled the research question. What’s your big new idea? (Again for a more general essay, you might want to adapt this slightly: what’s the new perspective you have adopted? or: What’s your overall view on the question you introduced in step 2?)
  5. In one sentence, how did you go about doing the research that follows from your big idea. Did you run experiments? Build a piece of software? Carry out case studies? This is likely to be the longest sentence, especially if it’s a PhD thesis – after all you’re probably covering several years worth of research. But don’t overdo it – we’re still looking for a sentence that you could read aloud without having to stop for breath. Remember, the word ‘abstract’ means a summary of the main ideas with most of the detail left out. So feel free to omit detail! (For those of you who got this far and are still insisting on writing an essay rather than signing up for a PhD, this sentence is really an elaboration of sentence 4 – explore the consequences of your new perspective).
  6. As a single sentence, what’s the key impact of your research? Here we’re not looking for the outcome of an experiment. We’re looking for a summary of the implications. What’s it all mean? Why should other people care? What can they do with your research. (Essay folks: all the same questions apply: what conclusions did you draw, and why would anyone care about them?)

The abstract I started with summarizes my approach to abstract writing as an abstract. But I suspect I might have been trying to be too clever. So here’s a simpler one:

(1) In widgetology, it’s long been understood that you have to glomp the widgets before you can squiffle them. (2) But there is still no known general method to determine when they’ve been sufficiently glomped. (3) The literature describes several specialist techniques that measure how wizzled or how whomped the widgets have become during glomping, but all of these involve slowing down the glomping, and thus risking a fracturing of the widgets. (4) In this thesis, we introduce a new glomping technique, which we call googa-glomping, that allows direct measurement of whifflization, a superior metric for assessing squiffle-readiness. (5) We describe a series of experiments on each of the five major types of widget, and show that in each case, googa-glomping runs faster than competing techniques, and produces glomped widgets that are perfect for squiffling. (6) We expect this new approach to dramatically reduce the cost of squiffled widgets without any loss of quality, and hence make mass production viable.

When I was visiting MPI-M earlier this month, I blogged about the difficulty of documenting climate models. The problem is particularly pertinent to questions of model validity and reproducibility, because the code itself is the result of a series of methodological choices by the climate scientists, which are entrenched in their design choices, and eventually become inscrutable. And when the code gets old, we lose access to these decisions. I suggested we need a kind of literate programming, which sprinkles the code among the relevant human representations (typically bits of physics, formulas, numerical algorithms, published papers), so that the emphasis is on explaining what the code does, rather than preparing it for a compiler to digest.

The problem with literate programming (at least in the way it was conceived) is that it requires programmers to give up using the program code as their organising principle, and maybe to give up traditional programming languages altogether. But there’s a much simpler way to achieve the same effect. It’s to provide an organising structure for existing programming languages and tools, but which mixes in non-code objects in an intuitive way. Imagine you had an infinitely large sheet of paper, and could zoom in and out, and scroll in any direction. Your chunks of code are laid out on the paper, in an spatial arrangement that means something to you, such that the layout helps you navigate. Bits of documentation, published papers, design notes, data files, parameterization schemes, etc can be placed on the sheet, near to the code that they are relevant to. When you zoom in on a chunk of code, the sheet becomes a code editor; when you zoom in on a set of math formulae, it becomes a LaTeX editor, and when you zoom in on a document it becomes a word processor.

Well, Code Canvas, a tool under development in Rob Deline‘s group at Microsoft Research does most of this already. The code is laid out as though it was one big UML diagram, but as you zoom in you move fluidly into a code editor. The whole thing appeals to me because I’m a spatial thinker. Traditional IDEs drive me crazy, because they separate the navigation views from the code, and force me to jump from one pane to another to navigate. In the process, they hide the inherent structure of a large code base, and constrain me to see only a small chunk at a time. Which means these tools create an artificial separation between higher level views (e.g. UML diagrams) and the code itself, sidelining the diagrammatic representations. I really like the idea of moving seamlessly back and forth between the big picture views and actual chunks of code.

Code Canvas is still an early prototype, and doesn’t yet have the ability to mix in other forms of documentation (e.g. LaTeX) on the sheet (or at least not in any demo Microsoft are willing to show off), but the potential is there. I’d like to explore how we take an idea like this an customize it for scientific code development, where there is less of a strict separation of code and data than in other forms of programming, and where the link to published papers and draft reports is important. The infinitely zoomable paper could provide an intuitive unifying tool to bring all these different types of object together in one place, to be managed as a set. And the use of spatial memory to help navigate will be helpful, when the set of things gets big.

I’m also interested in exploring the idea of using this metaphor for activities that don’t involve coding – for example complex decision-support for sustainability, where you need to move between spreadsheets, graphs & charts, models runs, and so on. I would lay out the basic decision task as a graph on the sheet, with sources of evidence connecting into the decision steps where they are needed. The sources of evidence could be text, graphs, spreadsheet models, live datafeeds, etc. And as you zoom in over each type of object, the sheet turns into the appropriate editor. As you zoom out, you get to see how the sources of evidence contribute to the decision-making task. Hmmm. Need a name for this idea. How about DecisionCanvas?

Update: Greg also pointed me to CodeBubbles and Intentional Software

Many moons ago, I talked about the danger of being distracted by our carbon footprints. I argued that the climate crisis cannot be solved by voluntary action by the (few) people who understand what we’re facing. The problem is systemic, and so adequate responses must be systemic too.

In the years since 9/11, it’s gotten steadily more frustrating to fly, as the lines build up at the security checkpoints, and we have to put more and more of what we’re wearing through the scanners. This doesn’t dissuade people from flying, but it does make them much more grumpy about it. And it doesn’t make them any safer, either. Bruce Schneier calls it “Security Theatre“: countermeasures that make it look like something is being done at the airport, but which make no difference to actual security. Bruce runs a regular competition to think up a movie plot that will create a new type of fear and hence enable the marketing of a new type of security theatre countermeasure.

Now Jon Udell joins the dots and points out that we have an equivalent problem in environmentalism: Carbon Theatre. Except that he doesn’t quite push the concept far enough. In Jon’s version, carbon theatre is competitions and online quizes and so on, in which we talk about how we’re going to reduce our carbon footprints more than the next guy, rather than actually getting on and doing things that make a difference.

I think carbon theatre is more insidious than that. It’s the very idea that an appropriate response to climate change is to make personal sacrifices. Like giving up flying. And driving. And running the air conditioner. And so on. The problem is, we approach these things like a dieter approaches the goal of losing weight. We make personal sacrifices that are simply not sustainable. For most people, dieting doesn’t work. It doesn’t work because, although the new diet might be healthier, it’s either less convenient or less enjoyable. Which means sooner or later, you fall off the wagon, because it’s simply not possible to maintain the effort and sacrifice indefinitely.

Carbon theatre means focussing on carbon footprint reduction without fixing the broader system that would make such changes sustainable. You can’t build a solution to climate change by asking people to give up the conveniences of modern life. Oh, sure, you can get people to set personal goals, and maybe even achieve them (temporarily). But if it requires a continual effort to sustain, you haven’t achieved anything. If it involves giving up things that you enjoy, and that others around you continue to enjoy, then it’s not a sustainable change.

I’ve struggled for many years to justify the fact that I fly a lot. A few long-haul flights in a year adds enough to my carbon footprint that just about anything else I do around the house is irrelevant. Apparently a lot of scientists worry about this too.When I blogged about the AGU meeting, the first comment worried about the collective carbon footprint of all those scientists flying to the meeting. George Marshall worries that this undermines the credibility of climate scientists (or maybe he’s even arguing that it means climate scientists still don’t really believe their own results). Somehow all these people seem to think it’s more important for climate scientists to give up flying than it is for, say, investment bankers or oil company executives. Surely that’s completely backwards??

This is, of course, the wrong way to think about the problem. If climate scientists unilaterally give up flying, it will make no discernible difference to the global emissions of the airline industry. And it will make the scientists a lot less effective, because it’s almost impossible to do good science without the networking and exchange of ideas that goes on at scientific conferences. And even if we advocate that everyone who really understands the magnitude of the climate crisis also gives up flying, it still doesn’t add up to a useful solution. We end up giving the impression that if you believe that climate change is a serious problem you have to make big personal sacrifices. Which makes it just that much harder for many people to accept that we do have a problem.

For example, I’ve tried giving up short haul flights in favour of taking the train. But often the train is more expensive and more hassle. If there is no direct train service to my destination, it’s difficult to plan a route, buy tickets, and the trains are never timed to connect in the right way. By making the switch, I’m inconveniencing myself, for no tangible outcome. I’d be far more effective getting together with others who understand the problem, and fixing the train system to make it cheaper and easier. Or helping existing political groups who are working towards this goal. If we make the train cheaper and easier than flying, it will be easy to persuade large number of people to switch as well.

So, am I arguing that working on our carbon footprints is a waste of time? Well, yes and no. It’s a waste of time if you’re doing it by giving up stuff that you’d rather not give up. However, it is worth it if you find a way to do it that could be copied by millions of other people with very little effort. In other words, if it’s not (massively) repeatable and sustainable, it’s probably a waste of time. We need changes that scale up, and we need to change the economic and policy frameworks to support such changes. That won’t happen if the people who understand what needs doing focus inwards on their own personal footprints. We have to think in terms of whole systems.

There is a caveat: sacrifices such as temporarily giving up flying are worthwhile if done as a way of understanding the role of flying in our lives, and the choices we make about travel; they might also be worthwhile if done as part of a coordinated political campaign to draw attention to a problem. But as a personal contribution to carbon reduction? That’s just carbon theatre.

Weather and climate are different. Weather varies tremendously from day to day, week to week, season to season. Climate, on the other hand is average weather over a period of years; it can be thought of as the boundary conditions on the variability of weather. We might get an extreme cold snap, or a heatwave at a particular location, but our knowledge of the local climate tells us that these things are unusual, temporary phenomena, and sooner or later things will return to normal. Forecasting the weather is therefore very different from forecasting changes in the climate. One is an initial value problem, and the other is a boundary value problem. Let me explain.

Good weather forecasts depend upon an accurate knowledge of the current state of the weather system. You gather as much data you can about current temperatures, winds, clouds, etc., feed them all into a simulation model and then run it forward to see what happens. This is hard because the weather is an incredibly complex system. The amount of information needed is huge: both the data and the models are incomplete and error-prone. Despite this, weather forecasting has come a long way over the past few decades. Through a daily process of generating forecasts, comparing them with what happened, and thinking about how to reduce errors, we have incredibly accurate 1- and 3- day temperature forecasts. Accurate forecasts of rain, snow, and so on for a specific location is a little harder because of the chance that the rainfall will be in a slightly different place (e.g a few kilometers away) or a slightly different time than the model forecasts, even if the overall amount of precipitation is right. Hence, daily forecasts give fairly precise temperatures, but put probabilistic values on things like rain (Probability of Precipitation, PoP), based on knowledge of the uncertainty factors in the forecast. The probabilities are known because we have a huge body of previous forecasts to compare with.

The limit on useful weather forecasts seems to be about one week. There are inaccuracies and missing information in the inputs, and the models are only approximations of the real physical processes. Hence, the whole process is error prone. At first these errors tend to be localized, which means the forecast for the short term (a few days) might be wrong in places, but is good enough in most of the region we’re interested in to be useful. But the longer we run the simulation for, the more these errors multiply, until they dominate the computation. At this point, running the simulation for longer is useless. 1-day forecasts are much more accurate than 3-day forecasts, which are better than 5-day forecasts, and beyond that it’s not much better than guessing. However, steady improvements mean that 3-day forecasts are now as accurate as 2-day forecasts were a decade ago. Weather forecasting centres are very serious about reviewing the accuracy of their forecasts, and set themselves annual targets for accuracy improvements.

A number of things help in this process of steadily improving forecasting accuracy. Improvements to the models help, as we get better and better at simulating physical processes in the atmosphere and oceans. Advances in high performance computing help too – faster supercomputers mean we can run the models at a higher resolution, which means we get more detail about where exactly energy (heat) and mass (winds, waves) are moving. But all of these improvements are dwarfed by the improvements we get from better data gathering. If we had more accurate data on current conditions, and could get it into the models faster, we could get big improvements in the forecast quality. In other words, weather forecasting is an “initial value” problem. The biggest uncertainty is knowledge of the initial conditions.

One result of this is that weather forecasting centres (like the UK Met Office) can get an instant boost to forecasting accuracy whenever they upgrade to a faster supercomputer. This is because the weather forecast needs to be delivered to a customer (e.g. a newspaper or TV station) by a fixed deadline. If the models can be made to run faster, the start of the run can be delayed, giving the meteorologists more time to collect newer data on current conditions, and more time to process this data to correct for errors, and so on. For this reason, the national weather forecasting services around the world operate many of the world’s fastest supercomputers.

Hence weather forecasters are strongly biased towards data collection as the most important problem to tackle. They tend to regard computer models as useful, but of secondary importance to data gathering. Of course, I’m generalizing – developing the models is also a part of meteorology, and some meteorologists devote themselves to modeling, coming up with new numerical algorithms, faster implementations, and better ways of capturing the physics. It’s quite a specialized subfield.

Climate science has the opposite problem. Using pretty much the same model as for numerical weather prediction, climate scientists will run the model for years, decades or even centuries of simulation time. After the first few days of simulation, the similarity to any actual weather conditions disappears. But over the long term, day-to-day and season-to-season variability in the weather is constrained by the overall climate. We sometimes describe climate as “average weather over a long period”, but in reality it is the other way round – the climate constrains what kinds of weather we get.

For understanding climate, we no longer need to worry about the initial values, we have to worry about the boundary values. These are the conditions that constraint the climate over the long term: the amount of energy received from the sun, the amount of energy radiated back into space from the earth, the amount of energy absorbed or emitted from oceans and land surfaces, and so on. If we get these boundary conditions right, we can simulate the earth’s climate for centuries, no matter what the initial conditions are. The weather itself is a chaotic system, but it operates within boundaries that keep the long term averages stable. Of course, a particularly weird choice of initial conditions will make the model behave strangely for a while, at the start of a simulation. But if the boundary conditions are right, eventually the simulation will settle down into a stable climate. (This effect is well known in chaos theory: the butterfly effect expresses the idea that the system is very sensitive to initial conditions, and attractors are what cause a chaotic system to exhibit a stable pattern over the long term)

To handle this potential for initial instability, climate modellers create “spin-up” runs: pick some starting state, run the model for say 30 years of simulation, until it has settled down to a stable climate, and then use the state at the end of the spin-up run as the starting point for science experiments. In other words, the starting state for a climate model doesn’t have to match real weather conditions at all; it just has to be a plausible state within the bounds of the particular climate conditions we’re simulating.

To explore the role of these boundary values on climate, we need to know whether a particular combination of boundary conditions keep the climate stable, or tend to change it. Conditions that tend to change it are known as forcings. But the impact of these forcings can be complicated to assess because of feedbacks. Feedbacks are responses to the forcings that then tend to amplify or diminish the change. For example, increasing the input of solar energy to the earth would be a forcing. If this then led to more evaporation from the oceans, causing increased cloud cover, this could be a feedback, because clouds have a number of effects: they reflect more sunlight back into space (because they are whiter than the land and ocean surfaces they cover) and they trap more of the surface heat (because water vapour is a strong greenhouse gas). The first of these is a negative feedback (it reduces the surface warming from increased solar input) and the second is a positive feedback (it increases the surface warming by trapping heat). To determine the overall effect, we need to set the boundary conditions to match what we know from observational data (e.g. from detailed measurements of solar input, measurements of greenhouse gases, etc). Then we run the model and see what happens.

Observational data is again important, but this time for making sure we get the boundary values right, rather than the initial values. Which means we need different kinds of data too – in particular, longer term trends rather than instantaneous snapshots. But this time, errors in the data are dwarfed by errors in the model. If the algorithms are off even by a tiny amount, the simulation will drift over a long climate run, and it stops resembling the earth’s actual climate. For example, a tiny error in calculating where the mass of air leaving one grid square goes could mean we lose a tiny bit of mass on each time step. For a weather forecast, the error is so small we can ignore it. But over a century long climate run, we might end up with no atmosphere left! So a basic test for climate models is that they conserve mass and energy over each timestep.

Climate models have also improved in accuracy steadily over the last few decades. We can now use the known forcings over the last century to obtain a simulation that tracks the temperature record amazingly well. These simulations demonstrate the point nicely. They don’t correspond to any actual weather, but show patterns in both small and large scale weather systems that mimic what the planet’s weather systems actually do over the year (look at August – see the the daily bursts of rainfall in the Amazon, the gulf stream sending rain to the UK all summer long, and the cyclones forming off the coast of Japan by the middle of the month). And these patterns aren’t programmed into the model – it is all driven by sets of equations derived from the basic physics. This isn’t a weather forecast, because on any given day, the actual weather won’t look anything like this. But it is an accurate simulation of typical weather over time (i.e. climate). And, as was the case with weather forecasts, some bits are better than others – for example the Indian monsoons tend to be less well-captured than the North Atlantic Oscillation.

At first sight, numerical weather prediction and climate models look very similar. They model the same phenomena (e.g. how energy moves around the planet via airflows in the atmosphere and currents in the ocean), using the same computational techniques (e.g., three dimensional models of fluid flow on a rotating sphere). And quite often they use the same program code. But the problems are completely different: one is an initial value problem, and one is a boundary value problem.

Which also partly explains why a small minority of (mostly older, mostly male) meteorologists end up being climate change denialists. They fail to understand the difference in the two problems, and think that climate scientists are misusing the models. They know that the initial value problem puts serious limits on our ability to predict the weather, and assume the same limit must prevent the models being used for studying climate. Their experience tells them that weaknesses in our ability to get detailed, accurate, and up-to-date data about current conditions is the limiting factor for weather forecasting, and they assume this limitation must be true of climate simulations too.

Ultimately, such people tend to suffer from “senior scientist” syndrome: a lifetime of immersion in their field gives them tremendous expertise in that field, which in turn causes them to over-estimate how well their expertise transfers to a related field. They can become so heavily invested in a particular scientific paradigm that they fail to understand that a different approach is needed for different problem types. This isn’t the same as the Dunning-Kruger effect, because the people I’m talking about aren’t incompetent. So perhaps we need a new name. I’m going to call it the Dyson-effect, after one of it’s worst sufferers.

I should clarify that I’m certainly not stating that meteorologists in general suffer from this problem (the vast majority quite clearly don’t), nor am I claiming this is the only reason why a meteorologist might be skeptical of climate research. Nor am I claiming that any specific meteorologists (or physicists such as Dyson) don’t understand the difference between initial value and boundary value problems. However, I do think that some scientists’ ideological beliefs tend to bias them to be dismissive of climate science because they don’t like the societal implications, and the Dyson-effect disinclines them to finding out what climate science actually does.

I am, however, arguing that if more people understood this distinction between the two types of problem, we could get past silly soundbites about “we can’t even forecast the weather…” and “climate models are garbage in garbage out”, and have a serious conversation about how climate science works.

Update: Zeke has a more detailed post on the role of parameterizations climate models.