Here’s an appalling article by Andy Revkin on dotEarth which epitomizes everything that is wrong with media coverage of climate change. Far from using his position to educate and influence the public by seeking the truth, journalists like Revkin now seem to have taken to just making shit up, reporting what he reads in blogs as the truth, rather than investigating for himself what scientists actually do.

Revkin kicks off by citing a Harvard cognitive scientist found guilty of academic misconduct, and connecting it with “assertions that climate research suffered far too much from group think, protective tribalism and willingness to spin findings to suit an environmental agenda”. Note the juxtaposition. On the one hand, a story of a lone scientist who turned out to be corrupt (which is rare, but does happen from time to time). On the other hand, a set of insinuations about thousands of climate scientists, with no evidence whatsoever. Groupthink? Tribalism? Spin? Can Revkin substantiate these allegations? Does he even try? Of course not. He just repeats a lot of gossip from a bunch of politically motivated blogs, and demonstrates his own total ignorance of how scientists work.

He does offer two pieces of evidence to back up his assertion of bias. The first is the well-publicized mistake in the IPCC report on the retreat of the Himalayan glaciers. Unfortunately, the quotes from the IPCC authors in the very article Revkin points to, show it was the result of an honest mistake, despite an entire cadre of journalists and bloggers trying to spin it into some vast conspiracy theory. The second is about a paper on the connection between vanishing frogs and climate change, cited in the IPCC report. The IPCC report quite correctly cites the paper, and gives a one sentence summary of it. Somehow or other, Revkin seems to think this is bias or spin. It must have entirely escaped his notice that the IPCC report is supposed to summarize the literature in order to assess our current understanding of the science. Some of that literature is tentative, and some less so. Now, maybe Revkin has evidence that there is absolutely no connection between the vanishing frogs and climate change. If so, he completely fails to mention it. Which means that the IPCC is merely reporting on the best information we have on the subject. Come on Andy, if you want to demonstrate a pattern of bias in the IPCC reports, you’re gonna have to work damn harder than that. Oh, but I forgot. You’re just repeating a bunch of conspiracy theories to pretend you have something useful to say, rather than actually, say, investigating a story.

From here, Revkin weaves a picture of climate science as “done by very small tribes (sea ice folks, glacier folks, modelers, climate-ecologists, etc)”, and hence suggests they must therefore be guilty of groupthink and confirmation bias. Does he offer any evidence for this tribalism? No he does not, for there is none. He merely repeats the allegations of a bunch of people like Steve McIntyre, who working on the fringes of science, clearly do belong to a minor tribe, one that does not interact in any meaningful way with real climate scientists. So, I guess we’re meant to conclude that because McIntyre and a few others have formed a little insular tribe, that this must mean mainstream climate scientists are tribal too? Such reasoning would be laughable, if this wasn’t such a serious subject.

Revkin claims to have been “following the global warming saga – science and policy – for nearly a quarter century”. Unfortunately, in all that time, he doesn’t appear to have actually educated himself about how the science is done. If he’d spent any time in a climate science research institute, he’d know this allegation of tribalism is about as far from the truth as it’s possible to get. Oh, but of course, actually going and observing scientists in action would require some effort. That seems to be just a little too much to ask.

So, to educate Andy, and to save him the trouble of finding out for himself, let me explain. First, a little bit of history. The modern concern about the potential impacts of climate change probably dates back to the 1957 Revelle and Suess paper, in which they reported that the oceans absorb far less anthropogenic carbon emissions than was previously thought. Revelle was trained in geology and oceanography. Suess was a nuclear physicist, who studied the distribution of carbon-14 in the atmosphere. Their collaboration was inspired by discussions with Libby, a physical chemist famous for the development of radio-carbon dating. As head of the Scripps Institute, Revelle brought together oceanographers with atmospheric physicists (including initiating the Mauna Loa of the measurement of carbon dioxide concentrations in the atmosphere), atomic physicists studying dispersal of radioactive particles, and biologists studying the biological impacts of  radiation. Tribalism? How about some truly remarkable inter-disciplinary research?

I suppose Revkin might argue that those were the old days, and maybe things have gone downhill since then. But again, the evidence says otherwise. In the 1970’s, the idea of earth system science began to emerge, and in the last decade, it has become central to the efforts to build climate simulation models to improve our understandings of the connections between the various earth subsystems: atmosphere, ocean, atmospheric chemistry, ocean biogeochemistry, biology, hydrology, glaciology and meteorology. If you visit any of the major climate research labs today, you’ll find a collection of scientists from many of these different disciplines working alongside one another, collaborating on the development of integrated models, and discussing the connections between the different earth subsystems. For example, when I visited the UK Met Office two years ago, I was struck by their use of cross-disciplinary teams to investigate specific problems in the simulation models. When I visited, they had just formed such a cross-disciplinary team to investigate how to improve the simulation of the Indian monsoons in their earth system models. This week, I’m just wrapping up a month long visit to the Max Planck Institute for Meteorology in Hamburg, where I’ve also regularly sat in on meetings between scientists from the various disciplines, sharing ideas about, for example, the relationships between atmospheric radiative transfer and ocean plankton models.

The folks in Hamburg have been kind enough to allow me to sit in on their summer school this week, in which they’re training the next generation of earth science PhD students how to work with earth system models. The students are from a wide variety of disciplines: some study glaciers, some clouds, some oceanography, some biology, and so on. The set of experiments we’ve been given to try out the model include: changing the cloud top mass flux, altering the rate of decomposition in soils, changing the ocean mixing ratio, altering the ocean albedo, and changing the shape of the earth. Oh, and they’ve mixed up the students, so they have to work in pairs with people from another discipline. Tribalism? No, right from the get go, PhD training includes the encouragement of cross-disciplinary thinking and cross-disciplinary working.

Of course, if Revkin ever did wander into a climate science research institute he would see this for himself. But no, he prefers pontificating from the comfort of his armchair, repeating nonsense allegations he reads on the internet. And this is the standard that journalists hold for themselves? No wonder the general public is confused about climate change. Instead of trying to pick holes in a science they clearly don’t understand, maybe people like Revkin ought to do some soul searching and investigate the gaping holes in journalistic coverage of climate change. Then finally we might find out where the real biases lie.

So, here’s a challenge for Andy Revkin: Do not write another word about climate science until you have spent one whole month as a visitor in a climate research institute. Attend the seminars, talk to the PhD students, sit in on meetings, find out what actually goes on in these places. If you can’t be bothered to do that, then please STFU [about this whole bias, groupthink and tribalism meme].

Update: On reflection, I think I was too generous to Revkin when I accused him of making stuff up, so I deleted that bit. He’s really just parroting other people who make stuff up.

Update #2: Oh, did I mention that I’m a computer scientist? I’ve been welcomed into various climate research labs, invited to sit in on meetings and observe their working practices, and to spend my time hanging out with all sorts of scientists from all sorts of disciplines. Because obviously they’re a bunch of tribalists who are trying to hide what they do. NOT.

Update #3: I’ve added a clarifying rider to my last paragraph  – I don’t mean to suggest Andy should shut up altogether, just specifically about these ridiculous memes about tribalism and so on.

Nearly everything we ever do depends on vast social and technical infrastructures, which, when they work, are largely invisible. Science is no exception – modern science is only possible because we have built the infrastructure to support it: classification systems, international standards, peer-review, funding agencies, and, most importantly, systems for the collection and curation of vast quantities of data about the world. Star and Ruhleder point out the infrastructure that supports scientific work is embedded inside of other social and technical systems, and becomes invisible when we come to rely on it. Indeed, the process of learning how to make use of a particular infrastructure is, to a large extent, what defines membership in a particular community of practice. They also observe that our infrastructures are closely intertwined with our conventions and standards. As a simple example, they point to the QWERTY keyboard, which despite its limitations, shapes much of our interaction with computers (even the design of office furniture!), such that learning to use the keyboard is a crucial part of learning to use a computer. And once you can type, you cease to be aware of the keyboard itself, except when it breaks down. This invisibility-in-use is similar to Heidegger’s notion of tools that are ready-to-hand; the key difference is that tools are local to the user, while infrastructures have vast spatial and/or temporal extent.

A crucial point is that what counts as infrastructure depends on the nature of the work that it supports. What is invisible infrastructure for one community might not be for another. The internet is a good example – most users just accept it exists and make use of it, without asking how it works. However, to computer scientists, a detailed understanding of its inner workings is vital. A refusal to treat the internet as invisible infrastructure is a condition to entry into certain geek cultures.

In their book Sorting Things Out, Star and Bowker introduced the term infrastructural inversion, for a process of focusing explicitly on the infrastructure itself, in order to expose and study its inner workings. It’s a rather cumbersome phrase for a very interesting process, kind of like a switch of figure and ground. In their case, infrastructural inversion is a research strategy that allows them to explore how things like classification systems and standards are embedded in so much of scientific practice, and to understand how these things evolve with the science itself.

Paul Edwards applies infrastructural inversion to climate science in his book A Vast Machine, where he examines the history of attempts by meteorologists to create a system for collecting global weather data, and for sharing that data with the international weather forecasting community. He points out that climate scientists also come to rely on that same infrastructure, but that it doesn’t serve their needs so well, and hence there is a difference between weather data and climate data. As an example, meteorologists tolerate changes in the nature and location of a particular surface temperature station over time, because they are only interested in forecasting over the short term (days or weeks). But to a climate scientist trying to study long-term trends in climate, such changes (known as inhomogeneities) are crucial. In this case, the infrastructure breaks down, as it fails to serve the needs of this particular community of scientists.

Hence, as Edwards points out, climate scientists also perform infrastructural inversion regularly themselves, as they dive into the details of the data collection system, trying to find and correct inhomogeneities. In the process, almost any aspect of how this vast infrastructure works might become important, revealing clues about what parts of the data can be used and which parts must be re-considered. One of the key messages in Paul’s book is that the usual distinction between data and models is now almost completely irrelevant in meteorology and climate science. The data collection depends on a vast array of models to turn raw instrumental readings into useful data, while the models themselves can be thought of sophisticated data reconstructions. Even GCMs, which now have the ability to do data assimilation and re-analysis, can be thought of as large amounts of data made executable through a set of equations that define spatial and temporal relationships within that data.

As an example, Edwards describes the analysis performed by Christy and Spencer at UAH on the MSU satellite data, from which they extracted measurements of the temperature of the upper atmosphere. In various congressional hearing, Spencer and Christy frequently touted their work, which showed a slight cooling trend in the upper atmosphere, as superior to other work that showed a warming trend because they were able to “actually measure the temperature of the free atmosphere” whereas other work was merely “estimation” from models (Edwards, p414). However, this completely neglects the fact that the MSU data doesn’t measure temperature in the lower troposphere directly at all, it measures radiance at the top of the atmosphere. Temperature readings for the lower troposphere are constructed from these readings via a complex set of models that take into account the chemical composition of the atmosphere, the trajectory of the satellite, and the position of the sun, among other factors. More importantly, a series of corrections in these models over several years gradually removed the apparent cooling trend, finally revealing a warming trend, as predicted by the theory (see Karl et al for a more complete account). The key point is that the data needed for meteorology and climate science is so vast and so complex that it’s no longer possible to disentangle models from data. The data depends on models to make it useful, and the models are sophisticated tools for turning one kind of data into another.

While the vast infrastructure for collecting and sharing data has become largely invisible to many working meteorologists, but must be continually inverted by climate scientists, in order to use it for analysis of longer term trends. The project to develop a new global surface temperature record that I described yesterday is one example of such inversion – it will involve a painstaking process of search and rescue on original data records dating back more than a century, because of the needs for a more complete, higher resolution temperature record than is currently available.

So far, I’ve only described constructive uses of infrastructural inversion, performed in the pursuit of science, to improve our understanding of how things work, and to allow us to re-adapt an infrastructure for new purposes. But there’s another use of infrastructural inversion, applied as a rhetorical technique to undermine scientific research. It has been applied increasingly in recent years in an attempt to slow down progress on enacting climate change mitigation policies, by sowing doubt and confusion about the validity of our knowledge about climate change. The technique is to dig down into the vast infrastructure that supports climate science, identify weaknesses in this infrastructure, and tout them as reasons to mistrust scientists’ current understanding of the climate system. And it’s an easy game to play, for two reasons: (1) all infrastructures are constructed through a series of compromises (e.g. standards are never followed exactly), and communities of practice develop workarounds that naturally correct for infrastructural weaknesses and (2) as described above, the data collection for weather forecasting frequently does fail to serve the needs of climate scientists. The climate scientists are painfully aware of these infrastructural weaknesses and have to deal with them every day, while those playing this rhetorical game ignore this, and pretend instead that there’s a vast conspiracy to lie about the science.

The problem is that, at first sight, many of these attempts at infrastructural inversion look like honest citizen-scientist attempt to increase transparency and improve the quality of the science (e.g. see Edwards, p421-427). For example, Anthony Watt’s SurfaceStations.org project is an attempt to document the site details of a large number of surface weather measuring stations, to understand how problems in their siting (e.g. growth of surrounding buildings) and placement of instruments might create biases in the long term trends constructed from their data. At face value, this looks like a valuable citizen-science exercise in infrastructural inversion. However, Watts wraps the whole exercise in the rhetoric of conspiracy theory, frequently claiming that climate scientists are dishonest, that they are covering up these problems, and that climate change itself is a myth. This not only ignores the fact that climate scientists themselves routinely examine such weaknesses in the temperature record, but also has the effect of biasing the entire exercise, as Watts’ followers are increasingly motivated to report only those problems that would cause a warming bias, and ignore those that do not. Recent independent studies that have examined the data collected by the SurfaceStations.org project demonstrate that the corrections demanded by Watts are irrelevant.

The recent project launched by the UK Met Office might look to many people like it’s a desperate response to “ClimateGate“, a mea culpa, an attempt to claw back some credibility. But, put into the context of the history of continual infrastructural inversion performed by climate scientists throughout the history of the field, it is nothing of the sort. It’s just one more in a long series of efforts to build better and more complete datasets to allow climate scientists to answer new research questions. This is what climate scientists do all the time. In this case, it is an attempt to move from monthly to daily temperature records, to improve our ability to understand the regional effects of climate change, and especially to address the growing need to understand the effect of climate change on extreme weather events (which are largely invisible in monthly averages).

So, infrastructural inversion is a fascinating process, used by at least three different groups:

  • Researchers who study scientific work (e.g. Star, Bowker, Edwards) use it to understand the interplay between the infrastructure and the scientific work that it supports;
  • Climate scientists use it all the time to analyze and improve the weather data collection systems that they need to understand longer term climate trends;
  • Climate change denialists use it to sow doubt and confusion about climate science, to further a political agenda of delaying regulation of carbon emissions.

And unfortunately, sorting out constructive uses of infrastructural inversion from its abuses is hard, because in all cases, it looks like legitimate questions are being asked.

Oh, and I can’t recommend Edward’s book highly enough. As Myles Allen writes in his review: “A Vast Machine […] should be compulsory reading for anyone who now feels empowered to pontificate on how climate science should be done.”

I’ve been invited to a workshop at the UK Met Office in a few weeks time, to brainstorm a plan to create (and curate) a new global surface temperature data archive. Probably the best introduction to this is the article by Stott and Thorne in Nature, back in May.

There’s now a series of white papers, to set out some of the challenges, and to solicit input from a broad range of stakeholders prior to the workshop. The white papers are available at http://www.surfacetemperatures.org/ and there’s a moderated blog to collect comments, which is open until Sept 1st (yes, I know that’s real soon now – I’m a little slow blogging this).

I’ll blog some of my reflections on what I think is missing from the white papers over the next few days. For now, here’s a quick summary of the white papers and the issues they cover (yes, the numbering starts at 3 – don’t worry about it!)

Paper #3, on Retrieval of Historical Data is a good place to start, as it sets out the many challenges in reconstructing a fully traceable archive of the surface temperature data. It offers the following definitions of the data products:

  • Level 0: original raw instrumental readings, or digitized images of logs;
  • Level 1: data as originally keyed in, typically converted to some local (native) format;
  • Level 2: data converted to common format;
  • Level 3: data consolidated into a databank;
  • Level 4: quality controlled derived product (eg corrected for station biases, etc)
  • Level 5: homogenized derived product (eg regridded, interpolated, etc)

The central problem is that most existing temperature records are level 3 data or above, and traceability to lower levels has not been maintained. The original records are patchy, and sometimes only higher level products have been archived. Also, the are multiple ways of deriving higher level products, in some cases because of improved techniques that supersede previous approaches, and in other cases because of multiple valid methodologies suited to different analysis purposes.

Effort to recover the original source data will be expensive, and hence will need some prioritization criteria. It will often be hard to tell whether peripheral information will turn out to be important, eg comments in ships log books may provide important context to explain anomalies in the data. The paper suggests prioritizing records that add substantially to the existing datasets – eg under-represented regions, especially for cases where it’s likely to be easy to get agreement from (eg national centres) that hold the data records.

Scoping decisions will be hard too. The focus is on surface air temperature records, but it might be cost-effective to include related data, such as all parameters from land stations, and, anticipating an interest in extremes, maybe want hydrological data too… And so on. Also, original, paper based records are important as historical documents, for purposes beyond meteorology. Hence, scanned images may be important, in addition to the digital data extraction.

Data records exist at various temporal resolution (hourly, daily, monthly, seasonal, etc), but availability of each type is variable. By retrieving the original records, it may be possible to backfill the various records at these different resolutions, but this won’t necessarily produce consistent records, due to differences in techniques used to produce aggregates. Furthermore, differences occur anyway between regions, and even between different eras in the same series. Hence, homogenization is tricky. Full traceability between different data levels and the processing techniques that link them is therefore an important goal, but will be very hard to achieve given the size and complexity of the data, and the patchiness of the metadata. In many cases the metadata is poor or non-existent. This includes descriptions of the stations themselves, the instruments used, calibration, precision, and even the units and timings of readings.

Then of course there is the problem of ownership. Much of the data was originally collected by national meteorological services, some of which depend on revenues from this data for their very operations, and some are keen to protect their interests in using this data to provide commercial forecasting services. Hence, it won’t always be possible to release all the lower level data publicly.

Suitable policies will be needed to decide what to do when lower levels from which level 3 data was derived are no longer available. We probably don’t want to exclude such data, but do need to clearly flag it. We need to give end users full flexibility in deciding how to filter the products they want to use.

Finally, the paper takes pains to point out how large an effort it will take to recover, digitize and make traceable all the level 0, 1 and 2 data. Far more paper based records exist than there is effort available to digitize them. The authors speculate about crowd sourcing the digitization, but that brings quality control issues. Also some of the paper records are fragile, and deteriorating (which might also imply some urgency).

(The paper also lists a number of current global and national databanks, with some notes on what each contains, along with some recent efforts to recover lower level data for similar datasets.)

Paper #4 on Near Real-Time Updates describes the existing Global Telecommunications System (GTS) used by the international meteorological community, which is probably easiest to describe via a couple of pictures:

Data Collection by the National Meteorological Services (NMS)

National Meteorological Centers (NMC) and Regional Telecommunications Hubs (RTH) in the WMO's Global Telecommunication System

The existing global telecommunications system is good for collecting low time-resolution (e.g. monthly) data, but hasn’t kept pace with the need for rapid transmission of daily and sub-daily data, nor does it do a particularly good job with metadata. The paper mentions a target of 24 hours for transmission of daily and sub-daily data, and within 5 days of the end of the month for monthly data, but points out that the target is rarely met. And it describes some of the weaknesses in the existing system:

  • The system depends on a set of catalogues that define the station metadata and routing tables (list of who publishes and subscribes to each data stream), which allow the data transmission to be very terse. But these catalogues aren’t updated frequently enough, leading to many apparent inconsistencies in the data, which can be hard to track down.
  • Some nations lack the resources to transmit their data in a timely manner (or in some cases, at all)
  • Some nations are slow to correct errors in the data record (e.g. when the wrong month’s data is transmitted)
  • Attempts to fill gaps and correct errors often yield data via email and/or parcel post, which therefore bypasses the GTS, so availability isn’t obvious to all subscribers.
  • The daily and sub-daily data often isn’t shared via the GTS, which means the historical record is incomplete.
  • There is no mechanism for detecting and correcting errors in the daily data.
  • The daily data also contains many errors, due to differences in defining the 24-hour reporting period (it’s supposed to be midnight to midnight UTC time, but often isn’t)
  • The international agreements aren’t in place for use of the daily data (although there is a network of bi-lateral agreements), and it is regarded as commercially valuable by many of the national meteorological services.

Paper #5 on Data Policy describes the current state of surface temperature records (e.g. those held at CRU and NOAA-NCDC), which contain just monthly averages for a subset of the available stations. These archives don’t store any of the lower level data sources, and differ where they’ve used different ways of computing the monthly averages (e.g. mean of the 3-hourly observations, versus mean of the daily minima and maxima). While in theory, the World Meteorological Organization (WMO) is committed to free exchange of the data collected by the national meteorological services, in practice there is a mix of different restrictions on data from different providers. For example, some is restricted to academic use only, while other providers charge fees for the data to enable them to fund their operations. In both cases, handing the data on to third parties is therefore not permitted.

One response to this problem has been to run a series of workshops in various remote parts of the world, in which local datasets are processed to produce high quality derived products, even where the low level data cannot be released. These workshops have the benefit of engaging the local meteorological services in analyzing regional climate change (often for the first time), and raising awareness of the importance of data sharing.

Paper #6 on Data provenance, version control, configuration management is a first attempt at identifying the requirements for curating the proposed data archive (I wish they’d use the term ‘curating’ in the white papers). The paper starts by making a very important point: the aim is not “to assess derived products as to whether they meet higher standards required by specific communities (i.e. scientific, legal, etc.)” but rather it’s “to archive and disseminate derived products as long as the homogenization algorithm is documented by the peer review process”. Which is important, because it means the goal is to support the normal process of doing science, rather than to constrain it.

Some of the identified requirements are:

  • The need for a process (the paper suggests a certification panel) to rate the authenticity of source material and its relationship to primary sources; and that this process must be dynamic, because of the potential for new information to cast doubt on material previously rated as authentic.
  • The need for version control, and the difficult question of what counts as a configuration unit for versioning. E.g. temporal blocks (decade-by-decade?), individual surface stations, regional datasets, etc?
  • The need for a pre-authentication database to hold potential updates prior to certification
  • The need to limit the frequency of version changes on the basic (level 2 and below) data, due to the vast amount of work that will be invested into science based on these.
  • The need to version control all the software used for producing the data, along with the test cases too.
  • The likelihood that there will be multiple versions of a station record at level 1, with varying levels of confidence rating.

Papers 8 (Creation of quality controlled homogenised datasets from the databank), 9 (Benchmarking homogenisation algorithm performance against test cases) and 10 (Dataset algorithm performance assessment based upon all efforts) go into detail about the processes used by this community for detecting bugs (inhomogeneities) in the data, and for fixing them. Such bugs arise most often because of changes over time in some aspect of the data collection at a particular station, or in the algorithms used to process the data. A particularly famous example is the growth of urbanization having the effect that a recording station that was originally in a rural environment ends up in an urban environment, and hence may suffer from the urban heat island effect.

I won’t go into detail here on these problems (read the papers!) except to note that the whole problem looks to me very similar to code debugging: there are an unknown number of inhomogeneities in the dataset, we’re unlikely to find them all, and some of them have been latent for so long, with so much subsequent work overlaid on them, that they might end up being treated as features if we can establish that they don’t impact the validity of that work. Also, the process of creating benchmarks to test the skill of homogenisation algorithms looks very much like bug seeding techniques – we insert deliberate errors into a realistic dataset and check how many are detected.

Paper 11 (Spatial and temporal interpolation) covers interpolation techniques used to fill in missing data, and/or to convert the messy real data to a regularly spaced grid. The paper also describes the use of reanalysis techniques, whereby a climate model is used to fill in missing data by running the model with it constrained by whatever data is available over a period of time, using the model values to fill in the blanks, and iterating on this process until a best fit with the real data is achieved.

Paper 13 (Publication, collation of results, presentation of audit trails) gets into the issue of how the derived products (levels 4 and 5 data) will be described in publications, and how to ensure reproducibility of results. Most importantly, publication of papers describing each derived product is an important part of making the dataset available to the community, and documenting it. Published papers need to give detailed version information for all data that was used, to allow others to retrieve the same source data. Any homogenisation algorithms that are applied ought to have also been described in the peer reviewed literature, and tested against the standard benchmarks (and presumably the version details will be given for these algorithms too). To ensure audit trails are available, all derived products in the databank must include details on the stations and periods used, quality control flags, breakpoint locations and adjustment factors, any ancillary datasets, and any intermediate steps especially for iterative homogenization procedures. Oh, and the databank should provide templates for the acknowledgements sections of published papers.

As an aside, I can’t help but think this imposes a set of requirements on the scientific community (or at least the publication process) that contradicts the point made in paper 6 about not being in the game of assessing whether higher level products meet certain scientific standards.

Paper 14 (Solicitation of input from the community at large including non-climate fields and discussion of web presence) tackles the difficult question of how to manage communication with broader audiences, including non-specialists and the general public. However, it narrows down the scope of the discussion, to consider as useful inputs from this broader audience only contributions to data collection, analysis and visualization (although it does acknowledge the role of broader feedback about the project as a whole and the consequences of the work).

Three distinct groups of stakeholders are identified: (i) the scientific community who already work with this type of data, (ii) active users of derived products, but who are unlikely to make contributions directly to the datasets and (iii) the lay audience who may need to understand and trust the work that is done by the other two groups.

The paper discusses the role of various communication channels (email, blogs, wikis, the peer reviewed literature, workshops, etc) for each of these stakeholder groups. There’s some discussion about the risks associated with making the full datasets completely open, for example  the potential that users may misunderstanding the metadata and data quality fields, leading to confusing analyses, and time-consuming discussions with users to clarify such issues.

The paper also suggests engaging with schools and with groups of students, for example by proposing small experiments with the data, and hosting networks of schools doing their own data collection and comparison.

Paper 15 (Governance) is a very short discussion, giving some ideas for appropriate steering committees and reporting mechanisms. The project has been endorsed by the various international bodies WMO, WCRP and GCOS, and therefore will be jointly owned by them. Funding will be pursued from the European Framework program, NSF, Google.org, etc. Finally, Paper 16 (Interactions with other activities) describes other related projects, which may partially overlap with this effort, although none of them are directly tackling the needs outlined in this project.

Great news – I’ve had my paper accepted for the 2010 FSE/SDP Workshop on the Future of Software Engineering Research, in Santa Fe, in November! The workshop sounds very interesting – 2 days intensive discussion on where we as a research community should be going. Here’s my contribution:

Climate Change: A Grand Software Challenge

Abstract

Software is a critical enabling technology in nearly all aspects of climate change, from the computational models used by climate scientists to improve our understanding of the impact of human activities on earth systems, through to the information and control systems needed to build an effective carbon-neutral society. Accordingly, we, as software researchers and software practitioners, have a major role to play in responding to the climate crisis. In this paper we map out the space in which our contributions are likely to be needed, and suggest a possible research agenda.

Introduction

Climate change is likely to be the defining issue of the 21st century. The science is unequivocal – concentrations of greenhouse gases are rising faster than at any previous era in the earth’s history, and the impacts are already evident [1]. Future impacts are likely to include a reduction of global food and water supplies, more frequent extreme weather events, sea level rise, ocean acidification, and mass extinctions [10]. In the next few decades, serious impacts are expected on human health from heat stress and vector-borne diseases [2].

Unfortunately, the scale of the systems involved makes the problem hard to understand, and hard to solve. For example, the additional carbon in greenhouse gases tends to remain in atmosphere-ocean circulation for centuries, which means past emissions commit us to further warming throughout this century, even if new emissions are dramatically reduced [12]. The human response is also very slow – it will take decades to complete a worldwide switch to carbon-neutral energy sources, during which time atmospheric concentrations of greenhouse gases will continue to rise. These lags in the system mean that further warming is inevitable, and catastrophic climate disruption is likely on the business-as-usual scenario.

Hence, we face a triple challenge: mitigation to avoid the worst climate change effects by rapidly transitioning the world to a low-carbon economy; adaptation to re-engineer the infrastructure of modern society so that we can survive and flourish on a hotter planet; and education to improve public understanding of the inter-relationships of the planetary climate system and human activity systems, and of the scale and urgency of the problem.

These challenges are global in nature, and pervade all aspects of society. To address them, researchers, engineers, policymakers, and educators from many different disciplines need to come to the table and ask what they can contribute. In the short term, we need to deploy, as rapidly as possible, existing technology to produce renewable energy[8] and design government policies and international treaties to bring greenhouse gas emissions under control. In the longer term, we need to complete the transition to a global carbon-neutral society by the latter half of this century [1]. Meeting these challenges will demand the mobilization of entire communities of expertise.

Software plays a ma jor role, both as part of the problem and as part of the solution. A large part of the massive growth of energy consumption in the past few decades is due to the manufacture and use of computing and communication technologies, and the technological advances they make possible. Energy efficiency has never been a key requirement in the development of software-intensive technologies, and so there is a very large potential for efficiency improvements [16].

But software also provides the critical infrastructure that supports the scientific study of climate change, and the use of that science by society. Software allows us to process vast amounts of geoscientific data, to simulate earth system processes, to assess the implications, and to explore possible policy responses. Software models allow scientists, activists and policymakers to share data, explore scenarios, and validate assumptions. The extent of this infrastructure is often invisible, both to those who rely on it, and to the general public [6]. Yet weaknesses in this software (whether real or imaginary) will impede our ability to make progress in tackling climate change. We need to solve hard problems to improve the way that society finds, assesses, and uses knowledge to support collective decision-making.

In this paper, we explore the role of the software community in addressing these challenges, and the potential for software infrastructure to bridge the gaps between scientific disciplines, policymakers, the media, and public opinion. We also identify critical weaknesses in our ability to develop and validate this software infrastructure, particularly as traditional software engineering methods are poorly adapted to the construction of such a vast, evolving knowledge-intensive software infrastructure.

Now read the full paper here (don’t worry, it’s only four pages, and you’ve now already read the first one!)

Oh, and many thanks to everyone who read drafts of this and sent me comments!

Over at Only in it for the Gold, Michael Tobis has been joining the dots about recent climate disruption in Russia and Pakistan, and asking some hard questions. I think it’s probably too early to treat this as a symptom that we’ve entered a new climate regime, but it it does help to clarify a few things. Like the fact that a few degrees average temperature rise isn’t really the thing we should worry about – a change in the global average temperature is just a symptom of the real problem. The real problem is the disruption to existing climates in unpredictable ways at unpredictable times caused by a massive injection of extra energy in the Earth’s systems. Sure, this leads to a measurable rise in the global average temperature, but it’s all that extra energy slopping around, disrupting existing climate regimes that should scare us witless.

Look at this pattern of temperature anomalies for July, and consider the locations of both Moscow and the headwaters of the rivers of Pakistan (from NASA). The world’s climate system has developed a new pattern. This specific pattern is probably temporary, but the likelihood of more weird patterns in different parts of the world will only grow:

Color bar for Global Temperature Anomalies, July 2010

As I said, the future is already here, it’s just not evenly distributed.

Which means that for much of this year, the North American media has been telling the wrong story. They were obsessed with an oil spill in the gulf, and the environmental damage it caused. Only one brave media outlet realised this wasn’t the real story – the real story is the much bigger environmental disaster that occurs when the oil doesn’t spill but makes it safely to port. Trust the Onion to tell it like it is.

I’ve pointed out a number of times that the software processes used to build the Earth System Models used in climate science don’t look anything like conventional software engineering practices. One very noticeable difference is the absence of detailed project plans, estimates, development phases, etc. While scientific steering committees do discuss long term strategy and set high level goals for the development of the model, the vast majority of model development work occurs bottom-up, through a series of open-ended, exploratory changes to the code. The scientists who work most closely with the models get together and decide what needs doing, typically on a week-to-week basis. Which is a little like agile planning, but without any of the agile planning techniques. Is this the best approach? Well, if the goal was to deliver working software to some external customer by a certain target date, then probably not. But that’s not the goal at all – the goal is to do good science. Which means that much of the work is exploratory and opportunistic.  It’s difficult to plan model development in any detail, because it’s never clear what will work, nor how long it will take to try out some new idea. Nearly everything that’s worth doing to improve the model hasn’t been done before.

This approach also favours a kind of scientific bricolage. Imagine we have sketched out a conceptual architecture for an earth system model. The conventional software development approach would be to draw up a plan to build each of the components on a given timeline, such that they would all be ready by some target date for integration. And it would fail spectacularly, because it would be impossible to estimate timelines for each component – each part involves significant new research. The best we can do is to get groups of scientists to go off and work on each subsystem, and wait to see what emerges. And to be willing to try incorporating new pieces of code whenever they seem to be mature enough, no matter where they came from.

So we might end up with a coupled earth system model where each of the major components was built at a different lab, each was incorporated into the model at a different stage in its development, and none of this was planned long in advance. And, as a consequence, each component has its own community of developers and users who have goals that often diverge from the goals of the overall earth system model. Typically, each community wants to run its component model in stand-alone model, to pursue scientific questions specific to that subfield. For example, ocean models are built by oceanographers to study oceanography. Plant growth models are built by biologists to study the carbon cycle. And so on.

One problem is that if you take components from each of these communities to incorporate into a coupled model, you don’t want to fork the code. A fork would give you the freedom to modify the component to make it work in the coupled scheme. But, as with forking in open source projects, is nearly always a mistake. It fragments the community, and means the forked copy no longer gets the ongoing improvements to the original software (or more precisely, it quickly becomes too costly to transplant such improvements into the forked code). Access to the relevant community of expertise and their ongoing model improvements are at least as important as any specific snapshot of their code, otherwise the coupled model will fail to keep up with the latest science. Which means a series of compromises must be made – some changes might be necessary to make the component work in a coupled scheme, but these must not detract from the ability of the community to continue working with the component as a stand-alone model.

So, building an earth system model means assembling a set of components that weren’t really designed to work together, and a continual process of negotiation between the requirements for the entire coupled model and the requirements of the individual modeling communities. The alternative, re-building each component from scratch, doesn’t make sense financially or scientifically. It would be expensive and time consuming, and you’d end up with untested software, that scientifically, is several years behind the state-of-the-art. [Actually, this might be true of any software: see this story of the netscape rebuild].

Over the long term, a set of conventions have emerged that help to make it easier to couple together components built by different communities. These include the basic data formating and message passing standards, as well as standard couplers. And more recently, modeling frameworks, metadata standards and data sharing infrastructure. But as with all standardization efforts, it takes a long time (decades?) for these to be accepted across the various modeling communities, and there is always resistance, in part because meeting the standard incurs a cost and usually detracts from the immediate goals of each particular modeling community (with the benefits accruing elsewhere – specifically to those interested in working with coupled models). Remember: these models are expensive scientific instruments. Changes that limit the use of the component as a standalone model, or which tie it to a particular coupling scheme, can diminish its value to the community that built it.

So, we’re stuck with the problem of incorporating a set of independently developed component models, without the ability to impose a set of interface standards on the teams that build the components. The interface definitions have to be continually re-negotiated. Bryan Lawrence has some nice slides on the choices, which he characterizes as the “coupler approach” and the “framework approach” (I shamelessly stole his diagrams…)

The coupler approach leaves the models almost unchanged, with a communication library doing any necessary transformation on the data fields.

The framework approach splits the original code into smaller units, adapting their data structures and calling interfaces, allowing them to be recombined in a more appropriate calling hierarchy

The advantage of the coupler approach is that it requires very little change to the original code, and allows the coupler itself to be treated as just another stand-alone component that can be re-used by other labs. However, it’s inefficient, and seriously limits the opportunities to optimize the run configuration: while the components can run in parallel, the coupler must still wait on each component to do its stuff.

The advantage of the framework approach is that it produces a much more flexible and efficient coupled model, with more opportunities to lay out the subcomponents across a parallel machine architecture, and a greater ability to plug other subcomponents in as desired. The disadvantage is that component models might need substantial re-factoring to work in the framework. The trick here is to get the framework accepted as a standard across a variety of different modeling communities. This is, of course, a bit of a chicken-and-egg problem, because its advantages have to be clearly demonstrated with some success stories before such acceptance can happen.

There is a third approach, adopted by some of the bigger climate modeling labs: build everything (or as much as possible) in house, and build ad hoc interfaces between various components as necessary. However, as earth system models become more complex, and incorporate more and more different physical, chemical and biological processes, the ability to do it all in-house is getting harder and harder. This is not a viable long term strategy.

The British Columbia provincial government has set up a Climate Change Data Catalogue, with open access to data such as GHG emissions inventories, records of extreme weather events, and data on energy use by different industrial sectors. They recently held a competition for software developers to create applications that make use of the data, and got some interesting submissions, which were announced this week. Voting is open to vote for the people’s choice winner until Aug 31st.

(h/t to Neil for this)

To get myself familiar with the models at each of the climate centers I’m visiting this summer, I’ve tried to find high level architectural diagrams of the software structure. Unfortunately, there seem to be very few such diagrams around. Climate scientists tend to think of their models in terms of a set of equations, and differentiate between models on the basis of which particular equations each implements. Hence, their documentation doesn’t contain the kinds of views on the software that a software engineer might expect. It presents the equations, often followed with comments about the numerical algorithms that implement them. This also means they don’t find automated documentation tools such as Doxygen very helpful, because they don’t want to describe their models in terms of code structure (the folks at MPI-M here do use Doxygen, but it doesn’t give them the kind of documentation they most want).

But for my benefit, as I’m a visual thinker, and perhaps to better explain to others what is in these huge hunks of code, I need diagrams. There are some schematics like this around (taken from an MPI-M project site):

But it’s not quite what I want. It shows the major components:

  • ECHAM – atmosphere dynamics and physics,
  • HAM – aerosols,
  • MESSy – atmospheric chemistry,
  • MPI-OM – ocean dynamics and physics,
  • HAMOCC – ocean biogeochemistry,
  • JSBACH – land surface processes,
  • HD – hydrology,
  • and the coupler, PRISM,

…but it only shows a few of the connectors, and many of the arrows are unlabeled. I need something that more clearly distinguishes the different kinds of connector, and perhaps shows where various subcomponents fit in (in part because I want to think about why particular compositional choices have been made).

The closest I can find to what I need is the Bretherton diagram, produced back in the mid 1980’s to explain what earth system science is all about:

The Bretherton Diagram of earth system processes (click to see bigger, as this is probably not readable!)

It’s not a diagram of an earth system model per se, but rather of the set of systems that such a model might simulate. There’s a lot of detail here, but it does clearly show the major systems (orange rectangles – these roughly correspond to model components) and subsystems (green rectangles), along with data sources and sinks (the brown ovals) and the connectors (pale blue rectangles, representing the data passed between components).

The diagram allows me to make a number of points. First, we can distinguish between two types of model:

  • a Global Climate Model, also known as a General Circulation Model (GCM), or Atmosphere-Ocean coupled model (AO-GCM), which only simulates the physical and dynamic processes in the atmosphere and ocean. Where a GCM does include parts of the other processes, it it typically only to supply appropriate boundary conditions.
  • an Earth System Model (ESM), which also includes the terrestrial and marine biogeochemical processes, snow and ice dynamics, atmospheric chemistry, aerosols, and so on – i.e. it includes simulations of most of the rest of the diagram.

Over the past decade, AO-GCMs have steadily evolved to become ESMs, although there are many intermediate forms around. In the last IPCC assessment, nearly all the models used for the assessment runs were AO-GCMs. For the next assessment, many of them will be ESMs.

Second, perhaps obviously, the diagram doesn’t show any infrastructure code. Some of this is substantial – for example an atmosphere-ocean coupler is a substantial component in its own right, often performing elaborate data transformations, such as re-gridding, interpolation, and synchronization. But this does reflect the way in which scientists often neglect the infrastructure code, because it is not really relevant to the science.

Third, the diagram treats all the connectors in the same way, because, at some level, they are all just data fields, representing physical quantities (mass, energy) that cross subsystem boundaries. However, there’s a wide range of different ways in which these connectors are implemented – in some cases binding the components tightly together with complex data sharing and control coupling, and in other cases keeping them very loose. The implementation choices are based on a mix of historical accident, expediency, program performance concerns, and the sheer complexity of the physical boundaries between the actual earth subsystems. For example, within an atmosphere model, the dynamical core (which computes the basic thermodynamics of air flow) is distinct from the radiation code (which computes how visible light, along with other parts of the spectrum, are scattered or absorbed by the various layers of air) and the moist processes (i.e. humidity and clouds). But the complexity of the interactions between these processes is sufficiently high that they are tightly bound together – it’s not currently possible to treat any of these parts as swappable components (at least in the current generation of models), although during development, some parts can be run in isolation for unit testing e.g. the dynanamical core is tested in isolation, but then most other subcomponents depend on it.

On the other hand, the interface between atmosphere and ocean is relatively simple — it’s the ocean surface — and as this also represents the interface between two distinct scientific disciplines (atmospheric physics and oceanography), atmosphere models and ocean model are always (?) loosely coupled. It’s common now for the two to operate on different grids (different resolution, or even different shape), and the translation of the various data to be passed between them is handled by a coupler. Some schematic diagrams do show how the coupler is connected:

Atmosphere-Ocean coupling via the OASIS coupler (source: Figure 4.2 in the MPI-Met PRISM Earth System Model Adaptation Guide)

Atmosphere-Ocean coupling via the OASIS coupler (source: Figure 4.2 in the MPI-Met PRISM Earth System Model Adaptation Guide)

Other interfaces are harder to define than the atmosphere-ocean interface. For example, the atmosphere and the terrestrial processes are harder to decouple: Which parts of the water cycle should be handled by the atmosphere model and which should be handled by the land surface model? Which module should handle evaporation from plants and soil? In some models, such as ECHAM, the land surface is embedded within the atmosphere model, and called as a subroutine at each time step. In part this is historical accident – the original atmosphere model had no vegetation processes, but used soil heat and moisture parameterization as a boundary condition. The land surface model, JSBACH, was developed by pulling out as much of this code as possible, and developing it into a separate vegetation model, and this is sometimes run as a standalone model by the land surface community. But it still shares some of the atmosphere infrastructure code for data handling, so its not as loosely coupled as the ocean is. By contrast, in CESM, the land surface model is a distinct component, interacting with the atmosphere model only via the coupler. This facilitates the switching of different land and/or atmosphere components into the coupled scheme, and also allows the land surface model to have a different grid.

The interface between the ocean model and the sea ice model is also tricky, not least because the area covered by the ice varies with the seasonal cycle. So if you use a coupler to keep the two components separate, the coupler needs information about which grid points contain ice and which do not at each timestep, and it has to alter its behaviour accordingly. For this reason, the sea ice is often treated as a subroutine of the ocean model, which then avoids having to expose all this information to the coupler. But again we have the same trade-off. Working through the coupler ensures they are self-contained components and can be swapped for other compatible models as needed; but at the cost of increasing the complexity of the coupler interfaces, reducing information hiding, and making future changes harder.

Similar challenges occur for:

  • the coupling between the atmosphere and the atmospheric chemistry (which handles chemical processes as gases and various types of pollution are mixed up by atmospheric dynamics).
  • the coupling between the ocean and marine biogeochemistry (which handles the way ocean life absorbs and emits various chemicals while floating around on ocean currents).
  • the coupling between the land surface processes and terrestrial hydrology (which includes rivers, lakes, wetlands and so on). Oh, and between both of these and the atmosphere, as water moves around so freely. Oh, and the ocean as well, because we have to account for how outflows from rivers enter the ocean at coastlines all around the world.
  • …and so on, as we account for more and more of the earth’s system into the models.

Overall, it seems that the complexity of the interactions between the various earth system processes is so high that traditional approaches to software modularity don’t work. Information hiding is hard to do, because these processes are so tightly inter-twined. A full object-oriented approach would be a radical departure from how these models are built currently, with the classes built on the data objects (the pale blue boxes in the Bretherton diagram) rather than the processes (the green boxes). But the computational demands of the processes in the green boxes is so high that the only way to make them efficient is to give them full access to the low level data structures. So any attempt to abstract away these processes from the data objects they operate on will lead to a model that is too inefficient to be useful.

Which brings me back to the question of how to draw pictures of the architecture so that I can compare the coupling and modularity of different models. I’m thinking the best approach might be to start with the Bretherton diagram, and then overlay it to show how various subsystems are grouped into components, and which connectors are handled by a separate coupler.

Postscript: While looking for good diagrams, I came across this incredible collection of visualizations of various aspects of sustainability, some of which are brilliant, while others are just kooky.

I had some interesting chats in the last few days with Christian Jakob, who’s visiting Hamburg at the same time as me. He’s just won a big grant to set up a new Australian Climate Research Centre, so we talked a lot about what models they’ll be using at the new centre, and the broader question of how to manage collaborations between academics and government research labs.

Christian has a paper coming out this month in BAMS on how to accelerate progress in climate model development. He points out that much of the progress now depends on the creation of new parameterizations for physical processes, but to do this more effectively requires better collaboration between the groups of people who run the coupled models and assess overall model skill, and the people who analyze observational data to improve our understanding (and simulation) of particular climate processes. The key point he makes in the paper is that process studies are often undertaken because they are interesting and or because data is available, but without much idea on whether improving a particular process will have any impact on overall model skill; conversely model skill is analyzed at modeling centers without much follow-through to identify which processes might be to blame for model weaknesses. Both activities lead to insights, but better coordination between them would help to push model development further and faster. Not that it’s easy of course: coupled models are now sufficiently complex that it’s notoriously hard to pin down the role of specific physical processes in overall model skill.

So we talked a lot about how the collaboration works. One problem seems to stem from the value of the models themselves. Climate models are like very large, very expensive scientific instruments. Only large labs (typically at government agencies) can now afford to develop and maintain fully fledged earth system models. And even then the full cost is never adequately accounted for in the labs’ funding arrangements. Funding agencies understand the costs of building and operating physical instruments, like large telescopes, or particle accelerators, as shared resources across a scientific community. But because software is invisible and abstract, they don’t think of it in the same way – there’s a tendency to think that it’s just part of the IT infrastructure, and can be developed by institutional IT support teams. But of course, the climate models need huge amounts of specialist expertise to develop and operate, and they really do need to be funded like other large scientific instruments.

The complexity of the models and the lack of adequate funding for model development means that the institutions that own the models are increasingly conservative in what they do with them. They work on small incremental changes to the models, and don’t undertake big revolutionary changes – they can’t afford to take the risk. There are some examples of labs taking such risks: for example in the early 1990’s ECMWF re-wrote their model from scratch, driven in part to make it more adaptable to new, highly parallel, hardware architectures. It took several years, and a big team of coders, bringing in the scientific experts as needed. At the end of it, they had a model that was much cleaner, and (presumably) more adaptable. But scientifically, it was no different from the model they had previously. Hence, lots of people felt this was not a good use of their time – they could have made better scientific progress during that time by continuing to evolve the old model. And that was years ago – the likelihood of labs making such radical changes these days is very low.

On the other hand, academics can try the big, revolutionary stuff – if it works, they get lots of good papers about how they’re pushing the frontiers, and if it doesn’t, they can write papers about why some promising new approach didn’t work as expected. But then getting their changes accepted into the models is hard. A key problem here is that there’s no real incentive for them to follow through. Academics are judged on papers, so once the paper is written they are done. But at that point, the contribution to the model is still a long way from being ready to incorporate for others to use. Christian estimates that it takes at least as long again to get a change ready to incorporate into a model as it does to develop it in the first place (and that’s consistent with what I’ve heard other modelers say). The academic has no incentive to continue to work on it to get it ready, and the institutions have no resources to take it and adopt it.

So again we’re back to the question of effective collaboration, beyond what any one lab or university group can do. And the need to start treating the models as expensive instruments, with much higher operation and maintenance costs than anyone has yet acknowledged. In particular, modeling centers need resources for a much bigger staff to support the efforts by the broader community to extend and improve the models.

Three separate stories on the front page of the BBC news site today:

Death rate doubles in Moscow as heatwave continues“: Extreme drought in Russia, with heatwaves filling the morgues in Moscow, and the air so thick with smoke you can’t breathe.

Pakistan floods threaten key barrage in southern Sindh“: Entire villages washed away by flooding in Pakistan – as the Globe and Mail puts it, “Scale of Pakistan floods worse than 2004 tsunami, Haiti and Kashmir quakes combined”

China landslide death toll jumps“: “The landslides in Gansu came as China was struggling with its worst flooding in a decade, with more than 1,000 people reported dead and millions more displaced around the country.”

Lots of statistics to measure the human suffering. But nobody (in the mainstream media) pointing out that this is exactly what climate change is expected to do: more frequent and more intense extreme weather events around the globe. When the forecasts from the models are presented in reports as a trend in average temperatures, don’t forget that it’s not the averages that really matter for human well-being – it’s the extremes.

And nobody (in the mainstream media) pointing out that we’re committed to more and more of this for decades, because we can’t just turn off carbon emissions, and we can’t just suck the extra carbon out of the air – it stays there for a very long time. The smoke in Moscow will eventually wash out in a good rainstorm. The carbon in the atmosphere that causes the heatwaves will not – it will keep on accumulating, until we get to zero net emissions. And given how long it will take to entirely re-tool the whole world to clean energy, the heatwaves and floods of this summer will eventually come to look like smallfry. There’s a denialist argument that environmentalists are misanthropes, wanting to deny under-developed countries the benefits of western (fossil-fuel-driven) wealth. But how much proof will we need until people realize that do-nothing strategies on climate change are causing millions of people to suffer?

I was struck by a rather idiotic comment on this CBC story on adaptation to climate change in Northern Canada:  “It’ll be awesome….palm trees, orange trees, right in my backyard!!” Yes. Quite. I’m sure the folks in Moscow will be rushing out to plant palm trees and orange trees to replace the forests that burnt down. Just as soon as they can breathe outdoors again, that is.

Oh look, Moscow is further north than every single major Canadian city. Are we ready for this?

Update: (Aug 10): At last, the BBC links the Moscow heatwave to climate change.

Update2: (Aug 11): Forgot to say that the title of this post is a version of a quote usually attributed to William Gibson.

Update3: (Aug 11): There’s a fascinating workshop in September, in Paris, dedicated to the question of how we can do a better job of forecasting extremes. I’ve missed the registration cut-off, so I probably won’t be able to attend, but the agenda is packed with interesting talks. And of course, the IPCC is in the process of writing a Special Report on Managing the Risks of Extreme Events (the SREX), but it isn’t due out until November next year.

Update4: (Aug 12): Good reporting is picking up. Toronto Star: “Weather-related disasters are here to stay, say scientists“, although I think I like the original AP title better: “Long hot summer of fire and floods fit predictions

This session at the AGU fall meeting in December is right up my street:

IN13: Software Engineering for Climate Modeling

As climate models grow in complexity in response to improved fidelity and inclusion of new physical effects, software engineering increasingly plays a important role in scientific productivity. Model results are more and more used in social and economical decisions, leading to increased demand on the traceability, repeatability, and accountability of climate model experiments. Critical questions include: How to reduce cost & risk in the development process? And how to improve software verification processes? Contributions are solicited on topics including, but not limited to: testing and reliability; life-cycle management; productivity and cost metrics; development tools and other technology; other best practices; and cultural challenges.

I’ve been asked to give an invited talk in the session, so now I’m highly motivated to encourage everyone else to submit abstracts, so that we have a packed session. The call for abstract submissions is now open, deadline is Sept 2, 2010. Go ahead, submit something!

And, as I can never stick to just one thing, here’s some other sessions that look interesting:

Aw, heck, all the sessions in the informatics division sound interesting, as do the ones in Global Environmental Change. I’ll be busy for the whole week!

Last but not least, Tim Palmer from ECMWF will be giving the Bjerknes lecture this year. Tim’s doing really interesting work with multi-model ensembles, stochastic predictions, and seamless assessment. Whatever he talks about, it’ll be great!