The British Columbia provincial government has set up a Climate Change Data Catalogue, with open access to data such as GHG emissions inventories, records of extreme weather events, and data on energy use by different industrial sectors. They recently held a competition for software developers to create applications that make use of the data, and got some interesting submissions, which were announced this week. Voting is open to vote for the people’s choice winner until Aug 31st.

(h/t to Neil for this)

To get myself familiar with the models at each of the climate centers I’m visiting this summer, I’ve tried to find high level architectural diagrams of the software structure. Unfortunately, there seem to be very few such diagrams around. Climate scientists tend to think of their models in terms of a set of equations, and differentiate between models on the basis of which particular equations each implements. Hence, their documentation doesn’t contain the kinds of views on the software that a software engineer might expect. It presents the equations, often followed with comments about the numerical algorithms that implement them. This also means they don’t find automated documentation tools such as Doxygen very helpful, because they don’t want to describe their models in terms of code structure (the folks at MPI-M here do use Doxygen, but it doesn’t give them the kind of documentation they most want).

But for my benefit, as I’m a visual thinker, and perhaps to better explain to others what is in these huge hunks of code, I need diagrams. There are some schematics like this around (taken from an MPI-M project site):

0c5e5ca1d3

But it’s not quite what I want. It shows the major components:

  • ECHAM – atmosphere dynamics and physics,
  • HAM – aerosols,
  • MESSy – atmospheric chemistry,
  • MPI-OM – ocean dynamics and physics,
  • HAMOCC – ocean biogeochemistry,
  • JSBACH – land surface processes,
  • HD – hydrology,
  • and the coupler, PRISM,

…but it only shows a few of the connectors, and many of the arrows are unlabeled. I need something that more clearly distinguishes the different kinds of connector, and perhaps shows where various subcomponents fit in (in part because I want to think about why particular compositional choices have been made).

The closest I can find to what I need is the Bretherton diagram, produced back in the mid 1980’s to explain what earth system science is all about:

The Bretherton Diagram of earth system processes (click to see bigger, as this is probably not readable!)

It’s not a diagram of an earth system model per se, but rather of the set of systems that such a model might simulate. There’s a lot of detail here, but it does clearly show the major systems (orange rectangles – these roughly correspond to model components) and subsystems (green rectangles), along with data sources and sinks (the brown ovals) and the connectors (pale blue rectangles, representing the data passed between components).

The diagram allows me to make a number of points. First, we can distinguish between two types of model:

  • a Global Climate Model, also known as a General Circulation Model (GCM), or Atmosphere-Ocean coupled model (AO-GCM), which only simulates the physical and dynamic processes in the atmosphere and ocean. Where a GCM does include parts of the other processes, it it typically only to supply appropriate boundary conditions.
  • an Earth System Model (ESM), which also includes the terrestrial and marine biogeochemical processes, snow and ice dynamics, atmospheric chemistry, aerosols, and so on – i.e. it includes simulations of most of the rest of the diagram.

Over the past decade, AO-GCMs have steadily evolved to become ESMs, although there are many intermediate forms around. In the last IPCC assessment, nearly all the models used for the assessment runs were AO-GCMs. For the next assessment, many of them will be ESMs.

Second, perhaps obviously, the diagram doesn’t show any infrastructure code. Some of this is substantial – for example an atmosphere-ocean coupler is a substantial component in its own right, often performing elaborate data transformations, such as re-gridding, interpolation, and synchronization. But this does reflect the way in which scientists often neglect the infrastructure code, because it is not really relevant to the science.

Third, the diagram treats all the connectors in the same way, because, at some level, they are all just data fields, representing physical quantities (mass, energy) that cross subsystem boundaries. However, there’s a wide range of different ways in which these connectors are implemented – in some cases binding the components tightly together with complex data sharing and control coupling, and in other cases keeping them very loose. The implementation choices are based on a mix of historical accident, expediency, program performance concerns, and the sheer complexity of the physical boundaries between the actual earth subsystems. For example, within an atmosphere model, the dynamical core (which computes the basic thermodynamics of air flow) is distinct from the radiation code (which computes how visible light, along with other parts of the spectrum, are scattered or absorbed by the various layers of air) and the moist processes (i.e. humidity and clouds). But the complexity of the interactions between these processes is sufficiently high that they are tightly bound together – it’s not currently possible to treat any of these parts as swappable components (at least in the current generation of models), although during development, some parts can be run in isolation for unit testing e.g. the dynanamical core is tested in isolation, but then most other subcomponents depend on it.

On the other hand, the interface between atmosphere and ocean is relatively simple — it’s the ocean surface — and as this also represents the interface between two distinct scientific disciplines (atmospheric physics and oceanography), atmosphere models and ocean model are always (?) loosely coupled. It’s common now for the two to operate on different grids (different resolution, or even different shape), and the translation of the various data to be passed between them is handled by a coupler. Some schematic diagrams do show how the coupler is connected:

Atmosphere-Ocean coupling via the OASIS coupler (source: Figure 4.2 in the MPI-Met PRISM Earth System Model Adaptation Guide)

Atmosphere-Ocean coupling via the OASIS coupler (source: Figure 4.2 in the MPI-Met PRISM Earth System Model Adaptation Guide)

Other interfaces are harder to define than the atmosphere-ocean interface. For example, the atmosphere and the terrestrial processes are harder to decouple: Which parts of the water cycle should be handled by the atmosphere model and which should be handled by the land surface model? Which module should handle evaporation from plants and soil? In some models, such as ECHAM, the land surface is embedded within the atmosphere model, and called as a subroutine at each time step. In part this is historical accident – the original atmosphere model had no vegetation processes, but used soil heat and moisture parameterization as a boundary condition. The land surface model, JSBACH, was developed by pulling out as much of this code as possible, and developing it into a separate vegetation model, and this is sometimes run as a standalone model by the land surface community. But it still shares some of the atmosphere infrastructure code for data handling, so its not as loosely coupled as the ocean is. By contrast, in CESM, the land surface model is a distinct component, interacting with the atmosphere model only via the coupler. This facilitates the switching of different land and/or atmosphere components into the coupled scheme, and also allows the land surface model to have a different grid.

The interface between the ocean model and the sea ice model is also tricky, not least because the area covered by the ice varies with the seasonal cycle. So if you use a coupler to keep the two components separate, the coupler needs information about which grid points contain ice and which do not at each timestep, and it has to alter its behaviour accordingly. For this reason, the sea ice is often treated as a subroutine of the ocean model, which then avoids having to expose all this information to the coupler. But again we have the same trade-off. Working through the coupler ensures they are self-contained components and can be swapped for other compatible models as needed; but at the cost of increasing the complexity of the coupler interfaces, reducing information hiding, and making future changes harder.

Similar challenges occur for:

  • the coupling between the atmosphere and the atmospheric chemistry (which handles chemical processes as gases and various types of pollution are mixed up by atmospheric dynamics).
  • the coupling between the ocean and marine biogeochemistry (which handles the way ocean life absorbs and emits various chemicals while floating around on ocean currents).
  • the coupling between the land surface processes and terrestrial hydrology (which includes rivers, lakes, wetlands and so on). Oh, and between both of these and the atmosphere, as water moves around so freely. Oh, and the ocean as well, because we have to account for how outflows from rivers enter the ocean at coastlines all around the world.
  • …and so on, as we account for more and more of the earth’s system into the models.

Overall, it seems that the complexity of the interactions between the various earth system processes is so high that traditional approaches to software modularity don’t work. Information hiding is hard to do, because these processes are so tightly inter-twined. A full object-oriented approach would be a radical departure from how these models are built currently, with the classes built on the data objects (the pale blue boxes in the Bretherton diagram) rather than the processes (the green boxes). But the computational demands of the processes in the green boxes is so high that the only way to make them efficient is to give them full access to the low level data structures. So any attempt to abstract away these processes from the data objects they operate on will lead to a model that is too inefficient to be useful.

Which brings me back to the question of how to draw pictures of the architecture so that I can compare the coupling and modularity of different models. I’m thinking the best approach might be to start with the Bretherton diagram, and then overlay it to show how various subsystems are grouped into components, and which connectors are handled by a separate coupler.

Postscript: While looking for good diagrams, I came across this incredible collection of visualizations of various aspects of sustainability, some of which are brilliant, while others are just kooky.

I had some interesting chats in the last few days with Christian Jakob, who’s visiting Hamburg at the same time as me. He’s just won a big grant to set up a new Australian Climate Research Centre, so we talked a lot about what models they’ll be using at the new centre, and the broader question of how to manage collaborations between academics and government research labs.

Christian has a paper coming out this month in BAMS on how to accelerate progress in climate model development. He points out that much of the progress now depends on the creation of new parameterizations for physical processes, but to do this more effectively requires better collaboration between the groups of people who run the coupled models and assess overall model skill, and the people who analyze observational data to improve our understanding (and simulation) of particular climate processes. The key point he makes in the paper is that process studies are often undertaken because they are interesting and or because data is available, but without much idea on whether improving a particular process will have any impact on overall model skill; conversely model skill is analyzed at modeling centers without much follow-through to identify which processes might be to blame for model weaknesses. Both activities lead to insights, but better coordination between them would help to push model development further and faster. Not that it’s easy of course: coupled models are now sufficiently complex that it’s notoriously hard to pin down the role of specific physical processes in overall model skill.

So we talked a lot about how the collaboration works. One problem seems to stem from the value of the models themselves. Climate models are like very large, very expensive scientific instruments. Only large labs (typically at government agencies) can now afford to develop and maintain fully fledged earth system models. And even then the full cost is never adequately accounted for in the labs’ funding arrangements. Funding agencies understand the costs of building and operating physical instruments, like large telescopes, or particle accelerators, as shared resources across a scientific community. But because software is invisible and abstract, they don’t think of it in the same way – there’s a tendency to think that it’s just part of the IT infrastructure, and can be developed by institutional IT support teams. But of course, the climate models need huge amounts of specialist expertise to develop and operate, and they really do need to be funded like other large scientific instruments.

The complexity of the models and the lack of adequate funding for model development means that the institutions that own the models are increasingly conservative in what they do with them. They work on small incremental changes to the models, and don’t undertake big revolutionary changes – they can’t afford to take the risk. There are some examples of labs taking such risks: for example in the early 1990’s ECMWF re-wrote their model from scratch, driven in part to make it more adaptable to new, highly parallel, hardware architectures. It took several years, and a big team of coders, bringing in the scientific experts as needed. At the end of it, they had a model that was much cleaner, and (presumably) more adaptable. But scientifically, it was no different from the model they had previously. Hence, lots of people felt this was not a good use of their time – they could have made better scientific progress during that time by continuing to evolve the old model. And that was years ago – the likelihood of labs making such radical changes these days is very low.

On the other hand, academics can try the big, revolutionary stuff – if it works, they get lots of good papers about how they’re pushing the frontiers, and if it doesn’t, they can write papers about why some promising new approach didn’t work as expected. But then getting their changes accepted into the models is hard. A key problem here is that there’s no real incentive for them to follow through. Academics are judged on papers, so once the paper is written they are done. But at that point, the contribution to the model is still a long way from being ready to incorporate for others to use. Christian estimates that it takes at least as long again to get a change ready to incorporate into a model as it does to develop it in the first place (and that’s consistent with what I’ve heard other modelers say). The academic has no incentive to continue to work on it to get it ready, and the institutions have no resources to take it and adopt it.

So again we’re back to the question of effective collaboration, beyond what any one lab or university group can do. And the need to start treating the models as expensive instruments, with much higher operation and maintenance costs than anyone has yet acknowledged. In particular, modeling centers need resources for a much bigger staff to support the efforts by the broader community to extend and improve the models.

Three separate stories on the front page of the BBC news site today:

Death rate doubles in Moscow as heatwave continues“: Extreme drought in Russia, with heatwaves filling the morgues in Moscow, and the air so thick with smoke you can’t breathe.

Pakistan floods threaten key barrage in southern Sindh“: Entire villages washed away by flooding in Pakistan – as the Globe and Mail puts it, “Scale of Pakistan floods worse than 2004 tsunami, Haiti and Kashmir quakes combined”

China landslide death toll jumps“: “The landslides in Gansu came as China was struggling with its worst flooding in a decade, with more than 1,000 people reported dead and millions more displaced around the country.”

Lots of statistics to measure the human suffering. But nobody (in the mainstream media) pointing out that this is exactly what climate change is expected to do: more frequent and more intense extreme weather events around the globe. When the forecasts from the models are presented in reports as a trend in average temperatures, don’t forget that it’s not the averages that really matter for human well-being – it’s the extremes.

And nobody (in the mainstream media) pointing out that we’re committed to more and more of this for decades, because we can’t just turn off carbon emissions, and we can’t just suck the extra carbon out of the air – it stays there for a very long time. The smoke in Moscow will eventually wash out in a good rainstorm. The carbon in the atmosphere that causes the heatwaves will not – it will keep on accumulating, until we get to zero net emissions. And given how long it will take to entirely re-tool the whole world to clean energy, the heatwaves and floods of this summer will eventually come to look like smallfry. There’s a denialist argument that environmentalists are misanthropes, wanting to deny under-developed countries the benefits of western (fossil-fuel-driven) wealth. But how much proof will we need until people realize that do-nothing strategies on climate change are causing millions of people to suffer?

I was struck by a rather idiotic comment on this CBC story on adaptation to climate change in Northern Canada:  “It’ll be awesome….palm trees, orange trees, right in my backyard!!” Yes. Quite. I’m sure the folks in Moscow will be rushing out to plant palm trees and orange trees to replace the forests that burnt down. Just as soon as they can breathe outdoors again, that is.

Oh look, Moscow is further north than every single major Canadian city. Are we ready for this?

Update: (Aug 10): At last, the BBC links the Moscow heatwave to climate change.

Update2: (Aug 11): Forgot to say that the title of this post is a version of a quote usually attributed to William Gibson.

Update3: (Aug 11): There’s a fascinating workshop in September, in Paris, dedicated to the question of how we can do a better job of forecasting extremes. I’ve missed the registration cut-off, so I probably won’t be able to attend, but the agenda is packed with interesting talks. And of course, the IPCC is in the process of writing a Special Report on Managing the Risks of Extreme Events (the SREX), but it isn’t due out until November next year.

Update4: (Aug 12): Good reporting is picking up. Toronto Star: “Weather-related disasters are here to stay, say scientists“, although I think I like the original AP title better: “Long hot summer of fire and floods fit predictions

This session at the AGU fall meeting in December is right up my street:

IN13: Software Engineering for Climate Modeling

As climate models grow in complexity in response to improved fidelity and inclusion of new physical effects, software engineering increasingly plays a important role in scientific productivity. Model results are more and more used in social and economical decisions, leading to increased demand on the traceability, repeatability, and accountability of climate model experiments. Critical questions include: How to reduce cost & risk in the development process? And how to improve software verification processes? Contributions are solicited on topics including, but not limited to: testing and reliability; life-cycle management; productivity and cost metrics; development tools and other technology; other best practices; and cultural challenges.

I’ve been asked to give an invited talk in the session, so now I’m highly motivated to encourage everyone else to submit abstracts, so that we have a packed session. The call for abstract submissions is now open, deadline is Sept 2, 2010. Go ahead, submit something!

And, as I can never stick to just one thing, here’s some other sessions that look interesting:

Aw, heck, all the sessions in the informatics division sound interesting, as do the ones in Global Environmental Change. I’ll be busy for the whole week!

Last but not least, Tim Palmer from ECMWF will be giving the Bjerknes lecture this year. Tim’s doing really interesting work with multi-model ensembles, stochastic predictions, and seamless assessment. Whatever he talks about, it’ll be great!

One of the exercises I set myself while visiting NCAR this month is to try porting the climate model CESM1 to my laptop (a MacBook Pro). Partly because I wanted to understand what makes it tick, and partly because I thought it would be neat to be able to run it anywhere. At first I thought the idea was crazy – these things are designed to be run on big supercomputers. But CESM is also intended to be portable, as part of a mission to support a broader community of model users. So, porting it to my laptop is a simple proof of concept test – if it ports easily, that’s a good sign that the code is robust.

It took me several days of effort to complete the port, but most of that time was spent on two things that have very little to do with the model itself. The first was a compiler bug that I tripped over (yeah, yeah, blame the compiler, right?) and the second was the issue of getting all the necessary third party packages installed. But in the end I was successful. I’ve just completed two very basic test runs of the model. The first is what’s known as an ‘X’ component set, in which all the major components (atmosphere, ocean, land, ice, etc) don’t actually do anything – this just tests that the framework code builds and runs. The second is an “A” compset at a low resolution, in which all the components are static data models (this ran for five days of simulation time in about 1.5 minutes). If I was going on to test the port correctly, there’s a whole sequence of port validation tests that I ought to perform, for example to check that my runs are consistent with the benchmark runs, that I can stop and restart the model from the data files, that I get the same results in different model configurations, etc. And then eventually there’s the scientific validation tests – checks that the simulated climate in my ported model is realistic.

But for now, I just want to reflect on the process of getting the model to build and run on a new (unsupported) platform. I’ll describe some of the issues I encountered, and then reflect on what I’ve learned about the model. First, some stats. The latest version of the model, CESM1.0 was released on June 25, 2010. It contains just under 1 million lines of code. Three quarters of this is Fortran (mainly Fortran 90), the rest is a mix of shell scripts (of several varieties), XML and HTML:

Lines of Code count for CESM v1.0 (not including build scripts), as calculated by cloc.sourceforge.net v1.51

In addition to the model itself, there are another 12,000 lines of perl and shell script that handle the installing, configuring, building and running the model.

The main issues that tripped me up were

  • The compiler. I decided to use the gnu compiler package (gfortran, included in the gcc package), because it’s free. But it’s not one of the compilers that’s supported for CESM, because in general CESM is used with commercial compilers (e.g. IBM’s) on the supercomputers. I grabbed the newest version of gcc that I could find a pre-built Mac binary for (v4.3.0), but it turned out not to be new enough – I spent a few hours diagnosing what turned out to be a (previously undiscovered?) bug in gfortran v4.3.0 that’s fixed in newer versions (I switched to v4.4.1). And then there’s a whole bunch of compiler flags (mainly to do with compatibility for certain architectures and language versions) that are not compatible with the commercial compilers, which I needed to track down.
  • Third party packages such as MPI (the message passing interface used for exchanging data between model components) and NetCDF (the data formating standard used for geoscience data). It turns out that the Mac already has MPI installed, but without Fortran and Parallel IO support, so I had to rebuild it. And it took me a few rebuilds to get both these packages installed with all the right options.

Once I’d got these sorted, and figured out which compiler flags I needed, the build went pretty smoothly, and I’ve had no problems so far running it. Which leads me to draw a number of (tentative) conclusions about portability. First, CESM is a little unusual compared to most climate models, because it is intended as a community effort, and hence portability is a high priority. It has already been ported to around 30 different platforms, including a variety IBM and Cray supercomputers, and various Linux clusters. Just the process of running the code through many different compilers shakes out not just portability issues, but good coding practices too, as different compilers tend to be picky about different language constructs.

Second, in the process of building the model, it’s quite easy to see that it consists of a number of distinct components, written by different communities, to different coding standards. Most obviously, CESM itself is built from five different component models (atmosphere, ocean, sea ice, land ice, land surface), along with a coupler that allows them to interact. There’s a tension between the needs of scientists who develop code just for a particular component model (run as a standalone model) versus scientists who want to use a fully coupled model. These communities overlap, but not completely, and coordinating the different needs takes considerable effort. Sometimes code that makes sense in a standalone module will break the coupling scheme.

But there’s another distinction that wasn’t obvious to me previously:

  • Scientific code – the bulk of the Fortran code in the component modules. This includes the core numerical routines, radiation schemes, physics parameterizations, and so on. This code is largely written by domain experts (scientists), for whom scientific validity is the over-riding concern (and hence they tend to under-estimate the importance of portability, readability, maintainability, etc).
  • Infrastructure code – including the coupler that allows the components to interact, the shared data handling routines, and a number of shared libraries. Most of this I could characterize as a modeling framework – it provides an overall architecture for a coupled model, and calls the scientific code as and when needed. This code is written jointly by the software engineering team and the various scientific groups.
  • Installation code – including configuration and build scripts. These are distinct from the model itself, but intended to provide flexibility to the community to handle a huge variety of target architectures and model configurations. These are written exclusively by the software engineering team (I think!), and tend to suffer from a serious time crunch: making this code clean and maintainable is difficult, given the need to get a complex and ever-changing model working in reasonable time.

In an earlier post, I described the rapid growth of complexity in earth system models as a major headache. This growth of complexity can be seen in all three types of software, but the complexity growth is compounded in the latter two: modeling frameworks need to support a growing diversity of earth system component models, which then leads to exponential growth in the number of possible model configurations that the build scripts have to deal with. Handling the growing complexity of the installation code is likely to be one of the biggest software challenges for the earth system modeling community in the next few years.

William Connolly has written a detailed critique of our paper “Engineering the Software for Understanding Climate Change”, which follows on from a very interesting discussion about “Amateurish Supercomputing Codes?” in his previous post. One of the issues raised in that discussion is the reward structure in scientific labs for software engineers versus scientists. The funding in such labs is pretty much all devoted to “doing science” which invariably means publishable climate science research. People who devote time and effort to improving the engineering of the model code might get a pat on the back, but inevitably it’s under-rewarded because it doesn’t lead directly to publishable science. The net result is that all the labs I’ve visited so far (UK Met Office, NCAR, MPI-M) have too few software engineers working on the model code.

Which brings up another point. Even if these labs decided to devote more budget to the software engineering effort (and it’s not clear how easy it would be to do this, without re-educating funding agencies), where will they recruit the necessary talent? They could try bringing in software professionals who don’t yet have the domain expertise in climate science, and see what happens. I can’t see this working out well on a large scale. The more I work with climate scientists, the more I appreciate how much domain expertise it takes to understand the science requirements, and to develop climate code. The potential culture clash is huge: software professionals (especially seasoned ones) tend to be very opinionated about “the right way to build software”, and insensitive to contextual factors that might make their previous experiences inapplicable. I envision lots of the requirements that scientists care about most (e.g. the scientific validity of the models) getting trampled on in the process of “fixing” the engineering processes. Right now the trade-off between getting the science right versus having beautifully engineered models is tipped firmly in favour of the former. Tipping it the other way might be a huge mistake for scientific progress, and very few people seem to understand how to get both right simultaneously.

The only realistic alternative is to invest in training scientists to become good software developers. Greg Wilson is pretty much the only person around who is covering this need, but his software carpentry course is desperately underfunded. We’re going to need a lot more like this to fix things…

The Muir Russell report came out today, and I just finished reading the thing. It should be no surprise to anyone paying attention that it completely demolishes the the allegations that have been made about the supposed bad behaviour of the CRU research team. But overall, I’m extremely disappointed, because the report completely misses the wood for the trees. It devotes over 100 pages to a painstaking walk through every single allegation made against the CRU, assessing the evidence for each, and demolishing them one after another. The worst it can find to say about the CRU is that it hasn’t been out there in the lead over the last decade in responding to the new FoI laws, adapting to the rise of the blogosphere, and adapting to changing culture of openness for scientific data. The report makes a number of recommendations for improvements in processes and practices at the CRU, and so can be taken as mildly critical, especially of CRU governance. But in so doing, it never really acknowledges the problems a small research unit (varying between 3.5 to 5 FTE staff over the last decade) would have in finding the resources and funding to be an early adopter in open data and public communication, while somehow managing to do cutting edge research in its area of expertise too. Sheesh!

But my biggest beef with the report is that nowhere, in 100 pages of report plus 60 pages of appendices, does it ever piece together the pattern represented by the set of allegations it investigates. Which means it achieves nothing more than being one more exoneration in a very long list of exonerations of climate scientists. It will do nothing to stop the flood of hostile attacks on science, because it never once considers the nature of those attacks. Let’s survey some of the missed opportunities…

I’m pleased to see the report cite some of the research literature on the nature of electronic communication (e.g. the early work of Sara Kiesler et al), but it’s a really pity they didn’t read much of this literature. One problem recognized even in early studies of email communication is the requesters/informers imbalance. Electronic communication makes it much easier for large numbers of people to offload information retrieval tasks onto others, and receivers of such requests find it hard to figure out which requests they are obliged to respond to. They end up being swamped. Which is exactly what happened with that (tiny) research unit in the UK, when a bunch of self-styled auditors went after them.

And similar imbalances pervade everything. For example on p42, we have:

“There continues to be a scientific debate about the reality, causes and uncertainties of climate change that is conducted through the conventional mechanisms of peer-reviewed publication of results, but this has been paralleled by a more vociferous, more polarised debate in the blogosphere and in popular books. In this the protagonists tend to be divided between those who believe that climate is changing and that human activities are contributing strongly to it, and those that are sceptical of this view. This strand of debate has been more passionate, more rhetorical, highly political and one in which each side frequently doubts the motives and impugns the honesty of the other, a conflict that has fuelled many of the views expressed in the released CRU emails, and one that has also been dramatically fuelled by them.” (page 42, para 26)

But the imbalance is clear. This highly rhetorical debate in the blogosphere occurs between, on the one hand, a group of climate scientists with many years training, and whose expertise is considerable (and the report makes a good job of defending their expertise), and on the other hand, a bunch of amateurs, most of whom have no understanding of how science works, and who are unable to distinguish scientific arguments from ideology. And the failure to recognise this imbalance leads the report to conclude that a suitable remedy is to :

“…urge all scientists to learn to communicate their work in ways that the public can access and understand; and to be open in providing the information that will enable the debate, wherever it occurs, to be conducted objectively.” (page 42, para 28)

No, no, no. As I said very strongly earlier this year, this is naive and irresponsible. No scientist can be an effective communicator in a world where people with vested interests will do everything they can to destroy his or her reputation.

Chapter 6 of the report, on the land station temperature record ought to shut Steve McKitrick McIntyre up forever. But of course it won’t, because he’s not interested in truth, only in the dogged determination to find fault with climate scientists’ work no matter what. Here’s some beautiful quotes:

“To carry out the analysis we obtained raw primary instrumental temperature station data. This can be obtained either directly from the appropriate National Meteorological Office (NMO) or by consulting the World Weather Records (WWR) …[web links elided] … Anyone working in this area would have knowledge of the availability of data from these sources.” (Page 46, paras 13-14)

“Any independent researcher may freely obtain the primary station data. It is impossible for a third party to withhold access to the data.” (Page 48, para 20).

…well, anyone that it except McKitrickMcIntyre and followers, who continue to insist, despite all evidence to the contrary, that climate scientists are withholding station data.

And on sharing the code, the report is equally dismissive of the allegations:

“The computer code required to read and analyse the instrumental temperature data is straightforward to write based upon the published literature.  It amounts a few hundred lines of executable code (i.e. ignoring spaces and comments). Such code could be written by any research unit which is competent to reproduce or test the CRUTEM analysis.  For the trial analysis of the Review Team, the code was written in less than two days and produced results similar to other independent analyses. No information was required from CRU to do this.” (Page 51, para 33)

I like the “any research unit which is competent to reproduce or test the CRUTEM analysis” bit. A lovely British way of saying that  the people making allegations about lack of openness are incompetent. And here’s another wonderful British understatement, referring to ongoing criticism of Briffa’s 1992 work:

“We find it unreasonable that this issue, pertaining to a publication in 1992, should continue to be misrepresented widely to imply some sort of wrongdoing or sloppy science.” (page 62, para 32)

Unreasonable? Unreasonable? It’s an outrage, an outrage I tell you!! (translation provided for those who don’t speak British English).

And there’s that failure to address the imbalance again. In examining the allegations from Boehmer-Christiansen, editor of the notoriously low-quality journal Energy and Environment, that the CRU researchers tried to interfer with the peer-review process, we get the following bits of evidence: An email sent by Boehmer-Christiansen to a variety of people with the subject line Please take note of potetially [sic] serious scientific fraud by CRU and Met Office.“, and Jones’ eventual reply to her head of department: “I don‟t think there is anything more you can do. I have vented my frustration and have had a considered reply from you“, which leads to the finding:

“We see nothing in these exchanges or in Boehmer-Christiansen’s evidence that supports any allegation that CRU has directly and improperly attempted to influence the journal that she edits. Jones’ response to her accusation of scientific fraud was appropriate, measured and restrained.” (page 66, para 14).

Again, a missed opportunity to comment on the imbalance here. Boehmer-Christiansen is able to make wild and completely unfounded accusations of fraud, and nobody investigates her, while Jones’ reactions to the allegations are endlessly dissected, and in the end everything’s okay, because his response was “appropriate, measured and restained”. No, that doesn’t make it okay. It means someone failed to ask some serious questions how and why people like Boehmer-Christiansen can be allowed to get away with continual smearing of respected climate scientists.

So, an entire 160 pages, in which the imbalance is never once questioned – the imbalance between the behaviour that’s expected of climate scientists, and the crap that the denialists are allowed to get away with. Someone has to put a stop to their nonsense, but unfortunately, Muir Russell ducked the responsibility.

Postscript: my interest in software engineering issues makes me unable to let this one pass without comment. The final few pages of the report criticize the CRU for poor software development standards:

“We found that, in common with many other small units across a range of universities and disciplines, CRU saw software development as a necessary part of a researcher‘s role, but not resourced in any professional sense.  Small pieces of software were written as required, with whatever level of skill the specific researcher happened to possess.  No formal standards were in place for: Software specification and implementation; Code reviews; and Software testing” (page 103, para 30).

I don’t dispute this – it is common across small units, and it ought to be fixed. However, it’s a real shame the report doesn’t address the lack of resources and funding for this. But wait. Scroll back a few pages…

“The computer code required to read and analyse the instrumental temperature data is straightforward to write […] It amounts a few hundred lines of executable code […]  For the trial analysis of the Review Team, the code was written in less than two days and produced results similar to other independent analyses.” (page 51, para 33)

Er, several hundred lines of code written in less than 2 days? What, with full software specification, code review, and good quality testing standards? I don’t think so. Ironic that the review team can criticize the CRU software practices, while taking the same approach themselves. Surely they must have spotted the irony?? But, apparently not. The hypocrisy that’s endemic across the software industry strikes again: everyone has strong opinions about what other groups ought to be doing, but nobody practices what they preach.

Gavin beat me to posting the best quote from the CCSM workshop last week – the Uncertainty Prayer. Uncertainty cropped up as a theme throughout the workshop. In discussions about the IPCC process, one issue came up several times: the likelihood that the spread of model projections in the next IPCC assessment will be larger than in AR4. The models are significantly more complex than they were five years ago, incorporating a broader set of earth system phenomena and resolving finer grain processes. The uncertainties in a more complex earth system model have a tendency to multiply, leading to a broader spread.

There is a big concern here about how to communicate this. Does this mean the science is going backwards – that we know less now than we did five years ago (imagine the sort of hay that some of the crazier parts of the blogosphere will make of that)? Well, there has been all sorts of progress in the past five years, much of it to do with understanding the uncertainties. And one result is the realization that the previous generations of models have under-represented uncertainty in the physical climate system – i.e. the previous projections for future climate change were more precise than they should have been. The implications are very serious for policymaking, not because there is any weaker case now for action, but precisely the opposite – the case for urgent action is stronger because the risks are worse, and good policy must be based on sound risk assessment. A bigger model spread means there’s now a bigger risk of more extreme climate responses to anthropogenic emissions. This problem was discussed at a fascinating session at the AGU meeting last year on validating model uncertainty (See: “How good are predictions from climate models?“).

At the CCSM meeting last week, Julia Slingo, chief scientist at the UK Met Office put the problem of dealing with uncertainty into context, by reviewing the current state of the art in short and long term forecasting, in a fascinating talk “Uncertainty in Weather and Climate Prediction”.

She began with the work of Ed Lorenz. The Lorenz attractor is the prototype chaotic model. A chaotic system is not random, and the non-linear equations of a chaotic system demonstrate some very interesting behaviours. If it’s not random, then it must be predictable, but this predictability is flow dependent – where you are in the attractor will determine where you will go, but some starting points lead to a much more tightly constrained set of behaviours than others. Hence, the spread of possible outcomes depends on the initial state, and some states have more predictable outcomes than others.

Why stochastic forecasting is better than deterministic forecasting

Much of the challenge in weather forecasting is to sample the initial condition uncertainty. Rather than using a single (deterministic) forecast run, modern weather forecasting makes use of ensemble forecasts, which probe the space of possible outcomes from a given (uncertain) initial state. This then allows the forecasters to assess possible outcomes, estimate risks and possibilities, and communicate risks to the users. Note the phrase “to allow the forecasters to…” – the role of experts in interpreting the forecasts and explaining the risks is vital.

As an example, Julia showed two temperature forecasts for London, using initial conditions for 26 June on two consecutive years, 1994 and 1995. The red curves show the individual members of an ensemble forecast. The ensemble spread is very different in each case, demonstrating that some initial conditions are more predictable than others: one has very high spread of model forecasts, and the other doesn’t (although note that in both cases the actual observations lie within the forecast spread):

Ensemble forecasts for two different initial states (click for bigger)

The problem is that in ensemble forecasting, the root mean squared (rms) error of the ensemble mean often grows faster than the spread, which indicates that the forecast is under-dispersive; in other words, the models don’t capture enough of the internal variability in the system. In such cases, improving the models (by eliminating modeling errors) will lead to increased internal variability, and hence larger ensemble spread.

One response to this problem is the work on stochastic parameterizations. Essentially, this introduces noise into the model to simulate variability in the sub-grid processes. This can then reduce the systematic model error if it better captures the chaotic behaviour of the system. Julia mentioned three schemes that have been explored for doing this:

  • Random Parameters (RP), in which some of the tunable model parameters are varied randomly. This approach is not very convincing as it indicates we don’t really know what’s going on in the model.
  • Stochastic Convective Vorticity (SCV)
  • Stochastic Kinetic Energy Backscatter (SKEB)

The latter two approaches tackle known weaknesses in the models, at the boundaries between resolved physical processes and sub-scale parameterizations. There is plenty of evidence in recent years that there are upscale energy cascades from unresolved scales, and that parametrizations don’t capture this. For example, in the backscatter scheme, some fraction of dissipated energy is scattered upscale and acts as a forcing for the resolved-scale flow. By including this in the ensemble prediction system, the forecast is no longer under-dispersive.

The other major approach is to increase the resolution of the model. Higher resolutions models will explicitly resolve more of the moist processes in sub-kilometer scale, and (presumably) remove this source of model error, although it’s not yet clear how successful this will be.

But what about seasonal forecasting – surely this growth of uncertainty prevents any kind of forecasting? People frequently ask “If we can’t predict weather beyond the next week, why is it possible to make seasonal forecasts?” The reason is that for longer term forecasts, the boundary forcings start to matter more. For example, if you add a boundary forcing to the Lorenz attractor, it changes the time in which the system stays in some part of the attractor, without changing the overall behaviour of the chaotic system. For a weak forcing, the frequency of occurrence of different regimes is changed, but the number and spatial patterns are unchanged. Under strong forcing, even the patterns of regimes are modified as the system goes through bifurcation points. So if we know something about the forcing, we can forecast the general statistics of weather, even if it’s not possible to say what the weather will be at a particular location at a particular time.

Of course, there’s still a communication problem: people feel weather, not the statistics of climate.

Building on the early work of Charney and Shukla (e.g. see their 1981 paper on monsoon predictability), seasonal to decadal prediction using coupled atmosphere-ocean systems does work, whereas 20 years ago, we would never have believed it. But again, we get the problem that some parts of the behaviour space are easier to predict than others. For example, the onset of El Niño is much harder to predict than the decay.

In a fully coupled system, systematic and model-specific errors grow much more strongly. Because the errors can grow quickly, and bias the probability distribution of outcomes, seasonal and decadal forecasts may not be reliable. So we assess reliability of a given model using hindcasts. Every time you change the model, you have to redo the hindcasts to check reliability. This gives a reasonable sanity check for seasonal forecasting, but for decadal prediction, it is challenging has we have very limited observational base.

And now, we have another problem: climate change is reducing the suitability of observations from the recent past to validate the models, even for seasonal prediction:

Climate Change shifts the climatology, so that models tuned to 20th century climate might no longer give good forecasts

Hence, a 40-year hindcast set might no longer be useful for validating future forecasts. As an example, the UK Met Office got into trouble for failing to predict the cold winter in the UK for 2009-2010. Re-analysis of the forecasts indicates why: Models that are calibrated on a 40-year hindcast gave only 20% probability of cold winter (and this was what was used for the seasonal forecast last year). However, models that are calibrated on just the past 20-years gave a 45% probability. Which indicates that the past 40 years might no longer be a good indicator of future seasonal weather. Climate change makes seasonal forecasting harder!

Today, the state-of-the-art for longer term forecasts is multi-model ensembles, but it’s not clear this is really the best approach, it just happens to be where we are today. Multi-model ensembles have a number of strengths: Each model is extensively tested by its own community and a large pool of alternative components provides some sampling across structural assumptions. But they are still an ensemble of opportunity – they do not systematically sample uncertainties. Also the set is rather small – e.g. 21 different models. So the sample is too small for determining the distribution of possible changes, and the ensembles are especially weak for predicting extreme events.

There has been a major effort on quantifying uncertainty over last few years at the Hadley Centre, using a perturbed physics ensemble. This allows for a larger sample: 100s (or even 10,000s in climateprediction.net) of variants of the same model. The poorly constrained model parameters are systematically perturbed, within expert-suggested ranges. But this still doesn’t sample the structural uncertainty in the models, because all the variants are from a single base model. As an example of this work, the UKCP09 project was an attempt to move from uncertainty ranges (as in AR4) to a probability density function (pdf) for likely change. UKCP uses over 400 model projections to compute the pdf. Although there are many problems with the UKCP (see the AGU discussion for a critique), but they were a step forward in understanding how to quantify uncertainty. [Note: Julia acknowledged weaknesses in both CP.net and the UKCP projects, but pointed out that they are mainly interesting as examples of how forecasting methodology is changing]

Another approach is to show which factors tend to dominate the uncertainty. For example, a pie chart showing impact of different sources of uncertainty (model weaknesses, carbon cycle, natural variability, downscaling uncertainty) on the forecast for rainfall in 2020s vs 2080s is interesting – for the 2020s, the uncertainty about the carbon cycle is relatively small factor, whereas for the 2080s it’s a much bigger factor.

Julia suggests it’s time for a coordinated study of the effects of model resolution on uncertainty. Every modeling group is looking at this, but they are not doing standardized experiments, so comparisons are hard.

Here is an example from Tim Palmer. In AR4, WG1 chapter 11 gave an assessment of regional patterns of change in precipitation. For some regions, it was impossible to give a prediction (the white areas), whereas for others, the models appear to give highly confident predictions. But the confidence might be misplaced because many of the models have known weaknesses that are relevant to future precipitation. For example, the models don’t simulate persistent blocking anticyclones very well. Which means that it’s wrong to assume that if most models agree, we can be confident in the prediction. For example, the Athena experiments with very high resolution models (T1259) showed much better blocking behaviour against the observational dataset ERA40. This implies we need to be more careful about selecting models for a multi-model ensemble for certain types of forecast.

The real butterfly effect raises some fundamental unanswered questions about convergence of climate simlations with increasing resoltion. Maybe there is an irreducible level of uncertainty in climate change. And if so, what is it? How much will increased resolution reduce the uncertainty? Will things be much better when we can resolve processes at  20km, 2km, or even 0.2km? compared to say 200km? Once we reach a certain resolution (e.g. 20km) is it just as good to represent small scale motions using stochastic equations? And what’s the most effective way to use the available computing resources as we increase the resolution? [There’s an obvious trade-off between increasing the size of the ensemble, and increasing the resolution of individual ensemble members]

Julia’s main conclusion is that Lorenz’ theory of chaotic systems now pervades all aspects of weather and climate prediction. Estimating and reducing uncertainty requires better multi-scale physics, higher resolution models, and more complete observations.

Some of the questions after the talk probed these issues a little more. For example, Julia was asked  how to handle policymakers demanding better decadal prediction, when we’re not ready to deliver it. Her response was that she believes higher resolution modeling will help, but that we haven’t proved this yet, so we have to manage expectations very carefully. She was also asked about the criteria to use to use for including different models in an ensemble – e.g. should we exclude models that don’t conserve physical quantities, that don’t do blocking, etc? For UKCP09, the criteria were global in nature, but this isn’t sufficient – we need criteria that test for skill with specific phenomena such as El Nino. Because the inclusion criteria aren’t clear enough yet, the UKCP project couldn’t give advice on wind in the projections. In the long run, the focus should be on building the best model we can, rather than putting effort into exploring perturbed physics, but we have to balance needs of users for better probablistic predictions against need to get on and develop better phyiscs in the models.

Finally, on the question of interpretation, Julia was asked what if users (of the forecasts) can’t understand or process probablistic forecasts? Julia pointed out that some users can process probablistic forecasts, and indeed that’s exactly what they need. For example, the insurance industry. Others use it as input for risk assessment – e.g. water utilities. So we do have to distinguish the needs of different types of users.

The IPCC schedule impacts nearly all aspects of climate science. At the start of this week’s CCSM workshop, Thomas Stocker from the University of Bern, and co-chair of working group 1 of the IPCC, gave an overview of the road toward the fifth assessment report (AR5), due to be released in 2013

First, Thomas reminded us that the IPCC does not perform science (it’s job is to assess the current state of the science), but increasingly it stimulates science. This causes some tension though, as curiosity-driven research must remain the priority for the scientific community.

The highly politicized environment also poses a huge risk. There are some groups actively seeking to discredit climate science and damage the IPCC, which means that rigor of the IPCC procedures are now particularly important. One important lesson from the last year is that there is no procedure for correcting serious errors in the assessment reports. Minor errors are routine, and are handled by releasing errata. But this process broke down for bigger issues such as the Himalayan glacier error.

Despite the critics, climate science is about as transparent as a scientific field can be. Anyone can download a climate model and see what’s in there. The IPCC process is founded on four key values (thanks to the advocacy of Susan Solomon): Rigor, Robustness, Transparency, and Comprehensiveness. However, there are clearly practical limits to transparency. For example, it’s not possible to open up lead author meetings, because the scientists need to be able to work together in a constructive atmosphere, rather than “having miscellaneous bloggers in the room”!

The structure of the IPCC remain the same: three working groups: WG1 on the physical science basis, WG2 on impacts and adaptation, and WG3 on mitigation, along with a task force on GHG inventories.

The most important principles for the IPCC are in article 2 and 3:

2. “The role of the IPCC is to assess on a comprehensive, objective, open and transparent basis the scientific, technical and socio-economic information relevant to understanding the scientific basis of risk of human-induced climate change, its potential impacts and options for adaptation and mitigation. IPCC reports should be neutral with respect to policy, although they may need to deal objectively with scientific, technical and socio-economic factors relevant to the application of particular policies.

3. Review is an essential part of the IPCC process. Since the IPCC is an intergovernmental body, review of IPCC documents should involve both peer review by experts and review by governments.

A series of meetings have already occurred in preparation for AR5:

  • Mar 2009: An expert meeting on science of alternative greenhouse gas metrics. The met and produced a report.
  • Sept 2009: An expert meeting on detection and attribution, which produced a report and a good practice guidance paper [which itself is a great introduction to how attribution studies are done].
  • Jan 2010: An expert meeting at NCAR on assessing and combining multi-model projections. The report from this meeting is due in a few weeks, and will also include a good practice guide.
  • Jun 2010: A workshop on sea level rise and ice sheet instability, which was needed because of the widespread recognition that AR4 was weak on this issue, perhaps too cautious.
  • And in a couple of weeks, in July 2010, a workshop on consistent treatment of uncertainties and risks. This is a cross-Working Group meeting, at which they hope to make progress on getting all three working groups to use the same approach. In the AR4, WG1 developed a standardized language for describing uncertainty, but other working groups have not yet.

Thomas then identified some important emerging questions leading up to AR5.

  1. Trends and rates of observed climate change, and in particular, the question of whether climate change has accelerated? Many recent papers and reports indicate that it has; the IPCC needs to figure out how to assess this, especially as there are mixed signals. For example, the decadal trend is accelerating in Arctic sea ice extent, but  the global temperature anomaly has not accelerated over this time period.
  2. Stability of the Western and Eastern Antarctic ice sheets (WAIS and EAIS). There has been much more dynamic change at margins of these ice sheets, accelerating mass loss, as observed by GRACE. The assessment needs to look into whether these really are accelerating trends, or if its just an artefact of limited duration of measurements.
  3. Irreversibilities and abrupt change: how robust and accurate is our understanding? For example, what long term commitment have been made already in sea level rise. And what about commitments in the hydrological cycle, where some regions (Africa, Europe) might go beyond the range of observed drought within the next couple of decades, and this may be unavoidable.
  4. Clouds and Aerosols, which will have their own entire chapter in AR5. There are still big uncertainties here. For example, low level clouds are a positive feedback in the north-east Pacific, yet all but one model are unable to simulate this.
  5. Carbon and other biogeochemical cycles. New ice core reconstructions were published just after AR4, and give us more insights into regional carbon cycle footprints caused by abrupt climate change in the past. For example, the ice cores show clear changes in soil moisture and total carbon stored  in the Amazon region.
  6. Near-term and long-term projections, for example the question of how reliable the decadal projections are. This is a difficult area. Some people say we already have seamless prediction (from decades to centuries), but not Thomas is not yet convinced. For example, there are alarming new results on number of extreme hot days across southern Europe that need to be assessed – these appear to challenge assumptions about the decadal trends.
  7. Regional issues – eg frequency and severity of impacts. Traditionally, the IPCC reports have taken an encyclopedic approach: take each region, and list the impacts in each. Instead, for AR5, the plan is to start with the physical processes, and then say something about sensitivity within each region to these processes.

Here’s an overview of the planned structure of the AR5 WG1 report:

  • Intro
  • 4 chps on observations and paleoclimate
  • 2 chps on process understanding (biogeochemistry and clouds/aerosols)
  • 3 chps from forcing to attributions
  • 2 chps on future climate change and predictability (near term and long term)
  • 2 integration chapters (one on sea level rise, and one on regional issues)

Some changes are evident from AR4. Observations have become more important. They grew to 3 chapters in AR4, and will keep the same in AR5. There will be another crack at paleoclimate, and new chapters on: sea level rise (a serious omission in AR4); clouds and aerosols; the carbon cycle; and regional change. There is also a proposal to produce an atlas which will include a series of maps summarizing the regional issues.

The final draft of the WG1 report is due in May 2013, with a final plenary in Sept 2013. WG2 will finish in March 2014, and WG3 in April 2014. Finally, the IPCC Synthesis Report is to be done no later than 12 months from WG1 report, ie. by September 2014. There has been pressure to create a process that incorporates new science throughout 2014 in to the synthesis report, however Thomas has successfully opposed this, on the basis that it will cause far more controversy if the synthesis report is not consistent with the WG reports.

The deadlines for published research to be included in the assessment is as follows. Papers need to be submitted for publication by 31 July 2012, and must be in press by 15 March 2013. The IPCC has to be very strict about this, because there are people out there who have nothing better to do than to wade through all the references in AR4 and check that all of them appeared before the cutoff date.

Of course, these dates are very relevant to the CCSM workshop audience. Thomas urged everyone not to leave this to the last minute; journal editors and reviewers will be swamped if everyone tries to get their papers published just prior to the deadline [although I suspect this is inevitable?].

Finally, here is a significant challenge in communication coming up. For AR5 we’re expecting to see a much broader model diversity than in previous assessments, partly because there are more models (and more variants), and partly because the models now include a broader range of earth system processes. This will almost certainly mean a bigger model spread,  and hence a likely increase in uncertainty. It will be a significant challenge to communicate the reasons for this to policymakers and a lay audience. Thomas argues that we must not be ashamed to present how science works – that in some cases the uncertainties multiply, during which the spread of projections grows, and then when we get the models more constrained by observations they converge again. But this also poses problems in how we do model elimination and model weighting in ensemble projections. For example, if a particular model shows no sea ice in the year 2000, it probably should be excluded as this is clearly wrong. But how do we set clear criteria for this?

I’ve speculated before about the factors that determine the length of the release cycle for climate models. The IPCC assessment process, which operates on a 5-year cycle tends to dominate everything. But there are clearly other rhythms that matter too. I had speculated that the 6-year gap between the release of CCSM3 and CCSM4 could largely be explained by the demands of the the IPCC cycle; however the NCAR folks might have blown holes in that idea by making three new releases in the last six months; clearly other temporal cycles are at play.

In discussion over lunch yesterday, Archer pointed me to the paper “Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work”  by Steven Jackson and co, who point out that while the role of space and proximity have been widely studied in colloborative work, the role of time and patterns of temporal constraints have not. They set out four different kinds of temporal rhythm that are relevant to scientific work:

  • phenomenal rhythms, arising from the objects of study – e.g. annual and seasonal cycles strongly affect when fieldwork can be done in biology/ecology; the development of a disease in an individual patient affects the flow of medical research;
  • institutional rhythms, such as the academic calendar, funding deadlines, the timing of conferences and paper deadlines, etc.
  • biographical rhythms, arising from individual needs – family time, career development milestones, illnesses and vacations, etc.
  • infrastructural rhythms, arising from the development of the buildings and equipment that scientific research depends on. Examples include the launch, operation and expected life of a scientific instrument on a satellite, the timing of software releases, and the development of classification systems and standards.

The paper gives two interesting examples of problems in aligning these rhythms. First, the example of the study of long term phenomena such as river flow on short term research grants led to mistakes where a data collected during an unusually wet period in the early 20th century led to serious deficiencies in water management plans for the Colorado river. Second, for NASA’s Mars mission MER, the decision was taken to put the support team on “Mars time” as the Martian day is 2.7% longer than the earth day. But as the team’s daily work cycle drifted from the normal earth day, serious tensions arose between the family and social needs of the project team and the demands of the project rhythm.

Here’s another example that fascinated me when I was at the NASA software verification lab in the 90s. The Cassini spacecraft took about six years to get to Saturn. Rather than develop all the mission software prior to launch, NASA took the decision to develop only the minimal software needed for launch and navigation, and delayed the start of development of the mission software until just prior to arrival at Saturn. The rational was that they didn’t want a six year gap between development and use of this software, during which time the software teams might disperse – they needed the teams in place, with recent familiarity with the code, at the point the main science missions started.

For climate science, the IPCC process is clearly a major institutional rhythm, but the infrastructural rhythms that arise in model development interact with this in complex ways. I need to spend time looking at the other rhythms as well.

Of all the global climate models, the Community Earth System Model, CESM, seems to come closest to the way an open source community works. The annual CESM workshop, this week in Breckenridge, Colorado, provides an example of how the community works. There are about 350 people attending, and much of the meeting is devoted to detailed discussion of the science and modeling issues across a set of working groups: Atmosphere model, Paleoclimate, Polar Climate, Ocean model, Chemistry-climate, Land model, Biogeochemistry, Climate Variability, Land Ice, Climate Change, Software Engineering, and Whole Atmosphere.

In the opening plenary on Monday, Mariana Vertenstein (who is hosting my visit to NCAR this month), was awarded the 2010 CESM distinguished achievement award for her role in overseeing the software engineering of the CESM. This is interesting for a number of reasons, not least because it demonstrates how much the CESM community values the role of the software engineering team, and the advances that the software engineering working group has made improving the software infrastructure over the last few years.

Earth system models are generally developed in a manner that’s very much like agile development. Getting the science working in the model is prioritized, with issues such as code structure, maintainability and portability worked in later, as needed. To some extent, this is appropriate – getting the science right is the most important thing, and it’s not clear how much a big upfront design effort would payoff, especially in the early stages of model development, when it’s not clear whether the model will become anything more than an interesting research idea. The downside of this strategy, is that as the model grows in sophistication, the software architecture ends up being a mess. As Mariana explained in her talk, coupled models like the CESM have reached a point in their development where this approach no longer works. In effect, a massive refactoring effort is needed to clean up the software infrastructure to permit future maintainability.

Mariana’s talk was entitled “Better science through better software”. She identified a number of major challenges facing the current generation of earth system models, and described some of the changes in the software infrastructure that have been put in place for the CESM to address them.

The challenges are:

1) New system complexity, as new physics, and new grids are incorporated into the models. For example, the CESM now has a new land ice model, which along with the atmosphere, ocean, land surface, and sea ice components brings the total to five distinct geophysical component models, each operating on different grids, and each with its own community of users. These component models exchange boundary information via the coupler, and the entire coupled model now runs to about 1.2 million lines of code (compare with the previous generation model, CCSM3, now six years old, which had about 330KLoC).

The increasing number of component models increases the complexity of the coupler. It now has to handle regridding (where data such as energy and mass is exchanged between component models with different grids), data merging, atmosphere-ocean fluxes, and conservation diagnostics (e.g. to ensure the entire model conserves energy and mass). Note: Older versions of the model were restricted, for example with the atmosphere, ocean and land surface schemes all required to use the same grid.

Users also want to be able to swap in different versions of each major component. For example, a particular run might demand a fully prognostic atmosphere model, coupled with a prescribed ocean parameterization (taken from observational data, for example). Then, within each major component, users might want different configurations:  multiple dynamic cores, multiple chemistry modes, etc.

Another source of complexity comes from resolutions. Model components now run over a much wider range of resolutions, and the re-gridding challenges are substantial. And finally, whereas the old model used rectangular latitude-longitude grids, now people want to accommodate many different types of grid.

2) Ultra-high resolution. The trend towards higher resolution grids poses serious challenges for scalability, especially given the massive increase in volume of data being handled. All components (and the coupler) need to be scalable in terms of both memory and performance.

Higher resolution increases the need for more parallelism, and there has been tremendous progress on this in the last few years. A few years back, as part of the DOE/LLNL grand challenge, CCSM3 managed 0.5 simulation years per day, running on 4,000 cores, and this was considered a great achievement. This year, the new version of CESM has successfully run on 80,000 cores, to give 3 simyears per day in a very high resolution model: 0.125° grid for the atmosphere, 0.25° for the land and 0.1° for the ocean.

Interestingly, in these highly parallel configurations, the ocean model, POP, is no longer dominant for processing time; the sea ice and atmosphere models start to dominate because the two of them are coupled sequentially. Hence the ocean model scales more readily.

3) Data assimilation. For weather forecasting models, this has long been standard analysis practice. Briefly, the model state and the observational data are combined at each timestep to give a detailed analysis of the current state of the system, which helps to overcome limitations in both the model and the data, and to better understand the physical processes underlying the observational data. It’s also useful in forecasting, as it allows you to arrive at a more accurate initial state for a forecast run.

In climate modeling, data assimilation is a relatively new capability. The current version of the CESM can do data assimilation in both the atmosphere and ocean. The new framework also supports experiments where multiple versions of the same component are used within a run. For example, the model might have multiple atmosphere components in a single simulation, each coupled with its own instance of the ocean, where one is an assimilation module and the other a prognostic model.

4) The needs of the user community. Supporting a broad community of model users adds complexity, especially as the community becomes more diverse. The community needs more frequent releases of the model (e.g. more often than every six years!), and people ned to be able to merge new releases more easily into their own sandboxes.

These challenges have inspired a number of software infrastructure improvements in the CESM. Mariana described a number of advances.

The old model, CCSM3 was run as multiple executables, one for each major component, exchanging data with a coupler via MPI. And each component used to have its own way of doing coupling. But this kills efficiency – processors end up idling when a component has to wait on data from the others. It’s also very hard in this scheme to understand the time evolution as the model runs, which then also makes it very hard to debug. And the old approach was notoriously hard to port to different platforms.

The new framework has a top level driver that controls time evolution, with all coupling done at the top level. Then the component models can be laid out across the available processors, either all in parallel, or in a hybrid parallel-sequential mode. For example, atmosphere, land scheme and sea ice modules might be called in sequence, with the ocean model running in parallel with the whole set. The chosen architecture is specified in a single XML file. This brings a number of benefits:

  • Better flexibility for very different platforms;
  • Facilitates model configurations with huge amounts of parallelism across a very large number of processors;
  • Allows the coupler & components to be ESMF compliant, so the model can can couple with other ESMF compliant models;
  • Integrated release cycle – it’s now all one model, whereas in the past each component model had it’s own separate releases.
  • Much easier to debug, as it’s easier to follow the time evolution.

The new infrastructure also includes scripting tools that support the process of setting up an experiment, and making sure it runs with optimal performance on a particular platform. For example, the current release includes script to create wide variety of out-of-the-box experiments. It also includes a load balancing tool, to check how much time each component is idle during a run, and new scripts with hints for porting to new platforms, based on a set of generic machine templates.

The model also has a new parallel I/O library (PIO), which adds a layer of abstraction between the data structures used in each model component and the arrangement of the data when written to disk.

The new versions of the model are now being released via the subversion repository (rather than a .tar file, as used in the past). Hence, users can use an svn merge to get the latest release. There have been three model releases since January:

  • CCSM Alpha, released in January 2010;
  • CCSM 4.0 full release, in April 2010;
  • CESM 1.0 released June 2010.

Mariana ended her talk with a summary of the future work – complete the CMIP5 runs for the next round of the IPCC assessment process; regional refinement with scalable grids; extend the data assimilation capability; handle super-parameterization (e.g. include cloud resolving models); add hooks for human dimensions within the models (e.g. to support the DOE program on integrated assessment); and improved validation metrics.

Note: the CESM is the successor to CCSM – the community climate system model. The name change recognises the wider set of earth systems now incorporated into the model.

I had a bit of a gap in blogging over the last few weeks, as we scrambled to pack up our house (we’re renting out it while we’re away), and then of course, the roadtrip to Colorado to start the first of my three studies of software development processes at climate modeling centres. This week, I’m at the CCSM workshop, and will post some notes about the workshop in the next few days. But first, a chance for some reflection.

Ten years ago, when I quit NASA, I was offered a faculty position in Toronto with immediate tenure. The offer was too good to turn down: it’s a great department, with a bunch of people I really wanted to work with. I was fed up of the NASA bureaucracy, the short term-ism of the annual budget cycle, and (most importantly) a new boss I couldn’t work with. A tenured academic post was the perfect antidote – I could focus on long-term research problems that interested me most, without anyone telling me what to study.

(Note: Lest any non-academics think this is an easy life, think again. I spend far more time chasing research funding than actually doing research, and I’m in constant competition with an entire community of workaholics with brilliant minds. It’s bloody hard work)

Tenure is an interesting beast. It’s designed to protect a professor’s independence and ability to pursue long term research objectives. It also preserves the integrity of academic researchers: if university administrators, politicians, funders, etc find a particular set of research results to be inconvenient, they cannot fire, or threaten to fire the professors responsible. But it’s also limited. While it ought to protect curiosity-driven research from the whims of political fashions, it only protects the professor’s position (and salary), not the research funding needed for equipment, travel, students, etc. But the important thing is that tenure gives the professor the freedom to direct her own research programme and the freedom to decide what research questions to tackle.

Achieving tenure is often a trial by fire, especially in the top universities. After demonstrating your research potential by getting a PhD, you then compete with other PhDs to get a tenure-track position. You have to maintain a sustained research program over six to seven years as a junior professor, publishing regularly in the top journals in your field, and gaining the attention of the top people in your field who might be asked to write letters of support for your tenure case. In judging tenure cases, the trajectory and sustainability of the research programme is taken into account – a publication record that appears to be slowing down over the pre-tenure period is a big problem; if you have several papers in a row rejected, especially towards the end of the pre-tenure period, it might be hard to put together a strong tenure case. The least risky route is to stick with the same topic you studied in your PhD, where you already have the necessary background and where you presumably have also ‘found’ your community.

The ‘finding your community’ part is crucial. Scientific research is very much a community endeavor; the myth of the lone scientist in the lab is dead wrong. You have to figure out early in your research career which subfield you belong in, and get to know the other researchers in that subfield, in order to have your own research achievements recognized. Moving around between communities, or having research results scattered across different communities might mean there is no-one who is familiar enough with your entire body of research to write you a strong letter of support for tenure.

The problem is, of course, that this system trains professors to pick a subfield and stick with it. It tends to stifle innovation, and means that many professors then just continue to work on the same problems throughout the rest of their careers. There’s a positive side to this: some hard scientific problems really do need decades of study to master. On the other hand, most of the good ideas come from new researchers – especially grad students and postdocs; many distinguished scientists did their best work when they were in their twenties, when they were new to the field, and were willing to try out new approaches and question conventions.

To get the most value out of tenure, professors should really use it to take risks: to change fields, to tackle new problems, and especially to do research they they couldn’t do when they were chasing tenure. A good example is inter-disciplinary research. It’s hard to do work that spans several recognizable disciplines when you’re chasing tenure – you have to get tenure in a single university department, which usually means you have to be well established in a single discipline. Junior researchers interested in inter-disciplinary research are always at a disadvantage compared to their mono-disciplinary colleagues. But once you make tenure, this shouldn’t matter any more.

The problem is that changing your research direction once you’re an established professor is incredibly hard. This was my experience when I decided a few years ago to switch my research from traditional software engineering questions to the issue of climate change. It meant walking away from an established set of research funding sources, and an established research community, and most especially from an established set of collaborative relationships. The latter I think was particularly hard – colleagues with whom I’ve worked closely for many years still assume I’m interested in the same problems that we’ve always worked on (and, in many ways I still am – I’m trained to be interested in them!). I’m continually invited to co-author papers, to review papers and research proposals, to participate in grant proposals, and to join conference committees in my old field. But to give myself the space to do something very different, I’ve had to be hardheaded and say no to nearly all such invitations. It’s hard to do this without also offending people (“what do you mean you’re no longer interested in this work we’ve devoted our careers to?”). And it’s hard to start over, especially as I need to find new sources of funding, and new collaborators.

One of the things I’ve had to think carefully about is how to change research areas without entirely cutting off my previous work. After many years working on the same set of problems, I believe I know a lot about them, and that knowledge and experience ought to be useful. So I’ve tried to carve out a new research area that allows me to apply ideas that I’ve studied before to an entirely new challenge problem – a change of direction if you like, rather than a complete jump. But it’s enough of a change that I’ve had to find a new community to collaborate with. And different venues to publish in.

Personally, I think this is what the tenure system is made for. Tenured professors should make use of the protection that tenure offers to take risks, and to change their research direction from time to time. And most importantly, to take the opportunity to tackle societal grand challenge problems – the big issues where inter-disciplinary research is needed.

And unfortunately, just about everything about the tenure system and the way university departments and scientific communities operate discourages such moves. I’ve been trying to get many of my old colleagues to apply themselves to climate change, as I believe we need many more brains devoted to the problem. But very few of my colleagues are interested in switching direction like this. Tenure should facilitate it, but in practice, the tenure system actively discourages it.

Congratulations to Jorge, who passed the first part of his PhD thesis defense yesterday with flying colours. Jorge’ thesis is based on a whole series of qualitative case studies of different software development teams (links go to ones he’s already published):

  • 7 successful small companies (under 50 employees) in the Toronto region;
  • 9 scientific software development groups, in an academic environment;
  • 2 studies of large companies (IBM and Microsoft);
  • 1 detailed comparative study of a company using Extreme Programming (XP) versus a similar sized company that uses more traditional development process (both building similar types of software for similar customers);

We don’t have anywhere near enough detailed case studies in software engineering – most claims for the effectiveness of various approaches to software development are based on little more than marketing claims and anecdotal evidence. There has been a push in the last decade or so for laboratory experiments, which are usually conducted along the lines of experiments in psychology: recruit a set of subjects, assign them a programming task, and measure the difference in variables like productivity or software quality when half of them are given some new tool or technique. While these experiments are sometimes useful for insights into how individual programmers work on small tasks, they really don’t tell us much about software development in the wild, where, as Parnas puts it, the interesting challenges are in multi-person development of multi-version software over long time scales. Jorge cites a particular example in his thesis of a controlled study of pair programming, which purports to show that pair programming lowers productivity. Except that it shows no such thing – any claimed benefits of pair programming are unlikely to emerge with subjects who are put together for a single day, but who otherwise have no connection with one another, and no shared context (like, for example, a project they are both committed to).

Each of Jorge’s case studies is interesting, but to me, the theory he uses them to develop is even more interesting. He starts by identifying three different traditions to the study of software development:

  • The process view, in which software construction is treated like a production line, and the details of the individuals and teams who do the construction are abstracted away, allowing researchers to talk about processes and process models, which, it is assumed, can be applied in any organizational context to achieve a predictable result. This view is predominant in the SE literature. The problem, of course, is that the experience and skills of individuals and teams do matter, and the focus on processes is a poor way to understand how software development works.
  • The information flow view, in which much of software development is seen as a problem in sharing information across software teams. This view has become popular recently, as it enables the study of electronic repositories of team communications as evidence of interaction patterns across the team, and leads to a set of theories abut how well patterns of communication acts match the technical dependencies in the software. The view is appealing because it connects well with what we know about interdependencies within the software, where clean interfaces and information hiding are important. Jorge argues that the problem with this view is that it fails to distinguish between successful and unsuccessful acts of communication. It assumes that communication is all about transmitting and receiving information, and it ignores problems in reconstructing the meaning of a message, which is particularly hard when the recipient is in a remote location, or is reading it months or years later.
  • The third view is that software development is largely about the development of a shared understanding within teams. This view is attractive because it takes seriously the intensive cognitive effort of software construction, and emphasizes the role of coordination, and the way that different forms of communication can impact coordination. It should be no surprise that Jorge and I both prefer this view.

Then comes the most interesting part. Jorge points out that software teams need to develop a shared understanding of goals, plans, status and context, and that four factors will strongly impact their success in this: proximity (how close the team members are to each other – being in the same room is much more useful than being in different cities), synchrony (talking to each other in (near) realtime is much more useful than writing documents to be read at some later time); symmetry (which means the coordination and information sharing is done best by the people whom it most concerns, rather than imposed by, say, managers) and maturity (it really helps if a team has an established set of working relationships and a shared culture).

This theory leads to a reconceptualization of many aspects of software development, such as the role of tools, the layout of physical space, the value of documentation, and the impact of growth on software teams. But you’ll have to read the thesis to get the scoop on all these…

A wonderful little news story spread quickly around a number of contrarian climate blogs earlier this week, and of course was then picked up by several major news aggregators: a 4th grader in Beeville, Texas had won the National Science Fair competition with a project entitled “Disproving Global Warming”. Denialists rubbed their hands in glee. Even more deliciously, the panel of judges included Al Gore.

Wait, what? Surely that can’t be right? Now, anyone who considers herself a skeptic would have been immediately, well, skeptical. But apparently that word no longer means what it used to mean. It took a real scientist to ask the critical questions, and investigate the source of the story: Michael Tobis took the time to drive to Beeville to investigate, as the story made no sense. And sure enough, there’s a letter that’s clearly on fake National Science Foundation letterhead, with no signature, and sure enough, the NSF have no knowledge of it. Oh, and of course, a quick google search shows that there is no such thing as a national science fair. Someone faked the whole thing (and the good folks at Reddit then dug up plenty of evidence about who).

So, huge kudos to MT for doing what journalists are supposed to do. And kudos to Sarah Taylor, the journalist who wrote the original story, for doing a full followup, once she found out it was a hoax. But this story just begs the question: how come, now that we live in such an information rich age, so few people can be bothered to check out the evidence about anything any more? Traditional investigative journalism is almost completely dead. The steady erosion of revenue from print journalism means most newspapers do little more than reprint press releases – most of them no longer retain science correspondents at all. And if traditional journalism isn’t doing investigative reporting any more, who will? Bloggers? Many bloggers like to think of themselves as “citizen journalists”. But few bloggers do anything more than repeat stuff they found on the internet, along with strident opinion on it. As Balbulican puts it: Are You A “Citizen Journalist”, or Just An Asshole?

Oh, and paging all climate denialists. Go take some science courses and learn what skepticism really means.