I had lunch last week with Gerhard Fischer at the University of Colorado. Gerhard is director of the center for lifelong learning and design, and his work focusses on technologies that help people to learn and design solutions to suit their own needs. We talked a lot about meta-design, especially how you create tools that help domain experts (who are not necessarily software experts) to design their own software solutions.

I was describing some of my observations about why climate scientists prefer to write their own code rather than delegating it to software professionals, when Gerhard put it into words brilliantly. He said “You can’t delegate ill-defined problems to software engineers”. And that’s the nub of it. Much (but not all) of the work of building a global climate model is an ill-defined problem. We don’t know at the outset what should go into the model, which processes are important, how to simulate complex physical, chemical and biological processes and their interactions. We don’t know what’s computationally feasible (until we try it). We don’t know what will be scientifically useful. So we can’t write a specification, nor explain the requirements to someone who doesn’t have a high level of domain expertise. The only way forward is to actively engage in the process of building a little, experimenting with it, reflecting on the lessons learnt, and then modifying and iterating.

So the process of building a climate model is a loop of build-explore-learn-build. If you put people into that loop who don’t have the necessary understanding of the science being done with the models, then you slow things down. And as the climate scientists (mostly) have the necessary  technical skills, it’s quicker and easier to write their own code than to explain to a software engineer what is needed. But there’s a trade-off: the exploratory loop can be traversed quickly, but the resulting code might not be very robust or modifiable. Just as in agile software practices, the aim is to build something that works first, and worry about elegant design later. And that ‘later’ might never come, as the next scientific question is nearly always more alluring than a re-design. Which means the main role for software engineers in the process is to do cleanup operations. Several of the software people I’ve interviewed in the last few months at climate modeling labs described their role as mopping up after the parade (and some of them used more colourful terms than that).

The term meta-design is helpful here, because it specifically addresses the question of how to put better design tools directly into the hands of the climate scientists. Modeling frameworks fit into this space, as do domain specific-languages. But I’m convinced that there’s a lot more scope for tools that raise the level of abstraction, so that modelers can work directly with meaningful building blocks than lines of Fortran. And there’s another problem. Meta-design is hard. Too often it produces tools that just don’t do what the target users want. If we’re really going to put better tools into the hands of climate modelers, then we need a new kind of expertise to build such tools: a community of meta-designers who have both the software expertise and the domain expertise in earth sciences.

Which brings me to another issue that came up in the discussion. Gerhard provided me a picture that helps me explain the issue better (I hope he doesn’t mind me reproducing it here; it comes from his talk “Meta-Design and Social Creativity” given at IEMC 2007):

To create reflective design communities, the software professionals need to acquire some domain expertise, and the domain experts need to acquire some software expertise (diagram by Gerhard Fischer)

Clearly, collaboration between software experts and climate scientists is likely to work much better if each acquires a little of the other’s expertise, if only to enable them to share some vocabulary to talk about the problems. It reduces the distance between them.

At climate modeling labs, I’ve met a number both kinds of people – i.e. climate scientists who have acquired good software knowledge, and software professionals who have acquired good climate science knowledge. But it seems to me that for climate modeling, one of these transitions is much easier than the other. It seems to be easier for climate scientists to acquire good software skills than it is for software professionals (with no prior background in the earth sciences) to acquire good climate science domain knowledge. That’s not to say it’s impossible, as I have met a few people who have followed this path (but they are rare). It seems to require many years of dedicated work. And there appears to be a big disincentive for many software professionals, as it turns them from generalists into specialists. If you dedicate several years to developing the necessary domain expertise in climate modeling, it probably means you’re committing the rest of your career to working in this space. But the pay is lousy, the programming language of choice is uncool, and mostly you’ll be expected to clean up after the parade rather than star in it.

I went to a workshop earlier this week on “the Future of Software Engineering Research” in Santa Fe. My main excuse to attend was to see how much interest I could raise in getting more software engineering researchers to engage in the problem of climate change – I presented my paper “Climate Change: A Software Grand Challenge“. But I came away from the workshop with very mixed feelings. I met some fascinating people, and had very interesting discussions about research challenges, but overall, the tone of the workshop (especially the closing plenary discussion) seemed to be far more about navel-gazing and doing “more of the same”, rather than rising to new challenges.

The break-out group I participated in focussed on the role of software in addressing societal grand challenges. We came up with a brief list of such challenges: Climate Change; Energy; Safety & Security; Transportation; Health and Healthcare; Livable Mega-Cities. In all cases, we’re dealing with complex systems-of-systems, with all the properties laid out in the SEI report on Ultra-Large Scale Systems – decentralized systems with no clear ownership; systems that undergo continuous evolution while they are being used (you can’t take the system down for maintenance and upgrades); systems built from heterogeneous elements that are constructed at different times by different communities for different purposes; systems where traditional distinctions between developers and users disappear, as the human activity and technical functionality intertwine. And systems where the “requirements” are fundamentally unknowable – these systems simultaneously serve multiple purposes for multiple communities.

I’ve argued in the past that really all software is like this, but that we pretend otherwise by drawing boundaries around small pieces of functionality so that we can ignore the uncertainties in the broader social system in which it will be used. Traditional approaches to software engineering work when we can get away with this game – on those occasions when it’s possible to get local agreement about a specific set of software functions that will help solve a local problem. The fact that software engineers tend to insist on writing a specification is a symptom that they are playing this game. But such agreements/specifications are always local and temporary, which means that software built in this way is frequently disappointing or frustrating to use.

So, for societal grand challenge problems, what is the role of software engineering research, and what kinds of software engineering might be effective? In our break-out group, we talked a lot about examples of emergent successful systems such as Facebook and Wikipedia (and even the web itself), which were built not by any recognizable software development process, but by small groups of people incrementally adding to an evolving infrastructure, each nudging it a little further down an interesting road. And by frequently getting it wrong, and seeking continual improvement when things do go wrong. Software innovation is then an emergent feature in these endeavours, but it is the people and the way they collaborate that matters, rather than any particular approach to software development.

Obviously, software alone cannot solve these societal grand challenges, but software does have a vital role to play: good software infrastructure can catalyze the engagement of multiple communities, who together can tackle the challenges. In our break-out group, we talked specifically about healthcare and climate change – in both cases there are lots of individuals and communities with ideas and enthusiasm, but who are hampered by socio-technical barriers: lack of data exchange standards, lack of appropriate organizational structures, lack of institutional support, lack of a suitable framework for exploratory software development, tools that ignore key domain concepts. It seems increasingly clear that typical governmental approaches to information systems will not solve these problems. You can’t just put out a call for tender and commission construction of an ultra-large scale system; you have to evolve it from multiple existing systems. Witness repeated failures of efforts around shared health records, carbon accounting systems, etc. But governments do need to create the technical infrastructure and nurture the coming together of inter-disciplinary communities to address these challenges, and strategic funding of trans-disciplinary research projects is a key element.

But what was the response at the workshop to these issues? The breakout groups presented their ideas back to the workshop plenary on the final afternoon, and the resulting discussion was seriously underwhelming. Several people (I could characterize them as the “old guard” in the software engineering research community) stood up to speak out against making the field more inter-disciplinary. They don’t want to see the “core” of the field diluted in any way. There were some (unconvincing) arguments that software engineering research has had a stronger impact than most people acknowledge. And a long discussion that the future of software engineering research lies in stronger ties between academic and industrial software engineering. Never mind that increasingly, software is developed outside the “software industry”: e.g. open source projects, scientific software, end-user programmers, community engagement, and of course college students building web tools that go on to take the internet world by storm. All this is irrelevant to the old guard – they want to keep on believing that the only software engineering that matters is that which can be built to a specification by a large software company.

I came away from the workshop with the feeling that this community is in the process of dooming itself to irrelevancy. But then, as was pointed out to me over lunch today, the people who have done the best under the existing system are unlikely to want to change it. Innovation in software research won’t come from the distinguished senior people in the field…

Here are some climate model coding standards that I’ve collected over the last few months:

It’s encouraging that most modelling centres have developed detailed coding standards, but it’s a shame that most of them had to roll their own. The PRISM project is an exception – as many of the modelling labs across Europe were members of the PRISM project, some of these labs now use the PRISM coding rules.

Two followup tasks I hope to get to soon – (1) analyze how much these different standards overlap/differ, and (2) measure how much the model codes adhere to the standards.

16/11/2010 Update: The UK Met Office standard was an old version that was never publically released, so I’ve removed the link, at the request of the UKMO. I’ll post a newer version if I can sort out the permissions. I’ve added MPI-M’s ICON standards to the list.

Reading through the schedule for the AGU fall meeting this December, I came across the following session, scheduled for the final day of the conference (Dec 17). What a great line-up of speakers (I’ve pasted in the abstracts, as they’re hard to link to on the AGU’s meeting schedule):

U52A Climate Change Adaptation:

  • 10:20AM Jim Hansen (NASA) “State of Climate Change Science: Need for Adaptation and Mitigation” (Invited)
    Observations of on-going climate change, paleoclimate data, and climate simulations all concur: human-made greenhouse gases have set Earth on a path to climate change with dangerous consequences for humanity. We show that the matter is urgent and a moral issue that pits the rich and powerful against the young and unborn, against the defenseless, and against nature. Adaptation can only partially ameliorate the effects, as governments are failing to protect the public interest and failing in their duty to provide young people equal protection of the laws. We quantify the reduction pathway for fossil fuel emissions that is required to restore Earth’s energy balance and stabilize climate. We show that rapid changes in emission pathways are essential to avoid morally unacceptable adaptation requirements.
  • 10:50AM Richard Alley (Penn State U) “Ice in the Hot Box—What Adaptation Challenges Might We Face?” (Invited)
    Warming is projected to reduce ice, despite the tendency for increased precipitation. The many projected impacts include amplification of warming, sea-ice shrinkage opening seaways, and loss of water storage in snowpacks. However, sea-level rise may combine the largest effects with the greatest uncertainties. Rapid progress in understanding ice sheets has not yet produced projections with appropriately narrow uncertainties and high confidence to allow detailed planning. The range of recently published scaling arguments and back-of-the-envelope calculations is wide but often includes 1 m of rise this century. Steve Schneider’s many contributions on dangerous anthropogenic influence and on decision-making in the face of uncertainty help provide context for interpreting these preliminary and rapidly evolving results.
  • 11:10AM Ken Caldeira (Stanford) Adaptation to Impacts of Greenhouse Gases on the Ocean (Invited)
    Greenhouse gases are producing changes in ocean temperature and circulation, and these changes are already adversely affecting marine biota. Furthermore, carbon dioxide is absorbed by the oceans from the atmosphere, and this too is already adversely affecting some marine ecosystems. And, of course, sea-level rise affects both what is above and below the waterline.
    Clearly, the most effective approach to limit the negative impacts of climate change and acidification on the marine environment is to greatly diminish the rate of greenhouse gas emissions. However, there are other measures that can be taken to limit some of the negative effects of these stresses in the marine environment.
    Marine ecosystems are subject to multiple stresses, including overfishing, pollution, and loss of coastal wetlands that often serve as nurseries for the open ocean. The adaptive capacity of marine environments can be improved by limiting these other stresses.
    If current carbon dioxide emission trends continue, for some cases (e.g., coral reefs), it is possible that no amount of reduction in other stresses can offset the increase in stresses posed by warming and acidification. For other cases (e.g., blue-water top-predator fisheries), better fisheries management might yield improved population health despite continued warming and acidification.
    In addition to reducing stresses so as to improve the adaptive capacity of marine ecosystems, there is also the issue of adaptation in human communities that depend on this changing marine environment. For example, communities that depend on services provided by coral reefs may need to locate alternative foundations for their economies. The fishery industry will need to adapt to changes in fish abundance, timing and location.
    Most of the things we would like to do to increase the adaptive capacity of marine ecosystems (e.g., reduce fishing pressure, reduce coastal pollution, preserve coastal wetlands) are things that would make sense to do even in the absence of threats from climate change and ocean acidification. Therefore, these measures represent “no regrets” policy options for the marine environment.
    Nevertheless, even with adaptive policies in place, continued greenhouse gas emissions increasingly risk damaging marine ecosystems and the human communities that depend on them.
  • 11:30AM Alan Robock (Rutgers) Geoengineering and adaptation
    Geoengineering by carbon capture and storage (CCS) or solar radiation management (SRM) has been suggested as a possible solution to global warming. However, it is clear that mitigation should be the main response of society, quickly reducing emissions of greenhouse gases. While there is no concerted mitigation effort yet, even if the world moves quickly to reduce emissions, the gases that are already in the atmosphere will continue to warm the planet. CCS, if a system that is efficacious, safe, and not costly could be developed, would slowly remove CO2 from the atmosphere, but this will have a gradual effect on concentrations. SRM, if a system could be developed to produce stratospheric aerosols or brighten marine stratocumulus clouds, could be quickly effective in cooling, but could also have so many negative side effects that it would be better not do it at all. This means that, in spite of a concerted effort at mitigation and to develop CCS, there will be a certain amount of global warming in our future. Because CCS geoengineering will be too slow and SRM geoengineering is not a practical or safe solution to geoengineering, adaptation will be needed. Our current understanding of geoengineering makes it even more important to focus on adaptation responses to global warming.
  • 11:50AM Olga Wilhelmi (NCAR) Adaptation to heat health risk among vulnerable urban residents: a multi-city approach
    Recent studies on climate impacts demonstrate that climate change will have differential consequences in the U.S. at the regional and local scales. Changing climate is predicted to increase the frequency, intensity and impacts of extreme heat events prompting the need to develop preparedness and adaptation strategies that reduce societal vulnerability. Central to understanding societal vulnerability, is population’s adaptive capacity, which, in turn, influences adaptation, the actual adjustments made to cope with the impacts from current and future hazardous heat events. To-date, few studies have considered the complexity of vulnerability and its relationship to capacity to cope with or adapt to extreme heat. In this presentation we will discuss a pilot project conducted in 2009 in Phoenix, AZ, which explored urban societal vulnerability and adaptive capacity to extreme heat in several neighborhoods. Household-level surveys revealed differential adaptive capacity among the neighborhoods and social groups. In response to this pilot project, and in order to develop a methodological framework that could be used across locales, we also present an expansion of this project into Houston, TX and Toronto, Canada with the goal of furthering our understanding of adaptive capacity to extreme heat in very different urban settings. This presentation will communicate the results of the extreme heat vulnerability survey in Phoenix as well as the multidisciplinary, multi- model framework that will be used to explore urban vulnerability and adaptation strategies to heat in Houston and Toronto. We will outline challenges and opportunities in furthering our understanding of adaptive capacity and the need to approach these problems from a macro to a micro level.
  • 12:05PM Anthony Socci (US EPA) An Accelerated Path to Assisting At-Risk Communities Adapt to Climate Change
    Merely throwing money at adaptation is not development. Nor can the focus of adaptation assistance be development alone. Rather, adaptation assistance is arguably best served when it is country- or community-driven, and the overarching process is informed and guided by a set of underlying principles or a philosophy of action that primarily aims at improving the lives and livelihoods of affected communities.
    In the instance of adaptation assistance, I offer the following three guiding principles: 1. adaptation is at its core, about people; 2. adaptation is not merely an investment opportunity or suite of projects but a process, a lifestyle; and 3. adaptation cannot take place by proxy; nor can it be imposed on others by outside entities.
    With principles in hand, a suggested first step toward action is to assess what resources, capacity and skills one is capable of bringing to the table and whether these align with community needs. Clearly issues of scale demand a strategic approach in the interest of avoiding overselling and worse, creating false expectations. And because adaptation is a process, consider how best to ensure that adaptation activities remain sustainable by virtue of enhancing community capacity, resiliency and expertise should assistance and/or resources dwindle or come to an end.
    While not necessarily a first step, community engagement is undoubtedly the most critical element in any assistance process, requiring sorting out and agreeing upon terms of cooperation and respective roles and responsibilities, aspects of which should include discussions on how to assess the efficacy of resource use, how to assess progress, success or outcomes, what constitutes same, and who decides. It is virtually certain that adaptation activities are unlikely to take hold or maintain if they are not community led, community driven or community owned. There is no adaptation by proxy or fiat.
    It’s fair to ask at this point, how might one know what communities and countries need, what and where the opportunities are to assist countries and communities in adapting to climate change, and how might one get started? One of the most effective and efficient ways of identifying community/country needs, assistance opportunities and community/country entry points is to search the online archive of National Adaptation Programmes of Action (NAPAs) that many of the least developed countries have already assembled in conformance with the UNFCCC process. Better still perhaps, consider focusing on community-scale assessments and adaptation action plans that have already been compiled by various communities seeking assistance as national plans are unlikely to capture the nuances and variability of community needs. Unlike NAPAs, such plans are not archived in a central location. Yet clearly, community-scale plans in particular, not only represent an assessment of community needs and plans, presumptively crafted by affected communities, but also represent opportunities to align assistance resources and capacity with community needs, providing the basis for engaging affected communities in an accelerated process. Simply stated, take full advantage of the multitude of assessment and planning efforts that communities have already engaged in on their own behalf.

After an exciting sabbatical year spent visiting a number of climate modeling centres, I’ll be back to teaching in January. I’ll be introducing two brand new courses, both related to climate modeling. I already blogged about my new grad course on “Climate Change Informatics”, which will cover many current research issues to do with software and data in climate science.

But I didn’t yet mention my new undergrad course. I’ll be teaching a 199 course in January, which I’ve never done before. 199 courses are first-year seminar courses, open to all new students across the faculty of arts and science, intended to encourage critical thinking, communication and research skills. They are run as small group seminar courses (enrolment is capped at 24 students). I’ve never taught one of these courses before, so I’ve no idea what to expect – I’m hoping for an interesting mix of students with different backgrounds, so we can spend some time attacking the theme of the course from different perspectives. Here’s my course description:

“Climate Change: Software, Science and Society”

This course will examine the role of computers and software in understanding climate change. We will explore the use of computer models to build simulations of the global climate, including a historical view of the use of computer models to understand weather and climate, and a detailed look at the current state of computer modelling, especially how global climate models are tested, what kinds of experiments are performed with them, how scientists know they can trust the models, and how they deal with uncertainty. The course will also explore the role of computer models in helping to shape society’s responses to climate change, in particular, what they can (and can’t) tell us about how to make effective decisions about government policy, international treaties, community action and the choices we make as individuals. The course will take a cross-disciplinary approach to these questions, looking at the role of computer models in the physical sciences, environmental science, politics, philosophy, sociology and economics of climate change. However, students are not expected to have any specialist knowledge in any of these fields prior to the course.

If all goes well, I plan to include some hands-on experimentation with climate models, perhaps using EdGCM (or even CESM if I can simplify the process of installing it and running it for them). We’ll also look at how climate models are perceived in the media and blogosphere (that will be interesting!) and compare these perceptions to what really goes on in climate modelling labs. Of course, the nice thing about a small seminar course is that I can be flexible about responding to the students’ own interests. I’m really looking forward to this…

Here’s a very nice video explaining the basics of how climate models work, produced by the folks at IPSL in Paris. This version is French with English subtitles – for the francophones out there, you’ll notice the narration is a little more detailed than the subtitles. I particularly like bit where the earth grid is unpeeled and fed into the supercomputers:

[Qt:http://www.cs.toronto.edu/~sme/movies/ipsl-modeling-med.mov 480 360]

The original (without the English subtitles) is here: http://www.youtube.com/user/CEADSMCOM

In my last post, I described our firsthand experience of flooding in Venice, and pondered the likely impact of climate change on Venice in the future. But that wasn’t our only firsthand experience of the impacts of climate change on our travels this summer. Having visited NCAR in July this year, we decided to come back to Boulder for the rest of the fall, to give me a chance to do more followup interviews with the NCAR folks, while I write up the findings from my studies of the software development processes for climate models.

Back in August I found a great house for us to rent, up in the mountains at Gold Hill. Shortly after I paid the deposit, I discovered the house was right in the middle of one of the most devastating forest fires in Colorado’s history. The fire, now known as the Fourmile Canyon fire, started on September 6, 2010, burned for over a week, affecting 6,181 acres, and destroying 169 homes. In terms of acreage, it wasn’t the biggest fire ever in Colorado, but in terms of destruction of property and damage costs, it was the worst ever.

I first heard about the fire while I was attending the Surface Temperature Record workshop in Exeter in September, and only then because of a conversation at dinner with some of the NCAR folks whose homes were in the evacuation zone. We spent the next few days wondering whether we’d have somewhere to live after all this fall, and trying to trace the path of the fire on various collaborative maps created by those on the scene. Not that we were affected anywhere near as much as the people who were evacuated, many of whom lost their homes and everything in them. But it gave us a taste of the impact of these massive forest fires on the communities who are affected.

Amazingly, the house we’re renting survived, even though several of the neighbouring houses burned down. Indeed, it seems amazing just how random the fire was – several patches of ground a few hundred yards from our house have been burned, but almost everything we can see from the house is untouched. The satellite images seemed to show huge areas completely devastated, but in reality, the affected area is now a real patchwork of healthy trees and burned sections.

Burned trees on Sunshine Canyon Drive

But this patchwork effect is actually easy to understand once you build a good computational model. I particularly like this NCAR simulation of the spread of forest fire. Notice how the prevailing winds (shown by the arrows) push the fire forward, but also how the updraft from the fire affects the wind pattern to the sides and in the path of the fire, effectively funnelling it into a narrower and narrower path. This certainly corresponds to the stripes of fire damage now visible in the area of the Fourmile Canyon fire, and explains why the fire damage seems so patchy.

Patches of burned and unburned trees, from Sunshine Canyon Drive

Patches of burned and unburned trees, from Sunshine Canyon Drive

As this fire was unusually large by Colorado standards, I wondered about the impact of climate change. In particular, I thought the damage caused by Mountain Pine Beetles might be to blame. When we drove up to Breckenridge in July, the kids noticed that many of the trees were dead, and we googled a little back then to discover it was because the hotter, drier summers were encouraging the spread of the pine beetles, and weakening the trees’ defences. And from a major study published in September this year, it’s clear that climate change is a major factor, and the destruction to pine forests across the North American Rockies will only get worse as climate change progresses.

So I figured all those dead trees would just encourage bigger wildfires. But, as usual with climate change, it’s not that simple. In particular, the areas damaged by fire this year don’t correlate with the areas most damaged by the beetles. It looks like the trees killed by beetles are actually less susceptible to fire, because the needles drop to the forest floor and decompose fairly quickly, while the trees lose the oils that encourage fire in the tree canopy. But although the beetle damage doesn’t cause the fires, climate change affects both, because the hotter drier summers increases both the spread of the beetles and the likelihood of fires.

I took a break from blogging for the last few weeks to take a vacation with the family in Europe. We fell in love with Venice, a city full of charming alleyways and canals, with no wheeled transport of any kind. Part of the charm is the dilapidated, medieval feel to the place – the buildings are subsiding, their facias are crumbing, and most of the city’s infrastructure doesn’t work very well. In fact, given what a sorry state the whole city is in, I’m surprised how much I fell in love with the place.

But one thing we didn’t expect was that Venice flooded while we were there. It turns out that several times a year, particularly during the high spring and autumn tides, the meteorological conditions are such that more water than usual is driven into the lagoon, and the high tide washes over the canal sides, across the sidewalks, and into the houses and shops:

The locals all take this in their stride, don their long boots, and get to work pumping it all out the buildings again. The tourists stand there looking bewildered. But the kids loved it:

High tide in Venice, Oct 5th, 2010

In Venice, it’s just another Acqua Alta. I’d heard about Venice sinking, given that the buildings sit on wooden rafts, which in turn are supported by wooden pillars driven deep into the soft mud on the lagoon bottom. And of course, I know that sea level rise due to global warming threatens many of the world’s coastal cities. But I didn’t realise just how low Venice really is, and the flooding we saw got me thinking again about whether the future is already here.

The last IPCC report forecasts a rise of up to 59cm in sea level rise by the end of this century, due to thermal expansion and melting glaciers. And as we know, the IPCC numbers exclude the contribution of the Greenland and Antarctic ice sheets, which together could be considerably more, and it also fudges the point that sea level rise won’t magically stop in 2100. Which means that Venice, a city that’s around 1500 years old, is very unlikely to survive into the twenty-second century.

But like all attempts to pin down the impacts of climate change, it gets complicated. It turns out that Acqua Alta isn’t a recent thing – it has occurred throughout Venice’s history. Technically, Aqua Alta occurs when the high tide is more than 90cm above the average sea level (actually, the average as was measured in the year 1897, according to wikipedia). In the foreign media, floods in Venice are typically portrayed in breathless terms as a disaster (see HuffPost for some dramatic photos from last winter). The locals don’t see it that way at all, and get furious at these media reports as they damage the tourist trade on which Venice depends almost entirely.

One problem is that the media reports confuse the measures. As I said, the floods are measured in terms of height above a 1897 sea level average. These days, even low tides are often above this baseline too. Here’s the forecast for the next 48 hours:

Venice tides for Oct 29-31, 2010, from http://www.comune.venezia.it/flex/cm/pages/ServeBLOB.php/L/IT/IDPagina/1748

A you can see, the sea level is expected to vary from low tides around 0cm, and high tides in the range 50-75cm. Which is classified as normal high tides for Venice. A high tide of  up to 90cm causes almost no flooding, while one of +150cm floods about 2/3 of the city – this happens once every few years. The confusion in the media is that +150cm is about 5 feet; so the papers duly report Venice as being under 5 feet of water. But really the water is rarely more than ankle deep, as the flooding is only the difference between the canal sides and the high water. On the day we took these photos (5th Oct 2010), the high tide reached about 107cm, enough to flood about 14% of the city, but as you can see, the actual flood is only a few centimeters deep.

But a sea level rise of +50cm due to climate change shifts things so that every high tide will flood a significant proportion of the city. Flooding twice a day throughout the year is a very different proposition from a little light flooding a few times in the spring and fall.

Can Venice be saved? MOSE, a large and controversial flood barrier project, has been under construction for the last few years, and is anticipated to be ready by 2012. It aims to protect Venice with automatic flood barriers around the entrance to the lagoon. The project has been severely criticized both for high cost, for it’s impact on the lagoon ecosystems, and because it doesn’t provide an incremental solution – if sea levels continue to rise, they will overwhelm the barriers, and there’s no obvious way to extend them. The design for the barriers is based on the IPCC projections of up to 60cm sea level rise (although I haven’t been able to find any detailed specifications of exactly what height of tide they will work for). The problem is, if the IPCC reports underestimate sea level rise (and increasingly it looks like they do), then a vast multi-billion dollar project will only buy Venice a few more decades. The techno-optimism of the engineers who designed MOSE seems to be symptomatic a broader mindset when it comes to climate change, which says we can just invent our way out of the problem. It would be nice if it’s correct, but based on the science, I wouldn’t bet on it.

For many decades, computational speed has been the main limit on the sophistication of climate models. Climate modelers have become one of the most demanding groups of users for high performance computing, and access to faster and faster machines drives much of the progress, permitting higher resolution models and more earth system processes being explicitly resolved in the models. But from my visits to NCAR, MPI-M and IPSL this summer, I’m learning that growth in volumes of data handled is increasingly a dominant factor. The volume of data generated from today’s models has grown so much that supercomputer facilities find it hard to handle.

Currently, the labs are busy with the CMIP5 runs that will form one of the major inputs to the next IPCC assessment report. See here for a list of the data outputs required from the models (and note that the requirements were last changed on Sept 17, 2010 -well after most centers have started their runs; after all  it will take months to complete the runs, and the target date for submitting the data is the end of this year)

Climate modelers have requirements that are somewhat different from most other users of supercomputing facilities anyway:

  • very long runs – e.g. runs that take weeks or even months to complete;
  • frequent stop and restart of runs – e.g. the runs might be configured to stop once per simulated year, at which point they generate a restart file, and then automatically restart, so that intermediate results can be checked and analyzed, and because some experiments make use of multiple model variants, initialized from a restart file produced partway through a baseline run.
  • very high volumes of data generated – e.g. the CMIP5 runs currently underway at IPSL generate 6 terabytes per day, and in postprocessing, this goes up to 30 terabytes per day. Which is a problem, given that the NEC SX-9 being used for these runs has a 4 terabyte work disk and a 35 terabyte scratch disk. It’s getting increasingly hard to move the data to the tape archive fast enough.

Everyone seems to have underestimated the volumes of data generated from these CMIP5 runs. The implication is that data throughput rates are becoming a more important factor than processor speed, which may mean that climate computing centres require a different architecture than most high performance computing centres offer.

Anyway, I was going to write more about the infrastructure needed for this data handling problem, but Bryan Lawrence beat me to it, with his presentation to the NSF cyberinfrastructure “data task force”. He makes excellent points about the (lack of) scaleability of the current infrastructure, and the social and cultural issues with questions of how people get credit for the work they put into this infrastructure, and the issues of data curation and trust. Which means the danger is we will create a WORN (write-once, read-never) archive with all this data…!

This will keep me occupied with good reads for the next few weeks – this month’s issue of the Journal Studies in History and Philosophy of Modern Physics is a special on climate modeling. Here’s the table of contents:

Some very provocative titles there. I’m curious to see how much their observations cohere with my own…

I’ve been meaning to write a summary of the V&V techniques used for Earth System Models (ESMs) for ages, but never quite got round to it. However, I just had to put together a piece for a book chapter, and thought I would post it here to see if folks have anything to add (or argue with)).

Verification and Validation for ESMs is hard because running the models is an expensive proposition (a fully coupled simulation run can take weeks to complete), and because there is rarely a “correct” result – expert judgment is needed to assess the model outputs.

However, it is helpful to distinguish between verification and validation, because the former can often be automated, while the latter cannot. Verification tests are objective tests of correctness. These include basic tests (usually applied after each code change) that the model will compile and run without crashing in each of its standard configurations, that a run can be stopped and restarted from the restart files without affecting the results, and that identical results are obtained when the model is run using different processor layouts. Verification would also include the built-in tests for conservation of mass and energy over the global system on very long simulation runs.

In contrast, validation refers to science tests, where subjective judgment is needed. These include tests that the model simulates a realistic, stable climate, given stable forcings, that it matches the trends seen in observational data when subjected to historically accurate forcings, and that the means and variations (e.g. seasonal cycles) are realistic for the main climate variables (E.g. see Phillips et al, 2004).

While there is an extensive literature on the philosophical status of model validation in computational sciences (see for example, Oreskes et al (1994); Sterman (1994); Randall and Wielicki (1997); Stehr (2001)), much of it bears very little relation to practical techniques for ESM validation, and very little has been written on practical testing techniques for ESMs. In practice, testing strategies rely on a hierarchy of standard tests, starting with the simpler ones, and building up to the most sophisticated.

Pope and Davies (2002) give one such sequence for testing atmosphere models:

  • Simplified tests – e.g. reduce 3D equations of motion to 2D horizontal flow (e.g. a shallow water testbed). This is especially useful if the reduction has an analytical solution, or if a reference solution is available. It also permits assessment of relative accuracy and stability over a wide parameter space, and hence is especially useful when developing new numerical routines.
  • Dynamical core tests – test for numerical convergence of the dynamics with physical parameterizations replaced by a simplified physics model (e.g. no topography, no seasonal or diurnal cycle, simplified radiation).
  • Single-column tests – allows testing of individual physical parameterizations separately from the rest of the model. A single column of data is used, with horizontal forcing prescribed from observations or from idealized profiles. This is useful for understanding a new parameterization, and for comparing interaction between several parameterizations, but doesn’t cover interaction with large-scale dynamics, nor interaction with adjacent grid points. This type of test also depends on availability of observational datasets.
  • Idealized aquaplanet – test the fully coupled atmosphere-ocean model, but with idealized sea-surface temperatures at all grid points. This allows for testing of numerical convergence in the absence of complications of orography and coastal effects.
  • Uncoupled model components tested against realistic climate regimes – test each model component in stand-alone mode, with a prescribed set of forcings. For example, test the atmosphere on its own, with prescribed sea surface temperatures, sea-ice boundary conditions, solar forcings, and ozone distribution. Statistical tests are then applied to check for realistic mean climate and variability.
  • Double-call tests. Run the full coupled model, and test a new scheme by calling both the old and new scheme at each timestep, but with the new scheme’s outputs not fed back in to the model. This allows assessment of the performance of new scheme in comparison with older schemes.
  • Spin-up tests. Run the full ESM for just a few days of simulation (typically between 1 and 5 days of simulation), starting from an observed state. Such tests are cheap enough that they can be run many times, sampling across the initial state uncertainty. Then the average of a large number of such tests can be analyzed (Pope and Davies suggest that 60 is enough for statistical significance). This allows the results from different schemes to be compared, to explore differences in short term tendencies.

Whenever a code change is made to an ESM, in principle, an extensive set of simulation runs are needed to assess whether the change has a noticeable impact on the climatology of the model. This in turn requires a sub jective judgment for whether minor variations constitute acceptable variations, or whether they add up to a significantly different climatology.

Because this testing is so expensive, a standard shortcut is to require exact reproducibility for minor changes, which can then be tested quickly through the use of bit comparison tests . These are automated checks over a short run (e.g. a few days of simulation time) that the outputs or restart files of two different model configurations are identical down to the least significant bits. This is useful for checking that a change didn’t break anything it shouldn’t, but requires that each change be designed so that it can be “turned off” (e.g. via run-time switches) to ensure previous experiments can be reproduced. Bit comparison tests can also check that different configurations give identical results. In effect, bit reproducibility over a short run is a proxy for testing that two different versions of the model will give the same climate over a long run. It’s much faster than testing the full simulations, and it catches most (but not all) errors that would affect the model climatology.

Bit comparison tests do have a number of drawbacks, however, in that they restrict the kinds of change that can be made to the model. Occasionally, bit reproducibility cannot be guaranteed from one version of the model to another, for example when there is a change of compiler, change of hardware, a code refactoring, or almost any kind of code optimization. The decision about whether to insist on bit reproducibility, or whether to allow it to be broken from one version of the model to the next, is a difficult trade-off between flexibility and ease of testing.

A number of simple practices can be used to help improve code sustainability and remove coding errors. These include running the code through multiple compilers, which is effective because different compilers give warnings about different language features, and some allow poor or ambiguous code which others will report. It’s better to identify and remove such problems when they are first inserted, rather than discover later on that it will takes months of work to port the code to a new compiler.

Building conservation tests directly into the code also helps. These would typically be part of the coupler, and can check the global mass balance for carbon, water, salt, atmospheric aerosols, and so on. For example the coupler needs to check that water flowing from rivers enters the ocean; that the total mass of carbon is conserved as it cycles through atmosphere, oceans, ice, vegetation, and so on. Individual component models sometimes neglect such checks, as the balance isn’t necessarily conserved in a single component. However, for long runs of coupled models, such conservation tests are important.

Another useful strategy is to develop a verification toolkit for each model component, and for the entire coupled system. These contain a series of standard tests which users of the model can run themselves, on their own platforms, to confirm that the model behaves in the way it should in the local computation environment. They also provide the users with a basic set of tests for local code modifications made for a specific experiment. This practice can help to overcome the tendency of model users to test only the specific physical process they are interested in, while assuming the rest of the model is okay.

During development of model components, informal comparisons with models developed by other research groups can often lead to insights in how to improve the model, and also as a method for confirming and identifying suspected coding errors. But more importantly, over the last two decades, model intercomparisons have come to play a critical role in improving the quality of ESMs through a series of formally organised Model Intercomparison Projects (MIPs).

In the early days, these projects focussed on comparisons of the individual components of ESMs, for example, the Atmosphere Model Intercomparison Project (AMIP), which began in 1990 (Gates, 1992). But by the time of the IPCC second assessment report, there was a widespread recognition that a more systematic comparison of coupled models was needed, which led to the establishment of the Coupled Model Intercomparison Pro jects (CMIP), which now play a central role in the IPCC assessment process (Meehl et al, 2000).

For example, CMIP3, which was organized for the fourth IPCC assessment, involved a massive effort by 17 modeling groups from 12 countries with 24 models (Meehl et al, 2007). As of September 2010, the list of MIPs maintained by the World Climate Research Program included 44 different model intercomparison projects (Pirani, 2010).

Model Intercomparison Projects bring a number of important benefits to the modeling community. Most obviously, they bring the community together with a common purpose, and hence increase awareness and collaboration between different labs. More importantly, they require the participants to reach a consensus on a standard set of model scenarios, which often entails some deep thinking about what the models ought to be able to do. Likewise, they require the participants to define a set of standard evaluation criteria, which then act as benchmarks for comparing model skill. Finally, they also produce a consistent body of data representing a large ensemble of model runs, which is then available for the broader community to analyze.

The benefits of these MIPs are consistent with reports of software benchmarking efforts in other research areas. For example, Sim et al (2003) report that when a research community that builds software tools come together to create benchmarks, they frequently experience a leap forward in research progress, arising largely from the insights gained from the process of reaching consensus on the scenarios and evaluation criteria to be used in the benchmark. However, the definition of precise evaluation criteria is an important part of the benchmark – without this, the intercomparison pro ject can become unfocussed, with uncertain outcomes and without the huge leap forward in progress (Bueler, 2008).

Another form of model intercomparison is the use of model ensembles (Collins, 2007), which increasingly provide a more robust prediction system than single models runs, but which also play an important role in model validation:

  • Multi-model ensembles – to compare models developed at different labs on a common scenario.
  • Multi-model ensembles using variants of a single model – to compare different schemes for parts of the model, e.g. different radiation schemes.
  • Perturbed physics ensembles – to explore probabilities of different outcomes, in response to systematically varying physical parameters in a single model.
  • Varied initial conditions within a single model – to test the robustness of the model, and to better quantify probabilities for predicted climate change signals.

Last week I attended the workshop in Exeter to lay out the groundwork for building a new surface temperature record. My head is still buzzing with all the ideas we kicked around, and it was a steep learning curve for me because I wasn’t familiar with many of the details (and difficulties) of research in this area. In many ways it epitomizes what Paul Edwards terms “Data Friction” – the sheer complexity of moving data around in the global observing system means there are many points where it needs to be transformed from one form to another, each of which requires people’s energy and time, and, just like real friction, generates waste and slows down the system. (Oh, and some of these data transformations seem to generate a lot of heat too, which rather excites the atoms of the blogosphere).

Which brings us to the reasons the workshop existed in the first place. In many ways, it’s a necessary reaction to the media frenzy over the last year or so around alleged scandals in climate science, in which scientists are supposed to be hiding or fabricating data, which has allowed the ignoranti to pretend that the whole of climate science is discredited. However, while the nature and pace of the surface temperatures initiative has clearly been given a shot in the arm by this media frenzy, the roots of the workshop go back several years, and have a strong scientific foundation. Quite simply, scientists have recognized for years that we need a more complete and consistent surface temperature record with a much higher temporal resolution than currently exists. Current long term climatological records are mainly based on monthly summary data. Which is inadequate to meet the needs of current climate assessment, particularly the need for better understanding of the impact of climate change on extreme weather. Most weather extremes don’t show up in the monthly data, because they are shorter term – lasting for a few days or even just a few hours. This is not always true of course; Albert Klein Tank pointed out in his talk that this summer’s heatwave in Moscow occured mainly in a single calendar month, and hence shows up strongly in the monthly record. But in general, that is unusual, and so the worry is that monthly records tend to mask the occurrence of extremes (and hence may conceal trends in extremes).

The opening talks at the workshop also pointed out that the intense public scrutiny puts us in a whole new world, and one that many of the workshop attendees are clearly still struggling to come to terms with. Now, it’s clear that any new temperature record needs to be entirely open and transparent, so that every piece of research based on it could (in principle) be traced all the way back to basic observational records, and to echo the way John Christy put it at the workshop – every step of the research now has to be available as admissible evidence that could stand up in a court of law, because that’s the kind of scrutiny we’re being subjected to. Of course, the problem is that not only isn’t science ready for this (no field of science is anywhere near that transparent), it’s also not currently feasible, given the huge array of data sources being drawn on, the complexities of ownership and access rights, the expectations that much of the data will have high commercial value.

I’ll attempt a summary, but it will be rather long, as I don’t have time to make it any shorter. The slides from the workshop are now all available, and the outcomes from the workshop will be posted soon. The main goals were summarized in Peter Thorne’s opening talk: to create a (longish) list of principles, a roadmap for how to proceed, an identification of any overlapping initiatives so that synergies can be exploited, an agree method to engage with broader audiences (including the general public), and an initial governance model.

Did we achieve that? Well, you can skip to the end and see the summary slides, and judge for yourself. Personally, I thought the results were mixed. One obvious problem is that there is no funding on the table for this initiative, and it’s being launched at a time when everyone is cutting budgets, especially in the UK. Which meant that occasionally it felt like we were putting together a Heath Robinson device (Rube Goldberg to you Americans) – cobbling it together out of whatever we could find lying around. Which is ironic really given that the major international bodies (e.g. WMO) seem to fully appreciate the importance of this. And of course, the fact that it will be a vital part of our ability to assess the impacts of climate change over the next few decades.

Another problem is that the workshop attendees struggled to reach consensus on some of the most important principles. For example, should the databank be entirely open, or does it need a restricted section? The argument for the latter is that large parts of the source data are not currently open, as the various national weather services that collect it charge a fee on a cost recovery basis, and wish to restrict access to non-commercial uses as commercial applications are (in some cases) a significant portion of their operating budgets. The problem is that while the monthly data has been shared freely with international partners for many years, the daily and sub-daily records have not, because these are the basis for commercial weather forecasting services. So an insistence on full openness might mean a very incomplete dataset, which then defeats the purpose, as researchers will continue to use other (private) sources for more complete records.

And what about an appropriate licensing model? Some people argued that the data must be restricted to non-commercial uses, because that’s likely to make negotiations with national weather services easier. But others argued that unrestricted licenses should be used, so that the databank can help to lay the foundation for the development of a climate services industry (which would create jobs, and therefore please governments). [Personally, I felt that if governments really want to foster the creation of such an industry, then they ought to show more willingness to invest in this initiative, and until they do, we shouldn’t pander to them. I’d go for a cc by-nc-sa license myself, but I think I was outvoted]. Again, existing agreements are likely to get in the way: 70% of the European data would not be available if the research-only clause clause was removed.

There was also some serious disagreement about timelines. Peter outlined a cautious roadmap that focussed on building momentum, and delivering the occasional reports and white papers over the next year or so. The few industrial folks in the audience (most notably, Amy Luers from Google) nearly choked on their cookies – they’d be rolling out a beta version of the software within a couple of weeks if they were running the project. Quite clearly, as Amy urged in her talk, the project needs to plan for software needs right from the start, release early, prepare for iteration and flexibility, and invest in good visualizations.

Oh, and there wasn’t much agreement on open source software either. The more software oriented participants (most notably, Nick Barnes, from the Climate Code Foundation) argued strongly that all software, including every tool used to process the data every step of the way should be available as open source. But for many of the scientists, this represented a huge culture change. There was even some confusion about what open source means (e.g. that ‘open’ and ‘free’ aren’t necessarily the same thing).

On the other hand, some great progress was made in many areas, including identifying many important data services, building on lessons learnt from other large climate and weather data curation efforts, offers of help from many of the international partners (including offers of data from NCDC, NCAR, EURO4M, from across Europe and North America, as well as Russia, China, Indonesia, and Argentina). Agreement was clear that version control and good metadata are vital, and need to be planned for right from the start, but also that providing full provenance for each data item is an important long term goal, but cannot be a rule from the start, as we will have to build on existing data sources that come with little or no provenance information. Oh, and I was very impressed with the deep thinking and planning around benchmarking for homogenization tools (I’ll blog more on this soon, as it fascinates me).

Oh, and on the size of the task. Estimates of the number of undigitized paper records in the basements of various weather services ran to hundreds of millions of pages. But I still didn’t get a sense of the overall size of the planned databank…

Things I learnt:

  • Steve Worley from NCAR, reflecting on lessons from running ICOADS, pointed out that no matter how careful you think you’ve been, people will end up mis-using the data because they ignore or don’t understand the flags in the metadata.
  • Steve also pointed out that a drawback with open datasets is the proliferation of secondary archives, which then tend to get out of date and mislead users (as they rarely direct users back to the authoritative source).
  • Oh, and the scope of the uses of such data is usually surprisingly large and diverse.
  • Jay Lawrimore, reflecting on lessons from NCDC, pointed out that monthly data and daily and sub-daily data are collected and curated along independent routes, which then makes it hard to reconcile them. The station names sometimes don’t match, the lat/long coords don’t match (e.g. because of differences in rounding), and the summarized data are similar but not exact.
  • Another problem is that it’s not always clear exactly which 24-hour period a daily summary refers to (e.g. did they use a local or UTC midnight?). Oh, and this also means that 3- and 6-hour synoptic readings might not match the daily summaries either.
  • Some data doesn’t get transmitted, and so has to be obtained later, even to the point of having to re-key it from emails. Long delays in obtaining some of the data mean the datasets frequently have to be re-released.
  • Personal contacts and workshops in different parts of the world play a surprisingly important role in tracking down some of the harder to obtain data.
  • NCDC runs a service called Datzilla (similar to Bugzilla for software) for recording and tracking reported defects in the dataset.
  • Albert Klein Tank, describing the challenges in regional assessment of climate change and extremes, pointed out that the data requirements for analyzing extreme events are much higher than for assessing global temperature change. For example, we might need to know not just how many days were above 25°C compared to normal, but also how much did it cool off overnight (because heat stress and human health depend much more on overnight relief from the heat).
  • John Christy, introducing the breakout group on data provenance, had some nice examples in his slides of the kinds of paper records they have to deal with, and a fascinating example of a surface station that’s now under a lake, and hence old maps are needed to pinpoint its location.
  • From Michael de Podesta, who insisted on a healthy dose of serious metrology (not to be confused with meteorology): All measurements ought to come with an estimation of uncertainty, and people usually make a mess of this because they confuse accuracy and precision.
  • Uncertainty information isn’t metadata, it’s data. [Oh, and for that matter anything that’s metadata to one community is likely to be data to another. But that’s probably confusing things too much]
  • Oh, and of course, we have to distinguish Type A and Type B uncertainty. Type A is where the uncertainty is describable using statistics, so that collecting bigger samples will reduce it. Type B is where you just don’t know, so that collecting more data cannot reduce the uncertainty.
  • From Matt Menne, reflecting on lessons from the GHCN dataset, explaining the need for homogenization (which is climatology jargon for getting rid of errors in the observational data that arise because of changes over time in the way the data was measured). Some of the inhomogeneities are due to abrupt changes (e.g. because a recording station was moved, or got a new instrument), and also gradual changes (e.g. because the environment for a recording station slowly changes, e.g. gradual urbanization of its location).
  • Matt has lots of interesting examples of inhomogeneities in his slides, includes some really nasty ones. For example, a station in Reno, Nevada, that was originally in town, and then moved to the airport. There’s a gradual upwards trend in the early part of the record, from an urban heat island effect, and another similar trend in the latter part, after it moved to the airport, as the airport was also eventually encroached by urbanisation. But if you correct for both of these, as well as the step change when the station moved, you’re probably over-correcting….
  • which led Matt to suggest the Climate Scientist’s version of the Hippocratic Oath: First, do not flag good data as bad; Then do not make bias adjustments where none are warranted.
  • While criticism from non-standard sources (that’s polite-speak for crazy denialists) is coming faster than any small group can respond to (that’s code for the CRU), useful allies are beginning to emerge, also from the blogosphere, in the form of serious citizen scientists (such as Zeke Hausfather) who do their own careful reconstructions, and help address some of the crazier accusations from denialists. So there’s an important role in building community with such contributors.
  • John Kennedy, talking about homogenization for Sea Surface Temperatures, pointed out that Sea Surface and Land Surface data are entirely different beasts, requiring totally different approaches to homogenization. Why? because SSTs are collected from buckets on ships, engine intakes on ships, drifting buoys, fixed buoys, and so on. Which means you don’t have long series of observations from a fixed site like you do with land data – every observation might be from a different location!

Things I hope I managed to inject into the discussion:

  • “solicitation of input from the community at large” is entirely the wrong set of terms for white paper #14. It should be about community building and engagement. It’s never a one-way communication process.
  • Part of the community building should be the support for a shared set of open source software tools for analysis and visualization, contributed by the various users of the data. The aim would be for people to share their tools, and help build on what’s in the collection, rather than having everyone re-invent their own software tools. This could be as big a service to the research community as the data itself.
  • We desperately need a clear set of use cases for the planned data service (e.g. who wants access to which data product, and what other information will they be needing and why?). Such use cases should illustrate what kinds of transparency and traceability will be needed by users.
  • Nobody seems to understand just how much user support will need to be supplied (I think it will be easy for whatever resources are put into this to be overwhelmed, given the scrutiny that temperature records are subjected to these days)…
  • The rate of change in this dataset is likely to be much higher than has been seen in past data curation efforts, given the diversity of sources, and the difficulty of recovering complete data records.
  • Nobody (other than Bryan) seemed to understand that version control will need to be done at a much finer level of granularity than whole datasets, and that really every single data item needs to have a unique label so that it can be referred to in bug reports, updates, etc. Oh and that the version management plan should allow for major and minor releases, given how often even the lowest data products will change, as more data and provenance information is gradually recovered.
  • And of course, the change process itself will be subjected to ridiculous levels of public scrutiny, so the rational for accepting/rejecting changes and scheduling new releases needs to be clear and transparent. Which means far more attention to procedures and formal change control boards than past efforts have used.
  • I had lots of suggestions about how to manage the benchmarking effort, including planning for the full lifecycle: making sure the creation of the benchmark is a really community consensus building effort, and planning for retirement of each benchmark, to avoid the problems of overfitting. Susan Sim wrote an entire PhD on this.
  • I think the databank will need to come with a regularly updated blog, to provide news about what’s happening with the data releases, highlight examples of how it’s being used, explain interesting anomalies, interpret published papers based on the data, etc. A bit like RealClimate. Oh, and with serious moderation of the comment threads to weed out the crazies. Which implies some serious effort is needed.
  • …and I almost but not quite entirely learned how to pronounce the word ‘inhomogeneities’ without tripping over my tongue. I’m just going to call them ‘bugs’.

Update Sept 21, 2010: Some other reports from the workshop.

I’ve mentioned the Clear Climate Code project before, but it’s time to give them an even bigger shout out, as the project is a great example of of the kind of thing I’m calling for in my grand challenge paper. The project is building an open source community around the data processing software used in climate science. Their showcase project is an open source Python re-implementation of gistemp, and very impressive it is too.

Now they’ve gone one better, and launched the Climate Code Foundation, a non-profit organisation aimed at “improving the public understanding of climate science through the improvement and publication of climate science software”. The idea is for it to become an umbrella body that will nurture many more open source projects, and promote greater openness of the software tools and data used for the science.

I had a long chat with Nick Barnes, one of the founders of CCF, on the train to Exeter last night, and was very impressed with his enthusiasm and energy. He’s actively seeking more participants, more open source projects for the foundation to support, and of course, for funding to keep the work going. I think this could be the start of something beautiful.

Here’s a question I’ve been asking a few people lately, ever since I asserted that climate models are big expensive scientific instruments: How expensive are we talking about? Unfortunately, it’s almost impossible to calculate. The effort of creating a climate model is tangled up with the scientific research, such that you can’t even reliably determine how much of a particular scientist’s time is “model development” and how much is “doing science”. The problem is that you can’t build the model without a lot of that “doing science” part, because the model is the result of a lot of thinking, experimentation, theory building, testing hypotheses, analyzing simulation results, and discussions with other scientists. Many pieces of the model are based on the equations or empirical results in published research papers; even if you’re not doing the research yourself, you still have to keep up with the literature, understand the state-of-the-art, and know which bits of research are mature enough to incorporate into the model.

So, my first cut, which will be an over-estimation, is that *all* of the effort at a climate modeling lab is necessary to build the model. Labs vary in size, but a typical climate modeling lab is of the order of 200 people (including scientists, technicians, and admin support). And most of the models I’ve looked at have been under steady development for twenty years or more. So, that gives us starting point of 200*20 = 4,000 person-years. Luckily, most scientists care more about science than salary, so they’re much cheaper than software professionals. Given we’ll have a mix of postdocs and senior scientists, let’s say average salary would be around $150,000 per year including benefits and other overheads. Thats $600 million.

Oh, and that doesn’t including the costs of equipping and operating a tier-2 supercomputing facility, as the climate model runs will easily keep such a facility fully loaded full time (and we’ll need to factor in the cost to replace the supercomputer every few years to take advantage of performance increases). In most cases, the supercomputing facilities are shared with other scientific uses of high performance computing. But there is one centre that’s dedicated to climate modeling, the DKRZ in Hamburg, which has an annual budget of around 30 million euro. Let’s pretend euros are dollars, and call that $30 million per year, which for 20 years gives us another $600 million. The latest supercomputer at DKRZ, Blizzard, cost 35 million euro. Let’s say we replace this every five years, and throw some more money in for many terabytes of data storage, that’ll get us to around $200 million for hardware.

Grand total: $1.4 billion.

Now, I said that’s an over-estimate. Over lunch today I quizzed some of the experts here at IPSL in Paris, and they thought that 1,000 person-years (50 persons per year for 20 years) was a better estimate of the actual model development effort. This seems reasonable – it means that only 1/4 of the research at my 200 person research institute directly contributes to model development, the rest is science that uses the model but isn’t essential for developing it. So, that brings the salary figure down to $150 million. I’ve probably got to do the same conversion for the supercomputing facilities – let’s say about 1/4 of the supercomputing capacity is reserved for model development and testing. That also feels about right: 5-10% of the capacity is reserved for test processes (e.g. the ones that run automatically every day to do the automated build-and-test process), and a further 10%-20% might be used for validation runs on development versions of the model.

That brings the grand total down to $350 million.

Now, it has been done for less than this. For example, the Canadian Climate Centre, CCCma, has a modeling team one tenth this size, although they do share a lot of code with the Canadian Meteorological Service. And their model isn’t as full-featured as some of the other GCMs (it also has a much smaller user base). As with other software projects, the costs don’t scale linearly with functionality: a team of 5 software developers can achieve much more than 1/10th of what a team of 50 can (cf The Mythical Man Month). Oh, and the computing costs won’t come down much at all – the CCCma model is no more efficient than other models. So we’re still likely to be above the $100 million mark.

Now, there are probably other ways of figuring it – so far we’ve only looked at the total cumulative investment in one of today’s world leading climate models. What about replacement costs? If we had to build a new model from scratch, using what we already know (rather than doing all the research over again), how much would that cost? Well, nobody has ever done this, but there are few experiences we could draw on. For example, the Max Planck Institute has been developing a new model from scratch, ICON, which uses a icosahedral grid and hence needs a new approach to the dynamics. The project has been going for 8 years. It started with just a couple of people, and has ramped up to about a dozen. But they’re still a long way from being done, and they’re re-using a lot of the physics code from their old model, ECHAM. On the other hand, its an entirely new approach to the grid structure, so a lot of the early work was pure research.

Where does that leave us? It’s really a complete guess, but I would suggest a team of 10 people (half of them scientists, half scientific programmers) could re-implement the old model from scratch (including all the testing and validation) in around 5 years. Unfortunately, climate science is a fast moving field. What we’d get at the end of 5 years is a model that, scientifically speaking, is 5 years out of date. Unless of course we also paid for a large research effort to bring the latest science into the model while we were constructing it, but then we’re back where we started. I think this means you can’t replace a state-of-the-art climate model for much less than the original development costs.

What’s the conclusion? The bottom line is that the development cost of a climate model is in the hundreds of millions of dollars.

Here’s a whole set of things I can’t make it to. The great thing about being on sabbatical is the ability to travel, visit different labs, and so on. The downside is that there are far more interesting places and events than I can possibly make it to, and many of them clash. Here’s some I won’t be able to make it to this fall: