On my trip to Queens University last week, I participated in a panel session on the role of social media in research. I pointed out that tools like twitter provide a natural extension to the kinds of conversations we usually only get to have at conferences – the casual interactions with other researchers that sometimes lead to new research questions and collaborations.

So, with a little help from Storify, here’s an example…

In which we see and example of how Twitter can enable interesting science, and understand a little about the role of existing social networks in getting science done.

http://storify.com/SMEasterbrook/science-via-twitter

How would you like to help the weather and climate research community digitize historical records before they’re lost forever to a fate such as this:

Watch this video, from the International Surface Temperature Initiative‘s data rescue initiative for more background (skip to around 2:20 for the interesting parts):

…and then get involved with the Data Rescue at Home Projects:

Our specialissue of IEEE Software, for Nov/Dec 2011, is out! The title for the issue is Climate Change: Science and Software, and the guest editors were me, Paul Edward, Balaji, and Reinhard Budich.

There’s a great editorial by Forrest Shull, reflecting on interviews he conducted with Robert Jacob at Argonne National Labs and Gavin Schmidt at NASA GISS. The papers in the issue are:

Unfortunately most of the content is behind a paywall, although you can read our guest editors introduction in full here. I’m working on making some of the other content more freely available too.

Excellent news: Our study of the different meanings scientists ascribe to concepts such as openness and reproducibility is published today in PLoS ONE. It’s an excellent read. And it’s in an open access journal, so everyone can read it (just click the title):

On the Lack of Consensus over the Meaning of Openness: An Empirical Study

Alicia Grubb and Steve M. Easterbrook

Abstract: This study set out to explore the views and motivations of those involved in a number of recent and current advocacy efforts (such as open science, computational provenance, and reproducible research) aimed at making science and scientific artifacts accessible to a wider audience. Using a exploratory approach, the study tested whether a consensus exists among advocates of these initiatives about the key concepts, exploring the meanings that scientists attach to the various mechanisms for sharing their work, and the social context in which this takes place. The study used a purposive sampling strategy to target scientists who have been active participants in these advocacy efforts, and an open-ended questionnaire to collect detailed opinions on the topics of reproducibility, credibility, scooping, data sharing, results sharing, and the effectiveness of the peer review process. We found evidence of a lack of agreement on the meaning of key terminology, and a lack of consensus on some of the broader goals of these advocacy efforts. These results can be explained through a closer examination of the divergent goals and approaches adopted by different advocacy efforts. We suggest that the scientific community could benefit from a broader discussion of what it means to make scientific research more accessible and how this might best be achieved.

Next year’s International Conference on Software Engineering (ICSE), to be held in Zurich, has an interesting conference slogan: Sustainable Software for a Sustainable World

In many ways, ICSE is my community. By that I mean, this is the conference where I have presented my research most often, and is generally my first choice of venue for new papers. This is an important point: one of the most crucial pieces of advice I give to new PhD students is to “find your community”. To be successful as a researcher (and especially as an academic) you have to build a reputation for solid research within an existing research community. Which means figuring out early which community you belong to: who will be the audience for your research results? who will understand your work well enough to review your papers? And eventually, which community will you be looking to for letters of support for job applications, tenure reviews, and so on? And once you’ve figured out which community you belong to, you have to attend the conferences and workshops run by that community, and present your work to them as often as you can, and you have to get to know the senior people in that community. Or rather, they have to get to know you.

The problem is, in recent years, I’ve gone off ICSE. Having spent a lot of time in the last few years mixing with a different research community (climate science, and especially geoscientific model development), I come back to the  ICSE community with a different perspective, and what I see now (in general) it is a rather insular community, focussed on a narrow, technical set of research questions that seem largely irrelevant to anything that matters, and a huge resistance to inter-disciplinary research. This view crystallized for me last fall, when I attended a two-day workshop on “the Future of Software Engineering”, but came away very disappointed (my blog post from the workshop captured this very well).

I should be clear, I don’t mean to write off the entire community – there’s some excellent people in the ICSE community, doing fascinating research – many of them I regard as good friends. But the conference itself seems ever less relevant. The keynote talks always suck. And the technical program tends to be dominated by a large number of dull papers: incremental results on unimaginative research problems.

Perhaps this is a result of the way conference publication works. Thomas Anderson sets out a fascinating analysis of why this might be so for computer systems conferences, in his 2009 paper “Conference Reviewing Considered Harmful“. Basically, the accept/reject process for conferences that use a peer-review system creates a perverse incentive to researchers to write papers that are just good enough to get accepted, but no better. His analysis is consistent with my own observations – people talk about “the least publishable unit” of research. The net result is a conference full of rather dull papers, where nobody takes risks on more exciting research topics.

There’s an interesting contrast with the geosciences community here, where papers are published in journals rather than conferences. For example, at the AGU and EGU conferences, you just submit an abstract, and various track chairs decide whether to let you present it as a talk in their track, or whether it should appear as a poster. Researchers are only allowed to submit one abstract as first author, which means the conference is really a forum for each researcher to present her best work over the past year, with no strong relationship to the peer-reviewed publication process. This makes for big conferences, and very variable quality presentations. Attendees have to do a little more work in advance to figure out which talks might be worth attending. But the perverse incentive identified by Anderson is missing all together – each presenter is incentivized to present her best work, no matter what stage the research is at.

Which brings me back to ICSE. Next year’s conference chairs have chosen the slogan “Sustainable Software for a Sustainable World” for the conference. An excellent rallying call, but I sincerely hope they can do more with this than most conferences do – such conference slogans are usually irrelevant to the actual conference program, which is invariably business as usual. Of course, the term sustainability has been wildly overused recently, to the point that its in danger of becoming meaningless. So, how could ICSE make it something more than a meaningless slogan?

First, one has to acknowledge that an understanding of sustainability requires some systems thinking, and the ability to analyze multiple interacting systems. The classic definition, due to the Bruntland Commission, is that it refers to humanity’s ability to meet its needs, without compromising the needs of future generations. As Garvey points out, this is entirely inadequate, as it’s impossible to figure out how to balance our resource needs with those of an unknown number of potential future earthlings. A better approach is to break the concept down into sustainability in different, overlapping systems. Sverdrup and Svensson do this by breaking it down to three inter-related concepts: natural sustainability, social sustainability, and economic sustainability. Furthermore, they are hierarchically related: sustainability of social and economic activity are constrained by physical limits such as thermodynamics and mass conservation (e.g. forget a sustained economy if we screw the planet’s climate), and economic sustainability is constrained by social limits such as a functioning civil society.

How does this apply to ICSE? Well, I would suggest applying the sustainability concept to a number of different systems:

  • sustainability of the ICSE community itself, which would include nurturing new researchers, and fixing the problems of perverse incentives in the paper review processes. But this only makes sense within:
  • sustainability of scientific research as a knowledge discovery process, which would include analysis of the kinds of research questions a research community ought to tackle, and how should it engage with society. Here, I think ICSE has some serious re-assessment to do, especially with respect to it’s tendency to reject inter-disciplinary work.
  • sustainability of software systems that support human activity, which would suggest a switch in attention by the ICSE community away from the technical processes of de novo software creation, and towards questions of how software systems actually make life better for people, and how software systems and human activity systems co-evolve. An estimate I heard at the CHASE workshop is that only 20% of ICSE papers make any attempt to address human aspects.
  • sustainability of software development as an economic activity, which suggests a critical look at how existing software corporations work currently, but perhaps more importantly, exploration of new economic models (e.g. open source; end-user programming; software startups; mashups, etc)
  • the role of software in social sustainability, by which I mean a closer look at how software systems help (or hinder) the creation of communities, social norms, social equity and democratic processes.
  • the role of software in natural sustainability, by which I mean green IT topics such as energy-aware computing, as well as the broader role of software in understanding and tackling climate change.

A normal ICSE would barely touch on any of these topics. But I think next year’s chairs could create some interesting incentives to ensure the conference theme becomes more than just a slogan. At the session on SE for the planet that we held at ICSE 2009, someone suggested that in light of the fact that climate change will make everything else unsustainable, ICSE should insist that all submitted papers to future conferences demonstrate some relevance to tackling climate change (which is brilliant, but so radical that we have to shift the Overton window first). A similar suggestion at the one of the CHASE meetings was that all ICSE papers must demonstrate relevance to human & social aspects, or else prove that their research problem can be tackled without this. For ICSE 2012, perhaps this should be changed to simply reject all papers that don’t contribute somehow to creating a more sustainable world.

I think such changes might help to kick ICSE into some semblance of relevancy, but I don’t kid myself that they are likely. How about as a start, a set of incentives that reward papers that address sustainability in one of more of the senses above? Restrict paper awards to such papers, or create a new award structure for this purpose. Give such papers prominence in the program, and relegate other papers to the dead times like right after lunch, or late in the evening. Or something.

But a good start would be to abolish the paper submission process all together, to decouple the conference from the process of publishing peer-reviewed papers. That’s probably the biggest single contribution to making the conference more sustainable, and more relevant to society.

Over the past month, we’ve welcomed a number of new researchers to the lab. It’s about time I introduced them properly:

  • Kaitlin Alexander is with us for the summer on a Centre for Global Change Science (CGCS) undergraduate internship. She’s doing her undergrad at the University of Manitoba in applied math, and is well know to many of us already through her blog, ClimateSight. Kaitlin will be studying the software architecture of climate models, and she’s already written a few blog posts about her progress.
  • Trista Mueller is working in the lab for the summer as a part-time undergrad intern – she’s a computer science major at U of T, and a regular volunteer at Hot Yam. Trista is developing a number of case studies for our Inflo open calculator, and helping to shake out a few bugs in the process.
  • Elizabeth Patitsas joins us from UBC, where she just graduated with a BSc honours degree in Integrated Science – she’s working as a research assistant over the summer and will be enrolling in grad school here in September. Elizabeth has been studying how we teach (or fail to teach) key computer science skills, both to computer science students, and to other groups, such as physical scientists. Over the summer, she’ll be developing a research project to identify which CS skills are most needed by scientists, with the eventual goal of building and evaluating a curriculum based on her findings.
  • Fabio da Silva is a professor of computer science at the Federal University of Pernambuco (UFPE) in Brazil, and is joining us this month for a one year sabbatical. Fabio works in empirical software engineering and will be exploring how software teams coordinate their work, in particular the role of self-organizing teams.

Occasionally I come across blog posts that I wish I’d written myself, because they capture so well some of the ideas I’ve been thinking about. Such the is the case with Ricky Rood’s series on open climate models, over at Weather Underground (which itself is an excellent resource – particularly Jeff Master’s Wunderblog):

  1. Greening of the Desert: Open Climate Models
  2. Stickiness and Climate Models
  3. Open Source Communities, What are the problems?

I’ve nothing really to add, other than to note that the points Ricky makes in the third post, on the need for governance, are crucial. Wikipedia is a huge success, but not because the technology is right (quite frankly, wikis rather suck from a usability point of view), nor because people are inherently good at massive collaborative projects. Wikipedia is a success because they got the social processes right that govern editing and quality control. Open source communities do the same. They’re not really as open as most people think – an inner core of people impose tight control over the vision for the project and the quality control of the code. And they sometimes struggle to keep the clueless newbies out, to stop them messing things up.

I had lunch last week with Gerhard Fischer at the University of Colorado. Gerhard is director of the center for lifelong learning and design, and his work focusses on technologies that help people to learn and design solutions to suit their own needs. We talked a lot about meta-design, especially how you create tools that help domain experts (who are not necessarily software experts) to design their own software solutions.

I was describing some of my observations about why climate scientists prefer to write their own code rather than delegating it to software professionals, when Gerhard put it into words brilliantly. He said “You can’t delegate ill-defined problems to software engineers”. And that’s the nub of it. Much (but not all) of the work of building a global climate model is an ill-defined problem. We don’t know at the outset what should go into the model, which processes are important, how to simulate complex physical, chemical and biological processes and their interactions. We don’t know what’s computationally feasible (until we try it). We don’t know what will be scientifically useful. So we can’t write a specification, nor explain the requirements to someone who doesn’t have a high level of domain expertise. The only way forward is to actively engage in the process of building a little, experimenting with it, reflecting on the lessons learnt, and then modifying and iterating.

So the process of building a climate model is a loop of build-explore-learn-build. If you put people into that loop who don’t have the necessary understanding of the science being done with the models, then you slow things down. And as the climate scientists (mostly) have the necessary  technical skills, it’s quicker and easier to write their own code than to explain to a software engineer what is needed. But there’s a trade-off: the exploratory loop can be traversed quickly, but the resulting code might not be very robust or modifiable. Just as in agile software practices, the aim is to build something that works first, and worry about elegant design later. And that ‘later’ might never come, as the next scientific question is nearly always more alluring than a re-design. Which means the main role for software engineers in the process is to do cleanup operations. Several of the software people I’ve interviewed in the last few months at climate modeling labs described their role as mopping up after the parade (and some of them used more colourful terms than that).

The term meta-design is helpful here, because it specifically addresses the question of how to put better design tools directly into the hands of the climate scientists. Modeling frameworks fit into this space, as do domain specific-languages. But I’m convinced that there’s a lot more scope for tools that raise the level of abstraction, so that modelers can work directly with meaningful building blocks than lines of Fortran. And there’s another problem. Meta-design is hard. Too often it produces tools that just don’t do what the target users want. If we’re really going to put better tools into the hands of climate modelers, then we need a new kind of expertise to build such tools: a community of meta-designers who have both the software expertise and the domain expertise in earth sciences.

Which brings me to another issue that came up in the discussion. Gerhard provided me a picture that helps me explain the issue better (I hope he doesn’t mind me reproducing it here; it comes from his talk “Meta-Design and Social Creativity” given at IEMC 2007):

To create reflective design communities, the software professionals need to acquire some domain expertise, and the domain experts need to acquire some software expertise (diagram by Gerhard Fischer)

Clearly, collaboration between software experts and climate scientists is likely to work much better if each acquires a little of the other’s expertise, if only to enable them to share some vocabulary to talk about the problems. It reduces the distance between them.

At climate modeling labs, I’ve met a number both kinds of people – i.e. climate scientists who have acquired good software knowledge, and software professionals who have acquired good climate science knowledge. But it seems to me that for climate modeling, one of these transitions is much easier than the other. It seems to be easier for climate scientists to acquire good software skills than it is for software professionals (with no prior background in the earth sciences) to acquire good climate science domain knowledge. That’s not to say it’s impossible, as I have met a few people who have followed this path (but they are rare). It seems to require many years of dedicated work. And there appears to be a big disincentive for many software professionals, as it turns them from generalists into specialists. If you dedicate several years to developing the necessary domain expertise in climate modeling, it probably means you’re committing the rest of your career to working in this space. But the pay is lousy, the programming language of choice is uncool, and mostly you’ll be expected to clean up after the parade rather than star in it.

I’ve speculated before about the factors that determine the length of the release cycle for climate models. The IPCC assessment process, which operates on a 5-year cycle tends to dominate everything. But there are clearly other rhythms that matter too. I had speculated that the 6-year gap between the release of CCSM3 and CCSM4 could largely be explained by the demands of the the IPCC cycle; however the NCAR folks might have blown holes in that idea by making three new releases in the last six months; clearly other temporal cycles are at play.

In discussion over lunch yesterday, Archer pointed me to the paper “Exploring Collaborative Rhythm: Temporal Flow and Alignment in Collaborative Scientific Work”  by Steven Jackson and co, who point out that while the role of space and proximity have been widely studied in colloborative work, the role of time and patterns of temporal constraints have not. They set out four different kinds of temporal rhythm that are relevant to scientific work:

  • phenomenal rhythms, arising from the objects of study – e.g. annual and seasonal cycles strongly affect when fieldwork can be done in biology/ecology; the development of a disease in an individual patient affects the flow of medical research;
  • institutional rhythms, such as the academic calendar, funding deadlines, the timing of conferences and paper deadlines, etc.
  • biographical rhythms, arising from individual needs – family time, career development milestones, illnesses and vacations, etc.
  • infrastructural rhythms, arising from the development of the buildings and equipment that scientific research depends on. Examples include the launch, operation and expected life of a scientific instrument on a satellite, the timing of software releases, and the development of classification systems and standards.

The paper gives two interesting examples of problems in aligning these rhythms. First, the example of the study of long term phenomena such as river flow on short term research grants led to mistakes where a data collected during an unusually wet period in the early 20th century led to serious deficiencies in water management plans for the Colorado river. Second, for NASA’s Mars mission MER, the decision was taken to put the support team on “Mars time” as the Martian day is 2.7% longer than the earth day. But as the team’s daily work cycle drifted from the normal earth day, serious tensions arose between the family and social needs of the project team and the demands of the project rhythm.

Here’s another example that fascinated me when I was at the NASA software verification lab in the 90s. The Cassini spacecraft took about six years to get to Saturn. Rather than develop all the mission software prior to launch, NASA took the decision to develop only the minimal software needed for launch and navigation, and delayed the start of development of the mission software until just prior to arrival at Saturn. The rational was that they didn’t want a six year gap between development and use of this software, during which time the software teams might disperse – they needed the teams in place, with recent familiarity with the code, at the point the main science missions started.

For climate science, the IPCC process is clearly a major institutional rhythm, but the infrastructural rhythms that arise in model development interact with this in complex ways. I need to spend time looking at the other rhythms as well.

I had a bit of a gap in blogging over the last few weeks, as we scrambled to pack up our house (we’re renting out it while we’re away), and then of course, the roadtrip to Colorado to start the first of my three studies of software development processes at climate modeling centres. This week, I’m at the CCSM workshop, and will post some notes about the workshop in the next few days. But first, a chance for some reflection.

Ten years ago, when I quit NASA, I was offered a faculty position in Toronto with immediate tenure. The offer was too good to turn down: it’s a great department, with a bunch of people I really wanted to work with. I was fed up of the NASA bureaucracy, the short term-ism of the annual budget cycle, and (most importantly) a new boss I couldn’t work with. A tenured academic post was the perfect antidote – I could focus on long-term research problems that interested me most, without anyone telling me what to study.

(Note: Lest any non-academics think this is an easy life, think again. I spend far more time chasing research funding than actually doing research, and I’m in constant competition with an entire community of workaholics with brilliant minds. It’s bloody hard work)

Tenure is an interesting beast. It’s designed to protect a professor’s independence and ability to pursue long term research objectives. It also preserves the integrity of academic researchers: if university administrators, politicians, funders, etc find a particular set of research results to be inconvenient, they cannot fire, or threaten to fire the professors responsible. But it’s also limited. While it ought to protect curiosity-driven research from the whims of political fashions, it only protects the professor’s position (and salary), not the research funding needed for equipment, travel, students, etc. But the important thing is that tenure gives the professor the freedom to direct her own research programme and the freedom to decide what research questions to tackle.

Achieving tenure is often a trial by fire, especially in the top universities. After demonstrating your research potential by getting a PhD, you then compete with other PhDs to get a tenure-track position. You have to maintain a sustained research program over six to seven years as a junior professor, publishing regularly in the top journals in your field, and gaining the attention of the top people in your field who might be asked to write letters of support for your tenure case. In judging tenure cases, the trajectory and sustainability of the research programme is taken into account – a publication record that appears to be slowing down over the pre-tenure period is a big problem; if you have several papers in a row rejected, especially towards the end of the pre-tenure period, it might be hard to put together a strong tenure case. The least risky route is to stick with the same topic you studied in your PhD, where you already have the necessary background and where you presumably have also ‘found’ your community.

The ‘finding your community’ part is crucial. Scientific research is very much a community endeavor; the myth of the lone scientist in the lab is dead wrong. You have to figure out early in your research career which subfield you belong in, and get to know the other researchers in that subfield, in order to have your own research achievements recognized. Moving around between communities, or having research results scattered across different communities might mean there is no-one who is familiar enough with your entire body of research to write you a strong letter of support for tenure.

The problem is, of course, that this system trains professors to pick a subfield and stick with it. It tends to stifle innovation, and means that many professors then just continue to work on the same problems throughout the rest of their careers. There’s a positive side to this: some hard scientific problems really do need decades of study to master. On the other hand, most of the good ideas come from new researchers – especially grad students and postdocs; many distinguished scientists did their best work when they were in their twenties, when they were new to the field, and were willing to try out new approaches and question conventions.

To get the most value out of tenure, professors should really use it to take risks: to change fields, to tackle new problems, and especially to do research they they couldn’t do when they were chasing tenure. A good example is inter-disciplinary research. It’s hard to do work that spans several recognizable disciplines when you’re chasing tenure – you have to get tenure in a single university department, which usually means you have to be well established in a single discipline. Junior researchers interested in inter-disciplinary research are always at a disadvantage compared to their mono-disciplinary colleagues. But once you make tenure, this shouldn’t matter any more.

The problem is that changing your research direction once you’re an established professor is incredibly hard. This was my experience when I decided a few years ago to switch my research from traditional software engineering questions to the issue of climate change. It meant walking away from an established set of research funding sources, and an established research community, and most especially from an established set of collaborative relationships. The latter I think was particularly hard – colleagues with whom I’ve worked closely for many years still assume I’m interested in the same problems that we’ve always worked on (and, in many ways I still am – I’m trained to be interested in them!). I’m continually invited to co-author papers, to review papers and research proposals, to participate in grant proposals, and to join conference committees in my old field. But to give myself the space to do something very different, I’ve had to be hardheaded and say no to nearly all such invitations. It’s hard to do this without also offending people (“what do you mean you’re no longer interested in this work we’ve devoted our careers to?”). And it’s hard to start over, especially as I need to find new sources of funding, and new collaborators.

One of the things I’ve had to think carefully about is how to change research areas without entirely cutting off my previous work. After many years working on the same set of problems, I believe I know a lot about them, and that knowledge and experience ought to be useful. So I’ve tried to carve out a new research area that allows me to apply ideas that I’ve studied before to an entirely new challenge problem – a change of direction if you like, rather than a complete jump. But it’s enough of a change that I’ve had to find a new community to collaborate with. And different venues to publish in.

Personally, I think this is what the tenure system is made for. Tenured professors should make use of the protection that tenure offers to take risks, and to change their research direction from time to time. And most importantly, to take the opportunity to tackle societal grand challenge problems – the big issues where inter-disciplinary research is needed.

And unfortunately, just about everything about the tenure system and the way university departments and scientific communities operate discourages such moves. I’ve been trying to get many of my old colleagues to apply themselves to climate change, as I believe we need many more brains devoted to the problem. But very few of my colleagues are interested in switching direction like this. Tenure should facilitate it, but in practice, the tenure system actively discourages it.

Susan Leigh Star passed away in her sleep this week, coincidently on Ada Lovelace day. As I didn’t get a chance to do a Lovelace post, I’m writing this one belatedly, as a tribute to Leigh.

Leigh Star (sometimes also known as L*) had a huge influence on my work back in the early 90’s. I met her when she was in the UK, at a time when there was a growing community of folks at Sussex, Surrey, and Xerox Europarc, interested in CSCW. We organised a series of workshops on CSCW in London, at the behest of the UK funding councils. Leigh spoke at the the workshop that I chaired, and she subsequently contributed a chapter entitled “Cooperation Without Consensus in Scientific Problem Solving” to our book, CSCW: Cooperation of Conflict. Looks like the book is out of print, and I really want to read Leigh’s chapter again, so I hope I haven’t lost my copy – the only chapter I still have electronically is our introduction.

Anyway, Leigh pioneered a new kind of sociology of scientific work practices, looking at the mechanisms by which coordination and sharing occurs across disciplinary boundaries. Perhaps one of her most famous observations is the concept of boundary objects, which I described in detail last year in response to seeing coordination issues arise between geophysicists trying to consolidate their databases. The story of the geologists realizing they didn’t share a common definition of the term “bedrock” would have amused and fascinated her.

It was Leigh’s work on this that first switched me on to the value of sociological studies as a way of understanding the working practices of scientists, and she taught me a lot about how to use ethnographic techniques to study how people use and develop technical infrastructures. I’ve remained fascinated by her ideas ever since. For those wanting to know more about her work, I could suggest this interview with her from 2008, or better yet, buy her book on how classification schemes work, or perhaps read this shorter paper on the Ethnography of Infrastructure. She had just moved to the i-school at U Pittsburgh last year, so I assumed she still had many years of active research ahead of her. I’m deeply saddened that I didn’t get another chance to meet with her.

Leigh – we’ll miss you!

I’m proposing a new graduate course for our department, to be offered next January (after I return from sabbatical). For the course calendar, I’m required to describe it in fewer than 150 words. Here’s what I have so far:

Climate Change Informatics

This introductory course will explore the contribution of computer science to the challenge of climate change, including: the role of computational models in understanding earth systems, the numerical methods at the heart of these models, and the software engineering techniques by which they are built, tested and validated; challenges in management of earth system data, such as curation, provenance, meta-data description, openness and reproducibility; tools for communication of climate science to broader audiences, such as simulations, games, educational software, collective intelligence tools, and the challenges of establishing reputation and trustworthiness for web-based information sources; decision-support tools for policymaking and carbon accounting, including the challenges of data collection, visualization, and trade-off analysis; the design of green IT, such as power-aware computing, smart controllers and the development of the smart grid.

Here’s the rationale:

This is an elective course. The aim is to bring a broad range of computer science graduate students together, to explore how their skills and knowledge in various areas of computer science can be applied to a societal grand challenge problem. The course will equip the students with a basic understanding of the challenges in tackling climate change, and will draw a strong link between the students’ disciplinary background and a series of inter-disciplinary research questions. The course crosscuts most areas of computer science.

And my suggested assessment modes:

  • Class participation: 10%
  • Term Paper 1 (essay/literature review): 40%
  • Term Paper 2 (software design or implementation): 40%
  • Oral Presentation or demo: 10%

Comments are most welcome – the proposal has to get through various committees before the final approval by the school of graduate studies. There’s plenty of room to tweak it in that time.

Brad points out that much of my discussion for a research agenda in climate change informatics focusses heavily on strategies for emissions reduction (aka Mitigation) and neglects the equally important topic of ensuring communities can survive the climate changes that are inevitable (aka Adaptation). Which is an important point. When I talk about the goal of keeping temperatures to below a 2°C rise, it’s equally important to acknowledge that we’ve almost certainly already lost any chance of keeping peak temperature rise much below 2°C.

Which means, of course, that we have some serious work to do, in understanding the impact of climate change on existing infrastructure, and to integrate an awareness of the likely climate change issues into new planning and construction projects. This is, of course, what Brad’s Adaptation and Impacts research division focusses on. There are some huge challenges to do with how we take the data we have (e.g. see the datasets in the CCCSN), downscale these to provide more localized forecasts, and then figure out how to incorporate these into decision making.

One existing tool to point out is the World Bank’s ADAPT, which is intended to help analyze projects in the planning stage, and identify risks related to climate change adaptation. This is quite a different decision-making task from the emissions reduction decision tools I’ve been looking at. But just as important.

Our paper, Engineering the Software for Understanding Climate Change finally appeared today in IEEE Computing in Science and Engineering. The rest of the issue looks interesting too – a special issue on software engineering in computational science. Kudos to Greg and Andy for pulling it together.

Update: As the final paper is behind a paywall, folks might find this draft version useful. The final published version was edited for journal house style, and shortened to fit page constraints. Needless to say, I prefer my original draft…

Our group had three posters accepted for presentation at the upcoming AGU Fall Meeting. As the scientific program doesn’t seem to be amenable to linking, here are the abstracts in full:

Poster Session IN11D. Management and Dissemination of Earth and Space Science Models (Monday Dec 14, 2009, 8am – 12:20pm)

Fostering Team Awareness in Earth System Modeling Communities

S. M. Easterbrook; A. Lawson; and S. Strong
Computer Science, University of Toronto, Toronto, ON, Canada.

Existing Global Climate Models are typically managed and controlled at a single site, with varied levels of participation by scientists outside the core lab. As these models evolve to encompass a wider set of earth systems, this central control of the modeling effort becomes a bottleneck. But such models cannot evolve to become fully distributed open source projects unless they address the imbalance in the availability of communication channels: scientists at the core site have access to regular face-to-face communication with one another, while those at remote sites have access to only a subset of these conversations – e.g. formally scheduled teleconferences and user meetings. Because of this imbalance, critical decision making can be hidden from many participants, their code contributions can interact in unanticipated ways, and the community loses awareness of who knows what. We have documented some of these problems in a field study at one climate modeling centre, and started to develop tools to overcome these problems. We report on one such tool, TracSNAP, which analyzes the social network of the scientists contributing code to the model by extracting the data in an existing project code repository. The tool presents the results of this analysis to modelers and model users in a number of ways: recommendation for who has expertise on particular code modules, suggestions for code sections that are related to files being worked on, and visualizations of team communication patterns. The tool is currently available as a plugin for the Trac bug tracking system.

Poster Session IN31B. Emerging Issues in e-Science: Collaboration, Provenance, and the Ethics of Data (Wednesday Dec 16, 2009, 8am – 12:20pm)

Identifying Communication Barriers to Scientific Collaboration

A. M. Grubb; and S. M. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

The lack of availability of the majority of scientific artifacts reduces credibility and discourages collaboration. Some scientists have begun to advocate for reproducibility, open science, and computational provenance to address this problem, but there is no consolidated effort within the scientific community. There does not appear to be any consensus yet on the goals of an open science effort, and little understanding of the barriers. Hence we need to understand the views of the key stakeholders – the scientists who create and use these artifacts.

The goal of our research is to establish a baseline and categorize the views of experimental scientists on the topics of reproducibility, credibility, scooping, data sharing, results sharing, and the effectiveness of the peer review process. We gathered the opinions of scientists on these issues through a formal questionnaire and analyzed their responses by topic.

We found that scientists see a provenance problem in their communications with the public. For example, results are published separately from supporting evidence and detailed analysis. Furthermore, although scientists are enthusiastic about collaborating and openly sharing their data, they do not do so out of fear of being scooped. We discuss these serious challenges for the reproducibility, open science, and computational provenance movements.

Poster Session GC41A. Methodologies of Climate Model Confirmation and Interpretation (Thursday Dec 17, 2009, 8am – 12:20pm)

On the software quality of climate models

J. Pipitone; and S. Easterbrook
Computer Science, University of Toronto, Toronto, ON, Canada.

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. Defect density analysis is an established software engineering technique for studying software quality. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.