Okay, I’ve had a few days to reflect on the session on Software Engineering for the Planet that we ran at ICSE last week. First, I owe a very big thank you to everyone who helped – to Spencer for co-presenting and lots of follow up work; to my grad students, Jon, Alicia, Carolyn, and Jorge for rehearsing the material with me and suggesting many improvements, and for helping advertise and run the brainstorming session; and of course to everyone who attended and participated in the brainstorming for lots of energy, enthusiasm and positive ideas.

First action as a result of the session was to set up a google group, SE-for-the-planet, as a starting point for coordinating further conversations. I’ve posted the talk slides and brainstorming notes there. Feel free to join the group, and help us build the momentum.

Now, I’m contemplating a whole bunch of immediate action items. I welcome comments on these and any other ideas for immediate next steps:

  • Plan a follow up workshop at a major SE conference in the fall, and another at ICSE next year (waiting a full year was considered by everyone to be too slow).
  • I should give my part of the talk at U of T in the next few weeks, and we should film it and get it up on the web.¬†
  • Write a short white paper based on the talk, and fire it off to NSF and other funding agencies, to get funding for community building workshops
  • Write a short challenge statement, to which researchers can respond with project ideas to bring to the next workshop.
  • Write up a vision paper based on the talk for CACM and/or IEEE Software
  • Take the talk on the road (a la Al Gore), and offer to give it at any university that has a large software engineering research group (assuming I can come to terms with the increased personal carbon footprint ūüėČ
  • Broaden the talk to a more general computer science audience and repeat most of the above steps.
  • Write a short book (pamphlet) on this, to be used to introduce the topic in undergraduate CS courses, such as computers and society, project courses, etc.

Phew, that will keep me busy for the rest of the week…

Oh, and I managed to post my ICSE photos at last.

In the last session yesterday, Inez Fung gave the¬†Charney Lecture: Progress in Earth System Modeling since the ENIAC Calculation. But I missed it as I had to go pick up the kids. She has a recent paper that seems to cover some of the same ground, and allegedly the¬†lecture was recorded, so I’m looking forward to watching it once the AGU posts it. And this morning, Joanie Keyplas gave the¬†Rachel Carson Lecture: Ocean Acidification and Coral Reef Ecosystems: A Simple Concept with Complex Findings. She also has a recent paper covering what I assume was in her talk (again, I missed it!). Both lectures were recorded, so I’m looking forward to watching them once the AGU posts them.

I made it to the latter half of the session on Standards-Based Interoperability. I missed Stefano Nativi‘s talk on the requirements analysis for GIS systems, but there’s lots of interesting stuff on his web page to explore. However, I did catch Olga Wilhelmi presenting the results of a community workshop at NCAR on GIS for Weather, Climate and Impacts. She asked some interesting questions about the gathering of user requirements, and we chatted after the session about how users find the data they need (here’s an interesting set of use cases). I also chatted with¬†Ben Domenico from Unidata/UCAR about open science. We were complaining about how hard it is at a conference like this to get people to put their presentation slides on the web. It turns out that some journals in the geosciences have explicit policies to reject papers if any part of the results have already been presented on the web (including in blogs, powerpoints, etc). Ben’s feeling is that these print media are effectively dead, and had some interesting thoughts about moving to electronic publishing, althoug we both worried that some of these restrictive policies might live on in online peer-review venues.¬†(Ben is part of the THREDDS project, which is attempting to improve the way that scientists find and access datasets).

Down at the¬†ESSI poster session, I bumped into Peter Fox, whom I’d met at the EGU meeting last month. We both chatted to Benjamin Branch, about his poster on spatial thinking and earth sciences, and especially how educators approach this. Ben’s PhD thesis looks at all the institutional barriers that prevent changes in high school curricula, all of which mitigate against the nurturing of cross-disciplinary skills (like spatial reasoning) necessary for understanding global climate change. We brainstormed some ideas for overcoming these barriers, including putting cool tools in the students hands (e.g. Google Maps mashups of interesting data sets; or idea that Jon had for a Lego-style constructor kit for building simplified climate models). I also speculated that if the education policy in the US prevents this kind of initiative, we should do it in another country, build it to a major success, and then import it back into the US as a best practice model. Oh, well, I can dream.

Next I chatted to¬†Dicky Allison from Woods Hole, and¬†Tom Yoksas from Unidata/UCAR. Dicky’s poster is on the MapServer project, and Tom shared with us the slides from his talk yesterday on the RAMADDA project, which is intended as a publishing platform for geosciences data. We spent some time playing with the RAMADDA data server, and Tom encouraged us to play with it more, and send comments back on our experiences. Again, most of the discussion was about how to facilitate access to these data sets, how to keep the user interface as simple as possible, and the need for instant access – e.g. grabbing datasets from a server while travelling to a conference, without having to have all the tools and data loaded on a large disk first. Oh, and Tom explained the relationship between NCAR and UCAR, but it’s too complicated to repeat here.

Here’s an aside. Browsing the UCAR pages, I just found the Climate Modeller’s Commandments. Nice.

This afternoon, I attended the session “A Meeting of the Models“, on the use of Multi-model Ensembles for weather and climate prediction. First speaker was Peter Houtekamer, talking about the Canadian Ensemble Prediction Systems (EPS). The key idea of an ensemble is that it samples across the uncertainty in the initial conditions. However, challenges arise from the incomplete understanding of the model-error. So the interesting questions are how to sample adequately across the space, to get a better ensemble spread. The NCEP Short-Range Ensemble Forecast System (SREF), claimed to be the first real-time operational regional ensemble prediction system in the world. Even grander is TIGGE, in which the output of lots of operational EPS’s are combined into an archive. The volume of the database is large (100s of ensemble members), and you really only need something like 20-40 members to get converging scores (he cites Talagrand for this) (aside: Talagrand diagrams are an interesting way of visualizing model spread). NAEFS combines 20-member American (NCEP) and 20-member Canadian (MSC) operational ensembles forecasts, to get a 40-member ensemble. Nice demonstration of how NAEFS outperforms both of the individual ensembles from which it is constructed. Multi-centre ensembles improve the sampling of model error, but impose a big operational cost: data exchange protocols, telecommunications costs, etc. As more centres are added, there are likely to be diminishing returns.

The¬†American¬†Geophysical Union’s Joint Assembly is in Toronto this week. It’s a little slim on climate science content compared to the EGU meeting, but I’m taking in a few sessions as it’s local and convenient. Yesterday I managed to visit some of the climate science posters. I also caught the last talk of the session on connecting space and planetary science, and learned that the solar cycles have a significant temperature impact on the upper atmosphere, but no obvious effect on the lower atmosphere, but more research is needed to understand the impact on climate simulations. (Heather Andres‘ poster has some more detail on this).

This morning, I attended the session on Regional Scale Climate ¬†Change. I’m learning that understanding the relationship between temperature change and increased tropical storm activity is complicated, because tropical storms seem to react to complex patterns of temperature change, rather than just the temperature itself. I’m also learning that you can use statistical downscaling from the climate models to get finer grained regional simulations of the changes in rainfall, e.g. over the US, leading to predictions for increased precipitation over much of the US in the winters and decreased in the summers. However, you have to be careful, because the models don’t capture seasonal variability well in some parts of the continent. A particular challenge for regional climate predictions is that some placed (e.g. Carribean Islands) are just too small to show up in the grids used in General Circulation Models (GCMs), which means we need more work on Regional Models to get the necessary resolution.

Final talk is Noah Diffenbaugh‘s talk on an ensemble approach to regional climate forecasts. He’s using the IPCC’s A1B scenario (but notes that in the last few years, emissions have exceeded those for this scenario). The model is nested – a hight resolution regional model (25km) is nested within a GCM (CCSM3, at T85 resolution), but the information flows only in one direction, from the GCM to the RCM. As far as I can tell, the reason it’s one way, is because the GCM run is pre-computed; specifically, it is taken by averaging 5 existing runs of the CCSM3 model from the IPCC AR4 dataset, and generate 6-hourly 3D atmosphere fields to drive the regional model. The runs show that by 2030-2039, we should expect 6-8 heat stress events per deacade across the whole of the south-west US (where a heat stress event is the kind of thing that should only hit once per ¬†decade). Interestingly, the warming is greater in the south-eastern US, but because the south-western states are already closer to the threshold temperature for heat stress events, they get more heatwaves. Noah also showed some interesting validation images, to demonstrate that the regional model reproduced 20th Century temperatures over the US much better than the GCM does.¬†

Noah also talked a little about the role of the 2¬įC¬†threshold used in climate negotiations, particularly at the Copenhagen meeting. The politicians don’t like that the climate scientists are expressing uncertainty about the¬†2¬įC¬†threshold. But there has to be, because the models show that even below 2 degrees, there are some serious regional impacts, in this case on the US. His take home message is that we need to seriously question greenhouse gas mitigation targets. One of the questioners pointed out that there is also some confusion between whether the¬†2¬įC is supposed to be above pre-industrial temperatures.

After lunch, I attended the session on¬†Breakthrough Ideas and Technologies for a Planet at Risk II. First talk is by¬†Lewis Gilbert on monitoring and managing a planet at risk. First, he noted that really, the¬†planet itself isn’t at risk – destroying it is still outside our capacity. Life will survive. Humans will survive (at least for a while). But it’s the quality of that survival that is at question. Some definitions of sustainability (he has quibbles with them all). First Bruntland’s – future generations should be able to meet their own needs; Natural Capital – future generations should have a standard of living better or equal to our own. Gilbert’s own: existance of a set of possible futures that are acceptable in some satisficing sense. But all of these definitions are based on human values and human life. So the concept of sustainability has human concerns deeply embedded in it. The rest of his talk was a little vague – he described a state space, E, with multiple dimensions (e.g. physical, such as CO2 concentrations; sociological, such as infant mortality in Somalia; biological, such as amphibian counts in Sierra Nevada), in which we can talk about quality of human life a some function of the vectors. The question then becomes what are the acceptable and unacceptable regions of E. But I’m not sure how this helps any.

Alan Robock talked about Geoengineering. He’s conducted studies of the effect of seeding sulphur particles into the atmosphere, using NASA’s climate model. In particular, injecting them over the arctic, where there is the most temperature change, and least impact on humans. His studies show that the seeding does have a significant impact on temperature, but as soon as you stop the seeding, the global warming quickly rises to where it would have been. So basically, once you start, you can’t stop. Also, you get other effects: e.g. a reduction of the tropical monsoons, a reduction of precipitation. Here’s an alternative: could it be done by just seeding in the arctic summer (when the temperature rise matters), and not in the winter. e.g. seed in April, May and June, or just in April, rather than year round. He’s exploring options like these with the model.¬†Interesting aside: Rolling Stone Magazine, Nov 3, 2006 “Dr Evil’s plan to stop Global Warming”. There was a meeting convened by NASA, at which Alan started to create a long list of risks associated with geoengineering (and has a newer paper updating the list currently in submission).

George Shaw talked about biogeologic carbon sequestration. First, he demolished the idea that peak oil / peak coal etc will save us, by calculating the amount of carbon that can be easily extracted by known fossil fuel reserves. Carbon capture ideas include iron fertilization of the oceans, which stimulates plankton growth, which extracts carbon from. Cyanobacteria also extract carbon. E.g. attach an algae farm to every power station smoke stack. However, to make any difference, the algae farm for one power plant might have to be 40-50 square km. He then described a specific case study, of taking the Salton Basin Area in southern California, and filling it up with an algae farm. This would remove a chunk of agricultural land, but would probably make money under the current carbon trading schemes.

Roel Snieder gave a talk “Facing the Facts and Living Our Values”. Interesting graph on energy efficiency, which shows that 60% of the energy we use is lost. Also presents a version of the graph showing cost of intervention against emissions reduction, point out that sequestration is the most expensive choice of all. Another nice point: understanding of the facts – how much CO2 gas is produced by burning all the coal in one railroad car. Answer is about 3 times the weight of the coal, but most people would say only a few ounces, because gases are very light. Also he has a neat public lecture, and encouraged the audience to get out and give similar lectures to the public.

Eric Barron: Beyond Climate Science. It’s a mistake for the climate science community to say that “the science is settled”, and we need to move on to mitigation strategies. Still five things we need:

  1. A true climate services – an authoritative, credible, user-centric source of information on climate (models and data). E.g. Advice on resettlement of threatened towns, advice on forestry management, etc.
  2. Deliberately expand the family of forecasting elements. Some natural expansion of forecasting is occurring, but the geoscience community needs to push this forward deliberately.
  3. Invest in stage 2 science – social sciences and the human dimension of climate change (physical science budget dwarves the social sciences budget).
  4. Deliberately tackle the issue of scale and the demand for an integrated approach.
  5. Evolve from independent research groups to environmental “intelligence” centres. Cohesive regional observation and modeling framework. And must connect vigorously with users and decision-makers.

Key point: we’re not ready. Characterizes the research community as a cottage industry of climate modellers. Interesting analogy: health sciences, which is almost entirely a “point-of-service” community that reacts to people coming in the door, with no coherent forecasting service. Finally, some examples of forecasting spread of west nile disease, lyme¬†disease, etc.

ICSE proper finished on Friday, but a few brave souls stayed around for more workshops on Saturday. There were two workshops in adjacent rooms that had a big topic overlap: SE Foundations for End-user programming (SEE-UP) and Software Engineering for Computational Science and Engineering (SECSE, pronounced “sexy”). I attended the latter, but chatted to some people attending the former during the breaks – seems we could have merged the two workshops for interesting effect. At SECSE, the first talk was by Greg Wilson, talking about the results of his survey of computational scientists. Some interesting comments about the qualitative data he showed, including the strong confidence exhibited in most of the responses (people who believe they are more effective at using computers than their colleagues). This probably indicates a self-selection bias, but it would be interesting to probe the extent of this. Also, many of them take a “toolbox” perspective – they treat the computer as a set of tools, and associate effectiveness with how well people understand the different tools, and how much they take the time to understand them. Oh and many of them mention that using a Mac makes them more effective. Tee Hee.

Next up: Judith Segal, talking about organisational and process issues – particularly the iterative, incremental approach they take to building software. Only cursory requirements analysis and only cursory testing. The model works because the programmers are the users – they build software for themselves, and because the software is developed (initially) only to solve a specific problem, so they can ignore maintainability and usability. Of course, the software often does escape from the lab, and get used by others, which leads to a large risk of using incorrect, poorly designed software leading to incorrect results. For the scientific communities Judith has been working with, there’s a cultural issue too – the scientists don’t value software skills, because they’re focussed on scientific skills and understanding. Also, openness is a problem because they are busy competing for publications and funding. But this is clearly not true of all scientific disciplines, as the climate scientists I’m familiar with are very different: for them computational skills are right at the core of their discipline, and they are much more collaborative than competitive.

Roscoe Bartlett, from Sandia Labs, presenting “Barely Sufficient Software Engineering: 10 Practices to Improve Your CSE Software”. It’s a good list: Agile (incremental) development, Code management, mail lists, checklists, make the source code the primary source of documentation. Most important was the idea of “barely sufficient”. Mindless application of formal software engineering processes to computational science doesn’t make any sense.

Carlton Crabtree described a study design to investigate the role of agile and plan-driven development processes among scientific software development projects. They are particularly interested in exploring the applicability of the Boehm and Turner model as an analytical tool. They’re also planning to use grounded theory to explore the scientists own perspectives, although I don’t quite get how they will reconcile the contructivist stance of grounded theory (it’s intended as a way of exploring the participants’ own perspectives), with the use of a pre-existing theoretical framework, such as the Boehm and Turner model.

Jeff Overbey, on refactoring Fortran. First, he started with a few thoughts on the history of Fortran (the language that everyone keeps thinking will die out, but never does. Some reference to zombies in here…). Jeff pointed out that languages only ever accumulate features (because removing features breaks backwards compatibility), so they just get more complex and harder to use with each update to the language standard. So, he’s looking at whether you can remove old language features using refactoring tools. This is especially useful for the older language features that encourage bad software engineering practices. Jeff then demo’d his tool. It’s neat, but is currently only available as an Eclipse plugin. If there was an emacs version, I could get lots of climate scientists to use this. [note: In the discussion, Greg recommended the book Working effectively with legacy code].

Next up: Roscoe again, this time on integration strategies. The software integration issues he describes are very familiar to me. and he outlined an “almost” continuous integration process, which makes a lot of sense. However, some of the things he describes a challenges don’t seem to be problems in the environment I’m familiar with (the climate scientists at the Hadley Centre). I need to follow up on this.

Last talk before the break: Wen Yu, talking about the use of program families for scientific computation, including a specific application for finite element method computations.

After an infusion of coffee, Ritu Arora, talking about the application of generative programming for scientific applications. She used a checkpointing example as a proof-of-concept, and created a domain specific language for describing checkpointing needs. Checkpointing is interesting, because it tends to be a cross cutting concern; generating code for this and automatically weaving it into the code is likely to be a significant benefit. Initial results are good: the automatically generated code had similar performance profiles to hand generated checkpointing code.

Next: Daniel Hook on testing for code trustworthiness. He started with some nice definitions and diagrams that distinguish some of the key terminology e.g. faults (mistakes in the code) versus errors (outcomes that affect the results). Here’s a great story: he walked into a glass storefront window the other day, thinking it was a door. The fault was mistaking a window for a door, and the error was about three feet. Two key problems: the oracle problem (we often have only approximate or limited oracles for what answers we should get) and the tolerance problem (there’s no objective way to say that the results are close enough to the expected results so that we can say they are correct). Standard SE techniques often don’t apply. For example, the use of mutation testing to check the quality of a test set doesn’t work on scientific code because of the tolerance problem – the mutant might be closer to the expected result than the unmutated code. So, he’s exploring a variant and it’s looking promising. The project is called matmute.

David Woollard, from JPL, talking about inserting architectural constraints into legacy (scientific) code. David has been doing some interesting work with assessing the applicability of workflow tools to computational science.

Parmit Chilana from U Washington. She’s working mainly with bioinformatics researchers, comparing the work practices of practitioners with researchers. The biologists understand the scientific relevance , but not the technical implementation; the computer scientists understand the tools and algorithms, but not the biological relevance. She’s clearly demonstrated the need for domain expertise during the design process, and explored several different ways to bring both domain expertise and usability expertise together (especially when the two types of expert are hard to get because they are in great demand).

After lunch, the last talk before we break out for discussion. Val Maxville, preparing scientists for scaleable software development. Val gave a great overview of the challenges for software development at iVEC. AuScope looks interesting – an integration of geosciences data across Australia. For each of the different projects. Val assessed how much they have taken practices from the SWEBOK – how much have they applied them, and how much do they value them. And she finished with some thoughts on the challenges for software engineering education for this community, including balancing between generic and niche content, and balance between ‘on demand’ versus a more planned skills development process.

And because this is a real workshop, we spent the rest of the afternoon in breakout groups having fascinating discussions. This was the best part of the workshop, but of course required me to put away the blogging tools and get involved (so I don’t have any notes…!). I’ll have to keep everyone in suspense.

Friday, the last day of the main conference, kicked off with Pamela Zave’s keynote “Software Engineering for the Next Internet”. Unfortunately I missed the first few minutes of the talk. But I regret that, because this was an excellent keynote. Why do I say that? Because Pamela demonstrated a beautiful example of what I want to call “software systems thinking”. By analyzing them from a software engineering perspective, she demonstrated how some of the basic protocols of the internet (eg the Simple Initiation Protocol, SIP), and the standardization process by which they are developed are broken in interesting ways. The reason they are broken is because they ignore software engineering principles. I thought the analysis was compelling: both thorough in terms of the level of detail, and elegant in the simplicity of the analysis.

Here’s some interesting tidbits;

  • A corner case is a possible behaviour that emerges from the interaction of unanticipated constraints. It is undesirable, and designers typically declare it to be rare and unimportant, without any evidence. Understanding and dealing with corner cases is important for assessing the robustness of a design.
  • The IETF standards process is an extreme (pathological?) case of bottom up thinking. It sets an artificial conflict between generality and simplicity, because any new needs are dealt with by adding more features and more documents to the standard. Generality is always achieved by making the design more complex. Better abstractions, and some more top down analysis can provide simple and general designs (and Pamela demonstrated a few)
  • How did the protocols get to be this broken? Most network functions are provided by cramming them into the IP layer. This is believed to be more efficient, and in the IETF design process, efficiency always takes precedence over separation of concerns.
  • We need a successor to the end-to-end principle. Each application should run on a stack of overlays that exactly meets its requirements. Overlays have to be composable. The bottom underlay runs on a virtual network which gets a predictable slice of the real network resources. Of course, there are still some tough technical challenges in designing the overlay hierarchy.

So, my reflections. Why did I like this talk so much? First it had an appealing balance of serious detail (with clear explanations) and new ideas that are based on an understanding of the big picture. Probably it helps that she’s talking about an analysis approach using techniques that I’m very familiar with (some basic software engineering design principles: modularity, separation of concerns, etc), and applies them to a problem that I’m really not familiar with at all (detailed protocol design). So that combination allows me to follow most of the talk (because I understand the way she approaches the problem), but tells me a lot of things that are new and interesting (because the domain is new to me).

She ended with a strong plug for domain-specific research. It’s more fun and more interesting! I agree wholeheartedly with that. Much of software engineering research is ultimately disappointing because in trying to be too generic it ends up being vague and wishy washy. And it misses good pithy examples.

So, having been very disappointed with Steve McConnell’s opening keynote¬†yesterday, I’m pleased to report that the keynotes got steadily better over the week. Thursday’s keynote was by¬†Carlo Ghezzi, entitled Reflections on 40+ years of software engineering research and beyond: An Insider’s View. He started with a little bit of history of the¬†SE research community and the ICSE conference, but the main part of the talk was a trawl though the data from the conference over the years, motivated by questions such as “how international are we as a community?”, and “how diverse?” (e.g. academia, industry…), and “how did the research areas included in ICSE evolve?”. For example, there has been a clear trend in the composition of the program committee, from being N. American dominated (80% at first ICSE), to now approx equal N. American and European, with some from asia & elsewhere. However, there is a startling trend on industry vs. acadmia mix. The attendees at the first conference were 80% industry and only 20% academics. This has steadily changed: the conference is now 90% academics. The number of accepted papers each year has remained fairly steady (average is 44), but with a strong growth in submissions over past 15 years from 150 to 400. Which now gives us a paper acceptance rate now well below 15%. This is clearly good for the academics – the low acceptance rate keeps the quality of the accepted papers high, and makes the conference the top choice as a publication venue. But a strong academic research program clearly does not attract practitioners to attend.

In Carlo’s analysis of research areas, I was struck by the graph of number of papers on programming languages, which looks like a pair of vampire teeth – a huge spike in this area in the early days of ICSE, then nothing for years, and again a huge spike in the last couple of years. A truly interesting and surprising result.

Towards the end of the talk, Carlo got onto the question of how we could identify our best products. He talked about the strengths and weaknesses of quantitative measures such as citation count (difficult as it’s a moving target, and you have to account for journal/conference versions), number of downloads from ACM digital library over 12 months, etc. He drew a lot on a report by the¬†Joint Committee on Quantitative Assessment of Research. He also mentioned Meyer’s¬†viewpoint article in CACM April 2009, and of course, Parnas’s somewhat less nuanced “Stop the numbers game“.¬†Why is the problem of quantitative assessment of research becoming so hot today? It’s being increasingly used to rank journals and conferences and individuals. Many stakeholders now need to evaluate research, and peer-review is considered to be expensive and subjective, while numeric metrics are considered to be simple and objective. The Joint committee report says that, to the contrary, numeric metrics are simple and misleading. From the report: Much of modern bibliometrics is flawed. The meaning of a citation can be even more subjective than peer review. Citation counts are only valid if reinforced by other judgements.

Carlos’ final message was that we have to care about impact of our research: understanding, measuring, and improving it. Because if we don’t others will (governments, funding agencies, universities, etc). Okay, that’s a good argument. I’ve been skeptical of SIGSOFT’s Impact Project in the past, largely because I think the process by which research ideas filter into industrial practice is much more complex, and takes much longer than everyone seems to expect. But I guess taking control of the assessment of impact is the obvious way to address this issue.

After the break, Jorge presented his paper on the Secret Life of Bugs. His did a great job on presenting the work, to an absolutely packed room, and I had lots of people comment on how much they enjoyed the paper afterwards. I beamed with pride.

But for most of the day, I was busy trying to finish off my talk “Software Engineering for the Planet”, in time for the session at 2pm. Many thanks to Spencer, Jon, Carolyn and Alicia for helping my polish it prior to delivery. I’ll get the slides up on the web soon. I think the session went very well – the questions and discussions afterwards were very encouraging – most people seemed to immediately get the key message (that we should stop focussing our energies on personal green choices, and instead figure out how our professional skills and experience can be used to address to the climate crisis).¬†Aran posted a quick summary of the session, and some afterthoughts.¬†Now we’ve got to do the community building, and keep the momentum going. [Aran said he doesn’t think I’ll get much research done in the next few months. He’s might be right, but I can just declare that this is now my research…]

As a fan of¬†Edward Tufte’s books¬†on the power of beautiful visualizations of qualitative and quantitative data, I’m keen on the idea of exploring new ways of¬†visualizing¬†the climate change challenge. In part because many key policymakers are not likely to ever read the detailed reports on the science, but a few simple, compelling graphics might capture their attention.

I like the¬†visualizations of collected by the UNEP, especially their summary of¬†climate processes and effects, their¬†strategic options curve, the map of¬†political choices, summary of¬†emissions by sector, a guide to¬†emissions assessment, trends in¬†sea level rise, CO2¬†emissions per capita. I should also point out that the IPCC reports are full of¬†great graphics¬†too, but there’s no easy visual index – you have to read the reports.

Now these are all very nice, and (presumably) the work of professional graphic artists. But they’re all static. The scientist in me wants to play with them. I want to play around with different scales on the axes. I want to select from among different data series. And I want to do this in a web-brower that’s directly linked to the data sources, so that I don’t have to mess around with the data directly, nor worry about how the data is formatted.

What I have in mind is something like¬†Gap Minder. This allows you to play with the data, create new views, and share them with others.¬†Many Eyes¬†is similar, but goes one step further in allowing a community to create entirely new kinds of visualization, and enhance each other’s, in a social networking style. Now, if i can connect up some of these to the climate data sets collected by the IPCC, all sorts of interesting things might happen. Except that the IPCC data sets don’t have enough descriptive metadata for non-experts to make sense of it. But fixing that’s another project.

Oh, and the¬†periodic table of visualization methods¬†is pretty neat as a guide to what’s possible.

Update: (via Shelly): Worldmapper is an interesting way of visualizing international comparisons.

Okay, the main conference started today, and we kick off with the keynote speaker – Steve McConnell talking about “The 10 most powerful ideas in software engineering”. Here’s my thoughts: when magazines are short of ideas for an upcoming issue, they resort to the cheap journalist’s trick of inventing top ten lists. It makes for easy reading filler, that never really engages the brain. Unfortunately, this approach also leads to dull talks. The best keynotes have a narrative thread. They tell a story. They build up ideas in interesting new ways. The top ten format kills this kind of narrative stone dead (except perhaps when used in parody). Okay, so I didn’t like the format, but what about the content? Steve walked through ten basic concepts that we’ve been teaching in our introductory software engineering courses for years, so I learned nothing new. Maybe this would be okay as a talk to junior programmers who missed out on software engineering courses in school. For ICSE keynotes, I expect a lot more – I’d have liked at least some sharper insights, or better marshalling of the evidence. I’m afraid I have to add this to my long list of poor ICSE keynotes. Which is okay, because ICSE keynotes always suck – even when the chosen speakers are normally brilliant thinkers and presenters. Maybe I’ll be proved wrong later this week… For what it’s worth, here’s his top ten list (which he said were in no particular order):

  1. Software Development work is performed by human beings. Human factors make a huge difference in the performance of a project.
  2. Incrementalism is essential. The benefits are feedback, feedback, and feedback! (on the software, on the development process, on the developer capability). And making small mistakes that prevent bigger mistakes later.
  3. I’ve no idea what number 3 was. Please excuse my inattention.
  4. Cost to fix a defect increases over time, because of the need to fix all the collateral and downstream consequences of the error.
  5. There’s an important kernel of truth in the waterfall model. Essentially, there are three intellectual phases: discovery, invention, construction. They are sequential, but also overlapping.
  6. Software Estimates can be improved over time, by reducing its uncertainty as the project progresses.
  7. The most powerful form of reuse is full reuse – i.e. not just code and design, but all aspects of process.
  8. Risk management is important.
  9. Different kinds of software call for different kinds of software development (the toolbox approach). This was witty: he showed a series of pictures of different kinds of saw, then newsflash: software development is as difficult as sawing.
  10. The software engineering body of knowledge (SWEBOK)

Next up, the first New Ideas and Emerging Results session. This is a new track at this year’s ICSE, and the intent is to have a series of short talks, with a poster session at the end of the day. Although I’m surprised how hard it was to get a paper accepted: of 118 submissions, they selected only 21 for presentation (an 18% acceptance rate). The organisers also encouraged the presenters to use the Pecha Kucha format: 20 slides on an automated timer, with 20 seconds per slide. Just to make it more fun and more dynamic.

I’m disappointed to report that none of the speakers this morning took up this challenge, although Andrew Begel’s talk on social networking for programmers was very interesting (and similar to some of our ideas for summer projects this year). The fourth talk, by Abram Hindle, also didn’t use the Pecha Kucha format, but made up for it with a brilliant and beautiful set of slides that explain how to form interesting time series analysis visualizations of software projects by mining the change logs.

Buried in the middle of the session was an object lesson in misuse of empirical methods. I won’t name the guilty parties, but let me describe the flaw in their study design. Two teams were assigned a problem to analyze, with one team being given a systems architecture, and the other team wasn’t. To measure the effect of being given this architecture on the requirements analysis, the authors asked experts to rate each of several hundred requirements generated by each of the teams, and then used a statistical test to see whether the requirements from one team were different on this ranking compared to the other. Unsurprisingly, they discovered a statistically significant difference. Unfortunately, the analysis is completely invalid, because they made a classic unit of analysis error. The unit of analysis for the experimental design is the team, because it was teams that were assigned the different treatments. But the statistical test was applied to individual requirements. But there was no randomization of these requirements – all the requirements from a given team have to be taken as a single unit. The analysis that was performed in this study merely shows that the requirements came from two different teams, which we knew already. It shows nothing at all about the experimental hypothesis. I guess the peer review process has to let a few klunkers through.

Well, we reach the end of the session and nobody did the Pecha Kucha thing. Never mind – my talk is first up in the next NIER session this afternoon, and I will take the challenge. Should be hilarious. On the plus side, I was impressed with the quality of all the talks – they all managed to pack in key ideas, make them interesting, and stick to the 6 minute time slot.

So, here’s an interesting thought that came up the the Michael Jackson festschrift yesterday. Michael commented in his talk that understanding is not a state, it’s a process. David Notkin then asked how we can know how well we’re doing in that process. I suggested that one of the ways you know is by discovering where your understanding is incorrect, which can happen if your model surprises you. I noticed this is a basic mode of operation for earth system modelers. They put their current best understanding of the various earth systems (atmosphere, ocean, carbon cycle, atmospheric chemistry, soil hydrology, etc) into a coupled simulation model and run it. Whenever the model surprises them, they know they’re probing the limits of their understanding. For example, the current generation of models at the Hadley centre don’t get the Indian Monsoon in the right place at the right time. So they know there’s something in that part of the model they don’t yet understand sufficiently.

Contrast this with the way we use (and teach) modeling in software engineering. For example, students construct UML models as part of a course in requirements analysis. They hand in their models, and we grade them. But at no point in the process do the models ever surprise their authors. UML models don’t appear to have the capacity for surprise. Which is unfortunate, given what the students did in previous courses. In their programming courses, they were constantly surprised. Their programs didn’t compile. Then they didn’t run. Then they kept crashing. Then they gave the wrong outputs. At every point, the surprise is a learning opportunity, because it means there was something wrong with their understanding, which they have to fix. This contrast explains a lot about the relative value students get from programming courses versus software modeling courses.

Now of course, we do have some software engineering modeling frameworks that have the capacity for surprise. They allow you to create a model and play with it, and sometimes get unexpected results. For example, Alloy. And I guess model checkers have that capacity too. A necessary condition is that you can express some property that your model ought to have, and then automatically check that it does have it. But that’s not sufficient, because if the properties you express aren’t particularly interesting, or are trivially satisifed, you still won’t be surprised. For example, UML syntax checkers fall into this category – when your model fails a syntax check, that’s not surprising, it’s just annoying. Also, you don’t necessarily have to formally state the properties – but you do have to at least have clear expectations. When the model doesn’t meet those expectations, you get the surprise. So surprise isn’t just about executability, it’s really about¬†falsifiability.

So, I made it to ICSE at last. I’m way behind on blogging this one: the students from our group have been here for several days, busy blogging their experiences. So far, the internet connection is way too weak for liveblogging, so I’l have to make do with post-hoc summaries.

I spent the morning at the Socio-Technical Congruence (STC) workshop. The workshops is set up with discussants giving prepared responses to each full paper presentation, and I love the format. The discussants basically riff on ideas that the original paper made them think of. Which ends up being more interesting than the original paper. For example, Peri Tarr clarified how to tell whether something counts as a design pattern. A design pattern is a (1) proven solution to a (2) commonly occurring problem in a (3) particular context. To assess whether an observed “pattern” is actually a design pattern, you need to probe whether all these three things are in place. For example, the patterns that Marcelo had identified do express implemented solutions, but e has not yet identified the problems/concerns they solve, and the contexts in which the patterns are applicable.

Andy Begel’s discussion include a tour through learning theory (I’ve no idea why, but I enjoyed the ride!). On a single slide, he tooks us though the traditional “empty container” model of learning, though Piaget‘s constructivism; Vygotsky‘s social learning, Papert‘s constructionism), Van Maanen & Schein‘s newcomer socialization; Hutchins‘ distributed cognition and Lave & Wenger‘s legitimate peripheral participation. Whew. Luckily, I’m familiar with all of these except the Van Maanen & Schein stuff – I’m looking forward to read that.¬†Oh and an interesting book recommendation “Anything that’s worth knowing is really complex” from Wolfram’s A New Kind of Science.¬†Then, Andy posed some interesting question’s: how long can software live? How big can it get? How many people can work on it? And he proposed we should design for long-term social structures, rather than modular architecture.

We then spent some time discussing whether designing the software architecture is the same thing as designing the social structure. Audris suggested that while software architecture people tend not to talk about the social dimension, but in fact they are secretly designing it. If the two get out of synch, people are very adaptable – they find a way of working around the mismatch. Peri pointed out that technology also adapts to people. They are different things, with feedback loops that affect each other. It’s an emergent, adaptive thing.

And someone mentioned Rob DeLine’s keynote on the weekend at CHASE, in which pointed out that only about 20% of ICSE papers mention the human dimension, and we should seek to flip the ratio. To make it 80% we should insist that papers that ignore the people aspects have to prove that people are irrelevant to the problem being addressed. Nice!

After lots of catching up with ICSE regulars over lunch, I headed over to the last session of the Michael Jackson festschrift, to hear Michael’s talk. He kicked off with some quotes that he admitted he can’t take credit for:¬†“description should precede invention”, and Tony¬†Hoare’s: “there are 2 ways to make a system (1) make it so complicated that it has no obvious deficiencies or (2) make it so simple that it obviously has no deficiencies”. And another which may or may not be original: “Understanding is a process, not a state”. And another interesting book recommendation: Personal Knowledge by Michael Polanyi.

So, here’s the core of MJ’s talk: every “contrivance” has an operational principle, which specifies how the characteristic parts fulfill their function. Further, knowledge of physics, chemistry, etc, is not sufficient to understand and recognise the operating principle. E.g. describing a clock – the description of the mechanism is not a scientific description. While the physical science has made great strides, our description of contrivances has not. The operational principle answers questions like “What is it?” “What is it for?”, ¬†and “how do the parts interact to achieve the purpose?”. To¬†supplement¬†this, the mathematical and scientific knowledge describes the underlying laws, context necessary for success (e.g. pendulum clock only works in the appropriate gravitational field, and must be completely upright – won’t work on the moon, on a ship, etc), part properties necessary for success, possible improvements, specific failures and causes, feasibility of a proposed contrivance.

MJ then goes on to show how problem decomposition works:

(1) Problem decomposition –¬† by breaking out the problem frames: e.g. for an elevator: provide prioritized lift service, brake on danger, provide information display for users.

(2) Instrumental decomposition – building manager specifies priority rules, system uses priority rules to determine operation.

The sources of complexity are the intrinsic complexity of each subproblem, plus the interaction of subproblems. But he calls for the use of¬†free decomposition (meaning free as in unconstrained). For initial description purposes, there are no constraints on how the subproblems will interact; the only driver is that we’re looking for simple operating principles.

Finally, then he identified some composition concerns: interleaving (edit priority rules vs lift service); requirements elaboration (e.g. book loans vs member status), requirements conflict (linter-library vs member loan), switching (lift service vs emergency action), domain sharing (e.g. phone display: camera vs gps vs email).

The discussion was fascinating, but I was too busy participating to take notes. Hope someone else did!

One interesting conversation I had at SciBarCamp was on how to get science fiction writers talking more to climate scientists, so they can take the latest science and turn it into compelling stories. The idea would be to tell it like it is. Instead of techno-optimizism or space opera, stories set in the current century that explain what the climate crisis will really do to us.

Several people talked about the need for some more positive visions, rather than the¬†apocalyptic¬†stuff. So, how about a set of stories from the latter half of the 21st Century, set in the world in which we won the battle. We made it to a completely carbon-neutral world. There were heroic efforts along the way by colourful individuals. There were political battles, and maybe a few bloody revolutions. But we avoided burning the trillionth tonne. The world is a little warmer, and we lost a few coastlines, but we avoided the critical thresholds that trigger runaway warming. I’d like to read stories about how we made it.

Maybe a volume of short stories?

(via Grist) A new report from the World Bank on effects of storm surges and extreme weather as a result of global warming. (See an overview in the NY Times, and the draft report). 

(via Gillian) A report in the Lancet on the impacts on health, which begins with the sentence “Climate Change is the biggest global health threat of the 21st Century”. (See an overview in New Scientist, and the¬†Editorial¬†and¬†full report in the Lancet). But to me, this is the most interesting bit: a roadmap for applied research in health and climate change.

And while we’re on the topic of research roadmaps, here’s one on Psychology and Climate Change, from the Australian¬†Psychological¬†Association.

Update: And another one from WWF And ETNOA – a roadmap on how the ICT sector can contribute to emissions reduction.

I like these roadmaps – send more!

Lately I’ve been advocating for smart people to start asking themselves how their special skills and expertise can be adapted to the challenge of climate change. And for them to get involved and do something. And I don’t just mean dabble around with trying to live a greener lifestyle. I mean to jump in completely and devote their careers to this. Because this is a planetary emergency, and we need a massive brain gain to address it. And because we¬†have a¬†moral obligation to act.¬†(Wish me luck: I’ll be pitching this message to software engineers next week).

But having immersed myself in the climate science for the last couple of years, I’m also aware of a huge cognitive dissonance. It’s like this incredible horrifying secret: the climate scientists have mapped out an¬†apocalyptic¬†future, demonstrating the urgency and the magnitude of the challenge, and have even calculated the probability factors. But most of the rest of the world is blissfully unaware. They carry on living their lives, burning through fossil fuels like there’s no tomorrow. Why is it not in the papers every day? Why do politicians make speeches and conduct election campaigns with barely a mention of it? Why aren’t there protest marches and sit-ins and hunger strikes?

I frequently meet people who don’t want to know. Some of them have convinced themselves its not happening. More often they treat it as some vague future threat that they’re too busy to worry about right now (and after all, they have changed their lightbulbs already). And some admit it’s too scary to talk about. Almost none of them are willing to take the time and explore what the climate scientists have to say.

And I have to admit, all of these people probably sleep better than I do. They might even be making good rational choices. Because if you spend too long immersed in the science and politics of climate change, there’s a serious danger of “climate trauma”, which appears to be as serious as other kinds of trauma.¬†Gillian Caldwell discusses this at length, and has a bunch of excellent tips to deal with it. Add that to the tips from the Australian Psychological Association that Jon blogged about a few months ago. Because, if you’ve read this far, and want to get involved, you’ll need to heed this advice.

Summer projects: I posted yesterday on social network tools for computational scientists. Greg has posted a whole list of additional suggestions.

Here, I will elaborate another of these ideas: the electronic lab notebook. For computational scientists, wiki pages are an obvious substitute for traditional lab notebooks, because each description of an experiment can then be linked directly with the corresponding datasets, configuration files, visualizations of results, scientific papers, related experiments, etc. (In the most radical version, Open Notebook Science, the lab notebook is completely open for anyone to see. But the toolset would be the same whether it was open to anyone, or just shared with select colleagues)

In my study of the software practices at the UK Met Office last summer, I noticed that some of the scientists carefully document each experiment via a new wiki page, but the process is laborious in a standard wiki, involving a lot of cut-and-paste to create a suitable page structure. For this reason, many scientists don’t keep good records of their experiments. An obvious improvement would be to generate a basic wiki page automatically each time a model run is configured, and populate it with information about the run, and links to the relevant data files. The scientists could then add further commentary via a standard wiki editor.

Of course, an even better solution is to capture all information about a particular run of the model (including subsequent commentary on the results) as meta-data in the configuration file, so that no wiki pages are needed: lab notebook pages are just user-friendly views of the configuration file. I think that’s probably a longer term project, and links in with the observation that existing climate model configuration tools are hard to use anyway and need to be re-invented. Let’s leave that one aside for the moment…

A related problem is better support for navigating and linking existing lab book pages. For example, in the process of writing up a scientific paper, a scientist might need to search for the descriptions of number of individual experiments, select some of the data, create new visualizations for use in the paper, and so on. Recording this trail would improve reproducibility, by capturing the necessary links to source data in case the visualizations used in the paper need to be altered or recreated.¬†Some of requires a detailed analysis of the specific workflows used in a particular lab (which reminds me I need to write up what I know of the Met Office’s workflows), but I think some of this can be achieved by simple generic tools (e.g. browser plugins) that help capture the trail as it happens, and perhaps edit and annotate it afterwards.

I’m sure some of these tools must exist already, but I don’t know of them. Feel free to send me pointers…

This summer, we have a group of undergrad students working with us, who will try building some of the tools we have identified as potentially useful for climate scientists. We’re just getting started this week, so it’s not clear what we’ll actually build yet, but I think I can guarantee we’ll end up with one of two outcomes: either we build something that is genuinely useful, or we learn a lot about what doesn’t work and why not.

Here’s the first project idea. It responds to the observation that large climate models (and indeed any large-scale scientific simulation) undergoes continuous evolution, as a variety of scientists contribute code over a long period of time (decades, in some cases). There is no well-defined specification for the system, and nor do the scientists even know ahead of time exactly what the software should do. Coordinating contributions to this code then becomes a problem. If you want to make a change to some particular routine, it can be hard to know who else is working on related code, what potential impacts your change might have, and sometimes it is hard even to know who to go and ask about these things – who’s the expert?

A similar problem occurs in many other types of software project, and there is a fascinating line of research that exploits the social network to visualize how the efforts of different people interact. It draws on work in sociology on social network analysis – basically the idea that you can treat a large group of people and their social interactions as a graph, which can then be visualized in interesting ways, and analyzed for its structural properties, to identify things like distance (as in six degrees of separation), and structural cohesion. For software engineering purposes, we can automatically construct two distinct graphs:

  1. A graph of social interactions (e.g. who talks to whom). This can be constructed by extracting records of electronic communication from the project database – email records, bug reports, bulletin boards, etc. Of course, this misses verbal interactions, which makes it more suitable for geographically distributed projects, but there are ways of adding some of this missing information if needed (e.g. if we can mine people’s calendars, meeting agendas, etc).
  2. A graph of code dependencies (which bits of code are related). This can include simply which routines call which other routines. More interestingly, it can include information such as which bits of code were checked into the repository at the same time by the same person, which bits of code are linked to the same bug report, etc.

Comparing these two graphs offers insight into socio-technical congruence – how well the social network (who talks to whom) matches the technical dependencies in the code. Which then leads to all sorts of interesting ideas for tools:

For added difficulty, we have to assume that our target users (climate scientists) are programming in Fortran, and are not using integrated programming environments. Although we can assume they have good version control tools (e.g. Subversion) and good bug tracking tools (e.g Trac).