I posted some initial ideas for projects for our summer students awhile back. I’m pleased to say that the students have been making great progress in the last few weeks (despite, or perhaps because of, the fact that I haven’t been around much). Here’s what they’ve been up to:

Sarah Strong and Ainsley Lawson have been exploring how to take the ideas on visualizing the social network of a software development team (as embodied in tools such as Tesseract), and applying them as simple extensions to code browsers / version control tools. The aim is to see if we can add some value in the form of better awareness of who is working on related code, but without asking the scientists to adopt entirely new tools. Our initial target users are the climate scientists at the UK Met Office Hadley Centre, who currently use SVN/Trac as their code management environment.

Brent Mombourquette has been working on a Firefox extension that will capture the browsing history as a graph (pages and traversed links), which can then be visualized, saved, annotated, and shared with others. The main idea is to support the way in which scientists search/browse for resources (e.g. published papers on a particular topic), and to allow them to recall their exploration path to remember the context in which they obtained these resources. I should mention the key idea goes all the way back to the Vannevar Bush’s memex.

Maria Yancheva has been exploring the whole idea of electronic lab notebooks. She has been exploring the workflows used by the climate scientists when they configure and run their simulation models, and considering how a more structured form of wiki might help them. She has selected OpenWetWare as a good starting point, and is exploring how to add extensions to MediaWiki to make OWW more suitable for computational science, especially to keep track of model runs.

Samar Sabie has also been looking at MediaWiki extensions, specifically to find a way to add visualizations into wiki pages and blogs as simply as possible. The problem is that currently, adding something as simple as a table of data to a page requires extensive work with the markup language. The long term aim is to make the insertion of dynamic visualizations (such as those at ManyEyes), but the starting point is to try to make it as ridiculously simple as possible to insert a data table, link it to a graph, and select appropriate parameters to make the graph look good, with the idea that users can subsequently change the appearance in useful ways (which means cut and paste from Excel Spreadsheets won’t be good enough).

Oh, and they’ve all been regularly blogging their progress, so we’re practicing the whole open notebook science thingy.

Okay, I’ve had a few days to reflect on the session on Software Engineering for the Planet that we ran at ICSE last week. First, I owe a very big thank you to everyone who helped – to Spencer for co-presenting and lots of follow up work; to my grad students, Jon, Alicia, Carolyn, and Jorge for rehearsing the material with me and suggesting many improvements, and for helping advertise and run the brainstorming session; and of course to everyone who attended and participated in the brainstorming for lots of energy, enthusiasm and positive ideas.

First action as a result of the session was to set up a google group, SE-for-the-planet, as a starting point for coordinating further conversations. I’ve posted the talk slides and brainstorming notes there. Feel free to join the group, and help us build the momentum.

Now, I’m contemplating a whole bunch of immediate action items. I welcome comments on these and any other ideas for immediate next steps:

  • Plan a follow up workshop at a major SE conference in the fall, and another at ICSE next year (waiting a full year was considered by everyone to be too slow).
  • I should give my part of the talk at U of T in the next few weeks, and we should film it and get it up on the web. 
  • Write a short white paper based on the talk, and fire it off to NSF and other funding agencies, to get funding for community building workshops
  • Write a short challenge statement, to which researchers can respond with project ideas to bring to the next workshop.
  • Write up a vision paper based on the talk for CACM and/or IEEE Software
  • Take the talk on the road (a la Al Gore), and offer to give it at any university that has a large software engineering research group (assuming I can come to terms with the increased personal carbon footprint 😉
  • Broaden the talk to a more general computer science audience and repeat most of the above steps.
  • Write a short book (pamphlet) on this, to be used to introduce the topic in undergraduate CS courses, such as computers and society, project courses, etc.

Phew, that will keep me busy for the rest of the week…

Oh, and I managed to post my ICSE photos at last.

ICSE proper finished on Friday, but a few brave souls stayed around for more workshops on Saturday. There were two workshops in adjacent rooms that had a big topic overlap: SE Foundations for End-user programming (SEE-UP) and Software Engineering for Computational Science and Engineering (SECSE, pronounced “sexy”). I attended the latter, but chatted to some people attending the former during the breaks – seems we could have merged the two workshops for interesting effect. At SECSE, the first talk was by Greg Wilson, talking about the results of his survey of computational scientists. Some interesting comments about the qualitative data he showed, including the strong confidence exhibited in most of the responses (people who believe they are more effective at using computers than their colleagues). This probably indicates a self-selection bias, but it would be interesting to probe the extent of this. Also, many of them take a “toolbox” perspective – they treat the computer as a set of tools, and associate effectiveness with how well people understand the different tools, and how much they take the time to understand them. Oh and many of them mention that using a Mac makes them more effective. Tee Hee.

Next up: Judith Segal, talking about organisational and process issues – particularly the iterative, incremental approach they take to building software. Only cursory requirements analysis and only cursory testing. The model works because the programmers are the users – they build software for themselves, and because the software is developed (initially) only to solve a specific problem, so they can ignore maintainability and usability. Of course, the software often does escape from the lab, and get used by others, which leads to a large risk of using incorrect, poorly designed software leading to incorrect results. For the scientific communities Judith has been working with, there’s a cultural issue too – the scientists don’t value software skills, because they’re focussed on scientific skills and understanding. Also, openness is a problem because they are busy competing for publications and funding. But this is clearly not true of all scientific disciplines, as the climate scientists I’m familiar with are very different: for them computational skills are right at the core of their discipline, and they are much more collaborative than competitive.

Roscoe Bartlett, from Sandia Labs, presenting “Barely Sufficient Software Engineering: 10 Practices to Improve Your CSE Software”. It’s a good list: Agile (incremental) development, Code management, mail lists, checklists, make the source code the primary source of documentation. Most important was the idea of “barely sufficient”. Mindless application of formal software engineering processes to computational science doesn’t make any sense.

Carlton Crabtree described a study design to investigate the role of agile and plan-driven development processes among scientific software development projects. They are particularly interested in exploring the applicability of the Boehm and Turner model as an analytical tool. They’re also planning to use grounded theory to explore the scientists own perspectives, although I don’t quite get how they will reconcile the contructivist stance of grounded theory (it’s intended as a way of exploring the participants’ own perspectives), with the use of a pre-existing theoretical framework, such as the Boehm and Turner model.

Jeff Overbey, on refactoring Fortran. First, he started with a few thoughts on the history of Fortran (the language that everyone keeps thinking will die out, but never does. Some reference to zombies in here…). Jeff pointed out that languages only ever accumulate features (because removing features breaks backwards compatibility), so they just get more complex and harder to use with each update to the language standard. So, he’s looking at whether you can remove old language features using refactoring tools. This is especially useful for the older language features that encourage bad software engineering practices. Jeff then demo’d his tool. It’s neat, but is currently only available as an Eclipse plugin. If there was an emacs version, I could get lots of climate scientists to use this. [note: In the discussion, Greg recommended the book Working effectively with legacy code].

Next up: Roscoe again, this time on integration strategies. The software integration issues he describes are very familiar to me. and he outlined an “almost” continuous integration process, which makes a lot of sense. However, some of the things he describes a challenges don’t seem to be problems in the environment I’m familiar with (the climate scientists at the Hadley Centre). I need to follow up on this.

Last talk before the break: Wen Yu, talking about the use of program families for scientific computation, including a specific application for finite element method computations.

After an infusion of coffee, Ritu Arora, talking about the application of generative programming for scientific applications. She used a checkpointing example as a proof-of-concept, and created a domain specific language for describing checkpointing needs. Checkpointing is interesting, because it tends to be a cross cutting concern; generating code for this and automatically weaving it into the code is likely to be a significant benefit. Initial results are good: the automatically generated code had similar performance profiles to hand generated checkpointing code.

Next: Daniel Hook on testing for code trustworthiness. He started with some nice definitions and diagrams that distinguish some of the key terminology e.g. faults (mistakes in the code) versus errors (outcomes that affect the results). Here’s a great story: he walked into a glass storefront window the other day, thinking it was a door. The fault was mistaking a window for a door, and the error was about three feet. Two key problems: the oracle problem (we often have only approximate or limited oracles for what answers we should get) and the tolerance problem (there’s no objective way to say that the results are close enough to the expected results so that we can say they are correct). Standard SE techniques often don’t apply. For example, the use of mutation testing to check the quality of a test set doesn’t work on scientific code because of the tolerance problem – the mutant might be closer to the expected result than the unmutated code. So, he’s exploring a variant and it’s looking promising. The project is called matmute.

David Woollard, from JPL, talking about inserting architectural constraints into legacy (scientific) code. David has been doing some interesting work with assessing the applicability of workflow tools to computational science.

Parmit Chilana from U Washington. She’s working mainly with bioinformatics researchers, comparing the work practices of practitioners with researchers. The biologists understand the scientific relevance , but not the technical implementation; the computer scientists understand the tools and algorithms, but not the biological relevance. She’s clearly demonstrated the need for domain expertise during the design process, and explored several different ways to bring both domain expertise and usability expertise together (especially when the two types of expert are hard to get because they are in great demand).

After lunch, the last talk before we break out for discussion. Val Maxville, preparing scientists for scaleable software development. Val gave a great overview of the challenges for software development at iVEC. AuScope looks interesting – an integration of geosciences data across Australia. For each of the different projects. Val assessed how much they have taken practices from the SWEBOK – how much have they applied them, and how much do they value them. And she finished with some thoughts on the challenges for software engineering education for this community, including balancing between generic and niche content, and balance between ‘on demand’ versus a more planned skills development process.

And because this is a real workshop, we spent the rest of the afternoon in breakout groups having fascinating discussions. This was the best part of the workshop, but of course required me to put away the blogging tools and get involved (so I don’t have any notes…!). I’ll have to keep everyone in suspense.

Friday, the last day of the main conference, kicked off with Pamela Zave’s keynote “Software Engineering for the Next Internet”. Unfortunately I missed the first few minutes of the talk. But I regret that, because this was an excellent keynote. Why do I say that? Because Pamela demonstrated a beautiful example of what I want to call “software systems thinking”. By analyzing them from a software engineering perspective, she demonstrated how some of the basic protocols of the internet (eg the Simple Initiation Protocol, SIP), and the standardization process by which they are developed are broken in interesting ways. The reason they are broken is because they ignore software engineering principles. I thought the analysis was compelling: both thorough in terms of the level of detail, and elegant in the simplicity of the analysis.

Here’s some interesting tidbits;

  • A corner case is a possible behaviour that emerges from the interaction of unanticipated constraints. It is undesirable, and designers typically declare it to be rare and unimportant, without any evidence. Understanding and dealing with corner cases is important for assessing the robustness of a design.
  • The IETF standards process is an extreme (pathological?) case of bottom up thinking. It sets an artificial conflict between generality and simplicity, because any new needs are dealt with by adding more features and more documents to the standard. Generality is always achieved by making the design more complex. Better abstractions, and some more top down analysis can provide simple and general designs (and Pamela demonstrated a few)
  • How did the protocols get to be this broken? Most network functions are provided by cramming them into the IP layer. This is believed to be more efficient, and in the IETF design process, efficiency always takes precedence over separation of concerns.
  • We need a successor to the end-to-end principle. Each application should run on a stack of overlays that exactly meets its requirements. Overlays have to be composable. The bottom underlay runs on a virtual network which gets a predictable slice of the real network resources. Of course, there are still some tough technical challenges in designing the overlay hierarchy.

So, my reflections. Why did I like this talk so much? First it had an appealing balance of serious detail (with clear explanations) and new ideas that are based on an understanding of the big picture. Probably it helps that she’s talking about an analysis approach using techniques that I’m very familiar with (some basic software engineering design principles: modularity, separation of concerns, etc), and applies them to a problem that I’m really not familiar with at all (detailed protocol design). So that combination allows me to follow most of the talk (because I understand the way she approaches the problem), but tells me a lot of things that are new and interesting (because the domain is new to me).

She ended with a strong plug for domain-specific research. It’s more fun and more interesting! I agree wholeheartedly with that. Much of software engineering research is ultimately disappointing because in trying to be too generic it ends up being vague and wishy washy. And it misses good pithy examples.

So, having been very disappointed with Steve McConnell’s opening keynote yesterday, I’m pleased to report that the keynotes got steadily better over the week. Thursday’s keynote was by Carlo Ghezzi, entitled Reflections on 40+ years of software engineering research and beyond: An Insider’s View. He started with a little bit of history of the SE research community and the ICSE conference, but the main part of the talk was a trawl though the data from the conference over the years, motivated by questions such as “how international are we as a community?”, and “how diverse?” (e.g. academia, industry…), and “how did the research areas included in ICSE evolve?”. For example, there has been a clear trend in the composition of the program committee, from being N. American dominated (80% at first ICSE), to now approx equal N. American and European, with some from asia & elsewhere. However, there is a startling trend on industry vs. acadmia mix. The attendees at the first conference were 80% industry and only 20% academics. This has steadily changed: the conference is now 90% academics. The number of accepted papers each year has remained fairly steady (average is 44), but with a strong growth in submissions over past 15 years from 150 to 400. Which now gives us a paper acceptance rate now well below 15%. This is clearly good for the academics – the low acceptance rate keeps the quality of the accepted papers high, and makes the conference the top choice as a publication venue. But a strong academic research program clearly does not attract practitioners to attend.

In Carlo’s analysis of research areas, I was struck by the graph of number of papers on programming languages, which looks like a pair of vampire teeth – a huge spike in this area in the early days of ICSE, then nothing for years, and again a huge spike in the last couple of years. A truly interesting and surprising result.

Towards the end of the talk, Carlo got onto the question of how we could identify our best products. He talked about the strengths and weaknesses of quantitative measures such as citation count (difficult as it’s a moving target, and you have to account for journal/conference versions), number of downloads from ACM digital library over 12 months, etc. He drew a lot on a report by the Joint Committee on Quantitative Assessment of Research. He also mentioned Meyer’s viewpoint article in CACM April 2009, and of course, Parnas’s somewhat less nuanced “Stop the numbers game“. Why is the problem of quantitative assessment of research becoming so hot today? It’s being increasingly used to rank journals and conferences and individuals. Many stakeholders now need to evaluate research, and peer-review is considered to be expensive and subjective, while numeric metrics are considered to be simple and objective. The Joint committee report says that, to the contrary, numeric metrics are simple and misleading. From the report: Much of modern bibliometrics is flawed. The meaning of a citation can be even more subjective than peer review. Citation counts are only valid if reinforced by other judgements.

Carlos’ final message was that we have to care about impact of our research: understanding, measuring, and improving it. Because if we don’t others will (governments, funding agencies, universities, etc). Okay, that’s a good argument. I’ve been skeptical of SIGSOFT’s Impact Project in the past, largely because I think the process by which research ideas filter into industrial practice is much more complex, and takes much longer than everyone seems to expect. But I guess taking control of the assessment of impact is the obvious way to address this issue.

After the break, Jorge presented his paper on the Secret Life of Bugs. His did a great job on presenting the work, to an absolutely packed room, and I had lots of people comment on how much they enjoyed the paper afterwards. I beamed with pride.

But for most of the day, I was busy trying to finish off my talk “Software Engineering for the Planet”, in time for the session at 2pm. Many thanks to Spencer, Jon, Carolyn and Alicia for helping my polish it prior to delivery. I’ll get the slides up on the web soon. I think the session went very well – the questions and discussions afterwards were very encouraging – most people seemed to immediately get the key message (that we should stop focussing our energies on personal green choices, and instead figure out how our professional skills and experience can be used to address to the climate crisis). Aran posted a quick summary of the session, and some afterthoughts. Now we’ve got to do the community building, and keep the momentum going. [Aran said he doesn’t think I’ll get much research done in the next few months. He’s might be right, but I can just declare that this is now my research…]

Okay, the main conference started today, and we kick off with the keynote speaker – Steve McConnell talking about “The 10 most powerful ideas in software engineering”. Here’s my thoughts: when magazines are short of ideas for an upcoming issue, they resort to the cheap journalist’s trick of inventing top ten lists. It makes for easy reading filler, that never really engages the brain. Unfortunately, this approach also leads to dull talks. The best keynotes have a narrative thread. They tell a story. They build up ideas in interesting new ways. The top ten format kills this kind of narrative stone dead (except perhaps when used in parody). Okay, so I didn’t like the format, but what about the content? Steve walked through ten basic concepts that we’ve been teaching in our introductory software engineering courses for years, so I learned nothing new. Maybe this would be okay as a talk to junior programmers who missed out on software engineering courses in school. For ICSE keynotes, I expect a lot more – I’d have liked at least some sharper insights, or better marshalling of the evidence. I’m afraid I have to add this to my long list of poor ICSE keynotes. Which is okay, because ICSE keynotes always suck – even when the chosen speakers are normally brilliant thinkers and presenters. Maybe I’ll be proved wrong later this week… For what it’s worth, here’s his top ten list (which he said were in no particular order):

  1. Software Development work is performed by human beings. Human factors make a huge difference in the performance of a project.
  2. Incrementalism is essential. The benefits are feedback, feedback, and feedback! (on the software, on the development process, on the developer capability). And making small mistakes that prevent bigger mistakes later.
  3. I’ve no idea what number 3 was. Please excuse my inattention.
  4. Cost to fix a defect increases over time, because of the need to fix all the collateral and downstream consequences of the error.
  5. There’s an important kernel of truth in the waterfall model. Essentially, there are three intellectual phases: discovery, invention, construction. They are sequential, but also overlapping.
  6. Software Estimates can be improved over time, by reducing its uncertainty as the project progresses.
  7. The most powerful form of reuse is full reuse – i.e. not just code and design, but all aspects of process.
  8. Risk management is important.
  9. Different kinds of software call for different kinds of software development (the toolbox approach). This was witty: he showed a series of pictures of different kinds of saw, then newsflash: software development is as difficult as sawing.
  10. The software engineering body of knowledge (SWEBOK)

Next up, the first New Ideas and Emerging Results session. This is a new track at this year’s ICSE, and the intent is to have a series of short talks, with a poster session at the end of the day. Although I’m surprised how hard it was to get a paper accepted: of 118 submissions, they selected only 21 for presentation (an 18% acceptance rate). The organisers also encouraged the presenters to use the Pecha Kucha format: 20 slides on an automated timer, with 20 seconds per slide. Just to make it more fun and more dynamic.

I’m disappointed to report that none of the speakers this morning took up this challenge, although Andrew Begel’s talk on social networking for programmers was very interesting (and similar to some of our ideas for summer projects this year). The fourth talk, by Abram Hindle, also didn’t use the Pecha Kucha format, but made up for it with a brilliant and beautiful set of slides that explain how to form interesting time series analysis visualizations of software projects by mining the change logs.

Buried in the middle of the session was an object lesson in misuse of empirical methods. I won’t name the guilty parties, but let me describe the flaw in their study design. Two teams were assigned a problem to analyze, with one team being given a systems architecture, and the other team wasn’t. To measure the effect of being given this architecture on the requirements analysis, the authors asked experts to rate each of several hundred requirements generated by each of the teams, and then used a statistical test to see whether the requirements from one team were different on this ranking compared to the other. Unsurprisingly, they discovered a statistically significant difference. Unfortunately, the analysis is completely invalid, because they made a classic unit of analysis error. The unit of analysis for the experimental design is the team, because it was teams that were assigned the different treatments. But the statistical test was applied to individual requirements. But there was no randomization of these requirements – all the requirements from a given team have to be taken as a single unit. The analysis that was performed in this study merely shows that the requirements came from two different teams, which we knew already. It shows nothing at all about the experimental hypothesis. I guess the peer review process has to let a few klunkers through.

Well, we reach the end of the session and nobody did the Pecha Kucha thing. Never mind – my talk is first up in the next NIER session this afternoon, and I will take the challenge. Should be hilarious. On the plus side, I was impressed with the quality of all the talks – they all managed to pack in key ideas, make them interesting, and stick to the 6 minute time slot.

So, I made it to ICSE at last. I’m way behind on blogging this one: the students from our group have been here for several days, busy blogging their experiences. So far, the internet connection is way too weak for liveblogging, so I’l have to make do with post-hoc summaries.

I spent the morning at the Socio-Technical Congruence (STC) workshop. The workshops is set up with discussants giving prepared responses to each full paper presentation, and I love the format. The discussants basically riff on ideas that the original paper made them think of. Which ends up being more interesting than the original paper. For example, Peri Tarr clarified how to tell whether something counts as a design pattern. A design pattern is a (1) proven solution to a (2) commonly occurring problem in a (3) particular context. To assess whether an observed “pattern” is actually a design pattern, you need to probe whether all these three things are in place. For example, the patterns that Marcelo had identified do express implemented solutions, but e has not yet identified the problems/concerns they solve, and the contexts in which the patterns are applicable.

Andy Begel’s discussion include a tour through learning theory (I’ve no idea why, but I enjoyed the ride!). On a single slide, he tooks us though the traditional “empty container” model of learning, though Piaget‘s constructivism; Vygotsky‘s social learning, Papert‘s constructionism), Van Maanen & Schein‘s newcomer socialization; Hutchins‘ distributed cognition and Lave & Wenger‘s legitimate peripheral participation. Whew. Luckily, I’m familiar with all of these except the Van Maanen & Schein stuff – I’m looking forward to read that. Oh and an interesting book recommendation “Anything that’s worth knowing is really complex” from Wolfram’s A New Kind of Science. Then, Andy posed some interesting question’s: how long can software live? How big can it get? How many people can work on it? And he proposed we should design for long-term social structures, rather than modular architecture.

We then spent some time discussing whether designing the software architecture is the same thing as designing the social structure. Audris suggested that while software architecture people tend not to talk about the social dimension, but in fact they are secretly designing it. If the two get out of synch, people are very adaptable – they find a way of working around the mismatch. Peri pointed out that technology also adapts to people. They are different things, with feedback loops that affect each other. It’s an emergent, adaptive thing.

And someone mentioned Rob DeLine’s keynote on the weekend at CHASE, in which pointed out that only about 20% of ICSE papers mention the human dimension, and we should seek to flip the ratio. To make it 80% we should insist that papers that ignore the people aspects have to prove that people are irrelevant to the problem being addressed. Nice!

After lots of catching up with ICSE regulars over lunch, I headed over to the last session of the Michael Jackson festschrift, to hear Michael’s talk. He kicked off with some quotes that he admitted he can’t take credit for: “description should precede invention”, and Tony Hoare’s: “there are 2 ways to make a system (1) make it so complicated that it has no obvious deficiencies or (2) make it so simple that it obviously has no deficiencies”. And another which may or may not be original: “Understanding is a process, not a state”. And another interesting book recommendation: Personal Knowledge by Michael Polanyi.

So, here’s the core of MJ’s talk: every “contrivance” has an operational principle, which specifies how the characteristic parts fulfill their function. Further, knowledge of physics, chemistry, etc, is not sufficient to understand and recognise the operating principle. E.g. describing a clock – the description of the mechanism is not a scientific description. While the physical science has made great strides, our description of contrivances has not. The operational principle answers questions like “What is it?” “What is it for?”,  and “how do the parts interact to achieve the purpose?”. To supplement this, the mathematical and scientific knowledge describes the underlying laws, context necessary for success (e.g. pendulum clock only works in the appropriate gravitational field, and must be completely upright – won’t work on the moon, on a ship, etc), part properties necessary for success, possible improvements, specific failures and causes, feasibility of a proposed contrivance.

MJ then goes on to show how problem decomposition works:

(1) Problem decomposition –  by breaking out the problem frames: e.g. for an elevator: provide prioritized lift service, brake on danger, provide information display for users.

(2) Instrumental decomposition – building manager specifies priority rules, system uses priority rules to determine operation.

The sources of complexity are the intrinsic complexity of each subproblem, plus the interaction of subproblems. But he calls for the use of free decomposition (meaning free as in unconstrained). For initial description purposes, there are no constraints on how the subproblems will interact; the only driver is that we’re looking for simple operating principles.

Finally, then he identified some composition concerns: interleaving (edit priority rules vs lift service); requirements elaboration (e.g. book loans vs member status), requirements conflict (linter-library vs member loan), switching (lift service vs emergency action), domain sharing (e.g. phone display: camera vs gps vs email).

The discussion was fascinating, but I was too busy participating to take notes. Hope someone else did!

This summer, we have a group of undergrad students working with us, who will try building some of the tools we have identified as potentially useful for climate scientists. We’re just getting started this week, so it’s not clear what we’ll actually build yet, but I think I can guarantee we’ll end up with one of two outcomes: either we build something that is genuinely useful, or we learn a lot about what doesn’t work and why not.

Here’s the first project idea. It responds to the observation that large climate models (and indeed any large-scale scientific simulation) undergoes continuous evolution, as a variety of scientists contribute code over a long period of time (decades, in some cases). There is no well-defined specification for the system, and nor do the scientists even know ahead of time exactly what the software should do. Coordinating contributions to this code then becomes a problem. If you want to make a change to some particular routine, it can be hard to know who else is working on related code, what potential impacts your change might have, and sometimes it is hard even to know who to go and ask about these things – who’s the expert?

A similar problem occurs in many other types of software project, and there is a fascinating line of research that exploits the social network to visualize how the efforts of different people interact. It draws on work in sociology on social network analysis – basically the idea that you can treat a large group of people and their social interactions as a graph, which can then be visualized in interesting ways, and analyzed for its structural properties, to identify things like distance (as in six degrees of separation), and structural cohesion. For software engineering purposes, we can automatically construct two distinct graphs:

  1. A graph of social interactions (e.g. who talks to whom). This can be constructed by extracting records of electronic communication from the project database – email records, bug reports, bulletin boards, etc. Of course, this misses verbal interactions, which makes it more suitable for geographically distributed projects, but there are ways of adding some of this missing information if needed (e.g. if we can mine people’s calendars, meeting agendas, etc).
  2. A graph of code dependencies (which bits of code are related). This can include simply which routines call which other routines. More interestingly, it can include information such as which bits of code were checked into the repository at the same time by the same person, which bits of code are linked to the same bug report, etc.

Comparing these two graphs offers insight into socio-technical congruence – how well the social network (who talks to whom) matches the technical dependencies in the code. Which then leads to all sorts of interesting ideas for tools:

For added difficulty, we have to assume that our target users (climate scientists) are programming in Fortran, and are not using integrated programming environments. Although we can assume they have good version control tools (e.g. Subversion) and good bug tracking tools (e.g Trac).

One of the things that came up in our weekly brainstorming session today was the question of whether climate models can be made more modular, to permit distributed development, and distributed execution. Carolyn has already blogged about some of these ideas. Here’s a little bit of history for this topic.

First, a very old (well, 1989) paper by Kalnay et al,  on Data Interchange Formats, in which they float the idea of “plug compatibility” for climate model components. For a long time, this idea seems to have been accepted as the long term goal for the architecture for climate models. But no-one appears to have come close. In 1996, David Randall wrote an interesting introspective on how university teams can (or can’t) participate in climate model building, in which he speculates that plug compatibility might not be achievable in practice because of the complexity of the physical processes being simulated, and the complex interactions between them. He also points out that all climate models (up to that point) had each been developed at a single site, and he talks a bit about why this appears to be necessarily so.

Fast forward to a paper by Dickinson et al in 2002, which summarizes the results of a series of workshops on how to develop a better software infrastructure for model sharing, and talks about some prototype software frameworks. Then, a paper by Larson et al in 2004, introducing a common component architecture for earth system models, and a bit about the Earth System Modeling Framework being developed at NCAR. And finally, Drake et al.’s Overview of the Community Climate System Model, which appears to use these frameworks very successfully.

Now, admittedly I haven’t looked closely at the CCSM. But I have looked closely at the Met Office’s Unified Model and the Canadian CCCma, and neither of them get anywhere close to the ideal of modularity. In both cases, the developers have to invest months of effort to ‘naturalize’ code contributed from other labs, in the manner described in Randall’s paper.

So, here’s the mystery. Has the CCSM really achieved the modularity that others are only dreaming of? And if so how? The key test would be how much effort it takes to ‘plug in’ a module developed elsewhere…

Well, this is a little off topic, but we (Janice, Dana, Peggy and I) have been invited to run this year’s International Advanced School of Empirical Software Engineering, in Florida in October. We’ve planned the day around the content of our book chapter on Selecting Empirical Research Methods for Software Engineering Research, which appeared in the book Guide to Advanced Empirical Software Engineering. It’s going to be a lot of fun!

At many discussions about the climate crisis that I’ve had with professional colleagues, the conversation inevitably turns to how we (as individuals) can make a difference by reducing our personal carbon emissions. So sure, our personal choices matter. And we shouldn’t stop thinking about them. And there is plenty of advice out there on how to green your home, and how to make good shopping decisions, and so on. Actually, there is way too much advice out there on how to live a greener life. It’s overwhelming. And plenty of it is contradictory. Which leads to two unfortunate messages: (1) we’re supposed to fix global warming through our individual personal choices and (2) this is incredibly hard because there is so much information to process to do it right.

The climate crisis is huge, and systemic. It cannot be solved through voluntary personal lifestyle choices; it needs systemic changes throughout society as a whole. As Bill McKibben says:

“the number one thing is to organize politically; number two, do some political organizing; number three, get together with your neighbors and organize; and then if you have energy left over from all of that, change the light bulb.”

Now, part of getting politically organized is getting educated. Another part is connecting with people. We computer scientists are generally not very good at political action, but we are remarkably good at inventing tools that allow people to get connected. And we’re good at inventing tools for managing, searching and visualizing information, which helps with the ‘getting educated’ part and the ‘persuading others’ part.

So, I don’t want to have more conversations about reducing our personal carbon footprints. I want to have conversations about how we can apply our expertise as computer scientists and software engineers in new and creative ways. Instead of thinking about your footprint, think about your delta (okay, I might need a better name for it): what expertise and skills do you have that most others don’t, and how can they be applied to good effect to help?

A group of us at the lab, led by Jon Pipitone, has been meeting every Tuesday lunchtime (well almost every Tuesday) for a few months, to brainstorm ideas for how software engineers can contribute to addressing the climate crisis. Jon has been blogging some of our sessions (here, here and here).

This week we attempted to create a matrix, where the rows are “challenge problems” related to the climate crisis, and the columns are the various research areas of software engineering (e.g. requirements analysis, formal methods, testing, etc…). One reason to do this is to figure out how to run a structured brainstorming session with a bigger set of SE researchers (e.g. at ICSE). Having sketched out the matrix, we then attempted to populate one row with ideas for research projects. I thought the exercise went remarkably well. One thing I took away from it was that it was pretty easy to think up research projects to populate many of the cells in the matrix (I had initially thought the matrix might be rather sparse by the time we were done).

We also decided that it would be helpful to characterize each of the rows a little more, so that SE researchers who are unfamiliar with some of the challenges would understand each challenge enough to stimulate some interesting discussions. So, here is an initial list of challenges (I added some links where I could). Note that I’ve grouped them according to who immediate audience is for any tools, techniques, practices…).

  1. Help the climate scientists to develop a better understanding of climate processes.
  2. Help the educators to to teach kids about climate science – how the science is done, and how we know what we know about climate change.
    • Support hands-on computational science (e.g. an online climate lab with building blocks to support construction of simple simulation models)
    • Global warming games
  3. Help the journalists & science writers to raise awareness of the issues around climate change for a broader audience.
    • Better public understanding of climate processes
    • Better public understanding of how climate science works
    • Visualizations of complex earth systems
    • connect data generators (eg scientists) with potential users (e.g. bloggers)
  4. Help the policymakers to design, implement and adjust a comprehensive set of policies for reducing greenhouse gas emissions.
  5. Help the political activists who put pressure on governments to change their policies, or to get better leaders elected when the current ones don’t act.
    • Social networking tools for activitists
    • Tools for persuasion (e.g. visualizations) and community building (e.g. Essence)
  6. Help individuals and communities to lower their carbon footprints.
  7. Help the engineers who are developing new technologies for renewable energy and energy efficiency systems.
    • green IT
    • Smart energy grids
    • waste reduction
    • renewable energy
    • town planning
    • green buildings/architecture
    • transportation systems (better public transit, electric cars, etc)
    • etc

Here’s an updated description of the ICSE session I kicked off this blog with. Looks like we’re scheduled for the second morning afternoon of the conference (Thurs May 21, 11am 2pm), straight after the keynote.

Update: Slides and notes from the session now available.

Software Engineering for the Planet

This session is a call to action. What can we, as software engineers, do to help tackle the challenge of climate change (besides reducing our personal carbon footprints)? The session will review recent results from climate science, showing how big the challenge is. We will then identify ways in which software engineering tools and techniques can help. The goal is to build a research agenda and a community of software engineering researchers willing to pursue it.

The ICSE organisers have worked hard this year to make the conference “greener” – to reduce our impact on the environment. This is partly in response to the growing worldwide awareness that we need to take more care of the natural environment. But it is also driven by a deeper and more urgent concern.

During this century, we will have to face up to a crisis that will make the current economic turmoil look like a walk in the park. Climate change is accelerating, confirming the more pessimistic of scenarios identified by climate scientists [1-4]. Its effects will touch everything, including the flooding of low-lying lands and coastal cities, the disruption of fresh water supplies for much of the world, the loss of agricultural lands, more frequent and severe extreme weather events, mass extinctions, and the destruction of entire ecosystems [5].

And there are no easy solutions. We need concerted systematic change in how we live, to reduce emissions so as to stabilize the concentration of greenhouse gases that drive climate change. Not to give up the conveniences of modern life, but to re-engineer them so that we no longer depend on fossil fuels to power our lives. The challenge is massive and urgent – a planetary emergency. The type of emergency that requires all hands on deck. Scientists, engineers, policymakers, professionals, no matter what their discipline, need to ask how their skills and experience can contribute.

We, as software engineering researchers and software practitioners have many important roles to play. Our information systems help provide the data we need to support intelligent decision making, from individuals trying to reduce their energy consumption, to policymakers trying to design effective governmental policies. Our control systems allow us to make smarter use of the available power, and provide the  adaptability and reliability to power our technological infrastructure in the face of a more diverse set of renewable energy sources.

The ICSE community in particular has many other contributions to make. We have developed practices and tools to analyze, build and evolve some of the most complex socio-technical systems ever created, and to coordinate the efforts of large teams of engineers. We have developed abstractions that help us to understand complex systems, to describe their structure and behaviour, and to understand the effects of change on those systems. These tools and practices are likely to be useful in our struggle to address the climate crisis, often in strange and surprising ways. For example, can we apply the principles of information hiding and modularity to our attempts to develop coordinated solutions to climate change? What is the appropriate architectural pattern for an integrated set of climate policies? How can we model the problem requirements so that the stakeholders can understand them? How do we debug the models on which policy decision are based?

This conference session is intended to kick start a discussion about the contributions that software engineering research can make to tackling the climate crisis. Our aim is to build a community of concerned professionals, and find new ways to apply our skills and experience to the problem. We will attempt to map out a set of ideas for action, and identify potential roadblocks. We will start to build a broad research agenda, to capture the potential contributions of software engineering research, and discuss strategies for researchers to refocus their research towards this agenda. The session will begin with a short summary of the latest lessons from climate science, and a concrete set of examples of existing software engineering research efforts applied to climate change. We will include an open discussion session, to map out an agenda for action. We invite everyone to come to the session, and take up this challenge.

References:

[1] http://www.csmonitor.com/2006/0324/p01s03-sten.html

[2] http://www.newscientist.com/article/dn11083

[3] http://news.bbc.co.uk/2/hi/uk_news/7053903.stm

[4] http://www.pnas.org/content/104/24/10288.abstract

[5] http://www.ipcc.ch/ipccreports/ar4-wg2.htm

I’ve been pondering starting a blog for way too long. Time for action. To explain what I think I’ll be blogging about, I put together the following blurb, for a conference session at the International Conference on Software Engineering. I’ll probably end up revising it for the conference, but it will do for a kickoff to the blog:

This year, the ICSE organisers have worked hard to make the conference “greener” – to reduce our impact on the environment. Partly this is in response to the growing worldwide awareness that we need to take more care of the natural environment. But partly it is driven by a deeper and more urgent concern. During this century, we will have to face up to a crisis that will make the current economic turmoil look like a walk in the park. Climate change is accelerating, outpacing the most pessimistic predictions of climate scientists. Its effects will touch everything, including the flooding of low-lying lands and coastal cities, the disruption of fresh water supplies for most of the world, the loss of agricultural lands, more frequent and severe extreme weather events, mass extinctions, and the destruction of entire ecosystems. And there are no easy solutions. We need concerted systematic change in how we live, to stabilize the concentration of greenhouse gases that drive climate change. Not to give up the conveniences of modern life, but to re-engineer them so that we no longer depend on fossil fuels to power our lives. The challenge is massive and urgent – a planetary emergency. The type of emergency that requires all hands on deck. Scientists, engineers, policymakers, professionals, no matter what their discipline, need to ask how their skills and experience can contribute.

We, as software engineering researchers and software practitioners have many important roles to play. Software is part of the problem, as every new killer application drives up our demand for more energy. But it is also a major part of the solution. Our information systems help provide the data we need to support intelligent decision making, from individuals trying to reduce their energy consumption, to policymakers trying to design effective governmental policies. Our control systems allow us to make smarter use of the available power, and provide the  adaptability and reliability to power our technological infrastructure in the face of a more diverse set of renewable energy sources. Less obviously, the software engineering community has many other contributions to make. We have developed practices and tools to analyze, build and evolve some of the most complex socio-technical systems ever created, and to coordinate the efforts of large teams of engineers. We have developed abstractions that help us to understand complex systems, to describe their structure and behaviour, and to understand the effects of change on those systems. These tools and practices are likely to be useful in our struggle to address the climate crisis, often in strange and surprising ways. For example, can we apply the principles of information hiding and modularity to our attempts to develop coordinated solutions to climate change? What is the appropriate architectural pattern for an integrated set of climate policies? How can we model the problem requirements so that the stakeholders can understand them? How do we debug strategies for emissions reduction when they don’t work out as intended?

This conference session is intended to kick start a discussion about the contributions that software engineering can make to tackling the climate crisis. Our aim is to build a community of concerned professionals, and find new ways to apply our skills and experience to the problem. We will attempt to map out a set of ideas for action, and identify potential roadblocks. We will start to build a broad research agenda, to capture the potential contributions of software engineering research. The session will begin with a short summary of the latest lessons from climate science, and a concrete set of examples of existing software engineering research efforts applied to climate change. We will include an open discussion, and structured brainstorming sessions to map out an agenda for action. We invite everyone to come to the session, and take up this challenge.

Okay, so how does that sound as a call to arms?