This summer, we have a group of undergrad students working with us, who will try building some of the tools we have identified as potentially useful for climate scientists. We’re just getting started this week, so it’s not clear what we’ll actually build yet, but I think I can guarantee we’ll end up with one of two outcomes: either we build something that is genuinely useful, or we learn a lot about what doesn’t work and why not.

Here’s the first project idea. It responds to the observation that large climate models (and indeed any large-scale scientific simulation) undergoes continuous evolution, as a variety of scientists contribute code over a long period of time (decades, in some cases). There is no well-defined specification for the system, and nor do the scientists even know ahead of time exactly what the software should do. Coordinating contributions to this code then becomes a problem. If you want to make a change to some particular routine, it can be hard to know who else is working on related code, what potential impacts your change might have, and sometimes it is hard even to know who to go and ask about these things – who’s the expert?

A similar problem occurs in many other types of software project, and there is a fascinating line of research that exploits the social network to visualize how the efforts of different people interact. It draws on work in sociology on social network analysis – basically the idea that you can treat a large group of people and their social interactions as a graph, which can then be visualized in interesting ways, and analyzed for its structural properties, to identify things like distance (as in six degrees of separation), and structural cohesion. For software engineering purposes, we can automatically construct two distinct graphs:

  1. A graph of social interactions (e.g. who talks to whom). This can be constructed by extracting records of electronic communication from the project database – email records, bug reports, bulletin boards, etc. Of course, this misses verbal interactions, which makes it more suitable for geographically distributed projects, but there are ways of adding some of this missing information if needed (e.g. if we can mine people’s calendars, meeting agendas, etc).
  2. A graph of code dependencies (which bits of code are related). This can include simply which routines call which other routines. More interestingly, it can include information such as which bits of code were checked into the repository at the same time by the same person, which bits of code are linked to the same bug report, etc.

Comparing these two graphs offers insight into socio-technical congruence - how well the social network (who talks to whom) matches the technical dependencies in the code. Which then leads to all sorts of interesting ideas for tools:

For added difficulty, we have to assume that our target users (climate scientists) are programming in Fortran, and are not using integrated programming environments. Although we can assume they have good version control tools (e.g. Subversion) and good bug tracking tools (e.g Trac).

Electronic Lab Notebooks
Modeling challenge (part 2)

1 Comment

  1. Here is a social network map of a world-wide ERP implementation team at a Fortune 10 company…

    http://www.orgnet.com/email.html

  2. Pingback: Links for Summer Interns « Software Carpentry

  3. Pingback: Links for Summer Interns at Software Carpentry

Join the discussion: