This summer, we have a group of undergrad students working with us, who will try building some of the tools we have identified as potentially useful for climate scientists. We’re just getting started this week, so it’s not clear what we’ll actually build yet, but I think I can guarantee we’ll end up with one of two outcomes: either we build something that is genuinely useful, or we learn a lot about what doesn’t work and why not.
Here’s the first project idea. It responds to the observation that large climate models (and indeed any large-scale scientific simulation) undergoes continuous evolution, as a variety of scientists contribute code over a long period of time (decades, in some cases). There is no well-defined specification for the system, and nor do the scientists even know ahead of time exactly what the software should do. Coordinating contributions to this code then becomes a problem. If you want to make a change to some particular routine, it can be hard to know who else is working on related code, what potential impacts your change might have, and sometimes it is hard even to know who to go and ask about these things – who’s the expert?
A similar problem occurs in many other types of software project, and there is a fascinating line of research that exploits the social network to visualize how the efforts of different people interact. It draws on work in sociology on social network analysis – basically the idea that you can treat a large group of people and their social interactions as a graph, which can then be visualized in interesting ways, and analyzed for its structural properties, to identify things like distance (as in six degrees of separation), and structural cohesion. For software engineering purposes, we can automatically construct two distinct graphs:
- A graph of social interactions (e.g. who talks to whom). This can be constructed by extracting records of electronic communication from the project database – email records, bug reports, bulletin boards, etc. Of course, this misses verbal interactions, which makes it more suitable for geographically distributed projects, but there are ways of adding some of this missing information if needed (e.g. if we can mine people’s calendars, meeting agendas, etc).
- A graph of code dependencies (which bits of code are related). This can include simply which routines call which other routines. More interestingly, it can include information such as which bits of code were checked into the repository at the same time by the same person, which bits of code are linked to the same bug report, etc.
Comparing these two graphs offers insight into socio-technical congruence - how well the social network (who talks to whom) matches the technical dependencies in the code. Which then leads to all sorts of interesting ideas for tools:
- Visualizers. These capture and display the project social network, and allowing you to visualize congruence mismatches. E.g. Tesseract, Workspace Activity Viewer, WorldView,
- Awareness Tools. These are intended to give you a stronger sense of what else is going on in the project in realtime, for example by showing you who is currently working on code related to yours. Examples: The work of the SEGAL lab at UVic, the TagSEA project and NavTracks from the CHiSEL group, Palantir, Lighthouse,
- Recommenders. These provide advice on who to talk to, how to identify the experts, or who to assign tasks to. For example, bug triage (the paper “Who should fix this bug?” is a classic), Hipikat, Suade, ConcernDetector,
For added difficulty, we have to assume that our target users (climate scientists) are programming in Fortran, and are not using integrated programming environments. Although we can assume they have good version control tools (e.g. Subversion) and good bug tracking tools (e.g Trac).