Summer projects: I posted yesterday on social network tools for computational scientists. Greg has posted a whole list of additional suggestions.
Here, I will elaborate another of these ideas: the electronic lab notebook. For computational scientists, wiki pages are an obvious substitute for traditional lab notebooks, because each description of an experiment can then be linked directly with the corresponding datasets, configuration files, visualizations of results, scientific papers, related experiments, etc. (In the most radical version, Open Notebook Science, the lab notebook is completely open for anyone to see. But the toolset would be the same whether it was open to anyone, or just shared with select colleagues)
In my study of the software practices at the UK Met Office last summer, I noticed that some of the scientists carefully document each experiment via a new wiki page, but the process is laborious in a standard wiki, involving a lot of cut-and-paste to create a suitable page structure. For this reason, many scientists don’t keep good records of their experiments. An obvious improvement would be to generate a basic wiki page automatically each time a model run is configured, and populate it with information about the run, and links to the relevant data files. The scientists could then add further commentary via a standard wiki editor.
Of course, an even better solution is to capture all information about a particular run of the model (including subsequent commentary on the results) as meta-data in the configuration file, so that no wiki pages are needed: lab notebook pages are just user-friendly views of the configuration file. I think that’s probably a longer term project, and links in with the observation that existing climate model configuration tools are hard to use anyway and need to be re-invented. Let’s leave that one aside for the moment…
A related problem is better support for navigating and linking existing lab book pages. For example, in the process of writing up a scientific paper, a scientist might need to search for the descriptions of number of individual experiments, select some of the data, create new visualizations for use in the paper, and so on. Recording this trail would improve reproducibility, by capturing the necessary links to source data in case the visualizations used in the paper need to be altered or recreated. Some of requires a detailed analysis of the specific workflows used in a particular lab (which reminds me I need to write up what I know of the Met Office’s workflows), but I think some of this can be achieved by simple generic tools (e.g. browser plugins) that help capture the trail as it happens, and perhaps edit and annotate it afterwards.
I’m sure some of these tools must exist already, but I don’t know of them. Feel free to send me pointers…