Developing more descriptive meta-data for the huge datasets used by climate scientists is an area where software engineering and information systems research can make a useful contribution. The goal is to develop rich metadata descriptors, based on an ontology that spans the field: the physical processes that climate scientists simulate, the physical quantities used in their models, the configuration settings used in creating model runs, the scientific questions these runs are intended to investigate, and the contents of the datasets used. This ontology can then be used to unify the metadata for model configurations and datasets, so that in principle, the metadata description would be rich enough to allow a researcher to set up a model to reproduce the data in the dataset, and to support rich queries over the datasets (e.g. which model runs isolate the effects of volcanoes on 20th century climate?).
There are two big research projects working on this:
- Earth System Curator, which is a US project, and includes as partners NCAR, GFDL, and Georgia Tech. Spencer Rugaber, who is part of the project team, will be coming to ICSE and will talk about Curator in the Software Engineering for the Planet session, and hopefully show off some demos. Here’s a more detailed research paper on Curator.
- Metafor, which is a big EU project, and includes as partners the Hadley Centre, IPSL, BADC and others. This project has started to develop some use cases, and various UML models. No research papers yet, but plenty of information about the project on their website.