I’ve been busy the last few weeks setting up the travel details for my sabbatical. My plan is to visit three different climate modeling centers, to do a comparative study of their software practices. The goal is to understand how the software engineering culture and practices vary across different centers, and how the differences affect the quality and flexibility of the models. The three centers I’ll be visiting are:
- The National Center for Atmospheric Research (NCAR) in Boulder Colorado;
- The Max-Planck Institute for Meteorology (MPI-M) in Hamburg, Germany;
- The Institute Pierre Simon Laplace (IPSL) in Paris, France.
I’ll spend 4 weeks at each centre, starting in July, running through to October, after which I’ll spend some time analyzing the data and writing up my observations. Here’s my research plan…
Our previous studies at the UK Met Office Hadley Center suggest that there are many features of software development for earth system modeling that make it markedly different from other types of software development, and which therefore affect the applicability of standard software engineering tools and techniques. Tools developed for commercial software tend not to cater for the demands of working with high performance code for parallel architectures, and usually do not fit well with the working practices of scientific teams. Scientific code development has challenges that don’t apply to other forms of software: the need to keep track of exactly which version of the program code was used in a particular experiment, the need to re-run experiments with precisely repeatable results, the need to build alternative versions of the software from a common code base for different kinds of experiments. Checking software “correctness” is hard because frequently the software must calculate approximate solutions to numerical problems for which there is no analytical solution. Because the overall goal is to build code to explore a theory, there is no oracle for what the outputs should be, and therefore conventional approaches to testing (and perhaps code quality in general) don’t apply.
Despite this potential mismatch, the earth system modeling community has adopted (and sometimes adapted) many tools and practices from mainstream software engineering. These include version control, bug tracking, automated build and test processes, release planning, code reviews, frequent regression testing, and so on. Such tools may offer a number of potential benefits:
- they may increase productivity by speeding up the development cycle, so that scientists can get their ideas into working code much faster;
- they may improve verification, for example using code analysis tools to identify and remove (or even prevent) software errors;
- they may improve the understandability and modifiability of computational models (making it easier to continue to evolve the models);
- they may improve coordination, allowing a broader community to contribute to and make use of a shared the code base for a wider variety of experiments;
- they may improve scalability and performance, allowing code to be configured and optimized for a wider variety of high performance architectures (including massively parallel machines), and for a wider variety of grid resolutions.
This study will investigate which tools and practices have been adopted at the different centers, identify differences and similarities in how they are applied, and, as far as is possible, assess the effectiveness of these practices. We will also attempt to characterize the remaining challenges, and identify opportunities where additional tools and techniques might be adopted.
Specific questions for the study include:
- Verification – What techniques are used to ensure that the code matches the scientists’ understanding of what it should do? In traditional software engineering, this is usually taken to be a question of correctness (does the code do what it is supposed to?); however, for exploratory modeling it is just as often a question of understanding (have we adequately understood what happens when the model runs?). We will investigate the practices used to test the code, to validate it against observational data, and to compare different model runs against one another, and assess how effective these are at eliminating errors of correctness and errors of understanding.
- Coordination – How are the contributions from across the modeling community coordinated? In particular, we will examine the challenges of synchronizing the development processes for coupled models with the development processes of their component models, and how the differences in the priorities of different, overlapping communities of users affect this coordination.
- Division of responsibility – How are the responsibilities for coding, verification, and coordination distributed between different roles in the organization? In particular, we will examine how these responsibilities are divided across the scientists and other support roles such as ‘systems’ or ‘software engineering’ personnel. We will also explore expectations on the quality of contributed code from end-user scientists, and the potential for testing and review practices to affect the quality of contributed code.
- Planning and release processes – How do modelers decide on priorities for model development, how do they decide which changes to tackle in a particular release of the model, and how they navigate between computational feasibility and scientific priorities? We will also investigate how the change process is organized, how changes are propagated to different sub-communities.
- Debugging – How do scientists currently debug the models, what types of bugs do they find in their code currently, and how they find them? In particular, we will develop a categorization of model errors, to use as a basis for subsequent studies into new techniques for detecting and/or eliminating such errors.
The study will be conducted through a mix of interviews and observational studies, focusing on particular changes to the model codes developed at each center. The proposed methodology is to identify a number of candidate code changes, including recently completed changes and current work-in-progress, and to build a “life story” for each such change, covering how each change was planned and conducted, what techniques were applied, and what problems were encountered. This will lead to a more detailed description of the current software development practices, which can then be compared and contrasted with studies of practices used for other types of software. This end result will be an identification of opportunities where existing tools and techniques can be readily adapted (with some clear indication of the potential benefits), along with a longer-term research agenda for problem areas where no suitable solutions currently exist.