I’ve been busy the last few weeks setting up the travel details for my sabbatical. My plan is to visit three different climate modeling centers, to do a comparative study of their software practices. The goal is to understand how the software engineering culture and practices vary across different centers, and how the differences affect the quality and flexibility of the models. The three centers I’ll be visiting are:
- The National Center for Atmospheric Research (NCAR) in Boulder Colorado;
- The Max-Planck Institute for Meteorology (MPI-M) in Hamburg, Germany;
- The Institute Pierre Simon Laplace (IPSL) in Paris, France.
I’ll spend 4 weeks at each centre, starting in July, running through to October, after which I’ll spend some time analyzing the data and writing up my observations. Here’s my research plan…
Our previous studies at the UK Met Office Hadley Center suggest that there are many features of software development for earth system modeling that make it markedly different from other types of software development, and which therefore affect the applicability of standard software engineering tools and techniques. Tools developed for commercial software tend not to cater for the demands of working with high performance code for parallel architectures, and usually do not fit well with the working practices of scientific teams. Scientific code development has challenges that don’t apply to other forms of software: the need to keep track of exactly which version of the program code was used in a particular experiment, the need to re-run experiments with precisely repeatable results, the need to build alternative versions of the software from a common code base for different kinds of experiments. Checking software “correctness” is hard because frequently the software must calculate approximate solutions to numerical problems for which there is no analytical solution. Because the overall goal is to build code to explore a theory, there is no oracle for what the outputs should be, and therefore conventional approaches to testing (and perhaps code quality in general) don’t apply.
Despite this potential mismatch, the earth system modeling community has adopted (and sometimes adapted) many tools and practices from mainstream software engineering. These include version control, bug tracking, automated build and test processes, release planning, code reviews, frequent regression testing, and so on. Such tools may offer a number of potential benefits:
- they may increase productivity by speeding up the development cycle, so that scientists can get their ideas into working code much faster;
- they may improve verification, for example using code analysis tools to identify and remove (or even prevent) software errors;
- they may improve the understandability and modifiability of computational models (making it easier to continue to evolve the models);
- they may improve coordination, allowing a broader community to contribute to and make use of a shared the code base for a wider variety of experiments;
- they may improve scalability and performance, allowing code to be configured and optimized for a wider variety of high performance architectures (including massively parallel machines), and for a wider variety of grid resolutions.
This study will investigate which tools and practices have been adopted at the different centers, identify differences and similarities in how they are applied, and, as far as is possible, assess the effectiveness of these practices. We will also attempt to characterize the remaining challenges, and identify opportunities where additional tools and techniques might be adopted.
Specific questions for the study include:
- Verification – What techniques are used to ensure that the code matches the scientists’ understanding of what it should do? In traditional software engineering, this is usually taken to be a question of correctness (does the code do what it is supposed to?); however, for exploratory modeling it is just as often a question of understanding (have we adequately understood what happens when the model runs?). We will investigate the practices used to test the code, to validate it against observational data, and to compare different model runs against one another, and assess how effective these are at eliminating errors of correctness and errors of understanding.
- Coordination – How are the contributions from across the modeling community coordinated? In particular, we will examine the challenges of synchronizing the development processes for coupled models with the development processes of their component models, and how the differences in the priorities of different, overlapping communities of users affect this coordination.
- Division of responsibility – How are the responsibilities for coding, verification, and coordination distributed between different roles in the organization? In particular, we will examine how these responsibilities are divided across the scientists and other support roles such as ‘systems’ or ‘software engineering’ personnel. We will also explore expectations on the quality of contributed code from end-user scientists, and the potential for testing and review practices to affect the quality of contributed code.
- Planning and release processes – How do modelers decide on priorities for model development, how do they decide which changes to tackle in a particular release of the model, and how they navigate between computational feasibility and scientific priorities? We will also investigate how the change process is organized, how changes are propagated to different sub-communities.
- Debugging – How do scientists currently debug the models, what types of bugs do they find in their code currently, and how they find them? In particular, we will develop a categorization of model errors, to use as a basis for subsequent studies into new techniques for detecting and/or eliminating such errors.
The study will be conducted through a mix of interviews and observational studies, focusing on particular changes to the model codes developed at each center. The proposed methodology is to identify a number of candidate code changes, including recently completed changes and current work-in-progress, and to build a “life story” for each such change, covering how each change was planned and conducted, what techniques were applied, and what problems were encountered. This will lead to a more detailed description of the current software development practices, which can then be compared and contrasted with studies of practices used for other types of software. This end result will be an identification of opportunities where existing tools and techniques can be readily adapted (with some clear indication of the potential benefits), along with a longer-term research agenda for problem areas where no suitable solutions currently exist.
Steve, how did you select these centers? I wonder how different they will be. For example, perhaps the Japanese Modeling Center (if there is such a thing) would be more different from Hadley than the Germans.
Pingback: Upcoming conferences relevant to Climate Change Informatics | Serendipity
In your interviews, will you be asking them how they learned the skills they are applying? I suspect most of their software skills are acquired by the apprenticeship model of learning from more senior people around you in the lab, rather than being taught in a classroom (even one with a lab).
A reader request: on the Verification topic, I’d be interested in hearing how the groups have integrated computer algebra systems into their work-flow.
Pingback: Reconstructing context from email databases | Serendipity
Pingback: Climate models and computing talk with Balaji « Semantic Werks
Scientific code development has challenges that don’t apply to other forms of software: the need to keep track of exactly which version of the program code was used in a particular experiment, the need to re-run experiments with precisely repeatable results, the need to build alternative versions of the software from a common code base for different kinds of experiments.
In more than 20 years as a software professional, I have never worked on any project in which these three requirements were not vital.
[Nick – could you be more specific about what types of software you’ve worked on? For example, I think it’s fair to say that repeatability is important in all software, at the very least for reproducing buggy behaviour; but here I’m talking about a much more stringent requirement; for example the case where someone wants to re-run a 6-year old version of the software on a new platform with a new compiler, and guarantee that every floating point computation gives precisely the same result down to the least significant bit? – Steve]
Pingback: This is what tenure is for | Serendipity
What Nick said.
I am not a professional software engineer / developer, but in over 40 years of being tightly integrated with software development, I have never worked on a project for which these requirements were not vital. These plus several additional that address directly critical verification issues. Plus several additional that address portability, maintainability, extension-ability, understanding-ability. Plus others for accurately documenting in great what is actually in the code. Plus others relating to validation. And the list goes on.
All are vitally important.
And much engineering software development employs each and every item on every list. It’s not rocket science.
Why can’t climate ‘science’ do it ?
Cranking out a bunch of calculations and making some kind of ensemble-mean of the numbers does not lead to verification. Never has and never will.
[Why can’t climate science do what? If you actually know anything about climate model development, feel free to share. Otherwise, go and do your trolling elsewhere – Steve]
What Steve said. Climate modelling does seem to do exactly these things; a lot of science software does not (because it’s a bunch of little run-once scripts), but that’s far from unique to climate science.
I am a little sceptical that science software has unusual process requirements: in industry that sort of story is often encountered by process improvers, and is usually code for “we like doing things our way and we’re not going to adopt any of your new-fangled process ideas”. In other words, the unusual requirements are dictated by inertia, not by the organisation goals. Maybe science is different, but I don’t see why.
[…and from the other side of the canyon, good software developers understand that most of what the “process improvers” want them to do is management bullshit. I’ve seen it from both sides. Process improvement tends to take an overly mechanistic view of software development, in which the skills and experience of the developers is irrelevant. It’s useful on very large projects with relatively inexperienced developers. On small projects, with highly skilled developers, its a productivity killer. That’s what got the whole agile movement started in the first place – because the process improvers were drowning them in red tape. How do we get out of this argument? By demonstrating, with clear empirical evidence, that proposed process improvements really do bring the claimed benefits for a specific kind of software (in this case, climate simulations). Unfortunately, there are no valid empirical studies of process improvement applied to scientific software of any kind. In fact, its worse than that – I’m not aware of valid empirical evidence that process improvement is useful for any type of software; however I am willing to take on trust the anecdotal evidence of its value to very large software companies, because these companies are simply unable to attract and retain top talent across their software teams. – Steve]
Well, here’s my usual challenge. Point me to the document that clearly identifies the continuous form of the vertical component of the momentum equation used in the NASA / GISS ModelE model and code. Pointers to the document that clearly identifies the time levels at which each part of the discrete approximation is evaluated earns bonus points.
Thanks in advance for all considered responses.
[Dan, you’ll have to can the attitude if you want to continue to comment here. If you have a specific point to make about weaknesses in the models, please make it. I’m not familiar with ModelE, nor, I suspect, are any of the regular commenters here. So your “challenge” comes across as pure gish gallop. Now, in case you are actually interested in constructive dialogue, here’s the documentation for the atmospheric module of the CCSM – this model probably has the best documentation of any. If you can offer useful improvements to the model or its documentation, I’m sure the CCSM team would welcome your contribution. – Steve]
I did not mean to exhibit an altitude. I attempted to ask a single, straightforward question regarding documentation of one of the several climate science models. I based my question on the content of your comments and posts on this blog and a paper published by you.
This post, Studying team coordination and software verification practices for earth system modeling contains this statement.
where ‘previous studies’ refers to, Engineering the Software for Understanding Climate Change, which has a link to Engineering the Software for Understanding Climate Change
which contains this statement:
And this post, What makes software engineering for climate models different?
My point being that none of these statements indicate that the conclusions are limited to only selected climate science software. On the contrary, the statements refer to all climate science software. An entirely new phylum would very likely cover lots of different software-development projects.
Greatly detailed documentation of what is actually in the coding would seem to be an important aspect of all computer software development; not just selected examples. Especially when the software evolves over significant periods of time ( in many important cases measured in decades ), is modified by different team members, is used for a variety of applications by many users and these are not all in the same physical location, and contains models and methods for several inherently complex and coupled physical phenomena and processes. Additionally, the continuous form of the fundamental equations for the models would seem to be an excellent starting point for the documentation.
[If you’ve read the posts and the paper you quote from, then it should be absolutely clear to you that I’m speculating about possible generalizations from the small number of case studies I’ve conducted so far, and that ModelE is not one of them. Which means you’re not adding anything constructive to this thread – you’re simply taking a hostile interpretation of my work. Last warning – cut the hostility if you want to continue this discussion. On the issue of documentation, the amount and timeliness depends on a number of factors, including the maturity of the software and the nature of the intended audience for such documentation. ModelE is not a community project (in contrast to, say the CCSM). The project team is small, and if they all know how the DyCore is implemented, or know who to ask, then there’s no point wasting time writing up formal documentation. No project I’ve studied (with the possible exception of the Shuttle Flight Software) has ever managed to maintain complete up-to-date documentation for every version of the software released. The key question then is how to strike a balance between maintaining documentation, and getting on with software development. Some recent studies show that most printed documentation is pretty useless anyway, which suggests the balance should be for less formal documentation, and more readable code. – Steve]
“Agile” is exactly process improvement, and agile advocates are exactly process improvers.
[What, despite the fact that the Agile Manifesto explicitly denigrates ‘process’ and ‘documentation’? – Steve]
@Dan Hughes
I can see that we have differences in ideas and have experiences that are completely different. Take this example from your reply above
My experiences have been that complete up-to-date documentation is a requirement for release of software for production applications. It is a requirement that is independently verified to have been fulfilled prior to release of the software. I think documentation outside the code would be very useful for users who need to be certain that a model or method has the properties and capabilities expected to be important for the application of interest.
I think reading prose is much more productive than attempting to reverse-engineer nitty-gritty details from coding. More productive and very much more certain of correctness.
Many users know nothing about the structure of the code and where to look for specific pieces or how to ensure that the piece of interest isn’t changed somewhere. And many users do not have access to the source; only the executable.
Readable code is absolutely necessary for those who work with the code; also a requirement that can be independently verified prior to release. Many users will never see the source code.
I made no suggestions about the physical realization of the documentation. Printed, stand-alone simple text documents is one option. The degree of complexity and coupling with the coding are open for discussion.
Yes, because that word doesn’t mean what you, or they, pretend to think that it means. All software is developed through some process. Some processes are better than others (e.g. better software, quicker or cheaper development). For instance, some agile processes are pretty good. Choosing a better process is process improvement. End of story.
And there’s a lot of empirical evidence about software process improvement, including agile processes. Maybe none in science software.
Sorry, didn’t see your inline comment there; subscribing to email comment notifications doesn’t echo them (which is why I don’t use them on my blogs).
Well, let’s see: when I was a teenager I wrote some school administration software, then worked for a year in small-business accounting software (on Data General minicomputers, which were a blast). Then after college – in which I wrote a lot of non-commercial code, including bits and pieces of more-or-less throwaway science software – I spent several years working on compilers and runtimes for (and in) Standard ML, both in industry (at Harlequin) and academe (at CMU). This evolved, in a second stint at Harlequin, into work on garbage collection and other memory management for an assortment of systems, including implementations of Dylan, Common Lisp, and PostScript. That was when I got really interested in process improvement, and we did a lot (of formal review, for instance). Then in 1997 I founded Ravenbrook with my colleague Richard Brooksby, and have worked for maybe a couple of dozen clients since then, of which I might mention Geodesic (memory management again), Perforce (software development tools and integrations, in Python), Social Science Automation (data analysis and visualisation tools, in Lisp), Sentec (embedded automation on PIC micro-controllers). As well as software development we do software management consultancy, using ideas of evolutionary delivery to produce software on time, to budget, exceeding requirements. Recently I have been spending some of my copious spare time on the Clear Climate Code project.
Pretty much all of this software cares very much about reproducibility.
Why do they think that is a sensible requirement? I can certainly build software which will do it (the trick being to insist on CPUs which implement IEEE standard floating-point – that part used to be difficult; on compilers which both know how to coax that out of the CPU and which don’t try stupid re-ordering tricks; and on coding styles which are careful about operation order and either avoid conversion between float formats or are very exacting about it), or a software development process which will let other people do it (add to the above, and to a generic high-quality process: make sure, by either hiring or training, that everyone on the team *really* understands floating-point arithmetic), but on the face of it the requirement is somewhat ridiculous. I imagine it’s there because some downstream processing is a chaotic model, and the results of *that* processing need to be reproducible?
One could greatly reduce reproducibility problems for far less effort by (say) rounding the intermediate results at some granularity defined by the amount of information in the system.
[War story: in CCC we have removed a large number of rounding operations which were only present because the original multi-phase Fortran stored intermediate data in plain-text files, rounded (or sometimes truncated) to some number of places. Of course, the change to the final results was insignificant].
Just to add: I want to distance myself from some of what Dan Hughes is writing in his comment. I certainly am not hostile to your work; I absolutely welcomed Easterbrook & Johns 2008, and have hawked it around the office. I’m just a bit sceptical of special pleading, having encountered a great deal of it in my career, much of it from people who want to be “left alone to do their work”, when they are demonstrably *not* doing their work very well.
(and certainly some of those people had “process” in their job titles, but seemed to think that their job was to place obstacles in people’s way, so I have a lot of sympathy for some of your iconoclastic remarks, however much I argue against them: process sometimes needs to be saved from “process”).
[Understood and appreciated. I’m very sensitive to the ‘special pleading’ issue too, and am hoping to pin down a robust answer to what practices climate scientists should (and should not) adopt – Steve]
May I suggest that some of the more recent publications addressing verification and validation concepts, methods, and procedures are being written by William ( Bill ) Oberkampf and colleagues ( Trucano, Pilch, Salari, and others ) at Sandia National Laboratories and P. J. ( Pat ) Roache. Many of the Sandia publications can be found and downloaded at the http://www.osti.gov site, and from the Sandia site. Pat has recently issued the Second Edition of his book, Fundamentals of Verification and Validation, Hermosa Publishers; also available from Amazon and probably other places.
Pat’s books have interesting history stories about the, ultimately successful, struggles to introduce these concepts into professional societies’ publication requirements. The First Edition, Verification and Validation in Computational Science and Engineering, also has an interesting discussion of the paper, N. Oreskes, K. Shrader-Frechette, and K. Belitz. Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. Science, 263(5147):641, 1994.
A short bibliography, from about three years ago is here. That post has broken links and I haven’t taken time to fix them.
More recently, these concepts have been investigated and implemented into the Advanced Strategic Computing Initiative ( ASCI ) at the Los Alamos, Lawrence Livermore, and Sandia National Laboratories in the USA. This program is now named Advanced Simulation & Computing ( ASC ) I think. A Google will find stuff, http://www.osti.gov will, too. A short summary is here.
The complete Sandia Software Quality Plan for ASC is here. Some Los Alamos concepts are given here.
Several of the papers and reports will discuss other Software Engineering aspects for scientific and engineering software.
[Thanks – that’s a useful resource list – Steve]
Here are a few papers that begin to apply some of the more recent concepts of verification to a limited aspect of climate science.
Robert L. Walko And Roni Avissar, “The Ocean–Land–Atmosphere Model (OLAM). Part I: Shallow-Water Tests”, Monthly Weather Review, Volume 136 November 2008. DOI: 10.1175/2008MWR2522.1
Robert L. Walko And Roni Avissar, “The Ocean–Land–Atmosphere Model (O:AM). Part II: Formulation and Tests Of The Nonhydrostatic Dynamic Core”, Monthly Weather Review, Volume 136 November 2008. DOI: 10.1175/2008MWR2523.1
David L. Williamson, Jerry G. Olson, and Christiane Jablonowski, “Two Dynamical Core Formulation Flaws Exposed by a Baroclinic Instability Test Case”, Monthly Weather Review, Volume 137 (2) February 2009. DOI: 10.1175/2008MWR2587.1
David L. Williamson, “The Evolution of Dynamical Cores for Global Atmospheric Models”, J. Meteorological Soc. Japan, Volume 85B, 2007.
Google might find others by these authors.
Pingback: Do Climate Models need Independent Verification and Validation? | Serendipity
Pingback: Workshop on Advancing Climate Modeling (1) | Serendipity
Pingback: TEDx talk: Should we trust climate models? | Serendipity