Can we improve the engineering of climate software?

13. July 2010 · 17 comments · Categories: Uncategorized

William Connolly has written a detailed critique of our paper “Engineering the Software for Understanding Climate Change”, which follows on from a very interesting discussion about “Amateurish Supercomputing Codes?” in his previous post. One of the issues raised in that discussion is the reward structure in scientific labs for software engineers versus scientists. The funding in such labs is pretty much all devoted to “doing science” which invariably means publishable climate science research. People who devote time and effort to improving the engineering of the model code might get a pat on the back, but inevitably it’s under-rewarded because it doesn’t lead directly to publishable science. The net result is that all the labs I’ve visited so far (UK Met Office, NCAR, MPI-M) have too few software engineers working on the model code.

Which brings up another point. Even if these labs decided to devote more budget to the software engineering effort (and it’s not clear how easy it would be to do this, without re-educating funding agencies), where will they recruit the necessary talent? They could try bringing in software professionals who don’t yet have the domain expertise in climate science, and see what happens. I can’t see this working out well on a large scale. The more I work with climate scientists, the more I appreciate how much domain expertise it takes to understand the science requirements, and to develop climate code. The potential culture clash is huge: software professionals (especially seasoned ones) tend to be very opinionated about “the right way to build software”, and insensitive to contextual factors that might make their previous experiences inapplicable. I envision lots of the requirements that scientists care about most (e.g. the scientific validity of the models) getting trampled on in the process of “fixing” the engineering processes. Right now the trade-off between getting the science right versus having beautifully engineered models is tipped firmly in favour of the former. Tipping it the other way might be a huge mistake for scientific progress, and very few people seem to understand how to get both right simultaneously.

The only realistic alternative is to invest in training scientists to become good software developers. Greg Wilson is pretty much the only person around who is covering this need, but his software carpentry course is desperately underfunded. We’re going to need a lot more like this to fix things…

17 Comments

Jason D.
July 13, 2010 at 8:23 pm

While I think there can definitely be problems in having experienced software professionals working in a new domain (climate science) I think it’s a bit much to say the “only realistic alternative” is to train scientists to become software developers. In many other complex application domains they don’t abandon all hope of hiring software folks and training them in the problem domain. The Software Carpentry efforts are much needed and should be supported, but let’s not make sweeping statements that cut out trained software engineers just based on a “potential culture clash.”
Michael Tobis
July 14, 2010 at 2:02 am

I think I see what Steve is saying. I have run into it myself, in trying to engage software engineers and computer scientists in joint proposals with climate scientists, and seeing others funded to do so and fail or at best attain modest success.

It’s more a criticism of scientists than of software professionals: scientists cannot effectively communicate their requirements to people with other interests because they have never been formalized.

A conventional requirements gathering approach as if you were being hired by a trucking company or a shoe vendor is not going to work. The whole approach to the computation is different and indeed there are

One reason to love Python (among many) is because it is the only software community on earth (as far as I know) where commercial code and scientific code both carry comparable weight in the development of the language, the infrastructure and the community. So contact is happening there. But even so, the needs of climate are quite idiosyncratic compared to other sciences, and not well-formulated so that outsiders can understand. Climate has do far only engaged with scientific Python at the fringes.

I’d like to consider one alternative, which is to take advantage of the huge interest in climate among technically educated people, to develop a new software culture based largely on an open-source culture that takes an interest in the science from the ground up. Maybe this can form the foundation for a new institution.

But the idea of just doubling the budget and hiring software engineers within existing institutions to refactor existing code bases and processes has been problematic already and is yet to yield as much benefit as software professionals might a priori expect.
Pingback: Tweets that mention Can we improve the engineering of climate software? | Serendipity -- Topsy.com
steve
July 14, 2010 at 11:57 am

Yes, quite. To clarify a little more, I don’t mean to say that software professionals can’t make a contribution, because they can, and indeed they are. I’ve met several people who have come into climate modeling from a more traditional software professional career. They’re rare, but they exist, and they bring interesting and useful perspectives.
But the learning curve is steep, and climbing it involves not just hard work, but also some humility. I’d really like to see more people climb it…
Michael Saunby
July 14, 2010 at 12:18 pm

Another thing to remember is that super-computers are as expensive when doing nothing, so getting code running quickly, and keeping it running 24/7 is very important.

There are IT specialists at the UK Met Office, but their focus is mostly on keeping everything up and running. I believe top operational centres have much higher throughput than research only facilities – it would certainly be expected, since that’s where the focus is. The benefit to the climate scientists is that this will give more model years of output, even if the code isn’t always the very best.
Eli Rabett
July 14, 2010 at 3:31 pm

All this may be pushing at an open door. Eli belongs to a mailing list for climate science writ broad (the archives are closed;) and has seen ads (more than one) recently for software engineers to work on code development. Here is one of them
———————————

Research engineer position
Development of OASIS, a numerical code coupler for high-performance computing in climate modelling

Numerical models used since the end of the sixties for climate modelling are powerful multi-physics codes reproducing the interactions between the different components of the Earth climate system, e.g. the atmosphere, the ocean, the sea ice, the surface, etc. Since 1991,
the « Climate Modelling and Global Change » group at CERFACS has been playing a key role in the
climate modelling community with the development of OASIS, a software ensuring the coupling of these components, i.e.synchronized exchanges and transformation of information at their
interface. Today, OASIS is used worldwide by more than 30 climate modelling groups in Europe, USA, Canada, China and Australia.

The long term maintenance and development of OASIS is ensured by CERFACS
and CNRS who each provides one full-time employee for these tasks. In the framework of the
IS-ENES project (« Infrastructure for ENES», see https://is.enes.org/), the European
Commission provides an additional funding of about 8 person-years over the 2009-2013 period, distributed between CERFACS and the Deutsches Klimarechenzentrum GmbH (DKRZ) in Hamburg. The main objective within IS-ENES is to finalise the development of OASIS4, the most recent fully parallel version of the software.

The research engineer hired in the framework of this contract will lead
certain aspects in the development of the OASIS4 coupler. His/her main tasks will be to:
· fully validate the global parallel interpolation algorithms for the numerical grids used in climate
modelling;
· improve the conservative remapping already implemented in OASIS;
· measure and improve the overall efficiency of the coupler;
· set-up and validate high-resolution coupled models based on OASIS4 (e.g. a coupling between
a T359 atmospheric model with 8 million points and an ocean model with a resolution of 1/12 of
a degree with 16 million points).

The engineer will be hired on a 18-month non-permanent position and will
join the current development team (2 engineers at CERFACS and one engineer at DKRZ); an aptitude to work and interact within a team of developers is therefore essential. The position requires mastering the Linux operating system, the Fortran 90 programming language and the Message Passing Interface specification. Confirmed skills in high performance scientific computing and in software development are highly recommended.

Notions in climate modeling would be an asset.
———————————-
Got a student Steve??
steve
July 14, 2010 at 4:32 pm

Eli: I’ll be visiting these folks in Hamburg and Paris in the next couple of months. As an aside, they pronounce the IS-ENES project as “easiness” 🙂

But you see the problem at the end of the ad? No CS grad I know of has any experience of F90 and MPI, and few have any HPC experience. This ad won’t recruit mainstream software professionals, it will recruit from within the existing earth science community.
William M. Connolley
July 14, 2010 at 5:37 pm

First off, OASIS is almost entirely a software engineering project anyway; it is coupling code, not climate model (disclaimer: I never used it, though I did try once, not very hard). And I think that CERFACS is mostly software engineering anyway, albeit done in Fortran :-). As it so ccharmingly says, “Notions in climate modeling would be an asset.”. So, the OASIS stuff is a bit of an exception.

Secondly, at least based on my own personal experience, you can’t teach people software engineering within a scientific community. The standards are just too different, the approacches, the way things get done, the mindsets (no more could you teach people Science in a software company). You have to bring people in from outside. But as you say, pure outsiders are not a lot of use because they don’t understand the science environment. So the solution is obvious: hire back people who’ve left science and worked in SE for a few years; attract them with big fat paycheques! Ha ha I knew you wouldn’t fall for that one. I am however serious: about the only way to do this is to encourage flow between the two disciplines. The UKMO has probably fouled that up somewhat by moving to Exeter.

Which brings up the longstanding “who wants to write Fortran” problem. Any decent SE can pick it up quickly, but still won’t want to; at the very least, this translates into having to pay them more. And while you’re there, you’re writing in a language that will be useless in your future elsewhere, which is bad. So really someone needs to start the long slow painful process of xfer to a real language. Michael Tobis has had some thoughts on this, I believe. Perhaps this is what the SE’s could actually *do* – translate, beautify, purify the code. Or do some real SE and write a model-writing language instead, which would translate PDE’s into code.
Alfred Differ
July 15, 2010 at 2:04 am

Don’t forget the cadre of folks who trained in science but work as software engineers. We may have trained in the wrong particulars, but the science isn’t foreign to us. The most experienced should be able to straddle the fence and function as analysts long enough to understand the requirements.
Nick Barnes
July 15, 2010 at 6:05 am

William is right on all these points, I think. Except maybe his suggestion of a new domain-specific language, which is usually a fool’s errand.
William M. Connolley
July 15, 2010 at 8:02 am

Actually, I was going to back off a little, since I decided I might be somewhat over-generalising from my own personal experience.

On the domain-specific language: OK, since I *do* think this is a good idea, but am clearly not being convincing:

The idea is not to write the whole model in the new langauge, but only to use it for various “plug-ins”, possibly pre-processed automatically, possibly just generated offline and stuffed in by hand.

What I’m thinking of is that an awful lot of GCM code looks like:

for i=1,n
for j=1,m
do f(i,j) related stuff
end
end

Or at least, it used to. In the days of domain-decomposition, it looks more like

for i=something,somethingelse
for j-morecomplexstuff,yetmorecomplexstuff
do f(i,j) thingies
end
end

where the loop boundaries can depend on your difference template and possibly other stuff I’ve forgotten, and which is not too hard to get wrong.

A language that allowed you to say:

invoke_todays_decompostion_on( f(i,j) )

might well help.

You can go further, because the std f(i,j) is something like f(i,j)-f(i+1,j)-… – you know, the std finite difference stuff. If you allow your new language to understand delta(f) then this (a) gets rid of yet more tedious coding and errors and (b) means that the loop-decomposition part of the langauge knows what order your FD scheme is so can adjust the loops accordingly. Great, eh?
Nick Barnes
July 15, 2010 at 9:12 am

We already have languages which do that sort of thing, of course. Like APL, and J….
Nick Barnes
July 15, 2010 at 9:19 am

Oh, and of course in Python you can write:
todays_decomposition(f, differencing=True)

in Lisp:
(todays-decomposition f :key differencing)

and so on in a lot of languages. Probably even in current Fortran.
Phil Bentley
July 15, 2010 at 4:35 pm

Speaking from an insider’s perspective (5 years at an international climate research centre, PhD in Earth Science, 25+ years of software development experience in private and public sectors) it’s my belief that the difficulties encountered developing robust scientific software systems stem not from an absence of suitable computer languages (there are plenty), nor from a lack of technically proficient people (there are enough, if perhaps not a plenitude), but rather from the constraints arising from time-limited projects, which in turn are a direct consequence of the short-term funding regimes imposed by national science bodies and the inevitable cycle of political administrations.

The difficult software engineering challenges that I see in the climate science arena are, IMHO, going to take 5, 10, 15 or more years to address. But most of the projects that I’ve witnessed or participated in over the past 5 years have only been 1 to 3 years in duration: they’re merely chipping away at the edges (which BTW is in no way intended to denigrate the enormous – and frequently unheralded – efforts of those folks working on such projects).

In fact these small projects often make the situation worse because, instead of solving the fundamental engineering problems, for the most part they just add yet another patina of complexity to the existing edifice!

So sometimes we say we’d really like to stop any further work on System X and start over on a shiny new System Y. But we also know that’s a dangerous game because, among other factors, i) there are too many mission-critical services depending on System X, which sucks up all our resources; and ii) there are some things you should just never do!

We’re still battling with this condundrum. But it’s one that clearly we need to solve. And probably at the international level at that.
Nick Barnes
July 16, 2010 at 5:22 am

“Second System Effect”.
Greg Wilson
July 16, 2010 at 8:51 pm

After putting 300+ students through Software Carpentry and similar courses over 13 years, I’m convinced that teaching a scientist about software engineering is a *lot* easier than teaching a computer scientist enough about fluid dynamics or genomics to understand what the code is actually doing — at a guess, the former takes 1/10th as long. If people trusted money to be there, I could see “science then coding” becoming more of a recognized (and rewarded) career path. Unfortunately, as per http://www.miller-mccune.com/science/the-real-science-gap-16191/, that’s a very big “if”.
Nick Barnes
July 18, 2010 at 10:50 am

I entirely agree with Greg Wilson, if we can take as read that “teaching a scientist about software engineering” in this context does not mean “training a scientist to read, understand, and contribute to TOSEM”, or even “training a scientist to professional levels of expertise in software development”, but simply “training a scientist to produce far more maintainable, flexible, and reusable software, with far lower defect rates”. Either the first or second of these skills takes years for a scientist to develop, just as it does for a non-scientist. But the third can certainly be achieved in weeks or months (again, for either a scientist or for a non-scientist).
Nick Savage
July 31, 2010 at 6:22 am

I agree with Nick Barnes that it would be very helpful to promote “training a scientist to produce far more maintainable, flexible, and reusable software, with far lower defect rates” However, this is necessary but not sufficient. I often hear the complaint “we don’t have time to rewrite this bit of code, I know its bad at the moment but I have all these papers to finish and the we have promised the funding body this new bit of science so we have to add some other bit of code”.

To my mind, a good analogy is with keeping your laboratory clean – no lecturer would accept the excuse of a postdoc that they don’t have time to keep the lab tidy. So why is code different? I think there are 2 main reasons – visibility and time to impact on the individual. A dirty lab is very obvious to the management. Dirty code less so – management rarely look at code, they look at scientific results. The other factor to my mind that makes the difference is who is impacted. If you leave glassware lying around etc. then you will probably damage your chance of getting the right results. Otherwise you may be sharing the lab with other people who are impacted straight away. With software it is easy to create code that works for what you want to do now, gives you the answers you need but is horrible to understand or maintain. However, it may be someone working at a different place, or someone in several years time that has to sort the mess you made out (or too often code around it…). You may no longer be working on that code so there is no come back on you. Conversely even a relatively simple fix such as improving how informative an error message is may only help you once a year, but if it saves everyone using the code a couple of hours once a year, that is a major amount of time for doing better science freed up.

So what is the solution? I think it has to involve some aspect of better informed peer review of code but we also need management to explicitly recognise quality of coding as part of the career development of scientists. This is especially hard at universities where the funding tends to be for 2-3 year projects, very tied to specific, often too challenging scientific objectives, and frequently people leave at the end of one or two contracts but should be easier at the larger climate and research centres. It does also need the re-education of funders though to understand that they can’t just pay for science and expect to get the model thrown in for free.
Pingback: I never said that! | Serendipity
Pingback: You can’t delegate ill-defined problems to software engineers | Serendipity

Can we improve the engineering of climate software?

17 Comments

Leave a Reply