You can’t delegate ill-defined problems to software engineers

16. November 2010 · 13 comments · Categories: climate modeling, collaborative science

I had lunch last week with Gerhard Fischer at the University of Colorado. Gerhard is director of the center for lifelong learning and design, and his work focusses on technologies that help people to learn and design solutions to suit their own needs. We talked a lot about meta-design, especially how you create tools that help domain experts (who are not necessarily software experts) to design their own software solutions.

I was describing some of my observations about why climate scientists prefer to write their own code rather than delegating it to software professionals, when Gerhard put it into words brilliantly. He said “You can’t delegate ill-defined problems to software engineers”. And that’s the nub of it. Much (but not all) of the work of building a global climate model is an ill-defined problem. We don’t know at the outset what should go into the model, which processes are important, how to simulate complex physical, chemical and biological processes and their interactions. We don’t know what’s computationally feasible (until we try it). We don’t know what will be scientifically useful. So we can’t write a specification, nor explain the requirements to someone who doesn’t have a high level of domain expertise. The only way forward is to actively engage in the process of building a little, experimenting with it, reflecting on the lessons learnt, and then modifying and iterating.

So the process of building a climate model is a loop of build-explore-learn-build. If you put people into that loop who don’t have the necessary understanding of the science being done with the models, then you slow things down. And as the climate scientists (mostly) have the necessary technical skills, it’s quicker and easier to write their own code than to explain to a software engineer what is needed. But there’s a trade-off: the exploratory loop can be traversed quickly, but the resulting code might not be very robust or modifiable. Just as in agile software practices, the aim is to build something that works first, and worry about elegant design later. And that ‘later’ might never come, as the next scientific question is nearly always more alluring than a re-design. Which means the main role for software engineers in the process is to do cleanup operations. Several of the software people I’ve interviewed in the last few months at climate modeling labs described their role as mopping up after the parade (and some of them used more colourful terms than that).

The term meta-design is helpful here, because it specifically addresses the question of how to put better design tools directly into the hands of the climate scientists. Modeling frameworks fit into this space, as do domain specific-languages. But I’m convinced that there’s a lot more scope for tools that raise the level of abstraction, so that modelers can work directly with meaningful building blocks than lines of Fortran. And there’s another problem. Meta-design is hard. Too often it produces tools that just don’t do what the target users want. If we’re really going to put better tools into the hands of climate modelers, then we need a new kind of expertise to build such tools: a community of meta-designers who have both the software expertise and the domain expertise in earth sciences.

Which brings me to another issue that came up in the discussion. Gerhard provided me a picture that helps me explain the issue better (I hope he doesn’t mind me reproducing it here; it comes from his talk “Meta-Design and Social Creativity” given at IEMC 2007):

To create reflective design communities, the software professionals need to acquire some domain expertise, and the domain experts need to acquire some software expertise (diagram by Gerhard Fischer)

Clearly, collaboration between software experts and climate scientists is likely to work much better if each acquires a little of the other’s expertise, if only to enable them to share some vocabulary to talk about the problems. It reduces the distance between them.

At climate modeling labs, I’ve met a number both kinds of people – i.e. climate scientists who have acquired good software knowledge, and software professionals who have acquired good climate science knowledge. But it seems to me that for climate modeling, one of these transitions is much easier than the other. It seems to be easier for climate scientists to acquire good software skills than it is for software professionals (with no prior background in the earth sciences) to acquire good climate science domain knowledge. That’s not to say it’s impossible, as I have met a few people who have followed this path (but they are rare). It seems to require many years of dedicated work. And there appears to be a big disincentive for many software professionals, as it turns them from generalists into specialists. If you dedicate several years to developing the necessary domain expertise in climate modeling, it probably means you’re committing the rest of your career to working in this space. But the pay is lousy, the programming language of choice is uncool, and mostly you’ll be expected to clean up after the parade rather than star in it.

13 Comments

Neil
November 16, 2010 at 6:04 pm

You’ve probably read her work, but Bonnie Nardi deals/dealt with similar topics. I particularly liked the idea of local Guru for a team; a person who has a foot in the software and domain specific camps. I’m not sure we need to insist that ALL team members know software engineering principles.

e.g.
http://portal.acm.org/citation.cfm?doid=142750.142767
Pingback: Tweets that mention You can’t delegate ill-defined problems to software engineers | Serendipity -- Topsy.com
George Crews
November 17, 2010 at 7:48 am

Hi Steve,

IMHO, a very good posting. I’m sure many of my friends would agree this is a very clear explanation of why the most appropriate software development methodology for complex problem domains is (and always has been!) an “agile” methodology.

Your examples of two complex problem domains are perfect. Climate scientists do not definitively understand the climate. Climate modelers do not definitively understand climate modeling. The problem domains are both too complex. Therefore, no definitive set of requirements can be developed for either, much less integrated. We discover/refine requirements as we go. Therefore, there can be no traditional waterfall approach to such software development efforts.

Of course, this means IV&V of such software must take an agile approach too. I am wondering how this might relate to Dr. Curry’s recent postings on climate uncertainty at Climate Etc.?

Also, having both an advanced degree in general engineering science and extensive software engineering experience, I can add my own anecdotal evidence that it is easier for a scientist or engineer to learn enough programming than it is for a programmer to learn enough science or engineering.

Don’t get me wrong, both are vitally important. The scientist deals with the complexity of nature, the programmer deals with the nature of complexity.
steve
November 17, 2010 at 7:15 pm

George: Thanks, good points! (I especially love your last sentence – a very quotable quote).

I’ve read some of the discussions at Judith Curry’s blog on uncertainty, and I think she’s completely misunderstood how the modeling community handles uncertainty. One day I’ll get time to write a rebuttal. Michael Tobis pulled her logic apart, but I’m more interested in using the software practices of the modellers to demonstrate that the modellers have a far more sophisticated approach to uncertainty that Curry makes out.
Richard pauli
November 17, 2010 at 11:50 pm

Please excuse a comment from this befuddled old BASIC programmer:
Suppose we had a middle layer effort made up of pseudo-coders who wrangle the models and comment the code and represent both the domain and the software engineers. – couldn’t this act as a single repository that allows both more open modeling and then too guide code experts to view a common ground of work? This way areas of scientific controversy and change can be well defined and designed in code to be built into the design. For instance, all climate models could be made into a superset of all climate models.
Forgive this is simplistic view. This is a great site, and you are facing the most important aspect of global warming science. And whatever you do, please hurry.
Tim van Beek
November 18, 2010 at 5:27 pm

Hi there,

Agile methods:
How many man-years go into a state of the art 3D global climate model? How about code sharing between different climate science groups (modularity)?

My educated guess is that the answers two both questions are indicators that the agile method may fail, if you mean with “agile method” an iterative approach on the scale of e.g. SCRUM (one developer has to be able to implement a complete feature in two months).

Scientists and education in software engineering:
If it should turn out that it is indeed easier for people in earth science to learn enough about software engineering than it is for software engineers to learn enough about earth sciences, shouldn’t there at least be dedicated classes in software engineering for active scientists? (Maybe there already are, but I’ve been out of academia for a while now and wouldn’t know :-)).
steve
November 18, 2010 at 5:33 pm

Tim: yes, these efforts are bigger (maybe by an order of magnitude) than a typical SCRUM team. Here’s my best guess at the numbers:
http://www.easterbrook.ca/steve/?p=1906
I’ve characterized them as the biggest agile projects I’ve ever seen. My challenge (as a research question) is to explain why they succeed, when conventional wisdom about agile practices say you can’t scale them up to such big teams.

On your last question, there’s very few such courses out there. Greg’s software carpentry is a notable exception, and Greg has taught it at several major climate modeling labs this year. But we definitely need more.
George Crews
November 18, 2010 at 5:37 pm

I should have mentioned something of importance in my last comment. But I overlooked it. Something, for example, Dan Hughs has written about on several occasions. Something he calls the “User Effect”.

For complex engineering analysis problems (for example, most any practical CFD problem) different users can model the same problem with the same software and get different answers. This happens a lot with more inexperienced engineers. Often the answers are actually inconsistent with each other. This is the User Effect. The explanation for the User Effect is that it takes considerable expertise to use such software for such problems. Different people with different skill levels will tend to get different answers.

I don’t see any way, offhand, where software engineers can help with this problem. For example, it does not not appear entirely appropriate to try and make such software “user friendly,” since this all but extends an invitation to the User Effect. In fact, software engineers themselve would, in this case, be “sophomore” engineers (at best) and thus be the most likely to suffer badly from User Effect.

Then again, IMHO, because of the extreme complexity of climate modeling, everybody is a sophomore in understanding how to interpret climate model output. Perhaps the User Effect is just something we have to live with for now?
George Crews
November 18, 2010 at 5:46 pm

@George Crews Sorry, typo. That’s Dan Hughes.
Robert Grumbine
November 18, 2010 at 5:47 pm

@Tim van Beek

the quick answers are a lot (see steve’s earlier post about climate models as large efforts) and not so much, but more than you might think. Turbulence schemes, radiation schemes, and sea ice modules are probably the most commonly transported. Model dynamical cores might be the least.
Alfred Differ
November 19, 2010 at 12:47 am

I suspect that you’ll find a lot of former science types among the software engineers. My background is physics, so while I would have to learn the Earth science details, the math wouldn’t scare me off. You might have to be picky in the labor market, but that isn’t unusual.

It seems to me the job positions being described are those of the two analysts. Some software engineers are better at that type of task than others, so you might be facing another hiring filter.

Also, I’ve yet to have a customer that could describe what they needed to me. It has ALWAYS come down to building something, putting it in front of them and finding out if that’s what they really needed… and then rebuilding it if it wasn’t. All the engineering work done before helps improve the odds that we get it right. An excellent analyst has domain experience and is a huge asset for the team.
Tim van Beek
November 19, 2010 at 1:15 am

@Alfred Differ
I’d second that, I have a Diplom (roughly the same as a master) in theoretical physics and work as a software developer for custom software at a rather big company. The kind of custom software we work on is usually designed for a very special domain, which at least the lead developers have to understand, sometimes at the same level as the experts. (You know, the fun aspect of this kind of work is that you get to talk to different experts working for your customer, and often find out that there are quite a lot of inconsistencies in their worldview :-)).
John Mashey
November 19, 2010 at 1:40 am

Well, I would hardly argue against this, given talks I used to give in the late 1970s, i.e., about agile programming before it got called that, and emphasizing tool-building and reusable components to raise level of work.
A later retrospective on that was Languages, Levels, Libraries and Longevity.

But I would disagree slightly in a few ways:

a) When starting a new project, it’s well worth having someone who really knows the relevant toolsets and their appropriateness, and can get them set up so the scientists and engineers can use them.

b) It is really good to have someone who understands enough about the interactions of computer architectures, algorithms and data structures to help get the latter, especially structured early to avoid performance catastrophes. Many good software engineers are *not* performance-analysis specialists and sadly.

Of course, people often pick these up from outside; I’ve certainly known scientists who were very good at the latter.

Back at SGI, in the 1990s, we had some terrific systems engineers who worked with scientists on code tuning for cache-based multiprocessors, and they could sometimes get order-of-magnitude performance improvements from fairly small changes.

Of course, if someone had a key data structure derived from a non-cached machine that:
a) was really too wired-in to change
b) caused almost every memory reference to be 100-clock cache miss

it’s hard to do much with it. Of course, some benchmarks are written that way on purpose.
chris price
November 22, 2010 at 3:26 pm

What happens is that the scientists become experts in the software tools
they use, just as good if not better than the software specialists themselves.
They build a library and they are the experts in that API.
This is just one part of the technical practicality in the experimentation
work. Time probably would be a more crucial factor than funding.

In fact this is done, you have software engineers whose role was applied
comp sci, not a specialist coder for hire. I was one of them. Worked for
a building research organisation for 8 years, we developed (outsourced) software
tools (new languages) for
fast coding of code-based building regulations and other energy efficiency
codes (eg ALF).
Pingback: Do Climate Models need Independent Verification and Validation? | Serendipity

You can’t delegate ill-defined problems to software engineers

13 Comments

Leave a Reply