This post contains lots of questions and few answers. Feel free to use the comment thread to help me answer them!
Just about every scientist I know would agree that being more open is a good thing. But in practice, being fully open is fraught with difficulty, and most scientists fall a long way short of the ideal. We’ve discussed some of the challenges for openness in computational sciences before: the problem of hostile people deliberately misinterpreting or misusing anything you release, the problem of ontological drift, so that even honest collaborators won’t necessarily interpret what you release in the way you intended. And for software, all the extra effort it takes to make code ready for release, and the fact that there is no reward system in place for those who put in such effort.
Community building is a crucial success factor for open source software (and presumably, by extension, for open science). The vast majority of open source projects never build a community, so, while we often think of the major successes of open source (after all, that’s how the internet was built), these successes are vastly outnumbered by the detritus of open source projects that never took off.
Meanwhile, any lack of openness (whether real or perceived) is a stick by which to beat climate scientists, and those wielding this stick remain clueless about the technical and institutional challenges of achieving openness.
Where am I going with this? Well, I mentioned the Climate Code Foundation a while back, and I’m delighted to be serving as a member of the advisory board. We had the first meeting of the advisory board meeting in the fall (in the open), and talked at length about organisational and funding issues, and how to get the foundation off the ground. But we didn’t get much time to brainstorm ideas for new modes of operation – for what else the foundation can do.
The foundation does have a long list of existing initiatives, and a couple of major successes, most notably, a re-implementation of GISTEMP as open source Python, which helped to validate the original GISTEMP work, and provide an open platform for new research. Moving forward, things to be done include:
- outreach, lobbying, etc. to spread the message about the benefits of open source climate code;
- more open source re-implementations of existing code (building on the success of ccc-GISTEMP);
- directories / repositories of open source codes;
- advice – e.g. white papers offering guidance to scientists on how to release their code, benefits, risks, licensing models, pitfalls to avoid, tools and resources;
- training – e.g. workshops, tutorials etc at scientific conferences;
- support – e.g. code sprints, code reviews, process critiques, etc.
All of which are good ideas. But I feel there’s a bit of a chicken-and-egg problem here. Once the foundation is well-known and respected, people will want all of the above, and will seek out the foundation for these things. But the climate science community is relatively conservative, and doesn’t talk much about software, coding practices, sharing code, etc in any systematic way.
To convince people, we need some high profile demonstration projects. Each such project should showcase a particular type of climate software, or a particular route to making it open source, and offer lessons learnt, especially on how to overcome some of the challenges I described above. I think such demonstration projects are likely to be relatively easy (!?) to find among the smaller data analysis tools (ccc-GISTEMP is only a few thousand lines of code).
But I can’t help but feel the biggest impact is likely to come with the GCMs. Here, it’s not clear yet what CCF can offer. Some of the GCMs are already open source, in the sense that the code is available free on the web, at least to those willing to sign a basic license agreement. But simply being available isn’t the same as being a fully fledged open source project. Contributions to the code are tightly controlled by the modeling centres, because they have to be – the models are so complex, and run in so many different configurations, that deep expertise is needed to successfully contribute to the code and test the results. So although some centres have developed broad communities of users of their models, there is very little in the way of broader communities of code contributors. And one of the key benefits of open source code is definitely missing: the code is not designed for understandability.
So where do we start? What can an organisation like the Climate Code Foundation offer to the GCM community? Are there pieces of the code that are ripe for re-implementation as clear code? Even better, are there pieces that an open source re-implementation could be useful to many different modeling centres (rather than just one)? And would such re-implementations have to come from within the existing GCM community (as is the case with all the code at the moment), or could outsiders accomplish this? Is re-implementation even the right approach for tackling the GCMs?
I should mention that there are already several examples of shared, open source projects in the GCM community, typically concerned with infrastructure code: couplers (e.g. OASIS) and frameworks (e.g. the ESMF). Such projects arose when people from different modelling labs got together and realized they could benefit from a joint software development project. Is this the right approach for opening up more of the GCMs? And if so, how can we replicate these kinds of projects more widely? And, again, how can the Climate Code Foundation help?