As a follow-on from yesterday’s post on making climate software open source, I’d like to pick up on the oft-repeated slogan “Many eyeballs make all bugs shallow”. This is sometimes referred to as Linus’ Law (after Linus Torvalds, creator of Linux), although this phrase is actually attributed to Eric Raymond (Torvalds would prefer “Linus’s Law” to be something completely different). Judging from the number of times this slogan is repeated in the blogosphere, there must be lots of very credulous people out there. (Where are the real skeptics when you need them?)

Robert Glass tears this one apart as a myth in his book “Facts and Fallacies about Software Engineering“, on the basis of three points: it’s self-evidently not true (the depth of a bug has nothing to do with how many people are looking for it); there’s plenty of empirical evidence that the utility of adding additional reviewers to a review team tails off very quickly after around 3-4 reviewers; and finally there is no empirical evidence that open source software is less buggy than its alternatives.

More interestingly, companies like Coverity, who specialize in static analysis tools, love to run their tools over open source software and boast about the number of bugs they find (it shows off what their tools can do). For example, their 2009 study found 38,453 bugs in 60 million lines of source code (a bug density of about 0.64 defect/KLOC). Quite clearly, there are many types of bugs that you need automated tools to find, no matter how many eyeballs have looked at the code.

Part of the problem is that the “many eyeballs” part isn’t actually true anyway. In a study conducted by Xu et. al. in 2005 of the sourceforge community, they found that participation in projects follows the power law well known in social network theory: a few open source projects have a very large number of participants, and a very large number have very few participants. Similarly, a very small number of open source developers participate in lots of projects; the majority participate in just one or two:

SourceForge Project and Developer Community Scale Free Degree Distributions (Figure 7d from Xu et al 2005)

SourceForge Project and Developer Community Scale Free Degree Distributions (Figure 7d from Xu et al 2005)

For example, the data shown in these graphs include all developers and active users for about 160,000 sourceforge projects. Of these projects, 25% had only a single person involved (as either developer or user!), and a further 10% had only 2-3 people involved. Clearly, a significant number of open source projects never manage to build a community of any size.

This is relevant to the climate science community because many of the tens of thousands of scientists actively pursing research relevant to our understanding of climate change build software. If all of them release their software as open source, there’s no reason to expect a different distribution from the graphs above. So most of this software will never attract any participants outside the handful of scientists who wrote it, because there simply aren’t enough eyeballs or interest available. The kind of software described in the famous “Harry” files at the CRU is exactly of this nature – if it hadn’t been picked out in the stolen CRU emails, nobody other than “Harry” would ever take the time to look at it. And even if lots of people’s attention was drawn to this particular software (as it has been), there are still thousands of other scraps of similar software out there which would also remain single person projects like those on sourceforge. In contrast, a very small number of projects will attract hundreds of developers/users.

The thing is, this is exactly how the climate science community operates already. A small number of projects (like the big GCMs, listed here) already have a large number of developers and users – for example, CCSM and Hadley’s UM have hundreds of active developers, and a very mature review process. Meanwhile a very large number of custom data analysis tools are built by a single person for his/her own use. Declaring all of these projects to be open source will not magically bring “many eyeballs” to bear on them. And indeed, as Cameron Neylon argues, those that do will immediately have to protect themselves from a large number of clueless newbies by doing exactly what many successful open source projects do: the inner clique closes ranks and refuses to deal with outsiders, ignores questions on the mailing lists, etc. Isn’t that supposed to be the problem we were trying to solve?

The argument that making climate software open source will somehow magically make it higher quality is therefore specious. The big climate models already have many eyeballs, and the small data handling tools will never attract large numbers of eyeballs. So, if any of the people screaming about openness are truly interested in improving software quality, they’ll argue for something that is actually likely to make a difference.

Documenting climate models
Software Quality in Climate Research

9 Comments

  1. Pingback: Why Opening Up (Probably) Wouldn’t Help « Software Carpentry

  2. Hi Steve,

    To theorize the uniform ineffectiveness of open sourcing climate science data/software based on an empirical power law seems dismissive.

    The value of “open source” software is not entirely captured by the software development meme: “Many eyeballs make all bugs shallow” (MEMABS).

    But to address MEMABS. You give a Wikipedia reference to the more formal definition of MEMABS: “Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone.”

    Some people are hoping that: “Given a large enough “alarmist”, skeptical, and “denier” base looking at the climate input/software/output, almost every problem will be characterized quickly and the fix will be obvious to someone.”

    To label such people’s hopes “credulous,” their thinking “magical” or their argument “specious” (which has the connotation of “deceptive”) seems to me harsh.

    Should the Steve McIntyre’s out there need to be given a chance? Correct me if I am wrong. Has he found “bugs”? Yes. Is this a positive outcome for climate science? Yes.

    Will such type of efforts help resolve the issues revealed in the CRU emails? Open source philosophy predicts — yes.

    That his efforts so far may not have been “worthwhile” is a value judgment. Values and science are very important — but different. It is futile to argue one using the other.

    George

    [McIntyre is what an open source team would characterize as a persistent clueless newbie. He'd get frozen out of any open source community with the kinds of nonsense he peddles. It's just a pity that people who want to call themselves "skeptics" don't show a bit more skepticism towards McIntyre's antics. Now that's really enough about him - it's not really relevant to the thread. - Steve]

  3. “Many eyeballs make all bugs shallow” is a classic argument of the Open Source movement, but even if you can successfully dispute the veracity of this claim, this does not construct an argument for proprietary software or proprietary computational models.

    Indeed, the Free Software movement does not attempt to make claims about Free Software being “better” according to one’s choice of quality metric, but it does have something to say about giving everyone connected with a piece of software the opportunity to study, improve and pass that software on. The transparency promoted by such a philosophy – based on ethical rather than technical arguments – should also assuage criticism from those who claim to be motivated to review and study such tools and techniques but who also claim to be excluded from doing so, potentially on a political basis or through some form of discrimination.

    With an emphasis on Free Software values, although technical hurdles may remain, the political and social obstacles can be effectively removed. And since these obstacles are the ones which cause most ill-feeling and injustice, feed notions of secrecy and conspiracy, and yet do not contribute to the technical excellence of a piece of work, there is clearly much to be said for their removal, all without seeking some kind of technical justification for doing so at all.

    [I agree there are many potential benefits of open source; I was just restricting myself to the argument that it improves quality. However, I disagree completely that the "political and social obstacles can be effectively removed". No open source community is completely free of politicking and social misbehaviour. In the charged political atmosphere around climate software, I don't see how a truly open source community could possibly work; there's no mutual respect on which to build. - Steve]

  4. Love your rationale, Steve. Training scientists and engineers to write better software is a great idea. I don’t think it will help a lot. Most folks have all they can do to keep with up their field of interest. Quick and dirty programming is the rule for most folks who aren’t professional programmers. Don’t get me wrong, I think formal training in programming is mandatory for most science professionals. On the other hand, programmers should be developing well tested, easily used tools… tools that make documentation easy, produce accurate results, and that cost little or nothing.

    Last, open source software, a lot of extra eyes, training, quality tools can’t help where arrogance is involved. Folks have a right to be angry when confronted with the arrogance exhibited by these guys. It just plays into the hands of all the luddites wanting to ignore obvious climate problems. It won’t make ordinary folk who are afraid of science and scientists feel any better either.

  5. For those commenters who still think McIntyre is engaged in an honest critique, take a look at FactCheck.org’s analysis: http://factcheck.org/2009/12/climategate/

  6. Pingback: Climate Science and Software Quality | Serendipity

  7. Steve,
    I am disappointed to read you seem to think it matters what his motivations are; does his argument have technical merit or no? That should be the only question a professional asks, nothing else is at issue. He (McI) found a mistake in some PCA analysis, the folks involved should thank him for his interest in the work, correct the problem and move on (doing anything else is just fodder for continued buffoonery). Here’s a similar case of ‘meddling amateur treated poorly':

    22. The concerted efforts by a group of eminent climate scientists to prevent the publication of the Keenan paper had been unsuccessful. However, this was mainly due to the fact that I was prepared to resist peer pressure and to be open-minded regarding Keenan’s evidence and argumentation. I doubt that mainstream science editors would have dared to reject the opposition by leading climate scientists who had targeted an amateur researcher. As Phil Jones fittingly put it to me in an e-mail: ‘How would any journal ever contemplate publishing such a paper?’
    EE Editor Submission to UK Parliament

    The proper response is: Thank the interested amateur for spotting the error, correct it (generally it’s a minor correction that changes very little), and move on with the work. Appealing to another’s bad behavior to excuse one’s own is unseemly. Is it hard to do when you think the other guy is a jerk and a scoundrel? Sure, but right action isn’t determined by difficulty. Even if the particular paper mentioned above turns out to be complete rubbish:

    As for thinking that it is “Better that nothing appear, than something unacceptable to us” …..as though we are the gatekeepers of all that is acceptable in the world of paleoclimatology seems amazingly arrogant. Science moves forward whether we agree with individual articles or not….
    — Raymond S. Bradley, 0924532891.txt

    If all you are saying is that the CRU email thing has very little impact on our understanding of the world, then I think most folks would agree (there might even be some folks in climate science who would argue paleo itself is rather unimportant). If you are saying the affair is unimportant in regards to upholding norms of scientific integrity, then I think plenty of people would (rightly) disagree, and whether any malfeasance has actually occurred is still up for determination.

    (Sorry, this comment turned out a little longer than I intended, but the scientific integrity thing is something I feel pretty strongly about, so in the spirit of open notebooks, I posted. We should take serious things seriously, possibly Dr Jones and the others involved will be exonerated, but we shouldn’t flinch from looking because we dislike the source of correction.)

  8. Josh: the motivation of the attacker matters because scientists often fail to understand the nature of political attacks. Scientists are trained in an environment where everyone they interact with places the highest value on scientific integrity and the quest for truth. When their work is attacked for political reasons, they don’t know what to do. Responding in the same way they would with other scientific colleagues doesn’t work, because a political attacker isn’t interested in truth, they are interested in scoring points in front of an audience. McI plays to his audience and scores points repeatedly on a topic that the scientists understand is long settled. If we can’t separate out the honest quests for the truth from the point-scoring political attacks, and handle each in an appropriate way, then the larger goal of true public understanding of science will forever be undermined.

    I still see nothing but political opportunism in the links you provide – scientists will always work to protect the integrity of the scientific process by working to keep crap out of the journals. I know many many people who think they have been wronged because journals turned down their papers; in every case they deserved it, and most cases they are too incompetent to understand what’s wrong with their attempts to do research.

    The political point-scoring aims to distort the way the peer-review process works, and present an artificial equality between world class scientists doing honest work and a bunch of political hacks who don’t understand the scientific process, and then cry fowl when the scientists treat their efforts with contempt. But if you don’t have the tools to sort out who are the experts here, then the complaints of attempts to distort the peer-review process might sound genuine. Which shows that this line of political attack works – even smart people are taken in by it.

  9. @steve

    IMHO, Josh’s suggestion is just to keep a sharp distinction between scientific theory and the scientific method. We agree that climate science has been attacked for political reasons. But it is climate science theory that is being attacked. For the scientific method, “attacking” theory is an ordinary, ho-hum event.

    But worrisome is when the integrity of the scientific method is compromised for “the larger goal of true public understanding of science.” There is no public understanding of any scientific theory that is worth endangering the confidence the public has in the scientific method.

    But IMHO, for AGW science, it is too late to avoid that. Public confidence in AGW method has already been eroded. The questions now are how to get confidence in the method back (theory will take care of itself) and how to keep confidence in the rest of science.

    One thing for sure, an appeal to the authority of AGW experts will not do the job. (Rely on a logical fallacy?) Instead, we need a very public re-emphasis on the scientific method (including extensive auditing, whether we like the auditors or not).

    I know it’s easy for me to say this. My life’s work has not been personally attacked. And many experts feel the stakes are higher than I realize. I sympathize.

  10. In business, an auditor is someone fully versed in business and accounting — the area they are auditing.

    Why is it that auditors of science should not be (certainly the current self-proclaimed auditors are not) fully versed in the science they claim to be auditing?

    As to climate science, it is the people doing the science who are being attacked. The theory and method, are only collateral. People, of course, are easier targets. And, as it is an attack on people, there is nothing that the people themselves can do to affect the attack. Glen Beck, c.f., will be calling for the death of all climate scientists regardless of any ‘audit’ process or anything else you name.

  11. I think it’s a little hysterical to suggest that either scientific theory or the scientific method have been damaged in any way by any of this. Some people are claiming so, but then these are the people who have always been anti-science; their opinion has not shifted one jot. The only substantial thing that has been damaged is the prospects for early action on climate change, in the next year or so. The majority of the general public are somewhat disengaged, and will swing against action on climate change while these campaigns of doubt are effective, and will swing in favour of action when they get exposed to more bits of science, or whenever we get more record temperatures, or reports of growing impacts. For this disengaged majority, their opinion of science in general (and of climate science more specifically) is pretty much unaffected by any of this.

    Once again, the people who are getting incensed that “the integrity of the scientific method is compromised” are exactly those who want to use this meme as a political weapon. To everyone else this is a storm in a teacup.

  12. Pingback: Climate modeling in an open, transparent world | Serendipity

Join the discussion: