I’m busy revising our paper on the study of the software development processes at the Hadley Centre for publication in CiSE (yay!). And I just looked again at the graph of code growth (click for bigger version):
The top line (green) shows lines of code, while the bottom line (blue) shows number of files. When I first produced this figure last summer, I was struck by the almost linear growth in lines of code over the fifteen years (with two obvious hiccups when core modules were replaced). Other studies have shown that lines of code is a good proxy for functionality, so I interpret this as a steady growth in functionality. Which contrasts with Lehman’s observations that for industrial software, an inverse square curve offers the best fit. Lehman offers as an explanation the theory that growing complexity of the software inevitably slows the addition of new functionality. He claims that his pattern is robust in other (commercial) software he has studied (but I’d haven’t trawled through his papers to see if he gives more case studies).
Subsequently, Godfrey & Tu showed that the Linux kernel did not suffer this limitation, but instead grew slightly faster than linearly (or geometrically if you include the device drivers, which you probably shouldn’t). So, that’s two studies that break Lehman’s pattern: the Linux kernel and the Hadley UM. What do they have in common that’s different from the commercial software systems that Lehman studied? I hypothesize the most likely explanation is that in both cases the code is written by the most knowledgeable domain experts, working in a non-hierarchical meritocracy (by which I mean that no one tells them what to work on, and that they get accepted as members of the development team by demonstrating their ability over a period of time). This isn’t a new hypothesis: Dewayne Perry has been saying for ages that domain expertise by the developers is the single biggest factor in project success.
Anyway, my co-author, Tim, was struck quite a different observation about the graph: the way the two lines diverged over the fifteen years shown. While lines of code have grown relatively fast (nearly tenfold over the fifteen years shown), the number of files has grown much more slowly (only threefold). Which means the average filesize has steadily grown too. What does this mean? Roughly speaking, new files mean the addition of new modules, while new lines within an existing file mean additional functionality within existing modules (although this is not quite correct, as scientific programmers don’t always use separate files for separate architectural modules). Which means more of the growth comes from adding complexity within the existing routines, rather than from expanding the model’s scope. I’m willing to bet a lot of that intra-file growth comes from adding lots of different options for different model configurations.
I’d like to see the evolution research move away from the static approach of metricizing everything, and toward a focus on value delivered per change. What we should be asking is how well the system is meeting requirements, because I think there is a big disconnect between counting LOC and evaluating how well objectives are achieved (granted, measuring requirements satisfaction is complex). There’s a good article by Tom DeMarco on this topic in IEEE Software: http://dx.doi.org/10.1109/MS.2009.101
Pingback: Climate Science is an Experimental Science | Serendipity
It would be interesting to see how the complexity of the ‘input decks’ changed over that time too.
Pingback: High level architecture of earth system models | Serendipity