I had a bit of a gap in blogging over the last few weeks, as we scrambled to pack up our house (we’re renting out it while we’re away), and then of course, the roadtrip to Colorado to start the first of my three studies of software development processes at climate modeling centres. This week, I’m at the CCSM workshop, and will post some notes about the workshop in the next few days. But first, a chance for some reflection.

Ten years ago, when I quit NASA, I was offered a faculty position in Toronto with immediate tenure. The offer was too good to turn down: it’s a great department, with a bunch of people I really wanted to work with. I was fed up of the NASA bureaucracy, the short term-ism of the annual budget cycle, and (most importantly) a new boss I couldn’t work with. A tenured academic post was the perfect antidote – I could focus on long-term research problems that interested me most, without anyone telling me what to study.

(Note: Lest any non-academics think this is an easy life, think again. I spend far more time chasing research funding than actually doing research, and I’m in constant competition with an entire community of workaholics with brilliant minds. It’s bloody hard work)

Tenure is an interesting beast. It’s designed to protect a professor’s independence and ability to pursue long term research objectives. It also preserves the integrity of academic researchers: if university administrators, politicians, funders, etc find a particular set of research results to be inconvenient, they cannot fire, or threaten to fire the professors responsible. But it’s also limited. While it ought to protect curiosity-driven research from the whims of political fashions, it only protects the professor’s position (and salary), not the research funding needed for equipment, travel, students, etc. But the important thing is that tenure gives the professor the freedom to direct her own research programme and the freedom to decide what research questions to tackle.

Achieving tenure is often a trial by fire, especially in the top universities. After demonstrating your research potential by getting a PhD, you then compete with other PhDs to get a tenure-track position. You have to maintain a sustained research program over six to seven years as a junior professor, publishing regularly in the top journals in your field, and gaining the attention of the top people in your field who might be asked to write letters of support for your tenure case. In judging tenure cases, the trajectory and sustainability of the research programme is taken into account – a publication record that appears to be slowing down over the pre-tenure period is a big problem; if you have several papers in a row rejected, especially towards the end of the pre-tenure period, it might be hard to put together a strong tenure case. The least risky route is to stick with the same topic you studied in your PhD, where you already have the necessary background and where you presumably have also ‘found’ your community.

The ‘finding your community’ part is crucial. Scientific research is very much a community endeavor; the myth of the lone scientist in the lab is dead wrong. You have to figure out early in your research career which subfield you belong in, and get to know the other researchers in that subfield, in order to have your own research achievements recognized. Moving around between communities, or having research results scattered across different communities might mean there is no-one who is familiar enough with your entire body of research to write you a strong letter of support for tenure.

The problem is, of course, that this system trains professors to pick a subfield and stick with it. It tends to stifle innovation, and means that many professors then just continue to work on the same problems throughout the rest of their careers. There’s a positive side to this: some hard scientific problems really do need decades of study to master. On the other hand, most of the good ideas come from new researchers – especially grad students and postdocs; many distinguished scientists did their best work when they were in their twenties, when they were new to the field, and were willing to try out new approaches and question conventions.

To get the most value out of tenure, professors should really use it to take risks: to change fields, to tackle new problems, and especially to do research they they couldn’t do when they were chasing tenure. A good example is inter-disciplinary research. It’s hard to do work that spans several recognizable disciplines when you’re chasing tenure – you have to get tenure in a single university department, which usually means you have to be well established in a single discipline. Junior researchers interested in inter-disciplinary research are always at a disadvantage compared to their mono-disciplinary colleagues. But once you make tenure, this shouldn’t matter any more.

The problem is that changing your research direction once you’re an established professor is incredibly hard. This was my experience when I decided a few years ago to switch my research from traditional software engineering questions to the issue of climate change. It meant walking away from an established set of research funding sources, and an established research community, and most especially from an established set of collaborative relationships. The latter I think was particularly hard – colleagues with whom I’ve worked closely for many years still assume I’m interested in the same problems that we’ve always worked on (and, in many ways I still am – I’m trained to be interested in them!). I’m continually invited to co-author papers, to review papers and research proposals, to participate in grant proposals, and to join conference committees in my old field. But to give myself the space to do something very different, I’ve had to be hardheaded and say no to nearly all such invitations. It’s hard to do this without also offending people (“what do you mean you’re no longer interested in this work we’ve devoted our careers to?”). And it’s hard to start over, especially as I need to find new sources of funding, and new collaborators.

One of the things I’ve had to think carefully about is how to change research areas without entirely cutting off my previous work. After many years working on the same set of problems, I believe I know a lot about them, and that knowledge and experience ought to be useful. So I’ve tried to carve out a new research area that allows me to apply ideas that I’ve studied before to an entirely new challenge problem – a change of direction if you like, rather than a complete jump. But it’s enough of a change that I’ve had to find a new community to collaborate with. And different venues to publish in.

Personally, I think this is what the tenure system is made for. Tenured professors should make use of the protection that tenure offers to take risks, and to change their research direction from time to time. And most importantly, to take the opportunity to tackle societal grand challenge problems – the big issues where inter-disciplinary research is needed.

And unfortunately, just about everything about the tenure system and the way university departments and scientific communities operate discourages such moves. I’ve been trying to get many of my old colleagues to apply themselves to climate change, as I believe we need many more brains devoted to the problem. But very few of my colleagues are interested in switching direction like this. Tenure should facilitate it, but in practice, the tenure system actively discourages it.

Congratulations to Jorge, who passed the first part of his PhD thesis defense yesterday with flying colours. Jorge’ thesis is based on a whole series of qualitative case studies of different software development teams (links go to ones he’s already published):

  • 7 successful small companies (under 50 employees) in the Toronto region;
  • 9 scientific software development groups, in an academic environment;
  • 2 studies of large companies (IBM and Microsoft);
  • 1 detailed comparative study of a company using Extreme Programming (XP) versus a similar sized company that uses more traditional development process (both building similar types of software for similar customers);

We don’t have anywhere near enough detailed case studies in software engineering – most claims for the effectiveness of various approaches to software development are based on little more than marketing claims and anecdotal evidence. There has been a push in the last decade or so for laboratory experiments, which are usually conducted along the lines of experiments in psychology: recruit a set of subjects, assign them a programming task, and measure the difference in variables like productivity or software quality when half of them are given some new tool or technique. While these experiments are sometimes useful for insights into how individual programmers work on small tasks, they really don’t tell us much about software development in the wild, where, as Parnas puts it, the interesting challenges are in multi-person development of multi-version software over long time scales. Jorge cites a particular example in his thesis of a controlled study of pair programming, which purports to show that pair programming lowers productivity. Except that it shows no such thing – any claimed benefits of pair programming are unlikely to emerge with subjects who are put together for a single day, but who otherwise have no connection with one another, and no shared context (like, for example, a project they are both committed to).

Each of Jorge’s case studies is interesting, but to me, the theory he uses them to develop is even more interesting. He starts by identifying three different traditions to the study of software development:

  • The process view, in which software construction is treated like a production line, and the details of the individuals and teams who do the construction are abstracted away, allowing researchers to talk about processes and process models, which, it is assumed, can be applied in any organizational context to achieve a predictable result. This view is predominant in the SE literature. The problem, of course, is that the experience and skills of individuals and teams do matter, and the focus on processes is a poor way to understand how software development works.
  • The information flow view, in which much of software development is seen as a problem in sharing information across software teams. This view has become popular recently, as it enables the study of electronic repositories of team communications as evidence of interaction patterns across the team, and leads to a set of theories abut how well patterns of communication acts match the technical dependencies in the software. The view is appealing because it connects well with what we know about interdependencies within the software, where clean interfaces and information hiding are important. Jorge argues that the problem with this view is that it fails to distinguish between successful and unsuccessful acts of communication. It assumes that communication is all about transmitting and receiving information, and it ignores problems in reconstructing the meaning of a message, which is particularly hard when the recipient is in a remote location, or is reading it months or years later.
  • The third view is that software development is largely about the development of a shared understanding within teams. This view is attractive because it takes seriously the intensive cognitive effort of software construction, and emphasizes the role of coordination, and the way that different forms of communication can impact coordination. It should be no surprise that Jorge and I both prefer this view.

Then comes the most interesting part. Jorge points out that software teams need to develop a shared understanding of goals, plans, status and context, and that four factors will strongly impact their success in this: proximity (how close the team members are to each other – being in the same room is much more useful than being in different cities), synchrony (talking to each other in (near) realtime is much more useful than writing documents to be read at some later time); symmetry (which means the coordination and information sharing is done best by the people whom it most concerns, rather than imposed by, say, managers) and maturity (it really helps if a team has an established set of working relationships and a shared culture).

This theory leads to a reconceptualization of many aspects of software development, such as the role of tools, the layout of physical space, the value of documentation, and the impact of growth on software teams. But you’ll have to read the thesis to get the scoop on all these…

A wonderful little news story spread quickly around a number of contrarian climate blogs earlier this week, and of course was then picked up by several major news aggregators: a 4th grader in Beeville, Texas had won the National Science Fair competition with a project entitled “Disproving Global Warming”. Denialists rubbed their hands in glee. Even more deliciously, the panel of judges included Al Gore.

Wait, what? Surely that can’t be right? Now, anyone who considers herself a skeptic would have been immediately, well, skeptical. But apparently that word no longer means what it used to mean. It took a real scientist to ask the critical questions, and investigate the source of the story: Michael Tobis took the time to drive to Beeville to investigate, as the story made no sense. And sure enough, there’s a letter that’s clearly on fake National Science Foundation letterhead, with no signature, and sure enough, the NSF have no knowledge of it. Oh, and of course, a quick google search shows that there is no such thing as a national science fair. Someone faked the whole thing (and the good folks at Reddit then dug up plenty of evidence about who).

So, huge kudos to MT for doing what journalists are supposed to do. And kudos to Sarah Taylor, the journalist who wrote the original story, for doing a full followup, once she found out it was a hoax. But this story just begs the question: how come, now that we live in such an information rich age, so few people can be bothered to check out the evidence about anything any more? Traditional investigative journalism is almost completely dead. The steady erosion of revenue from print journalism means most newspapers do little more than reprint press releases – most of them no longer retain science correspondents at all. And if traditional journalism isn’t doing investigative reporting any more, who will? Bloggers? Many bloggers like to think of themselves as “citizen journalists”. But few bloggers do anything more than repeat stuff they found on the internet, along with strident opinion on it. As Balbulican puts it: Are You A “Citizen Journalist”, or Just An Asshole?

Oh, and paging all climate denialists. Go take some science courses and learn what skepticism really means.

Short notice, but an interesting talk tomorrow by Balaji of Princeton University and NOAA/GFDL. Balaji is head of the Modeling Systems Group at NOAA/GFDL. The talk is scheduled for 4 p.m., in the Physics building, room MP408.

Climate Computing: Computational, Data, and Scientific Scalability

V. Balaji
Princeton University

Climate modeling, in particular the tantalizing possibility of making projections of climate risks that have predictive skill on timescales of many years, is a principal science driver for high-end computing. It will stretch the boundaries of computing along various axes:

  • resolution, where computing costs scale with the 4th power of problem size along each dimension
  • complexity, as new subsystems are added to comprehensive earth system models with feedbacks
  • capacity, as we build ensembles of simulations to sample uncertainty, both in our knowledge and representation, and of that inherent in the chaotic system. In particular, we are interested in characterizing the “tail” of the pdf (extreme weather) where a lot of climate risk resides.

The challenge probes the limits of current computing in many ways. First, there is the problem of computational scalability, where the community is adapting to an era where computational power increases are dependent on concurrency of computing and no longer on raw clock speed. Second, we increasingly depend on experiments coordinated across many modeling centres which result in petabyte-scale distributed archives. The analysis of results from distributed archives poses the problem of data scalability.

Finally, while climate research is still performed by dedicated research teams, its potential customers are many: energy policy, insurance and re-insurance, and most importantly the study of climate
change impacts — on agriculture, migration, international security, public health, air quality, water resources, travel and trade — are all domains where climate models are increasingly seen as tools that
could be routinely applied in various contexts. The results of climate research have engendered entire fields of “downstream” science as societies try to grapple with the consequences of climate change. This poses the problem of scientific scalability: how to enable the legions of non-climate scientists, vastly outnumbering the climate research community, to benefit from climate data.

The talks surveys some aspects of current computational climate research as it rises to meet the simultaneous challenges of computational, data and scientific scalability.

Update: Neil blogged a summary of Balaji’s talk.

I thought I wouldn’t blog any more about the CRU emails story, but this one is very close to my heart, so I can’t pass it up. Brian Angliss, over at Scholars and Rogues, has written an excellent piece on the lack of context in the stolen emails, and the reliability of any conclusions that might be based on them. To support his analysis, he quotes extensively from the paper “the Secret Life of Bugs” by Jorge Aranda and Gena Venolia from last year’s ICSE, in which they convincingly demonstrated that electronic records of discussions about software bugs are frequently unreliable, and that there is a big difference between the recorded discussions and what you find when you actually track down the participants and ask them directly.

BTW Jorge will be defending his PhD thesis in a couple of weeks, and it’s full of interesting ideas about how software teams develop a shared understanding of the software they develop, and the implications that this has on team organisation. I’ll be mining it for ideas to explore in my own studies of climate modellers later this year…

Take a look at this recent poll from Nanos on priorities for the upcoming G8/G20 meetings. Canadians ranked Global Warming and Economic Recovery as the top two priorities for the meetings, but note that global warming beats economic recovery for the top response across nearly all categories of Canadians (with the exception of the old fogeys, in the 50+ age group, and westerners, who I guess are busy getting rich from the oil sands). Overall, 33.7% of Canadians ranked Global Warming as the top priority, while 27.2% named Economic Recovery.

There’s some other interesting results in the poll. In the breakdown by party voting preferences, the Block Quebecois and the NDP seem much more worried about Global Warming than Green Party supporters: 59.3% of BQ voters and 41.5% of NDP voters ranked it first, while only 33.8% of Green Party voters did. So much for the myth that the green party is a single issue party, eh?

Oh, and if you look at the results to the later questions, Global warming is clearly the issue on which Canada is perceived to be doing most badly in terms of Canada’s place in the world.