I spent a little time this afternoon trying to get hold of data. I guess I have high expectations that the web should deliver what I want instantly; in the old days it would have taken a few days in the library to track down the data sets I needed, and then a few weeks waiting for it on inter-library loan. In some respects, things haven’t changed much, although now it just means you hit the paywall faster. Here’s today’s tale…

It began with a post by George Monbiot on how we’ll have to make cities much more dense if we are to cut down their energy needs. George then tweeted about a fabulous graph from the UNEP which illustrates the point nicely:

In which Toronto holds an interesting position compared to other North American cities. Anyway, someone then pointed out that this data is a little old – it’s based on a classic study by Newman and Kenworthy from the 1980′s. So now the hunt begins: is there an updated version of this anywhere, and if not, can I get hold of the data to create it?

Luke Devlin tweeted out a newer version, published in 2009, based on data from the UITP Mobility in Cities Database, which has data from around the year 2001:

However, this graph is pretty ugly, and has none of the cities labelled. So, methinks that would be easy to fix – all I need is the data. Unfortunately the database (on CD-ROM – how quaint!) costs €1,200. And I’d have to wait for it to arrive. Surely someone has this online for free? No? After all, I only want to use one indicator…

Okay, so the data hunt is on. Population density data is easy to get hold of – wikipedia has plenty of it. In exploring this a little, I find some wikified concerns expressed about the original graph, and a whole can of worms about how exactly you compute population density for a city (tl;dr: it depends where you think the city boundaries are).

A little more googling turns up a fascinating 2003 paper “Transport Energy Use and Greenhouse Gases in Urban Passenger Transport Systems: A Study of 84 Global Cities“ (by the same Kenworthy), which has a graph of exactly the data I need:

But of course, it points me back at the same UITP dataset for the actual numbers. Darn.

Then there’s a UNEP report dated March 2011, “Technologies for Climate Change Mitigation – Transport Sector“, which uses the same data, but actually does plot the graph I’m after:

It’s a little better than the previous version, but still doesn’t label the individual cities (which one is Toronto??). And of course, although the report is dated 2011, it’s still the same 2001 dataset from UITP.

So where else might I get data like this? A little more googling and I hit what looks like the jackpot: An extensive list of resources on transportation statistics. Unfortunately, the only one that seems to have the transport data by city is the UITP dataset. Back to that paywall again.

In the meantime, I seem to have launch George Monbiot off into an investigation of the academic publishing racket, exploring why the results of publicly funded research is invariably behind a paywall:

I look forward to reading his blog post on that topic. Meanwhile, I’m off to track down someone on campus who might already have the UITP CD-ROM…

Update 4-Jul-2011: Chris Kennedy sent me his 2009 paper in which he did a detailed analysis for 10 cites, with an update of the density vs transport energy consumption curve. He tells me he has the energy data for more cities, but not the density data, as this is very hard to do consistently. Oh, and silly me – I’d already blogged this, together with Chris’ graph last year. Here’s Chris’s graph. He says “The logarithm of urbanized density has a statistically significant fit (t stat ) -10.26) against the logarithm of GHG emissions from transportation fuels with an R2 of 0.93 (Table 2). The logarithm of average personal income is statistically insignificant (t stat ) -0.35).” (p7299)

Chris also tells me the IEA report on the world’s energy, due out later this year, will chapter on cities, with an update of the graph.


  1. I would never have guessed that Toronto is more dense than New York. What are they using as city boundaries?

  2. I’ll second that. There are, I understand, several definitions of ‘Toronto’, which give populations from ca. 500,000 to ca. 8-9 million. That will certainly affect density computations.

    I’ll also suggest that the conclusion of ‘need to get denser’ is not supported by this data set, regardless of that issue. Australian cities shown using half or less of the energy that US cities do — at lower population density. Toronto being twice the density of Australian cities, yet using more energy. Moscow at maybe 35% the density of Hong Kong but using ?30% the already very low energy levels of Hong Kong.

    The original curve also bothers me. Almost all the points are above the curve. That suggests the curve was drawn by hand rather than objective data fit. And my own experience working with data says that the line drawn over the data can conflict pretty badly with the data before the eye will reject it.

    Also missing are error bars. While census figures can be fairly good, interpolations are less good and the energy figures are probably not all from the same time as the census. The energy figures themselves almost certainly have some pretty substantial error bars; it’s a lot harder to tell how much energy people use than to count the people in the first place.

    That takes me to the axes. The error bars on the current graph would be vertical (i.e., mainly in the energy usage), and most (by number of cities) of the fit line is also nearly vertical. This makes it almost impossible to reasonably assess goodness of fit by eye. We then have a long flat zone with few cities — more than 2/3rds of the horizontal axis is for densities over 100, yet contains only 3 cities. One thing to do is swap x and y axes. If the relationship is good, it doesn’t matter which you use where, but it can be easier to see what is going on by eye one way than the other. And/or plot against the log of population density. In that case Tokyo to Hong Kong would have the same fraction of the axis as Hamburg to Tokyo. And you wouldn’t have 50% of the cities squished against the left wall in the first 8% of the axis.

    But, what’s already obvious from the curve is that culture is more important than population density. North American cities, regardless of population density, use more energy than any other cities. With only 3 Asian cities represented, and there being … er … lots of asian cities with large populations and densities (and some not as high densities) … again, I’d be hesitant to draw conclusions. (Plus zero cities from South America, Africa, and only Moscow for eastern Europe, none from India, …)

    Back to the data on density — definitions of what they’re talking about will be important. There’s no plausible (to me) definition that is going to make LA more densely populated than Chicago. Chicago being less dense than NYC works. Chicago being markedly less dense than Toronto only works if only paleo-Toronto is used (the ca. 500,000 population core). If the Toronto with population ca. 3.5 million is used, it’s comparable to Chicago city proper (ca. 3 million), as it is if the ca. 8-9 million metro area is used for both.

  3. I would be mostly concerned with confounding variables, Robert mentioned culture. I’d add to that policies, such as building standards, or simply energy costs. From my experience with Europe they seem to be more strict when it comes to buildings standards regarding energy efficiency this coupled with the price, just running a quick check the prize in BC, Canada is 8.78 cent (CAD) at its peak and in the place my parents live in Germany which is close to Frankfurt is 21.36 cent (EUR). You have similar price differences in natural gas, gasoline, and oil for heating. Bottom line the population is more motivated to invest into energy saving.

  4. The interesting points are the outliers below the curve. What makes Copenhagen and Amsterdam special, given that it isn’t density?

  5. The World Bank may have data you can use. e.g.

    As to Nick’s question “What makes Copenhagen and Amsterdam special, given that it isn’t density?” Maybe it’s climate :-)

  6. I’m willing to bet that none of the original datasets (if I ever get hold of them) contain any information about errors from which I could plot error bars. If I’m lucky there might be some methodology information from which I can infer something about errrors…

  7. You could use G3Data to extract the point values:

  8. Steve, no surprise about the error bars. But how much confidence can we place in conclusions based on data we have no idea of their quality? Plus, of course, the quality is probably variable between different areas.

    “You have to increase population density.” is a non-starting argument in the US. It’s also based, as far as I’ve seen, on assumptions of cultural equivalence that are erroneous. I live in a suburban area of approximately 1000 per square mile. Not sure what that works out to per hectare. Anyhow, well below the 10-15,000 per square mile of Chicago and New York. As an engineering matter, our density could be increased 10-fold fairly easily.

    If the problem with mass transit for the area were not enough customers close enough to terminals (whether that means train or bus stops), then that’s no problem. It is the main assumption, from what I’ve seen, that drives the ‘more density’ argument.

    But that’s wrong. Implicitly assumed, and incorrect for much of the US (most exceptions being older parts — certainly pre-WW II — of older cities) is that you can take care of much of your life by walking to places for routine shopping and transactions, say moderate groceries, ATM, post office, and so forth, for many ordinary needs. The mass transit is, then, just for getting to work and more specialized shopping — where you won’t be making so many trips per day, and they’re more schedulable.

    US post-war norm is that a) there’s nothing within walking distance and b) even if there is, you can’t walk there. A is zoning, B is lack of sidewalks. Neither of these is affected by mass transit being available. I’m in a more or less typical, of the US, area. The nearest place I could buy some milk is about 1 km away. About 400 m of that would involve walking in the midst of traffic — no sidewalk and no shoulder. Nearest post box is 3 km away, and a different 400 m stretch with no sidewalk and no shoulder is between us. Nearest transit stop is past the same impassable stretch. The only thing other than my neighbors’ homes that I could walk safely to is an elementary school (and my kids wouldn’t go there if they lived here). Ditto for bikes.

    Taking walking distance to be 1.5 miles (the distance I walked to high school), adding 800 meters of sidewalk would make available 2 high schools, 3 elementary schools, one or two middle schools, a hospital, two modest grocery stores, a clothes store, 3 restaurants, a nail salon, multiple churches, a bible college, and a major site of employment (ca. 15,000). Post office would still be out of range (3 km) but probably there’s a box somewhere inside that radius.

    Alternately, rezoning so that it was legal to put a small general service store in the vacant lot down the street from me would mean it was within 800 m of all of us in the development.

    Nick: For Amsterdam, I’ll guess bikes. Not sure if Copenhagen is as in to bikes.

  9. Re hectares. As we all know, there are 8 furlongs to a mile, and 10 chains to a furlong, and an acre is one chain by one furlong, so there are 640 acres to the square mile. Handy rule of thumb: there are almost exactly 2.5 acres to the hectare: 4 hectares is 10 acres. Thus there are 256 hectares to the square mile (you can also do this by saying 100 x 1.6 x 1.6, but going via acres is more fun).. So 1000 per square mile is 4 per hectare.

    Imperial units are Britain’s revenge for American independence.

    The Danes love bikes, maybe not as much as the Dutch, but still.

    I grew up here: . Density probably something like 8 per hectare, and eminently cycle-able. It’s not density which makes the US so cycle-hostile.

  10. Pingback: Misleading use of Information Visualization | Serendipity

  11. I’m no fan of artificial scarcity imposed on things with very low replication costs but it does look as though Monbiot has rarely if ever looked at a complete copy of a journal specialized much more narrowly than the “Nature” or “Science” breadth, say “JGR” for example. Advertising should pay publication costs? No, never going to happen and it’s not down to incompetence on the publisher’s part.

Join the discussion: