Friday, November 29, 2013

Marginal cost of congestion in Sydney is around $0.7/km

Post Synopsis: Every kilometre you drive in Sydney during peak hour, you are costing other motorists $0.66.... so if you drive 20km to work, other people are paying for your trip to the tune of about $13. If that sounds OK to you, remember that when you are stuck in traffic, you are paying (in time, extra fuel, etc) for someone else's decision to drive during peak hour. The best way out of this mess is congestion charging.

Congestion sucks

We all know that congestion is a drag in Sydney, and in most other large developed and developing cities (for fun, check out this congestion in Sao Paulo). A couple of studies have had a stab at estimating the total cost of congestion in Sydney. They found the following:
  • A CIE report estimated congestion costs of ~ $0.29 per vehicle kilometre in 2005. These are total costs (not avoidable/dead-weight)
  • BITRE estimate congestion costs of ~ $0.10 per vehicle kilometre in a 2007 working paper (a really good paper, btw). These are avoidable/dead-weight costs[*] -- total costs are around double that, so not so far from the CIE study.
I've reported these results (and others, below) in costs per vehicle-km, because thats the most common tangible measure, but if you want the total annual cost, you can multiply the above by the number of vehicle-km in Sydney (42.4 billion in 2005, say), and get total annual costs of $8-12 billion/year (depending on whether you use BITRE or CIE total costs).


Why average congestion costs are not so useful

Having just told you about a couple of prior studies that give average congestion costs, I have to break to it you that average costs aren't really that useful. Why? Two reasons:

  • Firstly, there is that old problem with averages: your average temperature can be fine if you have your head in the oven and your feet in the freezer....  If you are driving during a relatively quiet period, when congestion is low or non-existent, then you aren't imposing costs on others. If, on the other hand, you are driving during peak hour, then you are going to be imposing very high costs. 
  • Secondly, congestion is 'caused' by those last few thousand cars that are on the network. People refer to this effect as the 'school holiday effect', because you only need to remove a small amount of traffic for congestion to improve. This is telling us that the marginal cost of each additional vehicle is much higher than the average cost across all vehicles.
So what really matters, from a policy point of view, is the marginal cost of congestion. That is, what is the cost imposed by each additional driver? Since congestion costs are high during the peak, in this blog post I'm going to work out a rough estimate of the marginal congestion cost of each additional vehicle during peak hour in Sydney


So what are the marginal costs?

We are interested in the costs imposed on other motorists by each additional driver at peak time. We can work this out, to a reasonable approximation, as follows:

The average traffic speed during the inter-peak period is 42.9km/h [**]. The average traffic speed during the peak period is 30.0 km/h [***], so this means that traffic is 43% faster in the inter-peak period than during the (congested) peak period.

What causes this? Extra traffic on the roads, of course. Looking at the Figure below, we can see that during peak times, the volume of traffic on Sydney roads is ~ 270,000 vehicles during peak times, but only ~160,000 during inter-peak.


Source: BTS HTS 2011/12 summary report.

So, an extra 110,000 vehicles on the road network increases trip times by 43%. On a per-vehicle basis, this means each additional vehicle slows down all other traffic by 0.000325%. So if the vehicle is on the road for a minute, it slows down all other traffic by 0.000325% for that minute. Since there are 270,000 vehicles on the road at peak time, this means it costs those other motorists 0.88 minutes of delay. Valuing that at $14/hour (a pretty standard choice for peak travel), that's $0.205 of delay costs per minute of peak travel. If we add in other costs (extra running costs, pollution, etc), as in the BITRE report, this comes up to $0.33/minute. Average peak travel speed is 30 km/h, so converting this to a per-km costs, we get $0.66/km.

Note that this figure (of $0.66/km) is an average across Sydney for the peak period, and would be much higher in specific (congested) parts of the road network (and lower elsewhere). I haven't done the numbers, but my guess is that it would be a few times higher on congested links, which implies the marginal costs on congested roads might be in the vicinity of $2/km in the peak period.






[*] In the economics of congestion, there is an 'optimal' level of congestion: each additional traveller derives a benefit from making a trip, but there is a cost to other motorists due to congestion, and the optimal level of traffic is when the benefit to the additional traveller is equal to the costs imposed on existing travellers. Avoidable/dead-weight-loss congestion costs are the net costs taking into account the benefits of additional travel as well as the costs. Total costs are usually just the costs of delay relative to free-flow speeds.

[**] I've calculated average speed by using inter-peak travel times and distances from the NSW BTS's Strategic Travel Model. I've calculated the average speed across all trips in the ABS's journey-to-work data-set, using these travel times/distances.

[***] calculated as in [**], but I use am-peak travel times/distances.


Sunday, February 17, 2013

Computing polygon centroids for non-contiguous polygons in R


I don't like doing spatial analysis in a graphical GIS program like Arc or QGis. I find them fine for looking at data, but not for analysis. For that, I'd rather use a scripting language, and I am NOT going to use Arc's python scripting support (my experience with arcgisscripting is not good).

So, I'm quite excited about the growing support for spatial data in Python and R (Yes, I probably need to get out more). I would love to get to the point where I could do all my analysis in Python or R and dump out the results for viewing in a GIS program.

So, in order to learn about the new R packages for spatial data, I set myself the task of calculating the centroids of non-contiguous polygons. So, more concretely, I mean:


  • I have a layer of polygons, and each polygon has a name/identifier (a postcode, zone number, etc)
  • I would like to calculate the centroid of each name/identifier. Note that there may be more than one polygon with that identifier, and in that case I want the centroid of all the polygons with that name/identifier. (Or, if you prefer, the centre of mass of all the polygons with the same identifier).
Note that the 'centroid' I am calculating here for each name/identifier may not be inside a polygon with that name/identifier. 

And here is the R code to do it (NB: In this data-set, the 'id' field for each polygon is 'tz06'):


#import everything you need in 1 package
library(GISTools)

#read in your shapefile
p = readShapePoly("GMA_polygons_wgs84")

#calculate the centroids of each polygon
trueCentroids = gCentroid(p, byid=TRUE)


#I like to whack the centroids in as extra columns
p$lat = coordinates(trueCentroids)[,2]
p$lon = coordinates(trueCentroids)[,1]

#The 'average' lat and lon is what I want. It is the centre 
#of mass of all polygons with that id. 
#Of course, if there is only a single polygon with that id,
#the average lat/lon is just the regular centroid, so I 
#initialize my centroid to that first
p$avglat = p$lat
p$avglon = p$lon


#Now go through and calculate the 'average' centroid
#for all the tricky areas that are made up of more than
#1 polygon
for(i in 1:length(p$tz06)) 
{
    #find all the polygons with the same id
    matches = p$tz06 == p$tz06[i]
        
    #if there is more than one such polygon
    if(sum(matches) > 1)
    {
        #we weight all of the centroids by area,
        #because we want the centre of mass
        areas = matches*p$Area_sqm
        weights = areas / sum(areas)

        p$avglat[i] = sum(p$lat*weights)
        p$avglon[i] = sum(p$lon*weights)
    }
}


I'm sure this is not the nicest way of doing this, so if you know a cleaner way (hell, there may even be a built-in function for this already!), please let me know. Also, I'm hopeless with regards to functional programming (my brain thinks procedurally), so I've dont the above with a loop, but if you can point out to me a nice functional way of achieving the above with map/filter/fold/etc, I'd be keen to hear about that too.



Saturday, February 9, 2013

The disaster that is Google Drive

My wife is a smart lady. Not particularly interested in technology, but no luddite either. She recently signed up with Google Drive so that pictures and videos of our kids could be backed up online.

Big mistake.

Files mysteriously go missing from our home computer's LOCAL folders (i.e. the folder of images and videos she wants to sync). In fact many of the local folders were empty! Some have strange duplicates (for example, a "Pictures" and a "Pictures(1)" folder). This was quite a shock when she first discovered it -- "Google Drive is deleting the precious files I want to backup!!!". Looking into it, we discovered that other people have had the same problem.

We were able to find some of our 'missing' files in the "Trash" on our local PC. This caused us even more alarm, because we still lost a lot of really precious stuff from our local computer. So what this meant for us is that while we were able to go to Trash and recover some of the files, we lost many of them. My wife also assured me she wasn't doing anything strange like moving stuff around and deleting stuff while trying to back it up.

This seems really bizarre behaviour for a cloud storage solution. It should never delete stuff from your local computer for you. Or perhaps there are a few cases where it should (if you have manually deleted those items on another computer that syncs with your drive account), but if it does, it should be really careful about it. My wife didn't do anything to suggest to Google Drive that it should delete those items.


So, in panic we shut off Google Drive on our home computer and went to the web interface to see what was there. Alarmingly, all the folders we looked at were empty! Real panic stations now... but looking at our usage, we found that we were using ~20GB of Drive storage. Digging a little further, I discovered that I could click on the (hard to find) "All Items" link on the web interface and we could see everything that was uploaded (actually I think this means that it was moved to the online Google Drive Trash, but online Trash fortunately has to be emptied manually). I manually downloaded these (the interface to do this is terrible, as an added bonus, and you can only download in 2GB chunks). End result is that I think I've recovered our stuff. But the first thing I will do when I've confirmed that I've recovered everything is delete our Google Drive account and sign up with Box or DropBox or someone (anyone!) else.


Note that I am a big user of Google products, and I'm generally pretty disposed to liking Google: my work uses gmail for corporate email, I use it for my personal email, share videos via youtube, and so on. Heck this blog is run by Google. But Drive has been a massive headache for us.

Thinking of using Google Drive? Just dont do it.