Sunday, March 16, 2014

Public accountability gone in motorway funding


On the weekend the state government announced the "NorthConnex": another tolled motorway for Sydney. This one will cost $3 billion. That's small relative to the $11 billion WestConnex, but hardly small change. Some people think motorways are a bad idea because they induce more traffic, others think it is folly to invest billions of dollars in roadways when car travel is plateauing or falling in developed countries. Personally, I'm not opposed to motorways provided that:

  1. They have a benefit/cost ratio significantly above one
  2. The assumptions underlying the cost/benefit study are reasonable (there are many ways to game these things)

And this brings me to my main issue with these motorway proposals: there is no publicly available cost/benefit study for either of them. In a sane world, before we agreed to investing billions of dollars in infrastructure, the case for doing so would be publicly laid out. But it isn't, and the politicians are doing society two great disservices by their failure to do this: firstly, they may be wasting our money on ill-conceived projects; and secondly, they are entrenching a culture of opaqueness in the handling of public funds. This second point is especially important in NSW given our terrible record on corruption: if politicians can spend billions of dollars without the accountability of a public cost/benefit study, there will always be the potential for politicians to approve dud projects for a kickback, or a position on the board of a road construction or motorway operating company on retirement. Just so I am clear, I am not claiming that the WestConnex and NorthConnex motorways are going ahead due to corruption, I am simply pointing out that there are no publicly available cost/benefit studies, and we are being asked to trust the politicians and motorway companies without proof. This is no way to run major infrastructure investment.

My second peeve about these projects is that both of them are relying, to a significant degree, on tolls to fund them. This is a bad idea because funding a motorway this way lowers the benefit/cost ratio. This is easy to show mathematically, but is intuitively explained as follows:

  • The motorway operator has an incentive to maximize cost-recovery/revenue, and the toll that does this is not the same as the toll that maximizes public benefit.
  • Tolling individual links in a road network causes perverse outcomes: witness the Cross-City tunnel, which is tolled, and under-utilised, while the equivalent untolled surface route is still congested.





Friday, November 29, 2013

Marginal cost of congestion in Sydney is around $0.7/km

Post Synopsis: Every kilometre you drive in Sydney during peak hour, you are costing other motorists $0.66.... so if you drive 20km to work, other people are paying for your trip to the tune of about $13. If that sounds OK to you, remember that when you are stuck in traffic, you are paying (in time, extra fuel, etc) for someone else's decision to drive during peak hour. The best way out of this mess is congestion charging.

Congestion sucks

We all know that congestion is a drag in Sydney, and in most other large developed and developing cities (for fun, check out this congestion in Sao Paulo). A couple of studies have had a stab at estimating the total cost of congestion in Sydney. They found the following:
  • A CIE report estimated congestion costs of ~ $0.29 per vehicle kilometre in 2005. These are total costs (not avoidable/dead-weight)
  • BITRE estimate congestion costs of ~ $0.10 per vehicle kilometre in a 2007 working paper (a really good paper, btw). These are avoidable/dead-weight costs[*] -- total costs are around double that, so not so far from the CIE study.
I've reported these results (and others, below) in costs per vehicle-km, because thats the most common tangible measure, but if you want the total annual cost, you can multiply the above by the number of vehicle-km in Sydney (42.4 billion in 2005, say), and get total annual costs of $8-12 billion/year (depending on whether you use BITRE or CIE total costs).


Why average congestion costs are not so useful

Having just told you about a couple of prior studies that give average congestion costs, I have to break to it you that average costs aren't really that useful. Why? Two reasons:

  • Firstly, there is that old problem with averages: your average temperature can be fine if you have your head in the oven and your feet in the freezer....  If you are driving during a relatively quiet period, when congestion is low or non-existent, then you aren't imposing costs on others. If, on the other hand, you are driving during peak hour, then you are going to be imposing very high costs. 
  • Secondly, congestion is 'caused' by those last few thousand cars that are on the network. People refer to this effect as the 'school holiday effect', because you only need to remove a small amount of traffic for congestion to improve. This is telling us that the marginal cost of each additional vehicle is much higher than the average cost across all vehicles.
So what really matters, from a policy point of view, is the marginal cost of congestion. That is, what is the cost imposed by each additional driver? Since congestion costs are high during the peak, in this blog post I'm going to work out a rough estimate of the marginal congestion cost of each additional vehicle during peak hour in Sydney


So what are the marginal costs?

We are interested in the costs imposed on other motorists by each additional driver at peak time. We can work this out, to a reasonable approximation, as follows:

The average traffic speed during the inter-peak period is 42.9km/h [**]. The average traffic speed during the peak period is 30.0 km/h [***], so this means that traffic is 43% faster in the inter-peak period than during the (congested) peak period.

What causes this? Extra traffic on the roads, of course. Looking at the Figure below, we can see that during peak times, the volume of traffic on Sydney roads is ~ 270,000 vehicles during peak times, but only ~160,000 during inter-peak.


Source: BTS HTS 2011/12 summary report.

So, an extra 110,000 vehicles on the road network increases trip times by 43%. On a per-vehicle basis, this means each additional vehicle slows down all other traffic by 0.000325%. So if the vehicle is on the road for a minute, it slows down all other traffic by 0.000325% for that minute. Since there are 270,000 vehicles on the road at peak time, this means it costs those other motorists 0.88 minutes of delay. Valuing that at $14/hour (a pretty standard choice for peak travel), that's $0.205 of delay costs per minute of peak travel. If we add in other costs (extra running costs, pollution, etc), as in the BITRE report, this comes up to $0.33/minute. Average peak travel speed is 30 km/h, so converting this to a per-km costs, we get $0.66/km.

Note that this figure (of $0.66/km) is an average across Sydney for the peak period, and would be much higher in specific (congested) parts of the road network (and lower elsewhere). I haven't done the numbers, but my guess is that it would be a few times higher on congested links, which implies the marginal costs on congested roads might be in the vicinity of $2/km in the peak period.






[*] In the economics of congestion, there is an 'optimal' level of congestion: each additional traveller derives a benefit from making a trip, but there is a cost to other motorists due to congestion, and the optimal level of traffic is when the benefit to the additional traveller is equal to the costs imposed on existing travellers. Avoidable/dead-weight-loss congestion costs are the net costs taking into account the benefits of additional travel as well as the costs. Total costs are usually just the costs of delay relative to free-flow speeds.

[**] I've calculated average speed by using inter-peak travel times and distances from the NSW BTS's Strategic Travel Model. I've calculated the average speed across all trips in the ABS's journey-to-work data-set, using these travel times/distances.

[***] calculated as in [**], but I use am-peak travel times/distances.


Sunday, February 17, 2013

Computing polygon centroids for non-contiguous polygons in R


I don't like doing spatial analysis in a graphical GIS program like Arc or QGis. I find them fine for looking at data, but not for analysis. For that, I'd rather use a scripting language, and I am NOT going to use Arc's python scripting support (my experience with arcgisscripting is not good).

So, I'm quite excited about the growing support for spatial data in Python and R (Yes, I probably need to get out more). I would love to get to the point where I could do all my analysis in Python or R and dump out the results for viewing in a GIS program.

So, in order to learn about the new R packages for spatial data, I set myself the task of calculating the centroids of non-contiguous polygons. So, more concretely, I mean:


  • I have a layer of polygons, and each polygon has a name/identifier (a postcode, zone number, etc)
  • I would like to calculate the centroid of each name/identifier. Note that there may be more than one polygon with that identifier, and in that case I want the centroid of all the polygons with that name/identifier. (Or, if you prefer, the centre of mass of all the polygons with the same identifier).
Note that the 'centroid' I am calculating here for each name/identifier may not be inside a polygon with that name/identifier. 

And here is the R code to do it (NB: In this data-set, the 'id' field for each polygon is 'tz06'):


#import everything you need in 1 package
library(GISTools)

#read in your shapefile
p = readShapePoly("GMA_polygons_wgs84")

#calculate the centroids of each polygon
trueCentroids = gCentroid(p, byid=TRUE)


#I like to whack the centroids in as extra columns
p$lat = coordinates(trueCentroids)[,2]
p$lon = coordinates(trueCentroids)[,1]

#The 'average' lat and lon is what I want. It is the centre 
#of mass of all polygons with that id. 
#Of course, if there is only a single polygon with that id,
#the average lat/lon is just the regular centroid, so I 
#initialize my centroid to that first
p$avglat = p$lat
p$avglon = p$lon


#Now go through and calculate the 'average' centroid
#for all the tricky areas that are made up of more than
#1 polygon
for(i in 1:length(p$tz06)) 
{
    #find all the polygons with the same id
    matches = p$tz06 == p$tz06[i]
        
    #if there is more than one such polygon
    if(sum(matches) > 1)
    {
        #we weight all of the centroids by area,
        #because we want the centre of mass
        areas = matches*p$Area_sqm
        weights = areas / sum(areas)

        p$avglat[i] = sum(p$lat*weights)
        p$avglon[i] = sum(p$lon*weights)
    }
}


I'm sure this is not the nicest way of doing this, so if you know a cleaner way (hell, there may even be a built-in function for this already!), please let me know. Also, I'm hopeless with regards to functional programming (my brain thinks procedurally), so I've dont the above with a loop, but if you can point out to me a nice functional way of achieving the above with map/filter/fold/etc, I'd be keen to hear about that too.



Saturday, February 9, 2013

The disaster that is Google Drive

My wife is a smart lady. Not particularly interested in technology, but no luddite either. She recently signed up with Google Drive so that pictures and videos of our kids could be backed up online.

Big mistake.

Files mysteriously go missing from our home computer's LOCAL folders (i.e. the folder of images and videos she wants to sync). In fact many of the local folders were empty! Some have strange duplicates (for example, a "Pictures" and a "Pictures(1)" folder). This was quite a shock when she first discovered it -- "Google Drive is deleting the precious files I want to backup!!!". Looking into it, we discovered that other people have had the same problem.

We were able to find some of our 'missing' files in the "Trash" on our local PC. This caused us even more alarm, because we still lost a lot of really precious stuff from our local computer. So what this meant for us is that while we were able to go to Trash and recover some of the files, we lost many of them. My wife also assured me she wasn't doing anything strange like moving stuff around and deleting stuff while trying to back it up.

This seems really bizarre behaviour for a cloud storage solution. It should never delete stuff from your local computer for you. Or perhaps there are a few cases where it should (if you have manually deleted those items on another computer that syncs with your drive account), but if it does, it should be really careful about it. My wife didn't do anything to suggest to Google Drive that it should delete those items.


So, in panic we shut off Google Drive on our home computer and went to the web interface to see what was there. Alarmingly, all the folders we looked at were empty! Real panic stations now... but looking at our usage, we found that we were using ~20GB of Drive storage. Digging a little further, I discovered that I could click on the (hard to find) "All Items" link on the web interface and we could see everything that was uploaded (actually I think this means that it was moved to the online Google Drive Trash, but online Trash fortunately has to be emptied manually). I manually downloaded these (the interface to do this is terrible, as an added bonus, and you can only download in 2GB chunks). End result is that I think I've recovered our stuff. But the first thing I will do when I've confirmed that I've recovered everything is delete our Google Drive account and sign up with Box or DropBox or someone (anyone!) else.


Note that I am a big user of Google products, and I'm generally pretty disposed to liking Google: my work uses gmail for corporate email, I use it for my personal email, share videos via youtube, and so on. Heck this blog is run by Google. But Drive has been a massive headache for us.

Thinking of using Google Drive? Just dont do it.

Sunday, August 7, 2011

How to make high-speed rail stack-up

On the face of it, the latest of a long line of studies into fast rail appears to show, yet again, that it is going to be hard to justify the huge up-front capital costs. An estimated cost of $80 billion (give or take $20-odd billion, but who cares about loose change?), which is not going to be recouped in any significant way from fares. So lets not bother, right?

But hang on, aren't we missing something here? Well, only the main cost-recovery avenue -- value (re)capture. The idea is hardly new, but for some reason I can't fathom it is rarely discussed in detail in these fast rail feasibility studies. For those unfamiliar with the idea, it's quite simple.  The government is going to shell out huge buckets of money building a high-speed rail line. Anyone owning property near the rail stations will immediately see the value of that property increase. Since these property owners haven't done anything themselves, and are just benefiting from the government's spending, it's fair for the government to recoup the increase. At this point people often start quibbling about difficulties in estimating the benefits, or object to trying to claw back money from old ladies who just happen to own a house near one of the stations. None of that is relevant except in political terms, and for the moment lets stay away from the politics: we can come back to that later.

So how important is value-capture, relative to fare-box revenue? Very. Lets do our own back-of-the envelope calculations and see. First up, lets simplify things just to make the case for value recapture clear. Let's just take the Sydney-Newcastle-Canberra section of the track. The best-guess cost for this in the most recent study is $29.3 billion; let's just call it $30 billion.

Total cost: $30 billion

A further simplification will help bring out my point, and allow us to avoid complicated calculations that just obscure my main point. Let's say that instead of linking Sydney, Canberra and Newcastle, we just build south-west from Sydney out to empty sheep paddocks near Golbourne, and North to some easily developable land near the Central Coast or Newcastle. So, we have two lines, with 4 stops along each line (as in the most recent study), and a train commute of 40 minutes or under for each station.

Now, suppose we develop a small 2km radius 'town' around each station. Since we've built out to sheep paddocks this should be no problem. A 2km radius town with 8 lots per hectare will give us 10000 lots in each town. How much would each lot be worth? With a sub-40minute commute to Sydney, I'd say at least three or four hundred thousand. But lets say after servicing and other costs, the net increase is $200k for each lot. So, 8 stations, 10000 lots per station, $200k per lot, gives us $16 billion dollars:

Land-capture capital recovery: conservatively, $16 billion

So with some pretty conservative assumptions we have recouped half the capital cost. Getting a little more creative should allow even greater cost recovery. Developing a 3km^2 town at each station, or developing at higher densities, are easy adjustments that would increase revenue and get you to near full cost-recovery without any fare-box revenue.

So what about fare-box revenue? In our 8 towns, with 10000 dwellings, lets say there are 10000 workers, 30% of which commute using the rail line, with the other 70% commuting by car or some other mode. So, 3000 workers in each town commuting each way 200 days a year gives us 9.6 million trips. Call it 10 million. Add another 10 million non-work trips. Say each trip costs $10. So, that's $200 million a year, or a 0.6% p.a. gross return on $30 billion invested, before we take into account operating costs. Fiddle with the above numbers any way you like, it wont change the fact that fare-box revenue is almost small enough to forget about altogether, and certainly much less important than value-capture as a funding mechanism.

Net fare-box revenue: approximately nil.

Bottom-line: if you want to fund high-speed rail, heck, in fact rail of any kind, you need to put value recapture front and centre of your approach to cost-recovery. Why isn't it already? Because property owners would prefer to pocket the benefits from the rail-line themselves, and many will fight tooth-and-nail to do so. Building the line to some empty sheep paddock mostly avoids this, because there are only a handful of large land-owners likely to be involved, and that's why I took that approach in the above example, but in reality there would be a lot of opposition to such a move ("It's crazy to build a rail line out to nowhere! We need to strengthen the centres we already have!"). And  if you do build to existing centres, it is politically difficult to claw back the increased land values (not that any Australian government have really tried very hard....). In fact, building the line out to empty paddocks is probably not the best plan, economically, anyway, provided that you have a mechanism to recoup the increased value from existing land-holders.

And the reason land-value capture works so well (if you can do it, politically), is because our dysfunctional planning systems have made supply of new dwellings, at low cost, so difficult. So there is a lot of suppressed demand for accessible locations. A rail line is, essentially, a way of 'manufacturing' new accessible locations. Releasing land on the urban fringe is not the same thing -- a point missed by many who claim that the fix to our housing supply problems is just to release more land. This would help a bit, but the real demand is not for land, but for dwellings with good accessibility (to jobs, and other things), and accessibility can only really be supplied by transport infrastructure.

As a final aside, some economists have pointed out that because the benefits of rail systems get capitalized into a relatively small number of land-holders, this makes rail distributively unfair, as costs are shared and benefits and concentrated. The benefits of a motorway, on the other-hand, are much more widely spread. This is a valid point, and can't be addressed without re-thinking value capture. Just to be clear: I'm not arguing for motorways here, just making it clear that making rail work requires us to have a discussion about the fairness of recouping private gains from public infrastructure investment.

Wednesday, May 4, 2011

Connecting to mysql from python the odbc way from mac

This post just describes how to set up python to talk to a mysql database on a mac. Written for reference, and for co-workers who will shortly need to do something like this.

I wanted to be able, in a python script, to talk to a remote MySQL database from my mac. As I went about this I found the documentation a little scattered -- I couldnt find a single page giving the instructions, and so I've just written up the steps here.

I assume that you have python installed already.

Step 1: Install pyodbc

First step is to install pyodbc if you havent got it installed already. I downloaded pyodbc from http://code.google.com/p/pyodbc/downloads/list (I downloaded the source distribution because there were no binaries for mac. The source distribution is the one that ends in .zip)

After unzipping this, going into the directory that is created by the unzip and typing python setup.py install at the command prompt should install everything without you needing to do anything else.

Step 2: Check that you have an odbc manager installed

pyodbc wont do all the work for you -- it relies on an odbc manager to ferry things back and forth to databases. On a Mac you should have an odbc manager installed already by default. My machine has one called iodbc: if I type iodbctest at the command prompt* I get the following:

iODBC Demonstration program
This program shows an interactive SQL processor
Driver Manager: 03.52.0607.1008

Enter ODBC connect string (? shows list):

If I then type '?' as suggested, I get an empty list... which explains that there are no drivers installed. What is a driver? Its a database-specific connector that lets the odbc manager talk to a particular database (MySQL, Postgresql, SQLServer, etc). So the last link required in the communication chain is a mysql driver.

* Apparently you can get to iodbc through Applications->Utilities as well if you want to look for it in the Mac GUI.

Step 3: Install your odbc drivers

OK, so we now need to install specific odbc drivers for each database we want to connect to. In this case, I'm just going to install MySQL. I grab the odbc driver from http://dev.mysql.com/downloads/connector/odbc (picking the Mac .dmg file just to make the installation easy). A few clicks on the .dmg file and the driver is installed.

I can check that it is installed by running iodbctest again from the command prompt. This time when I type '?' to see what drivers are installed I get:

Enter ODBC connect string (? shows list): ?

DSN | Driver
------------------------------------------------------------------------------
myodbc | MySQL ODBC 5.1 Driver

OK, so that completes the pipeline. Now in python I can use pyodbc, which talks to iodbc (or whatever ODBC manager you have installed), which uses the mysql odbc driver to talk to a MySQL database.

So lets do that and check that it works

Step 4: connect to your database from python

OK, so now we should be able to connect to your database in python.

But one last thing you need to do is to work out what connection string you need to connect to your database. A helpful resource with example connection strings can be found at http://www.connectionstrings.com/. Here is what I did in python to connect to my database:

>>> import pyodbc
>>> connstr="Driver={MySQL ODBC 5.1 Driver};Server=your.server.;Port=3306;Database=databasename;User=username; Password=nottelling;Option=3;"
>>> conn=pyodbc.connect(connstr)
>>>

and from here you can just play with the conn object as you like in python! So, for example, to look at a table:

>>> res = conn.execute("select * from sometable")
>>>for r in res:
>>>    print r.next()

Saturday, April 9, 2011

Rental yields will increase, but not the way they'd like....

In response a growing chorus of analysts pointing out that Australian property seems to have both limited prospects for capital growth and low rental yields, many property market boosters respond that yields will increase (even the boosters have given up spruiking capital growth as a reason to buy property -- that boat has sailed).

Now, for once I am in agreement with the property markets boosters -- yields must increase. But there are two ways for yields on residential property to increase. Increasing rents is one (and clearly the one the boosters have in mind). But falling prices will do the job just nicely also. Just saying.