Tuesday, March 30, 2010

Schumacher's foresight and the folly of complex forecasting models

I've been reading Schumacher's "Small is Beautiful" recently -- quite an old book now but still worth a read. He writes very clearly, and the subject matter of the book is mostly still relevant, and the key ideas still thought-provoking. He shows uncommon foresight on a few topics. For example, he talks about the near certainly of a future where oil is scarce & expensive, and he was doing so before even the 1970's oil shocks. He also talks about the folly of eroding natural capital long before it was common to do this. But perhaps most interestingly, he identifies the process where a void in values (created by the decline of religion) was being filled by economics, and despite being an economist, he thought this dangerous. He thought it far safer to rely on traditional cultural & religious value systems than to subject everything to economic cost benefit analysis. I think this last point is especially interesting because, while most economists will probably baulk at it, recent events lend weight to Schumacher's view. For example, look at all the economic debate and analysis that has gone into climate change, and look where it has got us. Wouldn't we have done better developing a value system where caring for our planet was more than just an economic consideration, but something sacred?

Schumacher wasnt a hippy or anything either, nor was he a fuzzy-minded economist who hated the reduction of everything to numbers because he himself couldnt do any maths. In fact, he was a competent mathematician. This makes some of what he has to say about model-building in the later chapters of his book particularly interesting. Being a bit of a data-cruncher and model builder myself, I liked the quotes in particular, which are right on the money, and which I think should be compulsory reading for any model builder:

"Crude methods of forecasting ... are not likely to lead into the errors of spurious verisimilitude and spurious detailing -- the two greatest vices of the statistician. Once you have a formula and an electronic computer, there is an awful temptation to squeeze the lemon until it is dry and present a picture of the future which through its very precision and verisimilitude carries conviction. Yet a man who uses an imaginary map, thinking it a true one, is likely to be worse off than someone with no map at all; for he will fail to inquire wherever he can, to observe every detail on his way, and to search continuously with all his senses and all his intelligence for indications of where he should go.

The person who makes the forecasts may still have a precise appreciation of the assumptions on which they are based. But the person who uses the forecasts may have no idea at all that the whole edifice, as is often the case, stands and falls with one single, unverifiable assumption. He is impressed by the thoroughness of the job done, by the fact that everything seems to 'add up;, and so forth. If the forecasts were presented quite artlessly, as it were, on the back of an envelope, he would have a much better chance of appreciating their tenuous character and the fact that, forecasts or no forecasts, someone has to take an entrepreneurial decision about the unknown future."

I wish I could say I had never fallen for the temptation to go for a complex model with 'rich' outputs, when it would have been essentially as accurate, and less misleading, to present a few back of the envelope projections.....

Monday, March 8, 2010

More arcgisscripting hacking

As I mentioned in an earlier post, I've been fumbling around trying to learn how to get a python program that can talk to ARCGis. After getting around some initial irritations, I'm actually making some headway. I found this series of posts particularly helpful. As a reminder to myself, and as help for anyone else grappling with this, I'll post the first non-trivial script I've come up with here. This script is still pretty simple -- it opens up a shapefile and reads all the information associated with the shapefile. Here it is:



import arcgisscripting
import sys

#This is my first attempt to do some python
#scripting with ArcGIS Geocomputation object

#first we create the gis scripting object
gp = arcgisscripting.create(9.3)

gp.Workspace = "C:/tmp"
#allow old files to be overwritten
gp.OverWriteOutput = True

#lets print out some information about a layer
shapefile= "C:/tmp/test.shp"
desc = gp.Describe(shapefile)

#From Describe object
print desc.DataType
print desc.CatalogPath

#From FeatureClass object. The Describe object
#that is returned by the above call to Describe
#is also a 'FeatureClass' object. I havent quite
#got my head around the data model used by the
#geoprocessing object, so I don't quite understand
#how we know which fields are available and which
#arent after a call to describe.... but these work
#for the shape file I'm using.
print desc.ShapeType
print desc.FeatureType
print desc.ShapeFieldName

#list the fields. This is the same as
#returned by ListFields()
for f in desc.Fields:
....print " "+f.Name


#now go and read each line in the spatial
#data table associated with the shapefile.

#first we get a 'read' cursor
readCursor = gp.SearchCursor(shapefile)

#make sure we are at the start
readCursor.Reset()

#now go through and read (and print out) each row
row = readCursor.Next()
while (row != None):
....#print the contents of the row
....resstr = "ROW: "
....for f in desc.Fields:
........resstr = resstr + str(row.GetValue(f.Name)) + " , "
....print resstr

row = readCursor.Next()


#done!

Sunday, March 7, 2010

pure eye-candy

As part of the UrbanIT research project I've been co-opted into at UNSW, I've been writing some code to 'visualize' the contents of various databases holding information about the built environment. To be very brief, the aim of the project is to integrate the existing disparate data-sets that are out there to create a single unified view of the urban environment. So, you may have data on (say) the energy efficiency of different buildings, the cost of maintaining those buildings, the price of those buildings, etc etc. All this data is tucked away in different databases and you need (essentially) an integration layer sitting on top to, well, integrate it all. Once you do this, you can do relatively complex spatial queries across multiple data sources. Convert the output to KML and you get some pretty pictures like this:



This is a mashed-up visualization of two seperate queries -- one asking for the buildings within 60m of a particular lat/long, coloured by their main use (commercial:blue, industrial:red,residential:green,other:grey). The second is a query on a particular residential building with the individual apartments coloured by their estimated value (warmer colours == more expensive apartments).

Sorry, I cant provide the KML itself, as the data associated with the buildings shown in the picture is pretty rich, and is confidential.