I don't like doing spatial analysis in a graphical GIS program like Arc or QGis. I find them fine for looking at data, but not for analysis. For that, I'd rather use a scripting language, and I am NOT going to use Arc's python scripting support (my experience with arcgisscripting is not good).
So, I'm quite excited about the growing support for spatial data in Python and R (Yes, I probably need to get out more). I would love to get to the point where I could do all my analysis in Python or R and dump out the results for viewing in a GIS program.
So, in order to learn about the new R packages for spatial data, I set myself the task of calculating the centroids of non-contiguous polygons. So, more concretely, I mean:
- I have a layer of polygons, and each polygon has a name/identifier (a postcode, zone number, etc)
- I would like to calculate the centroid of each name/identifier. Note that there may be more than one polygon with that identifier, and in that case I want the centroid of all the polygons with that name/identifier. (Or, if you prefer, the centre of mass of all the polygons with the same identifier).
Note that the 'centroid' I am calculating here for each name/identifier may not be inside a polygon with that name/identifier.
And here is the R code to do it (NB: In this data-set, the 'id' field for each polygon is 'tz06'):
#import everything you need in 1 package
library(GISTools)
#read in your shapefile
p = readShapePoly("GMA_polygons_wgs84")
#calculate the centroids of each polygon
trueCentroids = gCentroid(p, byid=TRUE)
#I like to whack the centroids in as extra columns
p$lat = coordinates(trueCentroids)[,2]
p$lon = coordinates(trueCentroids)[,1]
#The 'average' lat and lon is what I want. It is the centre
#of mass of all polygons with that id.
#Of course, if there is only a single polygon with that id,
#the average lat/lon is just the regular centroid, so I
#initialize my centroid to that first
p$avglat = p$lat
p$avglon = p$lon
#Now go through and calculate the 'average' centroid
#for all the tricky areas that are made up of more than
#1 polygon
for(i in 1:length(p$tz06))
{
#find all the polygons with the same id
matches = p$tz06 == p$tz06[i]
#if there is more than one such polygon
if(sum(matches) > 1)
{
#we weight all of the centroids by area,
#because we want the centre of mass
areas = matches*p$Area_sqm
weights = areas / sum(areas)
p$avglat[i] = sum(p$lat*weights)
p$avglon[i] = sum(p$lon*weights)
}
}
I'm sure this is not the nicest way of doing this, so if you know a cleaner way (hell, there may even be a built-in function for this already!), please let me know. Also, I'm hopeless with regards to functional programming (my brain thinks procedurally), so I've dont the above with a loop, but if you can point out to me a nice functional way of achieving the above with map/filter/fold/etc, I'd be keen to hear about that too.