Sunday, February 17, 2013

Computing polygon centroids for non-contiguous polygons in R


I don't like doing spatial analysis in a graphical GIS program like Arc or QGis. I find them fine for looking at data, but not for analysis. For that, I'd rather use a scripting language, and I am NOT going to use Arc's python scripting support (my experience with arcgisscripting is not good).

So, I'm quite excited about the growing support for spatial data in Python and R (Yes, I probably need to get out more). I would love to get to the point where I could do all my analysis in Python or R and dump out the results for viewing in a GIS program.

So, in order to learn about the new R packages for spatial data, I set myself the task of calculating the centroids of non-contiguous polygons. So, more concretely, I mean:


  • I have a layer of polygons, and each polygon has a name/identifier (a postcode, zone number, etc)
  • I would like to calculate the centroid of each name/identifier. Note that there may be more than one polygon with that identifier, and in that case I want the centroid of all the polygons with that name/identifier. (Or, if you prefer, the centre of mass of all the polygons with the same identifier).
Note that the 'centroid' I am calculating here for each name/identifier may not be inside a polygon with that name/identifier. 

And here is the R code to do it (NB: In this data-set, the 'id' field for each polygon is 'tz06'):


#import everything you need in 1 package
library(GISTools)

#read in your shapefile
p = readShapePoly("GMA_polygons_wgs84")

#calculate the centroids of each polygon
trueCentroids = gCentroid(p, byid=TRUE)


#I like to whack the centroids in as extra columns
p$lat = coordinates(trueCentroids)[,2]
p$lon = coordinates(trueCentroids)[,1]

#The 'average' lat and lon is what I want. It is the centre 
#of mass of all polygons with that id. 
#Of course, if there is only a single polygon with that id,
#the average lat/lon is just the regular centroid, so I 
#initialize my centroid to that first
p$avglat = p$lat
p$avglon = p$lon


#Now go through and calculate the 'average' centroid
#for all the tricky areas that are made up of more than
#1 polygon
for(i in 1:length(p$tz06)) 
{
    #find all the polygons with the same id
    matches = p$tz06 == p$tz06[i]
        
    #if there is more than one such polygon
    if(sum(matches) > 1)
    {
        #we weight all of the centroids by area,
        #because we want the centre of mass
        areas = matches*p$Area_sqm
        weights = areas / sum(areas)

        p$avglat[i] = sum(p$lat*weights)
        p$avglon[i] = sum(p$lon*weights)
    }
}


I'm sure this is not the nicest way of doing this, so if you know a cleaner way (hell, there may even be a built-in function for this already!), please let me know. Also, I'm hopeless with regards to functional programming (my brain thinks procedurally), so I've dont the above with a loop, but if you can point out to me a nice functional way of achieving the above with map/filter/fold/etc, I'd be keen to hear about that too.



Saturday, February 9, 2013

The disaster that is Google Drive

My wife is a smart lady. Not particularly interested in technology, but no luddite either. She recently signed up with Google Drive so that pictures and videos of our kids could be backed up online.

Big mistake.

Files mysteriously go missing from our home computer's LOCAL folders (i.e. the folder of images and videos she wants to sync). In fact many of the local folders were empty! Some have strange duplicates (for example, a "Pictures" and a "Pictures(1)" folder). This was quite a shock when she first discovered it -- "Google Drive is deleting the precious files I want to backup!!!". Looking into it, we discovered that other people have had the same problem.

We were able to find some of our 'missing' files in the "Trash" on our local PC. This caused us even more alarm, because we still lost a lot of really precious stuff from our local computer. So what this meant for us is that while we were able to go to Trash and recover some of the files, we lost many of them. My wife also assured me she wasn't doing anything strange like moving stuff around and deleting stuff while trying to back it up.

This seems really bizarre behaviour for a cloud storage solution. It should never delete stuff from your local computer for you. Or perhaps there are a few cases where it should (if you have manually deleted those items on another computer that syncs with your drive account), but if it does, it should be really careful about it. My wife didn't do anything to suggest to Google Drive that it should delete those items.


So, in panic we shut off Google Drive on our home computer and went to the web interface to see what was there. Alarmingly, all the folders we looked at were empty! Real panic stations now... but looking at our usage, we found that we were using ~20GB of Drive storage. Digging a little further, I discovered that I could click on the (hard to find) "All Items" link on the web interface and we could see everything that was uploaded (actually I think this means that it was moved to the online Google Drive Trash, but online Trash fortunately has to be emptied manually). I manually downloaded these (the interface to do this is terrible, as an added bonus, and you can only download in 2GB chunks). End result is that I think I've recovered our stuff. But the first thing I will do when I've confirmed that I've recovered everything is delete our Google Drive account and sign up with Box or DropBox or someone (anyone!) else.


Note that I am a big user of Google products, and I'm generally pretty disposed to liking Google: my work uses gmail for corporate email, I use it for my personal email, share videos via youtube, and so on. Heck this blog is run by Google. But Drive has been a massive headache for us.

Thinking of using Google Drive? Just dont do it.