As I wrote in Rants and Revelations » C Paul Program, C Paul Rant, I’m trying to “recreate” a C program that I wrote back in 2001. I’m using the same shapelib library for reading ESRI shape files, I’m using (what I think is) the same algorithm for doing point in polygon tests, and it’s using the same shape file, one that was published on the ESRI web site back when Nunavut first became a territory. And it seems to be up and running.
So I wrote a little perl script that goes through my database and picks out the 3400+ waypoints that are in Canada and which come from DAFIF, and run them through both the old and new program. Both the old and new programs are looking at the same shape file. So you’d expect the exact same results, right? Wrong. There are 309 points that the old program assigns provinces to that the new program says are outside of any provincial boundary. Some of those are ones that I’d already noticed were assigned to provinces even though they were way out at sea and probably shouldn’t be, but some were airports that are near the coast or on islands. That’s a problem.
My first suspicion was that the algorithm as published used floats, but I’d converted it to using doubles because face it, these days computers are so damn fast. But I switched back to floats and now there are 314 differences. Some of the original 309 are now back to what they were, but some points have jumped provinces, like COMPR which moved from Alberta to Saskatchewan (which Google Maps just barely agrees with, by the way) and KITAR which moved from British Columbia to Yukon Territory (which Google Maps says is a toss up). But most importantly, it didn’t “fix” any of the coastal or island airports that ended up with no province. I’m not convinced that’s a net positive. So I’ve gone back to using doubles.
Ok, one other difference is that the notes for the algorithm mentioned a way to compensate for shapes with holes in them by inserting (0.0,0.0) points between the rings. I don’t think I did that before. So I tried without. No dice – still 309 differences.
And then I remembered one other difference. I call SHPRewindObject on each shape as I read it. According to the docs, that’s supposed to fix any problems with shapes that go the wrong way round. But no, that didn’t change anything either.
So I’m left with a mystery about why there is a difference between the old program and the new one. Since I don’t have a GIS program that can read and manipulate shape files, I think my next step will have to be to turn these shape files into Google Maps API polygons so I can plot these wayward points and see if the problem is in the shapes or in the algorithm.