More geocoding nonsense

I had to go back to using geonames.org because of the problems I’d already told you about with Google’s geocoder. But geonames.org has a very strange bug. I’d experimented, and found that sometimes it didn’t return anything, especially for something like a point just off-shore of a small island nation. You’re supposed to be able to feed it a “radius” so it can apply some slop, and sure enough, applying a radius of 25 or so made sure that those points were getting a result. But that’s when I discovered that it was returning the wrong result for places like Pelee Island, which as I’m sure you’re all aware is a tiny little island in Lake Ontario that’s part of Ontario, but is actually closer to Ohio. If you asked geonames for the country and subdivision with no radius, it would return Ontario, CA. But if you gave it a radius of 25, it would return Ohio, US. So I’ve got a dilemma – choose too small a radius, and it won’t find anything for some points, but choose too big a radius, and for some points it will return entirely the wrong thing.

So this is what I’m stuck with – I ask geonames for the country and subdivision with a radius of 1. If it doesn’t find anything, it multiplies the radius by 5, sleeps for 250 milliseconds (to be nice to the geonames server) and tries again. So far that finds a result with a radius of 1 749 times, a radius of 5 9 times, and a radius of 25 3 times. It’s not a good thing – obviously it would be better if geonames returned the right thing the first time, but I’ve done a number of spot checks and it seems to be working.

Geocoding is hard…

One of the problems I’m having with this data load is that instead of telling you what country each waypoint is in, they tell you the “responsible authority”. Ok, normally that’s not too hard to map to a country, and sometimes there are multiple authorities for a country, (and the Czech Republic is super annoying because they designate every little flying club or airport owner as a “responsible authority”). That I can take care of with a simple lookup table – 305 entries, 90 of them in the Czech Republic. The problem occurs because sometimes the “responsible authority” covers multiple countries. “Serbia/Montenegro” in the Balkans, “Comoros/Madagascar/Reunion” in the Indian Ocean, Aruba/Netherlands Antilles” in the Caribbean, “Kiribati/Tuvala”, “Kiribati/Line Islands”, “American Samoa/Western Samoa” in the Pacific. (Although didn’t I read somewhere that the Netherlands Antilles recently split up into a bunch of separate countries?) Anyway, I want to disambiguate these and determine which country points in these merged authorities are in.

First I thought I’d look for the closest point in my existing database. Turns out, some of the new points are near borders so I end up getting the wrong country. Aha, I thought, I’ll use “Reverse Geocoding”. A while back I used a service at geonames.org to reverse geocode some points to determine which Canadian province they were in. I tried it, and the service is really slow to respond. So I thought I’d try Google’s new reverse geocoding. That’s when I discovered a couple of flies in my oatmeal:

  1. There are locations in the world where Google returns no results, in one case I saw because the point is slightly off shore according to Google Maps (although if you switch to satellite view you can see the point is actually on land). In another case, the result is puzzling – yes, it’s in Kosovo so maybe it’s disputed territory, but it’s not too far from the village of Lluge which Google does recognize.
  2. Addresses in Kosovo show up in the “formatted_address” field as “Lluge, Kosovo”, but the country code that is returned is Serbia. The data I’ve used before comes from the US government, and since the US government officially recognizes Kosovo, it would be inconsistent to label the new stuff as from Serbia instead of Kosovo

Oh, and geonames.org? It eventually seems to do the right thing for both of the above cases, although the country code it returns for Kosovo is “XK” (it appears that there isn’t an official ISO country code for Kosovo – I’d previously seen “KS”. I guess I’ll have to experiment more.