Geocoding is hard…

One of the problems I’m having with this data load is that instead of telling you what country each waypoint is in, they tell you the “responsible authority”. Ok, normally that’s not too hard to map to a country, and sometimes there are multiple authorities for a country, (and the Czech Republic is super annoying because they designate every little flying club or airport owner as a “responsible authority”). That I can take care of with a simple lookup table – 305 entries, 90 of them in the Czech Republic. The problem occurs because sometimes the “responsible authority” covers multiple countries. “Serbia/Montenegro” in the Balkans, “Comoros/Madagascar/Reunion” in the Indian Ocean, Aruba/Netherlands Antilles” in the Caribbean, “Kiribati/Tuvala”, “Kiribati/Line Islands”, “American Samoa/Western Samoa” in the Pacific. (Although didn’t I read somewhere that the Netherlands Antilles recently split up into a bunch of separate countries?) Anyway, I want to disambiguate these and determine which country points in these merged authorities are in.

First I thought I’d look for the closest point in my existing database. Turns out, some of the new points are near borders so I end up getting the wrong country. Aha, I thought, I’ll use “Reverse Geocoding”. A while back I used a service at geonames.org to reverse geocode some points to determine which Canadian province they were in. I tried it, and the service is really slow to respond. So I thought I’d try Google’s new reverse geocoding. That’s when I discovered a couple of flies in my oatmeal:

  1. There are locations in the world where Google returns no results, in one case I saw because the point is slightly off shore according to Google Maps (although if you switch to satellite view you can see the point is actually on land). In another case, the result is puzzling – yes, it’s in Kosovo so maybe it’s disputed territory, but it’s not too far from the village of Lluge which Google does recognize.
  2. Addresses in Kosovo show up in the “formatted_address” field as “Lluge, Kosovo”, but the country code that is returned is Serbia. The data I’ve used before comes from the US government, and since the US government officially recognizes Kosovo, it would be inconsistent to label the new stuff as from Serbia instead of Kosovo

Oh, and geonames.org? It eventually seems to do the right thing for both of the above cases, although the country code it returns for Kosovo is “XK” (it appears that there isn’t an official ISO country code for Kosovo – I’d previously seen “KS”. I guess I’ll have to experiment more.

One thought on “Geocoding is hard…”

  1. I saw something that hinted at KV being assigned to Kosovo, but further reading revealed it hasn’t been assigned. XK is a “user assigned” code in ISO3166, so you could always use XK and correct it later.

    As for falling outside the polygons, for the unassigned points can you use nearest neighbour or even flag them for manual intervention? Perhaps you could use something like Amazon’s Mechanical Turk service (https://www.mturk.com/mturk/welcome) and get people to manually geocode the problem points for you.

Comments are closed.