That was worrisome

For reasons I don’t remember, I did a “mdadm –detail /dev/md0” on my home Linux server and noticed that the RAID was busy quietly rebuilding itself. That prompted me to try the same command on my dom0 on my colo box, and what I discovered there was even worse – the second disk on my RAID-1 (mirror) was marked as a “spare” and some other status that indicated that it wasn’t rebuilding, and the mirror disk was marked as missing.

I removed the second disk from the RAID and re-added it, and it went to the status “spare, rebuilding” and the RAID status was “active, degraded, rebuilding”, and some hours later it was back up and happy.

During that time, I discovered that there had been a few emails about SMARTD problems and RAID problems, but because I had set up exim wrong, they weren’t getting delivered. I tried a few things to get exim set up, and then when they didn’t work I decided that since I know how to set up postfix just fine, I uninstalled exim, installed postfix, and got it configured in less time than it took for the RAID to rebuild.

The fact that the RAID degraded in the first place gives me pause, but the fact that I was able to recover it without any downtime makes me happy that I choose to do a RAID in the first place. I’ll keep and eye on it and maybe order a replacement disk or two so I’m ready if something fails again.

More geo coding

I got the airport data nailed down, at least all the stuff I need for iPhone CoPilot (which unlike the other databases I provide doesn’t care about communications frequencies or runways). And now I’m looking at “waypoints”, the points in space, sometimes defined by the intersection of a specific radial or bearing from this navigation aid and a specific radial or bearing from that navigation aid, sometimes a distance and radial from one navigation aid, or in the case of GPS instrument approaches and air routes, just points in space.

The difficulty with waypoints is that their definition in the file doesn’t have any sort of location information other than latitude and longitude, which means I have to hit the geonames server for every one (and so far I’ve gone over my hourly limit with them multiple times while testing this code), and that sometimes they, unlike airports, can be out in the middle of the ocean somewhere. So the geonames “countrySubdivision” service just says “I have no idea what country this is in”.

Unfortunately, my code doesn’t like it when a point isn’t in a country. I need to assign every point a 2 letter country code (I use the FIPS 10.4 code instead of ISO-3166-1 because my first world data came from DAFIF, which used FIPS 10.4 and I stuck with it. I’d probably switch to ISO-3166-1 except I have no idea how to do it painlessly.)

In my program to load FAA data, I do some messing around trying to map the country names they use to FIPS 10.4, and sometimes I’ve done some things I’m not proud of, like mapping “French West Indies” to “GP” (the code for Guadeloupe, which is just one of the four territories that make up the French West Indies) or “Trust Territories” to “JQ” (the code for Johnston Atoll) – that one is really dodgy because the “Trust Territories” were broken down into the Republic of the Marshall Islands (“RM”), the Federated States of Micronesia (“FM”), The Commonwealth of the Northern Mariana Islands (“CQ”) and the Republic of Palau (“PS”). Actually if I looked through the FAA data these days, I’d probably find they never use the name “Trust Territories” any more. Another one that comes up is the United States Minor Outlying Islands, which has an ISO-3166-1 code “UM”, but which consists of 9 separate “insular areas” that have their own FIPS 10.4 codes.

So my thought was to ask the geonames “ocean” service what body of water these points is, and then make up a phoney country code for each ocean. Unfortunately there aren’t just a few oceans, there are are dozens of them – everything from the Arabian Sea to the South Pacific Ocean. So many that I can’t come up with semi-mnemonic identifiers for them. So using the fact that FIPS 10.4 codes never start with O or X, I just went though and assigned anything with “Ocean” in the name a code starting with “O” and anything else a code starting with “X”. It sucks, but it will work. Sort of. I hope.

The long term solution is that separate the code I use for iPhone CoPilot further away from the other navaid.com code, and not require a non-null country code in iPhone CoPilot. Also try to migrate to ISO-3166-1 country codes.

More geocoding nonsense

I had to go back to using geonames.org because of the problems I’d already told you about with Google’s geocoder. But geonames.org has a very strange bug. I’d experimented, and found that sometimes it didn’t return anything, especially for something like a point just off-shore of a small island nation. You’re supposed to be able to feed it a “radius” so it can apply some slop, and sure enough, applying a radius of 25 or so made sure that those points were getting a result. But that’s when I discovered that it was returning the wrong result for places like Pelee Island, which as I’m sure you’re all aware is a tiny little island in Lake Ontario that’s part of Ontario, but is actually closer to Ohio. If you asked geonames for the country and subdivision with no radius, it would return Ontario, CA. But if you gave it a radius of 25, it would return Ohio, US. So I’ve got a dilemma – choose too small a radius, and it won’t find anything for some points, but choose too big a radius, and for some points it will return entirely the wrong thing.

So this is what I’m stuck with – I ask geonames for the country and subdivision with a radius of 1. If it doesn’t find anything, it multiplies the radius by 5, sleeps for 250 milliseconds (to be nice to the geonames server) and tries again. So far that finds a result with a radius of 1 749 times, a radius of 5 9 times, and a radius of 25 3 times. It’s not a good thing – obviously it would be better if geonames returned the right thing the first time, but I’ve done a number of spot checks and it seems to be working.

Geocoding is hard…

One of the problems I’m having with this data load is that instead of telling you what country each waypoint is in, they tell you the “responsible authority”. Ok, normally that’s not too hard to map to a country, and sometimes there are multiple authorities for a country, (and the Czech Republic is super annoying because they designate every little flying club or airport owner as a “responsible authority”). That I can take care of with a simple lookup table – 305 entries, 90 of them in the Czech Republic. The problem occurs because sometimes the “responsible authority” covers multiple countries. “Serbia/Montenegro” in the Balkans, “Comoros/Madagascar/Reunion” in the Indian Ocean, Aruba/Netherlands Antilles” in the Caribbean, “Kiribati/Tuvala”, “Kiribati/Line Islands”, “American Samoa/Western Samoa” in the Pacific. (Although didn’t I read somewhere that the Netherlands Antilles recently split up into a bunch of separate countries?) Anyway, I want to disambiguate these and determine which country points in these merged authorities are in.

First I thought I’d look for the closest point in my existing database. Turns out, some of the new points are near borders so I end up getting the wrong country. Aha, I thought, I’ll use “Reverse Geocoding”. A while back I used a service at geonames.org to reverse geocode some points to determine which Canadian province they were in. I tried it, and the service is really slow to respond. So I thought I’d try Google’s new reverse geocoding. That’s when I discovered a couple of flies in my oatmeal:

  1. There are locations in the world where Google returns no results, in one case I saw because the point is slightly off shore according to Google Maps (although if you switch to satellite view you can see the point is actually on land). In another case, the result is puzzling – yes, it’s in Kosovo so maybe it’s disputed territory, but it’s not too far from the village of Lluge which Google does recognize.
  2. Addresses in Kosovo show up in the “formatted_address” field as “Lluge, Kosovo”, but the country code that is returned is Serbia. The data I’ve used before comes from the US government, and since the US government officially recognizes Kosovo, it would be inconsistent to label the new stuff as from Serbia instead of Kosovo

Oh, and geonames.org? It eventually seems to do the right thing for both of the above cases, although the country code it returns for Kosovo is “XK” (it appears that there isn’t an official ISO country code for Kosovo – I’d previously seen “KS”. I guess I’ll have to experiment more.

More on this data loader program

Well, I profiled a smaller data set and found a place where I was wasting a significant amount of time while processing nodes that I don’t care about. I’ve modified the code and I stopped the perl program (after 6098 minutes elapsed, 3491 minutes user, 2584 minutes system) and I’ve re-run it, and it finished in 16 minutes 30 seconds elapsed, 16 minutes 10 seconds user, 10 seconds system. Meanwhile, I’ve written a Java program that does the same stuff that the perl program does (like I said in my previous post, the perl program doesn’t actually do any loading or anything useful, it just parses one of the types of nodes that I’m interested in and prints out what it’s found) and it ran the whole file in 17 minutes 38 seconds elapsed, 6 minutes 31 seconds user and 10 minutes 9 seconds system.

So the upshot of this is that I guess I’m going to stick to perl.