Optimize this?

The main loop query of my waypoint generator app is kind of hairy. And trying to do an “explain” on a typical query shows why it’s so slow.

explain SELECT a.id, c.pdb_id, internalid, a.type, name,
address, state, country, latitude, longitude,
declination, main_frequency, elevation,
b.category, chart_map, tpa, ispublic
FROM waypoint a, type_categories b,
id_mapping c
WHERE a.type = b.type AND
a.id = c.id AND
country in (‘US’, ‘CA’) AND
(a.type in (‘AIRPORT’, ‘VOR’, ‘NDB’) or
(category = 3 and (chart_map & 7) != 0)) AND
deletedon is null;
QUERY PLAN
——————————————————————————————————————————————————————–
Hash Join (cost=4442.82..19955.46 rows=34752 width=111)
Hash Cond: ((a.id)::text = (c.id)::text)
-> Hash Join (cost=4.83..12938.32 rows=34752 width=107)
Hash Cond: ((a.”type”)::text = (b.”type”)::text)
Join Filter: (((a.”type”)::text = ANY ((‘{AIRPORT,VOR,NDB}’::character varying[])::text[])) OR ((b.category = 3) AND (((a.chart_map)::integer & 7) <> 0)))
-> Seq Scan on waypoint a (cost=0.00..10759.48 rows=72467 width=103)
Filter: ((country = ANY (‘{US,CA}’::bpchar[])) AND (deletedon IS NULL))
-> Hash (cost=4.37..4.37 rows=37 width=15)
-> Seq Scan on type_categories b (cost=0.00..4.37 rows=37 width=15)
-> Hash (cost=2091.77..2091.77 rows=127777 width=12)
-> Seq Scan on id_mapping c (cost=0.00..2091.77 rows=127777 width=12)
(11 rows)

Adding indexes on country and type doesn’t help. There is still that nasty looking “Seq Scan on waypoint a” line. And also, another “Seq Scan on id_mapping c”, which I don’t understand at all because the joining column, c.id, is a primary key, so shouldn’t there be an index involved?

I’ve got a few ideas on how to use the spatial capability of PostGIS to improve that query, so I’m going to have to run a few tests. The first few ideas I’ve had aren’t showing major improvements in “explain”. It looks like the whole “type in … or ((chart_map & NN) != 0)” is going to force a sequential scan on waypoint no matter what I do. Hmmm.

Back to the drawing board

In order to support a new product development I mentioned in an earlier blog post, I re-did my existing waypoint database as a PostGIS geographical database. I also added some foreign keys and some other cleanup that I’ve been meaning to do for a while. But obviously, I don’t want to support two databases, so today I’ve been converting one of my existing waypoint generator perl scripts to use the new PostGIS database instead of the MySQL database it was on before, but without any actual GIS functionality. And Houston? We have a problem. Doing a full US + Canada data load on the MySQL version takes about a minute and a half. Doing the same load on the PostGIS version takes twenty five minutes. Something tells me that I need to make some adjustments here.

This could work

I probably shouldn’t give too many details, but I’ve been in talks with a certain freeware developer over developing a flight planning application for a web connected hand-held device. (Anybody who knows anything about me can probably guess the developer and the device.)

My part would be a server app that would respond to requests for data from the device and send new data or updates. Nothing too different than what I’ve been doing before, but one of the things we’ve been talking about is managing “areas”. His concept was that if the user entered an id that wasn’t on the device already, my server would send the device a whole “area”, and the device would keep track of what areas it had in memory already, when they were last updated, and would occasionally request updates of the areas it knew. He thought that each area could be a whole country. The first thing that struck me about that is that if the point you asked for was in the US, you could be asking for thousands of waypoints (70,584 in the current database). That could take a long, long time on an Edge network. Then we discussed maybe breaking it down by state or province in the US and Canada.

But the thing is, I used to be a GIS (Geographic Information Systems) programmer. I know there are better ways. At first I started looking around for the HHCode algorithm since I worked with Herman Varma and the Oracle guys implementing the original Oracle “Spatial Data Option”, until that scumbag Jim Rawlings screwed me out of three months pay. But I can’t find the source code anywhere.

So my next idea was a modified quad tree. Basically, when populating the database, I made a “rectangle” that incorporates the whole world and start adding points. When I hit a threshold, I subdivide that “rectangle” into 4 equal sub-rectangles, and move the points into whichever rectangle they belong to. This means that where points are sparse, the rectangles are large, and where they are dense, the rectangles are small. That way I’ve got some consistency in the size of the file to be sent to the device, and I’m not wasting people’s time sending the 19 waypoints in Wake Island, say, as an individual file.

I’ve been experimenting today with PostGIS, which is an extension to Postgresql which adds some very efficient geographic query tools. The program I wrote to take the data from my old MySQL database and put it into the PostGIS database while building these quad cells runs pretty fast. Surprisingly fast, even. PostGIS is pretty capable. Too bad the manual for it sucks rocks.

One thing that I keep forgetting is how much faster computers are now than when I was doing GIS for a living. I keep expecting things to take hours when they end up taking minutes, because the last time I did this sort of thing I was using a 40MHz SPARC and now I’m using a dual core 1.86GHz Intel Core2 Duo, and I’ve got more RAM at my disposal now than I had hard drive space back then.

Anyway, mostly I’m writing this because I’m really enjoying working with GIS-type stuff again. I wish I could do it full time again.