Optimize this?

The main loop query of my waypoint generator app is kind of hairy. And trying to do an “explain” on a typical query shows why it’s so slow.

explain SELECT a.id, c.pdb_id, internalid, a.type, name,
address, state, country, latitude, longitude,
declination, main_frequency, elevation,
b.category, chart_map, tpa, ispublic
FROM waypoint a, type_categories b,
id_mapping c
WHERE a.type = b.type AND
a.id = c.id AND
country in (‘US’, ‘CA’) AND
(a.type in (‘AIRPORT’, ‘VOR’, ‘NDB’) or
(category = 3 and (chart_map & 7) != 0)) AND
deletedon is null;
QUERY PLAN
——————————————————————————————————————————————————————–
Hash Join (cost=4442.82..19955.46 rows=34752 width=111)
Hash Cond: ((a.id)::text = (c.id)::text)
-> Hash Join (cost=4.83..12938.32 rows=34752 width=107)
Hash Cond: ((a.”type”)::text = (b.”type”)::text)
Join Filter: (((a.”type”)::text = ANY ((‘{AIRPORT,VOR,NDB}’::character varying[])::text[])) OR ((b.category = 3) AND (((a.chart_map)::integer & 7) <> 0)))
-> Seq Scan on waypoint a (cost=0.00..10759.48 rows=72467 width=103)
Filter: ((country = ANY (‘{US,CA}’::bpchar[])) AND (deletedon IS NULL))
-> Hash (cost=4.37..4.37 rows=37 width=15)
-> Seq Scan on type_categories b (cost=0.00..4.37 rows=37 width=15)
-> Hash (cost=2091.77..2091.77 rows=127777 width=12)
-> Seq Scan on id_mapping c (cost=0.00..2091.77 rows=127777 width=12)
(11 rows)

Adding indexes on country and type doesn’t help. There is still that nasty looking “Seq Scan on waypoint a” line. And also, another “Seq Scan on id_mapping c”, which I don’t understand at all because the joining column, c.id, is a primary key, so shouldn’t there be an index involved?

I’ve got a few ideas on how to use the spatial capability of PostGIS to improve that query, so I’m going to have to run a few tests. The first few ideas I’ve had aren’t showing major improvements in “explain”. It looks like the whole “type in … or ((chart_map & NN) != 0)” is going to force a sequential scan on waypoint no matter what I do. Hmmm.

2 thoughts on “Optimize this?”

  1. yes, as soon as you apply a function to a column in (most) databases you will force a tablescan. The inequality may also force a tablescan. I think the way around this is to precompute the column (i.e. have a column whose value is 1 if ((chart_map & 7) != 0) ). (Oracle, at least, allows function indexes). I don’t know which parts of the query are constants, and which parts are inputs, so that’s my best suggestion for now. If this column really is a bitmap, think about exploding it into separate columns of type char(1).

    The other thing to be aware of is that if you’re going to hit multiple indexes with your comparators then the optimizer may also decide that it’s faster to simply do a tablescan because hitting multiple indexes, and merging the results is slower than accessing the base data.

    On a point of style you should always qualify which table the columns come from; you might know that category is in b and chart_map is in a, but the programmer that comes after you might not (and by creating a column of the same name in one of the other tables will break your code without even touching it).

    final thought… if some of the comparators radically restrict the data set, then you might try using subselects.

  2. No idea if this will have any effect, but try changing that “a.type in (blah)” to “b.type in (blah)”.

Comments are closed.