PostgreSQL woes

I was up to 2:30am last night, and up again at 8:30, working on a problem with PostgreSQL. I spent a week and a half consolidating the data that comes from OurAirports.com and my existing data, and trying to figure out who was right when they disagree. I finally got that finished up at around 8pm last night, but didn’t load it on the production machine until after I got back from a party around midnight.

That’s when I discovered the horrible truth about PostgreSQL: in spite of the fact that the database was created with “ENCODING=’UNICODE'”, if you actually store Unicode data in a field and then do a “UPPER” on the field, it fails horribly. Everything else seems to work fine with the Unicode data, you can store, retrieve, update, but trying to do “UPPER” (and probably “LOWER”) fails horribly. And there is no work-around, like a “UPPER”-equivalent that works with Unicode. The only way to fix it is to wipe the entire PostgreSQL system, reinitialize it with your “locale” set to a Unicode-compliant locale like “en_US.UTF8”, and then restore everything from backup.

Restoring from backup wouldn’t be so hard if you had any clue where the PostgreSQL data is actually stored. Last time I had to do anything like this, when you logged in as the postgres user, your environment variable “PGDATA” had that information, but that’s not true in Debian or Ubuntu. I think it’s some weird thing they do so they can run multiple versions of PostgreSQL at the same time. Just for the record, I found it in /var/lib/postgresql/8.1/main.

While I was at it, I removed the “Donate using the Amazon Honor System” button. Amazon discontinued it, and it’s no great loss because I only got about $10 every two or three months (of which Amazon took nearly a buck) through it. Hopefully pointing out that it’s gone will remind people to donate.

Anyway, everything is up, my scripts are running, I have a lot more data, (some of it of questionable validity), and I’m getting ready to go work out.