Weird one

A week or two ago, somebody from customer support came to me because certain customer sites were having some weird problems – some sites weren’t seeing new content even though it had been delivered and couldn’t update their schedules. She reported that restarting our services fixed it on most sites, but on one she’d had to reboot. Unfortunately the wasn’t anything obvious in the logs. The weird thing is that she said that the problem started with our content provider moved to the new servers – but the content provider says that they didn’t change what they were sending, just what servers they were sending it from.

After a few days, she managed to find one site that still having the problem, and I poked around and still couldn’t find much, except for the fact that if I went into the postgres command line, psql, it would allow me to do anything on the database except query one particular table. If I tried to do anything on that table, it would freeze up. Hmmm. Lacking any other ideas, I shut down the database server and restarted it, and that cleared up the problem. But shutting down the database server also kills off any processes that might be using the database. I was starting to think that two processes were in a deadlock over this one particular table. I filed away the information and asked for her to call me if it happened on any other sites.

This morning, she comes and says it’s happened on several sites at once. She logged me into one of the sites, and sure enough psql would block if I tried to select from that same damn table. Time to dig a little deeper. “select * from pg_locks” showed a couple of exclusive locks. Hmmm. Doing a “ps auwwfx” showed that there was a vacuumdb going on. Oh oh. It’s the nightly backup scripts. A couple of years ago (21Sep2005), I threw a call to “vacuumdb –analyze –full” in there. A bit of googling showed that duh, you’re not supposed to do a “–full” when anything else is going on because it does an exclusive lock on full tables. And JDBC has to take some sort of weird work-around for the fact that Postgres doesn’t give them an easy way to turn autocommit on, so they do the equivalent of a “commit;begin;” after every command. And this causes some locking of its own to go on which is clashing with the vacuumdb locking.

In all this time, it’s never caused any problems, or at least none with any regularity, but evidently the server relocation has caused some content ingestion to happen at the exact time this vacuumdb is going on, and causing these particular sites to have semi-regular problems.

I told them to go around to all the customer sites and edit /etc/init.d/backup_cos_files and remove the “–full” from the vacuumdb command line, and all should be well. I can’t believe I made such a dumb mistake, and that it didn’t cause any problems until now. Actually, I’m sort of hoping it will solve my other mysterious database lock up that was only happening about once a year per site. But that’s probably too much to hope for.

I can’t believe IT departments allow Lotus Notes on their networks

I’m having problems installing new software in my CrossOver Office Windows (non-)emulator, so I’m trying to get a VMWare Windows virtual machine working. A coworker gave me an image that’s working for him, and suggested that I just use that.

First thing I did was delete his personal account and create a new one for me. Then I copied over my .id file from ./.cxoffice/dotwine/fake_windows/data/notes/data/[foo].id to the appropriate place on the virtual machine. And when I fire up Notes, it says has my id in the drop down, and I can log in with my current password. But when I click the “Mail” item, it shows me my cow orker’s mail box. Just in case you missed that, let me spell it out for you – I used my password and accessed his email.

I mentioned that to our sysadmin guy (who takes care of the local Unix servers, and helps us work around the stupidity of corporate IT who are responsible for the Windows boxes). He said yes, you can put your Notes ID file on a thumb drive, take it to any Windows box in the company, log in with your password and read the email of the person who owns that box. Is it just me, or that just about the stupidest thing you’ve ever heard? Now, I don’t know if that’s a deficiency of Notes, or a deficiency of corporate IT, and I don’t particularly care. I’m just boggled.

But accepting that bogglement for the moment, does anybody know how to make Notes forget about the person who used to read Notes on this box and now doesn’t even have an account on the box, and allow me to read mine? The sysadmin says the only way is to remove Notes and reinstall it, but when I try that I get a Notes that doesn’t ask for any password and complains that the mail file wasn’t found when I start it.