Java Thread Locking

Ok, maybe I was a little succinct in my previous post.

You see, we’ve got an architecture where there are 3 or 4 layers of code, each one of which calls the one below it and then gets information back in the form of callbacks. Oh, and one of the very lowest layers is accessed through an RMI interface. Also, the very lowest layer deals with content, which can be created/modified/deleted through the program, or through other programs or just by doing file system stuff, which that lowest layer finds out about through dnotify.

The front end GUI has a dialog where you can delete content, and the problem was that evidently one of our customers have the fastest fingers in the world, and they complained that they delete the content and then go to ingest (slurp in) new content but the content they just deleted is still there (the deletion process takes a good 10-15 seconds) so the ingest fails due to lack of disk space. So they wanted the deletion to actually wait until it was done. And the lower level library actually provided a method called “deleteContentWaitTilDone”. So I thought it would be a simple matter to call it – once the method returned, the content would be really gone.

That’s when my problems started. I spent a week on this damn thing. The sad thing is that if Martin was still around, I could have used his Eclipse debugging skills and got this done in half the time. But when I attempted to install Eclipse on my machine, every time I fired it up, the whole machine locked up.

The problem seemed to be that the deletion process called callbacks in the higher levels, and ultimately some of them would do GUI stuff, and they’d also call down to the library. I had a hell of a time working out what was the actual problem. I ended up putting System.out.println debugging statements all over the damn place.

What I found first was a bunch of extraneous “synchronized” methods – the problem with that was the methods were synchronized to prevent different things. So instead of synchronizing 6 methods in a class where 2 of them were synchronized to prevent simultaneous accesses to a variable named “childThread”, and 2 of them were synchronized to prevent simultaneous accesses to the library, and 2 of them were synchronized for some other reason. I removed the “synchronized” on the method names, and then protected the important parts with different synchronization Objects, one called “childThreadSyncObject”, one called “librarySyncObject”, and it turned out the other ones didn’t have to be synchronized at all. Further digging revealed that the code one level above this that called this also had a synchronization object, which was redundant and I removed it.

The next thing I found was that one of the GUI level callbacks called “fireIntervalChanged” and it never returned. Ever. That’s when I had another epiphany – the callbacks aren’t in the gui event thread, and the event thread is currently locked because it’s waiting on that “deleteContentWaitTilDone”. So I went through all the GUI level code and made all the callbacks do the bulk of their processing in the event thread using SwingUtilities.invokeLater. The standard way to do that is

SwingUtilities.invokeLater(new Runnable()
{
public void run()
{
//..do stuff..
}
});

but unfortunately you can’t pass arguments that way, so I ended creating a metric buttload of tiny private classes that implement Runnable but take arguments in the constructor.

After all that work, I finally had stuff working. But unfortunately I neglected something that’s probably important – I didn’t give any sort of dialog or busy cursor or anything while that processing is going on. Oh well, maybe next time.

Ok, that was getting ridiculous

After getting that error on the pt_comments table again, I went into mysql and tried a

check table pt_comments;

and it found an error, so then I did a

repair table pt_comments;

and it said it repaired it, but another

check table pt_comments;

found the same damn error again. So I shut off the web server (so I wasn’t getting hit with comment spams) and did a backup and restore of the entire mysql database, and the problem seems to be gone. For now. Maybe. We’ll see.

I don’t think MySQL liked the upgrade very much

Every time I go into my WordPress SpamKarma page, I get the following error:

Failed to purge comment spam entries.
Query: DELETE `pt_comments`, `pt_sk2_spams` FROM `pt_comments` LEFT JOIN `pt_sk2_spams` ON `pt_sk2_spams`.`comment_ID` = `pt_comments`.`comment_ID` WHERE (`pt_comments`.`comment_approved` = ‘0’ OR `pt_comments`.`comment_approved` = ‘spam’) AND `pt_comments`.`comment_date_gmt` < DATE_SUB('2006-08-14 11:36:26', INTERVAL 2 HOUR) SQL error: Incorrect key file for table './wordpress/pt_comments.MYI'; try to repair it

So then I go into mysql, do a “repair table pt_comments”, and repeat the purge operation and it’s fine. But some hours later, when I go back to the page, I get the same error. How do I repair this damn table so it stays repaired? Would doing a full mysqldump and restore fix it?

Server upgrade

I upgraded my server from Fedora Core 4 to Fedora Core 5 using yum. After going through all the .rpmnew and .rpmsave files and fixing configuration files, most things are working. A couple of annoyances:

  • It no longer puts items in /etc/fstab for usb storage devices, so I have to find what device the disk has been assigned and mount it manually. I’m hoping I can find a solution to that.
  • Can somebody please tell me why the people who wrap the PostgreSQL binaries in an RPM can’t figure out how to do a pg_dumpall in the %pre at the beginning of the process, and a restore of the backup in the %post at the end? The init.d script refuses to start up if you have an 8.0.x database on an 8.1.x PostgreSQL, so after the upgrade you have to go “oh, shit, the last backup I made was …” and restore it. If you’re not the sort of person who dumps the database every night to a file on your USB disk, you’re screwed.
  • WordPress was refusing to clear out comment spam because of some index issue, and then claimed that the table had “crashed”. I had to fumble around with mysql_upgrade, mysqlcheck, and myisamchk to get that straightened out.

So far, knock wood, those are the only issues. The PostgreSQL one is, to me, indicative that the Fedora Core team don’t really care about preserving data. I haven’t tried, but I bet you anything the Debian people don’t just say “oh well, if you didn’t back up you’re screwed.”