Ok, maybe I was a little succinct in my previous post.
You see, we’ve got an architecture where there are 3 or 4 layers of code, each one of which calls the one below it and then gets information back in the form of callbacks. Oh, and one of the very lowest layers is accessed through an RMI interface. Also, the very lowest layer deals with content, which can be created/modified/deleted through the program, or through other programs or just by doing file system stuff, which that lowest layer finds out about through dnotify.
The front end GUI has a dialog where you can delete content, and the problem was that evidently one of our customers have the fastest fingers in the world, and they complained that they delete the content and then go to ingest (slurp in) new content but the content they just deleted is still there (the deletion process takes a good 10-15 seconds) so the ingest fails due to lack of disk space. So they wanted the deletion to actually wait until it was done. And the lower level library actually provided a method called “deleteContentWaitTilDone”. So I thought it would be a simple matter to call it – once the method returned, the content would be really gone.
That’s when my problems started. I spent a week on this damn thing. The sad thing is that if Martin was still around, I could have used his Eclipse debugging skills and got this done in half the time. But when I attempted to install Eclipse on my machine, every time I fired it up, the whole machine locked up.
The problem seemed to be that the deletion process called callbacks in the higher levels, and ultimately some of them would do GUI stuff, and they’d also call down to the library. I had a hell of a time working out what was the actual problem. I ended up putting System.out.println debugging statements all over the damn place.
What I found first was a bunch of extraneous “synchronized” methods – the problem with that was the methods were synchronized to prevent different things. So instead of synchronizing 6 methods in a class where 2 of them were synchronized to prevent simultaneous accesses to a variable named “childThread”, and 2 of them were synchronized to prevent simultaneous accesses to the library, and 2 of them were synchronized for some other reason. I removed the “synchronized” on the method names, and then protected the important parts with different synchronization Objects, one called “childThreadSyncObject”, one called “librarySyncObject”, and it turned out the other ones didn’t have to be synchronized at all. Further digging revealed that the code one level above this that called this also had a synchronization object, which was redundant and I removed it.
The next thing I found was that one of the GUI level callbacks called “fireIntervalChanged” and it never returned. Ever. That’s when I had another epiphany – the callbacks aren’t in the gui event thread, and the event thread is currently locked because it’s waiting on that “deleteContentWaitTilDone”. So I went through all the GUI level code and made all the callbacks do the bulk of their processing in the event thread using SwingUtilities.invokeLater. The standard way to do that is
public void run()
but unfortunately you can’t pass arguments that way, so I ended creating a metric buttload of tiny private classes that implement Runnable but take arguments in the constructor.
After all that work, I finally had stuff working. But unfortunately I neglected something that’s probably important – I didn’t give any sort of dialog or busy cursor or anything while that processing is going on. Oh well, maybe next time.