The same problem I mentioned in Rants and Revelations » Java Thread Locking cropped up again. This time it was quite random, but repeatable. I dreaded going through the crap I went through last time to find where the lockup was happening, until I discovered a nifty new trick – if you do a “kill -3” of the java process id, it dumps a stack trace of every thread, including what locks it’s holding, to stdout.
Going through the stack trace, I could see where one thread on the client had three locks and was calling an RMI method on the server that was locked waiting for the delete thread to finish. And the delete thread was calling a callback on the client that was waiting for one of those three locks, so the delete thread was locked as well. Not good. I removed most of the locks and things started working. Maybe eventually I will put some of the locks back.
Rohan suggests that I might have to rewrite parts of the server to take care of the next bug report on my list – the complaint is that deleting content takes too long. Unfortunately the bits he wants me to rewrite are his code, and it will take me 2 weeks just to understand it well enough to start to make the changes, and I’ve only got 10 days to clear all the bug reports off my list.