This time I think it was the cache…

As I wrote about in 2007 in articles and , back in 2004 I wrote a cache for part of the product I was working on at Kodak. In the first release to QA, I made sure that area of the code got tested thoroughly, and they found a bug, and fortunately I got it fixed before it went out to the customers. But to my chagrin, my boss and other people on the project got it in their heads that somehow any problem anywhere near that part of the product must be the fault of my cache, even though time and time again it was proven that there were no further bugs in that code for the following 3+ years.

Now flash forward to the product I’m working on now. We have a “go live to the very important customer” happening in just a few days, and we’re supposed to be in code semi-freeze. But the “Performance Project” just put their performance cache into the product, evidently without giving the local QA much chance to test it before it went to the customer’s QA. That seems just a little bit dangerous to me. But no matter, they assure me they’ve written tons of unit tests. So what could possibly go wrong?

Today the customer called up saying that they’re setting up a new client on the admin site, but every time they go to the “branding setup” for that new client, they see some other client’s branding setup. This branding consists of things like the client logo and some “terms and conditions” text and the like. Since they’ve got literally hundreds of QA people hitting this site, I naturally wondered if they weren’t seeing some interaction between multiple people messing with the setup. But after hours of poking around on their site, one of my peers and I (neither of us members of the “Performance Product”, I might add) are convinced it’s the performance cache. Evidently if you use one browser to look at one client’s branding, and then use a different browser to look at the branding of the client who hasn’t been setup yet, you see the branding from the client that you’d looked at in the first browser. Somehow the cache is reacting to the absence of information in the database for a client by pulling up information from some other client out of the cache. That’s not good.

Hopefully that will get fixed, and hopefully somebody will set up a test plan that actually tests what the cache does not just on a cache miss, but also on a database miss as well. And hopefully the important customer won’t think we’re all a bunch of idiots for not testing this properly.