Well, I profiled a smaller data set and found a place where I was wasting a significant amount of time while processing nodes that I don’t care about. I’ve modified the code and I stopped the perl program (after 6098 minutes elapsed, 3491 minutes user, 2584 minutes system) and I’ve re-run it, and it finished in 16 minutes 30 seconds elapsed, 16 minutes 10 seconds user, 10 seconds system. Meanwhile, I’ve written a Java program that does the same stuff that the perl program does (like I said in my previous post, the perl program doesn’t actually do any loading or anything useful, it just parses one of the types of nodes that I’m interested in and prints out what it’s found) and it ran the whole file in 17 minutes 38 seconds elapsed, 6 minutes 31 seconds user and 10 minutes 9 seconds system.
So the upshot of this is that I guess I’m going to stick to perl.
I really like the junit part of java. Is there a automated way test perl code?
Just recently I found old code which does not return the right values in every case (depends on sorting or grouping), my input data was very small so it was easy to find. I’m trying not to think about running that code with 50 million lines of input. But I guess nobody will pay me for that line of work the next one or two years.
I’ve been on a couple of projects where we used JUnit. As far as I can tell, the main use is that the original developer writes a few unit tests to make sure a few assumptions he made were correct or that the parts of his code that he already knew might be a problem are working, and then they are run forever more in spite of the fact that once the original developer got that part right it’s never going to be wrong again. I have not been impressed with the ability of JUnit to find problems or even to prevent you from introducing new bugs. There just isn’t any substitute for good QA people.
99% of the code I write interacts with other web services or data in the database. So to unit test you’ve got to mock up the database or the web service, and so then instead of having one debugging problem, of your own code, you now have twice as much debugging because you have to get your mock service working right too.