More on this data loader program

Well, I profiled a smaller data set and found a place where I was wasting a significant amount of time while processing nodes that I don’t care about. I’ve modified the code and I stopped the perl program (after 6098 minutes elapsed, 3491 minutes user, 2584 minutes system) and I’ve re-run it, and it finished in 16 minutes 30 seconds elapsed, 16 minutes 10 seconds user, 10 seconds system. Meanwhile, I’ve written a Java program that does the same stuff that the perl program does (like I said in my previous post, the perl program doesn’t actually do any loading or anything useful, it just parses one of the types of nodes that I’m interested in and prints out what it’s found) and it ran the whole file in 17 minutes 38 seconds elapsed, 6 minutes 31 seconds user and 10 minutes 9 seconds system.

So the upshot of this is that I guess I’m going to stick to perl.