This is worrisome.

Update: Somebody on the Nutch mailing list pointed me towards the config option “fetcher.threads.per.host”. Increasing that to 10 dropped the time from 45 minutes to 15 minutes on the first crawl and 2 minutes for a re-crawl. Since I fixed Nutch to properly respect the Last-Modified header and If-Modified-Since, I don’t think I’m going to be blocked from crawling sites with multiple threads. Much less worrisome.

Time spent to copy all the files on three small web sites to a directory on my machine using wget: 1 minute 1.114 seconds.

Time spent for Nutch to re-crawl those same web sites: 45 minutes.

It doesn’t seem to matter what I put in the “number of threads” parameter to Nutch, either – it takes 45 minutes if I give it 10 threads or 125 threads.

Even worse for Nutch, out of the box it refetches documents even if they haven’t changed – I had to find and fix a bug to make that part work – but wget does the right thing.

Considering that all I’m doing with the Nutch crawl is going through the returned files one by one and doing some analysis and putting those results in a Solr index, I wonder if I should toss Nutch entirely and just work up something using wget? All I’m really getting out of Nutch is pre-parsing the html to extract some meta data.

Too bad I’ve already spent 3 weeks on this contract going down the Nutch road. At this point, it would be too time consuming to throw away everything I have and start afresh.

Long paddle today

[youtube YE17iVskad4 Team Practice]
Today instead of just me and Mike doing a long grind, it was five of us – Mike, Paul D, Bill, me, and coach Dan. We met at the lake, and the lake was flatter than a pancake. Even I, the big wuss, paddled with the PFD lashed to the rear deck instead of wearing it, although Paul D wore his, but I think that was more due to his lack of experience and comfort in the ski rather than waves or wind.

We started off doing a moderate pace, riding each other’s wash, and every mile doing a “pickup” or a faster piece, not a sprint, but faster than our “grind” pace. At other times, instead of doing our “pickup” at a given time, we sprinted across a river channel, or turned to ride a large wake coming in. We did two level pickups where we increased pace to something like 6.5 mph, and then after 45 seconds increased to 6.7 or 6.8 for another 45 seconds. It was a good work out, lots of variation, and I’m quite wiped right now.

It’s an awesome sight seeing those four gleaming white surf skis skimming along the water, and my boat is also pretty gleaming itself, although it looks a little out of place. Based on my brief experience with the V10 Sport at Baycreek, I figure I’m half a mile an hour slower in my boat, so I think I’m doing pretty damn well to keep up with these guys for 2 hours.

Six hours with a Palm Pre

Vicki and I have been discussing smart phones for a while now. I wanted an iPhone, for a number of reasons regarding the phone itself and the Apps Store, and also because I have severe reservations about Sprint’s ability to provide signal, based on my experience about eight years ago when I was a Sprint customer. But Vicki utterly hated the idea of talking into a flat panel for reasons I don’t entirely understand, and she seemed to feel much more strongly about it than I did. So we decided to go with the Palm Pre. We picked ours up today. Here’s a few preliminary impressions, in no particular order:

  • I find the keyboard very cramped. The Treo keyboard was better.
  • The screen is small compared to the iPhone/Touch but just as bright and readable.
  • There are almost no apps in the App Store.
  • As I feared from my previous experience as a Sprint customer, signal strength inside the house sucks.
  • The OS is very slick in many ways. I’m hoping there is a faster way to dismiss a page than to swipe up to go into the multi card view, then swipe it up to throw it away, but otherwise it’s really nice. Very much the equal or better than the iPhone OS.
  • Even though the web browser is supposedly based in WebKit, same as the iPhone, it doesn’t do GMail right – you press the “Archive” button and it doesn’t take you back to the Inbox screen – although refreshing the screen shows that the message was archived, and sometimes it cuts off the bottom of mail and you can’t scroll down.
  • The built in mailer is better but it doesn’t thread or group by subject (much like SnapperMail or Apple Mail) and when you hit the delete button it somehow really deletes the message instead of
    archiving it like SnapperMail does.
  • The battery life seems pretty poor compared to the Treo, but of course I’m using it more right now, and I haven’t charged it overnight yet. But an hour or so of constant web browsing seems to use about 50% of the battery.
  • The Sprint GPS app seems extremely good – as good or better than my Garmin nuvi, although I wish it were louder.
  • The bastards used yet another incompatible connector instead of a standard mini-USB so you have to use their cable to charge it.
  • The iTunes integration seems to be working fine, although I can’t tell if it synced contacts.
  • The ability to merge contacts is great, although I kind of wish it hadn’t dragged in every person I’ve ever sent email to on Google.
  • Same with the calendar integration – it brought in every calendar I share, even the ones I normally turn off. You can either view only one calendar, or all of them. There is no way to turn off “Vicki’s Work Calendar” and “Ubuntu Local Community” and keep all the rest on.
  • Tasks seem to have no ability to make repeating entries. Funny how Palm OS used to do that so well, but WebOS can’t. But then again, neither can Google Calendar tasks.

All in all, I think the Pre is going to be a good phone, but I wish it got better reception in the house.

Surf Skis again

Dan and I went for a paddle today, and because it was such a hot day and the water was so warm, afterwards he suggested I give his Epic V10 surf ski a try. I wrote before that I’d tried out a V10 Sport a few times and liked it a lot. A V10 is longer and narrower than a V10 Sport – it’s basically got the same lack of initial stability, but it doesn’t flare out as much so it has a lot less secondary stability. Dan said that it would be a better boat for me because of my size, and I have to say that paddling in the shallow water just off shore at the beach it sure seemed less prone to “suck water” in the shallows.

The lack of stability meant that I dumped about four times before I got to take my first paddle stroke. After a while (and a lot more swimming), I got comfortable enough to take ten or twenty paddle strokes before dumping, but I was still going from gunwale to gunwale (assuming you can call those things on surf skis “gunwales”). But that was just my first half hour in the boat – I’m sure in another 10 hours I’ll be much better – and have a lot more beach sand in my ears.

I’m not denying that the V10 would be a faster boat once I got used to it, but on the other hand, the learning curve would be a lot higher – I wonder if I’d be able to race it in the first season I bought it. On the other hand, most of the rest of the team is in V10 Sports now, and I can just imagine what some of the more competitive (but who claim not to be competitive) types would think if I were to leap frog them and get a faster boat than them.

On the other hand, Mike thinks he’s got this all worked out. He owns a “blue stripe” V10 Sport. On Epic surf skis, there are are four different layups, each one more expensive and lighter than the last, each denoted by a different colour stripe on the cheat line. The “blue stripe” (Value) ones are the cheapest and heaviest. The “black stripe” (Performance) ones are about $500 more, and about 4 pounds lighter. The “red stripe” (Ultra) are $1000 more than Performance, and 8 pounds lighter than them. (Note the constant $125/pound ratio there.) And the mythical “black boat” (Elite) ones are another $1000 more than Ultra, and only 4 pounds lighter – so due to the drop off in price/performance, I’ve never seen one of these, and nobody is likely to see once not in the hands of pros. But anyway, Mike’s theory is this: next year, I should buy his “blue stripe” V10 Sport, and he’ll buy Steve’s “red stripe” V10 Sport. Then Steve will buy something even better, probably a V12 or V10 L. It’s a great theory, providing Steve cooperates by buying another boat and selling his V10 Sport for a price that Mike can afford. And that I actually want his “blue stripe” V10 Sport. But that might work – a year in a V10 Sport might be a good way to develop some core strength and balance before moving to a V10.