Remind me again why we switched to Sprint?

just got off the phone with Sprint. Needless to say, I had to phone them from the house line because once again I’m getting no signal in the house for the cell phone. I’m currently having two problems with the phones –

  1. We get signal in the house about 50% of the time, the rest of the time we get dropped calls and missed calls.
  2. Currently (as in ever since I was using it plugged into the car’s audio system on Saturday), I cannot hear anybody who calls me unless I remember to switch to speaker phone. It’s as if it still thinks there is something plugged into the headphone jack.

Sprint’s answer to the first was “we’ll send a network engineer to drive through your neighborhood to see what the signal strength is”, which means he’ll probably see it during the few minutes per hour where the strength registers as 3 or 4 bars, and declare it fine. I asked about one of those pico-cells, and they want you to pay for the device, then pay a monthly fee for the privilege of using your own cable modem network bandwidth to fix their network limitations. Their answer to the second problem is that I need to bring it in to a Sprint store, so somebody can try all the trouble shooting steps that I’ve already tried and say “yup, it’s dead all right”.

Right now the only thing that’s preventing me from driving to the Sprint store and saying “give us our money back, we’re going back to AT&T” and buying two iPhones is that Vicki isn’t home yet so I won’t be able to slam both of them down on the counter.

It’s a real shame, because I love WebOS, I kind of like the Pre itself (although the battery life sucks and when I’m using the GPS and music it suck down power faster than the car charger can replenish it unless I turn off the screen), but I hate, hate, hate, hate the Sprint Notwork.

So to everybody within the sound of my voice, hear my cry: “DON’T SWITCH TO SPRINT – THEY’RE CHEAPER FOR A REASON!”

Palm Pre as car entertainment/navigation system

Tonight, I was driving down to a person’s house near Moravia, NY to pick up a used kayak. I’d never been there, so I decided to borrow Vicki’s car charger and see how the Pre’s “Sprint Navigation” works on a real test. And because my car has an “Aux in”, I decided to use the Pre to play music at the same time.

If I’d stopped to write this review about 1 mile before I got to my destination, it would have been pretty glowing. On the way down there, I loved the fact that it would fade out the music when it had to give me a direction – I could listen to my music as loud as I wanted and not miss a turn. It took a different route than Google Maps had given me, but it avoided some messing around in downtown Auburn, and I got a nice view of Skaneateles Lake. But it was slightly annoying that it preferred the local road names over the highway number, so while cruising along SR 5/20, it kept telling me “In 1.9 miles, continue along Clark Street Road” and the like as the road changed name every few minutes. On the other hand, the voice prompts were so clear and frequent that I could just go by the sound and not look at the map. On the gripping hand, the phone got uncomfortably hot.

But I was cruising down State Route 38A when it started counting down to a turn. Now, I’m pretty sure it was telling me it was Pine Hill Road, although when I try it now it says Decker Hill Rd. I couldn’t for the life of me see this road when it said to turn, although looking at Google Maps there is a Decker Hill Road around where I was at that time, although I think I might have been at the driveway south of Decker Hill Road. Whatever, I couldn’t see anything I’d want to turn onto, but the GPS stopped showing the road I was on, only the one it thought was there and that I should have turned on. And so the GPS said that it was recalculating. And it said it again. And again. At this point I figured it couldn’t recalculate because of the lack of cell phone coverage down there, so I quickly punched the address into my car’s Garmin Nuvi and it got me to my destination.

On the way home, after I stopped for gas, I decided to try the Pre again, but I was mostly using the Nuvi. When I first fired up the GPS on the Pre, it told me that my ETA was 9:20, but the Nuvi was saying 9:13. As I got closer to home, the Pre kept adjusting my ETA downwards until it eventually agreed with the Nuvi at 9:13, and I got home pretty close to that. I don’t know why the difference – maybe the Pre thinks people drive the speed limit or something crazy like that. Another slight annoyance was that unlike the Nuvi, the Pre’s GPS doesn’t have a “night mode” map with a darker colour scheme to preserve night vision. So I left the screen off 90% of the time. As a side benefit, the phone didn’t get as hot with the screen off.

As I got close to home, I had my second major disappointment of the night. Every GPS in the world (and Google Maps) thinks I should exit from I-590 north of home and come back south, but I prefer to exit to the south and continue north on our neighbourhood streets. And so when I leave I-590, the Nuvi says “Recalculating” once while I’m on the off-ramp as it tries to convince me to take the on-ramp back onto I-590, then “Recalculating” again after I turn onto the road and shows me the neighbourhood route that I normally take. The Pre didn’t handle it quite as well. As I left I-590, it said “Recalculating”, but didn’t actually manage to recalculate a route. It didn’t even show me the street map – all it showed me was a very thin red line pointing in an exact straight line back to the nearest segment of the original route, which for most of the route would have involved smashing through somebody’s house, then their back fence, and then hopping over an embankment onto I-590. And this time I couldn’t even blame poor cell phone coverage, as it was showing a strong signal and EVDO data coverage.

My final verdict on the Pre as a GPS navigator? I’d say about 8/10 when you’re on the proper route, but 0/10 if you accidentally get off the route it originally calculated for you. I suspect based on my experience with other GPSes that it might be better when you get off route to tell it to stop navigating, and then tell it calculate a new route from where you are to your destination. I don’t know why, but I’ve gotten better routes that way from Garmin GPSes than by allowing them to recalculate and it might be the same for the Pre. Or maybe Sprint/TeleNav will just fix the damn software.

This is worrisome.

Update: Somebody on the Nutch mailing list pointed me towards the config option “fetcher.threads.per.host”. Increasing that to 10 dropped the time from 45 minutes to 15 minutes on the first crawl and 2 minutes for a re-crawl. Since I fixed Nutch to properly respect the Last-Modified header and If-Modified-Since, I don’t think I’m going to be blocked from crawling sites with multiple threads. Much less worrisome.

Time spent to copy all the files on three small web sites to a directory on my machine using wget: 1 minute 1.114 seconds.

Time spent for Nutch to re-crawl those same web sites: 45 minutes.

It doesn’t seem to matter what I put in the “number of threads” parameter to Nutch, either – it takes 45 minutes if I give it 10 threads or 125 threads.

Even worse for Nutch, out of the box it refetches documents even if they haven’t changed – I had to find and fix a bug to make that part work – but wget does the right thing.

Considering that all I’m doing with the Nutch crawl is going through the returned files one by one and doing some analysis and putting those results in a Solr index, I wonder if I should toss Nutch entirely and just work up something using wget? All I’m really getting out of Nutch is pre-parsing the html to extract some meta data.

Too bad I’ve already spent 3 weeks on this contract going down the Nutch road. At this point, it would be too time consuming to throw away everything I have and start afresh.

Oh yeah, /tmp is *temporary*

I was storing some files that were semi-important to the project I’m working on in /tmp. I knew that there is a process on some Unix computers that cleans out the stuff in /tmp either on boot or on a schedule, but I didn’t know if it did that on my Mac. So while I’d sort of had a flag in the back on my head to move that to somewhere less fragile, I never got around to it. And I got to working on another part of the project for a few days and forgot about them. And in the mean time, the files haven’t been touched, and I’ve installed an OS update and rebooted. And now I go back, and they’re gone. “Oh yeah”, I think, “/tmp is temporary”. So then I look to see if Time Machine has a backup, and of course Time Machine excludes /tmp because, oh yeah, /tmp is *temporary*.

I can recreate the files, but it’s a waste of a few hours. This time I’m going to recreate them in ~/data/.

This week’s interesting discoveries about Nutch

I’ve been working a lot with Nutch, the open source web crawler and indexer, and the first thing I found was that it was downloading web pages every day, instead of sending the “If-Modified-Since” header and only downloading ones that changed. Ok, I thought, I’ll fix that – since the information I want isn’t in the “datum.getModificationDate()”, I’ll use “datum.getFetchDate()”.

Second interesting discovery: Nutch then doesn’t index pages that returned 302 (not changed), and since the index merging code doesn’t seem to work, I can’t these pages that I cleverly managed to avoid downloading. Ok, I’ll fix IndexMapReduce and delete the code with the comment that says “// don’t index unmodified (empty) pages”, and resist the urge to send a cock-punch-over-ip to whoever wrote that comment for not realizing that “unmodified” does not mean “empty” by any stretch of the imagination.

Third interesting discovery: It turns out that some bright spark decided that when you’re crawling a page that’s never been loaded before, “datum.getFetchDate()” gets the current time, instead of any useful indication that it’s never been fetched before. So scratch my first fix, and go looking for why datum.getModifiedDate() isn’t set. And discover that it appears that datum.setModifiedDate() is never called except by code trying to force things to be recrawled. Yes, instead of forcing a new crawl by modifying the locally generated “fetch date”, they fuck around with the “modified date”, which is supposed to come originally from the server. My opinion of the quality of this crawler code is rapidly going down hill. But my patch to set the modification date according to the page’s metadata appears to be working. Sort of.

Fourth discovery, and one I can’t blame on Nutch: My Rochester Flying Club pages use shtml (Server Parsed HTML) so that I could include a standard header and navigation bar in each page. I could have used a perl script to automatically insert the header into the pages and regenerate them whenever anything changed, but this seemed a lot easier at the time. But one consequence that I’d never noticed before – the server doesn’t send a “Modification-Date” in the header meta data, so evidently these pages are never cached by any browser or crawler. Ooops.