Help me lazyweb!

Update: See my next post for what I think is a solution.

(I’ve noticed a bunch of people using the term “lazyweb” to mean throwing your question out to a web based audience who might know the answer off the top of their heads either instead of or as well as attempting to research the answer yourself. Works for me.)

I bought two new external drives – one to use as a TimeMachine drive for our laptops, and one to act as a Linux backup disk. I’ve had terrible luck with external USB drives – I’d say fewer than 50% have actually worked right for what I want them for, which is sitting idle 20 hours a day and then doing a nightly backup using rsync. And it’s always the fault of the enclosure, not the drive – ripping the drive out and using the drive as an internal drive and/or putting a new drive in the enclosure has proven that beyond a doubt. So this time I bought Seagate “FreeAgent”s – I figured if Seagate made the drive and the enclosure, there could be no doubt whose fault it was if it didn’t work.

The first thing I did with the Linux one is re-write the partion table (to make sure there wasn’t some damn U3 thing on it) and mkfs.ext3 it. I mounted it, and then tried copying the old backups from the other drive. I was hitting the drive pretty hard with three simultaneous “tar-untar” pipes going at once. No problems. But I woke up in the middle of the night and decided to check on things, and make sure the tars had finished with no problems. I typed “ls /mnt/backup2/all” expecting tab completion of the file name, but instead of I got some errors in /var/log/messages and on the screen:

Nov 1 02:54:22 allhats2 kernel: [1081871.238199] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 02:54:22 allhats2 kernel: [1081871.238208] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 02:54:22 allhats2 kernel: [1081871.238215] end_request: I/O error, dev sdd, sector 12375
Nov 1 02:54:22 allhats2 kernel: [1081871.242695] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 02:54:22 allhats2 kernel: [1081871.242701] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 02:54:22 allhats2 kernel: [1081871.242707] end_request: I/O error, dev sdd, sector 8279
Nov 1 02:54:22 allhats2 kernel: [1081871.253317] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 02:54:22 allhats2 kernel: [1081871.253322] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 02:54:22 allhats2 kernel: [1081871.253328] end_request: I/O error, dev sdd, sector 63
Nov 1 02:54:22 allhats2 kernel: [1081871.253382] lost page write due to I/O error on s
dd1

Now this is a bit odd – no errors when it was being hammered, but an error after it had been idle for some time, possibly hours. Then this morning when I woke up I wanted to see if the nightly backup had worked as well, and did the same thing with the same result:

Nov 1 06:38:03 allhats2 kernel: [1095289.455999] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 06:38:03 allhats2 kernel: [1095289.456006] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 06:38:03 allhats2 kernel: [1095289.456013] end_request: I/O error, dev sdd, sector 12375
Nov 1 06:38:03 allhats2 kernel: [1095289.460494] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 06:38:03 allhats2 kernel: [1095289.460499] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 06:38:03 allhats2 kernel: [1095289.460506] end_request: I/O error, dev sdd, sector 8279
Nov 1 06:38:03 allhats2 kernel: [1095289.465243] sd 9:0:0:0: [sdd] Device not ready: <6>: Sense Key : Not Ready [current]
Nov 1 06:38:03 allhats2 kernel: [1095289.465249] : Add. Sense: Logical unit not ready, initializing command required
Nov 1 06:38:03 allhats2 kernel: [1095289.465255] end_request: I/O error, dev sdd, sector 63
Nov 1 06:38:03 allhats2 kernel: [1095289.465308] lost page write due to I/O error on sdd1

It also turned out that it got similar errors at 3:30am and 4:19am, when I think the nightly backup was running. And when I tried to look at one of the backups (those of a machine called “xen1”), I got an i/o error on the directory. Trying to remove the directory got me an error saying it was a read-only file system. Unmounting the drive and remounting it got rid of that error. I removed the xen1 directory and repopulated it with another tar.

I’m not 100% sure if this is the only problem, but it looks to me like the drive is going to sleep, and then not waking up fast enough to make ext3 happy. So what I’m asking lazyweb for is a way to either stop the drive going to sleep, or a way to increase whatever timeouts there are in ext3? I’m hoping there is an hdparm command or something you can write into somewhere in /proc to do this?

3 thoughts on “Help me lazyweb!”

  1. I just bought this thing, and I hit the same issue. Looking at the kernel code I found the scsi section that produces that error. And checking, the SPC spec, it says:

    ASCP = 0x2h
    “NOT READY: Indicates that the logical unit is not accessible. Operator intervention may be required
    to correct this condition.”

    Which is probably the wrong thing that it should be sending. Unless the Windows version expects the userland tool to kick it.

    I’ll look into the other parts to see if I can get it to work. I might just hack it and let 0x2 kick off the request again. Maybe it needs another command to happen first.

    Will keep you posted.

Comments are closed.