The Smell of Hardware Errors In the Morning

Well, I woke up this morning as usual and checked my email to find that Big Brother had notified me of a RAID error on UFies.org. Seems that sometime around 4am one of the partitions in the /var RAID5 array blew up.



hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=41083896, sector=53880
end_request: I/O error, dev 03:04 (hda), sector 53880


Luckily though, the error was detected and the partition was taken out of the array. It didn’t fail over to the spare disk that was in there and set up in the /etc/raidtab though. Hmmm…. Anyway, glad that the spare drive is there. Just ran raidhotadd /dev/md1 /dev/hdb4 and after a few minutes syncing up, everything is back up. I hope this doesn’t signal the beginning of the end for that drive though, as that would suck. Not a huge amount, as everything is either on RAID5 or mirrored nightly. Still could be a PITA though. Later on tonight I’ll probably go through trying to recover the device and hope that a re-format and re-fdisk of the device will fix it all up.