Rebooted to fix the kernel exploit that Silverstr blogged about, but the box never came up. Something to do with the SCSI drive not being detected. sigh I really hate this hardware. A lot. Anyway, Fred is going to go in and see if he can convince things to go again. At least there’s a local mirror of /home now, or should be anyway. Guess the scheduled downtime for tomorrow has moved to today.
Update: Fixed, all back up and fine.
Update #2: The moral of the story is if it ain’t broke, don’t fuck with it. I got both raid arrays up and going just dandy with 3/3 drives operational. However, because I had moved things around at one point, they weren’t set right (and didn’t have spare drives showing up). So I figured, no problem, just remove the element that belongs to array 1 that’s in array 2 and re-add the element that’s supposed to be there.
Kernel oops when I tried that….
md: updating md1 RAID superblock on device
md: hde4 [events: 00000051]<6>(write) hde4’s sb offset: 18580864
md: <1>Unable to handle kernel NULL pointer dereference at virtual address 00000f90
[snip]
Stack: f88cd8ee […]
Call Trace: [
[
[snip]
md: recovery thread got woken up …
md1: no spare disk to reconstruct array! — continuing in degraded mode
md: recovery thread finished …
So it looks like it removed it ok, but somewhere in there something threw a NULL pointer somewhere. So I’m now waiting for the twenty minutes or so it takes the data-fortress dude to go in and kick the box. I don’t anticipate any problems with it coming back up, but it’s just a pain in the ass. I wonder if /dev/hdb is the real culprit here?
Why can’t I have a system where the hardware is stable?
Update #3: Ok, rebooted, things ok. Added append=30 to the kernel options as Wim suggested (anything other than adding it in lilo.conf and re-running lilo needed?) so that’ll hopefully eliminate this sort of thing in the future.