Went into the co-lo again today to fix the Ufies box (again). Same problems as before christmas that I thought I fixed on christmas eve, but found out on christmas day that they weren’t comletely fixed.
What was happening was there were CRC errors on one partition, and that was causing the spare disk to be brought in (as it is supposed to be). However, it was getting a bit into rebuilding the drive on the spare disk and it would spew out the error: “ServerWorks OSB4 in impossible state”, and kernel panic.
The drives are set up as follows: 3x40G + 1 CDROM on the motherboard IDE controllers, and 1x40G on a Maxtor promise IDE controller card. Each of the 40G drives is in a RAID5 array (with the one on the promise card as the spare).
The drive that was reported as faulty (causing the spare to be brought
in) was the secondary master, /dev/hdc. /dev/hde was bring brought in.
I swapped each drive between each controller, and tried to bring it into
the RAID array (via raidhotadd). It got a couple of percent into
reconstruction of the array and then kernel paniced with the message I
mentioned above.
I’m almost wondering if it is the motherboard IDE controller (not the
individual ide channel) that is causing the problems, simple due to
the fact that it happened in so many circumstances. I’m going to be
doing a bit of stress testing on the drives and controllers I have
brought back home to see if there is anything wrong with them though.
Anyone with a good clue as to wtf is going on, how to fix it or good
troubleshooting ideas is welcome to email me back at alan@ufies.org.
Of course when I went to dinner tonight at the in-laws there were computer problems there that I deduced to be heat related. Sadly my method of deduction was by grabbing the heat sink and going “Youch!, that’s hot”. Luckily it wasn’t my box, and I didn’t do anything to it to break it, so that didn’t cause me (much) more (mental) pain.
I hate computers.