UFies.org Status Report

The UFies server sitting on my desk at home
You have no idea the depth of hatred I feel for computers right now. To make a long and painful story short: Box down for a couple of days. Hardware bad. To make it a bit longer than that and see a picture of the 10 hard drives stuffed in the huge case (3xSCSI, 4xIDE, 1xIDE (backup drive), 1 floppy and 1 CD), read on.

  • 8:17 – Aquire Tim Hortons Mocha. This is the end of the good things in this story.
  • 9:40 – Arrive at the Data Fortress hosting center
  • 9:40am-2:00pm – Back up all data to a spare 80G hard drive that Fred brought. Then we decided to boot up on a CD and make sure we could still access the drive and that the data was safe (after which the drive was going to be disconnected so that the backup was safe and I was free to do whatever I wanted partitioning and formatting wise). I/we ended up fighting with what appeared to be a bad cable or a bad IDE controller on the motherboard. The symptons were that when the drive booted up it got about 5mb/s transfer rate (something I didn’t notice when I was backing data up, other than it was slower than it normally is), and cause “drive not ready” and DMA errors when DMA was turned on. In the end I basically said “the data is there, I can deal with a slow drive, guess the IDE channel on the MB is going”, and headed off to lunch.
  • 2:00-3:00pm – Lunch at the food court. Mmm….. sushi…..

  • 3:00-6:00pm – Install Gentoo, originall with the idea that the SCSI drives can now be turned into system drives, and the IDE drives can make up the /home partition. However, the SCSI drives would intermittently freeze the system up solid when mounting a drive on the SCSI partition or when formatting the newly created RAID5 array on the SCSI drives. In the end I said (paraphrased) “oh darn” and decided to go back to IDE for the system, figuring it was something odd in the gentoo kernel.

    I created the IDE RAID arrays and started the install. Gentoo is a slow install to begin with, but it was slowed even more (I realized this only after) due to the CDROM probably getting the same horribly slow 5mb/s transfer rate that the spare hard drive was getting on the IDE channel on the motherboard (where the CDROM was connected). Slow to copy files, then the slow process of setting up various files, slow to sync the portage tree, slow everything. I’m sure part of it was due to it getting towards the end of the day and me feeling things come down to the wire once again. Compile the kernel, run through it’s options, make sure things are all good, things are looking up.
  • 6:00-6:30pm – The install is almost done, all that is left is to configure GRUB, compile SSH so I can access the system remotely to continue configuring, and then I can go home! All the time the RAID5 has been resyncing the 70G partition I created. This is because it’s the first time that it’s been set up so that’s what it does. The syncing has slowed down my install due to the file writing in the background and the install has slowed down the resyncing.

    Right at the end (I assume) of the resync the same fucking error message is spit to the screen. DMA error, kicking /dev/hdf out of the RAID array, starting to resync with the hot spare.


    At this point I consider committing hari-kari but decided to give the hardware the benifit of the doubt. I had had problems with the drive before, even though it was a new drive and maybe, just maybe, it was actually a bad drive. I can let it resync in the background and just continue on.

    Last step, configure the boot loader. I run grub and tell it to boot off of /dev/hde1 for me. Grub comes back and tells me “I’m sorry dave, I can’ t do that.” Why not? “Can’t mount the partition.” But the partition is right there? “No it’s not!” I can see it in your command completion! “No you can’t! You’ve had 8 hours of sleep over the last couple of days and just spent 8 hours with your neck craned up at a monitor that’s horribly placed for anyone sitting down, you’re eyes are starting to go and your hands are shaking, you don’t know what’s going on!” Ok, maybe you’re right.

    So I decide to run fdisk to see if maybe the partition was tagged as something it wasn’t supposed to be. Fdisk came up with an empty partition table and a message about some sort of error that will be corrected by a write.

    WTF? WTFFF???

    Check all the IDE drives (/dev/hd[e-h]), all had an empty partition table and the same error message. All the drives were on a secondary PCI IDE controller so they wouldn’t be affected by the (theoretical) bad IDE channel on the HD

    Technically I could recreate the partitions (the scheme was pretty easy) and things would most likely be fine, but at that point we realized that something beyond a bad drive, or a bad cable, or some oddness with the kernel was going on and I packed up and threw the server in my trunk as it’d be easier to deal with at home not under the watchful eye of an employee who is wanting to go home only an hour late.
  • 6:30pm – After 8 hours of completely wasted time, I drove home in the pouring rain, stopping only for $10 in gas and to forget to buy windshield washer fluid.
  • 7:30pm – Arrive home to tea and turkey ceaser salad. Yummy. Guess there was another good thing in the story.

If you read this far I’m impressed at your perseverance. If you skipped to the end to see what the result was after all the bitching was through I’m impressed by your efficiency.

I hope to have the box back up by Monday January 26th in one form or another, be it with the old OS and on borrowed hardware, or on a new motheboard, or something. My sincere appologies to those who rely on the box for mail and websites, I’m working as quickly as I can.

Scroll to Top