Thread: Gawd o Gawd
View Single Post
Old 09-03-2001, 01:06 PM   #4
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
OK, here goes. About a week ago I got a bad block reported on the drive. I stepped up my schedule of backups but noticed that, through the backups process, I could figure out while file contained the bad block. It was a log file; I figured this was good, since it was probably eating up space that hadn't been touched in a while. I segregated the file and marked it as having no permissions at all.

I ordered another drive but wasn't sure when I'd have a chance to install it. But on Saturday morning, more bad blocks started popping up. The slowly dying drive was now leaping off the cliff.

I made a final backup from one SCSI drive to the other - thinking I'd yank the bad drive, install the new one, install a fresh copy of Linux to that drive, and copy the old stuff back from the old drive. During that process, I was forced at one point to reboot. On reboot the bad drive failed a little harder. Now it failed the automatic fsck (that's roughly ScanDisk for you non-Linux weenies). I manually fsck'd and was able to recover and boot past about 100 file system errors. Now, clearly, bad spots were causing file system corruption.

Still, I recovered everything I could, then swapped out the bad drive, then began a fresh install from CD.

Then I made a fatal error. In the Red Hat install, you are given the option of allowing Red Hat to partition for you. In my haste, I decided that the default partitioning scheme wasn't bad and that I should just go for it.

What I didn't realize is that Red Hat assumes control of ALL the drives on the system in such a case, not just the first/booting drive; and it went ahead and politely reformatted and repartitioned the second drive as well.

Sys admin lesson: never make any assumptions about a vendor's defaults.

Since I had previously given my wife a window of between 1 hour and 7 hours to complete the whole process, I let her know that it would be closer to 7 hours. Then I repartitioned again, this time manually, setting things up precisely how I wanted them, and reinstalled.

I had a secondary backup FTPd to another system, besides my main backups which were aging. Putting both of them together on top of a fresh install, without overwriting any system files, I got almost everything back.

Total time spent was about 6.5 hours and it was on the middle day of a three-day weekend. But that's OK. It was probably the best day for it to happen as many USians are on vacation.
Undertoad is offline