The Cellar

The Cellar (http://cellar.org/index.php)
-   Cellar Meta (http://cellar.org/forumdisplay.php?f=3)
-   -   Latest Cellar crash better (http://cellar.org/showthread.php?t=6837)

Undertoad 09-23-2004 09:37 PM

Latest Cellar crash better
 
Holy crap, people.

This one was a killer, but I think the machine is now entirely much better, thankyouvery much. What follows are the crappy details that I had up here instead of a real Cellar for a few hours:
---


The power supply fan seized. OK, how many times have I mentioned how
important the power supply is? Now there's a very expensive Enermax
running the Cellar machine.

It first appeared that the system was suffering from the same trouble
that hosed it last time - a failing drive. But once I realized how hot
the machine was, and that the power supply was particular hot, I
figured it might have something more serious. I replaced the power
supply but the system was still failing. And then the motherboard
suddenly died.

The process of rescuing a Linux system that won't boot, involves
booting from "rescue" CD, fixing what's wrong with your system, and
then booting from the system again. This means hitting that DEL key,
going into the BIOS, and telling it to boot from the other device.
After doing that 20 times, suddenly the system just wouldn't boot at
all. I may have blown another BIOS setting, but... thing wouldn't
even beep.

That was last night, and at that point I had no recourse because the
stores were closed. This morning I bought a new motherboard (and a
new processor just in case). Installed those, they run great...
But what THEN turned up was a slowly failing hard drive or further
filesystem confusion.

So! I added another drive, and moved one filesystem to it, leaving
the root system on the questionable drive. I'll upgrade and move the
whole thing to another new drive over the weekend.

This corrected almost everything wrong, but the Cellar still wouldn't
fly. In fact, starting it would cause the system to crash again!
I found that the database tables that run the Cellar had been really
messed up, requiring very complete rebuilds. The system lived well
through database reconstruction work, which was like a stress test on
everything except the web and database. But that added another couple
hours of downtime for the Cellar. That's where it is now, the tables
aren't working right for some reason...

This stuff is getting expensive and so I'm going to add the tip jar back.
Future tip jar donations will go to devoting an entire system solely to
the Cellar, a move that's well overdue. It used to have its own system
when its was dial-up.

And why did all this happen? One reason I can think of. I washed the
fan filter, and I might have put it back WET. It didn't feel wet but
you know how those things are. Well, in a dusty environment (your
basic house), dust + water = glue. It might have just been too humid.
The thing died a week after replacing that filter so this may be a
reach, but my next house will include a clean room.

Undertoad 09-23-2004 09:42 PM

In the end, it was the long hours of trying to get everything back up and running, that lead to a few additional hours of downtime. I knew I had a good set of data, but was going to kinda heroic efforts to save about 6 hours worth of messages -- which is how many were lost this time. A handful.

I was so frustrated working all this time, additionally with no Cellar -- that I started making dumb mistakes. Jacquelita told me to step away from it for a while and she was right.

Now, on the good side, this system is almost entirely new. New mainboard, new power supply, new CPU and cooler, one out of two new disks. My goal is to upgrade it, maybe some time off hours this weekend or something, and then get rid of the last disk still in use. At that point the system will be almost entirely new hardware and entirely new software, and at that point it should only be another few months until the next crash.

lookout123 09-23-2004 10:09 PM

Tony - you are an absolute stud! thanks for all the energy (and money) you expend for the community.

Jacquelita 09-23-2004 10:10 PM

Hurray!
 
Congratulations UT

I knew you'd get this %$#@ figured out! Now you can rest easy tonight... Tomorrow night - Martinis - on me! :browhappy

Happy Monkey 09-23-2004 10:10 PM

http://www.cellar.org/images/smilies/eek.gif http://www.cellar.org/images/newsmilies/pain3.gif
Yikes.

Good work on the recovery.

lookout123 09-23-2004 10:13 PM

oh and just in case you have some spare time on your hands... i still don't see a tip jar. even if we can't contribute millions, every little bit has to help, right?

xoxoxoBruce 09-23-2004 10:32 PM

The house looked intact so I figured it was something you could handle, given enough time and money. Thanks, UT. :band: Thanks Jax.

glatt 09-23-2004 10:37 PM

I can only imagine what a frustrating pain in the ass this must have been for you.

Glad to see it up and running again.
Thanks for the effort you put into this. :)

Undertoad 09-23-2004 11:02 PM

And it's still a frustrating pain in the ass -- kernel just panicked a minute ago, and the system had to be rebooted.

Looks like this'll continue until I can get the root disk changed out... dammit

Undertoad 09-24-2004 07:52 AM

It stayed up overnight so things look good. Still expecting to upgrade/move this weekend, if I can manage an after-hours timeframe.

The Paypal donate button is back on the front page. Do consider donating. I know there's a small chance I could become a multi-multi billionaire next year, but it really is a small chance. Plus if that did happen I would give the money back to you, if I could. I have to pay for all this new hardware next month and I think I will be selling lint on eBay to do so.

Elspode 09-24-2004 09:00 AM

Genuine Cellar lint?! Man, I'll start the bidding off at $5.00!

Thanks, as always, UT. I don't know why you are so obsessed with keeping The Cellar a living and breathing entity, but I'm mighty glad that you are.

SteveDallas 09-24-2004 09:36 AM

You're going to sell lint on ebay???

Be careful, man, I think AT&T's Unix patents still cover that. You don't want to be mixed up in this SCO mess...

Seriously, that bites. Once again thanks for all you do to keep this place running.

Oh, and,
Quote:

Originally Posted by Undertoad
Still expecting to upgrade/move this weekend, if I can manage an after-hours timeframe.

Fuck that. This isn't some high-availability corporate environment. Do the move when you feel like it. Don't feel obligated to do it in the middle of the night just to avoid inconveniencing the rest of us.

BrianR 09-24-2004 10:35 AM

I'll bid $6.00! :D

lookout123 09-24-2004 10:50 AM

$6.25!

lumberjim 09-24-2004 12:55 PM

well.

jinx and i just made a small donation, and going by the number in parentheses ( which i believe to be the total number of transactions to a payee) there had only been 17 donations to the tip jar previously. i sure hope that that is not the TOTAL ever number. you can spare $25 or $50. get off of it. it's important. Tony and bruce can;t pay for this all by themselves.


All times are GMT -5. The time now is 09:10 AM.

Powered by: vBulletin Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.