The Cellar

The Cellar (http://cellar.org/index.php)
-   Technology (http://cellar.org/forumdisplay.php?f=7)
-   -   Anatomy of a hang (http://cellar.org/showthread.php?t=10297)

Undertoad 03-21-2006 11:25 AM

Anatomy of a hang
 
It's another one of those thread titles that sounds dirty but isn't.

Like every PC ever known, my current desktop PC is going through a painful period. It's experiencing intermittent hangs. Freezes.

I'm an experienced problem-solver; I did it for a living for a while. But I have no idea why mine is having this problem, so I figured I'd list what I did, and what I do, to try to figure this one out.

The symptom: well, usually it happens in Firefox. I'm opening up tabs, or scrolling, or doing something, when the system stops being responsive. The cursor may go to an hourglass in the Firefox window. For about 10 seconds the system may be able to do other things; I may be able to hit ctrl-alt-del to get to the windows task manager, or I may be able to switch windows, or I may be able to highlight a desktop icon. Or I may be able to do none of this.

I think I may usually be doing memory-intensive operations when it happens. I don't think it will happen right now, for example; all I'm doing is typing into the edit box, so I'm thinking it'll stay stable during this period.

Just in case, I'll save this post :) It's the "symptoms" part, and next I'll say what I think so far.

Undertoad 03-21-2006 11:36 AM

The hang, as far as I know, has only happened when Firefox has been active. But it has happened when Firefox was open and Thunderbird was executed - T-bird being a hugely memory-intensive thing, moreso than Firefox.

As a problem solver, my first thought is: what has *changed*? Is there anything I can point to in the last few days that might make a difference?

Hardware: a big yes. In fact, I moved my entire room around last week. At that point, a day later, my sound card failed. Coincidence? Who can say, but a failing speaker system might have led to a shorted output, blowing a channel, so I had to replace the whole thing. These days it is hard to get a mid-priced sound card in stores, so I wound up getting an external: a Creative USB solution.

But the system had been running correctly with this in place for a week, and was rebooted several times during that period.

Software: yes. Two days previously, I had run a "startup cleaner" and turned several things off during startup. This was partly because I was annoyed that Real wanted to remind me to upgrade, those fuckers. Partly because I had been cleaning Jacquelita's system of malware, and I was getting paranoid.

I was able to reboot cleanly and run well after that. But shortly thereafter I started to get errors with a hardware monitoring program that watches my motherboard's temperature, fan, and voltages. It would not run at startup, although when I ran it after startup it seemed to run fine.

Undertoad 03-21-2006 11:47 AM

OK, so does the motherboard monitoring software say anything interesting? It says my CPU is so cool I could touch it, my fans are running fine, and the power supply is producing good voltages -- while the monitor is watching it -- except for maybe a few little hitches on the 12V rail. I write that off to disk usage.

Although I should say, when the system hangs, the disk activity light seems to be at 100%.

Is this problem hardware or software? Of course, it only seems to happen when software is running; the system has remained stable overnight, when it's relatively inactive. So my first thought was that it must be Firefox. The problem started to really get aggravating yesterday, and only appeared in Firefox; Unreal Tournament 2004 worked for an hour, and that'll tax your memory and your whole system harder than Firefox.

But was that just chance?

I disabled smooth scrolling, since it seemed to happen during scrolling. No effect.

I disabled a few extensions that I didn't need, and uninstalled a few that I never used. Often Firefox extensions will make it seem like Firefox itself is unstable. But this had no effect on the problem.

Undertoad 03-21-2006 11:55 AM

OK, seems like a memory problem then. So I've taken the 1GB out of my other system, and replaced the 768MB currently in this one. A complete memory upgrade: more memory, faster memory, and identical sticks.

The problem still happens.

And that's where it stands now. Next, I'll do the IotD, which will take a few memory-intensive, system-intensive things (maybe even Photoshop), and if this morning has been any example, the whole thing will just freeze up at some point in the middle of all that.

It just seems more hardware-ish. The motherboard is an Asus, whatever the most common Socket A AMD-supporting board was a year and a half ago. The capacitors are not bulging.

Your thoughts, anyone?

lumberjim 03-21-2006 12:02 PM

yeah right

dar512 03-21-2006 12:17 PM

Have you done a complete disk check?

smoothmoniker 03-21-2006 01:12 PM

Is the problem that you're using a PC?

Undertoad 03-21-2006 01:22 PM

Disk check = OK

PC = yes, but if this problem is figured out I can fix it myself for as little as $50, or reuse components to build another for $200, so the point is moot.

Beestie 03-21-2006 01:24 PM

I don't know much about hardware or software but I have some guesses:

1. Firefox has a memory leak; and/or
2. Firefox inadvertently creates multiple references to the same memory address; and/or
3. Firefox creates partially overlapping references to the same memory block ; and/or
4. Firefox has some unmapped cases in its algorithms.

I've installed and uninstalled Firefox quite a number of times on several computers and have decided that while I like the program a lot, it needs a few more minutes in the oven.

Clarification on point #1. I don't necessarily think its a true memory leak because the task list does not reflect excessive memory allocation to Firefox. But the way it starts behaving makes me wonder if somehow, Firefox thinks all the memory is tied up and therefore unavailable leading it to freeze which somehow cascades into a general PC lockup. I dunno but I'm past tired of trying to get that program to act right.

dar512 03-21-2006 01:28 PM

If you're not having problems in any other program, then I'd have to suspect Firefox also.

Undertoad 03-21-2006 01:44 PM

I've used Firefox since 0.7. And, somehow someway, I've been using it for the last two hours without a problem. I've opened and closed 50 tabs; my IotD search method involves opening 25 tabs simultaneously.

I think it shows up in Firefox because Firefox is 90% of what I do. When I can get to the task manager during the hang, it reports Firefox as taking about 100,000 k when the problem seems to happen.

There was one odd thing, though... during that disk check, the D: drive seemed to have a moment where it was louder and grindy-er. It got through the check OK, but I'm gonna back that up now and check it harder. There is space allocated on that drive for swap... or whatever Windows calls it...

barefoot serpent 03-21-2006 03:25 PM

Quote:

Originally Posted by Undertoad
or whatever Windows calls it...

Yeah, you might want to fiddle around with the virtual memory settings.

Undertoad 03-21-2006 03:43 PM

Yes, I've moved all virtual memory off that drive, and I'm now moving all the data off that drive. Because of various incorrect business decisions, I happen to have about 6 80GB drives, so upgrading the drive anyway is not a bad idea.

(If anyone wants one of these Seagate Barracude 80GB drives, let me know. Unused. $50 plus shipping. PM me.)

Kitsune 03-21-2006 03:48 PM

I had the same problem, but could never tell if it was memory, disk, or motherboard. Does your HDD access light go solid when you freeze up as well?

Because it was so infrequent and never resulted in a OS or App dump, I could never diagnose it. Troubleshooting it by swapping hardware would have proven too expensive.

Undertoad 03-21-2006 03:53 PM

Yes, it goes solid... where I have the system, under the desk, I don't notice the light so much. But every time I've checked, during the hang, that light is 100% on.

Kitsune 03-21-2006 04:05 PM

Quote:

Originally Posted by Undertoad
Yes, it goes solid... where I have the system, under the desk, I don't notice the light so much. But every time I've checked, during the hang, that light is 100% on.

I had the same problem on my Shuttle. I feared that I had done some damage to the box since the case had awful air flow and I tended to really cook the thing, but the problem never made itself known while I was playing memory and processor intensive games. It only showed up when I was simply using MS explorer to browse directories and files. The HDD light would go solid, the system would "pause" and it seemed as if the disk couldn't find what it was looking for. Yet, I couldn't hear the disk seeking/clicking/grinding and after a few moments, the HDD light would go out and everything would resume. The box never froze for more than 7 seconds, but it was really annoying because it was always more than 3s. An OS reload didn't fix it.

I'm tempted to blame the on-board IDE controller, just because of the HDD access light, but who knows?

Let me know if you want any of the hardware manufacturer information or model numbers. Maybe there is something in common.

Undertoad 03-21-2006 04:27 PM

Moving files from D: to C:, the system got really slow during one set of files, and hung during one particular file.

When I returned to explore that folder, the system hung again while I was just browsing the suspect directory. I didn't even open any of the files.

Luckily I don't need any of those particular files. I've been able to move just about everything else off of that partition.

Sadly there is another 20GB partition to move before the entire disk can be swapped out. But we have a key suspect in the hangs. Sadly we still don't even know whether it's hardware or software at it's root, right? It's either bad sectors on the drive, the handling of which is causing Windows internals to completely barf, which really shouldn't happen, or an NTFS filesystem problem which Windows' own disk check failed to find.

Not looking good for Mr. Gates. Updates to follow.

Kitsune 03-21-2006 05:01 PM

Quote:

Originally Posted by Undertoad
It's either bad sectors on the drive, the handling of which is causing Windows internals to completely barf, which really shouldn't happen, or an NTFS filesystem problem which Windows' own disk check failed to find.

Try giving XP a burned CD with some bad sectors on it. The result, when the sectors are accessed, is a panic and unexpected reboot. There is no doubt that Windows doesn't know how to handle bad data, but I'd expect some kind of error during the disk checks you did.

This reminds me of an issue that took me months to diagnose on some servers at work. Check out this IBM system hang from hell:

Quote:

The RSAII (Remote Supervisor AdapterII) has a internal timer which rolls over every 76.5 days.

Just prior to this event, the code gets into a mode whereby it sends multiple SMI (System Management Interupt) signals to the main processors, causing it to hang, or freeze.
No errors, no dumps, just a sudden, unexpected black screen. We had to let it happen twice before we figured out that it happened exactly every 76.5 days.

tw 03-21-2006 05:17 PM

Quote:

Originally Posted by Undertoad
OK, seems like a memory problem then. So I've taken the 1GB out of my other system, and replaced the 768MB currently in this one. A complete memory upgrade: more memory, faster memory, and identical sticks.

The problem still happens.

Your symptoms are consistent with voltage problems. BTW, intermittents created by electrolytic failure can occur before those capacitors start bulging. Motherboard monitor is not sufficient for monitoring voltages - only for monitoring for voltage changes. IOW motherboard monitor must be calibrated with a multimeter.

Meanwhile, what numbers are you using for 'good voltage'? What are you using (what program) for testing NTFS filesystem? Did you download the hardware diagnostic for that disk drive and execute only 'read only' tests? Hardware test is independent of the NTFS filesystem test - sometimes provided useful information.

One final point. Confirm that the BIOS setting still agree with what the drive actually is. I have seen where BIOS refused to see a drive properly - slowly destroyed disk data structures as NTFS kept fixing them.

What are you using for file transfers? Most copy programs have an option to ignore errors - to complete the file transfer.

Few hardware items that can hang an pre-emptive MT system. They include memory, CPU, only some functions in the peripheral interface, and the video controller. A disk drive with internal problems should not hang an MT system. It should only hang the task. A list of usual suspects.

Start with the volt meter. Don't use the motherboard monitor until after calibrated with that meter.

Undertoad 03-21-2006 05:18 PM

Well I hadn't done a sector-level check yet; I wanted to just get the data off ASAP, which is now done (after yet another hang). It's doing the sector-level check right now.

Still don't even know 100% that it's the disk -- that's what's frustrating about diagnosis like this. But that's part of why I wrote it here - I figured, hey other people might enjoy the drama of watching someone else's guessing session.

tw 03-21-2006 05:27 PM

Quote:

Originally Posted by Undertoad
I figured, hey other people might enjoy the drama of watching someone else's guessing session.

A power supply that appears to be good without using the meter can cause numerous other components to appear defective. It is why such hardware analysis is best started by first confirming hardware starting with the one function that can subvert everything else - power supply. Again, what numbers are you using for voltages. This because specification numbers and measured numbers may not be same. What does that 12 volts dip to?

Undertoad 03-21-2006 05:55 PM

I've now measured it with the multimeter.

Motherboard monitoring program says 12v+ is 12.41
Multimeter says 12.26

Motherboard monitoring program says 5v+ is 5.06
Multimeter says 5.11

The monitoring program says the 12v+ dips just slightly, to 12.35.

(Microsoft's) sector-level check now complete: checks out OK.

tw 03-21-2006 06:21 PM

Quote:

Originally Posted by Undertoad
I've now measured it with the multimeter.

Motherboard monitoring program says 12v+ is 12.41
Multimeter says 12.26

Motherboard monitoring program says 5v+ is 5.06
Multimeter says 5.11

The monitoring program says the 12v+ dips just slightly, to 12.35.

Those two voltages are high - and ok. But what is the 3.3 volt wire? That orange wire should measure above 3.23 volts. (Red is 5 volts; yellow is 12 volts). I assume red wire was measured. But voltage on purple wire is also significant (should be greater than 4.87). Also gray wire should remain well above 2.5 volts (gray wire typically would not be consistent with your symptoms but check it anyway). If all four (red, orange, yellow, and purple) measure OK, then move on to those other suspects.

Based upon those numbers, set alarm points for the voltage monitor to 11.86 or 11.9 (for 12 volt) and to 4.9 volts (for 5 volt).

Doubt it will provide any further information. But disk drive manufacturer's test program for that disk also could be used for seek tests, various multisector access tests, and other things that Microsoft program does not accomplish. Do this if short on ideas.

busterb 03-21-2006 09:39 PM

My 2cents . A multimeter might, but my son and I had a bad disagreament about a power supply. The meter showed it good. But a PS tester said no go.
About 15 bucks. POWMAX atx power tester.

Elspode 03-22-2006 12:49 AM

I agree with Buster, here. I've had some weird PS shit happen, and a new cheap one installed fixed my problems.

tw 03-22-2006 07:34 AM

Quote:

Originally Posted by busterb
My 2cents . A multimeter might, but my son and I had a bad disagreament about a power supply. The meter showed it good. But a PS tester said no go.
About 15 bucks. POWMAX atx power tester.

Tester cannot test for all that a meter does. A best power supply test is when fully under load - in the system. A power supply disconnected cannot be properly tested. Also note voltages - the numbers. Numbers are not the published ATX limits. I asked for numbers - not the subjective "power supply is good" - for this reason.

A best test of a power supply is to take numbers while multitasking is accessing every peripheral - disks, floppy, CD-Rom, network, sound card - simultaneously. Anything done by a power supply tester can be performed by the meter. Also are power supply defects that a tester cannot detect; but meter can. The power supply tester cannot test a power supply under full load - when many defects become apparent.

Then there is the rest of a power supply 'system'. It’s not just the power supply that must be tested. This also accomplished without disconnecting anything.

The down side of a meter is that these tricks must be understood. For example, what voltage would you have called 'good'?

Best way to test a power supply is when connected to system. Never start by disconnecting things until long after relevant facts have been collected. Power supply tester cannot do that. Just another reason why a meter finds problems or confirms power supply integrity so much faster. Unfortunately, too many declared a 'subject' good rather than provide those numbers. Those numbers - such as UT's numbers - tell us more about the system that has not been discussed. This is why those other voltage numbers (not yet provided) might be informative.

Pie 03-22-2006 07:44 AM

Also, check out your northbridge fan. I had similar intermittant failure on my fileserver -- turns out the nb fan was choking, and data was getting corrupted on the way to the drive... I lost two years' worth of email. :(

Oh, yeah -- now I back it all up three ways. :)

tw 03-22-2006 07:48 AM

Did it? For example, in a GM car, they kept replacing the computer. There was nothing wrong with the computer. But computer was replaced rather than first learn WHY failure was happening. Swapping was only temporarily cleaning a defective connector. Car would fail again later.

Same lessons are from Challenger. Management insisted that it was safe to launch because a shuttle safely launched one year previously. They ignored the near burn through of O rings in that one year previous flight. They did not want to know why. A perfect example of fixing things without first learning the whys. In that case, we should have called Challenger murder. Instead we destroyed the career of the engineer who told the truth to the Roger's commission. Instead too many insist they need not know why - if it appears to work.

In a third case, a GM shop foreman finally got tired of same GM model (Buick) with similar problems. So he broke open the computer. In each failure, the PC board was cracked in a corner. Regional rep then told him this is a known problem even though it was not in any service bulletin. Since the test facility was not informed of this problem, then vehicle computers tested OK and were shipped as repair parts. At GM, because reasons why were not important, then failure was acceptable.

Numerous examples that also explain why I see this so often with clone computer users. They get used to having failure as a norm. It is the difference between just swapping parts to fix something - curing symptoms - verses fixing something right the first time - learning why.

tw 03-22-2006 07:54 AM

Quote:

Originally Posted by Pie
Also, check out your northbridge fan. I had similar intermittant failure on my fileserver -- turns out the nb fan was choking, and data was getting corrupted on the way to the drive... I lost two years' worth of email.

If Northbridge needs a fan, then the Northbridge was defect. A computer must work just fine in a 100 degree room or have parts heated by a hairdryer on high. Heat is a diagnostic tool. Unfortunately, too many fix a defect by curing symptoms. They don't learn the whys. They simply install more fans.

That is the Home Improvement joke. Fix things with "more power". If a fan is required on that Northbridge, then the Northbridge IC is defective. One chassis fan is more than sufficient cooling for most every computer. That one chassis fan will provide sufficient airflow over any Northbridge.

But again, that Northbridge must work just fine when so warm as to be uncomfortable to touch. Learned this of the old timers who used to say in the 1960 - if it does not leave skin, then it is not too hot. Today, our IC must run normal at even higher temperatures.

A Northbridge fan suggests Northbridge IC is defective. Someone cured the symptom rather than fix the problem. Heat is a diagnostic tool.

Undertoad 03-22-2006 08:01 AM

So far so good - the new drive is in, partitioned, formatted and is now getting all the data from the old drive, and -- no hangs in hours.

The outgoing drive is an IBM Deskstar 60GB - not considered the most reliable of drives. It has a manufacture date of Jun 2001 so it has seen enough duty and can be retired.

That N-bridge fan, Pie, is a pet peeve of mine in motherboard designs. I can't believe manufacturers decided the best way to handle that problem was to put a dinky, weak fan right where all the dust in the system will flow and clog it up. This MB has a heat sink there, much better idea.

Kitsune 03-22-2006 08:44 AM

Keeps us posted, UT. I'd be interested in how the new disk fairs, as it might be the solution to my shuttle issue.

Pie 03-22-2006 10:02 AM

Quote:

Originally Posted by Undertoad
That N-bridge fan, Pie, is a pet peeve of mine in motherboard designs. I can't believe manufacturers decided the best way to handle that problem was to put a dinky, weak fan right where all the dust in the system will flow and clog it up. This MB has a heat sink there, much better idea.

Yep. That was a sucky motherboard, alright. Maybe one of these days we'll get around to replacing it. Right after we finish the MythTV box. :rolleyes:

tw 03-22-2006 12:04 PM

Quote:

Originally Posted by Undertoad
So far so good - the new drive is in, partitioned, formatted and is now getting all the data from the old drive, and -- no hangs in hours.

Were you able to recover the 'corrupted' files? If so, how?

Undertoad 03-22-2006 12:22 PM

I didn't need those particular files, so I just deleted them by deleting their parent directory from an explorer window. It could have just been coincidence that the hang happened when "revisiting" those particular files.

Even now, if this fixes the problem, it's hard to figure out what really happened (and not worth the time to diagnose more completely). It was probably the drive failing, but still, Windows should fail more gracefully when faced with a resource that's having trouble. It surprised me when the system hung even when I took virtual memory duties away from that drive. I could see a failing drive causing an OS a headache when it's swapping to it, but when doing more "routine" I/O, just reading files or folders, it shouldn't just lose its place that badly.

Maybe the drive was failing harder and drawing too much power in spikes, and thus causing other hardware problems?

busterb 03-22-2006 12:43 PM

Quote:

Tester cannot test for all that a meter does. A best power supply test is when fully under load - in the system. A power supply disconnected cannot be properly tested. Also note voltages - the numbers. Numbers are not the published ATX limits. I asked for numbers - not the subjective "power supply is good" - for this reason.
Tw. Belive it or not but I went to electronic school years ago. And have maybe more meters laying around than most folks. Meters lie. How many people think even know how to calibrate there meter? Are you telling me that the tester doesn't simulate a load? Well maybe not, but my son said was bad. I said no because I've taken readings w/voa meter. Anyhow a new PS fixed it.

tw 03-22-2006 01:01 PM

Quote:

Originally Posted by busterb
Tw. Belive it or not but I went to electronic school years ago. And have maybe more meters laying around than most folks. Meters lie. How many people think even know how to calibrate there meter? Are you telling me that the tester doesn't simulate a load? Well maybe not, but my son said was bad. I said no because I've taken readings w/voa meter. Anyhow a new PS fixed it.

If a tester provided a significant load, then tester would be too hot to handle with comfort. It would be a 300 watt hot plate. Testers apply minimal load to meet startup requirements for power supplies (some - not all - supplies need a minimal load to operate).

Yes, meters do not necessarily report RMS voltage: they lie. But that is what makes many meters so good at identifying bad power supplies. Again, note numbers provided because of how meters typically work.

I am more than just a tech. We designed power supplies even in the 1970s. Have even demonstrated on a system that was intermittent - the supply was not providing power as claimed. System would boot and mostly work. And then we put a meter on it. Quite obvious that a clone power supply could not service the load - even though the owner insisted supply was replaced and now working. Meter demonstrated otherwise. Been doing this stuff for too many decades. I prefer an oscilloscope because it says faster what I want to learn. But the meter is how field problems are identified or eliminated quickly as a suspect.

A $15 tester, among other things, does not provide a sufficient load for testing. It can declare a power supply bad but it cannot declare a power supply as good.

BTW, one final point. Notice that tester did not get hot and did not contain fans. Fans would be required if tester sufficiently loaded a power supply. Just another reason why power supply is best tested (and tested faster) still inside the computer. Just another example of why 'learning why' makes those meters a so superior solution.

BTW, do you still have a VTVM? I have a wee bit of knowledge and experience.

Undertoad 03-22-2006 01:17 PM

Spoke too soon, it just hung again.

tw 03-22-2006 01:18 PM

Quote:

Originally Posted by Undertoad
Maybe the drive was failing harder and drawing too much power in spikes, and thus causing other hardware problems?

Disk drives draw so little power as to be totally irrelevant. To cause power problems, drive would draw on the order of 100 watts - become so hot as to burn parts.

Disk drive computer talks to motherboard computer using a fixed set of command - similar to how networking works. There is nothing electrical in a disk drive that would hang a computer. Except when a computer is not so resilient - booting. Have never seen a disk drive hang any NT system except during boot. During simplistic boot programs, the software may sit waiting for a response forever - a hang. Have seen tasks hang due to a disk drive problem. Have seen NT slow to a crawl due to a bad disk. But never had an NT system lock up so that Task Manager would not operate - except when Task Manager could not load from that drive.

Marginal conditions can occur on disk hardware causing a drive's computer to not respond or reply to commands. It is why software designed to test hardware (ie from IBM) is so much better at testing disk hardware; rather than software designed to test Windows interface to hardware (Microsoft).

This being only background information - when that next drive fails. Meanwhile a drive failure should have been recorded in Microsoft's event (system) log. Find it using HELP. Also the drive hardware (an IBM creation) would have data to indicate ongoing failures. Forgot what they call that function - smart something. Just another reason why IBM hardware test software could have been more useful - I believe it is now a Toshiba product.

Undertoad 03-22-2006 01:29 PM

Whaddya know, the event log.

I should have known about that! I take it back about Mr. Gates.

The event log has numerous bad block errors listed for drive D, even after the drive has been replaced. Therefore these errors are probably not actual errors, but a failing controller thinking they ARE errors.

tw 03-22-2006 02:55 PM

Quote:

Originally Posted by Undertoad
The event log has numerous bad block errors listed for drive D, even after the drive has been replaced. Therefore these errors are probably not actual errors, but a failing controller thinking they ARE errors.

The controller is actually nothing more than some drivers and receivers. However the chip that contains that 'controller' can be tested. First do something that constantly accesses the drive. This is what hardware diagnostics are for. Once you have established a pattern, and then selectively heat different sections with a hairdryer on high. Yes it must be that hot to a human and yet that cool to a computer. The offending part (i.e. semiconductor, cable connector, etc) identifies itself via increased errors with temperature.

As noted earlier, heat is a diagnostic tool.

Drive D is the original offending drive? Well, it may have bad drivers/receivers on its computer board. It might cause motherboard computer to not communicate with a C: drive computer. IDE bus is a network cable where each computer - drive computer from each disk and the motherboard computer all share time talking on that cable. Therefore problem could be slave drive computer, master drive computer, south bridge IC on motherboard, etc. This is what the hairdryer does. To make intermittents more frequent by applying heat. Find failures by running parts hotter - then do not fix those parts with more fans.

Hairdryer that causes any computer part to fail - that part is 100% defective. And that part will get worse with age.

busterb 03-22-2006 07:40 PM

cold spray works the other way. If ya think it's hot give it a shot of freeze ass.
Quote:

BTW, do you still have a VTVM? I have a wee bit of knowledge and experience.
Naw I had a clean up one day and found a sucker to give it to. Damn tubes. Once I was working, "playing" with an old tube set and didn't see my hickory stick. Stuck my hand in to shake a tube and burned holes in 3 fingers. Never worked on another tube set.

Kitsune 03-23-2006 09:07 AM

Quote:

Originally Posted by busterb
cold spray works the other way. If ya think it's hot give it a shot of freeze ass.

It should also be noted giving someone a shot of freeze on the ass makes for good office prankage. Freez-It was more often used to give someone a case of frostbite as a joke than to test components.

So, UT, any verdict?

Undertoad 03-23-2006 09:29 AM

Yes, Newegg overnight shipping rocks!

It doesn't make sense to isolate which particular chip is having trouble because A) they're all on the same board, and the fix is the same: replace the whole board; and B) the parts are so close together now, that heating one particular part without heating any other is nearly impossible. At the least it requires that the board be completely de-cased and set up completely differently.

So I've ordered a new motherboard from Newegg. It's only $102, so what the hell.

The problem is that my old board is too old and they don't sell it any longer. So I had to get a new board. But my processor is pretty old too and I sure would like to get something that supports SATA since I have a big old SATA drive just sitting here.

So I decided to get a much newer board with more capabilities. Of course that meant changing out the video card too, because AGP is now out in favor of PCI Express. There's another $150.

And of course the processor. Doesn't make sense not to buy a 64-bit processor today; and if you get one with 1MB of cache you get another speed increase, so that makes sense. $215.

And well, it turns out that modern motherboards have a new 24-pin power connector. And there's a new feature of video cards called SLI where you can tie two video cards together, if you have the power capability for it; SO I got a new $80 power supply as well.

And with the beauty of Newegg overnight and rush processing, it's on a FedEx truck right now, headed my way.

By tonight the old problem should be completely gone, unless it's something *really* funky in software. And then, I'll have an entirely new set of problems: making sure all the drivers are in line and updated so the thing runs right with the new hardware.

I think, at one point in this whole mess, I complained about people buying a whole new computer to fix their comptuer issues. This will be almost what I will have done. Of course it's mostly out of the urge/need to upgrade anyway, I rationalize.

busterb 03-23-2006 10:33 AM

Show us the shopping list, in case some of us should get an upgrade attack. Who ever forbid.

Undertoad 03-23-2006 11:05 AM

MSI K8N SLI-F Socket 939 NVIDIA nForce4 SLI ATX AMD Motherboard - Retail

MSI NX6800GS-TD256E Lite Geforce 6800GS 256MB GDDR3 PCI Express x16 Video Card - Retail

Antec TRUEPOWERII TPII-480 ATX12V 480W Power Supply - Retail

AMD Athlon 64 3700+ San Diego 1GHz HT Socket 939 Processor Model ADA3700CFBOX - Retail

WabUfvot5 03-23-2006 08:31 PM

I'd bet my bippy it is power related. Since the sound card failed around the same time I suspect you got a surge when moving things. Maybe it just juiced the mobo a bit too much or the power supply took a knock. Power supply failures are a total bitch to solve. I had one on an old machine... the only time it froze was when I loaded a page with flash. I have no idea why flash did it, but it did. New computer, same general setup = no problems. Power issues manifest in really really strange ways.

Undertoad 03-23-2006 08:57 PM

All bets are off. I'm running now with all that new gear, but what do I find in the Event Viewer? You guessed it,

Quote:

The device, \Device\Harddisk0\D, has a bad block.
The same error that it had before.

No HANGS yet, and in the past these errors have logged more often.

But with a new controller, new drive, new everything, it is infuriating to see those errors in the log.

dar512 03-23-2006 09:04 PM

Wow, UT. That sux pond water.

Is it possible the damage was done before the switchover? You might try fixing the disk and have it do a full surface scan.

Undertoad 03-24-2006 08:40 AM

Quote:

The device, \Device\Harddisk0\D, has a bad block.
Oooookaaaayyy....

Maybe this message doesn't refer to the D: drive at all?

I think it's referring to the C: drive, which, in theory, is "Hard Disk 0".

:smack:

This morning I have done a complete chkdsk of the C: drive. It found some problems, though its reporting leaves a lot to be desired. In the end it listed 4KB in bad sectors and did make changes to the filesystem.

Next I'll do a complete chkdsk of the E: drive (which is the other partition on that same disk).

dar512 03-24-2006 08:49 AM

Actually, I think harddisk 0 refers to the physical drive, not the virtual (dos/windows) drive.

Undertoad 03-24-2006 09:47 AM

In order to diagnose the problems in my "A" system, I replaced its old memory with newer memory stolen out of my "B" system.

B memory into A -> A works well

A memory into B -> B doesn't boot.

So. Add $80 of new matched 1GB Kingston memory to A, and take the 1GB out of A and put it back in B.

B memory into B -> B doesn't boot.

Fuuuuuuuck

Now system B is dual-boot; the IDE drive boots into Windows, the SATA drive boots into Linux (Fedora). Set it to boot into Linux and it does that fine. The BIOS recognizes both the IDE and the SATA. So, maybe the Windows boot record got messed up somehow during all the various rebooting and such.

Undertoad 03-24-2006 09:50 AM

Oh yeah, so to finish this whole scenario, I put one of the sticks of A memory into the B system, to boot Linux with 1.5GB instead of 1GB, and it won't boot into Linux! So the whole process has also killed a stick of memory. Will this madness never end?

tw 03-24-2006 11:48 AM

Quote:

Originally Posted by Undertoad
... So the whole process has also killed a stick of memory. Will this madness never end?

It is called shotgunning. Trying to fix a problem rather than first learn what is wrong. Provided were some ideas of how to find a problem long before fixing anything. Break the problem down into parts - test individual system parts - and don't change anything.

Memory test - or what burn-in really is. Don't swap memory. Run a comprehensive memory diagnostic - either one provided by a responsible computer manufacturer or a third party diagnostic such as Memtst86 or Docmem. Execute diagnostic one or two passes. Even bad memory sometimes passes that test. And then we use burn-in - a concept completely misunderstood by those who used English interpretation to assume burn-in means running overnight.

Heat memory with a hairdryer on highest setting. A tropical paradise to good memory and hell to bad memory. Bad memory heated above 100 degree F often will expose itself as the pervert it really is. Otherwise move computer outside to 30 degree weather and leave it run the same memory diagnostic for maybe an hour. Accomplishes same thing that busterb discussed with coolant spray.

And yes, once I heated the oven to just over 100 degrees, put the clone computer in that oven, and found a defective cache Ram.

If memory passes both heat and cold test, then memory is fine. Move on to other suspects. If a memory stick has been damaged, well that is but another reason to not shotgun.

I have watched others swap memory because the new memory was defective. They did not use anti-static protection which is especially critical if room humidity is below 40%. Therefore memory that worked just fine on a memory diagnostic at 70 degrees was really defective - maybe static damaged. Just another problem with shotgunning. A problem created by another flawed assumption that parts (once thought to be good) will always be good.

Don't swap things. First collect facts. As a result of shotgun diagnostic techniques, you are now spending vast sums of money.

And yes, I am also concerned with that Hardrive0 being drive D. Something is wrong - just another fact that should be collected before changing anything. Does your drive have multiple partitions or were you running a master / slave combination as I originally asked? Answer is found in Disk Management program - among other places.

Meanwhile, get the comprehensive diagnostic from the hard drive manufacturer. Why? Among other things, because a diagnostic eliminates many unknown variables - ie Windows which is a massive variable. Every test is about stripping a problem down into parts - and then testing those parts - all without physically changing hardware. Don't even look at Windows until hardware diagnostics declare hardware good. And yes, that also means temperature cycling - ie the hair dryer - also called burn-in.

tw 03-24-2006 11:58 AM

Quote:

Originally Posted by Jebediah
I'd bet my bippy it is power related.

Acutally it shows little indication of being power related. This made more obvious when first learning what the many functions are inside a power supply AND how computers (such as the one on a disk drive) are both designed and damaged. Actually it looks more like classic static electric damage. But that is only one of a long list of possibilities that requires better details from UT. Original problem did not sound like a disk drive problem. But again, insufficient details were not provides.

For example, what could be causing all problems? An improperly crimped wire in the disk drive cable. But again, only wild speculation from a long list of possible reasons.

The reason why I am answering this is that those who don't know why failure happen then just to the myths promoted by power strip protector vendors. The lights dimmed - therefore it must be a surge. How does a voltage drop become a massive voltage increase. But again, this is how myths are promoted - technical details never learned before declaring a conclusion - or why George Jr could preach that Iraq was a threat to the US - the mythical WMDs. It is why military academies graduate engineers - people who learn why underlying facts and details must first be learned.

The claims of 'power related' damage is just too often a myth for too many reasons. Often found where people shotgun rather than first learn facts. Those claims of WMDs - classic example of shotgun reasoning.

Undertoad 03-24-2006 12:15 PM

Modern memory can't be heated with a hair dryer, because the DIMMs have aluminum "heat spreaders".

tw 03-24-2006 12:25 PM

Quote:

Originally Posted by Undertoad
Modern memory can't be heated with a hair dryer, because the DIMMs have aluminum "heat spreaders".

Modern memory - including the heat spreader - is and can be heated with hair dryer. And at those temperatures, it is called paradise to the semiconductor. Heat the sucker. If in doubt, get look at the first page of that memory's data sheet (enter one of the semiconductor's part number into Google) to literally read memory temperatures that are called good, desireable, and acceptable. A hairdryer on high does not get anywhere near to unacceptable memory temperatures. The heat spreader only means it takes longer to get the memory to a proper test temperature.

Undertoad 03-24-2006 02:09 PM

OK, now I'm stuck.

I want to copy bad drive C to good drive D. I want D to be bootable so that after copying, I can just remove C, make D a master, the new C.

In this case C and D are pretty much identical drives. In the past, I've done this with Partition Magic. But now, PM refuses to do it because it finds bad sectors on C.

I guess I could copy C to an external, put D in place as the new C, install XP to C, and then copy everything from the external to C. But isn't there a better way?

Kitsune 03-24-2006 02:17 PM

Might not be a bad idea to do that, anyways. At least if you move everything to an external disk you'll have the ability to do a good format and even use some serious disk sector checking before you move it all back.

Undertoad 03-24-2006 02:53 PM

Urgh. There are some files in /windows that you can't copy while windows is booted, so I can't just drag and drop /windows to D: and I can't copy them to the external.

And you can't recursively copy directories in windows recovery console mode.

If I reinstall XP, those files will then be open when the new copy of XP boots, and won't stand for being overwritten by the old copies.

Can I copy those files in safe mode?

Kitsune 03-24-2006 03:05 PM

Quote:

Originally Posted by Undertoad
Can I copy those files in safe mode?

What do you want to copy of out /windows and overwrite on the re-install? Just browsing through mine leads me to understand there isn't much, if anything, in there that you'd want to overwrite manually and not let Windows handle, instead, especially if it is a file that remains open while the OS is running. Manually changing critical system files without having windows "know" about it is asking for potentially big problems later...


All times are GMT -5. The time now is 08:43 PM.

Powered by: vBulletin Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.