The Cellar  

Go Back   The Cellar > Main > Technology

Technology Computing, programming, science, electronics, telecommunications, etc.

Reply
 
Thread Tools Display Modes
Old 03-21-2006, 04:05 PM   #16
Kitsune
still eats dirt
 
Join Date: Sep 2003
Location: Tampa, FL
Posts: 3,031
Quote:
Originally Posted by Undertoad
Yes, it goes solid... where I have the system, under the desk, I don't notice the light so much. But every time I've checked, during the hang, that light is 100% on.
I had the same problem on my Shuttle. I feared that I had done some damage to the box since the case had awful air flow and I tended to really cook the thing, but the problem never made itself known while I was playing memory and processor intensive games. It only showed up when I was simply using MS explorer to browse directories and files. The HDD light would go solid, the system would "pause" and it seemed as if the disk couldn't find what it was looking for. Yet, I couldn't hear the disk seeking/clicking/grinding and after a few moments, the HDD light would go out and everything would resume. The box never froze for more than 7 seconds, but it was really annoying because it was always more than 3s. An OS reload didn't fix it.

I'm tempted to blame the on-board IDE controller, just because of the HDD access light, but who knows?

Let me know if you want any of the hardware manufacturer information or model numbers. Maybe there is something in common.
Kitsune is offline   Reply With Quote
Old 03-21-2006, 04:27 PM   #17
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
Moving files from D: to C:, the system got really slow during one set of files, and hung during one particular file.

When I returned to explore that folder, the system hung again while I was just browsing the suspect directory. I didn't even open any of the files.

Luckily I don't need any of those particular files. I've been able to move just about everything else off of that partition.

Sadly there is another 20GB partition to move before the entire disk can be swapped out. But we have a key suspect in the hangs. Sadly we still don't even know whether it's hardware or software at it's root, right? It's either bad sectors on the drive, the handling of which is causing Windows internals to completely barf, which really shouldn't happen, or an NTFS filesystem problem which Windows' own disk check failed to find.

Not looking good for Mr. Gates. Updates to follow.
Undertoad is offline   Reply With Quote
Old 03-21-2006, 05:01 PM   #18
Kitsune
still eats dirt
 
Join Date: Sep 2003
Location: Tampa, FL
Posts: 3,031
Quote:
Originally Posted by Undertoad
It's either bad sectors on the drive, the handling of which is causing Windows internals to completely barf, which really shouldn't happen, or an NTFS filesystem problem which Windows' own disk check failed to find.
Try giving XP a burned CD with some bad sectors on it. The result, when the sectors are accessed, is a panic and unexpected reboot. There is no doubt that Windows doesn't know how to handle bad data, but I'd expect some kind of error during the disk checks you did.

This reminds me of an issue that took me months to diagnose on some servers at work. Check out this IBM system hang from hell:

Quote:
The RSAII (Remote Supervisor AdapterII) has a internal timer which rolls over every 76.5 days.

Just prior to this event, the code gets into a mode whereby it sends multiple SMI (System Management Interupt) signals to the main processors, causing it to hang, or freeze.
No errors, no dumps, just a sudden, unexpected black screen. We had to let it happen twice before we figured out that it happened exactly every 76.5 days.
Kitsune is offline   Reply With Quote
Old 03-21-2006, 05:17 PM   #19
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Quote:
Originally Posted by Undertoad
OK, seems like a memory problem then. So I've taken the 1GB out of my other system, and replaced the 768MB currently in this one. A complete memory upgrade: more memory, faster memory, and identical sticks.

The problem still happens.
Your symptoms are consistent with voltage problems. BTW, intermittents created by electrolytic failure can occur before those capacitors start bulging. Motherboard monitor is not sufficient for monitoring voltages - only for monitoring for voltage changes. IOW motherboard monitor must be calibrated with a multimeter.

Meanwhile, what numbers are you using for 'good voltage'? What are you using (what program) for testing NTFS filesystem? Did you download the hardware diagnostic for that disk drive and execute only 'read only' tests? Hardware test is independent of the NTFS filesystem test - sometimes provided useful information.

One final point. Confirm that the BIOS setting still agree with what the drive actually is. I have seen where BIOS refused to see a drive properly - slowly destroyed disk data structures as NTFS kept fixing them.

What are you using for file transfers? Most copy programs have an option to ignore errors - to complete the file transfer.

Few hardware items that can hang an pre-emptive MT system. They include memory, CPU, only some functions in the peripheral interface, and the video controller. A disk drive with internal problems should not hang an MT system. It should only hang the task. A list of usual suspects.

Start with the volt meter. Don't use the motherboard monitor until after calibrated with that meter.
tw is offline   Reply With Quote
Old 03-21-2006, 05:18 PM   #20
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
Well I hadn't done a sector-level check yet; I wanted to just get the data off ASAP, which is now done (after yet another hang). It's doing the sector-level check right now.

Still don't even know 100% that it's the disk -- that's what's frustrating about diagnosis like this. But that's part of why I wrote it here - I figured, hey other people might enjoy the drama of watching someone else's guessing session.
Undertoad is offline   Reply With Quote
Old 03-21-2006, 05:27 PM   #21
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Quote:
Originally Posted by Undertoad
I figured, hey other people might enjoy the drama of watching someone else's guessing session.
A power supply that appears to be good without using the meter can cause numerous other components to appear defective. It is why such hardware analysis is best started by first confirming hardware starting with the one function that can subvert everything else - power supply. Again, what numbers are you using for voltages. This because specification numbers and measured numbers may not be same. What does that 12 volts dip to?
tw is offline   Reply With Quote
Old 03-21-2006, 05:55 PM   #22
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
I've now measured it with the multimeter.

Motherboard monitoring program says 12v+ is 12.41
Multimeter says 12.26

Motherboard monitoring program says 5v+ is 5.06
Multimeter says 5.11

The monitoring program says the 12v+ dips just slightly, to 12.35.

(Microsoft's) sector-level check now complete: checks out OK.
Undertoad is offline   Reply With Quote
Old 03-21-2006, 06:21 PM   #23
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Quote:
Originally Posted by Undertoad
I've now measured it with the multimeter.

Motherboard monitoring program says 12v+ is 12.41
Multimeter says 12.26

Motherboard monitoring program says 5v+ is 5.06
Multimeter says 5.11

The monitoring program says the 12v+ dips just slightly, to 12.35.
Those two voltages are high - and ok. But what is the 3.3 volt wire? That orange wire should measure above 3.23 volts. (Red is 5 volts; yellow is 12 volts). I assume red wire was measured. But voltage on purple wire is also significant (should be greater than 4.87). Also gray wire should remain well above 2.5 volts (gray wire typically would not be consistent with your symptoms but check it anyway). If all four (red, orange, yellow, and purple) measure OK, then move on to those other suspects.

Based upon those numbers, set alarm points for the voltage monitor to 11.86 or 11.9 (for 12 volt) and to 4.9 volts (for 5 volt).

Doubt it will provide any further information. But disk drive manufacturer's test program for that disk also could be used for seek tests, various multisector access tests, and other things that Microsoft program does not accomplish. Do this if short on ideas.
tw is offline   Reply With Quote
Old 03-21-2006, 09:39 PM   #24
busterb
NSABFD
 
Join Date: Jul 2004
Location: MS. usa
Posts: 3,908
My 2cents . A multimeter might, but my son and I had a bad disagreament about a power supply. The meter showed it good. But a PS tester said no go.
About 15 bucks. POWMAX atx power tester.
__________________
I've haven't left very deep footprints in the sands of time. But, boy I've left a bunch.
busterb is offline   Reply With Quote
Old 03-22-2006, 12:49 AM   #25
Elspode
When Do I Get Virtual Unreality?
 
Join Date: Dec 2002
Location: Raytown, Missouri
Posts: 12,719
I agree with Buster, here. I've had some weird PS shit happen, and a new cheap one installed fixed my problems.
__________________
"To those of you who are wearing ties, I think my dad would appreciate it if you took them off." - Robert Moog
Elspode is offline   Reply With Quote
Old 03-22-2006, 07:34 AM   #26
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Quote:
Originally Posted by busterb
My 2cents . A multimeter might, but my son and I had a bad disagreament about a power supply. The meter showed it good. But a PS tester said no go.
About 15 bucks. POWMAX atx power tester.
Tester cannot test for all that a meter does. A best power supply test is when fully under load - in the system. A power supply disconnected cannot be properly tested. Also note voltages - the numbers. Numbers are not the published ATX limits. I asked for numbers - not the subjective "power supply is good" - for this reason.

A best test of a power supply is to take numbers while multitasking is accessing every peripheral - disks, floppy, CD-Rom, network, sound card - simultaneously. Anything done by a power supply tester can be performed by the meter. Also are power supply defects that a tester cannot detect; but meter can. The power supply tester cannot test a power supply under full load - when many defects become apparent.

Then there is the rest of a power supply 'system'. It’s not just the power supply that must be tested. This also accomplished without disconnecting anything.

The down side of a meter is that these tricks must be understood. For example, what voltage would you have called 'good'?

Best way to test a power supply is when connected to system. Never start by disconnecting things until long after relevant facts have been collected. Power supply tester cannot do that. Just another reason why a meter finds problems or confirms power supply integrity so much faster. Unfortunately, too many declared a 'subject' good rather than provide those numbers. Those numbers - such as UT's numbers - tell us more about the system that has not been discussed. This is why those other voltage numbers (not yet provided) might be informative.
tw is offline   Reply With Quote
Old 03-22-2006, 07:44 AM   #27
Pie
Gone and done
 
Join Date: Sep 2001
Posts: 4,808
Also, check out your northbridge fan. I had similar intermittant failure on my fileserver -- turns out the nb fan was choking, and data was getting corrupted on the way to the drive... I lost two years' worth of email.

Oh, yeah -- now I back it all up three ways.
__________________
per·son \ˈpər-sən\ (noun) - an ephemeral collection of small, irrational decisions
The fun thing about evolution (and science in general) is that it happens whether you believe in it or not.
Pie is offline   Reply With Quote
Old 03-22-2006, 07:48 AM   #28
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Did it? For example, in a GM car, they kept replacing the computer. There was nothing wrong with the computer. But computer was replaced rather than first learn WHY failure was happening. Swapping was only temporarily cleaning a defective connector. Car would fail again later.

Same lessons are from Challenger. Management insisted that it was safe to launch because a shuttle safely launched one year previously. They ignored the near burn through of O rings in that one year previous flight. They did not want to know why. A perfect example of fixing things without first learning the whys. In that case, we should have called Challenger murder. Instead we destroyed the career of the engineer who told the truth to the Roger's commission. Instead too many insist they need not know why - if it appears to work.

In a third case, a GM shop foreman finally got tired of same GM model (Buick) with similar problems. So he broke open the computer. In each failure, the PC board was cracked in a corner. Regional rep then told him this is a known problem even though it was not in any service bulletin. Since the test facility was not informed of this problem, then vehicle computers tested OK and were shipped as repair parts. At GM, because reasons why were not important, then failure was acceptable.

Numerous examples that also explain why I see this so often with clone computer users. They get used to having failure as a norm. It is the difference between just swapping parts to fix something - curing symptoms - verses fixing something right the first time - learning why.
tw is offline   Reply With Quote
Old 03-22-2006, 07:54 AM   #29
tw
Read? I only know how to write.
 
Join Date: Jan 2001
Posts: 11,933
Quote:
Originally Posted by Pie
Also, check out your northbridge fan. I had similar intermittant failure on my fileserver -- turns out the nb fan was choking, and data was getting corrupted on the way to the drive... I lost two years' worth of email.
If Northbridge needs a fan, then the Northbridge was defect. A computer must work just fine in a 100 degree room or have parts heated by a hairdryer on high. Heat is a diagnostic tool. Unfortunately, too many fix a defect by curing symptoms. They don't learn the whys. They simply install more fans.

That is the Home Improvement joke. Fix things with "more power". If a fan is required on that Northbridge, then the Northbridge IC is defective. One chassis fan is more than sufficient cooling for most every computer. That one chassis fan will provide sufficient airflow over any Northbridge.

But again, that Northbridge must work just fine when so warm as to be uncomfortable to touch. Learned this of the old timers who used to say in the 1960 - if it does not leave skin, then it is not too hot. Today, our IC must run normal at even higher temperatures.

A Northbridge fan suggests Northbridge IC is defective. Someone cured the symptom rather than fix the problem. Heat is a diagnostic tool.
tw is offline   Reply With Quote
Old 03-22-2006, 08:01 AM   #30
Undertoad
Radical Centrist
 
Join Date: Jan 2001
Location: Cottage of Prussia
Posts: 31,423
So far so good - the new drive is in, partitioned, formatted and is now getting all the data from the old drive, and -- no hangs in hours.

The outgoing drive is an IBM Deskstar 60GB - not considered the most reliable of drives. It has a manufacture date of Jun 2001 so it has seen enough duty and can be retired.

That N-bridge fan, Pie, is a pet peeve of mine in motherboard designs. I can't believe manufacturers decided the best way to handle that problem was to put a dinky, weak fan right where all the dust in the system will flow and clog it up. This MB has a heat sink there, much better idea.
Undertoad is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -5. The time now is 03:17 PM.


Powered by: vBulletin Version 3.8.1
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.