Aces High Bulletin Board

General Forums => Hardware and Software => Topic started by: artik on November 20, 2020, 04:14:57 PM

Title: Debugging silent reboot HW Issues
Post by: artik on November 20, 2020, 04:14:57 PM
I have a PC in following configuration:

- Power Supply Corsair 650W
- Intel i5-6600
- 16GB DDDR-3
- GPU: Radeon rx 560 / 16CU or GTX 960 (at some period I had both installed)

The PC crashes once in a while, frequently it happens when I load a GPU with some computations - much more frequently but can reboot just without a notice.

It happens both with NVidia GTX 960 (much more frequently - has PCI power connected) and with MSI Radeon rx 560 that gets all its power from MB.

Now once I removed both cards and used internal Intel GPU for two weeks I had no issues, when I put GTX 960 in crashed frequently, when I replaced it with rx 560 it can work for many days go through games but once in a while crashes as well.

How can I debug the problem:

1. It isn't reproducible consistently so I can't put another power supply for tests and return it
2. I don't think it is GPU since two different GPUs from different vendors have exactly the same issues
3. Memory checks OK.

What can it be and how can I figure out the faulty part?
Title: Re: Debugging silent reboot HW Issues
Post by: TyFoo on November 20, 2020, 10:39:34 PM
How old is your PC?

Are all of the fans operating? GPU, Chipset Fans if any, Power Supply?

Can you see what temperature the Processor is/ has been running at?



Ty
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 20, 2020, 10:46:49 PM
How old is your PC?

Are all of the fans operating? GPU, Chipset Fans if any, Power Supply?

Can you see what temperature the Processor is/ has been running at?



Ty

PC is 4.5 years old, all fans operating, no fan for chipset on MB, PS fan operational as well, No overheating - I watched this closely temperatures are low when it happens for both CPU and GPU.
I checked voltages in BIOS they are withing specs
Title: Re: Debugging silent reboot HW Issues
Post by: TyFoo on November 21, 2020, 12:15:08 AM
You more or less eliminated my troubleshooting list. . . . lol

If you can run onboard Graphics without issue, I would think that you have an issue with either the PCI socket, where the socket attaches to the MB or somewhere downstream on the MB itself.

I don't think I am saying anything you don't already know, but if you are creating demand on the GPU when it crashes - it sounds like something is heating up disrupting flow. If it isn't the PCI socket, then it has to be the MB. The outlier would be two bad GPU's - while probable - its unlikely.
Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on November 21, 2020, 01:03:21 AM
I'd say you've pretty much nailed it to the Power Supply as it happens most often with the most power hungry GPU. The PSU may not be faulty, it may just be underpowered for the task. Without knowing the brand and model it's hard to tell whether it's a known poor firecracker or a quality unit.
Title: Re: Debugging silent reboot HW Issues
Post by: zack1234 on November 21, 2020, 03:31:47 AM
My corsair psu blew they sent be a new one
Bizman is awesome as well
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 22, 2020, 02:15:37 AM
I'd say you've pretty much nailed it to the Power Supply as it happens most often with the most power hungry GPU. The PSU may not be faulty, it may just be underpowered for the task. Without knowing the brand and model it's hard to tell whether it's a known poor firecracker or a quality unit.

Actually the GPU that is now inside MSI Radeon rx 560/16CU/4GB is under 75W and takes all its power from MB. Another GPU that I tested Gigabyte GTX 960 4GB OC crashes as well even more easily has 6+8 bit connectors (120W TDP) while my PSU is 650W and half a year ago handled them both (960 + 560) - I had dual GPU for development

So single GPU isn't that hungry
Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on November 22, 2020, 03:11:24 AM
That's exactly why I believe the PSU is the culprit. Correct me if I'm wrong in the following:

To me that seems that the 12V line has fried. No matter how new your PSU is it can be a lemon. Corsairs have been made at least by Channel Well, Chicony, Flextronics and Seasonic and the quality can vary even within the same series.

A few years ago a friend had a 1½ years old computer with similar issues to yours - can't remember if the PSU was a Corsair or maybe a Chieftec but it started with a C. Anyway, as I studied the reviews to find if the PSU was a known poor choice I learned that from that series the lowest (450) and highest (850) wattage versions were built by a higher tier maker than the two middle versions (550 and 650). The reviewers wondered why the mid powered versions had cheaper (both price and quality) capacitors than the other two. When we tried to get a new one through warranty that very model was no more available, it was replaced by a model with a letter added to the name...
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 22, 2020, 05:13:44 AM
That's exactly why I believe the PSU is the culprit. Correct me if I'm wrong in the following:
  • The crashes happen more easily with the externally powered GTX 960 - 12 V, TDP 120W + 65W for PSU
  • Less crashes happen with the motherboard powered Radeon - 12 V, TDP <75W + 65W for PSU
  • NO crashes happened with the CPU integrated Intel graphics - 0.55 V-1.52 V, TDP 65W

To me that seems that the 12V line has fried. No matter how new your PSU is it can be a lemon. Corsairs have been made at least by Channel Well, Chicony, Flextronics and Seasonic and the quality can vary even within the same series.

A few years ago a friend had a 1½ years old computer with similar issues to yours - can't remember if the PSU was a Corsair or maybe a Chieftec but it started with a C. Anyway, as I studied the reviews to find if the PSU was a known poor choice I learned that from that series the lowest (450) and highest (850) wattage versions were built by a higher tier maker than the two middle versions (550 and 650). The reviewers wondered why the mid powered versions had cheaper (both price and quality) capacitors than the other two. When we tried to get a new one through warranty that very model was no more available, it was replaced by a model with a letter added to the name...

when you say it this way... it is very-very logical... :-)


Title: Re: Debugging silent reboot HW Issues
Post by: Denniss on November 22, 2020, 10:20:48 AM
What's the exact model number of your power supply and how old is it?
If its a modular one check the connectors (cable and PSU side) for anything looking anormal like brown/melted spots
Title: Re: Debugging silent reboot HW Issues
Post by: Shuffler on November 22, 2020, 11:48:03 PM
It does point to a bad rail in the psu... if the cables are good.
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 23, 2020, 01:54:38 AM
This PSU: https://www.corsair.com/us/en/Categories/Products/Power-Supply-Units/Power-Supply-Units-General-Purpose/CV-Series%E2%84%A2/p/CP-9020211-NA

Corsair VS650 non-modular

Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on November 23, 2020, 04:05:12 AM
There's not too many reviews available other than those on the online shops. I found a couple in Spanish and some threads about it. None of those said that it's a firecracker but as Corsair say on their website, it's "ideal for powering your new home or office PC".

I found out a few things: It's made by HEC who are a decent manufacturer building what the branded customer wants. The capacitors are made by Teapo which rhymes with cheapo for a reason. There was some comments about the rails in the Spanish reviews but Google Translator wasn't too exact. All in all, it's a cheap product seemingly intended for light use. The Bronze certification is a sign of that as well.

 
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 23, 2020, 04:11:40 AM
what are recommended brands/price-range, I thought corsair should be good brand
Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on November 23, 2020, 04:30:46 AM
Seasonic is a safe bet. They both design and build their products and give them a 10 year warranty. As I said in a previous post, they've also built PSU's for Corsair. Unfortunately the Who's Who list hasn't been updated since 2013 so reviews and PSU/tech forums are the only source for reliable information.

As a rule of thumb independently from the brand, Gold, Platinum and Titanium rated models should be of better quality as the higher efficiency rate requires a more thought of design. Corsair is a good brand but they have several series some of which are intended for low budget markets: https://www.tomshardware.com/reviews/best-psus,4229.html
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 23, 2020, 06:43:15 AM
Thanks a lot...

Last question. I run power supply calculator:

https://outervision.com/b/K7rQtn I got 465 W
And another version for future 960 replacement: https://outervision.com/b/V0owN5 and got 478 W

So I'm going to get 550W PSU...is it enoght:
https://rog.asus.com/power-supply-units/rog-strix/rog-strix-550g-model/
80+ Gold 10 years warrenty

Or is it better to aim higher for 650W? like this one:
https://rog.asus.com/Power-Supply-Units/ROG-Strix/ROG-STRIX-650G-Model/
80+ Gold 10 years warrenty

Also I saw: Seasonic CORE GC 80+ Gold 550W
7 years warrenty seems to be ok for me but costs 2/3 of Asus's ones - how is this one?

Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on November 23, 2020, 08:44:41 AM
Am I reading the PSU calculator reports right, do you really have both AMD and Nvidia video cards in the same computer? If so, why?

I'd recommend some 20% headroom for a)aging and b)future needs. That said 550W seems like a bit on the low side. Sufficient right now but what if you get a 970 instead of a 960? As 600W can be hard to find, a 650W version would be the best choice.

Did you know that the Asus ROG PSU's are made by Seasonic, based on their Focus platform? A Seasonic Focus Gold can be significantly cheaper, at least here. There's a Seasonic Core 650W as well.
Title: Re: Debugging silent reboot HW Issues
Post by: artik on November 23, 2020, 09:34:31 AM
Am I reading the PSU calculator reports right, do you really have both AMD and Nvidia video cards in the same computer? If so, why?

Development, I do some GPU related open source projects based on OpenCL, CUDA etc.: machine learning, AI, neural networks etc. so I need to have multiple GPU vendors for testing/development.

So my rig has Intel, AMD and NVidia GPUs.

I may be replacing NVidias 960 by some lower end RDNA card in future like rx 5500xt - or other once lower end RDNA2 will be released. I have other rigs with NVidia GPUs so 960 is less critical but would be much nicer to run them all together.

In general I don't fire them all. Also in the past I did it and had loaded all 3 GPU's at once.

My gaming GPU is actually AMD's rx 560 - also it is marginally weaker that GTX 960 - since for this specific card I need to have direct PCI-E lines to CPU while NVidia's card works via chipset over PCI-E x4.
Title: Re: Debugging silent reboot HW Issues
Post by: artik on December 06, 2020, 01:40:49 AM
Thanks to all. It was indeed PSU.

I borrowed a 500W PSU and tested it. No issues while loading 100% CPU, %100 Intel GPU, 100% AMD rx560 and 100% gtx 960 for a while all together.

I ordered Corsair TX750M 80+ Gold Semi-Modular 750W PSU, waiting for it to be delivered. I took with some extra power in case I want to replace my rx560 with a powerful GPU.
Title: Re: Debugging silent reboot HW Issues
Post by: Bizman on December 06, 2020, 02:17:56 AM
Glad you got it sorted!

Your new PSU also seems like a very good one based on the few reviews I quickly glimpsed. Although it has been on the market for a couple of years already, there was no 'dead after a year' threads on the top of the search which is also a good sign. "the Corsair TX-M series incorporate 100% Japanese 105c Capacitors" sounds better than "a variety of the cheapest 80c caps". For more read https://www.jonnyguru.com/blog/2017/08/07/corsair-tx750m-2017-750w-power-supply/ (https://www.jonnyguru.com/blog/2017/08/07/corsair-tx750m-2017-750w-power-supply/) hoping that's the same you got.