The game crash telemetry was interesting; a lot of systems were in the crash database but the crash rate per unit time of play is not straightforward to estimate with the way game crashes are typically logged.
[...]
A Better Approach – Datacenter Usage
Unhappy with what we found analyzing game crash databases, I decided we needed a new approach.
It would be better to control the system population better, and and the configuration of machines experiencing issues.
These CPUs can also be leased inside a datacenter for game servers and tasks that run well with high single core clock speeds. This typically means that you get error correcting memory and a different chipset motherboard – W680. This is the ideal data source for further analysis.
W680 is potentially a huge help in isolating a voltage and clock problem here because W680 is much more conservative in terms of clocks and watts.
Do we still see issues with W680?
Yes. In a test population of more than 210 W680-based systems, 47.1% of these systems experience at least one incident of instability over a 168 hour test window. This distribution is the same to within 0.4% between Asus brand W680 and Supermicro W680 based boards.
One datacenter technician told us they no longer offer for sale 13th gen CPUs, and they had replaced 13th gen with 14th gen CPUs for customers experiencing issues.
If this were just an eTVB issue, one would think that W680 would be immune, or at least, have a lower rate of crashing.