OC Stability testing in easy steps (UPDATE 08/08/2012)
This basic guide has been initially written with older 65/45nm generation dual and quad cores in mind. My recent upgrade to
Z68 FTW Z77 FTW platform allowed me to introduce some more remarks (in blue) for modern intel 5 and 6 series boards and Sandy/Ivy bridge chips and 600 series video cards (linked an excellent OCN article on 670 overclocking). Step 0. CPU temperature monitoring When stress testing your system you should be closely monitoring temps, for this purpose I use my favorite Real Temps with appropriate TJunctionMax settings for CPU to monitor max temps during test on all cores. Best thing about it is giving you max temps on every core as well as actual temps, recorded maximum and minimum temps (with time when they occurred).
C2Q Q9xxx use TJunctionMax=100
i7-2600K uses TJunctionMax=98
i7-3770K uses TJunctionmax=105
More of TJMax values for legacy chips here: http://www.evga.com/forum...p?m=645260&mpage=1 You can get the latest RealTemp always from here:
45nm quad core chips are worst in terms of reporting proper temperatures via DTS, sticking sensors, varying slope between cores etc.
Core 2 Dual and Quad core chips (both 45 or 65nm) in general have a tendency of misreporting (or sticking) thermal sensors - DTS were designed to work correctly close to shutdown temps ~100C and not to indicate correct temps at idle. Check this link for complete RealTemp documentation and testing methodology: http://www.techpowerup.com/realtemp/docs.php
Please refer to CPU Cool Down test (RealTemp feature) to see how sensors are reacting in different temps generated by Prime95 load.
Above link also covers paragraph called Calibration of sensors and is based on a method of running CPU @ 1.1V with multi 6x at standard (266MHz for 65nm chips and 333MHz for 45nm) FSB speed and comparing reported temps with ambient room temperature, adjusted by offset generated by used thermal solution. Such procedure will give you a good insight on how accurate your CPU thermal sensors are.
For stability testing you need to use software designed for this purpose!
Step 1 Stability of CPU only P67/Z68 users please refer to JacobF's thread on initial CPU OC'ing of those i5/i7 setups, will be much easier instead of reading below information which was initially written for older FSB setups: http://www.evga.com/forums/tm.aspx?m=1033200 Z77 users please refer to JacobF's thread on initial CPU OC'ing based on 3770K: http://www.evga.com/forums/tm.aspx?m=1611217
and refer to this thread to fix the multiplier downclocking under load (not the thermal throttling related!): http://www.evga.com/forum...?m=1645066&mpage=1 P67/Z68/Z77 + 1155 SB/IB maximum "considered safe" overvoltages: * CPU/vcore - 1.35v-1.4v (the lower, the better), 1.520V absolute max!!! CPU PLL - 1.9v VCCIO - 1.25v (it is still unknown what this limit is on Ivy Bridge) - this voltage is considered same as memory controller (old SPP) VCCSA - 1.3v (default should be between 0.9v-1v, leave it alone, unless you're going for a BCLK overclock) - this voltage is considered old FSB/vtt regarding all else except memory controller DRAM/vdimm - aim for factory spec (1.5v or 1.65v), if not, +10% should be okay. *source: cannot find it now ...
When initially overclocking the CPU only, it's wise to leave the memory at factory voltage, timings and speed (unlinked or linked with ratio making it run at or below the rated speed).
This way you can be sure you are actually testing the stability of CPU only and no other stuff plays a role here.
Once CPU is stable, then link/sync the memory speed (and adjust timings and voltage if needed) to match the CPU FSB speed or leave it linked with a ratio (e.g. 5:4) making it run lower than CPU.
In order to get some head room for initial CPU overclocking, it's advised to up the FSB voltage from stock 1.1V to last very safe (green in 790 BIOS) value of 1.30V - don't worry, you can optimize it later, it won't hurt anything and this way if can be sure that FSB voltage (VTT) is not limiting the stability of your CPU/MEM over clock. P67/Z68/Z77 users, when initially OC'ing the CPU only please leave the memory running Automatic settings of 1333MHz with default (loose) timings, sort of a running a fail-safe mode (based on JEDEC profile from SPD) and bump up the vccio to 1.25V. that is all to rule out potential sources of instability while working on CPU OC.
Also make sure you keep the Memory Controller residing in the North Bridge (called SPP in nvidia BIOS) getting enough volts to allow CPU run flawlessly during overclocking. Do not be afraid to keep it at max green level during initial CPU overclocking, this way you won't have to worry about it initially, and can lower it (optimize) later if needed ... or up it higher when needed for extra SLI stability (e.g. watching full screen HD movies, playing games, etc.) and clocks higher than 1800MHz (450MHz FSB). NOTE for nvidia 700 chipset series users: NEVER allow memory to run faster FSB speed than CPU does, even though the board allows it!!! This puts extra strain on memory controller and system will eventually loose stability and crash, likely causing also a massive data corruption on system partition! NOTE for intel's 5 and 6 series board users: Newer Intel chipset boards (P67/Z68/Z77) can handle memory running at faster FSB/BCLK speeds thanks to multipliers and ratios that get set automatically to avoid problems, so no real need to worry about memory running faster than CPU on such setups. however to rule out unstable memory at this point, please set the Memory Profile to Default 1066/1333MHz with automatic timings and voltage (it sets the failsafe JEDEC profile running loose timings at lowest possible RAM voltage and slowest base speed). Also all of i5/i7 CPUs have the memory controller integrated on the CPU die and that voltage is called now VCCIO (vs SPP aka North Bridge on former boards). Do NOT overvolt it too high, because instead of RMA'ing the board you will be shopping for a new CPU eventually. x58/P55 boards require a special max voltage "distance" of 0.5v between the vdimm and vccio (and affecting also vccsa if I remember right) which cannot get exceeded.
Mersenne Prime95 Website (no need to join GIMPS project): http://www.mersenne.org/ Download the latest available Prime95 x64 Windows version (links pointing to 27.7): Windows XP/Vista/7 64-bit --> ftp://mersenne.org/gimps/p95v277.win64.zip ... and run a test called Small FFT for at least 30 mins or 1 hr (the longer the better), if you observe no cores failing then your CPU voltage is OK. quick runs when increasing the CPU OC should last no less than 30 mins though if you want any kind of useful stability check. If cores (related to threads in prime, core0=thread#1, ..., core3=thread#4, etc.) are failing increase VCORE one or two steps (P67/Z68/Z77: 0.25v increment is recommended as quick & simple initially, use 0.05v increment for fine tuning) at a time and re-run this test until you find a sweet spot for you.
If you do this one step at a time, you will find an optimal sweet spot eventually, but it takes longer time to reach initial stability this way. I usually increase the vcore 3 notches at a time until it is stable and optimize later (by lowering the vcore one step at a time) when I have more spare time.
If during this (or any other) stress test you experience a BSOD with STOP error 0x80000124 (or something with 124 or 101 on end), it is 99% sure that vcore of your CPU is set too low and need to be upped by at least 1-3 notches to get fully stable.
Step 2 General system stability
including heavy stress on PSU, CPU, FSB/BCLK and memory as well: Since modern "K" series of unlocked intel i5/i7 CPUs are mostly overclocked by increasing the multiplier (and vcore), this step is completely irrelevant for P67/Z68/Z77 users unless you are aiming at BCLK overclock (don't expect more than 5-7% overclock on Z68 though). Increase VCCSA voltage as you increase BCLK frequency. a) same Prime95, but run Large FFT this time
, if you experience no hard locks or failing threads for at least 1-2 hours (the longer the better) then it should be fine. if something goes wrong, you may want to play around with other voltages (FSB, SPP, DRAM) or other settings (memory timings, toggle P1/P2, etc.).
Rounding errors in Prime can be related to too low SPP/VCCIO voltage (memory controller errors), up it by 1 notch.
LargeFFT is extremely useful for testing FSB/BCLK stability at known stable other voltages. If FSB/VCCSA voltage is too low for a given CPU+MEM speed then the system will likely restart. Increase the VTT (CPU FSB voltage) in BIOS by 1 notch if that happens. NOTE for 45nm Quad Core CPU users: DO NOT use more than 1.35V on FSB/VTT if you do not know what you are doing!!! Running 1.40V VTT is an easy way to kill the 45nm CPU in a long run (could be OK for short runs like extreme benching). Running 1.45V VTT is a FAST way to kill the 45nm CPU or start the degradation chain reaction, even in a short run.
More info on optimizing FSB/VTT voltage by GTLVREF tuning for 790 boards can be found here: GTLVREF tuning and calculations for 790
and here for 780: http://www.evga.com/forum...9&mpage=1&key=
b) OCCT Perestroika (latest version 4.3.1 - now with AVX extensions for IB/SB CPUs) http://www.ocbase.com/index.php/download
Run CPU:OCCT 1hr test, if it says System Stable it's good enough for some, however where OCCT found stability, the Prime95 blend run for 24 hrs or LinX 0.6.4 (5 quick rounds) may prove otherwise.
I consider OCCT as a mild stress tester (and use it only for high overclock testing where excessive temps are a problem under LinX, not a final confirmation though), so if you require more solid stability, especially if want to use such OC 24/7 (like me) - Proceed to the Final Stability then.
Step 3 Final Stability confirmation
NOTE for Sandy Bridge CPUs and high clocks over 47 multi. Most motherboards (e.g. my Z68 FTW running without vdroop) tends to slightly undervolt the CPU at idle, so even though your CPU is fully stable under load and takes whatever you throw at it, it may still BSOD (STOP error 124) occasionally after long idle. to solve this problem you can do 2 simple things: a) raise vcore which will increase idle voltage slightly (and overvolt the under load vcore at same time resulting in a little higher CPU temps under load) b) enable C1E and EIST in BIOS, so the CPU drops multi to 16x at idle while vcore stays fixed as it was (in manual mode that is) a) Run Prime95 Blend Test overnight (at least 8 hrs, some make it 12 or even 24 hrs!), if nothing goes wrong you may consider your system stable as rock. In case you find your system hard-locked or re-booted, you need to enter BIOS and adjust something (hard locks could be related to high temps inside case, unstable memory timings/speed or CPU stability problems on FSB/BCLK usually), then re-run it and find out what happens (BSOD, hardlock, etc.). Some most likely will need to increase voltages like SPP/VCCIO (undervolting this one is often cause of instability and do not be afraid to use red values of this setting in BIOS if overclocking high), MCP (only if running tri-SLI, other than that leave it on Auto or 1.50V or use 1.55V if running tons of external devices like I do), DRAM settings (loosen timings, increase voltage, change speed, etc.) or again play with other settings.
If you can't figure out what caused the lock or reboot then start backing off the clocks/speeds to safe(r) levels one by one and re-testing, eventually you will find where the culprit was. FSB voltage is tricky, not always higher means better, because higher one generates more noise on FSB thus causing CPU and/or memory to "get lost" in such noise during communications with other components (Components talk to each other by signalling each other via FSB bus which interconnects CPU, memory and video cards). In some cases decreasing voltage of FSB/VTT is a key to success, but likely will require GTLVREF tuning to achieve it in a stable manner.
b) Intel Burn Test (latest version 2.54) updated with AVX instructions
Use it with care and wisely. Never leave it running unattended when you are unsure of how high the temps might go. watch it running for 1 or 2 passes.
I usually do quick 5 rounds in Max Stress mode to confirm so so system stability (e.g. after adjusting one of voltages in BIOS and looking for a quick re-test).
However, if you need a solid stability confirmation, running it for 10-20 rounds in Max Stress mode is what you want to do. It takes time though, because length of 1 round depends on memory amount used (which changes with grade of test, Maximum Stress mode takes most available memory for testing).
If I remember correctly when running 4GB and Q9450 it was taking about 1 hour to complete 20 rounds in Max stress mode (using around 2.5GB of available RAM).
While when using 8GB RAM on same setup it was taking around 1 hours to complete 10 rounds in Max Stress mode (using around 6.5GB of available RAM).
Do not pay much attention to detected CPU speed (GFlops) and round calculation time as those values depend on used linpack libraries or tests (from library) used by stress tester, they may change each time you start the test even with same parameters.
c) LinX 0.6.4 with AVX extensions (my personal favorite)
The freshest version download (thanks to Stasio): http://www.mediafire.com/?0vlniujny4i8sjo
Official thread of Dua|ist (outdated download links): http://www.xtremesystems....mple-Linpack-interface You can use it for quick testing of stability between CPU OC increments, e.g. 3-5 runs at half/quarter the memory available to see how far you can go with multipliers (or BCLK). I usually do 5 quick rounds of LinX at 1GB memory setting to quickly test CPU stability as I keep increasing vcore and multiplier in a search for optimal base for fine tuning. Very good for testing the ultimate stability of CPU/MEM/chipset in a time much shorter than Prime95/64. Generally I'd say that 24hrs of Prime equal to around 3-4 hours of LinX. Ultimately I usually use 20 rounds (runs) in ALL MEMORY mode to test final stability of a daily 24/7 overclock (not really needed for benchmarking only clocks) - PLEASE MONITOR YOUR CPU TEMPERATURES for at least 1-2 runs to see how far they get! Sandy Bridge chips are suggested to stay below 85C during this test (or below 75C under Prime95). NOTE that it takes around 3.5-4 hours to complete 20 runs with ALL memory used when you have 16GB installed on Z68+2600K setup.
d) running folding clients
on CPU (SMP for all cores) and all GPUs at same time for at least 8-12 hours will confirm stability too
the easiest to use folding software, e.g. an excellent Folding@Home V7 Client (beta) https://fah-web.stanford....lient/wiki/BetaRelease !!!IMPORTANT WHEA NOTICE TO IVY BRIDGE OWNERS!!! ALWAYS after you complete a run of stress tester (successfully), look into "Computer Management snap-in - Event Logs - Custom - Administrative events" and check for WHEA warnings! those are recoverable errors detected by chip itself (and corrected, so the stress tester never knew the problem happened) and logged in Windows Event log. To keep it simple, if you see WHEA warnings then your OC is NOT stable and you most likely need to up the vcore a bit (try +5mV) or PLL or else. Also it means the chip is wasting some of its performance on correcting itself (doing the same job twice). Step 4 GPU overclocking and stability testing Recently I used below linked guide to overclock my 600 series card, it's an excellent write up from OCN, HIGHLY RECOMMENDED! Some awesome tools for GPU overclocking and final stability testing: a) for setting clocks, controlling fans, and much more: EVGA Precision X http://www.evga.com/precision/ or RIVAtuner: http://www.guru3d.com/index.php?page=rivatuner or MSI AfterBurner: http://event.msi.com/vga/afterburner/overview.htm b) for monitoring: GPU-Z (sensors tab is very useful with sensors reporting MAX values) http://www.techpowerup.com/gpuz/ c) for GPU stability testing: EVGA OC Scanner (can be used from within the Precision tool) http://www.evga.com/ocscanner/ Furmark - for GPU stability testing http://www.ozone3d.net/benchmarks/fur/ or OCCT (linked earlier, use OCCT:GPU test) - for GPU stability testing, said to be more stressful than Furmark Always choose Error Checking mode if you want to know if ANY errors happened during test. Always choose highest settings (resolution, AA, etc.) to test it once and right. Always use Full Screen mode for testing SLI stability. d) benchmarks (can be used for initial GPU stability testing): Unigine Heaven 3.0 New Dawn 3DMark11 3DMark Vantage NOTE that modern nvidia cards (400/500/600 series) come with shaders locked at max value (2x related to current core frequency setting), so you only adjust core and/or memory on them. You may need to use EVGA Voltage Tuner to increase gpu voltage to achieve higher GPU clocks, e.g. max 1.1v on 570/580 series gpus should be ok for max safe setting. It also came to my attention that 600 series cards come with a locked voltage control, instead they give the Target Clock/Power adjustment which takes care of automatic voltage and clock control in the background. Always test single cards first to know their limits before you join them in SLI for a common (for the 2 or 3 cards) overclock! This way you will know exactly what to expect from a joined setup and will be able to troubleshoot easier if some MB voltages are limiting your setup. How to do it right with old GTX 200 series cards? It seems that the only proper way of overclocking GPU is to start with:
0) Leaving all GPU card clocks at factory levels, unlinking Core from Shader and then
1) pushing shaders up first and testing GPU stability until you find the highest sweet spot which survives the stress test
2) then push GPU core clocks (maximum value is limited by 1/2 of shader value) and test the GPU stability until you find the highest sweet spot which survives the stress test
3) as a last step push memory per table (just match it with level from table below) or until it hardlocks your system on stress testing or in games and then revert to the last stable value and repeat stress test.
For steps 1-3 use either EVGA OC Scanner or Furmark or OCCT:GPU test
For step 3 you can additionally use a feature of OCCT and test GPU memory stability alone.
When pushing shaders you have to keep in mind that they actually run speeds based on specific increments, so whatever you set in RivaTuner/Precision may not be what you think it is in real (refer to actual clocks). EVGA Precision or GPU-Z should report correct current clocks, no matter what you set in your GPU overclocking software.
Below table was produced (by unknown to me author, found it in 200 series cards forum thanks to Ikeyes) to gather common GTX 260/280 clocks and their real working values, use it as a reference:
At some point of pushing those clocks you will encounter trouble when stressing your GPU, here are most common explanations of errors (on top of overheating):
If your system hard-locks, and requires a reboot then your memory is too high.
If your system is showing weird artifacts like flashing colors and intermittent blank screens and/or the driver crashes, then your shaders are too high.
If only the nvidia driver crashes without any visual anomalies, then your core is too high.
and most of all Good luck!
There are others stress testers not covered here like Orthos (didn't use it personally; Orthos is a remake of older Prime95 version with better GUI) or SuperPI (mostly for testing memory only under windows OS) and MemTest86+ (OS independent memory testing).
CPUID CPU-Z (gathers all info on current FSB/RAM clocks, MB details, actual CPU voltage, model numbers, etc.) You can also use it for validating your CPU overclocks. to get the latest, simply click the latest version number in left top corner of below website, under Download Latest version:
HardWare Monitor (not best for monitoring temps on CPU as it has hardcoded and not necessarily correct TJMax values; should be OK for reporting other system temps, fan speeds, etc.):
GPU-Z (graphic card clocks, model, BIOS information)
AIDA64 Extreme Edition (trial) - great piece of software that is configurable and adjustable, used for monitoring and gathering info of all components of your system, unfortunately not all features are available in trial mode.
Super PI (more of a benchmark than a stress tester nowadays):
No troubleshooting in this thread. Please start a new one and post your problems over there. Suggestions always welcome. Thanks.
<message edited by feniks on Wednesday, August 08, 2012 7:49 AM>