HIS Radeon HD 7970 IceQ X2 Graphics Card Review

Product: HIS Radeon HD 7970 IceQ X2
Company: HIS
Author: James Prior
Editor: Charles Oliver
Date: January 16th, 2013

Stuttering Rage

TL;DR - No more simple FRAPS min/avg/max, now min/perceived average/max fps plus 99%, 99.9%, mode frame rendertimes and a percent time inside smoothness boundary. Read it.

The Problem

The traditional frames per second metric we use show minimum and average frame rates, but doesn't show smoothness - how much stuttering or hitching the game experiences - unless the frame rendering times vary wildly for a significant length of time during the benchmark, long enough to reduce the average in comparison to another system, you can't tell if it's buttery smooth or an unplayable mess. Combining basic FPS and but swapping out average FPS for frame render time variation gives us a bigger picture. [H]ardOCP changed to factoring smoothness a while ago, noting feel and showing highest playable settings but an analytical metric approach was pioneered by Scott Wasson at the Tech Report. In place of more traditional editorial commentary by the reviewer simply indicating if stutter or hitching was present and affecting gameplay, there's an argument his metrics are a natural extension of quantifying what [H]ardOCP have been doing.

The Tool

It's important to note how FRAPS works to give us frame render times. FRAPS captures the time elapsed from a specific point in the engine loop and measures the time elapsed from one point to the next, with the aim being the time between loops is frame render time; the loop point is the Present() call, FRAPS hooks into the application with its own DLL and piggybacks of the Present() call to DirectX; amongst other things it's adding it's counter to the screen before calling Present() for real, hence why FRAPS screenshots don't capture some postprocessing effects like Morphological Anti-Aliasing. Due to the use of the same point in the loop each time, the fraps frame rendertime might not be the time of exact frame but it's consistently measuring the interval time - close enough, right? Well, maybe not, as we now live in a world of command lists and frame buffering where you can batch work on the GPU, leading to a long frame followed by several short ones which don't match how long the GPU took to render the frames. As the previous frame was being worked on, a large load up was created and this inserts a stall in the frame render time that might not be seen by the player - the FRAPS frame render time was finished but the GPU wasn't. The CPU thread queues up the data and instructions for the next few frames which takes a little longer, but the GPU is still chewing through the last frame. Somewhere in the interval between Present() calls the GPU ejaculates a mind melting spooge of tessellated, shadowed, and shaded glory onto your screen and waits for the next stroke from the CPU. This time there's a quick triple pump of instructions to go with the data, frame buffering is enabled possibly independently of the game engine's own settings, and the GPU goes off and processes these - but not at the same speed as the Present() calls were fed into the device, we've now gone asynchronous and GPU bound. The net effect is much less variation in timing in GPU output than the frametimes possibly indicate ... the trick is identifying when this condition is in play, especially when game engines themselves try to detect this change in throughput and attempt to resynchronize. Our performance results are going to be based on FRAPS recordings of gameplay, giving us several metrics in our results to try and determine what just happened to our screen.

The Analysis

First is the traditional and basic minimum and maximum frame rate, expressed as frames per second (FPS). Replacing the traditional average, we're going to insert a value known as apparent framerate, taken from a public domain tool, catchily named microstutter program. This application uses the FRAPS framerate file as input and reports two things, average FPS and local framerate variation, which appears to be an average of the absolute difference between each frame time and the globla mean average. The variation is used as a percentile modifier for average FPS to give an apparent framerate, and it's this value we're using in place of the traditional average FPS. These three calculated values give us our broadest outline of performance and can be used in a simpler higher-is-better comparison.

Next we're going to report some raw data, the fastest, mode and slowest frame render times, in milliseconds. Mode is different from computed average, this is an actual value of the most common frame render time in the data set. These values are all hidden in the traditional minimum, average and maximum FPS values as those results are all calculated over a second.

Processing the frametimes data to get 99% and 99.9% times is next, which is based on TechReport's methodology and most recently discussed here and here; Rage3D's games editor, Sean Ridgeley, used the FRAFS benchviewer tool, and the microstutter tool, in his evaluation of the Mechwarrior Online beta. The 99% time is the value at which nearly all the frames were rendered faster than and is thus can be the effective minimum FPS (when converted to iterations per second); the 99.9% time is the frame render time that all but 0.1% of frames are faster than. A large difference between 99% and 99.9% indicates that the lows are isolated, outlier conditions and might be experienced for only a small amount of time during the test. The smaller the variation between 99% time and apparent average, fewer large varations in frame render time were experienced.

Next we're going to do some of our own processing, using the mode time and a boundary of +/-20% to establish a 'smoothness band'. We determine how much time is spent outside that band by counting the number of frames that took longer to render than the mode render time +20%, and adding the number of frames that rendered faster than the mode render time -20%. This gives us the total number of frames beyond mode +/- 20% and thus gives us the % time spent inside the smoothness 'pocket'. +/-20% is a value we've determined is a useful boundary for normal framerate variation inside games, and we may adjust either offset in the future as we continue to investigate. At common mode frame render times the 'smoothness pockets' look like this:

FPS Slow limit Mode Avg Fast Limit
  36 45 54
  48 60 72
  72 90 108
  96 120 144

We will tabulate the mode average, and the slow and fast limits, expressed as FPS. Note that the conversion from three decimal point millisecond values as reported by FRAPS to nearest FPS will incur rounding, because of the banding inherent inside an average value; i.e. a 8.38ms mean time is 119.33FPS, which rounds to 119FPS. The slow limit is 8.38 + 1.67 (20%) = 10.05ms which rounds to 100FPS, and the fast limit is 6.7ms which rounds to 149FPS. The is contrary to using the FPS averaged value of 119 as a basis for the slow and fast limits (95 and 142, respectively). All our analysis is done using millisecond values, not converted averages; we're just expressing it in the more familiar FPS unit.

An example of a frametime plot with these limits superimposed (using mode for average) is shown, as you can see simply plotting a graph of all frametimes is messy and hard to intrepet but you can see the game sitting in the smoothness zone except for isolated but repeated bursts of slow and fast frame times (click to zoom wherever you see the magnifying glass in Rage3D articles).

Example Frametime plot

A game is a feedback loop with the worst possible variable included in it, a human. Our perception rates vary from sense to sense and from person to person, and familiarity with process dictates our expectations and reactions. A smoothness boundary of 20% takes out the highs and lows that are inside the 99% time and allows us to find when a game is offering sustained stuttering perceptible to gameplay. Large performance drops caused by in-game events like large explosions that cause a performance drop of more than 20% will be caught; it also removes any potential driver tricks such as very fast frames, while still allowing normal variation like changes from ducking behind cover or sighting a scope. A lower percentage of the total frames outside the smoothness band indicates a better experience. For presentation purposes we're going to show the percent time inside the boundary, so a higher value is better.

The Testing Parameters

We will bench with frame limiting or vsync options on and off. For testing, our preference is given to in-game frame limiting if available, and where not then an in-game vsync option will be used. If in-game vsync doesn't work, we'll try driver control panel forced vsync, and if that's not working then RadeonPro's vsync options to force vsync or frame limit. We're not going to enable double or triple buffering as it is not consistently available across all titles and introduces potential for reducing responsiveness. Implicit in the use of frame limiting or vertical sychronization is that it will improve the smoothness metric if the game is consistently hitting the synchronization frame rate; there will not be frames surpassing the fast limit figure. This means any 'unsmoothness' found is purely from frame rates dipping below the slow limit figure.

Frame limiting is preferred over vertical synchronization primarily for gaming feel, as it helps smooth out the frame rate by capping the fast rendered frames but still allowing the slower rendered frames to be delivered as soon as possible. However, these leave non-synchronized frames displaying on the screen which leads to tearing. Depending on which problem affects immersion for you more - image quality or responsiveness - then you'll have a preference on vsync vs. frame rate limiting. An advantage of frame rate limiting is it doesn't necessarily have to be refresh rate, it can be faster to help find the engine responsiveness feel you desire - or minimize the stutter.

Overall, this should give a comparative basis of performance and smoothness for different cards, configurations or systems.