Pixel Shader Performance
The new card’s behavior when executing pixel shaders differs somewhat from the cards on last-generation GPUs. Although the first considerable performance hit occurs when the card goes from the simple version 2.0 pixel shader to the more complex PS 2.0 Longer, the card then even performs faster with the 4 Registers PS 2.0 shader. It’s only with the per-pixel lighting shader that we see one more big performance hit.
You can note that in the last case the GeForce 8800 GTX is only two times faster than the Radeon X1950 XTX despite its 128 stream processors clocked at 1.35GHz. It seems that the performance of the stream processors is limited by other factors, perhaps by the performance of the TMUs.
There’s an increase in performance in almost every subtest in Xbitmark. The only exception is the Plaid Fabric shader whose main feature is the sampling from a 3D texture. In this case, the three graphics cards running in this test deliver similar results, which implies some limitation, perhaps on the software level, i.e. the support of the GeForce 8800 in the new version of the ForceWare driver.
The G80 incorporates special-purpose branch execution units similar to those in the ATI X1000 architecture, so the new chip easily crunches through shaders with dynamic branching and is even more efficient at that than its opponent.
So, the superb pixel shader processing potential of the new Nvidia card is evident, but it’s too early to make any conclusions yet. Let’s see what we have in other tests.
The GeForce 8800 GTX is about 30% faster than the Radeon X1950 XTX in the pixel shader test from 3DMark05. But considering the similar behavior of so different graphics cards, we suspect that it is the graphics memory bandwidth that is the bottleneck here.
The pixel shader test from 3DMark06 uses a similar shader, but produces different results. The GeForce 8800 GTX enjoys a bigger advantage over the Radeon X1950 XTX in this test, especially in the resolution of 1280x1024 pixels. Still, we think that it is not the computing power, but the speed of texture sampling, memory controllers and/or caches is the main performance-limiting factor here.
So, the GeForce 8800 GTX shows its very best in the pixel shader tests. When high math1ematical performance is needed, the 128 stream processors clocked at 1.35GHz are unrivalled. And if the pixel shader contains a lot of texture lookups, the card provides a performance growth, too, thanks to its increased memory bandwidth and the 32 texture address units. These two factors will surely affect the card’s performance in real games.
Now let’s see how well the unified architecture is going to execute vertex shaders.