Fill Rate and Multi-Texturing
The Fill Rate test from 3DMark 2001 SE opens the show. We ran the test in two modes: the standard and with full-screen anti-aliasing in order to check the compression of the frame- and Z-buffers.
The results indicate that frame-buffer compression is available in both: NVIDIA GeForce FX 5800 Ultra and ATI RADEON 9700 PRO. These two cards don’t slow down so much with full-screen anti-aliasing enabled as NVIDIA GeForce4 Ti4800. For example, 2x anti-aliasing practically doesn’t deteriorate the performance of RADEON 9700 PRO and GeForce FX 5800 Ultra at all.
Surprisingly, NVIDIA GeForce FX loses to R300 when performing single-texturing even if it works at its regular clock-rates. However, it copes much more successfully with multi-texturing outperforming the rival. This may mean either that NVIDIA GeForce FX has a very poor memory controller (which is hardly to believe) or that GeForce FX uses a configuration with four pixel “pipelines” with two texturing units each in this test.
We can double check our supposition in the following test. This small program draws a polygon that covers the whole screen. Then it renders from zero (no texture, the pixel color is calculated as an interpolation of the colors of the polygon vertexes) to four textures of 512x512 size. You can turn off or on color and Z writes. So, here are the results obtained in the normal mode: color and Z writes are enabled:
ATI RADEON 9700 PRO behaves quite expectedly. It has only one texturing unit per pixel pipeline and has to spend an extra cycle to render every extra texture.
NVIDIA GeForce FX 5800 Ultra does strikingly similar to GeForce4 Ti4800. The transition from rendering one texture to rendering two textures, and from three to four textures results in a smaller performance drop than the transition from two to three textures. Added that even without any textures NVIDIA GeForce FX Ultra 5800 performs at a half of its theoretical maximum (1914.9Mpixel/s against the theoretical 4000Mpixel/s), we can state that under normal conditions, with color and Z writes enabled, GeForce FX follows a scheme corresponding to four “classic” pipelines with two texturing units in each. That is, the chip can process only four pixels at a time in case of multi-texturing, but renders two textures per clock.
Now, let’s try to disable color writes:
We’ve got it! Ignoring texture rendering and without writing the results into the frame-buffer, GeForce FX comes to its maximum theoretical level. It means GeForce FX uses an eight-pipeline scheme. But texture rendering is unavailable here. Such a curious work mode may bring GeForce FX some performance boost in the upcoming PC games of the near future, for example in Doom 3. In this game, the rendering of lighting and shadows in each frame is forestalled by one or several passes to initialize the Z-buffer and stencil-buffer. GeForce FX would perform the preliminary passes twice as fast as the final one, calculating eight pixels (Z values, to be more exact) in a bunch per clock. At the same time, this optimization won’t give GeForce FX any advantage in a majority of existing games.