Notes to the table above:
: The first and foremost feature of the GeForce FX (NV30) architecture is the lack of “pixel pipelines” in their classic form. For example, the eight “classical” pixel pipelines of ATI RADEON 9700 PRO are eight independent functional units working in parallel. Each of them includes a pixel processor, a texturing unit, an input/output unit and so on…
The NV30 architecture doesn’t split the computational resources of the GPU into pixel pipelines. Instead, NV30 has a set of arithmetic logic units (ALUs), a set of texturing units, a set of control and interface elements and so on. In theory, they can organize themselves into any combination depending on the current task. Such a combination can be roughly called a pixel “pipeline”. As GeForce FX has eight texturing units, we suppose the chip can use combinations like 8x1 (eight pixel “pipelines” with one texturing unit in each), 4x2 (four “pipelines” with two texturing units in each) and so on…
From the performance point of view, this exceeding flexibility may prove either an advantage of GeForce FX, or its drawback. It depends upon the given task.
We are going to check out how GeForce FX organizes its architecture and behaves under different conditions in the benchmarking section of the review. For now, we should only say that in reality it is not so smooth as it seems.
: The maximum number of instructions in a shader is characterized by the “instruction slots” parameter. One instruction may take up several “slots”.
R300 supports parallel execution of scalar and vector arithmetic instructions, so the maximum number of instruction slots in a pixel shader equals 160.
GeForce FX features universal instruction slots and the maximum shader length doesn’t equal the sum of texture and arithmetic instructions slots.
: Constants and temporary registers of pixel shaders in GeForce FX may store either 16-bit four-component values, or 32-bit ones. When they store 16-bit numbers that take up twice as little space, the number of available registers doubles.
: NVIDIA GeForce FX supports two floating-point data formats: 16-bit per component and 32-bit per component. GeForce FX performs 32-bit floating-point calculations twice as slow as 16-bit ones: its 16-bit ALUs have to get in pairs for 32-bit calculations.
ATI RADEON 9700 PRO supports both 16-bit and 32-bit data precision, but performs all floating-point calculations with 24-bit precision. The result can be then translated into the 16-bit format, or expanded to the 32-bit one.
: Hidden Surface Removal (HSR) is a method, which allows to use the available graphics memory bus bandwidth more efficiently. Before drawing a pixel, GeForce4 and FX GPUs check its Z value (its distance) and compare it with the value stored in the Z-buffer. If the pixel’s Z is bigger, there is no need to draw it. Thus, the number of reads/writes into the frame-buffer is reduced as well as the amount of transferred texturing data. Overall, the available memory bandwidth is used more efficiently.
When full-screen anti-aliasing is on, GeForce FX performs the same operations on subpixel blocks, for example with 2x2 blocks for 4x FSAA. It implies that not single pixels, but blocks of several subpixels are compared and then either accepted for drawing or “dismissed”.
ATI RADEON 9700 PRO uses a more sophisticated HSR algorithm, which is based on the concept of the tile Z-buffer. It’s the so-called HyperZ III technology. Theoretically, HyperZ III is more effective as it “dismisses” whole groups of unnecessary pixels. You can read more about HyperZ III in our ATI RADEON 9700 PRO Review.
: Frame-buffer compression is used for full-screen anti-aliasing with the method of multisampling (multisampling and supersampling are described in our article called “On the Way to Ideal Picture”). Using the fact that during multisampling all subpixels are of the same color for the pixel that doesn’t lie at the edge of a polygon, NVIDIA GeForce FX and ATI RADEON 9700 PRO graphics chips write only one color value into the frame-buffer. Unfortunately, this technique cannot be applied to pixels that lie on the edges, as their subpixels may have different colors. Anyway, the number of such pixels in a scene is usually small, so we can reduce the amount of transferred data quite considerably. For example, at 4x multisampling, the data actually sent to the frame-buffer may get down to one fourth of the initial amount.
: Z-buffer compression follows the same principle as the frame-buffer compression. ATI RADEON 9700 PRO uses somewhat more sophisticated algorithm for work with the Z-buffer resulting in theoretically higher efficiency.
We will check out a little later the effect of the technologies intended to make the use of the available graphics memory bus bandwidth more efficient.
There is no doubt that he specifications of GeForce FX exceed those basic requirements to the hardware set by Microsoft DirectX 9. ATI RADEON 9700 PRO, on the contrary, meets all the pixel and vertex shaders specs version 2.0 very precisely.
The manufacturer emphasizes the enhanced flexibility and programmability of GeForce FX by marking the versions of pixel and vertex shaders as “2.0+”. This is the new CineFX architecture, the “hit” point of the new GPU from NVIDIA.
Let’s see where GeForce FX goes beyond the basic DirectX 9 specifications.