Pixel Shaders
In its pixel shader related part, GeForce FX exceeds the base specs only quantitatively. But to what extent!
First of all, the maximum length of a pixel shader is now 2048 instructions instead of 96! Secondly, when executing a pixel shader, the pixel processor can use not 32, but 1024 pre-set constants and also more temporary registers (32 against 12).
It’s clear that GeForce FX can run much more complex pixel shaders than RADEON 9700 PRO. Complex procedure textures imitating heterogeneous materials like wood or marble or representing natural surfaces like human skin; physically true rendering of various optical effects and complex reflections/refractions of light – this is all just a small area of application of “advanced” pixel shaders available in GeForce FX.
Of course, it will take a lot of time yet until we see such beauties in real computer games, but we already can watch the winged elf-girl from the famous demo:

Vertex Shaders
As far as vertex shaders are concerned, GeForce FX offers qualitative as well as quantitative innovations. Quantity – the increased number of registers. This allows programming more complex shaders. Quality – dynamic jumps and loops in the shader’s body. ATI RADEON 9700 PRO also allows using subroutines, loops and branching in vertex shaders. But RADEON’s branching is static. It means that intermediate results produced during execution of a shader cannot change its execution sequence. This sequence is only determined by input variables. That is, although we have transitions and loops implying non-linearity, the shader is executed linearly. The hardware part of R300 is intended to execute the shaders in such a way. For example, the compiler just unfolds all the loops in the shader when loading it into the vertex processor – they just become a chain of identical instruction blocks.
GeForce FX, on the contrary, offers “real” dynamic transitions: variables that determine the transition condition can change during shader execution. It means that at any moment you can use the input data or the data obtained on shader execution to perform the selected part of the code or a subprogram.
These dynamic jumps make vertex shaders a really powerful and handy tool. For example, there is no need now to use different vertex shaders for different situations and lose time on their re-loading into the vertex processor: you can write one big vertex shader with branches for each given situation. Or, when several vertex shaders have an identical code sequence, you can unite them into one and describe this common sequence as a subroutine.
So, we can see that NVIDIA GeForce FX goes far beyond present-day standards in flexibility and programmability. But it’s not enough for the chip to live a long and happy lifecycle. Above all, it must be fast. Ragingly fast!
Judging by the specifications and numbers only, we can suppose NVIDIA GeForce FX 5800 Ultra is going to have an advantage at calculations (500MHz graphics chip clock-rate is no small thing!) and a certain disadvantage in modes that load the graphics memory bus a lot.
Anyway, we know quite well how much the overall performance of a graphics card can be affected by the efficient use of the available memory bandwidth, different anisotropic filtering methods implemented by ATI and NVIDIA, or driver optimization. That’s why we won’t talk about performance now pointing at possible weak and strong points of the graphics solutions considered, but will turn to actual benchmarking results.
But first of all, let’s have a look at the new graphics card. Believe us, it deserves a separate section in our review! :)



