<%BANNER[top_768x90]%>

<%BANNER[banner_468x60_h]%>

NVIDIA GeForce FX 5800 Ultra Review: New Technologies and Performance

NVIDIA GeForceFX aka NV30 has been argued about for a long time now. Is the new architecture well-polished or still raw? Is the performance as high as everybody have expected? How efficient is the cooling solution? Well, you will definitely find all these answers and even more in our new article!

by Tim Tscheblockov
03/27/2003 | 10:47 PM

By the end of the winter we saw the completion of the most impressive and promising project in the history of graphics cards production. The new creation of NVIDIA, the GeForce FX GPU (nee NV30) came into this world.

<%BANNER[article]%>

While the new chip was being concocted deep in the innards of NVIDIA, the outer world was catching at any rumor or supposition about it. All but the lazy were savoring any piece of news about NV30 from NVIDIA and R300 from ATI. And everyone agreed NV30 would be more functional, more flexible and faster than its rival.

But it seems like NVIDIA had put too high a goal before itself to be achieved during the standard half-a-year-long (or a full year long – for brand-new architectures) development cycle of new products. The simultaneous transition to the absolutely new architecture and the more advanced 0.13micron technological process called for much more efforts than had been expected.

While NVIDIA was sorting out its problems, its “good old enemy” ATI released RADEON 9700 / 9700 PRO chips. Using the smooth and proven 0.15micron tech process and some of the innovations first implemented back in RADEON 9000 (RV250), ATI got the opportunity to tout a first DirectX 9-compatible graphics chip. We all know what has happened since then: ATI didn’t expect its “pilot ball” to be such a success and hurriedly issued a full series of chips based on the R300 architecture. Moreover, in order to save effort and time, the company didn’t develop any “value” versions of R300, but built the whole series of products on one and the same chip. To reduce the performance and price of the Value solution they simply disabled some functional units and narrowed the memory bus. This approach proved to be strategically correct and brought surprising results: having not encountered any worthy competition from NVIDIA’s side, graphics cards based on the new ATI’s chip were growing more and more popular and now the company faces the problem of severe chips shortage.

And what about NVIDIA? If we take on trust earlier rumors about NV30 and compare them with the final specs, we might suppose that there have been too many problems encountered on the way to the silicon implementation of the concept and the NV30 chip has been changed and simplified a number of times. So the outcome probably includes only a small part of the intended potential. Nonetheless, NVIDIA’s competitors shouldn’t delude themselves: NVIDIA just couldn’t roll out a GPU that would be slower than RADEON 9700 PRO. Otherwise, the company wouldn’t release it at all!

So, what’s the monster we have all been waiting for like?  What is GeForce FX?


GeForce FX GPU and CineFX Architecture

GeForce FX is the first GPU from NVIDIA to comply with Microsoft DirectX 9 specifications. NVIDIA called the launch of GeForce FX – “the dawn of cinematic computing”, and the NV30 architecture – CineFX. The company implies that GeForce FX provides the PC users the opportunity to enjoy those special effects we usually see in movies in real time. Let’s find out what is hidden behind the CineFX name that allows NVIDIA such bold statements.

NV30 architecture differs from the “classical” graphics chips architecture, so a number of items in the GeForce FX specifications list require our detailed comments. Below are the main characteristics of GeForce FX (NV30) compared with those of NVIDIA GeForce4 Ti4800 (NV28) and ATI RADEON 9700 PRO (R300):

NVIDIA GeForce4 Ti 4800
(NV28)

ATI RADEON 9700 PRO
(R300)

NVIDIA GeForce FX
(NV30)

Manufacturing technology

0.15micron

0.15micron

0.13micron

Number of transistors

67mln

110mln

125mln

Chip frequency

300MHz

325MHz

500MHz

Graphics memory controller

128bit
DDR SDRAM

256bit
DDR SDRAM

128bit
DDR II SDRAM

Graphics memory frequency

650MHz
(325MHz DDR)

620MHz
(310MHz DDR)

1000MHz
(500MHz DDR)

Peak memory bus bandwidth

9.9GB/s

18.9GB/s

15.2GB/s

Max graphics memory size

128MB

256MB

256MB

AGP interface

AGP 3.0 4x/8x

AGP 3.0 4x/8x

AGP 3.0 4x/8x

Pixel pipelines, pixel shaders

Pixel pipelines

4

8

8 [1]

Texturing unites per pipeline

2

1

1 [1]

Max number of textures during multi-texturing

4

8

8

Texture filtering types

bi-linear
anisotropic
tri-linear
tri-linear + anisotropic

bi-linear
anisotropic
tri-linear
tri-linear + anisotropic

bi-linear
anisotropic
tri-linear
tri-linear + anisotropic

Max anisotropy level

8

16

8

Pixel shaders version

v.1.3

v.2.0

v.2.0+

Branching, subroutines and loops

none

none

none

Max number of textures per shader

4

16

16

Max number of texture instructions

4

32

1024

Max number of arithmetic instructions

8

64 (+64) [2]

1024

Max number of instructions per shader

12

96 (+64) [2]

1024 [2]

Registers

2 color registers,
8 constant registers,
4 texture registers,
2 time registers

2 color registers,
32 constant registers,
8 texture coordinates registers,
16 TMU identification registers,
12 time registers,

4 resulting color registers,
1 resulting Z register

2 color registers,
512 (1024) constant registers [3],
8 texture coordinates registers,
16 TMU identification registers,
16 (32) time registers [3],

4 resulting color registers,
1 resulting Z register

Data representation formats
[4]

Fixed point

Fixed point,
16bit floating point,
32bit floating point

Fixed point,
16bit floating point,
32bit floating point

Vertex pipelines, vertex shaders

Vertex pipelines

2

4

3

Vertex shaders version

v.1.1

v.2.0

v.2.0+

Branching, subroutines and loops

none

static

Dynamic

Max number of instructions per shader

128

256

256

Max number of instructions with loops extension

128

65536

65536

Registers

16 input registers,
12 time registers,
96 constant floating-point registers,
1 address register,

8 output registers for texture coordinates,
1 fog color output register,
1 vertex position output register,
1 pixel size output register,
2 output registers for diffuse/mirror color component

16 input registers,
12 time registers,
256 constant floating-point registers,
16 constant integer registers,
16 Boolean registers,
1 address register,
1 loops counter register,

8 output registers for texture coordinates,
1 fog color output register,
1 vertex position output register,
1 pixel size output register,
2 output registers for diffuse/mirror color component

16 input registers,
16 time registers,
256 constant floating-point registers,
256 constant integer registers,
256 Boolean registers,
1 address register,
1 loops counter register,

8 output registers for texture coordinates,
1 fog color output register,
1 vertex position output register,
1 pixel size output register,
2 output registers for diffuse/mirror color component

Data representation formats

32bit floating-point

32bit floating-point

32bit floating-point

Full-screen anti-aliasing

FSAA methods

Supersampling,
Ordered grid multi-sampling
(OGSS, OGMS)

Rotated grid multi-sampling
(RGMS)

Supersampling,
Odered grid multi-sampling
(OGSS, OGMS)

Number of samples

2 (OGSS, OGMS),
Quincunx,
4 (OGSS, OGMS, OGSS+OGMS, only in Direct3D)

2, 4, 6

2 (OGSS, OGMS),
Quincunx,
4 (OGSS, OGMS, OGSS+OGMS – only in Direct3D ),
6 (OGSS+OGMS, only in Direct3D),
8 (OGSS+OGMS, only in Direct3D)

Technologies helping to use the graphics memory bandwidth more efficiently

Hidden Surfaces Removal (HSR)
[5]

yes

yes

Yes

Frame-buffer compression
[6]

none

yes

Yes

Z-buffer compression
[7]

yes

yes

yes


Notes to the table above:

[1]: The first and foremost feature of the GeForce FX (NV30) architecture is the lack of “pixel pipelines” in their classic form. For example, the eight “classical” pixel pipelines of ATI RADEON 9700 PRO are eight independent functional units working in parallel. Each of them includes a pixel processor, a texturing unit, an input/output unit and so on…

The NV30 architecture doesn’t split the computational resources of the GPU into pixel pipelines. Instead, NV30 has a set of arithmetic logic units (ALUs), a set of texturing units, a set of control and interface elements and so on. In theory, they can organize themselves into any combination depending on the current task. Such a combination can be roughly called a pixel “pipeline”. As GeForce FX has eight texturing units, we suppose the chip can use combinations like 8x1 (eight pixel “pipelines” with one texturing unit in each), 4x2 (four “pipelines” with two texturing units in each) and so on…

From the performance point of view, this exceeding flexibility may prove either an advantage of GeForce FX, or its drawback. It depends upon the given task.

We are going to check out how GeForce FX organizes its architecture and behaves under different conditions in the benchmarking section of the review. For now, we should only say that in reality it is not so smooth as it seems.

[2]: The maximum number of instructions in a shader is characterized by the “instruction slots” parameter. One instruction may take up several “slots”.

R300 supports parallel execution of scalar and vector arithmetic instructions, so the maximum number of instruction slots in a pixel shader equals 160.

GeForce FX features universal instruction slots and the maximum shader length doesn’t equal the sum of texture and arithmetic instructions slots.

[3]: Constants and temporary registers of pixel shaders in GeForce FX may store either 16-bit four-component values, or 32-bit ones. When they store 16-bit numbers that take up twice as little space, the number of available registers doubles.

[4]: NVIDIA GeForce FX supports two floating-point data formats: 16-bit per component and 32-bit per component. GeForce FX performs 32-bit floating-point calculations twice as slow as 16-bit ones: its 16-bit ALUs have to get in pairs for 32-bit calculations.

ATI RADEON 9700 PRO supports both 16-bit and 32-bit data precision, but performs all floating-point calculations with 24-bit precision. The result can be then translated into the 16-bit format, or expanded to the 32-bit one.

[5]: Hidden Surface Removal (HSR) is a method, which allows to use the available graphics memory bus bandwidth more efficiently. Before drawing a pixel, GeForce4 and FX GPUs check its Z value (its distance) and compare it with the value stored in the Z-buffer. If the pixel’s Z is bigger, there is no need to draw it. Thus, the number of reads/writes into the frame-buffer is reduced as well as the amount of transferred texturing data. Overall, the available memory bandwidth is used more efficiently.

When full-screen anti-aliasing is on, GeForce FX performs the same operations on subpixel blocks, for example with 2x2 blocks for 4x FSAA. It implies that not single pixels, but blocks of several subpixels are compared and then either accepted for drawing or “dismissed”.

ATI RADEON 9700 PRO uses a more sophisticated HSR algorithm, which is based on the concept of the tile Z-buffer. It’s the so-called HyperZ III technology. Theoretically, HyperZ III is more effective as it “dismisses” whole groups of unnecessary pixels. You can read more about HyperZ III in our ATI RADEON 9700 PRO Review.

[6]: Frame-buffer compression is used for full-screen anti-aliasing with the method of multisampling (multisampling and supersampling are described in our article called On the Way to Ideal Picture). Using the fact that during multisampling all subpixels are of the same color for the pixel that doesn’t lie at the edge of a polygon, NVIDIA GeForce FX and ATI RADEON 9700 PRO graphics chips write only one color value into the frame-buffer. Unfortunately, this technique cannot be applied to pixels that lie on the edges, as their subpixels may have different colors. Anyway, the number of such pixels in a scene is usually small, so we can reduce the amount of transferred data quite considerably. For example, at 4x multisampling, the data actually sent to the frame-buffer may get down to one fourth of the initial amount.

[7]: Z-buffer compression follows the same principle as the frame-buffer compression. ATI RADEON 9700 PRO uses somewhat more sophisticated algorithm for work with the Z-buffer resulting in theoretically higher efficiency.

We will check out a little later the effect of the technologies intended to make the use of the available graphics memory bus bandwidth more efficient.

There is no doubt that he specifications of GeForce FX exceed those basic requirements to the hardware set by Microsoft DirectX 9. ATI RADEON 9700 PRO, on the contrary, meets all the pixel and vertex shaders specs version 2.0 very precisely.

The manufacturer emphasizes the enhanced flexibility and programmability of GeForce FX by marking the versions of pixel and vertex shaders as “2.0+”. This is the new CineFX architecture, the “hit” point of the new GPU from NVIDIA.

Let’s see where GeForce FX goes beyond the basic DirectX 9 specifications.


Pixel Shaders

In its pixel shader related part, GeForce FX exceeds the base specs only quantitatively. But to what extent!

First of all, the maximum length of a pixel shader is now 2048 instructions instead of 96! Secondly, when executing a pixel shader, the pixel processor can use not 32, but 1024 pre-set constants and also more temporary registers (32 against 12).

It’s clear that GeForce FX can run much more complex pixel shaders than RADEON 9700 PRO. Complex procedure textures imitating heterogeneous materials like wood or marble or representing natural surfaces like human skin; physically true rendering of various optical effects and complex reflections/refractions of light – this is all just a small area of application of “advanced” pixel shaders available in GeForce FX.

Of course, it will take a lot of time yet until we see such beauties in real computer games, but we already can watch the winged elf-girl from the famous demo:

 

Vertex Shaders

As far as vertex shaders are concerned, GeForce FX offers qualitative as well as quantitative innovations. Quantity – the increased number of registers. This allows programming more complex shaders. Quality – dynamic jumps and loops in the shader’s body. ATI RADEON 9700 PRO also allows using subroutines, loops and branching in vertex shaders. But RADEON’s branching is static. It means that intermediate results produced during execution of a shader cannot change its execution sequence. This sequence is only determined by input variables. That is, although we have transitions and loops implying non-linearity, the shader is executed linearly. The hardware part of R300 is intended to execute the shaders in such a way. For example, the compiler just unfolds all the loops in the shader when loading it into the vertex processor – they just become a chain of identical instruction blocks.

GeForce FX, on the contrary, offers “real” dynamic transitions: variables that determine the transition condition can change during shader execution. It means that at any moment  you can use the input data or the data obtained on shader execution to perform the selected part of the code or a subprogram.

These dynamic jumps make vertex shaders a really powerful and handy tool. For example, there is no need now to use different vertex shaders for different situations and lose time on their re-loading into the vertex processor: you can write one big vertex shader with branches for each given situation. Or, when several vertex shaders have an identical code sequence, you can unite them into one and describe this common sequence as a subroutine.

So, we can see that NVIDIA GeForce FX goes far beyond present-day standards in flexibility and programmability. But it’s not enough for the chip to live a long and happy lifecycle. Above all, it must be fast. Ragingly fast!

Judging by the specifications and numbers only, we can suppose NVIDIA GeForce FX 5800 Ultra is going to have an advantage at calculations (500MHz graphics chip clock-rate is no small thing!) and a certain disadvantage in modes that load the graphics memory bus a lot.

Anyway, we know quite well how much the overall performance of a graphics card can be affected by the efficient use of the available memory bandwidth, different anisotropic filtering methods implemented by ATI and NVIDIA, or driver optimization. That’s why we won’t talk about performance now pointing at possible weak and strong points of the graphics solutions considered, but will turn to actual benchmarking results.

But first of all, let’s have a look at the new graphics card. Believe us, it deserves a separate section in our review! :)


NVIDIA GeForce FX 5800 Ultra Graphics Card

The new GPU from NVIDIA comes into our test lab on the NVIDIA GeForce FX 5800 Ultra reference-card:

 

Looks impressive, doesn’t it? While the PCB is quite homely, the cooling system is a monster made of copper and plastic:

Because of this cooling system, the card takes up two compartments in the system case: two brackets at the backside of the chassis. One of the brackets will have VGA, TV-Out and DVI outputs; the other – air in- and outtake vents.

The cooling system of the graphics card consists of two parts: the front side features an active cooling solution for the GPU and some of the graphics memory chips. The backside has a heatsink that covers the rest of the graphics memory chips.

The active cooling system of the GPU and memory is based on thermal pipes. They take the heat from the core and memory chips and transfer it to the heatsink:

 

The foot of the cooling system is equipped with special pads for tight thermal contact with the memory chips and also a layer of some supple material: thermal interface for the graphics core:

 

The heatsink that covers the backside of the card also carries such thermal pads. They are about one millimeter high.

 

A centrifugal blower, similar to the one used by ABIT in its OTES cooling solution (see our article called “Overclocker’s Dream: ABIT OTES Review”), pumps the air through the cooling system heatsink:

This fan is the main noise source of the working GeForce FX 5800 Ultra. The card has already earned the nick-names like “hair-drier”, “vacuum-cleaner” and the like for this loud and irritating noise. It is truly the most unpleasant issue about it. The fact that the fan only starts up in 3D applications doesn’t save the day: people buy these cards exactly for 3D, not for work in office applications. By the way, this noise from the working card is nothing compared to what you hear when the fan slows down. You see, the control circuitry doesn’t slow down the fan smoothly by reducing its rotation speed to a halt. It seems that the card doesn’t reduce the voltage sent to the fan little by little, but sends recurrent pulses of full voltage changing its on-off time ratio step-by-step. As a result, the rotation frequency of the fan collides with the power impulses sequence frequency, there arise beats and pulses and the card sets up a rich sound performance that could have waken 3dfx back to life :).

What is all this mess about? They just wanted to take off the heat from 125 million transistors of the graphics core:

The frequency of the GeForce FX GPU is 500MHz, which is more than 1.5 times higher than the frequency of the main competitor: ATI RADEON 9700 PRO. Of course, such high clock-rate is the outcome of the finer manufacturing technology. For example, the frequency limit for the 0.15micron R300 chip lies somewhere around 400MHz. In order to increase the frequency of its chips higher, ATI will have to go over to the finer technology process sooner or later. NVIDIA has already carried out the transition.

It’s a meaningful fact that NVIDIA didn’t use a 256-bit DDR memory controller in GeForce FX. Instead, there is a less complex 128-bit DDR II controller, which allows reaching higher frequencies. This new memory type has been developed to work at high frequencies and we can see it in GeForce FX 5800 Ultra: the graphics memory of the card is clocked at 1000MHz (500MHz DDR). The resulting peak bandwidth of the 128-bit graphics memory bus of GeForce FX 5800 Ultra is 15.2GB/s, which is just a little lower than the 18.9GB/s of the 256-bit bus of RADEON 9700 PRO.

But back to the card. It carries eight DDR II 2ns chips from Samsung:

These chips heat up a lot: after a few minutes of work, you can’t touch the heatsink at the backside of the card, it’s very hot. Considering that the backside heatsink is not forcedly cooled by any air streams, we might say that a chassis with good airflow is one of the requirements for GeForce FX 5800 Ultra to work stable.

NVIDIA GeForce FX 5800 Ultra, just like RADEON 9700 PRO based cards, requires additional power supply. But unlike them, GeForce FX 5800 Ultra can work even when there is no additional power supplied. In this case, the working frequencies of the card are reduced to 250MHz chip and 500MHz memory.


Drivers

We used version 42.68 driver for our NVIDIA GeForce FX 5800 Ultra tests. After installation, the graphics card settings window can be found in a separate page of the Display Properties window.

Most of the driver panels are standard and thus not very interesting. But some of them contain settings peculiar to NVIDIA GeForce FX.

First of all, we can adjust the frequencies of the card:

You can independently set up the GPU and graphics memory frequencies for 2D and 3D modes. By default, they are 300MHz/600MHz for the 2D mode and 500MHz/1000MHz for 3D applications.

Another interesting page contains options to set the card up for 3D applications:

Here you can choose one of the three modes setting the “performance-to-quality” ratio: Application, Balanced and Aggressive. We will discuss each mode in detail later in this article. Here you can also force full-screen anti-aliasing (2x, Quincunx, 4x and modes available in Direct3D only: 4xS, 6xS and 8xS) and anisotropic filtering (2x, 4x, 8x).

Testbed and Methods

We used the following testbed for our test session:

The software set included:

We used default drivers settings for synthetic benchmarks with one exception: we turned off video synchronization (VSync).

NVIDIA GeForce FX 5800 Ultra was tested in two modes: at its regular working frequencies (500MHz/1000MHz) and at 325MHz/620MHz. The latter are regular clock-rates of ATI RADEON 9700 PRO, thus we will be able to compare the efficiency of NV30 functional units with those of R300.

Well, let’s get started.


Fill Rate and Multi-Texturing

The Fill Rate test from 3DMark 2001 SE opens the show. We ran the test in two modes: the standard and with full-screen anti-aliasing in order to check the compression of the frame- and Z-buffers.

The results indicate that frame-buffer compression is available in both: NVIDIA GeForce FX 5800 Ultra and ATI RADEON 9700 PRO. These two cards don’t slow down so much with full-screen anti-aliasing enabled as NVIDIA GeForce4 Ti4800. For example, 2x anti-aliasing practically doesn’t deteriorate the performance of RADEON 9700 PRO and GeForce FX 5800 Ultra at all.

Surprisingly, NVIDIA GeForce FX loses to R300 when performing single-texturing even if it works at its regular clock-rates. However, it copes much more successfully with multi-texturing outperforming the rival. This may mean either that NVIDIA GeForce FX has a very poor memory controller (which is hardly to believe) or that GeForce FX uses a configuration with four pixel “pipelines” with two texturing units each in this test.

We can double check our supposition in the following test. This small program draws a polygon that covers the whole screen. Then it renders from zero (no texture, the pixel color is calculated as an interpolation of the colors of the polygon vertexes) to four textures of 512x512 size. You can turn off or on color and Z writes. So, here are the results obtained in the normal mode: color and Z writes are enabled:

ATI RADEON 9700 PRO behaves quite expectedly. It has only one texturing unit per pixel pipeline and has to spend an extra cycle to render every extra texture.

NVIDIA GeForce FX 5800 Ultra does strikingly similar to GeForce4 Ti4800. The transition from rendering one texture to rendering two textures, and from three to four textures results in a smaller performance drop than the transition from two to three textures. Added that even without any textures NVIDIA GeForce FX Ultra 5800 performs at a half of its theoretical maximum (1914.9Mpixel/s against the theoretical 4000Mpixel/s), we can state that under normal conditions, with color and Z writes enabled, GeForce FX follows a scheme corresponding to four “classic” pipelines with two texturing units in each. That is, the chip can process only four pixels at a time in case of multi-texturing, but renders two textures per clock.

Now, let’s try to disable color writes:

We’ve got it! Ignoring texture rendering and without writing the results into the frame-buffer, GeForce FX comes to its maximum theoretical level. It means GeForce FX uses an eight-pipeline scheme. But texture rendering is unavailable here. Such a curious work mode may bring GeForce FX some performance boost in the upcoming PC games of the near future, for example in Doom 3. In this game, the rendering of lighting and shadows in each frame is forestalled by one or several passes to initialize the Z-buffer and stencil-buffer. GeForce FX would perform the preliminary passes twice as fast as the final one, calculating eight pixels (Z values, to be more exact) in a bunch per clock. At the same time, this optimization won’t give GeForce FX any advantage in a majority of existing games.


Version 1.1 and 1.4 Pixel Shaders

The same 3DMark 2001 SE helps us to measure the speed of DirectX 8.1 pixel shaders execution. As before, GeForce FX 5800 Ultra is benchmarked at regular and reduced working frequencies.

When working at lower frequencies, GeForce FX executes this simple pixel shader a little slower than RADEON 9700 PRO, but the gap between them is getting smaller in higher resolutions.

At its nominal frequencies, the new card from NVIDIA quite naturally outperforms its rival.

This more complex pixel shader brings a disappointing result: NVIDIA GeForce FX 5800 Ultra loses to ATI RADEON 9700 PRO even working at its regular frequencies. And when working at 325MHz/620MHz, GeForce FX even fell behind GeForce4, which doesn’t have ver.1.4 pixel shaders support and has to render the scene in two passes.

Why is it so? This may be the result of “cutting-down” computational power of the chip: it just doesn’t have enough arithmetic processors. Or the tradeoff for its highest flexibility. Both causes are quite plausible.

Version 2.0 Pixel Shaders

For checking the execution speed of ver.2.0 pixel shaders we took a test from the 3DMark03 suite. This test draws a scene with statuettes of an elephant and rhinoceros on a pedestal. The complex materials the statuettes and pedestal are made of are not textured, but are calculated in real time by means of 2.0 pixel shaders. The shaders use a lot of floating-point calculations.

Most depressing results! NVIDIA GeForce FX 5800 Ultra loses to ATI RADEON 9700 PRO turning out several times slower. This once again proves that NV30 executes advanced and calculations-heavy pixel shaders less efficiently than R300.


T&L and Vertex Shaders

Let’s see how well the vertex pipelines of the new NVIDIA’s chip work. First, we will watch the way GeForce FX performs the functions of the classical “Fixed-Function” T&L of DirectX 7:

With its three vertex pipelines, NVIDIA GeForce FX runs faster than four-pipelined RADEON 9700 PRO even at the reduced frequencies. The T&L functions are emulated in these chips by means of vertex shaders and it seems as if NV30 has some “shortcuts” or “rudiments” of the T&L unit, which account for its higher performance. Or it is just more efficient at compiling T&L commands into vertex shaders commands, providing the maximum speed of T&L commands execution.

GeForce FX loses its ground as soon as we turn to vertex shaders. It has fewer vertex processors than R300 (three against four); moreover, their efficiency turns to be lower than that of vertex units in RADEON 9700 PRO.

The picture remains the same with ver.2.0 vertex shaders: NVIDIA GeForce FX working at nominal frequencies is a little behind ATI RADEON 9700 PRO. The lag of GeForce FX is not a nice thing,, but at least there are no such surprises like those we saw in the pixel shader test.

The last synthetic test for today is Ragtroll from 3DMark03:

This test uses 1.1 vertex shaders for transformation of the models of falling trolls. It also features program calculation of collision physics. Thus, the engine of the test distributes workload between the GPU and CPU of the system.

The results suggest that the test is limited by graphics cards performance. ATI RADEON 9700 PRO has higher-performing vertex shaders units and surpasses GeForce FX 5800 Ultra in about the same proportion as in the vertex shader tests.


“Application”, “Balanced” and “Aggressive” Modes

We have already mentioned earlier in this review that the driver offers three work modes for NVIDIA GeForce FX 5800 Ultra, differing in performance-to-quality ratio: “Application”, “Balanced”, “Aggressive”.

Let’s make GeForce FX draw a picture in different modes. For the beginning, we took a small test program drawing a pyramid with the base “lying” on the screen and a far-off vertex. The pyramid has many sides and looks like a cone. A “tessellated” texture is laid over the sides of the pyramid; MIP-levels are “highlighted”; tri-linear filtering is on:

Now, a fragment of this scene as rendered by GeForce FX in different modes:

Application

Balanced

Aggressive

Take a look at the color transitions between MIP-levels. In the “Application” mode they become “blurred” thanks to tri-linear filtering: at this kind of filtering, a pixel color is created as a combination of two values taken from two neighboring MIP-levels and summed up with weights depending on the distance to the pixel. That is, tri-linear filtering should result in smooth and continuous half-tone transitions in this test scene.

But we only see this smoothness in the “Application” mode. In “Balanced” and “Aggressive” modes, the smooth transitions get broken into pieces: some of the pieces have tone transitions and some are colored in pure tones.

This can mean only one thing: GeForce FX uses a combination of bi-linear and tri-linear filtering in “Balanced” and “Aggressive” modes instead of true tri-linear filtering. You can see it with a naked eye that tri-linear filtering is performed on less than a half of the image area in the “Balanced” mode, and on less than a quarter – in the “Aggressive” mode.

Now, let’s add anisotropic filtering of the maximum 8x level to the enabled tri-linear filtering:

Application

Balanced

Aggressive

It’s perfectly seen that besides the mixture of bi- and tri-linear filtering, the GPU shifted MIP-levels closer to the observer in “Balanced” and “Aggressive” modes. In other words, it lowered textures level of detail (LOD).

Let’s take a real gaming scene from Serious Sam: The Second Encounter. The game ran in OpenGL, with 32-bit color depth and “Quality” graphics quality settings. We used the “GFX: Extreme Quality” add-on. To make the transitions between MIP-levels visible, we enabled their “coloring” and used only bi-linear filtering:

Here is a fragment of the screenshot taken in different modes:

Application

Balanced

Aggressive

By the evident shift of MIP-levels we conclude that the modes use different textures level of detail.

But this is not all, yet. Other fragments of the same scene…:

Application

Balanced

Aggressive

…indicate that GeForce FX uses forced texture compression in the “Aggressive” mode. You can see the artifacts of this compression in the sky texture and the heart icon. Texture compression is most annoying when used on “transparent” textures. For example, here is a screenshot of the same scene taken in “Application” (left) and “Aggressive” (right) modes:

Application

Aggressive

So, forced texture compression and reduced level of detail are well-known methods of reducing the amount of texture data requested by the GPU. This lowers graphics memory bus workload and increases performance at the expense of worse image quality.

But what does GeForce FX get by substituting “true” tri-linear filtering with its “quasi” variant in “Balanced” and “Aggressive” modes? Only the reduction of requested texturing data? Or is this somehow connected with the new “fast” anisotropic filtering from NVIDIA? Let’s figure it out.


Anisotropic Filtering

As you know, graphics chips from NVIDIA have been using the same anisotropic filtering algorithm since GeForce3 (NV20) (see our Leadtek WinFast GeForce3 TD Review). By regarding pixel projection onto the polygon as an ellipse, not a circumference, NVIDIA GeForce3 and GeForce4 (NV20 and NV25/28) GPUs took not one but a few bi-linearly filtered points distributed evenly along the main axis of the ellipse and found the resulting color by averaging their colors. Depending on the anisotropy level, the number of the bi-linearly filtered points could change from 1 to 8 (1, 2, 4, 6, 8). The sketch below represents the variant with eight points:

The texturing units of NV20 and NV25 can take four samples at a time and perform bi-linear filtering on them at two neighboring MIP-levels simultaneously (eight samples and two bi-linear filtering operations in total). In other words, they performed tri-linear filtering in a single clock.

When it comes to anisotropic filtering, GeForce3 / GeForce4 require one clock to process each reference-point. So, the highest level of anisotropy may take eight cycles per pixel. The picture below illustrates a situation like that. Every reference-point represents the result of tri-linear filtering on N and N+1 MIP-levels. We put the GPU clocks marks next to the reference-points for your convenience:

It’s a known fact that the texturing units of GeForce3 / GeForce4 work at half of their full capacity when tri-linear filtering is not used: they just do one bi-linear filtering operation per clock.

But what if we “teach” texturing units to distribute the computational resources so that they were not idling when tri-linear filtering is off?

In this case, every texturing unit, able to take eight texture samples and perform two bi-linear filtering operations per clock, could process two reference-points at a time in case of enabled anisotropic filtering. The next picture shows this situation: we use one MIP-level; GPU clock-cycles are written next to the marked points:

When the GPU follows this scheme, it performs anisotropic filtering twice as fast as the combination of anisotropic and tri-linear filtering. If GeForce FX does use this particular work scheme, then 2x anisotropic filtering without tri-linear filtering or with tri-linear filtering in the “Aggressive” mode should be “cost-free” for the GPU from the performance point of view.

Of course, all our speculations about anisotropic filtering are nothing more than suppositions of ours. NVIDIA, of course, doesn’t tell the details of its “fast” anisotropic filtering algorithm. Nevertheless, these suppositions fit quite well with what we see in “Balanced” and “Aggressive” modes of GeForce FX. In these modes, the number of “true” tri-linear filtered pixels is small. That is, the substitution of “true” tri-linear filtering with the not fully-fledged one results not only in the smaller amounts of the texture data requested by the GPU, but also reduces computational costs of anisotropic filtering. The performance drop we witness when anisotropic filtering is enabled also confirms our “speculations”.

But let’s get back to the image quality. Another scene from Serious Sam: The Second Encounter is going to help us estimate the quality of texture rendering with enabled anisotropic filtering:

For the tests we used the same settings for both graphics cards: “Quality”, 32-bit color. We ran the “GFX: Extreme Quality” add-on.

NVIDIA GeForce FX was benchmarked in “Application”, “Balanced” and “Aggressive” modes.

It’s hard to pick up corresponding settings in the driver for ATI RADEON 9700 PRO: someone would certainly ask why this slide-bar is set this way, not the other way :)…

That’s why we bear all the responsibility for our decision to test this card in two “boundary” modes: “Speed”, when all the driver settings are set to maximum performance:

and “Quality” mode, which means that all settings are set to maximum quality:


Before estimating the quality of anisotropic filtering, let’s have a look at the picture produced by NVIDIA GeForce FX and ATI RADEON 9700 PRO with these settings. We will take a fragment of the scene as an example:

NVIDIA GeForceFX

Application

Balanced

Aggressive

ATI RADEON 9700 PRO

Quality

Speed

Visually, the quality of the image constructed by NVIDIA GeForce FX in “Application” and “Balanced” modes is similar to ATI RADEON 9700 PRO in the “Quality” mode.

When in the “Aggressive” mode, GeForce FX shows perceptible texture compression artifacts, but they are nothing compared to what ATI RADEON 9700 PRO shows in the “Speed” mode: besides forced compression, the graphics chip greatly reduces the level of textures detail. All this results in a wild-looking “slush”.

So, now that we have already collected some general impressions, let’s go over to estimating the anisotropic filtering quality. Just like in our ATI RADEON 9700 PRO Review, we took screenshots at different surface angles. The fragments of the screenshots are placed in two rows: the upper row shows anisotropic filtering of the maximum level (8x for GeForce FX and 16x for ATI RADEON 9700 PRO) together with tri-linear filtering. The lower row shows bi-linear filtering with highlighted MIP-levels. For comparison, we also show screenshots of NVIDIA GeForce4 Ti4800 in the “Balanced” mode with 8x anisotropic filtering.

0th angle of inclination (0o):

ATI RADEON 9700 PRO

GeForce4

NVIDIA GeForce FX 5800 Ultra

Quality

Speed

Balanced

Application

Balanced

Aggressive

ATI RADEON 9700 PRO surpasses GeForce FX in quality on horizontal surfaces: even in the “Speed” mode RADEON 9700 PRO produces no worse picture than GeForce FX. In the “Quality” mode, ATI RADEON 9700 PRO loses in sharpness of distant textures: the higher maximum level of anisotropy (16x against 8x in GeForce FX and GeForce4) tells here.

It’s interesting that even when tri-linear filtering is disabled in the “Quality” mode, the picture with highlighted MIP-levels on RADEON 9700 PRO looks like it’s still on. This effect is due to the ATI’s “house” anisotropic filtering method in the “Quality” mode: the resulting color is calculated from the colors of two texels taken from two neighboring MIP-levels with weights varied according to the remoteness of the pixel.

It’s rather hard to tell one mode of GeForce FX from another: “Application”, “Aggressive” and “Balanced” look much alike. Only the highlighting of MIP-levels makes clear the change in textures level of detail.

1st angle of inclination:

ATI RADEON 9700 PRO

GeForce4

NVIDIA GeForce FX 5800 Ultra

Quality

Speed

Balanced

Application

Balanced

Aggressive

This angle (22.5o) is “inconvenient” for the anisotropic filtering algorithm from ATI. RADEON 9700 PRO exposes to shame its fuzzy textures. NVIDIA GeForce FX, and also GeForce4, keep the same quality of texture filtering.

2nd angle of inclination:

ATI RADEON 9700 PRO

GeForce4

NVIDIA GeForce FX 5800 Ultra

Quality

Speed

Balanced

Application

Balanced

Aggressive

At the angle of 45o, ATI RADEON 9700 PRO returns to its normal quality and beats the competitors in image quality.

GeForce FX still provides visually the same texture filtering quality in all its modes. Only a closer examination shows that the MIP-levels highlight stripes in the “Aggressive” mode are wider than at the 22.5o angle of inclination. Of course, this is not the “adaptive filtering” from ATI, but just a peculiarity of the level of detail calculation.

So, we see that GeForce FX doesn’t bring any positive innovations over GeForce4 as far as anisotropic filtering quality is concerned. On the contrary, by introducing the mixture of tri-linear and bi-linear filtering instead of true tri-linear filtering they tend to worsen the image quality quite tangibly.

And this worsening does show itself, although it is not that noticeable in static screenshots. When the game is running, that is, the scene is dynamic, this worsening reveals itself on those parts of the picture that do not undergo tri-linear filtering. The worsening looks like “ripples” and clear transitions between MIP-levels on distant objects. Of course, you will hardly notice it in the “Balanced” mode, but is often evident in the “Aggressive” mode.

Anyway, this is not the anisotropic filtering from ATI that works perfect at “convenient” angles, and more than acceptable at “inconvenient” angles.


Now, the last thing we have to do is check the performance drop occurring in the three modes by GeForce FX when we enable anisotropic filtering. As usual, we will compare the results with those shown by ATI RADEON 9700 PRO.

Our first benchmark is Unreal Tournament 2003. The game uses Direct3D. Game settings looked as follows: Texture Detail: Highest, World Detail: Highest, Character Detail: Highest, Physics Detail: Normal, Character Shadows: ON, Dynamic Lighting: ON, Detail Textures: ON, Projectors: ON, Decals: ON, Coronas: ON, Decal Stay: Normal, Foliage: ON, Tri-linear Filtering: ON. We ran the Antalus flyby-scene.

We used “Speed” and “Quality” settings for ATI RADEON 9700 PRO in Direct3D as well as in OpenGL. The settings are those as shown in the above driver screenshots with one exception – we were moving the anisotropy level slide-bar to different positions.

We made almost no mistake by promising a “nearly cost-free” 2x anisotropic filtering in the “Aggressive” mode by GeForce FX. But this “nearly” amounted to 12% in the highest resolution. In the “Application” mode with true tri-linear filtering, NVIDIA GeForce FX has to perform anisotropic filtering in the way GeForce4 does, so they have similar performance drops. The performance reduction in the “Balanced” mode is in-between the two other modes by GeForce FX.

It is interesting, but the turned-on 2x anisotropic filtering in the “Speed” mode by ATI RADEON 9700 PRO leads to an increase in speed: the absence of tri-linear filtering allows more efficient use of texturing caches.

In the “Quality” mode, the adaptive anisotropic filtering algorithm implemented in RADEON 9700 PRO allows the solution to sacrifice less performance compared to the highest-quality “Application” mode on GeForce FX 5800 Ultra, which uses the same anisotropic filtering method as NVIDIA GeForce3 / GeForce4 chips.

The higher levels of anisotropy result in a bigger performance drop for all cards in all work modes. But the general picture remains the same: ATI RADEON 9700 PRO in the “Speed” mode is fastest, followed by GeForce FX in “Aggressive” and “Balanced” modes. ATI’s “Quality” comes next; GeForce4 and GeForce FX in the slowest, but best-quality “Application” mode are in the rear.

To check the cards’ behavior in OpenGL, we used Quake3 Arena v.1.30. The settings for all the cards were as follows: 32-bit texture color and frame-buffer depths, maximum amount of textures and objects, enabled tri-linear filtering, disabled texture compression.

Quake3 Arena loads the graphics card less than Unreal Tournament 2003, so we have a lower performance drop caused by anisotropic filtering here. But the results ratio remains unchanged, with that only difference that the performance drop of GeForce FX in the “Application” mode is just a little higher than of ATI RADEON 9700 PRO in the “Quality” mode.

Winding up this section of the review, we should acknowledge the quality and speed of anisotropic filtering as performed by GeForce FX. The evolutionary development of NVIDIA’s anisotropic filtering implementation is surprisingly successful. If you are not satisfied with the speed in the best-quality “Application” mode, you can switch to “Balanced” and increase the GPU speed without losing much of the image quality. And if this speed is still not enough for you, switch to “Aggressive”, but try to disregard possible texture “ripples” and some compression artifacts then.

Full-Screen Anti-Aliasing

Unlike the anisotropic filtering, GeForce FX can boast no drastic improvements in full-screen anti-aliasing. The only innovation here is the new 6xS and 8xS modes available only in Direct3D. These modes, as well as 4xS available since NVIDIA GeForce4, are a combination of multisampling and supersampling. To be more exact, it is supersampling applied to blocks processed by multisampling. Besides making textures look sharper due to supersampling, these methods should smooth polygons’ edges quite well thanks to the increased number of samples.

That’s what we are going to check out. ATI RADEON 9700 PRO is no simple rival: it uses a more interesting variation of multisampling when subpixels are placed on a rotated, not ordered grid. Our detailed examination of full-screen anti-aliasing implementation in ATI RADEON 9700 PRO is available in our ATI RADEON 9700 PRO Review.

So, we have our traditional scene from Serious Sam: The Second Encounter. We turned on work through Direct3D in game settings to enable 4xS, 6xS and 8xS modes of NVIDIA GeForce FX.

First, we check the quality of edges smoothing placed at an angle of 0o and 90o:

NVIDIA GeForce FX 5800 Ultra

4x

4xS

6xS

8xS

ATI RADEON 9700 PRO

2x

4x

6x

The method of 4x full-screen anti-aliasing from GeForce FX brings no surprises: the picture looks exactly like on any other graphics card from NVIDIA.

The 4xS method uses supersampling based on two blocks of 2x multisampling situated one above the other. That’s why it has a better effect on nearly-horizontal edges, rather than on nearly-vertical ones. In fact, there is no difference between 4x and 4xS on edges close to the vertical.

The 6xS method also uses supersampling based on neighboring multisampled blocks processed. But this time the blocks are situated horizontally, next to each other. This results in a better quality of edges smoothing if these edges are angled close to the vertical, but for horizontally angled edges this method works no better than 4x.

The 8xS method, just like 6xS, processes blocks placed horizontally next to each other. But due to the fact that it processes more samples (two blocks, four subpixels each), it is even more efficient in smoothing the “jaggies” at nearly vertical angles of inclination. As for horizontally angled edges, this method, just like 6xS, is hardly any better than 4x.

SMOOTHVISION 2.0 from ATI features rotated grid multisampling and provides excellent “jaggies” smoothing at polygons’ edges that are nearly horizontal or vertical. You can see it in the screenshots: in this case, 4x and 6x methods as implemented in ATI RADEON 9700 PRO don’t yield to NVIDIA GeForce FX in 6xS and 8xS modes.

Now, we change the angle:

NVIDIA GeForce FX 5800 Ultra

4x

4xS

6xS

8xS

ATI RADEON 9700 PRO

2x

4x

6x

Here 4x, 4xS, 6xS and 8xS modes of GeForce FX look better than 4x and 6x of ATI RADEON 9700 PRO: GeForce FX draws more half-tones at the edges of the polygons. The peculiarity of SMOOTHVISION 2.0 from ATI – the rotated grid – must have shown its disadvantage at these angles (30o and 60o) and these angles proved “inconvenient” for ATI RADEON 9700 PRO.

Now, the last variant:

NVIDIA GeForce FX 5800 Ultra

4x

4xS

6xS

8xS

ATI RADEON 9700 PRO

2x

4x

6x

We guess the “smoothest” edges are those produced by 6xS and 8xS methods of GeForce FX. All the rest, except ATI’s 2x, show a somewhat worse, but similar quality of “jaggies” smoothing.


Now, let’s estimate how big gets the performance drop when NVIDIA GeForce FX performs full-screen anti-aliasing.

Unreal Tournament 2003 comes first. The settings are the same: Texture Detail: Highest, World Detail: Highest, Character Detail: Highest, Physics Detail: Normal, Character Shadows: ON, Dynamic Lighting: ON, Detail Textures: ON, Projectors: ON, Decals: ON, Coronas: ON, Decal Stay: Normal, Foliage: ON, Trilinear Filtering: ON. We ran the Antalus flyby-scene.

We used “Speed” and “Quality” settings for ATI RADEON 9700 PRO in Direct3D as well as in OpenGL. The settings are those as shown in the above driver screenshots with one exception: we turned off anisotropic filtering and activated different full-screen anti-aliasing modes.

NVIDIA GeForce FX and ATI RADEON 9700 PRO with enabled full-screen anti-aliasing do not lose too much of their performance compared to NVIDIA GeForce4. The support of frame-buffer compression comes in handy here.

The lower memory bus bandwidth of GeForce FX accounts for its bigger performance drop in the highest-quality and slowest “Application” mode. In “Balanced” and “Aggressive” modes in 800x600 and 1024x768 resolutions, the CPU becomes the bottleneck of the system and even 2x FSAA doesn’t lead to any considerable performance reduction.

The workload of the graphics cards is higher in 1280x1024, but still NVIDIA GeForce FX does well. It all becomes clear in 1600x1200: the graphics memory bus bandwidth is the bottleneck here and ATI RADEON 9700 PRO shows a lower performance drop.

The graphics memory bus bandwidth is the crucial factor in 4x full-screen anti-aliasing, too. GeForce FX in the “Application” mode still loses to ATI RADEON 9700 PRO. “Balanced” and “Aggressive” modes reduce the amount of texturing data transferred through the graphics memory bus and GeForce FX shows smaller performance drop than RADEON 9700 PRO in lower resolutions. But in 1280x1024 and 1600x1200 this no longer helps: the lower memory bandwidth drags GeForce FX down.

Only NVIDIA graphics chips participated in this test: ATI RADEON 9700 PRO doesn’t support the combination of supersampling and multisampling.

Thanks to frame-buffer compression support, NVIDIA GeForce FX 5800 Ultra suffers a lower performance decrease compared to GeForce4. But this reduction is still higher than in case of 4x FSAA, because of supersampling used.

Actually, this comparison is not quite correct, although both cards use six samples for calculating one pixel color. ATI RADEON 9700 PRO uses “pure” multisampling, while NVIDIA GeForce FX 5800 Ultra – a combination of supersampling and multisampling.

The results speak for themselves: the “pure” multisampling is faster :).

There is nothing to comment on: NVIDIA GeForce FX could only run in up to 1280x1024 resolutions in the 8xS mode and also suffered a colossal performance drop: 70%-80%.

Quake3 Arena uses OpenGL. This means 4xS, 6xS and 8xS modes are not available here for NVIDIA GeForce FX and GeForce4.

We used the following game settings: 32-bit textures and frame-buffer color depths, maximum amount of textures and objects, enabled tri-linear filtering, disabled textures compression.

NVIDIA GeForce FX manages 2x FSAA in Quke3 Arena very well. Thanks to frame-buffer compression, GeForce FX is much better than GeForce4 here. Considering ATI RADEON 9700 PRO suffers significant performance drops, the graphics memory bus doesn’t become a bottleneck for GeForce FX in Quake3 Arena any more.

However, there is one more explanation of good results shown by GeForce FX in this game: without anti-aliasing the performance of GeForce FX is normally greatly limited by the CPU performance. Now that full-screen anti-aliasing is enabled, the system loses less fps due to the CPU’s limiting effect.

The lower graphics memory bus bandwidth of GeForce FX shows itself in the 4x mode. This GPU suffers a bigger performance drop than ATI RADEON 9700 PRO.

Summing it all up, we can put forth a few observations.

First, 6xS and 8xS modes, although they are good at smoothing polygons with nearly vertical edges, are hardly better than 4x on nearly horizontal edges. Considering the high performance tradeoff and their availability in Direct3D only, we don’t think they will be used often or at all. Anyway, the new FSAA modes of GeForce FX don’t bring any advantage over the anti-aliasing implemented by ATI.

The good old 2x and 4x of NVIDIA GeForce FX provide the same quality of “jaggies” smoothing as 2x and 4x of ATI RADEON 9700 PRO, but the advantage of the rotated grid from ATI is clearly seen at edges angled closer to absolute horizontals and verticals.

2x and 4x methods result in a lower performance drop in NVIDIA GeForce FX compared to GeForce4 Ti4800 thanks to its frame-buffer and Z-buffer compression. Moreover, NVIDIA GeForce FX with its 128-bit memory bus looks worthy even against ATI RADEON 9700 PRO with its 256-bit memory bus at 2x and 4x FSAA. Only in the highest resolutions in 4x mode, GeForce FX suffers a greater performance loss than RADEON 9700 PRO due to its lower graphics memory bus bandwidth.


Gaming Benchmarks: Quake3 Arena

Settings look as follows: 32-bit texture color and frame-buffer depths, maximum amount of textures and objects, tri-linear filtering is on, texture compression is off.

The results in 800x600 and 1024x768 resolutions are greatly limited by the CPU performance as well as the whole system. But in higher resolutions the difference appears. To the disadvantage of ATI RADEON 9700 PRO.

Quake3 Arena uses multi-texturing and most surfaces have only two texture layers: the base texture and lighting map. As we have already learned earlier in this article, the flexible structure of NVIDIA GeForce FX chooses in this case a scheme with four pixel “pipelines”, each of which has two texturing units. Thus, the graphics chip can render two textures and process four pixels per clock.

ATI RADEON 9700 PRO has only one texturing unit per pixel pipeline and thus has to spend two clock-cycles to draw a pixel when rendering two textures. Overall, with its eight pixel pipelines, the ATI’s graphics chip draws eight pixels in two clock-cycles.

This way both graphics cards are in equal conditions in Quake3 Arena: each draws eight pixels in two clocks. But GeForce FX has higher chip frequency (500MHz against 325MHz by RADEON 9700 PRO) and this helps it to fill up the scene faster. And as the fill-rate parameter is most crucial for graphics cards performance in Quake3 Arena, NVIDIA GeForce FX outruns ATI RADEON 9700 PRO quite naturally. But the gap between the two cards is not proportional to the difference between their graphics chip frequencies, because Quake3 Arena doesn’t use two textures everywhere and there are also other factors interfering, like the graphics memory bus bandwidth.

When 4x anti-aliasing is on, it is exactly the graphics memory bandwidth that mostly affects the overall result. So, NVIDIA GeForce FX 5800 Ultra proves slower here than ATI RADEON 9700 PRO.

Only in the “Aggressive” mode, which reduces computational workload on the graphics chip and the amount of texture data transferred through the memory bus to the minimum, NVIDIA GeForce FX 5800 Ultra can show higher results than ATI RADEON 9700 PRO. But it has to pay a lot for this boost: the image quality degrades considerably.

We have already counted out our reserve of complements to the fast anisotropic filtering method from NVIDIA. We can only repeat once again: at last, NVIDIA offers the option of “fast” or “quality” anisotropic filtering modes. And NVIDIA GeForce FX performs faster than ATI RADEON 9700 PRO in the both cases: when the cards work at highest quality settings and when they work in the “fast” modes.

At “heaviest” settings, NVIDIA GeForce FX 5800 Ultra fell behind RADEON 9700 PRO. The lower graphics memory bus bandwidth of GeForce FX mattered here.


Gaming Benchmarks: Unreal Tournament 2003

Settings: Texture Detail: Highest, World Detail: Highest, Character Detail: Highest, Physics Detail: Normal, Character Shadows: ON, Dynamic Lighting: ON, Detail Textures: ON, Projectors: ON, Decals: ON, Coronas: ON, Decal Stay: Normal, Foliage: ON, Trilinear Filtering: ON. We ran the Antalus flyby-scene.

Unreal Tournament 2003 loads the cards much more than Quake3 Arena.

In 800x600 and 1024x768 resolutions, the CPU becomes the bottleneck in the system: we can see no difference between the results in different quality modes.

The Direct3D part of the driver for NVIDIA GeForce FX seems to be not yet optimized to the level of the ATI’s driver. That’s why ATI RADEON 9700 PRO is ahead in 800x600 and 1024x768. But in higher resolutions the situation changes in favor of GeForce FX.

It looks as if the hardest test for GeForce FX 5800 Ultra in Unreal Tournament was the immense amount of textures. You can see how low the performance drops in the “Application” mode that uses true tri-linear filtering and highest LOD. Still, GeForce FX makes up for the lag behind RADEON 9700 PRO in the “Application” mode by getting ahead in “Balanced” and “Aggressive” modes.

When 4x full-screen anti-aliasing is used, ATI RADEON 9700 PRO reduces the gap to NVIDIA GeForce FX in “Balanced” and “Application” modes and catches up with it in 1600x1200. The lower memory bandwidth of GeForce FX tells on its results here.

The difference between the results GeForce FX shows in different quality modes is startling! That’s a clear demonstration of how optimized anisotropic filtering tells on the performance.

In the “Application” mode, GeForce FX performs anisotropic filtering like NVIDIA GeForce4 does, so we can’t even dream of high speed: GeForce4 loses to RADEON 9700 PRO in the “Quality” mode. But the fastest mode of NVIDIA GeForce FX 5800 Ultra, namely “Aggressive” mode, is faster than the “Speed” mode of ATI RADEON 9700 PRO.

4x full-screen anti-aliasing added to anisotropic filtering makes GeForce FX lose all its advantages and even lose its top position in 1600x1200 resolution.


Gaming Benchmarks: Serious Sam: The Second Encounter

We used 32-bit color and “Quality” graphics settings in Serious Sam: The Second Encounter. We also ran the “GFX: Extreme quality” add-on.

With the above-described settings, Serious Sam: The Second Encounter uses anisotropic filtering of the maximum possible level. So, NVIDIA GeForce FX 5800 Ultra worked in 8x mode, while ATI RADEON 9700 PRO – in 16x.

Full-screen anti-aliasing somewhat changes the rankings in favor of ATI RADEON 9700 PRO: its higher memory bus bandwidth comes to the fore.

Gaming Benchmarks: Codecult Codecreatures

We ran Codecult Codecreatures with the default settings.

This test is based on an engine that uses pixel and vertex shaders, requires high fill-rate and loads the graphics card with a lot of geometry calculations (up to 700.000 triangles in a single scene).

There are no games using this engine yet, but still GeForce FX 5800 Ultra outperforms ATI RADEON 9700 PRO here.


3DMark 2001 SE Gaming Tests

All the tests from 3DMark 2001 were run in the “Low Details” mode. In the “High Details” the CPU often limits the performance of the graphics card making their comparison absolutely impossible.

The tests ran at 32-bit texture color and frame-buffer depths; Z-buffer color depth equaled 24 bit. The settings of the graphics cards were the same as in the previous section.

Game 1:

We can’t state that any of the cards tested is a definite No.1 here. Moreover, the results are greatly limited by the CPU.

But look at the performance of NVIDIA GeForce4 Ti4800! It’s best in 1024x768 resolution, when the workload on the graphics card is low! This can indicate only one thing: good driver optimization. The results of NVIDIA GeForce FX in 1024x768 suggest that the Direct3D part of its driver is less efficiently optimized.

Quite expectedly, 4x full-screen anti-aliasing makes the situation look much better for ATI RADEON 9700 PRO. And while NVIDIA GeForce FX 5800 Ultra wins 1024x768 and 1280x1024 resolutions, RADEON 9700 PRO comes ahead in 1600x1200 when the graphics memory bus bandwidth affects the overall results most.

3DMark 2001 doesn’t use tri-linear filtering in principle. Thus, the task of NVIDIA GeForce FX 5800 Ultra when performing anisotropic filtering is easier and it outperforms ATI RADEON 9700 PRO.

NVIDIA GeForce FX holds its ground in the heaviest mode until 1600x1200 resolution, when ATI RADEON 9700 PRO shows best thanks to its higher memory bus bandwidth.

Game 2:

Just like in the first gaming test of 3DMark 2001, NVIDIA GeForce FX is behind ATI RADEON 9700 PRO in 1024x768 when the graphics card workload is minimal. And once again this indicates insufficient optimization of the Direct3D part of the driver for GeForce FX.

In higher resolutions the graphics cards go neck and neck.

Full-screen anti-aliasing favors ATI RADEON 9700 PRO, but this time the advantage of the latter is even higher than in the previous test. The higher OverDraw value of this test must be allowing ATI RADEON 9700 PRO to make best use of its optimal HSR implementation.

When we enabled anisotropic filtering, ATI RADEON 9700 PRO lost its top position. 3DMark 2001 doesn’t use tri-linear filtering and this passes the advantage over to NVIDIA GeForce FX 5800 Ultra.

But when both: full-screen anti-aliasing and anisotropic filtering are enabled, ATI RADEON 9700 PRO regains its first place thanks to its higher memory bus bandwidth.


Game 3:

The results of this test in all resolutions are limited by the CPU performance. Better optimization of ATI RADEON 9700 PRO drivers ensures its top position.

Full-screen anti-aliasing only makes things worse for NVIDIA GeForce FX 5800 Ultra: it falls farther behind.

Anisotropic filtering is working under the conditions advantageous to NVIDIA GeForce FX. So, this card comes ahead in 1600x1200 resolution, where CPU performance and drivers optimization quality is of lower importance.

The combination of full-screen anti-aliasing and anisotropic filtering quite expectedly brings ATI RADEON 9700 PRO ahead in higher resolutions.

Game 4:

This is the hardest of all 3DMark 2001 tests. The cards show similar performance, only the “Application” mode of NVIDIA GeForce FX spoils the picture: this highest-quality mode leads to a surprisingly big performance drop.

Traditionally, 4x full-screen anti-aliasing makes the results shift towards ATI RADEON 9700 PRO.

Once again NVIDIA GeForce FX 5800 Ultra shows a surprisingly big performance drop in the “Application” mode and loses to ATI RADEON 9700 PRO in the corresponding “Quality” mode. On the other hand, GeForce FX outruns RADEON 9700 PRO in “Balanced” and “Aggressive” modes.

Full-screen anti-aliasing added to anisotropic filtering allows ATI RADEON 9700 PRO to defeat the competitor in higher resolutions.


3DMark03 Gaming Tests

The 3DMark03 benchmarking set provoked a lot of critical remarks from independent reviewers as well as from NVIDIA. So, at first we didn’t want to use this software in the GeForce FX review. But this graphics card showed such interesting results in 3DMark03 – or I would even say illustrative results - that we decided to post them.

The graphics cards were benchmarked in the “Balanced” mode, that is, with the default drivers settings.

Game 1:

NVIDIA GeForce FX 5800 Ultra shows a certain advantage over ATI RADEON 9700 PRO. This is the simplest test from 3DMark03. It uses ver.1.1 vertex shaders and no pixel shaders.

Full-screen anti-aliasing lets GeForce FX down: the graphics memory bus bandwidth greatly affects the overall result here and it is not among GeForce FX’s strong points.

Here NVIDIA GeForce FX 5800 Ultra uses its anisotropic filtering in the “Balanced” mode, while RADEON 9700 PRO – higher-quality and slower variation of anisotropic filtering from ATI. This explains the higher results of NVIDIA GeForce FX.

With the “heaviest” settings, NVIDIA GeForce 5800 Ultra stays at the top only in 1024x768 resolution.

Game 2:

NVIDIA GeForce FX 5800 Ultra beats ATI RADEON 9700 PRO to nothing! Its flexible NV30 architecture must have shown itself here, at last.

The Game2 scene has several light sources. Each of them casts shadows processed in real time with the help of the stencil-buffer. The engine of the test is built in such a way that the graphics card has to process the stencil-buffer for each light source before the final scene-rendering. NVIDIA GeForce FX performs this procedure at a double speed: when the color is not written into the frame-buffer, it shapes itself up into an eight-pipelines structure and processes eight pixels from the stencil-buffer per clock.

This faster work with the stencil-buffer must have helped GeForce FX 5800 Ultra outperform its ATI’s rival.


Game 3:

Game3 uses the same technique of dynamic shadows rendering as Game2, and NVIDIA GeForce FX 5800 Ultra is again faster than ATI RADEON 9700 PRO.

Game 4:

The results of Game4 are not so easy to explain.

We know that this test uses pixel shaders of 1.4 and 2.0 versions and vertex shaders of 1.1 and 2.0 versions. And we also know that the synthetic pixel and vertex shader tests of 3DMark 2001 and 3DMark03 don’t show any clear advantage of NVIDIA GeForce FX 5800 Ultra. Why isn’t it in the lead in Game4 then?

We guess the 42.68 driver is specially optimized for 3DMark03. When the driver meets “familiar” code of pixel and vertex shaders from 3DMark03, it probably substitutes it with another code producing the same visual effect, but running faster on GeForce FX. Or it may use lower precision of calculations in pixel shaders.

Of course, this is only a supposition. But if we take another driver, for example, version 42.82, we will see a curious thing: GeForce FX will perform about the same in all games, but slower in 3DMark03. In this case, NVIDIA GeForce FX 5800 Ultra would lose all its present advantage over ATI RADEON 9700 PRO.


Conclusion

So, what are our impressions about the new creation of NVIDIA?

The main feature of NVIDIA GeForce FX is the highest level of flexibility and programmability going beyond the basic requirements of DirectX 9 and surpassing the capabilities of ATI RADEON 9700 PRO. The era of “cinematic graphics” has not yet come with the arrival of GeForce FX, but looms at the horizon.

We may seem skeptical, but justifiably so: we are not at all sure DirectX 9 is going to come into our computers soon. While we are still looking for beautiful water created with DirectX 8 pixel shaders and waiting to see games that would really use the possibilities of DirectX 8 at full, not just for demo purposes, a new revolution takes place: DirectX 9 comes out. We are astonished at the new graphics level achieved in demo programs. But when will we see such graphics in real games? In half a year? In a year? We can’t tell you now.

The only positive thing is that Microsoft seems to understand how far the developers are from the ordinary user and is actively promoting a high-level programming language intended to simplify and speed up the process of writing applications using shaders. It is called High Level Shader Language (HLSL). NVIDIA and ATI, of course, are supporting this initiative as actively they can.

But back to NVIDIA and its babe.

Our test session showed that the flexible architecture and programmability of NVIDIA GeForce FX resulted in reduced performance. In some cases, the performance dropped down quite considerably.

The weakest point of the new GPU is the lack of computational power of its ALUs that are responsible for pixel shaders execution. DirectX 9 pixel shaders that use floating-point calculations are the hardest nut for GeForce FX: in this area ATI RADEON 9700 PRO leaves the newcomer not a single chance.

But this is the only evident drawback of NVIDIA GeForce FX. And why should we care about slow execution of those pixel shaders on GeForce FX? By the time computer games use them, there will be NVIDIA GeForce FX II or even III already. In any other respect GeForce FX is a worthy product, not yielding to ATI RADEON 9700 PRO in most gaming tests.

I would like to specifically single out the introduction of “fast” anisotropic filtering. The algorithm of anisotropic filtering from NVIDIA didn’t practically change, but the replacement of tri-linear filtering with the mixture of bi- and tri-linear filtering allowed losing less performance. In other words, this covered up the traditional weak spot of GPUs from NVIDIA. The user can now choose between speed and quality. Moreover, we should acknowledge that the quality tradeoff is small even in the fast mode.

The introduction of frame-buffer compression helps to increase performance when working with full-screen anti-aliasing. As a result, NVIDIA GeForce FX suffers a much lower performance drop from FSAA than GeForce4. But the new NVIDIA’s card doesn’t always look winning against ATI RADEON 9700 PRO: with FSAA, the graphics memory bandwidth affects the overall card’s performance. GeForce FX 5800 Ultra has a twice as narrow bus and thus a little lower bandwidth, although its memory works at sky-high frequency.

The new FSAA modes, 6xS and 8xS, must be the answer to ATI’s SMOOTHVISION 2.0 that uses up to 6 samples. But these new methods are only available in Direct3D and lead to a big slow-down because they also involve supersampling. So, they have a narrow scope of application: Direct3D games that don’t require much from the graphics card’s speed. In all other respects, the anti-aliasing from NVIDIA is the same as we saw in previous chips.

But how did NVIDIA actually profit from the launch of GeForce FX?

First, NVIDIA has successfully gone over to 0.13micron manufacturing technology, which has required much time and effort. Now the company can steadily go on developing, while ATI has to do the transition yet.

Second, by releasing NV30 and GeForce FX 5800 Ultra, the company at last offered an answer to ATI’s RADEON 9700 PRO.

Third, the innovations from NV30 were used in NV31 and NV34. These are simpler, better value and mass (read: “money-making”) DirectX 9-compatible chips.

I believe that NVIDIA GeForce FX 5800 Ultra based graphics cards will hardly become really mass products. Yes, the solution we tested appeared very fast in the existing gaming applications. Besides, it supports all latest features and from the technological viewpoint looks a little more attractive than ATI RADEON 9700 PRO... But the noise produced by the monstrous cooling system makes all these advantages fade away, as well as the price of this product. The graphics cards based on NVIDIA GeForce FX 5800 Ultra appeared so high, that ATI RADEON 9700 PRO still remained an unbeaten leader in the high-performance graphics cards sector from the price-to-performance point of view.

Well, let's wait and see how the things will go. Very soon we can expect the new NV35 and ATI R350, andf it means that the picture in the today's graphics market may easily change...

<%BANNER[banner_468x60_f]%>