by Tim Tscheblockov , Alexey Stepin , Anton Shilov
05/04/2004 | 06:42 AM
ATI Technologies and NVIDIA Corporation companies, two giants, who own the consumer 3D graphics market today are constantly involved in cut-throat competition: this fact is no secret for any more or less aware computer enthusiast. There certainly can be some periods of calm, when the rivals regain their power and exchange only a couple of tactical strikes, but as the new generation of graphics processors is released, a new round of confrontation gets started.
<%BANNER[article]%>Not so long ago, about a month or so, ATI Technologies, which has got pretty secure positions in the market suffered a serious blow. NVIDIA announced a new graphics chip series also known as GeForce 6800/6800 Ultra and based on NV40 architecture, which got considerably improved compared with the previous NV3x generation and already managed to prove highly advantageous. So, ATI R360 graphics processor, which was at the top of the high-performance 3D graphics card series from ATI, immediately got ousted from the leader’s pedestal.
We didn’t have to wait for long for ATI to strike back. On May 4, 2004 the company finally put into life the well-prepared response. It’s new “weapon”, the new RADEON X800 series, is exactly the topic of our today’s discussion.
We will tell you more about the peculiarities and features of the new ATI architecture a little bit later, and in the meanwhile let’s take a closer look at the actual graphics cards we had in our testlab.
New graphics processor family from ATI aka R420 got a marketing name “RADEON X800”. You can interpret this name in many ways. Firstly, the Latin letter “X” makes the name sound very modern and very dynamic: “X” stands for “extreme”, extreme performance, extreme quality, etc. Secondly, “X” can also be read like a Roman number 10, which replaces the “9” in the previous generation graphics processor family: RADEON 9xxx. One way, or another, we consider this to be a very successful name for the new generation of graphics processors.
The new graphics card family from ATI based on R420 architecture will consist of three cards: RADEON X800 XT Platinum Edition; RADEON X800 PRO and RADEON X800 SE.
Having launched RADEON X800, ATI decided to split the graphics card family according to the performance differences between the chip modifications and to various pricing in a little bit different way compared to what they did before. If you remember, RADEON 9800 based graphics cards differed from RADEON 9800 PRO and RADEON 9800 XT mostly by the VPU and graphics memory clock frequencies. Now different models from the new RADEON X800 family will also differ by the number of pixel pipelines and memory bus width. You can use the table below to guide you through the differences between the three chips of the new RADEON X800 series:
Model | RADEON X800 XT Platinum Edition | RADEON X800 PRO | RADEON X800 SE |
VPU frequency | 520MHz | 475MHz | ? |
Pipelines | 16 | 12 | 8 |
Memory type | GDDR3 | GDDR3 | DDR |
Memory size | 256MB | 256MB | 128MB ? |
Memory frequency | 1120MHz | 900MHz | ? |
Bus width | 256bit | 256bit | 128bit |
Recommended price | $499 | $399 | ? |
As you can see from the table, R420 can work at much higher frequencies than its predecessors, or even NVIDIA NV40 (GeForce 6800/6800 Ultra).
Among a bunch of other factors that allowed achieving such high working frequencies, we should definitely keep in mind the importance of the fact that RADEON X800 is manufactured with 0.13micron low-k dielectric and copper compound technology.
The idea behind low-k dielectric technology implies that they use special material with low dielectric permittivity. The material is called “Black Diamond” and it allows reducing the spurious capacitances, which appear between the conductors connecting the functional units of the graphics processor located on die and thus reaching stable functioning at higher working frequencies. The reduction of dielectric permittivity also allows making the power consumption and heat generation much lower. The use of copper interconnects also serves the same purpose: copper boasts lower electric resistance than aluminum used in the old 0.15micron production process.
It is exactly due to this advanced production technology that the power consumption and heat dissipation of the new graphics solutions from ATI haven’t got any higher, according to the company. Moreover, they even claim that these parameters got lower compared with what the previous R360 based graphics cards demonstrated.
According to ATI Technologies, RADEON X800 PRO graphics adapter consumes about 50-60W on boot-up, while RADEON 9800 XT required about 65-75W for the same operations.
This way, the power consumption of the new RADEON family is much lo0wer than that of GeForce 6800 Ultra, which is a definitely pleasing fact: there is no need to look for a super-powerful and super-expensive power supply unit, because the minimal recommendation set by ATI to the power supply units is 300W, which is definitely much more democratic than the 480W PSUs recommended by NVIDIA.
It is also quite probable that at least RADEON X800 PRO will have no problems working in high-end mini-systems, such as the Shuttle cubes for instance, which are equipped with 200W PSUs. This is definitely a very pleasing piece of news for SFF system fans, who at the same time are not happy with NVIDIA’s claims for 480W power supplies for the normal operation of their GeForce FX 6800 Ultra.
Nevertheless, we believe that even though ATI’s requirements to the PSU are not that severe, you should make sure that you have a good and reliable power supply unit in your system, which ensures stable voltages and proper currents. The future owners of the new RADEON X800 PRO/XT solutions should pay due attention to the PSUs from such companies as Thermaltake, Levicom, Zalman, Delta, Antec offering solutions bstarting at 400W and up.
It is very interesting that ATI at first planned to respond to NVIDIA’s new architecture with an 8-pipeline solution, as they assumed this would be enough to compete with NVIDIA. However, later they started talking about 12 pixel pipelines, and a while after that they decided to introduce 16 pixel pipelines, fearing that their strike back would not be strong enough. Well, our previous review devoted to NVIDIA GeForce 6800 Ultra showed that this is a truly very fast graphics chip, which is much faster than the former leader, ATI RADEON 9800 XT, from many viewpoints. So no doubt that they were concerned about 8 or even 12 pixel pipelines to be enough to compete with NV40 in a fair duel.
By the way, I would like to draw your attention to the number of pixel pipelines in different R420 versions and a few things you should take not of in this respect. Some overclocking enthusiasts may have already checked out the table above and are probably looking forward to the possible ways of enabling the additional pipelines on the slower card modifications, which could be done to a few older RADEON solutions. Remember, our hints to soldering an additional resistor to the die surface, which allowed turning the four-pipeline architecture into the 8-pipeline one. Slow down, guys, your dreams may never come true, because ATI claims that there will be no way to enable the disabled pipelines any more. Time will show if this is true or not, but so far we haven’t found an evident and simple way to enable additional pipelines on the slower graphics cards modifications.
As for the physical implementation of the new graphics processors, we managed to get hold of two graphics cards based on RADEON X800: RADEON X800 XT Platinum and RADEON X800 PRO. Among the cards based on X800 chips, these two are currently the most interesting ones, because RADEON X800 SE is too limited from the functional point of view to be able to compete with GeForce 6800 and 6800 Ultra.
So, let’s take a closer look at the cards now.
When we removed the package, we didn’t believe our eyes at first glance: the two cards looked so familiar and so similar to the good old RADEON 9800 XT! The compact PCB, flat single-slot cooling solution onboard, everything reminded of RADEON 9800 XT, and only the stickers on the PCB and the memory chips marking indicated that we are looking at the graphics cards based on the new generation chips.
Anyway, take a look yourselves:

Well, this is a npretty modest design I should say, compared with GeForce 6800 Ultra, don’t you think so? The seeming simplicity of the PCB design surprised us most of all, I should say, as we had expected to see really monstrous graphics cards.
But the exterior may make an illusory impression: it was really RADEON X800 XT Platinum Edition and RADEON X800 PRO, 16- and 12-pipelines monsters from ATI.
All in all, RADEON X800 XT Platinum Edition and RADEON X800 PRO have the design very similar to that of RADEON 9800 XT, but look maybe even simpler than that, as the biggest part of a pretty compact PCB is covered with a screen and the soldered SIMD-elements are very few.
Just like the RADEON 9800 XT PCB, this one is equipped with a standard connector for additional power supply, as the power consumption of this solution doesn’t allow to give up additional power supply. Next to the additional PSU connector, there is a yellow connector, looking like an internal audio port of the contemporary sound cards. ATI Technologies representative claims that it is none other but a composite video-In, which allows implementing the video In on the front panel of the PC case with the help of a special cable attached to this connector. This is a pretty original and non-standard solution, which we haven’t yet seen anywhere, however, it will definitely make sense in certain situations.
The front side of the PCB doesn’t look that impressive, and so doesn’t the back side: you will find only the chips responsible for the graphics processor and graphics memory voltage regulation, and a traditional set of SIMD elements. The only remarkable thing there is the ATI Rage Theater chip implementing the VIVO function. It is also interesting that for some mysterious reason the cards were equipped with old Rage Theater chips, instead of a more up-to-date Rage Theater 200.
You can also notice a metal clip on the back side of the PCB. It serves as a part of the cooler retention mechanism, and also protects the graphics cards against possible physical damage caused by extreme bending of the PCB during work.
The cooling system of the VPU is almost identical to what we saw by RADEON 9800 XT:

The cooler design is pretty simple: the copper plate with a 70mm fan is equipped with copper ribs, and covered with a plastic case with a hole for the air to get freely to the fan. The case guides the airflow from the fan along the ribs of the heatsink, and after that the warm air is thrown upward in the direction opposite to the fan.
This is a simple, compact and pretty efficient solution, but will it be enough to cool down the chip made of more than 1.5 hundred million transistors?
As we have already mentioned, the power consumption and also the heat dissipation of the R420 chip is claimed to be not any higher than that of R360, that is why I believe the answer to this question will be positive.
Well, practical tests proved our point: the cooler did its job very well. It was not cold at all, but its temperature didn’t cause us any concerns.
Just like in case of the reference RADEON 9800 XT, there was barely any noise produced by the cooling system. Having sped up to the maximum rotation speed during boot-up, the cooler slowed down immediately and worked at half the speed. Only a few times the fan reacted to the VPU temperature growth and started rotating faster making a bit more noise in this case.
So, from the noise level point of view, new ATI Technologies’ solutions look much more attractive than the NVIDIA GeForce 6800 Ultra.
Since we had two graphics cards based on different RADEON X800 versions, we decided to remove the coolers from both of them to see, if there are any visual differences between the two graphics processors, and if there are any, then we should definitely try to find a way to unlock the 4 additional pipelines on RADEON X800 PRO.
However, when we removed the heatsinks, we discovered no differences between the chips. It looks like the info about the type and hence about the configuration of the pixel pipelines of the given graphics chip is written in the graphics card BIOS, or in the special Read-Only chip registers, i.e. it is not determined by the location of resistors on the die or on the graphics card PCB. So, here are a few close-ups for RADEON X800 XT and RADEON X800 PRO:
The marking suggests that one of the chips is manufactured on the 11th week of this year, while the second chip – on week 13. So, they are hardly any newer than the NV40 chip, we had on our GeForce 6800 Ultra sample during the test session.
Both graphics cards are equipped with GDDR3 memory in Samsung chips. Each graphics card is equipped with 256MB of memory. However, RADEON X800 XT Platinum Edition boasts memory with 1.6ns access time, which can work at up to 600MHz (1200MHz DDR) frequency, while RADEON X800 PRO features slower memory with 2ns access time capable of working at the maximum of 500MHz (100MHz DDR).

The nominal graphics memory frequency of the RADEON X800 XT Platinum Edition is 560Mhz (1120MHz), while RADEON X800 PRO features the memory working at 450MHz (900MHz). As you may have noticed, both graphics cards have their memory working at slightly lower frequencies than the allowed maximum, probably for stability purposes.
It is also remarkable that the memory chips on our graphics cards were not equipped even with the simplest heatsinks. Of course, the heat dissipation of the GDDR3 memory is considerably lower than that of the GDDR2, but when the card is working at such higher frequencies, the memory chips definitely need additional cooling.
Since there was no extra cooling for the memory chips on RADEON X800 XT, they got scary hot when the memory bus workload increased, and we couldn’t even touch them without burning the fingertips. Moreover, after a while we noticed a few visual artifacts on the monitor, which are typical of memory over-overclocking. We immediately installed a 120mm fan blowing extra air along the graphics card, and this resolved the memory overheating issue completely.
Well, we can only hope that memory overheating is a problem of the samples we got for our tests, and the graphics card makers will keep it in mind and equip their mass products with memory cooling solutions.
So, the first impressions of the graphics cards appeared highly positive: the new RADEON X800 XT Platinum Edition and RADEON X800 PRO based graphics solutions look very nice against the background of NVIDIA GeForce 6800 Ultra, even despite a few drawbacks we discovered and pointed out to you.
Now it’s high time we paid more attention to the architecture of the new graphics processors from ATI.
Having announced new X800 (R420) architecture and a new RADEON X800 XT and RADEON X800 PRO graphics processor family, ATI also introduced a new concept of High Definition Gaming.
The concept of High Definition Gaming appeared due to wide spreading of HDTV – high definition television - and HDTV supporting display devices.
The image quality in HDTV standard can hardly be compared with ordinary image quality, as it is not just a much higher resolution or higher level of detail, but a completely new level of image quality at all. ATI compares the appearance of HDTV with the transition from black-and-white to color images: most people who have once seen HDTV image quality, will have hard times shifting back to regular stuff.
ATI compares the impressions made by HD Gaming on personal computers with those made by the shift to HDTV video quality: it includes stably high fps rate in the highest resolutions, including 1600x1200 and 1920x1080, incredible level of detail, new level of image quality, realistic effects and really living worlds.
X800 architecture capable of implementing HD Gaming includes a number of absolutely new technologies, as well as a number of technologies inherited from the previous architecture and improved accordingly. The marketing names for the technologies remained the same, but instead of the numeric indexes we now have HD suffix:
SMARTSHADER HD: vertex and pixel X800 processors;
SMOOTHVISION HD: anisotropic filtering and full0screen anti-aliasing algorithms;
HYPER Z HD: technologies improving the efficiency of the available memory bus bandwidth;
3Dc: new compression method for normal maps.
Before we pass over to the details hiding behind these marketing names, we would like to offer you a comparative table with the features of the most high-performance graphics processors of the previous and current generations: ATI RADEON X800 XT Platinum Edition, ATI RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
ATI RADEON X800 XT Platinum Edition | ATI RADEON 9800 XT | NVIDIA GeForce 6800 Ultra | NVIDIA GeForce FX 5950 Ultra | |
Manufacturing technology | 0.13micron low-k | 0.15micron | 0.13 micron | 0.13micron |
Number of transistors | 160 mln. | 110 mln | 220 mln | 125 mln |
Clock frequency | 520MHz | 412MHz | 400MHz | 475MHz |
Graphics memory controller | 256bit DDR/DDR2/GDDR3 SDRAM | 256bit DDR/DDR2 SDRAM | 256bit DDR/DDR2/GDDR3 SDRAM | 256bit DDR SDRAM |
Graphics memory clock frequency | 1120 (560 DDR) MHz | 730 (365 DDR) MHz | 1100 MHz | 950 (475 DDR) MHz |
Memory bus peak bandwidth | 33.4GB/s | 21.8GB/s | 32.8GB/s | 28.3GB/s |
Maximum graphics memory size | 512MB | 256MB | 512MB | 256MB |
Interface | AGP 3.0 4x/8x | AGP 3.0 4x/8x | AGP 3.0 4x/8x | AGP 3.0 4x/8x |
Pixel processors, pixel shaders | ||||
Shader model | 2.x | 2.0 | 3.0 | 2.x |
Static loops and branching | yes | no | yes | no |
Dynamic loops and branching | no | no | yes | no |
Multiple Render Targets | yes | yes | yes | no |
Floating-Point Render Target | yes | yes | yes | no |
Maximum number of pixels per clock cycle | 16 | 8 | 16 | 4 |
Maximum number of Z values per clock cycle | 16 | 8 | 32 | 8 |
Number of texturing samples | 16 | 8 | 16 | 8 |
Texture filtering algorithms | Bi-linear, | Bi-linear, | Bi-linear, | Bi-linear, |
Maximum level of anisotropy | 16õ | 16x | 16x | 8x |
Vertex processors, vertex shaders | ||||
Shader model | 2.õ | 2.0 | 3.0 | 2.x |
Number of vertex processors | 6 | 4 | 6 | 3 |
Static loops and branching | yes | yes | yes | yes |
Dynamic loops and branching | no | no | yes | yes |
Reading textures from the vertex shader | no | no | yes | no |
Tesselation | no | no | no | no |
Full-Screen Anti-Aliasing | ||||
FSAA algorithms | Rotated-grid multi-sampling, | Rotated-grid multi-sampling | Ordered-grid super-sampling, | Super-sampling, |
Number of samples | 2,4,6 | 2,4,6 | 2..8 | 2..8 |
Technologies increasing the efficiency of the memory bus bandwidth | ||||
Hidden Surface Removal (HSR) | yes | yes | yes | yes |
Texture, Z-buffer, frame buffer compression | yes | yes | yes | yes |
Fast Z-buffer clear | yes | yes | yes | yes |
The specifications listed in this table let us make one conclusion: due to higher clock frequency, RADEON X800 XT Platinum Edition shouldn’t yield in performance to NVIDIA GeForce 6800 Ultra, and the functional differences should really be minimal, even though ATI doesn’t claim the support of shader version 3.0. They only state that their solution supports shader version 2.x, however, the functional level of the new RADEON X800 series is closer to shader version 3.0 than to the good old shaders of the R300 generation.
But, let’s not rush ahead of time and discuss everything from the very beginning.
The first thing to catch your eye when you are checking out the specs of the new RADEON X800 is higher clock frequencies and more pixel pipelines in the new ATI VPUs: RADEON X800 XT Platinum Edition features 16 pipelines, and RADEON X800 PRO features 12.
However, it wouldn’t be quite correct to state that RADEON X800 XT and X800 PRO feature 16 and 12 pixel pipelines respectively, even though the simple flow-chart for RADEON X800 does have all 16 of them:

16 pixel pipelines of RADEON X800 are split into 4 groups with 4 pipelines in each of them. In other words, RADEON X800 in fact has not 16 pixel pipelines, but only 4 of them, but each of these 4 “wide” pipelines works not with isolated pixels but with groups of four pixels.
![]()
As a result, XT version of the RADEON X800 features 4 pipelines of the kind, which corresponds to 16 “regular” pipelines, while the PRO version features only three of them and can display 12 pixels per clock this way.
The remarkable thing about it is the fact that all versions of the RADEON X800 graphics processors are identical and all of them feature the full set of “wide” pixel pipelines, but only RADEON X800 XT uses all four of them. RADEON X800 PRO has one pipeline disabled, and RADEON X800 SE will have two “wide” pipelines disabled. This approach is a good solution for ATI, as the use of identical dies for all VPU versions allows using the solutions with some defects in one or two of the “wide” pipelines for slower VPU modifications by simply disabling the defective quarter. This way they can also increase the overall production yields, as less chips will now go into the waste basket. Moreover, ATI claims that disabling the pipelines will not affect the work of the HYPER Z HD technology. In other words, we should no longer have the same problem as that of RADEON 9800 SE, i.e. when the disabled half of the pipelines and disabled HyperZ resulted into an even more severe performance drop.
But let’s get back to pixel pipelines of the R420.
So, each of the four “wide” pipelines in RADEON X800 processes 4 pixels: they are located as a 2x2 (Quad) block. At the same time, there can be a few (dozens or hundreds) 2x2 blocks processed or queuing to be processed: such functional organization allows to hide the sample and texture filtering latency, which otherwise could make dozens and hundreds of clock cycles.

Each “wide” pipeline works independently of the others and has its own independent resources: texturing units, pixel processors, and finally, cache-memory of the pixel processors (Shader State Memory), which stores the info for each processed pixel such as the status of the current shader in progress, temporary registers and constants, interpolated texturing coordinates, color, etc.
The number of available temporary registers, which can be used during shader processing, has been increased in R420 compared with what we had in R3x0: they increased their number from 12 to 32. This way, R420 appeared even less sensitive to the shader complexity: as you remember, pixel processors of NVIDIA NV3x lost much of their efficiency during complex shaders processing, which required involving a lot of temporary registers.
The internal calculations precision in floating-point calculations performed by the pixel processors remained the same: RADEON X800 supports the data in 16bit and 32bit floating-point format, but performs all calculations in 24bit format only.
![]()
The computational power of the pixel processors of the new RADEON X800 got significantly higher compared with the previous architecture: the number of scalar and vector arithmetic-logical units (ALUs) grew twice as big now. The former ATI pixel processors with one vector ALU, one scalar ALU and one texture addressing unit could perform up to three instructions per clock cycle for each pixel, however, ATI RADEON X800 has twice as many scalar and vector ALUs that is why its pixel processors can perform up to 5 instructions for each pixel per clock cycle.
Besides all other improvements, RADEON X800 pixel processors can now process much longer shaders compared with what the previous generation solutions could do. The maximum number of scalar and vector mathematical instructions was increased from 64 to 512, and the number of texturing instructions – from 32 to 512. Altogether the maximum number of texturing instructions, as well as scalar and vector mathematical instructions grew from 160 to 1536.
The drastic increase in the maximum number of shader instructions allows RADEON X800 to perform much more complex and resource-hungry pixel shaders in a more efficient way. If the shader turns out so complicated that the VPU resources will turn out insufficient to process it within a single pass, RADEON X800 graphics processors will split its processing into a few stages: the shader will be divided into a few fragments, which will then be processed one by one, And the intermediate results for each fragment will be temporary stored in a special buffer called Fragment Stream FIFO Buffer. It is exactly the F-Buffer, which was officially announced together with the RADEON 9800/9800 PRO solutions, together with a number of improvements intended for higher efficiency of multi-pass shader operations.
Compared with the previous architecture, ATI RADEON X800 boasts wider functionality, however, it nevertheless doesn’t support dynamic branching and looping in the pixel shaders. Therefore, ATI cannot claim the support of shader version 3.0 unlike 2.x, as the support of dynamic loops and branching is a requirement for the shaders version 3.0.
ATI certainly sees the benefits of the 3.0 shader model, but they believe that the time of 3.0 shaders hasn’t come yet: the use of dynamic loops and branching by the existing architectures inevitably results into a performance drop. Even NVIDIA warms against careless use of dynamic branching, and this is definitely not the thing the manufacturers want. At the same time, introducing the corresponding support would require a significant revision of the RADEON X800 pixel processors architecture, which have been intended for non-linear shader processing from the very beginning. As a result, having weighed all cons and pros, ATI engineers decided to give up fully-fledged shader version 3.0 support. Instead they are most likely to introduce a “cut-down” version of the standard without the support of dynamic loops and branching, which will be called 2.b.
The biggest advantage of the RADEON X800 pixel processors architecture, is their stable efficiency and predictable performance. Unlike NVIDIA’s reference cards, ATI RADEON X800, just like the previous ATI VPUs, reacts much calmer to the increase of temporary registers or change of the mathematical and texturing instructions during shader processing. It means that the developers will be able to create efficient shaders for RADEON X800 with less effort.
Here I should definitely say that with the launching of RADEON X800, ATI decided to follow in NVIDIA’s footsteps and introduce their own shader compiler optimizations. The higher computational capacity of the RADEON X800 pixel processors, should be used in the most efficient way. So, the primary goal is to minimize the number of situations when some ALUs of the pixel processors stay idle, therefore, the compiler will analyze the initial shader code and rearrange the instructions so that they could be processed in parallel.
A little bit later, I will also try to estimate in practice how efficiently RADEON X800 copes with pixel shaders, and now let’s dwell on vertex pipelines of RADEON X800.
First of all, I would like to say that RADEON X800 acquired more vertex processors: their number has been increased up to 6 (the previous generation high-end graphics chips from ATI featured only 4 vertex processors). If we also add here the slight increase in the clock frequencies, we will be able to immediately evaluate the minimal geometrical performance boost we will get when shifting to RADEON X800 solution.

The geometric part of RADEON X800 consists of 6 processors and a set of non-programmable functional units responsible for projecting vertexes processed by the pixel processors onto the screen surface, setting initial parameters for HyprZ and splitting the vertex triangles into tiles, setting the initial parameters for pixel processors, creating queues of 4-pixel blocks, etc.
Each of the vertex processors of the new RADEON X800 features two ALUs, which can perform commands in parallel: these are the vector and scalar ALUs. The calculations are done in 32bit floating point format.
RADEON X800 vertex processors do not support dynamic branching and loops in vertex shaders, just like the pixel processors. Together with the absence of texture sampling units this fact eliminates all the hopes to see RADEON X800 support vertex shader version 3.0 on the hardware level.
So, as for the vertex processors, RADEON X800 also boasts a few improvements compared with the previous generation architecture. They changed the functionality of the vertex processors, as well as the overall geometrical pertformance of the RADEON X800.
A little later in our today’s review we will be able to evaluate the vertex shader processing speed as well as the geometrical performance of the RADEON X800 XT Platinum Edition and RADEON X800 PRO solutions in the synthetic benchmarks. And now let’s say a few words about HyperZ HD, a new technology aimed at increasing the efficiency of the graphics processor resources.
HyperZ HD is a further developed idea of hierarchical tile Z-buffer, which has been implemented in all ATI solutions starting with RADEON 256.
The major goal of HyperZ HD is to make sure that the resources of the VPU are not wasted on processing of the polygon pixels, which will be located behind the already drawn ones in the final scene and hence will not be visible at all. The simplest example here: imagine that you are in front of the closed door. Then why should we draw the room interior if you cannot see it anyway: the door is closed?
To exclude the whole blocks of pixels, HyperZ HD technology uses not only the “regular” Z-buffer, but also low-resolution Z-buffer, or tile Z-buffer. Each new value taken from these Z-buffers stores the biggest Z value selected from the entire block – tile – of pixels.
RADEON X800 seems to be using two tile Z-buffers with different resolutions. Let’s consider the algorithm used by RADEON X800 for work with tile hierarchical Z-buffers. ATI calls it Hierarchical Z.
When the polygon to be processed arrives, HyperZ HD splits it into 8x8 pixel blocks and finds minimal Z value for 64 pixels of each block: it will evidently be found on one of the corner pixels, that is why it is more than enough to check only the 4 Z values in the corners of the block.
If this Z value appears higher than the value previously stored for this block in the tile Z-buffer (it could get there during the processing of previous polygons, or during Z-buffer initialization before the frame creation), then the entire 8x8 pixels block can be omitted. It will mean that the closest pixel from this block is anyway farther than the farthest of the previously drawn pixels.
If the smallest Z value for the pixels from the considered 8x8 block is smaller than the previously saved Z value for this block, then it means that the entire block or at least a part of it is visible. In this case, the 8x8 block is again split into 4 4x4 pixel blocks and the similar checking in done for each of the smaller 16-pixel groups now.
The Z value for the 4x4 block is stored in the second lower resolution Z-buffer. If the minimal Z value for the pixels from the 4x4 block is higher than the previously saved one, then the block is no longer considered.
Finally, when we discover that the 4x4 pixel block is completely or partially visible, then HyperZ HD finishes up the block by block checking and gets down to its “classical” work, which implies checking the Z values of individual pixels and comparing them with the values stored in the regular Z-buffer (Early Z Test).
To make this entire algorithm work correctly during scene creation, i.e. during polygons drawing, the information stored in the main Z-buffer and in the tile Z-buffers should be constantly updated. Namely, the VPU should save in the Z-buffer the calculated Z values for the pixels, and in the two tile Z-buffers – the maximum Z values for the 4x4 and 8x8 pixel blocks.
Besides that, the initial Z-buffer values should be set before a new frame starts building. It is remarkable that due to this organization of the hierarchical Z-buffer work, there is no need to save the initial values in the main Z-buffer. Instead, you can set the initial values of the “rough” tile Z-buffer, which stores the Z values for 8x8 pixel blocks, and other Z-buffers will get the actual values during scene creation by themselves.
ATI calls this option Fast Z Clear. Really, instead of writing initial values in the main Z-buffer, which is about 8MB in 1600x1200 resolution, we can only save the values in the “rough” tile Z-buffer corresponding to 8x8 pixel blocks and thus transfer 64 times less data.
With 4 “wide” pixel pipelines, RADEON X800 can check the Z values for 4 8x8 pixel blocks per clock cycle. And if they appear invisible, then we can immediately exclude 256 pixels from consideration. Really nice efficiency, don’t you agree?
Besides increasing the efficient use of the graphics processor resources by omitting invisible pixels, HyperZ HD allows to significantly reduce the amount of info referring to Z, which has to be transferred along the memory bus: hierarchical Z-buffers are stores in cache, unlike the main Z-buffer. Moreover, the cache is big enough to store both: hierarchical Z-buffers as well as the required fragments of the main Z-buffer even in the highest resolutions.
So, when we compare RADEON X800 XT Platinum Edition and RADEON X800 PRO with the newest graphics processors from NVIDIA and the previous generation chips in the gaming tests, we will definitely try to see, how efficient HyperZ HD actually is. And now let’s take a break from 3D and pay our attention to the way RADEON X800 plays and processed video streams.
Unlike NVIDIA developers, who provided their offspring, NV40 GPU, with special functions for video streams processing, ATI Technologies’ engineers decided to let well alone and left everything as is. In other words, R420 features the same video processing mechanisms as R3x0/RV3x0 family. However, it doesn’t mean that R420 features weak video processing algorithms. Since the times of RADEON 9700 PRO, ATI graphics adapters can use pixel processors computational power – VIDEOSHADER technology – for this purpose, so RADEON X800 doesn’t have any issues about video processing. VIDEOSHADER helps the graphics processor to perform de-interlacing, reduce noise level, convert color spaces, and apply some sort of anti-aliasing, which removes block artifacts typical of MPEG-4/DivX data compression methods.
Like in case of any other new graphics chip, we tested ATI RADEON X800 during the playback of videos in the whole bunch of different formats: HDTV, MPEG-4/DivX and MPEG-2/DVD. In all three cases, we used Windows Media Player 9.
For your convenience we summed up all the test results for R420, NV40 and DeltaChrome S8 in pretty easy to read diagrams:

During HDTV playback, we discovered no special surprises: the CPU utilization appeared quite acceptable even despite this high quality video stream. The GeForce 6800 Ultra based graphics card couldn’t pass this test OK, because of some driver issues: the CPU utilization always grew up to 100%.
We played the movie in 1440x1080 resolution with 8Mbit/sec bitrate.

When we played the movie encoded with DivX in 640x480 resolution, the newcomer also performed pretty well, although the CPU utilization grew somewhat higher than by S3 DeltaChrome S8 Nitro working in nominal mode. When we enabled Chromotion engine to the full extent, the S3 solution won the first prize.

The DVD playback test also caused no problems. Moreover, RADEON X800 XT performed best of all among the graphics cards that do not use any special enhancements for video decoding algorithms. Even when Chromotion engine of S3 DeltaChrome S8 Nitro was enabled, the CPU utilization got just a tiny bit lower.
As for the image quality during video playback, it was almost the same in al cases, and we do not have any complaints about it.
When RADEON X800 appeared, ATI Company introduced a new data compression algorithm for normal maps. This algorithm is known as 3Dc.
Normal maps are a new step in the bump mapping techniques, and today they get more and more popular. The idea behind the use of normal maps implies that the information about the object surface is stored as a texture, where each texture element saves three components of a vector perpendicular to the object surface in a given point, i.e. of the normal vector.
The use of normal maps allows obtaining a much more detailed and realistic image, without increasing the number of polygons used to build it. As a result, normal maps can be created from the difference between the high-polygon model and simple model of the object. Later on you can use only the simple model, but the normal map you apply to it will make it look almost as good as the reference image.

Usually, normal maps describing the object surface are applied together with the base textures storing the color info about the object. The higher is the level of details for the textures and normal maps, the more realistic the image will look. However, the use of textures and high resolution normal maps increases the memory bus workload that is why some alternative solutions are necessary to retain high level of performance. DirectX9 offers a set of DXTC algorithms providing efficient texture compression, and despite the info losses during compression manage to retain the acceptable texture quality. However, the compression of normal maps cannot be done with DXTC algorithms, because the sharp changes of the normal vector get lost as a result of compression and we get block artifacts instead.

The 3Dc algorithms supported by RADEON X800 on the hardware level is intended for normal maps compression and caused not such a great worsening of the image quality.
Let’s consider the 3Dc algorithm in a bit more detail now.
Firstly, the initial normal maps store three components of the normal vector in each element, while 3Dc stores only two components. When we turn the normal vector into a vector of a unit length, one of the components becomes no longer necessary: later on we will be able to restore this third components knowing the unit length of the vector and the coordinates of two remaining normals.
During the compression of two-component normal maps, 3Dc splits the whole thing into 4x4 elements blocks where each element consists of two normal vector components. The components building up 16 elements of the block are compressed separately: when we find the minimal and the maximal values among the first 16 components of the block elements, 3Dc saves them as they are. Then the linear scale of 8 values is built between the found minimum and maximum. Each of the 16 first components of the block elements is assigned the closest value from this linear scale and a three-bit index, the number of the closest value from the linear scale.
The second components of the normal map are compressed in a similar way. As a result, the 4x4 unit, which initially consisted of 16 elements each including 2 8-bit components of the normal vector, i.e. 256bit block, turns into a record of two 8-bit minimal values, 2 8-bit maximal values and 32 3-bit indexes, which makes the total of 128bit.
This way, when we compress dual-component normal maps, 3Dc ensures data compression with 2:1 ratio. The overall compression including the shift from the regular vector description to the dual-component description is done with the ratio of 4:1.
When we use normal maps compressed with 3Dc algorithm, RADEON X800 stores them in the graphics memory and transfers to the VPU compressed, decompressing them “on the fly”. The restoring of the third normal vector component in the pixel shader doesn’t take too much of the resources: just a few additional instructions should be added to the shader in this case.

So, the use of new compression method for normal maps allows reducing the memory bus workload immensely, when highly detailed maps are used, or, on the contrary, increasing the level of detail on normal maps while the amount of data to be processed and transferred remain the same.
Updated algorithms of anisotropic filtering and full-screen anti-aliasing implemented in RADEON X800 are all united under ATI’s SMOOTHVISION HD name.
ATI claims that RADEON X800 received an enhanced version of anisotropic filtering algorithm, which allows reducing the performance drop without losing any of the image quality. Moreover, during pixel by pixel subtraction of images obtained on RADEON 9800 XT and RADEON X800 XT Platinum Edition, there really appear a few insignificant differences, which get more evident on distanced MIP-levels. However, you will hardly be able to notice the differences between the pictures obtained on RADEON 9800 XT and RADEON X800 XT Platinum Edition.
So, in terms of anisotropic filtering implementation, ATI RADEON X800 will hardly offer us anything brand new: the anisotropic filtering algorithm has been fully transferred from R3x0, with a few minor changes. And the image quality differences between RADEON 9800 XT and RADEON X800 XT Platinum Edition we were talking about in the previous paragraph are really negligible.
However, in order to prove our point we would like to offer you a few screenshot fragments taken in 3DMark 03 during texture filtering quality tests.
At first let’s compare the quality of bi-linear filtering. On the left you see an image from ATI RADEON X800 XT Platinum Edition, and on the right – the image by ATI RADEON 9800 XT. The image in the lower left corner was taken on NVIDIA GeForce 6800 Ultra, while the very last image was taken from NVIDIA GeForce FX 5950 Ultra.
ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT and NVIDIA GeForce FX 5950 Ultra show almost the same picture, while GeForce 6800 Ultra uses a different algorithm for LOD (Level Of Detail) calculations and hence provides somewhat lower texture clarity at the following angles: +/-45 and +/-135 degrees.
Now let’s pass over to tri-linear filtering. The settings in the drivers are set for maximum image quality, but the tri-linear filtering optimizations for GeForce 6800 haven’t been disabled, we will talk more about them later today.
So, the graphics cards were tested in the same order: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
You can see from the screenshots that the location of MIP-level borders remained unchanged, but the borders themselves turned into smooth color transitions due to tri-linear filtering algorithms and MIP-levels highlights. Graphics cards based on ATI graphics processors show a picture with fully-fledged tri-linear filtering, while the NVIDIA based solutions demonstrate images built as a combination of bi-linear and tri-linear filtering.
Now we will enable anisotropic filtering (not the heaviest mode, of course) in addition to the tri-linear one. It will be AF 4x. the testing conditions remained unchanged. The cards go in the same order as before: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra:
ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT and NVIDIA GeForce 6800 Ultra show very similar images, where we can clearly see “inconvenient” angles: the color stripes of the highlighted MIP-levels get closer to the camera. NVIDIA GeForce FX 5950 Ultra doesn’t have clearly visible “inconvenient” angles for anisotropic filtering, which you can see on the screenshot.
Now let’s enable the maximum level of anisotropic filtering. For ATI graphics processors and GeForce 6800 Ultra from NVIDIA, this is 16x AF, while GeForce FX 5950 Ultra supports the maximum of 8x AF. The anisotropic filtering quality in the drivers is set to the maximum, tri-linear filtering is on. You can see a screenshot for ATI RADEON X800 XT Platinum Edition in the upper left corner, the one for RADEON 9800 XT in the upper right corner, the screenshot for NVIDIA GeForce 6800 Ultra is on your lower left, and the one for NVIDIA GeForce FX 5950 Ultra – on your lower right:
ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT and NVIDIA GeForce 6800 Ultra again produced very similar images, differing from what GeForce FX 5950 Ultra showed us.
It is interesting that if we compare the image quality obtained in the most favorable conditions, then NVIDIA GeForce FX 5950 Ultra will lose to all other graphics cards tested: you can clearly see it from the MIP-levels location, for instance, at the horizontal surfaces available in the test scene. However, if we compare the texture quality in the most unfavorable conditions, then GeForce FX 5950 Ultra will outperform all other graphics cards, because the algorithms they are using have a few “inconvenient” angles, where the texture clarity gets significantly worse than what we see by GeForce FX 5950 Ultra with 8x anisotropic filtering. You can clearly see it from the screenshots: the stripes of MIP-levels on ATI RADEON X800 XT/9800 XT and NVIDIA GeForce 6800 Ultra move much closer to the camera than those by NVIDIA GeForce FX 5950 Ultra.
The situation will never be changed, actually, because these “inconvenient” angles where the textures get less clear-cut is the price you have to pay for high performance of the new anisotropic filtering algorithms.
Additional performance growth in case of enabled anisotropic filtering can be obtained only at the expense of further image quality worsening. For instance, NVIDIA graphics processors use a mixture of bi-linear and tri-linear filtering instead of the fully-functional tri-linear filtering, and ATI chips use tri-linear filtering only for the first texture even in Quality mode, when anisotropic filtering is forced.
By the way, the control panel of NVIDIA driver allows disabling all anisotropic filtering optimizations for NVIDIA GeForce 6800 Ultra. In this case, all images obtained on GeForce 6800 Ultra get fully-fledged tri-linear filtering.
The screenshots below were taken on NVIDIA GeForce 6800 Ultra working in such a mode: in the upper left corner you see bi-linear filtering, in the upper right – tri-linear filtering, in the lower left – 4x anisotropic filtering combined with tri-linear filtering, and in the lower right – 16x anisotropic filtering combined with tri-linear filtering.
NVIDIA GeForce 6800 Ultra
We decided to estimate how the texture quality changes depending on the image quality settings in the drivers with the help of the good old Serious Sam game.
The image quality settings in Serious Sam were at the maximum, but the maximum level of anisotropic filtering was forced from the driver control panel: in Serious Sam only tri-linear filtering was enabled.
At first let’s compare the results obtained from ATI RADEON X800 XT Platinum Edition and RADEON 9800 XT. The screenshots should be read as follows: RADEON X800 XT in Quality mode, RADEON X800 XT in Performance mode, RADEON 9800 XT in Quality mode, RADEON 9800 XT in Performance mode:
RADEON X800 XT
Quality mode | Performance mode |
![]() | ![]() |
RADEON 9800 XT
Quality mode | Performance mode |
![]() | ![]() |
The differences between the images obtained on RADEON X800 XT and RADEON 9800 XT cannot be noticed with a naked eye. In Performance mode, RADEON X800 XT as well as the RADEON 9800 XT, disables tri-linear filtering, which can be seen not only in dynamics, but also on the static images.
Now let’s look at the screenshots from NVIDIA GeForce 6800 Ultra. First comes High Quality mode, then Quality mode, then Performance and then High Performance:
GeForce 6800 Ultra
High Quality mode | Quality mode |
![]() | ![]() |
Performance mode | High Performance mode |
![]() | ![]() |
It is really hard to notice any differences in image quality on a static picture. However, in dynamics in Performance and High Performance modes you can see how sharply the MIP-levels change, because of the aggressive tri-linear filtering optimization.
If you look at the images with highlighted MIP-levels, you will notice that on the transition from “quality” to “performance” modes MIP-levels of the second texturing layer get closer to the camera, which indicates that the LOD is getting lower or that the maximum anisotropic filtering mode imposes certain limitations.
Finally, let’s take a look at the results of NVIDIA GeForce FX 5950 Ultra. First comes the Quality mode, then Performance, and then High Performance:
NVIDIA GeForce FX 5950 Ultra
Quality mode | Performance mode | High Performance mode |
![]() | ![]() | ![]() |
Since NVIDIA GeForce FX 5950 Ultra supports only 8x anisotropic filtering, the textures on the farthest MIP-levels appear less clear, than by the previously considered graphics cards. All optimizations are enabled: tri-linear filtering degradation, reduction of the maximum anisotropy level in the most high-performance modes.
In order to estimate the per4rformance losses when anisotropic filtering is enabled, we tested our graphics cards in a few gaming applications. The anisotropic filtering image quality was set to the maximum, we enabled 4x, 8x and 16x anisotropic filtering:





We clearly see, that RADEON X800 suffers the lowest performance losses in case of enabled anisotropic filtering. However, I wouldn’t consider this an immediate victory of the updated anisotropic filtering algorithm from ATI: in Unreal Tournament 2004 we see that the performance of RADEON X800 XT/PRO appeared limited by the CPU, so that enabled AF didn’t have much influence on the results at all.
Call of Duty is the only game where the performance dropped significantly down once we enabled AF. RADEON X800 XT/PRO lost even more than other testing participants, as the CPU and the entire system didn’t tell that greatly on the results of this test.
It is interesting that enabled fully-fledged tri-linear filtering on NVIDIA GeForce 6800 Ultra didn’t affect the results as greatly as we had expected it to: the graphics card slowed down by maximum 10%.
The anti-aliasing algorithms implemented in R3x0 have been fully inherited by the new RADEON X800. The newcomer supports rotated-grid multisampling. We have already told you about the idea of multi-sampling, the samples gamma-correction from ATI and about the advantages of rotated-grid multi-sampling in our articles called On the Way to Ideal Picture: Anti-Aliasing by Contemporary Graphics Cards and NVIDIA GeForce 6800 Ultra and GeForce 6800: NV40 Enters the Scene. That is why let’s not go deep into details, but just compare the anti-aliasing quality of polygon edges on ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
As an example, just like in our GeForce 6800 Ultra review, we will take a Max Payne 2 scene:
The white squares mark those image fragments, which we are going to draw your attention to: the polygon edges there have the most interesting angles there.
The ATI based graphics cards worked in 2x, 4x and 6x AA modes. NVIDIA based solutions worked at 2x, 4x and 8x AA modes.
Well, this is the first image fragment: the polygon edges are almost horizontal. The screenshots should be read as follows: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
FSAA 2x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 4x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 6x/8x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
In 2x mode all graphics cards show almost identical AA quality, but when we shift to 4x mode, the graphics processors using rotated-grid (and these are all of them except NVIDIA GeForce FX 5950 Ultra) show better anti-aliasing quality.
In 6x/8x modes the polygon edges anti-aliasing quality is very similar, though ATI based graphics cards use gamma-correction to sum up the subpixels colors and hence the half-tones on the polygon edges they provide are subjectively more natural thus making much nicer overall picture.
Now let’s pass over to the second fragment where polygon edges are almost vertical. The screenshots again follow in the same order: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
FSAA 2x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 4x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 6x/8x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
Everything we have just said about the first fragment is true for the second fragment, too.
The third fragment features a 45o angle. The screenshots again follow like this: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
FSAA 2x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 4x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 6x/8x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
In 2x mode the NVIDIA based graphics cards provide the worst anti-aliasing quality. Since in 2x mode they are locating the samples along the same diagonal as the polygon edge in this fragment, there appears hardly any anti-aliasing effect at all.
The ATI based graphics cards provide a somewhat better image quality: they locate the samples along a different diagonal.
When we shift to 4x mode, the graphics cards start showing much better anti-aliasing quality. The jaggies are least visible by those cards, which use rotated-grid sampling algorithm, i.e. by RADEON X800 XT/9800 XT and GeForce 6800 Ultra.
In 6x/8x modes ATI based graphics cards show subjectively better image quality as they use rotated-grid multi-sampling with 6 samples, while NVIDIA based graphics cards enable a combination of super-sampling and multi-sampling with the traditional subpixels location.
Finally, let’s take some random angle for the polygon edges. In the fourth fragment it will be around 60o. The screemshots follows in the same order: ATI RADEON X800 XT Platinum Edition, RADEON 9800 XT, NVIDIA GeForce 6800 Ultra and NVIDIA GeForce FX 5950 Ultra.
FSAA 2x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 4x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
FSAA 6x/8x
RADEON X800 XT | RADEON 9800 XT | GeForce 6800 | GeForce FX |
In 2x mode on this fragment the graphics cards cannot provide acceptable anti-aliasing quality. The quality is especially poor on ATI based graphics cards: the polygon edge angle is close to the diagonal where RADEON 9800 XT and RADEON X800 XT locate their subpixels.
When we enable 4x mode, the situation changes to the contrary: ATI RADEON 9800 XT/X800 XT and NVIDIA GeForce FX 5950 Ultra provide good image quality, while for NVIDIA GeForce 6800 Ultra in 4x mode this angle becomes “inconvenient”.
In 6x/8x modes all graphics cards cope with polygon anti-aliasing perfectly well.
So, if we sum up everything we have just discussed, regarding the polygon edges anti-aliasing by the new NVIDIA and ATI processors, we will be able to conclude the following. In 2x mode it is almost identical. In 4x mode the laurels go to ATI, which manages to ensure a more realistic picture provided by the new RADEON X800 due to gamma-correction during multi-sampling. In 6x/8x modes NVIDIA has an advantage due to bigger number of samples, while ATI again succeeds due to gamma-correction algorithms.
It is interesting that in 2x and 4x modes all graphics cards use “pure” multi-sampling that is why the performance loss when these modes are enabled should be about the same by all graphics cards. But then the differences emerge. ATI in 6x mode still uses multi-sampling, while NVIDIA in 8x mode uses a combination of multi-sampling and super-sampling. It is evident that this hybrid 8x mode by NVIDIA GeForce FX 5950 Ultra and GeForce 6800 Ultra will slow down the graphics card performance compared with that of ATI solution in 6x mode, although it will definitely provide better image quality due to improved texturing algorithm and alpha-textures anti-aliasing as a result of super-sampling.
However, why should we guess what the performance will look like? Let’s test our cards in real gaming applications to see how big the performance drop is going to be when we enable full-screen anti-aliasing on the new solutions from NVIDIA and ATI.
In order to estimate the performance drop caused by enabled anti-aliasing algorithms, we tested our graphics cards in a few gaming applications.





The results show that graphics cards based on RADEON X800 lose about the same of the speed as RADEON 9800 XT, however, due to higher initial performance level, RADEON X800 XT/PRO turn out much farther ahead than the predecessors. Moreover, in a number of benchmarks the results of the new ATI cards appeared limited by the system CPU performance even in 2x and 4x modes.
NVIDIA based graphics cards in 2x and 4x modes suffer a comparable performance loss, but as soon as the hybrid 8x mode is enabled, the results drop by 3-4 times. Yes, this is the price you have to pay for the ability to use super-sampling in 8x mode.
In the end of the section devoted to SMOOTHVISION HD technologies from ATI, namely to anisotropic filtering and full-screen anti-aliasing, we would like to tell you a bit more about the new interesting anti-aliasing method. This approach has been introduced in the new ATI VPUs, namely in the freshly launched RADEON X800. we are going to talk about Temporal Anti-Aliasing.
The idea of temporal anti-aliasing is based on the fact that ATI graphics processors can control the samples location in case FSAA is enabled. In other words, the samples location is not preset on the hardware level once and for all, but is taken from a programmable table storing the samples positions inside a pixel.
The second fact used within the Temporal anti-aliasing approach is the existence of such a quality of human eyesight as inertia. When we look at a common CRT monitor, we do not perceive the sequence of frames as a discreet one, although the screen is shining not constantly but manages to light up and fade down a few dozens of times (or even hundreds of times) within each second.
Having combined these two facts together, ATI suggested an interesting way of improving the anti-aliasing quality without any additional computational expenses. If we reprogram the subpixels location for each frame drawn, we will be able to achieve subjectively better anti-aliasing quality, due to the inertial character of the human eyesight.
In other words, if we use for instance 2x mode, but in each even frame place two samples on one pixel diagonal, and in each odd frame – on the other pixel diagonal, then in case of high fps rate the human eye will “average” the even and odd frames and will perceive the image as if it were processed in 4x mode, providing higher visual quality.
ATI showed the location of samples in even and odd frames and the “seeming” result of the Temporal Anti-Aliasing as follows:

The maximum anti-aliasing quality, which can be obtained as a result of temporal anti-aliasing is achieved when we reprogram the samples location in 6x mode. Here the result that we see is equivalent to 12x multisampling. Not bad, eh?
Unfortunately, we cannot show you in this article how the temporal anti-aliasing actually works.
However, we can create and show two screenshots for each mode of the temporal anti-aliasing with different samples location.
To illustrate this interesting algorithm once again we created a few animated pics where the frames change slowly, so that you could see the difference.
And finally, to demonstrate the effect of the eyesight inertia we also averaged frame pairs with the help of the graphics editor software.
So, let’s start now. First come the screenshots with different samples location, then go animated pictures, and then the averaged image.
2x Temporal Anti-Aliasing
| | | ![]() |
| | | ![]() |
| | | ![]() |
| | | ![]() |
4x Temporal Anti-Aliasing
| | | ![]() |
| | | ![]() |
| | | ![]() |
| | | ![]() |
6x Temporal Anti-Aliasing
| | | ![]() |
| | | ![]() |
| | | ![]() |
| | | ![]() |
Judging by the screenshots, the use of temporal anti-aliasing really does ensure better smoothing of the polygon edges. What should we keep in mind to achieve better effect?
First of all, the frame rate should be high enough to make the “trembling” of the polygon edges invisible to the human eye. ATI claims that the standard frame rate values of the regular monitors (50-60Hz for LCD and 70-85Hz for CRT monitors) should be more than enough for that. I got the impression that it would be better to set the maximum frame rate in this mode, that is 100, 120 or even 150Hz. Otherwise, you will still notice the “trembling”. All in all, the higher is the frame rate on the monitor and the more frames the graphics card can display, the better. It is remarkable that if the graphics card cannot process more than 60 frames per second, temporal anti-aliasing gets automatically disabled, in order not to irritate the user with the blinking lines.
Secondly, video synchronization should be enabled. Without it, the frames calculated with different samples location may not get into the frames sequence displayed on the monitor. For instance, while there is the current even frame displayed on the monitor, the graphics card can have enough time to process a few next frames in a row, so that the monitor will again be displaying an even frame.
And finally, if you are using an LCD panel, the effect should be much more stable: you can count not only on the inertia of the human eye but also on the slowness of the LCD matrices.
Unfortunately, we didn’t manage to check if the use of temporal anti-aliasing loads the graphics accelerator even more. When this mode is enabled, the performance measurements we got differed by 2-3 times, which can only be explained by the unfinalized driver version. Another proof to this point is the fact that the driver control panel doesn’t even allow enabling temporal anti-aliasing, and there is a special utility for that.
Having installed the driver supplied together with the graphics cards, we saw at once that it is not marked as the final driver: the Options page indicated that it was CTALYST BETA DRIVER, and the Version field stated “6.14.10.6444”. Besides that there were no other differenced from the Catalyst 4.4 driver: all the pages and control panels of the new driver were absolutely the same as those of the catalyst 4.4/
However, despite the beta-status of the driver, this CATALYST version is the one tending to get WHQL-certified that is why we don’t think there can be any serious problems caused by the “unfinalized” driver.
The only unpleasant thing about using a beta-driver for the tests is the fact that its control panel doesn’t have any options for the management of the new full-screen anti-aliasing method, aka “temporal” anti-aliasing. ATI promises to finish the functions of the control panel in the next CATALYST driver release, and in the meanwhile temporal AA can be managed with the help of a special ATITemporalAASwitch.exe utility.
The testbed used for our test session was configured as follows:
Our ATI RADEON X800 XT Platinum Edition and ATI RADEON X800 PRO will compete with the following solutions during this test session:
Since we also included a solution based on the newest gaming graphics chip from NVIDIA, GeForce 6800 Ultra, the today’s battle for the 3D graphics leader title promises to be the bloodiest of all you witnessed this or last year.
During our discussion of the R420 architecture we will also offer you the results of some theoretical benchmarks, so that we could see, which chip boasts the today’s most advanced graphics architecture. And then we will face the battle of the giants from ATI and NVIDIA in a massive gaming duel, which will be described in detail in a separate article.
The synthetic benchmarks tests will as always start with the fillrate and texturing speed, which we will test with the help of McDolenc Fillrate Tester. The same benchmark allows estimating the graphics cards performance during the processing of simple DirectX9 pixel shaders version 1.1 and 2.0.

The first test round with enabled color writes and Z brought extremely disappointing results. It was not the RADEON X800 XT but the graphics card based on NVIDIA GeForce 6800 Ultra that appeared the leader in texturing speed, even though we had expected the opposite bearing in mind the ability of the newcomer to process 16 pixels per clock and its higher clock frequency of 520MHz.
The curve of RADEON X800 XT indicating the fillrate drop as the number of textures increases, is very similar to the curve obtained for GeForce 6800 Ultra. But if they both run neck and neck in case of no textures, then with the increase in the number of textures RADEON X800 XT falls behind the rival.
If we calculate the efficiency of the RADEON X800 XT and GeForce 6800 Ultra as a ratio between the practical result and theoretical result, considering the situation when each new texture requires an additional clock cycle as a reference point, then we will see that GeForce 6800 Ultra approaches its theoretical maximum as the number of textures increases, while RADEON X800 XT gets away from its theoretical maximum. Moreover, in the heaviest case, when there are 4 textures to be laid, the efficiency of NVIDIA GeForce 6800 Ultra calculated like that will make about 89%, while for RADEON X800 XT it will be only around 49%.
It looks like the efficiency of these graphics cards during simple texturing procedure is limited by completely different factors.
By GeForce 6800 Ultra the reason is probably connected with the insufficient bandwidth of the memory bus: as the number of textures increases, the number of clock cycles required to process each pixel group also grows up, and hence the memory workload reduces thus making the performance approach the theoretical maximum
BY RADEON X800 XT it is completely the opposite. As the number of textures increases, the efficiency drops down, which indicates that RADEON X800 XT has some problems with the textures sampling speed or the ability of the texturing pipeline to hide the latencies of the texture samples.
The graph for RADEON 9800 PRO is remarkably similar to the graph for RADEON 9800 XT. At the same time RADEON X800 PRO behaves a bit differently than RADEON X800 XT: it is the least efficient in case of no textures or only one single texture, i.e. when the memory bus workload is the highest.
It is probably the because of the fact that RADEON X800 PRO features reduced number of pixel pipelines: 12 against 16 by RADEON X800 XT (or 3 “wide” pipelines against 4, which would be more correct). That is why HyperZ HD cannot process the tiles of the same size as before efficiently, because the graphics processor cannot check the required number of Z values within the same clock cycle. Therefore the tile size is most likely to be made twice as small now, so that HyperZ HD starts looking just like HyperZ III by RADEON 9800 XT.

When we disable Z writes, the situation remains almost the same: the numbers are the same as on the previous diagram.

When we disable color writes, the graphics cards demonstrate their maximum Z writing speed: NVIDIA GeForce 6800 Ultra and GeForce FX 5950 Ultra double their performance in this mode. Unexpectedly high results obtained on NVIDIA GeForce 6800 Ultra look promising. As you remember, we have already seen this picture in our NV40 Review (the card shows higher performance than its theoretical maximum).
The above mentioned McDolenc Fillrate Tester, which we have been using in all our reviews, can also be used for pixel shaders performance tests. Let’s take a look at the results:

As we have expected, the graphs for RADEON X800 XT/PRO are absolutely identical to those of RADEON 9800 XT, except for one thing: the performance. RADEON X800 PRO boasts about 1.5 times higher performance and RADEON X800 XT – twice as high performance than their predecessor.
Despite the higher working frequency, RADEON X800XT outperforms just a little bit GeForce 6800 Ultra in pixel shader 2.0 with 32bit precision, and falls behind the rival when half the precision is applied, which is quite natural, as ATI processors always use 24bit precision during floating-point calculations. In simple pixel shaders 1.1 and 2.0 RADEON X800 XT becomes the leader.
12-pipeline RADEON X800 PRO performs just the way it should compared with the RADEON X800 XT, keeping in mind its lower working frequencies and fewer pipelines. As a result, RADEON X800 PRO manages to outperform GeForce 6800 Ultra only in case of a simple shader version 1.1.

Nothing changes as we disable Z writing.

And when we disable color writes, we see exactly the same situation as during simple texturing with disabled color writes: NVIDIA based graphics cards speed up. NVIDIA GeForce 6800 Ultra manages to get far ahead of its rivals, while GeForce FX 5950 Ultra defeats RADEON 9800 XT.
We used an updated version of our Xbitmark test package to test the graphics cards in a bit bigger set of shaders:

The results of Xbitmark are very interesting. On the one hand, RADEON X800 XT outperformed GeForce 6800 Ultra in most shaders, but there are a few shaders where RADEON X800 XT falls pretty far behind its rival.
If we compare the results obtained by ATI RADEON X800 XT and NVIDIA GeForce 6800 Ultra in different shaders, we will discover an interesting tendency: RADEON X800 XT looks worst of all in shaders using a lot of textures, and best of all in shaders rich in mathematical calculations.
The worst shaders for RADEON X800 XT are “8 textures” and “NPR”, which also uses a lot of textures. And the most advantageous shaders are “Wood” and Factored BRDF+HDR”, i.e. the shaders rich in calculations. The picture is pretty familiar, don’t you think so? We have already faced the textures sampling problems by RADEON X800 XT in the very first test, when we checked the texturing speed with Fillrate Tester.
So, according to the results obtained in Xbitmark, we can conclude that texturing samples in the shaders are a weak spot of the new RADEON X800, while mathematical calculations, on the contrary, are a strength of the new ATI architecture.
NVIDIA GeForce 6800 Ultra copes perfectly well with the textures in shaders, but is less efficient than RADEON X800 for shaders rich in mathematical calculations.
These tendencies can be easily observed in “Dot Product Bump Mapping”, “Dot Product Bump Mapping+Specular” and “Dot Product Bump Mapping+Specular+Reflection” shaders. The more calculations are involved, the more at home RADEON X800 XT/PRO feels and the slower GeForce 6800 Ultra gets.
Let’s take a look at the results obtained in 3DMark 2001 and 2003 benchmarking sets.

The relatively simple shader 1.1 test indicates an indisputable victory of the ATI RADEON X800XT and even RADEON X800 PRO. They both outpace GeForce 6800 Ultra here.

Here the situation got somewhat worse, I should say. But even here RADEON X800 XT doesn’t yield to GeForce 6800 Ultra.

During shader 2.0 processing in 3DMark03, the leadership goes to RADEON X800 XT and GeForce 6800 Ultra, which run almost equally fast here. A little below is the graph for 12-pipeline RADEON X800 PRO, while the previous generation solutions are all at the very bottom of the diagram.
Well, the first part of our synthetic test session is over. We are through with the tests devoted to texturing speed and pixel shader efficiency.
According to the results of 3DMark test set, the newest ATI and NVIDIA solutions are equally fast, but the more detailed investigation of the matter shows that ATI VPUs, RADEON X800 XT and RADEON X800 PRO, are beyond any competition in shaders involving a lot of mathematical calculations. However, as the texturing complexity of the shader increases, as well as in case of the simple multi-texturing the laurels go over to GeForce 6800 Ultra from NVIDIA.
Now let’s take a closer look at the vertex processors performance of RADEON X800 XT/PRO.

ATI RADEON X800XT, RADEON X800 PRO and NVIDIA GeForce 6800 Ultra perform almost equally fast in 640x480 resolution mode: they got limited by the CPU and the entire system in this case.

In a similar test from 3DMark03 package more complex shaders are used, that is why the results are no longer limited by the system CPU. RADEON X800 XT/PRO is far ahead of NVIDIA GeForce 6800 Ultra.
T&L tests from the 3DMark2001 SE package can be also considered vertex shader tests, because all contemporary graphics chips use programmable vertex processors to emulate T&L functions, translating the T&L commands into shaders.

New ATI graphics processors outperform NVIDIA GeForce 6800 Ultra here. Not to mention the previous generation GPUs: they simply get lost against the background of the powerful newcomer.

As the number of light sources increases, ATI graphics processors retain their leadership. It is remarkable that the results of the RADEON X800 XT, RADEON X800 PRO and GeForce 6800 Ultra in 640x480 resolution correlate with up to 0.01 precision with the working frequencies and the number of vertex units by these processors.
![]()
And here is one more proof of R420 higher geometrical performance and fillrate: RADEON X800 XT is twice as fast as RADEON 9800 XT. Unfortunately, we couldn’t compare the results with those of the primary competitor, GeForce 6800 Ultra, because of the issue with ForceWare 60.72 driver, which we have already mentioned earlier. We simply couldn’t get an adequate result for this card.

In case of the EMBM method we again had some problems with the GeForce 6800 Ultra driver, that is why it would make more sense to speak only about the RADEON X800 XT/PRO results. The newcomers performed very well, which you can best of all see in 1600x1200. In lower resolutions the results are limited by the CPU performance that is why the results in 640x480 do not actually reflect the real state of things.

When we use Dot3 method, the top solutions, namely RADEON X800 XT and GeForce 6800 Ultra run neck and neck. RADEON X800 PRO outperforms graphics cards of the previous generation, but cannot compete with RADEON 9800 XT and GeForce 6800 Ultra.

This benchmark characterizes the well-balanced combination of the GPU – driver – CPU, when the GPU is loaded with the geometrical calculations, while the CPU – with mathematical ones. As a result, the geometrical performance of the graphics processor as well as the mathematical performance of the central processor affect the final score. ATI RADEON X800 XT/PRO performs fairly well here, which indicates that there are no problems with this solution: graphics processors both process vertex shaders well and the driver doesn’t load the CPU with additional calculations.
Well, the new architecture of the ATI graphics processors proved to be a worthy competitor to NV40 from NVIDIA.
When we compared the results of the fastest models based on the new graphics chip architectures, i.e. ATI RADEON X800 XT and NVIDIA GeForce 6800 Ultra, the leadership in most cases stayed with RADEON X800 XT: it boasts faster vertex processors and copes with pixel shaders rich in mathematical calculations just brilliantly. However, during textures processing RADEON X800 XT appeared less efficient than NVIDIA GeForce 6800 Ultra: you could see it in complex pixel shaders (from the texturing point of view) as well as during trivial multi-texturing.
As for RADEON X800 PRO, its performance is evidently too low to be able to outperform GeForce 6800 Ultra: ATI’s concerns about 12 pipelines being too few to grant the new RADEON X800 architecture a victory over the new NVIDIA solution turned out absolutely justified. The direct competitor to ATI RADEON X800 PRO will be the non-Ultra GeForce 6800 working at lower clock frequencies and capable of processing the same 12 pixels per clock.
Nevertheless, we don’t have even the slightest doubt about RADEON X800 PRO being capable of outpacing the previous generation graphics cards in the gaming tests. At least, the synthetic benchmarks indicate clearly that it look much more advanced than the previous generation solution from almost all points of view.
Well, we are about to face a real clash of strong wills. You will be able to learn everything about the results of this battle from our next article devoted only to the performance of the new generation graphics solutions in gaming applications.