Highly Defined: ATI Radeon HD 2000 Architecture Review

The sixth-generation ATI Radeon series has mirrored the release of the third-generation Nvidia GeForce – having been scheduled for a Q4 2006 release, the highly anticipated R600 is coming out in Q2 2007, half a year after its main opponent. Will the R600 be a new R300 or follow the fate of the NV30? Let’s check it out right now!

by Anton Shilov , Yaroslav Lyssenko, Alexey Stepin
05/15/2007 | 07:34 AM

Microsoft Windows Vista, DirectX 10 and the unified shader architecture were all being discussed as far back as when DirectX 9.0 was just released, but it was only in late 2004 that the important transition for the whole industry was brought up to the public. It is then that ATI Technologies and Nvidia first talked about the unified architecture as applied to their GPUs. The latter was traditionally keeping all the info to itself, dropping occasional comments like “we’ll transition when it’s really necessary”, but the former openly declared its intention to migrate to a unified shader processor architecture.

 

In fact, ATI and Nvidia were both transitioning to the new architecture in steps, smoothly. Particularly, ATI introduced a universal dispatcher of computing resources for pixel processors (called an ultra-threaded dispatch processor) into the Radeon X1000 series and Nvidia began to experiment with different frequencies for different parts of the graphics core back in the GeForce 7. Although Intel, S3 Graphics and Silicon Integrated Systems do not disclose their plans, Intel’s current integrated graphics core is known to have unified shader processors while S3 and SiS are developing their products with such architecture, too.

Thus, the transition to the unified shader architecture was not something unexpected for any of the active GPU developers. It was part of a long-established plan. Talking about ATI Technologies, this plan was realized in the Xenos GPU that was installed into the Microsoft Xbox 360 console.

Nvidia was the first company to introduce a GPU with a unified architecture, though. It was the November 2006 announcement of the GeForce 8800 series. ATI, which has now become the graphics division of Advanced Micro Devices, is lagging behind with their R600 chip by over half a year notwithstanding their earlier experience with the Xenos. Just like it was with the Radeon X1800 series, the company unveils an entire family of DirectX 10-compatible chips: R600, RV630, and RV610.

In this article we’ll talk about the special features of ATI’s new architecture and Radeon HD 2900 XT chip. We will also try to understand what is the cause of ATI being so late with its DirectX 10-compatible GPUs. Have they been preparing something revolutionary or just correcting software/hardware problems and bugs?

Unified Shader Architecture – ATI’s Vision

The goal of the developer of modern GPUs is to create a scalable architecture that may be expanded or cut down to create GPUs for different price segments.

The ATI Radeon HD 2000 family is not an exception. Judging by the flowchart of the R600 chip (Radeon HD 2900), the new series resembles the Radeon X1800/X1900 in configuration. The similarity is further extended by the ring-bus memory controller first introduced in October 2005.

The heart of the new chip is a dynamic ultra-threaded dispatch processor that is capable of dispatching, according to ATI, thousands of tasks. This functionality is called for considering that the new dispatcher has to manage much more of computing resources and data types.

Stream processors that execute vertex, geometrical and pixel shaders are organized as 4 SIMD units consisting of 16 shader processors, each of which incorporates 5 scalar ALUs capable of executing one FP MAD instruction per clock cycle (and one ALU out of the five can execute instructions like SIN, COS, LOG, EXP, etc).

Texture processors are located outside the execution pipeline, like in the Radeon X1000 architecture, and are designed as 4 large blocks, each of which includes 8 texture address processors, 4 texture filter units, and 20 texture samplers. None of the blocks has its own cache. They all use unified L1, L2 and vertex caches.

The R600’s render back-ends are 4 rather complex processors capable of performing typical rasterization operations like blending, antialiasing, etc.

Thus, we can identify a few specific aspects of ATI’s approach to designing the Radeon HD 2000 chips:

Ultra Threaded Dispatch Processor Version 2.0

An important component of a GPU with unified computing processors, the dispatch processor has to effectively allocate all of GPU resources in such a way that none of the GPU subunits was idle.

The new ultra-threaded dispatch processor is more advanced than the one in the Radeon X1000 series. For comparison, the dispatch processor of the R580/Radeon X1900 chip can manage up to 512 threads, 16 pixels each, simultaneously whereas the Radeon HD 2900’s dispatcher can manage thousands threads, 5 pixels each.

Another important difference of the new-generation dispatcher is its ability not only to distribute the resources of pixel and texture processors, but fully manage the execution of vertex and geometrical shaders, dynamically allotting the GPU’s computing resources. We should note, however, that the new dispatcher does not decode the driver’s commands into the chip’s internal commands (this is the command processor’s job) and does not create queues of vertex, geometrical and pixel shaders (this is the setup engine’s job).

The ultra-threaded dispatch processor consists of arbiters that assign tasks to computing devices and texture processors and of sequencers for the execution processors. Each SIMD array is equipped with two arbiters, which allows launching two operations simultaneously on each array.

The new dispatcher supports a variety of techniques to mask latencies when some code branch requests data not in the cache. Like with the Radeon X1000 architecture, the execution of such a code branch is halted so that the computing resources could be freed up for other tasks.

When 64 Equals 320: Unified Shader Processor of ATI Radeon HD 2000

ATI’s new shader processor is more complex than those in Radeon X1000 or Nvidia GeForce 8000.

Each pixel processor of the R500 series contained 2 scalar and 2 vector ALUs and a branch execution unit. Thus, it was capable of executing up to 4 instructions per clock cycle plus 1 branch instruction. The new shader processor of the R600 chip incorporates 5 scalar ALUs capable of executing one floating-point MAD (Multiply-ADD) instruction per cycle, and one ALU can also execute transcendental instructions like SIN, COS, LOG, EXP, etc. The sixth unit in the R600 shader processor is a branch execution unit responsible for executing flow control instructions (comparisons, loops, subroutine calls). First introduced in the R520, this unit worked with the dispatch processor to accelerate the processing of shaders with dynamic branching.

Besides that, each subunit is equipped with a dedicated array of general-purpose registers. Theoretically, each ALU has to have access to another shader processor’s registers, but it is not certain how things stand in reality. The integration of general-purpose registers into the shader processor helps make GPUs more scalable because a reduction/increase in the number of shader processors automatically reduces/increases the number of registers.

Interesting to note, ATI/AMD prefers to specify the total number of execution units rather than mention the 64 shader processors with 5 ALUs in each. This approach is no worse than others, but you should realize that it’s not quite correct to compare the number of ALUs in the GeForce 8800 and Radeon HD 2000.

We know that each of the Nvidia G80’s 128 shader processors (whose architecture still remains a mystery, by the way) can execute two scalar MAD+MUL instructions per clock cycle. Each of the AMD R600’s processors can perform up to 5 instructions (including one complex one) plus a flow control instruction. Considering the difference in the frequencies of the execution units between the R600 and the G80, we can expect them to deliver similar overall performance. The developers’ data confirm this: Nvidia estimates the computing power of its GeForce 8800 GTX at approx. 520 gigaflops whereas AMD estimates its Radeon HD 2900 XT at 475 gigaflops.

Radeon HD 2000 Texture Processor: Enough Is Enough?

Although there is a noticeable trend for games to use more and more shader-based visual effects, practice suggests that complex texturing is still employed widely and the 16 TMUs of the Radeon X1950 XTX often prove to be its bottleneck. It seems so odd then to see the Radeon HD 2900 still have 16 texture filter units.

The Radeon HD 2000 texture processor is a highly integrated device configured like follows:

The new texture processor supports filtering of FP32 textures as well as the vertex texture fetch feature the Radeon X1000 did not support. It also supports 8192x8192 textures and RGBE 9:9:9:5 texture format to comply with DirectX 10 requirements. Besides everything else, ATI/AMD claims an improved quality of anisotropic filtering.

Regrettably, ATI’s new top-end solution Radeon HD 2900 has only 16 texture filter units, which look like a potential bottleneck. The Radeon HD 2600 has only 8 texture filter units: more than in the Radeon X1600, but just as many as in the Radeon X1650 XT.

For comparison, the Nvidia G80’s texture processors contain a total of 64 texture filter units – a fourfold advantage! There is parity in terms of texture address units, though: Nvidia’s 8 processors with 4 address units in each against the ATI R600’s 4 processors with 8 units in each. 32 texture address units in each GPU. So, the two solutions should be roughly equal to each other in pure texturing speed, but the R600 may be inferior to its opponent in the speed of texture filtering. We’ll check this out shortly in our theoretical tests.

To mask texture-mapping latencies, ATI enlarged the L2 texture cache to 256KB in the Radeon HD 2900 (R600) and to 128KB in the Radeon HD 2600 (RV630). The Radeon HD 2400 (RV610) uses an unknown-capacity unified L1/L2 cache for textures and vertexes.

Besides caching algorithms, performance hits due to not very high texturing speed can be avoided by means of the dynamic dispatcher and features like Fetch4 (it accelerates fourfold the sampling of one-component textures from adjacent addresses), yet we are still to see how well the ATI R600’s texture-mapping units perform in practice.

Ring-Bus Memory Controller: Implementing 512-bit Bus

The memory controller has been considerably improved in the new GPU generation, in an evolutionary way. The first generation of ring-bus controllers used to have two unidirectional 256-bit buses in the fastest version, but the R600’s controller uses four such buses for a combined 1024-bit ring bus and does not use classic-topology elements which were present in the memory controller of the Radeon X1000 series. Without a single command center, this controller is a fully distributed solution.


Click to enlarge

According to ATI’s diagrams, the ring has four Ring Stops that receive and transmit data from the core and memory. The fifth stop is a joint point where the R600 connects to the PCI Express bus (required for HyperMemory technology). It is logical to suppose there are not 4 but 16 stops because there are 16 BGA memory chips with a 32-bit interface each. AMD’s diagrams show, however, that the R600 uses full-duplex dual-channel memory controllers (termed Ring Stops), which allows addressing 16 memory chips.

Four such channels make up a 512-bit external access bus – for the first time in the consumer 3D graphics world! This provides a very high memory subsystem bandwidth which is so demanded today, in the era of high display resolutions, extreme antialiasing levels and HDR rendering.

Render Back-End: Now with Accelerated Stencil! 

Render back-ends or, in another terminology, raster operators (ROPs) are four rather complex processors capable of performing typical rasterization operations such as blending, antialiasing, processing the alpha channel, depth buffer, stencil buffer, etc. Each processor contains 4 alpha channel subunits, 8 depth/stencil subunits, 4 blending subunits, and 16 programmable subunits for antialiasing.

In ordinary terms, the R600 can be said to have 16 ROPs, but this just an approximation if you won’t count in the peculiarities of the render back-ends architecture.

Just like the texture processors of the R600, its render back-ends are complex devices consisting of separate subunits that perform different operations. Each render back-end contains:

Unfortunately, we cannot compare the R600 and G80 here since we don’t have detailed info about their architectures, but it’s clear that the raster back-ends of the new GPU series from AMD can work with the Z-buffer at a double speed. With four raster back-ends the Radeon HD 2900 can process 16 pixels with color and Z values or as many as 32 pixels if only Z-values are processed. This is not much in comparison with the GeForce 8800 GTX that can process up to 192 Z-values per clock cycle, but is an improvement over the Radeon X1950 XTX that had no means to accelerate the processing of the Z-buffer.

The render back-end architecture of the Radeon HD 2000 is more straightforward that the texture processor architecture. We can assume the Radeon HD 2900 XT to have 16 ROPs. The ability to process only 16 pixel per cycle as opposed to the GeForce 8800 GTX’s 24 or the GeForce 8800 GTS’ 20 should be compensated by the higher clock rate of the ROPs, which is 740MHz for the Radeon HD 2900 XT as compared with the GeForce 8800 GTX’s 576MHz and the GeForce 8800 GTS’ 513MHz. So, there should be no bottleneck at this point of the R600. We’ll check it out in our tests, though.

The antialiasing subunits shouldn’t be a bottleneck, either. Each render back-end of the R600 chip contains 16 such subunits for a total of 64. Being programmable, these subunits ensure high performance and wide capabilities in terms of full-screen antialiasing. We’ll tell you about the new MSAA methods supported by the Radeon HD family below.

Evolution of FSAA: Now Up To 24x!

Even though multisampling antialiasing provides optimal balance between image quality and performance hit, both Nvidia and ATI decided implement new full-scene antialiasing technologies into new DirectX 10-compatible graphics processors, as despite of improvements in display resolutions a lot of people demand higher quality of FSAA.

The new Radeon HD 2000 chips from ATI/AMD support a new method of antialiasing that the company calls custom-filter antialiasing (CFAA). In fact, CFAA includes three methods of antialiasing that are pretty different from each other:

Both wide-tend and narrow-tent CFAA rely on programmable post-filters that samples sub-samples from outside a pixel. ATI claims that the method is fully programmable and that sub-samples have different weights, meaning that driver developers and software developers can control quality of antialiasing. Even though the method has several advantages, its main disadvantage – sampling of sub-pixels from outside pixel – is blurring of the whole image, which can negatively affect the final picture.

A more promising FSAA technique implemented into Radeon HD 2000 hardware is adaptive edge detect filter, which, according to developers, can perform edge detection pass on rendered image and then applying antialiasing with increased amount of samples onto pixels that contrast with each other more than the rest, e.g., edges or textures with large drawings. Potentially, this may reduce texture shimmering and avoid blurring, however, this needs to be checked in practice, as, for example, textures with small details may still be “antialiased” too much and will get blurry. Unfortunately, current official drivers for Radeon HD 2000 do not support adaptive edge detect filter.

Avivo HD: FullHD Gets to PC

Despite of the fact that high-bandwidth copyright protection technology (HDCP) is currently supported by almost all new graphics cards, this, unfortunately, does not mean that end-users can enjoy 1080p movies from Blu-ray and HD DVD discs on their PCs. Apparently, previous-generation graphics chips do not support HDCP for resolutions requiring dual-link DVI connection. Moreover, when previous generation graphics cards used, users were not able to have multi-channel audio, as they had to use mini-jack connectors to connect audio cards to their audio systems, which, due to HDCP, would "downgrade" number of output channels to stereo.

The Radeon HD 2000 has solutions for both issues and bring even more powerful hardware for decoding HD content. The new chips support HDCP for all resolutions and also contain built-in audio controller, which allows to transmit audio using HDMI output of graphics cards (for that ATI equips its cards with DVI->HDMI adapters). As a result, end-users can now enjoy both 1080p movies as well as multi-channel audio with their systems equipped with Radeon HD 2000 graphics cards as well as special optical drives.

The new Radeon HD 2000 graphics chips also feature improved Avivo HD technologiw with unified video decoder processing unit, which supports such functions as decoding of Context-Adaptive Variable Length Coding (CAVLC) and Context-Adaptive Binary Arithmetic Coding (CABAC) entropic encoding algorithms of H.264 and VC-1 codecs.

It is interesting to note that ATI claims support for full hadrware decoding for both H.264 and VC-1 codecs, whereas Nvidia only claims support of bitstream processing for H.264 and not VC-1. Unfortunately, current ATI Catalyst driver does not support acceleration of VC-1 decoding at all, hence, we cannot compare Avivo HD and PureVideo HD just now.


Radeon HD 2900 XT with HDMI adapter. Click to enlarge

Full hardware decoding of HD content is tremendously important for mobile computers, as while contemporary processors can process even 1080p movies, mobile chips would consume much more energy than a GPU with dedicated hardware: according to ATI, system power draw of HD DVD playback without GPU-assisted decoding is about 52W, whereas with GPU-assistance it is about 35W. Given that notebooks with Blu-ray or HD DVD optical drives are becoming more and more popular, AMD may win several designs with its Radeon HD 2000 products thanks to their advanced video capabilities.

It is also interesting to note that Radeon HD 2400 and Radeon HD 2600 have special custom logic for video post processing which improves quality of deinterlacing with edge enhancement, vertical & horizontal scaling as well as color correction.

Back to the Future: Hardware Tesselator

High-order surfaces and hardware tessellation is not something new to consumer 3D graphics: back in 2001 both ATI Technologies and Nvidia Corp. introduced technologies aimed at improving computer games’ level of detail without adding complex geometry (which would result in additional load on CPUs, busses, etc).

There are different types of higher order surfaces:

However, neither of them has become popular, on the one hand due to not ideal hardware implementation, on the other hand due to the lack if support from software makers. But as end-users and game creators demand higher quality geometry, ATI believes that hardware tessellation will return, as allowing game developers to enhance geometry detail without creating too complex models may be a valuable technology.

According to ATI, next-generation DirectX application programming interface from Microsoft Corp. will actually support programmable hardware tessellation, however, the company does not explain whether the hardware tesselator available inside the Radeon HD 2000-series GPUs will be compatible with the DirectX one.

ATI claims that its hardware tesselator can be programmed using vertex shaders, which is a surprise, as DirectX 10 features geometry shaders capable of adding vertexes to models. On the other hand, ATI claims that its hardware tessleator is similar to that of Xbox 360 GPU, which, being a DirectX 9-like chip, does not feature geometry shader.

Radeon HD 2000: The Family Portrait

Before we proceed further to results of synthetic tests and image quality checks, let we finally mention the specifications of the Radeon HD 2000 family graphics cards and compare them with rivals.

As we see, the Radeon HD 2600 and HD 2400 chips differ from the Radeon HD 2900 pretty significantly, though, specifications of the HD 2600 were not lowered to be one fourth the power of the flagship GPU, something which Nvidia did with the GeForce 8600 GTS.

ATI Radeon HD 2900 XT: PCB Design

Since the external memory bus of R600 is 512bits wide, they had to lay out and design a new PCB with appropriate specifications. Nevertheless, when ATI/AMD developers designed their Radeon HD 2900 XT they managed to stay within the dimensions of the previous generation family – Radeon X1800/X1900/X1950. The nуw generation Radeon doesn’t look like a monster at all, unlike GeForce 8800 GTX.

Because of the peculiar design of the cooling system, Radeon HD 2900 XT looks somewhat similar to GeForce 8800 GTS, differing, however, by ATI’s traditional red color. After years of experiments in this field, the leading graphics card makers seem to have finally come to the unified design for powerful graphics accelerators cooling. We are going to dwell on the Radeon HD 2900 XT cooling solution later in the corresponding section of the review, and in the meanwhile let’s take a closer look at the PCB layout, components location, etc. To give you a better view of the PCB design we removed the cooler:

  

As you can see, all electronic components on Radeon HD 2900 XT PCB are very close to one another, there are almost no empty spots anywhere, which is not surprising taking into account relatively modest size of the board, powerful voltage regulator circuitry and 512bit memory bus.

Radeon HD 2900 XT uses the so-called digital voltage regulator, like the one that we saw on the first Radeon X1950 Pro. However, this one is definitely intended to handle much higher powers: even the very first estimates suggest that R600 graphics core consumes more than 130W of power and is actually most likely to sit around 140-160W. Unlike the classical voltage regulator, the digital one boasts higher operational frequency (up to a few MHz) and uses almost no electrolytic capacitors. The heart of this circuitry is the two high-frequency digital PWM controllers - Volterra VT1115M – operating at up to 1.3MHz frequency. Power elements managed by controllers are not the traditional MOS transistors, but special controllers with digital interface. Judging by the number of inductors, the voltage regulator is designed as a seven-phase circuitry: one VT1115M can manage a few modules. In this case the upper chip manages three modules like that, while the lower chip – four. All 6 phases may be powering the graphics processor directly, while the seventh is one of the power sources for the memory, however, it can also be possible that there is a seven-phase GPU power circuitry and a separate voltage regulator for VDD and VDDQ.

External power is delivered to Radeon HD 2900 XT via two connectors, however, unlike the power connectors for GeForce 8800 GTX, one of Radeon’s is an 8-pin connector. It is not that much of a problem because it is backward compatible with the 6-pin response part, however if you use two six-pin cable plugs then some overclocking and temperature monitoring functions of Catalyst Control center will be unavailable. We cannot find an explanation of this circuitry design by AMD. As we know, although Nvidia GeForce 8800 GTX is pretty power-hungry overall, it loads both of its external power connectors more or less equally and at the same time not too heavily: each received about 40W. Even the recently announced GeForce 8800 Ultra uses six-pin connectors, although its PCB can accommodate an eight-pin connector instead one of the six-pin ones, just like the new AMD solution. In the meanwhile the maximum load each of the six-pin PCI Express connectors can take is defined as 75W. Together with the maximum load the same level slot is capable of bearing, we get the impressive number of 225W, which is more than enough for any graphics adapter. So, it looks like there is no objective need for an eight-pin power connector and shouldn’t be in the near future. Nevertheless, the Radeon HD 2900 XT owners who have PSUs with 6-pin connectors will have to put up with the absence of default overclocking and temperature monitoring features or use third party utilities, as there are no restrictions in this case.

The left side of the PCB where the GPU and memory chips are located is packed. You can see a lot of small components around the memory chips. This is actually not surprising because the small PCB size and 512bit memory bus require chips to be placed on both sides of the PCB, so the accompanying elements cannot be all shifted to the reverse side of the PCB, as they usually do it in designed with single-sided placement of the GDDR3 memory chips. As one chip features 32bit access bus, with 512bit overall memory bus and the total memory capacity of 512MB they had to use 16 chips 256Mbit each. Hynix HY5RS573225AFP-1 chips installed on Radeon HD 2900 XT are of that particular capacity, work at 2.2V voltage and support 1000MHz (2000MHz) frequency. Their actual frequency on Radeon HD 2900 XT is much lower and equals only 825MHz (1650MHz). This relatively low memory frequency means a few things actually, first of all, it ensures low heat dissipation, and secondly it guarantees high overclocking potential. But even without any overclocking, the bandwidth of the memory subsystem featuring 512bit bus and chips running at 825MHz (1650MHz) equals the impressive 105.6GB/s, which is much higher than by GeForce 8800 GTX (84.6GB/s) or GeForce 8800 GTS (64GB/s). These characteristics together with the enhanced ring bus controller give us some hope that the memory subsystem of the new Radeon HD 2900 XT will not turn into a bottleneck. It will most likely be the texturing processors and rasterizers that will impose limitations. As I have already said, memory chips are installed on both sides of the PCB, 8 chips on each side. The chips on the front side of the PCB are cooled by the primary cooler, while those on the reverse side – by the massive heat-spreader plate.

AMD R600 graphics processor, unlike Nvidia G80, uses open-die packaging, and the die itself is placed at 45 degree angle because of it large size. To protect the die edges from cracking, the packaging is enforced with a metal frame preventing the cooler base from shifting.

There are no zones with different clock frequencies in R600 and all graphics processor elements are running at 740MHz. The ability of AMD Shader processors to perform up to 5 operations per clock cycle unlike 2 only operations per clock by Nvidia processors allows them to make up for the lower working frequencies and for the fewer processors: R600 has 64, while G80 – 128.

There is an ATI Rage Theater 200 chip in the lower left corner of the PCB that came to replace the outdated Rage Theater that used to be on previous generation Radeon cards. When ATI/AMD was working on the new generation solution they decided not to give up VIVO support, unlike Nvidia. The new Theater 200 features more advanced and better quality 12-bit video ADCs and supports some technologies improving video encoding quality. The built-in Xilleon core inside R600 is responsible for analogue video out implementation, as it supports analogue video output in high resolution.

Radeon HD 2900 XT comes with pretty common connector configuration for high-end ATI/AMD graphics adapters. It features two dual-link DVI-I connectors and universal nine-pin VIVO/YPbPr connector that is incompatible with the standard S-Video cable. The lower DVI-I connector also supports multi-channel AC3 sound output via HDMI interface but requires a special ATI converter for that. Unfortunately, the sound stream can only be 16bit and the maximum sample rate is only 48kHz. At least these are the parameters set for the uncompressed sound stream in PCM format, that is why we cannot say that the new AMD solution is up to the High Definition Audio standards. Since starting with HDMI version 1.0 this standard supports 24-bit/192kHz sound playback, we suspect that the above described limitation is of mostly software nature and may be eliminated in the new driver versions later on.

R600 features hardware Composing Engine necessary for proper functioning of crossfire technology, therefore there are two standard connectors for flexible CrossFire bridges in the upper left corner of the PCB. These connectors allow uniting two Radeon HD 2900 XT cards into a dual-processor tandem that will ensure higher performance than a single one, of course.

Cooling System Design

Unlike processor coolers, designing cooling solutions for powerful graphics adapters doesn’t give you much room for engineering maneuvers. While the shape, size and weight of the former may vary within very wide range, the latter have these parameters set pretty strictly to ensure that the graphics accelerator will fit into the standard ATX system case and will not block more than one expansion slot on the mainboard. However, since the power consumption of contemporary GPUs is sometimes even higher than that of the fastest CPUs, they should feature extremely efficient cooling solutions. It is a hard task to fulfill considering the limitations mentioned above. Over the years of graphics card cooling systems evolution, the developers have checked out multiple designed and with the time both ATI/AMD and Nvidia have arrived at a more or less unified design of their graphics card coolers that ensured maximum efficiency with acceptable size and levels of generated noise.

This design implies a radial fan (the so-called blower or centrifugal), grabbing the air from the middle of the system case, pushing it through the heatsink dissipating heat from the GPU, memory and other components, and ousting the air outside the case through the slits in the dual-slot bracket in the rear panel of the case.

The interesting thing about radial fan is that is provides higher static airflow pressure and lower turbulence while its performance remains the same as that of the axial-flow fan.

So, a fan like that works better with long heatsinks with relatively small gap between the fins in the array, and these are exactly the type of heatsinks that are used in Nvidia GeForce 8800 GTX/GTS and AMD Radeon HD 2900 XT coolers.

Unlike Nvidia solution, this cooling system uses copper heatsink. It definitely has a positive cooling effect, but also increases the weight of the entire system quite noticeably. The heatsink array consists of 30 fins sitting on copper base that is pressed directly against the graphics processor die. For more even heat distribution there are two heatpipes connecting the heatsink to the copper base of the cooler. This heatsink is not as large as the one on GeForce 8800 GTS, but the use of copper instead of aluminum may definitely make up for the smaller heat dissipating surface area. The cooler base is connected to a steel frame that serves to fasten the cooler to the PCB with four screws. There is a resilient metal plate on the reverse side of the PCB that ensures tight contact between the cooler base and the GPU and protects the PCB against bending.

They use traditional dark-gray thermal grease between the die and the cooler base. The base of the cooler and the heatsink are not connected mechanically with the read aluminum part that cools the memory chips and other power elements on the PCB via the pink elastic thermal pads. This part of the cooler is fastened to the PCB with another 8 screws. The same screws press the black hear-spreader plate used for the memory chips to the reverse side of the PCB. The memory chips are covered with the pink thermal pads that ensure proper contact between the surfaces. Since the memory works at frequencies much lower than the nominal, this cooling is quite sufficient for it. The aluminum part is cased into semi-transparent red plastic with silverfish ornaments on it that symbolize flames.

Radeon HD 2900 XT uses a fan with pulse-width modulation method for the speed control connected to the board via a four-pin connector. It seems to be more powerful than the one on GeForce 8800 GTX/GTS and should produce a lot of noise at full speed. Taking into account smaller heatsink surface area and high power consumption of AMD R600, we can expect pretty noisy operation even in everyday routine work, but we will find out later if it is really so.

All in all, the cooling system of the new AMD Radeon HD 2900 XT uses the today’s most optimal design with radial fan, heatpipes and warm air redirect outside the system chassis. Hopefully, it will prove as efficient as the Nvidia cooling solution developed for GeForce 8800 GTX/GTS family.

Faster, Louder, Higher: Noise Level and Power Consumption of Radeon HD 2900 XT

Traditionally, we measured the level of noise produced by the graphics cards’ coolers with a digital sound-level meter Velleman DVM1326 (0.1dB resolution) using A-curve weighing. At the time of our tests the level of ambient noise in our lab was 36dBA and the level of noise at a distance of 1 meter from a working testbed with a passively cooled graphics card inside was 43dBA.

We got the following results:



As we see, the Radeon HD 2900 XT is not a quiet graphics card to say at least. Even though it may not be as noisy as some other graphics cards, such as Radeon X1800 XL with the first version of the cooler, it is definitely much noisier than the main rival: GeForce 8800 GTX. Moreover, our Radeon HD 2900 XT did not slowed down its fanspeed after speeding it up after high load, which means that the board remains noisy even after high performance cooling becomes unnecessary (e.g., after exiting 3D applications).

Even though the cooling systems seems to be efficient, we are not satisfied with its acoustic levels and working algorithms and hope that AMD will improve them with future driver/BIOS releases.

We also traditionally measured power consumption of the new board using our special testbed with the following configuration:

The mainboard in this testbed was specially modified: we connected measurement shunts into the power lines of the PCI Express x16 slot and equipped them with connectors to attach measuring instruments. We also added such a shunt to a 2xMolex → PCI Express adapter. The measurements were performed with a Velleman DVM850BL multimeter (0.5% accuracy).

We loaded the GPU by launching the first SM3.0/HDR graphics test from 3DMark06 and running it in a loop at 1600x1200 resolution and with enabled 16x anisotropic filtering. The Peak 2D load was created by means of the 2D Transparent Windows test from Futuremark’s PCMark05 benchmarking suite. The results follow below:



Even though the GeForce 8800 GTX is not a chip that loves to save energy, the Radeon HD 2900 XT consumes even more: 161W is a new record for our lab. In fact, ATI claims that the Radeon HD 2900 XT can consume 215W, which is even more than we got, therefore, a noisy cooling system is not a surprise…

Testbed and Methods

With the review of the Radeon HD 2900 XT we decided to use our new Intel Core 2 Duo-based system with Windows Vista OS. The configuration of our new testbed is as follows:

Since we believe that the use of tri-linear and anisotropic filtering optimizations is not justified in this case, the graphics card drivers were set up in standard way to provide the highest possible quality of texture filtering.

ATI Catalyst:

Nvidia ForceWare:

Due to the fact that this is our first review of the Radeon HD 2900 XT, we initially carry out a short theoretical investigation before running real-life games on our using the graphics card, as that provides some clues regarding the GPU’s behavior in actual games. The list of our theoretical benchmarks hasn’t changed.

Anisotropic Filtering and FSAA Quality Investigation

But before running theoretical tests, we decided to perform a short investigation of ATI’s new full-scene antialiasing methods to find out whether they are different or not when compared to ATI’s old techniques as well as Nvidia’s coverage sample antialiasing (CSAA).

It is necessary to note that activation of ATI’s new custom-filter antialiasing (CFAA) may seem to be a tricky process for a novice: the end-user needs to select type of filter first and the number of samples second. Given that driver does not explain that “Box” filter is a traditional multi-sample antialiasing (MSAA), whereas “Narrow tent” and “Wide tent” represent the new CFAA, some customers may have to spend additional time finding out a suitable method.

Currently ATI Catalyst driver supports MSAA 2x, 4x and 8x, whereas the “good-old” MSAA 6x, which was introduced back in 2001 along with the Radeon 8500, is not supported in current drivers. In addition, the drivers support CFAA 4x, 6x, 8x, 12x and 16x, whereas the advertised adaptive edge-detect FSAA is not present.

Since CFAA 4x is actually MSAA 2x with “Narrow tent” filter, it makes sense to compare it to MSAA 2x of ATI Radeon and Nvidia GeForce.

CFAA 4x vs. MSAA 2x

Half-Life 2

Radeon HD 2900 XT

Radeon X1950 XTX

GeForce 8800 GTX

CFAA 4x

MSAA 2x

MSAA 2x

Elder Scrolls: Oblivion

Radeon HD 2900 XT

Radeon X1950 XTX

GeForce 8800 GTX

CFAA 4x

MSAA 2x

MSAA 2x

When we first tested Nvidia GeForce 8800 graphics card several months ago, we noticed that the launch ForceWare driver had issues with fog in Half-Life 2. Well, today we have an unpleasant surprise: the new ATI Catalyst driver has the same issue.

It is indisputable that CFAA 4x antialiases lines much better than MSAA 2x. However, we have to admit that it blurs the whole image pretty significantly too, hence, this may not be a really good solution for games that have sharp textures.

All of our today’s participants support MSAA 4x, so we can compare realization of multi-sampling on all GPUs.

MSAA 4x vs. MSAA 4x

Half-Life 2

Radeon HD 2900 XT

Radeon X1950 XTX

GeForce 8800 GTX

MSAA 4x

MSAA 4x

MSAA 4x

Elder Scrolls: Oblivion

Radeon HD 2900 XT

Radeon X1950 XTX

GeForce 8800 GTX

MSAA 4x

MSAA 4x

MSAA 4x

It is interesting to note that the new Radeon HD 2900 XT renders sharper image compared to the Radeon X1950 XTX and GeForce 8800 GTX. It is hard to tell, however, whether the new R600 produces better antialiasing quality compared to the G80, as on some lines MSAA filter of the GeForce provides smoother quality, whereas in other cases MSAA filter of the Radeon leaves its rival behind.

6x CFAA mode has two types of realization in the current driver: as MSAA 2x with “Wide tent” filter and as MSAA 4x with “Narrow tent” filter. We decided to compare these two methods to MSAA 4x of the Radeon HD 2900 as well as with MSAA 6x of the Radeon X1950 XTX.

CFAA 6x vs. MSAA 6x

Half-Life 2

Radeon HD 2900 XT

Radeon HD 2900 XT

Radeon X1950 XTX

CFAA 6x w. Narrow tent

CFAA 6x w. Wide tent

MSAA 6x

Elder Scrolls: Oblivion

Radeon HD 2900 XT

Radeon HD 2900 XT

Radeon X1950 XTX

CFAA 6x w. Narrow tent

CFAA 6x w. Wide tent

MSAA 6x

Unfortunately, 6x CFAA with “Wide tent” filter leaves aliasing artifacts on the image in addition to blurring textures, which is logical, as the number of colour samples remain the same as in the case of 2x MSAA, whereas the amount of blurring the driver has to apply on the image become gargantuan.

6x CFAA with “Narrow tent” filter looks indisputably better compared to its incarnation with “Wide tent” filter and two colour samples, as the mode is an improved version of usual MSAA 4x. In fact, the 6x CFAA with “Narrow tent” filter can even be compared against 6x MSAA: perhaps, it antialiases even better. Unfortunately, sampling from outside pixel’s boundaries has its negative effect: the whole image a little blurry.

As we know from our GeForce 8800 GTX review, CSAA 8x is not as good as FSAA 8xQ, which is basically MSAA 8x, still, it is not easy to notice any difference without special tools. But let’s take a look at ATI’s MSAA 8x implementation with the Radeon HD 2900 XT hardware.

CFAA 8x vs. MSAA 8x vs. CSAA 8x vs. CSAA 8xQ

Half-Life 2

Radeon HD 2900 XT

GeForce 8800 GTX

CFAA 8x

CSAA 8x

MSAA 8x

CSAA 8xQ

Elder Scrolls: Oblivion

Radeon HD 2900 XT

GeForce 8800 GTX

CFAA 8x

CSAA 8x

MSAA 8x

CSAA 8xQ

MSAA 8x of the Radeon HD 2900 XT is a little better compared to FSAA 8xQ of Nvidia GeForce 8800 in case of Elder Scroll: Oblivion title. Unfortunately, we cannot verify this for the case of Half-Life 2 due to the fact that R600 has issue with fog rendering with this game title.

Even though 8x CFAA with “Wide tent” filter looks normal and provide high-qaulity antialiasing, it also blurs textures and small details do not look really crispy with this MSAA 4x antialiasing with special filter applied.

Formally, the remaining CFAA 12x and CFAA 16x are competing against Nvidia’s FSAA 16x. Well, let’s see the difference.

CFAA 12x vs. CFAA 16x vs. CSAA 16x vs. CSAA 16xQ

Half-Life 2

Radeon HD 2900 XT

GeForce 8800 GTX

CFAA 12x

CSAA 16x

CFAA 16x

CSAA 16xQ

Elder Scrolls: Oblivion

Radeon HD 2900 XT

GeForce 8800 GTX

CFAA 12x

CSAA 16x

CFAA 16x

CSAA 16xQ

It is undeniable that both CFAA 12x (MSAA 8x with “Narrow tent” filter) and CFAA 16x (MSAA 8x with “Wide tent” filter) provide acceptable quality of antialiasing. Nevertheless, blurring becomes too significant in case of CFAA 16x and can also be noticed in case of 12x. A good thing about ATI’s CFAA is that vegetation looks a little more accurate, but small texture details may nearly disappear due to blurring. Therefore, we have to say that Nvidia’s FSAA 16x and FSAA 16xQ look much better than competing modes from ATI. Still, we yet have to see ATI’s adaptive edge detect filters.

So, lets sum everything up:

Now, let’s have a look at anisotropic filtering quality.

Anisotropic Filtering

Radeon HD 2900 XT

Radeon X1950 XTX

GeForce 8800 GTX

Default Quality

High Quality

High Quality

Without any surprises, ATI Radeon HD 2900 XT has the same good adaptive anisotropic filtering algorithm that we know from the introduction of ATI Radeon X1000. Nonetheless, current generation of Nvidia GeForce 8800 graphics cards provides even better angle-independent anisotropic filtering and it is not completely clear why ATI decided not to improve this part of its new chip.

Performance in FSAA Modes

Besides comparing the quality of the different FSAA modes supported by the Radeon HD 2900 XT, we also compared their influence on the speed of the card in the popular 3D shooter Half-Life 2: Episode One that runs on the advanced Source engine. The results are listed below:





The new Radeon HD 2900 XT is unquestionably faster compared to the previous-generation Radeon X1950 XTX. However, not only it is slower than the GeForce 8800 GTX, it is not as fast as its direct competitor: GeForce 8800 GTS, which is an alarming sign.

With regards to ATI’s new custom-filter antialiasing methods we can state the following:

Performance in Theoretical Tests

Scene Fill Rate

ATI Radeon HD 2900 XT behaves just according to theory: its scene fillrate is a little higher compared to Nvidia GeForce 8800 GTS due to increased clock-speed, however, it cannot compare to the 8800 GTX model, as the latter has 24 render back ends, whereas ATI’s new solution only sports 16 of them. Obviously, the R600 cannot match G80’s capabilities when it comes to Z pixel rate, as it only can compute 32 pixels with Z-value per clock.

Pixel Shader Performance

The Radeon HD 2900 XT behaves here similarly to its predecessor, however, its performance drops only on pixel shader 2.0 per-pixel lighting. It looks like this particular test hardly measures math1ematical performance of the R600, but rather shows that the chip has serious performance bottlenecks inside, e.g., slow texture units, which may negatively affect its speed in real-world applications. Still, the new board is definitely faster than the GeForce 8800 GTS and, in two cases even shows higher performance compared to the model 8800 GTS.

The times when ATI’s hardware used to lead in Xbitmark test have gone: as we see, there are lot of cases when Nvidia’s GeForce 8800 GTX leaves its rival in the dust. Moreover, there a number of cases when the new board does not outperform its predecessor significantly due to one simple reason: the newcomer has a an Achilles heel with its 16 texture mapping units.



As is known, 3DMark05 and 3DMark06 benchmarks use similar pixel shader for appropriate test of the suite, however, the latter still works a little faster.

The GeForce 8800 GTX is an indisputable leader here, whereas the Radeon HD 2900 XT’s performance is nearly similar to that of the GeForce 8800 GTS. We are not sure that this test actually measures computing power of modern GPUs, but have reasons to believe that it compares the speed of texture sampling, memory controllers and/or caches on the first place. Nonetheless, the Radeon HD 2900 XT does not outperform its main rival here despite of the fact that it has higher fillrate than Nvidia’s GeForce 8800 GTS.

Vertex Shader Performance

Apparently, vertex shader performance of ATI Radeon HD 2900 XT looks much better than its pixel shader performance, which is logical, as here the R600 has no performance bottlenecks with slow texture fetching.







The same happens when it comes to 3DMark’s vertex shaders: the new Radeon HD 2900 XT is virtually up to three times faster than its predecessor and even manages to leave its much more expensive rival – Nvidia GeForce 8800 GTX – behind.

Other Theoretical Tests

The “Shaders Particles” test helps to measure the efficiency of processing the physical model of a system of particles by means of pixel shaders and of its visualization with vertex texturing. Previous generation hardware from ATI did not support vertex texture fetches, whereas the new one, obviously, supports this feature without problems.

But even though results of the Radeon HD 2900 XT here are pretty high, the chip does not outperform Nvidia’s flagship offering.

In the “Perlin Noise” test the realistic clouds that are changing in real time are generated by means of a pixel shader that contains 447 math1ematic instructions and 48 texture lookups. So, this test shows how well the GPU can work with long shaders (the Shader Model 3.0 specification requires support for 512-intruction-long shaders).

Just according to expectations, Radeon HD 2900 XT managed to find itself on the first place in this test: with math1ematic instructions exceeding the amount of texture lookups by the factor of 10, the R600 architecture feels itself very good here.

Conclusion

We still to have to find out performance of the Radeon HD 2900 XTX in real-world games and cannot make any final conclusions now. However, we can already state certain things about the newcomer, which was expected to emerge back in the Q4 2006, but was delayed by nearly half a year.

ATI Radeon HD 2900 XT is here, it works; it has unified shader architecture; it has over 700 million transistors, an absolute record to date; it features 512-bit memory bus, for the first time with the industry; it supports many interesting features and technologies; but it also has drivers that need to be improved to utilize the new promising features and its has only 16 texture units, which limits performance even in synthetic 3D benchmarks.

The main advantage of ATI Radeon HD 2000 is exceptional computing power of all chips as well as modular design, something, which may allow the company to rapidly create new chips based on common building blocks – 5D scalar shader processors, texture processors and render back end processors. But if it takes the company another year before it boosts performance of its performance mainstream, mainstream and entry-level chips, the scenario that happened with transition from the Radeon X1600 XT to Radeon X1950 Pro, AMD risks to lose market share to Nvidia, who is consistently sticking to its policy of renewing the product lines every two quarters.

The new full-scene antialiasing methods look promising, but not all of them are ideal to say at least, as in certain cases we see substantial blur over our screenshots, which is unacceptable in the year 2007, when games tend to feature textures with a lot of small details as well as micro-geometry. We hope that ATI’s adaptive edge detect filter antialiasing will bring the new image quality heights into consumer 3D graphics.

Another potentially good thing about the Radeon HD 2000 is built-in hardware tesselator, similar to tesselator inside Xbox 360 GPU, which may catalyze game developers to take advantage of it and games will look more detailed on ATI Radeon HD 2000 graphics cards.

Multimedia features of the Radeon HD 2000 family also look promising, but, unfortunately, AMD/ATI unified video decoder should yet to be fully implemented into drivers. Even though the chip now officially supports decoding of VC-1 as well as features audio controller for audio data transmission over HDMI, the lack of certain software features make such capabilities useless for now.

Finally, despite of the fact that power consumption of $399 graphics card is not a really important factor to consider and around 160W is not really a lot for modern power supply units, cooling system that produces rather loud nice is something that should definitely be corrected on commercial Radeon HD 2900 XT graphics cards.