Knowing the Depths: NVIDIA GeForce 6600 GT Architecture

NVIDIA's GeForce 6600-series graphics processing units are now almost here: in a few days, in mid-September 2004, NVIDIA’s add-in-card partners are expected to begin shipments of the new mainstream hero. Let us have a look, what is inside of this graphics processor as well as find out all its strong and weak sides.

by Tim Tscheblockov , Alexey Stepin
09/07/2004 | 05:30 AM

Introduction

Until the present, NVIDIA Corp. has been having problems with arranging a comprehensive line-up of modern graphics processors. The company had an excellent chip for making top-end and upper-mainstream graphics cards out of, but the mainstream sector was only represented with outdated solutions of the unfortunate GeForce FX brand. Meanwhile, ATI Technologies, the archrival, had competitive enough RADEON 9800 XT series graphics cards that had moved down to the mainstream sector after the release of the new RADEON X800 GPU family.

 

The GeForce FX architecture was inferior to the competing solutions from ATI in many respects, so NVIDIA got to develop a chip that would bring the GeForce 6800 functionality down to the new mainstream. Today, on September 7, the new product is officially released, and you are welcome to be present at the delivery. As might have been expected, the new one obtained the name of GeForce 6600, which tells of a relation to the top-end GeForce 6800 family and negates any tie to the obsolete GeForce FX series.

GeForce 6600, Junior in the Family

The GeForce 6600 is architecturally alike to the GeForce 6800 and boasts all of the latter’s capabilities, including support of Shader Model 3.0, high-precision dynamic range (HPDR), UltraShadow II and other technologies we detailed in our review of the NV40 core. Thus, NVIDIA is a pioneer again, now issuing the world’s first mass GPU with support of the new pixel and vertex shader standard (as you remember, the NV40 was the first GPU to offer support of Shader Model 3.0, so the company’s maintaining its technological superiority). Besides that, the GeForce 6600 became the first of NVIDIA’s products to have native support of the PCI Express bus. Until now, all PCI Express compatibles from NVIDIA were connected to this bus via a special bridge chip. The full list of the GeForce 6600’s characteristics follows below:

The new GPU has 8 pixel pipelines and 3 vertex processors (against 16 and 6, respectively, in the GeForce 6800 Ultra), and this allows getting along with a lower transistor count. One GeForce 6600 die consists of 143 million transistors only. Compare this to the GeForce 6800 Ultra’s 220 million. The new GPU is manufactured with the most advanced 0.11-micron tech process, developed and implemented by Taiwan Semiconductor Manufacturing Company (TSMC).

The lesser complexity of the die and the thinner tech process mean better heat dissipation and power consumption parameters as well as a higher chip yield. We hope the situation with the acute shortage of GeForce 6800 Ultra/GT won’t recur with GeForce 6600-based cards.

NVIDIA’s New Mainstream Offer: GeForce 6600 GT, GeForce 6600

Two models make up the new series: GeForce 6600 GT and GeForce 6600. The first is clocked at 500/500 (1000DDR) MHz and is equipped with 128MB of high-speed GDDR3 memory. The GeForce 6600 GT is also the world’s first mainstream GPU with the PCI Express interface to support the Scalable Link Interface (SLI) technology. In other words, you can install two GeForce 6600 GT graphics cards into the two PCI Express x16 slots of your mainboard to enjoy an almost double performance gain in 3D applications. This opportunity doesn’t look very alluring right now, but with the release of NVIDIA’s own new chipsets and the arrival relatively inexpensive mainboards on those chipsets, SLI technology may become fashionable.

The recommended price for the GeForce 6600 GT is set by the developer at $199 for the USA and ?229 for Europe.

The GeForce 6600, being a humbler model, comes with a slower type of memory – DDR SDRAM. It is alike to the GeForce 6600 GT in all, except the frequencies and the SLI technology. The GeForce 6600 GPU doesn’t support SLI at all and works at 300MHz against the other model’s 500MHz. The memory frequency is going to vary among graphics card manufacturers. The price of the solution is not revealed, either. Probably, it will also vary among the manufacturers. The following table compares the characteristics of the two GPU models from the GeForce 6600 series:

GeForce 6600 GT, GeForce 6600 Product Specifications

 

GeForce 6600 GT

GeForce 6600

Core Clock

500MHz

300MHz

Memory Clock

1000MHz

determined by manufacturer

Pixel Pipelines

8

8

Vertex Pipelines

3

3

Memory Type

GDDR3

DDR

Memory Size

128MB (reference design)

128MB, determined by manufacturer

Memory Bus

128-bit

128-bit

Recommended Price

$199, $229

determined by manufacturer

This is a worthy replacement to the GeForce FX in everything, save for the 128-bit memory bus, but NVIDIA has got its GeForce 6800 and 6800 GT to compete with the advanced representatives of the last generation of graphics cards, while the GeForce 6600 family is called forth to take the place of the outdated solutions like GeForce FX 5900 XT, 5700 Ultra and 5700. The new tech process and the low transistor count allowed increasing the clock rate of the GeForce 6600 GT to 500MHz – added its 8 pixel pipelines, this should result in an excellent performance, as mainstream solutions go. The availability of only three vertex processors (against 6 in the GeForce 6800 Ultra/GT and 5 in the GeForce 6800) may negatively affect the performance in geometry-heavy scenes, but let’s not worry about that until the testing part of this review.

So, we received a sample of the GeForce 6600 GT card for us to test. Since the GeForce 6600 differs from it in the frequencies only, we will be able to check its performance, too, by simply reducing the clock rates.

GeForce 6600 GT: Taking a Closer Look

Fresh out of its antistatic package, the card surprises with its simple and compact design, not peculiar to modern semiconductor produce. The eight-pipelined cards of the last generation like GeForce 5950 Ultra or RADEON 9800 XT looked much more complex devices. Well, they also had a 256-bit memory bus, while the PCB of the GeForce 6600 GT carries the wiring for a 128-bit bus only.

    

Even in comparison to the RADEON X600 XT, the card looks empty, as the shield occupies most of the face side of the PCB. You can only see the wiring for the memory as well as the simple circuitry of power filtering and regulation. The card has no additional power connector, although there’s place left for it on the PCB. Thanks to the 0.11-micron tech process and the ability of the PCI Express bus to supply up to 75 watts to the connected device, there’s no need in attaching additional power to the GeForce 6600 GT. That’s good as it makes the installation process easier and frees one PSU connector for another gadget in your system. However, NVIDIA recommends that you use your GeForce 6600 GT card in a system with a high-quality 300W PSU. That’s a justified requirement as low-quality cheap PSUs don’t provide the necessary stability of the voltages. Moreover, such units don’t always honestly comply with their own specs and their design is often simplified. Under a high load, such a unit may crash by itself or with your whole system.

The connector for a special adapter that unites the two graphics cards in the SLI mode is located at the top left corner. The PCB also has a landing place for a second DVI connector, so we may probably see a version of the GeForce 6600 GT with two digital interfaces – especially for owners of two LCD panels. The release of a version with two D-Sub outputs is less likely, although conceivable.

There’s a scattering of small elements at the back side of the PCB, although the shield occupies most of it, too. The Philips SAA7115HL chip is responsible for TV input; the GPU itself takes care about outputting the TV signal.

The GPU cooler is rather untypical for NVIDIA, who has been equipping its products with the so-called blowers exclusively since the GeForce FX 5800. With all their efficiency, blowers can’t boast a very quiet operation. In this case, however, we have a simple axial fan with seven wide transparent blades.

We had a reference card, so there is no highlighting or other special effects here. We are sure, though, that many manufacturers will equip their versions of the GeForce 6600 GT with original coolers and exquisite LED patterns.

The simplicity of the cooling system is directly linked with the 0.11-micron tech process. The graphics processor, manufactured with so thin a process, won’t heat up much, even at 500MHz. There’s still no tachometer, and the fan speed is constant, unlike with the GeForce 6800 cards. The cooler is fastened with two “classic” spring clips. You should be careful when installing the cooler: there’s no frame to protect the GPU die from chipping, and there’s a danger of misaligning the cooler – its fastening lacks stiffness.

Here’s what we have under the heatsink’s sole:

The die of the new graphics processor is rather small, thanks to the 0.11-micron tech process again. Our sample is marked as NV43; it is A2 revision and was manufactured in the 29-th week of the current year, i.e. at the end of July.

The memory is not cooled at all – a sad negligence about chips that work at 500 (1000DDR) MHz, even though GDDR3 generates less heat than DDR/GDDR2. Theoretically, the memory should be cooled down by a stream of air from the GPU cooler, but this stream is too weak to be of any effect. As a result, the memory chips are very hot, although not as much as to scorch your fingers. Well, the problem of cooling is yet to be discussed, since we only deal with an engineering sample now. NVIDIA’s partners may come up with their own cooling systems, more caring about the memory.

The memory is FBGA-packaged and marked as “K4J55323QF-GC20”. According to Samsung, the manufacturer of the chips, this means GDDR3 with 32x8Mb organization, 2v voltage, 2ns access time and a rated frequency of 500 (1000DDR) MHz.

The chips are clocked exactly at their rated frequency on this card. Four chips with that organization make up 128MB of graphics memory, accessed across a 128-bit bus with a peak bandwidth of 16GB/s. The memory subsystem of the GeForce 6600 GT looks very similar to that of the GeForce FX 5800 Ultra where NVIDIA employed same-frequency chips and same-width bus. You may remember that the memory subsystem was among the basic disadvantages of that graphics card. However, this time we deal with a mainstream, inexpensive and much more powerful product than the wretched GeForce FX 6800 Ultra, which was positioned as a top-end solution. Mainstream graphics cards have always had a 128-bit memory bus, so there’s a case of historical justice here. ;)

GeForce 6600 GT: Noise, Overclocking, 2D Quality

The cooling system of the GeForce 6600 GT was rather quiet – the fan was noisy, but not irritating, without any disturbing shrieks. The sound of the cooler soon merged with the other system noises – from the CPU cooler, the two fans of the 550W PSU, and the HDD’s spindle – never rising above them.

The two samples of the GeForce 6600 GT we happened to have in our labs overclocked differently. One was stable at 580MHz GPU frequency, which was no record for a 0.11-micron chip. The other card, however, made an impressive climb to 625MHz. The memory overclocked to 550 (1100DDR) MHz on both samples – that’s good for 2.0ns chips, originally working at their nominal frequency.

The graphics card outputted a sharp picture on the screen in all the resolutions our monitor could support, i.e. up to and including 1800x1440@75Hz. Well, we’ve got used to that already. Only cheap no-name products, whose manufacturers save on the PCB design and output LC filters, produce a bad onscreen image. The reference GeForce 6600 GT has nothing to do with such “products”, of course.

Testbed and Methods

We set up the following platform to check out the performance of the GeForce 6600 GT:

For the comparison’s sake we offer the results of RADEON 9800 XT and GeForce FX 5950 Ultra, graphics cards of the older generation. We tested them on another testbed of the following configuration:

To avoid any confusion, we mark the graphics cards tested on the AMD64 platform with an asterix sign. Like we do with any new graphics processor, we will run a full cycle of theoretical tests that reveal all the performance-related characteristics of the new solution, from fill rate to pixel shader execution speed. So, let’s proceed to the tests!

DivX, DVD, HDTV Playback

GeForce 6800 series GPUs are known to have a video unit, intended for a more efficient processing of video streams as well as for reducing the load on the central processor of the system. However, our tests showed that these options were probably disabled on the driver level, since the CPU load was indecently high when playing video of various formats, reaching 100% with HDTV. We decided to check out if the new ForceWare (version 65.76) had brought anything new in this area. Playing a HDTV clip with a vertical resolution of 1080 lines, we got the following results:

For comparison, playing the same clip on a system with an Athlon XP 2600+ loaded the CPU by 80-90%. The GeForce 6600 GT is targeted exactly at mainstream systems equipped with PCI Express. Owners of such platforms will be able to enjoy smooth HDTV video playback without putting an extreme load on the CPU and having dropped frames or jerkiness.

We used a 640x480 clip in this test, and the new GPU did well in this case, too.

We used a legal copy of the Starship Troopers movie in the DVD playback test. With the new driver, NVIDIA’s GPUs were excellent again.

The asterix in the diagrams marks AGP graphics cards we tested in a system with an AMD Athlon 64 3400+. When using ForceWare 65.76 we always saw a considerable alleviation of the CPU load, and NVIDIA’s GPUs had an advantage over the rest of the participants, save for the S3 DeltaChrome S8 Nitro with its Chromotion engine. Thus, it is now clear that the video processor, highly touted by NVIDIA, does work. Moreover, it works just excellently!

GeForce 6600 GT: Theoretical Tests

Fill Rate Tests

Traditionally we open up the cycle of theoretical tests with MDolenc Fillrate Tester, a flexible and useful tool for measuring the scene fill rate as well as the efficiency of executing pixel shaders in different color/Z write modes.

Everything is changing dramatically in the course of the first test. The GeForce 6600 GT starts out by being worse than the RADEON 9800 XT at single-texturing and with no textures but finds itself ahead of its rivals at rendering two textures already. The GeForce 6800 GT belongs to another class, of course, and its results are given for the comparison’s sake only. So why is the GeForce 6600 GT so poor in the first two cases, with zero and one texture?

Obviously, the fill rate of the GeForce 6600 is limited by the memory bus bandwidth there. When rendering less than two textures, the GPU processes 8 pixels per clock and refreshes the values in the frame and Z-buffers each clock. When rendering 2, 3 or 4 textures, however, it theoretically requires 2, 3 and 4 clocks, respectively, to output the same 8 pixels, and the burden on the memory bus alleviates.

The RADEON 9800 XT, unlike the GeForce 6600, has a 256-bit memory bus, so its results at rendering a small amount of textures are higher, notwithstanding the lower clock rate and the lower theoretical speed maximum. Its results are simple less limited by the memory bandwidth.

Z-writes disabled, everything remains the same, save for the worse results of the GeForce FX 5950 Ultra with 1, 2 or 3 textures.

This time the performance of the GeForce 6600 GT with disabled color writes is higher than that of the GeForce 6800 GT, despite the difference in the number of their pipelines. It’s hard to account for this fact, as theoretically, the GeForce 6600 GT with support of UltraShadow II technology can perform up to sixteen Z-writes per clock when color writes are disabled.

It is possible that the Z-check and Z-write units of this test don’t work with single pixels, but rather with blocks of them that correspond in size to the tiles that the HSR unit(s) can process. This helps to overcome the theoretical maximum. The GeForce 6600 GT works at a higher clock rate than the GeForce 6800 GT and this allows the new graphics card to be faster despite having fewer pixel pipelines.

Next go the texturing speed tests from 3DMark03:

The results in 3DMark03 confirm the Fillrate Tester’s report: the GeForce 6600 GT is far from the theoretical limit at single-texturing as it is being limited by the memory bandwidth and the efficiency of the frame-buffer and Z-buffer caching algorithms.

As a result, the GeForce 6600 GT is slower than the RADEON 9800 XT which also has eight pixel pipelines, a lower core frequency, but a higher memory bandwidth (256-bit memory bus).

The load on the memory bus decreases dramatically at multi-texturing, as the graphics processor, busy with rendering so many textures and spending more clocks to process pixels, accesses the frame-buffer and the Z-buffer less frequently. As the outcome, we have numbers similar to the theoretical texturing speed of the GeForce 6600 GT.

We would like to note that the 128-bit memory bus and the rather low efficiency at rendering a small number of textures are no catastrophe for the GeForce 6600 GT. This “weakness” can only show up in old games where high speed with few textures is among the main requirements to the graphics card. New games like Far Cry or Doom 3 usually employ much more textures and extensively use pixel shaders. Being engaged into executing pixel shaders, the graphics processor won’t notice that it lacks the memory bandwidth – the bus won’t be a bottleneck anymore. We will check out this supposition in our tests of the GeForce 6600 GT in real gaming applications.

Pixel Shader Performance

The above-mentioned MDolenc Fillrate Tester can be employed to benchmark the performance of modern graphics cards at executing pixel shaders:

It’s all natural here: the advanced architecture and high clock rate of the GeForce 6600 GT win the test, leaving the RADEON 9800 XT, once unrivalled, behind. The graph of the 6600 GT has a similar shape to that of the GeForce 6800 GT, but the latter has higher results, of course.

What’s curious, the new GPU doesn’t have exactly half the performance of the GeForce 6800 GT, but rather more, due to its higher core frequency.

The results suggest that the GeForce 6600 GT, like all the new GPUs from NVIDIA, executes pixel shaders faster at half the calculation precision.

Disabling Z writes brings no surprises: all the above-said things are true for the diagram above.

Color writes disabled, we again see “miracles”: the GeForce 6600 GT produces a predictable result when calculating per-pixel lighting, but reaches the level of the GeForce 6800 GT and more at executing simple pixel shaders!

Next we tested the new NVIDIA GPU in our Xbitmark suite, which allows evaluating the performance of the GPU at executing pixel shaders of various degree of complexity.

The GeForce 6600 GT is slower than the RADEON 9800 XT in one case (the Factored BRDF + HDR shader is “heavy” in terms of arithmetic computations) out of fifteen. Well, the advantage of the newer model over the older solution was rather small with complex shaders like Dot Product Bump Mapping + Specular + Reflection, Metal + Phong, Cook-Torrance + Texture + Fresnel and Wood.

Anyway, the new graphics processor from NVIDIA looks very strong in this test. The NV40 architecture, which is a foundation of the NV43 core (i.e. of the GeForce 6600/6600 GT GPUs) has no clearly weak aspects like the NV3x architecture had (particularly, in pixel shader performance).

Now, let’s proceed to pixel shader tests from 3DMark 2001SE and 3DMark03 suites:

The performance of the GeForce 6600 GT with version 1.1 pixel shaders is worse than that of the RADEON 9800 XT, which is greatly helped by its 256-bit memory bus.

The GeForce 6600 GT is better at executing version 1.4 pixel shaders – it is only slower than the GeForce 6800 GT, a GPU of a higher class.

3DMark also testifies to the exceptional performance of the GeForce 6600 GT, confirming the results of Xbitmark and MDolenc Fillrate Tester.

Vertex Shader Performance

The GeForce 6600 GT is even faster than the GeForce 6800 GT in 640x480 due to the higher clock rate of its vertex processors. In higher resolutions it is the memory bandwidth rather than the geometry processing speed that determines the result. Clearly, this test suits but badly for evaluating vertex performance of modern graphics cards.

It’s different in an analogous test from the 3DMark03 suite: the speed of executing vertex shaders almost alone determines the end result. With 3 vertex pipelines against 6, but with a higher core clock rate, the GeForce 6600 GT is slightly faster than half the speed of the GeForce 6800 GT.

T&L Emulation

The GeForce 6600 GT-based card performs as expected in this test: in 640x480 resolution, i.e. when the results are least limited by the memory bus bandwidth, the GeForce 6600 GT is a little faster than half the speed of the GeForce 6800 GT.

When there are eight light sources, the GeForce 6600 GT loses its ground suddenly, even falling behind the RADEON 9800 XT. The problem may lie somewhere in the driver: the commands of the classic T&L are inefficiently translated into equivalent vertex shaders.

The test of processing arrays of sprites loads the vertex processors, but also demands a high memory bandwidth. Once again, the new product from NVIDIA suffers from its narrow 128-bit bus, although clocked at 500 (1000DDR) MHz.

Relief Rendering and Other Theoretical Tests

The GeForce 6600 GT has nice relief-rendering skills, at least with the EMBM method. It even surpasses the GeForce 6800 GT in two resolutions out of three, but the same narrow memory bus becomes the bottleneck in 1600x1200.

Dot3 relief rendering is another matter: the new NVIDIA solution delivers roughly the performance of the old-generation graphics cards.

The Ragtroll test shows the balance of the “CPU-driver-graphics card” chain when the scene geometry is processed by the GPU and the physics by the CPU. According to the diagram, the GeForce 6600 GT has everything the right way, surpassing both the RADEON 9800 XT and the GeForce FX 5950 Ultra.

Synthetic Tests: Conclusion

There are some points we’d like to draw your attention to in the results of the GeForce 6600 GT in synthetic tests.

First of all, the new GPU boasts a highest efficiency of multi-texturing, which is indicative of a well-designed system of texture caches. The GeForce 6600 GT delivers slightly more than half the performance of the GeForce 6800 GT in this case, and this fully corresponds to the GeForce 6600 GT’s having twice less pixel pipelines but a higher core frequency.

At single-texturing or at writing colors only (without rendering textures), the GeForce 6600 GT feels the lack of the memory bandwidth that reduces the efficiency of the GPU. We shouldn’t regard this as a serious shortcoming of the GeForce 6600 family chips – in many modern games the load on the pixel processors consists of pixel shaders rather than of trivial texturing, and the speed of the memory subsystem ceases to be among the crucial performance-related factors.

So what about shaders? Cutting it short, the GeForce 6600 GT is excellent at executing them. Sharing the same architecture with the GeForce 6800 GT, but having twice less vertex and pixel processors, the GeForce 6600 GT utilizes its higher core frequency to get higher results than half the performance of the GeForce 6800 GT. It means the new GPU is faster than the ATI RADEON 9800 XT as well as the GeForce 5950 Ultra in a majority of relevant synthetic tests. We don’t even mention the feeble RADEON X600 XT and GeForce 5700/5700 Ultra with their 4 pixel pipelines.

Everything said about the GeForce 6600 GT can be applied to the GeForce 6600 with a correction for its lower clock rates.

So our synthetic tests reveal a huge potential of NVIDIA GeForce 6600/6600 GT GPUs, unattainable by mainstream graphics cards of the previous generation in terms of both functionality and – that’s the most attractive – speed. Moreover, the newcomers can even bear comparison with the previously top-end solutions, the RADEON 9800 XT and GeForce FX 5950 Ultra.

We’re finished with theory now, but that’s not all. In our next report we will run a bunch of modern games on the GeForce 6800/6800 GT to check out the real-life performance of these GPUs.