Specifications, Features and Peculiarities of NVIDIA GeForce 6800/6800 Ultra
NVIDIA’s last-generation graphics processors based around the NV30/NV35 architecture differed from ATI’s R300/R350 chips in higher flexibility, but the advantage in functionality of pixel processors led to a pretty slow shaders execution. This remains a sore spot of the CineFX/CineFX 2.0 architecture and GeForce FX chips in their competition with RADEON 9xxx graphics processors.
When developing the next-generation processor, NVIDIA couldn’t help relying on the previous experience. Of course, the company couldn’t stand a situation when solutions from its main rival were faster in nearly every price sector, in spite of NVIDIA’s formal technological superiority. That’s why besides further functionality enhancements and the introduced support of version 3.0 pixel shaders, they tried to give a performance boost to the NV40 and to enforce the weak sides of the CineFX architecture.
Here are the basic characteristics of the NVIDIA GeForce 6800 Ultra GPU in comparison with top-end graphics processors of the previous generation (NVIDIA GeForce FX 5950 Ultra and ATI RADEON 9800 XT).
| NVIDIA GeForce 6800 Ultra | NVIDIA GeForce FX 5950 Ultra | ATI RADEON 9800 XT | |
| Process technology | 0.13 micron | 0.13 micron | 0.15 micron |
| Transistor Count | 220 million | 125 million | 110 million |
| GPU clock-speed | 400MHz | 475MHz | 412MHz |
| Memory controller | 256 bit DDR/GDDR2/GDDR3 | 256 bit DDR SDRAM | 256 bit DDR SDRAM |
| Memory clock-speed | 1100 (550) MHz | 950 (475 DDR) MHz | 730 (365 DDR) MHz |
| Peak theoretical memory bandwidth | 35.2GB/s | 28.3GB/s | 21.8GB/s |
| Maximum memory size | 512MB? | 256MB | 256MB |
| AGP interface | AGP 3.0 4x/8x | AGP 3.0 4x/8x | AGP 3.0 4x/8x |
| Pixel processors, Pixel Shaders | |||
| Pixel shader version | 3.0 | 2.x | 2.0 |
| Max number of pixels per clock | 16 | 4 | 8 |
| Max number of Z-values per clock | 32 | 8 | 8 |
| Number of TMUs | 16 | 8 | 8 |
| Static loops and branching | yes | no | no |
| Dynamic loops and branching | yes | no | no |
| Max number of textures per shader | 16 | 16 | 16 |
| Max number of texture instructions | n/a | 1024 | 32 |
| Max number of arithmetic instructions | n/a | 1024 | 64 (+64) |
| Max number of instructions per shader | n/a | 1024 | 96 (+64) |
| Registers | n/a | 2 color registers, 512 (1024) constant registers, 8 texture coordinates registers, 16 TMU identification registers, 16 (32) temporary registers, 4 resulting color registers, 1 resulting Z register | 2 color registers, 32 constant registers, 8 texture coordinates registers, 16 TMU identification registers, 12 temporary registers, 4 resulting color registers, 1 resulting Z register |
| Data representation formats | Fixed point?; 16-bit float; 32-bit float | Fixed point; 16-bit float; 32-bit float | Fixed point; 16-bit float; 32-bit float |
| Internal pixel shader pipeline precision | 128-bit pixel precision; 32-bit float precision; 16-bit float precision | 128-bit pixel precision; 32-bit float precision; 16-bit float precision | 96-bit pixel precision; 24-bit float precision; |
| Multiple Render Targets | yes | no | yes |
| FP Render Target | yes | no | yes |
| Texture filtrations | Bilinear, trilinear, anisotropic, trilinear + anisotropic | Bilinear, trilinear, anisotropic, trilinear + anisotropic | Bilinear, trilinear, anisotropic, trilinear + anisotropic |
| Max level of anisotropic filtering | 16x | 8x | 16x |
| Vertex processors, vertex shaders | |||
| Vertex shader version | 3.0 | 2.x | 2.0 |
| Number of vertex processors | 6 | 3 | 4.0 |
| Static loops and branching | yes | yes | yes |
| Dynamic loops and branching | yes | yes | no |
| Max number of instructions per shader | n/a | 256 | 256 |
| Max number of instructions with loops extensions | 65536? | 65536 | 65536 |
| Registers | 16 input registers, 16 temporary registers, 256 constant floating-point registers, 256 constant integer registers, 256 Boolean registers, 1 address register, 1 loops counter register, 8 output registers for texture coordinates, 1 fog color output register, 1 vertex position output register, 1 pixel size output register, 2 output registers for diffuse/mirror color component | 16 input registers, 12 temporary registers, 256 constant floating-point registers, 16 constant integer registers, 16 Boolean registers, 1 address register, 1 loops counter register, 8 output registers for texture coordinates, 1 fog color output register, 1 vertex position output register, 1 pixel size output register, 2 output registers for diffuse/mirror color component | |
| Data representation formats | 32-bit floating point | 32-bit floating point | 32-bit floating point |
| Texture read from vertex shader | yes | no | no |
| Tesselation | no | no | no |
| Full Scene AntiAliasing | |||
| FSAA Methods | Supersampling, multisampling, rotated-grid multisampling? | Supersampling, multisampling, ordered-grid supersampling, ordered-grid multisampling | Rotated-grid multisampling |
| Number of samples | 2..8 | 2..8 | 2,4,6 |
| Technologies aimed at higher memory bandwidth efficiency | |||
| Hidden Surfaces Removal (HSR) | yes | yes | yes |
| Frame-buffer, z-buffer, texture compression | yes | yes | yes |
| Fast Z-clear | yes | yes | yes |
The list of parameters and features is impressive: this is a new-generation solution from any point of view. The pixel pipeline can output up to 16 pixels per clock cycle; there are 6 vertex processors – any competitor should beware of that power. The CineFX 3.0 architecture supports shaders version 3.0, a new full-screen anti-aliasing method, an improved anisotropic filtering algorithm, Ultra Shadow II technology for faster processing of shadows in next-generation games like Doom III. High-Precision Dynamic-Range technology allows building scenes with a high dynamic lighting coefficient.
Now let’s take a closer look at the basic characteristics of the NVIDIA NV40.







