Graphics business unit of Advanced Micro Devices this week unveiled its new graphics processing unit (GPU) code-named Cayman, which significantly changes the architecture of ATI Radeon graphics processors, but brings relatively small performance improvement compared to the previous generation of products.
“AMD Radeon HD 6900 series graphics feature AMD’s second-generation, DirectX 11-capable architecture, new image quality improvements and up to 2GB of graphics frame buffer, making it a great choice for gamers and enthusiasts,” said Matt Skynner, corporate vice president and general manager of GPU division at AMD.
The Radeon HD 6900 "Cayman" GPU sports a number of major architectural changes compared to the previous-generation graphics processors.
- Firstly, the Cayman has two independent graphics engines, each of which has its own vertex assembly, geometry assembly, tessellation, backface culling, clipping, rasterization/HyperZ and so on units. As a result, the chip has a peak throughput of 2 primitives per clock while maintaining a peak rasterization rate of 32 pixels per clock, a significant improvement over the previous-generation products.
- Secondly, the Radeon HD 6900 changes the stream core (SC) to the so-called VLIW4 (very long instruction word) architecture. Previous-generation graphics processors featured VLIW5 architecture stream cores and each of the SCs featured four simple arithmetic logic units (ALUs, or processing elements as developers sometime call them) for simplistic operations and one so-called transcendental arithmetic logic unit capable of performing one complex instruction per clock. While the architecture survived for over four years, engineers from the company claim that it was almost impossible to utilize all five ALUs at once due to difficulties with register management and natural complexity of writing a program that could use all five units at once. The new four ALUs are neither simplistic nor complex and can perform up to 4 FMA (or 4MAD or 4 MUL of 4 ADD, etc) operations per clock. According to developers, such architecture saves around 10% of die area while maintaining similar performance and simplifies scheduling and register management. As a result of SC architectural change, each SIMD of the Cayman processor 64 stream processors. AMD claims that the maximum amount of SIMDs per chip is 24 (1536 ALUs), but earlier the rumours claimed that it could be as high as 30.
- Thirdly, the new chip supports the so-called asynchronous dispatch, which allows the GPU to perform multiple completely independent tasks from completely different applications in parallel. For example, if today a video game requires GPU to process both graphics and physics effects, then GPUs have to first compute physics and then process graphics. In case of Cayman it is possible to assign certain SIMD [single input multiple data] engines to certain tasks. Unfortunately, DirectX 11 and OpenCL 1.1 application programming interfaces do not support this capability, which is why it will be exposed only in the future.
- In fourth, AMD brings in intelligent power tuning technology called PowerTune, which automatically adjusts GPU power draw by dynamically controlling clock-speeds.
- Additional improvements include new render back end units, support of higher-quality antialiasing and improvements aimed at general purpose computing on GPUs.
“Delivering DirectX 11 performance with intelligent tessellation, image quality improvements with new anti-aliasing modes and AMD PowerTune technology, we believe AMD Radeon HD 6900 series graphics cards will make excellent gifts this holiday season," added Mr. Skynner.
Initially, AMD plans to release two graphics cards based on the code-named Cayman GPU:
- Radeon HD 6870 – 1536 stream processing units, 96 texture units, 32 render back ends, 880MHz clock-speed and 256-bit memory bus. The chip incorporates 2.64 billion of transistors, has die size of 389mm2 and has maximum compute performance of 2.7TFLOP/s SP and 0.675TFLOP/s DP. The card can connect up to five displays at once (using 2x DVI, 2x mDP + HDMI connectors) and carries 2GB of 5.5GHz GDDR5 memory. The recommended e-tail price of the novelty is $369.
- Radeon HD 6850 – 1408 stream processing units, 88 texture units, 32 render back ends, 800MHz clock-speed and 256-bit memory bus. The chip has maximum compute performance of 2.25TFLOP/s SP and 0.563TFLOP/s DP. The card can connect up to five displays at once (using 2x DVI, 2x mDP + HDMI connectors) and carries 2GB of 5.0GHz GDDR5 memory. The recommended e-tail price of the novelty is $299.
Even though the new-generation graphics cards offer lower amount of stream processing elements than predecessors from the Radeon HD 5800-series, it does offer the same compute performance and even manages to leave the previous-generation products behind in actual games, based on the early-look at the graphics boards' performance from X-bit labs.
The Cayman graphics processor was supposed to be made using 32nm process technology at Taiwan Semiconductor Manufacturing Company. Unfortunately for AMD's ATI division, TSMC scrapped the 32nm fabrication process and the chip designer had to redesign the chip and trim the number of its processing elements so to ensure high yields and moderate costs. If it was made using 32nm, it would have incorporated more SIMD engines and stream processors, which would have ensured much higher performance.
ATI Radeon HD 6970 2GB and ATI Radeon HD 6950 2GB are available now from various suppliers across the globe, according to AMD.