When 64 Equals 320: Unified Shader Processor of ATI Radeon HD 2000
ATI’s new shader processor is more complex than those in Radeon X1000 or Nvidia GeForce 8000.
Each pixel processor of the R500 series contained 2 scalar and 2 vector ALUs and a branch execution unit. Thus, it was capable of executing up to 4 instructions per clock cycle plus 1 branch instruction. The new shader processor of the R600 chip incorporates 5 scalar ALUs capable of executing one floating-point MAD (Multiply-ADD) instruction per cycle, and one ALU can also execute transcendental instructions like SIN, COS, LOG, EXP, etc. The sixth unit in the R600 shader processor is a branch execution unit responsible for executing flow control instructions (comparisons, loops, subroutine calls). First introduced in the R520, this unit worked with the dispatch processor to accelerate the processing of shaders with dynamic branching.
Besides that, each subunit is equipped with a dedicated array of general-purpose registers. Theoretically, each ALU has to have access to another shader processor’s registers, but it is not certain how things stand in reality. The integration of general-purpose registers into the shader processor helps make GPUs more scalable because a reduction/increase in the number of shader processors automatically reduces/increases the number of registers.
Interesting to note, ATI/AMD prefers to specify the total number of execution units rather than mention the 64 shader processors with 5 ALUs in each. This approach is no worse than others, but you should realize that it’s not quite correct to compare the number of ALUs in the GeForce 8800 and Radeon HD 2000.
We know that each of the Nvidia G80’s 128 shader processors (whose architecture still remains a mystery, by the way) can execute two scalar MAD+MUL instructions per clock cycle. Each of the AMD R600’s processors can perform up to 5 instructions (including one complex one) plus a flow control instruction. Considering the difference in the frequencies of the execution units between the R600 and the G80, we can expect them to deliver similar overall performance. The developers’ data confirm this: Nvidia estimates the computing power of its GeForce 8800 GTX at approx. 520 gigaflops whereas AMD estimates its Radeon HD 2900 XT at 475 gigaflops.