Like in the R600, each shader processor of the RV670 consists of six subunits: five ALUs and one flow control subunit (branching, comparison, loops, subroutine calls). It also contains a set of general-purpose registers.
Four out of the five ALUs are simple, capable of executing one FP MAD instruction per clock cycle, and the fifth ALU can execute complex instructions like SIN, COS, LOG, EXP, etc. This architecture is highly flexible and scalable, but depends heavily on software optimizations. Although each Radeon HD core contains a special task dispatcher, its efficiency depends directly on the efficiency of the shader code compiler, which is part of the driver. The superscalar architecture offers its highest performance when all the ALUs are busy computing independent operations but it is hard to achieve that because in 3D applications many operations depend on the results of previous operations. That’s why Radeon HD GPUs require application-specific optimizations in the driver. Unfortunately, AMD has problems with that as it doesn’t have access to the innards of games participating in Nvidia’s The Way It’s Meant to Be Played program. They have to optimize the driver after the final release of the game. It is more difficult and, sometimes, not as successful as we might wish.
Perhaps ATI’s vision of the future of GPUs is indeed more progressive than Nvidia’s, but game developers are always oriented at the most popular architecture, not the most innovative one, and this architecture is currently represented by the scalar cores of the GeForce 8 series.
AMD has to cooperate more with game developers in order to overcome this problem. It also has to offer new high-speed graphics solutions in due time and reduce prices for its products when necessary. Such measures would make Radeon HD cards interesting for end-users and popular among gamers – the company’s reputation on the consumer 3D market would be restored then.