Radeon HD 2000 Texture Processor: Enough Is Enough?
Although there is a noticeable trend for games to use more and more shader-based visual effects, practice suggests that complex texturing is still employed widely and the 16 TMUs of the Radeon X1950 XTX often prove to be its bottleneck. It seems so odd then to see the Radeon HD 2900 still have 16 texture filter units.
The Radeon HD 2000 texture processor is a highly integrated device configured like follows:
- 8 texture address units to calculate the address to sample
- 20 texture samplers
- 4 texture filter units
The new texture processor supports filtering of FP32 textures as well as the vertex texture fetch feature the Radeon X1000 did not support. It also supports 8192x8192 textures and RGBE 9:9:9:5 texture format to comply with DirectX 10 requirements. Besides everything else, ATI/AMD claims an improved quality of anisotropic filtering.
Regrettably, ATI’s new top-end solution Radeon HD 2900 has only 16 texture filter units, which look like a potential bottleneck. The Radeon HD 2600 has only 8 texture filter units: more than in the Radeon X1600, but just as many as in the Radeon X1650 XT.
For comparison, the Nvidia G80’s texture processors contain a total of 64 texture filter units – a fourfold advantage! There is parity in terms of texture address units, though: Nvidia’s 8 processors with 4 address units in each against the ATI R600’s 4 processors with 8 units in each. 32 texture address units in each GPU. So, the two solutions should be roughly equal to each other in pure texturing speed, but the R600 may be inferior to its opponent in the speed of texture filtering. We’ll check this out shortly in our theoretical tests.
To mask texture-mapping latencies, ATI enlarged the L2 texture cache to 256KB in the Radeon HD 2900 (R600) and to 128KB in the Radeon HD 2600 (RV630). The Radeon HD 2400 (RV610) uses an unknown-capacity unified L1/L2 cache for textures and vertexes.
Besides caching algorithms, performance hits due to not very high texturing speed can be avoided by means of the dynamic dispatcher and features like Fetch4 (it accelerates fourfold the sampling of one-component textures from adjacent addresses), yet we are still to see how well the ATI R600’s texture-mapping units perform in practice.