Pixel Pipelines, Pixel Shaders
The first thing to catch your eye when you are checking out the specs of the new RADEON X800 is higher clock frequencies and more pixel pipelines in the new ATI VPUs: RADEON X800 XT Platinum Edition features 16 pipelines, and RADEON X800 PRO features 12.
However, it wouldn’t be quite correct to state that RADEON X800 XT and X800 PRO feature 16 and 12 pixel pipelines respectively, even though the simple flow-chart for RADEON X800 does have all 16 of them:

16 pixel pipelines of RADEON X800 are split into 4 groups with 4 pipelines in each of them. In other words, RADEON X800 in fact has not 16 pixel pipelines, but only 4 of them, but each of these 4 “wide” pipelines works not with isolated pixels but with groups of four pixels.
![]()
As a result, XT version of the RADEON X800 features 4 pipelines of the kind, which corresponds to 16 “regular” pipelines, while the PRO version features only three of them and can display 12 pixels per clock this way.
The remarkable thing about it is the fact that all versions of the RADEON X800 graphics processors are identical and all of them feature the full set of “wide” pixel pipelines, but only RADEON X800 XT uses all four of them. RADEON X800 PRO has one pipeline disabled, and RADEON X800 SE will have two “wide” pipelines disabled. This approach is a good solution for ATI, as the use of identical dies for all VPU versions allows using the solutions with some defects in one or two of the “wide” pipelines for slower VPU modifications by simply disabling the defective quarter. This way they can also increase the overall production yields, as less chips will now go into the waste basket. Moreover, ATI claims that disabling the pipelines will not affect the work of the HYPER Z HD technology. In other words, we should no longer have the same problem as that of RADEON 9800 SE, i.e. when the disabled half of the pipelines and disabled HyperZ resulted into an even more severe performance drop.
But let’s get back to pixel pipelines of the R420.
So, each of the four “wide” pipelines in RADEON X800 processes 4 pixels: they are located as a 2x2 (Quad) block. At the same time, there can be a few (dozens or hundreds) 2x2 blocks processed or queuing to be processed: such functional organization allows to hide the sample and texture filtering latency, which otherwise could make dozens and hundreds of clock cycles.

Each “wide” pipeline works independently of the others and has its own independent resources: texturing units, pixel processors, and finally, cache-memory of the pixel processors (Shader State Memory), which stores the info for each processed pixel such as the status of the current shader in progress, temporary registers and constants, interpolated texturing coordinates, color, etc.



