HyperZ HD
HyperZ HD is a further developed idea of hierarchical tile Z-buffer, which has been implemented in all ATI solutions starting with RADEON 256.
The major goal of HyperZ HD is to make sure that the resources of the VPU are not wasted on processing of the polygon pixels, which will be located behind the already drawn ones in the final scene and hence will not be visible at all. The simplest example here: imagine that you are in front of the closed door. Then why should we draw the room interior if you cannot see it anyway: the door is closed?
To exclude the whole blocks of pixels, HyperZ HD technology uses not only the “regular” Z-buffer, but also low-resolution Z-buffer, or tile Z-buffer. Each new value taken from these Z-buffers stores the biggest Z value selected from the entire block – tile – of pixels.
RADEON X800 seems to be using two tile Z-buffers with different resolutions. Let’s consider the algorithm used by RADEON X800 for work with tile hierarchical Z-buffers. ATI calls it Hierarchical Z.
When the polygon to be processed arrives, HyperZ HD splits it into 8x8 pixel blocks and finds minimal Z value for 64 pixels of each block: it will evidently be found on one of the corner pixels, that is why it is more than enough to check only the 4 Z values in the corners of the block.
If this Z value appears higher than the value previously stored for this block in the tile Z-buffer (it could get there during the processing of previous polygons, or during Z-buffer initialization before the frame creation), then the entire 8x8 pixels block can be omitted. It will mean that the closest pixel from this block is anyway farther than the farthest of the previously drawn pixels.
If the smallest Z value for the pixels from the considered 8x8 block is smaller than the previously saved Z value for this block, then it means that the entire block or at least a part of it is visible. In this case, the 8x8 block is again split into 4 4x4 pixel blocks and the similar checking in done for each of the smaller 16-pixel groups now.
The Z value for the 4x4 block is stored in the second lower resolution Z-buffer. If the minimal Z value for the pixels from the 4x4 block is higher than the previously saved one, then the block is no longer considered.
Finally, when we discover that the 4x4 pixel block is completely or partially visible, then HyperZ HD finishes up the block by block checking and gets down to its “classical” work, which implies checking the Z values of individual pixels and comparing them with the values stored in the regular Z-buffer (Early Z Test).
To make this entire algorithm work correctly during scene creation, i.e. during polygons drawing, the information stored in the main Z-buffer and in the tile Z-buffers should be constantly updated. Namely, the VPU should save in the Z-buffer the calculated Z values for the pixels, and in the two tile Z-buffers – the maximum Z values for the 4x4 and 8x8 pixel blocks.
Besides that, the initial Z-buffer values should be set before a new frame starts building. It is remarkable that due to this organization of the hierarchical Z-buffer work, there is no need to save the initial values in the main Z-buffer. Instead, you can set the initial values of the “rough” tile Z-buffer, which stores the Z values for 8x8 pixel blocks, and other Z-buffers will get the actual values during scene creation by themselves.
ATI calls this option Fast Z Clear. Really, instead of writing initial values in the main Z-buffer, which is about 8MB in 1600x1200 resolution, we can only save the values in the “rough” tile Z-buffer corresponding to 8x8 pixel blocks and thus transfer 64 times less data.
With 4 “wide” pixel pipelines, RADEON X800 can check the Z values for 4 8x8 pixel blocks per clock cycle. And if they appear invisible, then we can immediately exclude 256 pixels from consideration. Really nice efficiency, don’t you agree?
Besides increasing the efficient use of the graphics processor resources by omitting invisible pixels, HyperZ HD allows to significantly reduce the amount of info referring to Z, which has to be transferred along the memory bus: hierarchical Z-buffers are stores in cache, unlike the main Z-buffer. Moreover, the cache is big enough to store both: hierarchical Z-buffers as well as the required fragments of the main Z-buffer even in the highest resolutions.
So, when we compare RADEON X800 XT Platinum Edition and RADEON X800 PRO with the newest graphics processors from NVIDIA and the previous generation chips in the gaming tests, we will definitely try to see, how efficient HyperZ HD actually is. And now let’s take a break from 3D and pay our attention to the way RADEON X800 plays and processed video streams.



