Bookmark and Share

Articles: Video

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 ]

The number of available temporary registers, which can be used during shader processing, has been increased in R420 compared with what we had in R3x0: they increased their number from 12 to 32. This way, R420 appeared even less sensitive to the shader complexity: as you remember, pixel processors of NVIDIA NV3x lost much of their efficiency during complex shaders processing, which required involving a lot of temporary registers.

The internal calculations precision in floating-point calculations performed by the pixel processors remained the same: RADEON X800 supports the data in 16bit and 32bit floating-point format, but performs all calculations in 24bit format only.

The computational power of the pixel processors of the new RADEON X800 got significantly higher compared with the previous architecture: the number of scalar and vector arithmetic-logical units (ALUs) grew twice as big now. The former ATI pixel processors with one vector ALU, one scalar ALU and one texture addressing unit could perform up to three instructions per clock cycle for each pixel, however, ATI RADEON X800 has twice as many scalar and vector ALUs that is why its pixel processors can perform up to 5 instructions for each pixel per clock cycle.

Besides all other improvements, RADEON X800 pixel processors can now process much longer shaders compared with what the previous generation solutions could do. The maximum number of scalar and vector math1ematical instructions was increased from 64 to 512, and the number of texturing instructions – from 32 to 512. Altogether the maximum number of texturing instructions, as well as scalar and vector math1ematical instructions grew from 160 to 1536.

The drastic increase in the maximum number of shader instructions allows RADEON X800 to perform much more complex and resource-hungry pixel shaders in a more efficient way. If the shader turns out so complicated that the VPU resources will turn out insufficient to process it within a single pass, RADEON X800 graphics processors will split its processing into a few stages: the shader will be divided into a few fragments, which will then be processed one by one, And the intermediate results for each fragment will be temporary stored in a special buffer called Fragment Stream FIFO Buffer. It is exactly the F-Buffer, which was officially announced together with the RADEON 9800/9800 PRO solutions, together with a number of improvements intended for higher efficiency of multi-pass shader operations.

Compared with the previous architecture, ATI RADEON X800 boasts wider functionality, however, it nevertheless doesn’t support dynamic branching and looping in the pixel shaders. Therefore, ATI cannot claim the support of shader version 3.0 unlike 2.x, as the support of dynamic loops and branching is a requirement for the shaders version 3.0.

ATI certainly sees the benefits of the 3.0 shader model, but they believe that the time of 3.0 shaders hasn’t come yet: the use of dynamic loops and branching by the existing architectures inevitably results into a performance drop. Even NVIDIA warms against careless use of dynamic branching, and this is definitely not the thing the manufacturers want. At the same time, introducing the corresponding support would require a significant revision of the RADEON X800 pixel processors architecture, which have been intended for non-linear shader processing from the very beginning. As a result, having weighed all cons and pros, ATI engineers decided to give up fully-fledged shader version 3.0 support. Instead they are most likely to introduce a “cut-down” version of the standard without the support of dynamic loops and branching, which will be called 2.b.

The biggest advantage of the RADEON X800 pixel processors architecture, is their stable efficiency and predictable performance. Unlike NVIDIA’s reference cards, ATI RADEON X800, just like the previous ATI VPUs, reacts much calmer to the increase of temporary registers or change of the math1ematical and texturing instructions during shader processing. It means that the developers will be able to create efficient shaders for RADEON X800 with less effort.

Here I should definitely say that with the launching of RADEON X800, ATI decided to follow in NVIDIA’s footsteps and introduce their own shader compiler optimizations. The higher computational capacity of the RADEON X800 pixel processors, should be used in the most efficient way. So, the primary goal is to minimize the number of situations when some ALUs of the pixel processors stay idle, therefore, the compiler will analyze the initial shader code and rearrange the instructions so that they could be processed in parallel.

A little bit later, I will also try to estimate in practice how efficiently RADEON X800 copes with pixel shaders, and now let’s dwell on vertex pipelines of RADEON X800.

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 ]

Discussion

Comments currently: 30
Discussion started: 05/04/04 12:35:49 PM
Latest comment: 08/28/06 08:53:47 AM

View comments

You must log in to add comments.

Forgot password? Registration

remember me