Synthetic Benchmarks. Performance in Theoretical Benchmarks: Fillrate Investigation
As usual, we will start our theoretical investigation with fillrate and texturing speed. Let’s have a look at Volari Duo V8 Ultra here:
This benchmark displays a number of planes with “semitransparent” textures. In multi-texturing mode contemporary graphics accelerators allow laying 8 textures per pass, so the test scene consists of 8 planes with 8 textures on each, which makes the total of 64 texture layers. When a single texture is laid over a polygon, the scene includes 64 surfaces with one texture layer on each.
As a rule, graphics cards perform better in this test in case of multi-texturing than in case of single-texturing. When we have multi-texturing, the previous Z value and the color should be read only once for each 8 textures. However, without multi-texturing this reading should be carried out for each texturing layer anew. As a result, the graphics cards performance with enabled multi-texturing appears less dependent on the memory bus bandwidth and texturing cache structure.
What does Volari show here? The maximum result we managed to obtain in this benchmark equals 3110Mtexels/s. the theoretical maximum for our graphics solution makes (8 pipelines * 1 texturing unit) * (2 chips) * (350MHz) = 5600Mtexels/s. In other words, the obtained result makes only 55.5% of the theoretically possible maximum, which demonstrates not very high efficiency of the Volari architecture.
By the way, this result also ruined our supposition about the processors of our card having only 4 pipelines each instead of the 8 pipelines claimed by XGI. If this had been true, then the theoretical performance maximum could have been equal to 2800Mtexels/s. The practical results are higher than that, which means that we were wrong.
Now let’s take a closer look at the performance with enabled multi-texturing and the changes that take place if we increase the texture color depth, frame buffer and Z. we can clearly see that the performance of our Volari solution mostly suffers from the shift from 16bit to 32bit textures, i.e. when the chips have to request twice as many texturing data. This is an evident indication of the inefficient caching algorithms implemented in XGI Volari.
A good example illustrating this statement is tri-linear filtering. To perform tri-linear filtering graphics chips require 8 texturing values instead of 4 values usually used for bilinear filtering (except DeltaChrome with its unique tri-linear filtering algorithm, see our Review for details). If our supposition about Volari’s bottleneck is true, then enabled tri-linear filtering should cause a significant performance drop.
To check this out we decided to run a texture filtering quality test from 3DMark03 test package with included bilinear and tri-linear texture filtering:
The results prove our supposition: enabled tri-linear filtering slows down Volari’s texturing speed by 37-47%.
And as a practical proof of the results obtained in this synthetic test we ran Return To Castle Wolfenstein game with enabled bilinear and tri-linear texture filtering:
The performance of the card also gets considerably lower here when we enable tri-linear filtering, however, it is not so dramatic as in the previous test, anyway. The situation in real game is „less synthetic“.