by Anton Shilov , Yaroslav Lyssenko , Tim Tscheblockov, Alexey Stepin
09/29/2004 | 05:31 AM
Graphics hardware for personal computers is continuing to develop faster than any other types of computing processors. With escalating competition between market leaders – ATI Technologies and NVIDIA Corp. – new generations of graphics chips deliver overwhelming advantage over previous generation products. Faster graphics chips and more demanding games urge different benchmarks that can clearly distinguish the best graphics card with others.
In addition to raw computing power visual processing nowadays gain additional functionality that has potential to positively affect performance of future games if only game developers decide there is need to support such functionality. Few really believed that NVIDIA's Ultra Shadow technology can give NVIDIA's hardware irresistible advantage in Doom III game, but it emerged that the tech did allow the GeForce 6-series to beat all the competitors. It is logical, that to predict this disposition the industry needed a benchmark that took advantage of the latest technologies.
Futuremark’s 3DMark03 foresaw performance of graphics cards in the vast majority of current-generation games, including FarCry, Doom III, Tomb Raider: Angel of Darkness, Unreal Tournament 2004 as well as some highly-anticipated DirectX 9.0 games by using complex pixel shaders as well as stencil shadows putting extreme load onto the most expensive graphics cards.
About a year and a half since the release of the RADEON 9700 PRO as well as the GeForce 5800 Ultra the world saw two brand-new graphics chips: the GeForce 6800 Ultra and the RADEON X800 XT that delivered a giant leap in terms of performance over the previous-generation hardware. Nevertheless, future games are likely to demand everything what the current top-of-the-range offerings are able to provide and ask for more. Today we will try to find out which of the contemporary graphics cards is more future-proof according to Futuremark and what we should expect from next-gen games.
As hardware becomes more feature-rich, software makers have to take advantage of that additional functionality in terms of image quality and performance.
Microsoft Corp., the developer of DirectX application programming interface, has demonstrated a great deal of liberalism by allowing two leading developers of graphics chips to offer a bunch of DirectX 9 incarnations (Pixel/Vertex Shaders 2.0, Pixel/Vertex Shaders 2.0a, Pixel/Vertex Shaders 2.0b, Pixel/Vertex Shaders 3.0) that were still considered as standards and eventually treated as such by developers who use DirectX 9 API.
Since the next-generation of pixel and vertex shaders set, Shader Model 4.0, is years ahead, game developers have to cope with existing Shader Models to create new effects. However, already today Shader Model 2.0 graphics pipeline may not be enough to rapidly execute complex shaders that consist of tens of instructions. Therefore, the industry needed a standard approach to extend the SM2.0 capabilities while not changing the overall architecture of the API as well as hardware. As a result of this Shader Model 2.0a, Shader Model 2.0b and Shader Model 3.0 were born. While the new Shader Models emerged quite some time before the software is likely to be able to exhaust all SM2.0 caps, it is pretty clear that the time when the industry is going to need something more advanced than the SM2.0 is around the corner.
Typically more flexible and feature reach hardware imply higher amount of data that can be processed by the chip. As a result, even not really complex graphics effects can be executed faster on comprehensive hardware. For example FarCry game processes up to 4 lights per pass on Shader Model 3.0 hardware and up to 3 lights per pass on Shader Model 2.0b hardware versus one light per pass on traditional Shader Model 2.0 visual processing units.
While four different Shader Models disintegrate the idea of one universal DirectX 9 API and approach to graphics programming, they do drive the computer graphics forward in terms of new effects and allow certain hardware to show additional performance boosts, therefore, game developers will have to switch from fixed Shader Model 2.0 programming approach to more IHV-specific Shader Models. While this does not lead to straight and rapid development process, game developers are unlikely to have much choice here: Shader Model 2.0 invented in 2002 is likely to become obsolete for next-generation titles a couple of years after its [SM2.0’s] introduction, that’s in 2004. Furthermore, with HLSL, a C-like language that allows to compile the code for certain hardware and software configurations, e.g., Shader Models.
With games shifting to a different programming approach we have a new paradigm for graphics cards’ benchmarking: test the fastest standard approach of rendering for particular hardware. This is what X-bit labs has been used for FarCry game for a couple of months and this is how the new 3DMark05 will rely on the hardware.
Futuremark, who calls its 3DMark benchmark “The Gamer’s Benchmark” has been consistently improving functionality and added support for technology innovations of its 3DMark series. The software that saw the light of the day in late 1998 has become a de-facto graphics performance measurement tool for categories varying from amateur gamers to large OEM decision makers.
The main points 3DMark05 brings over the previous version is restricted use of only Shader Model 2.0 and 3.0 across all benchmarking scenes along with dynamic selection of rendering paths for particular hardware.
“3DMark05 chooses render paths based on what DirectX (DX) features are reported as supported by the DX Caps. 3DMark05 builds mostly SM2 compatible shaders which are then delivered to the DX shader compiler, with compilation flags about which shader profile to use as output (2_0/2_a/2_b/3_0). The DX compiler then produces ASM level shaders, and it's up to the compiler which instructions end up in the low level shaders that are actually used for rendering. We have a couple specially tuned shaders, like an SM3 “early out” branching shader in Game Test 1, which exits the shader early, if that pixel is on a surface away from the light. Another special case is the DST shadow shader,” says Patric Ojala from Futuremark.
Futuremark has developed a brand-new engine for its 3DMark05 benchmark. The company says its new software’s engine is more game-like than that of the previous version of 3DMark. Though, Futuremark points out that the 3DMark05 does not sport any physics, AI and other CPU tasks, which is why the benchmark is not really dependant on the central processing units.
The engine in 3DMark05 dynamically builds shaders for each material in HLSL format. These shaders are then runtime compiled to best fit the installed hardware, or the user may manually set which compilation profile to use. The developer insists that such approach allows every hardware to benefit from its capabilities, but do not leave hardware with somewhat limited set of capabilities to render every scene in multiple of passes. Basically, 3DMark05 compares how NVIDIA’s GeForce 6800 Ultra does Shader Model 3.0 render-path to ATI’s RADEON 9700 PRO working using Shader Model 2.0 path.
“We have mostly just one single HLSL shader per material, and for NV40 it is compiled using the SM3 profile and for R420 it’s using PS 2_b for example. 3DMark05 then shows which is faster. Shaders exceeding SM2 limits would show more dramatic differences, but that would basically rule out all SM 2.0 hardware, and we don’t want to do that. We could, for example, add some feature test with some later update showing such extremely long and complex shader difference. But currently we are happy with the game tests, with all shaders fitting into a single rendering pass even on SM 2.0 hardware,” says Patric Ojala from Futuremark.
So, from fixed render-path, the new 3DMark05 goes to a more hardware-specific approach or choosing the render-path. While the software does allow to choose different rendering profiles on feature-rich hardware, such as NVIDIA GeForce 6800 Ultra that sports everything up to Shader Model 3.0, the default score should be achieved on “dynamic” profile.
Futuremark first implemented dynamic shadows in 3DMark2001 using used projection shadow maps, but then it changed the approach in 3DMark03 to stencil-shadows used on Doom III. With the 3DMark05 Futuremark again changes the way it calculates the shadows to a type of depth shadow maps called perspective shadow maps (PSM), a technology that is used by today’s games, such as FarCry.
The implementation is said to be a refinement of what is commonly known as PSM, since these have problems with certain angles of light in the simplest form. The scene is rendered from the direction of the light, as in projection shadow maps, but the depth of each texel in the shadowmap is also stored. This results in a shadow map implementation that has no need for object edge vertex selection, does not add the vertex load with shadow volume polygons and does not add fill load with the invisible but usually large and numerous shadow volume polygons.
PSM in Futuremark’s implementation still offers a global lighting solution that projects shadows correctly, including self shadowing, and is suited for a wide range of different types of 3D scenes and lighting types.
Shadows from directional light sources use a 2048x2048 resolution depth map of the format R32F. If the hardware supports depth stencil textures (DST), a D24X8 depth map is used of the same size. The 2K maps are actually used twice: once for rendering the depth of objects closer to the camera and second time for the rest of the scene. Shadows from point light sources use a 512x512x6 cube map of the format R32F as depth map. These depth maps sound enormous, and one would think they take up quite a bit of fill rate. They most certainly do, and 3DMark05 is therefore less sensitive to changes of the screen resolution. Still, in some cases even two 2K maps are not enough. Game Test 3 shows one of the most difficult environments for PSM use.
The depth maps DST or R32F are both sampled using Percentage Closest Filtering (PCF). If the hardware supports DST and hardware accelerated PCF, a single bilinearly filtered sample is taken. The non-DST rendering path uses four point samples. These two implementations produce a bit different rendering, which can be seen in close inspection, by magnifying parts of frames with some shadow artifacts, and comparing these side by side. In theory, the bilinear filtering is of higher quality than point sampling, but in point sampling, the samples are taken from a larger area, and so in some cases point sampling can produce a smoother looking rendering.
One could argue that the DST and hardware accelerated PCF implementation vs. the non-DST and point sampling code paths do not produce comparable performance measurements, since the resulting rendering shows slight differences. 3DMark05 was designed with the firm belief that those two are indeed comparable, and in the fact that it is the right way to reflect future 3D game performance, the developer claims. Futuremark’s study has proved that over a dozen of the biggest game developers are using DST and hardware PCF for dynamic shadow rending in their latest or upcoming titles. So if DST and hardware PCF are supported, they should be used in depth shadow map implementations, because that is what is done also in the latest and future games. However, if the benchmark user wishes to compare exactly identical rendering performance across different architectures, DST can be disabled in the benchmark settings, and the dynamic shadows are then always rendered using R32F depth maps and four point sample PCF.
After we discussed the engine and the concept of 3DMark05, it’s time to see some performance numbers. For our testing we used two computers: with AGP 8x interface and with PCI Express x16 interface for graphics cards configured as follows:
We did not disable any texture filtration optimization from NVIDIA’s or ATI’s drivers. Please keep in mind that these drivers are on the beta stage of testing by appropriate developers.
In order to determine whether the latest hardware from ATI and NVIDIA does take advantage of the additional functionality we included the results of two top graphics cards running different render-paths compared to determined by Futuremark: we tested NVIDIA GeForce 6800 Ultra in SM3.0, SM2.0b and SM2.0 mode and tried ATI RADEON X800 XT Platinum Edition in SM2.0b and SM2.0 rendering paths.
Pay attention that graphics cards with less than 128MB of onboard memory could not perform full-scene antialiasing in resolutions higher than 1024x768.
NB! We have added 3DMark05 scores with the latest drivers from ATI and NVIDIA to a special feature called "Drivers Still Do It All… Bringing ATI to Victory".
3DMark03 presented an action packed shooter scene aboard a space ship. The defending force bravely fought back against an invading force. In 3DMark05 the battle continues. A cargo ship with valuable cargo gets attacked by space pirates. The pirates board the cargo ship with breach pods and superior fire power. This game test shows only a part of the conflict. Please watch the 3DMark05 demo for the whole story.
It should be obvious that this test reflects the 3D performance of shooter games, which many times take place indoors. In this test, the indoor areas are a bit larger, as opposed to the narrow corridors that are typical for first and third person shooters. The larger area allows a larger number of characters fighting in the same room, which is desirable especially in multiplayer games.
Even though NVIDIA’s UltraShadow technology does not accelerate perspective shadow maps (PSM), NVIDIA’s GeForce 6800-series hardware managed to strongly outperform competing ATI’s RADEON X800-series. While, the difference may not be considered as too significant, based on the numbers NVIDIA produces in Shader Model 2.0 we can attribute its advantage either to more efficient Shader Model 3.0 execution or some tricks with pixel shader precision.
ATI’s RADEON 9800-/9600-/9500-series looks much better compared to the GeForce FX-series, though, NVIDIA’s current mainstream offering – GeForce 6600 GT manages to beat ATI’s RADEON 9800 XT in all resolutions.
Even though ATI RADEON X800 hardware is known for its perfectly efficient work in our eye candy – full-scene antialiasing 4x and anisotropic filtering 16x – mode, thanks to more efficient shader processing NVIDIA’s GeForce 6800-series maintains its lead in 3DMark05 Game Test 1 test even with extreme load.
A forest gets filled with magic fireflies in the night. The moon is nearly full, illuminating the forest with a bluish faint light. The magic fireflies have flickering bright green lights that playfully move around the forest.
This scene is a nice example of a smaller scale outdoor scene with rich vegetation. Immediate visibility is not so far, and there is a skybox surrounding the whole scene. There are a large number of trees, all swaying in the light breeze, the branches swinging separately, and there is dense vegetation on the ground. The vegetation on the ground is actually one of the key interests in this scene. It is dynamically distributed where needed, according to the camera movements. Its level of detail is also dynamically altered depending on the distance to the camera. The other key interest in this scene is the amazing lighting and dynamic shadow system. This scene really is ideal for showing the benefits of perspective shadow maps.
The different in performance between competing RADEON X800- and GeForce 6800-series solutions is pretty negligible, especially given framerate of about 13 per second, but it exists and clearly favours NVIDIA’s hardware across all resolutions.
The RADEON 9xxx shows advantage over the GeForce FX-series, but NVIDIA’s new mainstream fighter GeForce 6600 GT manages to score inline with the GeForce 6800 product, not talking about older-generation offerings.
We should pay attention that NVIDIA’s GeForce 6800 gets a noticeable boost when functioning using Shader Model 2.0b or Shader Model 3.0 render-paths.
With full-scene antialiasing 4x and anisotropic filtering 16x ATI’s RADEON X800 shows its capabilities and manages to catch up and even outperform rivaling GeForce 6800 series in some cases. Still, we should pay attention to extreme performance of the GeForce 6600 GT that operates with Shader Model 3.0 and brings itself in a row with more expensive graphics cards.
A Jules Verne type airship flies through a canyon guarded by a dangerous sea monster. The airmen defend their ship using heavy cannons, but these seem to have no effect on the huge sea monster. Finally the crew manages a narrow escape using the ‘last resort’ afterburners of the airship. The game test only shows a part of this adventure. Please watch the demo for the whole story.
This test gives an example of a large scale outdoor scene. The scene is fairly complex with large areas of water reflecting the high canyon walls. The water actually is one of the key points of interest in this scene.
The water not only does realistic looking reflections and refractions, it has a depth fog, making the sea monster swimming under the airship actually look deep down in the water. The air in this scene also uses a volumetric fog, making distant cliffs of the canyon really look far away.
In the “Canyon Flight” test of the 3DMark05 benchmark NVIDIA’s GeForce 6800- and ATI’s RADEON X800-series show basically comparable results, even though ATI’s RADEON X800 XT manages to lead in 1024x768, while NVIDIA’s 6800 Ultra part succeeds in leaving the rival behind in 1600x1200.
Naturally, the RADEON 9800-/9600-/9500-series showcase higher results compared to the GeForce FX graphics cards because of much more efficient pixel shaders execution.
The GeForce 6600 GT brought us a great surprise by performing in line with a $399 GeForce 6800 GT and leaving another $399 offering – RADEON X800 PRO behind.
The situation with “Pure Mode” continues in “eye candy mode” – ATI’s RADEON X800 and NVIDIA’s GeForce 6800 families of graphics processors fight for the top spots, while the much less expensive GeForce 6600 GT manages to bite its higher-end brethren, but only in 1024x768 resolution. Even though the RADEON 9000-series of chips show lead over the GeForce FX-series, the results of both are very low.
Each of the three game tests generates an average frame-rate (frames rendered per second measurement) that is used to calculate the overall 3DMark score. The formula for calculating the overall 3DMark05 score is:
3DMark05 score = (Game Test 1 * Game Test 2 * Game Test 3)^0.33 * 250
This formula generates a geometric mean, weighting the game tests equally on the total score. Even though one game test may be heavier (run slower) than another test, both of these affect the total score equally.
The overall scores of 3DMark05 tell us that NVIDIA has grabbed the lead in terms of performance in future games. Even though the results of graphics processors with 16 pixel pipelines and 6 vertex processors are close, NVIDIA’s GeForce 6800 Ultra pulls a bit ahead in all resolutions.
ATI’s RADEON X800 PRO is about 350 marks behind its direct rival GeForce 6800 GT and is 800 marks behind the RADEON X800 XT Platinum Edition in default benchmark run. Probably everyone is capable of figuring out whether this is important or not, but the fact that the X800 PRO has, at least, surpassed 4000 marks milestone.
The GeForce 6800 is far behind more expensive options and finds itself on the same line with far more affordable GeForce 6600 GT.
RADEON 9800 XT is the king of the previous-generation graphics cards, still it cannot keep up with the current mainstream offerings, which is a sing of unbelievable progress.
Some professional reviewers have noted that the fill rate tests of previous 3DMark versions have been somewhat bandwidth limited. Bandwidth is tightly tied to fill rate, since any game that does lots of fill, also
has to use a large amount of large texture maps, which in turn stresses the bandwidth.
The fill rate tests of 3DMark05 are a bit different than before, more theoretical, minimizing the influence of bandwidth and attempting to concentrate on measuring fill rate. The tests do not look as nice as the previous versions, but any nicer effect would require larger texture maps and thereby again move the bottleneck towards graphics memory bandwidth, Futuremark says.
One of the most complex materials in the game tests is the rock face shader of game test 3. This is separated to a feature test, showing the light moving on the rough surface. There are no real time shadows, only vertex lighting. There is also no water surface, only the rock face.
Filling the screen with a rock face is naturally fairly fast, since the game test shows huge amounts of that rock face in addition to the water air ship and sea monster. This test will be also bandwidth dependent, since any game like material with a complex shader will also have a number of lookups to large textures. The alternative is to do some procedural texturing, but we did that already in 3DMark03. It seems that most PC games will stick to normal color maps mainly that have been made during development, instead of loading the pixel shader with creating procedural textures.
Vertex shader feature tests have been included in 3DMark since 3DMark2001, which presented DirectX8, where vertex shaders were initially presented. The vertex shader tests replaced the earlier polygon throughput tests, or called high polygon tests in 3DMark2000. Polygon throughput is with fill rate the most important single performance characteristic in 3D rendering. Since 3DMark03, 3DMark has used vertex shaders for all vertex processing in the game tests. Vertex shader tests have therefore been a logical substitute for the earlier fixed function vertex throughput tests. There are two different vertex shader tests in 3DMark05: One very simple doing only very simple transformation and one light lighting. The other does a more complex vertex shader waving a large number of grass straws.
Vertex Shader – Simple. This test does simple transformation and single light lighting on six high polygon sea monster models. Each sea monster has over one million vertices to transform and illuminate, so the total workload is quite substantial. The vertex shader used here could quite well fit into a shader model 1 vertex shader, but since 3DMark05 concentrates on SM2 and offers different SM2 (and 3) profiles to choose from, the shader is declared in HLSL and SM2 as all shaders in the game tests.
Vertex Shader – Complex. This illuminates, and most importantly, transforms a large number of grass straws. Each straw is skinned and bent separately, more towards the tip of the straw, like real grass straws waving in the wind. The straws are waved according to a fractal noise calculated on the CPU, but it is highly optimized to decrease the influence of the CPU performance on the measurement. The grass is kept at a distance from the camera, offering a less interesting visual effect, but this is necessary to decrease the influence of fill rate to the measurement.
The test renders a very simple scene very much un-optimized, targeting a weak spot in most graphics drivers available today. Graphics IHVs have for years educated game developers to render as large batches as possible. However, it would be beneficial if the rendering of smaller batches would be optimized too.
This test has been requested ever since developing 3DMark2001, but for this 3DMark version more than one BDP member asked for it. There are six runs of this test, where 128 meshes of 128x128 quads are draw with 8, 32, 128, 512, 2048 and 32768 triangles per batch.
The last two batch sizes should be considered an optimized one for most drivers today, but the smaller the batch sizes get, the slower the rendering will be. Color change state changes were added between the rendering batches to make sure DirectX doesn’t collapse the whole rendering into a single or very few batches. Early versions of this test without the state changes caused this, and gave quite obscure results. The test therefore also is somewhat dependent on how fast the driver does rendering state changes.
Just when we were about to finish our benchmarking process with ATI CATALYST 4.9 (8-051-040825a-017633c) and NVIDIA ForceWare 66.29 drivers first certified by Futuremark to be used along with the 3DMark05, the developer of the benchmark informed us that there are new drivers certified for the new 3DMark05.
We have heard that the new drivers change performance of the graphics cards, therefore we decided to retest some of our boards along with the ForceWare 66.51 and ATI CATALYST 8-07-018211e drivers with the purpose to update the article when we get the results.
The results appeared to be pretty surprising. Even though NVIDIA’s GeForce 6800- and 6600-series of graphics processing units acquired new drivers, they did not face any substantial speed boosts, except in FSAA + AF mode. ATI’s AGP graphics cards, however, delivered brilliant performance advantage over previous driver version.
When asked about the reasons, ATI said there had been a bug in memory management of AGP graphics cards with 256MB of memory which affected performance in FarCry, Unreal Tournament 2003/2004 as well as in 3DMark05. When the bug was corrected, ATI’s RADEON X800-series products for AGP bus skyrocketed in terms of performance giving ATI's RADEON X800 series a lead over NVIDIA's GeForce 6800.
ATI's and NVIDIA's Driver Comparison, AGP 8x Graphics Cards
Game Test 1
Game Test 2
Game Test 3
In order not to overload the page with results across all the game tests, we present them in thumbnails.
ATI's and NVIDIA's Driver Comparison, PCI Express x16 Graphics Cards
Game Test 1
Game Test 2
Game Test 3
Getting the drivers that drive the speed up in the most-anticipated benchmark is certainly a suspicious deal. We cannot accuse ATI of anything wrong, as the company has been pretty open in terms of the source for the speed boost, but let us hope that another “driver war” will not break out with the release of the 3DMark05.
3DMark03 used to reveal the disposition between the RADEON 9800-, 9700-, 9600-, 9500-series and the GeForce FX-series graphics cards in terms of performance: while being somewhat more feature-rich than competitors, NVIDIA’s GeForce FX products fell tangibly under the RADEON 9xxx in terms of performance.
Today 3DMark05 outlines us some other important trends:
Besides, 3DMark05 reshuffles the positions of graphics market leaders:
The 3DMark05 itself delivered exceptional eye candy in all of its game tests. Highly-polygonal models along with massive amount of shader effects definitely bring the graphics bar to the next level. Some would argue that the world has not seen many 3DMark03-like games in reality, but on that there is only one answer: progress cannot be stopped.