FarCry Performance Revisited: ATI Strikes Back with Shader Model 2.0b

The latest 1.2 patch for the currently most-discussed game not only enabled Shader Model 3.0 on NVIDIA’s GeForce 6 hardware, but also brought Shader Model 2.0b for ATI’s latest RADEON visual processing units in order to improve performance and image quality. X-bit labs has run an exhaustive selection of FarCry benchmarks to find out whether long pixel shaders along with geometry instancing helped ATI to bring the performance up.
UPDATE: X-bit labs have added scores in FSAA and anisotropic filtering mode to the article.

by Anton Shilov , Alexey Stepin
07/25/2004 | 08:08 AM

UPDATE: Added section with performance numbers for the "Eye-Candy" mode with full-scene antialiasing and anisotropic filtering activated. More comments in "Conclusion" section.

Introduction

The most-demanding first person shooter game currently available has not only made it to the top of gamers’ lists, but also has become the No.1 benchmark for hardware reviewers. Being the only title currently available that uses DirectX 9.0 in all its glory, the FarCry reserves a lot of headroom for its developers along with graphics processor makers to tweak the game’s settings and render paths in order to achieve maximum speed.

 

Early in July, 2004, Crytek and NVIDIA released a patch and a driver that enabled Shader Model 3.0 use in the FarCry game for performance acceleration purposes. However, Shader Model 3.0 if far not the only way to improve speed by using more advanced rendering techniques: apparently, Shader Model 2.0b that is supported by ATI’s hardware is also capable of boosting the speed up, according to ATI, who will shortly release a new driver version that enables the capability.

Today we have a chance to evaluate performance of ATI’s latest RADEON X800-series visual processing units with additional speed increase provided by well-organized programming of pixel shaders along with geometry pipeline


Shader Model 3.0 vs Shader Model 2.0b: Long Pixel Shaders to Improve Performance

As revealed earlier, there are no new effects brought into FarCry with Shader Model 3.0; the same applies to the Shader Model 2.0b – both are aimed to bring up the performance by calculating one “long” pixel shader instead of numerous “short” pixel shaders when rendering lighting indoors, saving precious rendering passes.

Many indoor scenes of FarCry contain a number of light sources, each requiring a number of pixel shader operations to be performed. Traditional pixel shaders 2.0 graphics processors have 32 texture and 64 arithmetic instruction slots, since besides lighting there are a lot of other operations need to be done, time-honoured DirectX 9.0 graphics processors, such as the RADEON 9700-series, cannot calculate the lighting in a single pass due to the absence of enough instruction slots. More advanced graphics processors can handle much more instructions: for instance, the RADEON X800 has enough slots for 512 scalar and vector math1ematical instructions in addition to 512 texture instructions, which is a minimum requirement for pixel shaders 3.0, while the GeForce 6800-series graphics processors can theoretically handle up to 65536 instructions. Higher number of instruction slots allows graphics processing units to perform more complex pixel shaders, which is crucial in a lot of cases.

It is necessary to point out that there are a lot more differences between ATI’s and NVIDIA pixel processors, for instance, NVIDIA’s pixel pipes sport loops and branches, while ATI’s does not. At the same time, availability on instruction slots does not mean that a visual processing unit with more slots will work faster compared to a chip with fewer slots: a lot depends on core-clock as well as time-registers.

Since modern graphics processors can calculate more data per cycle, it makes sense for APIs to utilize this capability in order to make use of more powerful GPUs. Microsoft’s HLSL compiler allows to program shaders according to target model, e.g., Shader Model 2.0b implemented by ATI or Shader Model 3.0 implemented by NVIDIA. Merging a number of pixel shaders into one results in saving rendering passes and increase of performance, which is done by Crytek for FarCry. This is absolutely legal and very efficient way to improve speed, as it does not require a lot of work from game developers, but provides more comfort gaming experience.

FarCry with the latest patch 1.2 has brought two new rendering paths to support the latest graphics processors: Shader Model 2.0b for ATI’s RADEON X800, X600 and X300 series as well as Shader Model 3.0 for NVIDIA’s GeForce 6800-series and other GeForce 6 products. NVIDIA’s render path seems to be more efficient: it calculates up to 4 lights in a single pass, while ATI’s path can calculate up to 3 lights per pass. ATI says that it is possible to calculate 4 lights on the X800 hardware in single pass, however, by now this has not been done, which may mean that either it will be enabled in future FarCry patches, or will not be activated ever for some reason.


Geometry Instancing: See the Difference!

Despite of hardcore requirements, FarCry still uses a number of simplifications in order to deliver higher performance to customers. For instance, some distant vegetation uses 2D sprites instead of geometry for boosting speed. There is a way to re-enable “full geometry” in the game, but once activated, you are going to get a “slide-show” speed even on the most powerful graphics cards and microprocessors. To provide astonishing geometry amid low performance penalty game developers are advised to use geometry instancing for similar objects.

Currently, games face limits on the number of unique objects they can display in the scene, not because of graphics horsepower, but often the CPU-side overhead of either storing or submitting many slightly different variations of the same object. With geometry instancing, the VPU can create multiple objects from a single geometric model. Rather than passing an entire new model for each on-screen item, the application can send one model, and then supply parameters which indicate how each instance of that model is to be rendered in the scene. This results in savings on the CPU side.

FarCry v1.2, sprites vs. geometry on distant objects

Sprites distant ratio 1

Sprites distant ratio 50

Sprites distant ratio 100

Sprites distant ratio 100, shader model 2.0b

While it was reported earlier that geometry instancing is a prerogative of the Shader Model 3.0, in fact, it is not. Shader Model 3.0 requires geometry instancing to be supported, but it appears that RADEON X800-series along with some other RADEON graphics processors sport geometry instancing as well, ATI claims.

Geometry instancing alone, as our previous FarCry benchmark session revealed, does not bring any huge performance benefit to FarCry. However, once we enable the “eye-candy” geometry from the console and get a bit more realistic plants, we will not be able to play because of poor performance unless we enable geometry instancing (via Shader Model 3.0 or Shader Model 2.0b render paths) that brings the speed dramatically up.

FarCry to Get HDR Lighting, 3Dc, etc…

At this point Crytek, the developer of FarCry, is implementing capabilities of the modern graphics processors into its engine and title in an attempt to increase the speed of the game. But in future the company will add some graphics features in order to bring more realism into FarCry and impress gamers once again.

Among the features that are going to be implemented shortly are high dynamic range (HDR) lighting as well as 3Dc technology. The former will add some cool lighting effects into the game, while the latter will enable more detailed objects.

Both ATI and NVIDIA promise that the enhancements will see the light of the day with the 1.3 patch, but it is absolutely not clear when this patch is planned to emerge, as the latest official 1.2 patch was withdrawn by Crytek citing its unexpected behaviour.


Image Quality Comparison

In order to make sure that performance benefits supposedly brought by more advanced rendering mechanisms are not achieved by sacrificing image quality, a number of screenshots made in different modes with different hardware were compared.

As previously, we used 5 scenes from “Training”, “Pier”, “Catacombs”, “Archive” and “Volcano” levels, as shown below. Game quality settings were set to the maximum level, anisotropic and trilinear optimizations were enabled for NVIDIA-based graphics cards.

ATI RADEON X800 XT vs. GeForce 6800 Ultra IQ comparison, FarCry 1.2

RADEON X800, Shader Model 2.0

RADEON X800, Shader Model 2.0b

RADEON X800, Shader Model 2.0b, 
sprites distance ratio = 100

GeForce 6800, Shader Model 3.0

-

RADEON X800, Shader Model 2.0

RADEON X800, Shader Model 2.0b

-

GeForce 6800, Shader Model 3.0

RADEON X800, Shader Model 2.0

RADEON X800, Shader Model 2.0b

RADEON X800, Shader Model 2.0b, 
sprites distance ratio = 100

GeForce 6800, Shader Model 3.0

-

RADEON X800, Shader Model 2.0

RADEON X800, Shader Model 2.0b

-

GeForce 6800, Shader Model 3.0

-

RADEON X800, Shader Model 2.0

RADEON X800, Shader Model 2.0b

-

GeForce 6800, Shader Model 3.0

Quick look on the images and their comparison revealed that there is no degradation in the quality as a result of render path transition to shaders 2.0b. The only difference between Shader Model 2.0b and Shader Model 2.0 render paths can be seen on the screenshots is a slightly different lighting on the “Catacombs” level, however, it does not look like a glitch and may be a result of a very slight mouse move during the capture.

We also see almost no differences between NVIDIA’s and ATI’s image quality, except the “shadow issue” we discussed previously and some other minor things we are working on at this moment. While there are talks that NVIDIA encourages game developers to use 16-bit arithmetic precision instead of 32-bit precision (or 24-bit precision, like ATI) in order to improve performance, but possibly degrade image quality, we see no issues with NVIDIA’s quality under brief investigation.


FarCry 1.2: Testbed and Methods

As in the case of Shader Model 3.0 investigation, we used six demos made on “Training”, “Research”, “Regulator”, “Pier”, “Catacombs” and “Volcano” levels. Each demo was recorded with an attempt to reflect the actual gaming situation. In addition, we recorded one more demo on the “Pier” level trying to exaggerate the importance of geometry instancing, which is why the sequence does not necessarily reflect the actual gaming process.

We used our typical system for benchmarking:

For some reason, Shader Model 3.0 could not be enabled on NVIDIA’s latest official 61.76 drivers, which is why used a beta driver provided earlier by NVIDIA.

We advice you to bear in mind that ATI’s and NVIDIA’s drivers we used, as well as Microsoft DirectX 9.0c are not released officially, whereas the 1.2 patch for FarCry was withdrawn by Crytek, which may condition certain results to differ from those obtained on the final versions of the software.

In order to render the game using Shader Model 3.0 and Shader Model 2.0b we used “\r_SM30PATH” and “\r_sm2bpath 1” console keys respectively. In order to set the vegetations to use geometry instead of sprites, we used “\e_vegetation_sprites_distance_ratio 100” console command. We run the game in the so-called “developers mode”, which requires to run the game with a special key: FarCry.exe –DEVMODE.


Graphics Cards’ Performance, Pure Mode

Training Level

We begin with “Training” level that includes loads of vegetations, grass and water, just like many other levels of FarCry.



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

The level does not contain headroom for geometry instancing and has no situations where a number of pixel shaders 2.0 need to be performed in a single pass. However, ATI’s hardware, primarily RADEON X800 XT and RADEON X800 PRO got a slight performance boost for some reason.

The increase is enough for the RADEON X800 XT to leave the GeForce 6800 Ultra behind, but is not sufficient for RADEON X800 PRO to outperform its rival – the GeForce 6800 GT – in a mode that does not enable anisotropy and FSAA. The RADEON 9800 XT seriously lags behind NVIDIA’s GeForce 6800 graphics card.


Research Level

Our demo recorded on the “Research” level includes indoor and outdoor actions, reflecting the actual gaming process on the particular level. Action in a cavern is an example of other indoor scenes in the game: plethora of light sources, per-pixel lighting and multi-pass rendering amid rather complex geometry.



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Both RADEON X800-series and GeForce 6800-series got the slightest of the speed increases because of “long” pixel shaders they calculate instead of multitude “short” pixel shaders.

Neither ATI RADEON X800 XT nor the RADEON X800 PRO could outperform the competing offerings from NVIDIA, not talking about the RADEON 9800 XT that remains galaxy behind the GeForce 6800.


Regulator Level

 

“Regulator” level also features complex indoor scenes with pretty difficult lighting model. Both RADEON X800- and GeForce 6800-series calculate 3 and 4 light sources in a single pass respectively, saving passes in order to achieve higher performance.

 



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

 

Speed boost obtained by both series of top graphics processors is not really significant, but it exists and provides certain benefits to gamers. Alignment of forces did not change with ATI’s RADEON X800 performance increase: the GeForce 6800 Ultra, GeForce 6800 GT and GeForce 6800 appear to be faster in “pure” mode than competing offerings.


Catacombs Level

Our demo recorded on “Catacombs” level does not use complex lighting model and has no places where geometry instancing could impact performance. This is just a run through pretty narrow caverns to a place with ongoing battle between monsters and soldiers. However, it seems that there still are some shaders that can be optimized: transition to 1.2 patch from 1.1 delivered a substantial speed improvement to the RADEON X800 XT.



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Traditionally, ATI’s RADEON X800 XT leads on this level in terms of performance, while the RADEON X800 PRO can only keep up with rival’s GeForce 6800 GT in 1024x768 and 1280x1024 resolutions.


Volcano Level

“Volcano” is a yet another perfect example of what “long” pixel shaders can bring: multitude light sources clearly impact performance negatively on the hardware that cannot optimize rendering process, while the RADEON X800 and the GeForce 6800 graphics chips families clearly deliver tasty performance advantage thanks to more efficient programming.



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

With about 14% performance improvement over Shader Model 2.0 rendering path, ATI Technologies’ RADEON X800 XT leads on the “Volcano” level compared to NVIDIA’s GeForce 6800 Ultra. Nevertheless, the same cannot be said about the RADEON X800 PRO that is still outperformed by the GeForce 6800 GT.


Pier Level

“Pier” level contains loads of vegetations along with water and pretty thorough physics calculations, which is why fps in our demo here is not as high as on “Training” level.




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

As we noted with the GeForce 6800-series, geometry instancing does not provide huge speed benefit on this level. As a result, it is no surprise that the GeForce 6800 family of graphics processors continues to outperform rivalling RADEON X800 XT and RADEON X800 PRO along with the former champion RADEON 9800 XT.


Pier Level, Extreme Geometry

As said above, we decided to figure out the impact of geometry instancing on performance. We did not only disable usage of sprites on distant vegetation objects to get extreme geometry load, but also recorded a demo to intentionally exaggerate the effect of geometry instancing. Here are the results we got:



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

As you see, the impact of geometry instancing is colossal: 24% performance boost in a scene with truly extreme geometry load is something terrific. It is important to note that without geometry instancing all contemporary graphics cards seem to be CPU-dependant in 1024x768 and 1280x1024 resolutions, which means that on a processor less powerful than AMD Athlon 64 3400+ there will be even higher performance advantage from geometry instancing in case of hardcore geometry load.

Having raging geometry power, the RADEON X800 XT and RADEON X800 PRO fiercely leave the GeForce 6800 Ultra and GeForce 6800 GT graphics processors behind, even though the latter also take advantage of the geometry instancing capability. Meanwhile the RADEON 9800 XT demonstrates lower performance compared to the GeForce 6800.


Graphics Cards’ Performance, FSAA + Anisotropic Filtering Mode

As usually, we also present results achieved in the so-called “eye candy” mode with anisotropic filtering 16x (8x for the GeForce FX 5950 Ultra) and full-scene antialiasing 4x activated for premier image quality. Shader Model 3.0 brought pretty moderate performance benefits to the GeForce 6800-series in “eye candy” mode, when we initially tested it, let us now find out whether ATI’s RADEON X800-series can get any substantial assistance out of long pixel shaders as well as geometry instancing.

Training Level

Being pretty much limited by fillrate and pixel shader performance, this level is unlikely to add a lot of speed as a result of geometry instancing as well as long pixel shaders (if there are any here, of course).




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Both RADEON X800 XT and X800 PRO visual processing units very slightly boosted performance as a result of Shader Model 2.0b render-path with full-scene antialiasing and anisotropic filtering enabled.

The RADEON X800 XT continues to lead in terms of performance, just its lead is now even more evident, while the RADEON X800 PRO is still far behind the GeForce 6800 GT. The RADEON X800 XT is much faster than the GeForce FX 5950 Ultra, but appears to be slightly slower than the GeForce 6800.


Research Level

As noted before, advanced render paths have little effect on performance of our demo that reflects the gaming process. While it is possible to record a sequence to exaggerate the effect of Shader Model 2.0b and Shader Model 3.0 render paths, we believe it makes more sense to learn the real effect rather than a kind of synthetic result.




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Neither of the RADEON X800-series graphics processors got any substantial performance boost with the more technology-advanced rendering method. As a result, we have no surprises here: RADEON X800 XT leads all the way, GeForce 6800 Ultra and GeForce 6800 GT follow the leader, while the RADEON X800 PRO is behind all. Performance of the GeForce 6800 drops as a result of slow memory amid not very effective RAM management, as a consequence, the RADEON 9800 XT is a bit faster.


Regulator Level

 

“Regulator” level uses pretty complex lighting models and, from theoretical standpoint, should benefit pretty seriously from more efficient lighting calculations by the RADEON X800- and the GeForce 6800-series.

 



 

Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

 

The reality does not reflect the theory. The RADEON X800-series gain little with the addition of Shader Model 2.0b. The RADEON X800 XT cannot be rivalled by anything in high resolutions, while the RADEON X800 PRO is still following, not leading, compared to the GeForce 6800 GT. NVIDIA’s GeForce 6800 performs in-line with the RADEON 9800 XT.


Catacombs Level

Even though we recorded a speed increase for the RADEON X800 on the “Catacombs” level in “pure mode” , we could not replicate the same with the “eye candy” mode.




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

RADEON X800 XT is a way beyond from the competing GeForce 6800 Ultra, while the RADEON X800 PRO is a way behind the rivalling GeForce 6800 GT. While the RADEON X800 PRO still delivers great performance, it is outperformed by a card that costs the same money, which is not positive for the product. GeForce 6800 managed to seriously outperform the RADEON 9800 XT.


Volcano Level

“Volcano” is certainly the best example of what “long” pixel shaders can bring in terms of performance to the latest graphics cards.




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Both RADEON X800 chips gained massive amount of additional performance with the Shader Model 2.0b on the “Volcano” level. The RADEON X800 PRO is now “almost” in-line with the GeForce 6800 GT, whereas the RADEON X800 XT is unrivalled. While ATI RADEON 9800 XT struggles with the GeForce 6800, the latter seems to be faster than the former.


Pier Level

There are loads of vegetations and water on the “Pier” level. Crytek said that geometry instancing is used on plants, but, as we already know, it has tiny impact on performance unless there is extreme geometry load.




Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

As expected, “Pier” level hardly gets any speed advantages because of more efficient rendering paths. Therefore, this is not a news that the RADEON X800 XT continues to lead, while the RADEON X800 PRO and the RADEON 9800 XT find themselves behind the GeForce 6800 GT and GeForce 6800 respectively.


Pier Level, Extreme Geometry

Looks like even under extreme geometry load the main factor that limits performance on the RADEON X800-series in “eye candy” mode seem to be fillrate, whereas the GeForce 6800-series gain some speed as a result of geometry instancing.



Note: minimal fps are marked with white numbers on the diagrams, black numbers represent average fps.

Because of outrageous geometry power the RADEON X800 XT and the RADEON X800 PRO outperform the competing NVIDIA’s offerings even with FSAA and anisotropic filtering enabled.


Conclusion

With Shader Model 2.0b render path available now in a commercial product and functioning flawlessly ATI Technologies sends a clear message to gamers: excellent performance the RADEON X800 XT and the RADEON X800 PRO deliver now can be improved in future. Like with the case of the Shader Model 3.0 performance advantage provided today is not really huge, however, future titles that use more per-pixel lighting as well as geometry with higher complexity are unlikely to turn into something that the RADEON X800-series cannot bite.

Even though ATI considerably improved performance of the RADEON X800-series graphics cards in FarCry’s “Pure Mode”, NVIDIA’s GeForce 6800 Ultra still can claim performance leadership here, as in quite a lot of cases speed delivered by the GeForce 6800 Ultra is a bit higher compared to the rivalling RADEON X800 XT.

The same situation is effective for a bit less expensive graphics options – the GeForce 6800 GT and the RADEON X800 PRO – 16 pixel pipelines definitely help the former to beat the latter in quite a lot of cases.

Benchmarks in “Eye-Candy” mode revealed nothing new: overwhelming advantage of the RADEON X800 XT over all the other graphics cards in FarCry became even more indisputable because of moderate performance boost achieved as a result of  more efficient rendering paths. Nevertheless, the boost is not enough to drive the RADEON X800 PRO to the leading position among $399 graphics cards in FarCry.

It is important to point out that ATI’s RADEON X800 XT and PRO graphics cards handle extreme geometry load better compared to NVIDIA’s GeForce 6800 Ultra and GT, which may mean that from this point ATI’s visual processing units have higher future-proof than NVIDIA’s latest graphics processing units do.

Additionally, ATI’s RADEON X800 XT traditionally calculates complex pixel shaders faster than everything else available today. Unfortunately, the RADEON X800 PRO still lags behind the competing GeForce 6800 GT product, whereas the RADEON 9800 XT does not seem to be a really strong rival for the GeForce 6800, at least, in FarCry.

In spite of expectations, the GeForce FX 5950 Ultra did not gain any performance boosts with the commercial 1.2 patch, which means that the whole GeForce FX family of products cannot be recommended for FarCry as they are not going to deliver solid performance with premier image quality, the main peculiarity of the game.

To sum everything up, it looks like both ATI and NVIDIA have some performance reserves with their latest families of graphics chips. In the following 6-9 months until the new breed of graphics processors emerges, we are not only going to see a fierce battle between ATI’s RADEON X800 and NVIDIA’s GeForce 6 architecture, but probably are also going to witness the competition between Shader Model 2.0b and Shader Model 3.0 render paths in terms of performance and possibly quality.