ASUS A8N32-SLI Deluxe: Love at Second Sight

In our recent review of the Nvidia nForce4 SLI X16 chipset and of the ASUS A8N32-SLI Deluxe mainboard based on it we met a number of problems resulting in low performance in synthetic as well as gaming tests. We are making our amends now by publishing fresh test data we have achieved with correct HyperTransport bus settings and with the latest version of the nForce4 SLI X16 chipset driver.

by Alexey Stepin
03/01/2006 | 06:53 PM

When first testing the Nvidia nForce4 SLI X16 we saw the ASUS A8N32-SLI Deluxe mainboard, based on that chipset, delivering a lower performance across a number of tests than the ASUS A8N-SLI Premium, a mainboard on the ordinary nForce4 SLI, did (for details see our review called Dressed to Kill: ASUS A8N32-SLI Deluxe Mainboard Review ). The difference wasn’t too great and mostly showed up in synthetic tests. As for games, there was a gap of just 1-2% between the two mainboards.

We rechecked the results repeatedly, but with the same outcome, so we couldn’t but come to the conclusion that the new SLI-compatible chipset from Nvidia with two complete PCI Express x16 slots didn’t offer any real advantages over the older, “PCI Express x8 + x8” chipset. But we were wrong.

According to the documentation we had at that moment from Nvidia, the North Bridge of the nForce4 SLI X16 was connected to the CPU via two 8-bit HyperTransport channels. The width of the channels between the North and South Bridges is 16 bits. Our changing the appropriate BIOS setting had no effect on the performance. Since the PCI Express x16 slots are governed by the different Bridges in this chipset, we thought Nvidia had sacrificed the bandwidth of the HyperTransport bus on the CPU – North Bridge line to make the graphical slots complete. This was indirectly confirmed by the results of the tests. Moreover, we set the width of the link between the North and South Bridge to 8 bits by following the incorrect BIOS setup procedure also given in the documentation and this limited the performance further.

As it turned out eventually, there was an error in the documentation and the lower performance of the ASUS A8N32-SLI was caused by quite another factor, which we will discuss in the next section of the review.


ASUS A8N32-SLI Deluxe: Why Lower Performance?

Almost any fresh-to-market product has some imperfections which are eventually get corrected by the manufacturer in the new revisions or firmware updates. The A8N320SLI Deluxe mainboard didn’t make an exception. The defect in this mainboard was that the SB to NB Frequency option changed automatically after the BIOS settings were reset or the BIOS was updated. This option adjusts the multiplier of the HyperTransport bus frequency between the chipset’s Bridges and thus directly affects the bus bandwidth.

On all new ASUS A8N32-SLI Deluxe mainboards this option is initially set at the correct value of 5X which yields a resulting frequency of 1000MHz. It turns out, however, that any BIOS update or clearing of the CMOS chip with the appropriate jumper resets this option to 2X and the bus frequency is reduced from 1000 to 400MHz as a consequence (the base HyperTransport frequency is 200MHz). The bus bandwidth is reduced in more than two times which can’t but tell negatively on the mainboard performance.

So we manually set the multiplier at 5X, thus restoring the frequency of the HyperTransport bus between the chipset’s Bridges to its nominal value, and retested the A8N32-SLI Deluxe mainboard in gaming and synthetic tests with a new version of the Nvidia nForce driver. The information that the HyperTransport channels between the CPU and North Bridge are 8 bits wide hasn’t been confirmed, although the appropriate setting of our sample of the A8N32-SLI Deluxe was initially set at “8|8”. The performance of the mainboard grew up considerably after we set it at “16|16”.

So here’s our recommendation to all people who are or will be using an ASUS A8N32-SLI Deluxe mainboard. To avoid the low performance problem, be careful when you update the BIOS or clear the CMOS chip for overclocking or other purposes. When you start your system with this mainboard up for the first time, make sure the K8 to NB LinkWidth and the SB to NB LinkWidth options are both set at 16|16 and that the SB to NB Frequency option is set at 5X. Make this check every time you update the mainboard’s BIOS or reset the CMOS to be sure the performance of your mainboard is at the maximum.

This problem may be solved in upcoming BIOS updates, but the last available update we tried didn’t help. It is possible ASUS will release a new revision of the mainboard without the mentioned defect.


Testbed Configuration and Methods

Since our goal still was to check if the new nForce SLI x16 chipset had any advantage over the ordinary nForce4 SLI in real-life applications, we used the same hardware parts with two different mainboards:

Two NVIDIA GeForce 7800 GTX graphics cards working in SLI mode constituted the graphics subsystem for the gaming tests. This will show if the two PCI Express x16 channels give any performance gain in games as opposed to the ordinary nForce 4 SLI configuration in which each x16 slot has only 8 PCI Express lanes. To avoid CPU-imposed limitations, we didn’t test 1024x768 resolution, but tested the popular 1280x1024 mode and the resource-consuming yet standard resolution of 1600x1200. We also didn’t test the systems in the “pure speed” mode, but added two new modes to our traditional “eye candy” settings (4x FSAA + 16x anisotropic filtering): 8x FSAA + 16x AF and 16x FSAA + 16x AF. The ForceWare driver was set up as usual:

We select the highest graphics quality settings in each game. If possible, we use the games’ built-in benchmarking tools and if not, we measure the frame rate with the FRAPS utility. We measure minimal as well as average fps rates whenever possible. We turned on the 4x FSAA + 16x AF mode from the game’s own menu if it was possible. Otherwise, we forced the necessary mode from the ForceWare driver as we also did for the higher levels of full-screen antialiasing.

These games and applications were used as benchmarks:

First-Person 3D Shooters

Third-Person 3D Shooters

Simulators

Strategies

Semi-synthetic benchmarks

Synthetic benchmarks

Besides these gaming tests we also made use of WinRAR 3.51, Futuremark 3DMark2001 SE (build 330), Futuremark PCMark05 (build 110) and our own utility that tests the bandwidth of AGP and PCI Express buses. The testbed included only one GeForce 7800 GTX graphics card for these tests.


Performance in Theoretical Tests

Futuremark PCMark05 build 110

The performance of the two platforms with different mainboards varies by no more than a few percent. Our using the correct BIOS settings and the new version of the nForce4 driver leads to the new ASUS mainboard getting a higher overall score than the older A8N-SLI Premium, although the difference of 61 points isn’t too big for the overall scores of about 4300 points.

The diagram shows that the CPU performance is absolutely the same on the A8N-SLI Premium and the A8N32-SLI Deluxe mainboards. The difference of 12 points is within the measurement error range. Of course, the lower clock rate of the HyperTransport bus between the chipset’s Bridges doesn’t have any effect on the CPU performance.

All modern processors from AMD come with an integrated memory controller, so the results of this test aren’t surprising at all. The ASUS A8N32-SLI Deluxe does somewhat better than the older mainboard, though, thanks to the new driver and the correct setting of the SB to NB Frequency option.

Futuremark 3DMark2001 SE build 330

The ASUS A8N32-SLI Deluxe is definitely worse than the A8N-SLI Premium in this test, scoring 508 points less, but the second test reduced the gap to 116 points only. This difference is already small, so we can consider the A8N32-SLI Deluxe and the A8N-SLI Premium as having the same performance in 3DMark2001 SE.

Futuremark 3DMark05 build 120

The newer version of Futuremark’s benchmark thinks the new mainboard is a little better when using the new chipset driver and the correctly set-up HT bus. When the SB to NB Frequency option is not manually corrected, the two mainboards have nearly the same results. The difference is anyway rather small because the computer speed is limited by the graphics subsystem performance in this test.

The A8N32-SLI Deluxe used to be slower in this test, but its performance improves with the correct HyperTransport frequency multiplier and the new nForce4 driver so it is now as fast as the A8N-SLI Premium and even faster. However, the advantage is not that significant, only 1%.

It seems the frequency of the HyperTransport bus between the chipset’s North and South Bridges doesn’t affect the results of this test at all, while the performance gain comes only from the improved chipset driver.

In the theoretical tests we installed only one graphics card into the PCI Express x16 slot connected to the North Bridge. There is also no active data exchange between the CPU and the disk subsystem which is typical rather of data compression applications. Let’s take a look at the WinRAR results then.


Performance in WinRAR 3.51

The system bus (its role is played by HyperTransport on the AMD64 platform) is loaded heavily at data compression tasks, so the A8N32-SLI Deluxe quite naturally falls behind the A8N-SLI Premium if the SB to NB Frequency option is set at 2X. The gap isn’t too big, not amounting to 10% even, but that’s a fact that the low HyperTransport bandwidth between the CPU and chipset negatively affects the performance of the nForce4 SLI X16. Setting the correct multiplier improves the performance of the A8N32-SLI Deluxe at data archiving, but it still lags behind the A8N-SLI Premium. This seems to be the consequence of the longer path the data has to take in the nForce 16X as it goes from the HDD to the CPU and back again.

Performance in PCI Express Test

The point of this test is in sending a certain amount of data from the system memory to the graphics memory and back again, the size of the transferred block varying from 64KB to 4MB. The utility loads the memory controller as well as the graphics bus and the bus that connects the CPU and the chipset (in this case, this is the HyperTransport link). Here are the results the test produced on the ASUS A8N32-SLI Deluxe platform with the correct BIOS settings and the version 6.85 chipset driver:

When the data block size is small, the two mainboards differ by less than 10MB/s, but when it comes to 256MB blocks the A8N32-SLI Deluxe takes the lead and enjoys a 40MB/s advantage over the A8N-SLI Premium. The older mainboard overtakes the newer one later due to its simpler architecture and, accordingly, a lower latency when transferring data between the various components of the system.

The results are quite different when the data are being transferred from the graphics card memory into the system RAM. The mainboards have similar speeds, except on the largest data blocks where the A8N-SLI Premium is on top again.

So, we can’t say the nForce4 SLI X16 has any advantage over the ordinary nForce4 SLI basing on the results of the theoretical tests. On the other hand, the architectural features of the nForce4 SLI X16 just can’t show up if you use only one graphics card. The new chipset is meant to ensure the maximum performance of a SLI configuration with two graphics cards installed. We are going to see if it is so in real-life games in the next section of the review.


Performance in Gaming Tests: First-Person 3D Shooters

Battlefield 2 was unstable on the new platform and wouldn’t launch altogether in SLI mode. The other games worked well, so the overall stability of the ASUS A8N32-SLI Deluxe mainboard is quite high.

The Chronicles of Riddick

There’s almost no difference between the mainboards on the nForce4 SLI and nForce4 SLI X16 chipsets at low FSAA levels, although the newer mainboard used to be slower previously. It is now faster than the older one in the SLI AA 8x and SLI AA 16x modes, but by no more than 10%.

The new platform theoretically provides more comfort in 1280x1024 resolution in the SLI AA 8x mode, but you can hardly see the difference in practice.

Doom 3

It’s only in the SLI AA 8x mode and in 1600x1200 resolution that the ASUS A8N32-SLI Deluxe is more than 1 fps faster than the older mainboard. The advantage is less than 5% and may be due to measurement errors rather than to any advantages of the new chipset. The mentioned mode is quite playable, but without any speed reserve, so you may have slowdowns in some of the most complex game scenes. To safeguard yourself against that, you can choose the less resource-consuming 4x FSAA mode.

A SLI configuration with two GeForce 7800 GTX 512 cards may make the SLI AA 8x mode playable with comfort in Doom 3 , but we can’t check this out until we get a second sample of this graphics card.


Far Cry

Far Cry is the first game where the new chipset from Nvidia brings about a nice performance bonus of up to 10%. As we supposed, you can enjoy it in extreme antialiasing modes only. Here, it is the SLI AA 8x mode. Note that the SLI pair of GeForce 7800 GTX cards delivers a playable frame rate in 1280x1024 or even higher resolutions. The image quality is noticeably higher at that than with the traditional 4x FSAA, especially if you also turn on the transparent textures super-sampling.

The performance is limited by other factors in the SLI AA 16x mode, probably by the graphics memory frequency of the GeForce 7800 GTX. That’s why there’s little profit from the two complete PCI Express 16x slots.

The advantage of the nForce4 SLI X16 chipset is even more conspicuous on the Research map. The amount of data transferred across the PCI Express bus is probably larger here than in the previous test. Well, even the 10% performance gain doesn’t make the SLI AA 16x mode playable, although you can try to run the game in 1280x1024 on the ASUS A8N32-SLI Deluxe.


F.E.A.R.

F.E.A.R. is a very difficult application for the graphics subsystem of a computer, but the combined power of the two GeForce 7800 GTX ensures a frame rate of 60-65fps in 1280x1024 with 8x SLI AA enabled.

There is no difference between the old and new nForce4 SLI platform in this test, probably because the math1ematical performance of the GPU is the main performance-limiting factor here.

Half-Life 2

Half-Life 2 is by far not the most difficult among today’s gaming applications, but its numerous high-resolution textures allow the nForce4 SLI X16 platform to show its best. There is still a bigger speed gain in the 8x SLI AA mode than in 16x. The latter mode is not playable even in this rather undemanding game. An acceptable frame rate may be achieved on a SLI configuration with two GeForce 7800 GTX 512 cards.


Project Snowblind

We’ve seen this already. The two complete PCI Express x16 slots don’t have a slightest effect on the performance in games that don’t operate with complex textures, particularly in games that have come to the PC from game consoles as Project: Snowblind did. The results of the A8N-SLI Premium and A8N32-SLI Deluxe coincide to a tenth of an fps.

Quake 4

This recently released title doesn’t reveal any big differences between the two mainboards, but it is in fact characteristic of the new games from id Software to have rather simple textures. The low texture quality is made up for with the wide use of bump mapping and with a complex lighting & shadowing model. The amount of data transferred between the SLI-connected graphics cards is relatively small and keeps within the bandwidth of two PCI Express x8 slots.

The combined power of two GeForce 7800 GTX allows playing Quake 4 even with 16x SLI AA turned on, but you may want to limit yourself to 8x SLI AA to play in 1600x1200 and have a smaller risk of encountering a slowdown.


Serious Sam 2

This test is so difficult that the SLI pair of two GeForce 7800 GTX only yields a frame rate of about 50fps in the 4x FSAA + 16x AF mode. At higher levels of antialiasing, the performance immediately plummets below comfortable.

There is no difference between the nForce4 SLI and the nForce4 SLI X16 platforms, although Serious Sam 2 cannot be counted among games with low textural load. Shaders using 4 textures are everywhere in this game; shaders with 7 or even 8 textures occur, too. That’s why we think it rather strange the increased number of PCI Express lanes for each graphics card on a multi-GPU platform has no effect on the performance.

It is possible, however, that the system performance is limited by the texture-processing speed of the GPU, considering the relatively low frequency of the GeForce 7800 GTX, which is 430MHz. The rather slow graphics memory of the card may affect the results, too.

Unreal Tournament 2004

To our surprise, Unreal Tournament 2004 proved to be among those games that profit by having more PCI Express lanes on the graphics card slots. There’s no performance gain in the 4x FSAA and 8x SLI AA modes, but then it is 10% to 30% in the 16x SLI AA mode! The resolution of 1600x1200 is playable with comfort in the latter mode. This is a great improvement over the previous test session when the ASUS A8N32-SLI Deluxe was slower than the older mainboard due to the reduced HyperTransport bandwidth between the chipset’s Bridges.


Performance in Gaming Tests: Third-Person 3D Shooters

Splinter Cell: Chaos Theory

This graphically advanced game can please the eyes of the player with beautiful visuals if the graphics card supports Shader Model 3.0. It doesn’t however get any advantage from running on a mainboard with 16 PCI Express lanes connected to each of the graphics card slots. The game creates its visuals with pixel shaders and lacks complex textures, like all cross-platform projects do.

Note that the game runs at the same speed in the 8x and 16x SLI AA modes, so we suspect that 256 megabytes of graphics memory is not enough to correctly turn on 16x FSAA here. We can’t check this out as we don’t have two GeForce 7800 GTX 512 graphics cards at our disposal right now.


Performance in Gaming Tests: Simulators

Colin McRae Rally 2005

The two GeForce 7800 GTX graphics card deliver the same performance on both the mainboards despite our using the correct BIOS settings and the new chipset driver. The difference is negligibly small even in the extreme antialiasing modes. The game visuals are simple enough, so you can play with high levels of FSAA enabled, except for high resolutions in the 16x FSAA mode.

Pacific Fighters

Like in the previous test, we can see no difference between the two systems in Pacific Fighters . The results of the two platforms nearly coincide.


Performance in Gaming Tests: Strategies

Age of Empires 3

The 8x SLI AA mode, not to mention 16x, is unavailable even for owners of two GeForce 7800 GTX cards. The results are of pure theoretical interest then. You can see that the game doesn’t run any faster on the system with more PCI Express lanes. The A8N32-SLI Deluxe is even a little slower than the A8N-SLI Premium, but the difference is within the measurement error range.

Warhammer 40000: Dawn of War

It’s all like in the previous test, even though Warhammer 40000: Dawn of War makes wide use of high-resolution textures.


Performance in Gaming Tests: Semi-Synthetic Benchmarks

Aquamark 3

The correct BIOS settings and the new nForce4 SLI X16 driver brings about a considerable performance bonus varying from 5% to 10%. Not much, but we can see that there is some profit from having two complete PCI Express x16 slots on a multi-GPU platform.


Performance in Gaming Tests: Synthetic Benchmarks

Futuremark 3DMark03 build 360

The first test is very simple and lacks shaders, so the difference between the nForce4 SLI and nForce4 SLI X16-based platforms is as clear as it was in Unreal Tournament 2004 .

And the changes since our previous test session are obvious, too: the A8N32-SLI Deluxe platform used to be slower than the A8N-SLI Premium one, but it’s all vice versa now as the new chipset ensures up to 20% more performance in the extreme antialiasing modes!

Using pixel shaders and not using complex textures, the second test has both the platforms showing roughly the same speed, although the nForce4 SLI X16 is still somewhat superior in the most resource-consuming FSAA modes. The advantage varies from 5% to 10% depending on the antialiasing mode and resolution.


A more complex geometry is the single difference of the third test from the second one, so it produces the same picture of performance.

In the fourth test, on the contrary, the new platform enjoys a bigger advantage than in the two previous ones because the test is rather difficult and the amount of data pumped through the PCI Express bus in the SLI AA modes is rather large. The ASUS A8N32-SLI Deluxe is about 10% ahead of the older, nForce4 SLI-based mainboard that lacks two complete PCI Express x16 slots.


Futuremark 3DMark05 build 120

There are benefits from the 32 PCI Express lanes servicing a multi-GPU graphics subsystem, but they show up only at very resource-consuming antialiasing levels when the amount of transferred data is so large that 16 lanes just cannot cope with it.

There’s almost no advantage visible in the second test because it renders a smaller scene than the first test, so the load on the graphics memory subsystem and the graphics bus is much lower.

The third 3DMark05 test renders a larger scene than the first one, but the new chipset from Nvidia doesn’t offer a big performance gain here because it is the number and efficiency of pixel processors that determine the speed in this test. Anyway, the nForce4 SLI X16 platform looks a little better in the 8x and 16x SLI AA modes.


Conclusion

So, this test session proves that the Nvidia nForce4 SLI X16 chipset is the highest-performance and well-balanced solution for multi-GPU systems with Nvidia GeForce 7 graphics cards.

Using correct BIOS settings and the new version of the nForce4 SLI X16 driver, we retested the new Nvidia SLI platform that features two architecturally complete PCI Express x16 slots. We had earlier tested the same platform to find that the new chipset had no advantage in real-life gaming applications, but this turned out to be the result of some defects and errors in the ASUS A8N32-SLI Deluxe mainboard and Nvidia’s technical documentation.

As we already said at the beginning of the review, the current revision of the A8N32-SLI Deluxe automatically drops the frequency multiplier of the HyperTransport link between the chipset’s Bridges from the default 5X to 2X when you update the BIOS or reset the BIOS settings by clearing the CMOS chip. As a result, the bus frequency goes down to 400MHz (200MHz x 2) from the nominal 1000MHz (200MHz x 5) and the bandwidth drops in more than two times, from 8GB/s to 3.2GB/s. Of course, this affects the performance of the mainboard, especially where the bus bandwidth is a critical performance-related factor like in data compression applications or in SLI mode with enabled SLI AA.

Using the correct BIOS settings and the version 6.85 nForce4 SLI X16 driver, we saw the ASUS A8N32-SLI Deluxe perform as fast as the ASUS A8N-SLI Premium and sometimes even produce a nice performance boost in extreme full-screen antialiasing modes. We want to note that it was not only the correct setting of the BIOS option, but also the new version of the chipset driver, which obviously improves interaction with the memory controller, have had such a positive effect on the performance of the mainboard.

It is natural that we’ve observed the biggest performance gain in games that draw large-scale scenes with high-resolution textures. Particularly, such games as The Chronicles of Riddick, Far Cry, Half-Life 2, and especially Unreal Tournament 2004 reacted readily at the increase in the number of PCI Express lanes. The performance gain relative to the nForce4 SLI varied from 2% to 10%, and up to 30% in Unreal Tournament 2004. In other words, the new chipset from Nvidia has really proved it can be the foundation of the most efficient multi-GPU platform for Nvidia’s GeForce series graphics cards.

As for the news from the opposite camp, Nvidia’s archrival, the Canadian ATI Technologies, is expected to announce its Radeon Xpress 3200 very soon. This chipset may become a serious opponent to the multi-GPU SLI platform and, as usual, you will certainly see it tested on our site!