by Alexey Stepin , Anton Shilov
03/27/2006 | 02:34 PM
Multi-GPU-compatible systems account for a very small share of the consumer 3D graphics market, yet you can always be sure there is a stable group of enthusiasts that are ready to pay any money for real high performance. These people willingly pay for a little extra speed. Demand stimulates the supply and the first multi-GPU chipsets arrive into the market: at first Nvidia nForce4 SLI (for details see our article called NVIDIA Multi-GPU SLI Technology: New Approach to Old Ideas) and then, much later, ATI Radeon Xpress 200 CrossFire Edition (for details see our article called Swords Crossed: ATI CrossFire Platform Review).
<%BANNER[article]%>The first generation of system logic that could work with two graphics cards simultaneously was in fact nothing else but slightly modified versions of single-card-supporting chipsets. They only had to be made capable of reconfiguring the PCI Express bus in such a way that the 16 lanes previously connected to the single available graphical slot could be split in two groups, each for its own graphical slot. The reconfiguring was originally done by means of mechanical switches, and then by special logic which could be controlled from the BIOS Setup. One thing remained the same, though: the PCI Express x16 slots on first-generation multi-GPU systems were logically PCI Express x8 rather than x16. So, those chipsets didn’t differ much from their multi-GPU-incompatible prototypes. And that was a logical and justifiable solution since the market of multi-GPU systems was and is very small indeed.
The performance gain the two graphics cards brought about was many times higher than the loss from the reduction of the number of PCI Express lanes from 16 to 8. But the “PCI Express x8 + x8” formula was inherently deficient and prevented the first generation of multi-GPU platforms from showing their full potential. The release of new chipsets with support of two complete PCI Express x16 slots seemed inevitable, however, because multi-GPU technologies do one more thing besides satisfying the needs of wealthy PC enthusiasts. They are also a demonstration of the developer’s technological superiority.
Nvidia’s nForce4 SLI X16 was the first such chipset to appear. The 40 and 38 PCI Express lanes it offered in its Intel and AMD-oriented versions, respectively, were enough to make two complete PCI Express x16 slots. As our tests showed, having twice more PCI Express lanes connected to the graphical slots can really affect the system performance positively, but only in extreme full-screen antialiasing modes and in high display resolutions when the load on the bus is the highest. So, the enthusiasts got their extra percent of speed; the nForce4 SLI X16 helped fully utilize the potential of Nvidia SLI technology.
Meanwhile, second-generation CrossFire subsystems from ATI Technologies ran on the Radeon Xpress 200 CrossFire Edition chipset (RD400/RD480) which was the first generation of ATI’s system logic with multi-GPU support. With all its advantages, the chipset had fewer PCI Express lanes than the nForce4 SLI X16. Of course, ATI Technology just couldn’t let it be this way, especially when its Radeon X1900 XT CrossFire subsystem turned in so excellent results in our tests. To reveal the full potential of CrossFire technology, the new CrossFire Xpress 3200 chipset, also known under its codename of RD580, has been created. Let’s discuss its concepts and features at more length now.
Although the CrossFire Xpress 3200 was developed from scratch in only eight months, the first revision of the RD580 chip was fully operable. ATI’s engineers took only 24 hours to make the new chipset absolutely stable.
The new North Bridge came out very small notwithstanding TMSC’s 0.11-micron tech process and 40 PCI Express lanes. Consisting of 22 million transistors, the die is only 39 square millimeters large:
This is considerably smaller than the die area of the nForce SPP 100 (C51D) North Bridge of the nForce4 SLI X16 chipset, which is manufactured on 0.09-micron tech process and supports only 20 PCI Express lanes. The ATI RD580 is in fact the smallest North Bridge on the market! Another advantage of the new chip is very low power consumption at only 8 watts. It means there will be no clumsy and noisy coolers on upcoming CrossFire Xpress 3200-based mainboards whereas some existing mainboards on the nForce4 SLI X16 are equipped with sophisticated cooling systems on heat pipes.
The CrossFire Xpress 3200 is omnivorous when it comes to South Bridges: any chip with PCI Express support will do. Four PCI Express lanes, forming the so-called Alink2 interface, are allotted in the RD580 for connection to the South Bridge. The total bandwidth of this interface is 2GB/s (1GB/s in each direction). This should be enough for the standard peripherals integrated into South Bridge chips.
We should note that ATI’s SB450/460 South Bridges and ULi’s M1573 chip fall short of today’s requirements as they do not support Serial ATA-II. So, we will most likely see an M1575 South Bridge from ULi on early off-the-shelf mainboards on the CrossFire Xpress 3200 chipset. This chip has fewer USB 2.0 ports than the nForce4 SLI (eight against ten) and doesn’t support Gigabit Ethernet, but it can work with HAD (High-Definition Audio) codecs, while Nvidia’s chipset cannot. The lack of Gigabit Ethernet is not a big problem as it can be solved by attaching any suitable single-die controller with PCI Express x1 interface to the North Bridge. For example, the Marvell Yukon 88E8053 can do. Later on, the new SB600 South Bridge from ATI is going to be the main companion to the RD580. This chip is expected to appear in the middle of this year.
To summarize this section, we want to offer you a table that compares the technical characteristics of the modern chipsets that support two complete PCI Express x16 slots:
Nvidia nForce4 SLI X16 | ATI CrossFire Xpress 3200 | |
Architecture | Dual-chip | Dual-chip |
Connection between bridges | HyperTransport (8GB/s) | PCI Express x4 (2GB/s) |
HyperTransport support | 16 bit / 1 GHz | 16 bit / 1 GHz |
PCI Express lines | 38 | 40 |
Multi-GPU support | Yes, SLI | Yes, CrossFire |
PCI support | Yes, 6 devices | Yes, 7 devices |
USB 2.0 support | 10 ports | 8 ports |
Serial ATA support | Yes | Yes |
NCQ support | Yes | Yes |
Serial ATA ports | 4 | 4 |
Parallel ATA channels | 2 | 2 |
RAID support | Yes, 0, 1, 0+1, 5, JBOD | Yes, 0, 1, 0+1, 5, JBOD |
Ethernet support | Yes, 1 Gbit/s | Yes, 10/100 Mbit/s |
Network security | Yes, Nvidia Active Armor | None |
Audio support | AC’97, 8 channels | High Definition Audio (Azalia) |
So, the ATI CrossFire Xpress 3200 doesn’t yield to Nvidia nForce4 SLI X16 in terms of functionality (with a few allowances), but is far superior when it comes to the architectural concept we will dwell upon in the next section.
Unlike the Nvidia nForce4 SLI X16, the CrossFire Xpress 3200 (RD580) chip was originally developed as a high-performance solution for multi-GPU enthusiasts. This is even indicated by the name of the new chipset.
As a matter of fact, the Nvidia nForce4 SLI X16 is not a new chipset really, but uses an ordinary nForce4 SLI chip as the South Bridge and an nForce SPP 100 (C51D) chip as the North one. The latter is a cut-down version of the GeForce 6100/6150 (C51G) North Bridge – without the integrated graphics core. So, Nvidia created its chipset with support of two complete PCI Express x16 slots without much effort, but like any compromise, the nForce4 SLI X16 is not a perfect solution. Its main drawback is that there is a separate controller for each of the graphical slots.

This is a functional and operable solution, but the path between the memory controller and the graphics card in the South Bridge-connected slot is much longer than the path between the North Bridge and its corresponding slot. This is not good at all for the speed of data transfers between the graphics cards which are working in a multi-GPU configuration.
The total bandwidth of the HyperTransport bus between the North and South Bridges is 8GB/s, which seems to be enough for normal operation of two PCI Express x16 slots, but some of the bandwidth is consumed by other system components like the hard drives which are connected to the South Bridge and the network controller which is integrated into this Bridge, too. Moreover, data undergoes a double conversion like PCI Express → HyperTransport → PCI Express as it is moving from one graphics card to another on the nForce4 SLI X16 platform. This conversion increases data-transfer latencies and hence can lower the maximum theoretical performance.
The ATI CrossFire Express 3200 is originally free from the mentioned drawback. The RD580 North Bridge incorporates one controller with support of 40 PCI Express lanes which is more than enough to build two complete PCI Express x16 slots:

Besides the 32 PCI Express lanes for the graphical slots, four more lanes are used to connect to the South Bridge, and four more can be wired to additional PCI Express x1/x4 slots or to onboard peripherals like network controllers, Serial ATA RAID controllers, etc. As you see, the single controller integrated into the North Bridge controls all the lanes – this solution reduces data transfer latencies and increases performance.
Moreover, ATI’s specification mentions a special technology called Xpress Route. Its key point is to optimize the PCI Express lanes inside the chipset so that data from one PCI Express x16 slot traveled to the other graphical slot with minimal latencies and at a highest possible rate.

According to ATI, this is particularly important for the multi-GPU operating mode when the graphics cards are actively communicating between one another. This is especially important for configurations with two Radeon X1800 GTO, Radeon X1600 or Radeon X1300 which, unlike Radeon X1900/X1800 CrossFire Edition, do not have a hardware image-compositing unit, but rely only on the PCI Express bus.
Well, such solutions are not really in high demand since a single high-performance graphics card is going to have better performance and compatibility characteristics, but low data-transfer latencies and high PCI Express bandwidth are also necessary for today’s fastest CrossFire configuration with two Radeon X1900 XT cards, especially in high resolutions or extreme full-screen antialiasing modes.
Theoretically, the launch of the new CrossFire Express 3200 will allow ATI to enable the use of any two identical graphics cards in the CrossFire mode for their entire product range. This way you don’t need to purchase CrossFire Edition mainboards for certain specific configurations. However, this approach will automatically eliminate the highest speed SuperAA support, which is very important for high-end solutions.
Efficient data transfers between PCI Express slots can only be utilized in multi-GPU systems so far, but may have more applications in the future. For example, low data-transfer latencies will be useful for a co-processor that would process the game’s physics model. Besides games, a high PCI Express bandwidth may come in handy in video editing applications when one GPU is responsible for the OS interface and for outputting the resulting image on the screen and the other performs video-editing tasks proper (encoding, applying special effects, and so on).
So, ATI Technologies has created and implemented the CrossFire Xpress 3200 chipset not only as a multi-GPU-compatible gaming platform with maximum performance, but also as a future-proof solution.
Since the new chipset from ATI is targeted at PC enthusiasts, the company took care to provide some overclocking opportunities which will be discussed below.
Many computer enthusiasts run their gaming systems overclocked to achieve the maximum possible performance at any cost. When developing the CrossFire Xpress 3200 chipset, ATI took account of the needs of this group of users and endowed the new system logic with some overclocking potential.
It’s not a secret that overclocking PC components means putting them under stress and no manufacturer would vouchsafe for their stable operation under so harsh conditions. Well, what is harsh for other chipsets is quite normal for the ATI CrossFire Xpress 3200 which came out impressively robust.
The RD580 North Bridge can work normally at a voltage of 1.2V and a temperature of 120°C which is almost deadly for any other silicon chip. As a consequence, the CrossFire Xpress 3200 should have excellent overclockability under normal conditions, especially as it features low power dissipation and optimized HyperTransport and PCI Express controllers. The chip doesn’t have internal frequency divisors, so its clock rate grows up in sync with the base frequency of the HyperTransport bus, which is 200MHz.
According to ATI Technologies, the HyperTransport bus can be overclocked by more than 50% in the CrossFire Xpress 3200, from 1GHz to 1.5GHz and higher. The PCI Express bus is stable at a frequency 40% higher than the default one. You should be aware, however, that the actual overclockability will depend not as much on the chipset as on the particular mainboard and CPU model. The PCB wiring, the quality of components like resistors, capacitors, etc. and a lot of other factors will all have an effect, positive or negative, on your overclocking attempts.
So, the ATI CrossFire Xpress 3200 gives you a promise of good overclocking, but it’s then up to the mainboard maker to back up this promise or not. Even two samples of the same mainboard model may differ in their overclockability, so overclocking is not unlike a lottery. Luck has always been a factor in this enterprise besides the user’s knowledge and skill, and not the least important of the three!
ATI has been very successful in the chipset field, achieving a tremendous sales growth in the last year. In Q3 2005 the company earned less than $40 million by selling chipsets whereas in the fourth quarter of the same year they raked in as much as $100 million! And this number is expected to grow up to $120 million and higher in Q1 2006. So, ATI’s share of the chipset market has grown threefold in the past 12 months. Right now the company produces about 50% of all chipsets for the AMD platform, the most popular one among gamers. And talking about the market at large, ATI now holds a bigger portion of it than Nvidia.
This is all the result of successful collaboration with major PC OEMs after ATI got contracts to supply chipsets for such companies as Acer, HP, Fujitsu Siemens Computers, Lenovo, Gateway/Emachines, Sony, NEC, Fujitsu, Medion, Packard Bell and others. ATI Technologies was chosen as the main chipset supplier not only because its produce spans all the sectors of this market, including mobile and integrated chipsets, but also complies with world-class quality and reliability standards. For a better comparison, Nvidia’s nForce, on the other hand, is employed in systems from such large manufacturers as HP, Gateway and Dell. Considering that OEMs account for most of the PC market, the showings of ATI who managed to sell almost 7 million chipsets in Q4 2005 do not look such a great marvel.
ASUS A8R32-MVP Deluxe is one of the first mainboard on the CrossFire Xpress 3200. We’ll use it to learn more about the weak and strong aspects of ATI’s new chipset. So, the mainboard is really worth our attention.
The mainboard box carries the AiLife logo, the same as we noted in our ASUS A8N320SLI Deluxe review.
We expressed our praises to this design already. It is simple and elegant. The product creates a positive impression from the very beginning. The flap panel of the box can be opened to reveal the basic technical info on the mainboard.
Here’s what we found in the box, besides the mainboard itself:
As you can see, the accessories are not gorgeous at all even though this is a Deluxe product (no Wi-Fi modules or the like). Yet you do receive everything necessary to use the mainboard. Considering that the ULi M1575 South Bridge supports only eight USB 2.0 ports and there are four ports already on the mainboard’s I/O panel, a single USB 2.0 bracket is quite enough. The remaining connector is used for the wireless network adapter the Wireless Edition of the mainboard comes with or attached to the USB ports on the front panel of your system case.
The ASUS A8R32-MVP and the ASUS A8N32-SLI look similar as all multi-GPU-supporting mainboards do. That is, the two full-size PCI Express x16 slots catch your eyes at once.
Both these mainboards use two-piece chipsets, too. They do differ a lot otherwise, but the component layout of the A8R32-MVP Deluxe is none the worse for it.
There are 6 expansion slots here: two PCI Express x16, one PCI Express x1 and three PCI. The position of the PCI slots should be noted: even if there are two graphics cards with dual-slot coolers installed in the system, two out of the three PCI slots will still be available to the user (on the ASUS A8N32-SLI Deluxe, a single PCI slot would be available in that case). The mainboard’s single PCI Express x1 slot will also be blocked, but it’s not a big problem as there are virtually no PCI Express peripherals yet and, moreover, the mainboard belongs to the enhanced-functionality Deluxe series.
Despite the obvious targeting at PC enthusiasts, there is a three-channel CPU voltage regulator on the mainboard, although the A8N32-SLI Deluxe uses a more complex eight-channel one. ASUS must have thought it sufficient considering the relatively low power consumption of AMD processors, even of the senior models Athlon 64 X2 4800+ and Athlon 64 FX-60. The power transistors of the voltage regulators are cooled with a simple aluminum heatsink. The regulator is based on high-quality capacitors from United Chemi-Con and Sanyo and encapsulated choke coils, so we have no complaints about this section of the mainboard.
The CPU socket has been shifted closer to the top edge of the PCB on the A8R32-MVP in comparison with the A8N32-SLI, but you can still have some troubles installing a cooler whose dimensions are much larger than the fastening plate because the chipset’s North Bridge is quite near this plate and bears a rather tall heatsink. So, both the mentioned mainboards have this problem. The sole of the chipset heatsink has a special soft frame to prevent the North Bridge from chipping.
The power connectors are placed properly, however. The PSU cables won’t hinder the CPU cooler from working normally. There’s a small gap between the first Parallel ATA slot and the 24-pin ATX 2.0 connector and the latch of the latter connector is located on the opposite side, so there’s nothing wrong in the close location of these two connectors. Next to them, there is a place left for a 4-pin EZ_Plug connector which is, however, missing on the PCB. ASUS’s engineers must have thought it unnecessary because most other multi-GPU-compatible mainboard can do well without additional powering of the graphical slots. The FDD slot is turned by 90 degrees. A contemporary top-end computer system is likely to lack a floppy drive at all, but if you do have one, you will see that this placement of the connector makes it easier to lay the interface cables inside the system case.
The DIMM slots are placed properly. They are above the first PCI Express x16 slot and a long graphics card won’t block their latches. The slots are divided in two groups, each belonging to one memory channel. You should put your DIMMs into same-colored slots to enable dual-channel memory access. The PCI Express x16 slots have simple molded latches unlike on the ASUS A8N32-SLI Deluxe where more complex detachable latches are used. A low needled heatsink that cools the South Bridge can be seen in between them. Its size is too small to interfere with graphics cards installation.
There is one visual diagnostics tool on the mainboard – a standby voltage LED indicator. There is a place for a miniature PC speaker and an ITE IT8712F-A chip nearby. The chip is an all-purpose controller that is responsible for the COM, LPT and Game ports, FDD slot and PS/2 connectors, and also performs system monitoring and fan speed management functions (ASUS Q-Fan technology). The following parameters are monitored by the mainboard:
There are four fan connectors on the PCB. The COM and Game headers are located in the bottom right corner of the PCB. This is not very convenient because the cables may get in your way as you’re installing expansion cards. On the other hand, there’s a very low probability that these legacy will ever be used at all in a modern computer. The Clear CMOS jumper is accessible, but not quite easily if you’ve inserted a graphics card with a dual-slot cooling system in the bottom PCI Express x16 slot.
Stack Cool 2 technology should be mentioned, too. It means that the A8R32-MVP Deluxe has an additional metallized layer for better heat takeoff from the mainboard’s components.
The PCB design of the ASUS A8R32-MVP Deluxe is overall much better than that of the A8N32-SLI Deluxe model, mostly because they didn’t install that bulky chipset cooling system here. The CrossFire Xpress 3200 features low power consumption and heat dissipation, so it is quite satisfied with simple passive heatsinks. The second important plus of the A8R32-MVP is that you always have at least two PCI slots available, even if graphics cards with massive coolers are installed, and do not face the choice of using either an add-in audio card or a TV-tuner, for example.
It’s quite a daunting task to develop a multi-GPU-compatible mainboard, but ASUS’s engineers are quite successful at it again. They’ve done even better than the last time!
The disk subsystem of the mainboard features six Serial ATA-II ports and two Parallel ATA-133 channels. Four of the SATA connectors and both the ATA connectors are provided by the ULi M1575 South Bridge. The remaining two Serial ATA-II ports are implemented via a very popular SiI3232 controller from Silicon Image. One of the ports can be found on the mainboard’s I/O panel as part of the SATA on the Go technology: the Silicon Image chip fully complies with the Serial ATA-II specification that describes hot connection of devices, so you can connect and disconnect an external hard drive to and from your computer without shutting it down. You only need to provide a power source for the drive. This external SATA port explains the odd location of the additional disk controller – it is on the left, next to the mainboard’s I/O connectors.
The Serial ATA-II controller integrated in the South Bridge can work in two modes: AHCI and Parallel ATA emulation. You don’t need to install additional drivers for the latter mode. The drives attached to the South Bridge via the Serial ATA interface can be united into RAID arrays of level 0, 1, 0+1, 5 or JBOD.
The additional Silicon Image controller supports RAID 0 and 1.
As we said above, the disk connectors are all placed in quite a convenient way, so you shouldn’t have troubles assembling the system.
The mainboard’s networking section is represented by two chips from Marvell Yukon, 88E001 and 88E8053. Both are single-die Gigabit Ethernet controllers, support the test-cable feature (AI NET2) and differ in the bus type – 32-bit PCI and PCI Express x1, respectively. Hardware security features like the Active Armor firewall from Nvidia are not supported. The networking capabilities of the ULi M1575 South Bridge – it features an integrated media access controller – are not implemented on the mainboard as the maximum speed it supports is only 100Mbps.
Unlike the nForce4 SLI X16, the CrossFire Xpress 3200 supports the High-Definition Audio standard. The audio section of the ASUS A8R32-MVP Deluxe is implemented with a HD codec Realtek ALC882 that supports the multiple streaming feature. The codec includes five dual-channel DACs and allows reproducing different audio content on a 7.1 speaker set and on a stereo system simultaneously. The maximum audio resolution supported is 24bit/192kHz for both the analog and S/PDIF outputs. The recording resolution is limited to 20 bits. A number of functions are supported to improve the microphone recording quality. The connectors this codec supports are all switchable and support the auto-detect device feature. The AAFP header near the chip is meant to connect to the front audio panel.
The chip specification declares a signal-to-noise ratio of 103dB for all the DACs and 90dB for the ADCs of the codec. Of course, these are theoretical numbers because the real parameters of the codec are going to be worsened by electromagnetic interference from the other mainboard components. Still, this should be enough for many users, while true audiophiles will surely buy a discrete audio card for their PCs.
The IEEE 1394a (FireWire) interface is provided by a Texas Instruments TSB43AB22A chip. Both the ports the chip supports are available as onboard connectors that are to be connected to the appropriate back-panel bracket. The USB 2.0 interface is implemented using the capabilities of the ULi M1575 South Bridge; eight ports in total are available (seven in the Wireless Edition of the mainboard).
The back panel of the mainboard is the same as the A8N32-SLI Deluxe’s: two PS/2 ports, optical and coaxial S/PDIF connectors, a Serial ATA-II connector (SATA on the Go), an LPT port, 6 audio sockets, two Ethernet ports with LED indicators, and four USB ports.
ASUS still uses the AMIBIOS microcode, so the BIOS Setup interface is different on mainboards from ASUS than from many other manufacturers that go with Award-Phoenix.
There’s a BIOS screen that tells you detailed info on the processor installed:
Like the A8N32-SLI Deluxe, this mainboard offers you control over all the main parameters:
The CPU voltage range is perhaps not too big and may limit your overclocking attempts, but you are likely to be satisfied with the available options unless you are into extreme overclocking experiments.
You can choose among the following memory frequencies: 100, 133, 166, 183, 200, 216, 233 and 250MHz. All the memory timings typical of AMD’s processors can be adjusted. Inexperienced users may go on in Auto mode in which all the settings are set up automatically, but the performance of the system may be not the highest possible.
The mainboard’s BIOS supports a number of exclusive technologies from ASUS, some of which are indeed very helpful. For example, Crashfree BIOS 2 allows to restore corrupted BIOS code using a special recovery disc. The CPU Parameter Recall feature makes it unnecessary to reset the CMOS chip when you’ve overclocked your system so much that it cannot start up.
AU NOS and PEG Link Mode technologies are meant for dynamic overclocking: the former controls the CPU clock-gen frequency and the latter automatically overclocks the PCI Express bus depending on the needs of the graphics subsystem.
AI Quiet adjusts the speed of the fans depending on the temperature, and AI Net 2 helps identify ruptures in network cables – this technology is supported by both the network controllers installed on the A8R32-MVP Deluxe.
We took an AMD Athlon X2 4800+ processor to check the overclocking potential of the mainboard and the new chipset. This sample had been stable at a clock rate of 2.7GHz on an nForce4 SLI platform (at a clock-generator frequency of 225MHz).
With our sample of A8R32-MVP Deluxe we passed the Super PI test at 2765MHz CPU clock rate (set as 230MHz x 12) by increasing the core voltage by 0.1V and using the version 0310 BIOS. We couldn’t get any further. The system was not stable at a higher clock-generator frequency and would hang up even when we increased the CPU voltage by 0.2V above the default value. Our increasing the voltage on the HyperTransport and PCI Express buses was fruitless, too.
So, we suspect we reached the overclockability peak of our sample of Athlon 64 X2. The ASUS A8R32-MVP Deluxe has a certain advantage over the ASUS A8N32-SLI Deluxe when it comes to overclocking, but it is indeed very small. The mainboard will perhaps do better with some other processor.
And then we tried to overclock the processor at a reduced frequency multiplier. We got an impressive result: 323MHz clock-generator frequency at 2T Command Rate. And finally we made an attempt to overclock the PCI Express bus. We succeeded to raise its frequency to 130MHz at 1.3V voltage, i.e. by 30%. This is a little lower than the 40% frequency growth promised by ATI Technologies, yet we cannot but agree that the CrossFire Xpress 3200 really does well at overclocking.
Our goal was to check the capabilities of the ATI CrossFire Xpress 3200 chipset, so we decided to compare it with a platform on ATI Radeon Xpress 200 CrossFire Edition as well as with a platform on Nvidia nForce4 SLI X16. Thus, there were three mainboards:
The rest of the components were the same for all the platforms:
We used two Radeon X1900 XT CrossFire and two GeForce 7800 GTX graphics cards on ATI’s and Nvidia’s platforms, respectively, to build the corresponding multi-GPU subsystems.
Before testing the platforms in games, we carried out a small theoretical examination with the following programs:
The three platforms were all equipped with a single Radeon X1900 XTX for these tests.
These games and applications were used as performance benchmarks:
First-Person 3D Shooters
Third-Person 3D Shooters
Simulators
Strategies
Semi-synthetic benchmarks
Synthetic benchmarks
To avoid CPU-imposed limitations, we didn’t test 1024x768 resolution, but tested the popular 1280x1024 mode, which is the native resolution of most 17” and 19” LCD monitors today, and the resource-consuming resolution of 1600x1200, which is often selected by owners of large monitors with a diagonal of over 19”.
We select the highest graphics quality settings in each game, the same for ATI’s and Nvidia’s solutions, except for the flight sim Pacific Fighters which requires vertex texturing support to enable its Shader Model 3.0 mode. The Radeon X1000 family doesn’t support this feature and runs the game in the Shader Model 2.0 mode. We do not edit the games’ configuration files. If possible, we use the games’ built-in benchmarking tools and if not, we measure the frame rate with the FRAPS utility. We measure minimal as well as average fps rates whenever possible.
We didn’t test the systems in the “pure speed” mode, but added two new modes to our traditional “eye candy” settings (4x FSAA + 16x anisotropic filtering): Super AA/SLI AA 8x + AF 16x and Super AA 14x/SLI AA 16x + AF 16x. We turned on the 4x FSAA + 16x AF mode from the game’s own menu if it was possible. Otherwise, we forced the necessary mode from the ForceWare driver as we also did for the higher levels of full-screen antialiasing.
The drivers were set up as usual.
ATI Catalyst :
Nvidia ForceWare:

The three platforms all have almost the same result, which is not surprising at all. After all, they differ in the mainboard model alone and are absolutely identical otherwise. The architectural features of the chipsets don’t play a big role here. To be exact, the nForce4 SLI X16 platform is ahead of the CrossFire Xpress 3200, but the gap of 39 points is within the measurement error range.

This test measures the CPU performance, so its results are similar on all the platforms, with a very small advantage on the part of the ATI Radeon Xpress 200 CrossFire Edition (RD480), because they all used one and the same AMD Athlon 62 X2 4800+ processor. The integrated memory controller of AMD’s processors has one positive effect – the performance of the processor doesn’t depend on the chipset which is in fact serves as a HyperTransport tunnel.

The memory subsystem test is somewhat surprisingly won by the CrossFire Xpress 3200 platform. The memory worked with the same timings on all the platforms, so we can only guess that a better BIOS, with optimizations for the most efficient operation of the Athlon 64 memory controller, is the explanation of this.

The nForce4 SLI X16 has the lowest result in this old version of 3DMark. The highest result is shown by the ATI RD480-based mainboard rather than by the newer RD580-based one. This test can hardly be called representative today as it doesn’t use any modern technologies and puts but a very low load on the graphics subsystem.

It’s the same in 3DMark05, but there are narrower gaps between the platforms. The RD480 wins again, but the RD580 is a mere 58 points behind. We didn’t run the CPU test from 3DMark05 as it was unstable with the dual-core CPU and produces unrepeatable results.


As opposed to 3DMark 2001 and 2005, the new 3DMark06 gives you quite a different picture: both the platforms on ATI’s chipsets are far ahead of the Nvidia nForce4 SLI X16. We used only one graphics card in the theoretical tests, so the lower results of the Nvidia platform make us suspect that ATI’s chipsets have a more efficient PCI Express controller.

The speed of WinRAR greatly depends on the hard disk drive controller. The Nvidia nForce4 SLI X16 proves it has the most efficient one and wins the test. Both the mainboards on chipsets from ATI Technologies have similar results, yet the newer chipset is a little worse. So, the Serial ATA-II controller from the ULi M1575 South Bridge is probably less efficient than the one in the ATI SB450 chip, although the latter supports only the first version of the Serial ATA interface and is in fact a version of the popular Silicon Image SiI3112.
As you probably remember, the point of this test is in sending a certain amount of data from the system memory to the graphics memory and back again, the size of the transferred block varying from 64KB to 4MB. The utility loads the memory controller, the graphics bus and the CPU-chipset link.


We’ve got some curious results here. The RD480-based platform is quite confidently ahead when transferring data from the system memory into the graphics card’s and irrespective of the data block size. We performed the same test with a Radeon X1900 XTX down-clocked to 500/1000MHz and with a Radeon X1800 XL and found that the graphics memory frequency was not a limiting factor. The CrossFire Xpress 3200 is behind for some other reason.
It’s even more interesting with the backward transfer: the new ATI chipset is much better than the older one, but only with a Radeon X1900 and irrespective of the latter’s clock rates. When we installed a Radeon X1800 XL, the results went down immediately to the level of the Radeon Xpress 200 CrossFire Edition!
Unfortunately, we can’t really explain this and we used a similar utility, Serious Magic Texture Download Benchmark, to verify the results.
This test outputs a simple black-and-white image on the screen and can work in two modes: 1) render and display, and 2) render and download. The second mode helps measure the speed of texture transfers into the graphics memory and back.

In the “render and display” mode we can see that the GeForce 7800 GTX outputs the picture on the screen faster than the Radeon X1800/X1900 cards do, and this is the single point to be made here.


The “render and download” mode yields much more valuable information. The CrossFire Xpress 3200 is not the best working with the GeForce 7800 GTX – the single-chip nForce4 SLI proves to be faster. The performance grows up when we install the Radeon X1900 XTX, but not to the level of the older chipset from ATI. What’s curious, we observe the same performance slump with the Radeon X1800 XL here. Most likely a lot of temporary registers in R580 compared with R520 helps the former quite noticeably, because the Radeon X1900 XTX does better even at reduced frequencies than the Radeon X1800 XL.
Unfortunately, both tests are aimed at measuring only the peak performance of the PCI Express bus, which may sometimes be not representative of the true hardware performance in real games and applications. And real applications may require simultaneous data transfer from and to the system memory.
There is yet no reliable method to measure the PCI Express bandwidth when transferring data from one graphics card to another, the efficiency of the CrossFire Xpress 3200 with two Radeon X1900 XT CrossFire Edition graphics cards installed was checked in the gaming tests only.



The CrossFire Xpress 3200 doesn’t boast any advantage over the earlier chipset despite the improved architecture and doubled number of PCI Express lanes. This is not a wonder, however, considering the hardware image-compositing mechanism in the Radeon X1900 XT CrossFire configuration.
ATI’s multi-GPU technology implemented in senior Radeons seems to depend on the PCI Express bandwidth less than Nvidia’s SLI does. Note also that the performance of the Radeon X1900 XT CrossFire configugation is high enough for playing in 1280x1024 resolution with turned-on 14x Super AA.



The newer chipset brings about a performance growth of 2-3fps in Doom 3 , but only in the 4x FSAA mode. In the other cases the two platforms from ATI Technologies have identical results. The 8x Super AA mode is available here for owners of a Radeon X1900 CrossFire system while the 14x Supper AA mode is not due to low average performance.



The CrossFire-linked Radeon X1900 XT cards ensure a comfortable frame rate with any Super AA level. The advantage of the CrossFire Xpress 3200 over the Radeon Xpress 200 CrossFire Edition is negligible and only shows up in 1600x1200 resolution with turned-on Super AA.



It’s quite different on the Research map: the new platform from ATI is considerably slower than the older one at 4x full-screen antialiasing when each of the graphics card is rendering its own frames (the AFR mode). There’s nothing like that in the Super AA modes where the CrossFire Xpress 3200, on the contrary, ensures a certain performance bonus, obviously due to the increased bandwidth of the graphical PCI Express slots.


The load on the PCI Express bus is higher in HDR mode, so the two complete PCI Express x16 slots of the CrossFire Xpress 3200 chipset provide a performance gain even at 4x FSAA as well as at 8x Super AA. In the latter case the resolution of 1600x1200 is not quite playable, but in 1280x1024 you can enjoy the superb image quality without any slowdowns.


The same is true for the Research map: 1600x1200 resolution with 8x Super AA is beyond the capabilities even of such a powerful graphics subsystem as Radeon X1900 XT CrossFire.



F.E.A.R. is such a heavy application that the new chipset from ATI can shows its advantages here. The performance gain is never bigger than 3fps, though. Moreover, the results for the 8x and 14x Super AA modes are of a purely theoretical interest only, the min speed being below 25fps and the average speed not reaching to the required 60fps.



The CrossFire Xpress 3200 shows its very best in the high-resolution textures heavy Half-Life 2 . It is about 30% faster than the Radeon Xpress 200 CrossFire Edition in the 4x FSAA + 16x AF mode. The gap is 10-15% in the 8x Super AA mode and then vanishes completely at the higher antialiasing level because factors other than the PCI Express bandwidth begin to limit the system performance.



It’s only in some separate cases that we see an advantage of the new ATI chipset in Half-Life 2: Lost Coast : in 1600x1200 resolution of the “eye candy” mode and at 14x Super AA. Featuring HDR and sophisticated pixel shaders, this tech demo is very resource-consuming. Turning on Super AA makes the performance drop down to below comfortable level.



We again see the CrossFire Xpress 3200 falling back a little in the simplest test mode. It may be the result of our using the FRAPS utility here. There’s unavoidable inaccuracy in this method. In the rest of the test modes the two ATI platforms have the same speed which is never lower than 50fps even with enabled 14x Super AA.



It’s only in the 14x Super AA mode that the CrossFire Xpress 3200 has any advantage over the older chipset and this advantage amounts to 5% only. This is expectable since the game uses rather simple textures. Coupled with the Radeon X1900 XT CrossFire, both chipsets from ATI easily ensure a frame rate of over 90fps in the hardest mode possible.



Unlike in the comparison of the nForce4 SLI and nForce SLI X16 chipset, the newer North Bridge from ATI is better than the RD480. Still, this is a negligible advantage, even though there’s a high textural load in Serious Sam 2 . It’s possible that the combined power of two CrossFire-linked Radeon X1900 is not enough for the higher PCI Express bandwidth to have a bigger impact on the system performance.
The game is playable in the “eye candy” mode only, but even there the speed may bottom out to 10fps and lower. The game is really a demanding application and it also has some problems with graphics cards with GPUs from ATI Technologies.



Like with the nForce4 SLI X16, having the graphics subsystem connected via 32 PCI Express lanes leads to a colossal performance boost, up to 60% in the 14x Super AA mode, even though the CrossFire subsystem has a hardware image-compositing unit. The two Radeon X1900 XT cards on the CrossFire Xpress 3200 beat the SLI pair of GeForce 7800 GTX, although they used to be slower in this game on the older ATI chipset. This seems to be the most vivid illustration of how better the RD580 may be over the RD480, although the game itself is nothing serious for modern graphics hardware and doesn’t even have pixel shaders.



The game runs somewhat faster on the ATI CrossFire Xpress 3200 platform. The two complete PCI Express x16 slots provide a speed bonus of 1 to 4 fps while the average speed varies from 135fps in the “eye candy” mode to 50fps at 14x Super AA.



This is a rather simple application, so the new chipset has nothing to show you here. The CrossFire configuration with two Radeon X1900 XT cards makes all the antialiasing modes playable, including the highest-quality 14x Super AA.



CrossFire technology, at least in its top-end incarnation, is less dependent on the PCI Express bandwidth than Nvidia’s SLI is, but the CrossFire Xpress 3200 makes the Radeon X1900 XT CrossFire run this game faster than on the RD480 chipset. We saw no difference between the nForce4 SLI and nForce4 SLI X16 chipset in this test, by the way.

The game worked correctly when we turned on extreme full-screen antialiasing modes, but the results varies so greatly between one test pass and another that we do not publish the 8x and 14x Super AA results as non-representative.
However, the CrossFire Xpress 3200 ensures a certain advantage over the Radeon Xpress 200 CrossFire Edition using the ordinary 4x FSAA. The performance of the ATI solution is lower than that of the SLI configuration with two GeForce 7800 GTX.



The strong points of the new chipset are better seen in Warhammer 40000: Dawn of War. It is clear ATI’s approach to building a chipset with two complete PCI Express x16 slots is more productive than Nvidia’s implementation of the same idea.
The nForce4 SLI X16 cannot show its full potential because the graphical slots are connected to different chips in it, and data is transferred between the graphics card in a less efficient way.



The RD580-based platform is somewhat slower than the RD480-based one in the “eye candy” mode. The newer chipset overtakes the older as we enable higher antialiasing levels, yet the CrossFire Xpress 3200 never has a really big advantage in this test.



It’s only in the Super AA modes that we can observe a positive effect from the more PCI Express lanes the new chipset has, yet this effect is very small even then.



The mentioned effect is somewhat more conspicuous in the second test, even though it renders a smaller-scale scene.



Although the third test differs dramatically from the second one, it produces the same results: the CrossFire Xpress 3200 platform is always ahead of the Radeon Xpress 200 CrossFire Edition by 1-2fps.


3DMark06 agrees with 3DMark05: the CrossFire Xpress 3200 enjoys a slender lead in the first SM2.0 graphics test already.


The second test is more indifferent towards the number of PCI Express lanes per each graphical slot: the new platform offers a minor speed bonus in the Super AA modes only.


Larger amounts of data are transferred on the PCI Express bus in the first SM3.0/HDR graphics test than in the SM2.0 tests, so the CrossFire Xpress 3200 looks even better than the older chipset, although the gap is still a mere 1-2fps.


The same can be seen in the second SM3.0/HDR test, excepting 1280x1024 resolution in the 4x FSAA + 16x AF mode where the results of both the platforms coincide to 0.1fps.
So, the 3DMark suites have shown that it is really useful to have two complete PCI Express x16 slots in the system and that the ATI CrossFire Xpress 3200 is really faster than the Radeon Xpress 200 CrossFire Edition, but its advantage is very small, just like in a majority of real games.
So, is the new CrossFire Xpress 3200 platform from ATI a worthy reply to Nvidia’s nForce4 SLI X16? After we’ve done our tests, we can answer this question in the affirmative. Nvidia’s long-time rival, the Canadian ATI Technologies Company, has developed a chipset which is not only comparable to the nForce4 SLI X16 in capabilities but is also far superior in the concept.
Nvidia implemented two complete PCI Express x16 slots by using a combination of two chips, each with its own PCI Express controller. ATI, on its part, equipped its new North Bridge with a controller that can serve both the slots at once. As a result, the data path between the two graphics cards is made shorter and the data transfer latencies is minimized, which is very important to achieve the maximum performance in multi-GPU mode when the cards are actively exchanging data.
Unfortunately, we couldn’t find out if the new platform from ATI is more efficient than the nForce4 SLI X16 in gaming applications. The latter only supports Nvidia’s graphics cards, but we need to use the same graphics cards on both platforms for a correct comparison, which is currently not possible. The thing is that ATI driver blocks CrossFire on third-party chipsets (except Intel), and Nvidia driver doesn’t allow enabling SLI on systems built around anything other than nForce4 SLI chipsets.
On the other hand, our comparison of the CrossFire Xpress 3200 with the Radeon Xpress 200 CrossFire Edition has proven that two complete PCI Express x16 slots can really bring you some profit in games, but not too much. The speed gain is highest in games that use high-resolution textures and do not depend on the pixel processor performance – just like it is the case with the nForce4 SLI X16. Half Life 2 and Unreal Tournament 2004 are examples of such applications. Pixel shader-heavy games do not run much faster – the performance gain is usually just a few percent in them.
Besides the efficient architecture, the ATI CrossFire Xpress 3200 has a number of other merits that make it the most advanced multi-GPU platform for today. It is simple, consisting of only 22 million transistors, and reliable, being stable at an increased voltage and temperature. The latter fact also makes it suitable for overclocking. It consumes a mere 8 watts of power and allows using almost any South Bridge with a PCI Express x4 interface. Considering the major OEM-oriented business model of ATI Technologies, we can hardly be wrong to predict a very bright future to the CrossFire Xpress 3200 platform.
As for the particular product on the new chipset, the ASUS A8R32-MVP Deluxe mainboard is superbly designed and is almost free from drawbacks, yet we think a Deluxe mainboard might come with more and better accessories.
Highs:
Lows: