Workstation Processors Duel: AMD Opteron against Intel Xeon

Today we are going to compare the performance of dual-processor workstations built on single-core and dual-core processors from AMD Opteron and Intel Xeon processor families.

by Ilya Gavrichenkov
12/21/2005 | 08:44 PM

The CPU market has changed greatly throughout this year of 2005 and Intel’s position is not as superior in every market sector as it had used to be. Yes, the microprocessor giant enjoys the global advantage over AMD as concerns the total quantity of units shipped, but AMD has pressed the rival in the retail sector since individual users react more quickly to the changing market situation. It is a fact that AMD is currently the technological leader and has been such for quite a while already. AMD was the first to propose and implement 64-bit extensions to the classic x86 architecture and in 2005 became the first x86 CPU manufacturer to start shipping dual-core processors.

 

We’ve always kept our eye on the competition in the desktop CPU market and our numerous test sessions have proved that dual-core CPUs from AMD are more appealing than Intel’s due to a number of reasons. And today we are going to touch upon another subject of great controversy between Intel and AMD – processors for high-performance workstations.

AMD’s share has increased greatly in this market sector, too. The high consumer qualities of the Opteron processor we’ve talked about in our earlier reviews are appreciated by the customers. On the other hand, AMD’s CPUs still account for only about 2% of the performance workstation market, despite the considerable growth of their share that has almost doubled in the past nine months. Intel’s CPUs have the biggest share, 93%, partially because of the market inertia. The remaining 5% is controlled by the suppliers of non-x86 CPUs: Sun, HP and IBM. The x86 architecture is steadily ousting the competing ones, though.

In this review we are going to put up a fight between the basic CPU models for dual-processor workstations with the most popular x86 architecture, i.e. between AMD Opteron and Intel Xeon. The competition between these CPUs may become even tougher, now that AMD and Intel have both implemented dual-core designs in them. The situation is also spiced up by the fact that Intel seems to have lost the previous round where single-core models fought. The Xeon based on the NetBurst architecture proved to be slower than the Opteron in terms of sheer performance in a majority of applications as well as economy, i.e. power consumption and heat dissipation.

The next round may bring some dramatic changes because instead of pushing up the clock rate of their processors, the manufacturers took to parallelism, endowing their products with the ability to distribute the load among several execution cores. The dual-core Intel Xeon processor, unlike the rivaling AMD Opteron, also boasts two virtual cores in each physical one thanks to the well-known Hyper-Threading technology. So while one dual-core Opteron can only work with two execution threads, the Xeon can process four such threads simultaneously. Of course, only two threads are processed with the maximum efficiency, but anyway.

Well, let’s judge Intel’s and AMD’s single- and dual-core processors for performance workstations by their results in benchmarks. The next section is about what products are going to take part in our today’s tests.

Processors

AMD Opteron 254

The Opteron 254 is the senior model in the single-core CPU series meant for dual-processor systems. AMD has just recently started to ship such CPUs. The Opteron 254 is based on the 90nm Troy core which is in fact an analog of the San Diego core employed in Athlon 64 FX and Athlon 64. In other words, the Troy is the same E4 stepping, supports SSE3 and features a 1MB L2 cache.

The Opteron 254 processors we got for our tests were marked as OSA254FAA5BL and this is actually the only existing version of the Opteron 254. Having a very high clock rate of 2.8GHz, this processor doesn’t have a reduced-heat-dissipation modification. The TDP of the Opteron 254 is 92.6W, so it can be used with the same mainboards as other CPUs from the Opteron family.

The relatively low heat dissipation is one of the differences from the Athlon 64 FX-57 which too works at 2.8GHz and has a 1MB L2 cache, but has a TDP of 104W (or 12% higher). It means that AMD has to cull the semiconductor dies more carefully to produce the Opteron 254.

The other points of difference between the Opteron 254 and the Athlon 64 FX-57 are in fact the generic differences between the respective series. The Opteron 254 is meant for dual-processor systems, is installed into Socket 940, has an integrated dual-channel memory controller that requires registered modules of DDR SDRAM, and features three HyperTransport buses (rather than one) to connect to the chipset and to the second CPU.

The formal specification of the processor is listed below:

AMD Opteron 254

Marking

OSA254FAA5BL

Clock frequency

2.8GHz

Packaging type

940-pin organic micro-PGA

L2 cache

1024KB

Memory controller

128-bit, dual-channel

Supported memory types

Registered
DDR400/DDR333/DDR266 SDRAM

HyperTransport bus

1GHz, 3 busses

Core stepping

E4

Manufacturing technology

90nm, SOI

Typical heat dissipation

92.6W

Maximum package temperature

49-67o C

Vcore

1.35/1.4V

AMD64 technology support

Yes

NX-bit support

Yes

Heat dissipation and
power management

PowerNow!

This is the information the diagnostic CPU-Z utility reported about our sample of AMD Opteron 254:

 

We built a classic dual-processor system using two such CPUs. The Opteron 254 is neither physically nor virtually a dual-core processor, so the computer with two such CPUs was identified by the OS as having two logical processors.

AMD Opteron 275

Among the CPUs AMD offers for dual-processor workstations are also dual-core solutions. There are quite a few processors like that in the market today: the frequency of these models ranges from 1.6 to 2.4GHz. We were lucky to get the 2.2GHz model known as Opteron 275. This is not the senior model in the series, but it should give us enough information about the capabilities of dual-processor systems based on dual-core CPUs.

However, we have to point out right away that these were actually, not the Opteron 275 CPUs, but Opteron 875 from a more heavy-weight category. This doesn’t make any difference to our today’s experiment, because the latter is initially targeted for multi-processor configurations, but performs exactly alike to the Opteron 275 in a dual-processor system.

Dual-core Opteron processors are based on Egypt core (for the 800 series) or Italy core (for the 200 series) depending on the type of systems the CPU is intended for.

The Opteron 875 CPUs that we tested were marked as OSA875FKA6BS. It means their core is the E1 stepping and is similar to the Toledo core of the desktop Athlon 64 X2. The architecture of the dual-core Opteron doesn’t differ from the Athlon 64 X2 architecture: they both have two cores with two separate L2 caches, but share a single integrated memory controller and a single HyperTransport bus controller.

The cores of all dual-core Opterons are manufactured on 90nm tech process, have two 1MB L2 caches and support the SSE3 instructions set. The Opteron 275 (and 875) model has a clock rate of 2.2GHz and thus resembles the Athlon 64 X2 4400+. Unlike the desktop processor, the Opteron is installed into Socket 940 and has three HyperTransport buses, and its integrated memory controller supports registered DDR SDRAM.

The specification of the Opteron 875 processor we used in our tests is listed below:

AMD Opteron 875

Marking

OSA875FKA6BS

Clock frequency

2.2GHz

Packaging type

940-pin organic micro-PGA

L2 cache

2x1024KB

Memory controller

128-bit, dual-channel

Supported memory types

Registered
DDR400/DDR333/DDR266 SDRAM

HyperTransport bus

1GHz, 3 busses

Core stepping

E1

Manufacturing technology

90nm, SOI

Typical heat dissipation

95W

Maximum package temperature

67o C

Vcore

1.35V

AMD64 technology support

Yes

NX-bit support

Yes

Heat dissipation and
power management

PowerNow!

We should note that like the Opteron 254, the Opteron 875 dissipates less heat than its desktop mate, and this allows using dual-core Opterons on the mainboards single-core CPUs of this class are installed on. Older mainboards just need a BIOS update to support the dual-core models.

Here’s what CPU-Z told us about the system with two Opterons 875:

 

The OS installed on the platform with two dual-core Opteron 275 (875) CPUs is going to identify four logical processors.

Intel Xeon 3.6GHz

The maximum frequency in the single-core Xeon series has reached 3.8GHz. This is the limit of the NetBurst architecture which is employed in desktop Pentium 4 CPUs as well as in Xeon CPUs for servers and workstations. However the Pentium 4 3.8GHz can be easily found in any shop, while the Xeon 3.8GHz is much harder to spot. We are far from accusing Intel of a paper announcement on September 26, 2005, yet we couldn’t get a sample of the processor for our tests. Intel’s representatives refused our request and our nearby shops couldn’t provide us with this Xeon, either. Instead, Xeons with a clock rate of 3.6GHz are going to perform for you today.

Intel offers two flavors of the Xeon 3.6GHz, with 1 and 2 megabytes of L2 cache, and we got the newer model with the larger cache. Such processors are much alike to the 600 series of the Pentium 4 (on the Prescott-2M core) and are based on the 90nm Nocona core.

Besides the 2MB L2 cache the support of the 800MHz FSB is another distinguishing feature of the Nocona core employed in CPUs for servers and performance workstations. Unlike the Pentium 4, the Xeon began to use that FSB clock rate not very long ago and the transition to the faster FSB is a big step forward because the system bus is commonly considered a bottleneck in Intel’s existing multi-processor architecture. In modern dual-processor Xeon-based configurations the CPUs share the system bus bandwidth and it is through the system bus that they communicate with one another and receive data to be processed (the memory controller in such a system is located in the chipset rather than in the processor).

The Nocona core supports Hyper-Threading, so the Nocona-core Xeon is considered as two virtual CPUs.

Being a relatively new product, the Xeon 3.6GHz on the Nocona core boasts all the recent innovations from Intel, including the 64-bit extensions to the x86 architecture called EM64T, the buffer-overflow-protection technology Execute Disable Bit, and the power management system Enhanced Intel SpeedStep.

The formal specification of the Xeon 3.6GHz follows below:

Intel Xeon 3.6 GHz

S-Spec

SL7ZC

Clock frequency

3.6GHz

Packaging type

604 pin

L2 cache

2048KB

Bus frequency

800MHz

Core stepping

N0

Manufacturing technology

90nm

Typical heat dissipation

110W

Maximum package temperature

72o C

Vcore

1.25-1.388V

Hyper-Threading support

Yes

EM64T technology support

Yes

Execute Disable Bit support

Yes

Heat dissipation and
power management

Enhanced Intel SpeedStep

We’d like to note two things here. First, although the Xeon 3.6GHz is much alike to the Pentium4 660, it requires a special platform and is installed into Socket 604 that only dual-processor mainboards are equipped with. Second, the heat dissipation of the Xeon is higher than that of the competing CPUs from AMD, and it requires more advanced cooling solutions. For example, the Xeon 3.6GHz that we received for our tests comes with an all-copper cooler, while Opterons are quite satisfied with a cooler that only has a copper sole.

CPU-Z has the following to say about the Xeon 3.6GHz:

 

The utility identifies the processor core wrong for some reason. The Cranford core is employed in Xeon MP CPUs intended for multi-processor configurations. The core of Xeons for dual-processor platforms is codenamed Nocona.

Our dual-processor platform based on Xeon 3.6GHz is regarded by the OS as having four logical processors due to Hyper-Threading.

Dual-core Intel Xeon 2.8GHz

And here’s the true hero of this review. The processors we have described above are hardly a surprise for anyone as they have long been on the market and have been thoroughly explored by the users while the dual-core Xeon has appeared just recently and is scarcely available as yet. Intel introduced its dual-core Pentium D back in the first half of 2005 but then didn’t hurry to release a dual-core Xeon. Why? Because unlike the rest of the Xeons, the dual-core model from this series has no counterpart among Intel’s desktop CPUs. To make a worthy reply to the highly successful dual-core Opteron Intel has to do some additional engineering work. The result is going to be presented to you right now.

So the core of the new Xeon is codenamed Paxville and is a concatenation of two Nocona cores. Thus, the main difference of the Paxville from the Smithfield core that is employed in the dual-core Pentium D is not only in its targeting at servers and workstations but also in the bigger amount of L2 cache memory. The Smithfield has two cores with 1MB L2 cache in each whereas the Paxville has a 2MB L2 cache in each of its cores or 4MB in total. Only Xeon MP CPUs, intended for multi-processor systems, could boast such a large cache before, but now the cheaper CPU for dual-processor configurations comes with 4MB of cache, too.

Strangely enough, Intel didn’t wait for its 65nm tech process to come into operation to produce the Paxville. The company won only a couple of months with the earlier announcement, but then had to make processors with a really gigantic core. It is not a wonder then that Paxville-core CPUs are unavailable on the market.


Xeon (Nocona) on the left; Xeon (Paxville) on the right

But it was not only the manufacturing cost of the dual-core Xeon that suffered from such a big die size. Intel had to reduce the frequency of the new processor not only below the frequency of singe-core Xeons on the Nocona core but also below that of the dual-core Pentium D to ensure stability and acceptable heat dissipation. The clock rate of the only announced model in the dual-core Xeon series is 2.8GHz, but even this couldn’t keep the heat dissipation and power consumption of the Paxville within normal limits. The TDP of the dual-core 2.8GHz Xeon is 135W, which is 23% higher than that of senior single-core Xeon models. Mainboards for the Paxville processor must have a reinforced voltage regulator as a result and more advanced power supplies and coolers may also be necessary. For example, our dual-processor system with two Paxville-core Xeons would not work on a 460W power supply and we had to replace it with a 600W analog. As for cooling, the all-copper coolers included with senior single-core Xeons coped with the dual-core processors as well.

Although the high heat dissipation prohibits using the Paxville processor on the existing mainboards, it uses the same Socket 604 as the older Xeons. There is no need to introduce a new socket because the CPUs are fully pin-compatible. Moreover, the dual-core Xeon uses the same 800MHz Quad Pumped Bus and is compatible with Intel’s E7520 and E7525 chipsets. The downside is that the system bus bandwidth is now shared among four cores and the speed of communication between the CPUs and memory and between the different CPUs may suffer (Intel recognized this problem and is now preparing new chipsets with two independent FSBs).

Dual-core Xeons on the Paxville core support Hyper-Threading, so each processor is viewed by the OS as two physical and four logical units.

That is, a dual-processor system based on Paxville CPUs can work with eight execution threads simultaneously.

We’d like to note that even Windows XP Professional SP2, not to mention newer products from Microsoft, works without any limitations on a dual-processor system with two dual-core Xeons, despite such a big number of CPU-like units. Below is the formal specification of the Paxville-core Xeon:

Dual-Core Intel Xeon 2.8 GHz

S-Spec

SL8MA

Clock frequency

2.8GHz

Packaging type

604 pin

L2 cache

2x2048KB

Bus frequency

800MHz

Core stepping

A0

Manufacturing technology

90nm

Typical heat dissipation

135W

Maximum package temperature

72o C

Vcore

1.25-1.4V

Hyper-Threading support

Yes

EM64T technology support

Yes

Execute Disable Bit support

Yes

Heat dissipation and
power management

Enhanced Intel SpeedStep

The latest version of CPU-Z already knows the Paxville and reports the following about the dual-core Xeon:

 

Before proceeding to the thorough examination of the performance of the reviewed platforms we ran the synthetic SiSoft Sandra 2005 benchmark on the system with two dual-core Intel Xeon 2.8GHz. This will let us know what we can expect from this platform.

The algorithms employed by this benchmark run well in parallel, so the results can be considered as the peak performance of the dual-processor system with two dual-core Paxville CPUs. As you can see, the Paxville-core Xeon platform is far ahead of the platform with the single-core Xeon (Nocona), so we can expect high performance from the new dual-core processor in real-life applications, too.

Mainboards

Supermicro X6DA3-G2

We will first talk about the mainboard we tested the Intel Xeon processors with. It is not a problem to find a dual-processor platform for single- and dual-core AMD Opterons, but Socket 604 mainboards for dual-core Xeons on the Paxville core are rather rare. Intel’s release of the Paxville-based processors kind of gave a headache to the platform manufacturers because the new CPUs were incompatible with older Socket 604 mainboards because of the increased power requirements. The manufacturers had to quickly update their products, reinforcing the voltage regulator, and this explains why there are not that many mainboards that support dual-core Xeons in retail shops.

Moreover, we couldn’t go on with just any dual-processor mainboard that supported the Paxville-based Xeon because the mainboard for a high-performance workstation must also have a graphics bus for a professional graphics card. This requirement limited our choice to platforms on the Intel E7525 chipset as it is the only chipset from Intel for dual-processor systems to support the 800MHz FSB and a PCI Express x16 bus.

So we hadn’t much choice and the Supermicro X6DA3-G2 was almost the only mainboard we could find that complied with our requirements. Supermicro is currently the only company that has actively supported the launch of the dual-core Xeon. While the other manufacturers just announced one or two mainboards with added support of the new processor, Supermicro updated almost the entire product range and added not only a reinforced CPU voltage regulator but also such an interesting feature as Serial Attached SCSI to its mainboards.

So let’s have a closer look at the Supermicro X6DA3-G2 mainboard. As we mentioned above, it is based on the Intel N7525 North Bridge which supports two Intel Xeon processors with 800MHz FSB, dual-channel registered DDR2-400 SDRAM and PCI Express.

The Supermicro X6DA3-G2 has each of the listed features, too. The mainboard carries two Sockets 604 for Intel Xeon processors of any type – single- and dual-core models with a 533 or 800MHz FSB. The power circuit is the most important component here, and the mainboard has two independent four-channel voltage regulators. The MOSFETs employed in this circuit do not have any heatsinks on them, although we would think that heat-spreaders could be a good idea. The load-bearing elements become very hot when the mainboard works with two dual-core Xeon 2.8GHz CPUs that have the highest power consumption.

The Supermicro X6DA3-G2 offers eight DDR2 DIMM slots for registered DDR2-400 SDRAM modules with up to 2GB capacity each. The maximum amount of memory you can install is 16 gigabytes. The memory access is dual-channel and the slots belonging to the different channels alternate on the PCB. The peak memory bandwidth is 6.4GB/s, i.e. equals the peak bandwidth of the 800MHz FSB that connects the chipset with the CPUs.

Intel’s E7525 chipset endows the reviewed mainboard with support of ECC and Memory RAS (Reliability, Availability and Serviceability) technologies. This not only ensures reliable operation of the memory subsystem but also helps avoid stoppages when a module fails.

We’d like to draw your attention to the way the CPU sockets and the DIMM slots are placed on the PCB. They occupy the right part of the mainboard and are located in such a way that these components could be cooled with the fans on the front and rear panels of a rack-mount system case. The air stream is also expected to cool the CPU voltage regulator and the E7525 North Bridge which is capped with a passive aluminum heatsink.

The Intel E7525 supports 24 PCI Express lanes in total and 16 lanes are attached to the graphics PCI Express x16 slot on the Supermicro X6DA3-G2 – this slot must be present on any high-performance workstation. The remaining 8 lanes are divided into two interfaces. The first four lanes are attached to the second PCI Express slot which is logically x4, but physically designed as PCI Express x16. It means that the Supermicro X6DA3-G2 may theoretically support two graphics cards in SLI or CrossFire mode if the graphics card manufacturers take care of that. But even without the appropriate driver support, you can install a second graphics card into the second PCI Express slot and connect more monitors to your system.

The other PCI Express x4 bus connects the North Bridge with the onboard controller Intel 6700PXH which is responsible for the three PCI-X slots available on the mainboard: two PCI-X 100MHz and one PCI-X 133MHz. Two onboard controllers (Ethernet and Serial Attached SCSI – SAS) are also connected across the PCI-X interface provided by the Intel 6700PXH.

The networking capabilities of the Supermicro X6DA3-G2 are provided by the dual-port Gigabit Ethernet FW82546GB controller from Intel. The onboard SAS controller, Adaptec AIC 9410W, supports eight SAS ports and RAID 0 and 1 arrays.

The rest of the mainboard’s functionality comes from the ICH5R South Bridge which is well known to us from the desktop mainboards on i875/i865 chipsets. The South Bridge supports two Serial ATA-150 ports, eight USB 2.0 ports, one ordinary 32-bit PCI slot (located between the PCI Express slots), and six-channel AC’97 audio.

Although the ICH5R may seem an out-dated South Bridge, its capabilities are sufficient for a mainboard of this class. Newer South Bridges boast a more advanced audio solution and more USB 2.0 and SATA ports, but the audio section and the number of USB ports are not so important for a workstation, while hard drives are expected to be attached to the SAS controller in the first place, especially since this standard implies compatibility with ordinary SATA hard drives.

The Supermicro X6DA3-G2 leaves a very pleasant impression with its functionality, stable operation and convenient design. This Extended ATX mainboard can be installed in rack-mount cases as well as in “towers”. The mainboard is powered via three cables at once (4-pin, 8-pin and 24-pin). The power connectors are all shifted to the right edge of the PCB so that the cables from the power supply wouldn’t clutter the system case.

The mainboard allows you to attach five fans, including the CPU ones, and offers advanced fan rotation speed control as well as overall system monitoring options. Besides the traditional system status monitoring through the BIOS Setup and special utilities, the mainboard supports the IPMI 2.0 interface that allows collecting system status information remotely via Ethernet connection after you install an appropriate daughter card.

The rear connectors panel of the Supermicro X6DA3-G2 mainboard carries PS/2 ports for the mouse and keyboard, four USB 2.0 ports, two COM and one parallel port, two network RJ-45 sockets and three audio jacks.

In conclusion to this chapter of our coverage we would like to say that the mainboard from Supermicro received our highly positive response, as it combines wide functionality, smart PCB design and high reliability. We would like to emphasize once again that this mainboard is compatible with all Xeon processors, including the Paxville, has two PCI Express x16 slots on board and supports the promising SAS interface. Supermicro X6DA3-G2 seems to be just the ideal foundation for a dual-processor workstation with Intel Xeon CPUs.

Tyan K8WE (S2895)

For our AMD Opteron tests we needed one more mainboard that could be suitable for a high-performance workstation platform, i.e. could be similar to Supermicro X6DA3-G2 in features but supported Socket 940 processors. It was an easy task to find a mainboard like that. Although quite a few mainboard manufacturers are offering mainboard solutions like that, we fell for the Tyan K8WE (S2895), which has been receiving a lot of positive response from the users lately.

This mainboard is based on NVIDIA nForce Professional, which is actually not surprising at all. One of the major criteria defining a platform as a workstation solution, and not a server one is a graphics bus that allows using high-performance graphics cards. As of today, there is only one chipset family for dual-processor AMD Opteron systems supporting PCI Express x16 bus – NVIDIA nForce Professional (unfortunately, we cannot regard ServerWorks HT-2000 chipset as an alternative to NVIDIA’s solution here, because of its extremely limited availability).

NVIDIA nForce Professional chipset consists of two chips, which can be used together or separately in the end-systems. Tyan K8WE (S2895) mainboard features both these chips. Here I would like to stress that different combinations of nForce Professional chips affect only the peripheral features of the mainboard and do not matter for the CPU and memory support. The main reason for that is the fact that in dual-processor systems built with AMD Opteron CPUs, the two processors communicate with one another via the direct HyperTransport bus working at 1GHz and featuring 8GB/s data transfer rate, and the memory support is provided by the memory controllers integrated into the CPUs.

Therefore, the list of basic features of the Tyan K8WE (S2895) mainboard is primarily determined by the Opteron processors and not by the system core logic. This mainboard is equipped with two Sockets 940 that can accommodate any CPUs from this processor family, including single-core and dual-core models. Here I have to stress that the release of dual-core processors didn’t require any changes to be made to the mainboard design on the hardware level. Since the dual-core Opteron processors are pin-to-pin compatible with their single-core brothers and do not require any additional power, the only thing they need to be supported on the existing platforms is the BIOS update.

To the right of each CPU socket there are four DIMM slots. Opteron processors feature an integrated dual-channel memory controller requiring Registered DDR400 SDRAM memory modules (or slower). Note that even though the memory subsystem in the dual-processor Opteron system is split into two sub-systems for each of the CPUs, it is still functioning as a single address space thanks to NUMA technology. So, if you install at least two Registered DDR400 SDRAM memory modules into your Tyan K8WE (S2895) based system, you can calculate the peak memory bandwidth keeping in mind that it can work in four-channel mode. As for the maximum supported memory capacity, it can be determined basing on the maximum capacity of the Registered memory modules available in the today’s market. As of today, the maximum memory capacity for the Registered DDR SDRAM DIMMs is 2GB per modules, so Tyan K8WE (S2895) can theoretically accommodate up to 16GB of memory.

To the left of the processor sockets there are power elements forming the CPU voltage regulators. Although Opteron processors are not as power hungry as Intel Xeon, Tyan K8WE (S2895) is equipped with a pair of four-channel voltage regulators: one per CPU.

The overall layout of the right-hand side of the mainboard PCB is optimized for use in rack-mount cases: the CPUs and memory are placed along the same horizontal lines. However, this is when we have to draw your attention to one noticeable design flaw of the Tyan K8WE (S2895) mainboard: the ATX power connectors that are located between the CPU sockets. When you plug in the power cable, they will be lying right in the middle of the cooling air stream, thus hindering its natural flow. However, if this mainboard is installed into tower cases, the air will be flowing differently, and this location of the ATX power supply connectors will be just fine.

The core logic chips (MCP) – nForce Professional 2200 and nForce Professional 2050 - are placed in the middle of the mainboard PCB. Each CPU is connected with one of the chips via the HyperTransport bus working at 1GHz frequency. The first chip, nForce Professional 2200, is responsible for the first PCI Express x16 slot and some peripheral controllers. Actually, most mainboard features dealing with peripheral devices are implemented via this MCP.

Firstly, this chip offers the implementation of the IDE RAID controller supporting four Serial ATA-300 ports and RAID arrays of type 0, 1 and 0+1. Secondly, it provides the support for 8 USB 2.0 ports that are laid out on this mainboard. Thirdly, nForce Professional 2200 ensures the work of the first Gigabit Ethernet port supporting ActiveArmor hardware network security system. Fourthly, the same MCP provides the support of the regular 32-bit PCI bus that is required for the only existing PCI slot and additional IEEE1394 controller. And fifthly, the same nForce Professional 2200 delivers the integrated 6-channel AC97 sound.

Another chip installed onto Tyan K8WE (S2895) mainboard, nForce Professional 2050, bears much smaller load of responsibilities. It provides only the second PCI Express x16 slot and ensures proper functioning of the second Gigabit Ethernet port, which is absolutely identical to the first one. The funny thing is, though, that the functionality of the two MCP chips correlates very nicely with the size of the heatsinks installed on them :)

So, thanks to the two MCP chips, Tyan K8WE (S2895) mainboard features two fully fledged PCI Express x16 slots. And it means that the mainboard supports the fastest possible SLI configuration. In other words, this mainboard will be an excellent choice for a high-performance graphics workstation.

Besides one PCI slot and two PCI Express x16 slots, Tyan K8WE (S2895) mainboard also features PCI-X slots that are a typical feature of any high-performance server or workstation. These slots are implemented with the help of another chip, the so-called PCI-X tunnel. In our case it is AMD 8131 chip. Just like the MCP chips, it is connected directly to one of the CPUs via the HyperTransport bus working at 600MHz. due to the PCI-X tunnel, our mainboard has three 64-bit PCI-X slots (two working at 100MHz and one working at 133MHz). Also, there can be an optional integrated SCSI controller connected to AMD 8131 chip via the PCI-X bus. Our mainboard modification didn’t have a controller like that, so we won’t go into details about it today.

The Tyan K8WE (S2895) mainboard is designed in Extended ATX form-factor and has pretty traditional components layout for solutions of the kind. The not very convenient location of the power supply connectors that we have already mentioned above seems to be the only more or less significant flaw of the Tyan K8WE (S2895) design.

I would like to say that all chips, including NVIDIA nForce Professional MCP, are equipped with passive aluminum heatsinks. To be honest, this decision can be argued about, because contemporary NVIDIA chipsets are known for their relatively high heat dissipation. So, if you are building a Tyan K8WE (S2895) based platform, make sure that you pay due attention to arranging proper air flow inside the case. By the way, there are five fan connectors on the mainboard that could be useful in this case.

Speaking of small pleasing trifles we would like to mention that Tyan engineers took good care of system builders and testers. Reset and Power On pins on the mainboard are duplicated as special micro-switches so that it is very easy to work with the mainboard not only when it is installed into the system case, but also on an open testbed. By the way, the traditional Clear CMOS jumper has also been replaced with a micro-switch like that:

And in conclusion let’s take a look at the mainboard rear panel. Here you can see PS/2 ports for mouse and keyboard, four USB 2.0 ports, one serial port, two network RJ-45 connectors, one FireWire port and three audio-jacks.

Summing up, I have to admit that Tyan K8WE (S2895) mainboard is a very successful platform for a dual-processor workstation with AMD Opteron CPUs. It is free from any CPU compatibility issues, features two fully-fledged PCI Express x16 slots supporting SLI technology, and offers a lot of options for connecting other systems and devices. Also I would like to specifically stress that this mainboard is the most advanced one of all NVIDIA nForce Professional based products recommended by AMD.

Comparative Specifications Chart: Supermicro X6DA3-G2 vs. Tyan K8WE (S2895)

Testbed and Methods

The major goal of this test session was to find out which CPUs would be the best choice for a high-performance dual-processor workstation with x86 architecture. Of course, the main candidates for this title were selected from the Intel Xeon and AMD Opteron CPU families. Moreover, we picked out 2 testing participants from each CPU family: one single-core and one dual-core processors. All the CPUs and mainboards we used in this test session have already been discussed in detail above.

We equipped our workstations with 4GB of RAM and professional NVIDIA Quadro FX 4500 graphics accelerator. So, the four test platforms we assembled in our lab can really qualify for high-performance workstations.

Here is a list of hardware components we used for our systems:

Performance

FutureMark PCMark05

We decided to start our test session not with the applications that are typical of professional workstations, but with some classical benchmarks, which we usually use to test regular desktop platforms. It will allow us to get a better idea of the performance level contemporary dual-processor platforms may deliver not only in specific programs but also in desktop applications.

The results are pretty logical. PCMark05 test creates multi-threaded workload during work that is why the more logical CPUs there are, the higher is the overall system performance. This is actually the reason why Opteron 275 platform is faster than Opteron 254 platform. Xeon 3.6GHz system, just like the Opteron 275 system, boasts four logical CPUs. Even though Xeon 3.6GHz are single-core processors, they support virtual multi-core technology aka Hyper-Threading.

As for the results obtained on a platform with real dual-core Xeon 2.8GHz processors, its performance doesn’t look that great, according to PCMark05. This can be explained by the fact that these CPUs work at a pretty low clock frequency, which cannot be compensated by the ability to process eight computational threads at a time. The thing is that PCMark05 can create maximum four simultaneous threads. In other words, this benchmark cannot load a dual-processors system with dual-core CPUs supporting Hyper-Threading technology to the full extent.

We ca see almost the same situation in PCMark05 subtest targeted for testing pure processor performance. The only thing you should pay your attention to on this diagram is the victory won by the Xeon 3.6Ghz system. However, this is not an indisputable victory, because PCMark05 shows traditionally better results on Intel platforms.

When we tested the memory subsystems of the participating platforms in PCMark05, both Opteron and single-core Xeon systems demonstrated very close results. It seems to be quite natural, keeping in mind that the built-in memory controller of the Opteron CPUs works with dual-channel DDR400 SDRAM, while the memory controller of the Intel E7525 chipset used in our Xeon systems supports dual-channel DDR2-400 SDRAM with the same bandwidth.

However, if you have been reading our review attentively enough, you may wonder why we see no effect from the NUMA technology implemented in the Opteron platforms. A dual-Opteron based system seems to have four memory channels in total, doesn’t it? The thing is that PCMark05 test measures the memory bandwidth as a single thread, so you cannot see the advantages of the NUMA technology here.

As for the system built with dual-core Xeon processors, the low working frequency of its CPUs prevented it from performing as fast as its rivals in this test.

Quake 4

Besides PCMark05, we decided to add one more “game” test (in the direct and indirect meaning of this word). Namely, we decided to test the performance of our workstation platforms in a gaming application, such as one of the latest and more popular games – Quake 4. There were several reasons why we included this test into the list of our approved benchmarks. The thing is that the graphics card manufacturers have finally implemented SMP support in their drivers. And as we have found out during our previous dual-core CPU test session, some OpenGL games really get a performance boost if run on systems capable of processing several parallel computational threads simultaneously. Quake 4 is exactly a game like that.

However, we shouldn’t expect professional workstations to show any terrific results in Quake 4. As we see, systems with dual-core processors yield to those with single-core CPUs because of the lower clock frequency of the former. At this point the graphics drivers are optimized only for systems with two logical CPUs, so the potential of more powerful platforms is not fully utilized by the graphics drivers during parallel calculations.

In fact we could stop our Quake 4 performance discussion right here, if it hadn’t been for one fact. Last week id Software released patch version 1.0.5 for their shooter, where they promised to offer SMP support. In fact, there is nothing surprising about the appearance of this patch. All engines from this game developer have supported SMP for a long time now. However, since there was no real need for this support, it has never been finalized and was simply disabled just in case. Now, they have finally polished it off. So, let’s take a look at the results obtained in Quake 4:

The situation is pretty interesting. We can see how obviously changes the fps rate. However, it was not always the increase. In fact, the performance got faster only in those systems where we had four logical CPUs. For instance in the system built with Xeon 3.6GHz processors the results increased by 27.7%, and in the system with AMD Opteron 275 CPUs – by 25.5%. Note that the platform built with two single-core Opteron 254 CPUs got 5% slower, so that it yielded to the system with dual-core processors in it. As for the system with dual-core Xeon CPUs, the patched Quake 4 refused to run there at all. Looks like the eight logical processors were far beyond its understanding. So, we can say that Quake 4 with patch version 1.0.5 is first of all optimized for systems with four logical CPUs. Dual-processor workstations are not the only type of platforms that might have four logical CPUs. They may also be found in desktop systems built with Pentium Extreme Edition processors.

ScienceMark 2.0

ScienceMark 2.0 benchmark shows the system performance in typical scientific algorithms used in math1ematical modeling tasks.

We have already pointed out before that ScienceMark 2.0 benchmark is optimized for platforms with two logical processors. It means that the calculations in this test are split into two parallel streams and do not load most of the systems we are discussing today to the full extent. Therefore, the system with two Opteron 254 CPUs is ahead of all here.

Mathematica 5.2

Another computational task included into our test session is the math1ematical benchmarking suite called Mathematica that is intended for scientific and engineering tasks. Especially, since the new version 5.2 of this software acquired SMP support.

Workstations built with dual-core processors didn’t reveal any significant advantages in Mathematica 5.2. As we found out having analyzed the results, this suite can load only two logical processors at the same time (though not up to 100%), just like ScienceMark. Therefore, it doesn’t make much practical sense to use platforms capable of processing multiple parallel threads for work with this type of software.

At the same time I would like to point out surprisingly high result obtained on a system built with Xeon 3.6GHz processors. According to the numbers we got, it outperforms a system with Opteron 254 working at 2.8GHz by about 60%. Looks like Mathematica 5.2 favors NetBurst architecture, which boasts one significant advantage: fast ALU working at twice the CPU frequency.

Video Encoding

Most contemporary video codecs support multi-threading. Therefore, the use of dual-processor workstations may be theoretically justified for video encoding tasks. But what do the tests suggest in this respect:

The situation in popular codecs looks very similar to what we have just seen in ScienceMark. And the reasons are about the same. Not so long ago codec developers managed to optimize their products for systems capable of performing two computational threads simultaneously, and we are asking for the four-processor support already… Unfortunately, most codecs are still incapable of loading four processor cores with sufficient amount of work.

At the same time I would like to point out that the title of the fastest system for video content encoding has been won by the dual-Opteron 254 system. Besides processing two computational threads at the same time, it is also built with two fastest CPUs working at 2.8GHz frequency, which are analogs of the desktop AMD Athlon 64 FX-57.

However, some steps towards better optimization of the codecs for multi-processor platforms are still being made. For example, last week a new DiVX version 6.1 came out. Besides some other improvements it promises to deliver higher performance in multi-threaded environments. Unfortunately, DiVX 6.1 appeared incompatible with the current AutoGK interface version, so we couldn’t run the tests the usual way this time. So, in order to include this codec into our today’s test session we resorted to the VirtualDub utility. Unfortunately, we will not be able to compare the performance of the old and the new codec, but we can still get some idea of the platforms’ relative performance when using DiVX 6.1 codec.

According to the diagram above, a lot has changed. Now we have every right to say that DiVX 6.1 works very efficiently not only in systems with two logical processors, but also in systems with four logical processors. In particular, now the system built with two dual-core Opterons runs faster than all Xeon platforms as well as a single-core Opteron 254 platform. As for the dual-core Xeon processors, they are again demonstrating pretty bad results. Their ability to process 8 computational threads at a time cannot make up for the low clock frequency.

Video Editing

During our dual-core processor tests we have already mentioned that Adobe Premiere non-linear video editing system and Adobe After Effects package for visual effects and computer graphics are well-optimized for multi-processor systems. Let’s take a look at the performance of serious workstations in these applications:

Both these applications can make pretty good use of the advantages of multi-processor systems. For instance, a pair of dual-core Opteron CPUs is a lot faster than a pair of single-core Opterons working at much higher clock frequency. As a result, the system with Opteron 275 processors wins the race in Adobe Premiere 1.5. However, a pair of dual-core Xeon processors working at 2.8GHz fall behind a pair of single-core Xeon 3.6GHz. They might be lacking the clock speed. Although if we look at the frequencies of the single-core and dual-core processors from AMD and Intel, the difference will be pretty close. So, there should be some other reason for what we see here. Namely, it might be pretty hard to split the calculations into 8 parallel streams for the dual-processor Xeon 2.8GHz system, so Intel’s dual-core CPUs cannot get loaded to the top of their potential.

Image Editing

As we have just seen, Adobe took the optimization of their software applications for multi-processor platforms very seriously. I wonder if this is also true for Adobe Photoshop, which is one of the most popular graphics editing applications among professionals and amateurs. Let’s find out!

Unfortunately, we will not be able to discuss how this graphics editor is running on a system built with two dual-core Xeon CPUs. This application turned out absolutely startled by the platform with 8 logical processors and reported honestly its inability to comply upon startup. Therefore, the result for the Dual-Core Intel Xeon 2.8GHz you see on the diagram was taken when we ran Photoshop in single-thread mode, which enables automatically in this case.

As for the other results, the dual-core Opteron workstation won this race. Most filters included with Adobe Photoshop support multi-processor systems and get a pretty good performance boost when the calculations split into a few parallel threads.

3ds max 7

3ds max 7 software package doesn’t need any introduction, I suppose. It is a very popular package for professional 3D animation and modeling. We used the SPECapc script to measure the performance of our systems in this application. The script reports two results: first number indicates the performance of the systems in viewports, and the second – the performance during final rendering.

As we know from the previous test sessions, 3ds max 7 doesn’t use multi-threading when working in viewports. Therefore, the best result here is demonstrated by the Opteron 254 based platform.

As for the final rendering, the situation here is completely different. Rendering can be split into parallel tasks very easily. As a result, the systems built with dual-core processors can perform impressively fast. For example, a system built with two dual-core Opteron 275 processors is about 40% faster than a system with two single-core Opteron 254 CPUs during final rendering in 3ds max 7. The advantage of the Xeon (Paxville) based workstation over Xeon (Nocona) based one is about 15%. All in all, the fastest platform in this test appears a dual-processor system with dual-core AMD Opteron CPUs.

Maya 6.5

Maya is another professional graphics package for work with 3D graphics. We tested our system with the help of two scripts: SPECapc in viewports and ZooRender for final rendering speed measurements.

The situation in viewports is exactly the same as we have just seen in 3ds max 7. There is no real need in multi-processor support.

The best result in Maya 6.5 belongs to the workstation based on dual-core Opteron 275 CPUs. As for the dual-core Xeon processors, they again get defeated by their single-core fellows. However, there is a pretty logical explanation to this fact: Maya 6.5 doesn’t know to create more than four computational threads during final rendering. The workstation with dual-core Intel Xeon 2.8GHz offers the application eight virtual CPUs, which Maya cannot load completely.

Lightwave [8]

Lightwave is one more professional 3D graphics application. Here we measured the final rendering speed in two test scenes.

We have already pointed out more than once that final rendering speed in Lightwave depends a lot on the model structure. This proves absolutely true in our today’s test session. However, during the final rendering of both test scenes the result is the same. The first prize is won by the dual-core Opteron CPUs. Dual-core Xeon processors turn into quite an outsider here. It might be the structure of the system bus between the CPU and the chipset (where the memory controller is located) that determined the failure of the Intel processors. The system with two dual-core Xeon CPUs has the bandwidth of 6.4GB/s split between the four physical and eight virtual cores. Therefore, when they all work simultaneously, each core suffers from slow memory subsystem, and even two L2 caches with the overall capacity of 4GB cannot help them out.

CINEBENCH 2003

The special CINEBENCH 2003 test shows how efficient our platforms are in the three-dimensional Cinema 4D software application, which is extremely popular among Mac platform fans.

Systems with dual-core CPUs perform the rendering faster than systems with single-core CPUs. In other words, the results are pretty similar to what we have already seen in 3ds max 7 today. However, despite the fact that the systems with Xeon 3.6GHz and Opteron 254 are only 7% apart, the performance difference between the dual-core platforms is much higher. The workstation with Opteron 275 processors outperforms dual-core Xeon 2.8GHz by over 30%.

As for the work in OpenGL mode, there is no multi-threading supported, so the results are not that interesting for us.

AutoCAD 2006

AutoCAD 2006 is a popular software tool for computer-aided design systems.

Just like the viewport in 3D modeling applications, AutoCAD uses practically no multi-threading. Therefore, the winner here is an Opteron 254 based system, just like in other similar cases we have already discussed today. This system demonstrates high “pure” performance of each logical (and physical) CPU.

FLUENT 6.2.16

In conclusion I would like to discuss the results obtained during our workstation testing in CFD (Computational Fluid Dynamics) system aka FLUENT, which belongs to a group pf powerful computational applications. FLUENT software suite is a very popular “heavy” CFD package. CFD is a computational technology that enables you to study the dynamics of things that flow. Using CFD, you build a computational model that represents a system or device that you want to study. Then you apply the fluid flow physics to this virtual prototype, and the software outputs a prediction of the fluid dynamics. CFD is a sophisticated analysis technique. It not only predicts fluid flow behavior, but also the transfer of heat, mass (such as in perspiration or dissolution), phase change (such as in freezing or boiling), chemical reaction (such as combustion), mechanical movement (such as an impeller turning), and stress or deformation of related solid structures (such as a mast bending in the wind).

First of all, I would like to stress that FLUENT belongs to those few applications that can distribute calculations efficiently between any number of logical processors. Therefore, here we see an indisputable advantage of workstations built with dual-core processors over the systems using their single-core analogs. The unique thing about FLUENT is that dual-core Xeon processors perform very well there. Moreover, dual-core Xeon 2.8GHz managed to outperform dual-core Opteron CPUs in one of the tests, which is a very unusual thing to happen.

Conclusion

Well, we can congratulate AMD. Our detailed testing of high-performance workstations demonstrated a convincing victory of AMD Opteron CPUs in all major tasks typical of systems like that. In fact, we didn’t encounter a single “heavy” task where Intel Xeon based platform could perform a way better than the AMD Opteron one.

However, it is not such an easy task to tell, which AMD platform - a single-core or a dual-core one - would be the best choice. If the application is optimized for dual-thread processing, a system with two single-core CPUs will show better results. In other applications that are capable of splitting the tasks into more than two parallel threads, dual-core processors run faster even though their clock frequency is lower than that of the single-core ones.

Speaking of different types of applications, it would make more sense to split them all into two groups. The first group will include such tasks as video encoding, image editing and some computational tasks. These applications are fitter for systems with fewer logical CPUs but working at higher speeds. In other words, a dual-processor workstation with single-core AMD Opteron processors would be the right choice.

The second group includes software that allows processing a lot of data in multiple parallel threads. For example, these are such tasks as final 3D rendering, non-linear video editing and post processing, and again some computational tasks. For this type of applications you would be better off with a dual-processor system built with dual-core AMD Opteron CPUs, as they will be able to show much better results there.

I would also like to say a few words about the viewports of professional OpenGL applications. As we have already seen, most software packages like that do not use multi-threading at all. Therefore, a single-processor workstation with a single-core CPU working at high clock frequency could be the best choice from the price-to-performance point of view. However, here we should keep in mind that the work in OpenGL applications is usually connected with some other tasks, such as final rendering, for example. So, when you are shopping for an ideal platform, you should still sum up all pros and cons before making the decision.

As for the Intel Xeon processors, unfortunately, I cannot say another really positive about them. The today’s Intel processors for workstations cannot compete with AMD Opteron. Of course, there are some tasks where these guys can show their best, but they are very few. Intel also cannot boast better quality platforms with richer feature set than those designed for AMD. Contemporary mainboards for AMD Opteron based on NVIDIA nForce Professional chipset can be an excellent choice for graphics workstation need due to SLI technology support. Xeon platforms, cannot boast anything like that yet.

Moreover, Intel Xeon CPUs have a few significant drawbacks. Namely, the today’s platforms designed for these CPUs split the system bus bandwidth between all the cores in the system, thus reducing the memory subsystem performance quite tangibly. The second significant drawback of Intel Xeon CPUs is their extremely high heat dissipation and power consumption. As a result, you need more powerful and expensive cooling systems and PSUs, because the overall power consumption of the platform is much higher.

New dual-core Intel Xeon processors on Paxville core are not free from these drawbacks. On the contrary, they suffer from them even more than their single-core fellows. Moreover, they are incompatible with the older mainboards, because their power consumption is higher than that of the regular single-core Xeon CPUs.

As for the performance of the dual-core Intel Xeon processors, there is nothing to boast at this time. These CPUs prove really efficient only in those tasks that can be easily split into multiple parallel threads for simultaneous processing and that are not critical to the system bus bandwidth. According to our test results, there are very few applications that meet these requirements. So, it looks like dual-core Intel Xeon processors are still performing purely representational functions being just a formal competitor to dual-core AMD Opteron.

In addition, I would like to take pokes at the developers of professional software tools. As we see, all this software is quite well optimized for two parallel threads. There are quite a lot of programs that can use even four simultaneous threads. However, when it comes to eight logical CPUs, there are just a couple applications that prove capable of handling that much power. It is quite natural, as 8-processor systems have never been available in the workstation market. But times have changed. Two dual-core Intel Xeon CPUs supporting Hyper-Threading technology allow processing 8 threads at the same time. And in most cases dual-core Xeon CPUs fail to show what they are capable of because of the lack of appropriate software support.

I sincerely hope that things are going to change in the future. Because the Intel Xeon (Paxvill) processors we have tested today will soon be followed by faster CPUs on the same architecture. And these upcoming dual-core processors will be free from the drawbacks we have encountered in the today’s Xeons, which will make them much more suitable for workstations. These processors are expected to appear in Q1 2006 already and are known as Dempsey. They will be based on 65nm cores, will feature lower power consumption and heat dissipation on the one hand and higher clock frequencies (up to 3.8GHz) on the other. Moreover, they will acquire a faster front side bus and new chipsets to support them featuring an individual bus for each CPU in the system.

So, despite the today’s evident fiasco, the future looks pretty rosy for new Intel workstation CPUs.