Performance Monsters from AMD: Dual-Core CPUs in a Dual-Processor Workstation

Today we are looking at the dual-processor workstation platform built on two dual-core AMD Opteron 275 CPUs, ASUS K8N-DL mainboard and NVIDIA Quadro FX 4500 graphics card.

by Ilya Gavrichenkov
09/16/2005 | 03:26 PM

The remarkable thing about the year 2005 is that this is when dual-core processors appeared. Both major CPU developers, AMD and Intel, introduced their dual-core architectures, which immediately found their way into the server solutions as well as into the desktop systems. New architectures aroused great response from the users, because further progress in the CPU field is most likely to go towards multiple-core architectures.

 

Of course, we have already paid due attention to dual-core architectures. There are a few reviews on our site already that go into all the details about the dual-core processors from AMD and Intel, such as Athlon 64 X2 and Pentium D. So, it looked like there is not so much to talk about before the new dual-core processor models get released. However, we still managed to find a very interesting topic for discussion related to multi-core CPU architectures. Namely, we decided to take a look at the way dual-core CPUs will behave in a dual-processor system. In other words, we decided to resort to reconnaissance in force and find out the practical efficiency of a quad-core system today.

During our dual-core processor tests we have already come across the situations when contemporary software applications turn out unable to take good advantage of multi-threading, so that the increased number of execution cores inside a single CPU doesn’t lead to any tangible performance improvement. We would expect contemporary software to be even less successful in loading four cores at a time. Even though Intel has been pushing the “virtual dual-core architecture” into the market for quite a while now through their Hyper-Threading technology, far not all the software developers create even two parallel computational threads. Four parallel computational threads in a single program should be an even rarer occasion.

Nevertheless, it is very easy to assemble a dual-processor system with dual-core CPUs nowadays. Intel is currently offering only desktop dual-core processors, and they intend to introduce dual-core architecture in Xeon processor family only by the very end of this year. AMD, however, is already supplying dual-core solutions for servers and workstations. AMD Opteron CPUs with two cores are already a part of 1XX, 2XX and 8XX processor families. It means that AMD already offers dual-processors solutions for single-, dual- and quad-processor server and workstation systems.

For our research on the features and performance of a quad-core platform we decided to assemble a dual-processor system based on two AMD Opteron 2XX CPUs. Of course, if we intend to compare this dual-CPU system against the high-end desktop solutions, it should be a workstation, not a server. Luckily, there are quite a few feature-rich dual-processor mainboards in the market today that could be used as a basis for a dual-Opteron workstation. These mainboards are built around professional chipsets for K8 processor family, NVIDIA nForce Professional in the first place. The leading mainboard vendors are currently offering high-end products on this set of core logic with the PCI Express bus support. This is exactly the mainboard you might want to use for your dual-processor quad-core workstation.

Of course, it doesn’t make any sense to test a dual-processor system with dual-core CPUs in applications that do not support multi-threading. That is why today we are going to pay special attention to the applications that do support multi-threading and are typical for workstations. Therefore, there will be no gaming tests, where even regular (single-CPU) dual-core systems cannot win any performance gain. Instead we will specifically dwell on video encoding and processing applications, computer-aided design systems (CAD) and 3D modeling and rendering software. This is exactly the reason why we equipped our system with a professional graphics card, specifically designed for work in applications like that.

This way out today’s article will be pursuing multiple goals. Firstly, we will find out the potential of contemporary workstations based on AMD Opteron CPUs. Secondly, we will be able to evaluate the prospects of the quad-core systems in terms of the performance gain they get from the two additional cores in the today’s most complex software packages. And thirdly, we will compare the performance of powerful workstation platforms against the top desktop systems from the high-end price range, which will allow us to conclude how efficient the financial investment into a workstation today is going to be.

Well, now that the objectives have been set, let’ get started.

Building a Workstation

In this section we are going to take a closer look at all hardware components used for a workstation based on two dual-core AMD Opteron processors.

Mainboard

The basis of our dual-processor platform will be ASUS K8N-DL mainboard. This solution is based on NVIDIA nForce Professional chipset. It supports not only two Socket 940 processors, but also a PCI Express x16 graphics bus, which makes this platform ideal for a high-performance workstation. At the same time, ASUS K8N-DL doesn’t boast any superior “extras”, which value would be pretty doubtful in our case, and thus it turns into a very attractive solution from the price standpoint. Moreover, this mainboard also offers some overclocking-friendly features, which is a very rare thing for solutions of this kind.

The formal specifications of this platform look as follows:

ASUS K8N-DL

CPU

Dual AMD Opteron for Socket 940

Chipset

NVIDIA nForce Professional

HyperTransport bus

1 GHz

Clock generator
frequency

200-400MHz (with 1MHz increment)

Overclocking friendly
functions

Independently adjustable PCI Express bus frequency;
Adjustable Vcore, Vchipset, Vmem.
Adjustable HyperTransport bus voltage.

Memory

6 DDR DIMM slots for dual-channel Registered DDR SDRAM

PCI Express slots

1 x PCI Express x16
1 x PCI Express x1

PCI expansion slots

2

USB 2.0 ports

10 (4 – on the rear panel)

IEEE1394 ports

2
(1 – on the rear panel,
implemented by Texas Instruments TSB43AB22A controller)

ATA-100/133

2 ATA-133 channels (in the chipset)

Serial ATA

4 Serial ATA-150 channels controller
(with RAID support, in the chipset)
4 Serial ATA-150 channels
(with RAID support,
implemented by Silicon Image Sil3114R controller)

ATA RAID support

RAID 0, 1, 0+1 supported in the chipset
RAID 0, 1, 0+1, 5 supported by Silicon Image Sil3114R controller

Integrated sound

8-channel AC97 Realtek ALC850 codec

Integrated network

Gigabit Ethernet (BROADCOM BMC5751 controller)

Additional features

None

BIOS

Award BIOS v6.00PG

Form-factor

ATX, 305mm x 267mm

Before we pass over to the technical details and peculiarities of the ASUS K8N-DL mainboard, I have to say a few words about the NVIDIA nForce Professional chipset, because we haven’t yet dealt with this core logic before. To tell the truth, there is nothing principally new about it: this professional chipset solution is designed from the same building blocks as NVIDIA nForce4 Ultra. The thing is that K8 architecture doesn’t require anything special from the chipset to support multi-processor systems. Opteron CPUs are equipped with three independent HyperTransport busses, which can be used to connect the CPU with the chipset bridges as well as to connect the CPUs with one another. Therefore, chipset bridges do not need any additional processor busses.

Moreover, this dual-processor system architecture allows building a system chipset from two chips with no direct connection between them. These chips may be connected with one another via the CPUs and the same HyperTransport bus.

NVIDIA engineers took into account this feature and implemented it exactly this way when designing their nForce Professional core logic. This chipset consists of two micro-chips: nForce Professional 2200 chip similar to nForce4 Ultra MCP (media and communications processor) and nForce Professional 2050, which is very similar to the previous one but has limited functionality. These two chips can be used together or separately. Their combination defines the features of each particular mainboard. In order to distinguish more clearly between nForce Professional 2200 and nForce Professional 2050 let me offer you a comparative chart where we will list their features side by side with those of nForce4 Ultra/SLI MCP:

nForce4
SLI/Ultra

nForce
Professional 2200

nForce
Professional 2050

Processor bus

1GHz HyperTransport

1GHz HyperTransport

1GHz HyperTransport

PCI Express
busses

20 independently
configured lanes

20 independently
configured lanes

1 x PCI Express x16
3 x PCI Express x1

SLI support

Yes

Yes

None

USB 2.0

10 ports

10 ports

None

Serial ATA

3Gbit/s

3Gbit/s

3Gbit/s

Serial ATA ports

4

4

4

Parallel ATA
channels

2

2

0

RAID support

0, 1, 0+1

0, 1, 0+1

0, 1, 0+1

Gigabit Ethernet

Yes

Yes

Yes

Secure Networking Engine

Yes

Yes

Yes

NVIDIA Firewall 2.0

Yes

Yes

Yes

PCI (32bit) bus

Yes

Yes

None

Integrated sound

8-channel AC97

8-channel AC97

None

As we can see, nForce Professional 2200 is similar to nForce4 Ultra/SLI, and nForce Professional 2050 with very limited functionality can only perform as a companion chip.

From here we can derive typical configurations of nForce Professional based systems. A dual-processor workstation based on nForce Professional 2200 chip can offer pretty much the same functionality as any nForce4 Ultra based mainboard designed for computer enthusiasts. The dual-processor configurations support is implemented solely through the AMD Opteron CPUs, which boast three HyperTransport busses instead of only one by Athlon 64.

If the mainboard developer decides to expand the functionality of his product, he may pair the nForce professional 2200 chip with an nForce Professional 2050 chip. In this case the mainboard will support more Serial ATA ports (8) and two Gigabit Ethernet controllers instead of one. However, the most exciting thing about a system like that is the possibility to lay out two fully-fledged PCI Express x16 busses on the same mainboard PCB. This feature can come in handy for building SLI configurations, which are also supported by NVIDIA’s professional Quadro FX graphics card family. Moreover, in this case the SLI mode will be more efficient than the SLI mode in nForce4 SLI based systems, where the second graphics card of the pair is installed into a PCI Express bus cut down to x8.

Although we have just discussed the architecture of dual-processor NVIDIA nForce Professional based systems, you can apply the same approach to building a single- or quad-processor platform. NVIDIA nForce Professional is a pretty universal solution from this standpoint.

ASUS K8N-DL mainboard uses the simplest modification of the nForce Professional chipset. So, there is only one MCP on the board and it is none other but nForce professional 2200. Note that all diagnostic utilities recognize NVIDIA nForce Professional 2200 MCP and NVIDIA nForce4 as one and the same solution. Don’t you think it is a great argument proving the identity of these two chips?

This “universal” chip determines most features of the ASUS K8N-DL mainboard. This chip is responsible for the implementation of the PCI Express x16 slot, PCI Express x1 slot, two 32-bit PCI slots, four Serial ATA-II ports (out of 8), Parallel ATA ports, ten USB 2.0 ports and integrated sound, with an 8-channel Realtek ALC850 codec in the analog part.

There are a few additional controllers on the board, though. For example, ASUS engineers didn’t want to use the network controller integrated into the nForce Professional 2200 chipset. Instead, they integrated a Gigabit network controller from BROADCOM – PCI Express x1 BCM5751. Besides, ASUS K8N-DL is equipped with the additional Serial ATA RAID controller – Silicon Image SiI3114R, which implements the remaining four Serial ATA-I ports supporting RAID 0, 1, 0+1 and 5. The mainboard from ASUS also features two IEEE 1394 ports implemented via the Texas Instruments TSB43AB22A chip.

The most interesting features of ASUS K8N-DL mainboard are connected with the processors and memory support, of course. So, let’s dwell here for a while. The mainboard is equipped with two Socket 940 connectors, which allow installing one or two Opteron processors. The latest BIOS version at the time of this review – version 1004 – allows installing regular Opteron CPUs as well as dual-core ones.

There are memory slots next to both processor sockets: four DIMM slots next to the first one and two more DIMM slots next to the second one. Since Opteron processors feature a memory controller integrated into the CPU core, each of the CPUs in the dual-processor configuration can work with its own memory. However thanks to NUMA technology (Non-Uniform Memory Architecture) the two CPUs operate within the single address space. In other words, each processor can address the memory of the other CPU in the system via the HyperTransport bus between them. And no special tricks are necessary to make it happen. Moreover, each CPU doesn’t really care where the requested data is stored. Although you should definitely keep in mind that if the data is located in the other processor’s memory, the latency of the corresponding operation will be somewhat longer than in case the data were in the processor’s own memory.

The Opteron memory controllers work only with Registered DDR SDRAM. Therefore, large amounts of memory can be supported. ASUS K8N-DL with its 6 DIMM slots supports up to 24GB of memory (if you install 4GB Registered DIMMs, which are just about to appear in the market).

Also, just like in Socket 939 systems, the Opteron memory controller features dual-channel architecture. Therefore, the memory modules in your system should be installed in pairs, only in this case you will get the maximum performance. In case of a dual-processor configuration, you’d better use four identical memory modules and allocate the same amount of memory for each of the system CPUs. This will theoretically allow reaching the memory bandwidth of a quad-channel memory subsystem.

Now that we have listed all the major features of our ASUS K8N-DL mainboard, let’s take a closer look at its PCB layout.


Click to enlarge

To tell the truth, it didn’t strike me as great at first glance. On the other hand, you can hardly expect a dual-processor workstation mainboard to be very conveniently designed for easy system assembly. Firstly, a mainboard like that is packed with much more onboard electronic components than any enthusiast platform, and secondly, workstations are not as frequently upgraded or modified as the enthusiast systems.

ASUS K8N-DL mainboard features slightly bigger PCB than regular ATX mainboards. However, it is at the same time smaller than most dual-core mainboards out there and should fit easily into the system cases designed to support Extended ATX form-factor.

ASUS engineers tried real hard to move as many connectors as possible to the left-hand side of the PCB: away from the processor sockets and closer to the expansion slots. As a result, there are only a 24-pin and 8-pin ATX power connectors and a FDD connector on the right-hand side of the board. Frankly speaking the power supply connectors are located very conveniently. The chipset Serial ATA connectors are located in a pretty OK place: they are moved to the front edge of the PCB right before the expansion slots together with one of the Parallel ATA connectors. In order to prevent the cables from hanging all over the place, ASUS engineers have even turned some of these connectors parallel with the PCB edge.

The placement of all other connectors left us pretty puzzled. One of the biggest drawbacks is the location of some ATA ports on the very left side of the board and placement of the case front panel connectors in the far left corner of the PCB.

During the system assembly we also encountered the following problems. If the graphics card installed into the PCI Express x16 slot features a massive cooling system onboard, it blocks one of the two PCI slots. Moreover, since the chipset cooler sits really close to the graphics slot lock, you will have hard times removing the graphics card from its slot if you have to. You can also face some difficulties when you try to plug in the power for the first processor cooler. The corresponding connector is squeezed between the DIMM slots and the processor socket, which makes it pretty hard to reach.

The funny thing about ASUS K8N-DL PCB design is that it brings back the times when the system was configured by resetting the jumpers. All onboard controllers integrated onto this mainboard can be disabled only by the jumpers and not in the BIOS Setup. You should keep this thing in mind, because once the mainboard has been installed into the system case together will the CPUs and all expansion cards, you will have very hard times reaching for those jumpers you might want to reset.

As for the BIOS Setup of our ASUS K8N-DL, it is very similar to the BIOS Setup of many other ASUS mainboards, even though the features list is different in our case. Among these differences I would like to specifically point out the memory subsystem configuring options in the first place.

Among the common frequency and latency settings of the on-die memory controller, there are quite a few ECC configuration options.

Other than that the BIOS Setup of ASUS K8N-DL features pretty common settings.

Hardware monitoring implemented in ASUS K8N-DL mainboard doesn’t boast anything outstanding. Its only remarkable peculiarity is the ability to monitor and control the temperature and fan rotation speed for each CPU independently. They also offer special Q-Fan technology that allows adjusting both CPU fan rotation speeds depending on the CPU temperature.

The BIOS Setup also offers a few overclocking friendly options. First, the mainboard allows changing the clock generator frequency from the nominal 200MHz up to 400MHz with 1MHz increment.

Also, the mainboard allows adjusting numerous voltages.

The memory voltage can be increased up to 2.9V (with 0.1V increment), the chipset voltage can be raised from the nominal 1.5V to 1.8V with 0.1V increment, the HyperTransport voltage can also be raised above the nominal 1.2V to 1.35V with 0.05V increment. As for the processors core voltage, the choice is quite limited. The BIOS Setup only allows either setting the nominal Vcore value or raising it by 50mV.

However, the mere fact that this dual-processor workstation platform offers any overclocking friendly features at all it already very valuable. Although we couldn’t really enjoy all this overclocking fun to the full extent. The thing is that we only had the top dual-core Opteron processor models at our disposal that worked at 2.2GHz nominal frequency. Since ASUS K8N-DL doesn’t allow reducing the CPU clock frequency multiplier, the overclocking is limited by the processor potential in our case. The best we managed to achieve without losing the system stability was the 212MHz clock generator frequency. The CPUs worked at 2.33GHz in this case. However, even this relatively modest result indicates that ASUS K8N-DL is suitable for overclocking. So, it can turn into a very attractive platform when used with younger AMD Opteron models with relatively low clock rates.

CPUs

AMD offers Opteron 2XX processors for the dual-CPU system configurations. Among them there are three dual-core models today: Opteron 265 working at 1.8GHz, Opteron 270 working at 2.0GHz and Opteron 275 working at 2.2GHz. We were trying to get a pair of the latter for our today’s tests.

Luckily, our efforts were not vain: we managed to get hold of two dual-core Opteron CPUs working at the actual 2.2GHz frequency. However, these were not the 275 models, but the CPUs from a more heavyweight category: Opteron 875. Although it doesn’t really matter for our today’s investigation, whether we have Opteron 275 or Opteron 875 installed into our system, because the only difference between them is the ability of the “heavyweights” to work in multi-processor systems. As for the dual-processors systems, these CPUs perform identically.

As you can see from the picture above, the OPN of the Opteron CPUs we got was OSA875FKA6BS. Unfortunately, we didn’t find this particular model specification on AMD’s official web-site, however, if we read and decode this marking, we will get the following details:

Dual-Core AMD Opteron™ Processor Details

CPU ID

Dual-Core AMD Opteron™ Processor Model 875

Model

875

Ordering Parts Number (OPN)

OSA875FKA6BS

Stepping

E1

Frequency

2.2GHz

HyperTransport Technology Speed

1000MHz

Integrated Memory Controller

2.2GHz

Core Voltage

1.35V

Case Temperature

71° C

Wattage

95.0W

L2 Cache Size

2 x 1 MB

L2 Cache Speed

2.2GHz

Manuf. Technology

.09 micron SOI

Socket

Socket 940

Amperage

66.1 A

Warranty

1 year

This Opteron processor has the maximum working frequency for a dual-core solution: 2.2GHz. In other words, the CPUs that we will be testing today’ are the fastest solutions of the kind. Note that dual-core Athlon 64 X2 CPUs for desktop systems have already reached the frequencies higher than that. For instance, Athlon 64 X2 4800+ and Athlon 64 X2 4600+ work at 2.4GHz. No doubt that AMD doesn’t hunt for higher clock rates with these processors, because they prefer to keep these server and workstation solutions within the standard thermal range of 95W. As for the typical heat dissipation of the 2.4GHz CPUs from AMD, it rests around 110W today. Server and workstation platforms cannot heat up that much.

The diagnostic CPU-Z utility reports the following details about our Opteron 875 processor:

This utility recognizes correctly the 1MB L2 cache of each processor core. This Opteron CPU is based on E1 core revision. In other words, this processor belongs to the E core stepping CPU family and hence supports SSE3 instructions.

All in all, the dual-core Opteron architecture is very similar to that of the dual-core Athlon 64 X2 processor. The difference is in the memory controller and the number of HyperTransport busses.

Even though the Opteron processors have a dual-channel memory controller, they work only with Registered DDR DIMM modules. As a result, they allow more system memory than their desktop counterparts. However, you have to sacrifice speed for the sake of larger memory capacity, because Registered memory modules are usually slower and do not support very aggressive latency settings.

Memory

So, we needed Registered DDR SDRAM modules. Right now there are a lot of Registered DDR400 modules in the market with different capacities. For our tests we used HyperX memory modules from Kingston.

These memory kits with the KRX3200AK2/1G part number consist of a pair of 184-pin Registered DDR400 DIMM modules supporting ECC. Each kit is 1GB big, i.e. it consists of two 512MB modules. The nominal frequency this memory works at is 400MHz and the voltage is 2.6V. This memory can work with 2.5-3-3-6 timing settings. At least this is what the specification claims. The SPD of these memory modules contains slightly different info: the timing settings listed there are a little bit less aggressive: 3-3-3-8.

During our practical tests we managed to prove that Kingston HyperX KRX3200AK2/1G memory modules work stably and reliably at 2.5-3-3-6 timing settings that is why all our dual-Opteron workstation testing will be carried out with these particular timings.

Note that although these timing settings of 2.5-3-3-6 could look very poor for the regular unbuffered DDR400 SDRAM, they are pretty typical for the today’s registered memory solutions. It is very hard to find Registered memory with more aggressive timing settings today. So, I don’t think anybody will accuse us of having made the wrong choice on the memory for our workstation.

During our workstation tests we installed two sets of Kingston HyperX KRX3200AK2/1G memory, i.e. the total of 4 modules, 512MB each. So the overall memory capacity on our platform was 2GB.

The system memory was distributed between the two CPUs evenly: two memory modules were installed into the DIMM slots assigned to the first CPU, and the other two – into the DIMM slots of the second CPU. This configuration allowed us to use the symmetric implementation of the NUMA technology. So, our workstation had four independent DDR400 SDRAM channels, which provided 12.8GB/s theoretical peak bandwidth of the memory subsystem.

Graphics Card

Of course, a good workstation requires a good graphics card. If you are planning to use your dual-processor platform for CAD applications and 3D modeling tasks, it would be ideologically wrong to equip it with a regular 3D gaming graphics accelerator. Especially, since the leading 3D graphics solution developers offer their professional products for these specific needs.

For our test session we managed to borrow a graphics card based on the latest and greatest professional GPU from NVIDIA – Quadro FX 4500. It is based on the G70 architecture. This card also became a part of our dual-processor workstation.

 

Since NVIDIA Quadro FX 4500 graphics processor is based on the same architecture as NVIDIA GeForce 7800 GTX, their features have a lot in common. However, professional graphics cards work at higher core clock frequency and hence provide better performance. The main part of the Quadro FX 4500 core works at 550MHz, which the GeForce 7800 GTX chip works at 430MHz. The vertex processor unit of both solutions works at the same frequency of 470MHz.

The graphics memory of a professional and a gaming graphics card also works at different frequencies. The GDDR3 memory used on the gaming graphics accelerator works at 1200MHz frequency, while the graphics memory on Quadro FX 4500 works at a lower frequency of 1050MHz. In both cases the memory communicates with the GPU via the 256-bit bus, however the contemporary GeForce 7800 GTX cards feature only 256MB of memory, while the professional Quadro FX 4500 boasts a 512MB frame buffer.

Higher GPU frequencies and the desire to make the card as quiet as possible resulted into an absolutely unique cooling system design on Quadro FX 4500.This cooler is relatively big, so that it takes up the space allocated for the next expansion slot on the mainboard. This cooler features four heatpipes and an 80mm fan. As a result, the cooling efficiency has got significantly better: it increased by 50% compared with the cooling solutions of the previous-generation graphics cards. Moreover, this cooling system generates 30% less noise because the fan doesn’t have to speed up to its maximum rpm rate.

NVIDIA Quadro FX 4500 features two DVI Outs and supports SLI technology. So extreme professionals (or professional extreme fans?) will have the chance to enjoy even greater performance than a single NVIDIA Quadro FX 4500 can deliver. Moreover, they will have at their disposal such features as 32x full screen antialiasing and quad-monitor support.

First Boot-Up

Well, now that we are done with the description of our workstation components and assembly tips, let’s move towards some practice. Let’s see how all these hardware monsters work together. When we installed the CPUs, processor coolers, system memory and the graphics card into our mainboard, our system looked as follows:

Frankly speaking, we didn’t expect to face any problems with booting this system. Especially since we used very high-quality expensive components to build it up. Moreover, we would expect the hardware manufacturers to pay special attention to the reliability, stability and compatibility of their server and workstation equipment. However, our hopes didn’t come true.

The system refused to complete the POST stage and hung right after the video initialization, so that we couldn’t even access the BIOS Setup.

We also had some difficulties with finding out what was causing this problem. Our lab POST-controller card with the PCI interface simply couldn’t fit onto the mainboard. The Quadro FX 4500 cooling system took over the entire space over the first PCI slot and didn’t let us install any expansion cards with standing out elements on the back side of the PCB into the second PCI slot. Unfortunately, our POST-controller card had its LED indicator on the back of the PCB, so we had to temporarily replace the Quadro FX 4500 graphics cards with a different graphics solution featuring smaller cooling system.

By the way, the ASUS K8N-DL mainboard we used does display some POST codes on the monitor during system boot-up, however, we couldn’t really take advantage of it, because the system would hang even before the codes started displaying.

Once we faced all that hassle, we first of all supposed that the problem might be in the BIOS. We received the mainboard with the BIOS version 1003, and as far as we could understand this BIOS version didn’t have fully-functional support of dual-core processors. We found a newer BIOS version 1004 on ASUS’ web-site. But how should we reflash the BIOS if the mainboard freezes in the very beginning of the POST stage?

Luckily, this was an easy one to resolve. ASUS K8N-DL mainboard with the BIOS version 1003 doesn’t behave the right way only if there are two dual-core CPUs installed. Four cores turned out to be too much for this BIOS version to handle. If you leave only one dual-core CPU in, the mainboard will boot up successfully. Once we discovered this trick we reflashed the new BIOS version 1004, which proved capable of working fine with two dual-core CPUs of our system.

Having reflashed the BIOS, we returned NVIDIA Quadro FX 4500 graphics card back in its place and installed the operating system, which immediately recognized all four processor cores.

To confirm our success and get some idea of how fast a dual-processor system built with two dual-core CPUs could be, we ran the synthetic SiSoft Sandra 2005 SP2 benchmark. The impressive results didn’t keep us waiting for long:

Judging by the numbers, our workstation boasts enormous potential. We were especially pleased with the memory bandwidth tests. Here you go: this is how NUMA really works. As for the arithmetic benchmarks results, they will surely give AMD fans a few moments of pride and pleasing satisfaction. The algorithms of SiSoft Sandra 2005 benchmarks can be very well paralleled. That is why our system built with dual-core processors can be even faster than the regular quad-CPU platforms, not to mention dual-CPU platforms based on Intel Xeon processors.

Well, this is all great, but I wouldn’t rely on the SiSoft Sandra 2005 SP2 benchmark set as the primary reference point. The real field testing is just about to begin.

Testbed and Methods

We will compare our dual-processor workstation based on two dual-core Opteron CPUs against the today’s latest and most expensive desktop systems equipped with 2GB of system memory and professional NVIDIA Quadro FX 4500 graphics accelerator. So, we compared the performance of the dual-processor platform with that of the systems based on the today’s most expensive CPUs from the Athlon 64 X2, Athlon 64 FX, Pentium Extreme Edition and Pentium 4 Extreme Edition processor families. Moreover, we also tested our ASUS K8N-DL based platform with a single dual-core Opteron CPU installed, so that we could draw some conclusions about the performance changes brought by the second dual-core processor working at the same clock rate.

So, here is a list of hardware components that we used in our tests today:

Note that we had to give up the idea of running our tests in 64-bit Windows XP Professional x64 Edition, because many professional applications still boast very limited compatibility with this operating system.

Performance

Before we pass over to discussing the results of our tests, I would like to say a few words about the marking we are going to use on our diagrams hereinafter. Even though we actually tested AMD Opteron 875 processors, we will mark the results of our workstation in the “correct terms”. In other words, we will call it “Dual Opteron 275” The performance results shown by our workstation platform with a single processor installed will be marked as “Single Opteron 175”.

Futuremark PCMark05

We decided to start our test session with a popular PCMark05 from Futuremark. Although we cannot regard it as a professional application, it will still be very interesting to take a look at the dual-processor system performance there. Especially, since this benchmark has some scenarios running four parallel threads at a time. And this is exactly what we need to load our workstation based on two dual-core CPUs to the full extent.

As we have expected, the processor test shows the indisputable advantage of the dual-processor system over the single-processor one. The workstation with two Opteron 275 processors outperforms the same platform with only one Opteron 175 processor by about 25%. As a result, it proves the fastest system of all testing participants in PCMark05.

In fact, we should also point out the high result demonstrated by the Pentium Extreme Edition 840 based platform in this test. This CPU can also process four computational threads at a time as it contains two processor cores, each supporting virtual multi-core technology – Hyper-Threading.

The memory test from the same PCMark05 benchmarking set reveals a rather strange result, I should say. As we can see our dual-processor workstation based on dual-core Opteron CPUs runs quite low. It wouldn’t be surprising at all, if it didn’t support NUMA technology, because the Registered memory modules are initially slower than their unbuffered fellows. However, in this case the dual-processor system has four memory access channels, and should theoretically be at least as fast as the same single-processor platform.

However, there appears to be a pretty logical explanation to this seemingly strange behavior of our Dual Opteron 275. Only if you address the memory in two independent threads, you can really benefit from the NUMA technology. The PCMark05 memory test is single-threaded, that is why it is not really representative of what the memory subsystem of our dual-processor Opteron workstation is capable of.

ScienceMark 2.0

ScienceMark 2.0 reveals the performance when processing typical scientific algorithms involved in math1ematical modeling tasks.

The obtained results are quite confusing. Of course, this tests runs evidently faster on dual-core processors than on single-core ones. However, once we add a second dual-core CPU the results do not get any better. However, we managed to find a logical explanation in this case, too. ScienceMark is optimized for dual-core systems in such a way that the parallel calculations are processed in two threads. However, our dual-processor system features four computational cores and can show the best results only in case there are four simultaneous computational threads running. So, there should be deeper optimization of the ScienceMark 2.0 test, then we will see real advantage of the second dual-core CPU for our system.

Video Encoding

When we tested the desktop dual-core processors, we pointed out that contemporary video codecs do support multi-threading. Let’s take a look how a more serious system with two dual-core CPUs could speed up the video encoding process.

The picture we observe in the popular codecs reminds us of the ScienceMark results we have just discussed. And the reasons are pretty much the same. Not so long ago the codec developers optimized their products for systems capable of processing two computational threads at a time, and now we are asking them to support quad-processor systems… Unfortunately, most codec are still unable to load four processor cores with work at a time.

Video Editing

During our dual-core processor tests we have already mentioned that Adobe Premiere non-linear video editing system and Adobe After Effects package for visual effects and computer graphics are well-optimized for multi-processor systems. Let’s take a look at the performance of our Dual Opteron 275 workstation in these applications:

If you deal with non-linear video editing and video effects application, then you should definitely consider getting a dual-processor system on dual-core Opteron CPUs. A workstation like that, featuring four computational cores, offers significantly better performance in Adobe Premiere Pro and Adobe After Effects. With the second dual-core CPU in your system you will get about 71% performance increase in Adobe Premiere Pro and 14% in Adobe After Effects. Moreover, the Dual Opteron 275 workstation gets very far ahead of all platforms built around top-of-the-line desktop CPUs.

Image Editing

As we have just seen, Adobe took the optimization of their software applications for multi-processor platforms very seriously. I wonder if this is also true for Adobe Photoshop, which is one of the most popular graphics editing applications among professionals and amateurs. Let’s find out!

According to the obtained results, Photoshop also benefits a lot from additional computational resources: the system run noticeably faster, just like in video editing applications. Our platform with two dual-core processors proves about 20% faster than the same system with only one single dual-core CPU. As a result our Dual Opteron 275 platform gets far ahead of the fastest platforms using desktop processor solutions.

3ds max 7

3ds max 7 software package doesn’t need any introduction, I suppose. It is a very popular package for professional 3D animation and modeling. We used the SPECapc script to measure the performance of our systems in this application. The script reports two results: first number indicates the performance of the systems in viewports, and the second – the performance during final rendering.

As we can see, 3ds max 7 doesn’t use any multi-threading in viewports. That is why the winner here is the single-processor system with a single-core AMD Athlon 64 FX-57 CPU working at the actual clock frequency of 2.8GHz.

However, when it comes to final rendering, the number of processor cores does matter a lot. Dual Opteron 275 system runs 47% faster than the same system with only one dual-core processor and becomes an unreachable leader in this race.

Maya 6.5

Maya is another professional graphics package for 3D graphics professionals. We tested our system with the help of two scripts: SPECapc in viewports and ZOORender for final rendering speed measurements.

The situation in viewports is exactly the same as in 3ds max 7. The advantages of multi-processor architecture are not used at all.

But during the final rendering the performance difference between a single-processor and a dual-processor system based on dual-core Opteron CPUs is simply tremendous. With the second dual-core CPU in the system, your workstation will perform Maya final rendering 93% faster!

Lightwave [8]

Lightwave is one more professional 3D graphics application. Here we measured the final rendering speed in two test scenes.

We have already pointed out more than once that final rendering speed in Lightwave depends a lot on the scene structure. This proves absolutely true in our today’s test session. Nevertheless, our workstation performs very well in both scenes and retains the leading position all the time.

CINEBENCH 2003

The special CINEBENCH 2003 test shows how efficient our platforms are in the three-dimensional Cinema 4D software application, which is extremely popular among Mac platform fans.

The dual-processor system performs the rendering tasks very fast. It is 74% faster than the identical single-processor platform.

As for the OpenGL performance, Dual Opteron 275 performs quite modestly here, because the application doesn’t support multi-threading.

AutoCAD 2006

AutoCAD 2006 is a popular software tool for computer-aided design systems.

Just like the wireframe mode in 3D modeling applications, AutoCAD doesn’t support multi-threading, so the winner here appears AMD Athlon 64 FX-57.

SolidWorks 2005

SolidWorks 2005 is an integrated environment for component 3D modeling, construction and graphic designing. We resorted to the SPECapc script again to perform the tests here.

Just like AutoCAD, SolidWorks 2005 doesn’t support multi-processor architectures. So, our tests prove that if you specialize on applications like this, you will not benefit from dual-core CPUs and multi-processor systems.

Conclusion

The results obtained on a dual-processor workstation built on dual-core AMD Opteron 275 CPUs suggest a lot of interesting conclusions. But before we start making statements, I would like to express my admiration for AMD that managed to design and bring into mass production dual-core server and workstation processors ahead of Intel. And the introduction of dual-core architectures is truly beneficial for this market. As a result dual-processor systems built with dual-core AMD Opteron CPU can boast unprecedented high performance among the platform of this kind.

And as for the actual conclusions, we could say the following. Despite the fact that systems similar to the one we have tested today have huge computational potential, the users may not be able to take good advantage of it all the time. The situation is much simpler with multi-processor server platforms, however. Since all server applications are multi-threaded by nature, the use of two dual-core CPUs instead of two single-core ones will almost always have a positive effect on the system performance. As for the workstations, you should have a good idea of what type of tasks this workstation is going to be used for.

As we have just seen during our tests session, there are a lot of professional applications that are not optimized for multi-processor systems and some of them are only optimized for systems with two execution cores maximum. As for the applications that can really use a pair of dual-core Opteron processors to their advantage, they are not that widely spread yet. Among them I should definitely mention 3D rendering applications as well as professional video and image editing tools. In all other cases you cannot benefit that greatly from a system with four execution cores.

For example, computer-aided design systems, as well as viewport mode in 3D modeling tasks use only one single computational thread. That is why you can get the best performance if you go for one single-core processor and a powerful professional graphics card.

The video encoding applications are also a good example here. Most video codecs can work fast only on dual-core systems. Once you add another dual-core CPU, nothing really happens and its potential gets wasted. The same is true for some other applications, too. In these cases it doesn’t make much sense to go for a dual-processor platform with Opteron 275 type of processors: the most optimal choice you can make here would be a dual-processor system with two single core CPUs or a single processor system with one dual-core processor.

When we tested the first dual-core desktop processors, we already shared our frustration that there were not that many applications available that would load both cores evenly and efficiently. The results of our today’s Dual Opteron 275 workstation platform also left us pretty sad. There are very few applications that can really use the power of this platform. So it looks like the software developers have a long way to go before quad-core processors will go mass.