by Ilya Gavrichenkov
05/18/2007 | 06:57 PM
Multi-core paradigm has become one of realities of our life. The users had a chance to enjoy the advantages of CPUs with two computational cores and as a result the sales of these processors keep growing day by day. At the same time, it is absolutely clear that we haven’t yet hit the apogee of the core multiplication process. Quad-core CPUs start pushing their way into the market and at this time they are positioned for higher priced systems for computer enthusiasts. The ever growing interest to platforms optimized for multi-threaded performance inspired CPU manufacturers to introduce high-performance dual-socket computer systems featuring two multi-core processors inside.
AMD Company probed the path in this direction with their Quad FX platform built with two dual-core Athlon 64 FX processors. The launch of this platform was primarily stimulated by AMD’s desire to respond to quad-processor solutions from their competitor, because technological difficulties wouldn’t let AMD release their own solutions featuring four processor dies in a single package at that time. However, our tests revealed that at this time AMD Quad FX platform yields a lot to Intel quad-core processors, such as Core 2 Quad and Core 2 Extreme, from all standpoints, including the performance. Although despite this fact AMD managed to impress its fans with the prospects for future evolution of their Quad FX platform. Company representatives promised that dual-socket desktop platform would continue evolving after the prospective quad-core processors on K10 micro-architecture come out. It means that Quad FX platform will eventually allow building systems with the total of eight processor cores in them.
Of course, Intel couldn’t leave this initiative unattended. The micro-processor giant strengthening its positions day by day after the launch of Core micro-architecture, couldn’t disregard the competitor’s ability to introduce a desktop platform with two quad-core processors in it. The “refreshed” Quad FX platform will be known as FASN8 and will feature Agena FX codenamed CPU. And even though it is scheduled to launch only in Q3, Intel began pushing its alternative solution in January already. The first demonstration of the concept system equipped with two quad-core processors on Core micro-architecture was performed at CES 2007, where it was codenamed V8. These days Intel has already mastered mass production of all components for this platform and of course, we couldn’t help getting our hands on them right away.
Today we are going to discuss the first desktop Intel V8 system that the manufacturer positions as an eight-core desktop PC for media content creation.
As you may remember, AMD developed special processors for their Quad FX platform and also had core logic and mainboard developers introduce their special solutions for this platform. Unlike AMD, Intel took another much simpler route. V8 platform physically is none other but a traditional workstation based on quad-core Xeon Clowertown processors installed into a mainboard on i5000X Greencreek chipset. From this standpoint, Intel’s solution is not too innovative. Apple has long been offering systems built using the same principles, which are known as Mac Pro. By the way, a new solution using a pair of high-speed quad-core processors on Core micro-architecture has recently appeared in Mac Pro product line.
Intel recommends using top-of-the-line Xeon processors as a basis for their enthusiastic platform. These CPUs are also known as Clowertown. They are based on Core micro-architecture and feature quad-core internal structure. In other words, they are very similar to quad-core Core 2 Quad and Core 2 Extreme processors aka Kentsfield. However, since Xeon processors are intended to work in servers and high-performance workstations, their socket design is slightly different from that of the desktop CPUs – they are designed in LGA 771 form-factor. These processors can work in dual-socket systems and require special chipsets and as a result, special mainboards.
Xeon has been chosen for this advanced platform not only because it can work in dual-socket systems, unlike Core 2 Extreme. One more reason lies with the fact that as of today Xeon processor family managed to outpace their “desktop” counterparts in many essential parameters, including clock speeds. The top Xeon (Clowertown) model marked as X5365 works at 3.0GHz clock speed and uses 1333MHz bus that is scheduled to arrive into desktop systems only in summer. These are exactly the CPUs that were used in the Intel V8 eight-core personal computer for media content creation we are talking about today.
The diagnostic CPU-Z utility reports the following info about these processors:
Just like the Core 2 Quad and Core 2 Extreme processors you are more familiar with, the quad-core Xeon CPUs consist of two dual-core dies put into the same package. Each of these dual-core halves features a shared 4MB L2 cache for both cores. So, quad-core Xeon processor has two L2 caches with the total size of 8MB.
Here I would like to remind that unlike its primary competitor, Intel considers this dual-die design of their quad-core processors to be more optimal from the economical standpoint. According to the company reports, the use of components parts with smaller die size increases the production yields by 20% and hence reduces the processors production cost by 12%. However, this approach also has another side to it. Ensuring cache coherency in one quad-core processor with two separate L2 caches will load the processor bus very heavily, because the data between them is transferred through system memory.
The complete list of Xeon X5365 specifications looks as follows:
Intel Xeon X5365
8MB (2 x 4MB)
Enhanced Intel SpeedStep Technology
Enhanced Halt State (C1E)
Execute Disable Bit (XD)
Intel 64 Technology
Intel Virtualization Technology
As you can see, the stepping of quad-core Xeon Clowertown processors coincides with the Kentsfield stepping, which is another proof that they are close relatives.
You can also notice that the core voltage is pretty high, too. It is higher than Vcore of any other existing processors with Core micro-architecture. As a result, the typical heat dissipation of Xeon X5365 has also grown quite high and hit 150W. This high heat dissipation reminds us of the times of NetBurst at least in the server segment, especially since the TDP of those Xeon processors didn’t exceed 130W. However, they set the TDP to 150W only for the 3GHz model of the quad-core Xeon Clowertown family. Slower CPUs boast lower TDP of only 120W.
Intel suggest using specially designed all-copper coolers to cool these pretty powerful Xeon X5365 processors:
Of course, they are quite heavy that is why they are installed using special backplate retention.
Note that there is no info on Intel’s official web-site today about Xeon X5365 processors. They are currently being shipped in limited quantities exclusively to very few system builders. These quad-core workstation processors working at 3GHz frequency should become widely available in Q3 2007.
Of course, the use of Xeon processors for V8 platforms requires special mainboards. Intel recommends solutions based on i5000X (Greencreek) chipset designed for dual-socket workstations.
The key distinguishing feature of the i5000X core logic set from its predecessors is the two independent processor busses with point-to-point topology in this dual-processor system. In other words, this chipset eliminates the major bottleneck of all previous generation multi-processor systems built on Intel Xeon CPUs that were sitting on the same shared bus. This innovation increased the total theoretical processor bus bandwidth in i5000X based systems to 17GB/s for CPUs with 1066MHz bus and to 21GB/s for CPUs supporting 1333MHz bus.
Moreover, i5000X boasts another powerful tool improving the communication efficiency between the CPUs and the system memory. It features a special buffer also known as Snoop Filter that contains info on the location and acuteness of all data used by the CPUs. Since MESI protocol used to ensure caches coherency in multi-processor Xeon based systems requires each of the CPUs to keep an eye on the other processor’s bus, the use of Snoop Filter helps reduce parasitical traffic on the processor busses significantly.
Although i5000X chipset is targeted for workstations, it originates from server roots. Its server origin comes from the support of FB-DIMM DDR2 SDRAM memory modules. The chipset has two independent DDR2 SDRAM controllers, each working with dual-channel memory. So, the maximum theoretical bandwidth of the memory subsystem in case of DDR2-667 FB-DIMM memory reaches 21.3GB/s during read and 10.7GB/s during write operations. Moreover, i5000X memory controllers boast a number of interesting features that are quite demanded in the server market. For example, this chipset allows building memory module RAID arrays thus increasing system reliability and permitting to replace the failed memory modules without shutting the system down.
Since the chipset is targeted for high-performance workstations, its North Bridge supports PCI Express x16 graphics port. Unfortunately i5000X doesn’t allow splitting the PCI Express x16 bus into two PCI Express x8 busses, which could enable SLI and Crossfire support. This is one of the aspects where Intel platform yields to the competitor – AMD Quad FX.
However, it doesn’t mean that you can’t build any SLI or Crossfire configurations on an i5000X based platform. The 631xESB/632xESB South Bridge used as part of i5000X platform saves the day here. It supports two additional PCI Express x4 busses, one of which can be connected to the physical PCI Express x16 slot. Some mainboard manufacturers, TYAN for instance, take advantage of this opportunity. These mainboards have SLI and Crossfire technologies working just fine (as PCI Express x16 + PCI Express x4), because neither ATI nor Nvidia doe not prohibit in their drivers multi-GPU configurations for Intel workstation chipsets.
631xESB/632xESB South Bridge also gives the Intel V8 system two ATA-100 channels, six Serial ATA-300 channels, eight USB 2.0 ports, High Definition Audio and two Gigabit network controllers. Moreover, this South Bridge allows implementing two independent 64-bit 133-MHz PCI-X bus segments.
We decided to test Intel V8 platform using an Intel mainboard, which is quite logical. So, we picked Intel Workstation Board S5000XVN mainboard based on the i5000X chipset:
The chipset features determine the board’s ability to work perfectly fine with multi-core Xeon processors supporting 1333MHz bus and to support PCI Express x16 graphics cards. The mainboard also has two PCI Express x8 slots that are physically connected to PCI Express x4 bus, a PCI-X 100/133MHz slot and a PCI-X 100MHz slot. There are eight memory slots on the mainboard that can accommodate up to 32GB of FB-DIMM DDR2 memory working at 667 or 533MHz speeds.
Hard disk and optical drives can be connected to one Parallel ATA-100 port, two Serial ATA-100 ports supporting RAID 0, 1, 0+1 and four Serial Attached SCSI (SAS) ports also supporting RAID 0, 1, 0+1 and 5 arrays. Besides, Intel Workstation Board S5000XVN also carries two latest generation Gigabit network controllers and a dual-channel HD audio codec.
The features of Intel Workstation Board S5000XVN allow using it not only for a powerful workstation, but also for an enthusiast system intended for advanced media content processing. Unfortunately, you will not be able to build a high-end gaming system on this board, because it can only accommodate one graphics card.
In conclusion I would like to add that the positioning of Intel Workstation Board S5000XVN for the workstation market resulted into absence of a number of interesting features that could be very useful for enthusiast segment. In particular, this mainboard as well as other solutions based on i5000X core logic set, doesn’t allow any processor overclocking. This is another drawback we have to point out about Intel V8 system compared with the AMD Quad FX platform that gives overclockers quite a bit of freedom.
Since we haven’t yet discussed FB-DIMM (Fully Buffered DIMM) memory modules in any of our previous reviews, we would like to say a few words about this memory type today.
The need for high memory capacities brought FB-DIMM support into server and workstation platforms. The thing is that the memory controller cannot work with a lot of traditional memory modules because of pretty high electrical load that is required for each of the memory modules to function stably at high speeds. For example, to support 64GB of DDR2 SDRAM you will need to install 16 memory modules, so the memory controller in your system is very unlikely to cope successfully with all those DIMMs even if they are Registered ones. That is why Intel decided to give up the traditional parallel memory bus design and switch to serial bus. FB-DIMM modules are part of this concept: each module like that has DDR2 chips connected to a serial bus via the AMB (Advanced Memory Buffer) chip installed on each modules like that.
Like everything else, this approach also has some disadvantages. Firstly, the memory subsystem built with FB-DIMM features much higher latency. There is an additional AMB controller that stands in the way of data transferred between the DDR2 SDRAM chips and the processor. Besides, AMB is a pretty complex device that consumes quite a bit of power and heats up tangibly during operation. So all FB-DIMM modules consume about 3-6W each and should be equipped with heat-spreaders.
Our system featured four 1GB modules from Samsung, one in each channel.
As you can see from the stickers, these modules are designed to run at 667MHz with 5-5-5-11 timings.
The same info is reported in the modules SPD:
It is actually pretty hard to find worthy competitors to Intel V8 system. It is the first solution for enthusiasts that features eight processor cores located in two physical processors. Therefore, we think that we shouldn’t get very deep into investigating the performance of this system now that the successor to AMD Quad FX platform, the new FASN8 on Phenom FX processors is not out yet. It is evident that the eight-core system based on Core micro-architecture will outperform quad-core systems in all applications that are capable of paralleling the computational threads efficiently.
Nevertheless, we still decided to compare the performance of our today’s hero, Intel V8 platform, against that of the existing AMD Quad FX version and at the same time against that of a system based on one quad-core Core 2 Extreme QX6800 processor.
As a result, we built a few test systems using the following equipment:
The tests were performed with the BIOS Setup of the tested mainboards adjusted for maximum performance. Also note that according to the manufacturers’ recommendations for Quad FX system in Vista OS we set Node Interleaving parameter to Disable.
We used 64-bit versions of all applications where it was possible.
The diagrams below will depict results of four different platforms: Intel V8 with two Xeon X5365 processors; Intel V8 with one Intel Xeon X5365 processor; AMD Quad FX with two Athlon 64 FX-74 processors and a system with one Intel Core 2 Extreme QX6800 processor.
First of all, we decided to run a few simple benchmarks from SiSoftware Sandra 2007 suite to get an idea of the performance level provided by an eight-core system. These performance tests can be very well split into parallel threads and the result reported by the utility can be easily compared with the corresponding readings for other similar systems.
In the arithmetic and multimedia tests Intel V8 system on two quad-core Xeon X5365 processors demonstrated unprecedented performance. No wonder: Xeon X5365 are the fastest quad-core Intel processors available today among the solutions on Core micro-architecture and in general among all solutions out there.
Intel V8 system performs very well in the tests that measures data transfer rate between the CPUs in the system. Of course, contemporary dual-socket systems benefit a lot from two independent 1333MHz busses. Besides, the chipset Snoop Filter also makes its positive contribution to the overall victorious result.
Unfortunately, we failed to get the memory tests from Sandra to run on Intel V8 system. Therefore, we resorted to Everest 4.0 utility to measure the practical bandwidth and latency of the memory subsystem built with FB-DIMMs.
Despite the impressive theoretical results demonstrated by the i5000X chipset, the practical situation is not that remarkable any more. Because of high latencies, FB DDR2 revealed relatively low practical results.
However, you should hardly pay that much attention to the results obtained in simple synthetic tests. Let’s move on to more complicated benchmarks and discuss the performance of Intel V8 platform in complex tests and real applications, including the ones dealing with media content creation and processing.
Formally, PCMark05 can create only four computational threads. Nevertheless, Intel V8 system works faster than the system with only one same processor. This can be explained by the fact that the intellectual task manager of Vista OS tries to distribute the threads first of all between the two physically different processors, which in the end allows more efficient use of system’s L2 caches shared between the core pairs. However, it doesn’t allow Intel V8 platform to outperform a “regular” system with only one quad-core Core 2 Extreme QX6800 desktop processor. It could be cause by not as efficient memory subsystem of the system on i5000X chipset built with FB DDR2. Actually, the above mentioned results of the memory subsystem tests also prove this supposition.
However, 3DMark06 benchmark let Intel V8 climb all the way to the top of the pedestal, which is especially evident if we look at the results of the CPU test. Nothing surprising, as this test is very well optimized for multi-threaded environments and we have already seen it many times in our previous test sessions.
The majority of contemporary codecs are optimized for systems with multi-core processors. As a result, Intel V8 platform again shows unprecedented performance. Although we would like to add that in some cases, such as encoding with Xvid codec, the advantage of the eight-core system over a quad-core one with Core 2 Extreme processor is not that tremendous any more. It could be high latency of the memory subsystem in i5000X based platform again.
Although the last Adobe Photoshop version is optimized for SMP systems, these optimizations touch upon not all functions and filters. Therefore, although Intel V8 has an advantage here, it doesn’t get too far away from the other testing participants.
Although Adobe Premiere Pro supports multi-processor configurations, the eight-core system doesn’t perform impressive enough. It yields a little bit to the single-processor quad-core system with Core 2 Extreme QX6800. If we take a closer look at this test, we will see that the CPU utilization in Intel V8 during video rendering doesn’t exceed 50%. It is clear indication that the computational potential of the processor is not the bottleneck here. The performance may be limited either by the memory subsystem, or by the processor bus bandwidth, because these busses not only transfer data to the memory but also deal with ensuring coherency of four L2 caches.
Final 3D rendering tasks as always illustrate beautifully that system performance scales almost linearly depending on the number of computational cores. No wonder that Intel V8 platform is beyond all competition here.
New version of Excel 2007 is a good example of an application dealing with parallel calculations. The test measuring the time its takes the system to calculate an electronic table using Monte-Carlo algorithm for an economy problem is a great proof of that.
Data compression is another type of tasks that benefits greatly from additional computational cores in the system. Although archiving utilities are quite demanding to the memory subsystem performance, eight-core Intel V8 platform with relatively slow FB DDR2-667 SDRAM works much faster than a Core 2 Extreme QX6800 platform with high-performance DDR2 SDRAM.
As we have already demonstrated in our previous articles, contemporary games are not the tasks that would benefit greatly from SMP support. Nevertheless, we couldn’t disregard this part of our test session and suggest taking a look at the results in order to resolve all doubts in this respect.
Even the games that are formally optimized for multi-threading, such as Quake 4 and Supreme Commander, run relatively slow on a dual-processor system with quad-core Xeon X5365 processors. The thing is that despite optimizations these games can form a limited number of threads. Namely, Quake 4 creates only two threads, and Supreme Commander – four. This doesn’t allow Intel V8 system to show its real advantage, while the memory subsystem with higher latency aggravates the result even more.
A totally different picture can be witnessed in a chess game. The computational algorithm in this ancient strategy can be split into parallel threads easily, which immediately affects the results. The eight-core Intel V8 is far ahead of the quad-core platforms here.
However, upcoming games are more likely to have better implemented multi-threading support, so things will definitely get better. It is definitely so, judging by the results of a benchmark measuring the performance of the upcoming Source engine during environmental physics calculations.
Besides the performance tests in multi-threaded applications, we also wanted to find out how the background processes will affect the performance of our systems in a resource-hungry primary application. To check this out we used a popular single-threaded SuperPi benchmark to measure the time it takes out testing participants to calculate 2M digits of the Pi while there are a few copies of multi-threaded WinRAR utility running in the background.
The obtained results turned out very interesting. It appeared that with the increase of background workload Intel eight-core system could yield to AMD Quad FX with only two dual-core processors onboard. The reasons for that lie in the efficiency of work with memory subsystem, because in SMP systems it also serves to exchange data between the processor L2 caches. AMD Quad FX platform that uses fast quad-channel unbuffered DDR2 SDRAM boasts higher data transfer rate than the i5000X or Nvidia nForce 680i SLI based systems. Therefore, as the number of executed tasks increases, the platform built with Intel processors starts falling behind the AMD Quad FX at some point. Of course, intensive work with the memory subsystem as well as the need to transfer data between different CPU cores acts as catalyst to this situation.
We are used to the fact that systems with 65nm processors on Core micro-architecture are known for their relatively low power consumption. However, looks like Intel V8 doesn’t fall under this rule. Xeon X5365 processors boast typical heat dissipation of 150W. It is higher than the TDP of Athlon 64 FX used in the AMD Quad FX platform and even higher than that of older Intel processors with NetBurst micro-architecture. Moreover, FB DIMM memory modules used in Intel V8 platform also consume about 5-6W each.
To estimate the power needs of the eight-core Intel PC for media content creation we decided to compare our data with the power consumption stats for other platforms. We measured the power consumption of the systems described above. First we will look at the results obtained under high workload. We used special OCCT utility to load the CPUs and memory. The chart below shows peak power consumption of our systems:
460W that we obtained on the system with two Intel Xeon X5365 processors is a pretty high result even for a dual-processor system. Nevertheless, AMD Quad FX platform appeared even more power-hungry: its maximum power consumption needs reached 530W.
Now let’s see how much power these systems will require in idle mode with Enhanced SpeedStep and Cool’n’Quiet technologies enabled.
Intel V8 consumes over 200W in idle mode. In this case it loses from the economical standpoint to AMD Quad FX, which Athlon 64 FX-75 processors can reduce their working frequency much more tangibly than Intel Xeon X5365.
Our experience with Intel V8 platform made an unforgettable impression. The obtained benchmark results are so high that we do not hesitate to call it the today’s world’s fastest system for multi-threaded work. This desktop and workstation platform with two quad-core Xeon processors demonstrates unattainable performance in all SMP optimized applications.
However, everything we have just said is true mostly because there are no other eight-core systems in the market today that could compete with Intel V8. Therefore, when the new AMD FASN8 with quad-core CPUs on K10 micro-architecture comes out, situation may change. Especially, since Intel V8 has a few disadvantages originating from its server roots.
First of all, the FB DDR2 memory subsystem in the eight-core monster from Intel is implemented in not the most optimal way. Its latency is too high, which affects the system performance in some widespread applications. Secondly, i5000X based workstation mainboards used for Intel V8 systems cannot suit enthusiasts in the best possible way. They have a lot of functions that no enthusiasts will ever use, but at the same time offer nothing for processor overclocking. And thirdly, Intel V8 platform limits the user’s ability to create a high-performance video subsystem. There are very few mainboards on i5000X chipset that offer slots for two graphics cards, but even they do not allow to use SLI and Crossfire configurations in full power, only as PCI Express x16 + PCI Express x4.
That is why before we make any final conclusions about the future of Intel V8 platform as a great eight-core solution for enthusiasts, we would like to wait a little bit until we get a chance to test the AMD FASN8 platform featuring two next-generation quad-core processors. New Phenom FX CPUs (known under Agena FX codename) are expected to arrive in Q3 2007, so there is not much waiting left. Besides, from what we know already we can say that AMD Quad FX as well as the upcoming FASN8 will be more enthusiast-friendly judging by their features.
Anyway, we will be able to dot all i’s only when we get our hands on the new competitor in this segment, and right now Intel V8 deserves the laurels of an ultimate leader.