by Ilya Gavrichenkov
10/18/2009 | 07:27 PM
Over the past few years multi-core processors have been coming into our computer systems more and more aggressively. AMD’s and Intel’s marketing efforts were not in vain: they managed to convince the consumers and software developers that increasing the number of computational cores is one of the most optimal ways of increasing the performance of contemporary computer systems. The first half of 2009 turned out extremely significant in this respect, when the dual-core processors were almost completely ousted into the inexpensive solutions segment, and quad-core processors settled down in the upper price segments of the market. However, you shouldn’t think that the arrival and spreading of dual-core and then later quad-core processors is a process with an endpoint. Although not all contemporary tasks can be easily split into parallel threads for easier processing by several processor cores, most resource-hungry algorithms, including applications for media content creation and processing in the first place, can scale their performance perfectly well as the number of parallel thread increases. As a result, there remains room for speeding up popular software applications by introducing CPUs with more computational cores, which becomes a major requisite for the soonest arrival of consumer processors with over four cores.
In fact, only certain production limitations prevent the manufacturers from making CPUs with more than four cores: contemporary semiconductor technologies do not allow designing large processor dies like that at a price point that would be acceptable for the PC market. However, quad-core processors will very soon be ousted from the high-performance market segment, that’s a fact. Solutions like that have already appeared in the high-performance server and workstation market and it is a clear indication of the upcoming changes. At this point six-core server CPUs are available from both manufacturers. AMD Company offers six-core Opteron processors known as Istanbul that are based on 45 nm semiconductor die, similar in microarchitecture to dies used in Phenom II processor series. As for Intel, they have six-core Xeon processors from Dunnington family, also based on 45 nm monolithic die that is made of three Core 2 Duo-like dies joined together and enhanced with a large L3 cache. As the production of these semiconductor dies containing up to 1-2 billion transistors becomes cheaper, six-core processors will eventually appear as desktop solutions.
In fact, we have known about these Intel’s plans for a while no. The first six-core desktop processor from this company aka Gulftown should come out in H2 2010. It will be an LGA1366 solution with a 32nm monolithic core on Westmere microarchitecture (next Nehalem generation). Some time ago we learned about similar plans from AMD. Although this company was only planning to introduce more progressive 32 nm process for their solutions in 2011, they will launch their new six-core CPU also in 2010, around Q2. The corresponding processor is currently known as Thuban and will most likely be a server Istanbul adapted for desktop computers.
However, you don’t have to wait until next year to build a system based on a six-core processor. You just have to find a single-socket mainboard compatible with existing six-core server processors, which functionality will meet the requirements posed for mainboards for common computer systems. In other words, a mainboard like that should have at least PCI Express x16 slots for high-performance graphics accelerators. Luckily, our lab was lucky to find a mainboard like that with a single processor Socket F, which could be used to assemble and test a single-processor system on AMD Istanbul. This way we could take a peek into the future and get an idea of what a six-core processor is capable of in typical desktop tasks.
Moreover, it is also interesting to test Istanbul within a desktop system, because it will allow us to estimate the performance of the upcoming Thuban CPUs. These particular processors should become AMD’s primary weapon in the ongoing competition against Intel. Especially, since frankly speaking, the existing quad-core Phenom II processors seem to be a pretty weak alternative to Core i7 CPUs. Maybe Istanbul (that will eventually transform into a desktop reincarnation aka Thuban) will be able to shake the positions of Intel’s flagship solutions thanks to its additional cores? As we found out during our earlier research, Core i7 processors demonstrate unprecedented performance under well-paralleled computational load primarily due to Hyper-Threading technology support that adds four virtual cores to four “real” ones. As for Istanbul, it can oppose four virtual cores with two additional real ones that is why it is extremely interesting to see how it compares against Core i7.
Having successfully introduced 45 nm production process, AMD started manufacturing several CPU cores for mainstream desktop processors. We are very well familiar with them already, they are: Deneb that is used in Pehnom II X4, Propus used in Athlon II X4 and Regor used in Athlon II X2. AMD usually builds server processors with the cores similar to desktop ones that differing a little by external interfaces support: they have more HyperTransport busses and the memory controller is designed to support Registered memory DIMMs. For example, Shanghai core used in contemporary Opteron 2300 and 8300 processors is built exactly like that. It can be regarded as the closest relative of the desktop Deneb core.
However, Istanbul core stands out of this rule: so far it has no desktop analogue. Although we can’t deny that Istanbul, just like Shanghai, is close to Deneb in microarchitecture. However, in this case the differences go beyond the memory controller and the number of HyperTransport busses. Istanbul has six computational cores on a single processor die, which makes these processors the most expensive solutions in the Opteron lineup: they are priced starting at $450. There is nothing surprising about it: Istanbul die size is 346 mm2, which is about 1/3 larger than Shanghai die. The six-core processor has 904 million transistors. All this fits onto a monolithic semiconductor die, which holds not only six computational cores each with 512 KB L2 cache, but also a shared 6 MB L3 cache.
Istanbul processors have three HyperTransport busses, which allow using them in dual-, quad- and eight-processor systems. As for the memory controller, Istanbul works with Registered dual-channel DDR2-800/667/533 SDRAM with or without ECC support in order to maintain compatibility with existing Socket F platforms. By the way, compatibility is another specifically stressed advantage of AMD server processors: Socket F platform was first introduced three years ago, but even the latest Istanbul CPUs work perfectly fine with any Socket F mainboards after reflashing the BIOS.
Compared with their predecessors, six-core processors boast a significant improvement – HT Assist technology that reduces the CPU idling time in multi-processor systems during intensive memory operations. In systems like that cache-memory of different processors may contain copies of the same data from the system memory of different currency. That is why if the system needs to access any sort of data, the CPUs first of all send requests to other processors checking if they have more current version of the needed data in the cache. It creates big parasitic traffic along the HyperTransport bus and causes the performance during work with the memory subsystem to drop dramatically. In order to eliminate this negative effect, Istanbul processors can isolate a special area within L3 cache, that stores a constantly updated directory of data cached by all processors in the system. As a result, when the request for information from the memory is sent, each processor knows in advance, the cache-memory of which CPU in the multi-processor system it should address. This is the essence of HT Assist technology, which proves the most efficient in quad- and eight-processor systems.
However, if we regard Istanbul as a possible solution for a single-CPU platform, then there is no need for HT Assist and three HyperTransport busses. However, even in this case Istanbul may offer special advantages over Deneb and Shanghai, even if we do not take into account more computational cores. First, Istanbul supports higher HyperTransport bus frequency that has been increased to 2.4 GHz (in contemporary desktop AMD processors this frequency doesn’t exceed 2.0 GHz). Second, the built-in L3 cache inside Istanbul processor is overclocked to 2.2 GHz instead of 2.0 GHz.
As for the six-core processor clock frequencies, they are obviously lower than the frequencies of quad-core CPUs. The top AMD Opteron processor models with six cores work at 2.8 GHz. Moreover, these solutions belong to HE class that includes CPUs with higher power consumption and heat dissipation. The general-purpose six-core Opteron processors with average CPU power of 75 W work at 2.6 GHz maximum frequency, which is 30% less than the frequency of top quad-core Phenom II X4 CPUs. This must be another reason why AMD is not in a hurry to move their six-core processors into the desktop segment: at this point their clock frequencies won’t let them outperform the top quad-core CPUs even in well-paralleled tasks.
For our today’s test session we picked a six-core Opteron 2435 processor working at 2.6 GHz clock frequency. The table below lists the complete specifications of the new CPU:
Opteron 2435 is the top Istanbul model of those widely available in retail these days. It costs around $1000, but when these CPUs will be available in the desktop segment, they will most likely be considerably cheaper. Opteron 2435 processor is a server solution that is why it has a 1207-pin LGA packaging and differs substantially from the usual Socket AM2/AM3 CPUs even in its exterior looks.
Taking into account the specifications mentioned above, we would like to draw your attention to one more peculiarity of six-core Opteron CPUs, besides Registered DDR2 memory and higher operational frequencies of the North Bridge built into the processor and HyperTransport bus. The table above shows a relatively low heat dissipation of only 75 W, which seems a little surprising as the fastest desktop processor currently boast 125-140 W TDP. However, this inconsistency can be explained easily. Firstly, server CPU uses lower core voltage – this is AMD’s deliberate strategy aimed at lowering the electrical and thermal specifications of solutions targeted for rackmount servers, where it is challenging to use high-performance cooling solutions. Secondly, the physical meaning of the parameter that AMD uses to describe the heat dissipation of their server processors is slightly different from the TDP (Thermal Design Power) used for desktop CPUs. The thing is that AMD introduced a special parameter for their server processors called ACP (Average CPU Power) that is calculated not as maximum, but as average processor heat dissipation. In other words, Istanbul heat dissipation is lower than that of top Phenom II X4, but we can’t say how much lower, because it is impossible to directly compare ACP and TDP values.
AMD Company maintains strict differentiation of their server solutions. Socket F with 1207 pins that allows implementing several HyperTransport busses is positioned exclusively as a solution for dual- and multi-processor systems. AMD Opteron processors targeted for single-CPU platforms are all manufactured in Socket AM3 form-factor that implies the existence of only one HyperTransport bus in the CPU. As a result, mainboard makers do not offer single-socket solutions with Socket F, but six-core Istanbul processors are currently available only in Socket F form-factor. This is one of the major challenges one would face trying to build a single-processor system on a six-core Opteron.
Luckily, there are exceptions to almost every rule. We were lucky and we managed to find a mainboard equipped with a single Socket F. It was MSI K9NU Speedster.
The manufacturer positions this solution as a platform for single-processor servers and high-performance workstations that is why it also has the functionality suitable for common desktop computers.
To be fair we have to say that this is no new mainboard model. MSI K9NU Speedster was designed back in 2007, but continuity of Opteron processor generations actively promoted by AMD allows using this mainboard even with the newest six-core CPUs without any problems. The only thing you need to ensure that MSI K9NU Speedster will be compatible with the latest CPU is a BIOS update.
MSI K9NU Speedster is based on Nvidia nForce Prefessional 3400 MCP – analogue to a well-known Nvidia nForce 570 SLI chipset. Due to this chipset the board features two PCI Express x16 graphics card slots (although they only support version 1.1), which even allow building a dual-card SLI configuration as 8x+8x.
However, many features of MSI K9NU Speedster still give away its server origin. The first thing that catches your eye is the XGI Z7 chip with 16 MB of video memory integrated onto this board that allows to do without an external graphics card in case you are building a rackmount server, for instance. At the same time, the board has no such common thing as an integrated sound codec. However, MSI K9NU Speedster is equipped with two Gigabit network controllers and a special Renesas H8S/2168V controller for remote system management.
One of the non-typical peculiarities of this mainboard is the availability of eight DIMM slots for DDR2 SDRAM. The board supports only Registered DIMMs with or without ECC. As a result, MSI K9NU Speedster allows installing up to 32 GB of system memory.
I have to say that the use of relatively old Nvidia nForce Professional 3400 MCP chipset launched back in 2006 doesn’t allow MSI K9NU Speedster to really unwrap the potential of contemporary Opteron CPUs. For example, if you install processors that can clock HyperTransport bus at 2.4 GHz, the actual frequency of this bus will be only 1.0 GHz because of the chipset limitations. However, this issue has very little influence on the resulting performance in single-processor systems.
Another disappointment that this mainboard may bring the desktop users is really poor BIOS Setup functionality. The board has none of the traditional BIOS options used for CPU overclocking. That is why you can only use processors installed in MSI K9NU Speedster in their nominal mode. Therefore Opteron processors participating in our today’s test session weren’t overclocked.
During our first encounter of a system built around a six-core AMD CPU we would like to focus primarily on the need for processors with so many computational cores in a desktop platform. That is why the first part of our test session will be devoted to comparing an Istanbul based system against the one built around Phenom II X4 working at the same clock frequency.
The second part of our test session will discuss the performance of Opteron 2435 compared against that of quads-core mainstream processors. Here we will involve the top solution from Phenom II X4 lineup and the junior LGA1366 CPU from Intel supporting Hyper-Threading technology.
As a result, we put together the following platforms:
1. Socket F platform:
2. Socket AM2+ Platform:
3. LGA1366 Platform:
Besides the above listed components, all testbeds also included the following:
Also, since MSI K9NU Speedster doesn’t have an integrated audio codec, we also equipped our Socket F system with Creative Audigy SE SB0570 sound card.
We would like to conclude this section by offering you a screenshot taken off the six-core Istanbul based system with the windows of several diagnostic tools showing the system specs:
The systems based on quad-core processors do not get popular very rapidly. Many users believe that dual-core CPUs offer the best price-to-performance ratio. And they are partially correct. A lot of contemporary popular applications, first of all numerous games, can’t use all the resources of quad-core CPUs. That is why most users have very good reasons to consider the ability to split the load into even more, namely six, parallel threads to be obviously excessive. In these conditions the future of desktop six-core processors is pretty arguable, which may make these processors a very niche solution, at least shortly after they appear in the market.
However, there is also a different opinion. I have to remind you that almost all resource-hungry applications dealing with media content creation and processing have been long and very successfully optimized for multi-threaded environments. And in this case another increase in the number of CPU cores may become very handy, especially since more and more computer users start getting involved with video processing.
In order to see all the benefits from the increase in CPU cores from four to six, we decided to find out how great will the performance gain be from two additional cores in existing applications before we actually move to direct comparison between six-core Istanbul and top quad-core processors. To answer this question we performed a test session where we compared the performance of two systems with six and quad-core processors working at the same clock frequencies.
In other words, we compared a system built around six-core Opteron 2435 working at 2.6 GHz against a system using quad-core Phenom II X4 910 that also worked at 2.6 GHz clock speed. To ensure that we perform this comparison in testing conditions as close as possible, we clocked DDR2 memory at 667 MHz in both systems. Our experiments showed that with DDR2 SDRAM working at the same frequency in both cases the Socket F platform can ensure the same memory subsystem performance as the Socket AM2+ platform, even though we use Registered modules in the first system and Unbuffered in the second.
Overall, the CPU with six computational cores can really raise the performance bar. Two additional cores provide a pretty significant advantage during video processing and transcoding as well as during final rendering. We see the results also increase in synthetic Futuremark benchmarks that measure complex systems performance. In other words, adding additional cores to desktop processors indeed makes a lot of sense. But unfortunately, gamers will hardly be excited about the six-core desktop processors. Most games at this point can’t respond to the increase in the number of computational cores with an appropriate fps rate growth. But we can certainly find a few examples of just the opposite.
And although in this test we barely see any negative numbers among the relative performance results of AMD Istanbul processor, do not be too hopeful yet. Here we compared a six-core and quad-core processors working at the same clock frequencies. In reality, six-core desktop AMD processors that will be manufactured with the existing 45 nm process will have lower frequencies than the top quad-core offerings. And it will inevitably make top six-core AMD CPUs considerably less attractive. In this respect, Intel’s approach seems to be way more effective: this company will manufacture their six-core CPUs with the new 32 nm process, which will allow them to compete against quad-core solutions in clock speeds as well as performance in applications that do not parallel too well.
However, let’s not draw any final conclusions just yet and see how Opteron 2435 compared against Phenom II X4 965 in performance.
To estimate the average performance we resorted to synthetic PCMark Vantage test that nevertheless uses algorithms also employed in popular applications.
Unfortunately, six-core Opteron processors yield to both: Phenom II X4 and Core i7-920 in PCMark vantage. But in this case it isn’t surprising, because this benchmark emulates the work of popular algorithms that are currently rarely optimized for multi-threaded environments. In other words, the results obtained in this benchmark show us that six-core processors are not yet a good universal solution for desktop platforms.
3DMark Vantage benchmark shows the systems performance in gaming applications. In this case it is important to keep in mind that the processor part of this test can be well paralleled, which makes the obtained results a little different from the performance in real games.
The results of 3DMark Vantage reveal a slightly different picture than what we have just seen in PCMark Vantage. High-quality optimization of the CPU test from this suite that calculates gaming physics and artificial intelligence components for numerous computer opponents allows six-core Istanbul processors with 2.4 and 2.6 GHz frequencies to run even faster than, for instance, Phenom II X4 965 working at way higher clock speed of 3.4 GHz. However, six-core Opteron processors still lose to quad-core Core i7-920 that retains its leadership due to both: more progressive microarchitecture as well as support of Hyper-Threading technology, which becomes very useful in this case.
This is the most interesting part of our test session, because work with media content is on the one hand a pretty popular task for home computer systems, but on the other requires significant computational resources.
The video encoding speed tests performed with popular codecs allow Istanbul processors to perform pretty well. I would like to specifically stress the fact that six-core AMD processors with relatively low clock frequency can compete with Core i7-920 on equal terms. Even though the latter works at higher clock frequency and the operating system sees it as an eight-core CPU (due to Hyper-Threading technology support).
However, when we transcoded the video using Cyberlink Media Show 5 (for our tests we transcoded the HD trailer of “2012” movie into format compatible with youtube.com web-site), we didn’t see the advantages of multi-core Opteron microarchitecture that evidently anymore. As a result, even the quad-core Phenom II X4 copes with this task much faster thanks to its higher clock speed.
However, during non-linear video editing in Adobe Premiere Pro application six-core processors come in very handy. Both Istanbul CPUs participating in our today’s test session outperform Phenom II X4 965. However, even the top Opteron 2435 falls behind Core i7-920 although both of them have very similar clock frequencies. In other words, even with two additional cores AMD CPUs won’t get the necessary advantage over their quad-core competitors: they will most likely return to the high-performance segment only with Bulldozer microarchitecture.
Final rendering is a well-paralleled task that is why six-core AMD processors appear faster than quad-core CPUs from the same maker. But unfortunately, they are still unable to compete successfully against Core i7-920. Therefore, the outcome of the upcoming competition between the future six-core AMD and Intel processors seems to be predetermined.
Image editing in Adobe Photoshop graphics editor is definitely not one of those types of tasks where multi-core CPUs come in handy. Many operations performed by the editor in fact use single-threaded algorithms. That is why clock frequency becomes of greater important in this application than the number of computational cores.
During data archiving six-core AMD CPUs outpace quad-core one although it works at higher frequency. However, they are still very far behind Core i7-920. It must the relatively slow memory subsystem of AMD platforms that determines this outcome. When desktop analogues to the server Istanbul appear, we may expect the performance to increase a little, because these upcoming CPUs will be compatible with Socket AM3 and hence will support faster DDR3 SDRAM.
Distributing computer client called Folding@Home can use the potential of multi-core processors very efficiently that is why Opteron 2435 looks pretty good here. It proves just as fast as Core i7-920.
We didn’t expect anything good from gaming tests that is why we moved this part of our test session to the end of the performance analysis chapter. Most contemporary games can’t boast being optimized for multi-core microarchitectures. Moreover, Socket F platform can’t demonstrate good performance in applications like that because of slow DDR2-667 memory, according to today’s standards.
And nevertheless, Opteron 2435 processor managed to outperform Phenom II X4 965 in Resident Evil 5, which indicates that six-core CPUs do have some gaming potential. In most cases systems with fewer CPU cores but higher clock frequencies will run faster.
Server Istanbul processors have pretty low ACP values, which suggests that they might be very efficient in terms of performance-per-watt. To check this out we tested the actual power consumption of all participating platforms. The following numbers show the total power consumption of the tested platforms (without the monitor). During our tests we used 64-bit LinX 0.6.3 utility to load the systems to the utmost extent. Moreover, to ensure that we estimate the power consumption in idle mode correctly we activated all power-saving technologies, such as C1E, Cool'nQuiet 3.0 and Enhanced Intel SpeedStep. It is also important to keep in mind that Socket F MSI K9NU Speedster mainboard we used doesn’t support separate power supply for the processor cores and North Bridge integrated into the CPU that is why the power consumption of the system built on it appears a few watts higher in idle mode.
In idle mode Istanbul based system consumes a little more that Socket AM2 platform. It can be explained by the reasons described above as well as by the fact that Nvidia nForce Professional 3400 chipset used in our Socket F platform is not one of the energy-efficient solutions. However, despite this fact the system with a six-core AMD CPU consumes less power in idle mode than an LGA1366 platform with an Intel processor.
During full CPU utilization Istanbul is incredibly energy-efficient. While six-core Opteron processors do not yield in performance to the top Phenom II X4 CPU in a number of applications optimized for multi-threaded architectures, their power consumption is way lower. Six-core processors also look very appealing compared with Core i7-920, but here we should keep in mind that the latter is considerably faster.
To get a better picture of the situation we also tested the power consumption of the processors and mainboards under heavy load without taking into account the rest of the system components. To be more exact, we measured the power consumption along the 12 V power line connected directly to the processor voltage regulator on the mainboard and along the mainboard power lines.
As we see, in reality six-core Opteron processors working at 2.4-2.6 GHz frequency consume by about 1/3 less power than quad-core CPUs with the same microarchitecture, but working at 3.4 GHz clock speed.
It is interesting to see the mainboards power consumption first of all because low power consumption readings taken off the Nehalem based CPUs are partially explained by their voltage regulator design. The thing is that only processor cores are connected to the 12 V power line. Uncore part of the CPU is powered from the mainboard via 24-pin ATX power connector. As a result, if we sum up the numbers on this and the previous diagram, we can conclude that six-core Opteron processors win from the Core i7-920 in power consumption. So, looks like the major trump of the upcoming six-core AMD processors targeted for desktop computer systems will be performance increase-per-watt.
Although AMD and Intel both plan to launch their six-core desktop processors only in Q2 2010, it doesn’t mean that you can’t put together a six-core desktop system today if you want to. AMD has been offering Opteron processors with six cores from Istanbul family for about 6 months already. That is why if you find a CPU like that and a single-CPU Socket F mainboard, you will have a six-core platform that may become a great solution for a desktop. This is exactly what we have done today during our test session.
It is another question if building a system like that really makes sense. The existing six-core Opteron processors work at relatively low clock frequencies, and Socket F mainboards only work with inconvenient and not very fast Registered DDR2 memory. Nevertheless, it seems to make certain sense. An Opteron 2435 processor working at only 2.6 GHz turns out to be more attractive in a number of applications than the top Phenom II X4 CPU. Among applications like that I could mention programs for media content creation and processing, CAD and 3D modeling applications, etc., which are well optimized for multi-threading. And it is especially pleasing that Opteron not just outperforms Phenom II X4 965, but is also a more energy-efficient solution.
However, we shouldn’t overestimate the consumer qualities of the six-core AMD CPUs in regards to desktop use. These CPUs are great for resource-hungry applications, but they can’t reveal their entire potential under conventional everyday load or in games. Therefore, processor manufacturers are not hurrying to start offering six-core solutions in the desktop segment.
Nevertheless, there is no so much time left before six-core desktop CPUs will officially hit the streets: Intel Gulftown and AMD Thuban should come out at about the same time – in Q2 2010. I don’t think that the situation in the software market will change dramatically by then. And it means that six-core processors will still be barely niche products. Moreover, Intel Gulftown seems to be somewhat more promising, because it will be manufactured with 32 nm production process, which may allow setting its clock frequencies as high as those of the existing quad-core CPUs. As for AMD, they will manufacture Thuban using the same 45 nm process that they employ for Phenom II CPUs. That is why they may have to resort to aggressive pricing policies to make their solutions look more competitive.
Moreover, we expect AMD Thuban processors to work at about 3.0 GHz. And as we have seen during our today’s test session, it won’t be enough for them to outperform neither competitor’s six-core solutions, nor top quad-core Nehalem based CPUs, which cope perfectly well with heavy multi-threaded load due to Hyper-Threading technology support. Moreover, AMD solutions definitely lack something similar to Turbo Boost that allows multi-core Intel processors to perform very fast when not all the existing cores are busy. In other words, desktop six-core Thuban processors won’t become AMD’s ticket back to the high-performance market segment, but they will definitely have sufficient interesting functionality to become an attractive mainstream offering.