by Ilya Gavrichenkov
09/08/2006 | 05:05 PM
Intel's new processors with the Core micro-architecture remain in the focus of PC enthusiasts' attention. Numerous tests have proved that Core 2 Duo processors deliver unrivalled performance at their default frequencies as well as at overclocking. No wonder that various modifications of Core 2 Duo and Core 2 Extreme are on the wish list of many users who are upgrading their computers with their own hands. A mass transition to the new Core 2 platform is just about to begin if it hasn't begun already. Our intent now is to review the Core 2 Duo infrastructure to see what is the best environment for that processor.
In this first article we are going to talk about system memory with respect to the Core 2 Duo, particularly we will try to find out which memory parameter has a bigger effect on performance of Core 2 Duo systems, bandwidth or latency. As a result, we will come up with some general conclusions as to what exactly memory out of the variety of DDR2 SDRAM available today suits best for the new platform. Besides, we will give you our own recommendations on purchasing DDR2 SDRAM for systems with new Intel Core 2 Duo processors.
Before proceeding to discuss the results of our tests which may give us exhaustive answers to all the questions asked in the introduction, we want to say a few words on why Core 2 Duo processors may put forth specific requirements to the memory subsystem to achieve maximum performance. After all, these CPUs are compatible with the same LGA775 platforms (with minor variations in electric characteristics) that long- and deep-studied processors from the Pentium 4 and D families were used on earlier. But as a matter of fact, the Core 2 Duo has a dramatically different micro-architecture which is the main reason for its different way of working with system RAM.
First of all, the Core 2's innovative dual-core design with a shared L2 cache comes to mind. As opposed to separate L2 caches, a shared L2 cache frees the front-side bus and the memory bus from data transfers required to maintain cache data coherency. Dual-core Pentium D processors used to utilize the front-side and memory buses to exchange data between the execution cores whereas the Core 2 Duo achieves this by means of its shared L2 cache alone. As a result, the Core 2 Duo can use the CPU-memory link to more effect, freeing it from auxiliary data transfers.
The second thing that turned out to have a positive effect on memory performance in Core 2 Duo systems is the increased frequency of the Quad Pumped Bus which connects the CPU and the chipset's North Bridge. The resulting frequency of this bus is now 1067MHz which provides a bandwidth of 8.5GB/s. It also means that Core 2 Duo platforms have an opportunity to fully utilize the bandwidth provided by a dual-channel memory subsystem with DDR2-533 SDRAM modules. Installing even higher-frequency modules can give a chance to additionally reduce memory access latencies.
We shouldn't also forget that the Core micro-architecture features a number of technologies that improve the CPU's memory-accessing capability like the memory disambiguation technique and the data pre-fetch algorithms which are much better than those employed in the Pentium 4. You can learn more about these technologies here.
Although Core 2 Duo processors still use an external memory controller, located in the chipset's North Bridge, the mentioned features of the new micro-architecture help them challenge Athlon 64 X2 CPUs, which have a memory controller integrated into their core, in terms of memory performance. The graphs below show you the results of our measuring the effective memory bandwidth and latency in systems based around an Intel Pentium D 960, Intel Core 2 Duo E6700 and Athlon 64 X2 5000+. DDR2-800 SDRAM with 4-4-4-12 timings was used in each case:
As you see, the Core 2 Duo platform features higher memory performance in comparison with the Pentium D system in practice, not only in theory. Although the two processors from Intel access memory through the same memory controller integrated into the chipset's North Bridge (this test was performed on an i975X-based mainboard), the choice of the CPU affects the result greatly. The Core 2 Duo can ensure a 10% higher memory bandwidth and much lower data access latency (from 20% to 40% depending on how efficient the data pre-fetch algorithms are in a particular application). The Core 2 Duo is obviously superior to previous-generation NetBurst processors when it comes to using the memory subsystem efficiently.
It is quite interesting to compare the real-life performance of the Core 2 Duo platform with that of the Athlon 64 X2 platform especially since they use different approaches to placing the memory controller. Contrary to the Core 2 Duo, the Athlon 64 X2 (in Socket AM2 design) has an internal DDR2 SDRAM controller, integrated right into the CPU core. The integrated controller provides a very high memory bandwidth. The substantial advantage over the Core 2 Duo platform is not to be wondered at considering that in Intel's systems the speed of data transfers between the CPU and memory is limited by the bandwidth of the FSB. As a result, the DDR2-800 SDRAM memory subsystem is about 40% efficient on the platform with the new Intel CPU and 55-60% efficient on the Athlon 64 X2 platform.
As for the memory latency parameter, two out of three test utilities show that the Core 2 Duo platform is capable of achieving lower memory latency than the Athlon 64 X2 system. This result is obviously due to the data pre-fetch algorithms employed by the Core micro-architecture. Those algorithms prove to be very helpful in many cases. So, even though with an external memory controller, Core 2 Duo processors have high performance in applications sensitive to memory speed.
The second group of tests we carried out for this review was meant to reveal the influence of memory subsystem parameters, namely timings and frequency, on performance of Core 2 Duo platforms. The testbed was based around an ASUS P5W DH Deluxe mainboard which features the i975X chipset and is capable of clocking memory not only as DDR2-533, DDR2-667 and DDR2-800 but also as DDR2-1067 at a FSB frequency of 266MHz. Not all i975X-based mainboards allow selecting 1067MHz memory frequency at a 266MHz FSB because that frequency is not ratified by JEDEC and is ignored by some manufacturers. The ASUS mainboard is free from prejudices and allows using any memory frequency devisor available in the chipset from its BIOS Setup.
The testbed was assembled out of the following components:
Max performance settings were selected in the mainboard's BIOS.
We first measured the bandwidth and latency of the memory subsystem in synthetic benchmarks.
As you can see, memory types with different theoretical bandwidths do not differ much in practice. For example, there is a 100% difference in theoretical bandwidth between DDR2-533 and DDR2-1067 whereas the difference between the practical results obtained with those memory types is 17% at maximum.
This poor performance of fast DDR2 SDRAM is due to the architecture of Core 2 Duo systems in which memory is connected to the CPU via the chipset and two sequential buses. In this design it is not the bandwidth of dual-channel high-frequency memory that becomes the bottleneck, but the Quad Pumped Bus that connects the CPU with the chipset's North Bridge. Its maximum theoretical bandwidth is 8.5GB/s in Core 2 Duo systems, which only equals the bandwidth of dual-channel DDR2-533 SDRAM. That's why we don't see a really big performance growth if we use memory faster than DDR2-533.
It seems it doesn't make any sense to use memory faster than DDR2-533 on the Core 2 Duo platform. This is not quite so. Memory access latency decreases along with frequency, which can be seen in practical tests.
Here, the results differ much more. Like the bandwidth, the latency should have a considerable effect on system performance in many applications and may justify the use of high-frequency memory in a computer with a Core 2 Duo processor.
We ran a few complex test applications to verify our point.
The popular SuperPi benchmark shows quite clearly that the speed of calculating pi to 8 million decimal places depends on the memory speed. Both frequency and timings of the memory modules affect performance. On the whole, we can note that there is always a positive effect from faster DDR2 SDRAM unless its timings are the worst (highest) for that frequency.
However, not all applications are so sensitive to the memory subsystem speed. For example, PCMark05 doesn't care much about what DDR2 SDRAM you've got in your system.
The synthetic memory test from that benchmark reveals the best memory subsystem configurations, though. Its verdict differs from SuperPi's: here, we can say without any reservations that the frequency of DDR2 SDRAM has a bigger effect on performance than its latency.
The popular 3DMark06 benchmark seems to agree with the tendency, which is, however, not very sharply outlined here and is barely visible in the graphs. This serves to confirm our point that the limited bandwidth of the front-side bus is an obstacle to accelerating your system by means of high-frequency memory.
3D games have always reacted readily to any increase in the speed of the memory subsystem. We see that again here, but the reaction isn't very enthusiastic. However, you can see that higher-frequency memory enjoys a certain advantage over slower-frequency one and allows achieving a higher frame rate whereas the memory timings affect system performance less. We shouldn't overestimate the role of fast memory in gaming applications. For example, the results of DDR2-533 and DDR2-1067 differ by only 5-10%, i.e. installing the twice faster memory leads to a negligible performance increase even in games.
And there are a lot of applications that don't rely that much on memory performance. Take the final rendering task as an example:
The difference is no bigger than 1% here and is hard to see in the graphs.
The speed of encoding media content depends but little on the memory speed, but the diagram shows that the use of extremely fast DDR2 SDRAM can bring you a 5% advantage over the configuration with the slowest memory subsystem.
Fortunately for manufacturers of speedy memory, there are applications where fast DDR2 SDRAM can make a difference. Particularly, it is WinRAR, an application that we always use to see the benefits of a faster memory subsystem or a larger CPU cache. Here, DDR2-1067 SDRAM enjoys a nearly 50% advantage over DDR2-533 SDRAM on the Core 2 Duo platform. Yet you should keep it in mind that this is only a special case. In a majority of applications the positive effect from faster memory is negated by the limited bandwidth of the front-side bus that connects the chipset's North Bridge with the CPU.
The previous section may give you an impression that Core 2 Duo systems do not in fact need fast memory. Using higher-frequency memory modules makes the system costlier, yet doesn't lead to any significant performance increases. This is true, in part: memory faster than DDR2-533 can only provide a maximum of 5% performance growth in a majority of widespread applications. The problem is in the front-side bus which is only clocked at 266MHz as yet.
But it doesn't mean fast memory is completely useless for owners of Core 2 Duo systems. Although Intel has limited the frequency and bandwidth of the front-side bus, it's in our power to increase them without Intel's help. So, we'll be talking about overclocking now.
It has been proven in numerous experiments, also in our own labs, that Core 2 Duo processors are very overclocker-friendly. The frequency multiplier of such processors still being fixed at a certain value, you can only overclock them by increasing the FSB frequency. Given that the Conroe core has a huge overclocking potential and that the default multiplier of Core 2 Duo processors is usually small, you often have to lift the FSB frequency up very high at overclocking. This increases the bandwidth of the CPU-chipset thoroughfare which makes the use of fast DDR2 SDRAM more reasonable than in a non-overclocked system.
To check this out in practice we benchmarked memory subsystem performance using an overclocked Core 2 Duo and memory modules with different frequencies and latencies. Here, we didn't set it our goal to find the highest frequency Conroe-core CPUs would be capable of working at. We only wanted to see what effect on the overall system performance the memory subsystem parameters would have if the FSB was overclocked. We set the FSB frequency at a rather typical value of 400MHz (50% above the default). At that frequency the FSB bandwidth grows from 8.5GB/s to 12.8GB/s. In theory, this should make the use of dual-channel DDR2-800 worthwhile.
For our overclocking tests we took a Core 2 Duo E6300 processor with a default frequency of 1.86GHz and 7x multiplier. At a 400MHz FSB, the CPU clock rate is 2.8GHz which is easily conquerable by such processors.
The rest of testbed components are the same as were used in the previous section of the review.
The ASUS P5W DH Deluxe mainboard has certain peculiarities as concerns support for memory at overclocking. That is, many memory frequency divisors do not work when the FSB is overclocked. Until recently we could use that mainboard at a 400MHz FSB only to test memory at either 600MHz or 800MHz (FSB:DRAM divisors of 4:3 and 1:1, respectively). Fortunately, ASUS' engineers are still working upon the mainboard and the version 1305 BIOS is stable at 1000MHz memory frequency, too (with a 4:5 divisor at 400MHz FSB). So, we can now perform a test that includes memory modes not only with step-up divisors (as at the default CPU clock rate) but also with step-down divisors.
As in the previous section, we will first run synthetic benchmarks.
The measurements of the CPU-memory bandwidth agree with the results we've got in the previous section. When the memory frequency is higher than 800MHz, the bandwidth indeed stops to grow further which coincides with the maximum bandwidth of the FSB at 400MHz. You can see that the effective bandwidth growth is about 15% on switching from 600MHz to 800MHz memory frequency, but a mere 2% on switching from 800MHz to 1000MHz.
The memory latency tests confirm the point: the latency almost stops to decrease after an 800MHz memory frequency. So, it is clear that DDR2-1000 SDRAM cannot bring about a significant performance gain in comparison with DDR2-800 SDRAM on Core 2 Duo platforms even when the FSB is overclocked to 400MHz. However, the increase of the memory frequency to 800MHz should be profitable; at least we've got all the necessary conditions in terms of bandwidth and latency.
It's correct: even with bad timings DDR2-800 works faster than DDR2-600 with the best timings. DDR2-1000 is only a little faster than DDR2-800, the timings being the same. This looks like a convincing illustration of the fact that it's unreasonable to use memory with a resulting (DDR) frequency two times that of the FSB clock rate.
Although PCMark05 isn't very sensitive to memory subsystem performance, it still gives us some basic trends. DDR2-800 enjoys a bigger advantage over DDR2-600 SDRAM than in the tests with a non-overclocked CPU. The DDR2-1000 configurations turn to be slower than the DDR2-800 system that allows using the aggressive timings of 3-3-3-10. Well, even if you compare the results of DDR2-800 and DDR2-1000 with identical timings, you will see that the faster memory only scores a few extra points in PCMark05.
The same can be said about the results the memory configurations showed in the complex memory subsystem benchmark from PCMark05.
3DMark06 results don't depend much on the memory subsystem speed, so there is not much matter for us to discuss here. You should be aware that higher memory performance doesn't necessarily lead to a big practical effect. There are quite a lot of real-life applications that do not work with large amounts of data and thus do not actually need fast memory. So, before purchasing fast overclocker-friendly DDR2 SDRAM modules, you should make sure you do need them for the applications you are going to run on your computer.
3D games are certainly the kind of applications that can appreciate fast system memory.
At a 400MHz FSB the system with DDR2-800 enjoys a 2 to 7% advantage over DDR2-600 with the same timings, which is a good result. Switching to DDR2-1000 SDRAM doesn't provide big practical benefits: the high bandwidth of that memory type cannot be fully utilized due to the limited bandwidth of the FSB. Note also the performance-affecting influence of memory timings. A simple modification of the timings in the BIOS Setup can provide a performance gain up to 5%.
At a 266MHz FSB we couldn't see any influence of the memory subsystem parameters on the final rendering task. Here, with an overclocked FSB, the speed of the test shows some dependence on memory timings and frequency.
What we see in the media content encoding test looks much alike to the results of the gaming tests. Once again we are shown that increasing the memory frequency only makes sense when is performed within the limits imposed by the FSB bandwidth.
Even WinRAR, which at a 266MHz FSB ran better on systems with higher-frequency memory, doesn't speed up when we switch from DDR2-800 to DDR2-1000.
Summarizing all this, we can say that high-frequency memory on Core 2 Duo platforms will be demanded by overclockers in the first place. When the FSB frequency is above its default value, setting a higher memory frequency has a bigger effect on system performance than at the default FSB clock rate. In our tests we overclocked the FSB to 400MHz and enjoyed a considerable performance growth on installing DDR2-800 SDRAM. This growth was much more tangible than what we had at a 266MHz FSB, other conditions being the same. However, the system didn't speed up on our using 1000MHz memory because the speed of CPU-memory data transfers was now limited by the bandwidth of the FSB. Still it is clear that if you overclock the FSB to frequencies above 400MHz, you will see performance gains on using memory faster than DDR2-800, too.
We've already given our general recommendations about choosing system memory for Core 2 Duo systems in the course of this review. So, you should be aware by now what characteristics of DDR2 SDRAM modules are to be taken into consideration when you are building your computer with a new Intel processor with the Core micro-architecture. In case you've missed something, here are the main points.
First of all we have to acknowledge the high efficiency of data pre-fetch algorithms implemented in Core 2 Duo processors. It is thanks to them that platforms with such processors have data access latency comparable to that of Socket AM2 Athlon 64 systems, which feature an integrated memory controller. However, notwithstanding the impressive achievement of Intel's engineers, the memory subsystem of Core 2 Duo systems with an external memory controller, located in the chipset's North Bridge, cannot match the memory subsystem of Socket AM2 systems in overall efficiency. To be exact, the platforms with new Intel processors cannot provide as high memory bandwidth as the competing platforms do.
The memory bandwidth on Core 2 Duo systems is limited not by the characteristics of DDR2 SDRAM modules but by the bandwidth of the bus that connects the CPU with the chipset's North Bridge. This is why your changing the memory frequency or timings is going to have a small effect on performance of a non-overclocked Core 2 Duo system (but the frequency influences performance somewhat more than the timings do).
More interesting are the results of the overclocked platform. In this case, there is more sense in using fast memory and the optimal memory frequency divisor is 1:1 (FSB:DRAM) as has been shown in our tests. In other words, you can achieve maximum performance by using memory with lowest possible timings in synchronous mode. It means that if you overclock the FSB to 400MHz, DDR2-800 SDRAM with low timings is the optimal choice. If the FSB is overclocked more, DDR2-1000 or DDR2-1067 SDRAM is the best option. An additional argument in favor of using memory and the FSB in synchronous mode at overclocking is that the 1:1 divisor is the most stable one on a majority of mainboards.