Clarkdale and High-Speed DDR3 SDRAM: Does It Make Any Sense?

The memory controller of the dual-core Clarkdale processors is seriously inferior to the integrated memory controllers of other Intel processors in terms of pure performance. We decided to check if there is a way to fix this situation by using overclocker DDR3 SDRAM.

by Ilya Gavrichenkov
07/26/2010 | 03:57 PM

During the very first discussion of the new dual-core LGA1156 processors from Intel that belong to the Clarkdale family, we pointed out a few unpleasant peculiarities of its memory controller. These processors work with the memory very slowly, as you can see from comparing the results of memory subsystem bandwidth and latency tests in systems built around dual-core Core i5 and quad-core Core i7. For example, Everest Memory Benchmark launched on a new dual-core Clarkdale and a quad-core Lynnfield working at the same clock frequency produces the following results:

   

Clarkdale 2.8 GHz
  DDR3-1333 9-9-9-27   

Lynnfield 2.8GHz
  DDR3-1333 9-9-9-27  

Memory Read, MB/s       

9341

12891

Memory Write, MB/s

10014

10598

Memory Copy, MB/s

10582

15540

Memory Latency, ns

82.8

53.7

Although the dual-core processor was manufactured with more advanced 32 nm technological process and came out later than its quad-core counterpart, it is lagging behind the latter quite substantially. However, this lag is no unexplainable phenomenon for us. The thing is that unlike quad-core Lynnfield processors, dual-core Clarkdale CPUs do not have monolithic structure, but consist of two semiconductor dies put inside one packaging. At the same time, the memory controller is located inside a separate die than the computational cores. That is why the memory subsystem works slower, since there is now an additional QPI bus on the stretch between the processor and the memory, which helps the dies inside Clarkdale to communicate with one another.

For this particular reason the peculiarities of memory subsystem functioning in systems built around dual-core Core i3 and Core i5 processors deserve a separate study. It obviously makes no sense to generalize the results obtained earlier in LGA1156 systems equipped with quad-core CPUs. Therefore, we decided to carry out an individual test session that will reveal the influence of memory clock frequency and latencies on the performance of Clarkdale based systems.

This topic appears even more acute due to the fact that we got out hands on an overclocker Core i5-655K processor with an unlocked multiplier (this CPU is also available in retail already). It not only allows adjusting the multiplier to achieve the desired clock speed, but also offers access to a wide range of memory operation modes. While regular Clarkdale processors only supports DDR3-1333 SDRAM in nominal mode, the processor model with an unlocked multiplier also allows clocking the memory faster, namely as DDR3-1600, DDR3-1866 and DDR3-2133.

I hope that the ability of Core i5-655K processor to support higher-speed memory modes will at least partially make up for the slow memory controller, because in most cases it is the memory controller that becomes the bottleneck of Clarkdale microarchitecture preventing these processors from unveiling their true performance potential.

Closer Look at GeIL EVO ONE PC3-17000 (GE34GB2133C9DC)

In order to test the processor with a memory controller that allows (theoretically) using DDR3-2133 SDRAM without even increasing the base clock, we meticulously searched for suitable memory that wouldn’t run into any problems at such high frequencies. We decided to go with the DDR3 EVO ONE series from GeIL that has long been popular among overclockers. The GE34GB2133C9DC kit included a pair of 2 GB modules that were designed to work at 2133 MHz frequency with 9-9-9-27 timings. I have to say right away that a solution like that is something pretty rare in the today’s market of enthusiasts memory, so it is obviously worth a closer look.

Although GeIL GE34GB2133C9DC memory comes in a standard package, the modules inside look pretty extraordinary. No wonder, since memory modules working at high frequency like that, even without a significant voltage boost, require advanced chips cooling: obviously simple stamped aluminum heat-spreaders like the ones installed onto most enthusiasts memory modules won’t be enough.

EVO ONE modules cooling consists of two parts. The first part is made of pretty typical aluminum plates that are glued to the chips with sticky thermal interface band. The second part consists of a straight heatpipe with 25 thin fins on it that are turned across the module itself. The ends of this heatpipe are pressed between the heat-spreader plates on top of the memory chips, that is why it doesn’t get any direct heat, which gives us some cause for concern whether this cooling solution is in fact effective at all. However, this cooling configuration does have one indisputable advantage: it is built in such a way that the airflow from the processor cooler blows right through it.

Among the advantages of EVO ONE I have to point out that it is tested with DBT technology (Die-hard Burn-in Technology). This testing procedure implies that the modules undergo a 24-hour stress-test with significantly increased voltage and the ambient temperature increased to a 100°C. This approach allows the manufacturer to single out potentially weak chips that may fail during the beginning operation period, which most returns usually fall into.

But even these significant advantages can’t make up for a pretty serious drawback of GeIL modules from EVO ONE series: their cooling system is extremely large. The heat-spreaders are so tall that the modules cannot fit into every system out there. A relatively large number of high-performance processor coolers hang over the DIMM slots, and in this case EVO ONE modules have absolutely no chance of fitting into these DIMM slots.

The stickers attached to both modules contain the product part number alongside with their primary specifications. The complete list of GeIL GE34GB2133C9DC specs looks as follows:

In other words, this memory works at 2133 MHz frequency at the usual memory voltage of 1.65 V that has become an unofficial standard for overclocker modules designed for LGA1156 and LGA1366 systems.

GeIL GE34GB2133C9DC modules support XMP profiles. One of these profiles duplicates the official specs:

Moreover, the profiles also indicate that the manufacturer promises fault-free operation of the kit with 8-8-8-25 timings at 1900 MHz frequency, and at 7-7-7-22 timings – at 1666 MHz frequency. The SPD, as usual, contains settings that guarantee the system operation without any configuration adjustments.

DDR3-2133 memory from GeIL proved in the course of practical tests that its high-end specifications are exactly what they claim they would be. When we used this memory in a system built with Asus P7P55D Premium mainboard on Intel P55 Express chipset and a Core i7-860 processor that was previously used in our DDR3 SDRAM reviews, we could get our memory to work stably as DDR3-2214 with 9-9-9-27-2T timings. At the same time the voltage of our DDR3 memory modules was 1.65 V, which is exactly as the official specification and Intel recommendations state.

This is highly positive outcome and it speaks very highly of GeIL memory kit. But it is twice as great that with aggressive memory timings, GeIL EVO ONE PC3-17000 also doesn’t disappoint us. To be more exact, this memory worked quite stably as DDR3-1745 with 7-7-7-20-1T timings.

And as we see from the screenshot above, in this mode the memory subsystem performance is only a little lower than at the maximum frequency and slightly more “lenient” timings settings.

At this point we could have finished our story of excellent DDR3-2133 memory modules from GeIL, if it hadn’t been for one thing. You may have noticed that we checked out the overclocking potential of our GeIL GE34GB2133C9DC kit in an LGA1156 system based on a Core i7 processor, which has very little to do with the topic of our today’s review. And we had our reasons for that. The thing is that the memory controller in Clarkdale processors is different from the memory controller in Lynnfield CPUs, not only in terms of its location inside a separate semiconductor die. As our practical tests showed, DDR3 works in a completely different way when used with dual-core LGA1156 processors: we couldn’t reach the same high results during memory overclocking with a Clarkdale processor. And looks like this may be a general issue: any high-speed memory for computer enthusiasts works slower with dual-core LGA1156 processors than it would with a quad-core CPU.

For example, GeIL EVO ONE PC3-17000 kit that is initially designed for work as DDR3-2133, and in fact can remain fully stable at over 2.2 GHz frequencies, dropped down to the following level when installed into a Core i5-655K based system:

DDR3-2000 is the maximum our Clarkdale processor can offer GeIL GE34GB2133C9DC memory. In other words, overclocking of the memory controller in dual-core LGA1156 processors does reveal a few hidden obstacles, and the mere fact that overclocker Core i5-655K processor allows setting DDR3-2133 mode for the memory means nothing. In fact, we failed to find any practical corroboration of the fact that this memory could be used with a processor like that.

By the way, as further tests showed, Clarkdale’s memory controller causes problems not only when we try to use high-speed DDR3 memory. You may encounter some difficulties also when you try settings aggressive memory timings, although a lot depends on the divider you are using for the memory frequency. Nevertheless, it is an undeniable fact that the memory controller of the dual-core LGA1156 processors is less flexible than the memory controller in Lynnfiled CPUs.

Testbed Configuration

For our tests we used the following testbed:

Test 1: Nominal Mode

The first part of our test session was devoted to the work of our system in its nominal mode, when none of the system components were overclocked. Only the multiplier for the memory frequency and memory timings were changed. I have to say that during this test we tried to emulate the most typical operational conditions for our platforms, so we decided not to deactivate any processor technologies. Hyper-Threading, Turbo Mode and Enhanced Intel SpeedStep worked as usual: the system saw our Core i5-655K processor as a quad-core one, and its clock frequency increased to 3.33 GHz or 3.46 GHz under computational load of different intensity.

At first we were going to test Core i5-655K in all possible modes, which are in this case more numerous than by regular LGA1156 processor. The regular Core i5 CPUs can only clock the memory as DDR3-800, DDR3-1066 or DDR3-1333, but the overclocker Core i5-655K also supports DDR3-1600, DDR3-1866 and DDR3-2133 modes. At least, this is the conclusion we managed to draw after checking out the settings available in the mainboard BIOS with this processor. However, practical experiments showed that not all the memory configurations are operational. In particular, as we have already mentioned above in the description of our DDR3-2133 memory, Core i5-655K failed to work stably with the memory frequency at 2133 MHz. Therefore, we had to eliminate the DDR3-2133 mode from our tests. Another problem emerged in DDR3-1600 mode. In this mode the memory remained stable only with less aggressive timings set to 9-9-9-27 or 8-8-8-24. When we set the timings to 7-7-7-21, the system would freeze on boot-up, although GeIL GE34GB2133C9DC definitely supports this operational mode according to the spec. In other words, the Clarkdale memory controller is not as simple as it seems at first glance, so it is obviously very strange that the regular representatives of the Clarkdale family do not have any coefficients that could allow clocking the memory at frequencies past DDR3-1333.

We used Cachemem benchmark in Lavalys Everest utility to test the memory subsystem bandwidth and latency.

Things are very interesting here. On the one hand, memory frequency increase and lower timings logically lead to lowering of the overall practical latency. But on the other hand, if we look at the performance level during reads from the memory subsystem (and it is one of the most important practical parameters), its growth will not be really noticeable. It seems that with memory working at speeds past DDR3-1333 the bandwidth along the path between the processor computational cores and memory is artificially limited by some obstacle. So, it would be quite logical to assume that this obstacle is none other but the internal bus connecting the processor die with the die containing the memory controller itself. In other words, using high-speed memory with Clarkdale processors without any overclocking by raising the base clock frequency won’t do much good.

Of course, the benchmark results reflect this observation clearly:

Although fast DDR3 SDRAM only provides lower memory subsystem latency, barely affecting the actual bandwidth increase, we can see that the performance increases in a number of tests when we install faster memory modules into our testbed. On average, the use of DDR3-1866 instead of DDR3-1333 delivers about 3% performance boost. This is exactly what Intel takes away from us by telling us not to use faster memory than 1333 MHz with their Core i5 and Core i3 processors. In our opinion, this is a pretty ephemeral increase in actual speed, that gives us good reasons to doubt the need for high-speed DDR3 SDRAM in non-overclocked LGA1156 systems. Sad as it might seem, but the memory controller in Clarkdale based systems is no match for the memory controller in Lynnfield under no circumstances, so the increase in the memory frequency does absolutely nothing in this case.

Test 2: Overclocking

The results above can hardly be considered a motive for using high-speed memory in LGA1156 systems with dual-core processors. However, even this way of using high-speed memory, namely without increasing the base clock, is only available to select users, who are lucky to have a K-series CPU with an unlocked frequency multiplier. Most systems based on Clarkdale processors have their CPU overclocked by raising the BCLK frequency, when there is faster memory with 1333MHz+ speed. This is exactly why we decided to devote the second part of our test session to studying the effects of the memory subsystem settings on the performance of a system overclocked in this particular manner.

I have to point out right away that when you overclock your processor by raising the base clock, all busses in your system will be working at higher speeds. Besides the processor clock frequency, the speed of the QPI bus connecting the processor dies inside also increases. As a result, we expect to see more obvious connection between the performance and memory subsystem speed.

In order to make the testing conditions as realistic as possible, we overclocked our Core i5-655K CPU to 4.4 GHz. This frequency was obtained with 22x multiplier and 200 MHz BCLK clock. All dual-core Clarkdale processors support this multiplier, so the results of this test can be easily applied to majority of systems out there.

During the tests we disabled Turbo Mode, which changes the processor multiplier dynamically, because only in this case overclocking could be most fruitful.

The increase in the base clock frequency caused the available memory frequencies set with individual multipliers to change. Instead of DDR3-800, DDR3-1067 and DDR3-1333 the CPU automatically got support for DDR3-1200, DDR3-1600 and DDR3-2000 respectively. All these three modes are available not only in systems using a CPU with unlocked multiplier: they can be enabled with any Clarkdale processors. This is exactly why we used this particular BCLK frequency of 200 MHz, as this trick lets you have DDR3-2000 mode on any LGA1156 CPU.

As usual, synthetic benchmarks come first:

As we have expected, the memory frequency have much more influence over the memory subsystem practical bandwidth when the CPU is overclocked. While in nominal mode 40% increase in memory frequency boosted reading from the memory by only 7%, now the 66% higher DDR3 SDRAM frequency produces over 21% boost in memory subsystem practical bandwidth. In other words, overclocking of the internal processor QPI bus does have a positive effect on the memory subsystem performance.

There is only one catch: the results in the system with DDR3-1600 SDRAM are very low. However, there is no mistake here: the Clarkdale memory controller simply prepared another unexpected surprise for us. It is peculiar of the 8x memory frequency multiplier (which in nominal mode allows clocking the memory as DDR3-1067) to lower the practical memory subsystem bandwidth.

We see no anomalies in terms of latencies. More aggressive timings as well as increase in the memory frequency always cause adequate lowering of the practical latency.

Now let’s see how this diverse functioning of the memory controller affects the performance in other benchmarks and applications:

DDR3-1600 mode selected with the BCLK increased to 200 MHz is really far not the most optimal choice. According to the results, DDR3-1200 with lower timings almost always produces better outcome. However, we have no complaints about DDR3-2000. High-speed memory with this frequency does have a highly positive effect on overall performance during overclocking. On average, using DDR3-2000 in an overclocked system with a dual-core LGA1156 processor can improve performance by about 5%, and in applications that are sensitive to memory subsystem parameters (such as games) the performance may increase as far as by 10%.

Conclusion

However, the primary conclusion that we can draw from the results of our today’s test session is that the performance of dual-core LGA1156 processors doesn’t seriously depend on the memory speed. And this is something that is not so unique for Clarkdale processors: we have already talked about the minimal effect of the memory speed on performance many times before.

But there is one peculiarity in this case. Although Clarkdale processors formally have an integrated memory controller, in reality the controller is inside an individual semiconductor die that is connected with the processor die via QPI bus. This additional bus becomes a bottleneck that causes the performance to increase minimally when the memory is faster than DDR3-1333. However, DDR3-1600 or DDR3-1866 SDRAM without the overclocked BCLK frequency can only be used with Core i5-655K CPU with an unlocked frequency multiplier, which is not really that widely spread. Regular dual-core Core i5 or Core i3 processors do not allow clocking the memory at anything higher than 1333 MHz in nominal mode.

During overclocking by raising the BCLK frequency, the frequency of the notorious QPI bus also increases, so there are no more problems with high-speed memory in systems with the overclocked Clarkdale processor. Instead, we discover another peculiarity: it is better to stay away from 8x multiplier for the memory, because the actual memory subsystem bandwidth for some reason educes substantially in this case. Therefore, it is best to use either the smaller multiplier of 6x with the minimal timings, or the maximum multiplier of 10x to achieve maximum performance.

In the best case scenario, you can gain as much as 8-10% in extra performance by playing with the memory subsystem parameters. It is up to you to decide if that is enough to justify the investment into overclocker memory. But it definitely makes no sense to buy anything faster than DDR3-2000 for a system with a dual-core LGA1156 processor. Even if you have a Core i5-655K CPU with an unlocked frequency multiplier (not to mention regular Clarkdale CPUs), you won’t be able to use super-fast memory unless you have extreme cooling solutions.

Here I would only like to add that despite all tricks, we could get the controller of our Clarkdale CPU just a little bit closer to its counterpart in Lynnfield processors. We had to overclock all busses in our Clarkdale based system by 50% and use high-speed overclocker DDR3-2000 SDRAM in order to get our memory subsystem as fast as that in Lynnfield based configurations with DDR3-1333 SDRAM. So, more expensive quad-core LGA1156 processors are indisputably faster than their dual-core brothers not only when it comes to computational power, but also to work with the memory subsystem.