DDR3 SDRAM for Sandy Bridge: Choosing the Best Memory for LGA1155 Platform

Sandy Bridge processors set a number of tasks for computer enthusiasts, one of them being choosing the right memory. LGA1155 systems can work with DDR3-1067, DDR3-1333, DDR3-1600, DDR3-1866 and DDR3-2133, but does it really make sense to use super-fast memory in them?

by Ilya Gavrichenkov
05/16/2011 | 07:59 PM

It is believed that memory frequency and timings do not have that much influence over the performance in contemporary systems. So, it only makes sense to invest into high-seed memory modules if the other system components, such as CPU, graphics card and hard drive, are already running at the top of their ability. This reasoning didn’t emerge out of thin air. True, the tests show that improving the memory sub-system settings in Phenom II, Core i7 and Core i5 systems will only provide about 3-7% speed gain, which is a minor improvement.

 

However, these conclusions were first made a while back, so they are primarily true for the previous generation platforms. As for the influence the memory sub-system speed has on the overall performance of contemporary LGA1155 systems, we haven’t yet discussed anything like that. And it obviously makes little sense to translate the old results onto the new Sandy Bridge based platform. Back during our first discussion of this new innovative microarchitecture, we pointed out that the implementation of the memory controller in Sandy Bridge is dramatically different from the way it was implemented in the older Westmere and Nehalem processors. Namely, now the memory controller is located inside a different functional unit than the L3 cache and uses a new ring bus to connect to the computational cores of the processor. All this could have any sort of effect on the memory subsystem contribution to the overall system performance. So, we decided to set up a special test session and find out which memory would be the most optimal choice for LGA1155 processors.

Closer Look at the Memory Controller

Sandy Bridge processors have the same formal characteristics of the memory controller as their predecessors. According to the manufacturer, it is compatible with up to 32 GB of DDR3-1066/1333 SDRAM. And just as before, you can use one or two unbuffered memory modules per channel. ECC technology is not supported in desktop processors. the single-channel and asymmetrical modes also remained unchanged: you do not have to have identical number of modules with identical specs, but you can only achieve maximum performance if you have an even number of identical DDR3 SDRAM modules in your system.

Although the specifications of the Sandy Bridge memory controller look exactly the same as those of the earlier memory controllers used in LGA1156 processors, its internal structure is totally different. I would like to remind you that the first processors on Nehalem microarchitecture had their memory controller and L3 cache located within the same Uncore functional unit working at its own frequency and voltage settings and using a crossbar bus to connect to the computational cores. Later, when Westmere processors came out, the memory controller was combined with the processor graphics core and was placed on an individual semiconductor die (separate from the computational cores). The internal processor structure in Sandy Bridge CPUs changed drastically again. They introduced ring bus to connect all processor units with one another, and L3 cache memory became an individual functional entity. As a result, the memory controller stopped wandering around different parts of the processor, turned into an independent unit and took its place next to the System Agent.

This internal reorganization had exceptionally positive effect on the memory sub-system performance. The memory controller got closer to the computational cores, so to speak. Now it is not only in the same semiconductor die as the L3 cache and the CPU cores, but also isn’t logically separated from them, because it is connected to the same ring bus, which connects all inter-processor components with one another. This connection to the ring bus working at the processor frequency ensured a significant increase in bandwidth between the processor cores, L3 cache and memory controller.

We clearly see that processors from the Sandy Bridge generation work with the memory much faster now. To illustrate this statement we resorted to popular synthetic benchmarks, such as AIDA64 Cachemem and MaxxMem2, to measure practical bandwidth and latency of the memory sub-system in LGA1156 and LGA1155 platforms. We used Lynnfield CPUs (on Nehalem microarchitecture), Clarkdale CPUs (on Westmere microarchitecture) and Sandy Bridge CPUs working at the same frequency of 3.2 GHz and equipped with the same dual-channel DDR3-1333 SDRAM with 7-7-7-21 timings.

As we see, Sandy Bridge not only fixes the issues of the Westmere microarchitecture, where the memory controller performance became almost the biggest disappointment, but goes even beyond that. From the practical values standpoint, Sandy Bridge memory controller became even faster than that in Lynnfield processors, although the latter has been considered the industry’s most efficient memory controller until recently. Its advantage is the most significant when it comes to memory bandwidth, which is, in fact, quite logical, considering that the memory controller is connected directly to the Sandy Bridge ring bus. So, it looks like the memory sub-system speed in LGA1155 systems may actually have a much more serious effect on performance than in previous generation platforms, because there are now fewer bottlenecks along the path between the computational cores and the actual memory.

DDR3 Frequencies and Overclocking

Increased efficiency is not the only advantage of the new memory controller. Sandy Bridge processors boast extended list of supported memory frequencies. Although Core memory controllers officially support only DDR3-1066 and DDR3-1333 SDRAM, they can clock the memory at higher frequencies just fine, and, in fact, this is no news to us. For example, Core i7 LGA1156 processors have settings that allow using DDR3-1600 SDRAM, while overclocker Core i7-875K and Core i5-655K CPUs support even higher memory frequencies than that. LGA1155 processors from the Sandy Bridge family support memory frequencies even beyond DDR3-1600. Any Core i3, Core i5 and Core i7 LGA1155 processors also unofficially support DDR3-1866, DDR3-2133 and DDR3-2400 SDRAM.

The memory frequency in Core processors is set as the base clock generator frequency (BCLK) times a corresponding multiplier. The multipliers supported by each processor type actually determine the acceptable frequency intervals. Sandy Bridge processors have a longer list of supported multipliers than their predecessors. That is why the BIOS option responsible for the memory frequency in LGA1155 systems looks as follows:

However, do not forget that new LGA1155 systems do not allow overclocking by raising the base clock generator frequency. The changes made to the clocking algorithms in LGA1155 systems allow only a miniscule adjustment of the BCLK frequency: in most cases the system will immediately lose its stability if the BCLK is increased by more than 5% above the nominal setting of 100 MHz. In other words, while in LGA1156 systems we could increase BCLK frequency to overclock not only the processor, but also the memory, this approach will not work in the new LGA1155 platforms. Now overclocking is merely the multiplier adjustment, so it is quite logical that the new Sandy Bridge processors acquired a wider range of settings for DDR3 SDRAM. This way Intel preserves the opportunity for users to take advantage of those overclocker memory modules. In other words, we had to overclock the CPU to ensure that our memory could run at higher frequencies, but today we can use high-speed DDR3 SDRAM kits without any CPU overclocking. We don’t even need to have a special overclocker processor modification with an unlocked clock frequency multiplier: any LGA1155 CPU allows you to enable a memory mode like DDR3-1866 or DDR3-2133.

At the same time you do not need any tricks to ensure that Sandy Bridge will work stably with the memory running at high frequencies. If the memory modules you are using can work at a certain frequency and with certain timings originally, then all you need to do is enforce these settings in the mainboard BIOS and that’s it. The memory controller works perfectly fine with any available settings at its default voltage. The only thing we would recommend in order to ensure better stability with DDR3-2400 SDRAM is to increase slightly the System Agent voltage (VccSA), but no more than to 1.2 V. as for the memory DIMM voltage, the recommendations here remain the same as for any other Nehalem processors. It is not recommended to increase the memory voltage beyond 1.65 V, because it may cause the processor memory controller failure.

Testbed Configuration and Testing Methodology

Sandy Bridge processors offer us a lot of flexibility when it comes to experimenting with system memory. You can use high-speed DDR3 SDRAM modules even in those systems that are not cut for overclocking and have their CPU working in nominal mode. That is why we decided to investigate the influence of the memory speed on the system performance in two different modes: in nominal mode and with the overclocked processor. And even though the only difference between the overclocked and non-overclocked Sandy Bridge processor is in the multiplier, the system memory speed may have different effect on performance in both cases. Overclocking increases the CPU’s need for processing data that is why high memory speed may be of greater importance in high-performance systems. Moreover, higher CPU frequency leads to higher bandwidth inside the processor ring bus, so the memory controller efficiency may also improve during overclocking.

To check out whether our assumptions are true, we put together an LGA1155 system on a quad-core Core i5-2500K processor from the overclocking-friendly K-series that features an unlocked clock frequency multiplier. We completed the system with a pair of DDR3-2100 memory modules from GeIL: GeIL EVO ONE PC3-17000, that support a wide range of frequencies and latencies. As a result, in the end our test system consisted of the following components:

In nominal mode the technologies responsible for interactive management of the processor clock frequency, namely – Turbo Boost and Intel Enhanced SpeedStep – remained active.

In overclocked mode Turbo Boost technology was disabled, but Intel Enhanced SpeedStep remained up and running. The CPU clock frequency was set at 4.7 GHz.

The memory was tested in the following modes, which represent the settings of the today’s most popular DDR3 SDRAM kits:

Performance

Memory Sub-System Synthetic Benchmarks

First of all we are going to check out the synthetic tests of the memory sub-system performance. We will test the actual bandwidth and latency using Cachemem benchmark built into AIDA64 utility.

The obtained results reveal a few interesting things. First of all, I would like to point out right away that there is a significant difference in actual memory sub-system speeds between the memory modules with different frequencies and timings. By simply increasing the DDR3 SDRAM speed from 1067 MHz to 2133 MHz we stimulate a gigantic 60% increase in practical bandwidth. We haven’t seen anything like that in systems based on other processors, which indicates clearly that there are really no serious bottlenecks on the bus fragment between the processor cores and system memory.

Secondly, it is quite symptomatic that not only the read speed, but also the write speed depends on the memory modules frequency. There was no dependency like that in the previous-generation systems at all, or it was really minimal. This peculiarity of the Sandy Bridge memory controller will also contribute to the system performance gain upon increasing the memory frequency.

Thirdly, I have to say that DDR3 memory modules frequency has a greater effect on the memory sub-system performance than their timings. In fact, lower timings produce just a little lower practical latency, while by simply setting the memory frequency one 266-MHz increment higher we can easily outdo the effect from lowering the timings.

As a result, we can conclude that it definitely makes sense to use overclocker memory in LGA1155 systems. However, we should have our preferences set for higher frequency rather than lower timings. Anyway, now we are talking only about the results of synthetic benchmarks, which serve to estimate the system performance during work with the memory.

General Performance: PCMark Vantage, 3DMark 11

To estimate the average platform performance, PCMark Vantage measures the speed of actual popular algorithms users work with every day. And here we no longer see any dramatic performance differences between the systems featuring memory modules with different specifications. The memory frequency increase by one 266-MHz increment produces a barely noticeable 1-2% performance gain. And the performance difference between the system equipped with the fastest DDR3-2133 and the slowest DDR3-1067 memory is only 5% in nominal mode and 6% in overclocked mode.

According to a popular 3DMark 11 graphics test, the graphics sub-system performance doesn’t really depend on the memory speed at all.

However, besides the general graphics score, 3DMark 11 also generates another score, which is particularly interesting in our specific case – Physics rating. This number is produced by a specific physics test that emulates the work of a complex mechanical system with a large number of objects.

It turns out that the mathematical calculations performed within this test are pretty sensitive to the memory speed. And by simply increasing its frequency you can significantly boost the performance up to 15-20%. Note that the effect from the increase in the memory sub-system performance is most noticeable in an overclocked system. However, when our test Core i5-2500K works at its nominal frequency, most of the performance growth occurs in the interval between DDR3-1067 and DDR3-1600. Faster memory modules have less obvious effect on the performance in the physics test.

Performance in Applications

To test the processors performance during data archiving we take WinRAR archiving utility. Using maximum compression rate we archive a folder with multiple files 1.1 GB in total size.

Different applications react differently to the changes in the memory sub-system parameters. And although the average dependency between the performance and memory frequency or timings is usually not very prominent, other situations are also possible. Archiving is actually one of these situations: you can’t underestimate the importance of the memory sub-system performance here. It is remarkable that when our Core i5-2500K overclocked to 4.7 GHz works with the slow DDR3-1067 or DDR3-1333 SDRAM, it is slower than a non-overclocked processor working in tandem with faster DDR3-1866 or DDR3-2133 SDRAM. In fact, this is not surprising at all, because the 266 MHz increase in the memory frequency leads to about 5-10% acceleration in data compression speed. The memory timings have a much smaller effect: one increment either way causes about 2-3% change in the compression time.

We measured the performance in Adobe Photoshop using our own benchmark made from Retouch Artists Photoshop Speed Test that has been creatively modified. It includes typical editing of four 10-megapixel images from a digital photo camera.

The memory sub-system speed does affect the overall performance during image processing, but its influence is not so visible. Even if we compare the time it takes to complete the test using the slowest memory vs. the time it takes to complete the same test using the fastest memory, the results won’t exceed 3.5% for a non-overclocked system and 5.5% for an overclocked one.

In order to measure how fast our testing participants can transcode a video into H.264 format we used x264 HD benchmark. It works with an original MPEG-2 video recorded in 720p resolution with 4 Mbps bitrate. I have to say that the results of this test are of great practical value, because the x264 codec is also part of numerous popular transcoding utilities, such as HandBrake, MeGUI, VirtualDub, etc.

The results are almost the same as we have just seen in Photoshop. This process also doesn’t care much for the memory sub-system performance.

We use special Cinebench test to measure the final rendering speed in Maxon Cinema 4D.

Looks like low dependence of the system performance on the memory sub-system speed and timings settings is typical for Sandy bridge platforms. However, it is not that much about the platform, but mostly about the actual applications: most of them do not work with large data arrays, so large cache-memory of contemporary processors can easily ensure fast access to data.

Performance in 3D Games

At the same time, there are some applications that use operating memory very actively, and therefore react immediately to any changes in its speed. These applications are 3D games.

As you know, it is the graphics subsystem that determines the performance of the entire platform equipped with pretty high-speed processors in the majority of contemporary games. Therefore, we do our best to make sure that the graphics card is not loaded too heavily during the test session: we select the most CPU-dependent tests and all tests are performed without antialiasing and in far not the highest screen resolutions. In other words, obtained results allow us to analyze not that much the fps rate that can be achieved in systems equipped with contemporary graphics accelerators, but rather how well contemporary processors can cope with gaming workload.

As we can see, gamers should really take memory speed into consideration. Of course, the situation is different in different games, but all in all one 266-MHz increment increase in the memory frequency produces about 2% gain in fps rate in nominal mode and about 3-4% in a system with an overclocked processor. Therefore, choosing the right memory for a gaming computer should be taken seriously enough. Slow DDR3 SDRAM modules may turn into a system bottleneck that will prevent the processor and graphics card from unveiling their true potential. Especially, since there are some games (in our case it is F1 2010), where you can gain a fps or two by simply playing around with memory timings. Not to mention a significant performance boost resulting from the increase in the memory frequency.

Conclusion

Another modification of the memory controller that happened during the launch of the new processors on Sandy Bridge microarchitecture deserves our most positive feedback. Intel engineers not only managed to fix the issues in the memory controller of the previous generation Westmere processors, but also to create a new controller, which turned out the highest performing of all existing modifications. Due to elimination of all major bottlenecks between the computational cores and the memory controller, Sandy Bridge proved to be more dependent on the specifications of the DDR3 SDRAM modules in the system than the predecessors or competitors.

However, it doesn’t change the situation in a larger scale. Every time when we discussed the effects of memory speed on the overall performance in certain configurations, we arrived at the conclusion that these effects were quite insignificant. This conclusion that we made back in the days for Socket AM3 and LGA1156 systems proved true one more time. It is also valid for Sandy Bridge based platforms and is backed up by the test results. The results show that the 266 MHz increase in the memory frequency produces only 2-4% growth in the average performance. And by setting all latencies one step lower we can only boost the performance by 1-2% at best.

However, all this doesn’t mean that you should disregard the need to make an educated decision on the best memory for your LGA1155 system. A slight practical effect from the use of faster memory is an average picture. At the same time, there are applications that work with large amounts of data and their performance depends much greater on the DDR3 SDRAM specifications. Among applications like that are, for example, some contemporary games, where you can gain a few extra frames per second by simply upgrading your memory.

This uncertainty together with pretty wide range of DDR3 SDRAM prices on modules with different specifications do not allow us to give specific recommendations regarding the best memory choices for Sandy Bridge platform. However, in general terms, you should keep in mind two things. Firstly, the memory frequency is of greater importance for the overall system performance than the memory timings. Secondly, the additional financial investments into faster memory may not pay back in the long run. In particular, high-speed DDR3-2133 and DDR3-1866 modules may cost 1.5-2 times more than the ordinary DDR3-1333 SDRAM.

Therefore, we believe that inexpensive DDR3-1600 SDRAM with not very aggressive timings would be the most reasonable choice for contemporary LGA1155 systems: in our opinion, memory like that offers the best price-to-performance ratio today.