What is the best memory for Haswell? From DDR3-1333 to DDR3-2933: performance scaling test

The development of the new processor microarchitectures goes on and frequencies of contemporary types of DDR3 SDRAM grow as well. Is there any sense in using high-speed memory with modern Haswell processors? To answer this question, we have analyzed DDR3 frequency and timings influence on LGA 1150 platform performance.

by Ilya Gavrichenkov
03/28/2014 | 06:16 PM

We don’t often check out top-end computing platforms with different memory types. That’s not a topic that interests the average user. We’ve all come to think that the clock rate and timings of DDR3 SDRAM do not affect performance much, so we don’t pay much attention to choosing system memory. It is usually the last component you think about when building a new computer, even if you’re an enthusiast. In fact, the only memory parameter that is still important is its amount. Everyone knows that a lack of system memory may result in the OS and applications using the swap file and thus making your computer less responsive. We seem to have forgotten that memory modules have parameters other than capacity.

 

Of course, there are reasons for our attitude. It is indeed a fact that the clock rate and latencies of DDR3 SDRAM didn’t have much effect on performance. Why? First, some time ago processors got lots of cache memory with data pre-fetch algorithms which turned out to be most efficient at concealing the actual memory access speed from the application. Second, DDR3 SDRAM modules available on the market didn’t actually vary much in terms of their speed and latency. And third, average users didn’t run applications that needed to process really huge amounts of data. It is due to these reasons that the notion of fast DDR3 SDRAM as of a high-status product for perfectionists came about.

But while this notion was well-grounded just a couple of years ago, it seems to have got obsolete by now. Today’s applications are different and work with much larger amounts of data than before. Ordinary people find themselves editing digital photos tens of megapixels large or being creative with Full-HD or even Ultra-HD video. Today’s 3D games have colossal amounts of texture data, too. So much data just cannot fit into the processor’s cache, especially as the amount of cache memory hasn’t been growing much in recent CPUs.

As for memory products, they have become much more variegated, so there’s a twofold difference between them when it comes to clock rate. Choosing one or another memory kit, you can vary your system memory bandwidth from 21 to 47 GB/s and even more! The latest Haswell-based CPUs have also got faster than their predecessors and need faster access to data to show their full potential. That’s why we can expect that low-speed memory like DDR3-1333 and DDR3-1600 may be indeed too slow for today’s top-end platforms and we need to carry out some tests to find out the optimal memory type.

There’s one more reason for us to take up this topic right now. In fact, it may be the last opportunity for us to do so because, starting from the second half of this year, the faster, more economical and more advanced DDR4 SDRAM is going to arrive to desktop computers. It will be first supported by the Haswell-E series. Later on, in 2015 or 2016 this new memory type will become available with the LGA1151 platform and Skylake processors. So it is now just the right time to check out different DDR3 SDRAM products with Haswell-based CPUs. Let get started!

Haswell’s Memory Controller

At the first approach the memory controller of today’s LGA1150 processors, known under the codename of Haswell, doesn’t seem to differ much from its predecessors in the Sandy Bridge and Ivy Bridge CPUs. Intel’s memory communication algorithms have been evolving long and in many ways and seem to have reached a point of perfection in the latest CPU designs. What puts Intel’s modern memory controllers head above all other solutions is the connection of all CPU subunits with a common Ring Bus which was introduced in the Sandy Bridge. Thanks to the Ring Bus, all of a CPU's computing and graphics processing resources have the same fast access to both L2 cache and memory controller. As a result, the practical memory subsystem bandwidth got higher whereas its latencies were reduced.

This Ring Bus architecture has been adjusted in the Haswell design, though. In the earlier CPU designs, the Ring Bus and the L3 cache worked in sync with the CPU's computing cores, which provoked some problems with switching to power-saving states. The L3 cache and Ring Bus might lower their performance along with the x86 cores even though they were still required by the graphics core. To avoid this problem, the Ring Bus and the L3 cache are implemented as a separate and independently clocked domain in the Haswell.

With the Ring Bus now capable of being clocked asynchronously, there appeared unavoidable latencies in terms of L3 cache and memory controller access, but Intel tried to make up for that by various improvements in the microarchitecture. For example, the L3 cache acquired two parallel queues for processing requests of different types while the memory controller got longer queues and an optimized scheduler.

In fact, the Ring Bus, L3 cache and memory controller do not work asynchronously often. Apart from power-saving states, their clock rate is almost always identical to the clock rate of the x86 cores. It is only when the CPU switches to a Turbo mode or is overclocked that there is a difference. Yet even in the latter case the L3 cache and the Ring Bus are comparable to the x86 cores in their clock rate as the difference amounts to 300-500 MHz only. Practice suggests that it has no great effect on the resulting performance.

If we directly compare the Haswell’s and the Ivy Bridge’s memory controller, we will find that their bandwidth and latency are comparable at the same settings. This can be illustrated by the AIDA64 test, for example:


Ivy Bridge, 4 cores, 4.0 GHz, DDR3-1600 9-9-9-24-1N


Haswell, 4 cores, 4.0 GHz, DDR3-1600 9-9-9-24-1N

Yet we can notice that, despite Intel’s efforts, the Haswell’s memory controller is a little slower than the one we had in the previous-generation LGA1155 configurations with Ivy Bridge CPUs. The practical memory bandwidth is almost the same but the Haswell’s memory latency is 9% higher. That’s the tradeoff of the asynchronous design.

The second important innovation in the memory subsystem of the LGA1150 platform concerns mainboard design. Intel's reference design for DIMM wiring is now based on the T topology which ensures equality for DIMM slots connected to each memory channel. Making the memory controller more stable, this also ensures broader compatibility with memory modules and their configurations. Particularly, the Haswell's memory controller supports high-speed operating modes even if you install four dual-sided modules into all the available DIMM slots. Considering that the maximum capacity of currently selling DDR3 products is 8 GB, the LGA1150 platform can run up to 32 gigabytes of overclocker-friendly system memory at high clock rates and with low timings.

Otherwise, the controller has remained the same. Being dual-channel, it can work in both symmetrical dual-channel and single-channel modes. It supports Flex Memory technology to ensure dual-channel access with asymmetric memory module configurations (when the capacity and specifications of modules on different memory channels differ).

Like the Ivy Bridge series, the Haswell sets its DDR3 SDRAM clock rate with a step of 266 or 200 MHz, offering some flexibility in configuring your memory and expanding the number of clock rates supported by the controller. Although formally compatible with DDR3-1333 and DDR3-1600 SDRAM only, it allows using system memory at much higher frequencies on the LGA1150 platform. With the available frequency multipliers, you can enable even DDR3-2933 mode and you won’t have any stability issues at that.

We can also add that the Haswell's base clock rate can be increased from 100 to 125 MHz, so the top memory frequency is as high as 3666 MHz. And there's a lot of evidence on the internet that select overclocker-friendly memory can indeed be clocked at such a high frequency on the LGA1150 platform.

As you probably know, the Haswell introduced dramatic innovations into the power subsystem design. It has an integrated regulator that produces all the voltages necessary for the CPU. The mainboard now only has to yield two voltages: the processor's input voltage Vccin and the Vddq voltage for memory modules. All of the CPU's internal voltages, including those of the Ring Bus, L3 cache and memory controller, are produced by the CPU’s own integrated regulator. It means the memory voltage is not limited by the CPU, so the Haswell lets you set it above 1.65 volts safely. In other words, the LGA1150 platform allows you to overclock your memory at high voltages without fearing that the CPU's voltage regulator might fail.

All of these innovations make the Haswell’s DDR3 SDRAM controller highly efficient and suitable for overclocker-friendly memory modules, meaning that enthusiasts have got a lot of flexibility in choosing system memory for LGA1150 platforms with the aim of higher performance.

G.Skill [TridentX] F3-2933C12D-8GTXDG

Before we move on to our tests, we’d like to tell you about the memory modules we used to prepare this review. To get a full picture of how performance depended on memory parameters, we needed a DDR3 SDRAM kit with maximum speed. Such products are flexible in the sense that you don’t have to run them at the specified top clock rate, yet their manufacturers choose best-quality chips for them to ensure stability across different settings. Considering that the Haswell’s memory controller supports clock rates up to DDR3-2933, that’s the memory we needed for our tests.

DDR3-2933 SDRAM is currently offered by a limited number of brands including ADATA, Corsair, GeIL and G.Skill. It is the latter company that was kind to offer us its flagship product - the G.Skill TridentX F3-2933C12D-8GTXDG memory kit consisting of two 4GB modules. Rated for 2933 MHz with timings of 12-14-14-35-2N, it turned out to be able to work at Command Rate = 1N, too.

Here’re the official specs of this product:

The modules in this kit are equipped with exclusive black-and-red aluminum TridentX heatsinks which feature transformable design. G.Skill listened to user complaints that tall heatsinks had poor compatibility with massive CPU coolers, so the top (red) part of the heatsink can be removed by unfastening two screws. As a result, the total height of a TridentX module is decreased from 54 to only 39 mm. In their shortened version, the TridentX modules won't conflict with regular large-size CPU coolers but their heatsinks remain large enough to cool the memory chips.

To facilitate their installation and configuring, the G.Skill TridentX F3-2933C12D-8GTXDG modules support XMP 1.3. The only predefined XMP profile contains the specified frequency and timings. Considering the setup flexibility of the Haswell’s memory controller, it is extremely easy to make this memory run at 2933 MHz. You just plug the modules in and enjoy. It is even unlikely that you’ll have to additionally increase any voltage. The modules’ SPD describes a DDR3-1333 configuration for the sake of maximum compatibility, though.

This high-speed G.Skill product is based on Hynix H5TQ4G83MFR chips, which are very popular among overclockers. The chips are mounted on a specially designed 8-layer PCB, which ensures excellent overclocking potential and low heat dissipation. A practical test on our LGA1150 platform proved that the G.Skill TridentX F3-2933C12D-8GTXDG kit could work easily at 2933 MHz with timings of 12-14-14-35-1N.

It must be noted that this memory kit is designed for Haswell processors and Z87-based mainboards. It is only on such platforms that you can enable the DDR3-2933MHz mode. The kit comes with a long compatibility list, so it doesn’t restrict your mainboard choices. Most midrange and top-end mainboards from the leading brands are going to be compatible with the G.Skill TridentX F3-2933C12D-8GTXDG, which is an important advantage.

Thus, the only real downside of high-speed DDR3 SDRAM, like the one we're discussing, is its high price. The G.Skill TridentX F3-2933C12D-8GTXDG costs several times as much as a dual-channel DDR3-1866 kit, for example. It is hardly the best choice for price-conscious users. Such memory is an exclusive offer for enthusiasts who want maximum performance whatever the price.

Testbed and Methods

In this review we use a modern LGA1150 mainboard with Intel Z87 chipset, installing a Haswell-based Core i5-4670K processor. The focus of our testing is on the high-speed [TridentX] F3-2933C12D-8GTXDG memory kit from G.Skill.

Here is the full list of our testbed components:

We carry out our tests in Microsoft Windows 8.1 Enterprise x64 with the following drivers:

Take note that we overclock our Haswell-based CPU to 4.4 GHz. This ensures higher performance and highlights the correlation between performance and memory subsystem parameters.

Frequency vs. Timings

When it comes to choosing the right type of memory, you often find yourself choosing between higher clock rates and lower timings. This time around, however, we will not carry out detailed tests of DDR3 SDRAM modules with different timings. The fact is that with each new platform memory timings influence overall performance less. So today, the clock rate of DDR3 SDRAM has a much stronger effect than its timings.

There are two reasons for that. First, the minimum latency increases anyway along with memory frequency, so timings adjustments get relatively smaller. Increasing timings by 2 from a minimum of 3 or 4 (as with DDR2 SDRAM) and from a minimum of 9-10 (as with high-speed DDR3 SDRAM) means that the latency increases by 50-70% in the first case and only by 20-22% in the second case. So different combinations of timings do not actually differ much with today's memory. Moreover, the multi-level caching and data pre-fetch algorithms implemented in modern CPUs mask the real memory latency, making its bandwidth more important.

In fact, the manufacturers of overclocker-friendly memory kits have long realized that there's no need to achieve extremely low timings with DDR3 SDRAM. Products with latencies of 7 or 8 cycles have disappeared so it is hard to find DDR3 SDRAM modules with a CAS Latency of less than 9 or 10. But there are more and more products with very high clock rates and high timings.

To prove our point, we carried out a practical test to compare the real-life performance of identical Haswell-based configurations with DDR3-1600 and DDR3-1867 which had different memory timings.

The charts are most illustrative. Increasing the memory frequency by 266 MHz turns out to be far more effective than lowering all timings by 3 to 4 cycles. Even when it comes to real-life latency, which is heavily influenced by timings, DDR3-1867 with rather high timings of 10-10-10-29 turns out to be better than DDR3-1600 with aggressive timings of 7-7-7-21. Comparing the effective bandwidth, DDR3-1600 is always inferior to its higher-speed opponent.

Summing it up, we can see that memory timings have become a negligible factor for modern computers, so you should first look at the clock rate of your DDR3 SDRAM whereas a low CAS Latency and other such parameters have but a small effect on actual performance. The same goes for overclocking. You should first try to make your DDR3 SDRAM work at higher clock rates and only then minimize your memory timings.

Memory Frequency Impact on Performance

Now we’ve reached the main part of this review in which we'll try to estimate the effect of memory subsystem parameters of the LGA1150 platform’s performance in everyday applications. As we proved above, memory timings are a negligible factor even in synthetic benchmarks, so we will only focus on comparing DDR3 SDRAM with different clock rates. There are a lot of DDR3 SDRAM products available now with varying clock rates, so we will benchmark our Haswell-based platform with memory configurations from DDR3-1333 up to DDR-2933 SDRAM. We use the most popular timings for each clock rate. Here’s the full list of our memory configuration variants (dual-channel DDR3 SDRAM):

Besides the memory subsystem, our testbed with a quad-core Haswell-based CPU overclocked to 4.4 GHz always worked at the same settings in this test session.

Synthetic Benchmarks

We’ll start out by measuring the effective bandwidth and latency using the Cache and Memory benchmark from AIDA64 4.20.2820.

By changing your DDR3 frequency, you can change the effective memory bandwidth twofold. We might have expected this, though, as there’s a twofold difference between DDR3-1333 and DDR3-2933 in terms of clock rate and theoretical bandwidth. Surprisingly, the results do not increase linearly depending on frequency: the fastest memory modes do not ensure maximum bandwidth for some reason. The best results are achieved with DDR-2400 and DDR-2666.

The effective latency changes in a different way:

The latency lowers as the clock rate goes up even with the highest-speed DDR3 SDRAM configurations, so overclocker-friendly DDR3-2666 and DDR3-2933 memory may turn out to be useful in everyday applications. Let's check this out right now.

General Performance

To estimate general performance in typical usage scenarios we use three traces from the popular Futuremark PCMark 8 2.0: Home (typical internet activities + working in text and image editors), Work (office applications + internet), and Creative (serious photo and video content processing, 3D games, heavy internet use).

Fast DDR3 SDRAM configurations do not look superior here. While they were good in the synthetic memory tests, Futuremark PCMark 8 2.0 shows a completely different picture. Judging by the scores, memory subsystem speed hasn’t been improving much in the past decade. The difference between the fast and slow dual-channel DDR3 SDRAM is no larger than 1-2%.

We’ll also carry out some tests in specific applications to get a full picture.

Application Tests

In Autodesk 3ds max 2014 we benchmark the speed of mental ray rendering of a complex 3D scene.

The memory frequency doesn’t affect the speed of final rendering much. The twofold increase in DDR3 SDRAM bandwidth only translates into an extra 1% of speed.

The performance in Adobe Premiere Pro CC is measured as the time it takes to render a Blu-ray project with HDV 1080p25 video into H.264 format and apply special effects to it.

We've got a different picture when we process HD video content. The difference between DDR3-1333 and DDR3-2933 is up to 8%, which is quite large. So, there are applications that really care about memory subsystem speed.

By the way, if we examine the results closely, we can see that DDR3-2400 is the most optimal memory type for Premier Pro. You don’t get much performance benefits with higher memory frequencies although DDR3-2666 and DDR3-2933 kits cost much more than the slower products.

We benchmark performance in Adobe Photoshop CC using our custom test that is based on the Retouch Artists Photoshop Speed Test and consists of typical processing of four 24-megapixel images captured with a digital camera.

Photoshop is highly sensitive to memory subsystem parameters. Equipped with high-speed dual-channel DDR3-2933 SDRAM, our platform is 12% faster than itself with DDR3-1333. The optimal-buy DDR3-2400 is 8% faster than popular DDR3-1600, too.

To test the processors’ performance at data archiving we use WinRAR 5.0. Using maximum compression rate, we archive a 1.7GB folder with multiple files.

Compressing files always showed good scalability of performance depending on memory frequency, even in the era of LGA1155, LGA1156 and LGA775 processors. And today, each 266MHz increase in DDR3 SDRAM frequency makes WinRAR faster by 3 to 4%. So, DDR3-2933 helps our Haswell processor do 23% faster than with DDR3-1333 at archiving.

In order to measure how fast the tested CPUs can transcode video into H.264 format we used x264 FHD Benchmark 1.0.1 (64 bit). It measures the time it takes the x264 coder to convert an MPEG-4/AVC video recorded in 1920x1080@50fps resolution with default settings. The results have high practical value because the x264 codec is part of popular transcoding utilities such as HandBrake, MeGUI, VirtualDub, etc. We regularly update the coder used in this performance test. This time around, we use version r2389, which supports all contemporary instruction sets including AVX2.

HD video transcoding doesn’t depend that much on memory subsystem parameters. DDR3-2400 is a mere 3% faster than DDR3-1600, a difference of 266MHz in memory frequency translating into a 1% difference in performance. The performance benefits get even smaller after the memory clock rate goes above 2400 MHz.

Gaming Performance

Now, gaming performance is the most exciting part of our review because today’s 3D games need fast memory, so we expect that high-speed DDR3 SDRAM can show its best here.

On the other hand, it is the graphics subsystem that determines the performance of the entire platform in the majority of contemporary games. Therefore, we select the most CPU-dependent games and measure the frame rate in two test modes. For the first mode we use lower resolutions and disable full-screen antialiasing, so we could see if fast memory is actually required by gaming computers. This provides some insight into how platforms with different DDR3 SDRAM are going to behave in the nearest future when equipped with faster graphics cards. The second test mode refers to real-life settings: Full HD and maximum FSAA. In our opinion, these results are no less interesting as they demonstrate clearly the level of performance we can expect from contemporary platforms today.

When we use a low resolution, 3D shooters turn out to be highly sensitive to memory subsystem performance. The results suggest that adjusting the memory frequency alone can improve the frame rate by a third, as in the new Thief. The other games are less memory-sensitive, yet the average gap between the slow DDR3-1333 and the overclocker-friendly DDR3-2933 is as large as 20%. In other words, each 266MHz step up in frequency translates into a 2-3% increase in performance.

Well, this impressive scalability is largely due to our deliberately unloading the graphics subsystem. If we use the highest visual quality settings, we have a different picture:

Memory speed doesn’t affect the frame rate much now. The difference between the slow and fast memory configurations amounts to just a few percent. Thief is an exception, though. It shows that the clock rate of your DDR3 SDRAM is a performance-affecting factor even when you run your game at maximum visual quality settings. So if you're into gaming and want maximum performance, you shouldn't disregard your memory subsystem parameters altogether.

Conclusion

According to our tests, today’s Haswell-based platforms perform quite differently with different memory modules. We can’t say anymore that memory subsystem parameters don’t matter. By changing your DDR3 SDRAM parameters, you can actually ensure a performance boost of 20-30%!

Of course, not all applications are that memory-sensitive. Judging by our test results, you may want to choose high-speed DDR3 SDRAM in two cases: when you build a gaming computer or a home PC for image and HD video processing.

Frequency is more important than timings when it comes to top-end LGA1150 configurations. DDR3 SDRAM kits currently available don't differ much in their timings but vary greatly in terms of their specified clock rate. And indeed, it is the clock rate of DDR3 SDRAM that affects performance the most.

Today’s Haswell-based computers are optimized for high-speed DDR3 SDRAM. You will have no problems setting a memory frequency up to 2933 MHz. We'd recommend choosing the fastest kits available, yet they are too expensive, so we guess DDR3-2400 SDRAM is the most optimal choice right now in terms of price/performance ratio. It ensures much higher performance in comparison with DDR3-1600 while not being much more expensive. Moreover, our tests suggest that the higher memory clock rates have a smaller effect on performance although memory kits faster than 2400 MHz come at much higher prices.