L3 Cache and Memory Sub-System
As we can conclude from what has been said before, the major differences between Sandy Bridge and Sandy Bridge-E are in the number of cores and the memory sub-system. Therefore, we decided to dedicate an individual section to the practical aspect of the quad-channel memory sub-system and also touch upon the L3 cache. And it really deserves a special mention, because the developers gave up the previously used design principles in the new Sandy Bridge-E: the size of the shared L3 cache in processors of this family makes it possible to allocate not 2 but 2.5 MB per core.
This change is made with server loads in mind, where the size of cache-memory is of utmost importance. Nevertheless, the top Core i7 of the two desktop LGA 2011 models also has a larger cache. That is why Core i7-3960X and Core i7-3930K have different amount of cache memory. Extreme Edition processor has an L3 cache that provides 2.5 MB of memory per core, while Core i7-3930K processor has a more conventional amount per core – 2 MB.
It is also important to understand that they increased the L3 cache in Sandy Bridge-E processors for a reason. The changes have also touched upon its logical segmentation – associativity. As you remember, L3 cache memory of a standard Sandy Bridge processor had 16-way associativity. In Sandy Bridge-E with larger L3 cache this number has increased to 20. Therefore, the new design should provide higher L3 efficiency with a larger number of hits at a lower speed. It is also interesting that this is true only for Core i7-3960X with a 15 MB L3 cache, while the L3 cache in Core i7-3930K has common logical organization.
Taking into account these serious changes in the logical organization of the L3 cache memory in Sandy Bridge-E, we tested the practical performance of the L3 cache in them. We used our usual tool – AIDA64 Cache & Memory Benchmark.
The results are very interesting. L3 cache in LGA 2011 processors is indeed slower than in the previous-generation CPUs for LGA 1155 systems. The difference in read speed and latency is remarkably close to the same 25%, as in associativity increase. But it even more interesting that L3 cache in Core i7-3930K is not any faster than in Core i7-3960X. It looks like despite the smaller size and formally lowered associativity, its operational logics remained exactly the same as in other Sandy Bridge-E processors.
However, do not think that L3 cache in the new processors is worse than in regular Sandy Bridge. The bandwidth and latency got worse, but increased size and associativity still allow the requested data to be there more often than before. In other words, the new cache better conceals the memory sub-system performance.
But in this case the question is how much sense it actually makes, as LGA 2011 platform uses quad-channel memory, which should theoretically ensure higher bandwidth and lower latency. For example, if we use DDR3-1600 the theoretical peak memory bandwidth reaches 51.2 GB/s, which even exceeds the practical bandwidth of the L3 cache, which we have just seen in our tests. Well, I think we should run a few tests again. When we tested memory performance we used not only AIDA64 Cache & Memory Benchmark, but also another similar test called MaxxMem2. For a more diverse analysis the Sandy Bridge-E memory controller was tested in dual-, triple- and quad-channel mode.
We tested the memory speed in different tests on purpose, to make sure that the obtained paradoxical results couldn’t be written off as incorrect operation of a test utility. And to be honest, there were quite a few surprising findings. Namely, the dual-channel memory controller of the LGA 1155 Sandy Bridge processors turned out capable of delivering much higher practical speed than the quad-channel memory controller in the new Sandy Bridge-E. And it refers not only to latency, but also to memory bandwidth.
The comparative results obtained in the LGA 2011 system with different number of memory channel involved also look somewhat strange. It appears that expanding the memory bus compared to the dual-channel mode doesn’t really do anything and sometimes even has a negative effect. Moreover, the results obtained with DDR3-1333 and DDR3-1600 show that faster memory, even it works in dual-channel mode, can deliver higher practical performance than slower quad-channel DDR3 SDRAM.
However, even such unexpected results have a logical explanation. The thing is that the main purpose for adding more memory channels to the new Sandy Bridge-E was to increase not the speed, but the maximum supported memory size. Also do not forget that LGA 2011 is a server platform as well and this is where large system memory is necessary. Sandy Bridge-E controller can’t work with more than three modules per channel (the desktop modification supports only two modules per channel) that is why adding new channels is the most logical and the least expensive way out.
Even in the quad-channel mode overall lowering of the memory performance goes back to the server environment. It was very difficult to optimize the performance of the DDR3 memory controller in Sandy Bridge-E, because it has to be universal in the first place. To meet the needs of the server makers, it has to be compatible with buffered RDIMM (Registered) and LR DIMM (Load-Reduce) and in this case allow installing up to three modules per channel. Of course, the desktop LGA 2011 supports only unbuffered memory and only two DIMMs per channel, but Sandy Bridge-E memory controller is originally very flexible, which affects the practical bandwidth and latency in systems with new Core i7 processors.
The results on the diagrams below illustrate how insignificant the performance gain is even though LGA 2011 uses additional memory channels. These are the results obtained in benchmarks that are particularly sensitive to the memory sub-system performance. We took them on our Core i7-3930K based system with different number of memory channels involved and tested with DDR3-1333 and DDR3-1600 SDRAM.
The results once again prove that quad-channel memory sub-system organization doesn’t improve the performance a lot. If you want to build a faster system, first of all you should focus on the memory frequency, as increasing it proves to be much more beneficial. For example, dual-channel DDR3-1600 is almost always going to produce better results than quad-channel DDR3-1333. Therefore, if you already own dual- or triple-channel overclocker DDR3 SDRAM modules, you should replace it in your LGA 2011 system only if you can find a quad-channel kit with at least the same specifications.