Slower than Windsor? Brisbane in Synthetic Benchmarks
Same-frequency CPUs on the Brisbane and Windsor cores shouldn’t differ much in performance as they incorporate the same number of transistors and, as AMD claims, don’t differ at all in their micro-architecture. However, there are publications on the Web that argue that point. Let’s check it out.
First, we compared the speed of the computing units of processors on the Brisbane and Windsor cores (the latter had a total of 1MB of L2 cache). For the comparison to be correct, we set the clock rate of both CPUs at 2.4GHz. We tested them in the CPU benchmarks from the SiSoftware Sandra XI suite.
Sandra XI, Arithmetic ALU
Sandra XI, Arithmetic SSE3
Sandra XI, Multi-Media Integer MMX/SSE
Sandra XI, Multi-Media Floating-Point SSE2
The main computing units of the two CPUs indeed provide the same performance. The difference fits within the measurement error range.
The CPU benchmarks from SiSoftware Sandra XI do not depend on the memory subsystem speed. They are indicative of the “pure” performance of a CPU. But in real-life applications the speed at which the CPU is receiving data from memory has an effect on performance, too. So, we measured the bandwidth and latency of system memory as well as the latency of the L2 cache. We installed dual-channel DDR2-800 memory with timings of 4-4-4-12-1T for this test.
Sandra XI, L2 Cache Latency, clk
Sandra XI, Memory Bandwidth, MB/s
Sandra XI, Memory Latency, ns
Here’s a strange surprise to you. The L2 cache of CPUs on the new 65nm Brisbane core has higher latency and this increases the latency of the memory subsystem in general. And this ultimately reduces the overall memory subsystem bandwidth.
The discouraging results produced by SiSoftware Sandra XI are confirmed by other synthetic benchmarks of the memory subsystem.
CPU-Z, L2 Cache Latency, clk
CPU-Z, Memoy Latency, clk
ScienceMark 2.0, L2 Cache Latency, clk
ScienceMark 2.0, Memory Latency, clk
ScienceMark 2.0, Memory Bandwidth, MB/s
EVEREST 2006, Memory Read, MB/s
EVEREST 2006, Memory Write, MB/s
EVEREST 2006, Memory Copy, MB/s
EVEREST 2006, Memory Latency, ns
So, there can’t be any doubt: the L2 cache has become slower in the new core for Athlon 64 X2 CPUs. As a result, the new CPUs work slower with data in memory than their 90nm predecessors, which may show up in real-life applications, too. But the organization of the L2 cache hasn’t changed: 16-way associativity with a line length of 64 bytes.
So, we should seek elsewhere for the root of the problem. AMD has commented that the latency of the L2 cache has increased as a consequence of the engineers having left a reserve for enlarging the cache in the future. This doesn’t sound convincing to us, however. First, AMD’s plans don’t contain any information about enlarging the L2 cache even on the transition to the K8L micro-architecture. Second, Windsor-core CPUs with a 2x1MB L2 cache do not differ in cache performance from their counterparts that are equipped with a 2x512KB L2 cache. So, it is not yet clear to us why the speed parameters of the cache have changed.
The increased latency of the cache memory is not the only problem that can have a negative effect on performance of Brisbane-core CPUs. Another problem is about the fractional CPU frequency multipliers – the real memory frequency has been reduced in some modes because the default CPU frequencies now change with a step of 100MHz. In CPUs with the K8 micro-architecture the memory frequency is actually based on the CPU frequency and an integer divider. We’ve met this problem before, but it has grown worse with the new CPUs. To illustrate our point, here is a table that shows the real memory frequency in the different modes of the memory controller integrated into the CPU.
Processor frequency, MHz
That’s not a catastrophe, of course. There are very poor memory modes with CPUs that have integer frequency multipliers, too. Yet you should be aware of this thing because the real memory frequency is often much lower than the expected one with CPUs that have fractional multipliers. This has a negative effect on the overall system performance, of course.