Memory Subsystem Performance
First of all, we decided to study the performance of the memory controller integrated into Athlon 64 3200+. We have already mentioned above that low latency is one of the theoretical advantages of this solution, while the lower bandwidth than that of Athlon 64 FX is a theoretical drawback. Therefore, I decided to test the Athlon 64 3200+ memory controller and compare it to the dual-channel controller of the Athlon 64 FX/Opteron CPUs. In order to obtain adequate results, we compared the memory performance in an Athlon 64 3200+ based system with that in an Opteron 146 based system. Both these processors work at the same clock frequency and in fact differ only by the memory subsystem implementation. In an Opteron 146 based system we used a single-channel and dual-channel registered DDR400 with the timings set as 2.5-3-3-5, and in Athlon 64 3200+ based system – single-channel non-registered DDR400 with 2-2-2-5 and 2.5-3-4-5 timings. This way, we will be able to estimate not only the practical correlation between the performance of the memory subsystems in Athlon 64 3200+ and Athlon 64 FX/Opteron, but also the performance differences between the registered and non-registered memory.
So, let’s pass over to the practical experiments. We used the already familiar Cache Burst 32 utility, which is a worthy successor of the good old Cachemem:
Well, it was quite predictable that Athlon 64 FX/Opteron based system with a dual-channel memory controller will win the memory bandwidth test. There is another interesting thing here. If we compare the results for single-channel configurations, then we will notice that the memory controller of Athlon 64 ensures higher memory bandwidth, even if the memory configuration is absolutely identical to that of the registered memory in Athlon 64 FX/Opteron. This is exactly the price you pay for the use of Registered memory in Socket940 systems. Moreover, non-registered memory modules allows using much more aggressive timings and in this case the advantage of Athlon 64 (compared with the Athlon 64 FX/Opteron with a single memory channel) appears even greater.
The latency tests once again prove how slow the registered memory is. The memory controller of Athlon 64 working with the non-registered memory modules, outperforms the memory controller of Athlon 64 FX/Opteron even if the timing settings are the same in both cases. If you set the minimal timings possible for the Athlon 64 based system, then the its memory latency will appear 25% lower than that of the Athlon 64 FX/Opteron based system.
All this indicates that even though Athlon 64 FX boasts a dual-channel memory controller, the competition with the new Athlon 64 working at the same clock frequency is not settled once and for all. Due to the fact that Athlon 64 supports unregistered memory, the systems based on it can be even faster in some applications than Socket940 systems with dual-channel memory subsystem using registered modules. This is probably why the currently in production Athlon 64 FX-51 model works at a higher clock frequency than the top Athlon 64 processor. Otherwise, it would be simply impossible to claim that Athlon 64 FX is definitely faster than the regular Athlon 64 .
In this respect the upcoming Athlon 64 FX processors designed for Socket939 seem to me very interesting. These processors are due next year and they are expected to be free from the major drawback of the today’s Athlon 64 FX: they will support unregistered memory modules in dual-channel DDR SDRAM configurations.