AMD Athlon 64 Processors on E Core: Memory Controller Peculiarities in Detail

The E core revision of the AMD Athlon 64 processors features an enhanced integrated memory controller. In our today’s article we will discuss what has been improved compared with the previous controller version, namely, how the new Athlon 64 CPUs work with four memory modules and what benefits the DDR500 SDRAM support has brought these processors. Find out all the exciting details now!

by Ilya Gavrichenkov
07/28/2005 | 10:01 PM

Not so long ago AMD64 processor family with new E core revision appeared in the market. This processor core manufactured with 90nm process using SOI (Silicon on Insulator) and DSL (Dual Stress Liner) technologies got into a few processor product lines. The new E core found its way into several application fields. You can see it in Athlon 64 and Athlon 64 FX processors, where it is known under Venice and San Diego code names, as well as in dual-core Athlon 64 X2 processors, where it is known as Toledo or Manchester. It is also used in AMD Sempron processors where they call it Palermo.

When AMD develops and puts into mass production new processor cores, it aims not only at increasing the working frequencies of their new processors based on these cores, but also at improving their features and specifications. The E core revision appeared another threshold on this way: it provided Athlon 64 processors and their modifications with a number of absolutely new features. One of the most significant enhancements appeared the support of new SSE3 instructions, which has been already implemented in the competitor’s solutions since the 90nm Prescott core. Moreover, the integrated memory controller also underwent a few noticeable enhancements.

The tests showed, however, that the new processors cannot benefit that much from the SSE3 instructions support. There are still very few applications nowadays that really do take advantage of these instructions. Besides, the SSE3 instructions set can hardly be called fully-fledged.

Therefore, we decided to pay special attention to the changes introduced in the memory controller integrated into the E core revision of the new AMD processors. Note that in the earlier revisions of its processor cores AMD not only improved the performance of the integrated memory controller, but also expanded its compatibility with various memory modules and their combinations. The D core revision known mostly as Athlon 64 Winchester appeared a certain threshold at this point. First of all, Winchester processors boasted slightly faster memory controller than their predecessors. Secondly, Winchester based processors could work with DDR400 SDRAM memory modules installed into all the four DIMM slots on the mainboard. It looked like the most optimal solution was right there already, however, AMD engineers didn’t think so yet. AMD processors based on the E core revision feature an even more enhanced memory controller.

What did the engineers focus on this time? Of course, they again made a few optimizations in order to improve the memory controller performance. The tests of Venice based processors showed that they were faster than their Winchester based counterparts. Besides, the compatibility got even better. AMD processors on E core revision can work normally in the system with memory modules of different capacity and organization, which makes further upgrades a much simpler task for the users. Also, the CPUs based on the new core can also work just fine with four double-sided DDR400 SDRAM DIMMs. Another interesting feature of the new E core revision based CPUs is the new memory frequency dividers they support. This way, new AMD processors have no problems with DDR SDRAM working at frequencies exceeding 400MHz.

Today we are going to take a closer look at some of the above listed peculiarities of the new memory controller implemented in the E core revision, as they certainly deserve our special attention.


Support of Four Double-Sided DDR400 SDRAM Memory Modules

The integrated memory controller of the Athlon 64 processors is a pretty tricky thing. As soon as first CPU supporting two memory channels appeared, a lot of small unpleasant trifles began to pop up. It turned out that the relatively high workload on the controller created by the memory modules causes certain issues when Athlon 64 works in a system with four memory modules installed. If the Athlon 64 based system featured 4 memory DIMMs, the CPU could drop their working frequency, increase their timing settings, or simply refuse to work at all.

To be fair I have to say that the server buddy of Athlon 64 aka Opteron doesn’t have any of the above described problems because it works with more expensive registered memory modules. However, it doesn’t make much sense to use these modules in the desktop systems, therefore the users have to put up with some limitations once they decide to install more than two memory modules in their computer.

In fact, all these problems are being solved little by little, do there is no need to panic. If you remember, older Athlon 64 processors based on 130nm cores didn’t support 4 double-sided DDR400 SDRAM modules working at 400MHz frequency at all and reduced their frequency to 333MHz automatically. The today’s processors based on 90nm cores offer us a few better solutions. The D core revision also known as Winchester allows using four double-sided DDR400 memory DIMMs if the Command Rate timing is set to 2T.

Command Rate parameter determines how many clock cycles it is going to take for the memory modules to be able to receive the next command from the memory controller after it has been submitted. This timing may be increased if the address and command bus gets loaded really heavily when the signal transfer frequency of this bus is very high. Keeping in mind that the controller, memory modules and connecting wiring cannot be of 0 capacity, it will take some time to stabilize the logical level when the bus status is checked. Therefore, higher Command Rate parameter increased from 1T to 2T simplifies the work of the memory controller and adds to the system reliability at the expense of its performance rate.

Even before the E core revision had been released, some rumors circulated that this new core will allow giving up the 2T timing in systems with 4 double-sided memory modules. However, unfortunately, this has never happened. The CPUs based on this core can work with DDR400 SDRAM with 1T timing setting in all configurations except the one when we have 4 memory modules installed with at least two double-sided ones. And taking into account that most of the widely spread memory modules with 512MB capacity are double-sided, you can easily come across this particular situation with 4 double-sided DIMMs in the system.


This way, K8 CPUs based on E core revision allow using DDR400 SDRAM in the dual-channel mode in the following cases only:

Channel 1

Channel 2

Command Rate

Single Sided

-

Single Sided

-

1T

Double Sided

-

Double Sided

-

1T

Single Sided

Single Sided

Single Sided

Single Sided

1T

Single Sided

Double Sided

Single Sided

Double Sided

2T

Double Sided

Double Sided

Double Sided

Double Sided

2T

At first glance there is nothing really bad about increasing the timing setting in case of four double-sided memory modules. However, in some cases this can cause certain issues.

First, four double-sided memory modules can appear in the system after your next upgrade. A pair of 512MB modules, which are mostly double-sided, is the most frequent configuration today. Over 40% of systems used by computer enthusiasts today are equipped with two 512MB modules. The most natural way of increasing the capacity of your system RAM in this case is by adding another pair of 512MB memory DIMMs, which will result into the total memory capacity of 2GB. However, it is far not a simple question if the performance increases as well in this case, because you will have to raise the Command Rate timing to 2T.

I have to stress that switching to 1GB memory modules will not be a panacea in this case. The thing is that 1GB memory modules are still quite rare and expensive. Moreover, they cannot boast very aggressive timings because of their high capacity.

As a result, it looks like every owner of an Athlon 64 based system can one day get four memory DIMMs installed. In order to better understand what this sort of upgrade can actually mean for the overall system performance, we decided to run a few benchmarks on our Athlon 64 (Venice) based system and compare the results of a configuration with 1GB of RAM (2 double-sided 512MB modules) against those of a configuration with 2GB of RAM (4 double-sided 512MB modules).

For our tests we assembled two testbeds, which differed from one another only by the amount of system memory and the number of DIMMs installed. These systems were built with the following hardware components:

First of all let’s discuss the results of synthetic benchmarks showing the actual memory subsystem bandwidth and latency:

4 x 512MB

2 x 512MB

ScienceMark 2.0, Memory Bandwidth, MB/s

4498

5628

ScienceMark 2.0, Memory Latency, cycles

116

104

ScienceMark 2.0, Memory Latency, ns

48.11

43.13

SiSoft Sandra 2005,
RAM Int Buffered Bandwidth, MB/s

4936

6014

SiSoft Sandra 2005,
RAM Float Buffered Bandwidth, MB/s

4936

5952

The size of the installed RAM doesn’t affect the results in these synthetic benchmarks at all. So, I would say that what we see is pretty natural: the system with four double-sided memory modules appears slower than the system with only two double-sided memory modules. And this is true for both: latency and bandwidth measurements.


The benchmarks based on real-life algorithms and applications can actually behave differently when the size of the system memory and the number of installed DIMM modules increases. Some applications may require more than 1GB of RAM, so that the results get much better once more memory gets installed. So, we decided to go a little bit farther than just bandwidth and latency measurements.


Well, this is far from being promising, I should say. The system with two additional modules installed demonstrated a significant performance drop in games and a few popular applications. In fact, this is not that surprising: most contemporary programs and applications feel quite comfortable with 1GB of system memory at their disposal. By adding two more DIMMs we increased the Command Rate timing and didn’t actually gain anything.

However, in order to make our performance investigation complete we had to test our Athlon 64 system with 2GB of system memory in WorldBench 5 test set, which shows the system performance in popular office and digital content creation applications.

The situation here is a little bit different from what we have seen before. According to the results obtained in WorldBench 5, there are a few applications that actually benefit from adding up more system memory even though the timing settings get worse. Thus, a system with four DIMMs and the total memory capacity of 2GB appeared faster than the same system with only two DIMMs and 1GB total memory in such applications as: ACDSee PowerPack 5.0, Adobe Premiere 6.5, 3ds max 5.1, Windows Media Encoder 9.0, MusicMatch Jukebox 7.10 and WinZip 8.1. All these programs work with large amounts of data and address the storage subsystem a lot. In other words, the performance improves due to deeper caching of the disk operations rather than due to the absence of operations swapping. Since the performance improvement wasn’t dramatic in all these applications, we would still insist that 1GB of RAM is more than enough for contemporary tasks.

So, it looks like the use of four double-sided memory modules in a system with Athlon 64 processors on E core revision will affect its performance quite negatively because of the peculiar organization of the processor memory controller. You should definitely keep this fact in mind when planning you next system upgrade.


DDR500 Support: Finally It Is Here!

As we have already shown above, K8 processors on E core revision haven’t got any better than their predecessors when it comes to working with four memory modules. However, the memory controller optimization by introducing enhanced workload distribution algorithms was not the only thing AMD engineers had been working on. They also added the support of new memory bus frequency dividers thus expanding the functionality of the integrated memory controller. As a result, CPUs on E core revision learned to support faster types of memory than the good old DDR400 SDRAM. And even though these memory modules haven’t been certified by JEDEC, which moved its primary focus to DDR2 SDRAM standards, these modules became pretty popular in the today’s market.

The memory manufacturers offering solutions for hardware enthusiasts have been offering memory modules with over 400MHz frequency for quite a while already. Until recently these solutions were mostly targeted for overclockers. AMD engineers, however, made them of interest to common users, too. The thing is that new Athlon 64, Athlon 64 FX, Athlon 64 X2 and Sempron processors based on E core revision can now support DDR466 and DDR500 SDRAM.

Before we start discussing how the owners of K8 processors on the new core can benefit from this new feature, we should say a few words about the data memory controller of these CPUs. This way we will be able to get a better idea of DDR466 and DDR500 SDRAM support peculiarities in new Athlon 64 processors.

The memory bus in K8 processors is clocked not exactly as the bus in Athlon XP or Pentium4. Athlon 64 processors and their modifications do not have the FSB bus because they are based on the so-called Direct Connect architecture. The memory controller of these CPUs is a part of the processor core that is why the CPU doesn’t use any busses or bus protocols to address the memory controller.

The working frequency of the K8 processors is based on the clock generator working frequency (equal 200MHz at default) and not on the frequencies of some busses. It means that the memory frequency in Athlon 64 based systems is not determined by the clock generator frequency or bus frequency but is based on the CPU clock rate and its clock frequency multiplier.

The basic ratio for the memory frequency in Athlon 64 systems looks as follows:

DRAM_frequency = CPU_frequency / ceil (CPU_multiplier / DRAM_frequency_divider)

Here DRAM_frequency is the frequency in question, CPU_frequency – processor working frequency, CPU_multiplier – its actual multiplier, and DRAM_frequency_divider – sets the memory controller work mode and is selected from the existing set. If you are into coding, you should be familiar with the ceil function: it computes the smallest integral value no loss than x.

The memory clock frequency is only determined by the CPU clock rate and the corresponding divider that is why Athlon 64 processors with different clock frequencies set slightly different working frequencies for the memory modules in the system. In particular, you can find the following table in the documents available on AMD’s official web-site:

You can easily compose this table yourself if you know what DRAM frequency dividers Athlon 64 CPUs support. Luckily this is no big secret. The CPUs on D core revision support 1/2, 3/5, 2/3, 7/10, 3/4, 5/6, 9/10 and 1/1 dividers. All these dividers ensure that the CPUs support memory working at any frequencies close to 100, 120, 133, 140, 150, 166, 180 and 200MHz. The CPUs on the new E core acquired two additional dividers, which are bigger than 1. They are 7/6 and 5/4. As a result, Athlon 64 on E core works fine with the memory supporting something close to 233 and 250MHz.


Speaking about the exact memory frequency rates that can be set with the new dividers, they are all listed in the table below:

CPU frequency

Memory Frequency

DDR400

DDR466

DDR500

1/1 divider

7/6 divider

5/4 divider

2000 MHz

200 MHz

222.2 MHz

250 MHz

2200 MHz

200 MHz

220 MHz

244.4 MHz

2400 MHz

200 MHz

218.2 MHz

240 MHz

In order to actually use the dividers that would allow increasing the memory frequency over 200MHz, they should also be supported in the BIOS. It is true not only for those dividers that are greater than 1. Therefore, far not all the mainboards will let you take advantage of the frequencies listed above if you install an Athlon 64 processor with E core revision. However, if you are lucky to have one of the mainboards that do support these dividers, you will be able to use any memory faster than DDR400 SDRAM without any additional system overclocking. In other words, the CPUs on the new E core allow using overclocker-friendly memory without overclocking the CPU clock frequency generator and thus without raising the HyperTransport bus frequency.

In order to evaluate the advantages of the faster memory types, we carried out a test session where we compared the performance of Athlon 64 processor working with the regular DDR400 SDSRAM against that of the same processor working with faster DDR466 SDRAM and DDR500 SDRAM memory modules.

However, this is certainly not enough to provide an extensive analysis of the advantages resulting from the implementation of DDR4600 and DDR500 support. The thing is that many advanced users have been equipping their Athlon 64 systems with memory working at overclocked frequencies for quite a while now. This is a simple overclocking trick: by lowering the CPU clock multiplier and increasing the clock generator frequency. Contemporary chipsets allow clocking the PCI Express and PCI busses independently of the processor clock frequency generator that is why this trick works fine in most cases. As for the HyperTransport bus frequency, which is directly dependent on the processor clock generator frequency, it can still function normally if the corresponding multiplier is set at a lower value.

So, to check out all the smallest details, we will also consider the results obtained for the same processor but only when the clock generator frequency is increased above the nominal value, thus allowing higher memory working rates.

For our tests we used the same testbed, as the one described above. Athlon 64 3800+ on Venice core worked in four modes:

  1. The clock frequency is set as 12 x 200MHz, memory works at 200MHz (DDR400) with 2-2-2-10 timings;
  2. The clock frequency is set as 12 x 200MHz, memory works at 218MHz (DDR436) with 2-3-2-10 timings;
  3. The clock frequency is set as 12 x 200MHz, memory works at 240MHz (DDR480) with 2-3-3-10 timings;
  4. The clock frequency is set as 10 x 240MHz, memory works at 240MHz (DDR480) with 2-3-3-10 timings.

When we compare the performance of the system in these four work modes, we will be able to state what benefits the faster system memory support might grant Athlon 64 based systems. Besides, we will also see if the new work modes really make sense, or if the quality overclocker boards can grant us the same performance advantage without using the new frequency dividers.

Note that the differences in timing settings come from the peculiarities of the DIMM Corsair CMX512-3200XLPRO memory modules we used that are built with widely spread Samsung TCCD chips.


Again, let’s begin with the synthetic benchmarks results showing the memory bus bandwidth and memory subsystem latency:

 

FSB=200MHz

FSB=200MHz

FSB=200MHz

FSB=240MHz

DDR400
(2-2-2-10)

DDR436
(2-3-2-10)

DDR480
(2-3-3-10)

DDR480
(2-3-3-10)

ScienceMark 2.0,
Memory Bandwidth, MB/s

5628

6098

6346

6374

ScienceMark 2.0,
Memory Latency, cycles

104

96

88

88

ScienceMark 2.0,
Memory Latency, ns

43.13

39.82

36.5

36.66

SiSoft Sandra 2005,
RAM Int Buffered Bandwidth

6014

6353

6534

6510

SiSoft Sandra 2005,
RAM Float Buffered Bandwidth

5952

6268

6464

6440

Synthetic benchmarks show that as the memory frequency increases, so does its bandwidth, and the latency logically goes down. As for the results obtained for the 240MHz memory, they are about the same in both cases: when we have the nominal clock generator frequency of 200MHz and the new 5/4 divider, as well as clock generator overclocked to 240MHz and the common 1/1 divider.

Now let’s take a look at the complex benchmarks:


The results are quite logical. Faster memory leads to faster overall performance. I cannot say that DDR480 SDRAM guarantees significant performance growth, however, the average advantage of 2-3% is quite noticeable. Almost the same results can be obtained without involving the new features of the E core revision: all you need is simple memory overclocking by raising the clock generator frequency.

So, we have to admit that the enhancements of the K8 memory controller such as support of faster DDR-I memory types than the DDR400 SDRAM can really be efficient only for those users who intend to have their system running in the nominal mode with default settings. Overclockers have been able to have their memory bus running at higher speed for a long time already and with a way more flexible configuring options. The performance in this case remains about the same.