Bookmark and Share

Articles: CPU

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 ]


Pic.3: AMD Athlon 64 X2. Sequential reading of the data
modified in the cache of the other core.


Pic.4: AMD Athlon 64 X2. Random reading of the data
modified in the cache of the other core.

The results aren’t encouraging. The data transfer latency has become a little higher, but the overall picture has remained the same. The second thread’s data access latency is too high for this thread to be possibly reading directly from the first core’s cache. When randomly reading the modified data (Picture 4), there’s a small growth of data transfer latency for data blocks smaller than 512MB which may be due to the necessity to copy the modified cache lines into system RAM. The growth is very small, though, and there is no such latency growth when the data are accessed sequentially. This probably means that the memory controller doesn’t access the data after having just written them to memory, but returns them to the processor from the internal buffers, which is in fact right.

To make sure the data are read from the system bus, I’ll carry out a couple of tests more.

First I reduce the CPU frequency multiplier from 10 to 6 which results in a CPU clock rate of 1200MHz. And here are the results of reading the modified data:


Pic.5: AMD Athlon 64 X2, 1200MHz frequency. Random reading of the data
modified in the cache of the other core.

The steps in the graph are 6 cycles high now. This clearly indicates that data transfers into the core are synchronized with the CPU memory bus and are most likely performed through this very bus.

And now I’m going to check my supposition that the data are read from system RAM rather than from somewhere else by measuring the speed of reading data from it. To accomplish this, I just disabled data reads from the first thread. Thus, it’s only the second thread that works with data: it reads the data from system RAM and measures the latency, then clears the cache, reads the data again and clears the cache again.


Pic.6: AMD Athlon 64 X2. Sequential reading of the data from system RAM.


Pic.7: AMD Athlon 64 X2. Random reading of the data from the system RAM.

The graphs for reading the unmodified data loaded into the first core (Picture 1 and 2) and for reading the data from system RAM (Picture 6 and 7) almost coincide. So, the supposition that the hardware data prefetch mechanism affects the random-access latency is correct! There are only differences in the latency graphs of 8-16KB data blocks but they are due to the hardware data prefetch mechanism as we can make sure by running the tests multiple times.

So, I have to state that I can’t find any indication of direct data transfers from one execution core to another in the Athlon 64 X2 processor. According to my tests, the most recent copy of data is always read from system RAM. This must be a limitation of the MOESI protocol implementation. The following seems to happen when data are accessed: on receiving a read request probe read that the second core puts on the system bus, the first core performs a write-back of the modified cache line into memory. After this write or at the same time with it, the requested line is transferred to the second core. If the data in the first core’s cache haven’t been modified, they are read from system RAM. Why is there no direct transfer between the cores via the crossbar switch? Ask AMD’s engineers about that! :)

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 ]

Discussion

Comments currently: 1
Discussion started: 02/19/07 03:28:31 AM
Latest comment: 02/19/07 03:28:31 AM

View comments

You must log in to add comments.

Forgot password? Registration

remember me