<%BANNER[top_768x90]%>
<%BANNER[banner_468x60_h]%>
<%BANNER[cpu_300]%>

Articles: CPU

Real-time Pricing and Availability:
AMD Athlon

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 ]

Intel Core Duo (Yonah)

The Core Duo T2400 processor from Intel will be tested next. This dual-core processor has a clock rate of 1833MHz and is based on the Yonah core. The main difference of this CPU from those I’ve tested earlier in this review is that its 2 megabytes of cache memory is shared between the two execution cores.

With a shared L2 cache, the data read by the first core into the cache should be “visible” to the other core. This means we can hope to get good results here. Let’s see…


Pic.20: Intel Core Duo (Yonah). Sequential reading of non-modified data
loaded into the cache of the other core.


Pic.21: Intel Core Duo (Yonah). Random reading of non-modified data
loaded into the cache of the other core.

There’s a latency of 14 cycles when reading 1MB and smaller blocks of the unmodified data. This is exactly the latency of this processor’s L2 cache. You may ask why there’s a sudden increase of latency on the 2MB data block if the processor has 2MB of L2 cache. It’s because of the size of this processor’s TLB which is 256 entries for 4KB pages. That is, the TLB can serve only 1024KB of memory and when new pages are accessed, there must be performed a time-costly access to the page translation tables to translate virtual addresses into physical ones. This has a negative effect on the result, of course. So, we’ve got excellent performance when reading the unmodified data. Let’s see how good the processor is at reading the modified data.


Pic.22 : Intel Core Duo (Yonah). Sequential reading of the data
loaded and modified in the other core.


Pic.23 : Intel Core Duo (Yonah). Random reading of the data
loaded and modified in the other core.

The results are rather ambiguous, but let’s try to understand them. The latency is the lowest for the 1MB data block and grows for smaller data blocks, reaching a maximum on the data blocks that fully fit into the L1 cache memory (32KB). Is there something wrong with the test? No. You can notice that the graphs for small data blocks that fit into the L1 cache look very much alike to the latency graphs of the Athlon 64 X2 processor I tested at the beginning of the review. The characteristic step-like shape of the graphs is quite obvious. The steps are 11 cycles high which is exactly the value of the CPU frequency multiplier. Hence a surprising conclusion: the modified data in the other core’s cache are accessed here in the same way as in Athlon 64 X2 and Pentium D processors. That is, the most recent copy of data is first sent back to system RAM via the system bus and is then transferred into the second core. But why? The L1 caches of the Yonah’s cores use a write-back caching policy which means that after the data are changed by a core the modified line with the valid data copy is stored in the L1 cache until it is ousted from it whereas the L2 cache contains obsolete data. It seems that when a cache miss occurs in the shared L2 cache, the second core places a read request on the system bus and the first core (which stores the most recent copy of the data) responds to the probe read from the system bus by sending them to the bus rather than to the second core. I can’t say why the data is not saved directly into L2 cache for the second core to read them from there. Perhaps it would have taken a considerable redesign of the processor to make it work so.

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 ]

<%BANNER[banner_468x60_f]%>

Discussion

Comments currently: 1
Discussion started: 02/19/07 03:28:31 AM
Latest comment: 02/19/07 03:28:31 AM

View comments

You must log in to add comments.

Forgot password? Registration

remember me