Improved Data Pre-Fetcher. This improvement should help solve the problem with delays when there is no data in the cache for further processing – a very unpleasant situation when the CPU is idle waiting for the data to come from the memory to the cache. As we have already said, Prescott has twice as large L1 and L2 data caches. Besides that, Intel has also improved the data prefetch algorithms.
Intel improved not only the software data prefetching initiated by the running application, but also the hardware data prefetch mechanism. As for the latter, the CPU processes the software prefetch instructions even if the information about the requested data is absent in the TLB. Moreover, these instructions can be cached in Trace Cache. However, it didn’t prove very efficient, because the existing compilers do not distribute the software prefetch instructions in the code, so hardware prefetcher improvement is much more important here. According to Intel, the new hardware prefetch algorithm of the Prescott processor tracks the data as well as the code flow and ensures a performance improvement of about 35%.
Besides the already mentioned changes, I would also like to point out that Prescott features more Write Combining buffers, which allows performing a lot of instructions such as data saving and loading simultaneously.
The picture below is a flow-chart for Prescott processor:
In fact, this flow-chart shows that there are no structural changes in the new Prescott core compared with the previous solutions on NetBurst architecture. The most evident differences are the size and the structure of L1 and L2 caches, which we will discuss later today.
Cache and Memory Subsystem
Cache-memory seems to have undergone most changes in the new Prescott core compared with the predecessors. At least the different sizes of the cache-memory in Prescott and Northwood can be noticed with a naked eye. L1 data cache in Prescott processor grew from 8KB to 16KB, L2 cache grew from 512KB to 1MB. As for the structure of Prescott’s cache-memory, I should say that L1 data cache features 8-way set associativity with 64-byte string length. It works according to Write Through algorithm. In other words, the number of associative zones in L1 cache has doubled compared with Northwood core. The L2 cache of Prescott is almost the same as that of Northwood: it is also an 8-way cache working according to Write Back algorithm and contains 128-byte long strings. L2 cache in Prescott core features a 256bit bus, which is just like by Northwood.
Theoretically, the increase in the cache-memory size is another way to combat processor idling caused by the absence of data for further processing. That is why as the core clock frequencies increase and the gap between the CPU performance and memory speed grows bigger, efficient data cache-memory becomes more and more important. This way, the enhancement of L1 and L2 cache of the new Prescott processor core is a very significant change, especially since this core was initially developed for higher clock frequencies.
As for the L1 cache for instructions, it is known as Execution Trace Cache in NetBurst architecture, because it stores instruction sequences in the already decoded form. Its size and structure remained unchanged: it can store up to 12,000 micro-operations, which is equivalent to 8-16KB of ordinary instruction cache.