The mentioned micro-architectural differences between the Core Duo and the Dothan-core Pentium M are not too significant. The former may even be viewed as an improved version of the latter, but the main advantages of the new CPU come from its dual-core design, of course. And this is rather a non-typical design for today as the two cores share a single 2MB L2 cache. By the way, this is called Intel Smart Cache , and it is smart because the same L2 cache space is intellectually used by both the execution cores.
But what’s good about this design? First, the cache space used by a core can be flexibly adjusted. In other words, each core of the Core Duo processor can access all the 2 megabytes of L2 cache memory. When one core is idle, the other enjoys everything.
If both the cores are at work, the cache space is divided between them depending on how frequently the cores are addressing the system memory. Moreover, if both the cores are processing the same data at the same moment, only one copy of the data is stored in the shared L2 cache. Thus, the smart 1MB L2 cache of the Core Duo processor is more efficient and, so to say, more capacious than two separate 1MB caches as in dual-core processors of Intel’s Pentium D 8XX and AMD’s Athlon 64 X2 families for desktop PCs.
Another important feature of the shared L2 cache is that it helps reduce the load on the system memory and on the processor bus. The system just doesn’t have to control and ensure cache memory coherency. In dual-core designs with separate caches, there are two copies of the same data in each cache when both the cores need them and it is necessary to control if the data hasn’t become out-dated. Before extracting such data from the L2 cache, the processor core must check if the other core hasn’t changed it. If it is so, the cache memory contents must be updated – via the system memory and the system bus. The shared cache design helps abandon such an inefficient algorithm altogether. Moreover, the control logic in the Core Duo allows for direct data transfers between the L1 caches of the cores for more efficient communication when the cores are both working on the same task.
Of course, no improvement ever comes easily. Here, the processor die had to be made larger to unload the CPU bus and solve the coherency problem: the Smart Cache arbiter in the Core Duo die is about one third the size of a single execution core. You can note it’s not easy to manage a shared cache. Using a sophisticated operation algorithm, the cache has also become slower, with a 40% higher latency. The L2 cache latency of the Pentium M on the Dothan core is 10 cycles, while the same latency for the Core Duo is 14 cycles. Intel tried to make up for that by improving the data pre-fetch mechanism for L2 cache, so there’s little difference between the Pentium M and Core Duo on single-threaded applications under the same conditions.