As you know, cache memory is a high-speed type of static memory that stores frequently accessed data. Why is this type of memory necessary at all?
That’s because frequencies of memory and processor, originally similar, have long been very different. The frequency of modern processors is high above the possible frequency of the cell of ordinary random-access memory (and this situation has been around for long), which is the base of all existing memory types from EDO DRAM to DDR SDRAM. Of course, the memory manufacturers are conquering new heights by improving the production technologies, but the current memory clock rate is no match for the CPU clock rate anyway. The maximum frequency for mass-manufactured server memory is 400MHz (200MHz plus Double Data Rate effect, to be exact), while modern processors have already reached 3GHz and are set for more.
Thus, there arose the need for introducing a certain intermediary – fast memory that the processor shouldn’t wait for tens or hundreds of clock cycles to access. This memory could store frequently-accessed data and the results of calculations.
Right now, the cache memory subsystem of the processor is rather sophisticated. We now have several levels of cache. Server processors typically have two or three levels of cache memory, while systems on ProFusion (Intel) or Summit (IBM) chipsets contain four (!) levels of cache. This caching scheme is intended for the processor to wait as little as possible for data from memory.
Cache memory proved to be even more important for multiprocessor systems. Such machines use large caches which often provide a bigger performance advantage than a much higher CPU frequency would do. For example, the Xeon MP contains up to 2MB of L3 cache, while the system on the Summit chipset has up to 64MB of L4 cache! This amount of expensive static memory is amassed just for one purpose – to load the processor with work so that it didn’t stay idle. Besides that, the large cache can help to unload the system bus during peak loads. In other words, the more cache you have – the better. There’s only one downside to that: large cache leads to a very tangible increase of the system cost.
Besides other things, there is one inherent problem of all multiprocessor systems, irrespective of their organization. I talk about data coherency. When manipulating data, the processors change the memory contents. For the system to process information adequately, all other processors must know that this particular memory cell has changed its content. If other processors don’t need this cell at this moment, there’s no problem as this processor can just change its content for something new. But what happens if several processors need one and the same memory cell? When the cell is altered by a CPU, other CPUs must learn about the new value of the cell. Otherwise, they will use incorrect data in their future work. Thus, the problem of cache coherency must be solved for a multiprocessor system to work normally. This problem is among the most difficult that the developer of a multiprocessor system faces.
They solve it by introducing special connection protocols, which the processor caches use to exchange their information. For example, Intel uses the MESI protocol with a description of four cache cell states… Other manufacturers (AMD and IBM) use the more advanced MOESI protocol, which deals with five states for each cache cell. In any case, this exchange takes some part of the system bus bandwidth, but this is unavoidable, at least today.