Curiously, the instruction cache in this architecture is bigger than the data cache, but this is often the case with RISC systems (for example, with the NexGen and the K5 processor that was internally RISC). The simplicity of the instructions (the engineers highly approve of) means that the size of the code would be much higher than a comparable in functionality chunk of x86 code.
It is also a curious fact that the instruction cache uses the simplest structure – direct-mapped – when each line of the cache is 128 bytes long and consists of 4 sectors, 32 bytes each. One sector (32 bytes) can be either written into or read from the cache in a clock cycle. It’s more interesting with the L1 Data cache: two segments of 8 bytes can be read through two ports and 8 bytes can be written through the third. All of this takes one clock cycle, without any jams.
The L2 cache also consists of 128-byte lines, and is updated using the industry-standard method – Pseudo LRU, 7 bit.
As for the direct mapping of the L1 Instruction Cache, I can’t find the reason for the engineers to choose it. The engineering team from IBM may have thought this structure would provide an acceptable precision coefficient of cache hits. Or they may have been just simplifying the circuitry of the die. The forebear die was not small by itself (it was gigantic, to tell you the truth): it is no secret that a processor of an absolutely different price range, Power 4, was the base for developing the PowerPC970. Yes, it was that core that after some redesign transformed into a mainstream processor. The very idea was sensible, though. The Power4 was one of the performance leaders when it came out, and it remains quite competitive today. So it was much easier to use the available intellectual property, adapting it for the requirements of mass production (the die size of the Power4 is 417sq.mm; the high manufacturing cost of the monster would be unacceptable in mass production). They managed to reduce the cost of the die considerably: the processor manufacture of the Power4 (it also includes the L3 cache, though) costs about $10,000, while the PowerPC970 costs a few hundred dollars (I couldn’t find the exact number, apart from the cost of the whole platform). The following illustration is taken from a presentation made by IBM; it proves that these two processors are truly close relatives:
So, they sacrificed the second core for the sake of clocking the first one at a higher frequency (the finer production technology and the longer pipeline also contributed to reaching higher frequencies). They added SIMD instructions to the remaining core, too. We will have a chance to discuss this set of features; there is really a lot for discussion.
Right now, let’s get back to the microprocessor. What does it have inside?