General Principles of Nehalem Microarchitecture
Before we get acquainted with a promising Nehalem microarchitecture, we would like to say a few words about the reasons for its arrival. Although they have been working on it for a long time, Intel hardly intended to announce CPUs based on it only for the sake of sticking to their own “Tick-Tock” schedule. It seems that even though Core microarchitecture has been extremely successful, there is something about it that doesn’t quite satisfy the microprocessor giant any more. And these reasons are not superficial. Core processors have a lot of advantages, sell very well and are way ahead of the competitor’s solutions.
It turns out that a serious drawback of Core microarchitecture that makes Intel very unhappy is their non-modular design. Being a continuation of the mobile Pentium M CPUs, Core 2 microprocessors were initially designed as dual-core semiconductor dies. When they started making multi-core Core 2 and Xeon solutions later on, they discovered several drawbacks of this approach. Quad-core and recently released 6-core processors on Core microarchitecture were simply composed or several dual-core dies that had difficulties communicating with one another. Separate cores exchanged data through system memory, which resulted in serious delays caused by limited processor bus bandwidth.
Another bottleneck surfaced in multi-processor systems. Although Intel had already solved the problem of sharing the system bus between processors by launching new chipsets providing an individual bus to each processor, the performance of these systems would often be limited by insufficient memory bus bandwidth. We could see this problem even in Skulltrail platform targeted for computer enthusiasts, not to mention high-performance workstations and servers.
In other words, increasing the systems performance by adding more processor cores to CPUs and more processors to the systems would have sooner or later brought Intel to a dead end, despite the fact that contemporary Core microarchitecture seemed very successful overall. That is why Intel is working real hard to switch to new Nehalem microarchitecture that solved the above described structural problems in the first place. Nehalem’s key peculiarities that immediately catch your eye are integrated memory controller and new bus with point-to-point topology called Quick Path Interconnect (QPI) that not only connects the processor with the chipset but also connects directly several CPUs with one another.
All these innovations remind us of the AMD processors’ structure: a few years ago AMD discovered the advantages of integrating the memory controller into the CPU and connecting the CPUs with one another in multi-processor systems. However, even though Intel is currently the one catching up, Intel CPU cores have been offering higher performance since the launch of Core microarchitecture.
Nehalem’s second important innovation is the modular CPU design. In fact, the actual microarchitecture consists only of a few building blocks that will be used to form a processor at the final design and production stage. This set of building blocks includes a processor core with an L2 cache, L3 cache, QPI bus controller, memory controller, graphics core, etc.
The appropriate blocks will be put together within a single semiconductor die and presented as a solution for this or that market segment. For example, the Bloomfield CPU we are going to discuss fairly soon consists of four cores, an L3 cache, a memory controller and one QPI bus controller.
Server processors with the same microarchitecture that should be announced in early 2009 will have up to eight cores, up to four QPI bus controllers for multi-processor systems, L3 cache and a memory controller. The upcoming budget Nehalem processors scheduled to come out in H2 2009 will have two cores, a memory controller, a graphics core and DMI bus controller connecting the processor directly to the South Bridge. These are far not the only possible modifications. We referred to them to illustrate how flexible Nehalem microarchitecture is.
New principles in platform and processor design are certainly a significant but far not the only innovation arriving with the new Intel microarchitecture. A lot of changes have been made to the main part of a CPU: its computational core. Although Nehalem processor cores may be regarded only as enhanced cores with Core microarchitecture, they still support a lot of new technologies and boast numerous improvements. They provide Nehalem processors with higher “pure” performance. Among important innovations we should mention SMT (Simultaneous Multi-Threading), which is similar to Hyper-Threading technology that allows processing two computational threads simultaneously in a single core. We should also point out support of new SSE4.2 instructions, more efficient branch prediction algorithms, larger internal buffers, more efficient and faster cache-memory.
Summing up everything we have just said, let’s once again list the major distinguishing features of the new CPUs from Nehalem family:
- Two, four or eight cores;
- More advanced computational cores than those of Core based processors;
- SMT technology that allows processing two computational threads simultaneously in a single core;
- Three levels of cache-memory: 64KB L1 cache per core, 256KB L2 cache per core, up to 24MB shared L3 cache;
- Integrated memory controller supporting several DDR3 SDRAM channels;
- Monolithic design: the CPU consists of a single semiconductor die;
- 45nm production process;
- Possibility to integrated a graphics core into the CPU;
- New QPI bus with point-to-point topology connecting the processor with the chipset or the processors with one another;
- Modular structure.
Now that we have briefly discussed the general concept of the new microarchitecture, let’s take a closer look at the individual parts of the CPU based on it.