All these seemingly insignificant changes were called for by the fact that new processor cores support SMT technology and can simultaneously process up to two computational threads that require resource sharing. As a result, Intel used a few simple microarchitectural solutions to increase the efficiency of the CPU execution units, i.e. they increased the CPU performance without any serious power consumption changes.
I have to say that the return of SMT technology to Nehalem processors is one of the most significant innovations that can have the biggest positive effect on the CPU performance. Pentium 4 processors where the exact same technology was presented as Hyper-Threading, received up to 10% average performance boost from enabling it. New processors with Nehalem microarchitecture should benefit even more from SMT. Firstly, they have memory subsystem with much higher bandwidth that can much better supply two computational processes with data. Secondly, Nehalem boasts “wider” microarchitecture that allows processing more instructions simultaneously.
Here I have to say that Intel engineers didn’t have to increase the complexity of their processors significantly in order to implement SMP in Nehalem, as well as in Pentium 4 back in the days. In fact, they only duplicated the processor registers and Return Stack Buffer in the core. When SMT is enabled, all other resources are shared dynamically between processor threads (for example, Reservation Station or cache-memory) or shared 50-50 (for example, Reorder Buffer).
By the way, like in Pentium 4 processors, enabling SMT in Nehalem makes the operating system see each physical processor core as a pair of logical cores. For example, software will see a quad-core Nehalem processor as an 8-core CPU.
However, remembering that SMT activation may sometimes lower the performance, Intel engineers made sure that the physical and logical cores can be easily distinguished and are not possessing full rights. This way, software developers can decide themselves how the resources should be distributed between them more efficiently.
To illustrate how the above described practical changes affect the system performance, we decided to compare Nehalem against Penryn in a few simple benchmarks from SiSoftware Sandra 2009 suite. These results are especially valuable because they are not critical to the memory subsystem parameters and hence allow us to draw conclusions about the efficiency of processor microarchitectures discussed:
True, you can see the advantages of the new microarchitecture with enabled SMT. Sandra 2009 tests are optimized for multi-threading so no wonder that enabling SMT improves Nehalem results by 15-60%. However, is we compare the results of Nehalem and Penryn processors without SMT, then the new processor will not always be better than its predecessor. Everything depends on the type of workload, which indicates that there have been no revolutionary or universal changes made to the new core.