Chapter VII: Hyper-Threading Technology
It looks like now is the perfect time to put in a word about the well-known Hyper Threading technology (HT). The main idea behind this technology is very simple: all processor execution units (or the majority of them) are almost never busy all at once during the program code execution. As a rule only one third of the available processor computational resources is occupied (according to Intel’s own estimates), which is actually quite humiliating, I should say.
So, it suddenly occurred to them: why don’t we use the part of the execution units that are free from working on the current program code for some other program execution or other thread execution within the same program? In fact, this is a very reasonable idea. By the way, they first voiced out this idea in 1978, then Cray implemented it in their CDC6600 (although this was not a single-die CPU). AT that time they called it Simultaneous Multi Threading. So, I wouldn’t say that Intel was highly original when they first came up with the idea of HT technology. Nevertheless, we should give them due credit: Intel was the one to bring Hyper Threading technology into the PC market.
How does this technology actually work? The basic idea seems to be pretty clear, but how did they implement the simultaneous processing of several tasks?
In fact, it was all done in a very simple and logical way. The CPU with Hyper Threading technology is recognized by the operating system as two logical processors. So, only if the operating system can work correctly in multi-processor configurations, it will be able to work correctly with Hyper Threading technology. Each of the logical processors is assigned a corresponding “architectural status” defined by the values of the logical registers, service flags and status registers. The “architectural status” of the two logical processors is different, and the logics managing them should know to process the status of each processor separately. These are the blocks that had to be added to the CPU.
All in all, the CPU can be split in two parts: two description sets for “architectural statuses” of our logical CPUs, and a single core for both of them, which actually executes the commands. To make it simpler, we could say that the units implementing Hyper Threading support belong to the Front End unit block of the processor.
So, the following units had to be added to the processor to guarantee Hyper Threading technology support (to be more exact, they had to duplicate a number of already existing units): Instruction Streaming Buffers, Instruction TLB, Trace Cache Next IP, Trace Cache Fill Buffers, Register Alias Tables. Note that all these extra units do not increase the die size that much. It gets less than 5% bigger. Take a look yourself:
This way, most of the units added to the CPU are special buffers saving the info about architectural status of the logical processors.
Now the working principles get very easy to understand: different programs (or different threads of the same program) are simply sent to different logical processors. The only difference is that they are still executed in a single physical processor, so it looks like the biggest performance gain could only be obtained when the running programs use different execution units.