The execution units and the memory subsystem are shared between the two logical processors without any serious limitations. These units do not care which logical processor owns this or that micro-operation.
DTLB (table for translation of virtual addresses into physical ones, only for data this time) is also shared. Each record in it is marked with the identification index of the logical processor.
The caches of all levels are also shared by the two logical processors. There are no limitations, and the origin of the requests is not monitored.
Of course, resources sharing is not free from certain problems. As we have already seen above, as a result of competition between the two logical processors for the shared resources, the resources available for each thread are smaller than what a single thread would have. In particular, Northwood processor boasts a relatively small L1 data cache of only 8KB. With Hyper Threading technology the effective size of the L1 data cache for each of the threads is almost twice as small and equals about 4KB. By the way, this pushed us to the idea that the existing improvement of the Hyper Threading efficiency in the new Prescott core is not only connected with the enhancements made to the technology itself, but also with the twice as large L1 data cache. Which has certainly reduced the performance losses caused by a too small L1 data cache.
To tell the truth, all of us consider this idea pretty logical.
Anyway, since we have two threads processed simultaneously, the processing speed of each thread is generally lower. In particular, it takes more time to complete the execution of the main thread if there is a background thread being executed at the same time. Nevertheless, in most cases it still takes less time for both threads to be executed simultaneously than it would take to execute each of them individually. In other words, if there is a second thread, the first thread would be executed slower, but the performance would grow up anyway because it would take less time for execution of both threads than in case we had to execute them one by one.
Software optimization is a very important issue connected with Hyper Threading technology. Really, if the active thread requests some data from the memory but doesn’t free the execution units at this time, then the second thread will not be able to continue its work, while the first thread will not have the data for further execution either. Moreover, even in a single-threaded system it is more than enough to overload the queue of a single scheduler and no command will get to any other scheduler even if it is absolutely free. That is why the quality of the code optimization is one of the major criteria of the Hyper Threading efficiency.
Besides that the operating system should work correctly with Hyper Threading technology. In the Windows family this is Windows XP (and newer OS’s). The older operating systems such as Windows 2000 do not know to distinguish between the physical and logical processors, although they can still work in configurations like that.
Let’s sum up a few things now. Hyper Threading is based on the idea of increasing processor efficiency by splitting the processor units between different logical CPUs thus reducing their idling time. Hyper Threading can provide from 0% to 30% performance improvement, and in some cases can cause slight performance drops. The efficiency of Hyper Threading technology depends a lot on the quality of software optimization. It cannot replace dual-processor configurations, but the user doesn’t invest that much actually when he or she acquires Hyper Threading supporting CPU (the die size is not getting much bigger).
From the technological point of view, Hyper Threading is not too complicated to implement. However, the logical managing the entire processor structure will have to work in a much more complex way. Moreover, without Hyper Threading the CPU could have looked completely different.
All in all, Hyper Threading technology should be considered a worthy addition to the NetBurst micro-architecture. In most cases it does improve the processor performance, which is the first criterion of effectiveness for any innovation.
To be continued!