We have all known since a while ago, that Hyper-Threading technology in the new processors on Prescott core will undergo certain improvements. However, there has been a lot of speculation about what particular improvements these will be. Some people though that Prescott will be recognized by the operation system as four logical CPUs, the others expected Prescott to cope better with those situations when one thread blocks the execution of another one. However, neither of these actually took place.
Pentium 4 processor on Prescott core is recognized by the operation system as two logical CPUs. And as our practical experiments shows, blocked threads can slow this CPU down even with the enabled Hyper-Threading technology.
Let me tell you a bit more about our experiments. To test the Hyper-Threading technology of Prescott processor in real conditions, we developed a small program, which created two threads in the system. The first thread simply adds integers and in the end puts up a flag indicating that the task has been completed. The second thread is an empty spin-wait cycle, which end only when the first thread puts up the end-flag.
When we ran this pretty simple program on Northwood based processors, the results turned out dramatically low. The empty cycle locks the execution units of the CPU, which immediately slows down the processing speed for the first calculations thread.
Just take a look at the results. At first come the results for Pentium 4 (Northwood) working at 3.2GHz with enabled and disabled Hyper-Threading technology:
Northwood, HT enabled
Northwood, HT disabled
The first number produced by our test stands for the time it takes to complete the calculations, when this thread is running alone. The second number stands for the time it takes to complete the calculations when there is a second thread running in parallel to the first one (the empty spin-wait cycle, I have already mentioned to you). As we see, Hyper-Threading technology harms the performance a lot. It takes 2.5 times more time to complete this test when Hyper-Threading is enabled: the empty cycle blocks the processor execution units hindering the processing of the tasks in the first thread. Even though, our test I purely synthetic, situations like that sometimes happen when we run some multi-threaded applications. We have just successfully modeled the case when Hyper-Threading does harm the processor performance.
The third and the fourth numbers produced by our test indicate the time it takes to run both threads in case we apply different optimizations preventing the threads from blocking one another. In one case we used a PAUSE instruction, and in another – we used a special synchronization object of the Windows operation system.
Now let’s see how Prescott will cope with our tricky test:
Prescott, HT enabled
Prescott, HT disabled
As we see, the situation doesn’t actually change. Just like in the previous cases, our test reveals all the bottlenecks of Hyper-Threading technology, which we have already seen by Northwood.
However, I have to admit that some improvements to the technology in Prescott core have still been made. Firstly, the SSE3 instructions set introduced in the new 90nm processor offers the software developers some new opportunities for better threads synchronization (we will talk about it in a special section of our today’s article). Secondly, Prescott learned to run certain processes in parallel, although Northwood could perform them only separately. Not going deep into details, I would like to say that it is about simultaneous work of several threads with the cache-memory.