Chapter II: Goals and Prospectives of Processor Micro-Architecture Development
When we were working on the K8 micro-architecture technology coverage we arrived at the conclusion that the best way would be not to repeat what the developer says about their micro-architecture, but to explain its peculiarities and features in a simple and easy to understand way. Moreover, it looked really appealing to us to focus on the unclear statements in the whitepapers as well as to find and bring up the matters totally omitted there. Of course, you may find some things overlapping the whitepapers you might have already seen, because any detailed micro-architecture study inevitably has to be based on the official documents about it.
Also I would like to stress that we decided to publish this article now that many details about the Pentium 4 micro-architecture are no longer that new to all of us on purpose. In fact, these details are so not new, that almost every IT editor is ready to explain to you the “disadvantages of a longer pipeline for branch prediction” without even interrupting his lunch. You probably wonder why we delayed this article for so long? Let me explain: a few things about the functioning of the Pentium 4 processor appeared far not that simple. Secondly, despite the same marketing “Pentium 4” name, the Northwood and Prescott cores are very different from one another in the way they behave. As a result, we very often had to investigate Prescott core and Northwood core separately. Thirdly, we wanted to offer you a thorough and well-constructed analysis of the micro-architecture details, so that those of you who are really into it could take their time and enjoy our discussion of a completely different level of technical details.
Moreover, when Tejas project had been cancelled, it was evident that Prescott was the last serious modification of the Pentium 4 core. Of course, they continued increasing the system bus frequency, the size of L2 cache memory and enabled the EM64T support, but all this is none other but external changes: the core will not undergo any further modifications. That is why it makes perfect sense to sum many things up right now and finally answer the question: what do we know about this micro-architecture? Can we explain the Pentium 4 functioning peculiarities with something more essential than the boring phrases about “a longer pipeline”? Can we finally answer simple questions like “what is the actual length of the Pentium 4 pipeline?”
Well, let’s see what our investigations led to in the long run. And since the search for the answer has very often been more interesting that the actual answer, we would like to tell you a few exciting stories about our search for the answers. Especially, since we will have to introduce quite a few specific terms and notions for better understanding of the topic.
Before we pass over to the actual micro-architecture discussion let me remind you of a few interesting things. Very important things.
No secret that the performance of any micro-processor architecture can be represented as the processor frequency multiplied by the number of operations per clock cycle. So, the bigger is each of the multipliers, the bigger is the end product. Ideally it would be best to increase simultaneously the working frequency as well as the number of operations to be performed per clock cycle. However, in real life the situation is much more complicated, even though the frequency growth is not contradicting the growth of the per-clock performance directly. To provide higher operations per clock we need to introduce special design changes, perform complex analysis of the commands interconnections and overcome the impossibility to split the algorithms so that they could be working in parallel.