Well, unexpected challenges have never scared us away, I should say. So, our entire team got down to the development and checking of various hypotheses. Unfortunately, despite all the brilliance of these hypotheses didn’t help and after a while we gave up desperate attempts to explain the above described situation in a reasonable way using the whitepaper data we had at our disposal.
It “just worked” this way. But we wouldn’t buy it then.
Besides digging through the whole bunch of technical data, we also focused on other “traditional” ways of obtaining information: blackmailing, flattering and bribing the top Intel executives :) (Just kidding!) No luck. All these time-tested opportunities didn’t work. So, we had to start thinking real hard and even write special software for our testing needs ourselves.
Besides, we thoroughly searched all the available documentation looking for at list a hint to any possible explanation. And we managed to find this hint (very brief one, I should say). It was the word “replay”, we came across which several times in the IA-32 Intel® Architecture Optimization Reference Manual. Moreover, this word was also mentioned a few times on some slides in Intel’s early Pentium 4 presentations.
Since the documentation on Pentium 4 architecture didn’t contain any additional information about this mysterious “replay”, we started looking for its description in Intel’s patents.
Here I would like to omit all the details of our endless search, but I have to admit that the idea to study carefully Intel’s patents turned out very fruitful in the long run. It is there that we found a pretty abstract description of a special system for repeated micro-operations processing aka replay. This system is intended for repeated execution of micro-operations that have already left the scheduler but were executed incorrectly for some reason.
This way, we revealed the details about a mysterious sub-system of the Pentium 4 processor, which has hardly been described in any Pentium 4 documentation, or any articles and reviews devoted to this processor micro-architecture. Well, this is very unexpected windfall! Especially, keeping in mind that we didn’t plan going rally that deep into details at this point.
Of course, we were very curious to find out how this system affects the processor performance. Moreover, it is always a challenge to investigate the features and peculiarities of a subsystem that hasn’t been known to the general public before. We should finally be able to explain the weird behavior of our CPU, anyway.
So, we were really excited about getting this mysterious replay thing. And I have to confess that we managed to achieve some really impressive results. I would like to stress that this article is the world’s first detailed discussion of the replay feature and its functioning peculiarities. Of course, the first thing we supposed that replay will be the key to the revealed deviation of the actual processor performance from the theoretical one.
Later we arrived at the following conclusion: it looks like we found much more than we were actually looking for. This Replay turned out very interesting as a not very well-known feature of the Pentium 4 micro-architecture which clarifies some of its mysteries as well as affects the processor performance.
Moreover, when we investigated the Replay function, we also had to dwell on another very rarely mentioned subsystem called internal event counter. However, since this subsystem is not directly connected with the topic of our today’s article and can be of interest only to some technical specialists, we decided to provide all the details about it in Appendix 3. If you would like to skip additional details discussed in Appendix 3, you can go directly to Chapter IX now.