Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Replay: Unknown Features of the NetBurst Core (page 13)


Category: CPU

by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev

[ 06/06/2005 | 04:20 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17

If we consider our chain of commands with “holes”, the solution could be quite simple: we should prevent the scheduler queues from sending out new commands for a while. This is a pretty natural solution, because the issue actually arises from the fact that the scheduler has no idea of what is going on in the RL and keeps sending there new instructions. However, this method is not exactly the best way to solve livelocks: it represents just the opposite of what you should do in this case. Look, all we need is to somehow fit an instruction stuck in the scheduler into the RL, and right now all the attempts discussed above result into complete isolation of this instruction. Besides, you can also see that if the chain keeps growing, there will be at least 5 additional instructions in the replay system, and this is not what we suppose it should be.

Well, there is one more option left, which we need to check out: locking the scheduler Input. Since it will take another huge article to reveal all the smallest details of this, we will only provide you with the test results and conclusions drawn from them.

We managed to find out that there is a moment when only 8 instructions can be sent for execution (with further replay), which matches exactly the schQ FAST_0 scheduler queue depth.

The conclusion is evident: we deal with the locking of the scheduler input. This is a very important statement, as this mechanism is a pretty logical basis for the unified universal system used to resolve livelocks as well as “long chain” issues. The tests prove that once the scheduler input is closed, it locks not only the schQ FAST_0 queue, but also all other schedulers. We can see it from the limited number of instructions getting into replay from these schedulers once their input has been locked. The price we have to pay for this stall operation can roughly be estimated as 15-35 clock cycles.

The system would watch for a while the share of the replayed instructions, new instructions and free positions (let’s call them patterns). However, all we know now doesn’t allow us to fully reproduce the exact decision making algorithm about the stalling. Our experiments shows that there is a serializing event that serves as a starting point, that is why if we take the test code of our example as is but slightly modify the calling procedure, the maximum number of replayed instructions may turn out different.

Note that long chains are not stalled globally. For example a chain of dependent shift will never be stopped. The same is true for many other instructions. In other words, no stops is more of a rule than of an exception. The model “chains with holes” are always stopped when they get into RL-7, but in some cases when they get into RL-12, the “holes” keep looping infinitely. The looping of our model chains results into ideal identity of the pipeline status every 14 clocks, which cannot happen in real-life program codes. So, we get the impression that we are dealing not with the mechanism for resolving this difficult situation, but with a system initially intended for other tasks. And this system gets involved here only in specific favorable conditions.

We believe that this mechanism is intended primarily for resolving livelock issues. If you have been reading our article carefully, you might ask: how can the blocked scheduler input help get out of a livelock? We think it works like that: when the scheduler input is closed, the chain of commands circling in the replay loop is sent to some special buffer and the free positions in the pipeline get occupied by the micro-operations left in the scheduler. These micro-operations may be executed successfully, thus resolving the livelock.

<<< Previous page Next page >>>

Discussion

Comments currently: 25
Discussion started: 06/08/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

 
News Archive
All Latest News