Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Replay: Unknown Features of the NetBurst Core (page 12)


Category: CPU

by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev

[ 06/06/2005 | 04:20 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17

Forced Replay Quit Mechanism

As we have just seen, we need special additional logics capable of resolving hopeless conflicts like that, if we want to keep our processor working at least at the minimum acceptable level. To be more exact, we need two mechanisms. The first one, “watching mechanism”, will detect problems like livelocks, and the second one, “acting mechanism”, will push the system out of trouble. If these mechanisms exist, they should evidently work not only in completely hopeless situations, such as livelocks, but also in some more neutral cases. Well, moving from theoretical discussions to the practical tasks, we decided to test the system with the help of a chain with a “hole” by gradually increasing the number of commands in it.

        {xor eax,eax}
256* {and eax,eax}
        {mov eax,[eax+esi]}     //   L1 miss, L2 hit
        {and eax,eax}
        {add eax,eax}  //   “hole”
   N* {and eax,eax}

Let’s take a look at the diagram reflecting the difference (in clock cycles) between the execution latencies in case “L1 miss, L2 hit” and “L1 hit” depending on the queue length (Pic.9a). We can see that the system behaves pretty predictably if the chain of commands is relatively short: our rough estimates of the latencies are close to the theoretical values. While we are increasing the number of instructions in the dependency chain of AND commands, we increase the number of replay loops: after every 14 chains the chain latency jumps 7 clocks up. And when we reach N=82 we suddenly see an unexpected delay (for about 22 clock cycles), which we can hardly explain at first sight. To figure out what caused this delay we involved Pentium 4 performance counters, which include a special group of counters checking the replay events. We analyzed the performance counters calculating the  number of instructions sent to replay and found out starting with N=87 the instructions stop coming to replay (Pic.9b).


Pic.9a: Relative execution time for the chain of commands
with one “hole” depending on the length of this chain.


Pic.9b: The number of replayed commands from the chain
with a “hole” depending on the length of this chain.

The delay should be somehow connected with the forced replay exit. But what mechanism stops the chain?

Suppose that the process stalls naturally, i.e. without any external influence. For example, if the internal processor resources get overloaded. The question here is what these resources are, because there is quite a big reserve of the main resources (ROB, internal registers pull, etc.), which allows the CPU to survive much heavier stress. We will not completely disregard this possibility so far, but will keep in mind that it will not be the solution for the livelock: the problem will have to be solved in some other way in this case.

Let’s approach the issue from the other end. The simplest and most radical way of resolving the issue is to clear the pipeline. In this case we would expect the counters to detect the replayed chain of instructions immediately, however, the experiments didn’t show anything like that. In fact, nothing to be surprised at: clearing the pipeline is a pretty rough measure here.

<<< Previous page Next page >>>

Discussion

Comments currently: 25
Discussion started: 06/08/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 24, 2008

11:06 pm Intel Rumoured to Speed Up Nehalem Launch on Desktop. Intel’s Bloomfield Processor to Emerge in September – Rumours

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

 
News Archive
All Latest News