Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Replay: Unknown Features of the NetBurst Core (page 3)


Category: CPU

by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev

[ 06/06/2005 | 04:20 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17

First Look

Instructions can be executed incorrectly for multiple reasons. Besides the dependence on the results of the previous instructions, we could list the following external conditions: L1 cache miss, incorrect store-to-load-forwarding, hidden data dependencies, etc.

Let’s find out what the replay system looks like (Pic.3a, Pic.3b). The Scheduler output is connected to the Replay Mux. Then the operations from Replay Mux are sent to two pipelines. The first pipeline is the main one, it delivers the commands to the execution units. The second pipeline belongs to the replay system directly and contains empty stages, which do not do any specific work, but which number until the Check stage is the same as the number of stages in the main pipeline. The second pipeline receives exact copies of the operations going in parallel to the first pipeline.


Pic.3a


Pic.3b

The operations go along both pipelines in parallel until they reach the Check stage. Here the Checker unit verifies if the operation in the main pipeline has been executed correctly. If everything is alright, the operations retire (Pic.3a). if it turned out that the incorrect result was obtained for this or that reason (for example we got L1 Miss signal), then the chain of operations from the second pipeline is sent back to the Replay Mux through a replay loop (Pic.3b). At the same time (if we have an L1 cache miss, for instance) a request is sent to the next caching level (L2 cache). The replay loop may contain additional “fictitious” stages (on the picture they are marked as STG. E and STG. F). The number of these stages is adjusted so that the delay of the operation and pipeline for the complete loop could be just enough for the data to arrive from the new cache level (for example, the L2 cache latency, i.e. 7 clock cycles).

By the time the reversed command is expected to arrive at the replay mux, the Checker unit sends a special signal to the scheduler (stop signal) so that the scheduler could reserve a free slot in the next clock. The replay mux will insert the command returned for repeated execution into this slot. All commands, which depend on the incorrectly executed operation will also be returned for re-execution. Note that the distance between these commands equals fixed number of stages.

Here I have to stress right away that the commands can be sent to replay multiple times. For instance, the data from L2 cache can simply arrive too late, once a lot of repeated requests to the L2 cache occur. In this case one or two additional loops might be necessary, which will increase the L2 reading latency. For example, the data may arrive into L2 cache in 9 clock cycles instead of 7, and an additional loop will add at least 7 clock cycles to that. If the data is simply not in L2 cache at all, then the chain of commands depending on this data will be rotating in the replay system until the requested data arrives from the main memory.

Additional replay loops are one of the reasons why the actual L2 cache latency may turn out much higher than the number claimed in the official documents.

<<< Previous page Next page >>>

Discussion

Comments currently: 25
Discussion started: 06/08/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 3, 2008

11:50 pm Via Nano Processors Set to Arrive in August or September. Via Technologies’ Nano Chips Delayed Again

Friday, June 27, 2008

6:59 pm AMD Readies K8-Class Processors for Low-Power Systems – Pictures. AMD’s Rival for Intel Atom Tested by Company’s Partners

Monday, June 23, 2008

11:00 pm Advanced Micro Devices Set to Compete with Intel Atom and Via Nano Chips – Rumours. AMD Reportedly Plans to Fight for Ultra Low-Cost Personal Computers Market with Sempron-Like Chips

Wednesday, June 18, 2008

7:12 pm Intel Atom Competitor from AMD: Rumors. Bet on AMD64?

Wednesday, June 11, 2008

5:31 pm AMD Denies Cancellation of New-Generation Dual-Core Chips. AMD Claims “Kuma” Gets Ready for Release

 
News Archive
All Latest News