Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Replay: Unknown Features of the NetBurst Core (page 6)


Category: CPU

by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev

[ 06/06/2005 | 04:20 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17

The new commands will keep getting into replay loop until the dependency chain ends. The negative influence of the replay on the rest of the chain results not only into the increased operations latency, but also into absolutely inefficient waste of computational resources, because all the operations in the replay system have to be executed at least twice: first time before and second time after the replay. And as we have already pointed out, the commands can be re-executed a few times while waiting for the data to be ready, so the resources workload may grow more than twice.

In real applications the loops of dependent commands may disappear, if some events cause pipeline clearing, or the “holes” will be “patched”. Here I mean that there are enough commands from independent chains fitting into the “holes” between the operations sent back to replay (like in the example on Pic.4b). However, it is extremely hard to predict in the program code when “patches” like that might be needed, since the final decision about the order of commands execution depends on the scheduler, which may not necessarily “patch the hole” in the right moment of time.

Moreover, you know already that Pentium 4 processor features 5 schedulers. Each scheduler has its own replay system, so the commands of different types circle inside different replay loops thus increasing the number of “holes” in each of them, which automatically increases the probability of the dependent commands looping.

As we managed to find out, the “holes” between commands resent for repeated execution, which create long-lasting replay looping, are the major reason why the MOV EAX, [EAX] succession dependency chain will not help during the L1 cache latency measurements in the Pentium 4 processor. The existence of “holes” also explains the graph we offered you in the very beginning of this article. It turns out that once the commands from the dependency chain fall into the “holes” and worst comes to worst, they start lopping more than actually necessary thus increasing the execution latency tremendously. The number of these “loops” they make depends on the combination of ‘holes”, load commands and the number of commands between loads.
Our investigation used undocumented command counters for the operation sent to replay. Besides, we also developed special tests where we arranged the commands between the loads in such a way that the created “patches” wouldn’t let the commands from the dependency chain to get into the “holes” between the replayed commands. Our results proved the theory: if the “holes” are “patched” in time with other commands the succession will be protected against looping and the program code will be executed much faster, and the measured cache latency will match the values claimed in the official documents.

This way, the usual way of calculating the latency doesn’t work for NetBurst architecture (such as the latencies in case of L1 cache miss, for example). The actual number will include not only the latency itself, but also the additional delay caused by replay looping. The looping may actually take quite long, so the effective latency may reach hundreds of clocks instead of only 9. The worst thing however, is that replay delays the execution of not just once instruction, but blocks some execution resources, that could have been used for other independent operations in the meanwhile.

<<< Previous page Next page >>>

Discussion

Comments currently: 25
Discussion started: 06/08/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

Tuesday, July 8, 2008

11:01 pm DreamWorks and Intel Sign Pact: Larrabee, Xeon Set to Be Used. DreamWorks Switches from AMD to Intel

6:07 pm AMD Loses Microprocessor Revenue Share to Intel – iSuppli. AMD, Intel Continue to Gain CPU Revenue Share

 
News Archive
All Latest News