Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Replay: Unknown Features of the NetBurst Core (page 10)


Category: CPU

by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev

[ 06/06/2005 | 04:20 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17

We paid special attention to them, because they are really hard to foresee in the program code, unlike the data size change. During the tests we carried out for the STLF violations, we discovered that they can cause re-execution within the RL-7 as well as within the RL-12, depending on the type of violation. When STD is not ready, RL-7 starts working, and when STA is not ready – RL-12. In fact, this is fairly simple: when we are executing the Load command, we search for the data in the Store Buffer; once a position with the coinciding address is found, the possible data absence is detected immediately. However, if the data is there but the address hasn’t arrived yet, the CPU can only hope for the best and assume that this Store will not affect the Load. Later on it will perform a Check and in case of failure the operation will be sent to RL-12.

In order to ensure that LD is executed correctly from the very beginning, it should be sent for execution not any earlier than STA and STD. note that it should be sent at least 3 clocks later than STA. The processor will not allow Load to be executed ahead of STA, however, 3 clock cycles is a pretty big gap, so the Load queue scheduler will try to take advantage of it, because it doesn’t know anything about the hidden dependence of the addresses on the Store.

Pentium 4 architecture provides excellent opportunities for preliminary date loading: this is where the address operation queues come in handy. Therefore, you cannot guarantee that STLF violations will be avoided even if you try to adjust the algorithm implementation. Any pair of dependent Store-Load operations in the “window” of a few dozens of instructions is a potential hazard.

One of the most illustrative examples here could be the function calls we see in all program codes. The function call is usually performed by a chain of stack parameter store commands – PUSH – and the actual CALL command.
…. 
PUSH EAX
CALL Func 
………
Then the registers are saved and the parameters are read within the function call.

PUSH ESI 
MOV EAX, [ESP+8] 
………
You can notice that the PUSH EAX and MOV EAX, [ESP+8] pair is a potential cause of replay in case we have STLF Restriction on Data Availability: when STD is not ready (the procedure receives a result of long calculations) as well as when STA is not ready (if the scheduler sends MOV EAX,[ESP+8] for execution less than 3 clocks after PUSH EAX).

<<< Previous page Next page >>>

Discussion

Comments currently: 25
Discussion started: 06/08/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 24, 2008

11:06 pm Intel Rumoured to Speed Up Nehalem Launch on Desktop. Intel’s Bloomfield Processor to Emerge in September – Rumours

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

 
News Archive
All Latest News