Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Prescott: The Last of the Mohicans? (Pentium 4: from Willamette to Prescott). Part II (page 7)


Category: CPU

by Victor Kartunov

[ 05/30/2005 | 01:32 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21

At the [0] moment of time we received the micro-operation. The data from the L2 cache will arrive at the moment of time indicated as [0 + L2 cache access latency]. Northwood core features 9 clock cycles L2 cache access latency in the general case (to be more exact it equals 7 clocks, but the “data load” command will first check if the data is available in the L1 data cache, which requires 2 additional clocks). So, the scheduler will send out the next micro-operation, so that it arrives at the execution unit 9 clocks later.

In fact, this option will hardly work for us, as it takes 9 clock cycles to execute only one micro-operation. We will not accept this scheduler strategy, because it definitely is not the right way to high performance.

Second option (upon agreement). The idea is to delay all micro-operations depending on the results of the data load command until the data arrives, and then start sending micro-operations for further execution. The good thing about this strategy is that it doesn’t require any additional effort: sit and wait for the data. The negative side of it is that it doesn’t always guarantee good performance in the long run.

If we had a micro-operation of the second type, the scheduler could take into account the info about its execution status from the execution units. In this case the scheduler would need to receive feedback from the execution units about the estimated execution time for the given instruction. In fact, this is quite possible (that this strategy is applied to the FPU load), however, there is one unpleasant issue.

Suppose that we were really lucky and the data is available in the L1 cache. By Northwood processor core, the data will take two clock cycles to be delivered from L1 cache.

Say, the execution unit received a micro-operation at the [0] time point. At the [0+2 clocks] point, it sent the status report to the scheduler and received the data from L1 cache. It immediately reports to the scheduler and the latter immediately releases the next micro-operation to the pipeline. This micro-operation will take 6 clock cycles to reach the execution unit.

Everything seems to be correct, but what have we got in the end? Let’s sum up the results: our second micro-operation will reach the execution unit in 0+2+6 clocks, because it still needs to pass all the stages between the scheduler and the execution unit: the distance between them hasn’t got any smaller. It means we need 8 clock cycles total. It turns out that the dependent instruction started moving towards the execution unit not when the data is already ready - [0+2 clocks] time point, but at [0+2+6 clocks], i.e. 6 clock cycles later. In other words, we lost 6 clock cycles!

Well, this is not the best option, I should say.

<<< Previous page Next page >>>

Discussion

Comments currently: 16
Discussion started: 05/31/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

 
News Archive
All Latest News