Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Intel Prescott: One More Willamette-like Slow Processor or a Worthy Piece? (page 5)


Category: CPU

by Ilya Gavrichenkov

[ 02/01/2004 | 12:01 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14

Prescott’s Enhanced Architecture

With the launching of the new Prescott processor Intel made a significant step forward towards successful improvement of their NetBurst architecture. The picture below shows something like a NetBurst genealogical tree with highlighted improvements introduced in the new Prescott processor core.

Let’s discuss these improvements in a bit more detail now.

Improved Branch Predictor. Most processor delays are caused by the necessity to clear and refill anew Prescott’s long pipeline after incorrect branch predictions. Therefore, the best way to eliminate these delays is to avoid incorrect predictions at all. Although the branch prediction algorithm of the NetBurst architecture was very efficient from the very beginning, Intel managed to improve this efficiency even more now.

The work of the Branch Prediction unit in Intel processors with NetBurst architecture is based on the work with Branch target Buffer (BTB). It is a 4KB buffer storing the statistics about the already complete branching. In other words, Intel’s branch prediction is based on a probabilistic model: the CPU evaluates a given branch as preferable or not in each particular case according to the collected statistical data. This algorithm proved very efficient, however, it turns out absolutely useless if there is no statistics about a certain branch. The Northwood based CPUs selected a “backward” branch in this case, considering that quitting cycles is the most widely spread branch.

This statistical algorithm of branch predictions has been significantly improved in the new Prescott core. Now, if there is no statistics about a certain branch, the branch prediction unit doesn’t draw any definite conclusions about the branch direction. Since the backward branches are usually not any longer than a certain empirically calculated branch distance, the branch prediction unit bases its decision on the branch distance for this particular case.

Moreover, the dynamic branch prediction algorithm has also been slightly improved. Prescott processor acquired an indirect branch predictor, which was first used in Pentium M processors and proved highly efficient there.

So, if Northwood based processors boasted the average of 0.86 incorrect predictions for every 100 instructions, then the new Prescott boasts a lower value of 0.75 for every 100 instructions. In other words, we got 12% less incorrect branch predictions, which leads to fewer delays caused by the necessity to empty and refill the execution pipeline.

Faster Instructions Execution.  The new processor core has the same number of integer ALUs: there are two integer ALUs working at the double core frequency for simple instructions and one more ALU one for complex instructions. However, the some instructions are processed much faster now. Prescott owes this performance increase to a few changes introduced in the ALU units.

First of all, I would like to mention that Intel added a shifter/rotator unit into one of the fast ALUs performing all instructions like shifts and rotations. As a result these instructions are now performed much faster, because in the previous Pentium 4 processors they were regarded as complex instructions and hence processed by the slow ALU.

The integer multiplication will also be performed faster by Prescott processors. In the previous versions of Intel’s NetBurst architecture integer multiplication was performed by the FPU, which required operands to be translated into floating-point format and then back to the integer format. In Prescott processor the integer multiplication is performed by the integer ALU, which definitely works considerably faster.

According to the measurements, the shifts and rotations are now performed at least 4 times faster, while integer multiplication got 25% faster. However, we should still keep in mind that longer pipeline and different L1 cache working algorithms have affected the time required for other simple instructions processing. Many instructions, which used to require about half a clock cycle, now take the entire clock that is why it wouldn’t be correct to state the overall ALU performance improvement.

<<< Previous page Next page >>>

Discussion

Comments currently: 18
Discussion started: 02/02/04
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

 
News Archive
All Latest News