Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Prescott: The Last of the Mohicans? (Pentium 4: from Willamette to Prescott) (page 21)


Category: CPU

by Victor Kartunov

[ 05/25/2005 | 11:45 AM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28

So, at stage 6 (Allocator) three micro-operations are selected from the only available queue called Fetch Queue. Then we reserve processor resources for these micro-operations. After that these operations are located in two uopQ queues. One of these queues is for address operations, and another – for all other operations. As you can see from our explanation of the pipeline architecture these operations are distributed between the two queues at Stage 9.

The main task of the uopQ queues is to distribute micro-operations of different types between different schedulers correctly. That is exactly why uopQ for address operations accepts only two types of uops: “load [address]” and “store [address]”. All other operations, including “store data” are placed in another, major, queue. The address queue can be 16 micro-operations deep, the major queue is longer, and can receive twice as many operations, i.e. 32. The micro-operations are lined up in these queues sequentially: when one of the queues gets overflown, another one closes and doesn’t accept micro-operations any more. There are two practical advantages in this queue organization: “early” loading and faster execution of short parts of the code depending on the results of longer operations.

When the schedulers process micro-operations (at stages 10, 11 and 12) the uop-s get distributed among the next type of queues: scheduler queues.

The micro-operations are sent sequentially from the uopQ, according to the FIFO principle (first in – first out). The queues here work independently from one another: for instance the micro-operation from the address queue can be selected before all other preceding operations leave the “main” queue. In many cases this allows to start loading the data in advance, which can be helpful. However, there is also another side to the coin: this independence increases pretty significantly the probability of incorrect situations, such as the attempt to read the data before this data has actually been calculated by the corresponding micro-operation of the main queue.

The data transfer rate however is pretty high here: up to two micro-operations per clock cycle for each scheduler. And this speed is valid not only for fast schedulers (see below), but can also be achieved by slow schedulers. uopQ queues are very sensitive to semi-clock events (and thus they can also be considered units working at twice the clock frequency). If the micro-operation cannot be sent to the scheduler because of the schQ queue overflow, the transfer of all other uop-s from the uopQ is halted. If the micro-operation can be sent to one of the two available schedulers, then the system can actually chose the scheduler depending on the schQ queues status.

There are five queues as well as five schedulers, we have already mentioned it in the pipeline description. Let me list all these schedulers now:

FAST_0 – works with ALU micro-operations: logical operations (and, or, xor, test, not); ALU store data; branch; transfer operations (mov reg-reg, mov reg-imm, movzx/movsx, lea simple forms); simple arithmetic operations (add/sub, cmp, inc/dec, neg).

FAST_1 – works with ALU micro-operations. Among them are transfer operations subsets (except movsx) and arithmetic operations subsets (except neg). It looks like all operations sent to FAST_1 can also be sent to FAST_0.

SLOW_0 – works with FPU micro-operations dealing with data transfer and conversion. (for x87, MMX, SSE, SSE2-instructions); FPU store data, too.

SLOW_1 – works with ALU- and FPU micro-operations: a number of simple ALU-operations (shift/rotate; some uop-s created by adc/sbb) and all complex uop-s starting with multiplication, as well as the majority of “computational” FPU-operations.

MEM – AGU-operations: load and store address.

It is evident that all operations from the uopQ address queue are sent to the MEM scheduler queue. All other micro-operations fall into one of the remaining four scheduler queues.

<<< Previous page Next page >>>

Discussion

Comments currently: 17
Discussion started: 05/26/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 24, 2008

11:06 pm Intel Rumoured to Speed Up Nehalem Launch on Desktop. Intel’s Bloomfield Processor to Emerge in September – Rumours

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

 
News Archive
All Latest News