Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Prescott: The Last of the Mohicans? (Pentium 4: from Willamette to Prescott) (page 25)


Category: CPU

by Victor Kartunov

[ 05/25/2005 | 11:45 AM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28

It is a completely different story if we have Trace cache: we have some already decoded commands for different program threads (or different programs). So, it would be much easier to select the micro-operations for the appropriate threads: all you need is just to add a service index to the micro-operation, which will identify the thread it belongs to. Especially, sine the execution core is not very closely connected with the decoder and its functioning doesn’t depend on the decoder operation directly (all we need is the sufficient amount of pre-decoded instructions in the Trace cache).

So, both logical processors work on the same physical core. We have already mentioned above that all core resources will be split into large categories: “shared” and “distributed”. The so-called distributed resources include Fetch Queue, Uop Queue and Scheduler Queue. For each logical processor the queue depth is smaller because a part of its capacity is assigned to another logical processor. In other words, these resources are split in two halves: one for each logical processor. Or, for example, there is a position in the queue, which can be occupied only by the first logical processor, and another position can be used only by the second one. In case of shared resources their actual distribution between the logical processors will be arranges individually for each particular case. However, note that there is a special system preventing one logical processor from using up the entire resource capacity. You understand why this system is necessary: if we have one fast and one slow (or stalled) commands thread, then the latter can theoretically occupy all queues thus blocking the execution of the first thread.

Therefore, there should be a certain algorithm which would allow distributing the queue capacity between the two processors in the most efficient way. How can two threads share the positions in the UopQ (or any other queue) with the fixed number of positions? There are two different ways: competitive way (when each of the threads tries to take as much of the resources from the opponent as possible) and fixed way (50:50, for instance). In the first case it is quite possible that one of the threads will slow down or oust the second thread from the queue completely. In the second case, the resources will not be used efficiently, if one of the micro-operations threads requires more than 50% of them: one of the threads will lack resources, while the other one will simply waste them.

In the Pentium 4 processor the queue capacity for each thread is fixed: each of the logical processors has twice as short Fetch Queue, Uop Queue and Schedulers Queue at its disposal than in the example above with the disabled (absent) Hyper Threading technology. This certainly has some negative influence on the performance of each logical processor, but makes it impossible for any of the threads to block the processor. So, here the resources are distributed absolutely equitably. Important notice: the micro-operations are moving along the queue independently for each logical processor.

All other resources of the processor are shared in this interpretation: register file, schedulers, execution units, all caches, loading blocks. Here the resources are used according to the competitive principle: first come first served. Of course, not without arbitration: if both logical processors addressed the same unit, then the arbiter indicates strict processing order.

Let me offer you a schematic representation of a pipeline with the labeled distributed resources:


The blue color indicates the resources of one logical processor,
and the gray color – of another one.

<<< Previous page Next page >>>

Discussion

Comments currently: 17
Discussion started: 05/26/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

Tuesday, July 8, 2008

11:01 pm DreamWorks and Intel Sign Pact: Larrabee, Xeon Set to Be Used. DreamWorks Switches from AMD to Intel

6:07 pm AMD Loses Microprocessor Revenue Share to Intel – iSuppli. AMD, Intel Continue to Gain CPU Revenue Share

 
News Archive
All Latest News