Search<%BANNER[mem130]%>
<%BANNER[left_130x300]%>
<%BANNER[left_130x130_2]%>
InformationX-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news. <%BANNER[right_130x600]%>
|
<%BANNER[top_768x90]%>
|
|
|
<%BANNER[banner_468x60]%>
Articles: CPU
Replay: Unknown Features of the NetBurst Core (page 8)Category: CPU by Victor Kartunov , Yury Malich , Jan Keruchenko aka C@t , and Vadim Levchenko aka VLev [ 06/06/2005 | 04:20 PM ] Replay at FPU PipelineThe replay mechanism in the FPU pipeline works according to a different algorithm than the ALU replay. It looks like there is a sort of feedback between the data loading unit and the scheduler. Once the L1 data cache has been checked for data availability and the data has been found there, the scheduler sends the dependent instruction further. So, if the data is reported missing in L1 data cache (such as RL-7 loop for ALU loading), FP-load where x87, MMX, SSE and SSE2 belong, is replayed, but the dependent instructions do not get issued. For RL-12 there is no difference in this case: FP operations are circling in the RL just the same way. If the data is found in L1 cache, the latency of FP-load operations is 9 clock cycles. If the data is not there, we add n*7 or n*12 clock cycles depending on the situation. In fact, we failed to send any chain of FP-operations to RL-7 at all. For example, if there is an Int-chain circling around RL-7, then the dependent FP-chain will get onto RL-12. For instance, two instructions “MOVD MM0,EAX – MOVD EAX,MM0” transfer the Int-chain from RL-7 to RL-12 (EAX dependency). Why so and not the other way around? We assume that most instructions going via FP Move actually go through something like the “Convert & Classify” K8 unit, where the result is translated into a certain internal representation form (formatting). This hypothesis is proven by the following facts:
Maybe most FP Move operations are none other but more or less fixed pairs of primitive commands like “load + convert” or “convert + store”, where the ‘convert” part takes about 6-7 clock cycles. Speaking about replay again: in this (hypothetical) case the time required for “convert” execution exceeds the “distance” in clock cycles between the scheduler and the execution unit. So, the scheduler can safely send the dependent operation further according to the first check result. In case of failure, only the “load + convert” pair will need to be replayed. <%BANNER[banner_468x30]%>
|
Category NewsCategory: CPU Wednesday, July 23, 20083:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company Monday, July 21, 20088:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008 Thursday, July 17, 20082:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD 12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom Wednesday, July 16, 200811:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever 7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops All Latest News <%BANNER[right_130x130_1]%>
|
|
<%BANNER[foot_728x90]%> | ||