Search<%BANNER[mem130]%>
<%BANNER[left_130x300]%>
<%BANNER[left_130x130_2]%>
InformationX-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news. <%BANNER[right_130x600]%>
|
<%BANNER[top_768x90]%>
|
|
|
<%BANNER[banner_468x60]%>
Articles: CPU
AMD K10 Micro-Architecture (page 2)Category: CPU by Yury Malich [ 08/17/2007 | 10:06 AM ] Instructions FetchProcessor starts the code processing from fetching instructions from the L1I instruction cache and their decoding. x86 instructions have variable length, which makes it harder to determine their boundaries before decoding starts. To ensure that the identification of the instructions length doesn’t affect the decoding speed, K8/K10 processors decode instructions while the lines are being loaded into L1I cache. Instruction labeling info is stored in special fields of the L1I cache (3bits of predecoding info per each byte of instructions). By performing the predecoding during loading into cache the instructions boundaries can be determined beyond the decoding pipes, which allows maintaining steady decoding rate independent of instructions format and length. Processors load blocks of instructions from the cache and then pick out instructions that need to be sent for decoding. A CPU on K10 micro-architecture fetches instructions from the L1I cache in aligned 32-byte blocks, while K8 and Core 2 processors fetch instructions in 16-byte blocks. At 16 bytes per clock the instructions are fetched fast enough for K8 and Core 2 processors to send three instructions with the average length of 5 bytes for decoding every clock cycle. However, some x86 instructions may be 16 bytes long and in some algorithms the length of a few adjacent instructions may be greater than 5 bytes. As a result, it is impossible to decode three instructions per clock in such cases. (Pic.1).
Namely, SSE2 – a simple instruction with operands of register-register type (for example, movapd xmm0, xmm1 ) – is 4 bytes long. However, if the instruction generates addressed memory requests using the base register and offset (for example, movapd xmm0, [eax+16] ), the instruction increases up to 6-9 bytes, depending on the offset. If additional registers are involved in 64-bit mode, there is one more single-byte REX-prefix added to the instruction code. This way, SSE2 instructions in 64-bit mode may become 7-10 bytes long. SSE1 instructions are 1 byte shorter, if it is a vector instruction (in other words, if it works on four 32-bit values). But if it is a scalar SSE1 instruction (on one operand) it can also be 7-10 bytes long in the same conditions. Fetching maximum 16-byte blocks is not a limitation for K8 processor in this case, because it cannot decode vector instructions faster than 3 per 2 clocks anyway. However, for K10 architecture a 16-byte block could become a bottleneck, so increasing the maximum fetch block size to 32 bytes is an absolutely justified measure. By the way, Core 2 processors fetch 16-byte instruction blocks, just like K8 processors, that is why they can decode efficiently 4 instructions per clock cycle if the average instruction length doesn’t exceed 4 bytes. Otherwise, the decoder will not be able to process 4 or even 3 instructions per clock efficiently enough. However, Core 2 processors feature a special internal 64-byte buffer that stores the last four requested 16-byte blocks. The instructions are fetched from this buffer at the rate 32 bytes per clock speed. This buffer allows caching short cycles, removing their fetching speed limitations and save up to 1 clock cycle each time the prediction to move to the cycle beginning is made. Although the cycles shouldn’t have more than 18 instructions, more than 4 conditional branches and no ret instructions in them. <%BANNER[banner_468x30]%>
|
Category NewsCategory: CPU Thursday, July 17, 20082:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD 12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom Wednesday, July 16, 200811:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever 7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops Tuesday, July 8, 200811:01 pm DreamWorks and Intel Sign Pact: Larrabee, Xeon Set to Be Used. DreamWorks Switches from AMD to Intel 6:07 pm AMD Loses Microprocessor Revenue Share to Intel – iSuppli. AMD, Intel Continue to Gain CPU Revenue Share All Latest News <%BANNER[right_130x130_1]%>
|
|
<%BANNER[foot_728x90]%> | ||
