Articles: CPU
 

Bookmark and Share

(47) 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 ]

Sideband Stack Optimizer

Decoder schemes of the new K10 processors acquired a special block called Sideband Stack Optimizer. Its working principle is similar to that of the new Stack Pointer Tracker unit that is employed in Core processors. What do we need it for? x86 system uses CALL, RET, PUSH and POP instructions to call a function, retire a function, transfer parameters of a function and save register contents. All these instructions use, though not directly, the ESP register indicating the current stack-pointer value. When you call a function in K8 processor you can follow the execution of these instructions by representing their decoding as a succession of equivalent elementary operations modifying the stack register and load/save instructions.

Instructions

Equivalent operations

// func(x, y, z);
push X
push Y
push Z
call func


sub esp, 4; mov [esp], X
sub esp, 4; mov [esp], Y
sub esp, 4; mov [esp], Z
sub esp, 4; mov [esp], eip; jmp func

push esi
push edi
mov eax, [esp+16]
........
pop edi,
pop esi
ret

sub esp, 4; mov [esp], esi
sub esp, 4; mov [esp], edi
mov eax, [esp+16]
..............
mov edi, [esp]; add esp, 4
mov esi, [esp]; add esp, 4
jmp [esp]; add esp, 4

add esp, 12

add esp, 12

As you can see from this example, when the function is called, instructions sequentially modify the ESP register, so each next instruction is implicitly dependent on the result of the previous one. Instructions in this chain cannot be reordered that is why the execution of the function body starting with mov eax, [esp+16] cannot begin unless the last PUSH instruction has been executed. Sideband Stack Optimizer unit tracks the stack status changes and modifies the instructions chain into an independent one by adjusting the stack offset for each instruction and placing sync-MOP operations (top of the stack synchronization) in front of the instructions that work directly with the stack register. This way instructions working directly with the stack can be reordered without any limitations.

Instructions

Equivalent operations

// func(x, y, z);
push X
push Y
push Z
call func

mov [esp-4], X
mov [esp-8], Y
mov [esp-12], Z
mov [esp-16], eip; jmp func

push esi
push edi

mov eax, [esp+16]
........
pop edi,
pop esi
ret

mov [esp-20], esi
mov [esp-24], edi
sub esp, 24
mov eax, [esp+16]
..............
mov esi, [esp]
mov edi, [esp+4]
jmp [esp+8]



sync-MOP


add esp, 12

add esp, 12
add esp, 12

sync-MOP

mov eax, [esp+16] instruction that starts the calculations in the function body in our example depends only on the sync-MOP operation. Now these operations can be performed simultaneously with other instructions before them. This way the parameters passing and register saving happen faster and the function body can start loading these parameters and working with them even before all of them have been passed successfully and before the registers saving has been complete.

So, faster stack operations decoding, Sideband Stack Optimizer unit, deeper return-address stack and successful prediction of indirect alternating branches make K10 much more efficient for processing of function-rich codes.

K10 processor decoder will not be able to decode 4 instructions per clock like Core 2 decoder in favorable conditions, but it will not become a bottleneck during programs execution. The average instructions processing speed hardly ever reaches 3 instructions per clock, therefore K10 decoder will be efficient enough for the computational units not to lack any instructions in the queues and hence not to idle.

Instruction Control Unit

Decoded Mop triplets arrive to the Instruction Control Unit (ICU) that moves them to the reorder buffer (ROB). Reorder buffer consists of 24 lines three MOPs in each. Each MOP triplet is written in its own line. As a result, ROB allows the control unit to monitor the status of up to 72 MOPs until they retire.

From the reorder buffer MOPs are being dispatched to the scheduler queues of integer and floating-point units in exact same order, in which they retired from decoder. MOP triplets are stored in the ROB until all older operations are executed and retired. On retirement, the final values are written down into architectural registers and memory. The program order, in which the operations were placed to the ROB, is maintained when operations retire, their data is deleted from ROB and the final values are saved. It is necessary to make sure that the results of all further operations completed ahead of time can be canceled in case of exception or interrupt.

 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 ]

Discussion

Comments currently: 47
Discussion started: 08/17/07 01:21:05 PM
Latest comment: 11/23/07 07:24:13 AM

View comments

Add your Comment