Articles: CPU
 

Bookmark and Share

(25) 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 ]

Livelocks

Here we would like to dwell on one more not very evident problem caused by replay. When we discussed the replay working principles, we mentioned a few times that the scheduler halts the work when the command turned to replay approaches replay mux. Here it is quite possible that the entire chain of commands send by the scheduler to the pipeline before the work stopped, will be turned to replay. In this case the entire chain of re-executed commands will keep circling in the replay system until the requested data arrives. And have you ever asked yourself what will happen if the instruction to be executed for the successful execution of the entire chain currently in replay is still in the scheduler and cannot be released because the entire pipeline is full? This situation is called livelock. An example of a situation like that can be the case when TSD is not ready during store-to-load-forwarding. Here is what it will look like in the program code:

IMUL EAX
MOV [ESI], EAX
MOV EBX, [ESI]
14*{ AND EBX, EBX } // and ebx, ebx, repeated 14 times

The store in memory command is split into quasi independent micro-operations aka STA and STD. these micro-operations get into different schedulers: STA gets into the same scheduler as LD command, and STD gets into the same scheduler as AND command. This distribution of micro-operations is a perfect cause for a livelock. This is how it happens:

  • While EAX is calculated, Mem scheduler sends the STA micro-operation of the MOV [ESI], EAX command for execution, since the ESI value has already been obtained.
  • After STA, the Mem scheduler sends for execution the LD command corresponding to MOV EBX, [ESI], hoping that by the time LD has been executed the STA/STD data will be already available.
  • Two clocks later after LD micro-operation has been sent, Fast_0 scheduler starts sending out speculatively a chain of AND EBX, EBX commands for execution to FastALU0. STD cannot be sent yet at this time, as the EAX data is not ready yet.
  • Since STD is not ready, LD reaches the result check stage and gets turned to replay.
  • Following LD, the entire AND commands chain goes to replay in its execution pipeline.
  • Since fastALU0 scheduler doesn’t know anything about the “dirty trick”, it continues sending AND commands to its execution pipeline until it has to stop to let in the AND commands returned for re-execution.
  • FastALU0 pipeline turns out packed with circling AND commands, which cannot be executed correctly, until LD is. LD, in its turn, is waiting for STD, which can be sent out after EAX has been calculated, but will never be, because the pipeline is packed with AND commands.This is a typical livelock.


Pic.8: Livelock

Of course, the developers didn’t intend to design a processor, which could be locked dead with a certain chain of commands: this problem had to be solved. In fact, our experiments showed that the code execution in the examples, such as the one above, will resume in a few dozens of clock cycles. We have to confess that we do not know all the details about the pipeline clearing mechanisms, but one of the possible solutions will be discussed in detail in the next chapter.

 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 ]

Discussion

Comments currently: 25
Discussion started: 06/08/05 05:25:05 AM
Latest comment: 08/25/06 02:33:35 PM

View comments

Add your Comment