##### Real-time Pricing and Availability:
 - \$209.99 - \$219.99
 - \$219.92 SeaBoom.com - \$251.13

Articles: CPU
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 ]

### “Holes”

Since the distance between the operations returned to replay system remains the same all the time, sometimes there appear empty unoccupied staged between them. The Checker never sends the stop-signal to the scheduler for them. Let’s call these empty stages “holes”. The scheduler can send out a command for execution if the clock cycle coincides with a hole. This allows using the processor computational resources most efficiently, as we can mix together commands from different dependency chains. Let’s take a look at the following example:

ADD R1, R2 //1 – R1 = R1+R2
ADD R1, R2 //2 – R1 = R1+R2
ADD R1, R2 //3 – R1 = R1+R2
ADD R1, R2 //4 – R1 = R1+R2
ADD R1, R2 //5 – R1 = R1+R2
ADD R3, R4 //6 – R3 = R3+R4
ADD R3, R4 //7 – R3 = R3+R4
ADD R3, R4 //8 – R3 = R3+R4
...

We’ve got two chains of dependent commands: a chain of R1 register dependencies and a chain of R3 register dependencies. To simplify the example suppose that all commands of all types are sent to the same scheduler one by one, that is why the LD R3, [Y] command cannot be scheduled for execution before the fifth ADD R1, R2 command. Let’s take the L1 loading command latency equal to 2 clock cycles, and ADD latency – one clock cycle.

Here are two cases to be considered:

Pic.4a

Pic.4b

1. X and Y values are in L1 cache (L1 hit, Pic.4a). No surprises here. No commands go to the replay system, all commands retire one by one.

This example shows that the scheduler tries to use the computational resources more efficiently by using the “hole” between the LD and ADD1 commands. It inserts ADD6 command from the independent succession there. Unfortunately, there is another side to this hunt for efficiency. Let’s talk more about it now.

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 ]