Those of you who have studied NetBurst micro-architecture carefully should know that Store operation is split into two quasi-independent micro-operations: Store data (STD) and Store Address (STA). The results of these two micro-operations are combined in the Store Buffer (SB). The data stays in the SB until the record command retires. After that they are saved in the cache and RAM via the intermediate Write Buffer, where the modified data is put back together. Data load commands perform speculative reading of the data which is not in cache yet directly from the SB. This process is called store-to-load-forwarding (STLF).
In order to the STLF to end up successfully, certain conditions should be fulfilled:
- The data requested by the read operation can be taken from the SB or from the L1D, but in no case should be a combination of both, so the size and the address of the read data should correspond to the SB record.
- By the time the data load is taking place the SB should already carry the correct results of STA and STD.
The violation of any of the above mentioned conditions may lead to very unpleasant delays. The re-execution of data load command, which couldn’t be completed successfully because of the STLF violation is also carried out via the replay system.
Well, in order for Store to be able to send the data for Load command, SB should have the STA and STD results ready in advance. Although “IA-32 Intel Architecture Optimization Reference Manual” classifies this condition as “Store Forwarding Restrictions”, we should realize that it requires specific processing and leads to specific consequences. The true STLF violation, when the data already located in the SB cannot be sent, results into a significant delay: store and all preceding instructions should retire first and the store result should be saves in the cache. In our case, i.e. during STLF Restriction on Data Availability, we should only wait for the STA/STD result. As you may have already guessed, replay works here: LD and all dependent instructions are sent to RL and circle there until the Store results arrive.
We have just studies two types of STLF violations: when by the time Load should be executed either STA or STD result is not ready yet (the third type, when none of the results is ready will be determined by the worst consequences of the first two types).