Summing up I can say that replay is not an independent trait of the NetBurst architecture, intended to increase the processors performance. Replay is more likely to be regarded as “the other side to the picture” of the longer pipeline, as an auxiliary mechanism intended to help resolve speculation issues. The performance drop caused by replay is the price we have to pay for high working frequency. That is probably why we can rarely come across some scarce mention of the replay system in Intel’s official manuals and docs. Replay very often causes unjustified waste of resources and significant performance drops.
Since there is no description of replay causes and consequences in the official documentation, many software developers simply have no idea that it exists and thus cannot optimize their programs for it. It is definitely good news that Prescott processors acquired replay queues reducing the negative influence of the replay on two parallel working threads when HT technology is enabled, however, even in this case we see a 20% performance drop in the second thread cause by the replay of the first one.