Articles: CPU
 

Bookmark and Share

(14) 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 ]

Cache Subsystem

Doubling the FLOPs is certainly great, but the challenge here is that you need to be able to feed these execution units. Therefore, Intel made significant effort to improve their cache hierarchy. Internal cache structure and cache size remained the same, but they changed the bandwidths to the caches. It was done mainly to ensure that the cache speed is adequate to the high speed of AVX2 instructions execution in the core. The read and write ports in Haswell’s L1 cache are 32-byte (256-bit) wide. So, Haswell can do two reads and one write per clock, all 32-byte wide at the same time. They also removed restrictions around banking. So, overall, the improvements in L1 cache include the following: double the bandwidth, eliminated back conflicts, significantly improved L1 cache line split latency.

They have already made the L2 cache bus wider, so that now it can receive up to 64 bytes of data per clock cycle, which is twice as much as it can do in Sandy Bridge and Ivy Bridge.

The improvements in cache-memory performance deal only with the bandwidth, while the latency remains the same as before. Moreover, Intel hasn’t yet revealed any details about the size of the L3 cache memory.

As we know at this point the size of Haswell’s L3 cache will depend on the number of cores, and its internal structure will remain the same as before and will include uni-directional Ring Bus with two stops for each core, which we are very well familiar with from CPUs on Sandy Bridge and Ivy Bridge microarchitecture. However, its throughput should be increased due to the fact that data and non-data requests will be processed separately. Besides, they also promised to further optimize the memory controller, which should become more “bufferized” and therefore should guarantee higher write speed.

Transactional Synchronization Extensions

AVX2 is not the only new instruction set introduced in the Haswell microarchitecture. Intel has also developed Transactional Synchronization Extensions (TSX) that add hardware transactional memory support.

Intel want to make sure that it is easy for developers to write parallel code. TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory is a new instruction set interface that provides greater flexibility for programmers. TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses and aborts and rolls back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.

We want to make sure that it is easy for developers to write parallel code. TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory is a new instruction set interface that provides greater flexibility for programmers. TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses and aborts and rolls back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.

 
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 ]

Discussion

Comments currently: 14
Discussion started: 09/12/12 03:17:16 PM
Latest comment: 05/20/13 03:32:36 PM

View comments

Add your Comment