Intel Haswell on Track for Release in 2013, Will Sport New Instructions to Boost Wide Range of Apps
Intel Publishes Specs of Haswell New Instructions
by Anton Shilov
06/26/2011 | 03:13 PM
Intel Corp. earlier this month published the first details about its next-generation micro-architecture code-named Haswell. The forthcoming chips will feature new instructions that accelerate a wide number of applications and workloads.
Intel’s Ivy Bridge and Haswell microprocessors are projected to offer increased performance amid moderate power supply. Those chips will enable new class of mobile computers that Intel calls ultrabooks. With Haswell, Intel promises to change the mainstream laptop thermal design point by reducing the microprocessor power to half of today’s design point, which means something like 15W - 18.5W. The chips are also expected to offer much improved integrated graphics cores.

These new set of instructions are build upon the instructions coming in Intel micro-architecture code name Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and extend the Intel Advanced Vector extensions (AVX) that launched in 2011.
The instructions fit into several categories:
- AVX2 - Integer data types expanded to 256-bit SIMD. AVX2’s integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With Haswell, Intel will have AVX for floating point, and AVX2 for integer data types.
- Bit manipulation instructions are useful for compressed database, hashing , large number arithmetic, and a variety of general purpose codes.
- Gather useful for vectorizing codes with nonadjacent data elements. Haswell gathers are masked for safety, (like the conditional loads and stores introduced in Intel AVX), which favors their use in codes with clipping or other conditionals.
- Any-to-Any permutes – useful shuffling operations. Haswell adds support for DWORD and QWORD granularity permutes across an entire 256-bit register.
- Vector-Vector Shifts: We added shifts with the vector shift controls. These are critical in vectorizing loops with variable shifts.
- Floating Point Multiply Accumulate – Intel’s floating-point multiply accumulate significantly increases peak flops and provides improved precision to further improve transcendental mathematics. They are broadly usable in high performance computing, professional quality imaging, and face detection. They operate on scalar, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types.
- The vector instructions build upon the expanded (256-bit) register state added in Intel AVX, and as such as supported by any operating system that supports Intel AVX.