Bookmark and Share


Intel Corp. this week published details regarding new advanced vector extensions 512 (Intel AVX-512) instructions first to be found in the Xeon Phi “Knights Landing” co-processors as well as future microprocessors. The new instructions enables processing of twice the number of data elements that AVX/AVX2 can process and represent a significant leap to 512-bit SIMD support.

512-Bit AVX Instructions Incoming

Intel AVX-512 instructions are important because they offer higher performance for the most demanding computational tasks. Intel AVX-512 instructions offer the highest degree of compiler support by including an unprecedented level of richness in the design of the instructions. Programs can pack eight double precision or sixteen single precision floating-point numbers, or eight 64-bit integers, or sixteen 32-bit integers within the 512-bit vectors. This enables processing of twice the number of data elements that AVX/AVX2 can process with a single instruction and four times that of SSE.

Intel AVX-512 features include 32 vector registers each 512 bits wide, eight dedicated mask registers, 512-bit operations on packed floating point data or packed integer data, embedded rounding controls (override global settings), embedded broadcast, embedded floating-point fault suppression, embedded memory fault suppression, new operations, additional gather/scatter support, high speed math instructions, compact representation of large displacement value, and the ability to have optional capabilities beyond the foundational capabilities. It is interesting to note that the 32 ZMM registers represent 2K of register space.

Intel AVX-512 will be first implemented in the future Intel Xeon Phi processor and coprocessor known by the code name Knights Landing, and will also be supported by some future Xeon processors scheduled to be introduced after Knights Landing. Intel AVX-512 brings the capabilities of 512-bit vector operations, first seen in the first Xeon Phi co-processors (previously code named Knights Corner), into the official Intel instruction set in a way that can be utilized in processors as well. Intel AVX-512 offers some improvements and refinement over the 512-bit SIMD found on Knights Corner that I've seen bring smiles to compiler writers and application developers alike. This is done in a way that offers source code compatibility for almost all applications with a simple recompile or relinking to libraries with Knights Landing support.

512-Bit Instructions Detailed

Intel AVX-512 offers a level of compatibility with AVX that is stronger than prior transitions to new widths for SIMD operations. Unlike SSE and AVX that cannot be mixed without performance penalties, the mixing of AVX and Intel AVX-512 instructions is supported without penalty. AVX registers YMM0–YMM15 map into the Intel AVX-512 registers ZMM0–ZMM15, very much like SSE registers map into AVX registers. Therefore, in processors with Intel AVX-512 support, AVX and AVX2 instructions operate on the lower 128 or 256 bits of the first 16 ZMM registers.

Intel AVX instructions use the VEX prefix while Intel AVX-512 instructions use the EVEX prefix which is one byte longer. The EVEX prefix enables the additional functionality of Intel AVX-512. In general, if the extra capabilities of the EVEX prefix are not needed then the AVX2 instructions can be used, coded using the VEX prefix saving a byte in certain cases. Such optimizations can be done in compiler code generators or assemblers automatically

Intel AVX-512 foundation instructions will be included in all implementations of Intel AVX-512. Products may also include capabilities that extend Intel AVX-512 and have distinct CPUID bits for detection. Knights Landing will support three sets of capabilities to augment the foundation instructions. This is documented in the programmer’s guide; they are known as Intel AVX-512 Conflict Detection Instructions (CDI), Intel AVX-512 Exponential and Reciprocal Instructions (ERI) and Intel AVX-512 Prefetch Instructions (PFI). These capabilities provide efficient conflict detection to allow more loops to be vectorized, exponential and reciprocal operations and new prefetch capabilities, respectively.

The evolution to Intel AVX-512 contributes to Intel’s goal to grow peak FLOP/sec by 8x over 4 generations: 2x with AVX1.0 with the Sandy Bridge architecture over the prior SSE4.2, extended by Ivy Bridge architecture with 16-bit float and random number support, 2x with AVX2.0 and its fused multiply-add (FMA) in the Haswell architecture and then 2x more with Intel AVX-512.

Tags: Intel, Core, Xeon Phi, Knights Landing, Skylake


Comments currently: 6
Discussion started: 07/25/13 09:57:16 PM
Latest comment: 11/29/16 07:17:08 AM
Expand all threads | Collapse all threads


that roadmap must be outdated considering haswell is already out and has no DDR4 support. skylake will be the first intel cpu with DDR4 support when it comes out in 2015. With that said i'd imagine broadwell may get the avx 512 support as that will prob be the only major diff between haswell and broadwell other then a die shrink as a selling point.
1 2 [Posted by: SteelCity1981  | Date: 07/25/13 09:57:16 PM]
- collapse thread

This is roadmap is for Xeon

Haswell-E support DDR4 and it will be out in 2014
0 1 [Posted by: maroon1  | Date: 07/26/13 05:42:04 AM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture