Closer Look at Intel HD Graphics 4000/2500 Architecture
It is not so easy to increase the performance if integrated graphics cores. And Intel’s ability to make it 10 times better over the past few years is the result of intensive engineering effort. The major obstacle here is the fact that integrated graphics accelerators can’t take advantage of dedicated high-speed video memory and have to share with the computational cores the regular system RAM with pretty low bandwidth for contemporary 3D applications. Therefore, the first step towards improving the integrated graphics performance is to optimize the work with the memory sub-system.
Intel accomplished this important step in the previous version of their microarchitecture – Sandy Bridge. The introduction of ring bus that connects all CPU components (computational cores, L3 cache, graphics, system agent with the memory controller) offered the graphics core a short and advanced way of communicating with the system memory – via the fast L3 cache. In other words, integrated graphics core became just as entitled to using the L3 cache and memory controller as the computational processor cores, which significantly reduced delays caused by the need to wait for the delivery of the graphics data. Ring bus turned out such a successful innovation of the previous design that they used it as is, without any modifications, in the new Ivy Bridge microarchitecture.
As for the internal structure of the Ivy Bridge graphics core, it may be considered a further evolution of the previous generation Intel HD Graphics. It goes as far back as Clarkdale and Arrandale processors introduced back in 2010, but each new incarnation represents not a copy of the previous design, but an improvement.
Upon transition from Sandy Bridge to Ivy Bridge microarchitecture, the graphics performance improves primarily due to the larger number of execution units, especially since the internal structure of HD Graphics allows adding them easily. While the top Sandy Bridge graphics core, HD Graphics 3000, had 12 execution units, the fastest modification integrated into the new Ivy Bridge, HD Graphics 4000, has 16 execution units. However, this is not just it: the units themselves were also improved. They received a second texturing sampler and learned to process up to 3 instructions per clock cycle.
The ability of the graphics core to process data faster forced the developers to speed up the data delivery. Therefore, the Ivy Bridge graphics core has its own cache-memory. Its size hasn’t been disclosed, but it is most likely a small but very fast internal buffer.
Although the improvements in the graphics core microarchitecture may seem minor at first glance, they produce noticeable improvement in 3D performance, of as much as 2x, according to Intel. By the way, the next generation of HD Graphics accelerators, which will be integrated into the upcoming Haswell processors should also boast a 2x performance increase. They should have up to 20 execution units and will utilize L4 cache in order to lower the latencies during work with the memory sub-system.
As for the Ivy Bridge graphics, increasing its performance wasn’t the only goal. Besides that, they also adjusted the formal specifications of the new graphics core to the today’s needs. It means that HD Graphics 4000 has finally received full support of Shader Model 5.0 and hardware tessellation. In other words, Intel’s graphics is now fully compatible on a hardware level with DirectX 11 and OpenGL 3.1 software interfaces. And of course, HD Graphics 4000 won’t have any problems in the upcoming Windows 8 operating system: the necessary drivers are already available on the official Intel web-site.
Intel also enabled the new graphics core to process necessary computational operations that is why the new generation HD Graphics supports DirectCompute 5.0 and OpenCL. Sandy Bridge processors also supported these software interfaces, but only in the driver, which would readdress the corresponding load to respective computational cores. Now Ivy Bridge allows using GPU for fully-fledged calculations in systems with Intel graphics.
In light of these new realities, Intel engineers decided to pay special attention to multi-monitor configurations, which are becoming increasingly popular these days. HD Graphics 4000 graphics core has become the first integrated solution from Intel, which is capable of working with three independent displays simultaneously. But it is important to keep in mind that they had to widen the FDI bus transferring images from the processor to the system chipset. So, three-monitor configurations can only be built with new mainboards based on the seventh series chipsets.
Moreover, there are a few limitations in resolutions and monitor connection types. Theoretically, a desktop system on an Ivy Bridge processor can offer three outs: the first one - a universal out (HDMI, DVI, VGA or DisplayPort) with maximum resolution of 1920x1200, the second one – a DisplayPort, HDMI or DVI with up to 1920x1200 resolution, and the third one – a DisplayPort supporting higher resolutions up to 2560x1600. In other words, a popular connection option when WQXGA monitors are connected to Intel HD Graphics 4000 via Dual-Link DVI is still unavailable. However, it supports HDMI protocol version 1.4a, and DisplayPort version 1.1a, which means 3D support in the former case and the ability of the interface to transfer audio stream – in the latter.
Innovations have also touched upon some other parts of the Ivy Bridge graphics core, including their multimedia capabilities. High-quality decoding of AVC/H.264, VC-1 and MPEG-2 formats was implemented successfully in the previous generation of HD Graphics, but in Ivy Bridge they improved AVC-decoding algorithms even more. New design of the context-adaptive decoding module improved the overall performance of the hardware decoder, which enabled simultaneous displaying of several data streams with up to 4096x4096 resolution.
The Quick Sync technology accelerating hardware transcoding into AVC/H.264 format has also experienced a touch of progress. It was recognized as a significant breakthrough when it was first introduced in Sandy Bridge processors 1.5 years ago. It put Intel processors at the forefront of HD video transcoding, because now there was a special hardware unit within the graphics core responsible for it. Quick Sync technology in HD Graphics 4000 has been improved even more and received an enhanced media sampler. As a result, the enhanced Quick Sync engine almost doubles the H.264 transcoding speed of the previous version used in Sandy Bridge. Moreover, the image quality of the video produced by the codec has also improved, and it now supports HD video content with up to 4096x4096 resolution.
However, Quick Sync still has a few weak spots. Right now this technology is only utilized in commercial video transcoding applications. And there are no free utilities out there that would take advantage of this technology. Another issue is too close of a connection with the graphics core. If there is an external graphics accelerator in your system, which will normally disable the integrated graphics, you won’t be able to use Quick Sync at all. Although LucidLogix Company has a solution for you: their virtualization technology called Virtu.
Nevertheless, Quick Sync remains a unique technology in the today’s market. The special hardware codec implemented as part of this technology offers much better transcoding than the shader processors of contemporary graphics accelerators could offer. Only Nvidia managed so far to come up with an alternative hardware solution. Although their special NVEnc unit has just recently appeared in the graphics accelerators from the latest Kepler generation.
Summing up all the innovations in the new generation integrated graphics core from Intel, we would like to show you the following chart highlighting the advantages of the new HD Graphics 4000 compared with the predecessors: