by Anton Shilov
03/24/2014 | 10:08 PM
Microsoft Corp. has announced the latest version of its DirectX application programing interface. Just like its predecessors over the last twenty years, the DirectX 12 will bring new capabilities as well as improve performance. But perhaps, more importantly, it will address all platforms that Microsoft serves – PCs, tablets, smartphones and Xbox One – and will eliminate performance bottlenecks by at least a certain degree.
One of the key features of the DX12 is that it provides a lower level of hardware abstraction than ever before (which some developers call close-to-metal, CTM, approach), allowing games to significantly improve multithread scaling and CPU utilization. In addition, games will benefit from reduced GPU overhead via features such as descriptor tables and concise pipeline state objects. Besides, Direct3D 12 also introduces a set of new rendering pipeline features that will dramatically improve the efficiency of algorithms such as order-independent transparency, collision detection, and geometry culling.
Microsoft demonstrated how the 3DMark benchmark recompiled for the new API can improve CPU scaling by up to 50% during the presentation of DirectX 12.
Another key aspect regarding the Direct3D 12 is that it works across all the Microsoft platforms from smartphones and tablets, to laptops and desktops, and, of course, Xbox One. Mobile devices these days end to become more and more powerful, which is why they need desktop-class APIs and increased efficiency to sustain battery life. The DX12 promises that thanks to its improved multithread scaling and CPU utilization as well as CTM approach to graphics functions, it will allow mobile devices to make certain tasks quicker and therefore reduce consumption of energy.
Theoretically, developers of video games will be able to create games that will run on all platforms that Microsoft serves, from sub-mobile to PCs to the Xbox One.
Microsoft also showcased how Forza Motorsport 5 video game, which is only available on Xbox One, can run at 60fps on a PC. Definitely, that was not a hard trick to, since the rig it was run featured Nvidia GeForce GTX Titan Black, one of the highest-performing graphics adapters nowadays.
At first, DirectX 12 will be supported by AMD Radeon graphics processing units powered by the GCN architecture; Intel Core i-series processors “Haswell” and “Broadwell” with Iris graphics cores; Nvidia GeForce GPUs featuring Fermi, Kepler and Maxwell architectures; Qualcomm Snapdragon mobile system-on-chips with the latest Adreno graphics. In a bid to support all graphics functionality the DX12 has to offer, GPU designers will have to develop new GPU architectures.
DirectX 12 preview will be available this year, whereas the official release is scheduled for late 2015. Keep in mind that by late 2015 the software giant will most likely design all-new Windows operating system and the new DX12 could be a tip of the iceberg when it comes to vision for the next Windows.
Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go closer to the metal than ever before. We accomplished this by overhauling numerous areas of the API. We will provide an overview of three key areas: pipeline state representation, work submission, and resource access.
Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a convenient, relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.
Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.
In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.
Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.
In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create “view” objects of resources, then bind those views to several “slots” at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.
Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.
In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.