DirectX 12 & WDDM 2.0: Reworking the Windows Graphics Stack
At a low-level technical perspective, it’s perhaps a bit of a generalization though none the less true that Windows at the kernel is relatively stable and feature complete these days. After the massive reworking for Windows Vista (6.0), Windows finally reached a point where the kernel and other low-level components of the OS supported the necessary features and sported the required stability to drive Windows for generations to come. As a result Microsoft never significantly tampered with the Windows kernel through Windows 7 (6.1) and Windows 8 (6.2/6.3) – making small feature additions where it made sense to – and even the kernel version number of Windows 10 (10.0) is largely arbitrary, with its roots clearly in 6.x.
Which is not to say that Microsoft hasn’t made low-level changes, only that those changes have been more deliberate and driven by specific needs. Case in point (and getting to the subject matter of this section) is DirectX 12 and its underlying driver structure, the Windows Display Driver Model. Even after the release of Windows Vista and its massive overhaul of the graphics stack, Microsoft has continued modifying the stack over successive generations as GPUs have become more flexible and more capable. After a series of smaller changes in Windows 7 and Windows 8, for Windows 10 Microsoft has gone back to make what are the most fundamental changes to the graphics stack since Windows Vista over 8 years ago.
DirectX 12
Microsoft’s changes ultimately reach out and touch several aspects of the OS, but the bulk of these changes are being put in place to support DirectX 12, the next generation of Microsoft’s game & multimedia API. We have covered DirectX 12 in a great amount of detail over the past year, so for deeper coverage we’ll reference the appropriate articles, but in summary here is what DirectX 12 brings to the table and why it is a big deal.
Excerpt from Microsoft Announces DirectX 12
Why are we seeing so much interest in low level graphics programming on the PC? The short answer is performance, and more specifically what can be gained from returning to it.
Something worth pointing out right away is that low level programming is not new or even all that uncommon. Most high performance console games are written in such a manner, thanks to the fact that consoles are fixed platforms and therefore easily allow this style of programming to be used. By working with hardware at such a low level programmers are able to tease out a great deal of performance of this hardware, which is why console games look and perform as well as they do given the consoles’ underpowered specifications relative to the PC hardware from which they’re derived.
However with PCs the same cannot be said. PCs, being a flexible platform, have long worked off of high level APIs such as Direct3D 11 and OpenGL. Through the powerful abstraction provided by these high level APIs, PCs have been able to support a wide variety of hardware and over a much longer span of time. With low level PC graphics programming having essentially died with DOS and vendor specific APIs, PCs have traded some performance for the convenience and flexibility that abstraction offers.
The nature of that performance tradeoff has shifted over the years though, requiring that it be reevaluated. As we’ve covered in great detail in our look at AMD’s Mantle, these tradeoffs were established at a time when CPUs and GPUs were growing in performance by leaps and bounds year after year. But in the last decade or so that has changed – CPUs are no longer rapidly increasing in performance, especially in the case of single-threaded performance. CPU clockspeeds have reached a point where higher clockspeeds are increasingly power-expensive, and the “low hanging fruit” for improving CPU IPC has long been exhausted. Meanwhile GPUs have roughly continued their incredible pace of growth, owing to the embarrassingly parallel nature of graphics rendering.
The result is that when looking at single threaded CPU performance, GPUs have greatly outstripped CPU performance growth. This in and of itself isn’t necessarily a problem, but it does present a problem when coupled with the high level APIs used for PC graphics. The bulk of the work these APIs do in preparing data for GPUs is single threaded by its very nature, causing the slowdown in CPU performance increases to create a bottleneck. As a result of this gap and its ever-increasing nature, the potential for bottlenecking has similarly increased; the price of abstraction is the CPU performance required to provide it.
3DMark 2011 CPU Time: Direct3D 11 vs. Direct3D 12
Low level programming in contrast is more resistant against this type of bottlenecking. There is still the need for a “master” thread and hence the possibility of bottlenecking on that master, but low level programming styles have no need for a CPU-intensive API and runtime to prepare data for GPUs. This makes it much easier to farm out work to multiple CPU cores, protecting against this bottlenecking. To use consoles as an example once again, this is why they are capable of so much with such a (relatively) weak CPU, as they’re better able to utilize their multiple CPU cores than a high level programmed PC can.
The end result of this situation is that it has become time to seriously reevaluate the place of low level graphics programming in the PC space. Game developers and GPU vendors alike want better performance. Meanwhile, though it’s a bit cynical, there’s a very real threat posed by the latest crop of consoles, putting PC gaming in a tight spot where it needs to adapt to keep pace with the consoles. PCs still hold a massive lead in single-threaded CPU performance, but given the limits we’ve discussed earlier, too much bottlenecking can lead to the PC being the slower platform despite the significant hardware advantage. A PC platform that can process fewer draw calls than a $400 game console is a poor outcome for the industry as a whole.
DirectX 12 as a result is the next-generation API that will be providing the basis for graphics going forward in Windows 10. Along with enabling critical improvements in CPU efficiency and scalability in multi-threading, the latest version of Windows’ major graphics API also introduces some other features that further the state of computer graphics. This includes a number of disparate but otherwise “neat” graphics tricks like asynchronous shading to better utilize GPU resources by processing certain classes of rendering tasks in parallel, and explicit multi-adapter functionality that allows the integrated GPUs found on most gaming platforms to be utilized in a meaningful way to contribute to the rendering process, rather than sitting idle as is now the case.
Meanwhile DirectX 12 also introduces some new graphics features that are being rolled out under the feature level 12_0 and 12_1 specifications. These include conservative rasterization for better calculation of pixel coverage, raster order views for better control over rendering order, and even freer resource binding to expand the amount of resources devs can use and how they organize them. And due to the nature of feature levels, most of these benefits are also being exposed in one form or another to the existing DirectX 11 API through DirectX 11.3, though certainly the bulk of their use will be under DirectX 12.
The first commercial DirectX 12 games are expected at the end of this year, with more to follow in 2016. Like so many other elements of Windows 10, ideally Microsoft would like to quickly push development towards this new API, using the free upgrade to quickly build up an established base. With DirectX 11 having taken years to really achieve traction due to the stubborn perseverance of Windows XP, there is a good deal of hope that with the free upgrade there will not be a repeat performance with respect to DirectX 12.
WDDM 2.0
Meanwhile below the API layer, quite a bit of work has gone into Windows at the driver level in order to enable the functionality of DirectX 12. While the full list of these changes are beyond the scope of a simple OS review, perhaps the most important point to take away is that due to these changes, Windows 10 is the biggest overhaul of the Windows graphics stack since WDDM 1.0 in Windows Vista. A big part of this is changes to how virtual memory works, which though largely abstracted from both the user and the developer, is crucial to the performance improvements unlocked by DirectX 12.
However because of these changes, there is a clear division in capabilities between Windows 10 and earlier version of Windows, and for that matter in the drivers for the two OSes. While Windows Vista/7/8 graphics drivers were distributed using a unified WDDM 1.x driver, Windows 10 graphics drivers are being distributed separately as their own WDDM 2.0 build. So much of WDDM 2.0 will be hidden from end users, but this will be one area where though minor, users will notice that something is different.
Memory optimizations and drivers aside, WDDM 2.0 also gave Microsoft the chance to fix some niggling issues in how the graphics stack worked. Quite a bit of effort has been put into multi-display cloning, for example – a feature that never worked quite as well as it should have – with the new WDDM 2.0 stack changing how scaling was being handled so that it’s more useful, more consistent, and works with multiple GPUs. These enhancements are also being deployed to Miracast support, and further improvements are being unlocked there such as support for dynamic resolutions and framerates.
WDDM 2.0 improvements are also an element in enabling Microsoft’s GameDVR feature, which sees game footage recording become an OS-level feature. And for better or worse, WDDM 2.0 also enables some new DRM functionality, which is being deployed as a condition of getting 4K (and above) protected content licensed for use on Windows.
ncG1vNJzZmivp6x7orrAp5utnZOde6S7zGiqoaenZIZ2gJJoq6GdXay2r7DOsKpmaWBiv6bCyJ6uaGlj