All 4 One: Quad SLI Rendering Modes Explained
Currently Nvidia offers four SLI rendering modes that are different from each other:
In Alternate Frame Rendering , the driver divides workload by alternating GPUs every frame. For example, on a system with two SLI-enabled GPUs, frame 0 would be rendered by GPU 0, frame 1 would be rendered by GPU1, frame 2 would be rendered by GPU0, and so on. Nvidia says that this is typically the preferred SLI rendering mode as it divides workload evenly between GPUs and requires little inter-GPU communication, allowing for up to a 1.9x performance increase in case of 2 GPUs.
We have not seen any statements from Nvidia that describe benefits of AFR in 4-GPU environment, even though the company states that the basic AFR principles work here as well: the GPU0 renders frame 0, the GPU1 renders frame 1, the GPU2 renders frame 2, the GPU3 renders frame 3.
The AFR works pretty fine in 2 GPU environments, even in cases when we need to use render to texture operations (necessary for environment mapping, shadow mapping, and many more) which require graphics chips to in SLI mode to broadcast any changes in render targets to each other. Nvidia admits that this typically results in a large data copy, resulting in bus traffic and synchronization overhead and to avoid this applications developers have to clear color for texture RTs each frame by calling a special command (“Clear()”) on the surface that corresponds to the texture. The problem that occurs is that there are some special cases that need the results of rendering of the previous frame to render the next frame, for example, in the cases of previous frame’s rendering is used to approximate scene luminance for tone mapping, which means, clearing the RT should not be performed. Game developers have to allocate separate RT for each GPU (if an application uses one texture RT and happens to be running on an SLI system with two GPUs, Nvidia advices to allocate two RTs instead: on even frames, perform all RTT operations on renderTarget0, and on odd frames perform all RTT operations on renderTarget1), however, it is unclear whether all the programmers have already adopted the technique for two GPUs, not talking about four GPUs.
In Split Frame Rendering , the driver will clip the scene into multiple regions and designate rendering workload for these regions to different GPUs. For example, on a system with two SLI-enabled GPUs, the screen may be divided vertically, with GPU0 rendering the top region and GPU1 rendering the bottom region. Rendering is also dynamically load balanced, so the scene division will change whenever the driver determines that one GPU is working more than another. According to Nvidia, this SLI rendering mode is typically not as desirable as AFR mode, since some rendering work is duplicated and communications overhead is higher. We suspect that SFR mode for 4 GPUs has even higher overheads, hence, should deliver a bit lower efficiency compared to the 2 GPU SFR.'
Specifically for quad SLI, Nvidia introduced the so-called AFR of SFR : the frame0 is rendered in scissor mode by GPU0 and GPU1, whereas the frame1 is rendered by GPU2 and GPU3.
In compatibility mode , only one GPU is active and all other GPUs are idle. This offers no performance benefit, but ensures compatibility.
Finally, there is a mode called SLI antialiasing (SLI AA) that can blend the results of antialiased rendering of several GPUs for the final frame. In addition to the 8x and 16x mode, the 4-way SLI adds 32xs mode (which means that every GPU renders its frame using 8xs algorithm with certain offset and then the main GPU combines the rendering into one frame) that promises ultimate quality.
We should not that since there are 4 GPUs now, the SLI AA patterns that were effective for the 2-way solution no longer work here and probably Nvidia employed some new methods.