Ultra Threaded Dispatch Processor Version 2.0
An important component of a GPU with unified computing processors, the dispatch processor has to effectively allocate all of GPU resources in such a way that none of the GPU subunits was idle.
The new ultra-threaded dispatch processor is more advanced than the one in the Radeon X1000 series. For comparison, the dispatch processor of the R580/Radeon X1900 chip can manage up to 512 threads, 16 pixels each, simultaneously whereas the Radeon HD 2900’s dispatcher can manage thousands threads, 5 pixels each.
Another important difference of the new-generation dispatcher is its ability not only to distribute the resources of pixel and texture processors, but fully manage the execution of vertex and geometrical shaders, dynamically allotting the GPU’s computing resources. We should note, however, that the new dispatcher does not decode the driver’s commands into the chip’s internal commands (this is the command processor’s job) and does not create queues of vertex, geometrical and pixel shaders (this is the setup engine’s job).
The ultra-threaded dispatch processor consists of arbiters that assign tasks to computing devices and texture processors and of sequencers for the execution processors. Each SIMD array is equipped with two arbiters, which allows launching two operations simultaneously on each array.
The new dispatcher supports a variety of techniques to mask latencies when some code branch requests data not in the cache. Like with the Radeon X1000 architecture, the execution of such a code branch is halted so that the computing resources could be freed up for other tasks.