Notwithstanding the earlier rumors about fundamental changes in the architecture of VLIW processors in the new Northern Islands family, particularly that the developer had given up the existing design with 4 simple and 1 complex ALU per a stream processor (AMD calls this device a stream core) in favor of a simpler design with 4 identical ALUs per processor, we don’t see such changes in the final product. The Barts is still based on the TeraScale 2 architecture that was introduced in the Radeon HD 5000 series. The same superscalar design still has five ALUs per a stream processor: four ALUs can execute simple instructions like FP MAD whereas the fifth one can execute complex instructions like SIN, COS, LOG, EXP, etc. Besides the ALUs, each stream processor contains a branch control unit and a set of general-purpose registers.
This approach is not unquestionable because achieving maximum performance calls for all the five ALUs in each processor to have some work to do, which requires shader code optimizations and an ideal operation of the ultra-threaded dispatch processor (UTDP). The latter had already been optimized extensively (and, judging by the resulting performance, successfully) for the Radeon HD 5000 series, though.
Interestingly, there is a second dispatch processor in the Barts flowchart. Considering that the official Cypress flowchart has only one UTDP, we might suspect that the two UTDPs, one for each SIMD array, are meant to further optimize the use of the available computing resources and, coupled with the increased clock rate, make the Barts competitive to the Cypress.
Well, we have managed to clarify the issue. As a matter of fact, the abovementioned RV870 flowchart was simplified. Instead, the Cypress has two UTDPs as well, each of which is serviced by a dedicated rasterization unit. The two are connected with a switch for optimal load balancing. This whole arrangement seems to have been carried over to the Barts silicon without any visible changes. The component layout of the new GPU hasn’t changed much in other ways, either.
The basic subunit of a Barts processor is a SIMD core which includes 16 stream processors (for a total of 80 ALUs). Each such core is serviced by dedicated logic and has a local data share (its size seems to have remained the same at 32 KB) and an 8KB L1 cache. It is connected to four texture-mapping units. The developer hasn’t revised the rather sophisticated system of caches but its total amount is different because the Barts has fewer SIMD cores than its predecessor. Right now, we do not know how many SIMD cores there are in the new processor. We only know that 14 SIMD cores are active in the Radeon HD 6870 and 12 in the Radeon HD 6850.
Following the simplification strategy, the Barts has been stripped of the support for double-precision calculations. By the way, this is yet another indication that the Radeon HD 6800 continues the Radeon HD 5700 series rather than replaces the Radeon HD 5800 one. Such calculations will probably be a prerogative of the more advanced Radeon HD 6900 which will feature a GPU known under the codename of Cayman. Thus, the Radeon HD 6800 doesn’t look appealing as a GPGPU solution for serious computing. On the other hand, applications for home users prefer FP32 to the FP64 format and their performance won’t be affected by the lack of double-precision support.