ATI endowed its new graphics processors with a completely redesigned memory controller. The internal memory bus of RADEON X1800 has acquired ring topology and consists of two 256-bit bidirectional ring buses, while the ring topology of the RADEON X1600 implies two bi-directional 128-bit buses.
Ring buses go around the entire die and help to simplify and optimize its interconnects. The chip components can thus be connected in the shortest way. Coupled with the dispatch unit, this solution minimizes latencies and signal distortion at memory write operations. Thanks to the Ring Bus technology, the RADEON X1800/X1600 can work with high-frequency memory like GDDR4, for example, while a traditional architecture wouldn’t support GDDR4 due to interference in the not optimally wired connectors inside the GPU.
The memory is connected to the buses at the so-called Ring Stops. There are four such stops in total; each has two 32-bit access channels. For comparison: the memory of the RADEON X850 connects to the controller through four 64-bit channels. Each Ring Stop can give out data to the requesting client, according to the memory controller’s instructions.
The Ring Bus memory subsystem works simply. A client sends a data request to the memory controller which is located in the center of the chip. The memory controller uses a special algorithm to determine the priority of each request, giving the highest priority to those that affect the performance the most. Then it sends an appropriate request to the memory chips and sends the data along the Ring Bus to the Ring Stop nearest to the requesting client. From the Ring Stop the data arrives to the client. A so-called Write Crossbar Switch is located around the controller proper for optimal memory access – it makes sure the requests are distributed evenly.
The operation algorithm of the new controller can be programmed from the driver, so its operation can be improved further in the future. Moreover, ATI has a theoretical opportunity to program the controller for a specific application and create an appropriate profile in the Catalyst driver.
The cache has become fully associative, i.e. any cache line can store the contents of any location in the external memory.
The frequency being the same, an associative cache works more efficiently than a direct-mapped cache. Thus, the new architecture has a great performance reserve for applications critical about the graphics memory subsystem bandwidth. In other words, the RADEON X1000 is expected to perform well in high resolutions and/or with enabled full-screen antialiasing and anisotropic filtering.
The HyperZ technology has also been improved and a more sophisticated algorithm is now employed to identify invisible surfaces that are to be removed. ATI says the new algorithm is 50% more efficient than in the RADEON X850.
Note that although RADEON X1300 doesn’t support Ring Bus as well as programmable memory requests arbiter, it uses other techniques intended to improve the memory bandwidth of the RADEON X1000 family.