Future Hardware Architectures Will Make Development of Higly-Parallel Software Easier - AMD

CPUs and GPUs Should Be "Integrated Further", Claims AMD

by Anton Shilov
05/24/2011 | 10:17 PM

Even though there are a lot of tools nowadays that allow to create general-purpose software that relies onto graphics processing units (GPUs), software designers still need to assign the right hardware (x86 or stream processors) to perform specific tasks, which makes development process rather hard. But experts from Advanced Micro Devices believe that as CPUs and GPUs integrate further, everything will become much easier.


"Although OpenCL can make developing applications that leverage the power of the multi-core CPU or the GPU much easier, both OpenCL and other current programming models continue to maintain a very clear distinction between the GPU and the CPU. This distinction limits the range of algorithms that can be cleanly and easily implemented. Over time, as we continue to integrate the CPU and the GPU, we expect the range of algorithms that can efficiently be mapped to an APU to increase, and for it to become easier to develop applications that leverage both capabilities without the developer needing the detailed architectural knowledge that is currently necessary," said Lee Howes, an MTS programming models engineer at AMD, in a recent interview.

The software that relies on both central processing units (CPUs) and graphics processing units (GPUs) is becoming more and more widespread nowadays and that trend will accelerate thanks to the fact that both AMD and Intel now integrate graphics processors into microprocessors. Still, it will take several years before all software designers utilize general purpose processing on graphics processing units (GPGPU) technologies.

Another major problem with offloading work to the GPU is the latency and bandwidth limitations of the PCI Express bus that connects main memory to GPU memory. In particular, the limitation restricts the work that can be offloaded to large long-running kernels that do not have a tight turnaround time between producing data and the data being needed back in main CPU memory.

"The APU architecture is designed to help reduce this bottleneck by providing efficient access to shared memory. As we move forward during the next few years, we plan to continue to increase this flexibility and further reduce offload overhead, allowing developers to offload smaller units of work in tighter dependent loops without finding themselves limited by the interconnect," said Mr. Howes.

In general, AMD remains very optimistic about performance gains of APUs going forward due to the fact that the speed of CPUs has been stagnating in the recent years, whereas software clearly need as much horsepower as possible.