A Glance into the Future
X-bit labs: What is more progressive and economically feasible for entry-level and mainstream processors, to implement new instructions or certain special purpose accelerators or ALUs like those inside GPUs or a wide vector processing unit (those are similar though) to better serve future needs of computing?
Godfrey Cheng: The big difference between us and Intel is that we are not bound by the x86 ISA. So, we are going to apply the right silicon, the right programming model and the right APIs to what we see is the best way to handle solution. Decoding videos is really handled well by low-power fixed-function hardware like UVD. So, we have a lot of fixed-function units in our APUs. We have brand-new Bulldozer and Bobcat CPU cores, but at the same time we have world-class ALUs from our GPUs, so we do not really care what silicon we put there so long as we focus on the right solution. We do not care how we innovate as long as we deliver the best solution possible.
X-bit labs: Does it make sense to assume that the future APUs will have multiple parts working at different clock-speeds? For example, an antivirus monitor and a browser do not require high clock-speeds and they can work at 500MHz, while another core will work at 1.50GHz and higher?
Godfrey Cheng: Absolutely. For instance, in low-power settings you only need GPU partially active to refresh display and you need one core of the CPU to keep Windows operating system running. It is fundamental to our strategy to downclock as much as we can so to preserve power and the other strategy is to apply power to the core that needs it the most to increase clock-speed in whatever we need to do. Looking forward, and you might not see an implementation of this with the current-generation of APUs, we are going to clock different parts of CPU, or the APU, at different speeds or even shut down entire parts.
Software: GPGPU Is the Key
X-bit labs: Performance of Zacate (and Ontario) in x86-based applications appears to be pretty low, it barely beats Atom in a lot of cases and it offers lower performance than Pentium dual-core. But obviously it should be much faster in GPGPU-based applications. I wonder whether you have plans to ensure that actual system vendors install those GPGPU apps onto their netbooks or nettops with Fusion chips?
Godfrey Cheng: Benchmarks like Bapco Sysmark are not the things that the consumer does. When we looked at consumers and asked what did they want their computer to do, we learnt that consumers want better Internet video, better video in general, better Web-surfing. These are the things that our [Ontario and Zacate] APUs are designed for; they are not designed to be benchmark winners. Instead, we are delivering great performance, great functionality and the great experience at very good TDPs. We are not going to chase the benchmark wars with our friends at Intel, that is not our goal. But if you look at products based on those APUs, people will get great experience out of them.
We are definitely working with third-party software application companies to take advantage of parallel compute and fixed-function capabilities of our APUs. You will see companies shipping software next year that will be optimized for our accelerated processing units.
X-bit labs: When do you expect a wide range of consumer applications to start using GPUs to speed up stream processing?
Godfrey Cheng: We have been working with software developers for a while, we should expect to see such applications in 2011. [...] We are working with multiple companies and there are multiple applications that benefit from general purpose computing on GPUs and thus from our ALUs in the pipeline as a part of our Fusion Fund initiative.
X-bit labs: At present software determines which (x86 or GPU) cores to use. But would you expect future chips to automatically perform the load balance and decide which hardware to use for particular operations?
Godfrey Cheng: That part is really done on the compiler, API and application levels. We are definitely working on OpenCL application programming interface and at some point an OpenCL variation will allow you to choose whether you put [your workload on] your GPU or your CPU. Basically, we are agnostic as to where run the code as long as it runs efficiently; but software companies will have controls as to where exactly it runs, it is not at the chip level.
When we work with software developers, we say, "we have an APU that does this, this and this, why don't you try your code-path in different ways?" Ultimately, they will decide what is best for their software architecture.
What has never happened is that by offering them x86, ALUs and fixed-function units they can approach patterning their problems differently. If we can offer them 90GFLOPS+ of compute power in an eighteen watt package, they might approach that and use those capabilities that they never had access to. That is what is different [with APUs].