Bookmark and Share


Researchers from North Carolina State University have developed a new technique that allows to improve performance of AMD Fusion or Intel Sandy Bridge hybrid chips by an average of more than 20%. The engineers propose to take advantage of unique features of x86 microprocessors, such as data pre-fetching or large caches, to speed up execution of highly-parallel tasks on graphics processing units.

“Chip manufacturers are now creating processors that have a ‘fused architecture,’ meaning that they include CPUs and GPUs on a single chip. This approach decreases manufacturing costs and makes computers more energy efficient. However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That’s the issue we’re trying to resolve,” said Dr. Huiyang Zhou, an associate professor of electrical and computer engineering who co-authored a paper on the research.

Central processing units (CPUs) have less computational power than graphics processing units (GPUs) – but are better able to perform more complex tasks and have a number of special-purpose units that are not present on graphics processors.

“Our approach is to allow the GPU cores to execute computational functions, and have CPU cores pre-fetch the data the GPUs will need from off-chip main memory. This is more efficient because it allows CPUs and GPUs to do what they are good at. GPUs are good at performing computations. CPUs are good at making decisions and flexible data retrieval,” said Mr. Zhou

In other words, CPUs and GPUs fetch data from off-chip main memory at approximately the same speed, but GPUs can execute the functions that use that data more quickly. So, if a CPU determines what data a GPU will need in advance, and fetches it from the  main memory, that allows the GPU to focus on executing the functions themselves – and the overall process takes less time.

In the proposed CPU-assisted GPGPU, after the CPU launches a GPU program, it executes a pre-execution program, which is generated automatically from the GPU kernel using the proposed compiler algorithms and contains memory access instructions of the GPU kernel for multiple threadblocks. The CPU pre-execution program runs ahead of GPU threads because (1) the CPU pre-execution thread only contains memory fetch instructions from GPU kernels and not floating-point computations, and (2) the CPU runs at higher frequencies and exploits higher degrees of instruction-level parallelism than GPU scalar cores. The researchers also leverage the prefetcher at the L2-cache on the CPU side to increase the memory traffic from CPU. As a result, the memory accesses of GPU threads hit in the L3 cache and their latency can be drastically reduced. Since the pre-execution is directly controlled by user-level applications, it enjoys both high accuracy and flexibility. Engineers' experiments on a set of benchmarks show that our proposed preexecution improves the performance by up to 113% and 21.4% on average.

The paper, “CPU-Assisted GPGPU on Fused CPU-GPU Architectures”, will be presented in late February at the 18th International Symposium on High Performance Computer Architecture, in New Orleans. The paper was co-authored by NC State students Yi Yang and Ping Xiang, and by Mike Mantor of Advanced Micro Devices. The research was funded by the National Science Foundation and AMD.

Tags: AMD, Fusion, Llano, Trinity, Intel, Sandy Bridge, Ivy Bridge


Comments currently: 19
Discussion started: 02/07/12 03:01:20 PM
Latest comment: 02/12/12 04:31:17 AM
Expand all threads | Collapse all threads


This is already on AMD's roadmap for 2014. There are all sorts of advantages and possibilities to APUs which is why AMD plans their future on them. Many people will be using AMD desktop APUs in 2014 because these APUs will do everything you want, be cost effective, consume less power than a discrete CPU/GPU and produce a great PC experience.
9 7 [Posted by: beenthere  | Date: 02/07/12 03:01:20 PM]
- collapse thread

No, the engineers shows a certain way or a method to optimize a stream based GPU by using a CPU to do the prefetching. AMD can use this in their microcode to always do this instead waiting for developers to include this technique in their software to begin to compete against Intel. Though, Intel can also use this technique for their processors to.

I doubt people will buy an APU just for what you said. Only reason why people would buy it is they do not need a powerful graphics processor, so having a graphics processor built into the cpu becomes a 2 for 1 deal. Software developers have to use the technique described in the article to use both the CPU and GPU to gain more performance. If it is already including in the microcode, only then it becomes a transparent boost in performance.

A roadmap like AMD's can always change. The economy is bad and AMD's business is tipping in the red even all the interesting announcements that AMD have been doing, so the roadmap may change. For AMD, it is a wait and see or take it day by day. In Intel's case Intel's roadmap has stayed pretty consistent, so the roadmap is true to Intel's word.
2 4 [Posted by: tecknurd  | Date: 02/07/12 09:35:40 PM]
Yeah that is why AMD has sold 30 MILLION APUs...

Denial doesn't change reality Sparky.

Did I mention AMD has sold 30 MILLION APUs already?

Did I mention that AMD made a Half BILLION IN NET PROFIT in 2011?

I know the search feature on Xbit's isn't great but even you should be able to find info. with a little effort.

You are so BLINDED BY YOUR HATE that you can't even read a story on AMD's APU sales success and profits for 2011 and comprehend what it actually states. If you did you wouldn't make such uninformed posts.
2 3 [Posted by: beenthere  | Date: 02/08/12 11:08:01 AM]

The research paper has nothing to do with Intel or Sandy Bridge. Maybe Shillov had an enlightment somewhere.
3 3 [Posted by: bereft  | Date: 02/07/12 07:28:55 PM]
- collapse thread

Of course not. Sandy Bridge is not an APU.
1 0 [Posted by: zorg  | Date: 02/08/12 02:07:29 AM]

Anyone remember "co-processors"?
0 1 [Posted by: bluvg  | Date: 02/08/12 02:11:12 AM]
- collapse thread

Yup. This goes way beyond coprocessors or physic processors.

With AMD opening up their CPUs to other folks IP's and moving to HSA, the options are simply amazing.

Now they need to get GloFo and TSMC to execute on time, every time.
3 3 [Posted by: beenthere  | Date: 02/08/12 11:16:51 AM]
"AMD opening up their CPUs to other folks IP's and moving to HSA"

You must be a joker, right? ATi ever since DAMN put them over ther wing have issues executing on time. Even when they had advanced hardware their drivers just dint back it up .... just like in old pre-Radeon days of ATi Rage chips that were good bud lacking any decent driver support.

AMD simply use to sack people for sake of small amount of cream scrapping CEOs, that have been sleeping with politician (and still are) so they're very well infected with their rhetoric of giving people empty promises.

Hellou .... the word you should gugl is Larabee and tell us were it was their fail (drunken kartoffeln researches and silicone itself) ... even being backed up with twenty times fatter company than some measly sugar candy DAMN.
0 0 [Posted by: OmegaHuman  | Date: 02/12/12 02:42:57 AM]
deleted doubler

0 0 [Posted by: OmegaHuman  | Date: 02/12/12 04:31:12 AM]

I cant believe what 'teckturd' said, even after reading that rubbish twice, talk about disillusional
2 1 [Posted by: alpha0ne  | Date: 02/10/12 12:10:54 AM]
- collapse thread

Hate and denial is what the Intel fanbois live for. It's a very distorted view of the world.
2 3 [Posted by: beenthere  | Date: 02/10/12 06:04:23 AM]

"The engineers propose to take advantage of unique features of x86 microprocessors, such as data pre-fetching or large caches, to speed up execution of highly-parallel tasks on graphics processing units."

Only thing i could say is this for real?! It ain't April 1st yet?

"Researches as they used to call themselves" are you kidding me?

We all know that what cache is used for. But hey we don't want Intel LRB to be sketched as proven model do we and to scrap last 15 years of really produceable GPU technology that proven itself at least one magnitude more power effective than any contemporary CPU to me scrapped just because some *bribed know-it all lobbyist* says woow hUUUUGE cACHE must be our holy grail. And the whole thing heavily relied on effective computational mechanisms and not dully build-up of "storage of unnecessary megabytes". Hey we have x86 CPU already you know.

0 0 [Posted by: OmegaHuman  | Date: 02/12/12 02:28:07 AM]

deleted doubler
0 0 [Posted by: OmegaHuman  | Date: 02/12/12 04:31:17 AM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture