Heterogeneous Computing Performance
Promoting its hybrid processors, AMD keeps on reminding us that the integrated graphics core can be used to accelerate general-purpose computations. That’s true. The OpenCL and DirectCompute frameworks that enable parallel computing on both x86 and graphics cores are supported by both the AMD Trinity and the Intel Ivy Bridge series. And while they used to be used by very few specialized applications, the idea of heterogeneous computing has become much more widespread now. Today AMD already has a pretty impressive list of applications that get accelerated in the APU due to the graphics core potential.
That’s why we would like to run some performance tests in applications that can make full use of all the versatile resources provided by hybrid processors.
We prefer to start with a special Basemark CL benchmark, which replicates typical tasks, which use OpenCL calculations, such as image processing or physics simulation with results visualization. Moreover, Basemark CL takes into account the arithmetic capacity of the OpenCL calculator unit.
The advantage of the AMD APU is obvious and undisputed. At the same time Basemark CL reveals a significant increase in Richland performance compared with Trinity. Looks like the driver optimizations do the trick here.
The second OpenCL benchmark we used was SVPMark 3. It is a specialized performance benchmark for the SmoothVideo Project software which improves video playback smoothness by inserting new intermediary frames into the video stream. This software makes active use of GPU resources via OpenCL.
Here AMD A10-6800K is again considerably faster than Intel processors. However, the performance difference between them and their predecessors with Trinity design is truly insignificant and makes only 3%.
One of the greatest achievements in the APU concept that indicates its broad adoption by the software market is the introduction of OpenCL support in a popular WinZIP archiving utility. We used WinZIP 17.5 to compare the compression speed of 850 MB a folder with files into zipx format.
Calculations acceleration in OpenCL helps AMD processors to outperform Core i3-3225. However, both, Richland and Trinity products fail to beat the quad-core Core i5-3330. Note that the performance difference between A10-6800K and A10-5800K is about 8% in this case.
Another example of a popular OpenCL-compatible application is the professional video editing tool Sony Vegas Pro 12. When rendering video, it can distribute the load among all the computing resources of hybrid processors.
The performance differences between A10-6800K and A10-5800K are minimal, but there is another remarkable fact here: Core i3-3225 with Intel HD Graphics 4000 graphics core offers better performance here than AMD APUs.
The other popular video processing task is transcoding. Today, every graphics core developer has realized that specialized transcoders should be integrated into their solutions. We checked out the transcoding capabilities of the tested processors using CyberLink MediaEspresso 6.7 that supports both Intel Quick Sync and AMD VCE. During this test, a 1.5GB 1080p H.264 video clip (a 20-minute episode of a TV series) was transcoded into a lower-resolution format for viewing on an iPhone 4S (H.264, 1280x768 pixels, about 6 Mbps bitrate).
It is very difficult to compete against Intel Quick Sync in video transcoding speed. Intel’s transcoding implementation uses a highly efficient combination of specialized hardware units and has some of the tasks processed using graphics core resources. AMD solution merely relays the calculations to parallel stream processors, which generates certain bottlenecks taking into account the transcoding algorithm specifics. As a result, even the junior Intel HD Graphics core model 2500 offers 2.5 times faster transcoding speeds than A10-6800K. As for the Richland design, as we can see, it didn’t produce almost any gain compared with Trinity.