by Ilya Gavrichenkov
03/31/2013 | 01:41 AM
The CPU realm has changed over the last few years. Today, the majority of desktop CPUs sold are those with an integrated graphics core. It is such products that occupy the entry-level and midrange market segments, making people buy a graphics core along with a CPU when they build a new computer. Many users are not fond of the idea, but there’s no other way around. Meanwhile, modern graphics cores we find integrated into CPUs must be given credit for having rather advanced specifications and delivering performance that’s enough for quite a broad scope of applications, the sharp drop in popularity of cheap standalone graphics cards being the consequence. CPUs with integrated graphics can easily and completely substitute them. Moreover, we can even seriously discuss capabilities of today’s CPU-integrated graphics cores in terms of running modern DirectX 11 games at Full-HD resolutions. Of course, you have to drop some visual quality settings and disable various image enhancement techniques like full-screen antialiasing, but you do get a playable frame rate!
It must be noted, however, that the manufacturers of x86 processors didn’t purposefully target the market of low-end graphics cards in their race to endow their products with the new functionality. AMD and Intel have both come to their integrated solutions by listening to mobile users. Combining a CPU and a GPU inside one semiconductor die makes sense from the standpoint of platform miniaturization, cooling system simplification and energy efficiency. That’s why people who use compact notebooks and tablet PCs are actually very satisfied with the existing CPUs and their integrated graphics. Moreover, they have lower graphics performance requirements since mobile computers generally have a lower display resolution than typical desktop PCs.
Thus, CPUs with integrated graphics have come to desktop computers as the result of product unification AMD and Intel resort to due to the steady decline in the sales of desktop products and the explosive growth of all kinds of mobile gadgets. We can’t expect this trend to change anytime in the near future, so desktop users have to put up with adapted versions of mobile CPUs which have not only general-computing but also graphics cores. Well, we don’t want to sound as if such products have nothing useful at all to offer to desktop users. The integrated graphics core isn’t utterly hopeless and such CPUs have come to be employed in a wide range of applications. We don’t mean just office PCs or entry-level gaming configurations but also a whole new class of home media centers that are connected to large TV-sets for digital entertainment.
Besides, modern integrated CPUs have one important feature that may come in most handy in the desktop PC environment. They can use their graphics core not only to handle graphics proper but to do some computing as well. This heterogeneous computing concept has become a reality thanks to combined effort of CPU and software developers. The OpenCL framework designed specifically for such purposes is fully supported by all integrated graphics cores and its use is becoming a norm for many resource-consuming applications, especially those that deal with image or video processing.
So, a hybrid CPU with an integrated graphics core is an interesting and highly perspective product that calls for a change of perspective from desktop users. In this review we will try to carry out a comprehensive test of modern CPUs in order to not just highlight the highs and lows of the CPU and GPU cores individually but also to show the peculiarities of the hybrid CPU design in general as it is implemented by the two CPU developers. Therefore, besides conventional benchmarks of CPU performance, we will have specific tests such as parallel computing with the graphics core’s help, high-definition video encoding and decoding, and gaming. With this approach we will be able to give a just verdict to today’s so-called accelerated processing units or APUs.
We will consider the newest integrated-graphics CPUs offered by Intel and AMD in the under-$150 category. In other words, we’ll compare Intel’s dual-core Ivy Bridge processors with AMD’s Trinity series.
The AMD Trinity is the second variant of APUs from AMD. The first one, codenamed Llano, was introduced to desktop users as part of the Socket FM1 platform but didn’t really take off. The old microarchitecture and, consequently, low performance of the x86 cores coupled with the short declared lifecycle of the platform itself didn’t help to make it attractive. The new APU is different, with all of the older downsides having been corrected. The CPU part of the Trinity sports the most advanced microarchitecture AMD has at its disposal and the Socket FM2 platform is expected to have a rather lengthy lifecycle.
Like AMD’s first-generation APUs, the Trinity incorporates three constituent parts, each of which has been updated. The conventional CPU part is now based on x86 Piledriver cores which are well known to users by AMD’s new Vishera series. The difference is that the Trinity APUs may only have a maximum of four x86 cores. Thus, they have only one pair of dual-core modules which, according to AMD’s design concept, have a whole set of subunits shared by the two cores: cache memory, instruction fetch unit, instruction decoding unit, and a floating-point unit. In other words, the Trinity is at best only one half the top-end AMD FX CPUs in terms of computing performance, yet features all the advantages of the second-generation Bulldozer architecture.
Meanwhile, the 32nm semiconductor die of a Trinity CPU is as large as 246 sq. mm, which is but 22% smaller than that of an 8-core Vishera. Why? Because the larger part of the Trinity die is occupied by an integrated graphics core codenamed Devastator. It introduces into AMD’s integrated solutions the VLIW4 architecture which has come to APUs from Radeon HD 6900 series graphics cards. The change of architecture doesn’t affect the total number of shader processors compared to the previous-generation integrated graphics core but helps make a more efficient use of them, improving the overall computing density. In its maximum version the Devastator has six SIMD engines, each of which includes four texture fetch units and 16 VLIW4 stream processors, and also 24 texture-mapping units and 8 raster operators. Thus, it seems to be about one fourth the Radeon HD 6970 GPU, but with lower clock rates.
The third constituent component of the Trinity is the integrated North Bridge responsible for system memory access. The Socket FM2 platform developed by AMD specifically for new-generation APUs supports dual-channel DDR3 SDRAM at frequencies up to DDR3-1866. Since both the CPU and GPU cores use this memory controller, the memory bandwidth becomes a crucial factor. However, to reduce the transistor budget and lower the manufacturing cost, the Trinity is devoid of L3 cache memory which would be most useful in this design.
The Trinity series includes several APU modifications varying not only in clock rates of their x86 and graphics parts but also in the number of CPU cores and GPU stream processors. We’ve managed to collect the entire set for this test session as listed in the following table:
Take note that, although the Trinity uses a graphics core with VLIW4 rather than GCN architecture, AMD markets it as Radeon HD 7000 because the Devastator is compatible with DirectX 11, OpenGL 4.1 and OpenCL 1.1 APIs and features AMD’s latest video engine HD Media Accelerator. Thus, the Trinity provides extensive HD video processing capabilities including hardware decoding of popular formats (UVD3) and hardware encoding into H.264 format (VCE).
Now let’s have a closer look at the APUs we’re going to test.
The A10-5800K is the flagship model of the Trinity series. AMD uses it to showcase the benefits of the company’s new design, so it has the maximum number of execution cores and shader processors and runs at highest clock rates. As a result, we have a quad-core APU based on two Piledriver modules with a base clock rate of 3.8 GHz and capable of turbo-boosting to 4.2 GHz. Its integrated Radeon HD 7660D graphics core has 384 shader processors and is clocked at 800 MHz.
With Turbo Core technology enabled, the APU spends most of its time at 4.0 GHz, dropping the clock rate to 1.4 GHz when idle. The clock rate is also reduced at high multithreaded loads: to 3.4 GHz rather than to 3.8 GHz as you might have expected from the specifications. It looks like the manufacturer declares an overstated base clock rate, and the rather high TDP of this model, 100 watts, doesn’t make the A10-5800K any more attractive.
Overall, the A10-5800K seems to be an APU for overclocking and benchmarking experiments rather than for everyday use, especially as it comes with unlocked frequency multipliers. You can easily increase the clock rate of its execution and graphics cores and of system memory above default levels.
The A10-5700 is the senior “normal” version of the Trinity. Its TDP is limited to a modest 65 watts, making it suitable for rather compact and energy-efficient computers. AMD had to cut down the clock rates significantly to achieve that, though. The A10-5700 works at 3.4 GHz by default and the clock rate grows no higher than 4.0 GHz in the turbo mode. The clock rate of the Radeon HD 7660D graphics core is reduced to 760 MHz, too. For all these limitations, the A10-5700 is a full-featured Trinity with all of its x86 cores and shader processors active. The lack of L3 cache is somewhat disappointing, but that’s a characteristic feature of the whole Trinity design. Each dual-core Piledriver module in the senior Trinity APUs has a dedicated 2MB L2 cache, so we have a total of 4 MB of L2 cache for the entire chip. On the other hand, it is only connected to the x86 cores and doesn’t help the graphics core or in processing shared data of heterogeneous applications.
The A10-5700 is normally clocked at 3.7 GHz but, like the A10-5800K, drops the frequency at high loads – to 3.0 GHz.
Take note that the 65-watt version of the senior Trinity lacks the letter K in its model name, meaning that this APU can only be overclocked by increasing the platform’s base clock rate. This refers not only to the x86 cores but also to the integrated graphics core and even to DDR3 memory (its speed is limited to DDR3-1866).
It is clear from its model name that the A8-5600K is a weaker variant of the Trinity APU. The difference between the A8 and the A10 series is that two out of the graphics core’s six SIMD engines are turned off in the A8. As a result, the A8-5600K offers the Radeon HD 7560D graphics core with 256 stream processors clocked at 760 MHz. The x86 cores of the A8 are no different from those of its senior cousins. They are based on two dual-core Piledriver modules with 2 megabytes of L2 cache per each module. The A8-5600K isn’t much different from its cousins in clock rates, either. Its base frequency is 3.6 GHz and can be boosted to 3.9 GHz.
In everyday applications the A8-5600K mostly works at 3.8 GHz but the clock rate is lowered to 3.2 GHz at high loads. Although the A8-5600K is closer to the A10-5700 rather than to the A10-5800K in its clock rates, its TDP is as high as 100 watts. It may have something to do with the APU’s overclocking capabilities. Being a K series model, it can be overclocked by changing frequency multipliers.
For each 100-watt Trinity variant AMD also offers a more energy efficient version with a TDP of 65 watts. The A8-5500 is the more economical counterpart of the A8-5600K. They are identical in every specification except for the clock rate of the execution cores. The 65-watt A8 model clocks them at 3.2 GHz (3.7 GHz in turbo mode). Its Radeon HD 7560D graphics core is identical to the A8-5600K’s and works at a frequency of 760 MHz. So, however odd it may seem, the 35% TDP reduction is achieved by lowering the clock rate of the x86 cores by only 200-400 MHz.
The A8-5500 normally works at 3.5 GHz. When the load is high, its speed is lowered to 2.9 GHz, even though AMD doesn’t make this fact obvious in the specifications.
The A8-5500 is not an overclocker-targeted product and its frequency multipliers are all locked. Overclocking its x86 cores, graphics core or memory controller may only be done by changing the clock generator’s frequency.
Starting from the A6 series, the Trinity family includes APUs with a greatly reduced x86 part. For example, this A6-5400K model has only one Piledriver module with two x86 cores for integer operations and one floating-point unit. The capacity of the shared L2 cache is reduced to 1 MB, too. Thus, the total amount of L2 cache in this APU is only one fourth that of the A10 and A8 APUs.
The graphics capabilities of this Trinity modification have been cut down as well. The graphics core integrated into the A6-5400K is called Radeon HD 7540D. Architecturally, it is exactly one half of the Radeon HD 7660D available in the A10 series.
The A6-5400K doesn’t look deficient in its clock rates only. The graphics core works at a standard 760 MHz while the execution cores, at 3.6 GHz (3.8 GHz with Turbo Core technology). As opposed to the quad-core Trinity APUs, the A6-5400K easily accelerates to 3.8 GHz and keeps on going at that frequency continuously at high loads. But, like every other Socket FM2 APU, it eventually slows down, to 3.2 GHz, when doing hard computing work.
It’s hard to tell what overclockers might be interested in the A6-5400K but it comes with unlocked frequency multipliers. We can only suppose that this capability is meant to make it more competitive against Intel’s Pentium and Celeron processors which cannot be overclocked at all. By the way, the A6-5400K is the only overclocker-friendly Trinity with a TDP of 65 watts.
The A6-5400K may seem the ultimate simplification of the original Trinity design, yet the A4-5300 is even simpler than that. This APU has much lower specs. It only has one dual-core Piledriver module with a shared 1MB L2 cache and its clock rate is 3.4 GHz. The turbo mode boosts this clock rate by a mere 200 MHz, but the worst thing is that the maximum memory frequency the A4-5300 supports is DDR3-1600. It’s not just a formal line in its specs. If you try to increase the speed of DDR3 SDRAM above 1600 MHz, the APU will not start up. Therefore, the A4-5300 is the only APU we have to benchmark with DDR3-1600 rather than with faster DDR3-1867 SDRAM.
AMD has been even crueler about the graphics core here. The one incorporated into the A4-5300 is called Radeon HD 7480D, works at 723 MHz and has only 128 shader processors. In other words, there are only two SIMD engines active in the A4-5300 out of the six available in the original design. Fortunately, AMD didn’t turn off video encoding technologies in this junior APU as Intel did, so the A4-5300 is quite suitable for multimedia applications, especially as its actual heat dissipation should be very moderate, considering its relatively low clock rates.
The A4-5300 mostly works at 3.6 GHz, but slows down to 3.0 GHz at high loads.
Intel’s Ivy Bridge series has been extensively tested already, but we’ve never approached them as APUs. Meanwhile, the introduction of OpenCL 1.1 support on the graphics core is one of the key advantages of this microarchitecture. In fact, the Ivy Bridge can be viewed as Intel’s first generation of hybrid processors. Even though Intel itself shuns this designation, just because the term APU was coined by its competitor, Intel’s modern CPUs have every right to be discussed as APUs next to AMD’s solutions.
Like the Trinity, the Ivy Bridge chip consists of three main components: x86 cores, a graphics core and North Bridge (also known as System Agent). The microarchitecture of the x86 cores, which may number up to four inside one chip, is a next, yet not very big, step ahead in the evolution of Intel’s exclusive Core design. Compared to the previous-generation x86 cores (Sandy Bridge), there are but very few changes which mostly boil down to the transition to 22nm tech process. On the other hand, Intel’s current products are blameless when it comes to conventional computing as they offer much higher performance than their competitors, mostly because Ivy Bridge execution cores are full-featured and self-sufficient units which do not share any functional subunits with their neighbors and can even execute two instruction threads concurrently thanks to the Hyper-Threading technology.
Considering the higher computing capacity of modern Intel CPUs, we will only take dual-core Ivy Bridge products to compare with AMD’s APUs. Their quad-core cousins are fundamentally faster than the Trinity in performance as well as price, so they should be viewed as a separate class of products that are targeted as high-performance computers with discrete graphics cards. By limiting our scope in this way, we do not limit our choice of Intel’s graphics cores: there are all existing Intel HD Graphics modifications available among the dual-core Ivy Bridge CPUs.
That’s most appropriate for our purpose since the new-generation graphics cores implemented in the Ivy Bridge series differ from their predecessors dramatically, making inexpensive LGA1155 CPUs competitive against Trinity APUs. Most importantly, the new HD Graphics is compatible with DirectX 11, OpenGL 3.1 and OpenCL 1.1, so it can work with any modern 3D and computing algorithms. Moreover, Intel has nearly doubled 3D performance in the Ivy Bridge series, incorporating up to 16 execution units with improved bandwidth in the integrated graphics core.
It must be noted, however, that AMD, being the inventor of the APU concept, puts much more emphasis on the 3D capabilities of its processors. In Intel’s design, the graphics core takes about 30% of the total area of the semiconductor chip with four x86 cores whereas in the Trinity design the graphics core takes over 45% of the total area. However, Intel agrees with AMD that modern graphics cores are expected to do something more than just building and outputting images. That’s why the HD Graphics series has been developing in terms of its rendering pipeline as well as in terms of HD video decoding. Intel also offers the Quick Sync technology that supports hardware encoding of video into H.264 format.
The less prominent position of the graphics core in the Ivy Bridge CPU is somewhat compensated by the developer’s increased focus on the CPU-integrated North Bridge. Considering that the performance of an integrated graphics core is often limited by memory bandwidth, it is now connected to the CPU’s common ring bus. Thus, it has the same rights to system memory access as the x86 cores and also makes full use of the fast L3 cache which is an integral part of every high-performance Intel CPU.
For this review we tested four Ivy Bridge modifications that fit into the same price category as the AMD Trinity series. These are junior Core i3 models with different versions of Intel’s integrated graphics (HD Graphics 4000 and HD Graphics 2500) as well as Pentium and Celeron CPUs whose graphics core is called just HD Graphics without any numeral but has the same architecture. Basic specs of these CPUs are listed in the following table:
Now let’s move on from these basics to details.
Although the Core i3-3225 is not a senior model in the Core i3 series, it is the fastest Intel CPU in this review. It is all about market positioning: the Core i3-3225 is the closest to the AMD A10-5800K in its price as well as in the fact that it offers the most advanced modification of the HD Graphics 4000 core in the entire Core i3 series.
Viewed as a conventional CPU, the Core i3-3225 is a dual-core processor with Hyper-Threading support that is based on the Ivy Bridge microarchitecture. Its clock rate is set at 3.4 GHz, which is actually the top speed of this CPU since it doesn’t have Turbo Boost for its x86 cores. At low loads the clock rate can be dropped to 1.6 GHz. The total amount of L3 cache memory shared by both x86 cores is 3 MB in each Core i3 series product, and the Core i3-3225 is no exception. Its Intel HD Graphics 4000 is the most advanced version of Intel’s integrated graphics, featuring 16 execution units. It is clocked at 1.05 GHz, which is a mere 100 MHz lower than the clock rate of the same integrated graphics of quad-core Ivy Bridge processors.
Intel’s CPUs have long switched to 22nm tech process, so the Core i3-3225 has a TDP of only 55 watts, being preferable to senior Trinity models in terms of energy efficiency. Its low power consumption is indicated by its voltage, which is 1.0 volts, whereas Socket FM2 APUs work at up to 50% higher voltages.
As opposed to AMD’s solutions, the Core i3 series does not allow to speed up its x86 cores in any way. On the other hand, you can easily overclock the graphics core and memory with the Core i3-3225.
The single difference of the Core i3-3220 from the above-discussed Core i3-3225 is that it comes with a simpler version of the graphics core. Its Intel HD Graphics 2500 core has six rather than 16 execution units. Otherwise, the Core i3-3220 and the Core i3-3225 are twin brothers that have not only the same design but also the same clock rates and other specs.
Based on the Ivy Bridge microarchitecture, the Pentium series is basically different from the Core i3 in the lack of Hyper-Threading. In other words, the Pentium can only execute two but not four instruction threads. Otherwise, the Pentium G2120 is quite similar to the Core i3. It is a dual-core processor with a 3MB L3 cache but its clock rate is 3.1 GHz. Thus, the x86 performance of the Pentium G2120 is going to be much lower compared to the Core i3 at heavy multithreaded loads when the Core i3 can benefit from Hyper-Threading. There is only one thing we should take note of. The Pentium series does not support the AVX instruction set.
When it comes to graphics, the Core i3 and the Pentium have much in common, too. Although the HD Graphics core of the Pentium lacks a numeric index, it has the same design as the HD Graphics 2500 and has six execution units. The clock rate is the same, too. The Pentium G2120 clocks its integrated graphics core at 1.05 GHz. However, one of the key features – Quick Sync – is disabled in the cheaper CPU.
Like the other Intel CPUs covered in this review, the Pentium doesn't offer much in terms of overclocking. It allows increasing the frequency of the integrated graphics core and supports memory modules faster than DDR3-1600. It is impossible to increase the performance of the x86 cores above the default level, though.
The junior Pentium with the Ivy Bridge architecture has a clock rate of 2.9 GHz only but the more disappointing fact about its specifications is that DDR3-1333 SDRAM is listed as the highest supported memory speed. Fortunately, this is just a formal statement. Our testing showed that the CPU’s memory controller could work with faster DDR3 SDRAM modules, too. Thus, the single real difference between the Pentium G2020 and the Pentium G2120 is that the former has a 7% lower default clock rate.
We can also note that the voltage of the Pentium G2020 is lower than 1.0 volts. It must be indicative of a rather economical CPU but Intel doesn't stress this fact. The specified TDP of every Pentium series product is 55 watts, the same as with their senior cousins.
Dual-core Celeron processors with the Ivy Bridge design have been introduced but recently, yet we already have them in our review. Frankly speaking, there’s nothing exciting about them. Intel has cut down too many capabilities in their design. Besides disabled Hyper-Threading, AVX and Quick Sync, as in the Pentium series, we can now see a reduced amount of cache memory and a considerably lower clock rate. The Celeron G1620 has only 2 megabytes of L3 cache whereas its frequency is 2.7 GHz.
On the other hand, the HD Graphics core in the Celeron G1620 is absolutely the same as in the Pentium, featuring six execution units. Well, the real performance of the Celeron’s graphics is going to be lower due to the smaller L3 cache which is accessible by the graphics core in the Ivy Bridge microarchitecture and contributes a lot to solving the problem of low memory bandwidth. The good news is that, even though the Celeron G1620 is officially compatible with DDR3-1333 only, there is no reason why it can’t be used with faster memory modules.
Compared to senior Core i3 processors, the Celeron G1620 has much worse specifications, yet its TDP is specified to be 55 watts, too. It must be just a formality. We expect the Celeron to be highly economical in real-life applications.
Now that we have discussed the details about all participating processors, it is time to take about the actual test platforms. The list below show all software and hardware components that were used in our today’s test session:
Since the primary goal of this test session was to compare the performance of the integrated processors as a single heterogeneous system, all tests were performed without the discrete graphics. The graphics cores were fully responsible for displaying the image on the monitor.
And before we move on to the actual results, we would like to remind you of the relative positioning of all above discussed APUs according to their price:
In the first part of our testing we will focus on conventional computing. It is the performance delivered by the processor’s x86 cores.
As usual, we use Bapco SYSmark 2012 suite to estimate the processor performance in general-purpose tasks. It emulates the usage models in popular office and digital content creation and processing applications. The idea behind this test is fairly simple: it produces a single score characterizing the average computer performance. After the launch of Windows 8 SYSmark 2012 got updated to version 1.5, and this is exactly the version e are using in our test session.
We know the highs and lows of the modern Ivy Bridge and Piledriver microarchitectures well enough, so the standings of the products in the diagrams do not surprise us. Intel’s Ivy Bridge CPUs with Hyper-Threading support are generally faster than the dual-core Piledriver modules of AMD APUs, therefore the Core i3 series takes the lead. Intel’s dual-core CPUs without Hyper-Threading are considerably slower, though. They are comparable in performance to the quad-core Trinity A8 and A10 APUs. As for AMD’s junior APUs that have one dual-core Piledriver module inside, the A6 and A4 series are very weak in terms of general-purpose computing, being inferior even to the Intel Celeron series.
There’s one peculiarity about the relative performance of the AMD products. The A8-5600K processor has a higher SYSmark 2012 score than the A10-5700, a higher-class model. Why? Because the A8-5600K with its 100-watt TDP is designed for working at higher clock rates. As for the A10-5700, its positioning in the senior series does not reflect its computing performance but is due to its full-featured graphics core with the maximum number of shader processors. Thus, if you are interested in computing performance only, the A8-5600K is going to be preferable to the A10-5700.
Let’s take a closer look at the performance scores SYSmark 2012 generates in different usage scenarios. Office Productivity scenario emulates typical office tasks, such as text editing, electronic tables processing, email and Internet surfing. This scenario uses the following applications: ABBYY FineReader Pro 10.0, Adobe Acrobat Pro 9, Adobe Flash Player 10.1, Microsoft Excel 2010, Microsoft Internet Explorer 9, Microsoft Outlook 2010, Microsoft PowerPoint 2010, Microsoft Word 2010 and WinZip Pro 14.5.
Media Creation scenario emulates the creation of a video clip using previously taken digital images and videos. Here they use popular Adobe suites: Photoshop CS5 Extended, Premiere Pro CS5 and After Effects CS5.
Web Development is a scenario emulating web-site designing. It uses the following applications: Adobe Photoshop CS5 Extended, Adobe Premiere Pro CS5, Adobe Dreamweaver CS5, Mozilla Firefox 3.6.8 and Microsoft Internet Explorer 9.
Data/Financial Analysis scenario is devoted to statistical analysis and prediction of market trends performed in Microsoft Excel 2010.
3D Modeling scenario is fully dedicated to 3D objects and rendering of static and dynamic scenes using Adobe Photoshop CS5 Extended, Autodesk 3ds Max 2011, Autodesk AutoCAD 2011 and Google SketchUp Pro 8.
The last scenario called System Management creates backups and installs software and updates. It involves several different versions of Mozilla Firefox Installer and WinZip Pro 14.5.
The particular type of load has a considerable effect on the relative performance of the products we’re discussing here. It is only in two scenarios that the results agree exactly with the overall scores: office work and media content processing. Here, the quad-core AMD A10 and A8 APUs are comparable to Intel’s Pentium and Celeron. All of the Intel CPUs are much faster than any Trinity in terms of website development. It is in rendering and data analysis tasks which can be easily performed in parallel on multiple execution cores that the Socket FM2 APUs show their best. Even though they can’t match the Core i3, the quad-core A10 and A8 APUs are better than Intel’s Pentium and Celeron then. AMD’s APUs are also good in the system maintenance scenario: the competing products are similar to each other, the A4 and A6 series APUs even being cable to beat Intel’s Celeron.
To test the processors performance during data archiving we resort to WinRAR archiving utility. Using maximum compression rate we archive a folder with multiple files with 1.1 GB total size.
The latest WinRAR runs well on multi-core processors, yet the quad-core Trinity APUs are anyway slower than the dual-core Core i3 CPUs that support Hyper-Threading. On the other hand, AMD's A10 and A8 series are much faster than the Pentium and Celeron CPUs which in their turn are ahead of the dual-core AMD A6 and A4 in terms of data compression speed.
To check out audio transcoding performance, we use Nero AAC Encoder 184.108.40.206 to convert a grabbed CD into the AAC format. This encoder (like the majority of tools for converting audio files) generates single-threaded load only.
The Piledriver microarchitecture employed in modern AMD processors can't offer competitive single-threaded performance. Intel's solutions are much faster at such loads. As you can see, encoding audio goes faster on the dual-core Celeron than on the quad-core A10 which is 50% faster in terms of clock rate and 50% more expensive to the bargain.
Web application performance was benchmarked using RoboHornet, a browser test that makes use of all modern resource-consuming web technologies. This test was run in Google Chrome 24.
Today’s web-browsers are known to support multi-core CPUs but formally. For example, even though each Chrome tab is processed as a separate instruction thread, the web application or page in the forefront will only use one CPU core. The Trinity series can’t be expected to deliver high performance under such conditions, so the Ivy Bridge processors crunch through the single-threaded load much faster.
We benchmark CPUs in Adobe Photoshop CS6 using our custom test that is based on the Retouch Artists Photoshop Speed Test and consists of typical processing of four 24-megapixel images captured with a digital camera.
AMD’s hybrid APUs can’t boast high performance in Adobe Photoshop, either, even though this popular image-editing application often generates multi-threaded load. Once again we see that the fastest processor for the Socket FM2 platform is slower than the Celeron G1620, one of the junior products for the LGA1155 platform based on the Ivy Bridge microarchitecture.
In order to measure how fast our testing participants can transcode a video into H.264 format we used x264 FHD Benchmark 1.0.1 (64 bit). It works with an original MPEG-4/AVC video recorded in 1920x1080@50fps resolution with 30 Mbps bitrate. I have to say that the results of this test are of great practical value, because the x264 codec is also part of numerous popular transcoding utilities, such as HandBrake, MeGUI, VirtualDub, etc.
Video transcoding is one of the few scenarios where the Piledriver microarchitecture can show its best. AMD’s quad-core solutions are ahead of the Core i3 series here by up to 15% if we compare the flagship A10-5800K with the similarly priced Core i3-3220. The dual-core Trinity modifications do not share the success of their quad-core cousins, falling behind both the Pentium and the Celeron.
We use special Cinebench 11.5 benchmark to test final rendering speed in Maxon Cinema 4D suite.
Rendering is yet another kind of multithreaded computing load which is mostly done on the CPU’s integer subunits. It is in such situations that the senior Trinity modifications can challenge the Core i3 series. However, they do not have any substantial advantage even now, under favorable conditions. The senior A10 series APUs are only as fast as the junior members of the Core i3 series. The A8 APUs are in between the Core i3 and the Pentium series in performance whereas the A6 and A4 APUs are much slower than the Pentium and the Celeron, just like in the other tests.
The next diagram shows one of the intermediate results of the Futuremark 3DMark11 benchmark – Physics Score. This parameter shows how fast the testing participants can cope with a special physics test emulating the behavior of a complex system with a large number of objects.
Although this load can be easily distributed among multiple processing cores, the AMD A10 and A8 processors with twice the number of x86 cores available in their Intel opponents cannot show a high result. The weak spot of the Trinity design is that each dual-core Piledriver module contains but one floating-point subunit which is required for physics processing. That's why the Core i3 series is faster than the quad-core processors for the Socket FM2 platform whereas the dual-core A6 and A4 APUs are inferior in speed to the modern models from the Pentium and Celeron series.
So it is clear that sheer computing is not counted among the Trinity's fortes. AMD itself agrees, stressing the power of the integrated Radeon HD graphics core, available in the entire Socket FM2 APU family, as the key selling point of the Socket FM2 platform. Intel’s approach is different: the Core i3-3225 is the only inexpensive Ivy Bridge processor to feature an advanced integrated graphics core whereas the other models do not promise high 3D graphics performance.
For a general overview of relative 3D graphics performance of the heterogeneous Trinity and Ivy Bridge processors, we will check them out in Futuremark 3DMark 2011. Graphics cores in modern processors are all DirectX 11 compatible and have no problems running this benchmark.
The picture we see in the diagram is completely different from what we've seen in the computing performance tests. Intel's processors are slow whereas AMD's solutions are many times as fast as their opponents. The HD Graphics 4000, the fastest variant of Intel’s integrated graphics core, can only match the performance of the Radeon HD 7480D available in the junior Trinity model. Intel's CPUs with the HD Graphics 2500 or HD Graphics are slower by 50% and more and thus cannot compete with the Socket FM2 platform in this benchmark at all.
It must be noted, however, that a higher performance of the graphics core doesn’t make a better hybrid processor. It is the balance between the computing and 3D graphics parts that's important. We can check this out by benchmarking the processors in real-life games. There are two test modes: Full-HD resolution (1920x1080 pixels) with low graphics quality settings and 1366x768 pixels with average settings.
Although the results may vary between specific games, it is easy to see the overall picture. AMD's A10 series delivers the highest performance in applications of this kind. The Radeon HD 7660D graphics core can be viewed as equivalent to an entry-level graphics card as it can ensure a playable frame rate at Full-HD resolutions. Of course, you will have to compromise somewhat in terms of visual quality, but you have to do this with inexpensive standalone graphics cards as well.
As for the Core i3-3225 with its Intel HD Graphics 4000, this CPU turns out to be much slower than the flagship solutions for the Socket FM2 platform. Intel's integrated graphics core has much lower performance whereas the high computing performance cannot make up for this deficiency. As a result, the Core i3-3225 is generally inferior not only to the AMD A10 but even to the AMD A8 with Radeon HD 7560D. The Intel HD Graphics 4000 is obviously not as good as the Radeon HD 7540D from the AMD A6-5400K APU, but the dual-core Trinity is too slow from a conventional processor’s standpoint, therefore the Core i3-3225 is often ahead of AMD’s A6 and A4 models in real-life games.
By the way, we were surprised to find the junior Trinity products to have some compatibility issues, notwithstanding the declared support for the latest DirectX. Particularly, the sandbox game Sleeping Dogs refused to run on the A6 and A4 series in our tests. Intel’s graphics cores, which used to often have compatibility problems with some 3D games in the past, have improved in this respect and seem to have no such problems now. It looks like Intel has made a big step forward in adapting its graphics core for the modern software environment with the release of the Ivy Bridge and the latest drivers.
Summing everything up, we can say that among hybrid processors the AMD A10 and AMD A8 series and, with some reservations, the Core i3-3225 can be viewed as suitable for gaming applications. Other CPUs with integrated graphics core can hardly be used for entry-level desktop gaming systems due to their insufficient performance.
An integrated graphics core is not limited to gaming, though. It can be used for one more type of applications we will discuss in the next section.
Promoting its hybrid processors, AMD keeps on reminding us that the integrated graphics core can be used to accelerate general-purpose computations. That’s true. The OpenCL and DirectCompute frameworks that enable parallel computing on both x86 and graphics cores are supported by both the AMD Trinity and the Intel Ivy Bridge series. And while they used to be used by very few specialized applications, the idea of heterogeneous computing has become much more widespread now. That’s why we want to run some performance tests in applications that can make full use of all the resources provided by hybrid processors. There are quite a lot of such applications available so we can pick up a few popular ones. Thus, our testing will have some practical worth, too.
We want to start out with simple tasks of video decoding and transcoding. Modern hybrid processors do this by utilizing their graphics cores, but not their shader processors. There are specialized subunits for that purpose. Intel calls it Quick Sync whereas in AMD APUs these subunits are referred to as UVD3 and VCE.
Today’s processors have no problems playing back HD video in various formats. Hardware video decoding works perfectly even when it comes to playing a 1080p stream at 60 fps and high bitrate. However, as higher resolutions and bitrates get more popular, inexpensive processors may find it difficult to cope. For example, we used in our test a widescreen 4096x1744p@24fps clip encoded in H.264 format with a bitrate of about 34 Mbps. If played via DXVA with enabled hardware decoding, we have dropped frames. And the number of dropped frames depends directly on the CPU's capabilities. The diagram below shows the average number of displayed frames when the test video is reproduced in the software player Media Player Classic – Home Cinema version 1.6.5. We enabled subtitles to make the test even more difficult.
We’ve got some unusual results playing our 4K video. The A10-5800K and A8-5600K APUs are the best with a minimum of dropped frames. The two Core i3 processors are somewhat worse, closely followed by the A10-5700 and A8-5500. The A6, A4, Pentium and Celeron processors are on the losing side in this test, dropping about half of all the frames in the test video.
Well, there is actually no processor that copes with decoding our 4K video perfectly. None of them can be used in a truly versatile media center. As UHD and 4K formats get more popular, users may find it difficult to play movies and video clips with the best possible quality. Software players may get optimized to improve this situation, yet it would be safer to rely on higher-performance hardware components instead.
The other popular video processing task is transcoding. Today, every graphics core developer has realized that specialized transcoders should be integrated into their solutions. We checked out the transcoding capabilities of the Trinity and Ivy Bridge processors using CyberLink MediaEspresso 6.7 that supports both Intel Quick Sync and AMD VCE. During this test, a 1.5GB 1080p H.264 video clip (a 20-minute episode of a TV series) was transcoded into a lower-resolution format for viewing on an iPad 2 (H.264, 1280x768 pixels, 6 Mbps).
The results of the Celeron and Pentium processors are indicative of how important hardware acceleration is for that task. Intel disables Quick Sync in its junior CPU models and their transcoding speed is comparable to the length of the original video. The Core i3 series has Quick Sync and performs the job 10 times faster. We can also note that the senior version of the graphics core, the HD Graphics 4000, is faster by a third, so it differs from the HD Graphics 2500 in this respect as well, not only in the number of execution units.
Anyway, Quick Sync remains the fastest hardware transcoding solution in its every implementation. The Trinity series with its VCE technology is only one third as fast as their opponents in this test. VCE delivers the same transcoding performance in every APU, by the way. The only exception is the A4-5300 model which is about 20% slower than its cousins.
Video transcoding and playback are undoubtedly most important tasks for home computers, but we are interested in how modern hybrid processors do in true heterogeneous applications that run both on x86 cores and shader processors. A significant indicator that the hybrid processor concept has been accepted by the software market is the fact that OpenCL support is added to the popular data compression tool WinZIP. Its 17th version can use GPU resources to compress files, the x86 and graphics cores sharing the load in the following way:
According to the diagram, it is the x86 cores that do the bulk of work, yet the GPU can help a lot. So it is no wonder that the advanced graphics core ensures a substantial performance boost for AMD’s Socket FM2 processors in WinZIP.
The diagram shows that AMD’s effort in promoting the heterogeneous computing concept has not been wasted. The Radeon HD graphics cores implemented in the Trinity APUs really help improve their performance. As a result, the A10 and A8 APUs are as fast as the Core i3 series, the overall picture being different from what we've seen in conventional applications that do not use graphics core resources. The junior dual-core Socket FM2 APUs do not do as well as their senior cousins, though. They are still much slower even than the Celeron G1620.
We should keep it in mind that OpenCL can’t make AMD’s APUs superior to their opponents everywhere. The GPU-based acceleration of computations can only be achieved in specific algorithms that allow decomposing the original task into a lot of identical subtasks. That’s why the majority of heterogeneous software is concerned with image and video processing.
The image editor GIMP is a good example of such an application. In its latest version it features a library of filters with support for OpenCL acceleration. As opposed to WinZIP, these filters are almost exclusively performed on the graphics core whereas the x86 cores only do some auxiliary work.
So it is no wonder that GIMP runs better on high-performance graphics hardware. As an illustration, we can show you the speed of sequential execution of three resource-consuming effects: Gaussian blur, Motion blur and Bilateral.
GPU-based performance acceleration is something you can actually notice. Under favorable conditions, the graphics core’s shader processors can ensure a substantial performance boost. The graphics core architecture of AMD’s Trinity APUs is not only faster than Intel’s HD Graphics but also more optimized for computing. So when the application, like GIMP, is OpenCL-optimized, AMD’s APUs can deliver an outstanding performance in comparison with their Intel counterparts. The Core i3-3225 with the most advanced version of Intel’s integrated graphics is only as fast as the junior Socket FM2 processor AMD A4-5300 when it comes to applying these image filters. The other Intel CPUs are much slower.
Another example of a popular OpenCL-compatible application is the professional video editing tool Sony Vegas Pro 12. When rendering video, it can distribute the load among all the computing resources of hybrid processors.
It must be noted that Intel’s graphics cores are not compatible with that application for some reason although the Ivy Bridge is specified to have no limitations in terms of its OpenCL support. Anyway, owners of LGA1155 systems can only rely on the conventional x86 computing resources here. On the other hand, this fact doesn’t prevent the Intel CPUs to look better in this test of video rendering in Sony Vegas Pro than in the previous case.
AMD’s quad-core Trinity APUs are about as fast as Intel’s Core i3 in Sony Vegas Pro. The dual-core A6 and A4 series models compete successfully with the Pentium and Celeron CPUs.
Next we tested our processors in SVPMark 3. It is a specialized performance benchmark for the SmoothVideo Project software which improves video playback by inserting new intermediary frames into the video stream. This software makes active use of GPU resources via OpenCL.
Well, the APU load graph shows that it is the x86 cores that do the bulk of work here.
However, we still see AMD’s Socket FM2 A10 and A8 APUs outperform Intel’s Core i3. Judging by the difference between the Core i3-3225 and the Core i3-3220, the graphics core’s performance is important for this benchmark, so it is no wonder that the quad-core Trinity models are in the lead. The dual-core A6 and A4 APUs look good, too.
The results suggest that heterogeneous load is what the Socket FM2 platform needs to show its best. Intel’s CPUs, excepting perhaps the Core i3-3225, are not strong under such conditions. So if you plan to use video or image editing applications with OpenCL support, you may want to consider the graphics core’s performance while choosing the optimal platform. This factor may affect your platform’s speed in such applications even more than in 3D games.
We should keep it in mind that the integrated graphics core can only be employed for general-purpose computing if there is no discrete graphics card in the system. If it is installed, the integrated graphics is disabled in the processor, so the whole APU concept is only applicable to integrated platforms. But when the system includes a discrete graphics core, the integrated GPU has no effect on 3D or heterogeneous computing performance, which means that the computing performance of x86 cores remains the main factor for choosing CPUs for classic discrete PC configurations.
One of the advantages of integrated systems we’re discussing in this review is that they have low power consumption and heat dissipation compared to PC configurations with discrete graphics cards. Integrated platforms are often bought with the purpose of minimizing running costs and find their place in a compact computer case. That’s why the developers of processors with integrated graphics cores focus on power-saving features, which is indicated by low TDPs. For example, the TDP of the Core i3, Pentium and Celeron series is limited to 55 watts. AMD’s APUs are somewhat worse in this respect as their TDP is set at 100 or 65 watts, depending on the particular model. However, the TDP parameter itself is just a general requirement to the recommended cooling system. Things can be different in reality, especially as junior processor models should actually be more economical than their senior cousins.
To find out more about the power consumption of all tested processor models from the APU category, we performed a round of special tests. The new digital power supply unit from Corsair – AX760i – allows monitoring consumed and produced electrical power, which we use actively during our power consumption tests. The graphs below (unless specified otherwise) show the full power draw of the computer (without the monitor) measured after the power supply. It is the total power consumption of all the system components. The PSU's efficiency is not taken into account. The CPUs are loaded by running the 64-bit version of LinX 0.6.4-AVX utility. We used FurMark 1.10.4 utility to load the graphics cores. Moreover, we enabled Turbo mode and all power-saving technologies to correctly measure computer's power draw in idle mode: C1E, C6, Enhanced Intel SpeedStep and AMD Cool’n’Quiet.
All the processors and platforms have the same level of power consumption when idle. Each modern processor can switch to a special power-saving mode and consume just a few watts. It is the power requirements of other system components and the efficiency of the mainboard’s voltage regulator that go to the fore, concealing the actual consumption of the processor.
The single-threaded computing load helps rank the processors up according to their power consumption. The Core i3, Pentium and Celeron series are rather economical whereas the Socket FM2 solutions need considerably more power. The high power draw of the AMD A10-5800K must be pointed out: this APU was released by AMD in order to achieve maximum performance, so there was no talking about economy.
At peak x86 computing load we can clearly see the differences between solutions from the two processor manufacturers. Intel’s are more economical than AMD’s. Even the slowest dual-core A4-5300 and A6-5400K need more power than the Core i3 which are much faster in sheer performance. The senior A10 and A8 series with a specified TDP of 100 watts are downright uneconomical compared to their Intel opponents. Their configurations need almost twice as much power as the LGA1155 platforms although their performance isn’t any better. Well, the 65-watt quad-core Trinity products are not really energy-efficient, even though do help you save 20 to 30 watts at high loads compared to their 100-watt cousins.
AMD’s solutions are no better when it comes to 3D loads, but their high power consumption is justified in this case by their higher performance.
There are no changes when both the x86 and graphics cores are in use concurrently. The A10-5800K and A8-5600K processors with a TDP of 100 watts need 30 to 50 watts more than the others in practical applications. The platform with the Core i3-3225 (Intel HD Graphics 4000) only needs more power than the Socket FM2 systems with dual-core Trinity APUs. Thus, the Trinity with a TDP of 65 watts is hardly economical even compared to the Core i3-3225. Intel’s solutions obviously offer better performance per watt. Besides being more energy efficient, they are more versatile in terms of installing them in cramped systems cases with low-wattage PSUs and low-profile coolers.
Conventional CPUs shouldn’t be confused with APUs. The recently invented concept of hybrid processors has already gained broad recognition on the market, so we can approach APUs not as a variety of conventional CPUs but as a whole new class of products. Although AMD and Intel took different roads to reaching the APU goal, their current products offer a similar set of capabilities: two or four x86 cores, an integrated DirectX 11-compatible graphics core, OpenCL 1.1 support for heterogeneous computing, and dedicated subunits for HD video decoding and transcoding. On the other hand, since each developer has its own priorities and technical experience, the Trinity and Ivy Bridge series are very different and have specific highs and lows due to their design peculiarities. That’s why either series may be viewed as the better one depending on what capabilities of hybrid processors we focus on.
For example, Intel’s products remain unrivalled when it comes to conventional x86 performance. The senior quad-core Socket FM2 APUs from the A10 and A8 series are in between the Core i3 and Pentium in terms of average performance whereas the junior dual-core A6 and A4 are inferior to the Celeron series. AMD tries to make up for this discrepancy by means of pricing, yet not always successfully. The biggest problem of the Trinity family with x86 Piledriver modules is the low performance of the individual execution cores which shows up in many everyday applications.
Instead, AMD can offer much higher 3D graphics performance. With Radeon HD 7000D-class graphics cores with VLIW4 architecture, the Trinity series are much faster than any Ivy Bridge when running 3D games. This is true even for the Core i3-3225 which has the most advanced of Intel’s integrated graphics cores, HD Graphics 4000. In games, the Core i3-3225 can only compete with the AMD A6-5400K but not with the faster Trinity variants. Thus, Socket FM2 configuration without a discrete graphics card can be viewed as entry-level gaming platforms whereas similar configurations with a Core i3-3225 can only be characterized so with many reservations. The other processors from Intel’s Core i3, Pentium and Celeron series have slower versions of the integrated graphics core (HD Graphics 2500 or HD Graphics) and cannot guarantee playable frame rates in modern games even at low resolutions and low visual quality settings.
With their high 3D graphics performance and their architecture optimized for streaming algorithms, AMD APUs turn out to be unexpectedly good at heterogeneous computing. If an application can employ graphics core resources for computations, AMD APUs can show their very best and deliver much higher performance compared to Intel’s CPUs. And such applications are not so exotic nowadays. OpenCL-compatible software is on the rise, such functionality being implemented in many popular video and image editing tools. We have no doubts that there will be more and more such applications in the future.
Ivy Bridge CPUs still remain the fastest solution for basic video transcoding, though. The Quick Sync technology has no rivals as yet, the Trinity’s VCE turning out to be much slower. The bad news is that Quick Sync is only available in Core i3 and higher CPUs and is supported by a limited number of applications. This situation can change in the near future, though. The recent release of Intel Media SDK 2013 has paved the way for the developer community to easily use Quick Sync in their applications.
Intel’s solutions have one more indisputable advantage. The cutting-edge 22nm tech process employed for the Ivy Bridge series and numerous microarchitecture optimizations make the LGA1155 platform much more economical compared to same-class Socket FM2 systems. Thus, Core i3, Pentium and Celeron products seem to be preferable for compact computers or when energy efficiency is the top priority.