by Ilya Gavrichenkov
05/13/2014 | 03:53 AM
The last few years we’ve been watching AMD’s processor department lose its ground in terms of its presence in traditional PCs. The company is preaching about the importance of mobile and embedded solutions while keeping silent on desktop CPU issues. What we see is that AMD first lost the top-end CPU market to its rival and now they are already talking about only offering low-end desktop processors with integrated graphics. At least, this is implied by the proposed roadmap where we have no updates in the flagship FX series but do see more focus on accelerated processing units (APUs) which combine x86 and graphics cores within a single semiconductor die. Against this background, AMD’s new APU design, codenamed Kaveri, comes out as the company’s key processor product in 2014. Building on the ideas implemented in the Trinity and Richland APUs, the Kaveri is going to be viewed critically in this article since we remain loyal fans of desktop PCs.
There’s nothing particularly bad about AMD’s shifting its focus to processors with integrated graphics. After all, the majority of Intel’s desktop CPUs have the same internal design. The problem is that AMD, unlike Intel, does not strive to push the performance bar higher, preferring to set other priorities. The FX series was about multicore processors handling a lot of computing threads simultaneously. Now, with the APU having fewer x86 cores, the integrated graphics core comes to the fore. The Kaveri is optimized for affordable mobile gadgets, so it is supposed to sport a high performance-per-watt ratio. And this ratio is being improved not by increasing performance but by lowering power consumption and heat dissipation, which is now limited to 35 or even 15 watts.
Desktop users are offered special versions of the Kaveri design with a TDP up to 95 watts, yet AMD doesn’t even claim that they deliver high performance, talking instead about certain new capabilities. Judging by everything we know about it, the Kaveri can’t bring about any changes to the desktop processor market. The new series, just like AMD’s previous APUs, consists of affordable products for home, office and entry-level gaming computers.
It would be wrong to say that the Kaveri is absolutely unexciting for desktop users, though. It features a new version of the Bulldozer microarchitecture, codenamed Steamroller. Its graphics core is now based on the GCN design. And the APU at large complies with the heterogeneous system architecture (HSA) specs. Even though these innovations can’t make the new processors interesting for gamers or enthusiasts, they are quite exciting in their own way. At least, they show us what direction AMD is going in and may even suggest whether AMD will ever return to developing high-performance desktop processors as a top priority.
There are two desktop Kaveri-based products available since the beginning of 2014: A10-7850K and A10-7700K. They are not shipped in mass quantities, yet their availability is not a problem. We will be discussing the senior model which works at higher clock rates and features the most shader processors in its integrated graphics core. In other words, the A10-7850K is the fastest modern APU from AMD. Both models have a TDP of 95 watts.
There exists a third Kaveri-based desktop APU which features a TDP of 65 watts. This energy-efficient A8-7600 model is yet unavailable in retail, so we will review it at some other time.
The new microarchitecture of the Kaveri’s x86 cores is perhaps the most intriguing innovation this APU brings about. When AMD’s previous high-performance designs, Bulldozer and Piledriver, had failed to catch up with Intel’s Core series in performance, the Steamroller came to be regarded as the next attempt for AMD to deliver a truly competitive top-end product. AMD was expected to get rid of the key downside of the earlier designs, namely their low single-threaded performance.
Well, even if the Steamroller is indeed a big step forward compared to its predecessors, it is no breakthrough. AMD will not implement it in fast multicore processors, limiting it to the quad-core Kaveri which is positioned as an affordable integrated solution. AMD claims that the new microarchitecture can improve performance by about 20% over the Piledriver at the same clock rate. The Steamroller’s more complex design and mobile targeting means that its top clock rate is lower, though, and the practical performance benefits are rather small, too. Even the introduction of the more advanced 28nm tech process doesn’t save the day for AMD.
Thus, the Steamroller should be viewed as an improvement on the previous Bulldozer and Piledriver microarchitectures, judging by performance as well as internal design. AMD keeps on optimizing its basic microarchitecture in small steps, building on the Bulldozer foundation. As in the earlier designs, the Steamroller features dual-core x86 modules with a shared 2MB L2 cache per each module. There are no innovations in terms of instruction sets. The Steamroller doesn’t support AVX2.
It is the distribution of resources shared by the cores within a single module that has been revised. In the original Bulldozer concept, there were quite a lot of single functional subunits within each dual-core module including instruction fetch and decode subunits, a floating-point unit, and cache memory. It helped make the semiconductor die less complex, reducing its heat dissipation and enabling multicore processors with a rather high frequency potential. The downside of that concept was that the shared resources would become a bottleneck under multithreaded loads. Practice suggests that the instruction decoding stage was the most performance-limiting bottleneck, so AMD doubles the number of decode units in the Steamroller microarchitecture.
Now each core in the dual-core module has a dedicated decode unit capable of processing up to four x84 instructions per clock cycle. Instructions are still fetched by a shared fetch unit whose efficiency and performance have been improved in other ways. Particularly, branch prediction algorithms have become better by using larger buffers. The size of the shared L1 instruction cache has been increased from 64 to 96 kilobytes. The cache itself is now 3- rather than 2-way associative.
It must be noted that the double number of decode units and the other optimizations are only meant to get rid of the main bottleneck in the microarchitecture. We can’t expect the Steamroller to deliver double performance. There are still a few bottlenecks on the instruction fetch and execution stages which are only going to be dealt with in the next microarchitecture revision codenamed Excavator.
Besides the mentioned changes in the front part of the execution pipelines, the Steamroller only features some minor improvements that don’t affect its performance much. For example, the FPU’s execution subunits have been balanced out to optimize their load level. The interface between the L1 and L2 caches now ensures faster data transfers. Some of the Steamroller improvements are only meant to achieve better energy efficiency: the L2 cache is split up into four independent sections each of which can be turned off to save power whereas the decode units now have a micro-op queue which, when full, triggers the shutting-down of those units, too.
The higher performance of the Steamroller microarchitecture goes hand in hand with higher complexity. The transistor count per each dual-core module has increased by over 60% upon the Piledriver to Steamroller transition due to microarchitecture improvements and new automated methods of semiconductor chip integration. Thus, the Steamroller seems to diverge from the original concept of making processors out of a number of high-frequency and low-complexity cores. In practical terms, it is reflected in AMD’s unwillingness to apply the Steamroller to multicore FX series products.
Anyway, AMD promotes the Steamroller optimistically, emphasizing its advantages and glossing over their tradeoffs. According to the official data, the hit rate for the L1 instruction cache is improved by 30%. The branch prediction rate is improved by 20%. The overall scheduler efficiency is 5 to 10% higher now. All of this helps optimize the load level of the execution devices by about 25%.
Of course, such claims must be checked out in practice. So we are going to compare the actual performance of quad-core Richland and Kaveri-based processors (based on the Piledriver and Steamroller microarchitecture, respectively) clocked at the same frequency of 4.0 GHz in the synthetic benchmarks of the AIDA64 4.30.2907 utility. We also throw in a quad-core Haswell clocked at 4.0 GHz with Hyper-Threading turned off. The Richland’s results serve as the baseline for better readability.
The picture is rather gloomy for AMD. For all their efforts, AMD developers have not been able to ensure a substantial increase in performance. The Steamroller is a mere 10% faster than the Piledriver on average. There are even scenarios where the new microarchitecture is slower, like in the Queen benchmark which focuses on branch predictions and penalties associated with wrong predictions. This raises some doubts about AMD’s claims that the input section of the execution pipeline has been optimized.
It is the hashing benchmark that shows the highest performance benefits from the Steamroller microarchitecture. The benchmark uses the standard SHA1 algorithm with integer vector instructions.
The diagram shows the gap between AMD and Intel solutions, too. There’s a twofold difference between the Kaveri- and Haswell-based processors working at the same clock rate and having the same number of x86 cores. AMD’s new microarchitecture doesn’t even try to compete in terms of sheer speed, so the quad-core Kaveri can only be viewed as a competitor to the dual-core i3 models when it comes to computing performance.
Now let’s check out the floating-point performance.
The Kaveri is an average 6-7% ahead of the Richland in this test, both processors working at the same clock rate. The AMD processors are very slow in comparison with the Haswell, which might be expected as the quad-core Richland and Kaveri-based products only incorporate two FPUs.
Thus, the Kaveri series is just as bad in terms of x86 performance as its predecessors. Whatever claims AMD may make about their innovations, they cannot compete with Intel’s quad-core solutions. We’ll talk about the practical performance of Kaveri APUs in popular applications shortly. Right now, let’s discuss the integrated graphics core. AMD is much better at it than at developing x86 cores.
Codenamed Spectre, the integrated graphics core of Kaveri APUs features an updated architecture. The Richland’s graphics was based on the VLIW4 architecture whereas the new integrated GPU sports the newest GCN 1.1 architecture and can match up-to-date graphics cards in functionality (it has the same architecture as AMD’s Volcanic Islands solutions). Of course, the Spectre has much fewer shader processors than flagship Hawaii-based cards, yet it is classified as Radeon R7 and supports all modern APIs including DirectX 11.2, OpenGL 4.3 and Mantle.
The GCN architecture hasn’t changed much on its way from discrete graphics cards to APUs, so we see several compute units, each with 64 shader processors (IEEE 2008 compatible), 4 vector units and 16 texture-mapping units. In its maximum configuration the Kaveri’s graphics core may contain up to eight compute units combined with a geometry coprocessor and up to eight raster operators capable of processing up to 8 pixels per clock cycle (or up to 32 pixels with no color).
So, the Kaveri may have a total of up to 512 shader processors, being in-between such midrange graphics cards as Radeon R7 250 and Radeon R7 250X. AMD estimates the Spectre’s theoretical performance at 737 gigaflops but we must be aware that the gaming speed of an integrated graphics core is largely limited by the memory bandwidth rather than by the core’s shader processors. That’s why the Spectre is actually slower than the $100 discrete cards.
Besides the memory interface, the Kaveri’s integrated GPU has no limitations in its architecture compared to its discrete cousins. The Spectre can process and rasterize up to one geometrical primitive per clock cycle, has a large cache for storing primitive data, and features improved geometry and hardware tessellation performance thanks to data buffering optimizations implemented in the GCN architecture.
The Kaveri’s key feature which is emphasized by AMD is that the graphics core resources can be utilized for heterogeneous computing sharing the same system memory with the x86 cores. For that purpose, the graphics core incorporates eight independent asynchronous compute engines, which can work in parallel with the graphics command processor and serve up to eight instruction queues each. Each engine can access the APU’s cache and memory controller, implementing HSA technologies.
The asynchronous compute engines can actually work independently, allowing AMD to promote the Spectre as an additional eight processing cores. In AMD’s terms, a processing core is a programmable hardware unit capable of running at least one process in virtual memory independently of other cores. We just must keep it in mind that the GPU cores require special program code and may only be utilized by special software.
Talking about the functionality of the Kaveri’s graphics core, we must mention its TrueAudio coprocessor for hardware-accelerated dynamic 3D audio effects. As earlier, the processor incorporates dedicated engines, VCE and UVD, for encoding and decoding HD video content. Their capabilities have been enhanced once again. The VCE engine version 2 features improved encoding quality by being able to introduce B-frames in the YUV420 color space and supporting the YUV444 color model. The UVD engine is version 4 now and features higher stability when handling errors in video streams.
AMD’s marketing department used to be criticized for making a poor job of promoting new products and technologies. Now the situation is just the opposite. AMD’s marketing efforts manage to make users interested even in those features that are not actually available as yet. That’s what we have with HSA: the Kaveri only has hardware support for x86 and graphics cores to have common memory access but AMD is already touting the new technology, demonstrating impressive diagrams and promising huge performance benefits.
The fact is there is yet no HSA. To implement and use HSA capabilities the hardware support must be complemented with a compatible software infrastructure which is currently not present even in some basic form. First off, AMD has not yet released an HSA-compatible driver so there’s no talking about any publicly available software. Of course, HSA-enabled applications will come out sooner or later, but we suspect that the Kaveri APUs will have become obsolete by that time. As for today, the Kaveri’s HSA support may only be interesting for software developers who can polish off their future products on this hardware.
All currently available applications with heterogeneous computing support make use of the OpenCL 1.2 API which doesn’t regard processor cores of different types as equivalent to each other. From an ordinary user’s standpoint, the Kaveri is just the same as its predecessor Richland when it comes to hybrid computing, but we still think we have to say a few words about the hardware HSA support. Just keep it in mind that it is all a matter of long-term perspectives.
So the key point of heterogeneous computing is that many tasks can be accomplished on the GPU’s stream processors faster and more efficiently than on scalar x86 cores. By combining both types of hardware resources it is possible to ensure efficient execution of a wide variety of applications. Heterogeneous processors didn’t really come to their own at first, though. Applications developers found it hard to create appropriate applications. The HSA technologies are meant to facilitate the development of heterogeneous program code and also make it execute faster.
The first HSA component is called Heterogeneous Uniform Memory Access (hUMA). It provides simple access to all system memory irrespective of what ALU subunit issues a memory request. Thus, Kaveri’s x86 and graphics cores all have the same access directly to cache and system memory. The hardware hUMA implementation ensures cache memory coherency and lets the Kaveri’s integrated GPU work with physical and virtual memory in 32-gigabyte address space. To cut it short, hUMA eliminates any limitations and differences between system and graphics memory.
The second important technology which is based on HSA and makes the Kaveri a truly heterogeneous processor is Heterogeneous Queuing (hQ). Currently, all computing load goes through the x86 cores one way or another, even if it is meant for the integrated GPU. It is the x86 cores that are responsible for submitting tasks to the GPU and checking out their completion, which involves certain latencies. With hQ technology, the GPU can interact with the application and x86 cores directly, which eliminates the difference between the CPU and the GPU, reduces latencies and simplifies the parallel processing of data on computing cores of different types. Like the CPU, the GPU acquires the right to create computing threads and submit them for execution.
The whole HSA concept looks highly promising from a theoretical standpoint. AMD suggests it will become widespread in image/video playback and editing applications, in new-generation voice/gesture/facial recognition interfaces, and in games (for physics computations or AI modeling).
We only have to wait for applications with HSA-enabled OpenCL 2.0 to come out. And that will not happen until the next year.
Now that we’ve discussed the CPU and GPU components of the hybrid Kaveri processor, let’s take a look at it as a whole. The Kaveri, like its predecessors Trinity and Richland, is based on two dual-core Steamroller processing modules + an integrated GPU. In its maximum configuration, the new-generation hybrid processor only has four x86 cores, the more advanced integrated graphics core Radeon R7 being the single important difference from the previous models. The Radeon R7 features the new GCN 1.1 architecture and may incorporate up to 512 shader processors (a third more than in the maximum APU configurations of the previous generation).
Considering the small improvements in the Steamroller microarchitecture, the Kaveri is more graphics-oriented than its predecessors. The Richland’s x86 part accounted for 58% of the total transistor budget whereas the Kaveri’s x86 part accounts for 53%. The new APU is overall more complex, though. It is made out of 2.41 billion transistors as opposed to the Richland’s 1.3 billion. That’s even more than in Intel’s Haswell with GT3 graphics (1.8 billion). So the Kaveri series proves the fact that high complexity of a semiconductor die doesn’t necessarily translate into high performance. It does provoke serious manufacturing problems, however.
The Kaveri is manufactured on 28nm tech process by GlobalFoundries. Optimized for high transistor density, the process is referred to as SHP or Super High Performance. The SOI technology is not used. As a result, the Kaveri semiconductor die is 245 sq.mm large, which is about the same size as the 32nm Richland die.
Kaveri APU die
The downside of the high transistor density is low frequency potential. The top clock rate of the Kaveri’s x86 part is 3.7 GHz whereas its integrated graphics core is clocked at up to 720 MHz. Manufactured on 32nm tech process with SOI, the Richland’s CPU and GPU were clocked at up to 4.1 GHz and 844 MHz, respectively, which is about 10-15% higher. As a kind of compensation, AMD promises some power dissipation reduction in the new APUs whose desktop modifications have a specified TDP of 95/65/45 watts. The Richland series had a peak TDP of 100/65/45 watts but the 45W models were not widely available. Energy-efficient Kaveri products with a TDP of less than 95 watts are not available yet, either.
Thus, the Kaveri model range for desktop PCs consists of only two products: A10-7850K and A10-7700K. They have four x86 cores each but differ in clock rates. The A10-7850K has a base clock rate of 3.7 GHz whereas the A10-7700K, 3.4 GHz. The Turbo Cache technology can boost the clock rate to 4.0 and 3.8 GHz, respectively. The APUs differ in the number of shader processors. The A10-7850K has the maximum of 512 shader processors while the A10-7700K has 384, just like the Richland. The integrated graphics core is clocked at 720 MHz in both Kaveri models.
The Kaveri processors come out together with the Socket FM2+ platform which introduces a new CPU socket. It was meant to add DDR4 SDRAM support in the first place but something went wrong and the Kaveri’s memory controller only supports two standard DDR3 SDRAM channels. Later on, AMD gave up the idea to support DDR4 in the next APU generation, codenamed Carrizo, which was to be compatible with Socket FM2+. So it turns out now that the new socket has only been introduced to make users upgrade their mainboards.
As expected, Socket FM2+ is very similar to Socket FM2 but has differently positioned keys which make it impossible to insert a new Kaveri processor into an old Socket FM2 mainboard. Socket FM2+ mainboards, on the contrary, are compatible with older processors of the Trinity and Richland generations. Old processor coolers can be used with Socket FM2+ mainboards, too.
Socket FM2 on the left and Socket FM2+ on the right
Socket FM2+ mainboards have been around for quite a while already, so you should have no problems finding one for your Kaveri APU. They are based on the Bolton series chipsets (A88X and A78) which have almost the same specs as their Hudson predecessors (A85X and A75).
The new functions offered by Socket FM2+ mainboards are limited to their support for PCI Express x16 3.0 and high-speed DDR3 SDRAM (up to DDR3-2400). But both features are actually provided by Kaveri APUs which have updated PCIe and memory controllers. So if you install an older APU on your Socket FM2+ mainboard, you won’t get the PCIe x16 3.0 and DDR3-2400 SDRAM support.
The A88X and A78 chipsets only have one new capability of their own. It is the upgraded SATA RAID controller which adds TRIM support for RAID0 arrays built out of solid state drives.
To carry out our today’s tests we’ve got a sample of the senior Kaveri-based APU with the model name of A10-7850K. Here are its specs in comparison with the flagship Richland-based model:
As you can see, the senior Kaveri model is more expensive than the A10-6800K but doesn’t offer much for that money. In fact, it is only superior in terms of GPU performance as its GPU features a new architecture and has more shader processors. On the other hand, the graphics performance of the A10-7850K is going to be limited by its memory bandwidth rather than its GPU. For example, the discrete card Radeon R7 250 has only 384 shader processors but comes with 73.6GB/s GDDR5 SDRAM. The A10-7850K will only have a peak memory bandwidth of 34.1 GB/s if equipped with dual-channel DDR3-2133.
The GPU is clocked at 720 MHz in 3D applications. In 2D mode the clock rate is dropped to 350 MHz to save power. It must be noted that the Richland’s graphics worked at higher clock rates, so the difference between the A10-7850K and A10-6800K in terms of theoretical performance is about 13% in favor of the newer APU: 737 vs. 648 gigaflops.
The x86 part is not going to ensure such performance benefits. The Steamroller microarchitecture doesn’t bring about a significant increase in the number of instructions executed per clock cycle whereas the clock rate of the A10-7850K is considerably lower compared to its predecessor.
AMD markets the new APU at the same price as junior Core i5 models, which seems to be too expensive. Perhaps we just don’t see some hidden benefits?
According to the CPU-Z utility, the fully loaded A10-7850K works at 3.7 GHz and 1.328 volts, which is comparable to the typical voltage of AMD’s earlier APUs. The Turbo Cache technology boosts the clock rate to 4.0 GHz when only one or two Steamroller modules are in use. It is good that AMD has implemented the frequency adjustment correctly in the Kaveri, so we didn’t observe any frequency drops below the baseline of 3.7 GHz during our tests as we had seen with the earlier products. When idle, the A10-7850K is clocked at 1.7 GHz using power-saving technologies. The APU-integrated North Bridge is clocked at a lower frequency than the APU itself. This frequency is 1.8 GHz for the A10-7850K.
The A10-7850K is shipped in AMD’s traditional black-and-red packaging. As is mentioned on the box, the APU is Black Edition, so its frequency multipliers are unlocked, letting you easily overclock both its x86 and graphics cores.
The APU is shipped together with a simple cooler that consists of an aluminum heatsink and a PWM-regulated 70mm fan AVC DESC0715B2U.
Unfortunately, this cooler is no good for serious loads. At the maximum speed of 4100 RPM it is rather noisy. And it cannot cope with the A10-7850K when the latter is overclocked.
We will compare the AMD A10-7850K APU with its predecessor as well as with comparably priced products from Intel. So besides the senior Kaveri model, we take the flagship Richland model A10-6800K and two Haswell-based processors: the fastest dual-core model Core i3-4340 and the junior quad-core model Core i5-4430. Take note that the A10-7850K is close to Intel’s quad-core products in price but we only expect it to be comparable to the dual-core Haswell in terms of performance.
When testing the graphics capabilities of the A10-7850K, we had to use discrete graphics cards. These were Radeon R7 240 and Radeon R7 250 cards with DDR3 and GDDR5 memory from ASUS and Gigabyte.
So here is the full list of the hardware and software components we’re using for today’s test session.
Take note that we benchmark traditional x86 performance with a discrete graphics card Nvidia GeForce GTX 780 Ti. The integrated graphics cores are discussed and tested separately.
As usual, we use the Bapco SYSmark 2012 suite to estimate performance in everyday computing tasks. It emulates a user working in popular office and digital content creation and processing applications. The test produces a single score indicative of the computer’s average performance across different applications. After the launch of Windows 8, SYSmark 2012 got updated to version 1.5 and we use the updated version for our tests.
Just as expected, the improvements in the microarchitecture of the Kaveri’s x86 cores translate into but a very small increase in performance over the earlier APUs. And since the A10-7850K works at a lower clock rate than the A10-6800K, the new Socket FM2+ processor turns out to be inferior to the older one in popular applications. There’s no talking about competing with today’s Core i3 and Core i5 models, of course. Intel’s Pentium series can even sport a higher SYSmark 2012 score than the new quad-core A10-7850K from AMD.
Now let’s take a closer look at the performance scores SYSmark 2012 generates in different usage scenarios. The Office Productivity scenario emulates typical office tasks, such as text editing, spreadsheets, email and web-surfing. This scenario uses the following applications: ABBYY FineReader Pro 10.0, Adobe Acrobat Pro 9, Adobe Flash Player 10.1, Microsoft Excel 2010, Microsoft Internet Explorer 10, Microsoft Outlook 2010, Microsoft PowerPoint 2010, Microsoft Word 2010 and WinZip Pro 14.5.
The Media Creation scenario emulates the creation of a video clip out of prepared materials (digital images and videos) using popular tools from Adobe: Photoshop CS5 Extended, Premiere Pro CS5 and After Effects CS5.
Web Development is a scenario that emulates website authoring. It uses the following applications: Adobe Photoshop CS5 Extended, Adobe Premiere Pro CS5, Adobe Dreamweaver CS5, Mozilla Firefox 3.6.8 and Microsoft Internet Explorer 10.
The Data/Financial Analysis scenario is devoted to statistical and market analysis by means of Microsoft Excel 2010.
The 3D Modeling scenario is about creating 3D models and rendering static and dynamic scenes using Adobe Photoshop CS5 Extended, Autodesk 3ds Max 2011, Autodesk AutoCAD 2011 and Google SketchUp Pro 8.
The last scenario, System Management, creates backup copies and installs software and updates. It involves several different versions of Mozilla Firefox Installer and WinZip Pro 14.5.
The senior Kaveri-based model is inferior to the Richland at most types of load. The only exception is 3D modelling but the A10-7850K is a mere 3% ahead of the A10-6800K then. So if you don’t care about integrated graphics, the Kaveri is not a better choice than its predecessor. In fact, even the Core i3-4340, which is substantially cheaper than the A10-7850K, can offer higher performance in everyday applications typical of home and office PCs. The Kaveri just does not impress us as a desktop PC.
As you know, it is the graphics subsystem that determines the performance of the entire platform in the majority of contemporary games if the platform has a fast enough processor. However, the Kaveri is so slow that you can see the difference even if you've got a high-performance discrete graphics card and use maximum visual quality settings. That’s why we have only one test mode here: Full-HD resolution and maximum quality. We can see a substantial difference between our processors with our GeForce GTX 780 Ti card.
Just as we wrote above, the A10-7850K offers lower computing performance than the A10-6800K. Although the Richland APU is based on the older Piledriver rather than the new Steamroller microarchitecture, it has a 10% higher clock rate and a more aggressive Turbo Cache setup. This is enough to ensure a higher frame rate with the same discrete graphics card.
Well, it hardly even matters since none of AMD’s modern APUs is any good for gaming configurations with discrete graphics. Neither the A10-7850K nor the A10-6800K can compete even with the dual-core i3-4340 in gaming performance. And it shouldn’t come as a surprise to you if you’ve read our earlier reviews. AMD processors with Bulldozer and subsequent microarchitectures are far from brilliant when it comes to gaming.
In Autodesk 3ds max 2014 we benchmark the speed of mental ray rendering of a complex 3D scene.
There are not so many scenarios where the Kaveri series can show a decent computing performance. 3ds max 2014 is perhaps one of the examples. Even though the new quad-core APU from AMD can't match the speed of the junior quad-core Haswell, it is at least no worse than the dual-core i3-4340. By the way, this test shows a positive effect from the improvements in the Steamroller design: the A10-7850K is as much as 18% ahead of the A10-6800K.
We benchmark performance in Adobe Photoshop CC using our custom test which is based on the Retouch Artists Photoshop Speed Test and consists of typical processing of four 24-megapixel images captured with a digital camera.
We see a typical picture in Photoshop: the new A10-7850K falls behind its predecessor A10-6800K by 5% and is downright slow in comparison with Intel's products. Even the dual-core i3-4340 beats the senior quad-core Kaveri model by 42%.
The performance in the video editing suite Adobe Premiere Pro CC is measured as the time it takes to render a Blu-ray project with 1080p25 video into H.264 format and apply special effects to it.
The Steamroller-based A10-7850K is a little faster than its Piledriver-based predecessor but the overall situation remains the same. The four cores from AMD are less effective than Intel's modern dual-core solution with Hyper-Threading technology. Comparing the A10-7850K with the comparably priced Core i5-4430, we’d say that they are products of absolutely different classes.
In our ABBYY FineReader 11.0 test we convert a scanned document with lots of formulas and images into a text format.
We’ve tested the new Kaveri in very different applications but the A10-7850K hasn’t been able to compete with the Core i3, let alone the Core i5. Here, the senior Kaveri model is 17% and 28% slower than the Core i3-4340 and the Core i5-4430, respectively. It is also worse than its Socket FM2 predecessor A10-6800K.
The processors’ performance in cryptographic tasks is measured with the built-in benchmark of the popular TrueCrypt utility that uses triple AES-Twofish-Serpent encryption. Besides optimizations for multi-core CPUs, it supports the AES instructions.
This is the only diagram in the x86 performance section of our review where the Kaveri looks really good. The A10-7850K is 12% faster than the A10-6800K and even outperforms the competing CPUs from Intel!
To test the processors’ performance at data archiving we run WinRAR 5.0. Using maximum compression rate, we archive a 1.7GB folder with multiple files.
The new Steamroller microarchitecture cannot make up for the reduced clock rate of the Kaveri processor which takes more time than the A10-6800K to compress the same files. The new APU is also up to 50% slower than Intel’s processors here.
In order to measure how fast the tested CPUs can transcode video into H.264 format we used x264 FHD Benchmark 1.0.1 (64 bit). It measures the time it takes the x264 coder to convert an MPEG-4/AVC video recorded in 1920x1080@50fps resolution with default settings. The results have high practical value because the x264 codec is part of popular transcoding utilities such as HandBrake, MeGUI, VirtualDub, etc. We regularly update the coder used in this performance test. This time around, we use version r2389, which supports all contemporary instruction sets including AVX2.
Along with final rendering and encryption, video encoding is a kind of application where the A10-7850K manages to beat the A10-6800K. Moreover, the senior Kaveri model catches up with the dual-core i3-4340! That’s indeed an achievement for AMD's new processor design considering the previous tests.
Encoding video with a bare coder is hardly a real-life application, so we want to check out the speed of video transcoding with the popular free tool Freemake Video Converter 4.1.0. It uses the FFmpeg library and is based on the x264 coder too, but features certain optimizations. We disable CUDA for this test to create maximum load on the CPUs’ computing cores but enable DXVA optimizations.
Freemake Video Converter doesn’t yet use the AVX2 instructions, creating more favorable conditions for AMD APUs that don’t support AVX2. The quad-core A10-7850K is ahead of its predecessor as well as of the dual-core Haswell-based i3-4340. The gap is small, though. We still can’t say that the updated microarchitecture makes AMD's quad-core processors superior to Intel's dual-core ones in terms of x86 performance.
We’ve passed the most unpleasant part of this testing for the new APU. We’ve made sure the Kaveri’s x86 performance is low, so now we want to check out its graphics department. The A10-7850K looks optimistic in this respect as its integrated graphics core features high theoretical performance. AMD claims the Kaveri lets you do without any discrete graphics card if you want to use your Socket FM2+ platform for gaming. According to AMD’s official data, the new APU can ensure a playable frame rate (over 30 fps at Full-HD resolution) not only in the majority of online projects but also in popular single-player games.
Let’s see if this is indeed so. To make our testing complete, we will compare the A10-7850K with other hybrid processors as well as with inexpensive discrete graphics cards (Radeon R7 240 and Radeon R7 250 with DDR3 and GDDR5 SDRAM).
As a tentative test of the 3D performance of the graphics core integrated into the Kaveri processor, we will run Futuremark 3DMark. Its Cloud Gate test is designed to benchmark DirectX 10 performance of typical home PCs whereas the most resource-consuming Fire Strike is targeted at gaming DirectX 11-compatible configurations.
So AMD is right claiming high performance for the A10-7850K’s integrated graphics. The test results suggest that it is competitive to discrete graphics cards with DDR3, let alone other integrated graphics solutions. The Fire Strike scores are especially illustrative. The A10-7850K is twice as fast as the Haswell’s GT2-class graphics and 50% faster than the Radeon HD 8670D core integrated into the A10-6800K APU. It is even a little better than the discrete card Radeon R7 250 with DDR3 memory. We might expect this as the senior Spectre version has 512 shader processors whereas the Richland and the Radeon R7 250 have only 384.
However, the low bandwidth of dual-channel DDR3 SDRAM which is used on Socket FM2+ platforms doesn’t let the A10-7850K show its full graphics potential. The Radeon R7 250 with GDDR5 memory is considerably faster although its GPU is inferior in specs. So if AMD wants to make its integrated graphics better, it should consider upgrading to memory subsystems with much higher bandwidth or introducing a large and high-speed cache as in Intel’s Iris Pro Graphics.
Anyway, 3DMark is a synthetic benchmark, so it wouldn't be quite correct to form any general conclusions on its basis. Let’s first check out the integrated graphics cores in actual 3D games. There are two test modes: 1) Full-HD resolution (1920x1080) with low or medium visual quality settings and 2) 1280x720 pixels with medium or high visual quality settings. We do not enable full-screen antialiasing.
Battlefield 4 is a highly popular multiplayer shooter with rather high system requirements. The A10-7850K’s integrated graphics is quite capable of delivering a playable frame rate at the Full-HD resolution. You can even try to enable medium visual quality settings. The other integrated graphics solutions can’t offer such a high level of performance.
And if the resolution is dropped to 720p, the A10-7850K even allows using high visual quality settings. On the other hand, the A10-7850K is inferior to the discrete Radeon R7 250 cards in this case, irrespective of the type of onboard memory. So the low clock rate also seems to be a problem for the Spectre graphics.
Developed by Codemasters, F1 2013 is a racing simulator which employs the EGO 3.0 technology we can also find in the DiRT and GRiD game series. Such games do not have high system requirements, so you can enjoy F1 2013 on an integrated graphics core even with high visual quality settings. Although the A10-7850K is inferior to the discrete graphics cards of the Radeon R7 250 class here, it does ensure a playable frame rate. Well, we must admit that Intel’s Haswell processors with GT2 graphics are also quite sufficient for F1 2013 as they are a mere 5% slower than the A10-7850K at the Full-HD resolution. The game is just too CPU-dependent and the Kaveri’s x86 performance isn’t high.
Metro: Last Light is a first-person shooter that is one of the most resource-consuming games in terms of hardware requirements. So it is no wonder that it runs rather slow on the A10-7850K if you choose the Full-HD resolution. It is only at the 720p resolution that you may try increasing the visual quality settings to medium level. The low Full-HD performance of the A10-7850K in this game must be due to a lack of memory bandwidth. As you can see, the DDR3 version of Radeon R7 250 is even slower whereas the A10-7850K is a mere 6% ahead of the A10-6800K despite the significant difference between their integrated GPUs.
The latest third-person action game in the Tomb Raider series offers the gamer a highly realistic and visually rich world. Despite this, it can run fast enough with minimum settings on integrated graphics cores. The frame rate is high on the AMD APUs even at the Full-HD resolution. The Kaveri should be given credit for allowing to use medium visual quality settings at 1920x1080, maintaining a playable frame rate.
Still, the Radeon R7 250 card with only 384 shader processors but GDDR5 memory beats the A10-7850K by 50%. The new APU is a mere 6% ahead of its Richland predecessor, so it looks like the Kaveri’s 512 shader processors can’t be put to good use. AMD should have optimized the memory subsystem instead.
The highly popular MMO simulator World of Tanks is played by lots of gamers on different PC configurations. And the A10-7850K seems to be suitable for it, too. Its integrated graphics core makes the game playable at the Full-HD resolution with medium visual quality settings. However, the Kaveri doesn’t differ much from the senior Richland model here, so the key problem of its integrated graphics core - the low memory bandwidth - shows up once again. We can see that the discrete card Radeon R7 250 is 38% faster although has lower theoretical performance. It is just equipped with fast GDDR5 memory.
Summing up our graphics tests, we can say that the A10-7850K is indeed faster than any other hybrid processor. The GCN architecture and the increased number of shader processors give the Kaveri’s integrated graphics core an advantage of about 10% over the A10-6800K’s graphics. That’s enough to make most games playable on our Socket FM2+ configuration with A10-7850K processor at Full-HD and medium visual quality settings.
Unfortunately, the graphics core of AMD’s new APU is not good in all applications. Some resource-consuming shooters run too slow on the Kaveri even if you choose minimum visual quality settings and Full-HD resolution. The integrated graphics is fast but its memory bandwidth is low. The Kaveri’s dual-channel DDR3 SDRAM limits the potential of the Spectre GPU.
CPU and GPU tests would have been sufficient for benchmarking hybrid processors in the past, but now there’s a lot of applications that can make use of both types of computing cores simultaneously. Such heterogeneous applications utilize the OpenCL 1.1 framework which provides methods for executing computations on the graphics core’s shader pipelines. AMD claims that the majority of multimedia content authoring and processing applications can effectively use all the computing resources offered by today’s APUs. The prospective HSA concept is supposed to make the combined use of CPU and GPU resources easier.
HSA is still far from practical implementation, but there are already applications which can use the GPU via OpenCL 1.1. These include free software…
…as well as commercial products.
Ideally, we wouldn’t want to use special tests to check out OpenCL performance. It would be better if hybrid processors were natively supported by everyday applications including those we use for our standard performance testing. But today heterogeneous computing is not implemented widely. In most cases OpenCL acceleration is only used for specific operations and we need specific tests to see it. That's why we have to dedicate a special section of our review to heterogeneous computing.
Talking about the performance benefits ensured by hybrid processors, AMD uses synthetic benchmarks. Of course, it is easier to develop a special algorithm that can showcase the benefits of heterogeneous computing rather than to redesign an existing application.
So the most popular OpenCL benchmark is Basemark CL. It helps test an APU at three types of tasks: image processing (noise reduction, antialiasing and sharpening), physics modeling (hydrodynamic and wave processes, soft substance modeling), and fractal generation.
It goes without saying that specific tasks can enjoy a huge performance boost via GPU resources. Basemark CL is meant to showcase the computing potential of today's integrated graphics cores. AMD’s APUs have more advanced graphics cores, so their computing potential is higher. With OpenCL optimizations enabled, the A10-7850K is almost twice as fast the Intel’s processors. AMD puts an emphasis of such test results, suggesting that AMD APUs may be superior to their opponents in a world where most resource-consuming applications use both x86 and graphics resources. The question is whether we will ever live in such a world, too.
Now let’s see what we have in actual applications rather than synthetic benchmarks. As usual, we will start out with WinZIP which has been supporting OpenCL since its last version. Like in many other real-life applications, the GPU-based acceleration doesn't work always in WinZIP. The utility only applies it to compressing files larger than 8 megabytes. We don't want to handpick any files, so we just compress a folder with an Adobe Photoshop CC distribution.
WinZIP’s OpenCL acceleration doesn’t produce a big effect and cannot change the overall picture. Intel processors used to be faster in compression tests, so they remain in the lead with OpenCL, too. Moreover, the Haswell processors enjoy a larger performance boost from OpenCL than the Kaveri or Richland do.
The latest versions of the Libre Office suite have introduced experimental OpenCL support. Particularly, the Calc application may apply GPU resources to computing formulas. In our test we measure the time it takes to recalculate a spreadsheet with financial data.
The OpenCL optimization is not yet polished off in Libre Office Calc, so we can even see a performance hit when the GPU is used for computing. The Kaveri APUs can’t beat Intel’s Haswell processors irrespective of whether the OpenCL support is turned on or off.
The popular image editing application Adobe Photoshop CC is also declared to support OpenCL. However, this support is limited to only a few filters. AMD recommends benchmarking performance while applying the Smart Sharpen filter. We do so with a 24-megapixel image in our test.
Everything works exactly as it should in this case. The Smart Sharpen filter works faster with GPU-based acceleration on both AMD and Intel processors. The Kaveri-based configuration enjoys a larger performance boost than the other systems, yet the A10-7850K is still inferior to both the Core i5-4430 and the Core i3-4340 even with the OpenCL optimization turned on. It is just important to have fast x86 cores for Photoshop.
Another example of a popular OpenCL-compatible application is the professional video editing tool Sony Vegas Pro 12. When rendering video, it can distribute the load among all the computing resources of hybrid processors.
Just like in the previous test, AMD’s APUs enjoy a substantial performance boost after we enable OpenCL acceleration in Sony Vegas. It amounts to 60% but can't help them beat their Intel opponents. The fact is Intel's Haswell processors support OpenCL, too, and thus accelerate in the same manner. Besides, even with the GPU resources in use, the performance of the x86 cores remains most important. So AMD’s claims that the fast integrated graphics core and software optimizations will make AMD APUs superior to Intel’s hybrid processors do not seem to come true.
HD video transcoding is a special topic when it comes to heterogeneous computing. Intel processors feature a special Quick Sync engine for that purpose which provides hardware transcoding acceleration. AMD offers its VCE engine with the same functionality but it is not used in practice. Instead, the existing applications make use of OpenCL to accelerate video transcoding. To check out the performance benefits, we will run MediaCoder 0.8.28. We transcode an original 1080p@50fps AVC file from the x264 FHD Benchmark 1.0.1 with a bitrate of about 30 Mbps.
The OpenCL acceleration for video transcoding helps ensure certain performance benefits for AMD processors but can't make them competitive to Intel's products with Quick Sync. The high efficiency of the hardware Quick Sync acceleration can’t be achieved with any other means as yet.
Summing it up, we can say that the new Kaveri APUs cannot offer the same performance as comparably priced Intel Haswell processors even in heterogeneous computing applications available today. Theoretically, this may change with the implementation of HSA, but we can’t really be sure about the benefits of HSA and whether it will be implemented at all.
Our tests suggest that the Kaveri isn’t a large improvement over the Richland in terms of performance. But it is expected to be better in terms of heat dissipation and power consumption. AMD specifies a lower TDP for it. Then, the Kaveri is manufactured on a more advanced tech process. And the clock rates of the new A10-class APUs are lower than those of their predecessors. So we hope that the new APUs will be competitive in terms of energy efficiency. Let’s check this out.
The graphs below (unless specified otherwise) show the full power draw of the computer (without the monitor) from the wall socket. It is the total power consumption of all system components. The PSU's efficiency is taken into account but our Corsair AX760i is a highly efficient 80 PLUS Platinum product, so its effect on the result is very small. The x86 cores are loaded by running the 64-bit version of LinX 0.6.5 utility with support for AVX, AVX2 and FMA instructions. The graphics core is loaded by running Furmark 1.13.0. Moreover, we enable Turbo technology and all power-saving technologies to correctly measure the computer's power draw: Intel’s C1E, C6, Enhanced Intel SpeedStep and AMD’s Cool’n’Quiet.
Modern processors don’t consume much power when idle, so the numbers in the diagram are indicative of the power consumption of the whole platform rather than of the APUs. And we don’t see much difference between the LGA1150, Socket FM2 and Socket FM2+ platforms. All of them are economical at zero load.
When the processor has some work to do, we see a typical picture: AMD's APUs need more power than their Intel opponents but deliver lower performance. In other words, the Kaveri's x86 performance per watt is still inferior to the Haswell's. On the other hand, there’s some obvious progress as the A10-7850K consumes 11 watts less than the flagship Richland model.
We have the same picture at graphics load. The A10-7850K consumes more power than Intel’s Haswell-based products but less than its Richland predecessor. Lower power consumption rather than higher performance seems to have been the main goal for the Kaveri developers.
We get the most impressive picture when all of the APU's resources are loaded concurrently.
Here, the A10-7850K is more energy efficient not only than its predecessor but also than the quad-core i5-4430. Moreover, the senior quad-core Kaveri is close to the dual-core Haswell in terms of power draw.
Well, it turns out that the A10-7850K needs about the same amount of power when there’s high load on its x86 cores irrespective of whether the graphics core is loaded or not. How is it possible? The fact is the Kaveri's peak power draw is limited, so when all of the APU's resources are in use, the CPU and GPU clock rates are dropped down, and quite heavily so.
The CPU section is clocked at 3.0GHz whereas the graphics core frequency is periodically lowered from the default 720 MHz to 650 MHz. That’s why the peak power consumption of the Socket FM2+ platform with A10-7850K is limited to 116 watts in our testing.
The clock rate reduction helps keep the APU's appetite within specified limits but the peak heterogeneous performance suffers. It looks like AMD’s claims of the peak combined performance at 856 gigaflops are not true because the A10-7850K can’t work with its x86 and graphics cores all working at the default clock rates. The APU’s actual performance is about 760 gigaflops due to the clock rate drop.
This effect is going to be persistent because heterogeneous computing is about using all of an APU’s resources simultaneously.
The senior Kaveri model, A10-7850K, is formally an overclocker-friendly APU with unlocked frequency multiplier. This is indicated by the letter K in its model name and by the Black Edition designation on its packaging. However, the 28nm tech process doesn’t endow the Kaveri series with high frequency potential. On the contrary, it is the reason why the A10-7850K works at lower clock rates than the A10-6800K. The new APUs are going to be less overclockable than their predecessors which themselves were far from breaking any overclocking records.
And that’s exactly what we have in practice. The top clock rate our A10-7850K was stable at and didn’t drop its frequency due to overheat was 4.4 GHz. We had had to increase its voltage to 1.44 volts for that.
Besides the x86 section, the integrated graphics core can be overclocked. After increasing the voltage on the CPU-integrated North Bridge to 1.3 volts, we made the graphics core stable at 900 MHz.
The A10-7850K lets you overclock system memory, too. The top frequency is limited to DDR3-2400, though. It means the high-speed DDR3 SDRAM modes available on the LGA1150 platforms do not work on AMD’s new platform although they might improve the integrated graphics core’s performance which obviously lacks memory bandwidth.
With the CPU, GPU and DDR3 SDRAM overclocked, we managed to increase our 3DMark Fire Strike score to 1785 points. It means our overclocking ensured a 15% performance boost.
So, the Kaveri series isn’t good for overclocking. Its frequency potential seems to be smaller even in comparison with the Richland APUs which allowed to overclock their x86 cores to 4.7 or 4.8 GHz and their graphics cores to 1.2 GHz. The Kaveri’s new microarchitecture and 28nm tech process only worsened their overclocking capabilities.
The Kaveri brings about a few innovations and technologies like hardware HSA support, but most of them offer no real benefits right now.
Promoting the Kaveri series to the desktop market, AMD puts an emphasis on several things such as the Steamroller microarchitecture with allegedly higher efficiency, the fast graphics core with GCN architecture, HSA support for heterogeneous computing, and affordable pricing.
However, these advantages are all very questionable. The new Steamroller microarchitecture ensures but a small performance growth which is negated by the reduced clock rates of the new APU models. As a result, senior Richland-based APUs are even faster than the new Kaveri in terms of x86 performance.
The potentially fast graphics core is limited by the low memory bandwidth. The A10-7850K has a third more shader processors than the A10-6800K but its actual gaming performance is only 10% higher. It goes without saying that the Kaveri’s integrated graphics is superior to any other, yet it still cannot ensure a playable frame rate in any game at the Full-HD resolution even with low visual quality settings. It must be granted, though, that the A10-7850K delivers a high enough frame rate at 1920x1080 in a number of popular games including some online ones.
As for HSA, the related hUMA and hQ technologies look highly promising but it'll take a lot of time until they become practically applicable. With the current implementation of heterogeneous computing the Kaveri series aren’t any faster than their Intel competitors. The OpenCL support is but rarely implemented in modern applications. Besides, Intel’s APUs benefit from it just like AMD’s, so this doesn’t change the overall picture.
For all that, AMD prices the A10-7850K very high, pitting it against junior Core i5 models which are actually much faster except when we use the integrated graphics. That’s why the A10-7850K seems only to be interesting for users of compact entry-level gaming configurations.