True Fusion: AMD A8-3800 APU Review

Desktop Lynx platform that includes hybrid Llano processors has finally found its way to the consumers. Let’s take a closer look at it and find out how successful the combination of old Stars processor cores and a high-performance Radeon GPU actually is.

by Ilya Gavrichenkov
06/29/2011 | 09:00 PM

AMD has never managed to outdo Intel in CPU sales volumes. However, it doesn’t at all mean that this company plays a secondary role in the x86 processors world. There are multiple examples when AMD’s innovations in the development of their own x86-processor microarchitecture would eventually turn into global marketing trends also accepted by Intel at some point. It was actually AMD who developed 64-bit extensions to the x86 architecture and was the first to implement them in actual products. And now they have become an inalienable part of the processor microarchitecture. AMD was the one who pointed out the advantages of integrating the processor and the chipset North Bridge and was the first to move the memory controller into the CPU. If we go back digging into the history of processors, we will be able to come up with a bunch of episodes like that. All of them indicate that the “second player” in the processor market is not an outsider at all. On the contrary, they possess enormous engineering and technological potential.

 

A few years back this potential was enforced by acquisition of ATI Technologies, which gave AMD access to top-of-the-line graphics technologies. These technologies helped AMD to come up with even greater innovations known as Fusion. The idea behind Fusion implies combining traditional computational cores with the graphics core that contains a lot of stream processors capable of processing parallel calculations efficiently.

A processor with an integrated graphics core inside will hardly surprise anyone today. Intel has already been shipping products like that for a while. But AMD offers a different approach to symbiosis of computational and graphics cores. According to company engineers, the graphics core should be responsible not only for displaying the image on the monitor. It should also be involved into common processor functioning. The architecture of contemporary graphics cores allows them to process large data arrays in parallel. Therefore, the computational reserves of the graphics core can be effectively involved in a number of tasks, such as image and video processing, cryptographic algorithms and some scientific problems. Of course, it will require the existing software to be properly optimized, but the resulting performance improvement will be a clear indication that Fusion concept makes perfect sense.

We are already familiar with the very first products engineered within the Fusion concept. Processors codenamed Ontario and Zacate, which belong to the E- and C-series, have proven very effective in inexpensive desktop and mobile systems based on Brazos platform. However, they are primarily targeted for compact and energy-efficient systems, that is why they have relatively low performance and pretty limited application field. It is clear that AMD will need more mainstream platforms and processors offering sufficient level of performance for this market segment in order to fully integrate Fusion concept into the world. Therefore, after Brazos AMD launches two new Fusion platforms: the mobile Sabine and desktop Lynx. Both these platforms are based on the new A-series processors codenamed Llano, which will be the ones defending Fusion’s honor in the mainstream processor market segment.

Today we are going to talk about first desktop Llano processors and will try to estimate how badly we need the revolutionary hybrid architecture so aggressively promoted by AMD.

Llano: What Is Inside?

Llano Is Fusion

In global terms Fusion concept implies that traditional processor cores and graphics cores merge on the hardware and software level. Therefore, any APU (accelerated processor unit) designed according to Fusion principles has a typical structure, which we have already discussed when we talked about Ontario and Zacate. Just like these energy-efficient APUs, Llano processors have computational x86 cores, a graphics core and a North Bridge. However, Llano and Zacate are similar only superficially.

Higher status of the new Llano processors that are positioned for mainstream systems called for faster APU components than those used in Zacate. Llano’s computational x86 cores are based on fully functional Stars microarchitecture and not on simplified Bobcat. There may be two, three or four computational cores like that inside an APU. Llano graphics core contains 320 or 400 stream processors, which is 4-5 times more than what Brazos processors have. And the built-in North Bridge supports high-speed dual-channel memory and has fully-fledged PCI Express interface for external graphics accelerators.

At the same time although Llano is a second processor in the Fusion family, the overall internal structure is pretty much the only principally new thing in it. In other words, the entire new APU is put together using old components with minimal innovations. If we were to characterize the newcomer without going too deep into details we would say that it were a combination of Athlon II X4 processor, Radeon HD 5570 graphics core and AMD 870 chipset put inside the same packaging onto a single semiconductor die. Of course, a description like that is very relative, because the new Llano does have minor innovations and unique technical implementations, but there is nothing principally new in this APU.

However, the mere fact that several very complex units were put together inside a single semiconductor die is already a serious step forward. It was only possible due to fine 32 nm process, which AMD’s production partner – GlobalFoundries – has finally fully mastered. So, AMD can now offer processors manufactured using the latest production process, which Intel has been using for about a year already.


Llano semiconductor die

The internal complexity of Llano semiconductor dies increased to 1.45 billion transistors. In this aspect new hybrid processors from AMD have slightly surpassed Intel’s Sandy Bridge. However, both - Llano and Sandy Bridge - have very similar die size: 228 vs. 216 mm2. It means that these processors should have comparable production cost, if we disregard the fact that Intel’s engineering team invested significantly more effort into their Sandy Bridge.

However, the similarity in semiconductor die characteristics between Llano and Sandy Bridge doesn’t really mean anything. These processors have dramatically different distribution of the “transistor budget”. While Intel product is just a CPU with an integrated graphics core, which occupies no more than 20% of the die, AMD Llano has its graphics core under much greater emphasis. Therefore, it takes up as much space on the processor die as four processor cores combined.

It is obviously the best illustration of what Llano can offer users. AMD bets on what they can currently do best: the graphics core. The processor cores have stepped back for the moment, that is why the success of the new processor will be closely connected with the overall success of the Fusion concept. If AMD really manages to move the major operational load to the GPU stream processors, then Llano will undoubtedly be superior to all existing competitors. However, it is still too early to talk about it: most software still works the old way, i.e. relying on traditional x86 processor cores.

Husky CPU Cores

The computational x86 cores in the new Llano processors are codenamed Husky. However, there is not that much new about them, actually. They use the same K10 “Stars” microarchitecture, which is currently used in all Socket AM3 processors. And the promising Bulldozer microarchitecture should arrive into the hybrid AMD processors only next year. So, at this point we shouldn’t expect Llano to work any performance wonders, at least in traditional applications.

At the same time, AMD engineers tried to freshen up the old microarchitecture in some way and improve Husky cores performance at least a little bit over that of the cores inside Athlon II and Phenom II processors. We are going to see during our test session, how successful this attempt was, but the official data claim a 6% average boost.

It was very easy to achieve: by simply increasing the L2 cache memory. So, each Llano core has its own 1 MB L2 cache. However, the new processors have absolutely no shared L3 cache, so their total cache-memory size is not that big after all, according to today’s standards.

Besides increased cache, Husky cores have improved branch prediction unit and optimized primary buffers: instructions reorder buffer becomes 20% larger, and the load/store buffers have doubled in size. Also Husky has a separate hardware divider, which accelerates corresponding operations. As we see, there are not too many changes, and all of them are minor. The basis of K10 microarchitecture remained completely untouched that is why Husky can only process no more than three instructions per clock cycle, just like all preceding cores.

Obviously, the developers didn’t really focus on increasing the performance of the x86 cores in Llano processors. They had different priorities. First of all, they had to do their best to improve energy-efficiency of the new Husky, because CPUs on K10 microarchitecture can’t boast good power readings. Secondly, they had to seriously address the connection and communication between system memory, computational cores and the graphics core. This is where all the significant improvements and innovations actually took place.

Sumo GPU Core

Do not forget that Llano acquired a high-performance graphics core called AMD Sumo. Just like with Zacate processors, the architectural principles used in this core come from the discrete graphics solutions. But considering Llano’s positioning in the market, AMD decided to integrate much more powerful graphics into their new processor with the same number of execution units as in Radeon HD 5570.

In fact, it is true: Sumo is a close relative of the discrete Redwood GPU on AMD VLIW5 architecture. In other words, the highest-end APU in the Llano family will have a graphics accelerator with 400 stream processors, 20 texturing units and 8 raster units. Lower-cost APUs will have one of the SIMD units disabled by the manufacturer and the number of stream processors will be reduced to 320, making the Sumo core more like a Radeon HD 5550.

Compared to the original Redwood, Sumo underwent only two major changes. First, they modified the memory interface to ensure that the GPU can work with dual-channel DDR3 SDRAM via the processor North Bridge and not directly. Work with the memory sub-system is a traditional bottleneck of the integrated graphics cores, so when they put a high-speed GPU inside the new APU, they had to come up with some special optimizations. Unfortunately, Llano doesn’t have anything like Intel’s ring bus and Sumo can’t use processor cache-memory for its own needs. However, it boasts a different feature, which is totally new for a graphics core: it can write directly into the system memory skipping the processor cores. Moreover, all graphics core’s operations with the memory have higher priority in Llano processors than the requests from the computational cores, so they will be processed by the memory controller in the first place.

Secondly, the UDV (Unified Video Decoder) unit in Sumo has been replaced with a newer version. Version 3 of the video decoder in Llano processors supports all popular HD video formats and is compatible with MVC (Multi-View Codec) used for 3D video. So, unlike Brazos, Lynx platform can play 3D Blu-ray via HDMI interface. Moreover, UVD3 is more energy-efficient: it can work independently from the rest of the GPU, which allows turning off stream processors during video playback.

The above described changes allowed AMD to use a 6000-series marketing name for their Sumo graphics core. So, depending on the number of stream processors inside, Llano may have a Radeon HD 6550D or Radeon HD 6530D graphics core.

Integrated analogue of the Radeon HD 5570 graphics card is a pretty tempting offer, but it won’t suit everyone. Therefore, AMD made it possible to upgrade the graphics accelerator inside the APU by using CrossFire technology. In this case this technology is called Dual Graphics and allows pairing the integrated GPU with an add-on Radeon graphics card creating an asymmetrical CrossFire configuration.

The technology is truly impressive, because a Dual Graphics system can take on a graphics card of any type. However, this versatility creates a number of problems. First, you can only improve your graphics performance if the add-on accelerator is no more than twice as fast as the Sumo core integrated into your APU. In other words, if you have a Radeon HD 6850, it would make more sense to use it as a stand-alone discrete graphics card and fully disable the integrated graphics. The second restriction is even more serious. Asymmetrical CrossFire modes work only in DirectX 10 or 11, that is why in DirectX 9 or OpenGL games Dual Graphics won’t improve anything and the performance will drop to the level of the slowest GPU in the system. Nevertheless, Dual Graphics presents not very demanding users with a great way of saving some money, which makes Llano an excellent option for entry-level gaming systems.

North Bridge

The North Bridge integrated into the processor is responsible not only for working with the memory, but also for providing PCI Express support. It means that Lynx platform has become very similar in structure to Intel systems, and the actual mainboard core logic set has degenerated into a plain South Bridge.

Compared with the previous-generation processors, Llano memory controller has become much more functional. Yes, it still supports dual-channel memory, just like before, however today it officially supports not only DDR3-1333, but also DDR3-1600/1866 SDRAM. Although when you use high-speed DDR3-1866, it will only work if you install one module per channel.

The graphics core obviously needs higher memory bus bandwidth. Even AMD themselves point out that the use of DDR3-1333 memory may significantly slow down the integrated graphics. However, high-speed memory support is not the major peculiarity of the North Bridge. The much more interesting thing about it is the new data paths, which should optimize the APU graphics performance. Besides the traditional link between the CPU and the North Bridge, Llano has two additional bus connections between the GPU and the North Bridge.

The first one called Radeon Memory Bus is used for regular graphics applications. Its bandwidth is 29.8 GBps, which is the same as the data transfer rate of DDR3-1866 SDRAM. The memory requests going along this bus have the highest priority and the memory controller processes them even ahead of the processor requests.

The second bus between the graphics core and the memory controller is called AMD Fusion Compute Link and provides graphics core with direct access to the memory at the same time ensuring processor cache coherency. In other words, Fusion Compute Link is used for direct data transfer between the graphics and computational cores when they work as a team on the same task. Today this bus is barely used, but its benefits should really start to shine later on as Fusion concept becomes more popular and a larger number of applications are able to take full advantage of the APU functionality.

As for the PCI Express bus controller, Llano supports 24 second-generation lanes. Four lanes are assigned to connect to the chipset South Bridge, another four are given to peripheral devices. The remaining 16 lanes form a PCIe x16 bus for external graphics accelerators. This bus in its turn can be split in two PCIe x8, so Lynx platform supports CrossFire not only in Dual Graphics implementation, but also in a traditional configuration with two add-on cards.

Power Management and Turbo Core

Lowering power consumption was one of the major goals during Llano development. As we remember, processors with K10 microarchitecture were not particularly modest when it came to power consumption. And now besides the x86 cores they also have a powerful graphics core to deal with. So, AMD had to find a way to improve power consumption rates, otherwise Llano wouldn’t be able to take a decent place in the contemporary processor market, and especially in the mobile segment, where APUs with powerful integrated graphics may be especially needed.

The problem of high power consumption was partially solved when they started using new manufacturing process. 32 nm technology helped lower Llano’s core voltage to about 1.2-1.25 V, i.e. to the level of energy-efficient Athlon II modifications.

However, the major innovation is the ability to disconnect idle APU parts from the power rail via Power Gating. Although there are only two independent power lines assigned to Llano processors in Lynx platform – one for processor cores and one for the graphics core and the North Bridge – the power management system can power off selected processor units in a very flexible manner. Namely, the individual x86 cores, the graphics core or the UVD unit built into it can all be powered off independently.

As a result, AMD managed to design desktop processors with 100 W and 65 W TDP, and mobile processors with 45 W and 35 W TDP. In other words, AMD has finally come up with processors, which heat dissipation is comparable with that of Intel Sandy Bridge CPUs.

However, the TDP limitations resulted in pretty low clock speeds for Llano processors. None of the currently available models reaches 3 GHz threshold. As for the mobile products, their clock frequency doesn’t go beyond 2.6 GHz.

In this case AMD decided to use all available means of boosting their processors performance, including Turbo Core dynamic overclocking technology, which increases the clock frequency if some of the computational cores are idle. They have already implemented this technology in Phenom II X6 processors, but they did modify it a little for Llano. In the APU they determine the possibility to increase the processor clock frequency judging by the core utilization levels rather than their temperature. Temperature is secondary in this case: if it exceeds certain limits it may simply cause the Turbo mode to shut down.

The benefit of the new Turbo Core implementation is that in general case it produces repeatable results and almost doesn’t depend on the cooling system efficiency or the ambient temperature.

Unfortunately, dynamic overclocking technology in Llano processors is only applied to computational cores. The graphics core can only lower its frequency in order to reduce the power consumption and cannot speed up automatically. However engineers took into account the fact that when the graphics core is in full-speed mode, its power consumption increases, so if the x86 cores switch to Turbo mode in this case, the overall power consumption may get out of hand. Therefore, in 3D or Fusion applications that actively use graphics core stream processors Turbo Core technology may not kick in even if only one processor x86-core of the four is active.

Chipsets and Mainboards

Llano processor is not the only component of the Lynx platform. It also uses corresponding core logic sets. Since the APU already contains the memory and PCI Express bus controllers, the chipsets consist of only one chip – the South Bridge also known as FCH (Fusion Controller Hub).

AMD currently offers two FCH modifications: junior A55 and senior A75. The higher-end model has native support for all contemporary interfaces, including USB 3.0 and SATA 6 Gbps. The lower-end model can’t boast the same luxury and supports only old SATA 3 Gbps and USB 2.0 interfaces.

The flow-chart below illustrates the functionality of the A75 FCH very well:

The following table sums up the formal specifications of all FCH modifications:

Mainboards for desktop Llano processors based on the above described chipsets come with a special Socket FM1. In other words, they are incompatible with the existing Socket AM3 CPUs. Of course, Llano also cannot be installed into any of the old mainboards.


Socket FM1

Socket FM1 is of similar size as the common Socket AM3, although it has significantly different number of pins – 905. All these pins are necessary to ensure that the CPU is connected not only with the chipset and the memory sub-system, but also with the PCI Express x16 graphics bus and monitors.

Almost all leading mainboard manufacturers have already prepared mainboards with the new processor socket, so Llano owners won’t have any problems finding the right mainboard.

As for the cooling systems, AMD tried to maintain compatibility with the existing infrastructure, so the old Socket AM3 coolers will work just fine with the new processors.

Desktop Llano Lineup

The first batch of desktop Llano processors that is currently coming into the market includes four APU models with four x86 computational cores inside. Just like the mobile Llano APUs, they don’t have a marketing name and are assigned A8 and A6 series numbers. A8 series includes higher-end APU models with a Sumo graphics core featuring 400 stream processors, while the A6 series includes slower modifications with lower clock frequencies and “lighter” graphics core with only 320 stream processors.

It means that until dual-core Llano processors from A4 series come out, APUs are classified based on their graphics core capacity. The table below shows the base characteristics of the Radeon HD 6550D and Radeon HD 6530D graphics accelerators used in Llano APUs:

Here is the complete list of currently available Llano processors for the desktop Lynx platform:

 

We can clearly see that there are significant differences in the specs not only between the two different APU series. Two models within each series are, in fact, two totally different products with dramatically diverse heat dissipation levels. The highest-performing models have 100 W TDP, but at the same time there are models with 65 W TDP and Turbo Core support. The clock frequency of the energy-efficient processors has been reduced by about 20-25% below that of their 100 W counterparts, but the features and frequency of their graphics cores remain the same.

The first thing that catches your eye is that Llano’s clock speeds are obviously lower than the frequencies reached by the latest Athlon II and Phenom II processors. And it means that Llano are slower processors, which major advantage will be not the performance of x86 cores, but the performance of their graphics core. So, it makes sense to upgrade from a Socket AM3 to a Socket FM1 platform only for the sake of achieving better energy-efficiency and not in an attempt to boost the performance.

In the mobile segment new Llano processors are positioned as competitors to Core i3 and Core i5 based Sandy Bridge platforms. As for the desktop market, things are a little less rosy here: Llano definitely won’t be able to compete against Core i5 here because of their low clock frequencies and K10 microarchitecture developed back in 2007. Therefore, in the desktop segment AMD’s pricing strategy is to make the top A8 Llano processors an alternative to junior Core i3 models, which are dual-core CPUs, as you know.

The above described four Llano APU models won’t be the only ones. As we get closer to the back-to-school season AMD will roll out lower-cost dual- and triple-core A6 and A4 series processors with 65 W TDP, which will have 320 and 240 stream processors in their graphics cores. Namely, Llano family will expand aggressively into the lower-end price segments. As for the high-end segment, AMD has no plans to introduce Lynx platform there.

Testbed Configuration and Methodology

To ensure that reviewers get the best experience with the new Lynx platform AMD distributed samples of their top-of-the line A8-3850. However, we thought that this processor would be not the most appealing choice because of its very high power consumption. Luckily, we got our hands on the second model, too – A8-3800, which became the star of our today’s test session.

A8-3800 is very interesting for several reasons, but most importantly, it can be compared against Intel Core i3 processors not only in price, but also in power consumption. In other words, it is A8-3800 and not A8-3850 that should be regarded as the direct competitor to dual-core Sandy Bridge CPUs. This was the reasoning behind our decision to compare A8-3800 against two Intel dual-core CPUs with two different HD Graphics core modifications inside: Core i3-2100 (with Intel HD Graphics 2000) and Core i3-2150 (with Intel HD Graphics 3000). Also, we ran some tests on a cheaper and slower Sandy Bridge CPU – Pentium G850.

 

Keeping in mind low clock frequency of the A8-3800 processor, we decided to take the slowest Athlon II X4 we could find. That is why Athlon II X4 630 will represent the previous-generation AMD CPUs in our today’s test session.

As a result, we used the following hardware and software components in our tests:

We studied Llano’s performance in two ways. First we tested this processor in a system equipped with a discrete AMD Radeon HD 6970 graphics card. The second part of our test session was dedicated to the performance of Lynx platform as an integrated one. Therefore, in the second part of our tests we didn’t use Radeon HD 6970 any more, but compared the integrated core in our Llano processors with inexpensive graphics accelerators including AMD Radeon HD 5570, HD 6450 and HD 6570.

First Look: Llano vs. Propus

Before we get to “serious” testing, let’s take a quick look at Llano and Athlon II X4 processors working at the same clock frequencies. This will show us how efficient the improvements in the new Husky core actually are. AMD claims there is a 6% performance boost, however, they also take into account larger L2 cache. We decided to focus exclusively on the microarchitectural improvements, which could be tested using short synthetic benchmarks that do not overload the L2 cache memory. SiSoftware Sandra suite offers s decent set of tests for that.

To ensure a fair comparison we locked the clock frequency of Llano as well as Athlon II X4 processor at 2.4 GHz and disabled Turbo Core technology.

Microarchitectural improvements in Husky processors have very little practical value. However, the biggest gain occurs when the processor works with integer data. So, in most cases when Llano working at the same clock frequency outperforms the previous-generation processors, it will be either due to larger L2 cache memory or to stream processors of the advanced graphics core.

However, there is one more thing. AMD changed the memory controller in their Llano processors. Now it supports faster DDR3 memory and works with three independent memory access paths with different priority levels. We used Cachemem test from Aida64 suite to test its performance. In all tests our DDR3-1600 memory worked with 9-9-9-27-1T timings.

No improvement whatsoever. The memory controller needed additional arbitrage to ensure that not only processor cores but also the graphics core could properly work with the memory sub-system. As a result, the practical bandwidth of the bus between the processor cores and memory has dropped, because x86 cores do not have the highest priority level. L2 cache also slows down because of its increased size, which at the same time compensates the loss in speed.

Llano Performance with Discrete Graphics

If you use a powerful graphics accelerator in a Llano based system, it makes sense to disable the Radeon HD 6550D graphics core integrated into the APU, because in Dual Graphics mode the performance will actually drop. In this chapter we will see how Lynx system performs in our standard testbed with an AMD Radeon HD 6970 graphics card.

General Performance

As usual, we use Bapco SYSmark suite to estimate the processor performance in general-purpose tasks. It emulates the usage models in popular office and digital content creation and processing applications. The idea behind this test is fairly simple: it produces a single score characterizing the average computer performance.

Bapco has recently updated their testing suite and released SYSmark 2012, which includes new versions of popular applications and even more realistic usage scenarios. Unfortunately, AMD wasn’t happy with the way their processors performed in SYSmark 2012 and they initiated an argument with the benchmark developers. However, after carefully weighing the pros and cons we found AMD’s arguments unconvincing. The major issue was the fact that the new SYSmark 2012 suite didn’t include applications accelerated by the graphics cores in AMD APUs. However, do not forget that there are not many applications like that out there yet and the existing ones are not very popular at this time. Especially, since the stream processors of the APU graphics core are unavailable anyway during the tests of systems equipped with add-on graphics cards.

In other words, we decided not to turn our back at SYSmark 2012 and for now will continue using it for our performance testing. Especially, since in our opinion, the applications included into this suite are quite current and appropriate for drawing conclusions about the systems performance. Namely, SYSmark 2012 includes: ABBYY FineReader Pro 10.0, Adobe Acrobat Pro 9, Adobe After Effects CS5, Adobe Dreamweaver CS5, Adobe Photoshop CS5 Extended, Adobe Premiere Pro CS5, Adobe Flash player 10.1, AutoDesk 3DS Max 2011, AutoDesk AutoCAD 2011, Google Sketchup Pro 8, Microsoft Internet Explorer 9, Microsoft Office 2010, Mozilla Firefox Installer, Mozilla Firefox 3.6.8 and Winzip Pro 14.5.

It is actually quite logical that AMD is unhappy about the results in SYSmark 2012. A8-3800 is 33% behind Core 13-2100, but this is hardly unexpected outcome. Yes, A8-3800 is a quad-core processor, while Core i3-2100 has only two cores. However, Intel’s CPU supports Hyper-Threading, is based on more progressive microarchitecture and works at 30% higher clock frequency than AMD A8-3800.

Moreover, low clock frequency placed A8-3800 behind Intel Pentium G850 and even behind the previous-generation Athlon II X4 630. Unfortunately, it is a serious bottleneck of the new processor and even Turbo Core technology doesn’t make up for it fully. Having equipped their new Llano with a powerful graphics core and then trying to fit the CPU’s heat dissipation into acceptable range, AMD had to sacrifice the common x86-performance. Therefore, Llano doesn’t really shine in general-purpose tasks that haven’t been adapted for the whole AMD Fusion concept.

Let’s take a closer look at the performance scores SYSmark 2012 generates in different usage scenarios.

Office Productivity scenario emulates typical office tasks, such as text editing, electronic tables processing, email and Internet surfing. This scenario uses the following applications: ABBYY FineReader Pro 10.0, Adobe Acrobat Pro 9, Adobe Flash Player 10.1, Microsoft Excel 2010, Microsoft Internet Explorer 9, Microsoft Outlook 2010, Microsoft PowerPoint 2010, Microsoft Word 2010 and WinZip Pro 14.5.

Media Creation scenario emulates the creation of a video clip using previously taken digital images and videos. Here they use popular Adobe suites: Photoshop CS5 Extended, Premiere Pro CS5 and After Effects CS5.

Web Development is a scenario emulating web-site designing. It uses the following applications: Adobe Photoshop CS5 Extended, Adobe Premiere Pro CS5, Adobe Dreamweaver CS5, Mozilla Firefox 3.6.8 and Microsoft Internet Explorer 9.

Data/Financial Analysis scenario is devoted to statistical analysis and prediction of market trends performed in Microsoft Excel 2010.

3D Modeling scenario is fully dedicated to 3D objects and rendering of static and dynamic scenes using Adobe Photoshop CS5 Extended, Autodesk 3ds Max 2011, Autodesk AutoCAD 2011 and Google SketchUp Pro 8.

The last scenario called System Management creates backups and installs software and updates. It involves several different versions of Mozilla Firefox Installer and WinZip Pro 14.5.

Summing up the obtained results we should say that A8-3800 performs fairly well only in 3D Modeling test, where four fully-functional x86-cores is a very serious argument in the fight against dual-core Sandy Bridge CPUs. However, note that the new A8-3800 processor is always behind one of the slowest members of the Athlon II X4 family. So, if we look at Llano as a common CPU, and not an APU, then we can hardly call it a step forward at this point.

Gaming Performance

As you know, it is the graphics subsystem that determines the performance of the entire platform equipped with pretty high-speed processors in the majority of contemporary games. Therefore, we do our best to make sure that the graphics card is not loaded too heavily during the test session: we select the most CPU-dependent tests and all tests are performed without antialiasing and in far not the highest screen resolutions. In other words, obtained results allow us to analyze not that much the fps rate that can be achieved in systems equipped with contemporary graphics accelerators, but rather how well contemporary processors can cope with gaming workload. Therefore, the results help us determine how the tested CPUs will behave in the nearest future, when new faster graphics card models will be widely available.

Games like large L2 cache memory of the new A8-3800 processor. Therefore, even despite lower clock frequency, it performs as fast as the 2.8 GHz Athlon II X4 630. However, this is barely an achievement, because even dual-core Sandy Bridge processors without Hyper-Threading run faster almost in all games.

Archiving and Encryption

To test the processors performance during data archiving we resort to WinRAR archiving utility. Using maximum compression rate we archive a folder with multiple files 1.4 GB in total size.

Archiving algorithms work with data aggressively that is why A8-3800 is naturally ahead of the Athlon II X4 630, because Llano has larger L2 cache. However, it doesn’t save the newcomer from being defeated by inexpensive LGA1155 processors with almost 30% better results in WinRAR.

The processor performance during encryption is measured with an integrated benchmark from a popular cryptographic utility called TrueCrypt. I have to say that it can not only effectively utilize any number of processor cores, but also supports special AES instructions.

Dual-core Sandy Bridge processors are not always ahead of the quad-core AMD APUs. The diagram shows just the opposite situation this time. A8-3800 encrypts data way faster than Core i3-2100, which Intel deprived of encryption instructions support. However, Athlon II X4 630 still remains unreachable, because of Llano’s low clock frequency.

Image Editing

We measured the performance in Adobe Photoshop using our own benchmark made from Retouch Artists Photoshop Speed Test that has been creatively modified. It includes typical editing of four 10-megapixel images from a digital photo camera.

Adobe Photoshop has never favored AMD CPUs. Now things have become absurdly bad. Core i3-2100, which costs about the same as AMD A8-3800 is almost 70% faster than the new AMD processor. AMD is long due for a dramatic microarchitecture upgrade and hopefully when Bulldozer comes out we won’t see poor results like this any more.

We have also performed some tests in Adobe Photoshop Lightroom 3 program. The test scenario includes post-processing and export into JPEG format of a hundred 12-megapixel images in RAW format.

Batch image processing in Adobe Photoshop Lightroom is well optimized for multi-core architectures. Therefore, unlike Photoshop, A8-3800 does pretty well here. It is especially pleasing to see that Llano outperforms Athlon II X4 630, which means that in a number of cases new Husky cores do deliver a performance boost compared with the previous-generation processors.

Audio and Video Transcoding

We use Apple iTunes utility to test audio transcoding speed. It transcodes the contents of a CD disk into AAC format. Note that the typical peculiarity of this utility is its ability to utilize only a pair of processor cores.

The only thing that AMD processor can rely on in this rivalry with Core i3-2100 is its four fully-functional x86 cores. However, if the application can’t fully utilize them, the outcome is pre-determined. So, it is not surprising that A8-3800 was so slow in the iTunes test.

In order to measure how fast our testing participants can transcode a video into H.264 format we used x264 HD benchmark. It works with an original MPEG-2 video recorded in 720p resolution with 4 Mbps bitrate. I have to say that the results of this test are of great practical value, because the x264 codec is also part of numerous popular transcoding utilities, such as HandBrake, MeGUI, VirtualDub, etc.

x264 codec works pretty well on AMD processors. Moreover, it can effectively utilize all processor cores. That is why in this benchmark A8-3800 outperforms both – Pentium G850 as well as Core i3-2100. But lower clock frequency makes it fall behind the previous-generation Athlon II X4 630.

The performance in Adobe Premiere Pro is determined by the time it takes to render a Blu-ray project with a HDV 1080p25 video into H.264 format and apply different special effects to it.

A8-3800 does pretty well in Premiere Pro CS5. Four slow Husky cores outperform two fast Sandy Bridge cores in video processing tasks and the overall picture is hardly different from what we have just seen on the previous diagram.

We decided to add Cyberlink Media Espresso 6.5 to the list of benchmarks we use for video transcoding speed tests. This utility is particularly interesting because it allows using the graphics accelerator resources. We measured the time it took to transcode a small 10-minute H.264 1080p video clip into an iPhone 4-friendly format (H.264, 1280x720, 4 Mbps). In all tests we enabled ATI Stream technology supported by our Radeon HD 6970 graphics card, which accelerated the transcoding process.

This is the first application for HD video processing and the result is the same: A8-3800 falls somewhere between Pentium G850 and Core i3-2100.

Final Rendering

We use special Cinebench test to measure the final rendering speed in Maxon Cinema 4D.

Rendering is somewhat similar to video transcoding. Both these tasks scale beautifully as the number of available processor cores increases. That is why the results are quite predictable. A8-3800 outperforms the dual-core Pentium G850, but yields to the dual-core Core i3-2100, which is enforced with Hyper-Threading support.

Rendering speed in Autodesk 3ds max 2011 with both, Scanline as well as Mental Ray, was measured using SPECapc test.

The new A8-3800 runs at about the same speed in all applications supporting multi-threading. We could say that in these cases quad-core AMD processors demonstrate acceptable performance for this price range, however, they do compete against dual-core Sandy Bridge. So, if it comes to applications that aren’t so good at multi-threading, then A8-3800 immediately turns into an outsider. A particular disappointment about Llano is its low clock speed. This is the reason why A8 APU falls behind Athlon II X4, because the minor microarchitectural improvements have very little effect after all.

In other words, Llano processors are not the best choice for a system equipped with an external graphics card. Previous generation quad-core Socket AM3 processors, not to mention the competition, can deliver much higher performance.

Power Consumption

Well, Llano didn’t create a revolution with their computational cores performance, let’s see what happened to their power consumption. It is especially interesting to check out this parameter, because we were able to get a 65 W A8-3800, which should theoretically be comparable in energy and heat dissipation with Core i3-2100, which TDP is set at the same 65 W.

The graphs below show the full power draw of the computer (without the monitor) measured after the power supply. It is the total power consumption of all the system components. The PSU's efficiency is not taken into account. The processors are loaded by running the 64-bit LinX 0.6.4 utility. We enabled all the power-saving technologies for a correct measurement of the computer's power draw in idle mode: C1E, AMD Cool'n'Quiet and Enhanced Intel SpeedStep.

In idle mode A8-3800 does very well. This processor has the biggest number of transistors of all today’s testing participants, but loses in energy-efficiency only to Athlon II X4. Looks like AMD’s feature that allows disconnecting idle processor units from the power rail really works.

If only one computational core out of four is fully utilized, the power consumption readings look even more impressive. Here A8-3800 based system boasts the lowest power readings of all.

When all processor cores are fully loaded, Lynx platform with A8-3800 processor consumes more than its competitors. But nevertheless, we see that its power appetites have become significantly lower compared with the previous-generation AMD CPUs. This is a pretty pleasing fact that opens the door for Llano not only into energy-efficient systems, but also into the mobile segment.

Integrated Graphics Core Performance

Llano’s performance in a system equipped with an add-on graphics card is not very promising. And it is not only because A8-3800 loses to Core i3-2100 in most tests. The major disappointment was that Llano turned out slower than Athlon II X4.

However, it is not the diagnosis yet. The truth is that Llano is not your regular processor, but an APU with not only computational cores, but also a graphics core inside. Obviously, disabling this core makes Llano much less appealing, but there is also another way to use it, and that is with the integrated graphics up front and center. So, the second part of our test session will focus particularly on graphics performance the new APU can deliver in games and applications familiar with the whole Fusion concept.

To get the most extensive idea of the graphics performance delivered by Llano, we decided to compare our A8-3800 based system with the active integrated graphics core against the same system working with external graphics cards from the lower price range, such as Radeon HD 5570, HD 6450 and HD 6570. Moreover, we also tested Intel systems with integrated graphics built around Core i3-2100 and Core i3-2105. These CPUs have similar features and functionality, but have different graphics core modifications: Intel HD Graphics 2000 and Intel HD Graphics 3000. In addition to that we also tested AMD Dual Graphics technology in action by combining the A8-3800 graphics core with a Radeon HD 6570 in a CrossFire configuration aka Radeon HD 6630D2.

The gaming tests were performed in two modes: in 1280x800 resolution with low graphics quality settings and in 1680x1050 resolution with medium graphics quality settings. If in both these modes A8-3800 did well enough, then we also tested it in 1920x1080 with high graphics quality settings.

3DMark Vantage

This benchmark is pretty popular among gamers and it immediately sorts things out. While in the previous part of our test session where A8-3800 worked with discrete graphics its performance wasn’t that convincing, the integrated Radeon HD 6550D graphics core completely changes the first impression. The integrated system with Llano inside is getting extremely close to that of a 60-dollar Radeon HD 5570 and at the same time is farther ahead of systems with Intel’s graphics cores, which performance received a lot of enthusiastic reviews just recently.

3DMark 11

Another advantage of the new Llano processor over its integrated competitors is the fact that Radeon HD 6550D supports DirectX 11. Therefore, A8-3800 based system didn’t have any problems in 3DMark 11 test, while we can’t say the same about Intel products.

At the same time I would like to point out another peculiarity: the processor test score on A8-3800 turns out higher with add-on external graphics than with the integrated graphics core. The explanation of this mysterious phenomenon lies in Turbo Core technology. When the integrated Radeon HD 6550D core is on, the “thermal” budget of the computational x86 cores is smaller, that is why Turbo Core mode turns on not as frequently as it would with the integrated GPU being completely off.

Alien vs. Predator (2010)

It is another DirectX 11 game that doesn’t work on Intel’s integrated systems. As for A8-3800, it looks very decent leaving Radeon HD 6450 far behind in this game. Note that the performance of the integrated Radeon HD 6550D core is very similar to that of its discrete counterpart, Radeon HD 5570. However, the lack of proprietary video memory does affect the performance after all, so there is really no parity between Sumo and Redwood.

Dual Graphics configuration actually works pretty well here. It improves the performance of the Radeon HD 6570 graphics card by almost 50% at no extra cost.

Dirt 3

This game supports DirectX 11, but on systems using Intel’s graphics it only works as DirectX 9. However, it doesn’t help even the top Intel HD Graphics 3000 modification to avoid complete defeat. AMD’s graphics core integrated into their A8-3800 processor is about twice as fast.

Llano APUs set new performance standards for the Intel integrated graphics solutions. It is pretty remarkable that you can run this colorful contemporary game on the Lynx platform even in Full HD resolution and with highest graphics quality settings.

Far Cry 2

A8-3800 does pretty well in Far Cry 2 game. Nevertheless, this game is very sensitive to the graphics memory bandwidth that is why the integrated Radeon HD 6550D core is about 20% behind its discrete analogue here.

Left 4 Dead 2

This pretty old game supports only DirectX 9, so Dual Graphics configuration doesn’t work here. In other words, the results in this case are even lower than with a single stand-alone graphics accelerator.

Lost Planet 2

This game supports DirectX 11, but Intel’s graphics works via DirectX 9. This is probably why in 1680x1050 resolution Core i3-2105 based system manages to produce higher fps rate than a system with A8-3800 APU inside.

Mafia 2

Mafia 2 is not really an old game, but it uses only DirectX 9, which we immediately notice from the low score of the asymmetrical CrossFire configuration. Other than that, AMD’s integrated system works impeccably. Just as it should, it is more than twice as fast as the system with Core i3-2105 and Intel HD Graphics 3000 and almost catches up with Radeon HD 5570.

Metro 2033

Metro 2033 supports several different ways of rendering. We used DirectX 10 in our tests. In this case Intel HD Graphics and Radeon HD 6550D work in equal conditions, and Dual Graphics technology should work fine. But there was some problem with the AMD driver, and Dual Graphics produced only the minimal advantage in 1680x1050 resolution. However, the A8-3800 based system with one APU works perfectly leaving Intel HD Graphics 3000 far behind.

Starcraft 2

The unique thing about Starcraft 2 is that this game utilizes the processor resources very heavily. So, it looks like nothing we have seen in other games so far. Higher computational performance of the Sandy Bridge processors helps them outperform the A8-3800 based integrated platform in the lowest tested resolution. However, as soon as we switch to average image quality mode, Llano gets back on track. Unfortunately, Dual Graphics again experiences some problems. As we see, this technology is very selective when it comes to graphics acceleration and this game didn’t fall into the “good” category.

Tom Clancy’s H.A.W.X.2

The situation here is quite typical. Times when Intel’s integrated graphics core was superior are long gone and now AMD takes over the leadership in the integrated graphics department. Dual Graphics doesn’t work again, and to our great disappointment, we have been seeing this way too often during our test session. So, while this technology may seem promising, it doesn’t work that well at this time.

GPGPU Applications

When we tested the new AMD APU, we couldn’t help checking out those applications that use the graphics engine not for 3D graphics, but for calculations. Especially, since this is exactly what Fusion is all about: the GPU stream processors should get involved in computational work and should contribute to the overall APU speed.

At first we again resorted to synthetic benchmarks from SiSoftware Sandra 2011 suite. This suite helps measure the computational performance of shaders using OpenCL and DirectCompute software interfaces. Llano’s integrated graphics supports both of them, so there are no problems here. As for Sandy Bridge processors, Intel doesn’t allow access to the computational resources of the graphics cores, that is why all these calculations in Intel systems are performed by the x86 cores exclusively.

These particular results are the reason why AMD decided to get involved with Fusion concept. The graphics core can handle parallel calculations very well, that is why getting it involved in data processing may be an excellent way of improving the overall performance.

But that is a synthetic test that targets the stream processors specifically. It is much more interesting to see how well the APU will perform in real applications. AMD’s marketing people keep stressing that the number of applications optimized for APUs continues to grow constantly. You can check out this special section of the official corporate web-site to see what applications these are. Frankly speaking, this list is not particularly impressive, but there are a few interesting titles among the applications on that list. We decided to pick them for our today’s APU performance analysis, so that they could demonstrate how much better the APU is than a CPU in the optimized environment.

The first test measures HD video transcoding speed in Cyberlink Media Espresso 6.5. This utility may use UVD3 engine for decoding and stream processors for video encoding acceleration. And on Intel processors it can take advantage of Quick Sync technology.

The good news is that Radeon HD 6550D graphics core built into A8-3800 may be of great help during video transcoding. If it gets involved together with the x86 computational cores, transcoding is completed in almost half the time. However, the bad news is that when the stream processors of the graphics core are used, the performance of A8-3800 only reaches as far as Core i3-2100 would do without Quick Sync. As soon as Intel’s hardware coder is activated, AMD APU can no longer compete against Core i3-2100.

Among applications optimized for the APU, AMD mentions Microsoft Internet Explorer 9. It can really use graphics core resources to display web-pages and execute HTML5 and JavaScript code. But what about actual performance? To get some numeric representation of it we used two special tests. Futuremark Peacekeeper, which focuses primarily on performance during work with JavaScript, and a new HTML5 benchmark called Stimulant WebVizBench.

In the test with HTML5-pages A8-3800 falls between Core i3-2100 and Core i3-2105. And it can be considered a good result, especially since in JavaScript test the AMD solution is completely defeated by everyone including Core i3-2100.

Continuing with APU optimization for Internet applications, AMD also claims that the latest Flash player versions are also Fusion-friendly. We have known for a long time that this player can use UVD3 engine to accelerate video playback. But what about other applications? We will try to answer this question with the help of a multi-player online arcade game called Tanki Online built on one of the most advanced 3D Flash engines – Alternativa3D. The tests were performed using Adobe Flash Player 10.3.181.34.

The graphics processors resources are really utilized here, even though Flash version 10 doesn’t use the video card to display 3D graphics. However, it doesn’t help the AMD APU much. Core i3-2100 and Core i3-2105 processors are about 30% faster than A8-3800.

Another application, which AMD believes uses the graphics core resources effectively, is Windows Live Movie Maker 2011. Our test in this application was to create a small HD-video by compiling video, images and music.

A8-3800 graphics core is loaded quite heavily here, but Core i3 processors still cope with the same task much faster. They have special Quick Sync hardware coder, which comes in very handy during video processing.

Wrapping up our search for applications where new hybrid Llano processor can really shine, we ran a few tests in one more utility called ArcSoft Panorama Maker 5 Pro. It is used to put together a panoramic image using several photographs. This was our test task during the performance test.

APU is also fully supported here and the load on the graphics core is undeniable, but Core i3 is again better.

So, we managed to find multiple applications that really take full advantage of the APU, but it doesn’t produce the anticipated effect. We don’t see any significant performance increase, and Intel CPUs from the Core i3 family that work with similar tasks using only their x86 cores in most cases are way ahead of AMD Llano.

Nevertheless, we can’t deny that Llano has huge potential. Synthetic tests showed clearly that Radeon HD 6550D graphics core had tremendous computational power. So, let’s hope that the programs that will truly benefit from Fusion concept are yet to come in the nearest future. And as of today, the only real task where Llano will undoubtedly be faster than Core i3 thanks to the stream processors of its graphics core is hashed password cracking.

Power Consumption

We have already measured Llano’s power consumption in a system with a discrete graphics card. Now time has come to check out an integrated system on A8-3800. The testing conditions and methodology remained exactly the same.

In idle mode A8-3800 is extremely energy-efficient. A system built on it consumes significantly less power than a similar system with Intel Core i3 inside.

As soon as x86 workload hits, things change. Core i3 is not only faster, but also consumes less power. In defense of our A8-3800 processor I have to say that the system based on it also looks very good and has moderate power appetite.

Graphics load makes the difference in power consumption between Core i3 and A8-3800 very noticeable, and it is not in Llano’s favor. But do not forget that Radeon HD 6550D graphics core is about twice as fast as Intel HD Graphics 3000.

At the same time, A8-3800 seems to be a pretty decent choice for a media center PC. During video playback its power consumption is just as good as that of Core i3 processors. It indicates that power-saving technologies implemented in Llano processors are very effective in case of small partial loads.

Overclocking

AMD immediately gives us to understand that Lynx platform is not designed for overclocking fans. There are no Black Edition processor models with unlocked clock frequency multipliers among existing Llano APUs and there won’t ever be any. All processors in this family have locked multipliers for the processor and graphics core frequencies. Even AMD Overdrive utility doesn’t work with the new Socket FM1 processors.

However, it doesn’t eliminate the possibility of overclocking completely. Socket FM1 mainboards have the option to change the base clock generator frequency and you could use it to increase the processor clock frequency. However, it is important to keep in mind that this clock generator affects the frequencies of all system components equally. Therefore, if you increase the base clock by, say, 100 MHz above the nominal, you will simultaneously and proportionally overclock the processor, graphics core, memory and external system interfaces. The only multiplier in Llano that can actually be adjusted individually is the one for memory frequency and it can be set to 10.66x, 13.33x, 16.0x or 18.66x.

This is where the major problem emerges. The increase in the clock generator frequency quickly cases issues with SATA or USB devices detection and operation. And this (and not the processor) is the major factor limiting overclocking. According to the existing data, the maximum clock generator frequency when the system remains stable doesn’t exceed 120 MHz. However, as soon as you get past the 133 MHz threshold, most mainboards automatically adjust the multipliers used for external interface frequencies, so there might be one more operational interval somewhere between 133 and 150 MHz. Also note that the variety of clock generator frequencies, at which the system will remain stable during overclocking, may depend on the onboard controllers on each particular mainboard as well as on the configuration of the disk sub-system (for example, with some SSDs there might be fewer operational settings available).

We tried to overclock our A8-3800 processor on a Gigabyte GA-A75-D3H mainboard. The maximum base clock when our system remained stable was 146 MHz.

So, our APU in fact overclocked to 3.5 GHz, while the graphics core frequency increased proportionally from 600 MHz to 876 MHz. As for the memory, we used 13.33x multiplier and therefore could clock it as DDR3-1946. To ensure ongoing stability, we also increased the processor core voltage by 0.175 V.

Above described overclocking improved our system performance pretty significantly. According to 3DMark 11, the integrated graphics performance increased to the level of Radeon HD 6570, and the computational performance improved by 40%.

There is one unusual peculiarity about Socket FM1 overclocking. It turned out that the BIOS of some mainboards offers an option to increase the processor clock frequency multiplier and a separate option for the graphics core frequency. In reality these options do not work, but some diagnostic tools, such as CPU-Z take the multiplier values directly from the BIOS. So, Llano based systems produce all sorts of impressive screenshots showing how successfully Llano processors conquer the most outrageous heights. However, despite the readings in CPU-Z, the real Llano multiplier doesn’t change and the actual processor and graphics core frequencies depend only on the base clock.

Conclusion

First of all I would like to remind you that Llano is not really a new desktop processor, but mostly a solution that will finally let AMD feel very confident in the mobile PC market. However, today we did get a desktop modification and in this respect Llano makes very versatile impression.

As a new AMD processor that came to replace Phenom II and Athlon II, it doesn’t strike as particularly exciting. Of course, four computational cores of the new A8 processors may help them take the lead over Core i3 in applications that are well optimized for multi-threading, but this is hardly a consolation. The thing is that in systems equipped with discrete graphics Llano work slower than their predecessors from Athlon II X4 family. Llano processors still use x86 cores based on K10 microarchitecture from 2007 and all minor improvements are of purely cosmetic nature, while time has been long calling for a complete redesign. Moreover, trying to achieve acceptable TDP levels AMD had to lower the clock speeds. And even though Turbo Core technology was supposed to compensate lower frequencies, it doesn’t do it too well. Overclocking is also no remedy for this situation. Llano processors do not overclock well and even the Lynx platform itself doesn’t favor overclocking in general.

Luckily Llano is not a traditional processor, but an APU that contains not only x86 cores, but also a fast graphics core. And this particular peculiarity makes us look at Llano from a completely different angle. Built-in graphics is not just the fastest integrated solution in the today’s market. It is about twice as fast as Intel HD Graphics 3000 and delivers the same level of performance as some of the graphics cards in $50-$60 price range. At the same time Llano is pretty energy-efficient (especially the 65 W TDP models), which makes it an excellent choice for integrated low-end gaming home systems or high-performance HTPCs.

But in the end choosing Llano for any type of entry-level system will depend solely on your needs and goals. If gaming performance is your number 1 priority, but you are not ready to invest into a mainstream or high-end graphics card, then there is hardly anything better than an AMD A8 series processor. Especially, since there are at least two ways of boosting the graphics performance at minimal cost. The first one is overclocking, and the second one is Dual Graphics technology that allows combining APU graphics with an inexpensive discrete graphics card within an asymmetric CrossFire configuration.

However, if you are looking to build an inexpensive platform with high computational potential, then Llano won’t be the way to go. And in this case it doesn’t make sense to pay extra for the high-quality graphics core built into this APU. Although Fusion concept implies that stream processors from the APU graphics core can be used for calculations and acceleration of popular general-purpose applications, it doesn’t yet work well enough.

I have to say that this isn’t a final verdict, and the APU idea may have another brilliant comeback one day. The integration of hybrid processors into the existing infrastructure is at its early stage now, and in two-three years things may change dramatically. But at that point there will be new generation APUs, and today Llano is merely a locomotive of evolution helping plant the Fusion concept into the minds of computer users and software developers.