AMD Phenom Changes Stepping to B3: TLB Bug – in the Past

We have just completed the tests of a new AMD Phenom X4 9850 Black Edition processor using the new B3 stepping. AMD managed to quickly fix the TLB bug and increased the clock frequencies of its quad-core processors. But will this be enough to make Phenom a more promising choice for consumers?

by Ilya Gavrichenkov
03/26/2008 | 09:00 PM

I don’t think that you will consider our recent reviews of quad-core AMD Phenom processors optimistic in any way. Unfortunately, the micro-architectural improvements didn’t help AMD achieve any parity with the competitor’s solutions. That is why the AMD Phenom processors that have been in the market until this day were defeated by Intel Core 2 Quad in terms of performance as well as heat dissipation. Moreover, AMD Phenom didn’t hit competitive working frequencies, too. However, the most frustrating drawback of the new quad-core AMD processors was the so-called “TLB-bug”, which software fix affected the systems performance quite noticeably. And even though this bug didn’t really show that often in desktop platforms, it still had a negative effect on the overall Phenom image. Especially, since it did flowing in the server market forcing AMD to even temporarily stop shipping their quad-core Opteron processors aka Barcelona.

 

That is why all the engineering effort was thrown to fix the notorious TBL-bus on the hardware level as soon as possible. And the resolution didn’t keep us waiting for too long. Today AMD officially launched Phenom processors based on new B3 stepping that is free from this TLB issue. The core doesn’t have any other improvements as of yet, but nevertheless, this announcement allows AMD to improve the consumer attractiveness of their solutions. Besides, the clock speeds get a little higher and the prices drop a little lower. As a result, refreshed Phenom processors now look much better and more attractive than before.

The new Phenom processor family on B3 stepping includes four models: 9550, 9650, 9750 and 9850 Black Edition. The youngest models replace Phenom 9500 and Phenom 9600, while the two top models push the clock frequency to 2.4 and 2.5GHz respectively. Note that the last two digits – “50” – in the CPU marking stand for the new processor stepping that doesn’t have the TLB-bug anymore. Nevertheless, new Phenom processors shouldn’t be any different from practical standpoint from the old ones working at the same clock frequency and with software TLB fix disabled. The advantage of the new processor stepping is mainly no need to use the patch that aggravates the performance. Especially since its enabling or disabling may require certain effort and qualifications. While most mainboard manufacturers made it possible to activate or deactivate the patch from their BIOS Setup, Windows Vista SP1 that also has a fix for this bug doesn’t offer this flexibility and enables bus fix automatically, no matter what the user actually wants. In this case the only way for the owners of processors on old B2 stepping to avoid performance drop even at the expense of some system stability is to use special utilities such as AMD Overdrive, for instance.

Besides, AMD also officially announced their triple-core processors known as Toliman. At this time they will only be distributed among AMD OEM partners and will not get into retail, so we will have to postpone our review of these CPUs for a little while. Especially since at first they will still be using B2 processor stepping. However, we had to mention triple-core Phenom processors here, because AMD has once again changed their CPU marking strategy a little bit. Quad-core Phenom CPUs will now be called Phenom X4, triple-core – Phenom X3 and dual-core processors will remain Athlon X2.

AMD Phenom X4 9850 Black Edition

AMD offered us their Phenom X4 9850 Black Edition processor to check out the performance and features of the new B3 processor stepping. It is a new top of the line representative of AMD’s quad-core processor family that is designed to work at 2.5GHz clock frequency, which is 200MHz higher than what the top processor on B2 stepping used to support. This way AMD finally managed to push the clock speeds of its quad-core processors to the level of the youngest Intel Core 2 Quad CPUs. However, this frequency increase results not from the improved frequency potential of the new processor stepping, but from the more advanced production process. That is why the new processors with 2.4-2.5GHz clock speeds also boast higher TDP of 125W.

Besides increase clock frequency, Phenom X4 9850 processor has one more unique distinguishing feature: the frequency of the built-in North Bridge has also been raised to 2.0GHz. This frequency is very important for the L3 cache and DDR2 SDRAM memory controller. That is why we can expect Phenom X4 9850 to work more effectively with the memory subsystem than its predecessors. However, it didn’t really affect the formal specs of the supported memory. Phenom X4 9850 works with dual-channel DDR2-533/667/800/1066 SDRAM and supports Ganged and Unganged modes that we have already discussed in our previous articles.

The complete list of Phenom X4 Black Edition specifications is given below:

AMD Phenom 9850 Black Edition

Marking

HD985ZXAJ4BGH

Clock frequency

2.5 GHz

Packaging

Socket AM2+

L2 cache

4 x 512KB

L3 cache

2MB

Memory controller

Dual-channel 128-bit

Supported memory types

DDR2-533/667/800/1066 SDRAM

North Bridge frequency

2.0GHz

HyperTransport bus frequency

2.0GHz (HyperTransport 3.0)

Processor stepping

B3

Production process

65nm, SOI

Transistors

450 mln.

Die size

285sq.mm

TDP

125W

Max ambient case temperature

61ºC

Vcore

1.2V - 1.3V

AMD64 technology

Yes

SIMD instructions

MMX(+), 3DNow!(+), SSE, SSE2, SSE3, SSE4A

Cool'n'Quiet technology

Yes

It is important to point out that the embedded North Bridge working at 2.0GHz frequency is an exclusive feature of AMD Phenom X4 9850. All other Phenom processors including the ones using new B3 stepping have their North Bridge working at 1.8GHz frequency. The same is true for the HyperTransport 3.0 bus frequency: in Phenom X4 9850 it works at 2.0GHz, while all other quad-core AMD processors use a 1.8GHz bus.

To make things a little bit clearer in terms of available AMD Phenom X4 processor modifications, we would like to offer you another small table:

Note that since Phenom X4 9850 belongs to the Black Edition series, it boasts another distinguishing feature: unlocked clock frequency multiplier offering a lot of overclocking flexibility. By the way, AMD has no plans to launch a non-Black Edition modification of this CPU yet.

Since new B3 processor stepping has no principal differences from B2 processor stepping other than the TLB-bug fix, all mainboards compatible with the old Phenom X4 can also work with the new processor modifications.

TLB-Bug and Its Fix

The description of the notorious TLB-bug is available in AMD’s technical documentation, where it is referred to as ERRATA 298.

The problem implies that under certain tragic circumstances the entries of page translation table located in L2 cache that are used by the system OS to transform virtual address space into physical addresses may get duplicated in L3 cache with wrong flag settings. It not just contradicts exclusive cache-memory architecture, but may also result in data corruption, if the wrong entry from the shared L3 cache will be used by another processor core. According to the official documents, this duplication can occur only in one very rare case: while the processor changes bit flags in L2 for a given entry from the page address translation table, another process evicts the entry into L3 cache.

The patch they developed immediately following the bug discovery that can be activated in the mainboard BIOS Setup, solves this problems very radically: it simply prohibits caching the page address tables at all. As a result, every time the entry cannot be located in the TLB (Translation Lookaside Buffer) featuring some data on direct mappings from virtual memory to physical memory, the processor has to go to main system memory and uncached page table. This certainly increases the memory subsystem latency that is why giving up page table caching may not be considered a good solution.

Even the simplest synthetic benchmarks measuring the memory subsystem performance reveal dramatic performance drops when this TLB-patch is activated. For example, the charts below show the memory subsystem performance measured in a system with AMD Phenom X4 9600 processor using B2 stepping. You can see the results with the patch and without it:


TLB-Patch disabled.


TLB-patch enabled.

As you can see from the screenshots, enabling this patch results in about 50% latency increase. The practical bandwidth also worsens. As we have already shown in our article called AMD Fan Kit: Phenom 9600 Black Edition CPU + DFI LANParty UT 790FX-M2R Mainboard, it also affects the performance in real applications causing about 10% average drop and up to 30% slowing in some individual cases.

Although there are not too many examples when TLB-bug has some serious effect on reliability and only extremely unlucky desktop users working with some specific applications have a chance to ever really face it, hardware fix for ERRATA 298 turned into one of the most acute tasks for AMD.

New B3 processor stepping does solve the problem on the hardware level without losing any of the performance and sacrificing page tables caching. According to AMD representatives, the performance of new processors should be the same as that of CPUs using B2 processor stepping but working with disabled TLB patch. The same can be proven by synthetic benchmark results: Phenom X4 9850 working at lower 2.3GHz frequency and integrated North Bridge running at 1.8GHz speed demonstrates practically the same results as Phenom 9600 with disabled patch.

Nevertheless, the results are still a little different. The new processor stepping provides slightly worse latency when working with the memory subsystem. This is probably connected with the new algorithms for work with page address tables in the cache memory that now contain no potential hazards for the data. However, when we compared the performance of processors on B2 and B3 stepping in real applications, this difference was hardly noticeable.

Unfortunately, AMD engineers didn’t really explain to us what was done specifically to fix the TLB bug in the new B3 processor stepping. However, some indirect data we have at our disposal gives us reason to believe that now, after the processor core changes the bit flags for page table entries stored in L2 cache, they are all evicted into L3 cache. This may be the reason fore the latency to get a little bit higher.

Advantages of Faster Memory Controller

As we have already pointed out in our previous articles, Phenom processors feature a built-in North Bridge with a memory controller and L3 cache working at their own frequency and voltage that do not depend on the frequency and voltage of the main processor core. This is what distinguishes Phenom processors from the previous-generation Athlon 64, which memory controller worked at the same speed as the processor core. The use of individual frequency setting for the built-in North Bridge allows clocking the memory independently of the processor core and thus avoiding DDR2 SDRAM frequency variations that may occur in different processor models. Phenom memory controller always sets correct frequencies for standard DDR2 types no matter what the nominal CPU clock speed actually is.

All currently existing Phenom processors, except the top 9850 model, have their memory controller and L3 cache working at 1.8GHz frequency. Phenom X4 9850 Black Edition pushed this frequency 200MHz higher, its integrated North Bridge now works at 2.0GHz.

So, the memory subsystem of this processor should have also sped up a little. We decided to pay special attention to this matter and double-check the memory subsystem performance of a Phenom X4 9850 Black Edition, when its North Bridge worked at 2GHz and at 1.8GHz frequency like in the younger CPU models.


North Bridge frequency: 2.0GHz


North Bridge frequency: 1.8GHz

The results of our practical memory subsystem measurements speak for themselves. The increase in the frequency of the integrated North Bridge did in fact have a positive effect on L3 cache performance as well as on the memory subsystem speed.

Of course, this also tells on the results in real applications, which you can see from our express-tests session chart:

At the same time, you shouldn’t really hope that North Bridge frequency will have a significant effect on the performance. The performance boost does not exceed 3% at best. On average, 200MHz increase in the memory controller and L3 cache frequency generates about 1% improvement in most benchmarks.

Overclocking

Another matter that interests computer enthusiasts now that the new Phenom processor stepping has been announced is its frequency potential that can be revealed during overclocking. And although AMD keeps stressing that they didn’t really focus on increasing it in the new CPUs, we still had hopes.

Nevertheless, our practical experiments with Phenom X4 9850 Black Edition showed that no wonder happened. Quad-core processors using new B3 stepping overclock pretty much the same as their predecessors on B2 processor stepping. By raising the processor Vcore of our test sample from the nominal 1.3V to 1.4V we could only get to 2.7GHz. The system remained absolutely stable at this frequency with a Zalman CNPS9700 LED CPU air cooler.

As you can see from the screenshot, we overclocked out processor using clock frequency multiplier increase, because Black Edition processors and our Phenom X4 9850 in particular have the multiplier unlocked. However, this feature can hardly become a serious argument in favor of this CPU for overclocking fans, because an 8% frequency increase doesn’t look very exciting. Especially, if you recall how greatly Intel Core 2 Quad solutions can overclock.

Unfortunately, we couldn’t improve this result even by raising the processor core voltage or North Bridge voltage. So, it looks like we can actually expect any significant improvements in the frequency potential of AMD quad-core processors only when they start using 45nm production process.

Testbed and Methods

Frankly speaking you may doubt whether it makes sense to test Phenom X4 9850 Black Edition within this review. Our previous articles have already destroyed all illusions about Phenom performance. Besides we have already found out that their frequencies should be at least around 2.7-2.8GHz in order for AMD Phenom processors to successfully compete against the youngest quad-core Intel Core 2 Quad CPUs. At this time Phenom X4 9850 cannot boat anything like that.

Nevertheless, tests are an irreplaceable part of every processor review, so we couldn’t dare drop them. We decided to compare Phenom X4 9850 Black Edition against quad-core Intel processors offered within the same price range. Today Intel has two CPUs that fit the profile: Core 2 Quad Q6600 and a newer Core 2 Quad Q9300 from the Penryn family. Note that Phenom X4 9850 and Core 2 Quad Q9300 work at the same clock frequency of 2.5GHz, which may add some intrigue to our today’s test session.

Besides, our results also include the numbers for a more affordable Phenom X4 9750 and the top CPU on B2 processor stepping – Phenom X4 9600, with two types of results on the diagrams: with and without the TLB patch.

For our tests we put together the following testbeds:

AMD platform:

Intel Platform:

Performance

3D Games

Media Content Encoding

 

Final Rendering

 

Other Applications

 

All test applications we used this time indicate that the new Phenom X4 9850 Black Edition is still slower than the youngest quad-core processors from Intel. Therefore, we cannot speak of any direct competition between quad-core solutions from Intel and AMD at this time.

Conclusion

I can’t say that quad-core AMD processors using new B3 stepping surprised today. Against the background of quad-core Intel processors, they still look not very convincing falling behind the competitors in terms of performance, power consumption and overclocking potential.

Nevertheless, we can’t help stressing the fact that AMD is moving in the right direction trying to improve their Phenom X4 family at any rate. Namely, they have really rapidly fixed the notorious TLB bus that harmed the image of all processors on K10 micro-architecture a lot. Moreover, they have also increased the processors clock speeds, which is a definite advantage. The top Phenom X4 processors have even managed to catch up with the youngest Core 2 Quad representatives. Unfortunately, there is no performance parity to talk about just yet, but the gap between AMD and Intel has definitely grown smaller.

But the most important thing is that AMD have adjusted their price policy in a very smart way. Namely, the official price for AMD Phenom X4 9850 Black Edition processor is set at $235, which is less than what the cheapest quad-core Intel processor is currently selling for. AMD Phenom X4 9750 will be offered for $215, while the youngest model – Phenom X4 9550 – is priced at $195. This way AMD has finally given up unjustified illusions and is going to offer their Phenom X4 processors at reasonable and fair prices for their performance level.

And it means that quad-core AMD CPUs will become more popular as a basis for inexpensive multi-threaded systems that may be of interest to certain user groups out there. For example, as inexpensive computers for rendering and media content processing tasks.

In conclusion I would like to say that triple-core processors that AMD starts distributing among their OEM partners these days may boast even better marketing potential than Phenom X4 the way we know it today. The price of the triple-cores is going to be even more affordable despite their relatively high computational power in multi-threaded apps. Phenom X3 8600 working at 2.3GHz will sell at about $175, while Phenom X3 8400 with 2.1GHz frequency will be priced at around $150. However, we are going to discuss Phenom X3 a little later when these processors acquire B3 stepping and become available in retail segment.