Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

AMD Phenom Changes Stepping to B3: TLB Bug – in the Past (page 3)


Category: CPU

by Ilya Gavrichenkov

[ 03/26/2008 | 09:00 PM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

TLB-Bug and Its Fix

The description of the notorious TLB-bug is available in AMD’s technical documentation, where it is referred to as ERRATA 298.

The problem implies that under certain tragic circumstances the entries of page translation table located in L2 cache that are used by the system OS to transform virtual address space into physical addresses may get duplicated in L3 cache with wrong flag settings. It not just contradicts exclusive cache-memory architecture, but may also result in data corruption, if the wrong entry from the shared L3 cache will be used by another processor core. According to the official documents, this duplication can occur only in one very rare case: while the processor changes bit flags in L2 for a given entry from the page address translation table, another process evicts the entry into L3 cache.

The patch they developed immediately following the bug discovery that can be activated in the mainboard BIOS Setup, solves this problems very radically: it simply prohibits caching the page address tables at all. As a result, every time the entry cannot be located in the TLB (Translation Lookaside Buffer) featuring some data on direct mappings from virtual memory to physical memory, the processor has to go to main system memory and uncached page table. This certainly increases the memory subsystem latency that is why giving up page table caching may not be considered a good solution.

Even the simplest synthetic benchmarks measuring the memory subsystem performance reveal dramatic performance drops when this TLB-patch is activated. For example, the charts below show the memory subsystem performance measured in a system with AMD Phenom X4 9600 processor using B2 stepping. You can see the results with the patch and without it:


TLB-Patch disabled.


TLB-patch enabled.

As you can see from the screenshots, enabling this patch results in about 50% latency increase. The practical bandwidth also worsens. As we have already shown in our article called AMD Fan Kit: Phenom 9600 Black Edition CPU + DFI LANParty UT 790FX-M2R Mainboard, it also affects the performance in real applications causing about 10% average drop and up to 30% slowing in some individual cases.

Although there are not too many examples when TLB-bug has some serious effect on reliability and only extremely unlucky desktop users working with some specific applications have a chance to ever really face it, hardware fix for ERRATA 298 turned into one of the most acute tasks for AMD.

New B3 processor stepping does solve the problem on the hardware level without losing any of the performance and sacrificing page tables caching. According to AMD representatives, the performance of new processors should be the same as that of CPUs using B2 processor stepping but working with disabled TLB patch. The same can be proven by synthetic benchmark results: Phenom X4 9850 working at lower 2.3GHz frequency and integrated North Bridge running at 1.8GHz speed demonstrates practically the same results as Phenom 9600 with disabled patch.

Nevertheless, the results are still a little different. The new processor stepping provides slightly worse latency when working with the memory subsystem. This is probably connected with the new algorithms for work with page address tables in the cache memory that now contain no potential hazards for the data. However, when we compared the performance of processors on B2 and B3 stepping in real applications, this difference was hardly noticeable.

Unfortunately, AMD engineers didn’t really explain to us what was done specifically to fix the TLB bug in the new B3 processor stepping. However, some indirect data we have at our disposal gives us reason to believe that now, after the processor core changes the bit flags for page table entries stored in L2 cache, they are all evicted into L3 cache. This may be the reason fore the latency to get a little bit higher.

<<< Previous page Next page >>>

Discussion

Comments currently: 43
Discussion started: 03/26/08
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, May 15, 2008

11:11 pm Via Technologies Reportedly Readies Dual-Core Microprocessors. Via’s Dual-Core Chips Set to Come in 2009 – Rumours

Tuesday, May 13, 2008

4:25 pm Nvidia Has No Plans to Take Over Via Technologies, Says Chief Exec. Nvidia Denies Intentions to Buy Via Technologies – CEO

Monday, May 12, 2008

1:47 pm AMD Releases Its First Low-Power Quad-Core AMD Opteron HE Chips. AMD Unveils “Highly-Efficient” Quad-Core AMD Opteron Processors

Friday, May 9, 2008

3:39 pm Toshiba Plans to Equip Multimedia Laptops with SpursEngine Processor. Toshiba’s SpursEngine Chip to Find Home in Company’s Notebooks

Thursday, May 8, 2008

7:58 am Advanced Micro Devices Updates Server Roadmap. AMD Cancels Montreal, But Introduces Sao Paolo, Magny Cours

 
News Archive
All Latest News