Advanced Micro Devices may offer gamers something, which significantly boosts performance of single-threaded games with its new central processing units (CPUs) in socket AM2 form-factor, the sources claim.
In April 2006 some rumors emerged that AMD was working on the technology designed to boost performance of single-threaded applications on multi-core processors. According to certain sources familiar with AMD plans, the company is going to offer their own technology that will work in an opposite way to what Intel Hyper-Threading does: it increases performance of dual-core chips in single-threaded applications. If the latter splits resources of a single physical processor core, then AMD’s new know-how will allow combining the resources of the two physical cores to speed up the processing of tasks that work in the most optimal way on single-core CPUs, according to sources.
According to sources, it would be possible to double the number of decoders this way so that the “combined: CPU will process 6 instructions per clock cycle. This thing only can become a pretty decent response to Conroe processors and it should be expected that this technology would debut closer to July 24, the day of the Intel Core 2 for desktops launch. But the feature should require synchronization of chips’ L1 cache in general along with some other capabilities. Perhaps, the new technology will simply overclock the processor and disable the second core in certain situations.
The corresponding functionality has already been built into dual-core Athlon 64 X2 processors for Socket AM2 form-factor, the sources claim. To activate it customers will “only need to update the processor driver and the mainboard BIOS,” they say. Microsoft Corp. will reportedly even release a corresponding patch for the operating systems that will allow recognizing two cores of the Athlon 64 X2 as a single one.
According to sources and alleged preliminary test results, the CPU will be able to switch into this “combined” mode dynamically, depending on the type of the application. There is no secret that a lot of tasks still benefit from single-core CPUs more than they would from the dual-core processors working at lower nominal frequencies.
It is remarkable that at the same time AMD will also push forward their 4x4 platform for enthusiasts who cannot imagine their life without multi-tasking and absolutely extreme performance. It seems that these two completely different initiatives will be positioned in different market niches. At least the owners of socket AM2 Athlon 64 X2 will be able to put two cores into a single virtual one for free if we do not consider downloading the new BIOS and OS driver that much of a trouble.
AMD did not comment on the news-story.
Comments currently: 19
Discussion started: 06/22/06 05:13:21 PM
Latest comment: 08/25/06 02:12:35 PM
Expand all threads
| Collapse all threads
Sounds like marketing hype or bullshit ...
How does it gonna handle CONDITIONAL BRANCHES and STACK BASED CALLS?
How do you "reverse" thread those?
mov eax, data1
cmp eax, data2
How is it gonna transverse from PointB when it has to wait for eax? Even this piece of simple x86 code will prove diificult for OOO execution engines...
06/23/06 01:58:30 AM]
When something has NO sense, it still has WAY MORE sense then this reverse thingy...
06/23/06 03:45:47 AM]
If you guys read the article carefully and read the similar one at theinquirer.net then you will know that this technology makes sense.
Each CPU has different FPU units. three in each AMD core I guess. more FPU units mean faster processing becuase you can do more instructions per cycle. when a thread gets nasty and consumes a lot of power from one CPU then the system will assign the available FPU units in the other core to the busy core.
does not sound complicated and hard to understand for me.
if more FPU units in the same core do not mean more performance then why the hell would AMD and Intel put more than one unit in each core?
for a programmer this technology does not make sense, but for an engineer it makes a lot of sense.
Now go and put that piece of code you wrote in a trash bin.
06/23/06 04:19:50 PM]
- collapse thread
A core has multiple FPU units because it can process multiple instructions in parallel (and out of order).
The thing is, it happens on one core, meaning the instructions all share the same register file and the same OOO logic. That means one instruction can be assigned to either one of the FPU units with no penalty.
However, doing the same between cores is a radically different thing. At the very minimum, you need to transfer the instructions to the other core, and transfer the resulting register changes back to the first core. Then you need the logic to keep track of which instructions are currently being processed on the other core, as well as when you can expect to see the result. This is already being done inside each individual core, where latency isn't a factor.
The key point here is that it is only an advantage if you can find instructions that are independent, and assign them to a FPU unit each. However, independent instructions are hard to come by. Sooner or later, the result *will* be needed, which means you get a dependence. If the result you need is residing on a different core, you have to wait *a long* time for it to get back to you. That means *lower* efficiency, not higher.
If you have multiple threads, this works fine, because the two cores do not attempt to interleave or mix instruction. Each core is assigned one thread exclusively, and so, it doesn't matter how long it'd take to get a result from instructions processed on the other core.
Splitting a thread across multiple cores is different, because you constantly need access to previously processed instructions, to act as input for new ones. That's ok on single cores with multiple FPU units, because they're all close to the register file, so their results can be returned quickly. But having to send it across a hypertransport link from core A to core B would introduce a delay that'd really hurt performance.
Actually you got it wrong. For a programmer this makes sense, but for an engineer it doesn't.
06/25/06 07:15:27 AM]
Those are x86 assembly codes.. Don't you recognize?
My code demonstrates that you need shared resources such as registers and caches for this "duplexing" thingy to work... Note the stack (shared cache required), conditional jump (shared flag register required), eax operation (shared accumulator register required), etc... Heck it also needs a shared PC/IP and SP (Program Counter/Instruction Pointer and Stack Pointer)...
Currently all AMD dual cores have separate registers and separate caches!
AMD=Always Minds Dumber.
06/26/06 03:10:55 AM]
Add your Comment
Enter your username and e-mail address. Password will be sent to you.