<%BANNER[top_768x90]%>
<%BANNER[banner_468x60_h]%>
<%BANNER[cpu_300]%>

Articles: CPU

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 ]

3DNow! and 3 Mistakes in AMD’s Strategy

Now we will dwell upon the instructions extension introduced by AMD. It can be viewed as a competitor to SSE, or, to be more exact, SSE is the competitor to 3DNow! as it appeared later. The 3DNow! set was introduced in AMD K6-2-3D processors, competing with Intel Pentium II. We view 3DNow! and SSE as rivals, because both extensions work with real numbers and are intended for geometry-related applications.

Well, 3DNow! is very similar to SSE, but there are some differences, too. There are the same eight new registers, but of 64bit, not 128bit, size. Thus, they store two numbers, not four. You can perform similar operations as those in case of SSE: sum up / multiply / divide two pairs of operands, derive a (inverse) square root accurately or approximately (the latter is performed faster).

Part 1

Part 0

 

 

 

Register 7

 

 

Register 6

 

 

Register 5

 

 

Register 4

*

*

Register 3

10000.1

6.7

Register 2

-0.5

1.5e7

Register 1

2.0

1.0

Register 0

As you may guess, a 3DNow!-optimized code would run twice faster due to simultaneous processing of two pairs of operands. Seems less promising than SSE, doesn’t it? Yeah, if you sit down to optimizing your program, you would seek for a maximum possible performance growth. This factor, combined with the traditionally dominant position of Intel processors in the market, played a crucial role and prevented 3DNow! from becoming widely-spread among software developers.

By the way, SSE used in Pentium III provided twice as low improvement as in Pentium 4, and almost equaled the effect of 3DNow! in AMD Athlon processors. So, it’s mostly the predominance of SSE-supporting CPUs that negatively affected the fate of 3DNow!.

Nevertheless, 3DNow! offered some appealing options. One of them was the possibility to add up numbers stored in one register. That is, you can perform “horizontal” operations as well as “vertical” ones. This flexibility may come in handy in a lot of popular tasks, for example when calculating a scalar product of two 3D vectors. Try performing that with SSE. The result will be disappointing. You will be unable to add up the elements of the long SSE register without involving extra registers. As a result it will not be any faster than without any SSE, and maybe even slower than that. And scalar product is a very popular thing especially to get a vector norm, a distance between the two points. In this case 3DNow! looks much more preferable, due to higher flexibility.

One more advantage of 3DNow! is the possibility of effective automatic optimization by means of the compiler. SSE is too bulky – it has large registers – for automatic data organization. Such a compiler would make a floating-point-heavy program run 1.5 times faster. But AMD didn’t bother about implementing it, while Intel was actively promoting its SSE-supporting and 3DNow!-oblivious compiler. Things took such a turn that AMD had to use Intel’s compiler to create its Spec benchmarking tests (www.spec.org). They just used the most effective compiler to reach the highest performance of the benchmarking application.

Software developers were not eager to optimize their program once more for Athlon CPUs: they had had enough trouble with Intel processors already. So, as a result a program either had no SIMD-optimization at all and Athlon did well compared to Pentium 4, or there was SSE(2) optimization and Athlon would lose.

Overall, AMD made a mistake with 3DNow!. It all ended at the advertisement level. Among popular 3DNow!-optimized applications we can only name OpenGL drivers, where we could see a significant performance boost. I believe that since AMD couldn’t push its instructions set forward, they should have carefully implemented all Intel’s innovations in their processors. If they had implemented SSE2 in Athlon XP, even with a lower efficiency, instead of 3DNow!, it would be a very strong product, almost free from weak spots.

Running a little ahead, we should say that AMD has to take care of a compiler for its processors. The new x86-64 architecture from AMD requires recompilation of existing software to be able to use the new capabilities. AMD Athlon 64 still features 3DNow! so we will have an opportunity to compare the efficiency of SSE- and 3DNow!-optimizations.
Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 ]

<%BANNER[banner_468x60_f]%>

Discussion

Comments currently: 11
Discussion started: 03/26/03 10:24:56 PM
Latest comment: 09/11/05 10:09:15 PM

View comments

You must log in to add comments.

Forgot password? Registration

remember me