AMD Athlon 64 FX-51 vs. Intel Pentium 4 Extreme Edition 3.2GHz: Clash of Strong Wills

We tested two newest CPUs from Intel and AMD, which struggle for the right to be called the today’s fastest desktop solutions. Both processors are the first models of two principally new product families targeted for hardcore gamers and enthusiasts. Well, let’s find out which if the two appears to be the most successful!

by Ilya Gavrichenkov
09/23/2003 | 12:01 PM

The 23 of September has been expected by many people at a time. Most of us have been associating this date with the beginning of 64bit CPUs introduction into the desktop PC market. And even though it is not quite correct, since the first 64bit desktop processor is IBM PowerPC G5, launched in the beginning of this summer and applied in PowerMac G5 computer systems, AMD’s marketing efforts haven’t been vain. Not only the most sensitive users do consider today to be the beginning of the desktop systems era, but also the pretty experienced ones are very much excited about the coming of the 64bit Athlon 64 processors. Of course, because who knows, maybe a relatively small AMD company compared to Intel, will be able to make another significant dash forward as it has already happened once during the Athlon processor family announcement and this way to win all the initiative.

 

The interest to AMD Athlon 64 processors has been warmed up since a while ago already. Despite the preliminary plans, this CPU came out a year later. If it had been announced a year ago, Athlon 64 would have inevitably made a revolution. But now the situation is completely different. Intel managed to significantly develop its 32bit Pentium 4 processors since then. During the past year Pentium 4 CPUs acquired Hyper-Threading technology and 800MHz bus. Moreover, the clock frequencies of these processors have grown up immensely. Won’t Athlon 64 look not that attractive anymore against the competitor’s background? This is actually the major intrigue about the today’s announcement. This is why everybody have been so impatiently waiting for the new processor to finally come out.

Moreover, due to AMD’s and Intel’s joint efforts, there is one more reason to make this day a history. If you are reading this article, you probably know that today AMD is going to reveal two processor families based on AMD64 64bit CPU architecture. They are Athlon 64 and Athlon 64 FX. The first processor family is targeted for the mainstream market, while the second one is positioned for the gamers and computer enthusiasts squeezing the maximum performance out of their systems. This way, AMD appears to have three independent CPU families available: Athlon XP, Athlon 64 and Athlon FX. These are the families targeted for three different markets: Value, Mainstream and Enthusiast respectively. In other words, AMD decided to differentiate the processor market even further by offering each user group a separate solution.

Intel also didn’t stay apart and at the recent IDF introduced a new Pentium 4 Extreme Edition processor family, which we are going to talk more about today. These processors will perform better and cost more than the regular Pentium 4, and will be aimed at the gamers and enthusiasts market, just like AMD’s Athlon 64 FX.

So, Intel now can also boast three independent processor families for each of the user groups: Celeron for the Value market, Pentium 4 for the mainstream and Pentium 4 Extreme Edition for the gamers and hardware enthusiasts. As a result, the CPU market now looks very similar to the VGA market. The manufacturers have split the targeted audience into several user groups and are going to start selling a dedicated product meeting the needs of these particular users and lying within their affordable price range.

In our today’s review we are going to introduce the new AMD and Intel processors to you, which both companies will position for hardware enthusiasts. In other words, we will talk about the fastest and most expensive solutions Intel and AMD are ready to offer today: Pentium 4 Extreme Edition 3.2GHz and Athlon 64 FX-51.

AMD Athlon 64 FX-51

The new Athlon 64 and Athlon 64 FX processors announced today are built on the same core with AMD64 architecture, as AMD Opteron processors. We have already introduced this core and its peculiarities to you in a few articles on our site, that is why if you need more details on this architecture, I suggest that you consult one of the following materials available in our CPU section:

I would like to briefly remind you that CPUs with AMD64 architecture are based on the architecture of the classical Athlon XP. The major difference between the new CPUs and the predecessors, which allowed calling them the 8th generation CPUs is support of AMD64 technology, which allows executing 64bit code while retaining full hardware compatibility with the contemporary applications. The implementation of 64bit modes led to broadening of the address space as well as to doubling of the general purpose registers and the corresponding growth of their width to 64bits. However, if you really want to take advantage of the 64bit modes, you need appropriate software supporting AMD64 architecture. This software has already been introduced into the server market, for which the AMD64  Opteron processor is actually positioned. Here you can see ported clones for Linix, web-servers, databases, etc. However, the situation with Athlon 64 and Athlon 64 FX is not as rosy. The desktop systems users prefer Windows operation systems. Although Microsoft is not in a rush to release the 64bit Windows XP version for CPUs supporting AMD64 architecture in the nearest future. Yes, there are beta-versions of this OS already available, which lets us hope that we might soon see it coming, but the exact schedule for it is still unknown.

As a result, for most users Athlon 64 and Athlon 64 FX are currently none other but an enhanced version of the Athlon XP. These enhancements are not so numerous, to tell the truth, but they will definitely push up the performance. Among them we would like to point out the following:

The major differences of the new processors, which speed them up in common applications (and which you can actually feel quite easily), are larger L2 cache memory and high-performance integrated memory controller with low latency. These two innovations are the keys to increased performance of Athlon 64 and Athlon 64 FX processors.

If we regard the new AMD CPUs as a new enhanced version of AMD’s Athlon XP processor, there will be a few more things worth pointing out. AMD has long stopped increasing their CPU clock rates. The 0.13micron technology currently used in AMD’s fab30 in Dresden doesn’t allow the company to increase the CPU frequency over 2.2GHz for almost a year already. However, the actual performance of AMD CPUs keeps growing higher as the new processor models come out. The thing is that the processor actual performance depends not only on the core clock frequency, but also on the instructions the CPU is capable of processing per clock. AMD engineers have been working really hard on increasing the second parameter. During the last year L2 cache of AMD Athlon XP processors has already reached 512KB, and the bus frequency almost doubled. Of course, these changes affected the data and instructions transfer rate to the processor core, so that the number of instructions executed per clock has grown up quite noticeably in a number of applications. Athlon 64 and Athlon 64 FX represent one more step in this direction. Integrated memory controller and even larger 1MB L2 cache will allow speeding up the instructions feeding process even more, so that the idling of the execution units will drop down a lot. However, this is not everything yet. Fine-tuning of the used architecture, implying changes in the decoding algorithm, will also increase the processor core efficiency, because this way the CPU core will be able to processor more commands per clock. However, I have to stress right away that speeding up the commands decoder doesn’t have any serious effect on the processor performance in real applications. Firstly, it really works only in case of a pretty  limited instructions set, and secondly, the decoder acceleration we are talking about here is not that high at all.

As for the processor clock frequencies, things do not seem to be that comforting here. even though AMD uses a slightly improved 0.13micron SOI process to manufacture its Athlon 64 and Athlon 64 FX processors on AMD64 architecture, their core frequencies haven’t yet managed to exceed the 2.2GHz bar. The Athlon 64 FX-51 processor announced today works at 2.2GHz clock frequency, which is the same as that of Athlon XP 3200+, and Athlon 64 3200+ works at an even lower frequency: 2GHz. In the nearest future AMD is going to announce Athlon 64 3400+ with 2.2GHz frequency. As for higher frequencies, there is some hope that Athlon 64 FX processor model with 2.4Ghz core clock will be made with 0.13micron technology. After that AMD64 based CPUs should be able to acquire higher clock rates only after AMD moves to 90nm production technology and the new 64bit San Diego processor core comes out.

Now let’s try to figure out what are the differences between the mass Athlon 64 and the elite Athlon 64 FX for enthusiasts. In fact, the differences between them are not so numerous. AMD Athlon 64 is the CPU that AMD was going to launch a while a go already. Its major feature is the single-channel integrated memory controller supporting DDR400/DDR333 SDRAM. This is exactly the CPU that we managed to test about half a year ago (see our AMD Athlon 64 Performance Preview). Other than that this processor is just the same as other AMD64 based CPUs, except the Socket754, of course.

As for Athlon 64 FX, the idea to introduce a product like that occurred to AMD just recently. AMD decided to release an extremely fast gaming solution because the performance of the regular Athlon XP working at 2GHz was not high enough, as AMD desired it to be. Athlon 64 FX is a full analogy to AMD Opteron processor with the dual-channel memory controller. It even uses the same processor Socket 940 as Opteron CPU. The only thing that has become really different is the die marking. That is why Athlon 64 FX inherited some server features from the Opteron family. In particular, Athlon 64 FX requires Registered memory, which is really weird to tell the truth. Also AMD mentioned the support of Registered DDR400 SDRAM, although there is nothing special about it, actually: Opteron can also work with this memory type. It is quite a different story that Registered DDR400 hasn’t been yet approved by JEDEC and there are really very few memory modules like that available in the market today: basically you can count them with the fingers of one hand. However, the launching of Athlon 64 FX will definitely push many memory makers to start making these modules. For example, today together with the Athlon 64 FX processor announcement, Kingston launched their new Registered DDR400 memory, which will sell among HyperX products. And a few days ago OCZ also announced that they begin shipping memory modules like that.

Athlon 64 and Athlon 64 FX have also acquired a few long awaited changes of the package design and anti-burn protection. First of all, I would like to mention a special lid that protects the CPU die against physical damage during cooler installation and removing. Moreover, besides the built-in thermal diode, Athlon 64 FX and Athlon 64 have finally got their own anti-overheating protection circuit. Although, it doesn’t bring anything brand new to the idea behind it: if the CPU reaches certain critical temperature, it simply shuts down. Anyway, it is very pleasing that AMD has finally taken note of the reliability issues, which have been encountered by their Athlon and Athlon XP CPUs pretty often.

So, let’s sum up everything we have just said about the new Athlon 64 FX-51 and Athlon 64 3200+ in a table below. For a more illustrative picture here are the results for Athlon XP 3200+ as well:

 

Athlon 64 FX-51

Athlon 64 3200+

Athlon XP 3200+

Package

Socket 940

Socket 754

Socket 462

Frequency

2.2GHz

2.0GHz

2.2GHz

Production process

0.13micron, SOI

0.13micron, SOI

0.13micron

Number of transistors

105.9 million

105.9 million

54.3 million

Die size

193sq.mm

193sq.mm

101sq.mm

Nominal voltage

1.5V

1.5V

1.65V

Integrated memory controller

Dual-channel, 128bit

Single-channel, 64bit

None

Supported memory types

Registered DDR400/ DDR333/ DDR266 SDRAM

DDR400/ DDR333/ DDR266 SDRAM

-

ECC support

+

+

-

L1 cache

128KB
(64KB for data and 64KB for instructions)

128KB
(64KB for data and 64KB for instructions)

128KB
(64KB for data and 64KB for instructions)

L2 cache

1024KB (exclusive)

1024KB (exclusive)

512KB (exclusive)

SIMD instructions support

SSE2/SSE/3DNow!

SSE2/SSE/3DNow!

SSE/3DNow!

AMD64 technology support

+

+

-

* - Note that although the memory of Athlon 64 and Athlon 64 FX is clocked relative to the core frequency, the actual memory frequencies in this case make 100, 129.4, 157.1 and 200MHz.

It was a little bit unexpected that Athlon 64 FX and Athlon 64 used the same processor core, despite the architectural differences they have. However, AMD uses the same dies for both processors to unify the production process, just like for Opteron processors. They reduce functionality by disabling certain die features on the hardware level.

In fact, you can easily prove that Athlon 64 FX and Opteron processors are close relatives. Both processors feature the same CPUID. For example, we compared the info provided by the CPUZ utility for Athlon 64 FX-51 and Opteron 246.

And this is what the new Athlon 64 FX-51 (on the left) looks like compared with the Opteron processor (on the right):

Well, it is very pleasing that the today’s Athlon 64 FX-51 announcement also indicates the beginning of mass sales of these processors. However, the price of Athlon 64 FX-51 intended for extreme gaming fans will not strike you as very low. AMD set their retail price at $733.

Pentium 4 Extreme Edition 3.2GHz

Of course, Intel couldn’t disregard the launch of new AMD processors. No wonder, actually. Athlon 64 and Athlon 64 FX are positioned in such a way that the high-end market segment covered with Athlon 64 FX appeared open on Intel’s side. Look here:

Even though this picture has been composed by AMD’s marketing department, it displays pretty accurate situation. Athlon 64 FX-51 priced crazily high and aimed at the gaming enthusiasts market didn’t have an opponent from Intel to compete with. That is why Intel had to invent a solution to cover this market sector. I don’t think that Intel was I in a rush to make this decision. They seem more likely to have worked on that for a while concealing all these preparations really carefully, so that the launch of the Pentium 4 Extreme Edition processor could be a total surprise for Intel’s competitor and the public and could make a worthy effect.

So, a few days ago Intel announced this new processor family at the IDF in San Jose. Like Athlon 64 FX, this new family from Intel will include non-mass products targeted for the most demanding users. The first try in this direction is the new Pentium 4 Extreme Edition 3.2GHz, which we managed to get for review.

So, what does this CPU look like? In fact, there is nothing totally unexpected about its architecture. While Intel is still working on the Prescott core, and the current Northwood core has exhausted the clock frequency potential provided by the 0.13micron production technology, there appeared only one way left: to repeat the company experience with Xeon processors for dual-CPU systems. Namely, the today’s fastest Pentium 4 3.2GHz acquired an L3 cache.

This is an on-die cache, 2MB big, which works at the full core frequency. This way, Pentium 4 Extreme Edition appeared a sort of a Xeon MP on Gallatin core in a different package and with a different target. Of course, Intel’s engineers do not confirm this fact directly, saying that Gallatin core had to be modified for Pentium 4 Extreme Edition family so that it could support 800MHz bus. However, the L3 cache of the contemporary Pentium 4 Extreme Edition features absolutely the same structure as L3 cache of the already mentioned Xeon MP CPU. In other words, it is an exclusive, 8-way associative cache using 64bit bus.

Here is the info the CPUZ utility provides about this processor:

As you see, this is a real Northwood cores with additional L3 cache-memory. Note that M0 core stepping, which we see in there, is also used in the new Xeon DP with 1MB L3 cache. So, it is more or less clear where the actual idea comes from. Note that the launch of the new Pentium 4 Extreme Edition 3.2GHz processor is not a one-time measure aimed at saving Intel’s reputation until Prescott processors come out. Pentium 4 Extreme Edition is a new full-fledges processor family, which will be continued later on, even when they move to the new Prescott core.

The first Pentium 4 Extreme Edition 3.2GHz model is intended to use 800MHz Quad Pumped Bus and supports Hyper-Threading technology, like all other high-end Pentium 4 processors. Pentium 4 Extreme Edition CPUs working at 3.2GHz are compatible with all contemporary Socket478 mainboards and do not require any specific cooling systems, even though they have a 169-million transistor die and higher heat dissipation of up to 94W. this way, all owners of Socket478 systems will easily be able to upgrade to Pentium 4 Extreme Edition. Of course, if they are not discouraged by the price of this CPU, which is expected to be around $740. However, Pentium 4 Extreme Edition processors haven’t started selling yet, and they won’t come today or tomorrow anyway. The most likely schedule for the mass sales of this solution is within a month or two. The picture below shows Pentium 4 Extreme Edition 3.2GHz (on the left) and a regular Pentium 4 3.2GHz (on the right).

In fact, the exterior of the new processor is hardly any different from the predecessor. It probably differs only by the bigger number of elements on the reverse side of it.

Testbed and Methods

Today we are going to see a really exciting rivalry of AMD and Intel’s processors targeted for the enthusiasts market. In other words, we will take a closer look at the performance of AMD Athlon 64 FX-51 and Intel Pentium 4 Extreme Edition. Athlon 64, which belongs to a slightly different price category will be tested in one of our upcoming reviews, so don’t worry that we are leaving it out today. For a better comparison, we will also include the results obtained on top-performance processors of the previous generation: Intel Pentium 4 3.2GHz and AMD Athlon XP 3200+/ the fact that both these processors work at the same core clock as their successors allows us to draw some very interesting conclusions about the efficiency of AMD Athlon 64 FX and Pentium 4 Extreme Edition architectures.

Note that so far of all 64bit operation systems supporting CPUs with AMD64 architecture there are only Linux clones available. That is why we have to carry out most tests in 32bit Windows XP.

So, our testbeds were configured as follows:

A few comments to the testbed configurations, that you should take into account:

Performance in Synthetic Benchmarks: Cache and Memory Speed

The major participants of our today’s test session, AMD Athlon 64 FX and Intel Pentium 4 Extreme Edition, are new solutions from the architectural point of view, especially in terms of work with the memory subsystem. That is why we decided to investigate their bandwidths and memory subsystem latencies first. For this purpose we used Cache Burst 32 utility, which is a worthy optimization of the traditional Cachemem:

The graphs you have just seen above show very illustratively all highs of the new processor architectures, telling on the memory subsystem performance. You can see the inclusive L3 cache of the Pentium 4 Extreme Edition with a pretty good bandwidth. You can also see the larger 1MB L2 cache of Athlon 64 FX processor here. The results also show very clearly that the integrated dual-channel memory controller of the Athlon 64 FX processor allows it to work with the memory as fast as the memory controller of the i875P chipset used by the regular Pentium 4 CPUs. However, these are pretty logical results, which we have expected to see.

However, there are a few really unexpected things, which I would like to point out. First, we can see that L2 cache of Athlon 64 FX processor is working faster than that of Athlon XP. It proves the fact that AMD did broaden the bus between the processor core and the L2 cache up to 128bit. Moreover, according to more indepth benchmarks, there is a bi-directional bus (64bits in each direction). Also note that the L1 cache bandwidth by Athlon 64 FX and Athlon XP processors appears higher than that of Pentium 4. However, L1 cache of the Pentium 4 processor is remarkable fore a different feature: very low latency, which efficiency we will see later in this article. As for the L2 caches, Pentium 4 processor wins in terms of the bandwidth rates here, which is not at all surprising keeping in mind that this cache has a 256bit bus. The bandwidth of the L3 cache in Pentium 4 Extreme Edition processor is close to that of the L2 cache in Athlon XP, which indicates that there is a 64bit bus used to connect the processor core of the Pentium 4 Extreme Edition

with the L3 cache. By the way, Prescott dies will have a broader bus here. And finally the most interesting thing: the use of larger L3 cache allows Pentium 4 Extreme Edition to speed up writing into the memory proven by the benchmarks results.

For easier comparative analysis, we summed up all the numbers in a table below:

 

Pentium 4 3.2

Pentium 4 3.2 EE (2MB L3)

Athlon XP 3200+

Athlon 64 FX-51

L1 cache read speed, MB/s

23108

23295

26309

29260

L2 cache read speed, MB/s

12915

12920

6667

10145

L3 cache read speed, MB/s

 

6522

 

 

Memory read speed, MB/s

3699

3686

1778

3436

L1 cache write speed, MB/s

10793

10789

16424

16431

L2 cache write speed, MB/s

10802

10799

6512

8293

L3 cache write speed, MB/s

 

7236

 

 

Memory write speed, MB/s

1416

2142

1226

2314

Now let’s see what results can be obtained during our latency testing.

We measured the memory latencies on this graph in processor clocks. Therefore, you can only compare the results of the CPUs working at the same clock frequencies. However, we can still discover a few very interesting phenomena. For example, the latency of L3 cache in Pentium 4 Extreme Edition is not bad at all: only twice as big as that of the L2 cache.

To make the analysis more fair we will transform the latencies from processor clocks into time:

This graph give us more food for thought. First of all, you notice right away that the memory subsystem latency of the Athlon 64 FX processor is very low even compared with the dual-channel memory controllers used in Intel’s platforms. This is exactly where the dual-channel memory controller shows its very best. Note that this is far not the top limit for it. As is known, Registered memory features slightly higher latency than non-registered memory that is why the upcoming modifications of the Athlon 64 FX memory controller planned for next year will definitely speed up this processor family quite tangibly.

Here are the same data summed up into a table for your convenience:

 

Pentium 4 3.2

Pentium 4 3.2 EE (2MB L3)

Athlon XP 3200+

Athlon 64 FX-51

L1 cache latency, cycles

2

2

3

3

L2 cache latency, cycles

19

19

20

13

L3 cache latency, cycles

 

43

 

 

Memory latency, cycles

204

206

180

125

L1 cache latency, ns

0.63

0.63

1.36

1.36

L2 cache latency, ns

5.94

5.94

9.09

5.90

L3 cache latency, ns

 

13.44

 

 

Memory latency, ns

63.75

64.38

81.82

56.81

I would like to say that extremely low L1 cache latency is a powerful trump of the Pentium 4 architecture. Although this cache is also pretty small: only 8KB. L1 cache of Athlon 64 FX is slower but at the same time much bigger. The latencies of L2 caches of Intel and AMD processors have become nearly identical after the L2 cache bus had been improved in Athlon 64 FX. Therefore, the larger cache of AMD Athlon 64 FX processor may ensure a certain performance advantage of this processor over the rival.

Performance in Synthetic Benchmarks: SSE2 in Athlon 64 FX

Another innovation introduced in Athlon 64 and Athlon 64 FX is the integrated set of SSE2 instructions. Just like in Pentium 4, there is a single set like that here, so it would be really interesting to evaluate how efficient it is. We have already asked this question during our Opteron 144 tests (see our article called ASUS SK8N + AMD Opteron 144: Uniprocessor Workstation on AMD Opteron and NVIDIA nForce3 Professional for details). As you remember, we used BLAS test from ScienceMark 2.0 test set, which allowed us to conclude that SSE2 instructions set in CPUs based on AMD architecture is not so fast, as we would like it to be. Now let’s try to view this problem from a different viewpoint. This time we will take SiSoft Sandra 2003. this test package includes two benchmarks capable of measuring the performance of the SSE2 instructions unit. They are: the well-known Whetstone C 2.0 test rewritten for the SSE2 instructions and the test based on the measurements of time required for Mandelbrot sets calculations. Here are the results for both of them:

As we see, the SSE2 unit of Athlon 64 FX is not that impressive. The performance is quite low here because of the low processor frequency compared with the working frequencies of the Pentium 4 processor, as well as because of the not very successful SSE2 unit implementation. All in all, we can say that SSE2 support is a formal feature of the new AMD processors. AMD engineers seemed not to care that much about high performance of this unit thinking that it was not worth the effort probably. However, AMD has finally managed to catch up with Intel in terms of supported SIMD extensions (here we do not take into account 3DNow!). this situation will probably last for another 2-3 months until Intel finally announces its Prescott CPU, which will boast a few additional SIMD instructions. They will have to do with the translation of the floating point numbers into integers, complex numbers processing and video decoding.

Performance in Office Applications and Digital Content Creation Applications

Now let’s pass over to some real applications. Before we start I have to stress once again that despite AMD64 technology support implemented in the AMD Athlon 64 FX-51,we used an ordinary 32bit Windows XP version for our tests. The thing is that of all 64bit operation systems supporting x86-64 instructions, you can now get only Linux clones, which are not so popular among the individual end-users for some reason.

First of all, we took a closer look at the performance of our testing participants in office tasks and applications for audio/video files creation and editing. Here we used a popular Winstone benchmark.

In Business Winstone 2002 showing the performance in typical office apps, such as MS Word, Excel or Access, the victory was won by Athlon 64 FX-51. This is not at all surprising, because even the previous generation Athlon XP processors performed very well in this test. As for Pentium 4 Extreme Edition, the additional 2MB of L3 cache allowed to improve its performance by 5% compared with the regular Pentium 4. Although, this is still not enough to let it compete with the new AMD processor on equal terms.

In Multimedia Content Creation Winstone 2003, which tests the performance of our rivals during work with digital content, the situation is different. Due to laregr L2 cache, SSE2 support and integrated memory controller, Athlon 64 FX-51 works about 26% faster here than its predecessor. However, it still cannot outpace Pentium 4 Extreme Edition because the large L3 cache of the latter sped up the NetBurst architecture tangibly enough.

Performance in Data Encoding Applications

Before we pass over to the actual results discussion, I would like to remind you that AMD processors have never been really fast in this type of tasks. NetBurst architecture, which is specifically optimized for streaming data processing proves much more efficient in the applications of this kind.

The performance in WinRAR is very dependent on the memory subsystem performance. As we saw during the synthetic benchmarks discussion, Athlon 64 FX-51 is much faster than its predecessors and sometimes even than the competitors here. Therefore, it is not at all surprising that Athlon 64 FX-51 manages to outpace Pentium 4 3.2GHz. However, Pentium 4 Extreme Edition 3.2GHz with a large L3 cache also boasts improved performance during data compression compared to its predecessor. As a result, Athlon 64 FX-51 doesn’t manage to become the winner here letting Pentium 4 Extreme Edition 3.2GHz outpace it by 4.7%.

MP3 files encoding loads the computational units of the CPU including SSE/SSE2 in the first place. Since the times when Intel started implementing Hyper-Threading technology in its Pentium 4 processors, these CPUs has been much faster in MP3 encoding than the competitors. As we see, Athlon 64 FX-51 failed to change this situation in any way. The architecture of its computational units remained almost unchanged compared with Athlon XP, the clock frequency didn’t grew any higher, and the memory subsystem performance doesn’t matter that much for the LAME codec. By the way, forced enabling of SSE/SSE2 instructions for Athlon 64 FX-51 doesn’t have any positive effect on its performance, unfortunately. As we have already mentioned, Athlon 64 cannot boast fast processing of SSE2 instructions, to our regret.

During video decoding from AVI into MPEG2 format the leadership went over to Athlon 64 FX-51. SSE2 instructions support acquired by the new processor alongside with a significant improvement of the memory subsystem performance pushed the results of the newcomer 13.3% higher than those of the Athlon XP 3200+. This performance growth ensures very stable leadership of the new AMD solution over the competitor, which large L3 cache is of no help here at all.

NetBurst architecture of the Pentium 4 processors, their fast 800MHz bus, Hyper-Threading technology and SSE2 instructions support have always left no chances for the Athlon XP processors in this benchmark. With the launching of Athlon 64 FX-51 featuring SSE2 instructions and faster memory subsystem, the situation got absolutely different. As a result, AMD’s new solution managed to almost catch up with Pentium 4 3.2GHz here. However, a feather in Intel’s cap – large L3 cache – ruined the vague hopes for the victory of Athlon 64 FX-51. As a result, Pentium 4 Extreme Edition 3.2GHz is about 3.6% faster here.

The situation in Windows Media Encoder 9 is somewhat worse for Athlon 64 FX-51. although it works about 11% faster than the predecessor, it still fails to catch up with Pentium 4 3.2GHz, not to mention the Extreme Edition processor, which is 2.1% faster than the regular one.

Performance in Gaming Applications

Both new processor families: Athlon 64 FX from AMD and Pentium 4 Extreme Edition from Intel, are positioned by the manufacturers as extreme gaming solutions, in the first place. Therefore, this particular section of our test session is the most interesting for all of us. Let’s see what the newcomers are worth in contemporary games.

In the popular 3DMark2001 SE test AMD’s processor is a little faster than Intel’s gaming CPU. The difference makes less than 1%, which doesn’t allow us to call any of the them a leader here.

CPU test from the 3DMark2003 test set executing shader algorithms with the help of CPU’s computing power names Athlon 64 FX-51 the leader. Its advantage over the Pentium 4 Extreme Edition 3.2GHz reached 8.7%, and over the regular Pentium 4 3.2GHz – 18%.

The total 3DMark2003 score appears to be in favor of both Pentium 4 processors. It is pretty hard to say what caused such a big difference in results of these benchmarks. However, we tend to believe that it has to do with better optimization of the Detonator driver for NetBurst architecture.

For this test we also used a recently released Aquamark3 test package. This new benchmark is based on a real gaming engine supporting DirectX 9 and is the first one of all today’s 3D applications that really knows to use Hyper-Threading technology. Of course, it affects the results: even the regular Pentium 4 3.2GHz processor is a little ahead of Athlon 64 FX-51. To tell the truth, it’s a pity that AMD didn’t implement any algorithms similar to Hyper-Threading in its CPUs.

CPU test from Aquamark3 test set puts Athlon 64 FX-51 into a very awkward situation. Now it falls 7.3% behind Pentium 4 3.2GHz, while the gap between it and the new Pentium 4 Extreme Edition grows up to 11.2%. I should admit that it is a very alarming sign for the entire Athlon 64 and Athlon 64 FX family. The thing is that Hyper-Threading support will little by little start appearing in new games. In particular, the major hits of this year will be Half-Life 2 and Doom III, which are claimed to support Hyper-Threading.

Pentium 4 processors, which have been the irreplaceable leaders in Quake3 from the very first day, now are completely defeated by Athlon 64 FX-51. Due to faster memory subsystem, Athlon 64 FX-51 outperforms Pentium 4 3.2GHz by 5.6%. However, the additional 2MB of L3 cache by the new Pentium 4 Extreme Edition processor also have a positive effect on its performance thus improving its results by the good 13.3%. So, in the long run, Athlon 64 FX-51 is only the second fastest after the new Intel Pentium 4 Extreme Edition 3.2GHz.

However, we have recently found out that low performance of AMD processors in Quake3 is determined by the low optimization quality of this gaming engine for Athlon XP and Athlon 64 processor families. Not so long ago we managed to download specially optimized for AMD’s architecture Quake3 dll-files (download them here). Of course, we were very much interested to find out how greatly will the use of optimized code improve the performance of AMD CPUs.

Wow, do you notice these drastic changes? Now it’s Athlon 64 FX-51 that outperforms not only the regular Pentium 4, but also the new Pentium 4 Extreme Edition. This way, all Quake3 lovers using Athlon XP, Athlon 64 or Athlon 64 FX should definitely take advantage of the optimized dll.

In Unreal Tournament 2003 the leadership as always belongs to AMD processors. It looks as if Epic, unlike id Software, got really concerned about the optimization of its code for AMD processors architecture. As a result, Athlon 64 FX-51 is 31.5% faster than Pentium 4 3.2GHz and 22.9% faster than Pentium 4 Extreme Edition 3.2GHz. By the way, Epic’s anxious attitude to AMD may finally end up in the fact that the upcoming Unreal Tournament 2004 will be the first game optimized for x86-64 architecture.

The new AMD processor also prove faster than the rivalry products from Intel in Serious Sam: The Second Encounter.

Although Splinter Cell is based on the same engine as Unreal Tournament 2003, the advantage of Athlon 64 FX-51 is not so evident any more. Nevertheless, the new AMD CPU defeats Pentium 4 Extreme Edition 3.2GHz with 2MB L3 cache.

The similar picture can be observed in Gun Metal game based on its own engine.

And in the benchmark based on a very beautiful space game called X2 – The Threat, AMD Athlon 64 FX-51 is not so successful any more. Even though it is faster than the top Pentium 4 3.2GHz processor, the new Pentium 4 Extreme Edition surpasses it.

All in all, I have to point out that AMD’s processor manages to retain the leadership in most today’s gaming applications. However, we shouldn’t forget that very soon the so long-awaited shooter games are coming out and the gaming market will change a lot. And we have no idea how well the new CPUs will perform in Half-Life 2 or Doom III.

Performance in Photoshop

Adobe Photoshop 7.0 is a very popular application, which most of you use for 2D graphics editing, that is why we decided to pay special attention to the performance of our today’s testing participants in this application. We used PSBench test with 50MB image, which 7th version has become available for testing just recently.

Although SSE2 instructions support allow Athlon 64 FX-51 to work 14% faster than Athlon XP 3200+ in Photoshop, this improvement is too small to let it outperform a Pentium 4 based system. By the way, I found it very surprising that the L3 cache of the new Pentium 4 Extreme Edition processor doesn’t affect the results in this graphics application in any way.

Here are some more detailed results showing how well the CPUs tested coped with different Photoshop 7.01 filters. The values indicate time in seconds spent on each filter:

 

Pentium 4 3.2

Pentium 4 3.2 EE (2MB L3)

Athlon XP 3200+

Athlon 64 FX-51

Rotate 90

0.1

0.1

0.2

0.1

Rotate 9

2.3

2.4

2.5

1.9

Rotate .9

2.2

2.2

2.3

1.8

Gaussian Blur 1

0.6

0.6

0.7

0.6

Gaussian Blur 3.7

1.6

1.6

2.6

1.7

Gaussian Blur 85

1.8

1.9

3

1.9

Unsharp 50/1/0

0.8

0.8

0.8

0.7

Unsharp 50/3.7/0

1.8

1.8

2.7

1.8

Unsharp 50/10/5

1.8

1.8

2.8

1.8

Despeckle

2.1

2.1

2.5

2.4

RGB-CMYK

6.9

6.9

7.8

6

Reduce Size 60%

0.8

0.8

1.1

0.9

Lens Flare

2.2

2.3

3.7

3.5

Color Halftone

1.9

1.9

2.6

2.1

NTSC Colors

1.8

1.8

1.7

1.5

Accented Edges

9.9

9.8

9.8

9.4

Pointillize

11.3

11.4

17.3

16.9

Water Color

24

24

21.2

20.1

Polar Coordinates

5.5

5.5

7.7

6.6

Radial Blur

30.9

30.9

41.3

32.8

Lighting Effects

1.6

1.6

1.9

1.7

 Performance During 3D Rendering

The improvements introduced in the Athlon 64 FX-51 processor core appear very efficient by speeding up the rendering in 3d studio max 5.1. However, this application uses Hyper-Threading technology of the Pentium 4 processor family and hence doesn’t allow AMD CPUs to show their real best. Athlon 64 FX-51 is even slower than the regular Pentium 4, not to mention Pentium 4 Extreme Edition.

And in Lightwave 7.5 the situation is different. SSE2 support introduced in the new AMD64 based processors has significantly improved their performance in this application. As a result, even Pentium 4 Extreme Edition working at 3.2GHz and supporting 800MHz bus, Hyper-Threading and 2MB L3 cache falls behind AMD Athlon 64 FX-51.

CINEMA4D, like 3d studio max 5.1, also uses Hyper-Threading very actively. And of course, the absence of any analogy to this technology by AMD Athlon 64 FX doesn’t allow the new processor perform worthily: even Pentium 4 3.2GHz is 22% faster than the new AMD processor.

And the new Maya 5.0 is more tolerant to Athlon 64 architecture. As we see, Athlon 64 FX-51 is the leader here.

Performance in Scientific Tasks

To test the CPU performance during scientific calculations we used the new beta version of the ScienceMark 2.0 test package, which checks how fast the tested platforms can solve different math1ematical modeling tasks.

One of pretty important advantages of the Athlon architecture is a powerful FPU unit, which is very actively involved in this sort of tasks. Therefore, no wonder that even Athlon XP 3200+ showed very good results in ScienceMark 2.0. The improvements introduced in the new Athlon 64 FX-51 lead to even higher results of the latter. Top Pentium 4 and Pentium 4 Extreme Edition processors can’t boast anything spectacular here.

Performance in Professional OpenGL Applications

Here we are going to check the performance of our test platforms in CINEMA4D viewports in AutoCAD 2002:

When the shadowing algorithm built into CINEMA4D is involved, Athlon 64 FX-51 takes the lead. It is probably the high computing power of this processor that helps it show such great results. As for the second test, where the major work is done by OpenGL driver, Pentium 4 and Pentium 4 Extreme Edition are the best.

The testing carried out in AutoCAD 2002 with the help of Cadalyst ?2001 shows that Athlon 64 FX-51 performs faster than the rival in this popular automatic projecting test set in all cases except those involving 3D visualization. The supposition about better optimization of NVIDIA drivers for Pentium 4 processors, which we have already expressed above, finds another piece of evidence here.

Windows XP 64-Bit Edition: First Look

Although there is no 64bit Windows XP with AMD64 technology support, it doesn’t mean that this version is not going to come out one day. Although Microsoft doesn’t mention any exact time frame for this operation system version, they do work on it. At present there are pre-beta releases of Windows XP 64-Bit Edition available. We  managed to get one of these builds numbered as 3790 during our Athlon 64 FX-51 test session.

Please note the “Physical Address Extension” words on the properties window of this version. This phrase indicates that this operation system can work with the current CPU with the addresses of more than 32bit long, which is now of the distinguishing features of this bundle. Also, it is quite possible that Windows XP 64bit Edition is based on the Windows 2003 core and not on the classical XP version. In favor of this approach we can say that the system version for AMD64 processors, which should also work for dual-processor configurations, is expected to support NUMA technology (Non-Uniform Memory Access). The implementation of this particular technology has appeared in Windows 2003 Server.

The impressions made by the Windows XP 64Bit Edition, which we have seen, are more than nice. This system seems to be working faultlessly at first glance. The system includes two applications ported for x86-64 command system. They are: Internet Explorer 6.0 and Outlook Express 6. Other applications are being shipped now in the standard 32bit version.

One of the strengths of the AMD64 architecture is reverse compatibility of the CPUs working in 64bit modes with the 32bit code. This way, the regular 32bit applications can be processed without any problems in Windows XP 64-Bit Edition. However, as far as the drivers are concerned, the situation is somewhat different here. The system can’t work with the old 32bit drivers, and the manufacturers don’t seem to be really hurrying to develop 64bit driver versions for AMD64. By now we managed to get only NVIDIA’s beta drivers for Windows XP 64-Bit Edition.

However, the essence of the 64bit Windows version implies that you can work  not only in standard 32bit applications, which can also be used in the regular Windows XP, but also in specifically written tasks, which know to take real advantage of all AMD64 processor features, such as 64bit addressing, and all 16 64bit general purpose registers. Due to the use of these resources, the tasks ported for Windows XP 64-Bit Edition should theoretically work faster than in case of a standard Windows XP version. Unfortunately, it is still too hard to talk about the efficiency of applications for Windows XP 64-Bit Edition. A few software developers expressed their desire to release the corresponding program clones, however, it implies that they should first sell quite a bit of AMD64 based processors together with 64bit operation systems. So far there no official version of the 64bit operation system. Moreover, the CPUs with AMD64 architecture are only starting to penetrate into the market. As a result, the processing speed of 32bit applications in the new 64bit OS is the second most important thing after the actual arrival of Windows XP 64-Bit Edition.

In this respect, we decided to try figuring out how the 64bit and 32bit applications work in Windows XP 64-Bit Edition. Since there were no fully-fledged 64bit applications and benchmarks for Athlon 64 FX-51 to be tested with 64bit Windows code, we had to resort to synthetic tests. We used three small utilities compiled for 64bit and 32bit Windows versions. Among them are:

The initial code for all three benchmarks was preliminarily optimized for AMD64 architecture. Therefore, the results you are going to see right now represent a kind of ideal case. Nevertheless, they still allow to evaluate how big the maximum performance boost will be in a system with an Athlon 64 and Athlon 64 FX and the programs recompiled for x86-64 command system. Also note that the algorithms used as a basis for all the above described benchmarks do not work with large data packs. That is why you shouldn’t drive any conclusions about the complex performance of the 64bit AMD processors in 64bit Windows operation system.

So, the table below sums up the results of the three above mentioned benchmarks:

 

Win32, 32-bit .exe

Win64, 32-bit .exe

Win64, 64-bit .exe

Minigzip

Zip, sec

9.6

9.6

4.3

Unzip, sec

0.7

0.7

0.3

RSA

AES-128 Encrypt, sec

3.5

3.5

2

AES-128 Decrypt, sec

3.6

3.5

2.5

Triple-DES Encrypt, sec

6.9

7

6.6

Triple-DES Decrypt, sec

6.9

7

6.6

RC4 Encrypt, sec

2.3

2.3

2

RC4 Decrypt, sec

2.3

2.4

2

RSA Encrypt (key size = 4096, number of primes = 2), sec

4.6

4.5

1.4

RSA Decrypt (key size = 4096, number of primes = 2), sec

14.1

14

3.1

SHA-1 Digest, sec

3.8

3.8

3.5

DivX

DivX, sec

8.9

9.1

7.6

As we see, the performance may grow up quite tangibly as soon as we start using the advantages of the 64bit architecture of the new AMD CPUs. The simple recompilation in certain cryptographic tasks may result into not the percentage of growth but times. Moreover, as we see, the 32bit code processing does not get any slower on a 64bit system, which is exactly one of the advantages of the x86-64.

However, it is not for nothing that we said: we are talking about the performance in applications with minimal memory addressing operations. The thing is that the recompilation of 32bit addresses, which are used by 32bit applications, may lead to performance lowering in a 64bit system. To prove this point we would like to offer you the results obtained in Stream benchmark compiled for 32bit as well as for 64bit Windows XP version:

 

Win32, 32-bit .exe

Win64, 32-bit .exe

Win64, 64-bit .exe

Copy (MB/s)

5453

5342

5299

Scale (MB/s)

5439

5370

5395

Add (MB/s)

5319

5253

5304

Triad (MB/s)

5354

5277

5298

And this is where a really unpleasant surprise is waiting for us: 64bit addressing slows down operations with the memory a little bit, while the use of 32bit code in 64bit Windows XP slow these operations even more. Although the performance reduction in the synthetic Stream test should hardly be regarded as critical: the memory speed difference for 32bit code used in 32bit OS or 64bit OS makes only 2%. But this is just the beginning.

The use of not synthetic, but regular 32bit applications for Windows XP 64Bit Edition will require from the operation system a few extra actions aimed at the organization of proper work with 64bit drivers and the environment. The examples are all here already. Just taken a look at the results obtained for our standard 32bit benchmarks ran in Windows XP 64Bit Edition:

 

Win32, 32-bit .exe

Win64, 32-bit .exe

ScienceMark 2.0, Molecular Dynamics Benchmark, sec

79.77

84.73

ScienceMark 2.0, Primordia, sec

378

385.67

ScienceMark 2.0, Cipher, sec

12.19

12.52

3ds max 5.1, Final Rendering, Underwater, sec

267

273

3DMark2001 SE, Default

18912

15346

Quake3 (four), High Quality, 1024x768x32

451.5

75.1

Unreal Tournament 2003 (dm-antalus), 1024x768x32

86.38

77.32

Well, this is very sad but 32bit applications work slower in 64bit Windows XP version. In fact, we could blame NVIDIA for the not very successful 3D performance, because they haven’t yet finalized their graphics driver for the Win64 platform (this is evidently proven by the performance hit in Quake3). However, in ScienceMark 2.0 the results are not at all dependent on the drivers. Nevertheless, we do notice about 6% performance reduction when we shift to Windows XP 64Bit Edition.

If they do not eliminate this drawback in the ongoing versions of Windows XP 64Bit Edition, then the 64bit mode of the new AMD CPUs as well as the Windows XP 64Bit Edition solution will have really hard times. The users will not shift to this platform without any serious reason, such as the faster 64bit application versions providing really critical performance differences. For example, 3D games of multimedia files processing software.

Conclusion

Well, it is pretty hard to sum up all our conclusions in a few words. Nevertheless, we will give it a try. Both major processor manufacturers, Intel and AMD, decided to introduce their new processor families for enthusiastic users, which in the first place include hardcore gamers. Now those of you who belong to this category will have to choose between AMD Athlon 64 FX and Intel Pentium 4 Extreme Edition.

As for the today’s newcomer, the freshly announced Athlon 64 FX-51 processor, this is the first CPU with AMD64 architecture for the desktop systems. Being in fact an analogue to the Opteron processor with 2.2GHz core frequency, AMD Athlon 64 FX-51 really performs very fast and defeats in most benchmarks Pentium 4 3.2GHz, which used to be the fastest desktop CPU until today. With a dual-channel memory controller, the total of 1152KB cache memory and SSE2 instructions support, Athlon 64 FX-51 became much faster than its predecessor: Athlon XP 3200+.

Moreover, we should also mention AMD64 technology implemented in the new AMD processor. The use of 64bit applications, which haven’t yet come to the market, though, should theoretically make AMD platforms much more attractive. In this respect we can only hope that Microsoft and software developers will do a great job on that, as the 64bit mode of the new AMD processors boasts a really impressive potential.

Intel responded to the Athlon 64 FX-51 announcement with the launch of Pentium 4 Extreme Edition CPU working at 3.2GHz and featuring 2MB L3 cache. This relatively simple move allowed Intel to improve its CPU performance in real applications by the good 15%, even though the average performance boost equaled only 3.5%. L3 cache proved most efficient in gaming applications, which once again confirms the positioning of this processor for the gaming market in the first place. The announcement of the new Pentium 4 Extreme Edition 3.2GHz CPU, which should actually start selling in mass quantities only within the next month or two, has every chance to give some causes for concern to the Athlon 64 FX-51 processor. Especially, since Hyper-Threading technology supported by the Pentium 4 Extreme Edition CPU and featuring huge potential for further performance increase will very soon find its way into games.

So far the situation in the market looks as follows. Intel’s new CPU copes better than its AMD rival with streaming data processing and multimedia files encoding. Also, it appears quite efficient for multi-threading tasks, such as 3ds max or Photoshop. The newcomer from AMD, however, proved really fast in scientific tasks and office. If we make some additional allowances, we will be able to state its leadership over the Pentium 4 Extreme Edition even in contemporary 3D games.

And in conclusion I would like to say a few words about the further prospects of the new Athlon 64 FX and Pentium 4 Extreme Edition processor families. Both product lines are currently manufactured with 0.13micron production process (AMD also uses SOI). These production technologies have already exhausted the frequency potential behind them, which we clearly see from the overclocking results. The maximum frequency we managed to squeeze out of Pentium 4 Extreme Edition CPU reached 3.6GHz (the Vcore for both of them was increased by 0.1V), while Athlon 64 FX-51 got just a little below 2.4GHz. It means that all ongoing processor models in these product families will have to be made with the new 90nm technology. Therefore, the further success of the new Pentium 4 Extreme Edition and Athlon 64 FX will have a lot to do with the successful transition to 90nm technology.