To tell the truth, server platforms have always been pretty hard to test for us. It's a real brain-cracker to model up typical tasks even for a modest-sized server. First of all, server applications are all very different and serve diverse purposes. Another trouble is to emulate appropriate network load. That's why at our web-site there are so few articles devoted to multi-processors server testing. Every time we have to summon all our wits and courage to write an article like this. But as you know, there is an exception to every rule. In our case it was the new dual-processor platform from AMD based on AMD 760MP core logic and Athlon MP CPUs. There are certain reasons why we couldn't help taking a closer look at it. Of course, during our investigation we tried not only purely server tasks, but also tried to figure out if this platform could work as a dual-processor workstation. Actually, it was quite enough to grasp the idea of the innovational approach AMD implemented in its new SMP system.
The main reason that encouraged us to test an AMD 760MP based platform was the drastic difference between the SMP systems from AMD and corresponding solutions from Intel. EV6 system bus licensed from Alpha and introduced in Athlon CPUs allowed AMD to create a dual-processor system with point-to-point North-Bridge-to-CPUs topology. As for the systems built on Intel CPUs, they leave the system bus shared between the processors. This organization causes all sorts of collisions, which AMD systems are free from. That is why Athlon based dual-processor systems look much more promising and simply more logical (surely, from the theoretical point of view). And a bit later we'll see how things stand in practice.
At this stage we think it necessary to supply a brief preliminary summary. In this article we'll touch upon such points as:
- The new Athlon MP CPU on Palomino core and its application in SMP systems;
- AMD 760MP core logic and MOESI protocol.
Alongside with the theoretical issues, we'll tackle the application aspects of Dual Socket A systems.
Athlon MP CPU
Two years after the arrival of the first, highly successful Athlon family, AMD migrates to a fourth processor core code-named Palomino. The very first CPU die, K7, was manufactured with the 0.25micron technology. Its follower, K75, migrated to the 0.18micron copper interconnect technology. Thanks to the next core, Thunderbird, Athlon got an on-die L2 cache and a new Socket A package. At last, Palomino arrived… AMD manufactured the new processor die as they had to go on increasing the clock frequencies without introducing a new finer 0.13micron technology. Since Thunderbird core allows the clock frequencies to top only 1.4GHz-1.5GHz, AMD needed to create a "link core" that would make up for the 1.5GHz-2GHz gap. This is how Palomino came into being. In fact, AMD developers simply redesigned the old Thunderbird core having reduced the CPU heat dissipation by 20%. They also granted Palomino a number of new functions, though you shouldn't overestimate them.
At first, Palomino found its place in AMD mobile CPU line, Athlon 4. Then, the core was introduced in the server Athlon MP CPUs, which are the key figures of this review. Let us have a closer look at Athlon MP characteristics:
- The CPU die is code-named Palomino. It is manufactured with the 0.18micron copper interconnect technology at Fab30 factory in Dresden.
- L1 cache is 128KB (64KB for data and 64KB for instructions); exclusive 256KB L2 cache is integrated into the core and works at the full core frequency.
- The existing CPUs are clocked at 1GHz and 1.2GHz; 1.4GHz and 1.53GHz models are due in mid-October.
- EV6 system bus works at 266MHz. The physical interface is Socket A.
- SmartMP technology support (dual-processor configurations).
- 3DNow! Professional instructions support (107 SIMD instructions).
- The core surface covers 128sq.mm. It comprises 37.5mln transistors.

As you've read from the specs, Athlon MP core is slightly bigger than the regular Thunderbird in Athlon CPUs. Where do the extra half a million transistors (or 8sq.mm) come from? Part of the answer is to be found in the specs.
Being built on the Palomino core, Athlon MP CPUs support 3DNow! Professional instructions. The older Thunderbird based CPUs support a bit smaller set of 3DNow! instructions. The new larger set of SIMD instructions with "Professional" postfix in the name acquired 52 new instructions, which ensure the compatibility of 3DNow! Professional with SSE instructions supported by Pentium III. It implies that Athlon MP works with applications optimized for 3DNow! as well as with those using SSE instructions. Unfortunately, AMD failed to enhance Palomino with SSE2 instructions support implemented in Intel Pentium 4 CPUs. They'll probably appear in the upcoming cores from AMD.
Besides a number of evident changes, Palomino boasts some deeper-rooted innovations. One of them is data prefetch mechanism. The idea of this mechanism is pretty simple: the CPU tries to predict, which data from the main memory it may need in the immediate future and puts this data into the cache. If the mechanism makes the right guess, data processing goes faster. This way, thanks to the data prefetch mechanism, Athlon MP CPUs try to use the processor and memory buses more evenly softening the peaks and increasing the load when the buses stay idle. As a result, the data prefetch mechanism may appreciably increase the performance in applications, which work with sequential data streams like it happens in video processing tasks, for example, or during rendering or database processing.
Furthermore, Palomino features enlarged Translation Look-aside Buffers (TLB). They serve to cache translated physical memory addresses. The CPU needs translation when it addresses any kind of data from the main memory, that's why address caching helps to substantially lessen the time lag between the moment the CPU requests the data and the moment it is received. According to the official information from AMD, the enlarged TLB, which increases the probability of a prompter return of the translated address, is likely to boost the performance in high-end software applications.
The last but not the least important innovation introduced in Palomino, which is also present in Athlon MP CPUs, is a thermal diode integrated into the core. It grants more exact monitoring of the CPU physical state, thus helping to prevent overheating (which matters a lot for servers). Unfortunately, the diode doesn't always work correctly, as you may have recently seen in an article by Dr. Thomas Pabst.
Our interim conclusion is that Athlon MP CPUs possess no specific server features. In this respect Athlon MP is similar to the mobile Athlon 4 or desktop Athlon XP CPU. Accordingly, it's not the CPU, which shapes the server orientation of AMD 760MP based platforms, but…
AMD 760MP Chipset
The key technologies that make Dual Socket A platforms real server platforms are concentrated in AMD 760MP core logic. Though the chipset's name is close to that of AMD 760, AMD 760MP has nothing to do with it. There are two important functions of AMD 760MP, which contribute a lot to the performance of dual-processor systems built on this chipset. These are a system bus with point-to-point topology and MOESI protocol support. Both these technologies are in a way unique, and dual-processor systems based on Intel CPUs don't support them. On the one hand, they make AMD 760MP chipset much more complex. On the other hand, they bring about a discernible performance gain when working with data intensively.
Unlike SMP systems built on Intel processors where both CPUs are connected to the FSB and have to share its bandwidth, in SMP systems based on AMD 760MP chipset and Athlon MP processors each CPU has its own bus. Due to this point-to-point pattern, each CPU is connected to the host (chipset) individually via its own bus, and can transfer up to 2.1GB of data per second (FSB clock frequency is 266MHz) regardless of what the other CPU is busy with at the moment. This way, the two CPUs together can afford using a mutual system bus with 4.2GB/sec bandwidth. Let us remind you that in SMP systems based on Pentium III CPUs the FSB bandwidth shared between the two processors is only 1.06GB/sec. Xeon based systems enjoy 3.2GB/sec, which is still less than AMD's 4.2GB/sec.
Point-to-point CPU connection pattern naturally suffers some drawbacks. First of all, it is really hard to implement, because there is a bus for each CPU to be laid out individually. However, as we see, AMD managed to solve this problem, and its AMD 760MP is a perfect evidence of that. Nonetheless, AMD 762 North Bridge of AMD 760MP is amazingly complicated. Just fancy 949 pins of this microchip compared to 569 pins of the regular AMD 761 North Bridge from AMD 760 chipset, and you'll realize what we mean.
It's natural to wonder whether each of the processors in SMP Athlon systems needs an individual bus. Indeed, the bandwidth of the memory bus, which both these CPUs address is only 2.1GB/sec (if the system is equipped with PC2100 DDR SDRAM). Why then? AMD has a decisive argument for that. In common words, CPUs in Dual Socket A systems know to exchange the data without addressing RAM, which is absolutely impossible in SMP systems built on Intel processors.
In order illustrate the theoretical stuff mentioned above, let us take a look at how the problem of coherent data in CPU caches is solved in dual-processor systems from Intel and AMD. We guess, there is no need to explain what a headache it is for the developers to provide identical data stored in the CPU caches, when both processors deal with the same data array. In particular, if the same data is stored in the caches of both CPUs and one of the processors modifies it, the other processor cache should be refreshed before the system permits the second CPU to use this data. To see all the highs of AMD's solution we suggest pointing out the shortcomings of Intel's one.
In SMP systems based on Pentium III or Pentium 4 CPUs with shared system bus each processor monitors the system bus to detect if the other CPU has addressed it. In case the other CPU requests the data modified by the first CPU and stored only in the first CPU's cache, the first processor quickly rewrites this data into its memory and only after that the chipset transfers the data to the second CPU. This way, much more time is spent on to-and-fro data transfer and the system bus gets loaded more heavily. Unfortunately, there is no other way out, since the bus is shared between two CPUs of the system.
To control data validity in the memory and CPU caches of its dual-processor systems, Intel introduced a special MESI protocol. MESI stands for 4 words denoting the state of a data string stored in the cache: Modified, Exclusive, Shared and Invalid. So, one of these four states corresponds to each data string stored in the processor caches:
- Exclusive
- Data in the corresponding cache line is the same as in the memory.
- The other CPU doesn't have this data in its cache.
- Modified
- Data in the cache string was modified.
- The other CPU doesn't have this data in its cache.
- Shared
- Data stored in both caches is identical.
- Invalid
- Data stored in the cache is invalid.
MESI protocol allows to somewhat unload the bus, because the data transfer from one processor cache to the memory and back to the other processor cache can thus be skipped in any mode except Modified.
AMD decided to take a different way and to give up all unnecessary data transfers between the caches via the main memory. It could be implemented without much trouble, as long as each CPU of the dual-processor system had its own bus and hence can transfer the data directly from one processor cache to another avoiding the memory. For this purpose Dual Athlon MP systems acquired a smarter MOESI protocol borrowed from Sun and Alpha architectures.
MOESI protocol features one more data state - Owned. Data is described as "Owned" when a data string from the cache of the first processor is marked as "Modified" and the other CPU addresses this data string. In this case, the data is transferred directly from the first CPU cache to the second one via the North Bridge. The transferred data string in the first processor cache is marked as "Owned" and the received data string in the other CPU cache - as "Shared". Note that the Owned processor is responsible for the update of the data stored in the memory, if the second CPU doesn't modify the data in the corresponding string of its cache.
This way MOESI protocol helps to unload the memory bus a lot and to get rid of the seeming misbalance in Dual Socket A systems occurring because of the two CPU buses with 2.1GB/sec bandwidth, which work with one memory bus of the same bandwidth.
Now that we've told you about the architectural tricks implemented in dual-Athlon platforms, let us pass over to describing AMD 760MP core logic features. This chipset is designed in a traditional way, i.e. comprises two chips. It includes AMD 762 North Bridge and AMD 766 South Bridge, which we have already met in AMD 760 chipset. The Bridges are connected via PCI bus.
The North Bridge of AMD 760MP supports two Socket A processors with 200MHz or 266MHz FSB. Though only Athlon MP CPUs are marketed as server solutions, all the Socket A processors from AMD (i.e., both Athlon and Duron families) can work in dual-processor systems. The memory controller integrated into the North Bridge supports 266MHz or 200MHz DDR SDRAM with ECC. The memory frequency in AMD 762 is synchronized with the system bus frequency. AMD 760MP systems require Registered DIMM, but allow up to 4GB RAM, which is the maximum for 32bit processors. AMD 762 also supports AGP 4x bus. also it is worth mentioning that the chipset North bridge has an integrated PCI controller, which supports up to 7 PCI Bus Master devices working at 33MHz bus frequency and up to 2 PCI Bus Master devices working in 66Mhz mode. However, with the AMD 766 South Bridge the PCI bus works only in 33MHz mode, which automatically excludes the support of 66MHz PCI devices. This is no hopeless case, however: the mainboard makers can theoretically lay out 64bit 33MHz slots on their AMD 760MP based boards.

For this reason AMD is about to launch a new South Bridge for AMD 762 - AMD 768. The chipset comprising both chips will be called AMD 760MPX and will sport more advanced architecture. AMD 768 will be connected to the North Bridge via 64bit 66MHz PCI bus, so the chipset PCI controller will work in 66MHz mode as well. A PCI controller for regular 32bit 33MHz devices will be implemented in AMD 768 South Bridge. The chart below depicts it all pretty clearly:

Note that AMD will be the only core logic manufacturer to offer solutions for Dual Socket A configurations. At first, VIA used to contemplate creating an analog to AMD 760MP based on KT266 chipset, but then gave up this idea. Unfortunately, it turned out too hard to develop a core logic supporting point-to-point topology. So, today only AMD is experienced enough to cope with a task like that.
Mainboards
Until AMD starts shipping AMD 760MPX with the new South Bridge, AMD 762 North Bridge is nothing more but a pilot product. It means that there is just one manufacturer so far whom AMD has chosen to produce mainboards for Dual Athlon MP systems. This is Tyan, a well-known figure on the server market. In November, when AMD starts manufacturing new AMD 768 chips in mass, AMD 760MPX core logic will become available for everyone. Among the manufacturers who have already announced their intention to launch Dual Socket A mainboards by the end of the year are MSI, Gigabyte, ABIT and ASUS.
As for the mainboards available today, there are only two dual-processor Socket A solutions. These are AMD 760MP based Tyan Thunder K7, which is designed for rack mount servers, and a simpler version called Tyan Tiger MP.
Tyan Thunder K7 (S2462) is created according to AMD reference design. It is equipped with two Socket A for AMD CPUs with 200MHz and 266MHz bus, one AGP Pro slot for graphics cards consuming less than 110W of power, and five 64bit 33MHz PCI slots. The DIMM slots for Registered PC2100/PC1600 DDR DIMM are bent at 25 degrees. It allows building systems on Tyan Thunder K7, which easily fit into 1U cases often used for web servers. Another feature proving that the board is well suited for this particular application field are two integrated LAN controllers from 3Com, an integrated graphics card built on ATI RageXL chip, and an integrated dual-channel Ultra160 SCSI controller. Thus, Thunder K7 has a complete bunch of integrated controllers a wholesome web server needs.
There exists a modification of this mainboard without an integrated SCSI controller.
By the way, Thunder K7 requires a special power supply unit with a non-standard connector layout. At present there are only two companies shipping PSUs like that: Delta and NMB Technologies. The first produces 450W PSUs, the other one - 460W PSUs matching Thunder K7 specifications.
The characteristics of the second Dual Socket A mainboard from Tyan, Tiger MP (S2460), are much more modest. In fact, this mainboard has no additional integrated components, so Tiger MP is a good choice for low-profile servers and workstations. The board has a regular AGP 4x slot, two 32bit and four 64bit 33MHz PCI slots. Four DIMM slots support up to 3GB PC2100/PC1600 DDR memory (again, they're meant only for Registered DIMM). It's also very important that Tiger MP requires no special power supply units, so it enjoys much wider application sphere.
It goes without saying that neither of these mainboards has any overclocking friendly options.
The current average price of Thunder K7 is $450; its SCSI controller-free modification costs $370, and Tiger MP - $230. When there appear AMD 760MPX based mainboards from other manufacturers, they should be priced within $200-$250 interval. This should definitely give a tremendous boost to AMD platforms on the mass market.
Unfortunately, it is still impossible to state that Dual Socket A platforms are ready to conquer the server and workstation market. The first obstacle most user may face is the utterly poor choice of corresponding mainboards. Besides, there are some other things preventing AMD 760MP from becoming that popular. As you remember, AMD 760MP platforms support only special PSUs and don't support regular DDR DIMM as well as fully-fledged 64bit PCI bus. However, all these problems should be settled by the end of this year, when Dual Socket A solutions begin their onslaught on the market.
Testbed and Methods
Having paid due attention to the new technologies AMD implemented in its dual-processor platforms, we suggest finding out what they are worth in real conditions. Although we cannot yet test SMP systems in server applications, but it's within our power to see if they are good for workstations.
We tested a dual-processor system on AMD 760MP chipset and with two server Athlon MP CPUs as well as with common Athlon processors on Thunderbird core. Our Dual Socket A sample was built on Tyan Thunder K7 with 512MB Registered PC2100 DDR SDRAM.

We compared the performance of this system with that of single-processor systems on AMD Athlon MP/Athlon CPUs and AMD 760 chipset.
We also included the results for systems with Intel processors: a single-processor system built on Intel 850 core logic and Pentium 4 2GHz and a dual-processor system based on Intel Pentium III-S processor with Tualatin core and Server Works ServerSet III LE chipset.
Unfortunately, we didn't get hold of a dual-processor system on Intel Xeon CPUs, because they are too rare. Anyway, these systems are intended for another price sector and will not compete with Dual Socket A platforms so far.
So, the testbeds assembled look as follows:
| Athlon MP 1.2 (Dual) | Athlon 1.4 (Dual) | Athlon MP 1.2 (Single) | Athlon 1.4 (Single) | Pentium 4 2.0 | Pentium III-S 1.13 (Dual) | |
|---|---|---|---|---|---|---|
| CPU | 2 x AMD Athlon MP 1.2 | 2 x AMD Athlon 1.4 | AMD Athlon MP 1.2 | AMD Athlon 1.4 | Intel Pentium 4 2.0 | 2 x Intel Pentium III-S 1.13 |
| Chipset | AMD 760MP | AMD 760 | Intel 850 | ServerWorks ServerSet III LE | ||
| Mainboard | Tyan Thunder K7 | EPoX EP-8K7A | ABIT TH7-II | Supermicro SUPER P3TDLE | ||
| Memory | 512MB PC2100 Registered DDR SDRAM | 512MB PC2100 DDR SDRAM | 512MB PC800 RDRAM | 512MB PC133 Registered SDRAM | ||
| Videocard | Gigabyte GV-GF3000DF (NVIDIA GeForce3) | Matrox Millennium II | ||||
| HDD | IBM DTLA-307015 | |||||
We ran all the tests in Windows 2000 SP2.
Performance
We would like to stress in the very beginning that the main aim of this test session was to find out how AMD processors feel in dual-processor configurations. So, if you're interested in the performance of Athlon MP (Palomino) in single-processor systems, you're welcome to check our Pentium III-S (Tualatin) Review, where we paid keen attention to Athlon MP processors, too.
As we have already mentioned several times in our reviews of different dual-processor mainboards, it makes sense to use dual-processor systems either for applications processing several calculation threads, or in operation systems with SMP configurations support if you have several applications working in parallel. That is why it's pointless to test dual-processor systems in the programs without dual-processor support, say in most 3D games. You will merely obtain the same results as in case of single-processor systems. That was the logic we followed to compile the test scheme. But before we get down to the results our "Guinea pigs" showed in real applications, let's cast a glance at a popular synthetic benchmark, SiSoft Sandra 2001.

The algorithm, which SiSoft Sandra 2001 follows to measure CPU performance, depends neither on the chipset nor on the memory used. It allows to create two calculation threads and shows that theoretically dual-processor systems working in ideal conditions can be twice as fast as single-processor ones. As for the systems' relative results in this test, you'd better treat them with certain skepticism. The matter is that Sandra is a synthetic benchmark, therefore the results shown in it may not correspond to their performance in real applications.
Nevertheless, we have to note that according to this benchmark, the systems with Athlon 1.4GHz CPU prove faster than those with Athlon MP 1.2GHz CPU. It means that Thunderbird and Palomino cores do not differ too greatly in performance. Pentium 4 2GHz turns out to be close to the single-processor system on Athlon 1.4GHz, while the Dual-processor system on Pentium III-S 1.13GHz system almost catches up with the dual- Athlon MP one.

Another test from SiSoft Sandra 2001 is a lot more informative. It measures the real bandwidth of the memory subsystem. The first thing to stress is the high result of the Pentium 4 system. It is based on Intel 850 chipset with RDRAM, which maximal bandwidth makes 3.2GB/sec, which determines its victory over all the other testing participants. We would also like to mention relatively low performance of the dual Pentium III-S system built on ServerWorks ServerSet III LE core logic. However, taking into account that it's the only system with PC133 SDRAM featuring 1.06GB/sec top bandwidth, this result appears no longer that surprising.
Special attention should be paid to the real bus bandwidths of the single- and dual-processor Socket A systems. As you can see, a second CPU added to the single-processor system leads to a good 25% increase in the real bandwidth. Since the CPUs and the chipset in AMD 760MP based systems has point-to-point topology, an extra CPU bus helps to load the memory bus more evenly.
Now we'll check the performance of the dual-processor system from AMD in office and content creation applications. As usual, we'll resort to Winstone testing package. Please take not that dual-processor systems are unlikely to enjoy a huge performance increase here: most of the applications in Business Winstone 2001 and Content Creation Winstone 2001 don't have dual-processor support. However, Winstone 2001 emulates real conditions in these applications, launching some of them in parallel and shifting between them from time to time. Therefore, dual-processor systems can still appear a bit faster in some cases.

In this test all the systems run nearly neck and neck.

Content Creation Winstone 2001 indicates a more obvious difference in the performance of the single- and dual-processor systems. No wonder - some applications of this benchmark (for example, Adobe Photoshop 5.5) can create a number of calculation threads working in parallel.
We mark it out now and again that content creation applications are awfully sensitive to the memory bus bandwidth. That is why the dual Pentium III-S system with PC133 SDRAM cannot keep up with the other racers.
Unfortunately, eTesting Labs Inc. offers no up-to-date benchmarks for dual-processor workstations. They created a test like that in 1999. This was Dual-Processor Inspection Test from Winstone 99 intended to measure the performance in tasks with several calculation threads.

According to the results, dual-CPU systems work far more efficiently in applications supporting dual-processor configurations than single-CPU systems. By the by, this trend may probably alter in the nearest future. You see, contemporary CPUs have no special abilities to support multi-stream applications. However, Intel R&D guys have been working hard and long in this sphere, so Northwood core will be provided with appropriate hardware features. Although this technology code-named Jackson will be implemented only in server Xeon CPUs at first, some time later it may appear in desktop processors too.
Now it's high time we looked at the systems' performance in applications included in Dual-Processor Inspection Test.

Microstation SE is a typical CAD/Design application where performance is basically determined by two factors: the FPU and the graphics subsystem. That's where Athlon CPUs take the advantage of their fastest FPU to break ahead of all the other systems with Intel processors. However, we should also point out that dual Athlon and Athlon MP systems proved slower than similar single-processor systems in some cases. The reason is pretty simple: the AGP vxd-driver shipped together with AMD 760 and AMD 760MP based systems cannot work with the dual-processor AMD 760MP core logic as efficiently as with its single-processor counterpart.

Surprising as it may seem, the dual Pentium III-S system runs ahead of all in the "ancient" fourth version of Adobe Photoshop. As you remember, this application version doesn't support any SIMD instructions whatsoever.

In Visual C++ 6.0 two projects are compiled in parallel. Again, the dual-CPU system on Tualatin heads the race.

As the latest version of FlaskMPEG has SMP support, we measured how long it took the systems to encode a DVD stream into MPEG-4. Dual Pentium III-S is the winner again, though this time it has run not so far ahead. Anyway, this gives another evidence of how important a large L2 cache is for server processors. The 512KB L2 cache of Pentium III-S makes it possible to partially unload the CPU and memory buses, reducing the number of data transfers.

To assess the performance during final rendering in 3ds max 4, we measured the time needed to render Anisotropic Wheel scene in 800x600 resolution mode. So, the shortest time stands for the best result. As you remember, in this application rendering depends directly on the FPU performance. Moreover, 3ds max 4 is very good at distributing calculation streams between several processors. Subsequently, dual-processor systems outpace their single-processor rivals considerably in this test, whereas Athlon based systems use their more powerful FPUs to break ahead of the Intel Pentium III-S based system.

We have also found out what these CPUs are capable of in 3ds max 4 test in ViewPorts. For this purpose, we selected three most illustrative benchmarks. In our article dealing with the 3ds max test they were numbered as 1 (general stress test), 4 (geometry visualization) and 12 (wireframe visualization). Unfortunately, we had to exclude the dual Pentium III-S system from this test, because the ServerSet III LE based mainboard we used it with had no AGP slot support.
As a whole, we observe the same tendency as in Microstation SE. The ill-optimized AGP port of the AMD 760MP based system lets the single-processor systems on AMD 760 outrun the dual-processor monsters. Even the computing capacity of the second processor is not enough to help them out. The leading horse here is the system with Intel Pentium 4 2GHz CPU.

To test our systems in the latest version of Adobe Photoshop 6.0.1 (with Pentium 4 patch) we used PSBench script. It launches about 20 various filters and works with 50MB images. Our chart depicts the time spent to get through the whole task, so the shortest time corresponds to the highest performance.
A common scene again: dual Socket A systems demonstrate the best performance. For a more insightful description of the systems working with different filters, we would like to offer you PSBench results in detail:
| Athlon 1.4 | Athlon 1.4 (Dual) | Athlon MP 1.2 | Athlon MP 1.2 (Dual) | Pentium 4 2.0 | Pentium III-S 1.13 (Dual) | |
|---|---|---|---|---|---|---|
| Rotate 90 | 5 | 4.9 | 5.1 | 4.8 | 4.8 | 5.1 |
| Rotate 9 | 11.2 | 10.9 | 10.6 | 10.6 | 12.1 | 12.4 |
| Rotate .9 | 11.4 | 10.8 | 10.4 | 10.3 | 11.2 | 10.7 |
| Gaussian Blur 1 pixel | 6 | 5.4 | 5.4 | 5.1 | 5.5 | 9.6 |
| Gaussian Blur 3.7 pixels | 12.3 | 10.3 | 11.7 | 9 | 10.1 | 11.5 |
| Gaussian Blur 85 pixels | 14.2 | 11.3 | 12.4 | 10.8 | 10.9 | 12.3 |
| 50%. 1 pixel. 0 level Unsharp Mask | 5.2 | 4.5 | 5.5 | 4.1 | 4.7 | 6.4 |
| 50%. 3.7 pixel. 0 level Unsharp Mask | 13 | 10.5 | 11.8 | 9.2 | 10.6 | 11.9 |
| 50%. 10 pixel. 5 level Unsharp Mask | 12.7 | 10.5 | 11.9 | 9.4 | 10.4 | 11.7 |
| Despeckle | 6.8 | 5.8 | 7 | 5.7 | 9.6 | 8.6 |
| RGB-CMYK | 19.5 | 18.4 | 21.2 | 19.9 | 18.3 | 19.1 |
| Reduce Size 60% | 2.8 | 2.4 | 2.8 | 2.4 | 2.1 | 2.2 |
| Lens Flare | 14.1 | 10.6 | 14.5 | 10.6 | 13.1 | 15.3 |
| Color Halftone | 15.7 | 16.5 | 16.4 | 18.4 | 17.5 | 26 |
| NTSC Colors | 7.1 | 6.9 | 8.3 | 7.5 | 7 | 8.9 |
| Accented Edges Brush Strokes | 21.4 | 22.9 | 23.2 | 24.6 | 22.4 | 23.6 |
| Pointillize | 35.4 | 22.7 | 39.7 | 25 | 35.3 | 25 |
| Water Color | 42.3 | 44.4 | 45.4 | 46.7 | 46.4 | 45.8 |
| Polar Coordinates | 25.2 | 16 | 20.7 | 13.6 | 23.4 | 18.7 |
| Radial Blur | 96.6 | 67.2 | 103.3 | 62.5 | 87.2 | 59 |
| Lighting Effects | 7.4 | 6.7 | 7.2 | 7 | 6.9 | 9.1 |
The last thing we were keen to clear out was how dual Socket A systems would cope with Quake3 - a game that supports the SMP mode when r_smp 1 is entered in the console.

The results give us another reason to complain about the ill-implemented AGP support in AMD 760MP. Even with enabled SMP mode the dual-processor AMD 760MP based systems lag behind their single-processors brothers on AMD 760. We hope, the new drivers will improve the situation. If not, hardly anyone will suffer a lot: most applications one runs on dual-processor platforms are indifferent to the AGP bus performance. In this respect, Quake3 is no typical thing.
Conclusion
Though dual-processor systems based on AMD 760MP core logic are just a trial ball AMD throws at the server market, they have a good chance to become a real success in this sector. Later this year, when Athlon MP 1.5GHz+ become available, and comparatively cheap mainboards based on the new AMD 760MPX chipset arrive, AMD is likely to throw Intel back in the value server market now flooded with Pentium III and Pentium III-S CPUs, as well as in the performance market, where Intel introduces its Xeon based systems. We don't risk making any further predictions knowing that Intel is busy preparing some novelties for the server market. Whatever they are, it's definite that Intel won't escape fierce rivalry. Besides, don't forget that AMD has implemented the whole bunch of technologies that increase the performance of SMP systems impressively.
In the nearest future we'll see whether AMD will repeat its desktop success on the server market as well. No doubt, the company has gigantic plans. Before it gives birth to a new family of 64bit Hammer processors, AMD wants to launch server Duron CPUs for value SMP systems and to start Athlon MP migration to the 0.13micron technology.



