AMD: Per Aspera Ad Astra

In this analytical material we are giving forecasts about the further development of the AMD processor families, namely Athlon 64 and Opteron. How will AMD react to new Intel’s moves? How will the K8 core develop in the future? Are you curious about all these things? Then read our article and you will find the answers!

by Victor Kartunov
03/31/2004 | 06:54 AM

Some time ago we indulged in a little experiment trying to predict the future behavior of Intel Corporation in our “Where Going, Intel?” article. We had just bare facts and our razor-sharp common sense at our disposal and these were enough to forecast with a certain degree of accuracy the way of the semiconductor giant. Well, no corporation, even the biggest one, will ever live long in this cruel market environment without listening to common sense.

 

Of course, this method lacks precision when we don’t have any information or facts at all, but we may well imagine the directions of the main blows knowing the style of the corporation. We can surmise the general direction it is going to take in the near future. We can weigh up the company’s current products with all their pros and contras and suggest the most probable development of the situation. Anyway, it is the future that’s the final judge. It tells us what has been wrong or right in our half-analytical and half-intuitive suppositions. That’s the fate of all Cassandras, especially the ones from the IT industry.

The Good of Being Imaginative

So here are the events we found probable to occur with respect to Intel in our previous article:

  1. The thought that the question of “64 bits” wouldn’t receive a clear answer until the last moment. Well, that’s exactly what we saw. Only the top managers of Intel Corp. knew about the variant chosen, while rank employees (as well as the press and the company’s partners) had no idea about whether Intel would support the 64 bits in desktop processors and if yes, which instruction set? Now the situation has been clarified – the corporation listened to common sense and chose the AMD64 set (although gave it another name) – but back then, it was the main intrigue. By the way, we pointed at this variant as the most probable in our previous article.
  2. The new 90nm Prescott core. The rumors about its appearing in February were right. The availability of these processors is questionable (especially, with regard to topmost models). The release was a kind of “paper” one. Today, some time after the announcement, we don’t see an abundance of top-end Prescott-core processors, although testers did receive some samples. The performance didn’t go up, but rather dropped down (or remained at the same level, if you wish). The new core doesn’t outperform the hi-end Pentium 4 Extreme Edition. This is all like we supposed.
  3. We also put forth a supposition that the main battle would be fought at an adjacent field. And really, the announcement of new Pentium 4 platforms is rescheduled from Q3 to the end of Q2, evidently because of some heroic advances in the development process. We’re now waiting for the LGA755 socket and the PCI Express bus. Intel people are talking about the advantages of the new bus (in the PCI Express x16 variant) for graphics, but the graphics bus will first be working in the half-duplex mode and all dainties will be postponed for the next generation of chipsets. Moreover, I venture a supposition that Intel must promote the transition to the PCI Express x16 and oppose to installation of the AGP 8x slot in new systems, although, as practice suggests, we haven’t seen any performance breakthroughs from the AGP 4x – 8x transition either. That’s not all. The heads of Intel have reminded us recently that it is necessary (I wonder – for who?) to switch to the new memory standard, DDR2, which will bring you the fantastic performance you’re dreaming of. Memory manufacturers have started reporting about certification of their new memory modules with Intel, slurring over the price. Even most optimistic forecasts say that DDR2 is going to cost twice as much as DDR, without giving any bandwidth advantages and with higher latencies. DDR2 has only one potential advantage over DDR – it is expected to conquer higher frequencies in the future. Bingo! That’s exactly what we’ve been waiting for – the new bus and the new memory are nothing much by themselves, but they are strong trumps in the battle of the marketing departments. Memory is even more important for Intel as it is harder for AMD to implement DDR2 support into its Athlon 64 (the memory controller is integrated into this processor, as you know).
  4. We promised that the Prescott wouldn’t be very widespread for the Socket 478. You can now take a look at Intel’s roadmap: the 3.6GHz model will be available for the Socket LGA755 only.

As you see, there are enough of coincidences. We are quite satisfied and pleased with the accuracy of our fortune-telling and are going to take to it with more vigor. Our last article was mostly concerned with Intel’s possible moves and actions. Now we’ll try to see the future of AMD in our magic mirror.

Let me remind you once again, that we don’t have any insider information. This article only contains our own guesses and opinions about the probable course of future events.

Finding the Philosopher's Stone

AMD has long been searching for the ultimate weapon to fight Intel with. The company has found it – the AMD64 technology. It is the first revolutionary extension to the x86 architecture since the 386 processor and transition to 32-bit calculations. By the way, you may recall that the last transition took as long as 10 years, so we shouldn’t wait for the whole industry to be hasty about recompiling software for the 64 bits – many programs just don’t need that. Anyway, no one will be worse off for the change, while certain categories of tasks have long been in need of a 64-bit architecture. Well, software developers now have this option. It is even more important for AMD that the AMD64 technology is purely of their own development – a hefty plus to the corporation image.

We can also recall one evolutionary extension to the x86 architecture, the MMX technology. All later SIMD extensions – 3DNow!, SSE, 3DNow! Enhanced, SSE2, 3DNow Professional, SSE3 – are just sequels to the original idea of SIMD instructions, for other data types.

We can already speak about the industry’s having accepted the AMD64 technology. Right now (i.e., about a year after the official arrival of AMD64), there are over 1000 companies cooperating with AMD on hardware and software projects, starring such giants as IBM, HP, Sun and prominent software developers. Moreover, Microsoft has already offered its Windows XP 64 for Athlon 64 and Opteron for users to taste. You don’t have a great choice of programs to run in this OS yet (besides 32-bit applications, of course), but the release of the operating system is a signal for many software developers to start – no one will write programs for a non-existing OS. Thus, again, AMD64 has found its place under the sun. There’s now no question about if it is going to become a mass technology and the main direction of the evolution of the x86 architecture (by the way, this situation has no precedents in the past) as Microsoft has supported it. On the other hand, this technology has been invented and promoted by a firm that has about one fifth of the processor market, so there are some “nuances” possible.

Anyway, the 64 bits is a powerful weapon in the hands of the marketing men and AMD will base its advertising and marketing campaigns around it. AMD will probably draw the sword when it starts promoting its Athlon 64 architecture for the mass user – so far, the presence of these processors has only been felt in the sector of top-end systems. This will coincide with the announcement of the Socket 939 platform (scheduled for April), which is going to become the mainstream platform with AMD64 support. This platform will remain a mass one for several years – it’s hard to tell how long as it depends on too many factors.

Intel also realizes the appeal of the “64 bit for everyone” slogan. Against its own will, with words like “it’s not very necessary, but if people demand – here it is”, the corporation announced that it would ship Xeon processors with CT technology in the second half of the year (the technology will probably be named EM64T – Extended Memory 64 Technology). This technology actually copies AMD64 without some additional features like 3DNow! Moreover, it won’t support the so-called NX bit (the abbreviation for “No eXecute”), which allows for hardware protection against such a common plague of 32-bit OSes as the “buffer overrun” error. Intel included something of its own, the SSE3 instruction set, but I think they’d be better off keeping full compatibility with the AMD64, rather than trying to put a good face on things. Moreover, SSE3 is not a wide step onwards as SSE2 was: in fact, SSE2 allowed working without the x87 coprocessor, which is difficult to wield efficiently, while SSE3 only adds a few previously omitted instructions.

One of the reasons for this incomplete support may root in the fact that this NX bit is supported by the hi-end Itanium platform and Intel wants to distance the two platforms from each other.

By the way, the Xeon 3.6GHz with 1MB L2 cache, the first processor to support EM64 technology, is rumored to come out sooner than anticipated. This fact is a signal that Intel is worrying about its share of the server market where the Opteron now shows very good speed characteristics.

The Show Must Go On

Let’s try to think of possible steps AMD is likely to make. Of course, the reaction to Intel’s actions should be as simple to realize as possible and should also bring tangible marketing profits (sometimes that’s more important for the sales results than real technical advantages). So:

  1. Dual-channel memory and the Socket 939 platform. The transition to these two things had been announced beforehand and looks quite logical. On the other hand, this is a bad reaction as contains no surprises – this transition is only expected to give some performance gains. We’ll discuss the problem of memory in the next section of the review. For now, let’s say it once again – this is a poor answer to Intel.
  2. DDR2 memory. It cannot make a quick and ready answer since requires a serious overhaul of the processor and mainboards. The very necessity of DDR2 memory for the Athlon 64 is arguable – we’ll return to this subject shortly.
  3. CPU frequency growth (building a performance reserve). Today, it is difficult to develop Athlon 64 (Opteron) processors faster than 2.4GHz using the 130nm + SOI technology. The current core seems to have its frequency ceiling somewhere around 2.4GHz and it would be very hard to overcome this limit and have an acceptable chip yield. Of course, we’ll see some operational 2.4GHz (and faster) dies, but I doubt they will be a mass product. AMD will probably have prepared a handful of samples for the announcement of the new platform, but they won’t be cheap and widespread. Overall, such processors will come into the mass market only after the company has switched to the 90nm + SOI technology, scheduled for the Q3 of this year. Thus, the frequency growth is no answer to Intel, as Intel’s marketing machine will be busy trumpeting the PCI Express and DDR2 as “the bus and memory of the future”, respectively, just around that time. The answer is needed sooner.
  4. Expanding the L2 cache involves too many problems: the model with 1MB of cache is already large (193 sq. mm – what shall we expect from a larger cache?) and the necessity of redesigning the cache controller won’t be an easy matter. Moreover, larger cache requires more time to be accessed. The example of the Pentium 4 XE shows us that 2MB cache is welcome in the server market, but gives little profit for the desktop PC. This may be an option for the Athlon 64 FX and Pentium 4 XE, which are “extreme processors for extreme people” by their definition, but not a mainstream solution.
  5. Speeding up the HyperTransport bus should bring some benefits, although their scale is hard predict. This idea seems to be good and easily realizable since the Socket 939 platform is not yet announced. They can just correct the specifications (by either changing the base frequency from 200MHz to 250MHz or increasing the bus multiplier from 4x to 5x) and all new systems will support this clock rate.

At first, the idea of a faster bus seemed improbable to me, I didn’t believe it at all. At first sight, this is really unnecessary: the current realization of the HyperTransport bus provides enough bandwidth. On closer examination, the idea seems viable. Why not?

So what does AMD get from speeding up the HyperTransport bus? First of all, the bandwidth will grow from 6.4GB/s, which Intel’s current bus also provides, to 8GB/s, which the competitor will only have after realizing its 1066MHz bus, not very soon. That’s an advantage, especially from the marketing point of view. Second, and that’s even more important, we enforce the winning aspect of the Athlon 64 architecture – the low data exchange latencies. By increasing the bus carrier frequency from 800MHz to 1000MHz (25%) we reduce the latency by the same 25%. Third, there’s one more marketing advantage: what sounds better – 1.6GHz (the previous version of the HyperTransport) or 2GHz? And AMD will beat Intel with Intel’s own weapon: earlier, the 800MHz Quad Pumped Bus of the Pentium 4 platform stood against the 400MHz bus of the Athlon XP one. I guess it’s clear who used to win from the comparison. Now these 800MHz of the QPB will stand next to the beautiful and round number, 2GHz! The advantage of AMD’s bus will look simply overwhelming in the eyes of an inexperienced user, who’s not versed in technicalities.

The gains from a faster, “overclocked” bus will be more conspicuous for multi-processor mainboards for Opterons due to the features of the architecture of Opteron-based multiprocessor systems. All inter-processor communication is performed through HyperTransport busses – by making this bus faster we make data transfers faster and reduce latencies. Opteron-based systems would become “flatter” for the OS – it would take about the same time to access the processor’s own and the other processor’s memory. This effect (flattening of the memory space) will be visible when the bandwidth of the HyperTransport bus exceeds the memory bandwidth for each processor. In this case, the Opteron-based architecture will reach the highest efficiency. Moreover, the faster the inter-processor bus is, the better the system scales up with the number of processors. In ideal, the bandwidth of the inter-processor busses should be enough for transferring all requests to memory from all processors. That is, the bus bandwidth should ideally be equal to the total memory bandwidth of all processors. Today, two-processor systems need 12.8GB/s (that’s enough for the processors to read data from the memory belonging to the other processor at full speed). It is harder to make this estimate for 4-processor systems due to different routes the request may take. An approximation tells us that a triple memory speed should be enough. In other words, the number is somewhere around 20GB/s (we suppose that the memory bandwidth = 6.4GB/s, corresponding to dual-channel DDR400 SDRAM).

Now, let us recall AMD’s having recently announced the new HyperTransport standard, version 2.0. Its distinguishable features are support of compatibility with the PCI Express (that’s reasonable since the industry will surely embrace the new peripheral bus – the PCI is not enough for many modern devices) and new speeds. The maximum speed looks most curious: 22.4GB/s! Once again we have a strange coincidence? Well, let’s not be paranoid. Current systems use a 16-bit-wide bus with a twice lower speed and that’s enough for a two-processor machine – it becomes quite flat for the OS with respect to the access times to the processor’s own and the other processor’s memory.

It’s probable that AMD will build an etalon system around the new bus version (this would be most interesting for four- and eight-processor systems) – the performance growth will cover the cost of its development. This also will require that the new version of the HyperTransport bus is supported by processors. I won’t be surprised to find that one of the next steppings of the Opteron (Athlon 64) has an undocumented support of this bus. This support may even have been already included into the recently released CG stepping.

Anyway, this is rather a distant future, while now we’re going to see a trivial but so pleasant acceleration of the HyperTransport to 1GHz. This fact is nice as it is, but the first generation of Opteron-supporting mainboards is surely not intended for any acceleration of the HyperTransport. This problem may have a solution like BIOS upgrade – specific situation will depend on the manufacturer of the clock and the “reliability reserve” of the bus wiring on the PCB. I think that the platforms were originally designed with some reserve for the bus to be noise-tolerant. However, we can’t deny or accept this supposition. By the way, if the base frequency remains the same, it’ll be easier – it shouldn’t be a problem to change the multiplier coefficient. For example, one of the first Opteron-supporting mainboards from Arima already supports this bus frequency.

Let’s summarize: introduction of a faster bus brings huge dividends, both technological and marketing. The lifecycle of the platform will be longer due to its better characteristics. The acceleration of the HyperTransport bus has been rumored by various news agencies recently – there is no smoke without a fire.

Memories of the Future

To my opinion, Intel will do its utmost to make DDR2 memory popular. Intel’s efforts usually materialize (the old story with Rambus is an exception that only confirms the rule) and AMD must have some plan of actions in case this memory does become widespread.

Let’s first examine what DDR2 memory is. As you know, each memory type can be characterized by a couple of parameters: access latencies and bandwidth. It’s all right with bandwidth – DDR2 has the same bandwidth as DDR (per megahertz) while DDR2 will gain frequency easier. Moreover, DDR2 brings some purely technological advantages like on-chip termination, which help with the PCB wiring.

The latencies are another matter. DDR2 has high latencies. It seems like this memory type was specifically designed to negate the basic advantage of the Athlon 64 architecture, its low latencies. Well, there’s no conspiracy against AMD, of course. There’re no other ways to increase the performance of dynamic memory left. It’s only possible to slowly increase its frequency as the constantly improving technological process allows. But as the elementary memory cell hasn’t principally changed, we can’t hope for any breakthroughs in this front. Alas, the laws of physics are inexorable. Until the structure of the elementary memory cell changes, we’ll have no revolution in the PC memory field. Thus, the main direction to improving the memory performance is in speeding up the channel between the memory controller and the memory chip. In other words, the data-transfer rate and the peak bandwidth of a memory module grow, while the relatively low speed of the memory cell itself is compensated by various techniques like phase shift when accessing different cells.

Let’s view this in numbers. The typical timings combination for DDR400 memory of average quality is now 2.5-3-3 or 3-3-3. In other words, the access latency is 12.5 or 15 nanoseconds. Typical DDR2-400 available at the moment works according to the 4-4-4 scheme (20 nanoseconds for access). That’s the progress (with a minus sign) they are promising to us!

Once again, DDR2 has no advantages over DDR in bandwidth if their frequencies are equal. DDR2 has higher latencies than the previous standard. DDR2 has on-chip termination and heats up more than DDR (you may recall DDR2 chips on graphics cards: unlike DDR chips of the same frequency, they have to be covered with heatsinks). It’s no wonder then that Intel got busy with a new system case form-factor, BTX, as we’ll have another heat source in the system. The only advantage of DDR2 over DDR is its being able to reach higher frequencies, although DDR has already notched 550-560MHz (so far, only in overclocker modules). Well, DDR2 will grow to 800MHz in the future, and DDR won’t get that high, that’s sure. On the other hand, we won’t see those 800MHz soon.

As for the price factor, DDR2 memory will be expensive. Right now, the manufacturers promise a price proportion of 2:1. In other words, the user who supports “progressive technologies” will have to pay twice the money he would spend for the same amount of DDR SDRAM. I think there’ll be only one winning side – the memory makers who are welcoming and supporting this transition.

Later, as the technology is perfected, DDR2-400 with 3-3-3 timings will appear, but we’ll never see the 2-2-2 combination that high-quality DDR SDRAM modules can work at. And even no-name manufacturers offer 2.5-3-3 DDR SDRAM modules that have better characteristics than the technological triumph of DDR2! The following table lists DDR2 types in the order of their intended appearance in the market:

Memory

Preliminary latencies, Q3 2004

Latency, ns

Bandwidth, GB/sec

DDR400*

2.5 – 3 – 3

12.5

3.2

DDR500**

3 – 3 – 3

12.0

4.0

DDRII-400

4 – 4 – 4

20

3.2

DDRII-400***

3 – 3 – 3

15

3.2

DDRII-533

5 – 5 – 5

~19

4.3

DDRII-533

4 – 4 – 4

~15

4.3

DDRII-533***

3 – 3 – 3

~11.3

4.3

DDRII-667

5 – 5 – 5

15

5.3

DDRII-667

4 – 4 – 4

12

5.3

* DDR400 SDRAM data are included for the sake of comparison
** DDR500 SDRAM data are included for the sake of comparison. The DDR500 standard is not ratified by JEDEC
*** A later revision, expected in the second half of 2005

The table doesn’t need much commenting upon. It’s clear that only DDR2-533 with 3-3-3 timings is really better than off-the-shelf DDR400, but it will come out with timings like 4-4-4 or even 5-5-5. Only in the second half of 2005 we’ll have really fast memory with 3-3-3 timings. It is highlighted in the table as the first memory type to show the advantages of the DDR2 standard.

AMD reasonably thinks that it should introduce DDR2 into its processors starting from DDR2-667. Only this standard, even in its earliest modification, provides acceptable timings. But this memory is expected no sooner than the second half of 2005. Thus, the users of current Athlon 64 models shouldn’t be very disappointed about the lack of DDR2 support – they don’t miss anything important.

As AMD is not interested in the first versions of DDR2, we should think that this memory type will only be supported in the next core – why should they do extra work?

We must also understand that support of DDR2 will require new mainboards, both for Intel’s and AMD’s platforms. Particularly, the Socket 939 platform is not supposed to accommodate DDR2 memory. Thus, if you want to use this memory, you’ll have to wait for the next year and for DDR2-533 at least, otherwise you’ll get no performance advantages. It’s logical to think that AMD will offer its new platform in the next year (probably, in its second half). Will it be compatible with Socket 939 processors? I think not. Will the Socket 939 platform continue to evolve? I think yes and I’ll explain my opinion shortly.

By the way, there’s another curious side effect from the acceleration of the HyperTransport bus, omitted above. So, the carrier frequency of the bus will be 1000MHz. 1000MHz can be arrived at either by 200x5 or by 250x4. The second variant is interesting because of one bus-unrelated thing. Some memory makers, like Samsung or Hynix, have started producing DDR500 memory (this name is not official, as this standard is not approved of by JEDEC; it just means that the memory works at 500MHz). It seems like the two events are not connected – but they do fit together nicely. It’s easier for AMD to implement support of DDR500 than of DDR2. It’s also profitable for the manufacturers who can sell some memory at a high price not only to a few overclockers, but also to the mass user. There would be only one losing side – Intel. Intel doesn’t have to fear the transition to DDR2 – memory latency increases, but the latencies are anyway high on the Pentium platform.

Now let’s take a look at the table. DDR500 seems very appealing – only the later revision of DDR2-667 with small latencies can surpass it. Thus, AMD’s opinion becomes well-grounded: why should they transition to the new standard when the old one is not yet depleted? Moreover, DDR500 will be much cheaper than DDR2-667 with reduced latencies (today, there’s a difference of $3 between 256MB modules of DDR400 and DDR500). At least, this price proportion will keep in the near future.

A Lesson in Math

The subsequent development of the situation differs for the two variants of speeding up the HyperTransport bus.

Variant A: 1000 = 200x5

This variant is more probable as its implementation calls for no significant platform redesign, we only have to change the multiplier for the base frequency.

Everything else is quite predictable: there will be two platforms, Socket 754 and Socket 939 (and Socket 940, gradually leaving the desktop market). The first is single-channel, the second is dual-channel (Socket 940 is a dual-channel platform too, but its main purpose is servers). The number of memory channels is the key difference between the two platforms, high-performing and the budget one. According to latest rumors (particularly, exposed at Anandtech.com), the nomenclature of models will look like that:

CPU

Socket

Frequency

L2 cache

Core, production technology

Athlon 64 FX-55

Socket 939

2.6 GHz

1MB

San Diego, 90nm

Athlon 64 FX-53

Socket 939

2.4 GHz

1MB

SledgeHammer, 130nm

Athlon 64 FX-53

Socket 940

2.4 GHz

1MB

SledgeHammer, 130nm

Athlon 64 FX-51

Socket 940

2.2 GHz

1MB

SledgeHammer, 130nm

Athlon 64 3800+ (former Athlon 64 3700+)

Socket 939

2.4 GHz

512KB

Newcastle, 130nm

Athlon 64 3700+

Socket 754

2.4 GHz

1MB

ClawHammer, 130nm

Athlon 64 3500+ (former Athlon 64 3400+)

Socket 939

2.2 GHz

512KB

Newcastle, 130nm

Athlon 64 3400+

Socket 754

2.2 GHz

1MB

ClawHammer, 130nm

Athlon 64 3200+

Socket 754

2.0 GHz

1MB

ClawHammer, 130nm

Athlon 64 3000+

Socket 754

2.0 GHz

512KB

Newcastle, 130nm

Athlon 64 2800+

Socket 754

1.8 GHz

512KB

Newcastle, 130nm

In fact, the nomenclature of Socket 754 models shouldn’t change so we don’t publish it here. Socket 939 variants are also predictable: the frequencies grow by 200MHz, the models add 300+ points to their names. Note that processors for the Socket 939 have higher numbers than their Socket 754 counterparts (the frequency being equal, but Socket 754 CPUs having larger cache). Supposing that the increase of 300 points in the model number equals to 10% performance growth, we have that Socket 939 systems will be faster than their Socket 754 counterparts by 5%. These numbers are an approximation, of course. Note also that the L2 cache is 512KB for Socket 939 CPUs (the Newcastle core).

Using DDR500 memory, we can easily determine the resulting memory frequencies from the fact that the memory divisor in Athlon 64 CPUs can only be an integer number:

Frequency, GHz

DDR400

DDR500

1.8

200MHz (9)

225MHz (8)

2.0

200MHz (10)

250MHz (8)

2.2

200MHz (11)

244MHz (9)

2.4

200MHz (12)

240MHz (10)

2.6

200MHz (13)

236MHz (11)

The memory divisor is in the brackets

You see that DDR500 is most efficient for 2.0GHz and 2.2GHz processors as well as, to a lesser extent, for 2.4GHz CPUs. These are the frequencies the CPUs are going to have in the near future. These processors will experience a nice speed boost from their memory subsystems. By the way, the frequencies specified above will be applicable to both platforms as it’s quite probable that the new revision of the Athlon 64 will come out for the Socket 754, too. Anyway, the difference between DDR400 and DDR500 is quite perceptible and should be conspicuous in benchmarks and real applications.

Variant B: 1000 = 250 x 4

That’s a less probable variant, but more interesting due to its consequences.

Consequence One: the CPU frequency will be adding 250MHz, rather than 200MHz. That’s a wide step and the K8 core cannot be scaled up infinitely, so it makes sense to use half-integer CPU multipliers. Then Athlon 64 CPUs will be able to add 125MHz and their model range can change with 200+ stepping. This allows keeping up with the traditional for this platform ratio between the frequency growth and the model number. The supposed relation between the frequencies and models (for Socket 939) is listed in the following table:

2.0GHz

2.125GHz

2.25GHz

2.375GHz

2.5GHz

2.625GHz

2.75GHz

3200+

3400+

3600+

3800+

4000+

4200+

4400+

So we’ve got a logical series. Let’s see what we have with memory with such “not-integer” frequencies:

Frequency, GHz

DDR400

DDR500

2.0

200MHz (10)

250MHz (8)

2.125

193.2MHz (11)

236.1MHz (9)

2.25

187.5MHz (12)

250MHz (9)

2.375

198MHz (12)

237.5MHz (10)

2.5

192MHz (13)

250MHz (10)

2.625

202MHz (13)* or 187.5MHz (14)

238.6MHz (11)

2.75

196.4MHz (14)

250MHz (11)

* Although the memory divisor of 13 gives a higher memory frequency than 200MHz, the difference of 2MHz is small enough to affect stability in any way. That’s why we think this divisor should be used, rather than the formally necessary 14.

Well, it’s not all smooth with the memory frequency – it is always faster than the nominal, however slightly. DDR400 is not very effective, while DDR500 is, although DDR400 is closer to the nominal frequency than DDR500 in percent expression.

Once again, this variant is less probable as DDR500 is still an exotic memory type today. It’s not quite discreet to put your main stake on exotic memory. So we should get prepared to the first variant when the bus is closed with a frequency of 200MHz. DDR500 can show its advantages with this variant, too.

By the way, if the chipset allows, the overclocker may consider the option of transforming Variant 1 into Variant 2. This transformation should bring high dividends. And really, making the CPU work at 2.25GHz rather than 2.0GHz, and memory at 500MHz rather than 400MHz, and the HyperTransport bus at 1000MHz rather than 800MHz (i.e. clocking it as 250 x 4), we’ll get a nice performance boost. There’s only one nuance: for this overclocking to become possible, we need that AGP and PCI frequencies were asynchronous to the HyperTransport clock rate. Well, NVIDIA and VIA, two major manufacturers of chipsets for the AMD platform, are both promising to implement async clocking in their new products.

The King of the Hill

The processor manufacturers seem to have a liking to mountain climbing. The next height has been conquered by AMD with its Athlon 64 FX53 (2.4GHz frequency). It is based on the same CG revision, I mentioned above. At this moment, this processor allows AMD taking the lead in the race for the highest-performing CPU – until the competitor issues its own new product. Interestingly, all testers note the fact that this processor can be easily overclocked to about 2.6GHz with its standard cooling. In other words, AMD can offer an Athlon 64 FX55 even today.

Note also that although Opteron 1xx series processors and Athlon 64 FX ones are very close to each other, similar Opterons of the 150, 250 and 850 series haven’t been announced. AMD may probably be waiting for the next Xeon 3.6GHz to answer with a volley of new models for the server market. That’s quite understandable position – why open you trumps before the right time? The launch of this processor may also be called for if the new Xeon shows improved speed characteristics.

Then AMD will have to create another gap in performance, since the performance advantage and better capabilities of Opteron processors are the force that makes third-parties manufacture Opteron-based servers. Thus, if the new Xeon 3.6GHz with 1MB L2 cache will make a breakthrough in performance, AMD will also release its new CPU. Otherwise, the launch of the new Opteron will be postponed till about Q3. And if the new Xeon turns to be very fast, Opteron processors of both x50 and x52 series will “suddenly” come out, clocked at 2.6GHz. They won’t be many with the current 130nm+SOI tech process, but enough for a start. As far as I know, there’s no deficit of Opteron CPUs in the market – AMD’s production facilities can meet the demand.

AMD will now concentrate its efforts on the transition to the 90nm+SOI technology to reduce the manufacturing cost and increase the processor output. It’ll depend on AMD’s technologists if this transition goes smoothly. By the way, the work of the technologies commands respect as one and the same Fab30 in Dresden simultaneously produces Athlon XP processors (with the 130nm technology) and Athlon 64/Opteron processors (with 130nm+SOI technology, which differs much from the previous one) and they are also mastering the 90nm+SOI process at the same time. I can’t think of another such example when three manufacturing technologies are thus mixed together. Of course, the technologists have to make such feats because of the lack of fabricating facilities, but anyway.

The restrained optimism about the 90nm+SOI technology is due to the fact that AMD has already showcased an operational prototype of the Opteron made with it, although the CPU had a small frequency, only 800MHz. The company has enough time until Q3 to polish this technology off, if no principal troubles arise. For example, Intel encountered the problem as its 90nm+”strained silicon” technology produced unexpectedly high (higher, than predicted by the theory) leakage currents. As a result, chips made with this tech process heat up much more than the previous core. AMD uses another variation of the technology, which doesn’t have problems with leakage currents, but has other drawbacks. For example, it is poorer at scaling up with the frequency; moreover, it potentially has problems with temperature distribution inside the processor. Anyway, only practice will give us the definite answer. If everything goes right, we’ll see an abundance of new processors for the Socket 754 and Socket 939, manufactured by the 90nm+SOI technology.

This transition should drop the die manufacturing cost significantly (the square of the new die will be 102 sq. mm instead of 193 sq. mm) – another aid in the competition with Intel. This is the more important as Intel has already practically finished its transition to 90nm and also has 300mm wafers, which themselves reduce the cost of the processor.

AMD also published the maximum heat dissipation for the Socket 939 – 105W. Considering that this number refers to the entire lifecycle of this socket, we have a reason to be optimistic. At least, this is much smaller than the maximum heat dissipation of the Prescott platform for the LGA775 (up to 120W) and Tejas (around 150W). Thus, this technology will probably allow reaching 2.8-3.0 frequency (although not in the first revision of the processor). This would give us a 4000+ or 4300+ model in the AMD nomenclature. Besides that, the new core may support SSE3 (in fact, besides Hyper-Threading-related instructions, nearly all other instructions have long had functional analogs in Extended 3DNow!; their support means only “teaching” the decoder to transform them into internal macro-ops). Of course, save for MWAIT and MONITOR instructions that refer to Hyper-Threading – the Athlon 64 doesn’t need them. We’ll probably also see support of DDR2 memory. However, the use of such memory will require new mainboards and a new processor form-factor.

By the way, the new core (Newcastle) may come manufactured with the current 130nm+SOI technology, with 512KB of cache. Even this variant is relatively profitable since allows reducing the die surface to 150 sq. mm, which tells favorably on the processor self cost. It would also be positive for AMD’s profits and the company needs money, considering the construction of the new Fab35. This is probably an escape way in case there are problems with implementation of 90nm+SOI. The transition to the intermediary core will allow increasing the CPU output by one third from the same amount of wafers.

Cooling and Duplicating the CPU

Besides launching its new tech process, AMD has other things to work upon. For example, they are developing a new HyperTransport to PCI Express 16x bridge (or “tunnel” as AMD terms it). This shouldn’t take much effort as the two busses are very similar between themselves. They will only have to use a newer version of HyperTransport (2.0) so that the bus could pump 8GB/s of the graphical bandwidth. On the other hand, first systems will only support a PCI Express 16x port working in the half-duplex mode, and the transition to a new version of the HyperTransport is not vitally important as the overclocked HyperTransport fully corresponds to the new graphics port in bandwidth. They will also have to develop a new South Bridge to replace the out-dated AMD8111 – they must provide all must-haves like Gigabit Ethernet, SerialATA, PCI Express 1x. Otherwise, Athlon 64 systems will lose the battle due to the lack of new technologies. It’s possible, though, that AMD relies on third parties, VIA Technologies, NVIDIA and SiS, in this matter. I suppose that they need an escape way, though.

Besides the above-mentioned things, AMD has one more enticing feature – the Cool’n’Quiet technology. Its key point is in reduction of the processor frequency and power consumption when the CPU workload is below 80%. The rotational speed of the CPU cooler is reduced too. If the workload remains low, the frequency can be dropped down further. The minimum frequency for this technology is 800MHz (1000MHz for new processor revisions and Socket 939 systems). We should acknowledge that the Athlon 64 provides enough performance even at 1GHz to handle a majority of office and home chars. If the computer lacks performance, Cool’n’Quiet automatically unfolds the clock rate to the nominal value.

You may notice that Cool’n’Quiet is not very sophisticated technologically (in fact, it is the power saving technology for notebooks adapted and applied to desktop computers), but has a strong appeal to the end-user. The modern top-end computer system has become quite uncomfortable to work with: noise and heat don’t tell positively on the ergonomics of your workplace. Many manufacturers have started to stress the low noise generation of their systems. For example, this trend has brought some popularity to the Eden platform from VIA Technologies, which generates little heat and noise, although can’t boast a high performance. The marketing department of AMD felt this tendency and the users’ needs were met in the Cool’n’Quiet technology.

As a result, your computer works much more intelligently: it gives you its utmost calculating power only when required. Otherwise, it saves your ears and wallet, producing less noise and consuming less power. That’s absolutely good, since this technology doesn’t seem to have any disadvantages.

Modern Socket 754 platforms already support Cool’n’Quiet. Moreover, mainboards without it won’t be accepted by AMD for testing. Thus, a majority of Athlon 64 systems will surely be Cool’n’Quiet.

Now, let’s take a look at the other end of the processor scale where the Opteron resides. The evolution of this processor seems predictable: x50 models will come out (made by the 130nm + SOI technology), followed by higher-frequency models, if necessary.

It’s more interesting to examine the rumors that AMD emits to the question “Why does AMD have no analog of the Hyper-Threading technology?” The processor from Intel wins some tests exactly due to Hyper-Threading. In other words, AMD’s realizing a similar thing in its processor would reduce the last trump in the opponent’s hand to naught.

There seem to be two reasons. First, the micro-architecture of the Athlon 64 differs greatly from that of the Pentium 4, making it very difficult to implement an analog of Hyper-Threading (the decoder and some other CPU units would require a complete overhaul). The Pentium 4 processor was developed with this technology in mind (for example, the trace cache was invented for that, too) and it does bring a perceptible performance gain. The Athlon 64, without a total redesign of the core, won’t get the same performance gains as the Pentium 4. The reason is in the very micro-architecture of the Athlon 64. One of the main problems with the Pentium 4 is that its execution units are often idle, mostly due to the high clock rate of the processor and the peculiarities of its decoder. Accordingly, the introduction of Hyper-Threading technology allows masking data load latencies and increase the efficiency of the CPU. This problem is not as acute with the Athlon 64, since its execution units are not idle often. And the problem of loading the execution units is solved differently there.

The second reason has been voiced by AMD spokesmen at one press conference: “it’s better to release a truly dual-core processor, rather than make one processor pretend it is two”. Really, two physical processors will be faster than one physical processor that pretends to be two logical CPUs. Really, it’s better to be healthy and rich than ill and poor. AMD shouldn’t forget, though, that Hyper-Threading does provide some advantages to the competitor processors in some applications, especially in a multitasking environment. So this technology should receive some kind of answer. Dual-core processors would make a good and effective response.

If we take that AMD is working in this direction, then every piece fits into the puzzle. First, on transitioning to the 90nm+SOI technology, the processor square (although with just 512KB of cache) is 102 sq. mm. Thus, we can integrate two processors into one die and the resulting chip would be about 200 sq. mm. This roughly corresponds to the surface size of current Athlon 64 CPUs. This size is acceptable for hi-end and server processors.

Second, the very architecture of the Athlon 64 suits perfectly for making dual- (or many-) core processors. The CPU core proper is connected to the internal switch (X-bar) that then connects to the memory controller and the HyperTransport bus. Thus, in order to add another core into the die, we only have to add another port to the switch and the two cores will be doing well together. Moreover, the inter-die connection can be high-speed, so that the cores could work with each other’s L2 cache, increasing the value of cache for each of them. This will also reduce the necessity to load data from the RAM somewhat (that’s the longest process, by the processor’s measures). Of course, such contraption will easily scale up in performance with the frequency growth. And again, the natural field of application of dual-core processors is the server market and the market of workstations. It is there that we see numerous applications that make use of multi-processor architectures and are optimized for such systems.

Such processors have potential problems, though. One of them is evident – the heat factor. Two processors must be very hot, so they’ll only become popular with the transition to the 65nm technology, which will appear no sooner than a year. That’s why it is probable that not the fastest, but slower processors will be used in dual-core configurations. Anyway, heat dissipation is not a crucial problem for servers where people have already found ways to keep the hottest server processor cool. Again, dual-core processors are likely to appear in the server market in the first hand.

I’ll also risk a supposition that they’ll be installed into the Socket 940. First, this allows going on with the existing infrastructure as well as investments into advertisement. This will also help to save the users’ money, adding more appeal to this upgrade option for the potential user of the Opteron. This variant also allows setting an appropriate price for this processor. In other words, a price that’s highly profitable for AMD and higher than for the single-core variant of the processor.

The second potential problem lies in the realm of software. Hyper-Threading is quite operational with the Home version of Windows XP as the physical processor is one. With a truly dual-core CPU, you’ll have to buy the more expensive Windows XP Professional, licensed for two CPUs. This is not a big problem if we talk about top-end and expensive CPUs of the series – people who buy hi-end CPUs won’t stop because they have to pay extra $80 for the OS. However, this may become a problem in mainstream and low-end market sectors.

The server market, on the contrary, will certainly embrace such processors due to their much higher performance – as well as dual-core CPUs from Intel, who also dropped a hint about such projects being underway. Moreover, there’s an excellent example of such processor. It is the Power4+ from IBM, which is known as one of the fastest solutions for the server market. In fact, our expectations about dual-core CPUs from AMD (and Intel) are justifiable because many-core CPUs are the alternative way to making systems faster since the linear frequency increase is not always possible. Multi-processor systems can also bear high workloads without a sign of stress, which is their big advantage over single-processor computers.

Flown by the Dream

I’m going to plunge into pure fantasies in this section of the article, without any trace of the so-called common sense. Let’s be carried away by dreams? Once again, everything I’m going to talk about in this section is a pure fantasy that doesn’t base on anything solid, save for some general widely-accepted information.

I’m going to start with PC memory. It’s clear that sooner or later, we’ll all have DDR2 in our computers. The numerical magic works on the customer: “DDR2 is newer than DDR and 2 is more than 1”. Yes, AMD will transition to DDR2 since DDR2-533 or even later. Anyway, they have to increase the speed, while there’s no evident candidate to become a replacement for DDR. It’s really sad that the technologies of Rambus are not called for in the PC market due to the company’s careless licensing policy. Rambus made a strategic mistake, deciding to rake in all money at a stroke. And the mistake was in no way connected to the technical characteristics of their produce. Regrettably, the company’s style was far from being “fair play”. The very situation when a member of a JEDEC committee secretly patents originally open technologies and specifications (and then demands licensing fees from its partners) smells bad.

That’s why there’s no hope that manufacturers will put their stake on Rambus again (or cooperate with the company in any other way). Why I woke up the Rambus affair? Because the company has an exciting technology called XRD SDRAM, which is an embodiment of the Yellowstone ideology. Its key point is in transferring data eight times per clock cycle (compare to 2 transfers with DDR technology). That’s really impressive. Regrettably, the licensing fees for this technology are too high for a majority of memory makers. Although some manufacturers like Samsung have the license, they use it mostly in communication equipment where performance is more important than price: XDR SDRAM can provide a bandwidth of 100GB/s today! No other available memory technology (even Full Buffered DRAM Intel has started talking about) can even dream of such speeds. It’s true, this memory subsystem would make our computer systems much faster and would allow forgetting for a long time about the memory bandwidth – it’s no secret that the overall performance growth of computer systems is badly limited by memory speeds.

Well, we can’t dream about that, at least while Rambus keeps on with its current licensing system. Moreover, the company has spoiled its reputation and its technological superiority is not the only factor that matters.

Now let’s see what we can shape up from the existing technologies. Support of DDR2 will require a new processor packaging and a new processor socket. Let’s call it Socket X as we don’t know the exact number of the pins (they may be as many as 1000 and more). Accordingly, AMD will have to release an Athlon 64 model for this memory type and processor socket. It will be wise to offer only top-end processor models for this platform as the memory and mainboards will be much more expensive than their DDR analogs.

Thus, the most profitable thing to do is to transition the Athlon 64 FX processor to this platform in the first hand. By the way, this doesn’t mean the end of the Socket 939. I agree with the opinion that these platforms will happily coexist for some time (a year, at least). That’s reasonable since DDR2 won’t have a big advantage over DDR in terms of performance. They’ll also design mainboards in such a way as to provide support for DDR2-800 along with DDR2-533. As a result, DDR2-800 will give us 6.4GB/s per module. Supposing that the processor remains dual-channel, we have 12.8GB/s per processor. Well, that’s not bad, although we’re still sad about the missing XDR SDRAM.

Now that we’ve introduced a new platform, it makes sense to equip it with the new HyperTransport version 2.0 bus, moreover, as it is quite compatible with the current 1.05 version. I’ve mentioned above that the performance of the new version of the HyperTransport grows significantly, to 22.4GB/s, in the fastest variant.

Let’s also recall the rumors about AMD’s working on the next-generation processor, codenamed K9. That’s logical as the development of a new processor takes time (up to 4-5 years).

So let’s risk a supposition – the K9 CPU may be compatible with the newly-introduced Socket X. Moreover, this would be consistent with AMD’s traditional speeches about keeping up old investments. This would also be another factor for purchasing this platform. In fact, there can be even a more daring variant with K9 models for the Socket 939. Yes, the performance will be low on this platform, but why should they spurn some potential buyers of the new processor? So our suppositions make some sense.

I don’t guess what the K9 will be like from the inside. Probably, AMD will perfect the decoder of x86 instructions further. This means that the appearance of another pipeline (and, accordingly, execution of four macro-ops per clock cycle) is unlikely. They’ll surely take measures to increase the operational frequency of the processor and improve its performance in the 64-bit mode; it’s no secret many users hope for a speed gain from the 32-64-bits transition. Yes, extended address space and the NX bit are good, but the user wants to have a proof to his intuitive feeling “64 bits is more than 32 bits, so a 64-bit processor should be faster than a 32-bit one.” Meanwhile, that’s not quite so. For example, a 64-bit CPU performs some operations like multiplication slower than its 32-bit analog. It’s not because the processor is bad, but because it is more difficult to multiply two 64-bit numbers than two 32-bit ones. So it’s vitally important for AMD to boost the performance of its processor in the 64-bit mode.

This is how I view the future evolution of the K8 core. We’ll have to wait for confirmation or denial of these suppositions, though. The K9 is not likely to arrive sooner than in a year.

So, let’s wait and see if our suppositions are true. In any case, this year is going to be quite interesting as the two processor giants are very close to each other in technology. The number of announced innovations also promises some interesting things and events ahead.