Well, we have every reason to state that L2 cache has finally migrated from the mainboard to the processor core. In fact, it seems to have been not so long ago that we looked for Socket 7 mainboards with L2 cache as large as possible, and now you won't see a single SRAM microchip with L2 cache anywhere. The intermediate stage, when these microchips were located on the processor board was successfully surpassed. First, low-cost Intel Celeron processors acquired integrated on-die L2 cache, and then so did Intel Pentium III. And at last AMD with its Athlon joined them, too.
And why did the manufacturers start integrating L2 cache into the CPU core? They were guided by architectural as well as by economical reasons.
As for economy, there are hardly any questions here. Since there appears no use in SRAM microchips, the CPU can be made without a processor board and without a cartridge. As a result, CPU mounting gets much easier and cheaper, because it also allows becoming independent of the third parties supplying PCB, SRAM and CPU cartridges. In other words, the manufacturers can now not only save money on the component parts but also speed up the processor assembling.
As far as the CPU architecture is concerned, the first advantage, which is more than evident here, is the possibility to use one frequency for L2 cache and the CPU core, since both of them are now located on one and the same die. In other words, L2 cache works at the full processor frequency, as simple as that. As you remember, when the cache was placed outside the processor core it was impossible to make it work at the CPU frequency. Speaking about Athlon CPU, for instance, we can say that the maximum supported frequency for L2 cache of a 700MHz model was 350MHz, because L2 cache worked there at half the CPU frequency. Further on, the higher got the core frequency of the ordinary Athlon processors, the worse appeared the situation with L2 cache. AMD suppliers couldn't provide the company with the appropriate cache-memory working at 2/5 the processor frequency. And starting with 900MHz models, this memory had to support 1/3 the CPU frequency only, which turned out even harder. As a result, L2 cache appeared the major bottleneck for the CPU performance and hindered its further improvement.
Moreover, theoretically it is possible to provide a much more efficient bus between the CPU core and L2 cache if the latter is located on-die. Really, it requires considerably less time and effort to connect two closely located things rather than to think of some complicated methods for signal synchronization and to design the PCB layout for external SRAM chips. Intel Pentium III on Coppermine core can be regarded as a really illustrative example here, because the shift from the external L2 cache to the on-die one allowed Intel to make the bus between them 4 times wider, i.e. to increase it from 64bit to 256bit.
However, there is a very logical question then: why, on Earth, have the manufacturers been sticking to the use of the external L2 cache for so long, if the integrated on-die L2 cache is such a profitable thing? The answer is as simple as ABC: just look at the drawbacks of the integrated L2 cache and you will get the point right away. What is cache memory? It is additional transistors and hence moving them to the CPU core will automatically tell on the core size and make it bigger. And creating a bigger core is a much more complicated task to fulfill. Moreover, a bigger core dissipates more heat and hence requires better cooling. Well, this is exactly where another factor starts mattering greatly: the manufacturing technology. At first, with the 0.25 micron technology, it was very hard to integrate L2 cache into the processor core for the reasons mentioned above (now we don't take into account the Celeron on Mendocino core, where the cache was very small and fit into the core without much trouble). However, as soon as the CPU manufacturing was switched to 0.18 micron, the developers' dreams came true. 256KB L2 cache could easily fit into the core, so that the manufacturers announced a much higher percentage of good products than they had planned before.
So, having weighed all cons and pros of the on-die L2 cache, you probably arrived at the conclusion that moving L2 cache to the CPU core was a really promising thing. So did AMD. Especially since both AMD factories in Dresden and in Austin managed to succeed in applying 0.18 micron technology (in fact, the elder Athlon CPUs were manufactured with this technology already). This is where the new old Athlon, aka Thunderbird, comes from. Its only architectural difference from the old Athlon model is exactly this particular 256KB integrated on-die L2 cache introduced instead of the external 512KB L2 cache. Well, let's study Thunderbird specification more carefully:
- Chip manufactured with 0.18 micron aluminum or copper compound technology;
- Thunderbird core based on Athlon architecture with 37 million transistors and 120sq.mm size;
- Works in special mainboards equipped with 462pin Socket A (Slot A versions are available in limited quantities only for OEMs);
- Uses high performance 100MHz DDR EV6 system bus;
- 128KB L1 cache (64KB for instructions and 64KB for data);
- 256KB integrated on-die L2 cache working at the full core frequency;
- Power voltage for 850MHz frequency - 1.7V and for higher frequencies - 1.75V;
- 3DNow! SIMD-instructions set;
- The following versions are available: 750, 800, 850, 900, 950 and 1000MHz.
Well, as we have already said 256KB integrated on-die L2 cache is the only difference between AMD Thunderbird and ordinary Athlon, considering the processor architecture. Although the cache has become twice as small as that of the old Athlon CPU, the performance shouldn't drop, since the new cache works much faster than the old one: at the full CPU core frequency. Besides, Thunderbird cache features a 45% lower latency than the old L2 cache, because it is situated much closer to the CPU core. As for the other architectural peculiarities they remained unchanged, so you can read more about them in our AMD Athlon 600 Review. However, you should still keep in mind that Thunderbird core is enhanced and manufactured with a more up-to-date technology: 0.18 micron. So, Thunderbird core with the integrated on-die cache appears not much bigger than K75 core (0.18 micron Athlon) and even smaller than the old K7 core manufactured with 0.25 micron process.
The second difference between the old and the new Athlon is the form-factor. Since there is no need in the processor board any longer, Thunderbird uses Socket A instead of Slot A. And although Slot A Thunderbird processors will still be available in the market for some time, the major form-factor for these processors is 462-pin Socket A. As we have already mentioned in our AMD Duron 650 Review, new Socket A is of the same size as Socket 370, which should simplify the shift to it and hence should allow using old coolers for new processors.
AMD manufactures Thunderbird processors in two factories: in Austin and in Dresden, with two different technologies: aluminum and copper. Nevertheless, there seems to be only one major difference between these two modifications: the color. Thunderbirds from Dresden are blue while those from Austin - green.
If you ask us about the differences between new and old Slot A Athlons, we will really have to think hard before giving you an answer. Both of them look alike, and which is even more puzzling, cost alike. But there is still one thing that can help distinguish between them. It is the marking: the old Athlon is marked as AMD-K7XXX, while the new one - as AMD-AXXXX. Moreover, if you look inside the cartridge, you will hardly mix these two babies up. When looking at the edge going into the processor slot, you will see no SRAM microchips by new CPUs.
Everything said in this review about the new Thunderbird and its on-die L2 cache seems pretty praising hitherto. But now it's high time we criticized this CPU and disappointed slightly most of AMD fans. Especially, since it's not a hard job to compare Thunderbird L2 cache to that of Coppermine.
From this point of view the only Thunderbird cache's advantage is its exclusiveness. In other words, the algorithm of Thunderbird L2 cache implies that the data stored in L1 cache isn't duplicated in L2 cache. It means that the total size of the cache memory in Thunderbird processors makes 128+256=384KB. In case of Coppermine, 32KB of L2 cache always keep L1 cache copy, that is why the effective cache memory size makes only 256KB altogether.
But why should this be disappointing, you may wonder. Here you are: Thunderbird L2 cache is just slower than its competitor. Why? Firstly, the latency of Intel Pentium III cache is lower and secondly, AMD engineers decided not to change the bus between the core and L2 cache after they had moved the latter into the CPU core. As a result, the Thunderbird bus remained a 64bit one while that in the Coppermine is four times as wide.
Another tangible problem of AMD Thunderbird is the mainboards for it. Socket A mainboards on VIA Apollo KT133 are just starting to come out and are still too hard to get. As for Slot A Thunderbird, AMD has bowled everybody out by saying that this processor doesn't work with VIA Apollo KX133 chipset. In order to clear up this matter we tested a couple of mainboards, so that to find out if they were compatible with AMD Athlon 700 on Thunderbird core. Strange as it might seem, but almost all the boards tested didn't have any problems with this processor. Here is the list:
- ABIT KA7 (VIA KX133)
- ASUS K7M (AMD 750)
- ASUS K7V (VIA KX133)
- Gigabyte GA-7VX (VIA KX133)
- MSI MS-6195 (AMD 750)
- SOYO SY-K7AIA (AMD 750)
- Tyan S2380 (VIA KX133)
Only two boards failed to boot with Thunderbird 700:
- EPoX EP-7KXA (VIA KX133)
- SOYO SY-K7VIA (VIA KX133)
Four of the six mainboards tested were based on VIA Apollo KX133 and proved absolutely perfect with new Athlon CPU. We can't say that it does any good to AMD's image. Unfair marketing aimed at pushing Socket A form-factor into the market at any rate cannot characterize the company in a positive way. We have already touched upon this topic in our VIA Apollo KX133 Based Athlon Mainboard Roundup. So, we won't be surprised at all if one day we see Slot A-to-Socket A converters in the market, which are allegedly impossible to develop, as to AMD.
And now let's finally pass over to the benchmarks. The testing system was configured as follows:
- "Old" and "new" AMD Athlon 800 CPUs, Intel Pentium III 800EB CPU;
- Chaintech 6ATA4, ASUS K7V and Gigabyte 7ZM mainboards;
- 128MB PC133 SDRAM by Micron;
- Creative 3DBlaster Annihilator Pro graphics card;
- IBM DJNA-372200 HDD;
- Creative SoundBlaster Live! sound card.
As you see, the benchmarks were run for systems built on different mainboards, namely on different chipsets from VIA: Apollo Pro133A, KX133, KT133 with the same architecture. We did it in order to reduce the influence exerted by the chipsets and to give you a better idea of the differences between the processors.
First let's take a look at the performance according to synthetic benchmarks, and then - at real-life applications.
This test shows the performance of the CPU integer unit and data processing speed. Due to a faster L2 cache Thunderbird looks more attractive than Pentium III surpassing it by over 3%. The achieved results also give to understand that increasing L2 cache working frequency up to the core frequency has a really positive effect on the performance even though the cache has become twice as small.
This benchmark shows "pure" performance of the arithmetic coprocessor, since all the data it needs can fit into L1 cache of the CPU. According to the results obtained, Thunderbird has the same core as the old Athlon CPU. Working at the same frequency both CPUs proved equal.
This benchmark included into 3Dmark2000 shows theoretical CPU performance in typical gaming 3D scenes when SSE and 3DNow! SIMD instructions are applied very actively. As you can see, Intel Pentium III is undoubtedly the first here, however, it is very likely to be connected with a more enhanced L2 cache of the CPU rather than with a better implementation of SIMD instructions.
And now we are starting the real applications. According to Content Creation Winstone 2000, AMD CPUs manage to leave their Intel rivals a bit behind. However, a new Athlon proved only 5% faster than its older counterpart.
This benchmark is also based on office applications to test the CPU performance. But SYSmark 2000 contains the applications, where Intel CPUs have always been better than those from AMD. There are two main reasons for that. Firstly, SSE support is better implemented in Intel processors and secondly, Intel CPUs work much more efficiently with L2 cache. But today the situation has been finally changed: new Athlon managed to take the lead and to surpass Pentium III working at the same frequency by 3 points.
Now let's take a look at AMD Athlon (Thunderbird) in games. As we have already said several times, the fps rate at higher resolutions in Quake3 depends on the bandwidth of the buses connecting the main system components with each other. Since EV6 processor bus used in Athlon systems allows transferring more data between the CPU and other parts, AMD processors show slightly higher results than Intel Pentium III.
At lower resolutions the performance in Quake3 depends on a number of reasons, that's why this test beautifully shows the average performance. Actually, this time there is hardly anything to talk about, because all the testing participants appeared very close to each other. But nevertheless, Thunderbird got just a bit faster.
The performance differences are minimal. Thunderbird turned out only 1% faster than Intel Pentium III, and only 2% faster than the old Athlon. In fact, we think the results obtained in this test could be different if we changed the mainboard in our test system, for instance.
Here the situation repeats the previous case.
Expendable is a game, which operates huge data blocks. That is why the bandwidth of EV6 bus used in AMD CPUs matters greatly here. And of course, faster L2 cache of the Thunderbird, which helps it to take the lead over the old Athlon as well.
Well, the things are absolutely the same, though this time new Athlon is 2-3% faster than the old one.
Unfortunately, we can't estimate whether the new Athlon processor features good overclocking potential. The thing is that EV6 bus of these CPUs doesn't allow increasing the FSB frequency that much. The data is transferred along both signal edges that is why even the slightest frequency increase causes grave stability problems with the entire system. For example, we couldn't exceed the barrier of 105MHz for our Thunderbird.
As for the mainboards allowing us to change the clock multiplier, which have been the talk of the town recently, you will be also disappointed to hear that this function will be locked in the serial processors, as to AMD.
Unfortunately, we have to state that Thunderbird is nothing special in terms of the core architecture, as well as in terms of marketing and performance. Although AMD didn't aim at conquering Olympus with its new CPU: it was just a logical continuation of the Athlon family.
Thunderbird doesn't feature any innovations. AMD engineers simply changed the size, the location and the frequency of the L2 cache. That's why we saw very insignificant performance growth (or even none) compared to the old Athlon. As we have already said the processor architecture remained unchanged, even the L2 cache bus wasn't enhanced, which finally told on the performance and prevented L2 cache from showing its best.
As far as AMD's future plans are concerned, they aren't going to modify this bus up to the launching of Mustang, which will feature large on-die L2 cache. So, the only thing, which may help Athlon systems to perform better in the near future, is the use of DDR SDRAM memory, which seems to be not so far away already. The launching of the first Athlon chipset supporting DDR SDRAM, AMD 760, is planned for autumn. Well, can't wait to see it!
And in the meanwhile the question about Pentium III vs Athlon will still remain unanswered and only the personal likes and dislikes will influence the users' choice.