by Ilya Gavrichenkov
10/28/2007 | 09:04 PM
If you are following the events of the processor market, then you probably know that Intel is planning to launch their server processors from the Penryn family on November 12, 2007. These will be the first processors manufactured with innovative 45nm technological process. Although these CPUs are yet another solutions on the widely spread Core micro-architecture, they promise to be extremely interesting. The thing is that Penryn didn’t emerge as a result of simple transition of the previous 65nm processor cores to the new manufacturing technology. Intel engineers improved and enhanced a lot of things in them to achieve higher performance without pushing up the clock frequency.
However, Intel followed AMD’s example this time, when the new architectural solutions first arrive into the server segment. Thus, in two weeks they will announce only Xeon processors known as Harpertown (4 cores; 12MB L2 cache; 50, 80 and 120W TDP) and Wolfdale-DP (2 cores; 6MB L2 cache; 40, 55 and 80W TDP). As for the desktop system, Penryn processors will unfortunately appear there only next year. Intel, however, made only one exception for the most wealthy computer enthusiasts and decided to give them the opportunity to meet the new technologies before Christmas. In other words, the refreshed Xeon family is going to hit the stores together with the only 45nm model of a quad-core LGA775 processors – Core 2 Extreme QX9650 (codenamed Yorkfield XE). The most impatient computer users will be able to purchase it for $999. This processor is also scheduled to be announced on November 12, but we have a great opportunity to get a little bit ahead of time and tell you about this new solution today.
When Core 2 Extreme QX9650 processor took its place on our testbed we were very excited. Of course, this is it: the so long awaited opportunity to take a get close and personal with the fastest, newest and one of the most interesting solutions for computer enthusiasts. But unfortunately, testing this processor turned out a not very interesting process. The thing is that there is absolutely no competitor that we could compare it against at this time. But, of course, Core micro-architecture, the innovations made to it and new 45nm production technology ensure that it outperforms the previous solutions from all standpoints.
Intel Company keeps good tempo in replacing the manufacturing technologies and processor architectures with the time. As of today, they intend to offer new micro-architectures every two years and a year after the introduction of the new micro-architecture new processors should be transferred to new finer production process with a few improvements to them. Penryn family is actually the result of Merom, Conroe and Woodcrest evolution and it is coming to the market about a year after Core micro-architecture was first introduced. Closer to the end of next year we will be greeting absolutely new processors currently known as Nehalem.
Intel main competitor, AMD Company, failed to comply with the rhythm Intel has set for new announcements and to launch high-frequency quad-core desktop processors on new K10 micro-architecture this year, although we have been expecting them. Instead, they are going to announce only new mainstream CPUs in November-December timeframe, although they will not be able to compete with Intel Core 2 Extreme QX9650. That is why today we are going discuss the differences of the new Intel processor from the previous Intel solutions on 65nm Kentsfield core.
The first desktop processor from the Penryn family, Yorkfield, looks quite common. Its looks doesn’t give away the fact that there are two 45nm dual-core dies aka Wolfdale beneath the heat-spreader.
Core 2 Extreme QX9650 - on the left, Core 2 Extreme QX6850 - on the right
You shouldn’t consider a noticeable dot in the lower left corner of the CPU to be a distinguishing feature of the new processor. Intel introduced the same (or similar, located in the upper left corner) marker some time ago for all of their quad-core CPUs. It indicates the features of the HIS material and is used by the manufacturer for their internal purpose.
However, the bottom of the new processors has very specific distinguishing looks. We haven’t yet seen the electronic components arranged in such a way.
Core 2 Extreme QX9650 - on the left, Core 2 Extreme QX6850 - on the right
The diagnostic CPU-Z utility already known 45nm Yorkfield processors and recognizes them correctly.
The screenshot above reveals all the main peculiarities of the new processor. Core 2 Extreme QX9650 is yet another quad-core CPU with the similar working frequency as the recently reviewed Core 2 Extreme QX6850 (Kentsfield) manufactured with the previous-generation 65nm technological process. Its nominal clock frequency is also 3.0GHz, it works with 1333MHz bus and 9x frequency multiplier.
However, unlike the predecessor, the new CPU features larger L2 cache with the total size of 12MB. Since the new Core 2 Extreme QX9650, just like the previous-generation quad-core processors, is physically built of two dual-core dies, its L2 cache consists of two 6MB halves. This is a similarity between Yorkfield and Kentsfield: both of them aren’t monolithic quad-core processors.
New SSE4.1 instructions support is another vivid advantage of the new Core 2 Extreme QX9650 processor, just like all the other CPUs from the Penryn family. Other than that Core 2 Extreme QX9650 seems to have no surprises up its sleeve for us at first glance.
To sum up everything I have just said, I would like to offer you a table with formal specifications of the new CPU. To make the presentation more illustrative, I would like to add the specs of the previous generation Core 2 Extreme QX6850 processor side by side:
Core 2 Extreme QX9650
Core 2 Extreme QX6850
2 x 6MB
2 x 4MB
Enhanced Intel SpeedStep
Intel Virtualization Technology
SIMD instructions support
2 x 410 mln.
2 x 291 mln.
2 x 107sq.mm
2 x 143sq.mm
In the previous part of our review we have briefly pointed out the main differences between the Core 2 Extreme QX9650 and its predecessor. Now it’s time to take a closer look at the innovations that the Penryn processor family will boast.
We will start discussing the peculiarities of the new processors with the new manufacturing process. It was the 45nm process that actually created Penryn CPUs. It allowed not only increasing the complexity of the die thanks to enhancements made to the processor’s functional units, but also allowed reducing the processor Vcore and heat dissipation and guaranteed the ability to increase the core frequency in the future.
The new production technology is extremely interesting also because Intel has to actually perform tremendous amount of research. As a result, classical materials (such as silicon oxide), which had been used since 1960s to manufacture transistors, were replaced with absolutely new ones (rare-earth hafnium based compound). The new 45nm transistors use metal gate made of polysilicon and a high-k dielectric – hafnium silicide.
These changes in the semiconductor elements design help solve a few acute problems at the same time. New 45nm technological process almost doubled the transistor density on the die. Besides, it increases their switching speed by about 20% and reduce the power needed for that by about 30%. Also the use of new materials allows reducing leakage current significantly: in source-drain leakage power by the factor of 5 and in gate oxide leakage power – by the factor of 10.
I have to say that it would hardly be possible to get to 45nm production process without involving new materials. The thing is that older transistors with gate dielectric of silicon oxide lose their primary features when miniaturized for the sake of higher switch speed. In 65nm semiconductor devices, the dielectric layer became only 5 atoms thin already. Further thinning of this layer for even better performance of the semiconductor devices would have made it unable to keep the electrons outside. The new high-k dielectric solves this problem: it allows increasing the gate dielectric layer thickness without losing any of the transistor speed.
As for replacing the polysilicon gate with metal one, this is a partially forced measure. It was necessary because the hafnium silicide dielectric appeared incompatible with the old material on the quantum level, which led to lower transistor performance.
Just like before, they manufacture 45nm processors using copper interconnects, 300mm wafers and 193nm lithography. So, the new production process didn’t require Intel to replace most of its manufacturing equipment. This gives us hope that mass production of new Penryn processors shouldn’t hit against any significant obstacles and will deliver high chip yields.
Thanks to the new production technology Intel is planning on increasing the clock speeds of its Core 2 Quad processors up to 3.0GHz, and of Core 2 Duo – up to 3.33GHz within the next year, while their TDP will remain within the common 95W and 65W respectively. As for the frequencies of the top quad-core processor models for enthusiasts, that boast typical heat dissipation of 130W, they will hit 3.2GHz. Intel, however, is going to do one more thing in order to increase their performance even more: they will shift to faster 1600MHz bus.
The advantages of the new 45nm technology allowed Intel engineers to increase the number of processor transistors without making the CPU die any bigger. This possibility was as always used to increase the L2 cache memory in the first place. While the old 65nm dual-core Conroe processors features 4MB shared L2 cache, their 45nm Wolfdale successors got a 6MB cache at their disposal.
The area of the new CPU dies occupied by the L2 cache has finally become larger than any of the other processor functional units, which you can clearly see from the Wolfdale (half of Yorkfield) picture.
As a result, new quad-core Penryn processors will feature 12MB L2 cache: 6MB for each pair of cores. In other words, there will be no changes in the quad-core processor design with the transition to new production process. The core pairs are still located on different dies and will be exchanging data along the system bus and RAM.
Yorkfield processors have 50% larger cache with higher associativity level. They have 24-way set associativity L2 cache compared to 16-way associativity in the previous generation. As a result, Intel hopes to be able to use the L2 cache memory more efficiently and maintain fast data search in it.
Core 2 Extreme QX9650 - on the left, Core 2 Extreme QX6850 - on the right
However, the results of cache latency measurements indicate that the larger L2 cache is still a little bit slower by the new processors.
Core 2 Extreme QX9650
Core 2 Extreme QX6850
L2 latency (CPU-Z)
L2 latency 64 byte stride
L2 latency 256 byte stride
L2 latency 512 byte stride
The cache memory of the new processors not only got larger but also acquired an additional function called enhanced cache line split load . This innovation should speed up the reading of incorrectly aligned data from the cache-memory, namely the data that were supposed to be placed into a single line but for some reason ended up split into two cache lines. The new function attempts to speculatively predict what data this might be and pick it out from the cache as fast as if it sat in one single line. Theoretically, this feature may speed up some applications that deal with data scans.
Penryn processors, such as Yorkfield, may also boast newly expanded SIMD commands list. The new generation of Intel processors acquired support of SSE4.1 set that includes 47 new instructions. Nevertheless, despite a pretty large number of new instructions, they do not form a logical set, but are rather numerous additions to the already existing SIMD command sets. New instructions should as always help improve the new processors performance when working with 3D graphics, streaming video and in a number of scientific computational applications.
However, we will be able to really feel the positive effect from the new SSE4 instructions support only after a while, when it becomes widely adopted in computer software. So far, there are only two applications that use the new SSE4 instructions. They are two video codecs: DivX 6.7 and TMPEG Xpress 4.4.
As for the main peculiarities of the new SSE4 instructions, they include vectored 32-bit integer multiply operations; 8-bit unsigned min/max operations, plus 16-bit and 32-bit signed and unsigned versions; features to improve the compiler’s ability to vectorize integer and single-precision code more efficiently, as well as video encode acceleration functions; floating-point dot product operation and streaming load instructions using the peculiarities of the processor cache-memory structure.
Intel engineers didn’t just introduce new SIMD instructions support in their new processors, they have also worked on some of their functional units. As a result, they managed to significantly speed up integer and floating-point division and accelerate processing of those SSE instructions that deal with bit shuffling.
Fast division is performed in a special Penryn unit called Fast Radix-16 Divider . While the Radix-4 unit of the 65nm processors on Core micro-architecture could only calculate 2 quotient bits in a single pass of the iteration algorithm, the new unit can handle 4 bits per clock. As a result, Penryn processors can perform integer and floating-point division about twice as fast and work faster on square roots as well.
Intel engineers had to modify the SSE shuffle operations algorithms in order to implement new SSE4 instructions support properly. The new single-pass 128-bit shuffle unit called Super Shuffle Engine can shuffle bits of a 128-bit register in a single clock cycle. As a result, new processors can process SSE instructions that require bits shuffling twice as fast. Among these instructions are operands packing, unpacking, wide shifts, align concatenated sources, insertion and extraction.
Besides speeding up some of the instructions processing, they have also made some improvements to virtualization technology and interrupt masking mechanisms. As a result, Penryn processors can now boast two things: first, they can switch between virtual machines about 25%-75% faster and second, they can process CLI/STI instructions much faster, too.
Although our today’s article is devoted only to the first representative of the new desktop processor family, Core 2 Extreme QX9650, we tried to make our review useful not only for the potential buyers of this pretty expensive solution, but also for those of you who will be waiting for more affordable Penryn models. That is why we decided to include some information on Intel’s plans regarding Yorkfield and Wolfdale introduction in all other market segments.
The table below offers some info on the full 45nm lineup of Yorkfield and Wolfdale processors as well as some data on their pricing and anticipated launch schedule.
As you can see from the data above, Core 2 Extreme QX9650 will remain Intel’s top solution for a very short period of time. In Q1 2008 a newer model, Core 2 Extreme QX9770, with 3.2GHz clock speed and 1600MHz bus will take the lead. Intel is preparing a special X48 chipset to support this particular CPU model that is scheduled to be announced at the same time with the processor. This promising processor will not only push the price of the top desktop CPUs to $1,399, but will also boast the highest heat dissipation of 136W.
Non-extreme quad-core 65nm Yorkfield processors will be available starting beginning of next year in three modifications with frequencies ranging from 2.5 to 2.83GHz. Unlike contemporary Core 2 Quad on Kentsfield core, all of them will work with 1333MHz system bus. More frequency values will be available because processors manufactured with 45nm process support fractional dividers. I would like to say that the new processors in the Core 2 Quad family with 45nm cores will not cause the price of the younger quad-core models to go down. The slowest Yorkfield processors with Q9300 model name will cost as much as Core 2 Quad Q6600. It is interesting that the cache memory of Core 2 Quad Q9300 due in January 2008 will be reduced to 6MB, which is even less than the contemporary Kentsfield processors have.
The dual-core processor line-up will also welcome some new members in the beginning of next year. New Wolfdale processors will little by little oust old Core 2 Duo CPUs from the market. While they will cost the same as Wolfdale, their lower clock speeds and smaller L2 cache will not help them retain the popularity. The launch of 45nm core will push the clock speed of the top dual-core processor to 3.16GHz. Besides, you will be able to buy a pretty fast CPU with 2.66GHz clock frequency for only $166.
Q2 2008 promises another very interesting addition to the Wolfdale processor lineup. Around that time Intel plans to launch a budget dual-core Penryn processor with 1066MHz bus and smaller 3MB L2 cache.
Note that the quad-core family lineup will not change in any way in Q2 2008. However, Q3 promises to be full of news. Right before the launch of the CPUs with new Nehalem micro-architecture, Intel will be increasing clock speeds of the top Core 2 Quad and Core 2 Duo processors one step higher.
Well, our today’s article is intended to reveal the features of the new Core 2 Extreme QX9650 – the first representative of the Penryn desktop processor family. As we have already said before, it only makes sense to let this processor compete against its predecessor, Core 2 Extreme QX6850, at this time. That is why our test session was planned and performed accordingly.
We used the following hardware components to assemble our test platforms:
I would like to say that we picked ASUS P5E3 Deluxe mainboard based on the new Intel X38 chipset as a basis for our test platform for a reason. Although all mainboards on Intel X38 and P35 chipsets should support new 45nm processors, not all of them feature proper BIOS updates at this time. For example, our traditional ASUS P5K3 Deluxe test platform couldn’t be used this time for this particular reason. However, you should understand that all proper BIOS updates will be available sooner or later, especially since there are still two weeks left before the official launch of the Intel Core 2 Extreme QX9650 CPU.
We very rarely use Sandra benchmarks for processor performance analysis. However, this time we decided to make an exception. Simple synthetic benchmarks from this testing suite, which do not depend dramatically on the size and structural peculiarities of the CPU’s cache-memory, may help us estimate the important and efficiency of the micro-architectural improvements introduced in Penryn CPUs.
We see the largest 17% advantage in the integer multimedia test. This is a pretty illustrative moment as this test is the only one of all using new SSE4 instructions. In other words, this benchmark demonstrates very clearly that another expansion of the SIMD instructions set in the new Penryn processors can really improve the optimized applications performance if used wisely. We can also see a significant performance boost in the arithmetic floating-point benchmark, that Yorkfield processor owes to the new Fast Radix-16 Divider unit.
The practical memory bandwidth tests also reveal pretty noticeable performance improvement, although it is not as big this time. It should be the larger L2 cache that affects these results.
According to the results of Sandra XII, we can expect the new quad-core 45nm processors to demonstrate a significant advantage even when running on the same clock speed as their predecessors, only thanks to the above discussed micro-architectural improvements. However, synthetic benchmark results are not enough to make any far-fetching conclusions. These tests usually stress certain processor features more than others. Only the tests in real tasks will give us an idea of the actual state of things. So let’s move on to them now.
SYSmark 2007 testing suite uses typical work scenarios in the most popular applications and tasks.
We do not see the same impressive advantage of the new Penryn processors in the real applications as we have just seen in synthetic tests. Average performance difference between Yorkfield and Kentsfield working at the same clock frequency doesn’t reach even 2%.
The new PCMark Vantage testing suite for general performance analysis uses a slightly different approach. Instead of launching fully-fledged applications it emulates the work of widespread algorithms.
The results of PCMark Vantage are very similar to what we have just seen in SYSmark 2007. The average performance difference between Core 2 Extreme QX9650 and Core 2 Extreme QX6850 is only 1.5%.
Games have always been among those applications that are very sensitive to the size of L2 cache memory. It seems to be the case this time also: 12MB L2 cache of the new Yorkfield processor did its job very well. In some games Core 2 Extreme QX9650 outperforms Core 2 Extreme QX6850 by almost 7%. The overall average advantage of the new processor in 3D games rests around 4%.
The existing media codecs versions do not yet support SSE4 instructions, although there are a few instructions in this set that can speed up the codecs performance significantly. That is why we expect to see a noticeable performance difference between the 45nm and old 65nm processors during media content encoding in the future. The results obtained in DivX 6.7 are a perfect illustration to this statement: this codec version already knows to perform experimental SSE4 full search. So, the new processor supporting SSE4 turns about 30% faster.
By the way, we performed some additional tests and found out that the performance boost in DivX 6.7 depends a lot on the type of the encoded movie. In our test fragment featuring a battle scene Yorkfield processor was only 30% faster, however, the movie suggested by Intel, with water surface ripples being the major part of it revealed almost 70% performance boost in Yorkfield’s case.
We decided to include another four widely spread applications in our test session and had to single them out in this part of our article, as they do not belong to any of the previous sections.
The applications we have just shown illustrate perfectly that Yorkfield processors may provide much higher performance than Kentsfield even in those tasks where new SSE4 instructions aren’t supported. Other innovations, such as larger L2 cache and accelerated division seem to be doing the job just fine in this case.
Summing up our performance test results, let’s take another look at the performance improvement Core 2 Extreme QX9650 processor boasts compared with the predecessor, Core 2 Extreme QX6850. I would like to remind you that both CPUs work at identical clock speeds, that is why the performance advantage comes solely from the improvements in the micro-architecture.
The performance advantage of the new processor depends a lot on the type of applications we are looking at. However, despite this fact, it exists almost in every task. It is actually not surprising at all, taking into account that Core 2 Extreme QX9650 features larger L2 cache, new SSE4 instructions support and the whole bunch of other architectural innovations. Moreover, as SSE4 instructions find their way into applications, the advantage of the new 45nm processors over the old 65nm CPUs will keep growing even more. As for today’s state of things, Core 2 Extreme QX9650 proved on average 6.5% faster than Core 2 Extreme QX6850.
Intel positions Core micro-architecture not only as a high-performance solution. Intel considers it highly efficient from the “performance-per-watt” prospective, which we have already seen before in our tests. However, Core 2 Extreme owners and potential buyers are not very much concerned about how economical these CPUs are. Especially, since they boast pretty high typical heat dissipation of 130W as solutions for enthusiasts.
Nevertheless, it seems quite logical to expect 45nm cores and Penryn processors based on them to become more economical. On the other hand, Intel didn’t reduce the TDP for the new Core 2 Extreme QX9650 working at the same clock frequency as the Core 2 Extreme QX6850 on the old 65nm core. So, let’s take a closer look at the heat dissipation and power consumption changes that arrived with the new production technology and new processors.
For our tests we decided to compare the consumed power in identical platforms with different processors – Yorkfield and Kentsfield. During our tests we measured the DC power going through the processor voltage regulator circuitry that would allow us to estimate the processor power consumption without taking into account the efficiency of the onboard processor voltage regulator. During our tests the processors were loaded using Prime95 25.3 utility.
Besides, we have also measured the processor die temperatures to complete the picture. We used CoreTemp 0.95.4 utility to get the readings from the digital thermal sensors built into the CPUs. We used a standard boxed cooler throughout our test session.
We activated Enhanced Intel SpeedStep and Enhanced Halt State (C1E) power-saving technologies during our tests. By the way, Penryn processors drop their clock frequency multiplier to 6x in case of low CPU utilization.
The obtained temperatures and power readings are given in the table below:
As we have expected, the new 45nm production process let Intel reduce the practical heat dissipation and power consumption of the new processors quite noticeably. As we have already said before, new low-k dielectric transistors boast much lower leakage current. As a result, they could get the processor to function stably at lower voltages and currents.
Although Core 2 Extreme QX9650 and Core 2 Extreme QX6850 feature the same typical heat dissipation, in practice Yorkfield consumes about 30% less under maximum workload. In idle state the advantage is even greater: over 50%.
Of course, the power consumption differences affect the CPU temperature as well. Yorkfield processor doesn’t warm up as much as the predecessor, which gives us some hopes for significant overclocking success. So, let’s check it out now.
We used the same platform for our overclocking experiments as we used for performance tests. However, instead of the standard boxed cooler, we installed a much more efficient Scythe Infinity with a 120-mm fan rotating at 1,800rpm.
The nominal Vcore of our processor was 1.2V and we decided to attempt overclocking it without increasing this voltage setting.
Intel traditionally removed the overspeed protection of expensive processors for computer enthusiasts. Core 2 Extreme QX9650 was not exception. So, we managed to get this CPU to run stably at 3410MHz obtained as 10 x 341MHz. Unfortunately, at higher frequencies the CPU failed to pass Prime95 stability test.
Anyway, quad-core processor overclocking to 3.4GHz without raising its Vcore is a very decent result. Although some Kentsfield processors with G0 core stepping could do something like that, we shouldn’t disregard their higher nominal voltage.
The second part of our tests was performed with the CPU Vcore increased to 1.5V. I would like to point out that Yorkfield’s lower heat dissipation does help a lot during overclocking: even with the voltage raised significantly the tested quad-core processor didn’t get overheated and didn’t fall into thermal throttling.
We managed to overclock Core 2 Extreme QX9650 to the maximum frequency of 4068GHz without losing stability. It was obtained as 12 x 339MHz.
So, the new processors can easily exceed 4GHz bar with air cooling system. Isn’t it great evidence how significant the frequency potential of the 45nm cores is?
By the way, our test CPU running at 4068MHz remained within acceptable thermal range all the time. Its maximum temperature didn’t exceed 85?C, and its minimal temperature in idle mode equaled 43?C.
So, Intel made another strong evolutionary move forward having prepared the launch of new Penryn processors with 45nm cores. Mastering new production technology alongside with introducing new SSE4 instructions, increasing the L2 cache and making other micro-architectural improvements allowed them to raise the processor performance by another few percent without increasing the clock frequency. Moreover, at the same time Intel engineers managed to reduce the new processors’ heat dissipation and power consumption although even their predecessors were quite economical already. The frequency potential has also grown a lot, which will please numerous overclocking fans.
As a result, Penryn processors make extremely good impression overall, they are far better than any of the currently existing desktop solutions from all standpoints. In fact, the only drawback of the newcomers is their absence in stores at this time. And the only processor that you will be able to purchase this year will be only the fastest and unjustifiably expensive Core 2 Extreme QX9650.
The market expansion of Yorkfield and Wolfdale complete lineups is expected to hit only in the beginning of next year. But at that time they will have to compete not only with their obviously slower predecessors, but also with the new AMD CPUs. We don’t know what the outcome of this rivalry will be; however, we don’t have even the slightest doubt that Intel is armed and ready for the launch of the competing Phenom processor family.
In conclusion I would like to point out only curious fact. Although for almost one a half years since the first CPUs on Core micro-architecture appeared and until today we have been talking of good frequency potential Core 2 cores have, Intel doesn’t hurry to raise the clock speeds of its new processors. Core 2 Extreme Q6800 processor launched in July 2006 worked at 2.93GHz, and the new Core 2 Extreme QX9650 launching officially in two weeks will work at the nominal speed of 3.0GHz. All this time the performance of Intel processors for enthusiasts increased totally thanks to other things: more processor cores, system bus overclocking, micro-architectural enhancements. Therefore, we get the feeling that Intel doesn’t speed up its CPUs on purpose, because there is simply no competition in the high-end market these days. Next year will show if this supposition is true, when AMD hopefully releases some new solutions to oppose the fast quad-core Intel CPUs. Anyway, the year 2008 promises to be extremely exciting.