<%BANNER[top_768x90]%>

<%BANNER[banner_468x60_h]%>

Conroe and EM64T: Is There a Problem?

For the past 10 days there have been a lot of discussions over the internet that the new Intel processors with Core microarchitecture cannot boast the same high performance in 64-bit modes. However, is it really true? You will find out from our detailed review today!

by Ilya Gavrichenkov
07/24/2006 | 12:28 PM

The first wave of excitement has already calmed down that emerged right after the first benchmark results of the new Intel processors on Core microarchitecture became public. We have really seen that the Core 2 Duo and Core 2 Extreme processors are currently offering the highest level of performance and that Intel truly deserves to be called the manufacturer of fastest x86 processors of today. The launch of new CPUs also known as Conroe pushed the formerly popular Athlon 64 X2 processors into the inexpensive mainstream solutions segment, thus changing the traditional market layout. The new Core 2 Duo turned into a true breakthrough compared with the Pentium D predecessors.

<%BANNER[article]%>

We have already discussed the details and peculiarities of the new Intel processor family in our previous materials:

At first glance it looks like the only drawback of the new processor family from Intel is their high price. Especially, since excellent performance is not the only advantage of the Core 2 Duo processors: they also boast comparatively low heat dissipation and power consumption as well as significant overclocking potential. However, in the choir praising the newcomer, there are a few voices that try to pinpoint the drawbacks of the new processor family that may theoretically slightly spoil its triumph in the market. One of the most insistent claims is the fact that the new Intel processors that have totally defeated all their competitors in the today’s most widely spread benchmarks will not be able to repeat their success in 64-bit work mode.

I don’t think I need to remind you that Core microarchitecture also supports 64-bit extensions Enhanced Memory 64 Technology (EM64T). This way Core 2 Duo processors become the first CPUs supporting x86-64 in the evolutionary chain including Pentium III - Pentium M – Core Duo – Core 2 Duo. In other words, the implementation of EM64T in the new processors is the first attempt of Intel’s Israeli engineering team to introduce 64-bit extensions into traditional 32-bit processors.

However, you shouldn’t underestimate the importance of the EM64T implementation. It doesn’t really matter for the NetBurst based processors, since the 64-bit infrastructure has not yet developed well enough in the desktop segment. However by the time Core 2 Duo processors become mass, the situation should change dramatically. In Q1 next year we will all see new Windows Vista operating system. One of the key features of the new Vista OS will be the native support for 64-bit AMD64 and EM64T extensions. Although Microsoft will also offer a 32-bit version of its promising OS, we expect much high performance from the 64-bit Vista version, of course. The release of the new operating system is expected to become a powerful catalyst for the transition from 32-bit to 64-bit architecture. Moreover, the need for more than 4GB of RAM will also stimulate this transition.

However, the 64-bit operating systems from Windows family have already started their invasion. The currently available Windows XP Professional x64 Edition already can work with 64-bit applications and 64-bit processors supporting AMD64 and EM64T. However, it is still not so widely spread in the market because there are not too many 64-bit applications available yet and besides, this new 64-bit version is none other but a slightly modified 32-bit Windows XP professional, which is so far more than enough to satisfy most user needs. In other words, contemporary users do not feel the need to switch to 64-bit despite all the advantages of the AMD64 and EM64T such as linear addressing of over 4GB of memory and higher performance thanks to more general-purpose registers and their capacity. However, the practical value of these advantages is still quite hard to detect, because there are not that many desktop applications in the market that work really faster on CPUs supporting AMD64 or EM64T or that would use over 2GB of RAM, which is the maximum that 32-bit operating systems would allocate for a single process.

Note that from the micro-architectural standpoint, it is not that hard to implement 64bit extensions of the classical x86 architecture. x86-64 requires more general-purpose registers (16) with higher capacity (64bit), more 128-bit SSE registers (16) and linear 64-bit addressing. Of course, CPU developers need to apply some effort to implement the x86-64 support properly. However, they do not need to radically change the architecture, which is an indisputable advantage of the x86-64 compared with IA64, for instance, which has been introduced in Intel Itanium solutions.

That is why we would expect the relative performance of CPUs with AMD64 and EM64T in 64-bit modes to be not too different from that in 32-bit Windows XP. Therefore, most of the contemporary test sessions of the new Intel Core based processors were conducted in Windows XP Professional SP2. However, the emerging reports about the issues Core 2 Duo and Core 2 Extreme have in 64-bit work modes inspired us to resume our performance analysis and investigation ASAP.

Today we are going to share more benchmark results with you and make some conclusions about the performance of new Intel processors in 64-bit OS and 64-bit applications.


EM64T in Core 2 Duo: What’s the Theory?

All the claims of relatively low Core 2 Duo performance in 64-bit modes are based on two facts. According to some info confirmed by Intel representatives, there are two limitations imposed over the EM64T support in Core microarchitecture. Firstly, Core 2 Duo processors do not support Macrofusion technology in 64-bit mode. Secondly, the processor code decoding may slow down because of the instructions working with additional registers available only with EM64T enabled. Let’s try and get to the roots of these two problems.

Thanks to Intel’s marketing people, Macrofusion is known as one of the key peculiarities of the new Core microarchitecture. This technology serves to increase the number of instructions processed per clock cycle. Namely, the processor recognizes some pairs of sequential x86 instructions as a single microinstruction. A good example of a pair like that is a comparison followed by conditional branch, for instance. The scheduler and the execution units see this microinstruction as a single command and process it accordingly. This way the code is processed faster allowing the CPU to execute up to 5 instructions per clock cycle at best.

However, non-operational Macrofusion technology in 64-bit mode can hardly affect the CPU performance that dramatically. Ideally, when there is a branch per every five x86 instructions and when all these five instructions fall into the 16-byte sample processed within a single clock cycle, the theoretical acceleration will make 25%. However in reality, this technology will ensure steady performance improvement only if the whole bunch of conditions are fulfilled. At least because the above describe frequency of conditional branches is not realistic at all. Moreover, Macrofusion technology is really efficient only if the average instruction length equals less than 4 bytes. As a result, the engineers estimate the possible improvement to be 3%-5% at the most. In other words, the absence of Macrofusion support in EM64T should be no reason for panic, because it doesn’t really affect the performance that much.

As for the overall performance slowdown caused by instructions working with additional registers, it results from the single-byte REX prefix that is added for all 64-bit operations. This prefix probably affects the average length of instructions processed by the CPU in 64-bit modes. As a result, there may be fewer instructions within the 16-byte code sample from the L1 cache that is decoded in a single clock cycle. In other words, the average instruction length in x86 code is about 2.5-3.5 bytes, while in 64-bit mode it increases because of the REX prefix. When the average instruction length exceed 4 bytes, the CPU may lose its ability to process 4 instructions per clock.

To be fair we should say that the increasing instruction length caused by the REX prefix is typical not only of the CPUs from Intel on the new Core microarchitecture, but also of the competitor’s K8 processors. The only difference is that K8 can handle maximum 3 instructions from this 16-byte sample to load the execution units to the full extent, while Core 2 Duo from Intel can process 4 instructions per clock cycle thanks to Intel Wide Dynamic Execution technology.

This way, we don’t think that the EM64T implementation issues discussed above are that dead serious for Core based Intel processors. The code is fully similar to the regular 32-bit code and it is processed just a little bit slower on Core 2 Duo processors because of the non-operational Macrofusion technology. As for the performance drop caused by the 64-bit operations, the ability of the CPU to work with more registers with higher capacity will definitely make up for the slowdown.

Therefore, we do not feel like dramatizing the drawbacks revealed in 64-bit support implementation of the new Intel microarchitecture. Although, they will have some influence on the performance, of course. In order to avoid spreading panic we suggest taking a closer look at the performance of Core 2 Duo and Core 2 Extreme processors in 64-bit Windows XP Professional x64 Edition with 64-bit applications and comparing the obtained results with what we see in 32-bit Windows XP Professional environment.


Testbed and Methods

As we have already mentioned above we set two goals for our today’s test session. First of all, we decided to check once again what performance improvement we could get from porting 32-bit applications for 64-bit architecture. And second, we decided to compare the implementation of 64-bit extensions in AMD Athlon 64 X2, Intel Pentium D and the new Intel Core 2 Duo. For our test session we selected the fastest and most expensive CPUs from the corresponding families positioned for hardware enthusiasts and computer gaming fans, although it shouldn’t necessarily be the case. As a result our test platforms featured the following hardware components:

The tests were performed with the mainboard BIOS setup for maximum performance.


Performance: What’s the Practical Result?

Although quite a bit of time has passed since the release of Windows XP Professional x64 Edition, the native 64-bit applications haven’t yet got that widely spread. Therefore, every time we tend to measure the 64-bit performance of x86-64 processors, we face the necessity of finding suitable applications that would allow us to measure the systems performance and that would exist in 32-bit as well as 64-bit version. Therefore, you shouldn’t be surprised that there are so few applications that we look at within this test session.

A very well-known utility that contains both: 32-bit and 64-bit code, is SiSoft Sandra 2007. Depending on the OS version, this program uses either 32-bit or 64-bit core. As a result, we could use a few small synthetic tests from the SiSoft Sandra 2007 suite to compare the performance of Core 2 Extreme X6800, Athlon 64 FX-62 and Pentium Extreme Edition 965 in 64-bit modes against their performance in 32-bit modes.

The results we obtained are very diverse, so we cannot really make any definite conclusions. Namely, the CPU with Core microarchitecture loses a lot of its speed when switching to 64-bit mode in ALU test, while Pentium Extreme Edition 965 and Athlon 64 FX-62 get slightly faster. In the arithmetic SSE3 test all CPUs work faster in the 32-bit version of the application. The only subtest showing the advantages of the 64-bit mode is Multimedia Floating Point. This is where Core 2 Extreme X6800 wins most in 64-bit mode speeding up by well over good 40%.

However, we wouldn’t really focus on these results. The thing is that 32-bit and 64-bit test versions within the SiSoftware Sandra 2007 suite use different algorithms based on different sets of instructions. Therefore, these results cannot serve as a basis for far-fetched conclusions.


Let’s take a look at the results obtained in a few other applications. We will now take ScienceMark 2.0. This test estimates the systems performance when working on typical math1ematical modeling of physical processes.

The first test, Molecular Dynamics, models the thermodynamic condition of the substance using molecular dynamics methodology. As we see, the algorithms used in this case work much faster in 64-bit mode. We see the highest performance gain by the K8 processor, which gets 2.7 times faster. As for Core 2 Extreme X6800, it boasts slightly lower results, speeding up by “only” 111%.

Another scientific test, Primordia, that calculates the quantum atom structures also reveals the advantages of the 64-bit architectures over 32-bit ones. However in this case, the performance advantage is not as high as in the previous case. The maximum performance boost belongs to Pentium Extreme Edition 965 processor working 57% faster in 64-bit mode. Athlon 64 FX-62 and Core 2 Extreme X6800 receive almost identical performance boost from switching to 64-bit mode: 14%.

Thanks to the benchmarking tool built into one of the most efficient archiving utilities – 7-zip, we measured the data compression and decompression speeds for different CPU work modes. The results show that there is only one CPU that benefits from switching to 64-bit mode: it is AMD Athlon 64 FX-62. Intel processors with NetBurst as well as Core microarchitecture slow down with EM64T enabled.

During data decompression all the testing participants show some performance improvement from switching to the 64-bit mode. However, the performance of Core 2 Extreme X6800 processor improves by only 2.5% and is a way lower than the performance boost on Intel NetBurst and AMD K8 based solutions.

Video encoding into wmv-format in 64-bit mode accelerates only on systems with NetBurst based processors. As for Core 2 Extreme X6800 and Athlon 64 FX-62 they work slower in 64-bit version of the Windows Media Encoder than in the 32-bit one. The difference makes about 10%.

PDNBench is a test that reveals the system performance when working with typical image editing tasks in a popular open graphics editing suite aka Paint.NET 2.64. The results obtained in this test show a pretty significant performance boost provided by the AMD64 and EM64T extensions. According to the chart above, the CPUs get 25% to 42% faster with the maximum performance improvement available from Intel Core based CPU.


Strangely enough, but the results of the computer algebraic Mathematica test show the advantage of EM64T over AMD64. Intel processors appear slightly faster in the 64-bit application version, while Athlon 64 FX-62 slows down by 13% in 64-bit mode.

Cinebench test shows the systems performance in professional 3D modeling application – Cinema 4D, which is available for both: standard Windows XP as well as Windows XP Professional x64 Edition. The final rendering results suggest that the system built on Athlon 64 FX-62 benefit most from the 64-bit work modes. In this case it gets 13% faster. EM64T technology implemented in Core 2 Extreme X6800 and Pentium Extreme Edition 965 cannot boast the same efficiency, so Intel processors speed up by only 5%.

As for OpenGL operations, they work slower in 64-bit mode than in 32-bit mode. And the processor microarchitecture has nothing to do with it. Looks like the graphics card driver is not quite optimized yet for Windows XP Professional x64 OS. And this is one of those factors that prevent 64-bit operating systems from getting popular.

POV Ray is a quite known 3D modeling system using ray tracing methods. Just like Cinema 4D, it is available for 32-bit and 64-bit platforms. Therefore, we couldn’t miss this opportunity to test our platforms in this application. The results once again show the victory of 64-bit processors. The 64-bit rendering is faster than in case of a regular 32-bit program version. As a result, Athlon 64 FX-62 speeds up by about 14%, while Core 2 Extreme X6800 – by 4%. As for Pentium Extreme Edition 965, the 64-bit POV-Ray works slower with it than the 32-bit one.

Most contemporary 3D games supporting x86-64 use the advantages of 64-bit work modes to ensure better image quality, so they don’t suit that well for our today’s test session. Luckily, we managed to find one game that doesn’t improve the image quality in the 64-bit mode without changing the appropriate graphics settings). It is Unreal Tournament 2004, which will be our measuring tool for performance analysis with enabled 64-bit EM64T and AMD64 extensions of our testing participants.

As we see, enabled 64-bit extensions do not affect the gaming performance that much. Athlon 64 FX-62 and Core 2 Extreme X6800 get only 1-2% faster, while Pentium Extreme Edition 965 wins more: the good 9%.


Conclusion

As we have expected, nothing serious has happened. CPUs with Intel Core microarchitecture and EM64T technology work normally in 64-bit modes. No dramatic performance drop has been detected in most benchmarks.

Of course, there are a few applications, when Core 2 Duo work slower in their 64-bit versions than it would in their 32-bit ones. Among them are Windows Media Encoder 9 or 7-zip archiving tool, for instance. However, since the other testing participants have also lost some of their performance in these tasks, the problem is most likely to be not in the microarchitecture. EM64T technology of Core 2 Duo processors has a positive effect on the performance in the majority of applications.


The diagram shows the performance increase (in percents)
for CPUs with Intel Core and AMD K8 microarchitecture
when we switch from 32-bit to 64-bit applications.

At the same time I would like to point out that it looks like Athlon 64 processors ensure higher performance increase when switching to 64-bit work mode. The average performance improvement we have seen from Athlon 64 FX-62 equaled 16%, while Core 2 Extreme X6800 demonstrated only 10% average performance boost. This way, there is a certain difference: AMD K8 turns out 6% mode efficient in 64-bit mode than Intel Core. However, this difference cannot compensate for the 20% performance advantage of the Intel Core 2 Duo over the Athlon 64 X2 working at the same clock speed, which we have pointed out in our previous articles. Therefore, we will not change our conclusions about the performance of the new Intel processors even keeping in mind the upcoming launch of 64-bit Windows Vista OS family.

<%BANNER[banner_468x60_f]%>