Intel Sandy Bridge Microarchitecture Preview

In a few weeks we will start posting reviews of Intel processors based on the new Sandy Bridge microarchitecture. But while we are still bound by the NDA, we decided to sum up all the information we know about these promising new products, which doesn’t fall under the NDA.

by Ilya Gavrichenkov
12/27/2010 | 02:49 PM

Every time, when we meet a new Intel processor family, the “tick-tock” strategy is brought up. Its principles are fairly simple: the company upgrades in turns either the production process, or the microarchitecture. And the full swing of this symbolic pendulum takes about 2 years. In other words, if back in late 2008 Intel introduced Nehalem microarchitecture and in early 2010 started manufacturing Westmere processors using 32 nm process, then in 2011 we are up for the new cycle – new microarchitecture. Today we know it as Sandy Bridge and this name has been on the front pages of all technology media for a long time, as it poses extreme interest to all of us. However, there is nothing surprising about it: each new step on the evolutionary spiral introduced by Intel delivers both: significant changes in the PC platform structure as well as substantial performance boost. CPUs with Sandy Bridge microarchitecture will obviously be no exception to this rule, because according to Intel CEO, Paul Otellini, the launch of these processors will have the same tremendous effect as the transition from 80486 CPUs to Intel Pentium did back in the days.

 

Taking into account everything mentioned above, we decided to start our information support of the upcoming product launch well in advance. A great way to introduce Sandy Bridge would be a description of the upcoming microarchitecture with our own commentary, where we will try to indicate the specific features that may be appealing to PC users. In other words, we will use the available preliminary info to help us explain why the upcoming Sandy Bridge processors should be appealing for users and why we should look forward to their official launch.

Before we dive deep into the technical details, we would like to share our thoughts on Intel’s passionate desire to replace the existing Nehalem microarchitecture. As we see, CPUs based on it are very fast and enjoy high demand in the market. Moreover, Intel’s primary competitor, AMD, cannot offer a worthy rival to Intel’s Nehalem microarchitecture anyway – the existing CPUs on Stars microarchitecture are considerably slower. As for the upcoming Bulldozer products, which will hopefully be more successful than their predecessors, their launch is still a while ahead. However, despite all these advantages, Nehalem does have a few hidden drawbacks that force Intel engineers to work really hard on new products even without obvious objective reasons.

First of all, it is the complex manufacturing process for the latest generation Westmere CPUs. These processors consist of two semiconductor dies produced with two different manufacturing technologies that are sealed inside a single processor package. The contemporary 32 nm process has already matured enough to allow making very complex monolithic processor dies with high production yields. In other words, there is nothing hindering further integration and increasing complexity of the semiconductor processor dies, which in the end should also bring the production cost down substantially. Secondly, Nehalem processors have already reached their frequency maximum. It means that the clock frequencies of the current microarchitecture have already hit the top, so it is extremely difficult to further overclock these processors without pushing their power consumption and heat dissipation beyond the acceptable standards. And it means that time has come to figure out new ways of increasing the performance, namely by improving the microarchitecture.

So, the upcoming Sandy Bridge launch is by no means demonstration of arrogance, but a pretty pragmatic measure that will allow making Intel processors not only faster and functionally more complex, but also more profitable.

General Information

It was clear over two years ago already that Intel was going to gradually transfer the chipset functions into the CPUs. The first CPUs on Nehalem microarchitecture – Bloomfield – acquired an integrated memory controller. The next generation, Lynnfield, also got a PCI Express bus controller, besides the memory one. After that Clarkdale processors also received an integrated graphics core, which, however, was implemented in the individual semiconductor die inside the processor package. Sandy Bridge dots the final “I” in this integration process: the CPUs built on the new microarchitecture will have everything integrated into a single chip: processor cores, graphics core, memory controller and PCI Express bus controller.

Sandy Bridge semiconductor dies will be about 225 mm2. It means that this die will be even smaller than the die of quad-core Bloomfield or Lynnfield, or even the six-core Gulftown CPUs due to finest 32 nm production process.

Sandy Bridge offers no compromises when it comes to features. The CPUs will consist of two or four processor cores with Hyper-Threading support, up to 8 MB of L3 cache-memory, a dual-channel DDR3 memory controller, will support 16 PCI Express 2.0 lanes and feature a modern DirectX 10.1 graphics core. In other words, new generation processors have everything it takes to conquer different market segments, including the high-end ones.

The overwhelming integration in Sandy Bridge microarchitecture inspired other significant improvements. The microarchitecture of the computational cores has been seriously modified, and these improvements will ensure that the new processors run considerably faster than their predecessors working at the same clock speeds. They have also worked on lowering the heat dissipation, so Sandy Bridge CPUs will be able to work at simply higher clock rates. Moreover, the new processors support new AVX instructions (Advanced Vector Extensions), which will be required for a number of multimedia, financial or scientific algorithms. AVX differ from the previous SSE vector instruction sets by higher operand width (256 bit instead of 128 bit), so they will allow processing larger amounts of data at a lower resource expense. Therefore, Sandy Bridge can be considered a significant step forward in a few directions at the same time, which gives us every reason to speak very highly of this promising product.

By launching Sandy Bridge early next year, Intel expects these processors to quickly conquer most of the market price segments. In the very beginning of the next year they will offer a wide range of Core i3, Core i5 and Core i7 processors on the new microarchitecture with the prices from $100 to $300. And later in 2011 they will also introduce even less expensive modifications.

The existing information suggests that the first group of Sandy Bridge based processors should be announced on January 5, 2011, and they should hit the stores by January 9. On that day the company is planning to include the following new quad-core desktop CPU models into their price list:

Note that in addition to the processor models mentioned in the table above, there will also appear corresponding mobile and energy-efficient CPU modifications based on Sandy Bridge microarchitecture. While desktop processors are certainly our primary focus, we could also tell you about the following Sandy Bridge based CPUs with 65 W, 45 W and 35 W TDP that will be coming out on January 5:

So, the only price range that will still be occupied by Nehalem processors over the next year is the expensive LGA1366 CPUs, such as Bloomfield and Gulftown. They will most likely be replaced no sooner than by the end of 2011, when Intel adapts for desktops their LGA2011 server platform accordingly. Special “loaded” Sandy Bridge E modifications offered as part of this platform will boast up to eight computational cores, 16 MB L3 cache, quad-channel memory controller, 32 PCI Express 2.0 lanes, and other sweet little things that we can currently only dream of. However, this is something that will happen farther down the road. The first Sandy Bridge version will become the basis for a more down-to-earth, but still very new platform.

Although Sandy Bridge didn’t acquire any completely new units compared to Clarkdale, the new generation processors will come to the market as part of the LGA1155 platform. Unfortunately, it is incompatible with LGA1156, so new processors will require new mainboards with special processor socket.

Together with Sandy Bridge, a new chipset family will hit the market. Its primary figures will be Intel P67 chipset and Intel H67 chipset with integrated graphics. Just like the LGA1156 chipsets, the new P67 and H67 are very simple: now that the North Bridge functions have all migrated to the CPU, these chipsets only consist of one chip – a South Bridge with pretty typical set of functions. Besides being compatible with Sandy Bridge, these new chipsets will support two 6 Gbps SATA ports.

Unfortunately, the new chipsets do not support USB3, but most LGA1155 mainboards will undoubtedly have the corresponding ports implemented using additional onboard controllers. The same is true for the PCI bus: the absence of the corresponding controller inside the new chipsets doesn’t at all mean that the traditional PCI connectors will no longer be on the mainboards.

Although there is still a little time left before actual processors with Sandy Bridge microarchitecture and LGA1155 platform come out, the already known facts allows us to draw pretty specific conclusions about the performance of the upcoming systems. For example if we compare Sandy Bridge and Lynnfield processors with the same number of computational cores that work at the same clock frequency, the new microarchitecture delivers 5-10% higher actual performance.


Performance data courtesy of inpai.com.cn.
They compared quad-core processors at 3.4 GHz frequency.

At the same time, the power consumption of Sandy Bridge processors appears about 25% lower, i.e. the new CPUs have progressed substantially in terms of performance-per-watt ratio. And by the way, taking into account that the nominal clock speeds of Sandy Bridge processors are about 10% higher than those of comparable Lynnfield ones, we can conclude that the new platform as a whole will be at least 25% faster than the previous-generation LGA1156 platform. This number should be used as a reference point for the practical value estimates of the new microarchitecture, if we leave aside such extensive enhancements as graphics core improvement and new AES-NI and AVX instructions support.

Where Is High Performance Coming From?

There are a lot of unexpected (to say the least of it) microarchitectural changes that allowed Intel engineers to increase the performance of their processors while lowering their power consumption and heat dissipation. The thing is that Sandy Bridge has not just become further evolutionary development of Nehalem microarchitecture. It also borrowed a lot of ideas from the seemingly failed Pentium 4 project. Yes, even though Intel has long given up NetBurst microarchitecture because of its energy-inefficiency, some functional units of the Pentium 4 processors will now find their way into the new Core i3, Core i5 and Core i7 CPUs. And it is especially ironic that the adoptions in Sandy Bridge are used not only for raising the performance, but also for lowering the power consumption.

We start noticing significant changes in Sandy Bridge microarchitecture in the beginning of the pipeline already: when x86 instructions are decoded into simpler processor micro-ops. The actual decoder remained the same as in Nehalem – it processes 4 instructions per clock cycle and supports Micro-Fusion and Macro Fusion technologies that make the output instructions thread more even in terms of execution complexity. However, the processor instructions decoded into micro-operations are not just transferred to the next processing stage, but also cached. In other words, in addition to the regular 32 KB L1 cache for instructions that is a feature of almost any x86 processor, Sandy Bridge also has an additional “L0” cache for storing the decoding results. This cache is the first flashback from NetBurst microarchitecture, its general operation principles make it similar to the Execution Trace Cache.

The decoded micro-ops cache is about 6 KB big and can store up to 1500 micro-ops, which makes it of great help to the decoder. If the decoder discovers instructions that have been translated earlier and are now stored in the cache, it replaces them with internal micro-operations without performing any new decoding. This decoded micro-ops cache helps to take a big load off the decoder, which is a pretty energy-hungry part of the CPU. According to Intel, this additional cache comes in handy in about 80% of cases, which makes all suspicions about its inefficiency absolutely unjustified. Besides, when the decoder in Sandy Bridge is idle, it is disabled thus helping lower the CPU power consumption substantially.

The second important improvement in the early pipeline stages deals with the branch prediction unit. You can’t underestimate the importance of proper functioning of this unit, because each incorrect branch prediction requires stopping and clearing the pipeline completely. As a result, the prediction mistakes not only have a negative effect on performance, but also use additional power for filling up the pipeline all over again. I have to say that Intel managed to make this unit extremely efficient in their new processors. However, they modified all the Sandy Bridge buffers used to store branch addresses and prediction history in order to increase the data density in them. As a result, Intel is able to store longer branching history without increasing the size of the data structures used by the branch prediction unit. And that had a great effect on the branch prediction unit efficiency, which is directly connected with the amount of static data about executed branches that it works with. According to preliminary estimates, the branch prediction correctness in Sandy Bridge improved by more than 5% compared with the predecessor.

But it is the key unit of all Out-of-Order processors – the Out-of-Order cluster - that underwent the most interesting modifications. This is where Sandy Bridge and NetBurst microarchitectures seem to be the closest: Intel engineers brought back the physical register file into their new processors (if you remember they retired this file in their Core and Nehalem processors in favor of a centralized Retirement Register File. Before, when they rearranged micro-ops, they used to store full copies of registers for each operation in the buffer. Now they use links to register values stored in a physical register file. This approach allows not only to eliminate excessive data transfers, but also to prevent multiple duplication of the register contents thus saving space in the register file.

As a result, the out-of order cluster in Sandy Bridge processors can keep up to 168 micro-ops “in sight” at the same time, while Nehalem processors could store only 128 micro-ops in their ROB (reorder buffer). Besides, some energy is also being saved. However, replacing the actual values with the links to them also has its negative side: the execution pipeline gets new stages required for dereferencing the pointers.

However, the developers didn’t really have much of a choice in case of Sandy Bridge. These processors support new AVX instructions operating 256-bit registers, so transferring their values forth and back numerous times would inevitably create additional overhead expenses. But Intel engineers made sure that the new instructions in Sandy Bridge microarchitecture are executed fast enough. In this case high performance will guarantee that software developers will accept the new instructions, because only in this case they can really increase the parallelism and throughput in vector calculations.

AVX instructions are none other than further development of SSE, which increases the size of the SIMD vector registers to 256 bit. Moreover, the new instruction set allows non-destructive execution, i.e. when the original data in the registers is not lost. As a result, AVX instruction set, just like the microarchitectural improvements, can be considered an innovation increasing the performance and saving the power, because their implementation will allow simplifying many algorithms and using fewer instructions to complete the tasks. AVX instructions are quite fit for heavy floating-point calculations in multimedia, scientific and financial applications.

The processor execution units have been redesigned specifically to ensure that 256 bit instructions can be executed effectively. The major redesign had to do with pairing two 128 bit execution units in order to efficiently process 256 bit data packs. And since each of the three execution ports in Sandy Bridge processors (just like in Nehalem ones) has units for simultaneous work with three types of data – 64 bit, 128 bit integer and 128 bit real – it makes perfect sense to join SIMD units into pairs within the same port. And most importantly, this resources rearrangement doesn’t affect the bandwidth of the processor execution unit at all.

Since Sandy Bridge is designed to work with 256 bit vector instructions, the processor developers had to address the performance of the functional units responsible for data loading and unloading. Three ports designed in Nehalem for that purpose have migrated to Sandy Bridge. However, in order to increase their efficiency Intel engineers unified two of these ports that used to serve for storing addresses and loading data. Now they have become equal and can either load addresses and data or unload addresses. The third port remained unchanged and is designed for storing data. Since each port can let through up to 16 bytes per clock, the total throughput of the L1 data cache in the new microarchitecture increased by 50%. As a result, CPUs with Sandy Bridge microarchitecture can load up to 32 bytes of data and store 16 bytes of data per clock cycle.

If we compare all above described innovations, we will see that the microarchitecture of computational cores in Sandy Bridge processors has been modified more than significantly. These innovations are undoubtedly serious enough to be regarded as dramatic modifications rather than simple fixing of Nehalem’s bottlenecks.

New Approach towards Integration

With the introduction of Nehalem microarchitecture Intel started making real steps towards increasing the level of integration in their processors. They began moving inside the CPU all of the functional units that used to be prerogative of system chipsets, such as memory controller, PCI Express bus controller, graphics core. The CPUs also received an L3 cache. In other words, the CPU has turned not into just a local computing center, but into a bundle of a numerous different complex units.

Of course, this integration has a lot of benefits to it and allows increasing the performance due to shorter wait time during data transfers. However, the more different units there are inside a CPU, the more difficult it is to connect them all electrically. And the most difficult task in this case would be the connection between shared L3 cache and the processor cores, especially, since the number of cores tends to increase later on. In other words, when the developers were working on Sandy Bridge processor microarchitecture, they had to give a lot of serious thought to convenient connection between all functional units inside the processor. The formerly used common crossbar interconnect worked fine in dual-, quad- and six-core Nehalem processors, but it doesn’t fit for modular CPUs with a large number of different cores inside.

In fact, they have already taken it into account in eight-core server Nehalem-EX processors where they used an absolutely new technology to connect computational cores with the L3 cache. This technology is called Ring Bus and it has successfully migrated to the new Sandy Bridge microarchitecture. All the computational cores, cache, graphics core and North Bridge elements inside the new processors are connected via special ring bus supporting a QPI-like protocol, which allowed to significantly reduce the number of inter-processor connections needed for signal routing.

They divided the L3 cache of Sandy Bridge processors into equal banks, 2 MB each, in order to ensure communication between the processor functional units with the L3 cache via the ring bus. The original design implies that the number of these banks equals the number of processor cores. However, for marketing reasons they can disconnect some banks from the bus without any damage to the cache integrity and thus reduce the cache-memory size. Each of the cache-memory banks is managed by its own arbiter but at the same time all of them work closely together: the data in them is never duplicated. The use of banks doesn’t split the L3 cache, but rather increases its bandwidth, which in its turn scales according to the growing number of cores, and banks, respectively. For example, since the “ring” is 32 bytes wide, the peak L3 cache bandwidth inside a quad-core CPU working at 3.4 GHz frequency is 435.2 GB/s.

Scalability to the number of processor cores is not the only advantages of the ring bus. The latency of the L3 cache has also gone down, since data transfers along the “ring” take the shortest route. Now the L3 cache latency is 26-31 clock cycles, while the L3 cache in Nehalem processors offered 35-40 clocks latency. However, in this case you should keep in mind that all cache memory in Sandy Bridge works at the processor frequency, i.e. this is another reason why it has become faster.

Another advantage of the ring bus is its ability to also include the graphics core integrated into the processor to the general data transfer routes. It means that the graphics core in Sandy Bridge doesn’t work directly with the memory, but like the processor cores do – via L3 cache. This way it works faster and also eliminates the negative effect on the overall system performance caused by the graphics core trying to take part of the memory bus from the processor cores for its own needs.

Graphics Core Acquiring New Functions

Graphics core inside a CPU is no news. Clarkdale processors with a built-in Intel HD Graphics GPU have been in the market for almost a year now. But it is only in Sandy Bridge that graphics and computational cores have finally made friends: they are inside the same semiconductor die and are connected with the same ring bus that is also shared with other processor resources. This architectural change when the graphics core got very close to the memory controller and was granted access to the L3 cache had a positive effect on performance. However, the graphics core, just like the computational processors cores, has also been improved in many other ways, so that it has formally become a next generation component.

Overall, graphics core architecture hasn’t changed dramatically: it is still based on the same 12 execution (shader) processors. However, the developers managed to make them almost twice as fast in a number of tasks and reached higher levels of parallelism in them. Among the innovative changes we should definitely mention Shader Model 4.1 and DirectX 10.1 support.

Since graphics core has now moved into a 32 nm semiconductor die, it is now possible to easily increase its clock speed, which may go as high as 1.35 GHz. As a result, the performance of Sandy Bridge graphics core in real applications will be comparable with that of entry-level discrete graphics accelerators. Intel even thought of implementing FSAA in their integrated graphics solution! In other words, Sandy Bridge has every chance to become the fastest integrated graphics solution out there, which will even be capable of threatening the discrete graphics cards in the low-end price segment. I am pretty sure that AMD and Nvidia will object by stressing the lack of DirectX 11 support, which may come in handy not only in the latest games, but also in applications utilizing DirectCompute such as the upcoming Internet browsers.

However, improving the existing graphics core architecture was not the only thing they’ve done in this department. They also equipped the graphics core of the new Sandy Bridge processors with special new units intended for video decoding and encoding in popular formats, such as MPEG2, VC1 and AVC.

Of course, hardware video decoding is not something revolutionary these days, even the Clarkdale graphics core can do it just fine. However, this operation used to be the responsibility of shader processors, while now there is a special functional unit in charge of it. This rearrangement of responsibilities was required for proper support of 3D video: the new graphics core can easily cope with hardware decoding of stereo 3D Blu-ray or MVC streams.

Another interesting addition is the hardware codec that can encode video into AVC format. In practical terms it means that Sandy Bridge has all the resources necessary for fast video transcoding without utilizing any of the traditional CPU resources for that purpose. And keeping in mind how popular Intel processors are these days, it will most likely be actively employed by software developers. Especially, since hardware encoding and decoding units can also be used in Intel P67 based systems, i.e. in systems with an external graphics card.

And the examples are right here: we all know that media functionality of the new Sandy Bridge processors will be supported in such popular applications as ArcSoft MediaConverter, Corel DVD Factory, CyberLink MediaEspresso, Movavi Video Converter, Roxio Creator, etc. And by the way, when multimedia units inside Sandy Bridge graphics core are utilized for video transcoding, shader processors remain free, so they can give a hand in additional video processing or applying special effects.

Different processors on Sandy Bridge microarchitecture will have one of the following graphics core modifications: Intel HD Graphics 2000 and Intel HD Graphics 3000. They differ by the number of active execution (shader) processors. The top GPU model positioned for mobile solutions and top desktop CPUs will have all 12 execution units, while the “lite” version of the same core, Intel HD Graphics 2000, will only have six units like that. Intel HD Graphics 2000 will also work at a slightly lower frequency. However, the most interesting GPU components, namely – hardware encoder and decoder – will be untouched and identical in both GPU modifications.

New North Bridge – System Agent

The only functional unit of the new Sandy Bridge processors left to discuss is the so-called System Agent that contains all external interface controllers: PCI Express, DMI, memory and display interfaces. In fact, this System Agent is very similar to what we know as Uncore in Nehalem processors. However, System Agent in Sandy Bridge is nevertheless not sully identical to Uncore. It doesn’t contain the L3 cache, which is an individual functional unit working at the processor frequency in the new microarchitecture. Another distinguishing feature of this System Agent is that it also uses the ring bus to exchange data with the processor and graphics cores as well as with the L3 cache.

Speaking of the innovations in the System Agent we would first of all like to mention the long-anticipated improvement of the memory controller. The memory controller in Westmere (Clarkdale) processors was combined with the graphics core and this solution didn’t prove successful. They have finally eliminated this issue in Sandy Bridge: the new memory controller works at least as fast as the controller in Lynnfield CPUs. Note that it supports dual-channel DDR3 SDRAM: DDR3-1067 or DDR3-1333 according to the formal specifications. But in reality Sandy Bridge processors support multipliers that also allow clocking the memory at 1600, 1866 and 2133 MHz.

You can get an idea of how fast the memory controller in Sandy Bridge processors actually is from the following results of Aida64 benchmark:


The results courtesy of xfastest.com.
They tested Core i7-2400 processor with
dual-channel DDR3-1600 memory (7-7-7-21-1T timings).

The latency of the memory sub-system in a computer with a Sandy Bridge CPU inside is comparable with the latency of a similar LGA1156 platform with a Core i7 processor. However, the new processors have indisputably higher memory sub-system bandwidth.

The PCIE bus controller in Sandy Bridge is similar to the same controller in LGA1156 processors. It supports 16 PCI Express 2.0 lanes that can either form a single PCIE x16 bus, or two PCIE 8x busses. Therefore, the old LGA1366 platform will still remain demanded after the launch of the new LGA1155 systems: it will be the only system supporting full-speed video sub-systems including multiple GPUs connected via PCIE bus with the maximum bandwidth.

An important change has been made to the display interfaces as well. The graphics core in the new processors will support HDMI 1.4, which key peculiarity is 3D support.

Power Control Unit and Overclocking Functionality

Besides external interface controllers, another important part of the Sandy Bridge System Agent is Power Control Unit (PCU). Just like in Nehalem processors, this unit is a programmable micro-controller that collects information about temperatures and currents in different parts of the processor and can control its voltages and frequencies interactively. PCU implements power-saving functions as well as Turbo-mode, which developed even further in the new Sandy Bridge microarchitecture.

All functional units of the Sandy Bridge CPUs are split into three domains, each using its own independent clocking and powering algorithms. The first primary domain contains processor cores and L3 cache working at the same frequency and voltage.   The second domain is the graphics core working at its own frequency. The third domain is the System Agent itself.

This structure allowed Intel engineers to implement Enhanced Intel SpeedStep and Turbo Boost technologies simultaneously and independently for the processor cores and the GPU. They have already used the same approach in their mobile Arrandale processors, but that implementation was simple: it worked via driver. Sandy Bridge has a fully hardware solution that controls the frequencies of computational and graphics cores interdependently, considering their current power consumption. This way processor cores can be overclocked more effectively in Turbo mode when the graphics core is idle, and the other way around: the graphics core can be overclocked significantly when the processor cores are not fully utilized. You can get a great idea of how aggressive Turbo mode in Sandy Bridge is from the way the frequencies may increase over their nominal setting: by four increments for the CPU and up to 6-7- increments for the GPU.

However, these are far not all the innovative changes made to the Turbo Boost technology. The advantages of its new implementation imply that the PCU can control the frequencies in a more intellectual way, taking into account the actual temperatures of the processor units, and not only their power consumption. It means that when the CPU works in favorable thermal conditions, its power consumption can exceed the TDP maximum.

Processor load is usually very uneven during typical everyday work. Most of the time the CPU is in power-saving state and needs to work very fast only for short periods of time. The processors doesn’t get overheated within these short periods due to inertia from the cooler heat conductivity. The PCU unit in Sandy Bridge processors that controls the frequencies believes that nothing bad will happen if at this time the CPU is overclocked more dramatically than the TDP can theoretically allow. As soon as the processor temperature starts approaching the critical thresholds, the frequency will be dropped down to safe levels.

This automatically suggests that it more beneficial to use highly efficient cooling systems in Sandy Bridge based platforms for achieving maximum performance. But do not get overexcited: there is a hardware limitation that will restrict the maximum length of “beyond TDP” operation with 25 seconds.

As for regular overclocking, performed with traditional methods and tricks, we can expect quite a few serious changes here as well. And they will hardly be welcomed with too much enthusiasm. Striving for maximum integration, Intel had to move the base clock frequency generator being moved to the chipset in LGA1155 platforms. However, this measure isn’t the one to have a fatal effect on the common overclocking procedures. The main issue is that this clock frequency generator is the only one in the system and it is responsible for generating all the frequencies. And as we know, not all busses and controllers can actually tolerate overclocking that well. For example, when we increase the PCI Express frequency or USB or SATA controllers speed, instability may occur fairly quickly. This factor is going to be a serious obstacle during CPU overclocking by raising the clock generator frequency.

And the facts are as follows. Sandy Bridge processors have their clock generator frequency set at 100 MHz. The generator allows varying this frequency in a very broad range even with 0.1 MHz increment. However, any attempts to increase this frequency rapidly lead to system instability or failure. At this point we don’t know about any successful attempts to push the clock generator frequency beyond 105 MHz. In other words, the time tested overclocking method of raising clock generator frequency doesn’t work in Sandy Bridge processors and can only allow a 5% increase at best.

So the only way to overclock the upcoming LGA1155 processors is to increase their clock frequency multiplier. Intel is going to offer several Sandy Bridge based CPU modifications that will have an unlocked frequency multiplier and should theoretically overclock to 5.7 GHz (57 is the maximum multiplier allowed by this microarchitecture). However, these processors marked with a “K” in their model number will only be available in the upper price range and will cost a little more than their regular brothers.

The owners of ordinary CPUs will be offered artificially limited overclocking: these CPUs will also let you increase their clock multiplier, but only by four increments tops. Note that it is all only about overclocking, as the multiplier adjustment will not affect Turbo Boost technology, which will also contribute to this manual frequency increase. Moreover, Intel won’t limit the multipliers for the graphics core and memory frequency. In other words, it will be possible to overclock memory and graphics in any Sandy Bridge based system: a regular or an enthusiast one.

However, I doubt that it will be sufficient compensation for overclockers out there, so they will most likely be primarily interesting in CPUs with an unlocked multiplier, namely Core i7-2500K and Core i7-2600K. Especially, since the information available about their frequency potential seems to be very promising. For example, there is evidence that Core i7-2600K remains stable at up to 5.0 GHz frequency with only air cooling involved.


Screenshot courtesy of windwithme.

The above shown result was obtained with Prolimatech Mega Shadow Deluxe Edition cooler and the CPU core voltage increased to 1.45 V. Of course, this high voltage increase will hardly suit an everyday usage model, but we assume that at about 4.8 GHz frequency Sandy Bridge processors will be able to work 24/7.

Conclusion

Going back to the beginning of our article we would like to remind you that Intel positions their Sandy Bridge as a “tock” within their “tick-tock” strategy. It means that the developer considers this processor to be a bearer of the new microarchitecture. At the same time, we didn’t find anything mind-blowingly new about its structure and design. In fact, they improved a few things, brought back a few good old ideas and proceeded with deeper integration. Do they have the right to talk about a new CPU generation or in reality Sandy Bridge is just further evolution of Nehalem processors?

 

And here we don’t have even the slightest doubts: we completely agree with Intel on this one.  Sandy Bridge processors is an excellent example of how new quality emerges from a number of quantitative changes that have accumulated over time. Numerous innovations in the microarchitecture of the new cores, added support for 256 bit AVX instructions, enhanced graphics core, hardware modules for video encoding and decoding, new L3 cache, ring bus, intelligent System Agent, more aggressive Turbo Boost technology and higher clock frequencies – all these things may seem little when taken separately, but together they create a principally better new product. And its advantage over the predecessors is quite noticeable – Sandy Bridge processors have become much faster within the same thermal envelope as their predecessors.

Of course, when we say “much” we do not imply that they have become several times faster. Nevertheless, if you replace your LGA1156 Lynnfield or Clarkdale based system with a similarly priced Sandy Bridge CPU and an LGA1155 mainboard, you can expect at least 25% faster performance in all CPU-dependent applications.

However, there are tasks where Sandy Bridge will be 10 times better than their predecessors due to new structural units. First of all, we can see a significant performance boost in many video transcoding utilities, which can now use special hardware codecs and decoders in the CPU. Also, multimedia, cryptographic, scientific and financial algorithms will now work faster on the new processors as they will utilize new AES-NI and AVX instructions. Of course, you can only enjoy all these advantages with special software optimizations, but they shouldn’t take long to be available, because Intel engineers did their best to ensure that software developers can easily use these innovations.

Those users who intend to use the integrated graphics core will also benefit from having this platform. Compared with the previous generation Intel HD Graphics, the new graphics core has become much faster, which the owners of new notebooks with Sandy Bridge and Huron River platform inside will feel. And if the new CPUs are intended for home desktop systems or HTPC, the built-in graphics core will provide HDMI 1.4 support required for ultimate 3D experience with external devices.

Overall, there is only one serious issue with Sandy Bridge, as we can tell at this time: overclocking. While the users in higher price range will be able to get an overclocking-friendly processor with an unlocked multiplier by paying a little bit extra, in the sub-$200 range there won’t be anything like that. So, LGA1155 platform will also set another tendency: Intel’s intention to prevent overclocking of their inexpensive CPUs. However, I doubt it will have any effect on the popularity of the overclocking phenomenon: AMD will happily open their arms for all users who love to squeeze all juices out of their systems, as they should also launch a milestone product next year aka Bulldozer.