AMD’s energy-efficient processors codenamed Ontario and Zacate are based on new cores with the Bobcat microarchitecture. The main goal that the AMD engineers pursued when developing it was to minimize the chip’s power consumption as much as possible. Bobcat-based processors are meant for applications and devices where the one processor core has a power consumption of 1 to 10 watts only.
No wonder that the Bobcat has little in common with the well-known Stars microarchitecture. Like Intel’s engineers that developed the Atom, AMD’s specialists had to give up a lot of features and think about cutting off everything unnecessary to lower the power draw of the resulting chip. The outcome looks more like AMD’s K6 processor rather than like a modern Phenom.
However, the Bobcat has retained the key feature of modern CPU microarchitectures as it supports out-of-order execution. This is the main difference of AMD’s approach from Intel’s. Intel’s Atom does not support out-of-order execution of instructions in order to save more power whereas the problem of making full use of all of the CPU subunits is solved by Hyper-Threading technology that enables one CPU core to run two instruction streams simultaneously.
AMD does not have anything like Hyper-Threading, so its new processor has a classic scheduler that is responsible for reordering incoming instructions. Howeover Bobcat scheduler uses a physical register file, not a centralized retirement register file. The physical register file contains links to register contents and makes it unnecessary to transfer data within the processor when there are changes in the order of instructions. Intel has implemented a similar solution in the Sandy Bridge series, by the way. A physical register file reduces the amount of data transfers within the processor and saves power but also increases the depth of the execution pipeline which is as long as 15 stages in the Bobcat.
Of course, the new processor from AMD has a branch prediction unit whose operation has a great effect on the energy efficiency of the whole thing. AMD’s engineers have tried to improve this unit once again, so in the Bobcat microarchitecture it can predict up to two branches per clock cycle and process indirect branching as well as return statements.
Overall, the Bobcat microarchitecture can execute two x86 instructions, including 64-bit ones, simultaneously. Of course, the processor actually works with its own internal microinstructions, but from a statistical point of view, up to 89% of x86 instructions are transformed by the Brazos decoder into one symmetric microinstruction. About 10% of x86 instructions are decoded into a pair of microinstructions and no more than 1% of incoming x86 commands are translated inside the processor into a sequence of several micro-operations. Thus, it is quite likely that the Bobcat will be processing two x86 commands per clock cycle most of the time. This can give us some notion about the level of performance we can expect from the new microarchitecture, by the way. The clock rate being the same, it is theoretically only two thirds as fast as the Stars (K10) microarchitecture.
As it is optimized to perform two instructions per clock cycle, the Bobcat has two integer ALUs, a couple of data ports and two 64-bit FPUs which also perform integer multiplication. Bobcat-based processors support SIMD instruction extensions up to SSSE3 and SSE4A but do not support AMD’s own 3DNow! extensions. The recent instruction sets like AESNI and AVX are not supported, either.
The Bobcat’s cache system consists of a 64-kilobyte L1 cache, one half of which is for data and another for instructions, and a 512-kilobyte L2 cache which is clocked at half the clock rate of the rest of the processor. Each CPU core has dedicated L1 and L2 caches. Multi-core Bobcat implementations do not have a shared cache.
Of course, the Bobcat microarchitecture is also optimized for energy efficiency. The processor supports power-saving states up to C6 and can turn off unused L2 cache sections.