Bookmark and Share


Power efficiency of relatively simplistic microprocessor cores, such as Intel Atom or ARM-based chips is well known and many believe that such central processing units (CPUs) can find their home in data center servers, which are supposed to become "green", in the future. However, specialists from Intel Corp. and Google warn that such cores may not be truly good even for massively-parallel applications due to their natural limitations.

Microprocessors based on ARM architecture designed by various designers consume only a fraction of power compared to chips like AMD Opteron or Intel Xeon developed specifically for servers. Unfortunately, the latest ARM Cortex-A9 and ARM Cortex-A15 "Eagle" cores are still 32-bit and are not x86-compatible. Atom processors consume more than ARM-based chips, but delivers better performance and supports x86. Naturally, many ARM or Atom cores may fit into power envelope of a Xeon processor and thus provide improved parallelism at lower power, something that is needed in data centers.

IBM "Green" Datacenter

But there are too many limitations for such servers, according to Kirk Skaugen, general manager of Intel’s data center unit, and Urs Hölzle, senior vice president of operations and Google fellow at Google. Firstly, current ARM designs do not support 64-bit capability that allows to address higher amounts of memory and process more complex instructions at once. Secondly, ARM designs are not x86 compatible and thus software has to be redeveloped for the new instruction set. Thirdly, slow execution of single-thread operations will slow the actual workflow more than exceptional parallelism will speed it up. In fourth, neither ARM-powere SoCs nor Atom chips feature virtualization technologies.

"What is the challenge that ARM has in [servers]? Well, it has an instruction set issue. If you are going to do hosting what application do you host? What is an application porting effort?  We did application porting with Itanium, it took us about 10 years. Hundreds and hundreds of millions of dollars to port about 14 000 applications. ARM has to port for hosting all those applications over. Second challenge is the A9 and the A15 as we know it are 32-bit processors. Microsoft only supports 64-bit operating systems today. [...] So it’s an instruction set issue as well as a 64-bit issue. Everything we do in servers for real servers will be 64-bit," said Kirk Skaugen at Morgan Stanley technology and telecommunications conference, reports ZDnet web-site.

Lack of 64-bit support, cost of porting software to ARM or its optimization for Atom are two strong points against using of ultra low-power CPUs in servers for the sake of maximum parallelization. According to the Google fellow, even when such software is developed, in real world cases performance of such server systems will be limited by the so-called Amdahl's law.

"Even though many Internet services benefit from seemingly unbounded request- and data-level parallelism, such systems are not above the Amdahl's law. As the number of parallel threads increases, reducing serialization and communication overheads can become increasingly difficult. In a limit case, the amount of inherently serial work performed on behalf of a user request by slow single-threaded cores will dominate overall execution time," said Mr. Hölzle in a research paper.

Moreover, the software designed for such über-parallel systems will not be too responsive by definition; in fact, it should be ideal as one mistake in data schedulling further reduces performance.

"The more threads handling a parallelized request, the larger the overall response time. Often all parallel tasks must finish before a request is completed, and thus the overall response time becomes the maximum response time of any subtask, and more subtasks will push further into the long tail of subtask response times. With 10 subtasks, a one-in-a-thousand chance of suboptimal process scheduling will affect 1% of requests (recall that the request time is the maximum of all subrequests), but with 1000 subtasks it will affect virtually all requests," the Google fellow warned.

To sum up, even though highly-parallel systems can do certain tasks much faster compared to traditional machines and consume less energy, difficulties to design appropriate software and complexities of hardware designs may easily make such systems useless in real world scenarios.

"Although we are enthusiastic users of multicore systems, and believe that throughput-oriented designs generally beat peak-performance-oriented designs, smaller is not always better. Once a chip’s single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult," concluded Mr. Hölzle.

Tags: Intel, Google, Opteron, Windows, Microsoft, Cortex


Comments currently: 4
Discussion started: 03/02/11 07:04:26 AM
Latest comment: 03/02/11 02:05:03 PM


Well, they can always use Fusion APUs which have 64-bit as well as GPU processing support. AMD also just announced that they will offer "Headless" versions of the Fusion processors when driving a display is not necessary (embedded and possibly servers?).
0 0 [Posted by: mamisano  | Date: 03/02/11 07:04:26 AM]

Let's go point by point.

"Firstly, neither current ARM nor current Atom designs support 64-bit capability that allows to address higher amounts of memory and process more complex instructions at once."

ARM's NEON extensions support 64-bit and 128-bit SIMD operations. ARM has announced 40-bit addressing support (1TB address space) and 64-bit has been discussed[1], albeit as a secret unannounced project.

Further, most ARM cores in development have high performance wide stream processors onboard. Everyone has sights set on being able to release the first mobile GPU as capable as the PS3, and they're not so very far off.

"Secondly, ARM designs are not x86 compatible and thus software has to be redeveloped for the new instruction set."

Who cares? Linux just works today. Microsoft has already announced they too will go MicroArm in the future.

"Thirdly, slow execution of single-thread operations will slow the actual workflow more than exceptional parallelism will speed it up."

A 2.5 GHz out-of-order multiple-dispatching core is nothing to sneeze at (ARM A15). That said there are workloads which are extremely time critical, and where single threaded operation is paramount and ARM may not be the best choice.

Most companies do not have this concern. They're working with highly concurrent workloads, which are task or data parallel. ARM's great potential is here; scaling out.

At present, AMD's 4164 EE and Intel's L5630 are top of the picks for energy efficient processing. As for the upstart competition, whether "such cores may not be truly good even for massively-parallel applications due to their natural limitations," is a valid concern today, as A9 is just reaching market. There are places where real time performance trumps all. Some workloads indeed will with 32-bit and even 40-bit virtual memory spacing-- ZFS for example requires very large virtual address space to operate performantly.

0 0 [Posted by: rektide  | Date: 03/02/11 07:55:19 AM]

Intel PR dissing a competitor... should we be surprised, or even moderately engaged ?

Itanium indeed never took off, but it was a brand new architecture, with no software at all, and complex optimization constraints. ARM has been around for ages, already as a full ecosystem and toolchain - several, actually. It can run Linux and pretty much all mainstream servers, especially a full LAMP stack, and much more.

On the processing power side of things, first not all servers need much power, second Moore's law puts ARM servers at level of today's x86 servers 5-6 year from now.
0 0 [Posted by: obarthelemy  | Date: 03/02/11 10:04:08 AM]

Some current Atom CPUs do implement 64-bit mode, so the article is inaccurate in that respect.

In reply to rektide and your commets about NEON extensions: Vector length is not the same as processor bit-width. By your logic the Pentium-MMX was a "64-bit" processor, the Pentium-II and later were "128-bit", and Sandy Bridge is "256-bit" (due to MMX, SSE, and AVX respectively). When people talk about a processor being "64-bit" they're generally talking about whether it supports 64-bit pointers and 64-bit scalar integer operands.

No ARM has "40-bit virtual memory". They all have 32-bit pointers, and therefore 32-bit per-process virtual address spaces. What the A15 does have is 40-bit physical addressing via large page-table entries, such that multiple processes can collectively access more than 4 GB via different virtual address spaces. They're doing the same thing that Intel implemented with PAE on the Pentium-Pro all the way back in 1995 (and that others had done before that, for example the IBM POWER series had large physical address support early on).

Unfortunately, PAE-like tricks present severe challenges both to OS designers and in implementing solutions for server-class workloads. For example, you can't use mmap() effectively on large datasets, you have to do nasty things with low memory and bounce-buffers for I/O, etc. I'm actually surprised ARM went there - It was at best a stopgap measure for Intel, and I don't think it will buy ARM much breathing room.

0 0 [Posted by: patrickjchase  | Date: 03/02/11 01:41:42 PM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture