Power efficiency of relatively simplistic microprocessor cores, such as Intel Atom or ARM-based chips is well known and many believe that such central processing units (CPUs) can find their home in data center servers, which are supposed to become "green", in the future. However, specialists from Intel Corp. and Google warn that such cores may not be truly good even for massively-parallel applications due to their natural limitations.
Microprocessors based on ARM architecture designed by various designers consume only a fraction of power compared to chips like AMD Opteron or Intel Xeon developed specifically for servers. Unfortunately, the latest ARM Cortex-A9 and ARM Cortex-A15 "Eagle" cores are still 32-bit and are not x86-compatible. Atom processors consume more than ARM-based chips, but delivers better performance and supports x86. Naturally, many ARM or Atom cores may fit into power envelope of a Xeon processor and thus provide improved parallelism at lower power, something that is needed in data centers.
IBM "Green" Datacenter
But there are too many limitations for such servers, according to Kirk Skaugen, general manager of Intel’s data center unit, and Urs Hölzle, senior vice president of operations and Google fellow at Google. Firstly, current ARM designs do not support 64-bit capability that allows to address higher amounts of memory and process more complex instructions at once. Secondly, ARM designs are not x86 compatible and thus software has to be redeveloped for the new instruction set. Thirdly, slow execution of single-thread operations will slow the actual workflow more than exceptional parallelism will speed it up. In fourth, neither ARM-powere SoCs nor Atom chips feature virtualization technologies.
"What is the challenge that ARM has in [servers]? Well, it has an instruction set issue. If you are going to do hosting what application do you host? What is an application porting effort? We did application porting with Itanium, it took us about 10 years. Hundreds and hundreds of millions of dollars to port about 14 000 applications. ARM has to port for hosting all those applications over. Second challenge is the A9 and the A15 as we know it are 32-bit processors. Microsoft only supports 64-bit operating systems today. [...] So it’s an instruction set issue as well as a 64-bit issue. Everything we do in servers for real servers will be 64-bit," said Kirk Skaugen at Morgan Stanley technology and telecommunications conference, reports ZDnet web-site.
Lack of 64-bit support, cost of porting software to ARM or its optimization for Atom are two strong points against using of ultra low-power CPUs in servers for the sake of maximum parallelization. According to the Google fellow, even when such software is developed, in real world cases performance of such server systems will be limited by the so-called Amdahl's law.
"Even though many Internet services benefit from seemingly unbounded request- and data-level parallelism, such systems are not above the Amdahl's law. As the number of parallel threads increases, reducing serialization and communication overheads can become increasingly difficult. In a limit case, the amount of inherently serial work performed on behalf of a user request by slow single-threaded cores will dominate overall execution time," said Mr. Hölzle in a research paper.
Moreover, the software designed for such über-parallel systems will not be too responsive by definition; in fact, it should be ideal as one mistake in data schedulling further reduces performance.
"The more threads handling a parallelized request, the larger the overall response time. Often all parallel tasks must finish before a request is completed, and thus the overall response time becomes the maximum response time of any subtask, and more subtasks will push further into the long tail of subtask response times. With 10 subtasks, a one-in-a-thousand chance of suboptimal process scheduling will affect 1% of requests (recall that the request time is the maximum of all subrequests), but with 1000 subtasks it will affect virtually all requests," the Google fellow warned.
To sum up, even though highly-parallel systems can do certain tasks much faster compared to traditional machines and consume less energy, difficulties to design appropriate software and complexities of hardware designs may easily make such systems useless in real world scenarios.
"Although we are enthusiastic users of multicore systems, and believe that throughput-oriented designs generally beat peak-performance-oriented designs, smaller is not always better. Once a chip’s single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult," concluded Mr. Hölzle.