Many-Core – the Evolution
Although Intel is investing huge amounts of money for the new micro-architectures, the recent processors from the chip giant – Atom, Core 2 – are based on relatively old, but very well advanced P5/Pentium and P6/Pentium Pro micro-architectures. Moreover, even the infamous Larrabee graphics processor was supposed to be powered by the P54C cores. Logically it may seem that the world’s largest processor maker seems to believe that having loads of cores is more important than having the best single-threaded core.
But it looks like everything is not that simple: Intel still sees single-thread performance as a target and many-core architecture as a research.
X-bit labs: Do you think that in future multi-core processors will sport many simple cores, but not a limited number of advanced cores, like today? Or maybe they will contain of as many advanced cores as possible?
Sebastian Steibl: Being a research person, I cannot comment on decisions of product groups. I can only comment on the trends we see. If you look at today’s software, even today’s multi-core software, to a large extent a lot of things need [maximum] single-thread performance. There are applications that benefit from single-thread performance, but there are [also] applications [which take advantage of] multi-thread performance.
There are implications for future processors. Single-thread performance will continue for a foreseeable amount of time be a requirement. The software industry at large still needs tools to be better prepared for exploit parallelism and this is this is one of the reasons why we have decided to give out SCC to interested academic institutions to advance methods and productivity enhancements for [multi-core] programming research.
For the foreseeable future large cores will play an important role.
There are may ways to boost performance of microprocessors. One of the ways to implement new instructions, like I case of Intel AVX. Another way is to implement special-purpose accelerators, like in case of AMD Fusion program. But which way is better?
X-bit labs: What is more progressive and economically feasible for high-end processors, to implement new instructions or certain special purpose accelerators or ALUs like those inside GPUs or a wide vector processing unit (those are similar though)?
Sebastian Steibl: I think they are similar [approaches]… In high-performance processing we need vector units, which we have been adding [for ages now] and we are getting good results out of it . In the mobile space, accelerators play an important role since mobile computing becomes more and more dominant. I could see a future that will have both of them. We actually have research programs that look into both dimensions.
X-bit labs: Do you think that x86 with accelerators (VLIW vector units, whatever) will be able to rapidly process graphics?
Jon Peddie: Absolutely! Intel's new Sandy Bridge processor, AMD's new Llano processor and its little brother, Ontario, will do just that. We will have the hardware readily available by 2011, [but] the software to exploit it will probably not be available until 2012 or 2013.
X-bit labs: What is the reason why Intel decided to allow software to determine the amount of cores to use? Maybe usage of an ultra-threaded dispatch processor – like the one used in GPUs – would be more efficient in terms of efficiency and complexity of software?
Sebastian Steibl: The SCC is a research vehicle, we wanted it be as experimental platform as possible. Having this architecture, we have software data flow, management of execution; it is much better – for a development platform – to have this kind of capability rather than to have a fixed-function unit. Maybe, a fixed-function [data scheduler] is more efficient, but having this program [allows us to] give more flexibility to software organizations.
X-bit labs: Do you believe that many-core CPUs (in a decade or more from now or so) will be able to process both general data and graphics?
Jon Peddie: If graphics operations can be reduced to geometric functions like tessellation and image functions like shaders, without the need for specialized processors (such as a texture processor, video scalar, or colour lookup tables). Then the answer is absolutely ‘yes’. Furthermore, I think it will happen less than five years and probably as soon as three. The reason for specialized processors is to overcome the processing time of scalar processors. With multi-core CPUs and high clock rates, we can be extravagant with the software load and run general purpose processors for highly specialized applications.