Types of Multiprocessor Architectures
Before getting deeper into technological details, I would like to make one important reservation. Systems that include a lot of processors differ from those that have just a few in one thing: memory hierarchy. Imagine we have a really big system with, say, several hundred CPUs. Let them be united in fours, and four fours are joined into a group with a switch; several such groups are joined with a higher-level switch. We will find certain peculiarities in the operation of this system, which we haven’t noticed before. The memory access time will now differ depending on where the accessed memory module is. For example, it will take less time to access the local memory, situated on the processor PCB, than the memory that belongs to another four-processor unit – the switch has to spend some time to process the request and find the necessary address. And if the necessary address belongs to another group, the access time will be much higher than with the local memory.
That’s why a classification of computer systems with respect to memory hierarchy is accepted. Let’s learn it.
The simplest case is a symmetric multiprocessor system (SMP) where there is no memory hierarchy at all. Figures 1 and 2 are examples of such systems: they show that each processor is absolutely the same as another one as far as memory access is concerned. The memory is accessed through the memory controller, which resides in the chipset, and the request that came earlier is served first. Nearly all 2-way systems follow this principle as well as a majority of 4-way and a certain number of 8-way systems. For example, the above-mentioned ProFusion, although has a switch inside, is an SMP system since every processor has equal rights with regard to memory access.
The Non-Uniform Memory Access concept is a more complex case – I described it in the example with a several-hundred-processor machine. I have already stressed the fact that the memory access time will greatly differ depending on what exactly address we require. Accordingly, to reach the maximum performance the operating system must allocate the closest memory for the processor that performs a task. It’s necessary that the operation system could efficiently allocate memory resources and knew about the architecture of this system. Such operation systems are difficult in development and very expensive like the hardware itself (it’s clear that a system of several hundred processors can’t be cheap by definition). This architecture is mostly applied in supercomputers and mainframes (we can distinguish between these types of computers mainly by the ratio of the memory performance to the processor performance).
Again, as I have already said, a majority of multiprocessor servers (from 8 CPUs on) are NUMA-systems. Of course, the manufacturers support such servers with NUMA-optimized operation systems.