There are subtypes of multiprocessor platforms in-between these two types (SMP and NUMA). I mean the Opteron platform from AMD in the first hand. This processor is remarkable since it has an integrated memory controller. This is a chart of a 4-way Opteron-based platform:
Evidently, the P0 processor can access its local memory faster than the memory of other processors. It seems like we deal with a NUMA architecture, no doubt. Yes, but it’s not exactly so. The NUMA architecture is characterized by a big difference between the times it takes to access different levels of the memory hierarchy, but in the case you see in the picture above, it takes just about 30% more time to access the memory of the processor located against the diagonal. This is a small difference, we can neglect it completely. Thus, we can write programs just as we do for an SMP platform – I already said that programming for NUMA is much more difficult. And if we use programs optimized for NUMA systems, we can hope for a performance gain. AMD coined a new term for this memory access organization – SUMO or Sufficiently Uniform Memory Organization.
Also there is such a form of multiprocessor systems organization as cluster. Clusters are usually many two- (sometimes one-) processor platforms, joined in some way. There are many ways, from exotic connections like SCSI to “classical” variants with Myrinet and Ethernet. The difference is in the cluster’s being in fact a conglomerate of computers, each of which runs its own operation system. Such machines exchange data, performing tasks together. An important fact about such systems is that the data exchange rate is much lower between nodes than within one node of the cluster (a node is a single computer in a cluster).
Clusters are mostly engaged into executing easily-paralleled algorithms that can run as data-unconnected (or slightly connected) tasks on each node independently. A typical example of such algorithm is calculations for 3D graphics, for example in 3D Studio Max, where each frame is rendered independently. This task can be performed efficiently by a cluster, while the cost of a cluster is much lower than that of a computer with the same number of processors. Of course, not all tasks can be solved with a cluster. For example, a hydrodynamic problem is unsuitable for a cluster since it involves calculating the model grid depending on the conditions in the neighborhood of each point, i.e. it’s necessary to exchange large amounts of data. As the data-exchange speed is much lower between nodes than within one node, the calculation of this task is inefficient – the processors are waiting for data from each other most of the time.