Besides the limitations I have described in the previous section, there are several even more severe ones. The processor can only process data when it has them at its disposal. So, we need a system bus to serve data to the CPU. The caching mechanism helps the bus, too. We will discuss caching today, now let’s deal with the bus.
The bus bandwidth (and the bandwidth of the memory subsystem) is a factor which limits the system performance growth as the CPU frequency increase causes saturation. So the bandwidth of the system bus is one of the main characteristics of a server platform. It’s also obvious that the bandwidth of the system memory should correspond to the demands of the system bus – otherwise it wouldn’t make sense to create a fast system bus. In other words, the system bus serves data to the CPU and takes them from it. The faster this process goes, the more time the CPU has for processing the data, resulting in higher performance. In many cases, the system bus links processors among themselves and with memory. The last thing is not always true: there are processors like the Opteron from AMD and the UltraSPARC IIi from Sun that have an integrated memory controller. Thus, memory connects directly to the processor, rather than via the chipset. Thus, such processors only need the system bus for connecting to each other and to the I/O system.
Now let’s dwell upon the ways to organize a multiprocessor system. There are several principal methods: a shared bus, “point-to-point” and switch architecture (the last is in fact a hybrid of the first two variants).
A shared bus is often employed for building 2-way systems for several reasons: simple wiring (the mainboard topology is less complex), fewer contacts, lower development cost. The point of this technology is linking processors and memory with one and the same bus as the illustration below shows. So, the system is simpler and easier to build, but all processors have to share the bus bandwidth. For example, all systems based on CPUs from Intel have this architecture (the Xeon and, with certain reservations, the Xeon MP and Itanium).
In fact, there are no principal limitations to building a system with more processors than two (for example, four-processor Xeon MP-based systems are based on the same principle). In theory, we can shape up an eight-processor system with a shared bus, too. But the system has one disadvantage, as you may have guessed: the bus bandwidth becomes not enough for so many processors. In fact, they have to include large caches into CPUs even in a 4-way system so that while one CPU is controlling the bus, the others could do something useful rather than wait idly. Another problem in multiprocessor systems is arbitration since requests and data are sent on the same bus usually in this system and the latency between the CPU request and the system response becomes even more important than the bandwidth. Of course, low latencies are better. Considering that we also need to keep track of the processor status and the cache data, many manufacturers also use another auxiliary bus. For example, Intel introduces the so-called Snoopy Bus for system monitoring.