Advanced Micro Devices' approach towards creation of many-core microprocessors based on the Bulldozer micro-architecture is rather fresh and innovative. But in some cases the company's "dual-core modules" that form the chips will not work with maximum possible efficiency. In order to ensure highest achievable performance all the time, AMD will work with software vendors, the company said.
A software developer recently asked AMD whether it made sense to partition a multi-threaded algorithm to pairs of closely interacting threads, and schedule each pair to AMD Bulldozer module of two closely interacting INT cores. Apparently, AMD is already working with Microsoft and other makers of operating system to ensure that Bulldozer core pairs operate efficiently.
"For the majority of software, the OS will work in concert with the processor to manage the thread to core relationships. We are collaborating with Microsoft and the open source software community to ensure that future [...] operating systems will understand how to enumerate and effectively schedule the Bulldozer core pairs. The OS will understand if your machine is setup for maximum performance or for maximum performance/watt which takes advantage of core performance boost," said John Fruehe, the director of product marketing for server/workstation products at AMD.
In Bulldozer micro-architecture-based designs cores will be able to dynamically share fetch and decode blocks, caches and other units. At least in initial designs, multi-core chips will consist of several modules, each of which will have two independent integer cores (that will share fetch, decode and L2 functionality) with dedicated schedulers and two 128-bit FMAC pipes with one FP scheduler. This means that each major block is, according to AMD, essentially a tightly-linked dual-core microprocessor with shared fetch, decode and floating point units.
At this point AMD claims that two integer cores in a Bulldozer module would deliver roughly 80% of the throughput of a dual-core processor with similar architecture. It is understandable that some workloads will deliver lower performance and some will offer higher performance. But according to AMD, it is possible to optimize software for Bulldozer's modules.
For example, if a workload with a main focus of querying data and two threads are sharing a data set that fits in Bulldozer's L2, then having them execute in the same module could have some advantages. On the other hand, if a multi-threaded application is not optimized to target the L2 (or possibly the L3 cache), or one has distinctly separate applications to run, then to get better performance a developer will need to have them scheduled on separate modules.