Bookmark and Share


No matter how secretive AMD wants to be with its future plans, sometimes it just has to open up its cards and show what’s next in a bid to ensure wide industry support for its innovations. Looks like, now is the time, as one of the company’s public documents revealed that the chipmaker is working on a huge chip that features sixteen x86 cores on a single piece of silicon.

Kaveri Opens Doors to the Future

AMD has tremendously conservative public roadmap that shows code-names of next-gen microprocessors, which are due within the following twelve months from the publication, but lacks many details. By contrast, the company’s documents for software developers have to be detailed and have to cover more distant future of AMD’s chips to give programmers a perspective where AMD is heading. As a result, those documents often reveal numerous important facts regarding upcoming products from the company.

Along with its long-awaited code-named “Kaveri” accelerated processing unit, AMD released an all-new “Software Optimization Guide for AMD Family 15h Processors” this month (15h stands for Bulldozer micro-architecture and derivatives), which contains tips how to optimize programs for the new APU as well as references to the future AMD offerings.

Sixteen-Core Chip

As it appears from the document (see page 197), the Sunnyvale, California-based chip designer is working on a Family 15h processor model 30h – 3Fh with eight compute nodes (this is how AMD now calls its dual-core x86 modules) and sixteen cores. The scheme that AMD provides clearly depicts a chip with eight modules interconnected using SRI [system request interface] with one crossbar (XBAR) that handles communication between the SRI, MCT [memory controller] and HT [HyperTransport] links. This kind of topology clearly points to a single-die multi-core microprocessor with up to sixteen (in AMD’s classification) x86 cores. Current-generation twelve/sixtreen-core processors from AMD use two dies (with six or eight cores) on a single piece of substrate.

“Newer models of Family 15h processors offer five links for connections to I/O and other processors. Of the five links, one link supports PCIe 3.0, two support coherent HyperTransport, and two are capable of either coherent HyperTransport or PCIe 3.0. These processors have 8 compute units (16 cores),” the document, which was discovered by web-site, from AMD reads.

The number of links inside the processor clearly points to its destination: high-end 2P and multiprocessor servers. AMD Family 15h processor models 30h – 3Fh classifier indicates AMD Steamroller micro-architecture. Given the fact that AMD plans to sell Piledriver-based server CPUs for a while (code-named Warsaw chips are set to arrive this year), this sixteen-core microprocessor will hardly hit the market until late in 2015, or, more likely, sometime in 2016. Therefore, it is logical to assume that it will be made using 20nm- or 14nm-class process technology.


Keeping in mind that AMD has used multi-chip-module solutions for server-class processors for a number of years now (to get sixteen-core Opterons using two eight-core processors), it is highly likely that the firm will stick to a similar design approach to create 32-core chips in a bid to compete against Intel’s Xeon “Broadwell-EP” and “Broadwell-EX” chips with eighteen and more high-performance cores.

Due to increased amount of processing engines, enhanced performance on the micro-architectural level and a number of other factors, it is likely that the commercial products based on the eight-core dies described above will multi-channel DDR4 memory.

To sum up, the new information, which AMD unveiled about its future projects, clearly indicates that the company has not forgotten about high-performance central processing units with high core-count. These chips are going to play a big role for quite a while, so, AMD wants to be there.

The only question is how competitive will a multi-core AMD Steamroller-based chip be against an Intel Broadwell-EX/EP in terms of performance?

AMD did not comment on the news-story.

Tags: AMD, Steamroller, Bulldozer, x86, Excavator, Opteron


Comments currently: 31
Discussion started: 01/18/14 12:37:06 AM
Latest comment: 03/01/16 12:25:36 AM
Expand all threads | Collapse all threads


yes troll, but this is exceedingly stupid - "mobile in desktop" - what does it even mean? and servers are dying? it is the dumbest thing i've ever heard from you... this week
9 1 [Posted by: snakefist  | Date: 01/18/14 01:34:51 AM]
- collapse thread

Please don't feed the trolls. Just vote him down to help silence him.
14 7 [Posted by: fanboyslayer  | Date: 01/18/14 02:05:30 AM]
Thats good. But do that silently just like me
7 3 [Posted by: tks  | Date: 01/18/14 04:36:59 AM]

AFAIK they were planning this, but it got scrapped due not being competitive enough, just like they scrapped the 10-core server piledriver, and K7 chips with >512 KB external L2 cache.
3 0 [Posted by: hkultala  | Date: 01/18/14 02:59:30 AM]
- collapse thread

And the APU. Just imagine had Kaveri come out in 2012 like it was supposed to?

AMD have consistently failed to meet deadlines for the past 5 years or so.
1 0 [Posted by: keysplayer  | Date: 01/18/14 04:10:47 PM]
Failing to meet deadlines is different than scrapping a chip totally.

Though sometimes the reason for scrapping may be too long delays that would make the product uncompetitive at launch time
0 0 [Posted by: hkultala  | Date: 01/21/14 08:02:14 AM]
Other cancelled AMD chips were Mustang (Athlon XP/MP with >= 1 MiB of on-die L2 cache), and Krishna/Wichita (28 nm Bobcat-based APU)

Also Bulldozer modules/cache controllers support also 1MiB L2/module (with 2 cycles faster latency) but no chip using such cache size has been released.
0 0 [Posted by: hkultala  | Date: 01/21/14 08:15:28 AM]

I highly doubt they will be Steamroller based, judging by the timeline.
If they aren't due until 2015-16, then they will be based on a later architecture, perhaps Excavator or the next gen cores.
6 1 [Posted by: caring1  | Date: 01/18/14 04:10:15 AM]
- collapse thread

Not a chance. These cpus are to satisfy existing contracts. Do not expect anything different than gpu-less Kevs.
0 1 [Posted by: amdzorz  | Date: 01/21/14 10:25:43 AM]

From the picture and description above, you can clearly see that there are actually 8 real cores, or compute units, like AMD prefer to call them. I don't think is fair to be called a true 16 core CPU since the 2 cores on each compute units are sharing the same cache. In real applications, those 2 cores are actually acting like a single one, because they share the same resources. This is visible especially in games.
6 1 [Posted by: TAViX  | Date: 01/18/14 05:34:43 AM]
- collapse thread

You don't know what you're saying... calling 2 cores that only share cache and the fp pipe is like saying intels ht doubles the core count... it is not clear cut but they are more like 1.8 cores then 1... also scaling anywhere from 40% to 100% means they are effectively 2 cores much more then 1. Plus a module has 2 compute units as per amds nomenclature...
3 3 [Posted by: tcube  | Date: 01/18/14 06:08:58 AM]
2 cores of Core2Duo share the same L2 cache. And what? They are not a 2 real cores for this reason*?
Granted, AMD compute units contains 2 iteger cores and one, shared by them, FPU unit.
And this is the reason for which some FPU-intense programs perform slower if they are run on 2 cores belonging to the same compute units than when they are run on 2 cores belonging to 2 separate cores.

*) Actually having shared (vs separate) cache of the same level is an advantage, not a problem.
If a 2 cores are accessing the same area of memory they cannot cache it in separate caches, because this would lead to inconsistency (writes done by one core are not visible to other core). In such cases a datas must be cached in (slower) L3 shared by both cores, or if L3 is not present, in memory (even more slower).
The last is sometimes case for Core2Quad (separate sets of L2 cache for each pair of cores and no L3) and for pre-Phenoms, and Athlons (both without L3 and with separate L2 for each core).

1 0 [Posted by: KonradK  | Date: 01/19/14 07:39:18 AM]
No, they call 2 cores as 1 because of public relation reason. The performance of 2 AMD cores could barely match their competitor single core, so to save face ...
1 0 [Posted by: Tukee44  | Date: 01/19/14 11:26:50 AM]
agree. if it cannot execute on all 16 cores 100% of the time.. and not dual pipe fpu.. not really 16 cores. its better than BD ever was.. but still a disaster aka p4 for amd.
0 2 [Posted by: amdzorz  | Date: 01/21/14 10:28:57 AM]

I find the graphics and the contents of this article confusing: if AMD is refering to "modules" (1 module = 2 threads with separate ALU/AGU/decode/etc., but with shared FPU between them) as "nodes" now, why call them "Compute Units" (CU) in the picture shown above ?

Anadtech's article explains that with this generation of products AMD is using term "Compute Core" for a single thread of a 2-thread module (meaning that top of the line Kaveri sporting two modules + 8 GPU compute clusters is called a "12 Compute Core" product; 4 from 4 threads from 2 modules plus 8 GPU clusters).

So does "Compute Unit" (CU) in the picture refer to "Compute Core" (meaning that this is merely a 4-module, 8-thread chip) or does it actually refer to modules, leading up to 8-module 16-thread chip ?

Please elaborate.
5 0 [Posted by: Alecto  | Date: 01/18/14 08:41:09 AM]
- collapse thread

This clearly would/may be a server [can only find a refrence to a Piledriver-based Warsaw CPU having up to 16 cores] Steamroller microarchitecture based 2 CPU per each module of the 8 total module (not on the offical roadmap)new chip. So why the Kaveri refrence? The Steamroller microarchitecture does share less computational resources between the CPUs, on each module of the steamroller based cores, but the server variant has more power hungry server type circuitry (cHT, and more PCIe 3.0 lanes, etc. and are usually clocked higher than mobile CPUs. Mobile Kaveri has less cores/modules, but it also has more HSA functionality, that this supposed server based 16 core chip lacks, i.e. no GPU. Kaveri has its tuned for mobile 2 CPUs per module(2 total Modules) for its 4 "cpu compute cores", plus 8 HSA aware GPU "compute Cores"/asynchronous compute engines(ACE) that also provide GPGPU compute acceleration! Cleraly the Steamroller microarchitecture that both Server and Mobile Chips are based around has more non shared CPU functionality, and can produce more IPC performence, but the server variants have no GPUs or HSA aware circuitry, but have extra PCIe lanes, and are designed for server motherboards with more than one 16 core CHIP per server motherboard, and high server bandwith clustering (via the HyperTransport links). the Steamroller microarchitecture does improve by using less shared CPU resources per 2 CPU module, it still shares a FPU but a much wider(even more so with some 15h server variants), indipendent FPU, than previous generation AMD microarchitectures. The linked to guide has 392 pages of programmers information, better brush up on you machine language skills, and computer microarchitectures texts! Nuthing new with server variants based on tweaked and added CPU execution resources, more CPU resources than the desktop SKUs, with the mobile having the minium CPU execution resources, but what is NEW with kaveri, is haveing full HSA GPGPU resuorces (for mobile and desktop)! so even though the FPU is still shared, the FPU itself on the server variants is beefed up to give each core the more "equivalent performence" of completely non shared resources CPU. Will this beat Intel or come close to it, question will have to wait for the actual post RTM indipendent benchmarks to be released. The cost per AMD server chip may more than make up for the less than Intel's top end performence, if the Price/performence metrics show the AMD product to be the better buy.

Notes:did find a server APU listed by Wikipedia, but the source listed by Wikipedia, is questionable.

"Enterprise and server markets: The Berlin APU will be similar to Kaveri, featuring four Steamroller cores, up to 512 stream processors, and support for ECC memory"

Looks like Excavator, will finally wash Hector Jesus Ruiz and Dirk Meyer out of AMD's hair, But oh the confusing AMD codenames and such!!!

Also for a in depth review of Kavrei read this:

1 0 [Posted by: BigChiefRunAmok  | Date: 01/19/14 04:01:48 PM]
If you go to the document it defines "These processors have 8 compute units (16 cores)."

And earlier text defines each module as 1 or 2 independent integer units and a floating point unit and fetch....

1 0 [Posted by: Principle  | Date: 01/20/14 11:14:15 PM]

Made by whom? TSMC on 20nm or Gloflo on 28nm? both have the capability to do so.
1 0 [Posted by: tedstoy  | Date: 01/18/14 08:40:02 PM]
- collapse thread

Quote from the article:
this sixteen-core microprocessor will hardly hit the market until late in 2015, or, more likely, sometime in 2016. Therefore, it is logical to assume that it will be made using 20nm- or 14nm-class process technology.

I am also interested in the power consumption for this CPU (APU?). My guess is that it will be in the 150W range, but knowing AMD, it can also be 200-250W quite easily, on full load...
2 0 [Posted by: TAViX  | Date: 01/19/14 05:13:34 AM]

What about the Abu Dhabi which is due now with up to 16 cores No. 6338 - 6370?
2 0 [Posted by: tedstoy  | Date: 01/19/14 06:54:27 AM]

It would be competitive (somewhat) if released today. By the time 2016 rolls around, 16 cores is going to be "meh" when compared against Skylake EX which will have probably 32 real cores.
0 0 [Posted by: AnonymousGuy  | Date: 01/20/14 05:34:28 PM]

You know, that isn't really larger than other Opteron chips that are on that same large socket, as well as are the Xeons of that size. I expect it will be released in 2015, otherwise it will be too late. This will come out on the 20nm line most likely, otherwise the die size using 28nm would end up being about 450mm^2. I can easily see them producing a 350mm^2 16-core CPU at 20nm with quad channel DDR3. This chip is to support old customers, a drop-in replacement for all the loyal that bought the Piledriver line. They already have mainboards galore and cooling and DDR3 RAM, etc... The main architecture shift will likely be with Excavator, using DDR4 RAM and all APUs. Excavator is expected to take advantage of the GPU compute units, as if there were just another set of FPUs to be dispatched to. So it would not require special programming like the current Kaveri APU. So Excavator is essentially by definition as the completion of FUSION and APU only design.

Technically, based on the Kaveri die size, AMD could make a 28nm Steamroller CPU with 8-cores, but they would likely be disappointing on the frequency front without changing the foundry process from bulk to something more optimized.
1 0 [Posted by: Principle  | Date: 01/20/14 08:45:43 PM]


Add your Comment

Related news

Latest News

Wednesday, November 5, 2014

10:48 pm | LG’s Unique Ultra-Wide Curved 34” Display Finally Hits the Market. LG 34UC97 Available in the U.S. and the U.K.

Wednesday, October 8, 2014

12:52 pm | Lisa Su Appointed as New CEO of Advanced Micro Devices. Rory Read Steps Down, Lisa Su Becomes New CEO of AMD

Thursday, August 28, 2014

4:22 am | AMD Has No Plans to Reconsider Recommended Prices of Radeon R9 Graphics Cards. AMD Will Not Lower Recommended Prices of Radeon R9 Graphics Solutions

Wednesday, August 27, 2014

1:09 pm | Samsung Begins to Produce 2.13GHz 64GB DDR4 Memory Modules. Samsung Uses TSV DRAMs for 64GB DDR4 RDIMMs

Tuesday, August 26, 2014

10:41 am | AMD Quietly Reveals Third Iteration of GCN Architecture with Tonga GPU. AMD Unleashes Radeon R9 285 Graphics Cards, Tonga GPU, GCN 1.2 Architecture