For the applications this targets DDR bandwidth is only part of the picture: Mesh and L2 bandwidth matter at least as much, since they often use these cores to implement higher-level structures like pipelines and systolic arrays.
Also, who said anything about an "ARM core"?
Tilera uses their own 3-wide 64-bit VLIW core as opposed to an out-of-order superscalar like A9/A15/proAptive. That's actually a pretty nice tradeoff for their application space. Most of the workloads/algorithms they implement have reasonable static ILP, and they don't care about binary portability between core generations. They can therefore save a LOT of power/gates by pushing dependency analysis and instruction scheduling into the compiler.
This makes for an interesting contrast with LSI, who went the "many ARM core" route (16 A15s at 1.6 GHz) for a similar application space and chip size. Tilera achieves 216 Gops/sec peak, whereas LSI gets 77 Gops/sec. That's actually a pretty typical ratio for a simple VLIW vs an OoO core like A15. The trick is to use all of that bandwidth :-).
02/25/13 05:44:11 PM]