LEAN is what AMD has been doing since rv770, and what nvidia has done with one sku, 680...which is a good match for bw, rops, tmus, compute/sfu units, and clock yields on the process within 225w. It's a pretty perfect storm. I imagine the refresh will be very similar, but use 7ghz ram (old 6ghz was 1.6v and new 7ghz uses 1.5v) because 680 is clocked at stock (1112/6008) to use every ounce of bw...hence the higher clocks the process is capable would be better matched with faster ram within that TDP. In theory it could scale perfectly to 1300/7000, granted I don't think that is realistic on any consistent basis, especially within 225w.
On the subject of LEAN, AMD realized long ago the nvidia logic of maximizing for lower-voltage (and/or best yielding) clocks while using more and better-matched logic wasn't the best business model for them. (I assume)they estimate that performance level and then work backwards to find the lowest amount of logic that the process, with higher voltage/clocks can scale on a consistent basis within set tdps for markets (ie 150, 225, 300) to reach that level, and plan accordingly for salvage parts using more power-efficient and yielding clocks being one notch (pci-e connector) down. That is LEAN. nvidia's attempt at such a strategy (the 500 series) with stock tdps right at pci-e thresholds to limit the connectors, with overclocking going out of spec was a nice value, but pretty shady. Square peg, round hole. You can see they gave up on any residual planning with their three 150-225w gk104 parts and 140w <150w part, shoe-horned to beat competing products.
Everything we've seen points to AMD using 2560sp, or 40 CUs, which will probably be clocked in the ~1050mhz/6400 range (and probably overclock to 1200mhz or so like other high-end products). Why does it make sense? Because in theory 2560sp at 1.175v should be able to hit 1166mhz, which would saturate 7ghz on a 384-bit bus...all reasonable clocks/voltages for 28nm and gddr5. IE, lean.
OTOH, everything points to nvidia using more logic, probably less well-matched for 48 ROPs than gk104 is for 32 to keep power down and/or yields up using lower clocks; one would assume similar to 670 (980mhz) etc. In theory, 14 smx/2688 shaders could be clocked up to gtx680 (1112) levels with 7ghz ram and still have the same compute/bw ratio which could probably be done with low voltage...and because of the separate sfus and different cache/bw setup (or at least current optimizations) one-would-think perform around 10%+ better per clock than 2560sp from AMD. It probably won't clock very high before running into the tdp wall, but will probably be enough regardless. We shall see.
IOW though, it's probably the same old story. Logic versus voltage/clocks. Yields versus die size. $899 versus likely a hell of a lot less. Careful planning vs shoehorning.
01/22/13 03:31:52 AM]