There is a glaring and technically worrying misconception in the article at the end, and that is that the number of stream processors is all that matters (Spursengine/Cell vs GPUs). This is by far not the case, of way more importance is what the stream processors are capable of. And even though programmers complain about Cell being hard to program for, it is nothing compared to porting general purpose code to a GPU. Getting that to run fast is a real nightmare, and general purpose code is a lot different to running some shaders... Some algorithms are inherently unsuited for GPU processing, and that's even more than are already unsuited for Cell processing (though, thanks to the PS3 and Cell being used in supercomputers, there are breakthroughs where someone ports something to Cell that was said to be not possible before rather frequently!)
I'll give you three examples:
1) People at IBM tell me they can decode 2 1920x1080 H.264 streams in realtime on the PS3. I doubt Nvidia or ATI GPU can handle that, and we have no clue for how many tasks of the H.264 decoding pipeline they actually use the main CPU, because the task is inherently unsuited for GPU processing (CABAC aka Arithmetic Coding comes to mind!). There are realtime 1080p H.264 encoders for Cell (see: www.dicas.de/news120908 ). There's some software out that speeds up H.264 encoding called Badaboom that supposedly gets a 2x speedup on a 1440x1080 video. Ofcourse we have no idea what quality parameters both use, but let's just assume they both try to make their solution appear as fast as possible: A G80 would still only be about 1.5x as fast as a Cell, even though it has hundreds of stream processors more. And again it's questionable how much of the whole task in the GPU solution actually runs on the GPU...
2) Folding@home runs on PS3s Cell and Nvidia/ATI GPU. While GPUs are a lot faster at folding the jobs they're assigned by Stanford, the nature of these jobs also needs to be accounted for. On their FAQ page the F@h people say that only a very limited set of tasks can be run on the GPU client (very fast though!) and that only normal CPUs run the full breadth of simulations. And Cell is somewhere in between the two, meaning it lends itself for more tasks than a GPU, but still less than a CPU (assumably that's only for the tasks where double precision matters!). And it is also a lot faster than a CPU! ;-)
3) Let's look at a general purpose task like Raytracing. IBMs Realtime Cell raytracer vs the Saarland GPU one (that Intel also loves!), doing the Stanford bunny:
http://gametomorrow.com/b...p/2007/09/05/cell-vs- 80/
A PS3 with a somewhat limited Cell (6 SPUs of 8) is a little over 3x as fast as an Nvidia 8800 GTX in primary rays, but ofcourse in raytracing you atleast want secondary rays (what's the point of raytracing if you don't have reflections, right?), and here the PS3 is over 4x as fast as the 8800 GTX! And that is with the 8800 having many times the transistors, stream processors and theoretical GFLOPS of Cell!
Let's just face the facts: While modern GPUs *can* be used for other purposes than 3D gfx, it does not mean that they were *built* with anything else in mind! 3D FPS is what sells these things, not GFLOPS or protein folds per second!... Cell however was built to cover more breadth by having better programmability and more flexibility. I'm not sure, but I think it is still impossible to daisychain GPU stream processors to sequentially work on some data like you can on Cell (e.g. in Video decoding: IDCT -> Motion Estimation -> YUV scaling)..