I just want to reiterate that saying SSE3 will save the P4 is silly. What about all the northwood users without SSE3? They're just going to have to live with a lame CPU I guess.
http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=1956&am p;p=9
http://www.thejemreport.com/modules.php?op=modload&na me=News&file=article&sid=109
SSE3 has 11 new instructions. 13 if you count all of the Prescott New Instructions, there are two new instructions to improve hyperthreading that are unrelated to single thread math performance.
FISTTP, ADDSUBPS, ADDSUBPD, MOVSLDUP, MOVSHDUP, MOVDDUP, LDDQU, HADDPS, HSUBPS, HADDPD, HSUBPD, MONITOR, MWAIT
The FISTTP instruction is useful in x87 floating point to integer conversion, which is an instruction that will be used by applications that are not using SSE for their floating point math. No benfit because you wouldn't use x87 in the first place if you have SSE2, right?
MOVSLDUP, MOVSHDUP and MOVDDUP are nice for complex math I guess. I can't see much use for them in games.
ADDSUBPS, ADDSUBPD are useful for FFT (used is DSP like tasks, not games) according to anandtech, but I don't see the use for them in games.
LDDQU is for unalligned loads, something you want to avoid. Not useful for games because you can just program them to avoid this.
The horizontal instructions. Yes, these are useful for doing a dot product. It'll save a few instructions of loading and adding. This will give you a noticable benefit if all you did was dot products. Pretty useful for doing software emulation of vertex shaders or doing software skinning (why do I own a card that does vertex shaders again?), but not going to give a noticable boost even for physics because there is so much else going on that saving a couple instructions on the dot product isn't that helpful.
Other than stating SSE3 and Hyperthreading optimizations could help save the Pentium4 from it's embaresment which are wrong this article was great! It was very informative. Thanks.
SSE3 is easier than reenginering your game for multithreading (very difficult) because presumably new Intel compilers will automatically use it if they can. You know what else is just as easy provided you've written good code? Recompiling your application as x86-64. This will likely provide a bigger boost than SSE3 or hyperthreading could.
Sure, some values will now be 64bit instead of 32bit and cause the cache to overflow quicker, but all in all having 16 (14) registers instead of 8 (6) should more than make up for it by not having to access the cache or main memory near as much. This would make the Athlon64s kill the Pentium4 even worse than we saw.
[
Posted by: ac

|
Date: 11/25/04 09:32:18 PM]