MMX
The MMX extension appeared quite a long time ago and is now considered a standard for PC. MMX stands for Multi Media Extensions. This extension was intended for processing multimedia data, image and sound.
Processors supporting MMX have eight MMX registers, each 64bit (or 8Byte) large. MMX works only with integer numbers; 1, 2, 4 or 8-Byte data are supported. That is, one MMX register can store 8, 4, 2 or 1 operand.
Byte 7 | Byte 6 | Byte 5 | Byte 4 | Byte 3 | Byte 2 | Byte 1 | Byte 0 |
-128 | 127 | 100 | 70 | 60 | -50 | 20 | 10 |
Word 3 | Word 2 | Word 1 | Word 0 |
15000 | -30000 | 20000 | 10000 |
And so on. The data stored in the MMX registers can be added, multiplied and subtracted componentwise. There are other instructions, which often occur in multimedia applications, like add without overflow, arithmetic mean calculation, and logical operations. Bit by bit and, or, xor operations. There is one restriction, however, there is no division operation yet. But still, a lot of operations can be performed much faster than before. On the other hand, MMX requires manual optimization; no compiler can help you much. For example, various audio codecs are often optimized for MMX. Their algorithms get along well with MMX. Usually, a small part of the program, performing the biggest part of the encoding work, is optimized. This simplifies the entire optimization procedure a lot.
SSE2 – Integer Instructions
We jump from the “oldie” MMX straight to the newcomer SSE2. It makes sense, as SSE2 consists of two quite different parts: SSE development and MMX development. The former deals with real numbers, the latter – with integers. Compared to MMX, SSE2 registers are twice as big, i.e. there are not 8 numbers stored there but 16. It means twice the application performance after SSE2 optimization, because the instructions processing speed remained unchanged. By the way, a program already optimized for MMX can easily be further optimized for SSE2 due to similarities in their instruction sets.
Athlon XP processors don’t support SSE2. And we could witness a curious picture when Pentium 4 was at first losing to Athlon XP in speed, but after the application was optimized for SSE2, it would run faster on Pentium 4.
We should acknowledge that the idea to develop MMX into SSE2 was most felicitous. There are few programs that are optimized for MMX, but those that are, are optimized very thoroughly. We also should mention that Intel offered the software developers a number of SSE2-optimized libraries with some typical encoding functions almost for free, which played a crucial part in “saving” the Pentium 4 performance.





