After the arrival of Matrox Millennium G400, an excellent graphics card in all respects - both for its performance and quality, we were all waiting for Matrox to continue steadily increasing the performance in its future chips, such as G600, G800 and so on. However, the company rolled out just a modified version of G400, the G450 chip, after that. G450 based graphics cards earned their popularity as a "working horse" among users who need high 2D image quality, but in games they looked no that attractive with the competitors shipping much stronger products.
G450 was followed by the G550 chip. It didn't boast high performance, either, but Matrox Millennium G550 based cards provided good image output onto the display and supported digital displays. Being the continuation of G450, G550 again proved to be more "for work" than "for play" and we witnessed no attempts from Matrox to claim the leadership in the gaming graphics card market.
This situation could prove dangerous for the very presence of Matrox in the graphics card market, as there were strong rivals appearing even in the stronghold of Matrox: high display image quality. Fortunately, we won't forget the name of Matrox. 11 months after the announcement of G550, on May 14, 2002, the company announced its new chip, Matrox Parhelia-512.
Closer Look: Parhelia-512 Chip
The flow-chart of the Matrox Parhelia-512 chip:
Click to enlarge
Here are its general features:
- 0.15micron manufacturing technology;
- 80 million transistors;
- 200/220MHz chip clock-rate;
- 256bit DDR SDRAM memory controller;
- 500MHz (250MHz DDR) / 550MHz (275MHz DDR) graphics memory frequency;
- Up to 256MB of graphics memory;
- The support of PCI 2.2, AGP 2.0 and AGP 3.0 interfaces and AGP 1x, 2x, 4x protocols;
- The support of ACPI, PCI Bus Power Management 1.1;
- Four pixel pipelines with four texturing units per pipeline;
- Rendering of up to four textures per pass;
- Texture blocks dynamic management;
- Bi-linear, tri-linear and anisotropic texture filtering;
- Anisotropic filtering with up to 16 samples (anisotropy level - up to 4);
- S3TC/DXTC texture compression;
- Volumetric textures;
- Projected textures and render-to-texture support;
- Flat, spherical and volumetric environment maps;
- Bump mapping via EMBM, Dot3 and Emboss techniques;
- Hardware support of DirectX 8.1 ver. 1.3 pixel shaders;
- Hardware support of DirectX 8.1 ver. 1.1 and DirectX 9 ver. 2.0 vertex shaders;
- 4-pipeline T&L and vertex shaders unit;
- Simultaneous processing of up to 16 vertexes;
- Depth-Adaptive tessellation;
- Support of Bezier splines, N-patches;
- Displacement maps;
- Bi-linear and tri-linear filtering of displacement maps (Vertex Texturing);
- Full-screen anti-aliasing by means of 4-subpixel (2x2) supersampling;
- Fragment Anti-aliasing with the quality of 16x (4x4) supersampling;
- Textures, vertexes, instructions and Z-buffer caching;
- The support of textures and the frame buffer with enhanced color precision: 10bit per channel (ARGB 2:10:10:10);
- Gaming in TripleHead mode (Surround Gaming);
- Two built-in 400MHz RAMDACs with 10bit per channel color depth;
- Two external 165MHz TMDS transmitters with 10bit per channel color depth;
- Specialized RAMDAC for the TripleHead mode;
- Built-in TV-encoder with 10bit per channel color depth;
- Multi-display configuration support in DualHead, TripleHead, Clone, Zoom and DVDMax modes;
- Hardware support of Windows XP GDI and DirectDraw functions;
- 10bit per channel color depth in graphics modes;
- Hardware font anti-aliasing (Glyph Antialiasing);
- Video output in the 10bit per channel color modes;
- 8-bit alpha-channel overlays;
- Independent overlay gamma correction;
- Scalable video using bi-cubical interpolation and 4x4 filter with programmable coefficients;
- Adaptive per-pixel de-interlacing;
- Max. CRT resolution: 2048x1536;
- Max. DVI resolution: 2560x 2048;
- Max. DVI total resolution in the TripleHead mode: 3840x1024;
- Max. TV resolution: 1024x768;
All present-day graphics cards and operating systems are known to be disintegrating each color into three components (channels): red, green and blue. The stored color value of every component takes eight bit at most. So, this value may range from 0 to 255 (2^8-1). There're 16777216 (256x256x256) possible combinations of the three color components. This way, there are up to 16 million colors to be displayed in case of the regular color storing system.
The GigaColor technology is one of the "hits" of Matrox Parhelia-512. It's intended to provide more precise color representation. With GigaColor, every component of pixel color can be written into 10 bits and thus every color component (channel) may range from 0 to 1023 (2^10-1) and the total number of colors that can be displayed on the monitor increases over a billion. It's 1073741824, to be exact.
Matrox Parhelia supports 10-bit color components representation throughout the entire pipeline, including textures with 10bit per channel color precision (ARGB 2:10:10:10), higher-precision internal calculations, frame buffer with 10bit color depth, video and overlays. The same color depth is used in image output units, such as TV-Out, RAMDACs and TMDS transceivers.
The only shortcoming of GigaColor is that the increase in the number of bits assigned to each of the color components, didn't imply the increase in the number of bits for the entire pixel: every pixel is allotted the same 32 bits in the memory. To increase the color precision they had to sacrifice the remaining bits intended for the other pixel info, so that it got as small as 2 bits only. And the problem about it is that this spot is usually used to save the alpha channel (transparency) value. So, now, it's impossible to use GigaColor technology when we need more than 2-bit precision of the alpha channel.
Multi-Display Configurations, TripleHead, Surround Gaming
Matrox Parhelia-512 boasts all the multi-display features of its predecessor, Matrox Millennium G550. They're DualHead Clone, DualHead Zoom, DualHead Multi-Display and DualHead DVDMax:
Moreover, there's one more multi-display configuration in Matrox Parhelia-512: TripleHead.
In this mode, the graphics card provides image output onto three displays in the desktop extension mode. In this case all the displays have the same screen resolution, and the image is displayed in Stretched mode, i.e. has no shift on coming over to the next display. This feature could be useful for those who need a really large desktop:
TripleHead has its shortcomings as well. In three-display mode you have to make sure that all displays have the same screen resolution, color depth and refresh rate (for CRT displays). It also doesn't support TV-out and GigaColor technology in this mode.
Surround Gaming implies using a frame buffer of triple width and image output onto three displays in games. This way, in a 3D action or auto-simulators, for instance, the player's field of view (FOV) appears wider but not even a little bit higher.
Simulator fans can really enjoy the feature.
Adaptive Polygon Tessellation
Adaptive polygon tessellation is one more key feature of Matrox Parhelia. As hardware adaptive polygon tessellation is going to be one of the requirements for DirectX9-compatible graphics cards, let's have a closer look at the technology.
Tessellation is the splitting or partitioning of the image into multiple smaller triangles that helps to reach higher level of detail of the model. An example of the tessellation is the TruForm technology from ATI. It provides this partitioning and the vertexes of the new, smaller polygons are not uniplanar with the vertexes of the triangle being partitioned. These vertexes lie in the plane determined by the direction of normals in the vertexes of the original triangle (for more information refer to the documents on TruForm and Curved PN Triangles on the ATI's web-site). As a result of the higher level of partitioning, the triangle starts looking like a smooth patch. The model consisting of these patches (N-patches) instead of original triangles looks much more natural. Look at the picture below. In the upper row you can see the original triangle and a few levels of tessellation, while the lower row shows the "patches", this triangle forms:
Tessellation of this kind requires no extra data to be sent to the GPU as all the necessary information (the coordinates of vertexes and normals) is already included into triangle vertexes description. And this means that the amount of descriptive data for each model doesn't grow as we shift to TruForm-like technologies, but can be made even smaller by reducing the amount of original polygons for the model.
We can go further and let the hardware choose the tessellation level necessary for a given triangle. For example, we can use the maximum tessellation level for closely located objects and no tessellation at all - for distant ones. Thus, we can greatly reduce the number of vertexes to be processed by the geometric pipeline of the chip (T&L and vertex shaders units have to process tessellated vertexes in addition to the original polygon's vertexes) and boost performance almost without losing any quality.
But here we can encounter a problem. Imagine a model that consists of two polygons, and the one closer to the observer is more tessellated than the other:
Tessellation produced incorrect results: the polygons share one edge and each polygon created new vertexes on the rib, but these vertexes do not coincide. So, there are "gaps" in the model, that are marked in red. Evidently, we need some better way to perform adaptive tessellation.
This problem disappears once we resort to continuous polygon tessellation.
The usual triangles partitioning, such as in case of TruForm, implies tessellation on the integer levels, i.e. when every edge of the original triangle is split into two, three, four and so on, segments on the first, second, third and so on, partitioning level.
The algorithm of continuous tessellation should provide smooth change of triangle partitioning types and simultaneous smooth transition from one level of detail (tessellation) to another. Here's what the algorithm should do:
- Smooth change of the level of detail (tessellation level) should maintain smooth transitonis between different levels of detail.
- There should be no "hanging vertexes" or "T-junctions" as the transformation of the created vertexes may in this case lead to "holes" in the model:
- The number of vertexes cannot be fractional;
- It should be possible to avoid "popping" when the number of vertexes that split the edge of the polygon into segments suddenly changes (i.e. when some vertexes are created or removed). For this purpose the position of the created or removed vertexes should be congruent with the location of the vertexes that were not changed at the current tessellation level.
- The level of tessellation should be set not for the entire base polygon, but for its every edge separately. In other case, an edge that is shared by two polygons will be tesselated by each polygon in its own way according to its own selected level of detail. This, again, would lead to "holes" in models.
Here's an example of a continuous tessellation algorithm that complies with the above mentioned requirements:
Here, on transition from the zero level of tessellation to the first one, we get additional vertexes that coincide with those of the base polygon. As the tessellation level grows, the extra vertexes approach the edge middles and finally get there. The lower row of the picture above shows a model that consists of one triangle like that. Of course, if the tessellation level changes smoothly, the model also changes smoothly, without any "popping" effects.
The adaptive polygon tessellation from Matrox, Depth-Adaptive Tessellation, is more sophisticated but complies with the above-listed general requirements.
In case of Depth-Adaptive tessellation, the base triangle is split into 6 triangles on the zero tessellation level already. The triangles are formed by three vertexes that lie in the middles of the edges and the vertex that lies in the medians point of intersection. On transition to the first tessellation level, the number of the vertexes is increased by 12, so that the overall amount of triangles becomes equal to 22.
Smooth increase of tessellation level results in "moving apart" the vertexes that lie on the edges and inside the original triangle until the vertexes take their places and the resulting triangle mesh becomes uniform.
On further increase of the tessellation level, i.e. on smooth transition from the first to the second level of detail, 6 more vertexes appear and the number of triangles created goes up to 31.
Then, the vertexes "move apart" until the triangle mesh gets even again:
This tessellation algorithm is applied again and again and this way the mesh will become uniform at any integer tessellation rate.
The above mentioned example represents an ideal case when tessellation level is the same for every edge of the base triangle. In reality, different edges may have different tessellation levels. This level is determined by the distance between the edge vertices and the observer: the closer the edge, the higher tessellation level is used. That's where the name of the patent-pending algorithm from Matrox comes from: Depth-Adaptive Tessellation.
You can see the actual results of the adaptive tessellation from Matrox below. The picture is taken from "Mars" demo and is a little polished for more clearness. The bold lines mark the edges of original polygons:
Hardware displacement mapping support is a long-time dream of software developers who wanted to correctly display bump mapped models without making them too complex by describing each bump in a separate set of polygons.
The existing technologies just simulate the bumps and even the most advanced of them, such as EMBM and Dot3, don't really change the model geometry. Instead, they create an illusion of depth and detail on the surface of the object by using "advanced" texture calculations and lighting processing and applying reflections maps.
The next example from Matrox shows the base model, the effect produced by EMBM and displacement mapping:
It's clear that displacement mapping provides most correct bump representation by changing the very geometry of the model.
But how does it work?
To relocate the vertexes the so-called displacement map is applied. In fact, it is an ordinary texture, but the values it contains are interpreted not as colors but as the displacement distances for vertexes. Every vertex is displaced from its initial position in the base model along the perpendicular dropped from this location. The displacement distance is taken from the corresponding displacement map. A simplified case of a single-dimensional displacement map is shown on the picture below:
If the displacement map is laid upon a tessellated polygon, the vertexes that have been created during tessellation also undergo displacement according to the values from the map. The following picture illustrates this: the base flat model and the tessellated model are on the left; the displacement map and the displaced model are on the right. The final rendered model is located in the lower right corner:
It is evident that by combining tessellation methods with displacement mapping we are able to define complex surfaces with very few parameters: several polygons and the appropriate displacement map. Moreover, as we've already said, the use of adaptive tessellation doesn't tell too tangibly on the image quality and at the same time doesn't demand too much form the GPU.
But there can appear a certain problem: when the vertexes are smoothly "moving" along the edges as it is implemented in the adaptive tessellation algorithm from Matrox, the coordinates of these vertexes in the corresponding displacement map will also be changing. And this means that the vertexes will "pop" on transition to the new coordinates in the displacement map. Of course, that's not acceptable.
Matrox refers to the algorithm use to handle situations like that as well as the general procedure of displacement map sampling as to "Vertex Texturing". The "popping" of vertexes is eliminated very easily: like in case of regular textures, they just apply bi-linear filtering by four neighboring values to the displacement map.
Besides that, MIP-levels are generated and used for displacement maps, like for ordinary textures. This is a very natural thing in this case: with depth-adaptive tessellation a distant model loses its high level of detail, so the displacement maps should also not so detailed.
In order to avoid the "popping" of vertexes during the shift from one MIP-level of the map to another, linear interpolation by two values is used. These values are taken from neighboring MIP-levels. And as a result we receive the displacement distance for the vertex considered. That is, we deal with true tri-linear texture filtering; in our case it is tri-linear displacement map filtering.
Tri-linear filtering of displacement maps eliminates all the "popping" of vertexes or any other unpleasant effects. The shorter becomes the distance to the object, the higher gets the level of detail for it. In other words, the model will be smoothly "growing" more detailed, as well as a blurry distant texture of the model will become more distinct on smooth approaching. The next picture illustrates this point: on the left you can see a model and its displacement map MIP-levels, and on the right - the model and its texture MIP-levels:
Displacement mapping is a powerful and flexible tool for creating bumped objects. For example, you can reach absolutely different results using the same model and different displacement maps. Matrox provides the Character Demo program where the human body model is transformed into different "aliens" with different displacement maps:
Displacement maps as well as ordinary textures don't have to be static. They can be changed, refreshed, or even generated anew for each frame. So, for example, it's possible to create an ocean surface with waves on it, for a flight simulator, or a bumped battlefield with trenches, antitank ditches, and permanent shell-holes, for a strategy.
All in all, the possible application fields for the displacement maps are really impressive. Besides, the fact that they included displacement mapping support into DirectX 9 specification means that very soon it will become a standard for DirectX9-compatible graphics accelerators. There is not much to do now: the game developers need to get excited about this technology as well and to start using displacement maps in their products.
Texturing Units Dynamic Management
Matrox Parhelia features 4x4 organization: 4 pixel pipelines with four texturing units each.
Seems like this is going to be great advantage over competitors in polygon texturing speed. For example, ATI RADEON 8500 and NVIDIA GeForce4 Ti4600 have four pixel pipelines with two texturing units each.
But the texturing units of Matrox Parhelia, unlike those of the competitors, can only take four texture samples per clock, that is, to perform bi-linear filtering only.
To perform tri-linear filtering, the texturing units of Parhelia are combined in pairs and eight samples are taken from each texture (four from each of the two closest MIP-levels of the texture).
When performing anisotropic filtering, texturing units again get combined in pairs thus reducing the texturing speed.
The implementation of anisotropic filtering in Matrox Parhelia is worth closer look. During anisotropic filtering, the projection of a pixel onto the model surface is interpreted not as a pixel, but as an elongated ellipse:
In order to calculate the pixel's color correctly, the GPU has to sum up the colors of all texture samples (with the corresponding coefficients) that come into the ellipse. That's a too complex a task to be solved in real time, so certain simplification is required. Several pixels are set along the longer axis of the ellipse. Then bi-linear filtering is performed in them, so that the average of these values forms the final color:
The calculation of the pixels coordinates in the plane has also become very simple. The GPU calculates the location of the first pixel and increments along the texture axes, then shifts the pixel a required number of times and performs bi-linear filtering in every new location.
The number of pixels and the distance between them correspond to anisotropy level - the ratio between the long and short axes of the ellipse:
So, we'll have more steps at high anisotropy levels and fewer - at low levels. The minimum number of steps is equal to 1, i.e. in some cases anisotropic filtering may "degrade" into bi-linear filtering. This spares the GPU the necessity to do anisotropic filtering where it's not required.
On the whole, the driver just sets the maximum anisotropy level while the GPU defines the necessary number of texture samples in each particular case.
By the way, NVIDIA GeForce3/GeForce4 chips families perform anisotropic filtering very much the same. Although, Matrox Parhelia-512 has its unique peculiarities:
- Firstly, the texturing units of Parhelia-512 cannot take eight texture samples per clock to bi-linearly filter two neighboring MIP-levels at the same time. This means that the combination of tri-linear and anisotropic filtering in Parhelia will make all four texturing units involved in the rendering of one texture. That is, if there is more than one texture to be rendered, Parhelia-512 will spend an extra clock cycle for rendering every extra texture.
This situation differs from what we see in NVIDIA GeForce3 Ti/GeForce4 Ti. These chips need extra clocks not for the growing number of textures but when the anisotropy level is increased: once clock for every extra bi-linear or tri-linear filtering operation.
As Parhelia texturing units can be re-configured, the 8-sample anisotropic filtering (level 2) for two textures or both tri-linear and 8-sample anisotropic filtering for one texture shouldn't slow sown the performance, while by NVIDIA GeForce3 Ti/ GeForce4 Ti any growth of the anisotropy level requires extra clocks to be spent.
- Secondly, our tests showed that Matrox Parhelia's maximum supported anisotropy level is 2 while by NVIDIA GeForce3 / GeForce4 Ti it is equal to 8. That is, Matrox Parhelia provides much lower texture precision at very small view angles.
- Thirdly, the same tests showed that forcing anisotropic filtering in Matrox Parhelia automatically turns on tri-linear filtering, too. So, forcing anisotropic filtering pushes Parhelia-512 to work under the most unfavorable conditions when it can render only one texture per clock.
Supersampling and FAA-16x Anti-Aliasing
Matrox Parhelia supports standard 2x2 supersampling, the advantages and shortcomings of which you can learn from our article called "On the Way to Ideal Picture". The card also features a new interesting method of smoothing the "jaggies". It's Fragment Anti-Aliasing, FAA.
What is it? The key concept of this method is choosing between the spots where Anti-Aliasing is necessary and where it is not. It's evident that we need the smoothing only on the edges of polygons. FAA-16x, as well as NVIDIA's multisampling, uses the increased number of subpixels to draw a pixel only on the polygon edges. But while NVIDIA's multisampling builds up the image in an enlarged buffer (see "On the Way to Ideal Picture" article), Parhelia involves a separate buffer to store pixels that require processing (fragments of the image, that's where the name "Fragment Anti-Aliasing" comes from ). A simplified scheme of Fragment Anti-Aliasing is shown in the next pic:
That's the way it works:
- The scene is constructed as usual. If the GPU encounters pixels that only partially belong to the currently drawn polygon, that is they lie on the polygon edges, their colors are not written to the frame buffer as usual. For these pixels 4x4 supersampling starts, i.e. they are calculated with higher precision. The color of the pixel fragment (the average color of its subpixels that belong to the current polygon) and the pixel coordinates are added to the fragments list stored in a special buffer.
If the GPU meets these pixels again when processing another polygons, the average color of the remaining subpixels will be combined with the corresponding value of the pixel from the fragment buffer and then written into the fragment buffer anew. This excludes situations with "lost" fragments or "half-constructed" pixels.
- As soon as all the polygons of the scene have been processed, the image is nearly done, but lacking the pixels that have been selected to be processed at a higher precision level.
The colors of all these pixels have been combined and stored in the fragment buffer. They're extracted from the buffer and put into the corresponding parts of the image. After that, the frame buffer will contain a completely ready-to-go picture.
FAA-16x has an evident advantage over supersampling and multisampling: it demands a bit more from the memory bandwidth as it doesn't use the additional enlarged frame buffer for scene construction.
A bit confusing might be the number of subpixels: 4x4, which could make one think of low performance of the method. But in reality in most scenes very few percents of pixels require high-precision processing.
So, on the whole, this method provides unrivalled quality of full-screen Anti-Aliasing at the expense of rather low growth of the workload.
As a result, Matrox Parhelia chip ensures impressive picture quality, as if the edges have been smoothed via 4x4 supersampling. Moreover, the performance losses appear much smaller than in case of 4x multisampling from NVIDIA and much smaller than during 4x supersampling by ATI RADEON 8500, for example.
Glyph Anti-Aliasing means hardware fonts Anti-Aliasing. Windows XP/2000 can turn on screen fonts Anti-Aliasing thus making them easier to read. But this is done on the software level and, according to Matrox, results in about 20-30% slower windows drawing.
Matrox Parhelia supports forced hardware fonts Anti-Aliasing without losses in speed:
Moreover, the user can adjust the subjective "boldness" of symbols by setting the gamma of half-tints produced by Anti-Aliasing.
The picture shows two screenshots of the ordinary "notepad" window, made with different gamma settings.
Closer Look: Matrox Parhelia-512 Graphics Card
Matrox Parhelia retails in a stylish white-and-blue box:
The package includes the card, a CD with drivers, demo-programs and utilities, a user's manual and the following adapters: DVI-I-to-D-SUB, DVI-I-to-D-SUB + D-SUB, D-SUB-to-S-Video + RCA:
The graphics card itself looks very unusual as some of the memory chips are placed at an angle of 45deg:
The card is based on the Parhelia-512 chip and has 128MB 256bit DDR SDRAM graphics memory as BGA chips from Infineon with 3.3ns access time:
The shape and size of Parhelia-512 chip are quite impressive. It's not only the largest of all graphics cards I've ever seen, but also features solid metal surface for better heat dissipation from the chip:
The frequencies of the retail Matrox Parhelia are 220MHz for the chip and 550MHz (275MHz DDR) for the graphics memory.
The OEM version has lower working frequencies: 200MHz for the chip and 500MHz (250MHz DDR) for the graphics memory.