<%BANNER[top_768x90]%>
<%BANNER[banner_468x60_h]%>
<%BANNER[article]%>

Articles: Video

Pages: [ 1 | 2 | 3 | 4 ]

Anisotropic Filtering

GeForce3 can have anisotropic filtering running for 8, 16 and 32 samples. Unfortunately, the driver 11.01 doesn't allow enabling anisotropic filtering via the control panel, however, you can do it in the registry. To enable anisotropic filtering in OpenGL you should find the following line:
[\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Class\Display\0000\NVidia\OpenGL]
and set DefaultLogAniso equal to 1 for 8 samples, to 2 for 16 samples and to 3 for 32 samples.

In this case instead of bilinear filtering, anisotropic filtering will be automatically enabled. Now we would like to give you a better idea of the image quality in case of pure anisotropic filtering and a combination of anisotropic and trilinear filtering. We selected Quake3 for that (click images to enlarge):



Bilinear Filtering
   

Trilinear Filtering
 


Anisotropic Filtering, 8 samples
   

Anisotropic Filtering, 8 samples + Trilinear Filtering
 


Anisotropic Filtering, 16 samples
   

Anisotropic Filtering, 16 samples + Trilinear Filtering
 


Anisotropic Filtering, 32 samples
   

Anisotropic Filtering, 32 samples + Trilinear Filtering

Of course, anisotropic filtering requires a lot of computing resources and increases the memory bus workload significantly that's why you may notice a certain performance drop when this filtering is enabled. The diagrams below illustrate the performance drop for anisotropic and trilinear filtering in Quake3 Arena:


As you can see, enabling anisotropic filtering on purpose results into a serious performance drop in Quake3 Arena, which grows with the increase in the number of texturing samples used. For example, in case of anisotropic filtering with 32 samples at lower resolutions the memory bus isn't overloaded because the CPU restricts the performance. That is why the speed doesn't drop too much here. However, at 1280x1024 and 1600x1200 the cards slows down quite tangibly.

Trilinear filtering also requires resources. However, fully-fledged trilinear filtering needs only 1 extra texturing unit to be involved in the process, which doesn't tell on the performance that greatly.

11.01 driver allows enabling anisotropic filtering in Direct3D as well, though only for 8 samples. For this purpose you should find the registry line:
[\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Class\Display\0000\NVidia\Direct3D]
and set FORCEANISOTROPICLEVEL equal to 1. The quality of anisotropic filtering in Direct 3D can be illustrated by Unreal Tournament:



Bilinear Filtering
   

Trilinear Filtering
 


Anisotropic Filtering, 8 samples
   

Anisotropic Filtering, 8 samples + Trilinear Filtering

Bump Mapping: EMBM and DotProduct3

At first EMBM was a feature of Matrox Millennium G400/G450 graphics cards. Then its support appeared in ATI RADEON. And now NVIDIA finally implemented EMBM in its GeForce3. Environmental Mapped Bump Mapping provides a really beautiful surface relief, which cannot be obtained by any texture combinations:

DotProduct3 bump mapping is supported only by the latest generation gaming graphics cards: NVIDIA GeForce2 family, GeForce3 and all ATI RADEON modifications. That is why it hasn't become popular in games yet. Here is an example of DotProduct3 method in 3DMark 2001:

In our opinion, the relief here looks much more natural than in case of EMBM.

RTPatches - Hardware Tessellation of Smooth Surfaces

GeForce3 can do hardware tessellation or split smooth surfaces. Briefly speaking, this technology looks as follows. For example, to define an object of some surface, the GPU gets a set of vertexes. With their help GeForce3 creates a math1ematical description of this curved surface. After that it builds up a polygon "net" (note that there can be any amount of polygons used and the "net" can be either tight or loose, it doesn't matter). This way the core doesn't receive a huge array of vertex coordinates, but just a few control vertexes, which unloads the AGP bus and saves it time and trouble transferring all these data. Unfortunately, this method isn't that widely used in games yet, especially since its implementation may be connected with some difficulties. For instance, it is not that trivial to find the points or curves formed by two intersecting surfaces like that. And the solution for this task may be very important for those cases when in a game there is, for instance, a guy moving (the model of the guy is described as a set of curved surfaces) and the developer should make sure that he doesn't go through the walls :-)

HSR, Z-Buffer Compression

HSR technology (Hidden Surface Removal) is aimed at saving the accelerator time and trouble drawing those objects, which will be in the end covered by other objects located at the forefront, i.e. closer to the viewer.

GeForce3 is very likely to be using a method, which is considered to be a variation of the hierarchical Z-buffer principles. Z-buffer is split into rectangular blocks, which are used for building "low resolution Z-buffer". Note that from each Z-buffer block the "farthest" value is taken and saved in the corresponding cell of the "low-resolution Z-buffer". Before texturing the polygons transferred along the pipeline, they are checked as belonging to some particular blocks. And if a polygon fits inside one of the blocks and the "depth" of the nearest polygon vertex is larger than the value stored in the corresponding cell of the "low-resolution Z-buffer", then this polygon evidently appears covered by the other ones and hence needn't be textured. Moreover, another optimization is also possible: if the entire block fits into a polygon (triangle) it is enough to check the "depth" value for the four vertexes of the block. If the "closest" value is larger than that stored in the "low-resolution Z-buffer", then the part of the polygon belonging to the considered block is hidden and there is no need to texture it.

If all these operations have been completed and it turns out that the entire polygon or a part of it should be displayed, then during texturing, the Z values for each pixel are saved to the Z-buffer, which ensures that the Z-buffer contains the latest data by the time a new polygon comes to be processed. This part of the algorithm also seems to be optimized, since it is possible to save to the "low-resolution Z-buffer" only the largest "depth" value of the polygon vertexes or of the block vertexes in case the block appears inside the considered polygon.

Anyway, this method allows reducing the number of comparisons carried out and unload some of the memory bus bandwidth due to fewer Z-buffer requests. However, HSR doesn't always provide a guaranteed performance gain. First of all, Overdraw, which is a measure of how many times a pixel is drawn for each frame rendered, may vary for different scenes. Secondly, if the polygons in the scene are sorted so that each next polygon is closer to the viewpoint than the previous one, i.e. the core gets the info on the farther polygons first and on the closer polygons afterwards, then each polygon drawn will always be the closest and will be drawn as a whole. It means that the HSR doesn't work. Therefore, the ideal case is when the polygons are sorted in such a way that the closest polygons are the first to get transferred to the accelerator core. Then all the next polygons get the chance to appear obscured by the previous ones and in this case the card can skip texturing them.

Together with HSR, there is also the so-called lossless Z-buffer compression, which allows reducing the amount of data transferred to the Z-buffer even more.

In order to be able to estimate how greatly these technologies can tell on the memory bus bandwidth workload and the graphics accelerator performance, we resorted to VillageMark benchmark, where the scene OverDraw is very high. For a better comparative analysis, we also took a look at ATI RADEON DDR, Creative 3D Blaster Annihilator 2 Ultra (NVIDIA GeForce2 Ultra) and ASUS V7700 64MB (NVIDIA GeForce2 Pro), having set its core to 200MHz and overclocked the memory up to 230MHz (460MHz DDR), which are the working frequencies of NVIDIA GeForce3:


As you can see in 16bit mode GeForce2 Ultra takes the lead due to high memory and core frequencies. However, in 32bit mode GeForce3 becomes the winner. HSR technology allowed it to leave behind not only GeForce2 Pro overclocked up to the level of GeForce3, but also GeForce2 Ultra. Moreover, we didn't detect any performance drop when we shifted from 16bit to 32bit mode. RADEON DDR is lagging behind everybody in 16bit mode while in 32bit mode, especially at higher resolutions, its HyperZ technology allows RADEON to catch up with GeForce2 Pro and GeForce2 Ultra. As for GeForce3, which is treating hidden surfaces similarly and featuring similar Z-buffer optimization, RADEON DDR failed to beat it, because GeForce3 works at much higher frequencies and can boast twice as many pipelines. We will return to HSR discussion a bit later when we speak about the performance in Quake3.

Lightspeed Memory Architecture

NVIDIA's official papers say that the 256bit (128bit DDR) memory controller of GeForce3 is "split" into 4 controllers 64bit wide each. In case the triangles are so small that their image on the screen is made of only 1-2 pixels and hence their drawing requires only 64bit of data to be taken from the local graphics memory, the separation of the memory controller into four smaller ones allows reducing the idle time. Thus one 64bit controller will transfer the data for the rendering of this particular triangle and the other three controllers will remain uninvolved. In other words, there will be no short data threads transferred along the 256bit memory bus. However, we doubt that the memory architecture is implemented in the following way over there. It seems much more logical to have texture and Z-buffer caching, which is most likely to have been used in GeForce3. Nevertheless, this architecture of the memory controller was officially announced to be the case. So, it is also possible that each of the four pipelines of GeForce3 has its own memory controller.

Anyway, it seems to be too much theory now. Let's pass over to real things...
 

Pages: [ 1 | 2 | 3 | 4 ]

<%BANNER[banner_468x60_f]%>

Discussion

Comments currently: 0

You must log in to add comments.

Forgot password? Registration

remember me