NVIDIA GeForce3 Guide

Our ultimate guide devoted to the newest graphics accelerator from NVIDIA covers all the aspects you could ever think of.In this article you will find not only the details on GeForce3 architecture and technologies, but also an indepth study ofhow all this stuff works as well as the whole bunch of benchmarks and screenshots.

by FastSite
04/25/2001 | 12:00 AM

During the last couple of years NVIDIA has turned into the world's leader in the consumer 3D graphics market. Their six-month cycle strategy, when we saw a new product developed and launched every half a year, always put their competitors in the catching-up position.

The graphics chips evolution was based on the increase in chip frequencies and in the amount of pipelines used. In other words, NVIDIA improved the performance of its solutions without introducing any new technologies. That is why by autumn 2000 the company finally understood that this would lead them to a dead end one day.

The best proof of this point is the successful defile of ATI RADEON. A chip with lower working frequencies and fewer pipelines, managed to catch up with the contemporary NVIDIA's solutions only due to the implementation of some tile architecture ideas.

And tile KYRO from PowerVR/STM, which couldn't boast high working frequencies as well appeared unexpectedly fast in games, even though it seemed a pretty weak solution, according to NVIDIA's standards.

Well, at last there appeared the so long-awaited GeForce3, NVIDIA was so proud of. When working on this solution, NVIDIA's engineers, besides some other innovations, finally made a small move towards tile architecture for only one single purpose: to get rid of the current architecture drawbacks.

NVIDIA GeForce3 Specs

Judging by the specs of the new graphics chip we can see that GeForce3 has a number of improvements and innovations compared to the previous generation products: GeForce2, GeForce2 Pro and GeForce2 Ultra.

And now we would like to focus on the advantages the newcomer can boast.

nFinite FX Technology, Pixel and Vertex Shaders

The mere name of this technology speaks for itself: now the developers have inexhaustible opportunities for creation of various graphics effects and highly realistic scenes.

nFinite FX technology represents a combination of hardware features of the new GeForce3 core and some flexible means for GeForce3 geometry control, i.e. Pixel and Vertex Shaders.

Shaders are none other but some special programs, and if we try to describe them verbally, then the Pixel Shaders will be special programs for TMU control, which in particular deal with texturing and pixel color calculating, while Vertex Shaders will be special means for the developer to program T&L unit for geometric calculations and transformations.

Compared with the GeForce2, the new GeForce3 features a much more flexible core structure. For instance, Pixel Shaders by GeForce2 (NSR, NVIDIA Shading Rasterizer) isn't compliant with Microsoft DirectX 8:

As you can see from the chart above, GeForce2 allows combining only 2 textures to create a single pixel color. However, there was no loopback, so effects that needed combining more textures and required dependent texture reads, such as true, reflective bump mapping were simply not possible. In other words, the programmers were somewhat limited by NSR because of its architecture imperfection.

GeForce3 Pixel Shaders don't have the described drawback.

It means that its blending unit supports up to 8 texture-blend operations, and dependent texture reads. Besides, the pipelines are connected with each other and don't lack the "loopback" anymore. At the same time, Pixel Shader isn't interpreted by the GeForce3 core but is transferred into a set of parameters involved in different stages of texturing units rendering. The number of texturing stages determines the size of Pixel Shader program. GeForce3 features 8 texturing stages, i.e. it can perform up to 8 operations on textures and intermediate results to get the proper pixel color in the end. Unfortunately, there are no games yet, which could boast using Pixel Shaders mechanism to the full extent.

Pixel Shaders technology is very closely connected with Vertex Shaders. Each vertex is defined by a minimum of three coordinates: x, y, and z that describe its location. In most cases, a vertex may also include data for color, alpha-channel, texture, and lighting characteristics. In fact, Vertex Shaders allow flexible control over the T&L engine and represent special programs, which are interpreted by the GeForce3 geometric coprocessor. For instance, if you set the initial and the final data sets for a vertex with the help of Vertex Shaders, you will automatically get the lighting, transparency and transformation coordinates calculated for each intermediate point of the trajectory. Of course, this is not the only application for Vertex Shaders. For instance, they can help you to easily get volumetric fog or lighting, different object deformations, etc. The only thing that may be considered a limitation here is just the imagination of the developer and the maximum allowed size for the Vertex Shader program. GeForce3 is capable of interpreting Vertex Shaders with the maximum of 128 commands, which is more than enough for programming the most complex transformations. The results obtained here are later used as the initial data for Pixel Shaders when the end image is created. This way Pixel and Vertex Shaders are very closely connected with each other. It is worth mentioning that Vertex Shaders doesn't only help creating new effects and simplify the programming of the already existing effects, but also unloads the CPU quite considerably. For instance, GeForce3 can take upon itself some routine geometric transformations for a set of objects, which will be carried out by only one shader.

Now there are no games, which could boast using GeForce3's features in full, especially, Vertex and Pixel Shaders. That is why we have to offer you some screenshots from 3Dmark2001 and Aquanox (a recently announced game optimized for GeForce3 and using Pixel Shaders v.1.1 and Vertex Shaders v.1.0):

Here vertex Shaders are most likely to be used to get the effect of the brownish water layer just above the seabed, moving plankton, shadows. Together with the Pixel Shaders, Vertex Shaders also provide dancing light spots on all the objects of the scene, which are none other but the effect of the sunlight refraction on the water surface.

Pixel Shaders can't be emulated when the graphics card turns out incapable of supporting them, because ordinary graphics cards can produce only combinations of textures, so to speak. And Pixel Shaders imply the possibility of saving intermediate results and the "loopback" between the pipelines. Vertex Shaders, however, can be emulated due to the fact that some of the scene geometry can be calculated by the system CPU. Of course, this may lead to inevitable performance drop, which we will actually see later on in this review when we study the performance of GeForce3 and the previous generation graphics cards in Aquanox.

Anti-Aliasing by Multisampling and Quincunx

The common image we get in most 3D games without full-scene anti-aliasing is usually spoilt by the so-called stair-step effect at the triangle edges:

GeForce2 eliminated this drawback by means of supersampling technology. Here are the key ideas of this method:

In GeForce3 NVIDIA decided not to resort to supersampling to eliminate this stair-step effect. The developers made up their mind not only to improve the anti-aliased image quality but also to reduce the performance drop in case of FSAA.

To cut the long story short, GeForce3 applies anti-aliasing only where necessary, i.e. to the edges of the triangles.

Like in case of GeForce2, the image is created in a larger buffer. And just like in the previous case the buffer is doubled along the horizontal axis for 2-sample anti-aliasing and along the horizontal and vertical axes - for four-sample anti-aliasing.

However, the difference lies with the way GeForce3 creates the image in this buffer.

Let's consider the four-sample anti-alising case.

By GeForce2 the four samples stored in the larger buffer should be blended to define the color of the pixel in the frame buffer.

And what about GeForce3? When the triangles are produced by the geometric engine, the rasterizing unit colors them. At the same time, it is constantly checked if the "2 samples x 2 samples" block fits into the boundaries of the triangle currently textured. If the block made of 2x2 samples fits into the triangle completely, then GeForce3 calculates only one color, interpolating the textures for the pixel in the middle of this block. Then it fills this block with a single color. And when not all the samples of a "2 samples x 2 samples" block fit into the triangle, i.e. when the block crosses the triangle edge, those samples, which belong to the triangle currently textured are drawn in a larger buffer in the common order. The remaining samples, which do not belong to this particular triangle, will be textured later, when other triangles are processed. As soon as the sample colors of 2x2 blocks in the larger buffer are blended and the scene is built, the frame buffer is created.

As a result of the larger buffer rasterization described above, the textures lying within the triangle boundaries do not get blurred, and the color of those pixels situated close to the triangle edges are obtained as a mixture of four sample colors from the larger buffer. In other words, this anti-aliasing method appeared very much like supersampling with that only difference that it is applied only to the triangle edges, which allows eliminating stair-steps without washing out the textures.

Besides, there are almost four times as few samples that should be processed, which reduces the number of operations needed to build a larger image. Definitely, it saves a lot of time and provides much better image quality due to no blurring and washing out within each triangle.

2x anti-aliasing acts the same way, but in this case GeForce3 uses 2x1 sample blocks made of 2 samples only.

Also GeForce3 allows doing scene anti-aliasing with the help of Quincunx technology.

According to NVIDIA, the patented Quincunx technology offers quality comparable to FSAA 4x, with performance similar to that FSAA 2x. To get the pixel color there are 5 samples used. Note that these samples are positioned in a specific manner, which is marked with red color on the picture below:

GeForce3 anti-aliasing can be controlled via driver properties:

Well, now let's take a look at the image quality provided by 2-sample and 4-sample anti-aliasing by GeForce2 and compare it with 2-sample, 4-sample and Quincunx anti-aliasing by GeForce3. To illustrate our study we selected Homeworld: Cataclysm:




No AntiAliasing
   


GeForce2 2x AntiAliasing
   


GeForce3 2x AntiAliasing
 



GeForce3 Quincunx AntiAliasing
   


GeForce2 4x AntiAliasing
   


GeForce3 4x AntiAliasing

Unfortunately, it was exactly in Homeworld: Cataclysm that we observed very unpleasant artifacts in the game menu, when we enabled any anti-aliasing modes for GeForce3:

We don't think it is a hardware problem of GeForce3. It looks more like some driver bug, because we didn't come across anything of the kind in Quake3 and Unreal Tournament: anti-aliasing worked perfectly well there.

In order to figure out how much of the performance of our GeForce3 we will sacrifice for the sake of FSAA, we took Quake3 Arena. For a better comparison we also tested ATI RADEON DDR, Creative 3D Blaster Annihilator 2 Ultra (NVIDIA GeForce2 Ultra) and ASUS V7700 64MB (NVIDIA GeForce2 Pro), having set its core to 200MHz and overclocked the memory up to 230MHz (460MHz DDR), which is the working frequencies of NVIDIA GeForce3:


Due to optimized multisampling methods, GeForce3 core requires much fewer samples for FSAA and hence it makes fewer calculations and needs less data from the memory. That is why only GeForce3 can provide acceptable gaming performance and beautiful image quality at 1024x768 and higher resolutions in case Full-Scene Anti-Aliasing is enabled. GeForce2 Pro and GeForce2 Ultra suffer memory bus bandwidth limitations in this case and hence cannot compete.

You may be surprised to see GeForce3 show such high results in 1600x1200x32 for 4x anti-aliasing. Well, this is no wonder: there was not enough graphics memory in this mode and 4x anti-aliasing in Quake3 simply got disabled :-)
 

Anisotropic Filtering

GeForce3 can have anisotropic filtering running for 8, 16 and 32 samples. Unfortunately, the driver 11.01 doesn't allow enabling anisotropic filtering via the control panel, however, you can do it in the registry. To enable anisotropic filtering in OpenGL you should find the following line:
[\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Class\Display\0000\NVidia\OpenGL]
and set DefaultLogAniso equal to 1 for 8 samples, to 2 for 16 samples and to 3 for 32 samples.

In this case instead of bilinear filtering, anisotropic filtering will be automatically enabled. Now we would like to give you a better idea of the image quality in case of pure anisotropic filtering and a combination of anisotropic and trilinear filtering. We selected Quake3 for that (click images to enlarge):



Bilinear Filtering
   

Trilinear Filtering
 


Anisotropic Filtering, 8 samples
   

Anisotropic Filtering, 8 samples + Trilinear Filtering
 


Anisotropic Filtering, 16 samples
   

Anisotropic Filtering, 16 samples + Trilinear Filtering
 


Anisotropic Filtering, 32 samples
   

Anisotropic Filtering, 32 samples + Trilinear Filtering

Of course, anisotropic filtering requires a lot of computing resources and increases the memory bus workload significantly that's why you may notice a certain performance drop when this filtering is enabled. The diagrams below illustrate the performance drop for anisotropic and trilinear filtering in Quake3 Arena:


As you can see, enabling anisotropic filtering on purpose results into a serious performance drop in Quake3 Arena, which grows with the increase in the number of texturing samples used. For example, in case of anisotropic filtering with 32 samples at lower resolutions the memory bus isn't overloaded because the CPU restricts the performance. That is why the speed doesn't drop too much here. However, at 1280x1024 and 1600x1200 the cards slows down quite tangibly.

Trilinear filtering also requires resources. However, fully-fledged trilinear filtering needs only 1 extra texturing unit to be involved in the process, which doesn't tell on the performance that greatly.

11.01 driver allows enabling anisotropic filtering in Direct3D as well, though only for 8 samples. For this purpose you should find the registry line:
[\HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Class\Display\0000\NVidia\Direct3D]
and set FORCEANISOTROPICLEVEL equal to 1. The quality of anisotropic filtering in Direct 3D can be illustrated by Unreal Tournament:



Bilinear Filtering
   

Trilinear Filtering
 


Anisotropic Filtering, 8 samples
   

Anisotropic Filtering, 8 samples + Trilinear Filtering

Bump Mapping: EMBM and DotProduct3

At first EMBM was a feature of Matrox Millennium G400/G450 graphics cards. Then its support appeared in ATI RADEON. And now NVIDIA finally implemented EMBM in its GeForce3. Environmental Mapped Bump Mapping provides a really beautiful surface relief, which cannot be obtained by any texture combinations:

DotProduct3 bump mapping is supported only by the latest generation gaming graphics cards: NVIDIA GeForce2 family, GeForce3 and all ATI RADEON modifications. That is why it hasn't become popular in games yet. Here is an example of DotProduct3 method in 3DMark 2001:

In our opinion, the relief here looks much more natural than in case of EMBM.

RTPatches - Hardware Tessellation of Smooth Surfaces

GeForce3 can do hardware tessellation or split smooth surfaces. Briefly speaking, this technology looks as follows. For example, to define an object of some surface, the GPU gets a set of vertexes. With their help GeForce3 creates a math1ematical description of this curved surface. After that it builds up a polygon "net" (note that there can be any amount of polygons used and the "net" can be either tight or loose, it doesn't matter). This way the core doesn't receive a huge array of vertex coordinates, but just a few control vertexes, which unloads the AGP bus and saves it time and trouble transferring all these data. Unfortunately, this method isn't that widely used in games yet, especially since its implementation may be connected with some difficulties. For instance, it is not that trivial to find the points or curves formed by two intersecting surfaces like that. And the solution for this task may be very important for those cases when in a game there is, for instance, a guy moving (the model of the guy is described as a set of curved surfaces) and the developer should make sure that he doesn't go through the walls :-)

HSR, Z-Buffer Compression

HSR technology (Hidden Surface Removal) is aimed at saving the accelerator time and trouble drawing those objects, which will be in the end covered by other objects located at the forefront, i.e. closer to the viewer.

GeForce3 is very likely to be using a method, which is considered to be a variation of the hierarchical Z-buffer principles. Z-buffer is split into rectangular blocks, which are used for building "low resolution Z-buffer". Note that from each Z-buffer block the "farthest" value is taken and saved in the corresponding cell of the "low-resolution Z-buffer". Before texturing the polygons transferred along the pipeline, they are checked as belonging to some particular blocks. And if a polygon fits inside one of the blocks and the "depth" of the nearest polygon vertex is larger than the value stored in the corresponding cell of the "low-resolution Z-buffer", then this polygon evidently appears covered by the other ones and hence needn't be textured. Moreover, another optimization is also possible: if the entire block fits into a polygon (triangle) it is enough to check the "depth" value for the four vertexes of the block. If the "closest" value is larger than that stored in the "low-resolution Z-buffer", then the part of the polygon belonging to the considered block is hidden and there is no need to texture it.

If all these operations have been completed and it turns out that the entire polygon or a part of it should be displayed, then during texturing, the Z values for each pixel are saved to the Z-buffer, which ensures that the Z-buffer contains the latest data by the time a new polygon comes to be processed. This part of the algorithm also seems to be optimized, since it is possible to save to the "low-resolution Z-buffer" only the largest "depth" value of the polygon vertexes or of the block vertexes in case the block appears inside the considered polygon.

Anyway, this method allows reducing the number of comparisons carried out and unload some of the memory bus bandwidth due to fewer Z-buffer requests. However, HSR doesn't always provide a guaranteed performance gain. First of all, Overdraw, which is a measure of how many times a pixel is drawn for each frame rendered, may vary for different scenes. Secondly, if the polygons in the scene are sorted so that each next polygon is closer to the viewpoint than the previous one, i.e. the core gets the info on the farther polygons first and on the closer polygons afterwards, then each polygon drawn will always be the closest and will be drawn as a whole. It means that the HSR doesn't work. Therefore, the ideal case is when the polygons are sorted in such a way that the closest polygons are the first to get transferred to the accelerator core. Then all the next polygons get the chance to appear obscured by the previous ones and in this case the card can skip texturing them.

Together with HSR, there is also the so-called lossless Z-buffer compression, which allows reducing the amount of data transferred to the Z-buffer even more.

In order to be able to estimate how greatly these technologies can tell on the memory bus bandwidth workload and the graphics accelerator performance, we resorted to VillageMark benchmark, where the scene OverDraw is very high. For a better comparative analysis, we also took a look at ATI RADEON DDR, Creative 3D Blaster Annihilator 2 Ultra (NVIDIA GeForce2 Ultra) and ASUS V7700 64MB (NVIDIA GeForce2 Pro), having set its core to 200MHz and overclocked the memory up to 230MHz (460MHz DDR), which are the working frequencies of NVIDIA GeForce3:


As you can see in 16bit mode GeForce2 Ultra takes the lead due to high memory and core frequencies. However, in 32bit mode GeForce3 becomes the winner. HSR technology allowed it to leave behind not only GeForce2 Pro overclocked up to the level of GeForce3, but also GeForce2 Ultra. Moreover, we didn't detect any performance drop when we shifted from 16bit to 32bit mode. RADEON DDR is lagging behind everybody in 16bit mode while in 32bit mode, especially at higher resolutions, its HyperZ technology allows RADEON to catch up with GeForce2 Pro and GeForce2 Ultra. As for GeForce3, which is treating hidden surfaces similarly and featuring similar Z-buffer optimization, RADEON DDR failed to beat it, because GeForce3 works at much higher frequencies and can boast twice as many pipelines. We will return to HSR discussion a bit later when we speak about the performance in Quake3.

Lightspeed Memory Architecture

NVIDIA's official papers say that the 256bit (128bit DDR) memory controller of GeForce3 is "split" into 4 controllers 64bit wide each. In case the triangles are so small that their image on the screen is made of only 1-2 pixels and hence their drawing requires only 64bit of data to be taken from the local graphics memory, the separation of the memory controller into four smaller ones allows reducing the idle time. Thus one 64bit controller will transfer the data for the rendering of this particular triangle and the other three controllers will remain uninvolved. In other words, there will be no short data threads transferred along the 256bit memory bus. However, we doubt that the memory architecture is implemented in the following way over there. It seems much more logical to have texture and Z-buffer caching, which is most likely to have been used in GeForce3. Nevertheless, this architecture of the memory controller was officially announced to be the case. So, it is also possible that each of the four pipelines of GeForce3 has its own memory controller.

Anyway, it seems to be too much theory now. Let's pass over to real things...
 

Graphics Cards: Closer Look

We had two GeForce3 based graphics cards at our disposal: GV-GF33000DF from Gigabyte and PixelView XX-Player from Prolink:

   
Gigabyte GA-GF3000DF
 
   
PixelView XX-Player

Gigabyte was one of the first to announce its GeForce3 based solution. The card carefully follows the reference design and features Gigabyte's brand blue color PCB:

GV-GF3000DF features a DVI-Out and S-Video Out located on the GA-GF2TV daughter card:

GV-GF3000DF is equipped with 64MB local graphics memory. There are 8 3.8ns chips by EliteMT onboard. The working frequency of this memory makes 460MHz (230MHz DDR):

By the way, NVIDIA's reference design implies that there may be a TV-coder microchip on the main PCB as well as on the daughter card. In our case the Bt868KRF TV coder from Conexant is installed onto the daughter card.

The reverse side of the graphics card is provided with the core and memory power supply microchips. By the by, GeForce3 core seems to be consuming very little power because there are no powerful transistors on the card. Also the reverse side of the PCB has Sil164CT64 chip from Silicon Image, which is responsible for coding and transferring of the video signal to the DVI interface.

The chip is cooled down by an active cooler, but there is so little room between the chip and the fan that its effort seems to be wasted:

That is why we would regard it as passive cooling, actually, rather than active. Fortunately, the holes in the PCB, provided by the reference design allow installing such coolers as Blue Orb, for instance, which is exactly what we did right away. Nevertheless, the cooler didn't matter that much, because the graphics card didn't get overheated and didn't crash when working with the Blue Orb cooler installed as well as with its own original cooler.

Prolink PixelView XX-Player card is also based on the reference design from NVIDIA and differs from GV-GF3000DF by the absence of a DVI-Out, and by the location of the TV-coder, which is situated on the main PCB and not on the daughter card like by GV-GF3000DF.

The card from Prolink we had at our disposal was equipped with the same memory chips as Gigabyte GV-GF3000DF, although their working frequency was lower: only 400MHz (200MHz DDR):

When we tried to set the memory frequency equal to 460MHz, the card worked fine throughout all the tests and we didn't discover any problems. We suspect that the default frequency was set lower because this Prolink PixelView XX-Player was a preproduction sample, so the mass pieces may have higher memory frequency.

Testbed

To test our cards we used the following system:

Software:

Drivers:

Testing Method

In our review we will compare the results obtained for GeForce3 (represented by Gigabyte GV-GF3000DF) with those obtained for ATI RADEON 64MB DDR VIVO, Creative 3D Blaster Annihilator 2 Ultra (NVIDIA GeForce2 Ultra) and ASUS V7700 64MB (NVIDIA GeForce2 Pro), having set its core to 200MHz and overclocked the memory up to 230MHz (460MHz DDR), which is the working frequencies of NVIDIA GeForce3.

When testing in Quake3 Arena we set the textures and the level of detail to the maximum, enabled trilinear filtering. All other settings were left as default. The tests were run in demo127.dem, which is included into the 127g point release patch.

In Unreal Tournament we set the textures and objects to the maximum, enabled trilinear filtering. All other settings were default. The tests were run in utbench.dem.

The tests in 3DMark2001 were run with all default settings. For 16bit modes we enabled 16bit textures and 16bit Z-buffer, while for 32bit modes - 32bit textures and 24bit Z-buffer.

Overclocking and Architecture Bottlenecks

In order to figure out how well-balanced the graphics cards based on NVIDIA GeForce3 chip were, we measured their performance in Quake3 Arena with the core and memory frequencies slightly increased and decreased. On the graphs below the horizontal axis indicates the memory frequency, and the colored curves correspond to different core frequencies:


You can see that the performance of GeForce3 in 16bit mode is much more dependent on the core clock frequency than on the memory frequency. In 32bit modes the performance increases almost in the same way for both: memory and core overclocking.

These results show that very soon we may see the first "Ultra" modifications of GeForce3 graphics cards, since the increase in the core clock frequency stimulates the growth of the graphics card performance.

We managed to achieve the maximum of 225MHz core and 480MHz memory frequency for Gigabyte GV-GF3000DF. Prolink PixelView XX-Player managed to work at 240MHz core and 500MHz memory frequency.

By the way, when we overclocked the core of PixelView XX-Player up to 246MHz, we noticed that some blocks simply fell out of the scenes:


These regularly located blocks with the size of 8x4 pixels allowed us to see the objects located behind them, when they fell out. They are most likely to appear as a result of incorrect work of the core during overclocking and prove our supposition about the "low-resolution Z-buffer" being cached in the chip core. Another indirect evidence is the fact that caches are usually the first ones to crash if the overclocked core gets overheated, because caches are local overheating centers. When we set the core frequency equal to 247MHz, the scene covered with the "missed-blocks-ripples" very quickly and then the system crashed.

Test Results: Quake3 Arena


Due to HSR, unusual memory architecture and probably also textures caching, GeForce3 turned out an indisputable leader at high resolutions in 32bit color mode. On the other hand, it is quite interesting that in 16bit color mode GeForce3 based graphics cards fell behind GeForce2 Pro overclocked up to the level of GeForce3. How could we explain it? GeForce3 core may be performing some operations, calculations connected with determining the objects visibility, however the OverDraw value of the tested scene isn't high enough for the performance gain provided by HSR to make up for the resources spent on these calculations. Besides, it is also noticeable that in 16bit modes of this test the memory bus bandwidth isn't a bottleneck at all and hence the HSR shouldn't provide any serious gain here.

Test Results: Unreal Tournament


Since the gaming engine in Unreal Tournament is optimized in a very specific way, the gaming performance depends a lot on the overall system fastness. GeForce3 proved just excellent in 32bit mode, however, in 16bit mode it again fell behind GeForce2 Pro overclocked to the frequencies of GeForce3. The lag at 800x600 made 2.6%, at 1024x768 - 3.2% and at 1280x1024 - 2.2%. Taking into account errors in measurements, the lag may be considered the same for all three resolutions. It is probably the HSR calculations carried out by GeForce3, which tell on the performance. In 32bit modes, especially at higher resolutions, the memory bandwidth starts limiting the accelerator's performance and this is where GeForce3 overtakes everybody.
 

Test Results: 3DMark2001

Game 1 - Car Chase

This test includes a special system aka Ipion from Havok, which calculates physics in real time. This makes the benchmark maximally close to real gaming conditions, as to the 3DMark2001 developers. The graphics accelerator in this test is also loaded to the full extent: the average scene of this game has about 37.7 thousand polygons and about 8.8MB/17.7MB of textures in 16bit/32bit modes with the low level of detail correspondingly. As for the high level of detail, there are 12.8MB/22.9MB of textures in 16bit/32bit modes. In case of High Detail, the scene includes more objects of higher complexity and all the moving objects also have shadows, which are calculated with the help of vertex shaders. Besides, the landscape is covered with the third texture layer and all the stepping bots acquired searchlights and two flying bots-assistants.


In case of Low Detail GeForce3 falls a bit behind GeForce2 Ultra only in 16bit mode because the scene is full of different objects while the textures aren't that numerous. GeForce2 Ultra core works at a higher clock frequency than that of GeForce3 that's why GeForce2 Ultra processes the polygons somewhat faster than its elder brother. At the same time the performance gain provided by the HSR in 16bit mode isn't big enough to bring GeForce3 the winner's laurels.

In 32bit color depth mode the things change. GeForce3 can now be found a significant while ahead of GeForce2 Ultra because the situation requires more from the memory bus bandwidth and due to HSR and memory optimization GeForce3 manages to dash forward even though it processes the polygons somewhat slower. This is especially noticeable at higher resolutions.


In High Detail, the textures size increases, besides, the amount of polygons almost doubles. As is known, GeForce and GeForce2 chips can save the vertexes in their local graphics memory, however they never do it and transfer the data via AGP bus in order to free the local graphics memory bus for some other purposes, such as transferring textures, etc. That is why the cards performance in this test is additionally limited by the data transfer rate via AGP bus. Another factor that slightly lowers the results of GeForce2 is the necessity to lay three textures over the ground surface, which forces GeForce2 based graphics cards to undertake 2 passes, while RADEON, which can lay 3 textures per clock, and GeForce3, which can lay 4 textures per 2 clocks, manage that within a single pass.

Game 2 - Dragothic

This test shows a dragon flying over the village, which may be an imitation of the future role games.

There are a bit more textures and polygons in this test compared to Car Chase. A lot of houses in the village make the OverDraw quite high. Dragon and villagers animation as well as shadows transformations are implemented via Vertex Shaders. The scene played in the Low Detail mode features about 51.4 thousand polygons, which makes 8.8MB of textures for 16bit and 17.2MB for 32bit color depth. For High Detail mode the number of polygons rises up to 99.7 thousand and the textures size to 13.2MB for 16bit and 25.8MB for 32bit color depth.


With the low level of detail set GeForce3 leads the race due to high OverDraw value and hence to efficient HSR and Z-buffer compression.

In case of high level of detail, the village is larger and there are 3 textures laid over each house in this village. The dragon is shown as a whole and there appear three awfully badly drawn village guards with bows and arrows, who can't hit the dragon from a 20-meter distance.


Here GeForce3 and RADEON DDR appear the leaders, which is quite understandable since the scene has a huge OverDraw. Besides, the village houses are all made of three textures each, which makes the cards on NVIDIA GeForce2 draw them in two passes. Well, and all the stuff we have just said about the AGP bus bandwidth in Car Chase turns out valid here as well. In general, the graphics cards proved a bit faster here than in Car Chase. This is probably caused by the fact that no physics is calculated in real time here and as a result, the CPU workload drops down.

Game 3 - Lobby

This gaming scene has been initially intended for demo only, however the developers considered it realistic enough to be used as a test. The floor and the walls of the lobby are covered with two textures and all the characters - with one texture. In High Detail mode the action is reflected on the floor, which is achieved due to the scene mirroring. All the characters have dynamic shadows created with the "rendering to textures" method. Note that the shadows polygons are transformed with the help of Vertex Shaders, and if the graphics card doesn't support Vertex Shaders, then it is the CPU, which makes all the necessary calculations. Also in High Detail mode the bullets leave marks on the walls of the lobby. On the average there are 21.7 thousand polygons in the low detail scene, which makes 4.1MB and 8.2MB for 16bit and 32bit modes correspondingly. The high detail scene includes 41.7 thousand polygons, which makes 6MB and 10.3MB of textures in 16bit and 32bit modes correspondingly.




Well, the result is quite logical: GeForce3 is in the leading position in 16bit and 32bit modes due to HSR, Z-buffer compression and graphics memory optimization.

Game 4 - Nature


 

 

This scene shows the image quality that could be obtained if the hardware opportunities provided by DirectX8-compliant graphics cards were used to the full extent. Here Vertex Shaders are used to create grass, leaves, trees, butterflies, moving fisherman.

The lake surface is drawn with the help of per-pixel reflection and cube environment mapping in 4 texturing stages, which is equivalent to laying 4 textures. Of course, the cards, which don't support Pixel Shaders, failed this test that's why here we tested only GeForce3. The average amount of polygons in the scene is equal to 21.7 thousand. As for the textures, there are 4.1MB in 16bit mode and 8.2MB in 32bit mode.

Fillrate

This synthetic benchmark measures the fillrate of your graphics hardware. Fillrate is a measurement of how fast the graphics card is capable of drawing textures onto 3D objects. There are two different test runs included in the fillrate test, and they are:

  1. Single-texturing: There are 64 surfaces with one texture each. This means that the graphics hardware fill each of these objects separately, no matter how many texture layers that card is capable of drawing in a single pass.
  2. Multi-texturing: We draw 64 texture layers as fast as possible. 64 texture layers are distributed so that each surface in use has as many texture layers as that particular card can draw in a single pass. For example, if your card can draw 8 texture layers in a single pass, then there will be 8 objects with 8 texture layers each. If your card is capable of doing 6 texture layers in a pass, there will be 10 objects with 6 layers and an 11th object with the remaining 4 layers.



In 16bit modes in case of single-texturing the fillrates of GeForce2 and GeForce3 are almost equal. In all other modes GeForce3 features higher real fillrate. It is interesting that in multi-texturing, RADEON shows higher results than GeForce2 Ultra, which makes us suspect that the benchmark contains a bug. Especially since ATI RADEON 64MB DDR VIVO features the theoretical texel fillrate of 183x2x3=1098Mtexels/sec, while the theoretical texel fillrate of GeForce2 Ultra makes 250x4x2=2000Mtexels/sec. On the other hand, we can suppose that Z-buffer compression and texture caches of GeForce3 and RADEON allow using the memory bus more efficiently.

High Polygon Count

This synthetic test measures the polygon throughput of the graphics card. There are over one million polygons in this scene. There are two test runs in this benchmark. The objects in the scene aren't textured and are lit by a single directional light. In the second run, there are 8 lights all in all; one directional and 7 point lights.


As you can see the polygon throughput of GeForce3 T&L unit is almost the same as that of GeForce2. It means that if the polygons are processed as fast as by GeForce2, GeForce3's T&L unit differs from GeForce2's one only by "being smarter" due to Vertex Shaders support.

Vertex Shaders

This benchmark demonstrates a great lot of moving characters, which are all alike. Their animation is set by the Vertex Shader. If the Vertex Shaders aren't supported by the graphics card, they can be efficiently emulated in software with the help of the system CPU.

The most interesting thing is that Vertex Shaders emulation on those graphics cards, which do not support them, appeared faster than their hardware implementation on GeForce3. In fact, this is quite understandable. If we assume that the characters' movements are defined with a few key issues, and the intermediate stages are interpolated by the Vertex Shader, then Athlon 1.2GHz will undoubtedly do this interpolation much faster than GeForce3 core. Besides, the models transformation (when their coordinates are already calculated) by GeForce3 is carried out as fast as by GeForce2, because both graphics chips transform the polygons at the same speed.

Test Results: Aquanox

This is just a test demo version of the game, which is expected to be finalized in Q3 2001. What we've got now, is just a gaming engine. The action takes place in 2666 under water, in a gigantic deep sea city, which appears the center of a conflict. The scene is rich in polygons since all the object models are pretty complex: the suboceanic world and the sea bed relief are impressively detailed. The developers used here Pixel Shaders v.1.1 and Vertex Shaders v.1.0. Moreover, they claim that this gaming engine is optimized for GeForce3.

The test in Aquamark, the benchmark built into the Aquanox demo, was carried out with all default settings:


Well, GeForce3 is an indisputable leader here. However, when testing we didn't notice any visual differences between the image quality provided by GeForce2 and that provided by GeForce3. But if you take a look at the diagram above, you will see that GeForce2 is quite a while behind. We suppose that the floating fog effect is exactly the thing that slows down its performance, because it is very likely to be created with the help of the Vertex Shader. Maybe some effects, which are supported on the hardware level by GeForce3, have to be emulated by GeForce2. Anyway, Aquanox developers do not reveal any details of their offspring that's why we have to guess what it could be…

Conclusion

Well, GeForce3 turned out quite a finished product intended not only for the games of the nearest future, but also providing a real performance increase in the today's games. Since most contemporary games do not use High-Order pixel and vertex shaders that much or don't use them at all, we have to be patient because we will be able to see the advantages of these hardware innovations only in the future.

It is extremely pleasing that we will be able to get more beautiful images and to retain high performance of the graphics accelerator due to well-balanced architecture, new fast and high-quality full-scene anti-aliasing methods, possibility to enable anisotropic filtering.

And the most important thing. In order to balance GeForce3's architecture properly, NVIDIA engineers used caching and all sorts of optimization, which prevents GeForce3 based graphics cards from getting slowed down by the insufficient memory bus bandwidth.

Highs:

Lows: