<%BANNER[top_768x90]%>
<%BANNER[banner_468x60_h]%>
<%BANNER[article]%>

Articles: Video

Table of Contents

<%BANNER[fp_160x600_r_1]%>

During the last two years NVIDIA has become a world's indisputable leader in the 3D accelerators market. There is a lot we could talk about when discussing what made them so successful. Some people believe it is the marketing policy they pursued, the others think NVIDIA owes the greatest part of their success to perfectly developed drivers. One more opinion deals with NVIDIA's product strategy, i.e. to announce new products every six months always being ahead of the competitors. Anyway, no matter what you consider the major factor, but one thing is undoubtedly true: a lot has changed in NVIDIA since the times of Riva 128 and Riva 128 ZX based graphics cards, and even since the era of Riva TNT.

However, life showed that even without introducing any new technologies (with a brute force approach, so to speak) the performance of NVIDIA's graphics solutions is so high and impressive that the company has left its competitors really far behind within the last one and a half years. This way they understood in the fall of the year 2000 that they were reaching a deadlock. If you take a closer look at the benchmark results obtained for overclocked GeForce2 GTS (GeForce2 Pro and Ultra), then you will understand as well that there is no further room for improvement and new records are just impossible to set, since the current architecture has exhausted its potential to the full extent.

Well, today we are going to tell you about GeForce3, a new development by NVIDIA, which represents a totally new core with absolutely new features. In the end of our preview we will try to draw some conclusions and to predict the prospects of GeForce3 for the near and far future. Also we will do our best to answer one of the most troubling question: is GeForce3 worth $600?

So, let's get started!

NVIDIA GeForce3 Features

At first, we suggest taking a closer look at the official product specifications:

  • Manufacturing technology process: 0.15 micron;
  • Number of rendering pipelines: 4 with 2 texturing units each;
  • Core frequency: 200MHz;
  • Graphics memory frequency: 460MHz (230MHz DDR SDRAM);
  • Memory interface: 128bit for SDR/DDR SDRAM/SGRAM;
  • RAMDAC: 350MHz;
  • Maximum supported resolution: 2048x1532x32;
  • Number of transistors in the core: 57 million;
  • 800 billion integer operations per second;
  • 76 billion floating point operations per second;
  • nfiniteFX Engine: This is programmable unit for calculation of the scene geometry and lighting, which gives developers the ability to program a virtually infinite number of special effects and custom looks with the help of two patented architectural advancements: Vertex Shaders and Pixel Shaders;
  • LightSpeed Memory Architecture: a set of technologies, which allow saving the memory bus bandwidth with the help of some means, which we'll discuss later in our preview;
  • Quincunx: the technology of full-scene anti-aliasing via the scene multisampling;
  • Complete hardware support for DirectX 8 and OpenGL 1.2.

It is on purpose that we don't list all the features connected with API and other technologies here, because it will hardly make much sense reading them without any comments and explanations. We suggest discussing all the new technologies implemented in NVIDIA GeFordce3 chip in a bit greater detail.

nfiniteFX engine

As we have already said, nfiniteFX engine is a programmable geometry and lighting unit, which allows the developers to avoid focusing only on a restricted set of special effects. With this new technology of NVIDIA's the developers can create their own effects with the help of Pixel and/or Vertex Shaders.

Pixel Shaders

What we see on the screens of our displays is pixels. Depending on resolution, we may see over 2 million pixels at the same time. Of course each pixel of these 2 million needs to be rendered, lit shaded and colored. With the help of Pixel Shaders the developers can use lighting and other custom shading effects at the pixel level. Firstly, this gives us an unprecedented level of hardware control over the graphics accelerator (DirectX 8 in this case is none other but a low-level programming language). And secondly, it allows obtaining very high quality of the final image due to increased precision.

If you take a look back at the history of computer games, you will surely noticed that they have undergone significant changes recently. Remember how impressed you were with bilinear filtering and low-resolution texture maps mapped onto relatively large polygons in Tomb Raider, for instance. Some of the today's games do already support texture compression and T&L effects and consequently, require more powerful graphics accelerators supporting all these new features.

Last year NVIDIA introduced a new technology: the NVIDIA Shading Rasterizer (NSR). It gave developers the ability to add per-pixel lighting effects, dot-3 product bump mapping and some other sophisticated techniques. At that time NVIDIA promised that its new shaders (NSR) will be compliant with Microsoft DirectX 8. However, it turned out that GeForce2 GTS couldn't be regarded as compliant with Microsoft Pixel and Vertex Shaders on the hardware level because of the architectural peculiarities of the GPU.

GeForce2 GTS Shader Architecture

As you can see from this scheme, GeForce2 allowed combining only 2 textures to create a single pixel. However, there was no loopback, so effects that needed combining more textures and required dependent texture reads, such as true, reflective bump mapping were simply not possible. In other words, the programmers were somewhat limited by NSR because of its architectures imperfection.

GeForce3 doesn't have the described drawback. Its blending unit supports up to 8 texture-blend operations, and dependent texture reads. Besides, the pipelines are connected with each other and don't lack the "loopback" anymore.

GeForce3 Shader Architecture

You should understand that the graphics cards without hardware support of DirectX 8 Pixel Shaders can't emulate them to the full extent with the help of the system CPU only, because it will automatically result into a complete shift to software rendering. So, from the viewpoint of DirectX 8 pixel shaders TNT2 M64 and GeForce GTS graphics cards are get placed on the same level of "stupid" frame-buffers.

Moreover, applying multiple textures in a single pass almost always yields better performance than performing multiple passes. Multiple passes translate into multiple geometry transformations and multiple Z-buffer calculations, slowing the overall rendering process.

As a good example here is dot-3 bump mapping, which requires the application of multiple layers of textures (at least 2). The key textures are derived from the height map and normal map, which help define the visible surface geometry. The height map is a simple gray scale where the shades of gray represent the relative height or indentation. The normal map is an RGB color map, that defines the effect from light shining on the surface.

Note that we will be able to perceive the real beauty of Pixel Shaders only when combined with vertex Shaders, since in nfiniteFX Engine they are used together contributing to the end image quality.

So, the developers received a really cool possibility to program a great lot of new effects with the means provided by DirectX 8 and special OpenGL extensions. But it will be a totally different question how soon they make real use of this new opportunity.

Vertex Shaders

Vertex Shaders are another giant jump towards the creation of ideal virtual world. Before passing over to the benefits vertex shaders provide for the end user, we would like to get a bit deeper into technical peculiarities of this technology and to find out the major principles it is based on.

Each vertex is defined by a variety of data variables. At a minimum, each vertex has associated with it x, y, and z coordinates that define its location. In most cases, a vertex may also include data for color, alpha-channel, texture, and/or lighting characteristics such as specular color. In fact, a Vertex Shader can be considered a magic box: vertex data is fed into the box and different vertex data comes out. It means that the same effects can be applied for different objects.

Well, let's find out what programmable vertex shaders actually offer.

Complex Character Animation

Vertex Shaders create skin, clothing, i.e. everything that is constantly changing in real life, more realistically. They stretch and crease properly at the joints like elbows and shoulders. Facial animation can now include dimples or wrinkles that appear when the character smiles, and disappear when the smile disappears. All the developer will need to do is to set the keyframes. And then Vertex Shaders will interpolate them in real time. Bearing in mind how powerful nfiniteFX Engine is, it will allow creating very realistic and high-quality characters.

Vertex Shaders allow up to 32 control matrices. That means up to 32 individual bones and muscles can be used to define individual components of a character's skeleton. Of course, the character may have hundreds or even thousands of total components like that and the only thing that nay prevent the developers from squeezing the maximum out of these techniques is their patience and desire to work.

Environmental Effects

Vertex Shaders allow modeling the environment and its dynamics. Take, for instance, fog or water, which are of different color and thickness depending on the coordinates. Here are a few examples taken from the demo-programs:

   
Water      Fog

Procedural (Static and Dynamic) Deformation

Procedural deformation, calculated by Vertex Shaders, can add movement to otherwise static objects. For example, the bumps formed in a metal object from the impact of high-caliber bullets (static), an animal' s chest expanding and contracting to simulate breathing or a flag waving in the breeze (dynamic).

   
 

Morphing

Morphing (please, don't mix it up with morphine :-) is another animation technique similar to keyframe animation. Using different versions of an object, the Vertex Shader blends the positions of each vertex on two images. For example, with the dolphins below, the Vertex Shader blends the position of each vertex in dolphin #1 with the positioning of the same vertices in dolphin #2 to create the middle, morphed dolphin.


The middle dolphin exists only as a temporary blend of the other two. The morphed dolphin' s geometry is never stored permanently. It is recreated immediately before it is needed. The result is a smooth animation sequence as the dolphin morphs from one tail position to the next.

If you remember, 3dfx used to promote the so called Motion Blur technology (as a part of T-Buffer), which is quite similar to Morphing. Unlike Morphing, Motion Blur creates additional frames blended with each other even in the intermediate stage, which adds this washed out effect to the image and creates an impression of super-high speed.

Motion Blur

Lens Effects

Custom transforms can be programmed into Vertex Shaders to produce the effects associated with optical lenses. Note the effect below: a normal transform on the left, and the fish-eye lens transform on the right.

Although the Lens effects aren't that frequently used, they are still quite interesting. Whether simulating a view through a virtual security peephole in a front door, or the viewfinder for an international spy scooping out an enemy compound, a fish eye lens effect can heighten the realism of the 3D scene.

To tell the truth, these were just a few examples of the effects, which can be created with Vertex Shaders. In fact, we wouldn't be able to cover all of them within one single article. Anyway, we think that you got a more or less clear idea of what vertex Shaders is capable of.

NVIDIA GeForce3 with DirectX 8 offer the whole bunch of lighting effects, which have never been supported by consumer 3D accelerators before. Of course, like in case of pixel Shaders, it may take the game developers quite long to create new games, which will show us super beauty and realism achieved with the help of this technology.

Lightspeed Memory Architecture

Frankly speaking, Lightspeed Memory Architecture (LMA) seems to be the most impatiently awaited technology GeForce3 can boast. This technology allows to reduce the load falling onto the graphics memory bus, which is a major bottleneck of the today's graphics accelerators. To make it easier for you to get the idea of NVIDIA's methods and algorithms we would like to offer you a brief tour into gaming 3D graphics techniques.

A typical graphics application such as an interactive video game has four main components:

  • Game Logic
    Game logic, physics, artificial intelligence (AI), networking, interactivity, sound, and other non-graphics functions are some examples, and are all elements, of the primary game engine code. The key to great content is delivering an engaging interactive experience and offering highly intelligent characters. As is known, the CPU of the system is usually responsible for calculating all these elements. By the way, T&L was initially intended to offload the CPU and to free it from calculating transformation coordinates and lighting, so that the game developers could work wonders with the logical elements of the games.
  • Scene Management
    A typical gaming scene usually consists of a great lot of different objects. Each object belongs to a database describing the "3D world". Typically, these databases are very large, sometimes containing hundreds of megabytes, or even gigabytes of data. Rendering and displaying all of this data is simply not practical, even on high-end multiprocessing graphics supercomputers, so this task must be simplified with the help of some algorithms that minimize the amount of scene data required to render a frame. These algorithms are called scene management. There are a lot of theoretical software algorithms, which could speed up the rendering on some graphics processors by cutting out hidden surfaces, etc. However, they cannot be always used in real life because it will increase the CPU load. In other words, this gain can detract from the processing power available for the game logic.
  • Geometry Calculations
    Calculating transforming coordinates and lighting.
  • Pixel Rendering

As you can see, the general model is pretty simple. As well as the problems, which most hardware and software developers face. As we have already mentioned in our preview, the today's major problem of all graphics accelerators is low maximum memory bus bandwidth. Now let's dwell on it.

Geometry Bandwidth

As we have already said, each vertex is defined by a variety of data variables. Besides the ordinary x, y, and z coordinates that define its location, there may be also data for color, shading, etc. In most cases each vertex may contain 50Bytes or more of information, for these things. This way a typical scene of 100,000 polygons (which is not that much for the present day games, actually), each composed of three vertices, will make 300,000 vertices per frame altogether. Hence, it is common for each frame to contain 15MB of geometric information, i.e. in case of 60 frames per second, we will have to transfer 900MB/s of info!

NVIDIA has developed several unique solutions to address this geometry bandwidth problem. The first is higher order surfaces (HOS). When working on curved surfaces with triangles, the developers had to load the graphics memory bus with a plenty of geometrical data. While HOS make the developers' life much easier: they allow developers to create objects using curves defined by control points. A curve or surface defined with a set of control points is called a spline. Joining splines together allows a designer to create complex curved surfaces that are difficult to create by just using triangles. Moreover, it allows reducing the graphics memory bus workload quite tangibly.

Compare the incredible amount of vertex coordinates with a number of splines shown on the picture above: it is simply impressive!

And if the result obtained is just the same, then why should we pay more? :-)

Hardware support of curved surfaces allows creating really high quality scenes almost free from lighting bugs, unlike the images created with numerous triangles.

Pixel and Z-Buffer Bandwidth

Unfortunately, the geometry bandwidth is not the only bottleneck of the today's graphics accelerators. Rendering appears much more problematic.

What is required for a single pixel rendering? Rendering a single pixel once requires the graphics processor to read the color buffer, to discover the previous value, to read the Z-value to determine the depth in the scene for the pixel, and to read the texture data necessary to texture map that pixel. Once the pixel is generated it requires writing the new (potentially blended) color value to the color buffer, and potentially writing the new Z-buffer value. In the 32-bit depth rendering case, each of these operations requires 32bits, or 4Bytes of data per access. So:

You should bear in mind that operations like that are carried out for each pixel, no matter if it is visible or hidden. The average OverDraw for the current applications makes about 2.5. In other words, each pixel on the screen is rendered 2.5 times. Assume a resolution of 1024 pixels by 768 pixels:

The higher gets the resolution, the higher memory bus bandwidth is required (it grows in geometric progression), especially bearing in mind that the scenes in new games are getting much more complex. As a result, the OverDraw value grows as well. For the resolution of 1600x1200 and OverDraw=3.5, we will need 8.04GB/sec! No doubt that it would be quite unreasonable to increase the memory bus bandwidth in this case because it is very expensive on the one hand, and sometimes, even impossible, on the other.

What new technologies does the new GeForce3 offer us? GeForce3 implements many patent pending technologies to improve the efficiency at which it renders pixels. Three of these key technologies are: Firstly, a crossbar-based memory controller to improve the efficiency of access to the frame buffer and to make more efficient use of the available 128bit memory bus. Secondly, lossless Z-buffer compression, and thirdly, Z-Occlusion Culling to reduce the drawn depth complexity, and thus reduce the number of pixels that must actually read from and write to the frame buffer. Now let's devote a bit more time and attention to each particular feature.

Crossbar Memory Controller

In most cases, graphics memory is the most expensive part of a typical graphics system, often accounting for 50% of the cost of the product or more. This became a usual thing long time ago already, however, it was only last year that the graphics memory appeared the major factor to tell on the graphics product cost. Moreover, the graphics memory bus bandwidth turned out the main determinant of the accelerator's performance. In fact, this is exactly the deadlock NVIDIA is currently in, which we have already mentioned in the beginning of our preview.

Of course, as a way-out they could resort to increasing the working frequency, or the bus width up to 256bit, and… this could lead to another deadlock. Theoretically, this process could be nearly endless, with the product cost constantly getting higher and the reliability and stability - lower.

However, let's find out whether the today's graphics accelerators make really efficient use of their graphics memory bus. Consider a very simple case of DDR SDRAM data transfer.

As you know, the level of detail in the latest 3D games can be very high. For instance, the size of the average triangle (again, the fundamental building block of all real-time graphics) can be very small, sometimes only a few pixels. If a triangle is perhaps 2 pixels in size, and is composed of 32bits of color or Z for each pixel, the total amount of data for that triangle would be 32bits x 2 pixels, or 64 bits (see the formula in the previous section). If memory controllers access information only in 256bit "chunks" then much of this access would be wasted, as this "payload" or amount of data being transferred would essentially waste much of the frame buffer' s potential bandwidth. In this example, a traditional 128bit memory controller would be only 25% efficient, "wasting" 75% of the memory bandwidth.

In order to improve the situation, GeForce3 implements a new crossbar memory controller represented by a set of four independent memory controllers, each of which communicate with each other and the rest of the graphics processor. This way each of the four memory controllers can transfer up to 64bit of data per clock in case of DDR SDRAM. So, we have several independent memory channels, which can be up to four times as efficient as previous less intelligent designs.

Lossless Z Compression

Z coordinates denote the depth or visibility information of this or that pixel. The main problem is Traditional graphics processors read and potentially write Z data for every pixel they render regardless of their visibility or not, making Z-buffer traffic one of the largest "consumers" of memory bandwidth in a graphics system. By implementing an advanced form of 4:1 lossless data compression the memory bandwidth consumed by Z-buffer traffic is reduced by a factor of four.

Maybe the company will try to cut down the overall amount of Z-buffer transactions, because the results of its functioning will be cached. However, we wouldn't impose this idea not to mislead you, since there is no info proving whether it's true or not.

Visibility Subsystem: Z-Occlusion Culling

Well, we have finally come up to the feature, which has long been the talk of the town: the possibility to avoid rendering hidden pixels.

We would like to repeat one again for your reference, that a typical 3D scene has an OverDraw coefficient (depth complexity), which is usually equal to 2 at least. It means that for every pixel that ends up being visible, two pixels have to be rendered (on average) to come up with that result. In other words, for every visible pixel, the graphics processor is forced to access the frame buffer twice, spending valuable frame buffer bandwidth essentially rendering pixels that the viewer will never see.

GeForce3 implements a sophisticated Z-Occlusion Culling technology, whereby it attempts to determine early if a pixel is going to end up being visible. In some cases if a pixel is going to be occluded and the Z-Occlusion Culling unit determines this, the pixel is not rendered, the frame buffer is not accessed, and the frame buffer bandwidth is saved. The major principle applied here is very simple, just like all the great things usually are: if x and y coordinates of several pixels coincide, that one is taken, which Z-coordinate is smaller, i.e. this pixel will be closer to the viewer and hence will overdraw all the others. Taking into account that the today's average OverDraw is equal to 2, it will ensure 50% performance gain.

An additional technique, which developers can employ, is an "Occlusion Query". Essentially, the application makes a request of the graphics processor to render a bounding box or region to test for visibility. If the GPU determines that the region is going to be occluded, then all the representative geometry and rendering representing that region can be skipped over, potentially offering an order of magnitude increase in fillrate. Take, for instance a monster standing behind the brick wall, when you see only its head. If the developer determines that there is no need calculating the transformation and lighting and render the hidden part of the character, then it will save spending precious memory bandwidth or GPU processing time. However, unfortunately, game developers aren't always that eager to optimize their games.

As we have just showed, NVIDIA did its best to combat the problems connected with transferring huge amounts of geometric data. Unfortunately, we won't be able to see real advantages offered by the described techniques unless the games using Higher order Surfaces (HOS) start appearing in mass. But an enhanced memory controller as well as the methods of pixel data transfer optimization will allow us to feel the difference even now.

Quincunx HRAA

One of the major aims all hardware and software developers always pursue is increasing the image quality and realism of the virtual worlds together wit the overall system performance. In the very beginning of the 3D acceleration history when all polygons were huge and all textures were turbid, no one cared about the low resolutions supported by the first 3D cards. And very few people really noticed the stair-step effect on the edges of objects. Most users simply didn't imagine what lighting effects could be used in games of that time.

The today's graphics accelerators can support resolutions up to 2048x1532x32, work very fast and boast a lot of features improving the image quality.

One of the key image quality issues is aliasing, or the "jaggies" (stair-step effect). There are several ways of solving this problem. Each of them has its highs and lows.

Higher Resolution

No doubt, this is the best way to get rid of "stair-steps". The size of the "jaggy" or stair-step artifact is never larger than the size of the actual pixel. Hence, reducing the size of the pixel reduces the size of these artifacts. Changing the resolution is not always feasible, however. The user may already be using the maximum resolution supported by the monitor or the application itself may limit the resolution. Beyond these hard limits the only solution is to increase the effective resolution. The best way to do this is to use more sophisticated techniques for computing the color of each pixel of the display in a way that simulates having more pixels. These techniques are referred to as "antialiasing" and a technique applied to the entire scene - Full-Scene Anti-Aliasing (FSAA).

Supersampling

This is the simples type of antialiasing. In fact, this technology is extremely simple and extremely useless. :-) The scene is rendered in very high virtual resolution, much higher than the current display mode, and then scales and filters the image to the final resolution before it is sent to the display. However, the problem is that, as you might guess, supersampling causes a substantial drop in performance, namely the higher was the virtual resolution, the greater will be the performance drop. This antialiasing technology doesn't provide any acceptable outcome that's why we won't dwell on it any longer.

Multisampling

This technology renders several samples to calculate the final pixel color. It is another type of antialiasing techniques and though it is much more complicated to implement the resulting image quality is really high. We won't go too deep into details here, since we aim at giving you the basic idea of the GeForce3 new features.

Quincunx HRAA

GeForce3 has new technology for graphics hardware antialiasing in the form of multisampling using a new sampling pattern. This new sampling pattern is called Quincunx, after the name of the pattern of the five dots on the "5" side of a 6-sided die. This Quincunx pattern, implemented in hardware, offers quality comparable to FSAA 4x, with performance similar to that FSAA 2x. How do they manage that? The scene is rendered just the same way, however, there are two different samples saved in a special buffer for each particular pixel. Just to simplify the explanation let's assume that the secondary pixel "sample set" is none other but our primary pixel offset for half a pixel along the vertical and horizontal axes. After that the two samples are superposed over one another and the pixel color values for the 5 neigbouring pixels, four of which belong to the secondary "offset" sample get blended.

The advantage of this method is the possibility to make the accelerator perform the same operations as in case of 2-sample antialiasing. As you may see, NVIDIA's algorithm is highly efficient, since it doesn't actually require a lot of additional samples like in case of supersampling technology. Of course, it saves not only the accelerator's resources but also the memory bus bandwidth and the frame-buffer storage capacity.

To our great disappointment we won't be able to evaluate the image quality and fastness of the new antialiasing technology, unless we get a geForce3 based graphics card at our disposal. That is why in the meanwhile we will have to be happy with what NVIDIA offers us.

As we have expected, the company isn't willing to unveil all the details of its new Quincunx technology. However, what we have already understood for sure is its key aspect connected with the algorithm of calculating colors and saving memory bus bandwidth. As for the hardware implementation, only developers seem to be aware of the details.

Conclusion

GeForce3 is undoubtedly a great move forward. Full support of Pixel and Vertex Shaders promises the cards based on this new GPU a brilliant future. However, we don't know yet how soon we will see the games making really full use of nfiniteFX Engine.

In fact, LMA technologies saving the graphics memory bus bandwidth will find their application today already. Again the question is how intensively they will be used in the today's 3D games. The games, which have always been known as fast-running are most likely to remain fast-running no matter what. The major changes will be most probably introduced to those games, which couldn't show what they were capable of when run on the cards of the previous generation.

The third key peculiarity of GeForce3 is FSAA, which doesn't cause the performance to drop down that greatly, unlike its implementation in GeForce2 GTS. It will definitely win the hearts of those who prefer to enjoy high image quality at an acceptable fps rate.

We think that NVIDIA GeForce3 is certainly one of the most progressive graphics cards of all. It is very likely to reveal some other hidden features and performance potential with the time, so that the competitors will be lagging really far behind then.

And as for the price of $600, it's solely up to you to decide whether this card is worth this money or not.

Well, this is what we can say about the new generation GPU now basing on the theoretical info and facts. However, please, take note that our ongoing conclusions may be somewhat different from what you read today, since we haven't yet taken a closer look at the real performance of GeForce3 based cards. Anyway, stay tuned for more GeForce3 stuff coming!


<%BANNER[banner_468x60_f]%>

Discussion

Comments currently: 0

You must log in to add comments.

Forgot password? Registration

remember me



Latest materials in Video section

Article Rating

Article Rating: 10 out of 10
 
Rate this article:
Excellent
Average
Poor