Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: Video

Second Look at Kyro II: Squeezing All Juices...


Category: Video

by FastSite

[ 06/28/2001 | 12:00 AM ]

This time we tried to squeeze the maximum out of our Kyro II based graphics card - Hercules 3D Prophet 4500 64MB.For this purpose we overclocked the card using some extreme methods and texture compression. Besides, we also managedto figure out some pretty interesting things about Kyro II architecture.


Table of contents:


In our previous article called "STM Kyro II Review: Hercules 3D Prophet 4500 64MB Graphics Card" devoted to Hercules' solution based on PowerVR/STM Kyro II, we investigated the performance of Kyro II on an Athlon platform (VIA KT133A). Besides, we also compared the performance, features and image quality provided by Kyro II with those provided by graphics cards based on NVIDIA GeForce2 MX400 and NVIDIA GeForce2 GTS chips.

<%BANNER[article]%>

That time we didn't tackle a couple of really interesting issues and now we decided to test Kyro II once again and to squeeze the most of it in Quake3 Arena, Serious Sam and 3DMark2000 with forced texture compression.

In the previous review, we saw that the overclocking potential of Kyro II wasn't that impressive: with its nominal 175MHz Kyro II could be safely clocked to 185MHz only. Inspired with the positive experience in extreme overclocking of the graphics cards on NVIDIA chips, we decided to do the same thing to Kyro II. In this article you will find some testing results for the nominal frequencies as well as for higher ones and a detailed analysis of the performance gains and efficiency of extreme overclocking, to estimate how fruitful our "extreme" attempts were.

Testbed

This time we used the following testbed to test Kyro II:

  • Intel Pentium III 1000MHz (133MHz FSB) CPU;
  • ASUS CUSL2 (i815 based) mainboard;
  • 256MB NCP PC133 SDRAM;
  • Fujitsu MPE3084AE 8.4GB HDD.

We tested in:

  • Windows 98 SE build 4.10.2222 A;
  • DirectX8.1a;
  • Quake 3 Arena v1.27g;
  • 3Dmark 2000 v1.1;
  • Unreal Tournament + patch 4.36;
  • Serious Sam v1.0c.

For Hercules 3D Prophet 4500 based on Kyro II we selected drivers from Imagination Technologies version 1.2.1 (7-103).

The other two cards based on NVIDIA chips were tested with Detonator 12.41 driver.

Extreme Overclocking

Kyro II chip on Hercules 3D Prophet 4500 is provided with a power supply microchip marked as MIC 29302 BU:

   
MIC 29302 BU Microchip

This microchip is an adjustable voltage regulator from MICREL, which is intended for PCs with low power consumption, cars, devices powered by batteries and the like.

Typical application scheme of the adjustable output voltage configuration:

The entire set of documents for the MIC29xxx series regulators (in PDF format) is available here.

As it is shown on the scheme, the Vout is determined by R1 and R2 resistors. On Hercules 3D Prophet 4500 6.8KOhm and 10KOhm resistors marked as R38 and R39 correspondingly stand for R1 and R2 determining the Vout value. The default Vout in this case totals 2.08V.

To improve the overclockability of Kyro II, we decided to increase the Vcore up to 2.5V. There are two ways to do that. You can either raise the R1 resistance (see R38 on the card) or reduce R2 resistance (R39 on the card). The first alternative doesn't suit us, because it implies R38 to be soldered out and replaced with another resistor, while the second solution seems to work out well. The only thing we need to do in this case is to calculate the resistance for the additional resistor shunting R39. We believe that knowing the required Vout you'll find these calculations easy enough to cope with, that is why we won't go into greater details now.

So, we took a 18KOhm shunting resistor. Subsequently, R1 made 6.4KOhm and the regulator Vout appeared equal to 2.55V. The card seems to have a special space for this resistor (marked as R37 on the photo of the card), however, you shouldn't necessarily use this space. It will be easier and quicker for you to solder the additional resistor to the pins of MIC29302 chip, though it will slightly spoil the general looks of the card :-)


Shunting Resistor

After this redesign we didn't feel that the core heated more than usual, so no extreme cooling solutions were applied. Finally, Hercules 3D Prophet 4500 worked fine at 200MHz (just for reminder: Kyro II has strictly synchronized core and graphics memory frequencies). Frankly speaking, this is a really cool result for a chip that could hardly give 185MHz before the corresponding changes were introduced to the card.

When we passed over to gaming tests at these working frequencies, we noticed that the card didn't always behave adequately: now and then the action seemed to freeze for an instance and then went on as if nothing had happened (it occurred with the Quake3 demo, for example). The higher got the screen resolution, the oftener appeared the pauses making it absolutely impossible to play, not to mention the performance measuring. Our desperate tries to save the situation by installing various fans and coolers ended in failure proving that the problem had nothing to do with the chip overheating. We had to bring down the clock frequency little by little until the action ran smoothly. The hitches disappeared only at the core frequency equal to 196MHz. Eventually we tested the card at its nominal frequency and at the extreme 196MHz.

This reaction to overclocking is most probably generated by the chip's architecture, that is, by the built-in Z-buffer, frame buffer and so on. All this stuff sheerly makes up an on-die cache, which is known as the major trouble-maker of any chip during overclocking.

It looks as if Kyro II as it is now, i.e. with its current architecture and 0.18-micron manufacturing technology, could endure the maximum of 190-200MHz.

Performance

Quake3 Arena

In Quake3 Arena all the graphics cards were tested with the following settings. For 16bit color modes we took 16bit textures. For 32bit modes the texture quality was set to 32bit. Textures and level of detail were set to the maximum. We enabled bi- and tri-linear filtering and tested the cards with the enabled texture compression as well as with the disabled one.


The diagrams show it clearly that in the nominal mode the rating of Kyro II, GeForce2 MX and GeForce2 GTS stays the same both on Pentium III and Athlon based platform (for the results obtained on Athlon based platform see this review). So, we dare conclude that Kyro II drivers for Athlon and Pentium III (at least on VIA KT133A and Intel i815 chipsets) are optimized equally well and don't have any grave differences.

In 16bit color modes the performance of Kyro II falls between GeForce2 MX 400 and GeForce2 GTS. And in 32bit color mode Kyro II opposes GeForce2 GTS being slower at lower resolutions and breaking ahead of it when the resolutions get higher.

As usual, Kyro II seems not to care at all about the screen color and texture depth: its performance drops by only 10-12% when the color mode is changed from 16bit to 32bit.


Tri-linear filtering changes the situation radically: in 16bit mode Kyro II can't cope with the new task fast enough and falls behind all its rivals. In 32bit mode we track the same tendency, but now it's GeForce2 MX 400 with its standard architecture that is unable to catch up with the tile Kyro II. The latter performs even better when texture compression is enabled. Due to this feature the performance of Kyro II doesn't drop that dramatically when tri-linear filtering is enabled. We would even say that it appears on an acceptable level.


The graphs show that in 16bit color mode texture compression helps Kyro II to reduce the losses caused by the tri-linear filtering from 25-35% to 10-15%. In 32bit mode the performance drop by Kyro II also makes 10-15%, However, it is not so low as by NVIDIA GeForce2 MX 400 and GeForce2 GTS: they suffer even smaller performance downturn - some 3-6%.


In Quake3 extreme overclocking has proven really worth doing for Kyro II. As we increased the core frequency by 12%, the performance gain at high resolutions averaged the same 12%. At any resolution below 1280x1024 the results are mostly determined by the CPU, therefore at low resolutions extreme overclocking doesn't affect the performance of Kyro II so notably.


In 16bit color mode the cards cope with the work pretty well. It feels like Kyro II is the only one troubled with an insufficient bandwidth of the graphics memory bus when the tri-linear filtering is enabled. By the by, the pixel pipeline of Kyro II is designed in such a way that it needs two clock cycles to lay a texture in case of tri-liner filtering. During the first clock the texturing unit selects 4 texturing samples from one texture MIP level, carries out bi-linear filtering and saves the calculated color. During the second clock the sampling and bi-linear filtering is done with a group of 4 samples from another MIP level of the same texture and then the texturing unit blends the newly calculated and the saved color data in an appropriate proportion. This way tri-linear filtering is done in two clock cycles. The calculated color data can be either stored in the frame buffer or saved for any upcoming manipulations with the textures. As you see, for tri-linear filtering Kyro II needs 2 clock cycles and 8 texturing samples (4 samples per clock). That is why in case of tri-linear filtering it is only for the surprisingly negligible performance loss possible due to texture compression that we admit: in Quake3 the performance of Kyro II is restricted by the graphics memory bandwidth, but not by the extra clocks of the pixel pipelines. Anyway, we'll return to the role of these extra clocks, and now let's see how things in Serious Sam stand.

Serious Sam


Serious Sam engine does support texture compression, but we decided not to change the settings. We left two modes only - "Speed Settings" and "Quality Settings" - and enabled "Force texture compression" in the driver. The results appeared similar to those obtained in Qualke3. For the "Speed Settings", when the tri-linear filtering is not engaged, in 16bit color mode Hercules 3D Prophet 4500 is just a trifle slower than NVIDIA GeForce2 GTS. In 16bit mode the too narrow memory bus of GeForce2 MX 400 and GeForce2 GTS don't spoil their performance that greatly. It makes Kyro II with its lower fillrate lag behind the rivals in spite of its "native" hardware hidden surface removal.

What do we observe in 32bit color mode? Kyro II lives through a much slighter performance drop than NVIDIA GeForce2 MX 400 and GeForce2 GTS. Now their performance is limited not by the texturing speed, i.e. by the fillrate, but by the graphics memory bandwidth. As a result, NVIDIA GeForce2 MX 400 turns out slower than Kyro II at all the resolutions. The other racer, NVIDIA GeForce2 GTS, proves faster than Kyro II at lower resolutions, where the graphics memory bus is loaded not so heavily, and loses at higher resolutions. That's where Kyro II manages to show its best, while the 8 texturing units of GeForce2 GTS stay idle waiting for the data to be delivered by the graphics memory.

Accordingly, when we enabled texture compression, the performance of Kyro II didn't soar too high (in these modes the chip is not slowed down by the graphics memory bandwidth, anyway). Much greater effect was obtained due to the extreme overclocking.


Apart from all the other changes, the "Quality" mode involves tri-linear filtering. Therefore, enabling texture compression allowed Kyro II to catch up with GeForce2 GTS in 16bit color mode and to leave it even farther behind in 32bit mode.

As a whole, Kyro II runs a good deal faster in the "Quality" mode (even without texture compression) than it did when we wrote our previous review. To tell the truth, we aren't sure what the grounds of this breakthrough are: is it due to the shift to a Pentium III based system, or the new 1.0c patch for Serious Sam, or maybe the 7-103 driver for Kyro II...


It takes Serious Sam a bit longer to run to the end than Quake3, so the test results don't vary so much and the performance increase obtained by extreme overclocking coincides almost perfectly with what we have surmised. Namely, at low resolutions, say at 800x600, where the performance is restricted by the CPU, overclocking doesn't yield any tangible fruits. But when the result is determined by the graphics card in the first place, at 1600x1200 resolution, the overclocking effect made 12% - that's exactly what the clock frequency of Kyro II was increased by.

As the resolution grows from 800x600 to 1600x1200, in the "Quality Settings" mode the graphics card is loaded more heavily than in the "Speed Settings" mode, that's why in case of "Quality Settings" overclocking is more efficient bringing about a performance gain even at lower resolutions.

3DMark 2000

In the previous review we tested Kyro II in 3DMark 2001, but this time we turned to 3DMark 2000, which is closer to the up-to-date games in difficulty and which provides adequate workload:




In this benchmark Kyro II is for the most part faster than GeForce2 MX 400. It proved up to our expectations having overtaken GeForce2 GTS at higher resolutions (especially in 32bit color mode) and dropped behind at low resolutions.

The diagrams show it quite well that in the "High Details" modes of Game1 and Game2 tests Kyro II faces some limitations again. It can be both the CPU performance and the necessity to sort the polygons. The CPU's significant impact onto the results is perfectly illustrated by the diagram for Game2 16bit: three absolutely different graphics cards have similar performance in the "High Details" mode.

We have also tried to run 3DMark 2000 with "Force Tri-linear Filtering" and "Force Texture Compression" enabled in the Direct3D section of the driver. The outcome resembled that of Quake3, that is, with forced tri-linear filtering the performance fell sharply, but forced texture compression slowed down the performance drop making it equal to some satisfactory 10-15% only.

Forced anisotropic filtering enabled in the Direct3D section of the driver brought about a deep performance plunge down, too. Forced texture compression didn't help much this time that's why we considered it worth looking at the figures a bit more attentively:

It's obvious that texture compression doesn't rescue Kyro II when anisotropic filtering joins the game. It proves that the graphics memory bandwidth is no bottleneck - we tested in 800x600x16 mode and enabled forced texture compression. To make sure that anisotropic filtering and texture compression really work in 3DMark 2000, we made a few nice screenshots:

   
Bi-linear Filtering     Anisotropic Filtering
   

Anisotropic Filtering + Texture
Compression (S3TC)

Anisotropic filtering and texture compression make a noticeable effect on the image quality, but texture compression gives no performance gain. So, we assume that when the anisotropic filtering is enabled, the graphics memory data bus is not to blame here. The performance of Kyro II drops because the core, i.e. the texturing units, get loaded much heavier. Well, now let's spotlight a couple of things.

Fillrate Investigation

We meditated for a while on the logic of Kyro II's behavior and drew a plan of how to measure its fillrate in different filtering modes.

Well, no sooner said than done... We took Unreal Tournament, disabled Decals, dynamic lighting, fogging, mirror surfaces, detailed textures and naturally Vsync (you select "preferences", then the points "Rendering" -> "Direct3D support" and "Display"). After that we selected an indoor level without the sky (we took DM-Deck16) and launched Practice Session with robots.

As a result, at this level the accelerator builds the whole scene within a single pass overlaying only two textures on the walls - the base texture and the lightmap. That's exactly what we need! The lightmap can be disabled if you like ("Display" -> "NoLighting" = "True"). It makes the level boring and lifeless, but it doesn't matter for us: we only aimed to have a single texture laid on the walls.

   
Base Texture with Lightmap     Base Texture without Lightmap

That's all for the preparations. Now we should only make the program display the instant number of frames per second (console command: timedemo 1), enter the game without pressing the ready-to-fight button and fly over the level looking for a dead-end to stay in. A nice way to play, isn't it?

After the instant number of frames per second turns stable, we save the figure and multiply it by the screen resolution to get the fillrate. Here is what we obtained for Kyro II in 1600x1200x16 mode testing it in Unreal Tournament with different Direct3D driver settings:

Texture filtering mode Bi-linear Tri-linear Anisotropic Tri-linear + Anisotropic
16bit - base texture (fps) 166.83 88.37 44.67 22.56
16bit - base texture + lightmap (fps) 88.37 59.46 22.56 15.09
32bit - base texture (fps) 166.83 88.37 44.65 22.55
32bit - base texture + lightmap (fps) 88.37 59.3 22.55 15.09
Pixels per frame 1,920,000 1,920,000 1,920,000 1,920,000
Calculated fillrate for base texture (Mpixels/sec) 320.3 169.7 85.8 43.3
Calculated fillrate for base texture + lightmap (Mpixels/sec) 169.7 114.2 43.3 28.97
Clock cycles per pixel (theoretical fillrate / calculated fillrate) for base texture 1.092 2.097 4.079 8.083
Clock cycles per pixel (theoretical fillrate / calculated fillrate) for base texture + lightmap 2.097 3.064 8.083 12.08

Of course, the figures are not 100% exact - we couldn't bypass the time spent for scene and game logic calculation, but at 1600x1200 when the game has not been actually begun it takes the CPU the shortest time to build a frame.

So, Kyro II carries out tri-linear filtering for one texture in two clock cycles. It needs the same two clocks for two operations with bi-linear filtering. When the tri-linear filtering is on with enabled lightmaps, Kyro II tackles a pixel in three clocks, so tri-linear filtering is not used with the lightmaps: in two clocks the pixel pipeline lays the base texture with tri-linear filtering and in the third clock it lays the lightmaps with bi-linear filtering. With anisotropic filtering enabled there are 4 clocks needed to process each pixel in case one texture is laid (the base one) and 8 clocks in case there are two textures.

When tri-linear and anisotropic filtering are enabled simultaneously, i.e. in case of maximum image quality, the chip needs 8 clocks to lay one texture per pixel: 4 clocks are spent to do the anisotropic filtering on one MIP level and 4 more clocks - on another MIP level. The texturing unit can blend the color data right after it is calculated.

It's remarkable that the fillrate changed neither after 32bit color depth mode was enabled, nor after 32bit textures and texture compression were turned on. This indicates that in this mode the performance of Kyro II is limited by the time needed for either filtering type and not by the data transfer rate via the memory bus. Especially, if we remember that the small piece of texture that we see staying in the dead-end is small enough to fit into Kyro II texturing cache and we don't have to wait for the data to be transferred from the graphics memory.

Another surprising thing is the big number of clocks needed for anisotropic filtering. Assuming that with anisotropic filtering texel color is calculated as a result of four bi-linear filtering operations along the anisotropy line, it makes sense to talk about 16-sample anisotropic filtering of Kyro II and to expect excellent image quality. However, image quality provided by Kyro II in case of anisotropic filtering can be compared with that by NVIDIA GeForce2, while the costs of this image quality are incommensurable.

Anyway, these are only intuitive casts and guesses. The fact is that for the time being the anisotropic filtering of Kyro II is to be admired only theoretically and can hardly be used in the today's games, because it causes a really bad performance drop.

Conclusion

Extreme Overclocking

Since theoretically the performance of Kyro II is supposed to depend mainly on the chip fastness, and in practice these are the strictly synchronized core and graphics memory frequencies that matter, extreme overclocking seemed utterly promising to us. Indeed, the low overclockability of the graphics memory as well as Kyro II's technological and architectural limitations are a grave hindrance to a substantial performance gain. It is especially unattractive because the overclockers will have to redesign the graphics card and this way deprive them of any warranty services.

But even this modest clocking growth allows to net a significant performance increase as long as the performance of the overclocked Kyro II increases in proportion to its growing clock frequency.

Forced Texture Compression

It represents a marvelous way to increase the performance of Kyro II when tri-linear filtering is on. The only negative consequence is a little bit lower graphics quality, which in its turn results from the S3TC/DXTC compression algorithms.

Forced texture compression provides a performance growth even without tri-linear filtering. Although, in this case the effect is not that obvious and is comparable to that by the other graphics cards. Certainly, Kyro II suffers no inborn bugs with incorrect S3TC/DXTC decompression of transparent textures as it happens with NVIDIA GeForce256, GeForce2 and GeForce3.

Clocks, Anisotropic Filtering, Drivers

In our previous review, we were pretty much disappointed with the results shown by Kyro II in Serious Sam, but after we got hold of a new patch for Serious Sam and installed a fresh driver set from Imagination, the situation changed for the better. It lets us think that the performance potential of Kyro II is not exhausted yet and there is room for improvement. So far, Kyro II hasn't yet recovered from the so-called "ATI disease". We mean that the results are outstandingly high in some games leaving much to be desired in the other games, the card simply flies on one platform and lags awfully slowly on the other one.

With all that, the manufacturers have acknowledged Kyro II and keep announcing graphics cards built on this chip. They are selling quite well, i.e. Kyro II is steadily pushed into the market and there should be no delays with updated driver versions.

Hefty losses caused by the anisotropic filtering are a weak spot of Kyro II. We guess that it won't be eliminated even when new driver versions come out. However, anisotropic filtering is used not so often to be regarded as a fatal sin of Kyro II.


Article Rating

Article Rating: 10 out of 10
 
Rate this article:
Excellent
Average
Poor
 

Discussion

Comments currently: 0

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: Video

Friday, July 25, 2008

12:31 pm Channel Vendors Demand Graphics Cards Suppliers to Recall Potentially Faulty Nvidia GeForce Graphics Cards. Resellers Want to Return Potentially Faulty Nvidia GeForce Graphics Boards to Makers

Thursday, July 17, 2008

5:48 am Microsoft Preps to Unveil DirectX 11 Features in Several Days. ATI, Nvidia, Microsoft to Discuss DirectX 11 Techniques at XNA, Siggraph

Wednesday, July 16, 2008

12:30 pm New Generation ATI Radeon for Mainstream, Mobile Markets are Ready. PCI-SIG Approves ATI RV730, M98-L, M96 Graphics Chips

7:22 am EVGA and XFX Reimburse Price Difference on GeForce GTX 200 after Price Collapse. EVGA and XFX to Return Money to GeForce GTX 200 Purchasers

Tuesday, July 15, 2008

4:23 pm Startup Promises to Revolutionize Multi-GPU Technology Early Next Year. LucidLogix Unveils Hydra Distributed Processing Engine

 
News Archive
All Latest News