Nvidia Unwraps Performance Benchmarks of Tegra 3 “Kal-El” [UPDATED]

Nvidia Tegra 3 “Kal-El” Outperforms All Dual-Core Rivals, Says Company

by Anton Shilov
09/21/2011 | 02:28 PM

UPDATE: Adding comments regarding performance, memory bandwidth.

Nvidia Corp. has issued a whitepaper that uncovers performance of its forthcoming Tegra 3 “Kal El” system-on-chip for tablets and smartphones and shows off the benefits of a quad-core SoC with a low-power companion core over dual-core offerings from its rivals. Thanks to increased amount of cores, new graphics core and refined 40nm manufacturing process technology, Tegra 3 is clearly faster than current-generation offerings.


Despite of expectations, a number of mobile applications and benchmarks can already take advantage of four general-purpose cores. Nvidia utilized either synthetic benchmarks or artificially modeled use-cases (e.g., it used Chrome browser that is not available for ultra-mobile devices). Nonetheless, it is pretty clear that in case of heavy usage and demanding applications, such as games, the Kal-El may be rather unbeatable.

Linpack, a widely used CPU benchmark, gives an industry-standard measure of a system’s floating point computing power.

Results from the multithreaded Linpack benchmark show that quad-core Tegra “Kal-El” delivers almost 60% higher performance over an equivalent dual-core processor. Typically, LINPACK scales very well with the increase of the number of cores, but in this case performance may be limited by memory bandwidth.

Results from Coremark, a popular mobile CPU benchmark, are an indicator of performance for CPU intensive multimedia applications.

For instance, Coremark shows that quad-core Tegra delivers almost 2x the performance of dual-core SoCs based mobile processors and almost 4x the performance of single core SoCs.

On a quad core CPU based system, the operating system will be able to allocate multiple web scripts across the four CPU cores and deliver much faster execution of JavaScript heavy pages.

Results from Moonbat, a web-based JavaScript benchmark, obtained by Nvidia show that a quad-core SoC delivers almost 50% faster web browsing performance compared to a mobile system-on-chip with two general-purpose ARM cores. 

According to Nvidia, Google Chrome web-browser, which is currently available only for Windows or Mac OS and which may eventually be released for Android, can take advantage of all four cores during tabbed browsing.

While it is clear that all four cores are utilize during tabbed web-browsing, Nvidia did not indicate whether it actually used Tegra 3 “Kal-El” or an x86 processor to show benefits of four general-purpose processing engines.

Photaf 3D Panorama is an Android application that enables users to automatically capture 3D panoramic photos and stitch together the captured images to be viewed immediately.

The heavy image processing involved in detecting edges and stitching together the images benefit greatly from the quad-core processing capabilities of Tegra 3 “Kal-El”: the SoC is approximately two times faster than its predecessor.

Without any surprises, a video transcoding app benefits greatly from the increased amount of cores.

Handbrake, a popular video transcoding application, delivers 60% faster transcoding on a quad-core Nvidia SoC compared to a dual-core Nvidia Kal-El-based system. What is unknown is how performance of quad-core Tegra 3 in this benchmark stacks up against hardware-accelerated transcoding.

Thanks to next-generation GeForce graphics core with twelve stream processors along with two additional general-purpose ARM cores, the next-generation SoC will offer very strong performance in gaming applications.

The graph shows the performance speedup that the Nvidia Tegra 3 “Kal-El: platform provides greater performance in the Glowball demo as well as in two video games over an equivalent dual-core Nvidia Tegra 2-based platform.

Unfortunately, Nvidia does not reveal any testbed configurations or methods. Given the fact that Nvidia used to show off Glowball demo on a tablet with 10.1" screen (which likely had 1280*800 resolution), it is highly likely that the company used a similar system. In case of 1280*800 resolution, the results are pretty encouraging as the quad-core SoC does not seem to be limited by memory bandwidth. However, if Nvidia lowered the resolution during testing in order to maximize the difference between dual-core and quad-core chips, real-world games on tablets with high-def screens will behave differently.

Finally, since next-generation Nvidia Tegra Kal-El is made using refined 40nm process technology and sports a number of power consumption optimizations, it manages to consume less amount of energy in certain use cases compared to existing current-generation offerings made using less advanced process technology.

Nvidia’s measurements show that Tegra “Kal-El” SoC consumes 2-3x less power than competing solutions when it is constrained to the same level of performance – each processor completing roughly 5000 of Coremark “work”. Even when Kal-El is run at a higher frequency, completing more than 2x the amount of Coremark “work”, it still consumes less power than dual core solutions. Naturally, in real-world cases applications will attempt to get all the performance the new SoC has to offer and will thus increase power consumption.

Nvidia’s performance benchmarks of Tegra 3 “Kal-El” clearly show that the next-generation system-on-chip will be able to provide 60% and higher performance when compared to current-generation SoCs for smartphones. In a number of cases, Tegra “Kal El” will also consume less power than Nvidia’s Tegra 2, which means better battery life for devices. What should be noted is that Tegra 3 "Kal-El" will not compete directly with existing Qualcomm Snapdragon QC8660 or OMAP4, but against next-gen SoCs that will also sport foru ARM cores, next-gen graphics and other advantages.

Nvidia also yet has to reveal more details about memory bandwidth of Tegra 3 "Kal El". Based on LINPACK score, there may be some constraints. Four ARM Cortex-A9 cores and a ULP GeForce with twelve stream processors will naturally require a lot of bandwidth, clearly more than Tegra 2's memory sub-system can provide (up to 5.3GB/s, LPDDR2 667MHz). In fact, just like in case of traditional PCs, small form-factor systems may soon feel the critical need for fast memory sub-system and Tegra 3 will be one of the first SoCs that may show this necessity.

Nvidia itself promises that the first Tegra 3-based products will hit the market this calendar year.