News
 

Bookmark and Share

(7) 

UPDATE: Adding comments regarding performance, memory bandwidth.

Nvidia Corp. has issued a whitepaper that uncovers performance of its forthcoming Tegra 3 “Kal El” system-on-chip for tablets and smartphones and shows off the benefits of a quad-core SoC with a low-power companion core over dual-core offerings from its rivals. Thanks to increased amount of cores, new graphics core and refined 40nm manufacturing process technology, Tegra 3 is clearly faster than current-generation offerings.

Despite of expectations, a number of mobile applications and benchmarks can already take advantage of four general-purpose cores. Nvidia utilized either synthetic benchmarks or artificially modeled use-cases (e.g., it used Chrome browser that is not available for ultra-mobile devices). Nonetheless, it is pretty clear that in case of heavy usage and demanding applications, such as games, the Kal-El may be rather unbeatable.

Linpack, a widely used CPU benchmark, gives an industry-standard measure of a system’s floating point computing power.

Results from the multithreaded Linpack benchmark show that quad-core Tegra “Kal-El” delivers almost 60% higher performance over an equivalent dual-core processor. Typically, LINPACK scales very well with the increase of the number of cores, but in this case performance may be limited by memory bandwidth.

Results from Coremark, a popular mobile CPU benchmark, are an indicator of performance for CPU intensive multimedia applications.

For instance, Coremark shows that quad-core Tegra delivers almost 2x the performance of dual-core SoCs based mobile processors and almost 4x the performance of single core SoCs.

On a quad core CPU based system, the operating system will be able to allocate multiple web scripts across the four CPU cores and deliver much faster execution of JavaScript heavy pages.

Results from Moonbat, a web-based JavaScript benchmark, obtained by Nvidia show that a quad-core SoC delivers almost 50% faster web browsing performance compared to a mobile system-on-chip with two general-purpose ARM cores. 

According to Nvidia, Google Chrome web-browser, which is currently available only for Windows or Mac OS and which may eventually be released for Android, can take advantage of all four cores during tabbed browsing.

While it is clear that all four cores are utilize during tabbed web-browsing, Nvidia did not indicate whether it actually used Tegra 3 “Kal-El” or an x86 processor to show benefits of four general-purpose processing engines.

Photaf 3D Panorama is an Android application that enables users to automatically capture 3D panoramic photos and stitch together the captured images to be viewed immediately.

The heavy image processing involved in detecting edges and stitching together the images benefit greatly from the quad-core processing capabilities of Tegra 3 “Kal-El”: the SoC is approximately two times faster than its predecessor.

Without any surprises, a video transcoding app benefits greatly from the increased amount of cores.

Handbrake, a popular video transcoding application, delivers 60% faster transcoding on a quad-core Nvidia SoC compared to a dual-core Nvidia Kal-El-based system. What is unknown is how performance of quad-core Tegra 3 in this benchmark stacks up against hardware-accelerated transcoding.

Thanks to next-generation GeForce graphics core with twelve stream processors along with two additional general-purpose ARM cores, the next-generation SoC will offer very strong performance in gaming applications.

The graph shows the performance speedup that the Nvidia Tegra 3 “Kal-El: platform provides greater performance in the Glowball demo as well as in two video games over an equivalent dual-core Nvidia Tegra 2-based platform.

Unfortunately, Nvidia does not reveal any testbed configurations or methods. Given the fact that Nvidia used to show off Glowball demo on a tablet with 10.1" screen (which likely had 1280*800 resolution), it is highly likely that the company used a similar system. In case of 1280*800 resolution, the results are pretty encouraging as the quad-core SoC does not seem to be limited by memory bandwidth. However, if Nvidia lowered the resolution during testing in order to maximize the difference between dual-core and quad-core chips, real-world games on tablets with high-def screens will behave differently.

Finally, since next-generation Nvidia Tegra Kal-El is made using refined 40nm process technology and sports a number of power consumption optimizations, it manages to consume less amount of energy in certain use cases compared to existing current-generation offerings made using less advanced process technology.

Nvidia’s measurements show that Tegra “Kal-El” SoC consumes 2-3x less power than competing solutions when it is constrained to the same level of performance – each processor completing roughly 5000 of Coremark “work”. Even when Kal-El is run at a higher frequency, completing more than 2x the amount of Coremark “work”, it still consumes less power than dual core solutions. Naturally, in real-world cases applications will attempt to get all the performance the new SoC has to offer and will thus increase power consumption.

Nvidia’s performance benchmarks of Tegra 3 “Kal-El” clearly show that the next-generation system-on-chip will be able to provide 60% and higher performance when compared to current-generation SoCs for smartphones. In a number of cases, Tegra “Kal El” will also consume less power than Nvidia’s Tegra 2, which means better battery life for devices. What should be noted is that Tegra 3 "Kal-El" will not compete directly with existing Qualcomm Snapdragon QC8660 or OMAP4, but against next-gen SoCs that will also sport foru ARM cores, next-gen graphics and other advantages.

Nvidia also yet has to reveal more details about memory bandwidth of Tegra 3 "Kal El". Based on LINPACK score, there may be some constraints. Four ARM Cortex-A9 cores and a ULP GeForce with twelve stream processors will naturally require a lot of bandwidth, clearly more than Tegra 2's memory sub-system can provide (up to 5.3GB/s, LPDDR2 667MHz). In fact, just like in case of traditional PCs, small form-factor systems may soon feel the critical need for fast memory sub-system and Tegra 3 will be one of the first SoCs that may show this necessity.

Nvidia itself promises that the first Tegra 3-based products will hit the market this calendar year.

Tags: Nvidia, Kal-El, Kal El, Tegra, 28nm, 40nm

Discussion

Comments currently: 7
Discussion started: 09/21/11 02:56:05 PM
Latest comment: 11/10/11 11:13:11 PM
Expand all threads | Collapse all threads

[1-4]

1. 
Thanks to increased amount of cores, new graphics core and more advanced 28nm manufacturing process technology, Tegra 3 is clearly faster than current-generation offerings.


I think Kal-El is 40 nm
3 0 [Posted by: veleciraptor  | Date: 09/21/11 02:56:05 PM]
Reply
- collapse thread

 
Thanks, corrected!
0 0 [Posted by: Anton  | Date: 09/22/11 04:26:46 AM]
Reply

2. 
Some benchmarks results are interesting. Linpack particularly. Tegra3 provides NEON/VFPv3 while Tegra2 only VFPv3-D16 (half of floating point regs) so I would guess Linpack should go quite higher in case of Tegra3. What it might shows is memory bandwidth limitation on Tegra3...
1 0 [Posted by: kgardas  | Date: 09/22/11 03:14:56 AM]
Reply
- collapse thread

 
Thanks for pointing out. It may be a memory bandwidth constraint, or another one. We still don't know a lot about the Tegra 3.
0 0 [Posted by: Anton  | Date: 09/22/11 12:45:36 PM]
Reply

3. 
is a caricature 5 cores CPU... i didnt hear by know 5 cores...
0 0 [Posted by: tbaracu  | Date: 09/22/11 07:54:02 AM]
Reply

4. 
the fact that Tegra3 now provides NEON SIMD is a very good thing, as x264 ARM code now has some fast cortex NEON SIMD optimization... but as regards above, what version of handbrake did they/you use, and why didn't they/you run a Git pull of the current daily x264 with these latest and greatest NEON ARM SIMD optimization's included and use that faster updated x264 code/app directly instead ?

and come to that, what was the exact video sample and encode setting used so we can compare to other existing FPS throughput.

put simply, core Nvidia and ARM devs would be best served if they helped lend a hand and actually send in patch's to the current x264 encoder GIT to help add even more ARM SIMD speedups in there ASAP, to get FPS parity with other quad CPUs today.
0 0 [Posted by: pip10  | Date: 11/10/11 11:13:11 PM]
Reply

[1-4]

Add your Comment




Related news

Latest News

Tuesday, July 22, 2014

10:40 pm | ARM Preps Second-Generation “Artemis” and “Maya” 64-Bit ARMv8-A Offerings. ARM Readies 64-Bit Cores for Non-Traditional Applications

7:38 pm | AMD Vows to Introduce 20nm Products Next Year. AMD’s 20nm APUs, GPUs and Embedded Chips to Arrive in 2015

4:08 am | Microsoft to Unify All Windows Operating Systems for Client PCs. One Windows OS will Power PCs, Tablets and Smartphones

Monday, July 21, 2014

10:32 pm | PQI Debuts Flash Drive with Lightning and USB Connectors. PQI Offers Easy Way to Boost iPhone or iPad Storage

10:08 pm | Japan Display Begins to Mass Produce IPS-NEO Displays. JDI Begins to Mass Produce Rival for AMOLED Panels

12:56 pm | Microsoft to Fire 18,000 Employees to Boost Efficiency. Microsoft to Perform Massive Job Cut Ever Following Acquisition of Nokia