Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

Prescott: The Last of the Mohicans? (Pentium 4: from Willamette to Prescott) (page 15)


Category: CPU

by Victor Kartunov

[ 05/25/2005 | 11:45 AM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28

Chapter V: Cache Hierarchy

Of course, you may have noticed that we jumped right to execution units having omitted quite a number of other CPU architecture components. This is definitely not because there is nothing we can say about any of the previous stages. In fact, right on the contrary: we found some really interesting details there. The thing is that we need to revise some things we already know before we pass over to the detailed discussion of the planning units and micro-instruction queues. And among these things are cache access latencies, for instance.

The access into caches of different hierarchy is characterized with the time it takes to access the cache and the width of the bus between them. The table below sums up all these data for different caches of the CPU:

 

Bus bandwidth at 3GHz frequency

L1 D cache* - registers

48GB/sec***

L2 cache – L1 D cache

96GB/sec

L3 cache** - L2 cache

24GB/sec

* - the numbers are given only for the data cache, because there is no defined bus bandwidth in the Trace cache. We only know the maximum transfer rate of 6 uop-s per two clock cycles, but since there is no fixed size for a single uop, it is hard to estimate the actual bandwidth.
** - Only Gallatin core has an L3 cache right now. This is a modification of 130nm Northwood core with the integrated on-die L3 cache (with ECC support). This core is used for the following CPUs: Pentium 4 Extreme Edition, Xeon MP, Xeon DP with L3 cache.
*** - in fact more exact bus bandwidth rates are 44.8GB/s, 89.5GB/s, 22.4GB/s.

You can easily notice that different cache levels have different peculiarities. In particular, the read speed form l1 cache into registers is much lower than the data transfer rate along the bus between L1 and L2 caches. You wonder how this thing happened? Why do we need the L1 cache at all then? Does it make any sense this way?

It is actually because the task of the L1 cache is different from what the L2 cache is supposed to do: L1 cache should find and present the needed data fast and with minimal latency. Note that the read speed from this cache doesn’t exceed 16KB/clock (48GB/s). Moreover, one of the main reasons for the L1 cache to be there is the inability of the CPU to access the L2 cache directly. In order to use the data, they should create a request, find the data, transfer it to the L1 data cache, etc.

But if the requested data is not in the L1 cache, then it needs to be transferred there as soon as possible.

The following example will illustrate why this is necessary. Imagine that the program needs some data, which is arranged not in an ordered chain (with sequential addresses). All pieces of data are more than the length of a string apart from one another. Or they are even scattered all over the place within quite a big area of the memory, which is even worse. In this case we will have to transfer the entire 64 Byte string in order to read only one byte of information(!). In other words, the amount of information transferred from the L2 cache is 64 times bigger than what we actually need. Of course, the higher is the data transfer rate from L2 to L1 data cache, the better. Moreover, the decoder also grabs the data from the L2 cache.

<<< Previous page Next page >>>

Discussion

Comments currently: 17
Discussion started: 05/26/05
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

Wednesday, July 16, 2008

11:55 pm Intel Promises to Ship 100 Million 45nm Microprocessors This Year. Intel Says 45nm Process Technology Ramp Better than Ever

7:06 pm Intel to Launch Another Offence with Nehalem Microprocessors Later This Year. Intel to Aggressively Push Nehalem Micro-Architecture into High-End Desktops

 
News Archive
All Latest News