What is CPU Cache? – The Hero of Speed

Central processing units, or CPUs, have had a major influence on how modern computing works. Most of the logic and everyday usage a personal computer, or mobile phone goes through, are done by the CPU and depending on the tasks, a graphics card, commonly referred to as a GPU. Speed, when talking about CPUs, has often been attributed to the clock rate, measured in gigahertz (GHz) today.

This is a common misconception, because for example, the AMD Bulldozer series of CPUs reached 5 GHz in 2013, but their performance was not as good as competing units, despite the high clock speed. Its failure was in the design, two cores sharing a single floating-point unit (FPU), as well as cache. The latter is more important, especially for everyday use.

Often, manufacturers will talk about improved cache or a larger cache, and the numbers might seem impressive. Without knowing what cache actually is, the numbers are anything but meaningful to the average user. Examining cache in detail will help in understanding it.

What is a CPU Cache? – The Basics

Latency sensitive operations, such as CPU operations in modern computing, need fast memory in order for them to actually be perceived as fast. While clock speeds play an important role in how fast a CPU will perform a requested operation, the CPU cache plays an even bigger one.

A CPU cache is made from static random-access memory and is used to store frequently requested data or parts of data deemed necessary, so that it can be accessed and opened faster. Bits of data that are considered important are based on many algorithms that analyze code.

A good example is your favorite browser. Opening it a couple of times during one session will have as a result faster loading times. Depending on many factors, however, the data for your browser, or video game, for that matter, will be stored in a different tier/level of cache.

Caches were developed because there was a need for faster memory access, because CPU clock speeds started rocketing in the 1980s, compared to memory speed. SRAM is used for caches, because it is faster and can be smaller and placed physically closer to the CPU, usually on-die. Compared to main memory or dynamic RAM, it is faster and closer to the CPU, meaning much less latency.

Latency is a big issue in computing, where distance, speed and other factors determine how fast an operation will be performed. To reduce latency and improve performance, different levels of cache were created, most processors having Level 1 and 2 and larger ones Level 3, also called L1, L2 and L3.

CPU Cache Levels – The Hierarchy of Data

When the first caches were invented and implemented, in the late 1970s and early 1980s, most processors had a L1 cache. Even today, these cache tiers are only a couple kilobytes large, and are split into two separate sides, L1D, for data and L1I for instructions.

The L1 cache is present on all processors and is the most important cache for operations, reserved for data that needs the lowest latency. Having fewer blocks than L2, it is the fastest. Typically, every CPU core has its own L1 cache. Sharing L1 between cores is undesirable and can produce latency issues.

When a CPU finds the necessary data in L1, it is considered a cache hit. When, however, it does not find the data, it is called a cache miss and then it has to look for the data elsewhere, in this case, the next tier of cache. L2 is the next tier, for most processors.

L2 cache is sometimes shared between two cores, but more often than not, on desktop processors, each core has its own L2 cache. L2 is larger than L1, with a total size of a couple of megabytes. L2 does not share data with L1, because that would make things worse. If data cannot be found in L2, the processor moves down the line, to L3.

L3 is often the largest cache to be found on processors, just as the others, it is on-die. This tier of cache, however, is most often shared between cores, simply because it is very large. AMD Zen 2 processors, for example, have a combined L3 of 64 MB, 32 per 6-8 cores, depending on the core count. L3 is used for larger chunks of data, like video games.

Some processors have L4, but not too many. DRAM is sometimes used for L4, meaning system memory, which is the case with integrated graphics. One of the reasons why integrated graphics are slow is due to the physical distance between the CPU and the speed of DRAM.

CPU Cache and Overclocking – Does it Matter?

Some overclockers tinker with CPU cache ratio and CPU cache voltage, basically overclocking their cache. The problem with this is that cache overclocking does not do much but raise the temperature and potentially cause more instability. Higher temperatures require better cooling.

Overclocking the cache is not recommended unless one is competing for a world record. Extreme overclocks tend to be unstable and unsustainable, which can cause crashes and in fringe cases, data loss due to bit flips and write errors. Any professional or even a gaming rig would suffer from an unstable overclock.

Conclusion and Summary – Cached Data

Cache is basically really fast memory which the processor uses to find necessary data or data chunks. Split into tiers, most often L1, L2 and L3, cache is used to find data faster, the most important data being in the first level, larger and less important data being in the next levels.

A cache hit is when the processor finds the data it needs, while a cache miss moves the search to the next tier, either until a hit is made or system memory is reached. Cache is most often on-die, very close to the CPU cores, except for L4, which can be off-die and is the system memory, particularly on modern desktop CPUs.

Finally, cache overclocking is not recommended, unless the overclocker is an expert and either is looking to break a record or is up to the challenge of stability testing.

About The Author

XbitLabs Staff

We are a team of enthusiasts thriving to provide you with helpful advice on buying tech.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments