What is CPU Cache? – The Hero of Speed
Central processing units, or CPUs, have had a major influence on how modern computing works. Most of the logic and everyday usage a personal computer, or mobile phone goes through, are done by the CPU and depending on the tasks, a graphics card, commonly referred to as a GPU. Speed, when talking about CPUs, has often been attributed to the clock rate, measured in gigahertz (GHz) today.
This is a common misconception, because for example, the AMD Bulldozer series of CPUs reached 5 GHz in 2013, but their performance was not as good as competing units, despite the high clock speed. Its failure was in the design, two cores sharing a single floating-point unit (FPU), as well as cache. The latter is more important, especially for everyday use.
Often, manufacturers will talk about improved cache or a larger cache, and the numbers might seem impressive. Without knowing what cache actually is, the numbers are anything but meaningful to the average user. Examining cache in detail will help in understanding it.
What is a CPU Cache? – The Basics
Latency sensitive operations, such as CPU operations in modern computing, need fast memory in order for them to actually be perceived as fast. While clock speeds play an important role in how fast a CPU will perform a requested operation, the CPU cache plays an even bigger one.
A CPU cache is made from static random-access memory and is used to store frequently requested data or parts of data deemed necessary, so that it can be accessed and opened faster. Bits of data that are considered important are based on many algorithms that analyze code.
A good example is your favorite browser. Opening it a couple of times during one session will have as a result faster loading times. Depending on many factors, however, the data for your browser, or video game, for that matter, will be stored in a different tier/level of cache.
Caches were developed because there was a need for faster memory access, because CPU clock speeds started rocketing in the 1980s, compared to memory speed. SRAM is used for caches, because it is faster and can be smaller and placed physically closer to the CPU, usually on-die. Compared to main memory or dynamic RAM, it is faster and closer to the CPU, meaning much less latency.
Latency is a big issue in computing, where distance, speed and other factors determine how fast an operation will be performed. To reduce latency and improve performance, different levels of cache were created, most processors having Level 1 and 2 and larger ones Level 3, also called L1, L2 and L3.
CPU Cache Levels – The Hierarchy of Data
When the first caches were invented and implemented, in the late 1970s and early 1980s, most processors had a L1 cache. Even today, these cache tiers are only a couple kilobytes large, and are split into two separate sides, L1D, for data and L1I for instructions.
The L1 cache is present on all processors and is the most important cache for operations, reserved for data that needs the lowest latency. Having fewer blocks than L2, it is the fastest. Typically, every CPU core has its own L1 cache. Sharing L1 between cores is undesirable and can produce latency issues.
When a CPU finds the necessary data in L1, it is considered a cache hit. When, however, it does not find the data, it is called a cache miss and then it has to look for the data elsewhere, in this case, the next tier of cache. L2 is the next tier, for most processors.
L2 cache is sometimes shared between two cores, but more often than not, on desktop processors, each core has its own L2 cache. L2 is larger than L1, with a total size of a couple of megabytes. L2 does not share data with L1, because that would make things worse. If data cannot be found in L2, the processor moves down the line, to L3.
L3 is often the largest cache to be found on processors, just as the others, it is on-die. This tier of cache, however, is most often shared between cores, simply because it is very large. AMD Zen 2 processors, for example, have a combined L3 of 64 MB, 32 per 6-8 cores, depending on the core count. L3 is used for larger chunks of data, like video games.
Some processors have L4, but not too many. DRAM is sometimes used for L4, meaning system memory, which is the case with integrated graphics. One of the reasons why integrated graphics are slow is due to the physical distance between the CPU and the speed of DRAM.
CPU Cache and Overclocking – Does it Matter?
Some overclockers tinker with CPU cache ratio and CPU cache voltage, basically overclocking their cache. The problem with this is that cache overclocking does not do much but raise the temperature and potentially cause more instability. Higher temperatures require better cooling.
Overclocking the cache is not recommended unless one is competing for a world record. Extreme overclocks tend to be unstable and unsustainable, which can cause crashes and in fringe cases, data loss due to bit flips and write errors. Any professional or even a gaming rig would suffer from an unstable overclock.
Conclusion and Summary – Cached Data
Cache is basically really fast memory which the processor uses to find necessary data or data chunks. Split into tiers, most often L1, L2 and L3, cache is used to find data faster, the most important data being in the first level, larger and less important data being in the next levels.
A cache hit is when the processor finds the data it needs, while a cache miss moves the search to the next tier, either until a hit is made or system memory is reached. Cache is most often on-die, very close to the CPU cores, except for L4, which can be off-die and is the system memory, particularly on modern desktop CPUs.
Finally, cache overclocking is not recommended, unless the overclocker is an expert and either is looking to break a record or is up to the challenge of stability testing.
Using Hardware Accelerated Apps While Gaming
Hardware acceleration in apps has been a staple for a long time since you can complete tasks more efficiently with specialized hardware. When it comes to gaming, however, having hardware-accelerated apps run alongside your game can cause troubles, and here is why. What Is Hardware Acceleration? Hardware acceleration is the process through which apps gain […]
What Is Ray Tracing and Should You Care?
In recent years, video games have been advertised with ray tracing, a technology which supposedly makes the gaming experience almost magical. Other than the obvious advertising campaigns launched with the release of the Nvidia 2000 series graphics cards in 2018. However, ray tracing is a technology beyond one company’s marketing department, meant to enhance lighting […]
Liquid Cooling vs Air Cooling – TL;DR
The debate between liquid coolers and air coolers is as old as liquid cooling became a thing and there are a lot of misconceptions and misunderstandings when it comes to which type of CPU cooling solution you should use. The main point of this article is to quickly point out which cooling solution is better […]
NVIDIA Ultra Low Latency vs NVIDIA Reflex
If you are a competitive gamer it is quite obvious that higher FPS and monitor refresh rates improve your response times by reducing system latency and allowing more accurate inputs to show up as in-game actions. If you are using an NVIDIA GPU you also probably know that the company has worked hard on different […]
Advanced GPU Overclocking – BIOS Power Limit Tweaking
Overclocking your parts has become a natural step in setting up your PC for maximum performance, and although it is not necessary since a lot of the new parts are already almost maxed out from the factory, my philosophy is that if I paid for the entire GPU I will use all of it. GPU […]
How to Force Your GPU in a P0 State
Clock fluctuation for your CPU and GPU is generally considered to be bad for your inputs since it introduces variations to what otherwise would be a linear and predictable input. In both cases, you can get rid of this problem by overclocking your CPU/GPU and disabling C-states for your CPU and P-states for your GPU. […]