Home System Design Tutorial What is CPU cache memory in computer architecture?

What is CPU cache memory in computer architecture?

The cache is the high-speed data storage memory. It is a temporary storage area that lies between the processor and the main memory(RAM) of a computer for faster data retrieval. It stores the copy of data/information frequently used. The information stored in the cache memory is the result of the previous computation of the main memory. The data stored in the CPU cache need not be synchronized with the actual main memory content every time.

Memory Hierarchy

The memory hierarchy is an arrangement of various types of memories in the order of response time. The following diagram describes the memory hierarchy. As we move up in the Memory hierarchy, the execution speed increases but the cost also increases. For example – CPU registers (at level 1) is the fastest processing unit but also very costly hence we can’t use all of them. The secondary memory residing at Level 4 is the cheapest storage unit but their execution speed is very slow, so we also cannot use all of them.

Hence we have to use the combination of these memories in such a way that we achieve good execution speed at a cheaper cost.

Memory hierarchy and Level 3 CPU cache
Memory hierarchy

Levels of Memory in detail –

  1. Registers (Level 1) – Registers are the type of data storage memory that is close to the CPU. Registers are the fastest storage unit in terms of speed(usually 1 clock cycle). Registers are generally measured by the number of bits they can hold, for example, an “8-bit register”, “32-bit register” or a “64-bit register” etc.
  2. Cache (Level 2) – It is a faster storage unit (after registers) with faster access time to reduce the average cost of data access from the main memory. It stores copies of data that are frequently accessed by the main memory. Most of modern CPUs have multiple levels of CPU caches.
  3. Main Memory (Level 3) This is also known as Ram. It’s a volatile memory, hence once the power is gone it’s is empty.
  4. Secondary Memory – It is the external memory with large capacities (up to TeraBytes). The data is stored permanently and they are slow as compared to all other levels. Ex- HardDrives, FlashDrives, etc.

CPU Cache Performace

Whenever a processor wants to perform a read from or write a memory location if first checks whether the data block is present in the cache or not because read from and write too in the cache is much faster than that of the main memory.

If the processor finds the required data block in the cache, this is known as a cache hit. But if the required data block is not present in the cache and processor has to access the main memory for reading or writing the data block thus increasing the latency, it is known as a cache miss. In case of a cache miss, the CPU will perform the read from or write to from the main memory and also add the entry to cache block too for faster subsequent read and write.

The cache performance is measured in terms of Hit Ratio –

Hit Ratio = (cache hits) / (cache hits + cache misses)

Multi-level CPU caches

Cache memories are fast but very costly. So to make a trade-off between the cost and speed (or latency) we use multilevel caches between the main memory and processor. In multilevel caching architecture, when we move from the higher level to the lower level the latency(time to read from or write to) and the storage capacity increases but the cost decreases.

  1. L1 Cache – L1 cache also known as the primary cache is the fastest cache but smallest in size (generally 1MB-2MB) as compared to all other caches. Whenever the processor starts looking for some instructions, it first searches it in the L1 cache. It is usually embedded in the processor chip.
  2. L2 Cache – L2 cache is slower than the L1 cache but larger in size (generally 256KB-8MB) also cheaper too as compared to the L1 cache. If an instruction is not present in the L1 cache then the processor searches for instruction in the L2 cache.
  3. L3 cache – L3 cache is the slowest among all caches, but also cheaper and has more capacity(generally 4MB – 50MB). In a multi-core processing environment, each core has a dedicated L1 & L2 cache but a shared L3 cache.

The data flow from Ram occurs first to L3 cache, then the L2 cache, and finally to L1 cache, but when the processor is looking for the data, it first searches in the L1 cache, then L2 cache & then L3 cache. If data is found in any level of the cache then it is known as a cache hit, but if data is not available in any level of cache and the processor has to fetch it from the main memory, then it is known as the cache miss.

Further Reading

  1. The Locality Of Reference – The locality of Reference helps in deciding which data block should be placed in the CPU cache. It is generally of two types –
    • Temporal Locality – According to the temporal locality if at one point a particular memory location is referenced, then it is likely that the same location will be referenced again in the near future.
    • Spatial Locality – According to the spatial locality if a particular memory location is referenced at a time then it is likely that memory location in its close proximity will be referenced in the near future.
  2. Web Cache – Web Cache (a.k.a HTTP Cache) is temporary storage used for storing frequently accessed static data such as HTML, CSS, images, etc to reduce the latency and the server load.


  1. Hello,

    I would like to point out two mistakes in this article.
    1) You write “In multilevel caching architecture when we move from the higher level to lower level the latency(time to read from or write to) and the cost decreases but the storage capacity increases.”.

    However, latency and size increase while only cost decreases.

    2) You write “If data is not found in any level of the cache then it is known as a cache hit, but if […]”.

    The “not” needs to be removed as it is a cache hit when data is found in the cache.



Please enter your comment!
Please enter your name here