Yesterday we mentioned how caches work, what the distinction is between L1 and L2, and the varied design components that decide how briskly (and the way efficient) a CPU’s cache is. At the moment, we’re going to take one step additional and discover the distinction between L2 and L3 caches.

At its easiest stage, an L3 cache is only a bigger, slower model of the L2 cache. Again when most chips have been single-core processors, this was typically true. The primary L3 caches have been really constructed on the motherboard itself, related to the CPU by way of the bottom bus. When AMD launched its K6-III processor household, many present K6/Ok-2 motherboards might settle for a K6-III as nicely. Sometimes these boards had 512Ok-2MB of L2 cache — when a K6-III, with its built-in L2 cache was inserted, these slower, motherboard-based caches grew to become L3 as an alternative.

By the flip of the century, slapping an extra L3 cache on a chip had turn into a straightforward method to enhance efficiency — Intel’s first consumer-oriented Pentium four “Excessive Version” was a repurposed Gallatin Xeon with a 2MB L3 on-die. Including that cache was enough to purchase the Pentium four EE a 10-20 p.c efficiency increase over the usual Northwood line.

Cache and the multi-core curveball

As multicore processors grew to become extra frequent, L3 cache began showing extra incessantly on shopper . These chips, like Intel’s Nehalem and AMD’s K10 (Barcelona) used L3 as greater than only a bigger, slower backstop for L2. Along with this perform, the L3 cache is commonly shared between the entire processors on a single piece of silicon. That’s in distinction to the L1 and L2 caches, each of which are usually personal and devoted to the wants of every explicit core. (AMD’s Bulldozer design is an exception to this — Bulldozer, Piledriver, and Steamroller all share a standard L1 instruction cache between the 2 cores in every module).

Intel’s Haswell-E, for instance, has eight separate cores that every one again as much as a standard L3 cache.

Haswell-E

Non-public L1/L2 caches and a shared L3 is hardly the one option to design a cache hierarchy, nevertheless it’s a standard strategy that a number of distributors have adopted. Giving every particular person core a devoted L1 and L2 cuts entry latencies and reduces the possibility of cache competition — which means two completely different cores gained’t overwrite important information that the opposite put in a location in favor of their very own workload. The frequent L3 cache is slower however a lot bigger, which suggests it might probably retailer information for all of the cores directly. Subtle algorithms are used to make sure that Core zero tends to retailer info closest to itself, whereas Core 7 throughout the die additionally places needed information nearer to itself.

Not like the L1 and L2, that are practically all the time CPU-focused and personal, the L3 can be shared with different gadgets or capabilities. Intel’s Sandy Bridge CPUs shared an 8MB L3 cache with the on-die graphics core (Ivy Bridge gave the GPU its personal devoted slice of L3 cache in lieu of sharing your entire 8MB).

In distinction to the L1 and L2 caches, each of that are usually fastened and differ solely very barely (and largely for funds components) each AMD and Intel supply completely different chips with considerably completely different quantities of L3. Intel usually sells not less than a number of Xeons with decrease core counts, greater frequencies, and a better L3 cache-per-CPU ratio. Intel’s Core i7 processors have maintained an 8MB L3 for the reason that debut of Nehalem in 2008 (roughly 2MB of L3 for each CPU core) however the highest-end components are usually pegged at 2.5MB of cache per CPU core.

At the moment, the L3 is characterised as a pool of quick reminiscence frequent to all of the CPUs on an SoC. It’s usually gated independently from the remainder of the CPU core and could be dynamically partitioned to stability entry velocity, energy consumption, and storage capability. Whereas not practically as quick as L1 or L2, it’s usually extra versatile and performs a significant position in managing inter-core communication. With Intel having already added L4 to its Skylake chips, it’s attainable we’ll see the L3 take a extra simplified position — with a few of its features and capabilities shifting over to the newer, bigger pool of cache.