The Intel manual optimization (revision September 2019) shows a 48 KiB 8-way associative L1 data cache for the Ice Lake microarchitecture.
1 Software-visible latency/bandwidth will vary depending on access patterns and other factors.
This baffled me because:
- There are 96 sets (48 KiB / 64 / 8), which is not a power of two.
- The indexing bits of a set and the indexing bits of the byte offset add to more than 12 bits, this makes the cheap-PIPT-as-VIPT-trick not available for 4KiB pages.
All in all, it seems that the cache is more expensive to handle but the latency increased only slightly (if it did at all, depending on what Intel means exactly with that number).
With a bit of creativity, I can still imagine a fast way to index 96 sets but point two seems an important breaking change to me.
What am I missing?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…