c - How to write or read memory without touching cache

Question

Welcome To Ask or Share your Answers For Others

c - How to write or read memory without touching cache

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:55:54+0000

The CPU indeed manages its own caches in hardware, but x86 provides you some ways to affect this management.

To access memory without caching, you could:

Use the x86 non-temporal instructions, they're meant to tell the CPU that you won't be reusing this data again, so there's no point in retaining it in the cache. These instructions in x86 are usually called movnt* (with the suffix according to data type, for e.g. movnti for loading normal integers to general purpose registers). There are also instructions for streaming loads/stores that also use a similar technique but are more appropriate for high BW streams (when you load full lines consecutively). To use these, either code them in inline assembly, or use the intrinsics provided by your compiler, most of them call that family _mm_stream_*
Change the memory type of the specific region to uncacheable. Since you stated you don't want to disable all caching (and rightfully so, since that would also include code, stack, page map, etc..), you could define the specific region your benchmark's data-set resides in as uncacheable, using MTRRs (memory type range registers). There are several ways of doing that, you'll need to read some documentation for that.
The last option is to fetch the line normally, which means it does get cached initially, but then force it to clear out of all cache levels using the dedicated clflush instruction (or the full wbinvd if you want to flush the entire cache). Make sure to properly fence these operations so that you can guarantee they're done (and of course don't measure them as part of the latency).

Having said that, if you want to do all this just to time your memory reads, you may get bad results, since most of the CPUs handle non-temporal or uncacheable accesses "inefficiently". If you're just after forcing reads to come from memory, this is best achieved through manipulating the caches LRUs by sequentially accessing a data set that's large enough to not fit in any cache. This would make most LRU schemes (not all!) drop the oldest lines first, so the next time you wrap around, they'll have to come from memory.

Note that for that to work, you need to make sure your HW prefetcher does not help (and accidentally covers the latency you want to measure) - either disable it, or make the accesses stride far enough for it to be ineffective.

Categories

c - How to write or read memory without touching cache

c - How to write or read memory without touching cache

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags