volatile
only forces your code to re-read the value, it cannot control where the value is read from. If the value was recently read by your code then it will probably be in cache, in which case volatile will force it to be re-read from cache, NOT from memory.
There are not a lot of cache coherency instructions in x86. There are prefetch instructions like prefetchnta
, but that doesn't affect the memory-ordering semantics. It used to be implemented by bringing the value to L1 cache without polluting L2, but things are more complicated for modern Intel designs with a large shared inclusive L3 cache.
x86 CPUs use a variation on the MESI protocol (MESIF for Intel, MOESI for AMD) to keep their caches coherent with each other (including the private L1 caches of different cores). A core that wants to write a cache line has to force other cores to invalidate their copy of it before it can change its own copy from Shared to Modified state.
You don't need any fence instructions (like MFENCE) to produce data in one thread and consume it in another on x86, because x86 loads/stores have acquire/release semantics built-in. You do need MFENCE (full barrier) to get sequential consistency. (A previous version of this answer suggested that clflush
was needed, which is incorrect).
You do need to prevent compile-time reordering, because C++'s memory model is weakly-ordered. volatile
is an old, bad way to do this; C++11 std::atomic is a much better way to write lock-free code.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…