Does anyone know of published benchmarks of the overhead of locking instead of relying on certainly atomic operations/intrinsics (on a multiprocessor system) only?
I’m particularly interested in general conclusions, e.g. something like “regardless of the platform, locking is at least a factor X slower than intrinsics.” (That’s why I can’t just benchmark myself.)
I’m interested in direct comparisons, e.g. how much faster is using
#pragma omp atomic
++x;
instead of
#pragma omp critical
++x;
(assuming that every other update of x
is also critical).
Basically, I need this to justify a complex lock-free implementation instead of a straightforward locking one where starvation isn’t an issue. Conventional wisdom is that while locking is simpler, non-locking implementations have tons of advantages. But I’m hard pressed to find reliable data.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…