Compilers often favour if-conversion to cmov when both sides of the branch are short, especially with a ternary so you always assign a C variable. e.g. if(x) y=bar;
sometimes doesn't optimize to CMOV but y = x ? bar : y;
does use CMOV more often. Especially when y
is an array entry that otherwise wouldn't be touched: introducing a non-atomic RMW of it could create a data-race not present in the source. (Compilers can't invent writes to possibly-shared objects.)
The obvious example of when if-conversion would be legal but obviously not profitable would be when there's a lot of work on both sides of an if/else. e.g. some multiplies and divides, a whole loop, and/or table lookups. Even if gcc can prove that it's safe to run both sides and select one result at the end, it would see that doing that much more work isn't worth avoiding a branch.
If-conversion to a data-dependency (branchless cmov) is only even possible in limited circumstances. e.g. Why is gcc allowed to speculatively load from a struct? shows a case where it can/can't be done. Other cases include doing a memory access that the C abstract machine doesn't, which the compiler can't prove won't fault. Or a non-inline function call that might have side-effects.
See also these questions about getting gcc to use CMOV.
See also Disabling predication in gcc/g++ - apparently gcc -fno-if-conversion -fno-if-conversion2
will disable use of cmov.
For a case where cmov hurts performance, see gcc optimization flag -O3 makes code slower than -O2 - GCC -O3
needs profile-guided optimization to get it right and use a branch for an if
that turns out to be highly predictable. GCC -O2
didn't do if-conversion in the first place, even without PGO profiling data.
An example the other way: Is there a good reason why GCC would generate jump to jump just over one cheap instruction?
GCC seemingly misses simple optimization shows a case where a ternary has side-effects in both halves: ternary isn't like CMOV: only one side is even evalutated for side effects.
AVX-512 and Branching shows a Fortran example where GCC needs help from source changes to be able to use branchless SIMD. (Equivalent of scalar CMOV). This is a case of not inventing writes: it can't turn a read/branch into read/maybe-modify/write for elements that source wouldn't have written. If-conversion is usually necessary for auto-vectorization.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…