It's common for x86 to have more than 1 valid way of encoding an instruction. e.g. most op reg, reg
instructions have a choice of encoding via the op r/m, reg
or the op reg, r/m
opcode.
And yes, normally you want an assembler to always pick the shortest encoding for an instruction. NASM even optimizes mov rax, 1
(7 bytes for mov r64, sign_extended_imm32
) into mov eax, 1
(5 bytes) for x86-64, changing the operand-size to use the zero-extension from writing a 32-bit register instead of explicit sign-extension of a 32-bit immediate.
Using the sign-extended-imm8 encoding when available is always good
It's equal length for 16-bit, but shorter for 32-bit operand-size, so it simplifies your code to always choose imm8
.
With operand-size of 32-bit, op eax, imm32
is 5 bytes, vs. op r/m32, imm8
still being 3 bytes. (Not counting any prefixes needed to set operand-size or other things; those will be the same for both.)
Performance advantages of the imm8 encoding
If an operand-size prefix is requires (e.g. in 32-bit mode for adc ax, 0x33
), using the adc ax/eax/rax, imm16/32/32
encoding with an operand-size prefix will create an LCP stall on Intel CPUs (Length-Changing Prefix means the prefix changes the length of the rest of the instruction. This doesn't happen for the imm8 encoding because it's still (prefix) + opcode + modrm + imm8 regardless of the operand-size.
See Agner Fog's microarch.pdf and other performance links in the x86 tag wiki. See also x86 instruction encoding how to choose opcode which is a duplicate of this, except for the fact that adc
is a special case.
In the specific case of adc
/sbb
, there is another advantage to avoiding the ax, imm16
encoding: See Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? On Sandybridge through Haswell, adc ax, 0
is special-cased as a single-uop instruction, instead of the normal 2 for a 3-input uop (ax, flags, immediate).
But this special casing doesn't work for the no-ModRM short form encodings, so the 3-byte adc ax, imm16
still decodes to 2 uops. Only the decoder for the imm8
form checks if the immediate is zero before decoding to a single uop. (And it still doesn't work for adc al, imm8
.)
So always choosing the sign-extended-imm8 whenever possible is optimal for this, too, even in 16-bit mode where no operand-size prefix would be required for adc ax,0
and thus the LCP-stall issue wouldn't happen.
Most assemblers don't provide an override to avoid the no-ModRM short form. When they were designed, there wasn't a performance use-case other than intentionally lengthening instructions to get alignment without adding NOPs before the top of a loop or other branch target: What methods can be used to efficiently extend instruction length on modern x86?
If you're designing a new flavour of asm syntax you might consider allowing more control of the encoding with override keywords. For existing designs, check out NASM's strict
and nosplit
keywords, and GAS's {vex2}
, {vex3}
, {disp32}
and so on "prefixes"