x86 assembly 16 bit vs 8 bit immediate operand encoding

Question

Welcome To Ask or Share your Answers For Others

x86 assembly 16 bit vs 8 bit immediate operand encoding

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:28:52+0000

It's common for x86 to have more than 1 valid way of encoding an instruction. e.g. most op reg, reg instructions have a choice of encoding via the op r/m, reg or the op reg, r/m opcode.

And yes, normally you want an assembler to always pick the shortest encoding for an instruction. NASM even optimizes mov rax, 1 (7 bytes for mov r64, sign_extended_imm32) into mov eax, 1 (5 bytes) for x86-64, changing the operand-size to use the zero-extension from writing a 32-bit register instead of explicit sign-extension of a 32-bit immediate.

Using the sign-extended-imm8 encoding when available is always good

It's equal length for 16-bit, but shorter for 32-bit operand-size, so it simplifies your code to always choose imm8.

With operand-size of 32-bit, op eax, imm32 is 5 bytes, vs. op r/m32, imm8 still being 3 bytes. (Not counting any prefixes needed to set operand-size or other things; those will be the same for both.)

Performance advantages of the imm8 encoding

If an operand-size prefix is requires (e.g. in 32-bit mode for adc ax, 0x33), using the adc ax/eax/rax, imm16/32/32 encoding with an operand-size prefix will create an LCP stall on Intel CPUs (Length-Changing Prefix means the prefix changes the length of the rest of the instruction. This doesn't happen for the imm8 encoding because it's still (prefix) + opcode + modrm + imm8 regardless of the operand-size.

See Agner Fog's microarch.pdf and other performance links in the x86 tag wiki. See also x86 instruction encoding how to choose opcode which is a duplicate of this, except for the fact that adc is a special case.

In the specific case of adc/sbb, there is another advantage to avoiding the ax, imm16 encoding: See Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? On Sandybridge through Haswell, adc ax, 0 is special-cased as a single-uop instruction, instead of the normal 2 for a 3-input uop (ax, flags, immediate).

But this special casing doesn't work for the no-ModRM short form encodings, so the 3-byte adc ax, imm16 still decodes to 2 uops. Only the decoder for the imm8 form checks if the immediate is zero before decoding to a single uop. (And it still doesn't work for adc al, imm8.)

So always choosing the sign-extended-imm8 whenever possible is optimal for this, too, even in 16-bit mode where no operand-size prefix would be required for adc ax,0 and thus the LCP-stall issue wouldn't happen.

Most assemblers don't provide an override to avoid the no-ModRM short form. When they were designed, there wasn't a performance use-case other than intentionally lengthening instructions to get alignment without adding NOPs before the top of a loop or other branch target: What methods can be used to efficiently extend instruction length on modern x86?

If you're designing a new flavour of asm syntax you might consider allowing more control of the encoding with override keywords. For existing designs, check out NASM's strict and nosplit keywords, and GAS's {vex2}, {vex3}, {disp32} and so on "prefixes"

How to force NASM to encode [1 + rax*2] as disp32 + index*2 instead of disp8 + base + index? for nosplit to force a longer more efficient encoding for LEA.
How do GNU assembler x86 instruction suffixes like ".s" in "mov.s" work? (GAS {disp32} and so on, and {load} or {store} to choose which of the op r/m, r vs. op r, r/m encoding you prefer.)
Sign or Zero Extension of address in 64bit mode for MOV moffs32? In 64-bit mode, a32 mov eax, [0x123456] with the no-modrm moffs encoding causes an LCP stall on Intel CPUs. It's shorter than modrm+SIB+disp32 for absolute addressing, but potentially slower.
Why NASM on Linux changes registers in x86_64 assembly NASM mov rax,1 (5 bytes) vs. mov rax, strict dword 1 (7 bytes) vs. mov rax, strict qword 1 (10 byte imm64 encoding)

Categories

x86 assembly 16 bit vs 8 bit immediate operand encoding