In AT&T syntax, memory operands have the following syntax1:
displacement(base_register, index_register, scale_factor)
The base, index and displacement components can be used in any combination, and every component can be omitted
but obviously the commas must be retained if you omit the base register, otherwise it would be impossible for the assembler to understand which of those components you are leaving out.
All this data gets combined to calculate the address you are specifying, with the following formula:
effective_address = displacement + base_register + index_register*scale_factor
(which incidentally is almost exactly how you would specify this in Intel syntax).
So, armed with this knowledge we can decode your instruction:
movl data_items (,%edi,4), %eax
Matching the syntax above, you see that:
data_items
is the displacement;
base_register
is omitted, so is not put into the formula above;
%edi
is index_register
;
4
is scale_factor
.
So, you are telling the CPU to move a long from the location data_items+%edi*4
to the register %eax
.
The *4
is necessary because each element of your array is 4-bytes wide, so to transform the index (in %edi
) to an offset (in bytes) from the start of the array you have to multiply it by 4.
Since we've already declared that each number is 4 bytes with .long, why do we need to do it again here?
Assemblers are low level tools that knows nothing about types.
.long
is not an array declaration, is just a directive to the assembler to emit the bytes corresponding to the 32-bit representation of its parameters;
data_items
is not an array, is just a symbol that gets resolved to some memory location, exactly as the other labels; the fact that you placed a .long
directive after it is of no particular significance to the assembler.
Notes
- Technically, there would also be the segment specifier, but given that we are talking about 32 bit code on Linux I'll omit segments entirely, as they would only add confusion.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…