x86 - Assembly code to return smallest integer in array instead randomly returns either last or second to last number

Question

Welcome To Ask or Share your Answers For Others

x86 - Assembly code to return smallest integer in array instead randomly returns either last or second to last number

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

x86 - Assembly code to return smallest integer in array instead randomly returns either last or second to last number

I'm trying to create a function in nasm which, given an array of integers and the length of the array, returns the smallest integer. This is based on the CodeWars problem "Find the smallest integer in the array". I'm doing this on 64 bit BlackArch Linux. My function looks like this:

SECTION .text
global find_smallest_int

find_smallest_int:
  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

  ; rsi is the second argument to int find_smallest_int(int *, int)
  ; which represents the length of the array.
  ; Store it in rbx to be explicit.
  mov rbx, rsi

  loop:
    ; Check to see if we've reached the end of the array.
    ; If we have, we jump to the end of the function and 
    ; return the smallest value (which should be whatever
    ; is in rax at the moment.
    cmp rbx, 0
    je end

    ; Subtract one from our counter. This started as 
    ; the number of elements in the array - when it
    ; gets to 0, we'll have looped through the entire thing.
    sub rbx, 1

    ; If rax is smaller than [rdi], we'll jump down to the
    ; rest of the loop. Only if rax is bigger than [rdi] will
    ; we reassign rax to be the new smallest-yet vaue.
    cmp rax, [rdi]
    jl postassign

    assign:
      ; If we execute this code, it means rax was not less
      ; than [rdi]. Therefore, we can safely reassign
      ; rax to [rdi].
      mov rax, [rdi]


    postassign:
    ; Set rdi to point to the next value in the array
    add rdi, 4

    ; if we get here, then we aren't finishing looping yet
    ; because rbx (the counter) hasn't eached 0 yet.
    jmp loop

  end:
    ret

I then call this function via the following C code:

extern int find_smallest_int(int *array, int size);

int main(void)
{
    int nums[4] = {800, 300, 100, 11};
    int ret = find_smallest_int(nums, 4);

    return ret;
}

Finally, I compile and run the whole thing using the following commands:

#!/bin/bash

# Make an object file from my assembly code with nasm
nasm -f elf64 -o sum.o call_sum.s

# make an object file from my C code
gcc -O0 -m64 -c -o call_sum.o call_sum.c -g

# compile my two object files into an executable
gcc -O0 -m64 -o run sum.o call_sum.o -g

# Run the executable and get the output in the
# form of the exit code.
./run
echo $?

Instead of getting the smallest integer, I either get 100 or 11 (the second to last and last members of the integer array that I pass to my assembly function, respectively). Which result I get appears to be completely random. I can run the program a few times and get 11, then run it a few more and then start getting 100.

If anyone could help me understand this strange behavior I'd appreciate it immensely. Thanks!

Update: I implemented the changes from Jester's comment (using 32bit registers to hold ints) and it works, but I do not really understand why.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:07:11+0000

The beginning of this answer is based on what Jester commented on. It just expands on that and explains the changes in some more detail. I did some additional changes too, two of which are also addressing mistakes in your sources.

First, this part:

An int is 4 bytes but you use 8 all over your code. Use eax instead of rax.

These instructions in your example are accessing 8 bytes from the array each:

    mov rax, [rdi]

    cmp rax, [rdi]

    mov rax, [rdi]

This is because rax is a 64-bit register, so doing a full rax load or compare with a memory operand accesses 8 bytes of memory. In NASM syntax, you are allowed to specify the memory operand's size explicitly, such as by writing the following:

    mov rax, qword [rdi]

If you had done that you might have figured out earlier that you were accessing memory in 8-byte units (quadwords). Trying to access a doubleword explicitly while using rax as the destination register would fail. The following line causes the error "mismatch in operand sizes" at assembly time:

    mov rax, dword [rdi]

The following two lines are fine, and both load into rax from a doubleword memory operand. The first one uses zero-extension (which is implicit in the AMD64 instruction set when writing to a 32-bit register part), the second one uses (explicit) sign extension:

    mov eax, dword [rdi]
    movsx rax, dword [rdi]

(A movzx instruction from a dword memory operand to rax does not exist, as it would be redundant with the mov to eax.)

Later in your example, you're using rdi as the address of a 4-byte wide type, advancing the array entry pointer by adding 4 to it:

    add rdi, 4

This is correct for the int type, but clashes with your use of quadwords as the memory operands' sizes.

Two more issues are given by Jester's comment:

Also do not use rbx because that is a callee-saved register and it's pointless to copy from rsi anyway. As before, you'd better use esi because that's another int.

The rsi problem is that the high 32 bits of the 64-bit rsi might hold nonzero values depending on the ABI. If you aren't sure about whether a nonzero value is allowed, you should assume it is and that you should use the 32-bit value in esi only.

The rbx (or ebx) problem is that rbx needs to be preserved across function calls for the AMD64 psABI that is used by Linux, see Where is the x86-64 System V ABI documented? for the documentation of that ABI. In your simple test program changing rbx may not cause any failure, but it easily would in a non-trivial context.

The next problem I found is with your initialisation of eax. You wrote it this way:

  ; [rdi] is the first value in the array.
  ; We'll store the smallest value so far found
  ; in rax. The first value in the array is the
  ; smallest so far found, therefore we store it
  ; in rax.
  mov rax, [rdi]

However, as evidenced by your loop flow-control logic, you allow for the caller to pass in a zero for the size argument. In that case, you should not access the array at all, because "the first value in the array" may not even exist or be initialised to anything at all. Logically, you should initialise the smallest-yet value with INT_MAX instead of the first array entry.

There is one more problem: You're using rsi or esi as an unsigned number, counting down to zero. However, in your declaration of the function you specified the type of the size argument as int, which is signed. I fixed that by changing the declaration to unsigned int.

I did a few more optional changes to your program. I used NASM local labels for the "sub"-labels of your function, which is useful because you could re-use for example .loop or .end in other functions in the same source file if there were any to be added.

I also corrected one of the comments to note that we jump for eax being less than the array entry, and do not jump for eax being greater than or equal to the array entry. You could change this conditional jump to be jle instead which would jump for equal comparisons too. Arguably one or the other may be preferred for clarity, or performance, but I don't have much of an answer as to which.

I also used dec esi instead of sub esi, 1 which is not very much better but just sits better with me. In 32-bit mode, dec esi is a single-byte instruction. Not so in 64-bit mode though; dec esi is 2 bytes against sub esi, 1 being 3 bytes.

Further, I changed the initial check for esi being zero from using cmp to test, which is slightly better, refer to Test whether a register is zero with CMP reg,0 vs OR reg,reg?

Finally, I changed the actual loop conditional to be at the end of the loop body, which means the loop uses one jump instruction less. The unconditional jump to the loop body's start is replaced by the conditional jump that checks for the while condition. The test at the beginning of the function is still needed to handle the zero-length-array possibility. Also, instead of using cmp or test again to check for zero in esi, I just use the Zero Flag as already set up by the dec instruction to check whether esi was decremented to zero.

You could use ecx or rcx for the loop counter, but that would probably not be much of a win on modern CPUs. Would allow for slightly more compact code if you resorted to using the jrcxz, jecxz, or loop instructions. They aren't recommended though, due to slower performance.

Instead of comparing to the dword [rdi] and then, if less-or-equal than eax, loading from the same memory dword, you could load the array entry's value into a register first and then use that as source for cmp and mov. This might be faster, but it does result in more opcode bytes.

A trick I otherwise use to advance the Destination Index (rdi in 64-bit mode) by 4 is to use a single scasd instruction, which modifies only the flags and the index register. This is a single-byte instruction instead of the 4-byte add rdi, 4 but is highly probably slower to run.

I uploaded a repo with your original source and my improvements to https://hg.ulukai.org/ecm/testsmal/file/2b8637ca416a/ (Taken under the CC BY-SA usage conditions of stackoverflow content.) I modified the C part and the test script too, but these are trivial and mostly irrelevant to your question. Here's the assembly source:


INT_MAX equ 7FFF_FFFFh

SECTION .text
global find_smallest_int

find_smallest_int:

                ; If the array is empty (size = 0) then we want to return
                ; without reading from the array at all. The value to return
                ; then logically should be the highest possible number for a
                ; 32-bit signed integer. This is called INT_MAX in the C
                ; header limits.h and for 32-bit int is equal to 7FFF_FFFFh.
                ;
                ; If the array is not empty, the first iteration will then
                ; always leave our result register equal to the value in
                ; the first array entry. This is either equal to INT_MAX
                ; again, or less than that.
        mov eax, INT_MAX

                ; esi is the second argument to our function, which is
                ; declared as int find_smallest_int(int *, unsigned int).
                ; It represents the length of the array. We use this
                ; as a counter. rsi (and its part esi) need not be preserved
                ; across function calls for the AMD64 psABI that is used by
                ; Linux, see https://stackoverflow.com/a/40348010/738287

                ; Check for an initial zero value in esi. If this is found,
                ; skip the loop without any iteration (while x do y) and
                ; return eax as initialised to INT_MAX at the start.
        test esi, esi
        jz .end

.loop:
                ; If eax is smaller than dword [rdi], we'll jump down to the
                ; rest of the loop. Only if eax is bigger than or equal to
                ; the dword [rdi] will we reassign eax to that, to hold the
                ; new smallest-yet value.
        cmp eax, dword [rdi]
        jl .postassign

.assign:
                ; If we execute this code, it means eax was not less
                ; than dword [rdi]. Therefore, we can safely reassign
                ; eax to dword [rdi].
        mov eax, dword [rdi]


.postassign:
                ; Set rdi to point to the next value in the array.
        add rdi, 4


                ; Subtract one from our counter. This started as 
                ; the number of elements in the array - when it
                ; gets to 0, we'll have looped through the entire thing.
        dec esi

                ; Check to see if we've reached the end of the array.
                ; To do this, we use the Zero Flag as set by the prior
                ; dec instruction. If esi has reached zero yet (ZR) then
                ; we do not continue looping. In that case, we return the
                ; smallest value found yet (which is in eax at the moment).
                ;
                ; Else, we jump to the start of the loop to begin the next
                ; iteration.
        jnz .loop

.end:
        retn

Here's an alternative for the conditional jump within the loop body. The cmov instruction seems to be supported by all AMD64 CPUs. It is a conditional move: If the condition is met it runs like a mov -- it has no effect otherwise, with one exception: It may read the source operand (and thus might fault from the memory access). You can flip the condition you used for the branch around the mov to get the condition for the cmov instruction. (I came across this thread involving Linus Torvalds which indicates that the conditional jump solution may actually be better or no worse than cmov. Make of that what you will.)

.loop:
                ; If eax is greater than or equal to dword [rdi], we'll
                ; reassign eax to that dword, the new smallest-yet value.
        cmp eax, dword [rdi]
        cmovge eax, dword [rdi]

                ; Set rdi to point to the next value in the array.
        add rdi, 4

Categories

x86 - Assembly code to return smallest integer in array instead randomly returns either last or second to last number

x86 - Assembly code to return smallest integer in array instead randomly returns either last or second to last number

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags