The Problem
The answer to your question is buried in your question, it just isn't obvious. You quoted my General Bootloader Tips:
- When the BIOS jumps to your code you can't rely on CS,DS,ES,SS,SP registers having valid or expected values. They should be set up appropriately when your bootloader starts. You can only be guaranteed that your bootloader will be loaded and run from physical address 0x00007c00 and that the boot drive number is loaded into the DL register.
Your code correctly sets up DS, and sets its own stack (SS, and SP). You didn't blindly copy CS to DS, but what you do do is rely on CS being an expected value (0x0000). Before I explain what I mean by that, I'd like to draw your attention to a recent Stackoverflow answer I gave about how the ORG directive (or the origin point specified by any linker) works together with the segment:offset pair used by the BIOS to jump to physical address 0x07c00.
The answer details how CS being copied to DS can cause problems when referencing memory addresses (variables for example). In the summary I stated:
Don't assume CS is a value we expect, and don't blindly copy CS to DS . Set DS explicitly.
The key thing is Don't assume CS is a value we expect. So your next question may be - I don't seem to be using CS am I? The answer is yes. Normally when you use a typical CALL or JMP instruction it looks like this:
call print_char
jmp somewhereelse
In 16 bit-code both of these are relative jumps. This means that you jump forward or back in memory but as an offset relative to the instruction right after the JMP or CALL. Where your code is placed within a segment doesn't matter as it is a plus/minus displacement from where you currently are. What the current value of CS is doesn't actually matter with relative jumps, so they should work as expected.
Your example of instructions that don't always seem to work correctly included:
call [call_tbl] ; Call print_char using near indirect absolute call
; via memory operand
call [ds:call_tbl] ; Call print_char using near indirect absolute call
; via memory operand w/segment override
call near [si] ; Call print_char using near indirect absolute call
; via register
All of these have one thing in common. The addresses that are CALLed or JMPed are ABSOLUTE, not relative. The offset of the label will be influenced by the ORG (origin point of the code). If we look at a disassembly of your code we will see this:
objdump -mi8086 -Mintel -D -b binary boot.bin --adjust-vma 0x7c00
boot.bin: file format binary
Disassembly of section .data:
00007c00 <.data>:
7c00: 31 c0 xor ax,ax
7c02: 8e d8 mov ds,ax
7c04: fa cli
7c05: 8e d0 mov ss,ax
7c07: bc 00 7c mov sp,0x7c00
7c0a: fb sti
7c0b: be 34 7c mov si,0x7c34
7c0e: a0 36 7c mov al,ds:0x7c36
7c11: e8 18 00 call 0x7c2c ; Relative call works
7c14: a0 37 7c mov al,ds:0x7c37
7c17: ff 16 34 7c call WORD PTR ds:0x7c34 ; Near/Indirect/Absolute call
7c1b: 3e ff 16 34 7c call WORD PTR ds:0x7c34 ; Near/Indirect/Absolute call
7c20: ff 14 call WORD PTR [si] ; Near/Indirect/Absolute call
7c22: a0 38 7c mov al,ds:0x7c38
7c25: e8 04 00 call 0x7c2c ; Relative call works
7c28: fa cli
7c29: f4 hlt
7c2a: eb fd jmp 0x7c29
7c2c: b4 0e mov ah,0xe ; Beginning of print_char
7c2e: bb 00 00 mov bx,0x0 ; function
7c31: cd 10 int 0x10
7c33: c3 ret
7c34: 2c 7c sub al,0x7c ; 0x7c2c offset of print_char
; Only entry in call_tbl
7c36: 42 inc dx ; 0x42 = ASCII 'B'
7c37: 4d dec bp ; 0x4D = ASCII 'M'
7c38: 45 inc bp ; 0x45 = ASCII 'E'
...
7dfd: 00 55 aa add BYTE PTR [di-0x56],dl
I've manually added some comments where the CALL statements are, including both the relative ones that work and the near/indirect/absolute ones may not. I've also identified where the print_char
function is, and where it was in the call_tbl
.
From the data area after the code we do see that the call_tbl
is at 0x7c34 and it contains a 2 byte absolute offset of 0x7c2c. This is all correct, but when you use an absolute 2-byte offset it is assumed to be in the current CS. If you have read this Stackoverflow answer (that I referenced earlier) about what happens when the wrong DS and offset is used to reference a variable, you might now realize that this may apply to JMPs CALLs that use absolute offsets involving NEAR 2-byte absolute values.
As an example let us take this call that doesn't always work:
call [call_tbl]
call_tbl
is loaded from DS:[call_tbl]. We properly set DS to 0x0000 when we start the bootloader so this does correctly retrieve the value 0x7c2c from memory address 0x0000:0x7c34. The processor will then set IP=0x7c2c BUT it assumes it is relative to the currently set CS. Since we can't assume CS is an expected value, the processor potentially can CALL or JMP to the wrong location. It all depends on what CS:IP the BIOS used to jump to our bootloader with (it can vary).
In the case where the BIOS does the equivalent of a FAR JMP to our bootloader at 0x0000:0x7c00, CS will be set to 0x0000 and IP to 0x7c00. When we encounter call [call_tbl]
it would have resolved to a CALL to CS:IP=0x0000:0x7c2c . This is physical address (0x0000<<4)+0x7c2c=0x07c2c which is in fact where the print_char
function in memory that the function physically starts at.
Some BIOSes do the equivalent of a FAR JMP to our bootloader at 0x07c0:0x0000, CS will be set to 0x07c0 and IP to 0x0000. This too maps to physical address (0x07c0<<4)+0=0x07c00 .When we encounter call [call_tbl]
it would have resolved to a CALL to CS:IP=0x07c0:0x7c2c . This is physical address (0x07c0<<4)+0x7c2e=0x0f82c. This is clearly wrong since the print_char
function is at physical address 0x07c2c, not 0x0f82c.
Having CS set incorrectly will cause problems for JMP and CALL instructions that do Near/Absolute addressing. As well any memory operands that use a segment override of CS:
. An example of using the CS:
override in a real mode interrupt handler can be found in this Stackoverflow answer
Solution
Since it has been shown that we can't rely on CS that is set when the BIOS jumps to our code we can set CS ourselves. To set CS we can do a FAR JMP to our own code which will set CS:IP to values that make sense for the ORG (origin point of the code and data) we are using. An example of such a jump if we use ORG 0x7c00:
jmp 0x0000:$+5
$+5
says to use an offset that is 5 above our current program counter. A far jmp is 5 bytes long so this has the affect of doing a far jump to the instruction after our jmp. It could have been coded this way too:
jmp 0x0000:farjmp
farjmp:
When either of these instructions is complete CS will be set to 0x0000 and IP will be set to the offset of the next instruction. They key thing for us is that CS will be 0x0000. When paired with an ORG of 0x7c00 it will properly resolve absolute addresses so that they work properly when physically running on the CPU. 0x0000:0x7c00=(0x0000<<4)+0x7c00=physical address 0x07c00.
Of course if we use ORG 0x0000 then we need to set CS to 0x07c0. This is because (0x07c0<<4)+0x0000=0x07c00. So we could code the far jmp this way:
jmp 0x07c0:$+5
CS will be set to 0x07c0 and IP will be set to the offset of the next instruction.
The end result of all this is that we are setting CS to the segment we want, and not rely on a value that we can't guarantee when the BIOS finishes jumping to our code.
Issues with Different Environments
As we have seen the CS can matter. Most BIOSes whether in an emulator, virtual machine or real hardware do the equivalent of a far jump to 0x0000:0x7c00 and in those environments your bootloader would have worked. Some environment like older AMI Bioses and Bochs 2.6 when booting from a CD are starting our bootloader with CS:IP = 0x07c0:0x0000. As discussed in those environments near/absolute CALLs and JMPs will proceed to execute from the wrong memory locations and cause our bootloader to function incorrectly.
So what about Bochs working for a floppy image and not for an ISO image? This is a peculiarity in earlier versions of Bochs. When booting from a floppy the virtual BIOS jumps to 0x0000:0x7c00 and when it boots from an ISO image is uses 0x07c0:0x0000. This explains why it works differently. This odd behavior apparently came about because of literal interpretation of one of the El Torito specifications that specifically mentioned segment 0x07c0. Newer versions of Boch's virtual BIOSes were modified to use 0x0000:0x7c00 for both.
Does this Mean some BIOSes have a Bug?
The answer to this question is subjective. In the first versions of IBM's PC-DOS (prior to 2.1) the bootloader assumed that the BIOS jumped to 0x0000:0x7c00, but this wasn't clearly defined. Some BIOS manufacturers in the 80s started using 0x07c0:0x0000 and broke some early versions of DOS. When this was discovered bootloaders were modified to be well behaved as to not make any assumptions about what segment:offset pair was used to reach physical address 0x07c00. At the time one may have considered this a bug, but was based on the ambiguities introduced with 20-bit segment:offset pairs.
Since the mid 80s, it is my opinion that any new bootloader that assumes CS is a specific value has been coded in error.