This is fairly easy to do. Push the return address onto the stack and then jump to the subroutine.
The final code looks like this:
PUSH 5
PUSH 4
PUSH offset label1
jmp Function
label1: ; returns here
leas esp, 8[esp]
Function:
...
ret
While this works, you really don't want to do this. On most modern processors, an on-chip call stack return address cache is kept, which pushes return addresses on a call, and pops return addresses on an RET. Being on the processor this has extremely short update/access times, which means the RET instruction can use the call-stack cache popped value to predict where the PC should go next, rather than waiting for the actual memory read from the memory location actually pointed to by ESP. If you do the "PUSH offset label1" trick,
this cache does not get updated, and thus the RET branch prediction is wrong and the processor pipeline gets blown, having a severe negative impact on performance. (I think IBM has a patent on special instructions which are essentially "PUSHRETURNADDRESS k" and "POPRETURNADDESS", allowing this trick to be used on some of their CPUs. Alas, not on the x86.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…