| \C{64bit} Writing 64-bit Code (Unix, Win64) |
| |
| This chapter attempts to cover some of the common issues involved when |
| writing 64-bit code, to run under \i{Win64} or Unix. It covers how to |
| write assembly code to interface with 64-bit C routines, and how to |
| write position-independent code for shared libraries. |
| |
| All 64-bit code uses a flat memory model, since segmentation is not |
| available in 64-bit mode. The one exception is the \c{FS} and \c{GS} |
| registers, which still add their bases. |
| |
| Position independence in 64-bit mode is significantly simpler, since |
| the processor supports \c{RIP}-relative addressing directly; see the |
| \c{REL} keyword (\k{effaddr}). On most 64-bit platforms, it is |
| probably desirable to make that the default, using the directive |
| \c{DEFAULT REL} (\k{default}). |
| |
| \c{DEFAULT REL} is likely to become the default in a future version of NASM. |
| |
| 64-bit programming is relatively similar to 32-bit programming, but |
| of course pointers are 64 bits long; additionally, all existing |
| platforms pass arguments in registers rather than on the stack. |
| Furthermore, 64-bit platforms use SSE2 by default for floating point. |
| Please see the ABI documentation for your platform. |
| |
| 64-bit platforms differ in the sizes of the C/C++ fundamental |
| datatypes, not just from 32-bit platforms but from each other. If a |
| specific size data type is desired, it is probably best to use the |
| types defined in the standard C header \c{<inttypes.h>}. |
| |
| All known 64-bit platforms except some embedded platforms require that |
| the stack is 16-byte aligned at the entry to a function. Specifically, |
| the stack pointer (\c{RSP}) needs to be 16-byte aligned just before the |
| \c{CALL} instruction. |
| |
| In 64-bit mode, the default instruction size is still 32 bits. When |
| loading a value into a 32-bit register (but not an 8- or 16-bit |
| register), the upper 32 bits of the corresponding 64-bit register are |
| set to zero. |
| |
| \H{reg64} Register Names in 64-bit Mode |
| |
| NASM uses the following names for general-purpose registers in 64-bit |
| mode, for 8-, 16-, 32- and 64-bit references, respectively: |
| |
| \c AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B |
| \c AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W |
| \c EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D |
| \c RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15 |
| |
| This is consistent with the AMD documentation and most other |
| assemblers. The Intel documentation, however, uses the names |
| \c{R8L-R15L} for 8-bit references to the higher registers. It is |
| possible to use those names by defining them as macros; similarly, |
| if one wants to use numeric names for the low 8 registers, define them |
| as macros. The standard macro package \c{altreg} (see \k{pkg_altreg}) |
| can be used for this purpose. |
| |
| \H{id64} Immediates and Displacements in 64-bit Mode |
| |
| In 64-bit mode, immediates and displacements are generally only 32 |
| bits wide. NASM will therefore truncate most displacements and |
| immediates to 32 bits. |
| |
| \S{id64imm} Immediate 64-bit Operands |
| |
| The only instruction which takes a full \i{64-bit immediate} is: |
| |
| \c MOV reg64,imm64 |
| |
| NASM will produce this instruction whenever the programmer uses |
| \c{MOV} with an immediate into a 64-bit register. If this is not |
| desirable, simply specify the equivalent 32-bit register, which will |
| be automatically zero-extended by the processor, or specify the |
| immediate as \c{DWORD}: |
| |
| \c mov rax,foo ; 64-bit immediate |
| \c mov rax,qword foo ; (identical) |
| \c mov eax,foo ; 32-bit immediate, zero-extended |
| \c mov rax,dword foo ; 32-bit immediate, sign-extended |
| |
| The length of these instructions are 10, 5 and 7 bytes, respectively. |
| |
| If optimization is enabled and NASM can determine at assembly time |
| that a shorter instruction will suffice, the shorter instruction will |
| be emitted unless of course \c{STRICT QWORD} or \c{STRICT DWORD} is |
| specified (see \k{strict}): |
| |
| \c mov rax,1 ; Assembles as "mov eax,1" (5 bytes) |
| \c mov rax,strict qword 1 ; Full 10-byte instruction |
| \c mov rax,strict dword 1 ; 7-byte instruction |
| \c mov rax,symbol ; 10 bytes, not known at assembly time |
| \c lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI |
| |
| Note that \c{lea rax,[rel symbol]} is position-independent, whereas |
| \c{mov rax,symbol} is not. Most ABIs prefer or even require |
| position-independent code in 64-bit mode. However, the \c{MOV} |
| instruction is able to reference a symbol anywhere in the 64-bit |
| address space, whereas \c{LEA} is only able to access a symbol within |
| within 2 GB of the instruction itself (see below). |
| |
| \S{id64disp} 64-bit Displacements |
| |
| The only instructions which take a full \I{64-bit displacement}64-bit |
| \e{displacement} is loading or storing, using \c{MOV}, \c{AL}, \c{AX}, |
| \c{EAX} or \c{RAX} (but no other registers) to an absolute 64-bit address. |
| Since this is a relatively rarely used instruction (64-bit code generally uses |
| relative addressing), the programmer has to explicitly declare the |
| displacement size as \c{ABS QWORD}: |
| |
| \c default abs |
| \c |
| \c mov eax,[foo] ; 32-bit absolute disp, sign-extended |
| \c mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended |
| \c mov eax,[qword foo] ; 64-bit absolute disp |
| \c |
| \c default rel |
| \c |
| \c mov eax,[foo] ; 32-bit relative disp |
| \c mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) |
| \c mov eax,[qword foo] ; error |
| \c mov eax,[abs qword foo] ; 64-bit absolute disp |
| |
| A sign-extended absolute displacement can access from -2 GB to +2 GB; |
| a zero-extended absolute displacement can access from 0 to 4 GB. |
| |
| \H{unix64} Interfacing to 64-bit C Programs (Unix) |
| |
| On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the |
| CPU in 64-bit mode) is defined by the documents at: |
| |
| \W{https://www.nasm.us/abi/unix64}\c{https://www.nasm.us/abi/unix64} |
| |
| Although written for AT&T-syntax assembly, the concepts apply equally |
| well for NASM-style assembly. What follows is a simplified summary. |
| |
| The first six integer arguments (from the left) are passed in \c{RDI}, |
| \c{RSI}, \c{RDX}, \c{RCX}, \c{R8}, and \c{R9}, in that order. |
| Additional integer arguments are passed on the stack. These |
| registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function |
| calls, and thus are available for use by the function without saving. |
| |
| Integer return values are passed in \c{RAX} and \c{RDX}, in that order. |
| |
| Floating point is done using SSE registers, except for \c{long |
| double}, which is 80 bits (\c{TWORD}) on most platforms (Android is |
| one exception; there \c{long double} is 64 bits and treated the same |
| as \c{double}.) Floating-point arguments are passed in \c{XMM0} to |
| \c{XMM7}; return is \c{XMM0} and \c{XMM1}. \c{long double} are passed |
| on the stack, and returned in \c{ST0} and \c{ST1}. |
| |
| All SSE and x87 registers are destroyed by function calls. |
| |
| On 64-bit Unix, \c{long} is 64 bits. |
| |
| Integer and SSE register arguments are counted separately, so for the case of |
| |
| \c void foo(long a, double b, int c) |
| |
| \c{a} is passed in \c{RDI}, \c{b} in \c{XMM0}, and \c{c} in \c{ESI}. |
| |
| \H{win64} Interfacing to 64-bit C Programs (Win64) |
| |
| The Win64 ABI is described by the document at: |
| |
| \W{https://www.nasm.us/abi/win64}\c{https://www.nasm.us/abi/win64} |
| |
| What follows is a simplified summary. |
| |
| The first four integer arguments are passed in \c{RCX}, \c{RDX}, |
| \c{R8} and \c{R9}, in that order. Additional integer arguments are |
| passed on the stack. These registers, plus \c{RAX}, \c{R10} and |
| \c{R11} are destroyed by function calls, and thus are available for |
| use by the function without saving. |
| |
| Integer return values are passed in \c{RAX} only. |
| |
| Floating point is done using SSE registers, except for \c{long |
| double}. Floating-point arguments are passed in \c{XMM0} to \c{XMM3}; |
| return is \c{XMM0} only. |
| |
| On Win64, \c{long} is 32 bits; \c{long long} or \c{_int64} is 64 bits. |
| |
| Integer and SSE register arguments are counted together, so for the case of |
| |
| \c void foo(long long a, double b, int c) |
| |
| \c{a} is passed in \c{RCX}, \c{b} in \c{XMM1}, and \c{c} in \c{R8D}. |
| |
| There is a requirement for functions to allocate a "shadow space" for callees, |
| prior to calling them, that is owned by the callee. This is for the callee to |
| (optionally) store the arguments that are passed in via registers (e.g. for |
| debugging purposes), or in fact any other desired values. This 32-byte shadow |
| space must be allocated just before the stack space used for non-register |
| arguments (5th and beyond, if any). |
| |
| Before a function call, 16-byte stack alignment is required. |
| |
| Regarding shadow space and stack alignment, an exception is made for leaf |
| functions, which in Win64 terms means no modification to \c{RSP} at all |
| (not just having no function calls). |