| \C{32bit} Writing 32-bit Code (Unix, Win32, DJGPP) |
| |
| This chapter attempts to cover some of the common issues involved |
| when writing 32-bit code, to run under \i{Win32} or Unix, or to be |
| linked with C code generated by a Unix-style C compiler such as |
| \i{DJGPP}. It covers how to write assembly code to interface with |
| 32-bit C routines, and how to write position-independent code for |
| shared libraries. |
| |
| Almost all 32-bit code, and in particular all code running under |
| \c{Win32}, \c{DJGPP} or any of the PC Unix variants, runs in \I{flat |
| memory model}\e{flat} memory model. This means that the segment registers |
| and paging have already been set up to give you the same 32-bit 4Gb |
| address space no matter what segment you work relative to, and that |
| you should ignore all segment registers completely. When writing |
| flat-model application code, you never need to use a segment |
| override or modify any segment register, and the code-section |
| addresses you pass to \c{CALL} and \c{JMP} live in the same address |
| space as the data-section addresses you access your variables by and |
| the stack-section addresses you access local variables and procedure |
| parameters by. Every address is 32 bits long and contains only an |
| offset part. |
| |
| |
| \H{32c} Interfacing to 32-bit C Programs |
| |
| A lot of the discussion in \k{16c}, about interfacing to 16-bit C |
| programs, still applies when working in 32 bits. The absence of |
| memory models or segmentation worries simplifies things a lot. |
| |
| |
| \S{32cunder} External Symbol Names |
| |
| Most 32-bit C compilers share the convention used by 16-bit |
| compilers, that the names of all global symbols (functions or data) |
| they define are formed by prefixing an underscore to the name as it |
| appears in the C program. However, not all of them do: the \c{ELF} |
| specification states that C symbols do \e{not} have a leading |
| underscore on their assembly-language names. |
| |
| The older Linux \c{a.out} C compiler, all \c{Win32} compilers, |
| \c{DJGPP}, and \c{NetBSD} and \c{FreeBSD}, all use the leading |
| underscore; for these compilers, the macros \c{cextern} and |
| \c{cglobal}, as given in \k{16cunder}, will still work. For \c{ELF}, |
| though, the leading underscore should not be used. |
| |
| See also \k{opt-pfix}. |
| |
| \S{32cfunc} Function Definitions and Function Calls |
| |
| \I{functions, C calling convention}The \i{C calling convention} |
| in 32-bit programs is as follows. In the following description, |
| the words \e{caller} and \e{callee} are used to denote |
| the function doing the calling and the function which gets called. |
| |
| \b The caller pushes the function's parameters on the stack, one |
| after another, in reverse order (right to left, so that the first |
| argument specified to the function is pushed last). |
| |
| \b The caller then executes a near \c{CALL} instruction to pass |
| control to the callee. |
| |
| \b The callee receives control, and typically (although this is not |
| actually necessary, in functions which do not need to access their |
| parameters) starts by saving the value of \c{ESP} in \c{EBP} so as |
| to be able to use \c{EBP} as a base pointer to find its parameters |
| on the stack. However, the caller was probably doing this too, so |
| part of the calling convention states that \c{EBP} must be preserved |
| by any C function. Hence the callee, if it is going to set up |
| \c{EBP} as a \i{frame pointer}, must push the previous value first. |
| |
| \b The callee may then access its parameters relative to \c{EBP}. |
| The doubleword at \c{[EBP]} holds the previous value of \c{EBP} as |
| it was pushed; the next doubleword, at \c{[EBP+4]}, holds the return |
| address, pushed implicitly by \c{CALL}. The parameters start after |
| that, at \c{[EBP+8]}. The leftmost parameter of the function, since |
| it was pushed last, is accessible at this offset from \c{EBP}; the |
| others follow, at successively greater offsets. Thus, in a function |
| such as \c{printf} which takes a variable number of parameters, the |
| pushing of the parameters in reverse order means that the function |
| knows where to find its first parameter, which tells it the number |
| and type of the remaining ones. |
| |
| \b The callee may also wish to decrease \c{ESP} further, so as to |
| allocate space on the stack for local variables, which will then be |
| accessible at negative offsets from \c{EBP}. |
| |
| \b The callee, if it wishes to return a value to the caller, should |
| leave the value in \c{AL}, \c{AX} or \c{EAX} depending on the size |
| of the value. Floating-point results are typically returned in |
| \c{ST0}. |
| |
| \b Once the callee has finished processing, it restores \c{ESP} from |
| \c{EBP} if it had allocated local stack space, then pops the previous |
| value of \c{EBP}, and returns via \c{RET} (equivalently, \c{RETN}). |
| |
| \b When the caller regains control from the callee, the function |
| parameters are still on the stack, so it typically adds an immediate |
| constant to \c{ESP} to remove them (instead of executing a number of |
| slow \c{POP} instructions). Thus, if a function is accidentally |
| called with the wrong number of parameters due to a prototype |
| mismatch, the stack will still be returned to a sensible state since |
| the caller, which \e{knows} how many parameters it pushed, does the |
| removing. |
| |
| There is an alternative calling convention used by Win32 programs |
| for Windows API calls, and also for functions called \e{by} the |
| Windows API such as window procedures: they follow what Microsoft |
| calls the \c{__stdcall} convention. This is slightly closer to the |
| Pascal convention, in that the callee clears the stack by passing a |
| parameter to the \c{RET} instruction. However, the parameters are |
| still pushed in right-to-left order. |
| |
| Thus, you would define a function in C style in the following way: |
| |
| \c global _myfunc |
| \c |
| \c _myfunc: |
| \c push ebp |
| \c mov ebp,esp |
| \c sub esp,0x40 ; 64 bytes of local stack space |
| \c mov ebx,[ebp+8] ; first parameter to function |
| \c |
| \c ; some more code |
| \c |
| \c leave ; mov esp,ebp / pop ebp |
| \c ret |
| |
| At the other end of the process, to call a C function from your |
| assembly code, you would do something like this: |
| |
| \c extern _printf |
| \c |
| \c ; and then, further down... |
| \c |
| \c push dword [myint] ; one of my integer variables |
| \c push dword mystring ; pointer into my data segment |
| \c call _printf |
| \c add esp,byte 8 ; `byte' saves space |
| \c |
| \c ; then those data items... |
| \c |
| \c segment _DATA |
| \c |
| \c myint dd 1234 |
| \c mystring db 'This number -> %d <- should be 1234',10,0 |
| |
| This piece of code is the assembly equivalent of the C code |
| |
| \c int myint = 1234; |
| \c printf("This number -> %d <- should be 1234\n", myint); |
| |
| |
| \S{32cdata} Accessing Data Items |
| |
| To get at the contents of C variables, or to declare variables which |
| C can access, you need only declare the names as \c{GLOBAL} or |
| \c{EXTERN}. (Again, the names require leading underscores, as stated |
| in \k{32cunder}.) Thus, a C variable declared as \c{int i} can be |
| accessed from assembler as |
| |
| \c extern _i |
| \c mov eax,[_i] |
| |
| And to declare your own integer variable which C programs can access |
| as \c{extern int j}, you do this (making sure you are assembling in |
| the \c{_DATA} segment, if necessary): |
| |
| \c global _j |
| \c _j dd 0 |
| |
| To access a C array, you need to know the size of the components of |
| the array. For example, \c{int} variables are four bytes long, so if |
| a C program declares an array as \c{int a[10]}, you can access |
| \c{a[3]} by coding \c{mov ax,[_a+12]}. (The byte offset 12 is obtained |
| by multiplying the desired array index, 3, by the size of the array |
| element, 4.) The sizes of the C base types in 32-bit compilers are: |
| 1 for \c{char}, 2 for \c{short}, 4 for \c{int}, \c{long} and |
| \c{float}, and 8 for \c{double}. Pointers, being 32-bit addresses, |
| are also 4 bytes long. |
| |
| To access a C \i{data structure}, you need to know the offset from |
| the base of the structure to the field you are interested in. You |
| can either do this by converting the C structure definition into a |
| NASM structure definition (using \c{STRUC}), or by calculating the |
| one offset and using just that. |
| |
| To do either of these, you should read your C compiler's manual to |
| find out how it organizes data structures. NASM gives no special |
| alignment to structure members in its own \i\c{STRUC} macro, so you |
| have to specify alignment yourself if the C compiler generates it. |
| Typically, you might find that a structure like |
| |
| \c struct { |
| \c char c; |
| \c int i; |
| \c } foo; |
| |
| might be eight bytes long rather than five, since the \c{int} field |
| would be aligned to a four-byte boundary. However, this sort of |
| feature is sometimes a configurable option in the C compiler, either |
| using command-line options or \c{#pragma} lines, so you have to find |
| out how your own compiler does it. |
| |
| |
| \S{32cmacro} \i\c{c32.mac}: Helper Macros for the 32-bit C Interface |
| |
| Included in the NASM archives, in the \I{misc directory}\c{misc} |
| directory, is a file \c{c32.mac} of macros. It defines three macros: |
| \i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be |
| used for C-style procedure definitions, and they automate a lot of |
| the work involved in keeping track of the calling convention. |
| |
| An example of an assembly function using the macro set is given |
| here: |
| |
| \c proc _proc32 |
| \c |
| \c %$i arg |
| \c %$j arg |
| \c mov eax,[ebp + %$i] |
| \c mov ebx,[ebp + %$j] |
| \c add eax,[ebx] |
| \c |
| \c endproc |
| |
| This defines \c{_proc32} to be a procedure taking two arguments, the |
| first (\c{i}) an integer and the second (\c{j}) a pointer to an |
| integer. It returns \c{i + *j}. |
| |
| Note that the \c{arg} macro has an \c{EQU} as the first line of its |
| expansion, and since the label before the macro call gets prepended |
| to the first line of the expanded macro, the \c{EQU} works, defining |
| \c{%$i} to be an offset from \c{BP}. A context-local variable is |
| used, local to the context pushed by the \c{proc} macro and popped |
| by the \c{endproc} macro, so that the same argument name can be used |
| in later procedures. Of course, you don't \e{have} to do that. |
| |
| \c{arg} can take an optional parameter, giving the size of the |
| argument. If no size is given, 4 is assumed, since it is likely that |
| many function parameters will be of type \c{int} or pointers. |
| |
| |
| \H{picdll} Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF \i{Shared |
| Libraries} |
| |
| \I{Linux}\i{ELF} replaced the older \c{a.out} object file format |
| under Linux because it contains support for \i{position-independent |
| code} (\i{PIC}), which makes writing shared libraries much |
| easier. NASM supports the \c{ELF} position-independent code features, |
| so you can write Linux \c{ELF} shared libraries in NASM. |
| |
| \i{NetBSD}, and its close cousins \i{FreeBSD} and \i{OpenBSD}, take |
| a different approach by hacking PIC support into the \c{a.out} |
| format. NASM supports this as the \i\c{aoutb} output format, so you |
| can write \i{BSD} shared libraries in NASM too. |
| |
| The operating system loads a PIC shared library by memory-mapping |
| the library file at an arbitrarily chosen point in the address space |
| of the running process. The contents of the library's code section |
| must therefore not depend on where it is loaded in memory. |
| |
| Therefore, you cannot get at your variables by writing code like |
| this: |
| |
| \c mov eax,[myvar] ; WRONG |
| |
| Instead, the linker provides an area of memory called the |
| \i\e{global offset table}, or \i{GOT}; the GOT is situated at a |
| constant distance from your library's code, so if you can find out |
| where your library is loaded (which is typically done using a |
| \c{CALL} and \c{POP} combination), you can obtain the address of the |
| GOT, and you can then load the addresses of your variables out of |
| linker-generated entries in the GOT. |
| |
| The \e{data} section of a PIC shared library does not have these |
| restrictions: since the data section is writable, it has to be |
| copied into memory anyway rather than just paged in from the library |
| file, so as long as it's being copied it can be relocated too. So |
| you can put ordinary types of relocation in the data section without |
| too much worry (but see \k{picglobal} for a caveat). |
| |
| |
| \S{picgot} Obtaining the Address of the GOT |
| |
| Each code module in your shared library should define the GOT as an |
| external symbol: |
| |
| \c extern _GLOBAL_OFFSET_TABLE_ ; in ELF |
| \c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out |
| |
| At the beginning of any function in your shared library which plans |
| to access your data or BSS sections, you must first calculate the |
| address of the GOT. This is typically done by writing the function |
| in this form: |
| |
| \c func: push ebp |
| \c mov ebp,esp |
| \c push ebx |
| \c call .get_GOT |
| \c .get_GOT: |
| \c pop ebx |
| \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc |
| \c |
| \c ; the function body comes here |
| \c |
| \c mov ebx,[ebp-4] |
| \c mov esp,ebp |
| \c pop ebp |
| \c ret |
| |
| (For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a |
| second leading underscore.) |
| |
| The first two lines of this function are simply the standard C |
| prologue to set up a stack frame, and the last three lines are |
| standard C function epilogue. The third line, and the fourth to last |
| line, save and restore the \c{EBX} register, because PIC shared |
| libraries use this register to store the address of the GOT. |
| |
| The interesting bit is the \c{CALL} instruction and the following |
| two lines. The \c{CALL} and \c{POP} combination obtains the address |
| of the label \c{.get_GOT}, without having to know in advance where |
| the program was loaded (since the \c{CALL} instruction is encoded |
| relative to the current position). The \c{ADD} instruction makes use |
| of one of the special PIC relocation types: \i{GOTPC relocation}. |
| With the \i\c{WRT ..gotpc} qualifier specified, the symbol |
| referenced (here \c{_GLOBAL_OFFSET_TABLE_}, the special symbol |
| assigned to the GOT) is given as an offset from the beginning of the |
| section. (Actually, \c{ELF} encodes it as the offset from the operand |
| field of the \c{ADD} instruction, but NASM simplifies this |
| deliberately, so you do things the same way for both \c{ELF} and |
| \c{BSD}.) So the instruction then \e{adds} the beginning of the section, |
| to get the real address of the GOT, and subtracts the value of |
| \c{.get_GOT} which it knows is in \c{EBX}. Therefore, by the time |
| that instruction has finished, \c{EBX} contains the address of the GOT. |
| |
| If you didn't follow that, don't worry: it's never necessary to |
| obtain the address of the GOT by any other means, so you can put |
| those three instructions into a macro and safely ignore them: |
| |
| \c %macro get_GOT 0 |
| \c |
| \c call %%getgot |
| \c %%getgot: |
| \c pop ebx |
| \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc |
| \c |
| \c %endmacro |
| |
| \S{piclocal} Finding Your Local Data Items |
| |
| Having got the GOT, you can then use it to obtain the addresses of |
| your data items. Most variables will reside in the sections you have |
| declared; they can be accessed using the \I{GOTOFF |
| relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The |
| way this works is like this: |
| |
| \c lea eax,[ebx+myvar wrt ..gotoff] |
| |
| The expression \c{myvar wrt ..gotoff} is calculated, when the shared |
| library is linked, to be the offset to the local variable \c{myvar} |
| from the beginning of the GOT. Therefore, adding it to \c{EBX} as |
| above will place the real address of \c{myvar} in \c{EAX}. |
| |
| If you declare variables as \c{GLOBAL} without specifying a size for |
| them, they are shared between code modules in the library, but do |
| not get exported from the library to the program that loaded it. |
| They will still be in your ordinary data and BSS sections, so you |
| can access them in the same way as local variables, using the above |
| \c{..gotoff} mechanism. |
| |
| Note that due to a peculiarity of the way BSD \c{a.out} format |
| handles this relocation type, there must be at least one non-local |
| symbol in the same section as the address you're trying to access. |
| |
| |
| \S{picextern} Finding External and Common Data Items |
| |
| If your library needs to get at an external variable (external to |
| the \e{library}, not just to one of the modules within it), you must |
| use the \I{GOT relocations}\I\c{WRT ..got}\c{..got} type to get at |
| it. The \c{..got} type, instead of giving you the offset from the |
| GOT base to the variable, gives you the offset from the GOT base to |
| a GOT \e{entry} containing the address of the variable. The linker |
| will set up this GOT entry when it builds the library, and the |
| dynamic linker will place the correct address in it at load time. So |
| to obtain the address of an external variable \c{extvar} in \c{EAX}, |
| you would code |
| |
| \c mov eax,[ebx+extvar wrt ..got] |
| |
| This loads the address of \c{extvar} out of an entry in the GOT. The |
| linker, when it builds the shared library, collects together every |
| relocation of type \c{..got}, and builds the GOT so as to ensure it |
| has every necessary entry present. |
| |
| Common variables must also be accessed in this way. |
| |
| |
| \S{picglobal} Exporting Symbols to the Library User |
| |
| If you want to export symbols to the user of the library, you have to |
| declare whether they are functions or data, and if they are data, you |
| have to give the size of the data item. This is because the dynamic |
| linker has to build \I{PLT}\e{procedure linkage table} entries for any |
| exported functions, and also moves exported data items away from the |
| library's data section in which they were declared. |
| |
| So to export a function to users of the library, you must use |
| |
| \c global func:function ; declare it as a function |
| \c |
| \c func: push ebp |
| \c |
| \c ; etc. |
| |
| And to export a data item such as an array, you would have to code |
| |
| \c global array:data array.end-array ; give the size too |
| \c |
| \c array: resd 128 |
| \c .end: |
| |
| Be careful: If you export a variable to the library user, by |
| declaring it as \c{GLOBAL} and supplying a size, the variable will |
| end up living in the data section of the main program, rather than |
| in your library's data section, where you declared it. So you will |
| have to access your own global variable with the \c{..got} mechanism |
| rather than \c{..gotoff}, as if it were external (which, |
| effectively, it has become). |
| |
| Equally, if you need to store the address of an exported global in |
| one of your data sections, you can't do it by means of the standard |
| sort of code: |
| |
| \c dataptr: dd global_data_item ; WRONG |
| |
| NASM will interpret this code as an ordinary relocation, in which |
| \c{global_data_item} is merely an offset from the beginning of the |
| \c{.data} section (or whatever); so this reference will end up |
| pointing at your data section instead of at the exported global |
| which resides elsewhere. |
| |
| Instead of the above code, then, you must write |
| |
| \c dataptr: dd global_data_item wrt ..sym |
| |
| which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym} |
| to instruct NASM to search the symbol table for a particular symbol |
| at that address, rather than just relocating by section base. |
| |
| Either method will work for functions: referring to one of your |
| functions by means of |
| |
| \c funcptr: dd my_function |
| |
| will give the user the address of the code you wrote, whereas |
| |
| \c funcptr: dd my_function wrt ..sym |
| |
| will give the address of the procedure linkage table for the |
| function, which is where the calling program will \e{believe} the |
| function lives. Either address is a valid way to call the function. |
| |
| |
| \S{picproc} Calling Procedures Outside the Library |
| |
| Calling procedures outside your shared library has to be done by means |
| of a \I{PLT relocations}\e{procedure linkage table}, or \i{PLT}. The |
| PLT is placed at a known offset from where the library is loaded, so |
| the library code can make calls to the PLT in a position-independent |
| way. Within the PLT there is code to jump to offsets contained in the |
| GOT, so function calls to other shared libraries or to routines in the |
| main program can be transparently passed off to their real |
| destinations. |
| |
| To call an external routine, you must use another special PIC |
| relocation type, \I{PLT relocations}\i\c{WRT ..plt}. This is much |
| easier than the GOT-based ones: you simply replace calls such as |
| \c{CALL printf} with the PLT-relative version \c{CALL printf WRT |
| ..plt}. |
| |
| |
| \S{link} Generating the Library File |
| |
| Having written some code modules and assembled them to \c{.o} files, |
| you then generate your shared library with a command such as |
| |
| \c ld -shared -o library.so module1.o module2.o # for ELF |
| \c ld -Bshareable -o library.so module1.o module2.o # for BSD |
| |
| For ELF, if your shared library is going to reside in system |
| directories such as \c{/usr/lib} or \c{/lib}, it is usually worth |
| using the \i\c{-soname} flag to the linker, to store the final |
| library file name, with a version number, into the library: |
| |
| \c ld -shared -soname library.so.1 -o library.so.1.2 *.o |
| |
| You would then copy \c{library.so.1.2} into the library directory, |
| and create \c{library.so.1} as a symbolic link to it. |