| \C{outfmt} \i{Output Formats} |
| |
| NASM is a portable assembler, designed to be able to compile on any |
| ANSI C-supporting platform and produce output to run on a variety of |
| Intel x86 operating systems. For this reason, it has a large number |
| of available output formats, selected using the \i\c{-f} option on |
| the NASM \i{command line}. Each of these formats, along with its |
| extensions to the base NASM syntax, is detailed in this chapter. |
| |
| As stated in \k{opt-o}, NASM chooses a \i{default name} for your |
| output file based on the input file name and the chosen output |
| format. This will be generated by removing the filename \i{extension} |
| (\c{.asm}, \c{.s}, or whatever you like to use) from the input file |
| name, and substituting an extension defined by the output format. |
| The extensions are given with each format below. |
| |
| |
| \H{binfmt} \i\c{bin}: \i{Flat-Form Binary}\I{pure binary} Output |
| |
| The \c{bin} format does not produce object files: it generates |
| nothing in the output file except the code you wrote. Such `pure |
| binary' files are used by \i{MS-DOS}: \i\c{.COM} executables and |
| \i\c{.SYS} device drivers are pure binary files. Pure binary output |
| is also useful for \i{operating system} and \i{boot loader} |
| development. |
| |
| The \c{bin} format supports \i{multiple section names}. For details of |
| how NASM handles sections in the \c{bin} format, see \k{multisec}. |
| |
| Using the \c{bin} format puts NASM by default into 16-bit mode (see |
| \k{bits}). In order to use \c{bin} to write 32-bit or 64-bit code, |
| such as an OS kernel, you need to explicitly issue the \I\c{BITS}\c{BITS 32} |
| or \I\c{BITS}\c{BITS 64} directive. |
| |
| \c{bin} has no default output file name extension: instead, it |
| leaves your file name as it is once the original extension has been |
| removed. Thus, the default is for NASM to assemble \c{binprog.asm} |
| into a binary file called \c{binprog}. |
| |
| It is extremely important to understand that the binary output format |
| is simply nothing other than \e{a linker built into the NASM |
| executable.} As such, NASM behaves just as it does when producing any |
| other output format: notably the list file reflects the code output |
| \e{before} relocation, and the addresses in the list file are |
| addresses relative to the start of the current output section. |
| |
| |
| \S{org} \i\c{ORG}: Binary File \i{Program Origin} |
| |
| The \c{bin} format provides an additional directive to the list |
| given in \k{directive}: \c{ORG}. The function of the \c{ORG} |
| directive is to specify the origin address which NASM will assume |
| the program begins at when it is loaded into memory. |
| |
| For example, the following code will generate the longword |
| \c{0x00000104}: |
| |
| \c org 0x100 |
| \c dd label |
| \c label: |
| |
| Unlike the \c{ORG} directive provided by MASM-compatible assemblers, |
| which allows you to jump around in the object file and overwrite |
| code you have already generated, NASM's \c{ORG} does exactly what |
| the directive says: \e{origin}. Its sole function is to specify one |
| offset which is added to all internal address references within the |
| section; it does not permit any of the trickery that MASM's version |
| does. See \k{proborg} for further comments. |
| |
| |
| \S{binseg} \c{bin} Extensions to the \c{SECTION} |
| Directive\I{section, bin extensions to} |
| |
| The \c{bin} output format extends the \c{SECTION} (or \c{SEGMENT}) |
| directive to allow you to specify the alignment requirements of |
| segments. This is done by appending the \i\c{ALIGN} qualifier to the |
| end of the section-definition line. For example, |
| |
| \c section .data align=16 |
| |
| switches to the section \c{.data} and also specifies that it must be |
| aligned on a 16-byte boundary. |
| |
| The parameter to \c{ALIGN} specifies how many low bits of the |
| section start address must be forced to zero. The alignment value |
| given may be any power of two.\I{section alignment, in |
| bin}\I{segment alignment, in bin}\I{alignment, in bin sections} |
| |
| |
| \S{multisec} \i{Multisection}\I{bin, multisection} Support for the \c{bin} Format |
| |
| The \c{bin} format allows the use of multiple sections, of arbitrary names, |
| besides the "known" \c{.text}, \c{.data}, and \c{.bss} names. |
| |
| \b Sections may be designated \i\c{progbits} or \i\c{nobits}. Default |
| is \c{progbits} (except \c{.bss}, which defaults to \c{nobits}, |
| of course). |
| |
| \b Sections can be aligned at a specified boundary following the previous |
| section with \c{align=}, or at an arbitrary byte-granular position with |
| \i\c{start=}. |
| |
| \b Sections can be given a virtual start address, which will be used |
| for the calculation of all memory references within that section |
| with \i\c{vstart=}. |
| |
| \b Sections can be ordered using \i\c{follows=}\c{<section>} or |
| \i\c{vfollows=}\c{<section>} as an alternative to specifying an explicit |
| start address. |
| |
| \b Arguments to \c{org}, \c{start}, \c{vstart}, and \c{align=} are |
| critical expressions. See \k{crit}. For example, in the case of |
| \c{align=(1 << ALIGN_SHIFT)}, \c{ALIGN_SHIFT} must be defined before |
| it is used here. |
| |
| \b Any code which comes before an explicit \c{SECTION} directive |
| is directed by default into the \c{.text} section. |
| |
| \b If an \c{ORG} statement is not given, \c{ORG 0} is used |
| by default. |
| |
| \b The \c{.bss} section will be placed after the last \c{progbits} |
| section, unless \c{start=}, \c{vstart=}, \c{follows=}, or \c{vfollows=} |
| has been specified. |
| |
| \b All sections are aligned on dword boundaries, unless a different |
| alignment has been specified. |
| |
| \b Sections may not overlap. |
| |
| \b NASM creates the \c{section.<secname>.start} for each section, |
| which may be used in your code. |
| |
| \S{map}\i{Map Files} |
| |
| Map files can be generated in \c{-f bin} format by means of the \c{[map]} |
| option. Map types of \c{all} (default), \c{brief}, \c{sections}, \c{segments}, |
| or \c{symbols} may be specified. Output may be directed to \c{stdout} |
| (default), \c{stderr}, or a specified file. E.g. |
| \c{[map symbols myfile.map]}. No "user form" exists, the square |
| brackets must be used. |
| |
| |
| \H{ithfmt} \i\c{ith}: \i{Intel Hex} Output |
| |
| The \c{ith} file format produces Intel hex-format files. Just as the |
| \c{bin} format, this is a flat memory image format with no support for |
| further relocation or linking. It is usually used with ROM |
| programmers and similar utilities. |
| |
| From a programmer point of view, this behaves identically to the |
| \c{.bin} format; the only difference is the encoding of the |
| output. All extensions supported by the \c{bin} file format is also |
| supported by the \c{ith} file format. |
| |
| \c{ith} provides a default output file-name extension of \c{.ith}. |
| |
| |
| \H{srecfmt} \i\c{srec}: \i{Motorola S-Records} Output |
| |
| The \c{srec} file format produces Motorola S-records files. Just as the |
| \c{bin} format, this is a flat memory image format with no support for |
| relocation or linking. It is usually used with ROM programmers and |
| similar utilities. |
| |
| From a programmer point of view, this behaves identically to the |
| \c{.bin} format; the only difference is the encoding of the |
| output. All extensions supported by the \c{bin} file format is also |
| supported by the \c{srec} file format. |
| |
| \c{srec} provides a default output file-name extension of \c{.srec}. |
| |
| |
| \H{objfmt} \i\c{obj}: \i{Microsoft OMF}\I{OMF} Object Files |
| |
| The \c{obj} file format (NASM calls it \c{obj} rather than \c{omf} |
| for historical reasons) is the one produced by \i{MASM} and |
| \i{TASM}, which is typically fed to 16-bit DOS linkers to produce |
| \i\c{.EXE} files. It is also the format used by \i{OS/2}. |
| |
| \c{obj} provides a default output file-name extension of \c{.obj}. |
| |
| \c{obj} is not exclusively a 16-bit format, though; NASM has full |
| support for the 32-bit extensions to the format. In particular, |
| 32-bit \c{obj} format files are used by \i{Borland's Win32 |
| compilers}, instead of using Microsoft's newer \i\c{win32} object |
| file format. |
| |
| The \c{obj} format does not define any special segment names: you |
| can call your segments anything you like. Typical names for segments |
| in \c{obj} format files are \c{CODE}, \c{DATA} and \c{BSS}. |
| |
| If your source file contains code before specifying an explicit |
| \c{SEGMENT} directive, then NASM will invent its own segment called |
| \i\c{__NASMDEFSEG} for you. |
| |
| When you define a segment in an \c{obj} file, NASM defines the |
| segment name as a symbol as well, so that you can access the segment |
| address of the segment. So, for example: |
| |
| \c segment data |
| \c |
| \c dvar: dw 1234 |
| \c |
| \c segment code |
| \c |
| \c function: |
| \c mov ax,data ; get segment address of data |
| \c mov ds,ax ; and move it into DS |
| \c inc word [dvar] ; now this reference will work |
| \c ret |
| |
| The \c{obj} format also enables the use of the \i\c{SEG} and |
| \i\c{WRT} operators, so that you can write code which does things |
| like |
| |
| \c extern foo |
| \c |
| \c mov ax,seg foo ; get preferred segment of foo |
| \c mov ds,ax |
| \c mov ax,data ; a different segment |
| \c mov es,ax |
| \c mov ax,[ds:foo] ; this accesses `foo' |
| \c mov [es:foo wrt data],bx ; so does this |
| |
| |
| \S{objseg} \c{obj} Extensions to the \c{SEGMENT} |
| Directive\I{SEGMENT, obj extensions to} |
| |
| The \c{obj} output format extends the \c{SEGMENT} (or \c{SECTION}) |
| directive to allow you to specify various properties of the segment |
| you are defining. This is done by appending extra qualifiers to the |
| end of the segment-definition line. For example, |
| |
| \c segment code private align=16 |
| |
| defines the segment \c{code}, but also declares it to be a private |
| segment, and requires that the portion of it described in this code |
| module must be aligned on a 16-byte boundary. |
| |
| The available qualifiers are: |
| |
| \b \i\c{PRIVATE}, \i\c{PUBLIC}, \i\c{COMMON} and \i\c{STACK} specify |
| the combination characteristics of the segment. \c{PRIVATE} segments |
| do not get combined with any others by the linker; \c{PUBLIC} and |
| \c{STACK} segments get concatenated together at link time; and |
| \c{COMMON} segments all get overlaid on top of each other rather |
| than stuck end-to-end. |
| |
| \b \i\c{ALIGN} is used, as shown above, to specify how many low bits |
| of the segment start address must be forced to zero. The alignment |
| value given may be any power of two from 1 to 4096; in reality, the |
| only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 is |
| specified it will be rounded up to 16, and 32, 64 and 128 will all |
| be rounded up to 256, and so on. Note that alignment to 4096-byte |
| boundaries is a \i{PharLap} extension to the format and may not be |
| supported by all linkers.\I{section alignment, in OBJ}\I{segment |
| alignment, in OBJ}\I{alignment, in OBJ sections} |
| |
| \b \i\c{CLASS} can be used to specify the segment class; this feature |
| indicates to the linker that segments of the same class should be |
| placed near each other in the output file. The class name can be any |
| word, e.g. \c{CLASS=CODE}. |
| |
| \b \i\c{OVERLAY}, like \c{CLASS}, is specified with an arbitrary word |
| as an argument, and provides overlay information to an |
| overlay-capable linker. |
| |
| \b Segments can be declared as \i\c{USE16} or \i\c{USE32}, which has |
| the effect of recording the choice in the object file and also |
| ensuring that NASM's default assembly mode when assembling in that |
| segment is 16-bit or 32-bit respectively. |
| |
| \b When writing \i{OS/2} object files, you should declare 32-bit |
| segments as \i\c{FLAT}, which causes the default segment base for |
| anything in the segment to be the special group \c{FLAT}, and also |
| defines the group if it is not already defined. |
| |
| \b The \c{obj} file format also allows segments to be declared as |
| having a pre-defined absolute segment address, although no linkers |
| are currently known to make sensible use of this feature; |
| nevertheless, NASM allows you to declare a segment such as |
| \c{SEGMENT SCREEN ABSOLUTE=0xB800} if you need to. The \i\c{ABSOLUTE} |
| and \c{ALIGN} keywords are mutually exclusive. |
| |
| NASM's default segment attributes are \c{PUBLIC}, \c{ALIGN=1}, no |
| class, no overlay, and \c{USE16}. |
| |
| |
| \S{group} \i\c{GROUP}: Defining Groups of Segments\I{segments, groups of} |
| |
| The \c{obj} format also allows segments to be grouped, so that a |
| single segment register can be used to refer to all the segments in |
| a group. NASM therefore supplies the \c{GROUP} directive, whereby |
| you can code |
| |
| \c segment data |
| \c |
| \c ; some data |
| \c |
| \c segment bss |
| \c |
| \c ; some uninitialized data |
| \c |
| \c group dgroup data bss |
| |
| which will define a group called \c{dgroup} to contain the segments |
| \c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group |
| name to be defined as a symbol, so that you can refer to a variable |
| \c{var} in the \c{data} segment as \c{var wrt data} or as \c{var wrt |
| dgroup}, depending on which segment value is currently in your |
| segment register. |
| |
| If you just refer to \c{var}, however, and \c{var} is declared in a |
| segment which is part of a group, then NASM will default to giving |
| you the offset of \c{var} from the beginning of the \e{group}, not |
| the \e{segment}. Therefore \c{SEG var}, also, will return the group |
| base rather than the segment base. |
| |
| NASM will allow a segment to be part of more than one group, but |
| will generate a warning if you do this. Variables declared in a |
| segment which is part of more than one group will default to being |
| relative to the first group that was defined to contain the segment. |
| |
| A group does not have to contain any segments; you can still make |
| \c{WRT} references to a group which does not contain the variable |
| you are referring to. OS/2, for example, defines the special group |
| \c{FLAT} with no segments in it. |
| |
| \c{GROUP} is cumulative. The above example can be done like this: |
| |
| \c group dgroup data |
| \c group dgroup bss |
| |
| \S{uppercase} \i\c{UPPERCASE}: Disabling Case Sensitivity in Output |
| |
| Although NASM itself is \i{case sensitive}, some OMF linkers are |
| not; therefore it can be useful for NASM to output single-case |
| object files. The \c{UPPERCASE} format-specific directive causes all |
| segment, group and symbol names that are written to the object file |
| to be forced to upper case just before being written. Within a |
| source file, NASM is still case-sensitive; but the object file can |
| be written entirely in upper case if desired. |
| |
| \c{UPPERCASE} is used alone on a line; it requires no parameters. |
| |
| |
| \S{import} \i\c{IMPORT}: Importing DLL Symbols\I{DLL symbols, |
| importing}\I{symbols, importing from DLLs} |
| |
| The \c{IMPORT} format-specific directive defines a symbol to be |
| imported from a DLL, for use if you are writing a DLL's \i{import |
| library} in NASM. You still need to declare the symbol as \c{EXTERN} |
| as well as using the \c{IMPORT} directive. |
| |
| The \c{IMPORT} directive takes two required parameters, separated by |
| white space, which are (respectively) the name of the symbol you |
| wish to import and the name of the library you wish to import it |
| from. For example: |
| |
| \c import WSAStartup wsock32.dll |
| |
| A third optional parameter gives the name by which the symbol is |
| known in the library you are importing it from, in case this is not |
| the same as the name you wish the symbol to be known by to your code |
| once you have imported it. For example: |
| |
| \c import asyncsel wsock32.dll WSAAsyncSelect |
| |
| |
| \S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols, |
| exporting}\I{symbols, exporting from DLLs} |
| |
| The \c{EXPORT} format-specific directive defines a global symbol to |
| be exported as a DLL symbol, for use if you are writing a DLL in |
| NASM. You still need to declare the symbol as \c{GLOBAL} as well as |
| using the \c{EXPORT} directive. |
| |
| \c{EXPORT} takes one required parameter, which is the name of the |
| symbol you wish to export, as it was defined in your source file. An |
| optional second parameter (separated by white space from the first) |
| gives the \e{external} name of the symbol: the name by which you |
| wish the symbol to be known to programs using the DLL. If this name |
| is the same as the internal name, you may leave the second parameter |
| off. |
| |
| Further parameters can be given to define attributes of the exported |
| symbol. These parameters, like the second, are separated by white |
| space. If further parameters are given, the external name must also |
| be specified, even if it is the same as the internal name. The |
| available attributes are: |
| |
| \b \c{resident} indicates that the exported name is to be kept |
| resident by the system loader. This is an optimization for |
| frequently used symbols imported by name. |
| |
| \b \c{nodata} indicates that the exported symbol is a function which |
| does not make use of any initialized data. |
| |
| \b \c{parm=NNN}, where \c{NNN} is an integer, sets the number of |
| parameter words for the case in which the symbol is a call gate |
| between 32-bit and 16-bit segments. |
| |
| \b An attribute which is just a number indicates that the symbol |
| should be exported with an identifying number (ordinal), and gives |
| the desired number. |
| |
| For example: |
| |
| \c export myfunc |
| \c export myfunc TheRealMoreFormalLookingFunctionName |
| \c export myfunc myfunc 1234 ; export by ordinal |
| \c export myfunc myfunc resident parm=23 nodata |
| |
| |
| \S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry |
| Point} |
| |
| \c{OMF} linkers require exactly one of the object files being linked to |
| define the program entry point, where execution will begin when the |
| program is run. If the object file that defines the entry point is |
| assembled using NASM, you specify the entry point by declaring the |
| special symbol \c{..start} at the point where you wish execution to |
| begin. |
| |
| |
| \S{objextern} \c{obj} Extensions to the \c{EXTERN} |
| Directive\I{EXTERN, obj extensions to} |
| |
| If you declare an external symbol with the directive |
| |
| \c extern foo |
| |
| then references such as \c{mov ax,foo} will give you the offset of |
| \c{foo} from its preferred segment base (as specified in whichever |
| module \c{foo} is actually defined in). So to access the contents of |
| \c{foo} you will usually need to do something like |
| |
| \c mov ax,seg foo ; get preferred segment base |
| \c mov es,ax ; move it into ES |
| \c mov ax,[es:foo] ; and use offset `foo' from it |
| |
| This is a little unwieldy, particularly if you know that an external |
| is going to be accessible from a given segment or group, say |
| \c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could |
| simply code |
| |
| \c mov ax,[foo wrt dgroup] |
| |
| However, having to type this every time you want to access \c{foo} |
| can be a pain; so NASM allows you to declare \c{foo} in the |
| alternative form |
| |
| \c extern foo:wrt dgroup |
| |
| This form causes NASM to pretend that the preferred segment base of |
| \c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will |
| now return \c{dgroup}, and the expression \c{foo} is equivalent to |
| \c{foo wrt dgroup}. |
| |
| This \I{default-WRT mechanism}default-\c{WRT} mechanism can be used |
| to make externals appear to be relative to any group or segment in |
| your program. It can also be applied to common variables: see |
| \k{objcommon}. |
| |
| |
| \S{objcommon} \c{obj} Extensions to the \c{COMMON} |
| Directive\I{COMMON, obj extensions to} |
| |
| The \c{obj} format allows common variables to be either near\I{near |
| common variables} or far\I{far common variables}; NASM allows you to |
| specify which your variables should be by the use of the syntax |
| |
| \c common nearvar 2:near ; `nearvar' is a near common |
| \c common farvar 10:far ; and `farvar' is far |
| |
| Far common variables may be greater in size than 64Kb, and so the |
| OMF specification says that they are declared as a number of |
| \e{elements} of a given size. So a 10-byte far common variable could |
| be declared as ten one-byte elements, five two-byte elements, two |
| five-byte elements or one ten-byte element. |
| |
| Some \c{OMF} linkers require the \I{element size, in common |
| variables}\I{common variables, element size}element size, as well as |
| the variable size, to match when resolving common variables declared |
| in more than one module. Therefore NASM must allow you to specify |
| the element size on your far common variables. This is done by the |
| following syntax: |
| |
| \c common c_5by2 10:far 5 ; two five-byte elements |
| \c common c_2by5 10:far 2 ; five two-byte elements |
| |
| If no element size is specified, the default is 1. Also, the \c{FAR} |
| keyword is not required when an element size is specified, since |
| only far commons may have element sizes at all. So the above |
| declarations could equivalently be |
| |
| \c common c_5by2 10:5 ; two five-byte elements |
| \c common c_2by5 10:2 ; five two-byte elements |
| |
| In addition to these extensions, the \c{COMMON} directive in \c{obj} |
| also supports default-\c{WRT} specification like \c{EXTERN} does |
| (explained in \k{objextern}). So you can also declare things like |
| |
| \c common foo 10:wrt dgroup |
| \c common bar 16:far 2:wrt data |
| \c common baz 24:wrt data:6 |
| |
| |
| \S{objdepend} Embedded File Dependency Information |
| |
| Since NASM 2.13.02, \c{obj} files contain embedded dependency file |
| information. To suppress the generation of dependencies, use |
| |
| \c %pragma obj nodepend |
| |
| |
| \H{obj2fmt} \i\c{obj2}: \i{OS/2 32-bit OMF}\I{OMF} Object Files |
| |
| The \c{obj2} output format is the same as \c{obj} except: |
| |
| \b Default attributes for a segment are \c{ALIGN=16} and \c{USE32}. |
| |
| \b All 32-bit segment is added to \c{FLAT} group implicitly. |
| |
| \b Support Unix sections such as \c{.text}, \c{.rodata}, \c{.data} |
| and \c{.bss} for compatibility with other Unix platforms. And they are |
| aliased to \c{TEXT32}, \c{CONST32}, \c{DATA32}, \c{BSS32}, respectively. |
| |
| \b Set default classes implicitly for known segments such as TEXT32, |
| CONST32, DATA32, BSS32 and so on. |
| |
| The defaults assumed by NASM if you do not specify the qualifiers are: |
| |
| \c SECTION .text ALIGN=16 USE32 CLASS=CODE FLAT |
| \c SECTION .rodata ALIGN=16 USE32 CLASS=CONST FLAT |
| \c SECTION .data ALIGN=16 USE32 CLASS=DATA FLAT |
| \c SECTION .bss ALIGN=16 USE32 CLASS=BSS FLAT |
| \c SECTION CODE ALIGN=16 USE32 CLASS=CODE FLAT |
| \c SECTION TEXT ALIGN=16 USE32 CLASS=CODE FLAT |
| \c SECTION CONST ALIGN=16 USE32 CLASS=CONST FLAT |
| \c SECTION DATA ALIGN=16 USE32 CLASS=DATA FLAT |
| \c SECTION BSS ALIGN=16 USE32 CLASS=BSS FLAT |
| \c SECTION STACK ALIGN=16 USE32 CLASS=STACK FLAT |
| \c SECTION CODE32 ALIGN=16 USE32 CLASS=CODE FLAT |
| \c SECTION TEXT32 ALIGN=16 USE32 CLASS=CODE FLAT |
| \c SECTION CONST32 ALIGN=16 USE32 CLASS=CONST FLAT |
| \c SECTION DATA32 ALIGN=16 USE32 CLASS=DATA FLAT |
| \c SECTION BSS32 ALIGN=16 USE32 CLASS=BSS FLAT |
| \c SECTION STACK32 ALIGN=16 USE32 CLASS=STACK FLAT |
| |
| |
| \H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files |
| |
| The \c{win32} output format generates Microsoft Win32 object files, |
| suitable for passing to Microsoft linkers such as \i{Visual C++}. |
| Note that Borland Win32 compilers do not use this format, but use |
| \c{obj} instead (see \k{objfmt}). |
| |
| \c{win32} provides a default output file-name extension of \c{.obj}. |
| |
| Note that although Microsoft say that Win32 object files follow the |
| \c{COFF} (Common Object File Format) standard, the object files produced |
| by Microsoft Win32 compilers are not compatible with COFF linkers |
| such as DJGPP's, and vice versa. This is due to a difference of |
| opinion over the precise semantics of PC-relative relocations. To |
| produce COFF files suitable for DJGPP, use NASM's \c{coff} output |
| format; conversely, the \c{coff} format does not produce object |
| files that Win32 linkers can generate correct output from. |
| |
| |
| \S{win32sect} \c{win32} Extensions to the \c{SECTION} |
| Directive\I{SECTION, Windows extensions to} |
| |
| Like the \c{obj} format, \c{win32} allows you to specify additional |
| information on the \c{SECTION} directive line, to control the type |
| and properties of sections you declare. Section types and properties |
| are generated automatically by NASM for the \i{standard section names} |
| \c{.text}, \c{.data} and \c{.bss}, but may still be overridden by |
| these qualifiers. |
| |
| The available qualifiers are: |
| |
| \b \c{code}, or equivalently \c{text}, defines the section to be a |
| code section. This marks the section as readable and executable, but |
| not writable, and also indicates to the linker that the type of the |
| section is code. |
| |
| \b \c{data} and \c{bss} define the section to be a data section, |
| analogously to \c{code}. Data sections are marked as readable and |
| writable, but not executable. \c{data} declares an initialized data |
| section, whereas \c{bss} declares an uninitialized data section. |
| |
| \b \c{rdata} declares an initialized data section that is readable |
| but not writable. Microsoft compilers use this section to place |
| constants in it. |
| |
| \b \c{info} defines the section to be an \i{informational section}, |
| which is not included in the executable file by the linker, but may |
| (for example) pass information \e{to} the linker. For example, |
| declaring an \c{info}-type section called \i\c{.drectve} causes the |
| linker to interpret the contents of the section as command-line |
| options. |
| |
| \b \c{align=}, used with a trailing number as in \c{obj}, gives the |
| \I{section alignment, in win32}\I{alignment, in win32 |
| sections}alignment requirements of the section. The maximum you may |
| specify is 64: the Win32 object file format contains no means to |
| request a greater section alignment than this. If alignment is not |
| explicitly specified, the defaults are 16-byte alignment for code |
| sections, 8-byte alignment for rdata sections and 4-byte alignment |
| for data (and BSS) sections. |
| Informational sections get a default alignment of 1 byte (no |
| alignment), though the value does not matter. |
| |
| \b \I{comdat, win32 attribute}\c{comdat=}, followed by a number |
| ("selection"), colon (acting as a separator) and a name, |
| marks the section as a \I{COMDAT section, in win32}"COMDAT section". |
| It allows Microsoft linkers to perform function-level linking, |
| to deal with multiply defined symbols, to eliminate dead code/data. |
| |
| The "selection" number should be one of the |
| \c{IMAGE_COMDAT_SELECT_*} constants from |
| \W{https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src/Debug/pe-format.md#comdat-sections-object-only}\c{COFF format specification}; |
| this value controls if the linker allows multiply defined symbols |
| and how it handles them. |
| |
| The name is the \I{COMDAT symbol, in win32}"COMDAT symbol" |
| - basically a new name for the section. So even though you have one |
| section given by the main name (e.g. \c{.text}), it can actually |
| consist of hundreds of COMDAT sections having their own name |
| (and alignment). |
| |
| When the "selection" is IMAGE_COMDAT_SELECT_ASSOCIATIVE (5), |
| the following name is the "COMDAT symbol" of the associated COMDAT |
| section; this way you can link a piece of code or data only when |
| another piece of code or data gets actually linked. |
| |
| \> So, when linking a NASM-compiled file with some C code, |
| the source may be structured as follows. |
| Note that the default \c{.text} section in handled in a special |
| way and it doesn't work well with \c{comdat}; you may want to append |
| a \c{$} character and an arbitrary suffix to the section name. |
| It will get linked into the \c{.text} section anyway - see the info on |
| \W{https://github.com/MicrosoftDocs/win32/blob/docs/desktop-src/Debug/pe-format.md#grouped-sections-object-only}\c{Grouped Sections}. |
| |
| \c section .text$1 align=16 comdat=1:FirstFnc |
| \c ... ; Code linked only if referenced from C |
| \c |
| \c section .text$1 align=16 comdat=1:SecondFnc |
| \c ... ; Code linked only if referenced from C |
| \c |
| \c section .rdata align=32 comdat=5:FirstFnc |
| \c ... ; Data linked only if the related code (FirstFnc) is linked |
| \c |
| |
| The defaults assumed by NASM if you do not specify the above |
| qualifiers are: |
| |
| \c section .text code align=16 |
| \c section .data data align=4 |
| \c section .rdata rdata align=8 |
| \c section .bss bss align=4 |
| |
| The \c{win64} format also adds: |
| |
| \c section .pdata rdata align=4 |
| \c section .xdata rdata align=8 |
| |
| Any other section name is treated by default like \c{.text}. |
| |
| \S{win32safeseh} \c{win32}: Safe Structured Exception Handling |
| |
| Among other improvements in Windows XP SP2 and Windows Server 2003, |
| Microsoft has introduced the concept of "safe structured exception |
| handling." The general idea is to collect handlers' entry points |
| in a designated read-only table and have SEH entry points verified |
| against this table before exception control is passed to the |
| corresponding handler. In order for an executable module to be |
| equipped with this read-only table, all object modules on linker |
| command line have to comply with certain criteria. If even a single |
| module among them does not, then the table in question is omitted |
| and above mentioned run-time checks will not be performed for the |
| application in question. Table omission is silent by default and |
| therefore can be easily missed. One can instruct the linker to |
| refuse to produce binary without such table by passing the |
| \c{/safeseh} command line option. |
| |
| Without regard to this run-time check, it's natural to expect |
| NASM to be capable of generating modules suitable for \c{/safeseh} |
| linking. From the developer's viewpoint the problem is two-fold: |
| |
| \b how to adapt modules not deploying exception handlers of their own; |
| |
| \b how to adapt/develop modules utilizing custom exception handling; |
| |
| The former can be easily achieved with any NASM version by adding the |
| following line to the source code: |
| |
| \c $@feat.00 equ 1 |
| |
| As of version 2.03 NASM adds this absolute symbol automatically, if |
| it is not already present (in which case the developer can choose to |
| assign another value, if desired, for whatever reason). |
| |
| Registering a custom exception handler on the other hand requires |
| certain "magic." As of version 2.03, an additional \c{safeseh} directive |
| is implemented, which instructs the assembler to produce appropriately |
| formatted input data for the above-mentioned "safe exception handler |
| table." Its typical use would be: |
| |
| \c section .text |
| \c extern _MessageBoxA@16 |
| \c %if __?NASM_VERSION_ID?__ >= 0x02030000 |
| \c safeseh handler ; register handler as "safe handler" |
| \c %endif |
| \c handler: |
| \c push DWORD 1 ; MB_OKCANCEL |
| \c push DWORD caption |
| \c push DWORD text |
| \c push DWORD 0 |
| \c call _MessageBoxA@16 |
| \c sub eax,1 ; incidentally suits as return value |
| \c ; for exception handler |
| \c ret |
| \c global _main |
| \c _main: |
| \c push DWORD handler |
| \c push DWORD [fs:0] |
| \c mov DWORD [fs:0],esp ; engage exception handler |
| \c xor eax,eax |
| \c mov eax,DWORD[eax] ; cause exception |
| \c pop DWORD [fs:0] ; disengage exception handler |
| \c add esp,4 |
| \c ret |
| \c text: db 'OK to rethrow, CANCEL to generate core dump',0 |
| \c caption:db 'SEGV',0 |
| \c |
| \c section .drectve info |
| \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' |
| |
| As you might imagine, it's perfectly possible to produce an .exe binary |
| with the "safe exception handler table" and yet invoke an unregistered |
| exception handler. A handler is invoked by manipulating \c{[fs:0]} |
| at run-time, something the linker has no power over. It is therefore |
| important to note that such failure to register a handler's entry point |
| with the \c{safeseh} directive will have undesired side effects at |
| run-time. If an exception is raised and an unregistered handler is to be |
| executed, the application is abruptly terminated without any notification |
| whatsoever. One can argue that the system should at least log some kind |
| of "non-safe exception handler in x.exe at address n" message in the |
| event log, but unfortunately the user is left without any clue as to |
| what might have caused the crash. |
| |
| Finally, all mentions of linker in this paragraph refer to Microsoft |
| linker version 7.x and later. Presence of \c{@feat.00} symbol and input |
| data for "safe exception handler table" causes no backward |
| incompatibilities and "safeseh" modules generated by NASM 2.03 and |
| later can still be linked by earlier versions or non-Microsoft linkers. |
| |
| \S{codeview} Debugging formats for Windows |
| \I{Windows debugging formats} |
| |
| The \c{win32} and \c{win64} formats support the Microsoft \i{CodeView |
| debugging format}. Currently CodeView version 8 format is supported |
| (\i\c{cv8}), but newer versions of the CodeView debugger should be |
| able to handle this format as well. |
| |
| |
| \H{win64fmt} \i\c{win64}: Microsoft Win64 Object Files |
| |
| The \c{win64} output format generates Microsoft Win64 object files, |
| which is nearly 100% identical to the \c{win32} object format (\k{win32fmt}) |
| with the exception that it is meant to target 64-bit code and the x86-64 |
| platform altogether. This object file is used exactly the same as the \c{win32} |
| object format (\k{win32fmt}), in NASM, with regard to this exception. |
| |
| \S{win64pic} \c{win64}: Writing Position-Independent Code |
| |
| While \c{REL} takes good care of RIP-relative addressing, there is one |
| aspect that is easy to overlook for a Win64 programmer: indirect |
| references. Consider a switch dispatch table: |
| |
| \c jmp qword [dsptch+rax*8] |
| \c ... |
| \c dsptch: dq case0 |
| \c dq case1 |
| \c ... |
| |
| Even a novice Win64 assembler programmer will soon realize that the code |
| is not 64-bit savvy. Most notably the linker will refuse to link it, showing: |
| |
| \c 'ADDR32' relocation to '.text' invalid without /LARGEADDRESSAWARE:NO |
| |
| So [s]he will have to split jmp instruction as following: |
| |
| \c lea rbx,[rel dsptch] |
| \c jmp qword [rbx+rax*8] |
| |
| What happens behind the scenes is that the effective address in \c{lea} |
| is encoded relative to instruction pointer, in a perfectly |
| position-independent manner. But this is only part of the problem! |
| The issue is that in a .dll context, the \c{caseN} relocations will make |
| their way to the final module and might have to be adjusted at .dll load |
| time (specifically, when it can't be loaded at the preferred address). |
| When this occurs, pages with such relocations will be rendered private |
| to current process, which kind of undermines the idea of a shared .dll. |
| But not to worry, it's trivial to fix: |
| |
| \c lea rbx,[rel dsptch] |
| \c add rbx,[rbx+rax*8] |
| \c jmp rbx |
| \c ... |
| \c dsptch: dq case0-dsptch |
| \c dq case1-dsptch |
| \c ... |
| |
| NASM version 2.03 and later provides another alternative, \c{wrt |
| ..imagebase} operator, which returns an offset from base address of the |
| current image, be it .exe or .dll module, hence the name. For those |
| acquainted with PE-COFF format, this base address denotes the start of |
| the \c{IMAGE_DOS_HEADER} structure. Here is how to implement a switch |
| statement with these image-relative references: |
| |
| \c lea rbx,[rel dsptch] |
| \c mov eax,[rbx+rax*4] |
| \c sub rbx,dsptch wrt ..imagebase |
| \c add rbx,rax |
| \c jmp rbx |
| \c ... |
| \c dsptch: dd case0 wrt ..imagebase |
| \c dd case1 wrt ..imagebase |
| |
| That said, the snippet before last works just fine with any NASM version |
| and is not even Windows specific, which makes this operator unnecessary |
| in this case. The real reason for the \c{wrt ..imagebase} operator will |
| become apparent in the next section. |
| |
| It should be noted that \c{wrt ..imagebase} is defined as 32-bit |
| operand only: |
| |
| \c dd label wrt ..imagebase ; ok |
| \c dq label wrt ..imagebase ; bad |
| \c mov eax,label wrt ..imagebase ; ok |
| \c mov rax,label wrt ..imagebase ; bad |
| |
| \S{win64seh} \c{win64}: Structured Exception Handling |
| |
| Structured exception handing in Win64 is completely different compared |
| to Win32. When an exception occurs, the program counter is noted, and a |
| linker-generated table containing start and end addresses of all the |
| functions (in a given executable module) is traversed and compared to |
| the saved program counter. This is used to identify the corresponding |
| \c{UNWIND_INFO} structure. If missing, then the offending subroutine is |
| assumed to be "leaf" and this lookup procedure is instead attempted for |
| its caller. In Win64, a leaf function is a function that does not call |
| any other functions \e{nor} modifies any Win64 non-volatile registers, |
| including the stack pointer. The latter ensures that it's possible to |
| identify a leaf function's caller by simply pulling the value from the |
| top of the stack. |
| |
| While the majority of subroutines written in assembler are not calling |
| any other functions, they may not qualify as "leaf" functions in the |
| Win64 sense. The requirement for non-volatile registers to be |
| unchanged leaves the developer with not more than 7 registers and no |
| stack frame, which is not necessarily what they counted on. |
| Customarily one would meet this requirement by saving non-volatile |
| registers on stack and restoring them upon return. However, if (and |
| only if) an exception is raised at run-time and no \c{UNWIND_INFO} |
| structure is associated with such a "leaf" function, the stack unwind |
| procedure will expect to find the caller's return address on the top of |
| the stack immediately followed by its frame. Given that the developer |
| pushed the caller's non-volatile registers onto the stack, the value |
| on top will no longer point to the right place. The developer can |
| attempt to copy the caller's return address to the top of stack, which |
| would work in some very specific circumstances. But unless the |
| developer can guarantee that these circumstances are always met, it's |
| more appropriate to assume the worst, i.e. the stack unwind procedure |
| goes berserk, abruptly terminating without any notification whatsoever |
| (just like in the the Win32 case). |
| |
| Now that we understand significance of the \c{UNWIND_INFO} structure, |
| let us discuss what is in it and how it is processed. First, it is |
| checked for the presence of a reference to a custom language-specific |
| exception handler. If there is one, then it is invoked. Depending on |
| the return value, execution flow is resumed (exception is said to be |
| "handled"), \e{or} the rest of the \c{UNWIND_INFO} structure is |
| processed as follows. Aside from an optional reference to a custom |
| handler, it carries information about the current callee's stack frame |
| and where non-volatile registers are saved. The information is detailed |
| enough to be able to reconstruct the contents of the caller's |
| non-volatile registers on entry to the current callee. And so the |
| caller's context is reconstructed, at which point the unwind procedure |
| is repeated, using the \c{UNWIND_INFO} structure associated with the |
| caller's instruction pointer. The procedure is repeated recursively |
| until the exception is handled. As a last resort, the system "handles" |
| it by generating a memory dump and terminating the application. |
| |
| As of this writing, NASM unfortunately does not facilitate generation |
| of above mentioned detailed information about stack frame layout. But |
| as of version 2.03, it implements building blocks for generating |
| structures involved in stack unwinding. Here is a simple example |
| showing how to deploy a custom exception handler for a leaf function: |
| |
| \c default rel |
| \c section .text |
| \c extern MessageBoxA |
| \c handler: |
| \c sub rsp,40 |
| \c mov rcx,0 |
| \c lea rdx,[text] |
| \c lea r8,[caption] |
| \c mov r9,1 ; MB_OKCANCEL |
| \c call MessageBoxA |
| \c sub eax,1 ; incidentally suits as return value |
| \c ; for exception handler |
| \c add rsp,40 |
| \c ret |
| \c global main |
| \c main: |
| \c xor rax,rax |
| \c mov rax,QWORD[rax] ; cause exception |
| \c ret |
| \c main_end: |
| \c text: db 'OK to rethrow, CANCEL to generate core dump',0 |
| \c caption:db 'SEGV',0 |
| \c |
| \c section .pdata rdata align=4 |
| \c dd main wrt ..imagebase |
| \c dd main_end wrt ..imagebase |
| \c dd xmain wrt ..imagebase |
| \c section .xdata rdata align=8 |
| \c xmain: db 9,0,0,0 |
| \c dd handler wrt ..imagebase |
| \c section .drectve info |
| \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' |
| |
| What you see is that the \c{.pdata} section contains a single-element |
| table, containing function start and end addresses, along with references |
| to associated \c{UNWIND_INFO} structures (only one in this case). The |
| \c{.xdata} section contains the referenced \c{UNWIND_INFO} structure, |
| describing a function with no frame, but with a designated exception handler. |
| These references are \e{required} to be image-relative, which is the real |
| reason for implementing the \c{wrt ..imagebase} operator). It should be |
| noted that \c{rdata align=n}, as well as \c{wrt ..imagebase}, are actually |
| optional in the context of these two segments (they apply even when omitted); |
| \e{all} 32-bit references placed into these two segments will be image-relative. |
| This is important to understand, as the developer is allowed to append |
| handler-specific data to the \c{UNWIND_INFO} structure, and any 32-bit |
| references that are added may require adjustment to obtain the real pointer. |
| |
| As already mentioned, in Win64 terms, a leaf function is one that neither |
| calls any other function \e{nor} modifies any non-volatile registers, |
| including the stack pointer. But it is not uncommon for the programmer |
| to intend to utilize every single register and sometimes even have a |
| variable stack frame, requiring a more complicated \c{UNWIND_INFO} structure |
| than in the example above. Is there anything one can do with these simpler |
| building blocks, and avoid manually composing fully-fledged \c{UNWIND_INFO} |
| structures, which would surely be considered error-prone? Yes, there is. |
| Recall that an exception handler is called first, before the stack layout |
| is analyzed. As it turns out, it is perfectly possible to manipulate |
| current callee's context in a custom handler in a manner that permits |
| further stack unwinding. The general idea is that handler would not |
| actually "handle" the exception, but instead restore the callee's context |
| (restore to state at entry point) and thus mimic a Win64 leaf function. |
| In other words, the handler would effectively undertake part of the |
| unwinding procedure. Consider the following example: |
| |
| \c function: |
| \c mov rax,rsp ; copy rsp to volatile register |
| \c push r15 ; save non-volatile registers |
| \c push rbx |
| \c push rbp |
| \c mov r11,rsp ; prepare variable stack frame |
| \c sub r11,rcx |
| \c and r11,-64 |
| \c mov QWORD[r11],rax ; check for exceptions |
| \c mov rsp,r11 ; allocate stack frame |
| \c mov QWORD[rsp],rax ; save original rsp value |
| \c magic_point: |
| \c ... |
| \c mov r11,QWORD[rsp] ; pull original rsp value |
| \c mov rbp,QWORD[r11-24] |
| \c mov rbx,QWORD[r11-16] |
| \c mov r15,QWORD[r11-8] |
| \c mov rsp,r11 ; destroy frame |
| \c ret |
| |
| The key is that until \c{magic_point}, the original \c{rsp} value |
| remains in the chosen volatile register, and no non-volatile register |
| except for \c{rsp} is modified. After \c{magic_point}, \c{rsp} remains |
| constant till the very end of the \c{function}. In this case a custom |
| language-specific exception handler would look like this: |
| |
| \c EXCEPTION_DISPOSITION handler (EXCEPTION_RECORD *rec,ULONG64 frame, |
| \c CONTEXT *context,DISPATCHER_CONTEXT *disp) |
| \c { ULONG64 *rsp; |
| \c if (context->Rip<(ULONG64)magic_point) |
| \c rsp = (ULONG64 *)context->Rax; |
| \c else |
| \c { rsp = ((ULONG64 **)context->Rsp)[0]; |
| \c context->Rbp = rsp[-3]; |
| \c context->Rbx = rsp[-2]; |
| \c context->R15 = rsp[-1]; |
| \c } |
| \c context->Rsp = (ULONG64)rsp; |
| \c |
| \c memcpy (disp->ContextRecord,context,sizeof(CONTEXT)); |
| \c RtlVirtualUnwind(UNW_FLAG_NHANDLER,disp->ImageBase, |
| \c dips->ControlPc,disp->FunctionEntry,disp->ContextRecord, |
| \c &disp->HandlerData,&disp->EstablisherFrame,NULL); |
| \c return ExceptionContinueSearch; |
| \c } |
| |
| As this custom handler allows the example function to mimic a Win64 leaf |
| function, the corresponding \c{UNWIND_INFO} structure does not need to |
| contain any information about the stack frame and its layout. |
| |
| \H{cofffmt} \i\c{coff}: \i{Common Object File Format} |
| |
| The \c{coff} output type produces \c{COFF} object files suitable for |
| linking with the \i{DJGPP} linker. |
| |
| \c{coff} provides a default output file-name extension of \c{.o}. |
| |
| The \c{coff} format supports the same extensions to the \c{SECTION} |
| directive as \c{win32} does, except that the \c{align} qualifier and |
| the \c{info} section type are not supported. |
| |
| \H{machofmt} \I{Mach-O}\i\c{macho32} and \i\c{macho64}: \i{Mach Object File Format} |
| |
| The \c{macho32} and \c{macho64} output formts produces Mach-O |
| object files suitable for linking with the \i{MacOS X} linker. |
| \i\c{macho} is a synonym for \c{macho32}. |
| |
| \c{macho} provides a default output file-name extension of \c{.o}. |
| |
| \S{machosect} \c{macho} extensions to the \c{SECTION} Directive |
| \I{SECTION, macho extensions to} |
| |
| The \c{macho} output format specifies section names in the format |
| "\e{segment}\c{,}\e{section}". No spaces are allowed around the |
| comma. The following flags can also be specified: |
| |
| \b \c{data} - this section contains initialized data items |
| |
| \b \c{code} - this section contains code exclusively |
| |
| \b \c{mixed} - this section contains both code and data |
| |
| \b \c{bss} - this section is uninitialized and filled with zero |
| |
| \b \c{zerofill} - same as \c{bss} |
| |
| \b \c{no_dead_strip} - inhibit dead code stripping for this section |
| |
| \b \c{live_support} - set the live support flag for this section |
| |
| \b \c{strip_static_syms} - strip static symbols for this section |
| |
| \b \c{debug} - this section contains debugging information |
| |
| \b \c{align=}\e{alignment} - specify section alignment |
| |
| The default is \c{data}, unless the section name is \c{__text} or |
| \c{__bss} in which case the default is \c{text} or \c{bss}, |
| respectively. |
| |
| For compatibility with other Unix platforms, the following standard |
| names are also supported: |
| |
| \c .text = __TEXT,__text text |
| \c .rodata = __DATA,__const data |
| \c .data = __DATA,__data data |
| \c .bss = __DATA,__bss bss |
| |
| If the \c{.rodata} section contains no relocations, it is instead put |
| into the \c{__TEXT,__const} section unless this section has already |
| been specified explicitly. However, it is probably better to specify |
| \c{__TEXT,__const} and \c{__DATA,__const} explicitly as appropriate. |
| |
| \S{machotls} \i{Thread Local Storage in Mach-O}\I{TLS}: \c{macho} special |
| symbols and \i\c{WRT} |
| |
| Mach-O defines the following special symbols that can be used on the |
| right-hand side of the \c{WRT} operator: |
| |
| \b \c{..tlvp} is used to specify access to thread-local storage. |
| |
| \b \c{..gotpcrel} is used to specify references to the Global Offset |
| Table. The GOT is supported in the \c{macho64} format only. |
| |
| \S{macho-ssvs} \c{macho} specific directive \i\c{subsections_via_symbols} |
| |
| The directive \c{subsections_via_symbols} sets the |
| \c{MH_SUBSECTIONS_VIA_SYMBOLS} flag in the Mach-O header, that effectively |
| separates a block (or a subsection) based on a symbol. It is often used |
| for eliminating dead codes by a linker. |
| |
| This directive takes no arguments. |
| |
| This is a macro implemented as a \c{%pragma}. It can also be |
| specified in its \c{%pragma} form, in which case it will not affect |
| non-Mach-O builds of the same source code: |
| |
| \c %pragma macho subsections_via_symbols |
| |
| \S{macho-ssvs} \c{macho} specific directive \i\c{no_dead_strip} |
| |
| The directive \c{no_dead_strip} sets the Mach-O \c{SH_NO_DEAD_STRIP} |
| section flag on the section containing a a specific symbol. This |
| directive takes a list of symbols as its arguments. |
| |
| This is a macro implemented as a \c{%pragma}. It can also be |
| specified in its \c{%pragma} form, in which case it will not affect |
| non-Mach-O builds of the same source code: |
| |
| \c %pragma macho no_dead_strip symbol... |
| |
| \S{macho-pext} \c{macho} specific extensions to the \c{GLOBAL} |
| Directive: \i\c{private_extern} |
| |
| The directive extension to \c{GLOBAL} marks the symbol with limited |
| global scope. For example, you can specify the global symbol with |
| this extension: |
| |
| \c global foo:private_extern |
| \c foo: |
| \c ; codes |
| |
| Using with static linker will clear the private extern attribute. |
| But linker option like \c{-keep_private_externs} can avoid it. |
| |
| \S{macho-bver} \c{macho} specific directive \i\c{build_version} |
| |
| The directive \c{build_version} generates a \c{LC_BUILD_VERSION} |
| load command in the Mach-O header, which allows specifying a |
| target platform, minimum OS version and optionally SDK version. |
| Newer Xcode linker versions warn if this is not present in object |
| files. |
| |
| This directive takes the target platform name and minimum OS |
| version as arguments, in this form: |
| |
| \c build_version macos,10,7 |
| |
| Platform names that make sense for x86 code are \c{macos}, |
| \c{iossimulator}, \c{tvossimulator} and \c{watchossimulator}. |
| |
| Optionally, a trailing version number and minimum SDK version |
| can also be specified with this syntax: |
| |
| \c build_version macos, 10, 14, 0 sdk_version 10, 14, 0 |
| |
| This is a macro implemented as a \c{%pragma}. It can also be |
| specified in its \c{%pragma} form, in which case it will not |
| affect non-Mach-O builds of the same source code: |
| |
| \c %pragma macho build_version ... |
| |
| This latter form is also useful on the command line when using |
| the \c{--pragma} command-line switch: |
| |
| \c nasm -f macho64 --pragma "macho build_version macos,10,9" ... |
| |
| \H{elffmt} \i\c{elf32}, \i\c{elf64}, \i\c{elfx32}: |
| \I{ELF}\I{linux, elf}Executable and Linkable Format Object Files |
| |
| The \c{elf32}, \c{elf64} and \c{elfx32} output formats generate |
| \c{ELF32} and \c{ELF64} (Executable and Linkable Format) object files, as |
| used by \i{Linux} as well as \i{Unix System V}, including \i{Solaris x86}, |
| \i{UnixWare} and \i{SCO Unix}. ELF provides a default output |
| file-name extension of \c{.o}. \c{elf} is a synonym for \c{elf32}. |
| |
| The \c{elfx32} file format is an ELF32 file containing 64-bit x86 |
| code, and is used for the \i{x32} ABI, which runs the CPU in 64-bit |
| mode while using 32-bit values for pointers to reduce memory |
| footprint. Thus, code intended to be used with the x32 ABI should be |
| assembled with \c{BITS 64}. |
| |
| \S{abisect} ELF specific directive \i\c{osabi} |
| |
| The ELF header specifies the application binary interface for the |
| target operating system (OSABI). This field can be set by using the |
| \c{osabi} directive with the numeric value (0-255) of the target |
| system. If this directive is not used, the default value will be "UNIX |
| System V ABI" (0) which will work on most systems which support ELF. |
| |
| \S{elfsect} ELF extensions to the \c{SECTION} Directive |
| \I{SECTION, ELF extensions to} |
| |
| Like the \c{obj} format, \c{elf} allows you to specify additional |
| information on the \c{SECTION} directive line, to control the type |
| and properties of sections you declare. Section types and properties |
| are generated automatically by NASM for the \i{standard section |
| names}, but may still be |
| overridden by these qualifiers. |
| |
| The available qualifiers are: |
| |
| \b \i\c{alloc} defines the section to be one which is loaded into |
| memory when the program is run. \i\c{noalloc} defines it to be one |
| which is not, such as an informational or comment section. |
| |
| \b \i\c{exec} defines the section to be one which should have execute |
| permission when the program is run. \i\c{noexec} defines it as one |
| which should not. |
| |
| \b \i\c{write} defines the section to be one which should be writable |
| when the program is run. \i\c{nowrite} defines it as one which should |
| not. |
| |
| \b \i\c{progbits} defines the section to be one with explicit contents |
| stored in the object file: an ordinary code or data section, for |
| example. |
| |
| \b \i\c{nobits} defines the section to be one with no explicit |
| contents given, such as a BSS section. |
| |
| \b \i\c{note} indicates that this section contains ELF notes. The |
| content of ELF notes are specified using normal assembly instructions; |
| it is up to the programmer to ensure these are valid ELF notes. |
| |
| \b \i\c{preinit_array} indicates that this section contains function |
| addresses to be called before any other initialization has happened. |
| |
| \b \i\c{init_array} indicates that this section contains function |
| addresses to be called during initialization. |
| |
| \b \i\c{fini_array} indicates that this section contains function |
| pointers to be called during termination. |
| |
| \b \I{align, ELF attribute}\c{align=}, used with a trailing number as in \c{obj}, gives the |
| \I{section alignment, in elf}\I{alignment, in elf sections}alignment |
| requirements of the section. |
| |
| \b \c{byte}, \c{word}, \c{dword}, \c{qword}, \c{tword}, \c{oword}, |
| \c{yword}, or \c{zword} with an optional \c{*}\i{multiplier} specify |
| the fundamental data item size for a section which contains either |
| fixed-sized data structures or strings; it also sets a default |
| alignment. This is generally used with the \c{strings} and \c{merge} |
| attributes (see below.) For example \c{byte*4} defines a unit size of |
| 4 bytes, with a default alignment of 1; \c{dword} also defines a unit |
| size of 4 bytes, but with a default alignment of 4. The \c{align=} |
| attribute, if specified, overrides this default alignment. |
| |
| \b \I{pointer, ELF attribute}\c{pointer} is equivalent to \c{dword} |
| for \c{elf32} or \c{elfx32}, and \c{qword} for \c{elf64}. |
| |
| \b \I{strings, ELF attribute}\c{strings} indicate that this section |
| contains exclusively null-terminated strings. By default these are |
| assumed to be byte strings, but a size specifier can be used to |
| override that. |
| |
| \b \i\c{merge} indicates that duplicate data elements in this section |
| should be merged with data elements from other object files. Data |
| elements can be either fixed-sized objects or null-terminated strings |
| (with the \c{strings} attribute). A size specifier is required unless |
| \c{strings} is specified, in which case the size defaults to \c{byte}. |
| |
| \b \i\c{tls} defines the section to be one which contains |
| thread local variables. |
| |
| The defaults assumed by NASM if you do not specify the above |
| qualifiers are: |
| |
| \I\c{.text} \I\c{.rodata} \I\c{.lrodata} \I\c{.data} \I\c{.ldata} |
| \I\c{.bss} \I\c{.lbss} \I\c{.tdata} \I\c{.tbss} \I\c\{.comment} |
| |
| \c section .text progbits alloc exec nowrite align=16 |
| \c section .rodata progbits alloc noexec nowrite align=4 |
| \c section .lrodata progbits alloc noexec nowrite align=4 |
| \c section .data progbits alloc noexec write align=4 |
| \c section .ldata progbits alloc noexec write align=4 |
| \c section .bss nobits alloc noexec write align=4 |
| \c section .lbss nobits alloc noexec write align=4 |
| \c section .tdata progbits alloc noexec write align=4 tls |
| \c section .tbss nobits alloc noexec write align=4 tls |
| \c section .comment progbits noalloc noexec nowrite align=1 |
| \c section .preinit_array preinit_array alloc noexec nowrite pointer |
| \c section .init_array init_array alloc noexec nowrite pointer |
| \c section .fini_array fini_array alloc noexec nowrite pointer |
| \c section .note note noalloc noexec nowrite align=4 |
| \c section other progbits alloc noexec nowrite align=1 |
| |
| (Any section name other than those in the above table |
| is treated by default like \c{other} in the above table. |
| Please note that section names are case sensitive.) |
| |
| |
| \S{elfwrt} \i{Position-Independent Code}\I{PIC}: ELF Special |
| Symbols and \i\c{WRT} |
| |
| Since \c{ELF} does not support segment-base references, the \c{WRT} |
| operator is not used for its normal purpose; therefore NASM's |
| \c{elf} output format makes use of \c{WRT} for a different purpose, |
| namely the PIC-specific \I{relocations, PIC-specific}relocation |
| types. |
| |
| \c{elf} defines five special symbols which you can use as the |
| right-hand side of the \c{WRT} operator to obtain PIC relocation |
| types. They are \i\c{..gotpc}, \i\c{..gotoff}, \i\c{..got}, |
| \i\c{..plt} and \i\c{..sym}. Their functions are summarized here: |
| |
| \b Referring to the symbol marking the global offset table base |
| using \c{wrt ..gotpc} will end up giving the distance from the |
| beginning of the current section to the global offset table. |
| (\i\c{_GLOBAL_OFFSET_TABLE_} is the standard symbol name used to |
| refer to the \i{GOT}.) So you would then need to add \i\c{$$} to the |
| result to get the real address of the GOT. |
| |
| \b Referring to a location in one of your own sections using \c{wrt |
| ..gotoff} will give the distance from the beginning of the GOT to |
| the specified location, so that adding on the address of the GOT |
| would give the real address of the location you wanted. |
| |
| \b Referring to an external or global symbol using \c{wrt ..got} |
| causes the linker to build an entry \e{in} the GOT containing the |
| address of the symbol, and the reference gives the distance from the |
| beginning of the GOT to the entry; so you can add on the address of |
| the GOT, load from the resulting address, and end up with the |
| address of the symbol. |
| |
| \b Referring to a procedure name using \c{wrt ..plt} causes the |
| linker to build a \i{procedure linkage table} entry for the symbol, |
| and the reference gives the address of the \i{PLT} entry. You can |
| only use this in contexts which would generate a PC-relative |
| relocation normally (i.e. as the destination for \c{CALL} or |
| \c{JMP}), since ELF contains no relocation type to refer to PLT |
| entries absolutely. |
| |
| \b Referring to a symbol name using \c{wrt ..sym} causes NASM to |
| write an ordinary relocation, but instead of making the relocation |
| relative to the start of the section and then adding on the offset |
| to the symbol, it will write a relocation record aimed directly at |
| the symbol in question. The distinction is a necessary one due to a |
| peculiarity of the dynamic linker. |
| |
| A fuller explanation of how to use these relocation types to write |
| shared libraries entirely in NASM is given in \k{picdll}. |
| |
| \S{elftls} \i{Thread Local Storage in ELF}\I{TLS}: \c{elf} Special |
| Symbols and \i\c{WRT} |
| |
| \b In ELF32 mode, referring to an external or global symbol using |
| \c{wrt ..tlsie} \I\c{..tlsie} |
| causes the linker to build an entry \e{in} the GOT containing the |
| offset of the symbol within the TLS block, so you can access the value |
| of the symbol with code such as: |
| |
| \c mov eax,[tid wrt ..tlsie] |
| \c mov [gs:eax],ebx |
| |
| |
| \b In ELF64 or ELFx32 mode, referring to an external or global symbol using |
| \c{wrt ..gottpoff} \I\c{..gottpoff} |
| causes the linker to build an entry \e{in} the GOT containing the |
| offset of the symbol within the TLS block, so you can access the value |
| of the symbol with code such as: |
| |
| \c mov rax,[rel tid wrt ..gottpoff] |
| \c mov rcx,[fs:rax] |
| |
| |
| \S{elfglob} \c{elf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL, |
| elf extensions to}\I{GLOBAL, aoutb extensions to} |
| |
| \c{ELF} object files can contain more information about a global |
| symbol than just its address: they can contain the \I{symbols, |
| specifying sizes}\I{size, of symbols}size of the symbol and its |
| \I{symbols, specifying types}\I{type, of symbols}type as well. These |
| are not merely debugger conveniences, but are actually necessary when |
| the program being written is a \I{elf shared library}shared |
| library. NASM therefore supports some extensions to the \c{GLOBAL} |
| directive, allowing you to specify these features. |
| |
| You can specify whether a global variable is a function or a data |
| object by suffixing the name with a colon and the word |
| \i\c{function} or \i\c{data}. (\i\c{object} is a synonym for |
| \c{data}.) For example: |
| |
| \c global hashlookup:function, hashtable:data |
| |
| exports the global symbol \c{hashlookup} as a function and |
| \c{hashtable} as a data object. |
| |
| Optionally, you can control the ELF visibility of the symbol. Just |
| add one of the \I{elf visibility}visibility keywords: |
| \I{default, elf}\c{default}, |
| \I{internal, elf}\c{internal}, |
| \I{hidden, elf}\c{hidden}, |
| or \I{protected, elf}\c{protected}. The default is |
| \c{default} of course. For example, to make \c{hashlookup} hidden: |
| |
| \c global hashlookup:function hidden |
| |
| Since version 2.15, it is possible to specify symbols binding. The keywords |
| are: \i\c{weak} to generate weak symbol or \i\c{strong}. The default is \i\c{strong}. |
| |
| You can also specify the size of the data associated with the |
| symbol, as a numeric expression (which may involve labels, and even |
| forward references) after the type specifier. Like this: |
| |
| \c global hashtable:data (hashtable.end - hashtable) |
| \c |
| \c hashtable: |
| \c db this,that,theother ; some data here |
| \c .end: |
| |
| This makes NASM automatically calculate the length of the table and |
| place that information into the \c{ELF} symbol table. |
| |
| Declaring the type and size of global symbols is necessary when |
| writing shared library code. For more information, see |
| \k{picglobal}. |
| |
| |
| \S{elfextrn} \c{elf} Extensions to the \c{EXTERN} Directive\I{EXTERN, |
| elf extensions to}\I{EXTERN, elf extensions to} |
| |
| Since version 2.15 it is possible to specify keyword \i\c{weak} to generate weak external |
| reference. Example: |
| |
| \c extern weak_ref:weak |
| |
| |
| \S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive |
| \I{COMMON, elf extensions to} |
| |
| \c{ELF} also allows you to specify alignment requirements \I{common |
| variables, alignment in elf}\I{alignment, of elf common variables}on |
| common variables. This is done by putting a number (which must be a |
| power of two) after the name and size of the common variable, |
| separated (as usual) by a colon. For example, an array of |
| doublewords would benefit from 4-byte alignment: |
| |
| \c common dwordarray 128:4 |
| |
| This declares the total size of the array to be 128 bytes, and |
| requires that it be aligned on a 4-byte boundary. |
| |
| |
| \S{elf16} 16-bit code and ELF |
| \I{ELF, 16-bit code} |
| |
| Older versions of the \c{ELF32} specification did not provide |
| relocations for 8- and 16-bit values. It is now part of the formal |
| specification, and any new enough linker should support them. |
| |
| ELF has currently no support for segmented programming. |
| |
| \S{elfdbg} Debug formats and ELF |
| \I{ELF, debug formats} |
| |
| ELF provides debug information in \c{STABS} and \c{DWARF} formats. |
| Line number information is generated for all executable sections, but please |
| note that only the ".text" section is executable by default. |
| |
| \H{aoutfmt} \i\c{aout}: Linux \I{a.out, Linux version}\I{linux, a.out}\c{a.out} Object Files |
| |
| The \c{aout} format generates \c{a.out} object files, in the form used |
| by early \i{Linux} systems (current Linux systems use ELF, see |
| \k{elffmt}.) These differ from other \c{a.out} object files in that |
| the magic number in the first four bytes of the file is |
| different; also, some implementations of \c{a.out}, for example |
| NetBSD's, support position-independent code, which Linux's |
| implementation does not. |
| |
| \c{a.out} provides a default output file-name extension of \c{.o}. |
| |
| \c{a.out} is a very simple object format. It supports no special |
| directives, no special symbols, no use of \c{SEG} or \c{WRT}, and no |
| extensions to any standard directives. It supports only the three |
| \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. |
| |
| |
| \H{aoutfmt} \i\c{aoutb}: \i{NetBSD}/\i{FreeBSD}/\i{OpenBSD} |
| \I{a.out, BSD version}\c{a.out} Object Files |
| |
| The \c{aoutb} format generates \c{a.out} object files, in the form |
| used by the various free \c{BSD Unix} clones, \c{NetBSD}, \c{FreeBSD} |
| and \c{OpenBSD}. For simple object files, this object format is exactly |
| the same as \c{aout} except for the magic number in the first four bytes |
| of the file. However, the \c{aoutb} format supports |
| \I{PIC}\i{position-independent code} in the same way as the \c{elf} |
| format, so you can use it to write \c{BSD} \i{shared libraries}. |
| |
| \c{aoutb} provides a default output file-name extension of \c{.o}. |
| |
| \c{aoutb} supports no special directives, no special symbols, and |
| only the three \i{standard section names} \i\c{.text}, \i\c{.data} |
| and \i\c{.bss}. However, it also supports the same use of \i\c{WRT} as |
| \c{elf} does, to provide position-independent code relocation types. |
| See \k{elfwrt} for full documentation of this feature. |
| |
| \c{aoutb} also supports the same extensions to the \c{GLOBAL} |
| directive as \c{elf} does: see \k{elfglob} for documentation of |
| this. |
| |
| |
| \H{as86fmt} \c{as86}: \i{Minix}/Linux\I{linux, as86} \i\c{as86} Object Files |
| |
| The Minix/Linux 16-bit assembler \c{as86} has its own non-standard |
| object file format. Although its companion linker \i\c{ld86} produces |
| something close to ordinary \c{a.out} binaries as output, the object |
| file format used to communicate between \c{as86} and \c{ld86} is not |
| itself \c{a.out}. |
| |
| NASM supports this format, just in case it is useful, as \c{as86}. |
| \c{as86} provides a default output file-name extension of \c{.o}. |
| |
| \c{as86} is a very simple object format (from the NASM user's point |
| of view). It supports no special directives, no use of \c{SEG} or \c{WRT}, |
| and no extensions to any standard directives. It supports only the three |
| \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. The |
| only special symbol supported is \c{..start}. |
| |
| |
| \H{dbgfmt} \i\c{dbg}: Debugging Format |
| |
| The \c{dbg} format does not output an object file as such; instead, |
| it outputs a text file which contains a complete list of all the |
| transactions between the main body of NASM and the output-format |
| back end module. It is primarily intended to aid people who want to |
| write their own output drivers, so that they can get a clearer idea |
| of the various requests the main program makes of the output driver, |
| and in what order they happen. |
| |
| For simple files, one can easily use the \c{dbg} format like this: |
| |
| \c nasm -f dbg filename.asm |
| |
| which will generate a diagnostic file called \c{filename.dbg}. |
| However, this will not work well on files which were designed for a |
| different object format, because each object format defines its own |
| macros (usually user-level forms of directives), and those macros |
| will not be defined in the \c{dbg} format. Therefore it can be |
| useful to run NASM twice, in order to do the preprocessing with the |
| native object format selected: |
| |
| \c nasm -e -f elf32 -o elfprog.i elfprog.asm |
| \c nasm -a -f dbg elfprog.i |
| |
| This preprocesses \c{elfprog.asm} into \c{elfprog.i}, keeping the |
| \c{elf32} object format selected in order to make sure ELF special |
| directives are converted into primitive form correctly. Then the |
| preprocessed source is fed through the \c{dbg} format to generate the |
| final diagnostic output. |
| |
| This workaround will still typically not work for programs intended |
| for \c{obj} format, because the \c{obj} \c{SEGMENT} and \c{GROUP} |
| directives have side effects of defining the segment and group names |
| as symbols; \c{dbg} will not do this, so the program will not |
| assemble. You will have to work around that by defining the symbols |
| yourself (using \c{EXTERN}, for example) if you really need to get a |
| \c{dbg} trace of an \c{obj}-specific source file. |
| |
| \c{dbg} accepts any section name and any directives at all, and logs |
| them all to its output file. |
| |
| \c{dbg} accepts and logs any \c{%pragma}, but the specific |
| \c{%pragma}: |
| |
| \c %pragma dbg maxdump <size> |
| |
| where \c{<size>} is either a number or \c{unlimited}, can be used to |
| control the maximum size for dumping the full contents of a |
| \c{rawdata} output object. |