| \# -------------------------------------------------------------------------- |
| \# |
| \# Copyright 1996-2020 The NASM Authors - All Rights Reserved |
| \# See the file AUTHORS included with the NASM distribution for |
| \# the specific copyright holders. |
| \# |
| \# Redistribution and use in source and binary forms, with or without |
| \# modification, are permitted provided that the following |
| \# conditions are met: |
| \# |
| \# * Redistributions of source code must retain the above copyright |
| \# notice, this list of conditions and the following disclaimer. |
| \# * Redistributions in binary form must reproduce the above |
| \# copyright notice, this list of conditions and the following |
| \# disclaimer in the documentation and/or other materials provided |
| \# with the distribution. |
| \# |
| \# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND |
| \# CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, |
| \# INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF |
| \# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE |
| \# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| \# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, |
| \# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT |
| \# NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; |
| \# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
| \# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| \# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR |
| \# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, |
| \# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| \# |
| \# -------------------------------------------------------------------------- |
| \# |
| \# Source code to NASM documentation |
| \# |
| |
| \M{category}{Programming} |
| \M{title}{NASM - The Netwide Assembler} |
| \M{year}{1996-2017} |
| \M{author}{The NASM Development Team} |
| \M{copyright_tail}{-- All Rights Reserved} |
| \M{license}{This document is redistributable under the license given in the file "LICENSE" distributed in the NASM archive.} |
| \M{summary}{This file documents NASM, the Netwide Assembler: an assembler targetting the Intel x86 series of processors, with portable source.} |
| \M{infoname}{NASM} |
| \M{infofile}{nasm} |
| \M{infotitle}{The Netwide Assembler for x86} |
| \M{epslogo}{nasmlogo.eps} |
| \M{logoyadj}{-72} |
| |
| \& version.src |
| |
| \IR{-D} \c{-D} option |
| \IR{-E} \c{-E} option |
| \IR{-F} \c{-F} option |
| \IR{-I} \c{-I} option |
| \IR{-L} \c{-L} option |
| \IR{-M} \c{-M} option |
| \IR{-MD} \c{-MD} option |
| \IR{-MF} \c{-MF} option |
| \IR{-MG} \c{-MG} option |
| \IR{-MP} \c{-MP} option |
| \IR{-MQ} \c{-MQ} option |
| \IR{-MT} \c{-MT} option |
| \IR{-MW} \c{-MW} option |
| \IR{-O} \c{-O} option |
| \IR{-P} \c{-P} option |
| \IR{-U} \c{-U} option |
| \IR{-X} \c{-X} option |
| \IR{-a} \c{-a} option |
| \IR{-d} \c{-d} option |
| \IR{-e} \c{-e} option |
| \IR{-f} \c{-f} option |
| \IR{-g} \c{-g} option |
| \IR{-i} \c{-i} option |
| \IR{-l} \c{-l} option |
| \IR{-o} \c{-o} option |
| \IR{-p} \c{-p} option |
| \IR{-s} \c{-s} option |
| \IR{-u} \c{-u} option |
| \IR{-v} \c{-v} option |
| \IR{-W} \c{-W} option |
| \IR{-Werror} \c{-Werror} option |
| \IR{-Wno-error} \c{-Wno-error} option |
| \IR{-w} \c{-w} option |
| \IR{-Z} \c{-Z} option |
| \IR{!=} \c{!=} operator |
| \IR{$, here} \c{$}, Here token |
| \IR{$, prefix} \c{$}, prefix |
| \IR{$$} \c{$$} token |
| \IR{%} \c{%} operator |
| \IR{%db} \c{%} prefix to \c{DB} lists |
| \IR{%%} \c{%%} operator |
| \IR{%+1} \c{%+1} and \c{%-1} syntax |
| \IA{%-1}{%+1} |
| \IR{%0} \c{%0} parameter count |
| \IR{&} \c{&} operator |
| \IR{&&} \c{&&} operator |
| \IR{*} \c{*} operator |
| \IR{..@} \c{..@} symbol prefix |
| \IR{/} \c{/} operator |
| \IR{//} \c{//} operator |
| \IR{<} \c{<} operator |
| \IR{<<} \c{<<} operator |
| \IR{<<<} \c{<<<} operator |
| \IR{<=>} \c{<=>} operator |
| \IR{<=} \c{<=} operator |
| \IR{<>} \c{<>} operator |
| \IR{<=>} \c{<=>} operator |
| \IR{=} \c{=} operator |
| \IR{==} \c{==} operator |
| \IR{>} \c{>} operator |
| \IR{>=} \c{>=} operator |
| \IR{>>} \c{>>} operator |
| \IR{>>>} \c{>>>} operator |
| \IR{?db} \c{?}, data syntax |
| \IR{?op} \c{?}, operator |
| \IR{^} \c{^} operator |
| \IR{^^} \c{^^} operator |
| \IR{|} \c{|} operator |
| \IR{||} \c{||} operator |
| \IR{~} \c{~} operator |
| \IR{%$} \c{%$} and \c{%$$} prefixes |
| \IA{%$$}{%$} |
| \IR{+ opaddition} \c{+} operator, binary |
| \IR{+ opunary} \c{+} operator, unary |
| \IR{+ modifier} \c{+} modifier |
| \IR{- opsubtraction} \c{-} operator, binary |
| \IR{- opunary} \c{-} operator, unary |
| \IR{! opunary} \c{!} operator, unary |
| \IR{alignment, in bin sections} alignment, in \c{bin} sections |
| \IR{alignment, in elf sections} alignment, in ELF sections |
| \IR{alignment, in win32 sections} alignment, in \c{win32} sections |
| \IR{alignment, of elf common variables} alignment, of ELF common |
| variables |
| \IR{alignment, in obj sections} alignment, in \c{obj} sections |
| \IR{a.out, bsd version} \c{a.out}, BSD version |
| \IR{a.out, linux version} \c{a.out}, Linux version |
| \IR{bin} \c{bin} output format |
| \IR{bitwise and} bitwise AND |
| \IR{bitwise or} bitwise OR |
| \IR{bitwise xor} bitwise XOR |
| \IR{block ifs} block IFs |
| \IR{borland pascal} Borland, Pascal |
| \IR{borland's win32 compilers} Borland, Win32 compilers |
| \IR{braces, after % sign} braces, after \c{%} sign |
| \IR{bsd} BSD |
| \IR{c calling convention} C calling convention |
| \IR{c symbol names} C symbol names |
| \IA{critical expressions}{critical expression} |
| \IA{command line}{command-line} |
| \IA{case sensitivity}{case sensitive} |
| \IA{case-sensitive}{case sensitive} |
| \IA{case-insensitive}{case sensitive} |
| \IA{character constants}{character constant} |
| \IR{codeview} CodeView debugging format |
| \IR{common object file format} Common Object File Format |
| \IR{common variables, alignment in elf} common variables, alignment |
| in ELF |
| \IR{common, elf extensions to} \c{COMMON}, ELF extensions to |
| \IR{common, obj extensions to} \c{COMMON}, \c{obj} extensions to |
| \IR{declaring structure} declaring structures |
| \IR{default-wrt mechanism} default-\c{WRT} mechanism |
| \IR{devpac} DevPac |
| \IR{djgpp} DJGPP |
| \IR{dll symbols, exporting} DLL symbols, exporting |
| \IR{dll symbols, importing} DLL symbols, importing |
| \IR{dos} DOS |
| \IR{dos archive} DOS archive |
| \IR{dos source archive} DOS source archive |
| \IR{dup} \c{DUP} |
| \IA{effective address}{effective addresses} |
| \IA{effective-address}{effective addresses} |
| \IR{elf} ELF |
| \IR{elf, 16-bit code} ELF, 16-bit code |
| \IR{elf, debug formats} ELF, debug formats |
| \IR{elf shared libraries} ELF, shared libraries |
| \IR{elf32} \c{elf32} |
| \IR{elf64} \c{elf64} |
| \IR{elfx32} \c{elfx32} |
| \IR{executable and linkable format} Executable and Linkable Format |
| \IR{extern, elf extensions to} \c{EXTERN}, \c{elf} extensions to |
| \IR{extern, obj extensions to} \c{EXTERN}, \c{obj} extensions to |
| \IR{extern, rdf extensions to} \c{EXTERN}, \c{rdf} extensions to |
| \IR{floating-point, constants} floating-point, constants |
| \IR{floating-point, packed bcd constants} floating-point, packed BCD constants |
| \IR{freebsd} FreeBSD |
| \IR{freelink} FreeLink |
| \IR{functions, c calling convention} functions, C calling convention |
| \IR{functions, pascal calling convention} functions, Pascal calling |
| convention |
| \IR{global, aoutb extensions to} \c{GLOBAL}, \c{aoutb} extensions to |
| \IR{global, elf extensions to} \c{GLOBAL}, ELF extensions to |
| \IR{global, rdf extensions to} \c{GLOBAL}, \c{rdf} extensions to |
| \IR{got} GOT |
| \IR{got relocations} \c{GOT} relocations |
| \IR{gotoff relocation} \c{GOTOFF} relocations |
| \IR{gotpc relocation} \c{GOTPC} relocations |
| \IR{intel number formats} Intel number formats |
| \IR{linux, elf} Linux, ELF |
| \IR{linux, a.out} Linux, \c{a.out} |
| \IR{linux, as86} Linux, \c{as86} |
| \IR{logical and} logical AND |
| \IR{logical or} logical OR |
| \IR{logical xor} logical XOR |
| \IR{mach object file format} Mach, object file format |
| \IA{mach-o}{macho} |
| \IR{mach-o} Mach-O, object file format |
| \IR{macho32} \c{macho32} |
| \IR{macho64} \c{macho64} |
| \IR{macos x} MacOS X |
| \IR{masm} MASM |
| \IR{masmdb} MASM, \c{DB} syntax |
| \IA{memory reference}{memory references} |
| \IR{minix} Minix |
| \IA{misc directory}{misc subdirectory} |
| \IR{misc subdirectory} \c{misc} subdirectory |
| \IR{microsoft omf} Microsoft OMF |
| \IR{mmx registers} MMX registers |
| \IA{modr/m}{modr/m byte} |
| \IR{modr/m byte} ModR/M byte |
| \IR{ms-dos} MS-DOS |
| \IR{ms-dos device drivers} MS-DOS device drivers |
| \IR{multipush} \c{multipush} macro |
| \IR{nan} NaN |
| \IR{nasm version} NASM version |
| \IR{netbsd} NetBSD |
| \IR{nsis} NSIS |
| \IR{nullsoft scriptable installer} Nullsoft Scriptable Installer |
| \IR{omf} OMF |
| \IR{openbsd} OpenBSD |
| \IR{operating system} operating system |
| \IR{os/2} OS/2 |
| \IR{pascal calling convention}Pascal calling convention |
| \IR{passes} passes, assembly |
| \IR{perl} Perl |
| \IR{pic} PIC |
| \IR{pharlap} PharLap |
| \IR{plt} PLT |
| \IR{plt} \c{PLT} relocations |
| \IA{pre-defining macros}{pre-define} |
| \IA{preprocessor expressions}{preprocessor, expressions} |
| \IA{preprocessor loops}{preprocessor, loops} |
| \IA{preprocessor variables}{preprocessor, variables} |
| \IA{rdoff subdirectory}{rdoff} |
| \IR{rdoff} \c{rdoff} subdirectory |
| \IR{relocatable dynamic object file format} Relocatable Dynamic |
| Object File Format |
| \IR{relocations, pic-specific} relocations, PIC-specific |
| \IA{repeating}{repeating code} |
| \IR{section alignment, in elf} section alignment, in ELF |
| \IR{section alignment, in bin} section alignment, in \c{bin} |
| \IR{section alignment, in obj} section alignment, in \c{obj} |
| \IR{section alignment, in win32} section alignment, in \c{win32} |
| \IR{section, elf extensions to} \c{SECTION}, ELF extensions to |
| \IR{section, macho extensions to} \c{SECTION}, \c{macho} extensions to |
| \IR{section, win32 extensions to} \c{SECTION}, \c{win32} extensions to |
| \IR{segment alignment, in bin} segment alignment, in \c{bin} |
| \IR{segment alignment, in obj} segment alignment, in \c{obj} |
| \IR{segment, obj extensions to} \c{SEGMENT}, ELF extensions to |
| \IR{segment names, borland pascal} segment names, Borland Pascal |
| \IR{shift command} \c{shift} command |
| \IA{sib}{sib byte} |
| \IR{sib byte} SIB byte |
| \IR{align, smart} \c{ALIGN}, smart |
| \IA{sectalign}{sectalign} |
| \IR{solaris x86} Solaris x86 |
| \IA{standard section names}{standardized section names} |
| \IR{symbols, exporting from dlls} symbols, exporting from DLLs |
| \IR{symbols, importing from dlls} symbols, importing from DLLs |
| \IR{test subdirectory} \c{test} subdirectory |
| \IR{thread local storage in elf} thread local storage, in ELF |
| \IR{thread local storage in mach-o} thread local storage, in \c{macho} |
| \IR{tlink} \c{TLINK} |
| \IR{underscore, in c symbols} underscore, in C symbols |
| \IR{unicode} Unicode |
| \IR{unix} Unix |
| \IR{utf-8} UTF-8 |
| \IR{utf-16} UTF-16 |
| \IR{utf-32} UTF-32 |
| \IA{sco unix}{unix, sco} |
| \IR{unix, sco} Unix, SCO |
| \IA{unix source archive}{unix, source archive} |
| \IR{unix, source archive} Unix, source archive |
| \IA{unix system v}{unix, system v} |
| \IR{unix, system v} Unix, System V |
| \IR{unixware} UnixWare |
| \IR{val} VAL |
| \IR{version number of nasm} version number of NASM |
| \IR{visual c++} Visual C++ |
| \IR{www page} WWW page |
| \IR{win32} Win32 |
| \IR{win32} Win64 |
| \IR{windows} Windows |
| \IR{windows 95} Windows 95 |
| \IR{windows nt} Windows NT |
| \# \IC{program entry point}{entry point, program} |
| \# \IC{program entry point}{start point, program} |
| \# \IC{MS-DOS device drivers}{device drivers, MS-DOS} |
| \# \IC{16-bit mode, versus 32-bit mode}{32-bit mode, versus 16-bit mode} |
| \# \IC{c symbol names}{symbol names, in C} |
| |
| |
| \C{intro} Introduction |
| |
| \H{whatsnasm} What Is NASM? |
| |
| The Netwide Assembler, NASM, is an 80x86 and x86-64 assembler designed |
| for portability and modularity. It supports a range of object file |
| formats, including Linux and *BSD \c{a.out}, ELF, Mach-O, 16-bit and |
| 32-bit \c{.obj} (OMF) format, COFF (including its Win32 and Win64 |
| variants.) It can also output plain binary files, Intel hex and |
| Motorola S-Record formats. Its syntax is designed to be simple and |
| easy to understand, similar to the syntax in the Intel Software |
| Developer Manual with minimal complexity. It supports all currently |
| known x86 architectural extensions, and has strong support for macros. |
| |
| NASM also comes with a set of utilities for handling its own RDOFF2 |
| object-file format. |
| |
| \S{legal} \i{License} Conditions |
| |
| Please see the file \c{LICENSE}, supplied as part of any NASM |
| distribution archive, for the license conditions under which you may |
| use NASM. NASM is now under the so-called 2-clause BSD license, also |
| known as the simplified BSD license. |
| |
| Copyright 1996-2017 the NASM Authors - All rights reserved. |
| |
| Redistribution and use in source and binary forms, with or without |
| modification, are permitted provided that the following conditions are |
| met: |
| |
| \b Redistributions of source code must retain the above copyright |
| notice, this list of conditions and the following disclaimer. |
| |
| \b Redistributions in binary form must reproduce the above copyright |
| notice, this list of conditions and the following disclaimer in the |
| documentation and/or other materials provided with the distribution. |
| |
| THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND |
| CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, |
| INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF |
| MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE |
| DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, |
| SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT |
| NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; |
| LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) |
| HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN |
| CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR |
| OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, |
| EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| |
| \C{running} Running NASM |
| |
| \H{syntax} NASM \i{Command-Line} Syntax |
| |
| To assemble a file, you issue a command of the form |
| |
| \c nasm -f <format> <filename> [-o <output>] |
| |
| For example, |
| |
| \c nasm -f elf myfile.asm |
| |
| will assemble \c{myfile.asm} into an ELF object file \c{myfile.o}. And |
| |
| \c nasm -f bin myfile.asm -o myfile.com |
| |
| will assemble \c{myfile.asm} into a raw binary file \c{myfile.com}. |
| |
| To produce a listing file, with the hex codes output from NASM |
| displayed on the left of the original sources, use the \c{-l} option |
| to give a listing file name, for example: |
| |
| \c nasm -f coff myfile.asm -l myfile.lst |
| |
| To get further usage instructions from NASM, try typing |
| |
| \c nasm -h |
| |
| The option \c{--help} is an alias for the \c{-h} option. |
| |
| If you use Linux but aren't sure whether your system is \c{a.out} |
| or ELF, type |
| |
| \c file nasm |
| |
| (in the directory in which you put the NASM binary when you |
| installed it). If it says something like |
| |
| \c nasm: ELF 32-bit LSB executable i386 (386 and up) Version 1 |
| |
| then your system is \c{ELF}, and you should use the option \c{-f elf} |
| when you want NASM to produce Linux object files. If it says |
| |
| \c nasm: Linux/i386 demand-paged executable (QMAGIC) |
| |
| or something similar, your system is \c{a.out}, and you should use |
| \c{-f aout} instead (Linux \c{a.out} systems have long been obsolete, |
| and are rare these days.) |
| |
| Like Unix compilers and assemblers, NASM is silent unless it |
| goes wrong: you won't see any output at all, unless it gives error |
| messages. |
| |
| |
| \S{opt-o} The \i\c{-o} Option: Specifying the Output File Name |
| |
| NASM will normally choose the name of your output file for you; |
| precisely how it does this is dependent on the object file format. |
| For Microsoft object file formats (\c{obj}, \c{win32} and \c{win64}), |
| it will remove the \c{.asm} \i{extension} (or whatever extension you |
| like to use - NASM doesn't care) from your source file name and |
| substitute \c{.obj}. For Unix object file formats (\c{aout}, \c{as86}, |
| \c{coff}, \c{elf32}, \c{elf64}, \c{elfx32}, \c{ieee}, \c{macho32} and |
| \c{macho64}) it will substitute \c{.o}. For \c{dbg}, \c{rdf}, \c{ith} |
| and \c{srec}, it will use \c{.dbg}, \c{.rdf}, \c{.ith} and \c{.srec}, |
| respectively, and for the \c{bin} format it will simply remove the |
| extension, so that \c{myfile.asm} produces the output file \c{myfile}. |
| |
| If the output file already exists, NASM will overwrite it, unless it |
| has the same name as the input file, in which case it will give a |
| warning and use \i\c{nasm.out} as the output file name instead. |
| |
| For situations in which this behaviour is unacceptable, NASM |
| provides the \c{-o} command-line option, which allows you to specify |
| your desired output file name. You invoke \c{-o} by following it |
| with the name you wish for the output file, either with or without |
| an intervening space. For example: |
| |
| \c nasm -f bin program.asm -o program.com |
| \c nasm -f bin driver.asm -odriver.sys |
| |
| Note that this is a small o, and is different from a capital O , which |
| is used to specify the number of optimisation passes required. See \k{opt-O}. |
| |
| |
| \S{opt-f} The \i\c{-f} Option: Specifying the \i{Output File Format} |
| |
| If you do not supply the \c{-f} option to NASM, it will choose an |
| output file format for you itself. In the distribution versions of |
| NASM, the default is always \i\c{bin}; if you've compiled your own |
| copy of NASM, you can redefine \i\c{OF_DEFAULT} at compile time and |
| choose what you want the default to be. |
| |
| Like \c{-o}, the intervening space between \c{-f} and the output |
| file format is optional; so \c{-f elf} and \c{-felf} are both valid. |
| |
| A complete list of the available output file formats can be given by |
| issuing the command \i\c{nasm -h}. |
| |
| |
| \S{opt-l} The \i\c{-l} Option: Generating a \i{Listing File} |
| |
| If you supply the \c{-l} option to NASM, followed (with the usual |
| optional space) by a file name, NASM will generate a |
| \i{source-listing file} for you, in which addresses and generated |
| code are listed on the left, and the actual source code, with |
| expansions of multi-line macros (except those which specifically |
| request no expansion in source listings: see \k{nolist}) on the |
| right. For example: |
| |
| \c nasm -f elf myfile.asm -l myfile.lst |
| |
| If a list file is selected, you may turn off listing for a |
| section of your source with \c{[list -]}, and turn it back on |
| with \c{[list +]}, (the default, obviously). There is no "user |
| form" (without the brackets). This can be used to list only |
| sections of interest, avoiding excessively long listings. |
| |
| \S{opt-L} The \i\c{-L} Option: Additional or Modified Listing Info |
| |
| Use this option to specify listing output details. |
| |
| Supported options are: |
| |
| \b \c{-Lb} show builtin macro packages (standard and \c{%use}) |
| |
| \b \c{-Ld} show byte and repeat counts in decimal, not hex |
| |
| \b \c{-Le} show the preprocessed input |
| |
| \b \c{-Lf} ignore \c{.nolist} and force listing output |
| |
| \b \c{-Lm} show multi-line macro calls with expanded parameters |
| |
| \b \c{-Lp} output a list file in every pass, in case of errors |
| |
| \b \c{-Ls} show all single-line macro definitions |
| |
| \b \c{-Lw} flush the output after every line (very slow!) |
| |
| \b \c{-L+} enable \e{all} listing options |
| |
| These options can be enabled or disabled at runtime using the |
| \c{%pragma list options} directive: |
| |
| \c %pragma list options [+|-]flags... |
| |
| For example, to turn on the \c{d} and \c{m} flags but disable the |
| \c{s} flag: |
| |
| \c %pragma list options +dm -s |
| |
| For forward compatility reasons, an undefined flag will be |
| ignored. Thus, a new flag introduced in a newer version of NASM can be |
| specified without breaking older versions. Listing flags will always |
| be a single alphanumeric character and are case sensitive. |
| |
| \S{opt-M} The \i\c{-M} Option: Generate \i{Makefile Dependencies} |
| |
| This option can be used to generate makefile dependencies on stdout. |
| This can be redirected to a file for further processing. For example: |
| |
| \c nasm -M myfile.asm > myfile.dep |
| |
| |
| \S{opt-MG} The \i\c{-MG} Option: Generate \i{Makefile Dependencies} |
| |
| This option can be used to generate makefile dependencies on stdout. |
| This differs from the \c{-M} option in that if a nonexisting file is |
| encountered, it is assumed to be a generated file and is added to the |
| dependency list without a prefix. |
| |
| |
| \S{opt-MF} The \i\c\{-MF} Option: Set Makefile Dependency File |
| |
| This option can be used with the \c{-M} or \c{-MG} options to send the |
| output to a file, rather than to stdout. For example: |
| |
| \c nasm -M -MF myfile.dep myfile.asm |
| |
| |
| \S{opt-MD} The \i\c{-MD} Option: Assemble and Generate Dependencies |
| |
| The \c{-MD} option acts as the combination of the \c{-M} and \c{-MF} |
| options (i.e. a filename has to be specified.) However, unlike the |
| \c{-M} or \c{-MG} options, \c{-MD} does \e{not} inhibit the normal |
| operation of the assembler. Use this to automatically generate |
| updated dependencies with every assembly session. For example: |
| |
| \c nasm -f elf -o myfile.o -MD myfile.dep myfile.asm |
| |
| If the argument after \c{-MD} is an option rather than a filename, |
| then the output filename is the first applicable one of: |
| |
| \b the filename set in the \c{-MF} option; |
| |
| \b the output filename from the \c{-o} option with \c{.d} appended; |
| |
| \b the input filename with the extension set to \c{.d}. |
| |
| |
| \S{opt-MT} The \i\c{-MT} Option: Dependency Target Name |
| |
| The \c{-MT} option can be used to override the default name of the |
| dependency target. This is normally the same as the output filename, |
| specified by the \c{-o} option. |
| |
| |
| \S{opt-MQ} The \i\c{-MQ} Option: Dependency Target Name (Quoted) |
| |
| The \c{-MQ} option acts as the \c{-MT} option, except it tries to |
| quote characters that have special meaning in Makefile syntax. This |
| is not foolproof, as not all characters with special meaning are |
| quotable in Make. The default output (if no \c{-MT} or \c{-MQ} option |
| is specified) is automatically quoted. |
| |
| |
| \S{opt-MP} The \i\c{-MP} Option: Emit phony targets |
| |
| When used with any of the dependency generation options, the \c{-MP} |
| option causes NASM to emit a phony target without dependencies for |
| each header file. This prevents Make from complaining if a header |
| file has been removed. |
| |
| |
| \S{opt-MW} The \i\c{-MW} Option: Watcom Make quoting style |
| |
| This option causes NASM to attempt to quote dependencies according to |
| Watcom Make conventions rather than POSIX Make conventions (also used |
| by most other Make variants.) This quotes \c{#} as \c{$#} rather than |
| \c{\\#}, uses \c{&} rather than \c{\\} for continuation lines, and |
| encloses filenames containing whitespace in double quotes. |
| |
| |
| \S{opt-F} The \i\c{-F} Option: Selecting a \i{Debug Information Format} |
| |
| This option is used to select the format of the debug information |
| emitted into the output file, to be used by a debugger (or \e{will} |
| be). Prior to version 2.03.01, the use of this switch did \e{not} enable |
| output of the selected debug info format. Use \c{-g}, see \k{opt-g}, |
| to enable output. Versions 2.03.01 and later automatically enable \c{-g} |
| if \c{-F} is specified. |
| |
| A complete list of the available debug file formats for an output |
| format can be seen by issuing the command \c{nasm -h}. Not |
| all output formats currently support debugging output. |
| |
| This should not be confused with the \c{-f dbg} output format option, |
| see \k{dbgfmt}. |
| |
| |
| \S{opt-g} The \i\c{-g} Option: Enabling \i{Debug Information}. |
| |
| This option can be used to generate debugging information in the specified |
| format. See \k{opt-F}. Using \c{-g} without \c{-F} results in emitting |
| debug info in the default format, if any, for the selected output format. |
| If no debug information is currently implemented in the selected output |
| format, \c{-g} is \e{silently ignored}. |
| |
| |
| \S{opt-X} The \i\c{-X} Option: Selecting an \i{Error Reporting Format} |
| |
| This option can be used to select an error reporting format for any |
| error messages that might be produced by NASM. |
| |
| Currently, two error reporting formats may be selected. They are |
| the \c{-Xvc} option and the \c{-Xgnu} option. The GNU format is |
| the default and looks like this: |
| |
| \c filename.asm:65: error: specific error message |
| |
| where \c{filename.asm} is the name of the source file in which the |
| error was detected, \c{65} is the source file line number on which |
| the error was detected, \c{error} is the severity of the error (this |
| could be \c{warning}), and \c{specific error message} is a more |
| detailed text message which should help pinpoint the exact problem. |
| |
| The other format, specified by \c{-Xvc} is the style used by Microsoft |
| Visual C++ and some other programs. It looks like this: |
| |
| \c filename.asm(65) : error: specific error message |
| |
| where the only difference is that the line number is in parentheses |
| instead of being delimited by colons. |
| |
| See also the \c{Visual C++} output format, \k{win32fmt}. |
| |
| \S{opt-Z} The \i\c{-Z} Option: Send Errors to a File |
| |
| Under \I{DOS}\c{MS-DOS} it can be difficult (though there are ways) to |
| redirect the standard-error output of a program to a file. Since |
| NASM usually produces its warning and \i{error messages} on |
| \i\c{stderr}, this can make it hard to capture the errors if (for |
| example) you want to load them into an editor. |
| |
| NASM therefore provides the \c{-Z} option, taking a filename argument |
| which causes errors to be sent to the specified files rather than |
| standard error. Therefore you can \I{redirecting errors}redirect |
| the errors into a file by typing |
| |
| \c nasm -Z myfile.err -f obj myfile.asm |
| |
| In earlier versions of NASM, this option was called \c{-E}, but it was |
| changed since \c{-E} is an option conventionally used for |
| preprocessing only, with disastrous results. See \k{opt-E}. |
| |
| \S{opt-s} The \i\c{-s} Option: Send Errors to \i\c{stdout} |
| |
| The \c{-s} option redirects \i{error messages} to \c{stdout} rather |
| than \c{stderr}, so it can be redirected under \I{DOS}\c{MS-DOS}. To |
| assemble the file \c{myfile.asm} and pipe its output to the \c{more} |
| program, you can type: |
| |
| \c nasm -s -f obj myfile.asm | more |
| |
| See also the \c{-Z} option, \k{opt-Z}. |
| |
| |
| \S{opt-i} The \i\c{-i}\I\c{-I} Option: Include File Search Directories |
| |
| When NASM sees the \i\c{%include} or \i\c{%pathsearch} directive in a |
| source file (see \k{include}, \k{pathsearch} or \k{incbin}), it will |
| search for the given file not only in the current directory, but also |
| in any directories specified on the command line by the use of the |
| \c{-i} option. Therefore you can include files from a \i{macro |
| library}, for example, by typing |
| |
| \c nasm -ic:\macrolib\ -f obj myfile.asm |
| |
| (As usual, a space between \c{-i} and the path name is allowed, and |
| optional). |
| |
| Prior NASM 2.14 a path provided in the option has been considered as |
| a verbatim copy and providing a path separator been up to a caller. |
| One could implicitly concatenate a search path together with a filename. |
| Still this was rather a trick than something useful. Now the trailing |
| path separator is made to always present, thus \c{-ifoo} will be |
| considered as the \c{-ifoo/} directory. |
| |
| If you want to define a \e{standard} \i{include search path}, |
| similar to \c{/usr/include} on Unix systems, you should place one or |
| more \c{-i} directives in the \c{NASMENV} environment variable (see |
| \k{nasmenv}). |
| |
| For Makefile compatibility with many C compilers, this option can also |
| be specified as \c{-I}. |
| |
| |
| \S{opt-p} The \i\c{-p}\I\c{-P} Option: \I{pre-including files}Pre-Include a File |
| |
| \I\c{%include}NASM allows you to specify files to be |
| \e{pre-included} into your source file, by the use of the \c{-p} |
| option. So running |
| |
| \c nasm myfile.asm -p myinc.inc |
| |
| is equivalent to running \c{nasm myfile.asm} and placing the |
| directive \c{%include "myinc.inc"} at the start of the file. |
| |
| \c{--include} option is also accepted. |
| |
| For consistency with the \c{-I}, \c{-D} and \c{-U} options, this |
| option can also be specified as \c{-P}. |
| |
| |
| |
| \S{opt-d} The \i\c{-d}\I\c{-D} Option: \I{pre-defining macros}Pre-Define a Macro |
| |
| \I\c{%define}Just as the \c{-p} option gives an alternative to placing |
| \c{%include} directives at the start of a source file, the \c{-d} |
| option gives an alternative to placing a \c{%define} directive. You |
| could code |
| |
| \c nasm myfile.asm -dFOO=100 |
| |
| as an alternative to placing the directive |
| |
| \c %define FOO 100 |
| |
| at the start of the file. You can miss off the macro value, as well: |
| the option \c{-dFOO} is equivalent to coding \c{%define FOO}. This |
| form of the directive may be useful for selecting \i{assembly-time |
| options} which are then tested using \c{%ifdef}, for example |
| \c{-dDEBUG}. |
| |
| For Makefile compatibility with many C compilers, this option can also |
| be specified as \c{-D}. |
| |
| |
| \S{opt-u} The \i\c{-u}\I\c{-U} Option: \I{Undefining macros}Undefine a Macro |
| |
| \I\c{%undef}The \c{-u} option undefines a macro that would otherwise |
| have been pre-defined, either automatically or by a \c{-p} or \c{-d} |
| option specified earlier on the command lines. |
| |
| For example, the following command line: |
| |
| \c nasm myfile.asm -dFOO=100 -uFOO |
| |
| would result in \c{FOO} \e{not} being a predefined macro in the |
| program. This is useful to override options specified at a different |
| point in a Makefile. |
| |
| For Makefile compatibility with many C compilers, this option can also |
| be specified as \c{-U}. |
| |
| |
| \S{opt-E} The \i\c{-E}\I{-e} Option: Preprocess Only |
| |
| NASM allows the \i{preprocessor} to be run on its own, up to a |
| point. Using the \c{-E} option (which requires no arguments) will |
| cause NASM to preprocess its input file, expand all the macro |
| references, remove all the comments and preprocessor directives, and |
| print the resulting file on standard output (or save it to a file, |
| if the \c{-o} option is also used). |
| |
| This option cannot be applied to programs which require the |
| preprocessor to evaluate \I{preprocessor expressions}\i{expressions} |
| which depend on the values of symbols: so code such as |
| |
| \c %assign tablesize ($-tablestart) |
| |
| will cause an error in \i{preprocess-only mode}. |
| |
| For compatiblity with older version of NASM, this option can also be |
| written \c{-e}. \c{-E} in older versions of NASM was the equivalent |
| of the current \c{-Z} option, \k{opt-Z}. |
| |
| \S{opt-a} The \i\c{-a} Option: Don't Preprocess At All |
| |
| If NASM is being used as the back end to a compiler, it might be |
| desirable to \I{suppressing preprocessing}suppress preprocessing |
| completely and assume the compiler has already done it, to save time |
| and increase compilation speeds. The \c{-a} option, requiring no |
| argument, instructs NASM to replace its powerful \i{preprocessor} |
| with a \i{stub preprocessor} which does nothing. |
| |
| |
| \S{opt-O} The \i\c{-O} Option: Specifying \i{Multipass Optimization} |
| |
| Using the \c{-O} option, you can tell NASM to carry out different |
| levels of optimization. Multiple flags can be specified after the |
| \c{-O} options, some of which can be combined in a single option, |
| e.g. \c{-Oxv}. |
| |
| \b \c{-O0}: No optimization. All operands take their long forms, |
| if a short form is not specified, except conditional jumps. |
| This is intended to match NASM 0.98 behavior. |
| |
| \b \c{-O1}: Minimal optimization. As above, but immediate operands |
| which will fit in a signed byte are optimized, |
| unless the long form is specified. Conditional jumps default |
| to the long form unless otherwise specified. |
| |
| \b \c{-Ox} (where \c{x} is the actual letter \c{x}): Multipass optimization. |
| Minimize branch offsets and signed immediate bytes, |
| overriding size specification unless the \c{strict} keyword |
| has been used (see \k{strict}). For compatibility with earlier |
| releases, the letter \c{x} may also be any number greater than |
| one. This number has no effect on the actual number of passes. |
| |
| \b \c{-Ov}: At the end of assembly, print the number of passes |
| actually executed. |
| |
| The \c{-Ox} mode is recommended for most uses, and is the default |
| since NASM 2.09. |
| |
| Note that this is a capital \c{O}, and is different from a small \c{o}, which |
| is used to specify the output file name. See \k{opt-o}. |
| |
| |
| \S{opt-t} The \i\c{-t} Option: Enable TASM Compatibility Mode |
| |
| NASM includes a limited form of compatibility with Borland's \i\c{TASM}. |
| When NASM's \c{-t} option is used, the following changes are made: |
| |
| \b local labels may be prefixed with \c{@@} instead of \c{.} |
| |
| \b size override is supported within brackets. In TASM compatible mode, |
| a size override inside square brackets changes the size of the operand, |
| and not the address type of the operand as it does in NASM syntax. E.g. |
| \c{mov eax,[DWORD val]} is valid syntax in TASM compatibility mode. |
| Note that you lose the ability to override the default address type for |
| the instruction. |
| |
| \b unprefixed forms of some directives supported (\c{arg}, \c{elif}, |
| \c{else}, \c{endif}, \c{if}, \c{ifdef}, \c{ifdifi}, \c{ifndef}, |
| \c{include}, \c{local}) |
| |
| \S{opt-w} The \i\c{-w} and \i\c{-W} Options: Enable or Disable Assembly \i{Warnings} |
| |
| NASM can observe many conditions during the course of assembly which |
| are worth mentioning to the user, but not a sufficiently severe |
| error to justify NASM refusing to generate an output file. These |
| conditions are reported like errors, but come up with the word |
| `warning' before the message. Warnings do not prevent NASM from |
| generating an output file and returning a success status to the |
| operating system. |
| |
| Some conditions are even less severe than that: they are only |
| sometimes worth mentioning to the user. Therefore NASM supports the |
| \c{-w} command-line option, which enables or disables certain |
| classes of assembly warning. Such warning classes are described by a |
| name, for example \c{label-orphan}; you can enable warnings of |
| this class by the command-line option \c{-w+label-orphan} and |
| disable it by \c{-w-label-orphan}. |
| |
| The current \i{warning classes} are: |
| |
| \& warnings.src |
| |
| Since version 2.15, NASM has group aliases for all prefixed warnings, |
| so they can be used to enable or disable all warnings in the group. |
| For example, -w+float enables all warnings with names starting with float-*. |
| |
| Since version 2.00, NASM has also supported the \c{gcc}-like syntax |
| \c{-Wwarning-class} and \c{-Wno-warning-class} instead of |
| \c{-w+warning-class} and \c{-w-warning-class}, respectively; both |
| syntaxes work identically. |
| |
| The option \c{-w+error} or \i\c{-Werror} can be used to treat warnings |
| as errors. This can be controlled on a per warning class basis |
| (\c{-w+error=}\e{warning-class} or \c{-Werror=}\e{warning-class}); |
| if no \e{warning-class} is specified NASM treats it as |
| \c{-w+error=all}; the same applies to \c{-w-error} or |
| \i\c{-Wno-error}, |
| of course. |
| |
| In addition, you can control warnings in the source code itself, using |
| the \i\c{[WARNING]} directive. See \k{asmdir-warning}. |
| |
| |
| \S{opt-v} The \i\c{-v} Option: Display \i{Version} Info |
| |
| Typing \c{NASM -v} will display the version of NASM which you are using, |
| and the date on which it was compiled. |
| |
| You will need the version number if you report a bug. |
| |
| For command-line compatibility with Yasm, the form \i\c{--v} is also |
| accepted for this option starting in NASM version 2.11.05. |
| |
| |
| \S{opt-pfix} The \i\c{--(g|l)prefix}, \i\c{--(g|l)postfix} Options. |
| |
| The \c{--(g)prefix} options prepend the given argument |
| to all \c{extern}, \c{common}, \c{static}, and \c{global} symbols, and the |
| \c{--lprefix} option prepends to all other symbols. Similarly, |
| \c{--(g)postfix} and \c{--lpostfix} options append |
| the argument in the exactly same way as the \c{--xxprefix} options does. |
| |
| Running this: |
| |
| \c nasm -f macho --gprefix _ |
| |
| is equivalent to place the directive with \c{%pragma macho gprefix _} |
| at the start of the file (\k{mangling}). It will prepend the underscore |
| to all global and external variables, as C requires it in some, but not all, |
| system calling conventions. |
| |
| \S{opt-pragma} The \i\c{--pragma} Option |
| |
| NASM accepts an argument as \c{%pragma} option, which is like placing |
| a \c{%pragma} preprocess statement at the beginning of the source. |
| Running this: |
| |
| \c nasm -f macho --pragma "macho gprefix _" |
| |
| is equivalent to the example in \k{opt-pfix}. See \k{pragma}. |
| |
| |
| \S{opt-before} The \i\c{--before} Option |
| |
| A preprocess statement can be accepted with this option. The example |
| shown in \k{opt-pragma} is the same as running this: |
| |
| \c nasm -f macho --before "%pragma macho gprefix _" |
| |
| |
| \S{opt-limit} The \i\c{--limit-X} Option |
| |
| This option allows user to setup various maximum values after which |
| NASM will terminate with a fatal error rather than consume arbitrary |
| amount of compute time. Each limit can be set to a positive number or |
| \c{unlimited}. |
| |
| \b\c{--limit-passes}: Number of maximum allowed passes. Default is |
| \c{unlimited}. |
| |
| \b\c{--limit-stalled-passes}: Maximum number of allowed unfinished |
| passes. Default is 1000. |
| |
| \b\c{--limit-macro-levels}: Define maximum depth of macro expansion |
| (in preprocess). Default is 10000 |
| |
| \b\c{--limit-macro-tokens}: Maximum number of tokens processed during |
| single-line macro expansion. Default is 10000000. |
| |
| \b\c{--limit-mmacros}: Maximum number of multi-line macros processed |
| before returning to the top-level input. Default is 100000. |
| |
| \b\c{--limit-rep}: Maximum number of allowed preprocessor loop, defined |
| under \c{%rep}. Default is 1000000. |
| |
| \b\c{--limit-eval}: This number sets the boundary condition of allowed |
| expression length. Default is 8192 on most systems. |
| |
| \b\c{--limit-lines}: Total number of source lines allowed to be |
| processed. Default is 2000000000. |
| |
| For example, set the maximum line count to 1000: |
| |
| \c nasm --limit-lines 1000 |
| |
| Limits can also be set via the directive \c{%pragma limit}, for |
| example: |
| |
| \c %pragma limit lines 1000 |
| |
| |
| \S{opt-keep-all} The \i\c{--keep-all} Option |
| |
| This option prevents NASM from deleting any output files even if an |
| error happens. |
| |
| \S{opt-no-line} The \i\c{--no-line} Option |
| |
| If this option is given, all \i\c{%line} directives in the source code |
| are ignored. This can be useful for debugging already preprocessed |
| code. See \k{line}. |
| |
| |
| \S{nasmenv} The \i\c{NASMENV} \i{Environment} Variable |
| |
| If you define an environment variable called \c{NASMENV}, the program |
| will interpret it as a list of extra command-line options, which are |
| processed before the real command line. You can use this to define |
| standard search directories for include files, by putting \c{-i} |
| options in the \c{NASMENV} variable. |
| |
| The value of the variable is split up at white space, so that the |
| value \c{-s -ic:\\nasmlib\\} will be treated as two separate options. |
| However, that means that the value \c{-dNAME="my name"} won't do |
| what you might want, because it will be split at the space and the |
| NASM command-line processing will get confused by the two |
| nonsensical words \c{-dNAME="my} and \c{name"}. |
| |
| To get round this, NASM provides a feature whereby, if you begin the |
| \c{NASMENV} environment variable with some character that isn't a minus |
| sign, then NASM will treat this character as the \i{separator |
| character} for options. So setting the \c{NASMENV} variable to the |
| value \c{!-s!-ic:\\nasmlib\\} is equivalent to setting it to \c{-s |
| -ic:\\nasmlib\\}, but \c{!-dNAME="my name"} will work. |
| |
| This environment variable was previously called \c{NASM}. This was |
| changed with version 0.98.31. |
| |
| |
| \H{qstart} \i{Quick Start} for \i{MASM} Users |
| |
| If you're used to writing programs with MASM, or with \i{TASM} in |
| MASM-compatible (non-Ideal) mode, or with \i\c{a86}, this section |
| attempts to outline the major differences between MASM's syntax and |
| NASM's. If you're not already used to MASM, it's probably worth |
| skipping this section. |
| |
| |
| \S{qscs} NASM Is \I{case sensitivity}Case-Sensitive |
| |
| One simple difference is that NASM is case-sensitive. It makes a |
| difference whether you call your label \c{foo}, \c{Foo} or \c{FOO}. |
| If you're assembling to \c{DOS} or \c{OS/2} \c{.OBJ} files, you can |
| invoke the \i\c{UPPERCASE} directive (documented in \k{objfmt}) to |
| ensure that all symbols exported to other code modules are forced |
| to be upper case; but even then, \e{within} a single module, NASM |
| will distinguish between labels differing only in case. |
| |
| |
| \S{qsbrackets} NASM Requires \i{Square Brackets} For \i{Memory References} |
| |
| NASM was designed with simplicity of syntax in mind. One of the |
| \i{design goals} of NASM is that it should be possible, as far as is |
| practical, for the user to look at a single line of NASM code |
| and tell what opcode is generated by it. You can't do this in MASM: |
| if you declare, for example, |
| |
| \c foo equ 1 |
| \c bar dw 2 |
| |
| then the two lines of code |
| |
| \c mov ax,foo |
| \c mov ax,bar |
| |
| generate completely different opcodes, despite having |
| identical-looking syntaxes. |
| |
| NASM avoids this undesirable situation by having a much simpler |
| syntax for memory references. The rule is simply that any access to |
| the \e{contents} of a memory location requires square brackets |
| around the address, and any access to the \e{address} of a variable |
| doesn't. So an instruction of the form \c{mov ax,foo} will |
| \e{always} refer to a compile-time constant, whether it's an \c{EQU} |
| or the address of a variable; and to access the \e{contents} of the |
| variable \c{bar}, you must code \c{mov ax,[bar]}. |
| |
| This also means that NASM has no need for MASM's \i\c{OFFSET} |
| keyword, since the MASM code \c{mov ax,offset bar} means exactly the |
| same thing as NASM's \c{mov ax,bar}. If you're trying to get |
| large amounts of MASM code to assemble sensibly under NASM, you |
| can always code \c{%idefine offset} to make the preprocessor treat |
| the \c{OFFSET} keyword as a no-op. |
| |
| This issue is even more confusing in \i\c{a86}, where declaring a |
| label with a trailing colon defines it to be a `label' as opposed to |
| a `variable' and causes \c{a86} to adopt NASM-style semantics; so in |
| \c{a86}, \c{mov ax,var} has different behaviour depending on whether |
| \c{var} was declared as \c{var: dw 0} (a label) or \c{var dw 0} (a |
| word-size variable). NASM is very simple by comparison: |
| \e{everything} is a label. |
| |
| NASM, in the interests of simplicity, also does not support the |
| \i{hybrid syntaxes} supported by MASM and its clones, such as |
| \c{mov ax,table[bx]}, where a memory reference is denoted by one |
| portion outside square brackets and another portion inside. The |
| correct syntax for the above is \c{mov ax,[table+bx]}. Likewise, |
| \c{mov ax,es:[di]} is wrong and \c{mov ax,[es:di]} is right. |
| |
| |
| \S{qstypes} NASM Doesn't Store \i{Variable Types} |
| |
| NASM, by design, chooses not to remember the types of variables you |
| declare. Whereas MASM will remember, on seeing \c{var dw 0}, that |
| you declared \c{var} as a word-size variable, and will then be able |
| to fill in the \i{ambiguity} in the size of the instruction \c{mov |
| var,2}, NASM will deliberately remember nothing about the symbol |
| \c{var} except where it begins, and so you must explicitly code |
| \c{mov word [var],2}. |
| |
| For this reason, NASM doesn't support the \c{LODS}, \c{MOVS}, |
| \c{STOS}, \c{SCAS}, \c{CMPS}, \c{INS}, or \c{OUTS} instructions, |
| but only supports the forms such as \c{LODSB}, \c{MOVSW}, and |
| \c{SCASD}, which explicitly specify the size of the components of |
| the strings being manipulated. |
| |
| |
| \S{qsassume} NASM Doesn't \i\c{ASSUME} |
| |
| As part of NASM's drive for simplicity, it also does not support the |
| \c{ASSUME} directive. NASM will not keep track of what values you |
| choose to put in your segment registers, and will never |
| \e{automatically} generate a \i{segment override} prefix. |
| |
| |
| \S{qsmodel} NASM Doesn't Support \i{Memory Models} |
| |
| NASM also does not have any directives to support different 16-bit |
| memory models. The programmer has to keep track of which functions |
| are supposed to be called with a \i{far call} and which with a |
| \i{near call}, and is responsible for putting the correct form of |
| \c{RET} instruction (\c{RETN} or \c{RETF}; NASM accepts \c{RET} |
| itself as an alternate form for \c{RETN}); in addition, the |
| programmer is responsible for coding CALL FAR instructions where |
| necessary when calling \e{external} functions, and must also keep |
| track of which external variable definitions are far and which are |
| near. |
| |
| |
| \S{qsfpu} \i{Floating-Point} Differences |
| |
| NASM uses different names to refer to floating-point registers from |
| MASM: where MASM would call them \c{ST(0)}, \c{ST(1)} and so on, and |
| \i\c{a86} would call them simply \c{0}, \c{1} and so on, NASM |
| chooses to call them \c{st0}, \c{st1} etc. |
| |
| As of version 0.96, NASM now treats the instructions with |
| \i{`nowait'} forms in the same way as MASM-compatible assemblers. |
| The idiosyncratic treatment employed by 0.95 and earlier was based |
| on a misunderstanding by the authors. |
| |
| |
| \S{qsother} Other Differences |
| |
| For historical reasons, NASM uses the keyword \i\c{TWORD} where MASM |
| and compatible assemblers use \i\c{TBYTE}. |
| |
| Historically, NASM does not declare \i{uninitialized storage} in the |
| same way as MASM: where a MASM programmer might use \c{stack db 64 dup |
| (?)}, NASM requires \c{stack resb 64}, intended to be read as `reserve |
| 64 bytes'. For a limited amount of compatibility, since NASM treats |
| \c{?} as a valid character in symbol names, you can code \c{? equ 0} |
| and then writing \c{dw ?} will at least do something vaguely useful. |
| |
| As of NASM 2.15, the MASM syntax is also supported. |
| |
| In addition to all of this, macros and directives work completely |
| differently to MASM. See \k{preproc} and \k{directive} for further |
| details. |
| |
| \S{masm-compat} MASM compatibility package |
| |
| See \k{pkg_masm}. |
| |
| |
| \C{lang} The NASM Language |
| |
| \H{syntax} Layout of a NASM Source Line |
| |
| Like most assemblers, each NASM source line contains (unless it |
| is a macro, a preprocessor directive or an assembler directive: see |
| \k{preproc} and \k{directive}) some combination of the four fields |
| |
| \c label: instruction operands ; comment |
| |
| As usual, most of these fields are optional; the presence or absence |
| of any combination of a label, an instruction and a comment is allowed. |
| Of course, the operand field is either required or forbidden by the |
| presence and nature of the instruction field. |
| |
| NASM uses backslash (\\) as the line continuation character; if a line |
| ends with backslash, the next line is considered to be a part of the |
| backslash-ended line. |
| |
| NASM places no restrictions on white space within a line: labels may |
| have white space before them, or instructions may have no space |
| before them, or anything. The \i{colon} after a label is also |
| optional. (Note that this means that if you intend to code \c{lodsb} |
| alone on a line, and type \c{lodab} by accident, then that's still a |
| valid source line which does nothing but define a label. Running |
| NASM with the command-line option |
| \I{label-orphan}\c{-w+orphan-labels} will cause it to warn you if |
| you define a label alone on a line without a \i{trailing colon}.) |
| |
| \i{Valid characters} in labels are letters, numbers, \c{_}, \c{$}, |
| \c{#}, \c{@}, \c{~}, \c{.}, and \c{?}. The only characters which may |
| be used as the \e{first} character of an identifier are letters, |
| \c{.} (with special meaning: see \k{locallab}), \c{_} and \c{?}. |
| An identifier may also be prefixed with a \I{$, prefix}\c{$} to |
| indicate that it is intended to be read as an identifier and not a |
| reserved word; thus, if some other module you are linking with |
| defines a symbol called \c{eax}, you can refer to \c{$eax} in NASM |
| code to distinguish the symbol from the register. Maximum length of |
| an identifier is 4095 characters. |
| |
| The instruction field may contain any machine instruction: Pentium |
| and P6 instructions, FPU instructions, MMX instructions and even |
| undocumented instructions are all supported. The instruction may be |
| prefixed by \c{LOCK}, \c{REP}, \c{REPE}/\c{REPZ}, \c{REPNE}/\c{REPNZ}, |
| \c{XACQUIRE}/\c{XRELEASE} or \c{BND}/\c{NOBND}, in the usual way. Explicit |
| \I{address-size prefixes}address-size and \i{operand-size prefixes} \i\c{A16}, |
| \i\c{A32}, \i\c{A64}, \i\c{O16} and \i\c{O32}, \i\c{O64} are provided - one example of their use |
| is given in \k{mixsize}. You can also use the name of a \I{segment |
| override}segment register as an instruction prefix: coding |
| \c{es mov [bx],ax} is equivalent to coding \c{mov [es:bx],ax}. We |
| recommend the latter syntax, since it is consistent with other |
| syntactic features of the language, but for instructions such as |
| \c{LODSB}, which has no operands and yet can require a segment |
| override, there is no clean syntactic way to proceed apart from |
| \c{es lodsb}. |
| |
| An instruction is not required to use a prefix: prefixes such as |
| \c{CS}, \c{A32}, \c{LOCK} or \c{REPE} can appear on a line by |
| themselves, and NASM will just generate the prefix bytes. |
| |
| In addition to actual machine instructions, NASM also supports a |
| number of pseudo-instructions, described in \k{pseudop}. |
| |
| Instruction \i{operands} may take a number of forms: they can be |
| registers, described simply by the register name (e.g. \c{ax}, |
| \c{bp}, \c{ebx}, \c{cr0}: NASM does not use the \c{gas}-style |
| syntax in which register names must be prefixed by a \c{%} sign), or |
| they can be \i{effective addresses} (see \k{effaddr}), constants |
| (\k{const}) or expressions (\k{expr}). |
| |
| For x87 \i{floating-point} instructions, NASM accepts a wide range of |
| syntaxes: you can use two-operand forms like MASM supports, or you |
| can use NASM's native single-operand forms in most cases. |
| \# Details of |
| \# all forms of each supported instruction are given in |
| \# \k{iref}. |
| For example, you can code: |
| |
| \c fadd st1 ; this sets st0 := st0 + st1 |
| \c fadd st0,st1 ; so does this |
| \c |
| \c fadd st1,st0 ; this sets st1 := st1 + st0 |
| \c fadd to st1 ; so does this |
| |
| Almost any x87 floating-point instruction that references memory must |
| use one of the prefixes \i\c{DWORD}, \i\c{QWORD} or \i\c{TWORD} to |
| indicate what size of \i{memory operand} it refers to. |
| |
| |
| \H{pseudop} \i{Pseudo-Instructions} |
| |
| Pseudo-instructions are things which, though not real x86 machine |
| instructions, are used in the instruction field anyway because that's |
| the most convenient place to put them. The current pseudo-instructions |
| are \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, \i\c{DO}, |
| \i\c{DY} and \i\c\{DZ}; their \i{uninitialized} counterparts |
| \i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ}, \i\c{REST}, |
| \i\c{RESO}, \i\c{RESY} and \i\c\{RESZ}; the \i\c{INCBIN} command, the |
| \i\c{EQU} command, and the \i\c{TIMES} prefix. |
| |
| |
| \S{db} \c{DB} and Friends: Declaring Initialized Data |
| |
| \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, \i\c{DO}, \i\c{DY} |
| and \i\c{DZ} are used, much as in MASM, to declare initialized data in |
| the output file. They can be invoked in a wide range of ways: |
| \I{floating-point}\I{character constant}\I{string constant} |
| |
| \c db 0x55 ; just the byte 0x55 |
| \c db 0x55,0x56,0x57 ; three bytes in succession |
| \c db 'a',0x55 ; character constants are OK |
| \c db 'hello',13,10,'$' ; so are string constants |
| \c dw 0x1234 ; 0x34 0x12 |
| \c dw 'a' ; 0x61 0x00 (it's just a number) |
| \c dw 'ab' ; 0x61 0x62 (character constant) |
| \c dw 'abc' ; 0x61 0x62 0x63 0x00 (string) |
| \c dd 0x12345678 ; 0x78 0x56 0x34 0x12 |
| \c dd 1.234567e20 ; floating-point constant |
| \c dq 0x123456789abcdef0 ; eight byte constant |
| \c dq 1.234567e20 ; double-precision float |
| \c dt 1.234567e20 ; extended-precision float |
| |
| \c{DT}, \c{DO}, \c{DY} and \c{DZ} do not accept \i{numeric constants} |
| as operands. |
| |
| \I{masmdb} Starting in NASM 2.15, a the following MASM-like features |
| have been implemented: |
| |
| \b A \I{?db}\c{?} argument to declare uninitialized data: |
| |
| \c db ? ; uninitialized data |
| |
| \b A superset of the \i\c{DUP} syntax. The NASM version of this has |
| the following syntax specification; capital letters indicate literal |
| keywords: |
| |
| \c dx := DB | DW | DD | DQ | DT | DO | DY | DZ |
| \c type := BYTE | WORD | DWORD | QWORD | TWORD | OWORD | YWORD | ZWORD |
| \c atom := expression | string | float | '?' |
| \c parlist := '(' value [, value ...] ')' |
| \c duplist := expression DUP [type] ['%'] parlist |
| \c list := duplist | '%' parlist | type ['%'] parlist |
| \c value := atom | type value | list |
| \c |
| \c stmt := dx value [, value...] |
| |
| \> Note that a \e{list} needs to be prefixed with a \I{%db}\c{%} sign unless |
| prefixed by either \c{DUP} or a \e{type} in order to avoid confusing it with |
| a parentesis starting an expression. The following expressions are all |
| valid: |
| |
| \c db 33 |
| \c db (44) ; Integer expression |
| \c ; db (44,55) ; Invalid - error |
| \c db %(44,55) |
| \c db %('XX','YY') |
| \c db ('AA') ; Integer expression - outputs single byte |
| \c db %('BB') ; List, containing a string |
| \c db ? |
| \c db 6 dup (33) |
| \c db 6 dup (33, 34) |
| \c db 6 dup (33, 34), 35 |
| \c db 7 dup (99) |
| \c db 7 dup dword (?, word ?, ?) |
| \c dw byte (?,44) |
| \c dw 3 dup (0xcc, 4 dup byte ('PQR'), ?), 0xabcd |
| \c dd 16 dup (0xaaaa, ?, 0xbbbbbb) |
| \c dd 64 dup (?) |
| |
| \S{resb} \c{RESB} and Friends: Declaring \i{Uninitialized} Data |
| |
| \i\c{RESB}, \i\c{RESW}, \i\c{RESD}, \i\c{RESQ}, \i\c{REST}, |
| \i\c{RESO}, \i\c{RESY} and \i\c\{RESZ} are designed to be used in the |
| BSS section of a module: they declare \e{uninitialized} storage |
| space. Each takes a single operand, which is the number of bytes, |
| words, doublewords or whatever to reserve. The operand to a |
| \c{RESB}-type pseudo-instruction is a \i\e{critical expression}: see |
| \k{crit}. |
| |
| For example: |
| |
| \c buffer: resb 64 ; reserve 64 bytes |
| \c wordvar: resw 1 ; reserve a word |
| \c realarray resq 10 ; array of ten reals |
| \c ymmval: resy 1 ; one YMM register |
| \c zmmvals: resz 32 ; 32 ZMM registers |
| |
| \I{masmdb} Since NASM 2.15, the MASM syntax of using \I{?db}\c{?} |
| and \i\c{DUP} in the \c{D}\e{x} directives is also supported. Thus, |
| the above example could also be written: |
| |
| \c buffer: db 64 dup (?) ; reserve 64 bytes |
| \c wordvar: dw ? ; reserve a word |
| \c realarray dq 10 dup (?) ; array of ten reals |
| \c ymmval: dy ? ; one YMM register |
| \c zmmvals: dz 32 dup (?) ; 32 ZMM registers |
| |
| |
| \S{incbin} \i\c{INCBIN}: Including External \i{Binary Files} |
| |
| \c{INCBIN} includes binary file data verbatim into the output |
| file. This can be handy for (for example) including \i{graphics} and |
| \i{sound} data directly into a game executable file. It can be called |
| in one of these three ways: |
| |
| \c incbin "file.dat" ; include the whole file |
| \c incbin "file.dat",1024 ; skip the first 1024 bytes |
| \c incbin "file.dat",1024,512 ; skip the first 1024, and |
| \c ; actually include at most 512 |
| |
| \c{INCBIN} is both a directive and a standard macro; the standard |
| macro version searches for the file in the include file search path |
| and adds the file to the dependency lists. This macro can be |
| overridden if desired. |
| |
| |
| \S{equ} \i\c{EQU}: Defining Constants |
| |
| \c{EQU} defines a symbol to a given constant value: when \c{EQU} is |
| used, the source line must contain a label. The action of \c{EQU} is |
| to define the given label name to the value of its (only) operand. |
| This definition is absolute, and cannot change later. So, for |
| example, |
| |
| \c message db 'hello, world' |
| \c msglen equ $-message |
| |
| defines \c{msglen} to be the constant 12. \c{msglen} may not then be |
| redefined later. This is not a \i{preprocessor} definition either: |
| the value of \c{msglen} is evaluated \e{once}, using the value of |
| \c{$} (see \k{expr} for an explanation of \c{$}) at the point of |
| definition, rather than being evaluated wherever it is referenced |
| and using the value of \c{$} at the point of reference. |
| |
| |
| \S{times} \i\c{TIMES}: \i{Repeating} Instructions or Data |
| |
| The \c{TIMES} prefix causes the instruction to be assembled multiple |
| times. This is partly present as NASM's equivalent of the \i\c{DUP} |
| syntax supported by \i{MASM}-compatible assemblers, in that you can |
| code |
| |
| \c zerobuf: times 64 db 0 |
| |
| or similar things; but \c{TIMES} is more versatile than that. The |
| argument to \c{TIMES} is not just a numeric constant, but a numeric |
| \e{expression}, so you can do things like |
| |
| \c buffer: db 'hello, world' |
| \c times 64-$+buffer db ' ' |
| |
| which will store exactly enough spaces to make the total length of |
| \c{buffer} up to 64. Finally, \c{TIMES} can be applied to ordinary |
| instructions, so you can code trivial \i{unrolled loops} in it: |
| |
| \c times 100 movsb |
| |
| Note that there is no effective difference between \c{times 100 resb |
| 1} and \c{resb 100}, except that the latter will be assembled about |
| 100 times faster due to the internal structure of the assembler. |
| |
| The operand to \c{TIMES} is a critical expression (\k{crit}). |
| |
| Note also that \c{TIMES} can't be applied to \i{macros}: the reason |
| for this is that \c{TIMES} is processed after the macro phase, which |
| allows the argument to \c{TIMES} to contain expressions such as |
| \c{64-$+buffer} as above. To repeat more than one line of code, or a |
| complex macro, use the preprocessor \i\c{%rep} directive. |
| |
| |
| \H{effaddr} Effective Addresses |
| |
| An \i{effective address} is any operand to an instruction which |
| \I{memory reference}references memory. Effective addresses, in NASM, |
| have a very simple syntax: they consist of an expression evaluating |
| to the desired address, enclosed in \i{square brackets}. For |
| example: |
| |
| \c wordvar dw 123 |
| \c mov ax,[wordvar] |
| \c mov ax,[wordvar+1] |
| \c mov ax,[es:wordvar+bx] |
| |
| Anything not conforming to this simple system is not a valid memory |
| reference in NASM, for example \c{es:wordvar[bx]}. |
| |
| More complicated effective addresses, such as those involving more |
| than one register, work in exactly the same way: |
| |
| \c mov eax,[ebx*2+ecx+offset] |
| \c mov ax,[bp+di+8] |
| |
| NASM is capable of doing \i{algebra} on these effective addresses, |
| so that things which don't necessarily \e{look} legal are perfectly |
| all right: |
| |
| \c mov eax,[ebx*5] ; assembles as [ebx*4+ebx] |
| \c mov eax,[label1*2-label2] ; ie [label1+(label1-label2)] |
| |
| Some forms of effective address have more than one assembled form; |
| in most such cases NASM will generate the smallest form it can. For |
| example, there are distinct assembled forms for the 32-bit effective |
| addresses \c{[eax*2+0]} and \c{[eax+eax]}, and NASM will generally |
| generate the latter on the grounds that the former requires four |
| bytes to store a zero offset. |
| |
| NASM has a hinting mechanism which will cause \c{[eax+ebx]} and |
| \c{[ebx+eax]} to generate different opcodes; this is occasionally |
| useful because \c{[esi+ebp]} and \c{[ebp+esi]} have different |
| default segment registers. |
| |
| However, you can force NASM to generate an effective address in a |
| particular form by the use of the keywords \c{BYTE}, \c{WORD}, |
| \c{DWORD} and \c{NOSPLIT}. If you need \c{[eax+3]} to be assembled |
| using a double-word offset field instead of the one byte NASM will |
| normally generate, you can code \c{[dword eax+3]}. Similarly, you |
| can force NASM to use a byte offset for a small value which it |
| hasn't seen on the first pass (see \k{crit} for an example of such a |
| code fragment) by using \c{[byte eax+offset]}. As special cases, |
| \c{[byte eax]} will code \c{[eax+0]} with a byte offset of zero, and |
| \c{[dword eax]} will code it with a double-word offset of zero. The |
| normal form, \c{[eax]}, will be coded with no offset field. |
| |
| The form described in the previous paragraph is also useful if you |
| are trying to access data in a 32-bit segment from within 16 bit code. |
| For more information on this see the section on mixed-size addressing |
| (\k{mixaddr}). In particular, if you need to access data with a known |
| offset that is larger than will fit in a 16-bit value, if you don't |
| specify that it is a dword offset, nasm will cause the high word of |
| the offset to be lost. |
| |
| Similarly, NASM will split \c{[eax*2]} into \c{[eax+eax]} because |
| that allows the offset field to be absent and space to be saved; in |
| fact, it will also split \c{[eax*2+offset]} into |
| \c{[eax+eax+offset]}. You can combat this behaviour by the use of |
| the \c{NOSPLIT} keyword: \c{[nosplit eax*2]} will force |
| \c{[eax*2+0]} to be generated literally. \c{[nosplit eax*1]} also has the |
| same effect. In another way, a split EA form \c{[0, eax*2]} can be used, too. |
| However, \c{NOSPLIT} in \c{[nosplit eax+eax]} will be ignored because user's |
| intention here is considered as \c{[eax+eax]}. |
| |
| In 64-bit mode, NASM will by default generate absolute addresses. The |
| \i\c{REL} keyword makes it produce \c{RIP}-relative addresses. Since |
| this is frequently the normally desired behaviour, see the \c{DEFAULT} |
| directive (\k{default}). The keyword \i\c{ABS} overrides \i\c{REL}. |
| |
| A new form of split effective addres syntax is also supported. This is |
| mainly intended for mib operands as used by MPX instructions, but can |
| be used for any memory reference. The basic concept of this form is |
| splitting base and index. |
| |
| \c mov eax,[ebx+8,ecx*4] ; ebx=base, ecx=index, 4=scale, 8=disp |
| |
| For mib operands, there are several ways of writing effective address depending |
| on the tools. NASM supports all currently possible ways of mib syntax: |
| |
| \c ; bndstx |
| \c ; next 5 lines are parsed same |
| \c ; base=rax, index=rbx, scale=1, displacement=3 |
| \c bndstx [rax+0x3,rbx], bnd0 ; NASM - split EA |
| \c bndstx [rbx*1+rax+0x3], bnd0 ; GAS - '*1' indecates an index reg |
| \c bndstx [rax+rbx+3], bnd0 ; GAS - without hints |
| \c bndstx [rax+0x3], bnd0, rbx ; ICC-1 |
| \c bndstx [rax+0x3], rbx, bnd0 ; ICC-2 |
| |
| When broadcasting decorator is used, the opsize keyword should match |
| the size of each element. |
| |
| \c VDIVPS zmm4, zmm5, dword [rbx]{1to16} ; single-precision float |
| \c VDIVPS zmm4, zmm5, zword [rbx] ; packed 512 bit memory |
| |
| |
| \H{const} \i{Constants} |
| |
| NASM understands four different types of constant: numeric, |
| character, string and floating-point. |
| |
| |
| \S{numconst} \i{Numeric Constants} |
| |
| A numeric constant is simply a number. NASM allows you to specify |
| numbers in a variety of number bases, in a variety of ways: you can |
| suffix \c{H} or \c{X}, \c{D} or \c{T}, \c{Q} or \c{O}, and \c{B} or |
| \c{Y} for \i{hexadecimal}, \i{decimal}, \i{octal} and \i{binary} |
| respectively, or you can prefix \c{0x}, for hexadecimal in the style |
| of C, or you can prefix \c{$} for hexadecimal in the style of Borland |
| Pascal or Motorola Assemblers. Note, though, that the \I{$, |
| prefix}\c{$} prefix does double duty as a prefix on identifiers (see |
| \k{syntax}), so a hex number prefixed with a \c{$} sign must have a |
| digit after the \c{$} rather than a letter. In addition, current |
| versions of NASM accept the prefix \c{0h} for hexadecimal, \c{0d} or |
| \c{0t} for decimal, \c{0o} or \c{0q} for octal, and \c{0b} or \c{0y} |
| for binary. Please note that unlike C, a \c{0} prefix by itself does |
| \e{not} imply an octal constant! |
| |
| Numeric constants can have underscores (\c{_}) interspersed to break |
| up long strings. |
| |
| Some examples (all producing exactly the same code): |
| |
| \c mov ax,200 ; decimal |
| \c mov ax,0200 ; still decimal |
| \c mov ax,0200d ; explicitly decimal |
| \c mov ax,0d200 ; also decimal |
| \c mov ax,0c8h ; hex |
| \c mov ax,$0c8 ; hex again: the 0 is required |
| \c mov ax,0xc8 ; hex yet again |
| \c mov ax,0hc8 ; still hex |
| \c mov ax,310q ; octal |
| \c mov ax,310o ; octal again |
| \c mov ax,0o310 ; octal yet again |
| \c mov ax,0q310 ; octal yet again |
| \c mov ax,11001000b ; binary |
| \c mov ax,1100_1000b ; same binary constant |
| \c mov ax,1100_1000y ; same binary constant once more |
| \c mov ax,0b1100_1000 ; same binary constant yet again |
| \c mov ax,0y1100_1000 ; same binary constant yet again |
| |
| \S{strings} \I{Strings}\i{Character Strings} |
| |
| A character string consists of up to eight characters enclosed in |
| either single quotes (\c{'...'}), double quotes (\c{"..."}) or |
| backquotes (\c{`...`}). Single or double quotes are equivalent to |
| NASM (except of course that surrounding the constant with single |
| quotes allows double quotes to appear within it and vice versa); the |
| contents of those are represented verbatim. Strings enclosed in |
| backquotes support C-style \c{\\}-escapes for special characters. |
| |
| |
| The following \i{escape sequences} are recognized by backquoted strings: |
| |
| \c \' single quote (') |
| \c \" double quote (") |
| \c \` backquote (`) |
| \c \\\ backslash (\) |
| \c \? question mark (?) |
| \c \a BEL (ASCII 7) |
| \c \b BS (ASCII 8) |
| \c \t TAB (ASCII 9) |
| \c \n LF (ASCII 10) |
| \c \v VT (ASCII 11) |
| \c \f FF (ASCII 12) |
| \c \r CR (ASCII 13) |
| \c \e ESC (ASCII 27) |
| \c \377 Up to 3 octal digits - literal byte |
| \c \xFF Up to 2 hexadecimal digits - literal byte |
| \c \u1234 4 hexadecimal digits - Unicode character |
| \c \U12345678 8 hexadecimal digits - Unicode character |
| |
| All other escape sequences are reserved. Note that \c{\\0}, meaning a |
| \c{NUL} character (ASCII 0), is a special case of the octal escape |
| sequence. |
| |
| \i{Unicode} characters specified with \c{\\u} or \c{\\U} are converted to |
| \i{UTF-8}. For example, the following lines are all equivalent: |
| |
| \c db `\u263a` ; UTF-8 smiley face |
| \c db `\xe2\x98\xba` ; UTF-8 smiley face |
| \c db 0E2h, 098h, 0BAh ; UTF-8 smiley face |
| |
| |
| \S{chrconst} \i{Character Constants} |
| |
| A character constant consists of a string up to eight bytes long, used |
| in an expression context. It is treated as if it was an integer. |
| |
| A character constant with more than one byte will be arranged |
| with \i{little-endian} order in mind: if you code |
| |
| \c mov eax,'abcd' |
| |
| then the constant generated is not \c{0x61626364}, but |
| \c{0x64636261}, so that if you were then to store the value into |
| memory, it would read \c{abcd} rather than \c{dcba}. This is also |
| the sense of character constants understood by the Pentium's |
| \i\c{CPUID} instruction. |
| |
| |
| \S{strconst} \i{String Constants} |
| |
| String constants are character strings used in the context of some |
| pseudo-instructions, namely the |
| \I\c{DW}\I\c{DD}\I\c{DQ}\I\c{DT}\I\c{DO}\I\c{DY}\i\c{DB} family and |
| \i\c{INCBIN} (where it represents a filename.) They are also used in |
| certain preprocessor directives. |
| |
| A string constant looks like a character constant, only longer. It |
| is treated as a concatenation of maximum-size character constants |
| for the conditions. So the following are equivalent: |
| |
| \c db 'hello' ; string constant |
| \c db 'h','e','l','l','o' ; equivalent character constants |
| |
| And the following are also equivalent: |
| |
| \c dd 'ninechars' ; doubleword string constant |
| \c dd 'nine','char','s' ; becomes three doublewords |
| \c db 'ninechars',0,0,0 ; and really looks like this |
| |
| Note that when used in a string-supporting context, quoted strings are |
| treated as a string constants even if they are short enough to be a |
| character constant, because otherwise \c{db 'ab'} would have the same |
| effect as \c{db 'a'}, which would be silly. Similarly, three-character |
| or four-character constants are treated as strings when they are |
| operands to \c{DW}, and so forth. |
| |
| \S{unicode} \I{UTF-16}\I{UTF-32}\i{Unicode} Strings |
| |
| The special operators \i\c{__?utf16?__}, \i\c{__?utf16le?__}, |
| \i\c{__?utf16be?__}, \i\c{__?utf32?__}, \i\c{__?utf32le?__} and |
| \i\c{__?utf32be?__} allows definition of Unicode strings. They take a |
| string in UTF-8 format and converts it to UTF-16 or UTF-32, |
| respectively. Unless the \c{be} forms are specified, the output is |
| littleendian. |
| |
| For example: |
| |
| \c %define u(x) __?utf16?__(x) |
| \c %define w(x) __?utf32?__(x) |
| \c |
| \c dw u('C:\WINDOWS'), 0 ; Pathname in UTF-16 |
| \c dd w(`A + B = \u206a`), 0 ; String in UTF-32 |
| |
| The UTF operators can be applied either to strings passed to the |
| \c{DB} family instructions, or to character constants in an expression |
| context. |
| |
| \S{fltconst} \I{floating-point, constants}Floating-Point Constants |
| |
| \i{Floating-point} constants are acceptable only as arguments to |
| \i\c{DB}, \i\c{DW}, \i\c{DD}, \i\c{DQ}, \i\c{DT}, and \i\c{DO}, or as |
| arguments to the special operators \i\c{__?float8?__}, |
| \i\c{__?float16?__}, \i\c{__?float32?__}, \i\c{__?float64?__}, |
| \i\c{__?float80m?__}, \i\c{__?float80e?__}, \i\c{__?float128l?__}, and |
| \i\c{__?float128h?__}. |
| |
| Floating-point constants are expressed in the traditional form: |
| digits, then a period, then optionally more digits, then optionally an |
| \c{E} followed by an exponent. The period is mandatory, so that NASM |
| can distinguish between \c{dd 1}, which declares an integer constant, |
| and \c{dd 1.0} which declares a floating-point constant. |
| |
| NASM also support C99-style hexadecimal floating-point: \c{0x}, |
| hexadecimal digits, period, optionally more hexadeximal digits, then |
| optionally a \c{P} followed by a \e{binary} (not hexadecimal) exponent |
| in decimal notation. As an extension, NASM additionally supports the |
| \c{0h} and \c{$} prefixes for hexadecimal, as well binary and octal |
| floating-point, using the \c{0b} or \c{0y} and \c{0o} or \c{0q} |
| prefixes, respectively. |
| |
| Underscores to break up groups of digits are permitted in |
| floating-point constants as well. |
| |
| Some examples: |
| |
| \c db -0.2 ; "Quarter precision" |
| \c dw -0.5 ; IEEE 754r/SSE5 half precision |
| \c dd 1.2 ; an easy one |
| \c dd 1.222_222_222 ; underscores are permitted |
| \c dd 0x1p+2 ; 1.0x2^2 = 4.0 |
| \c dq 0x1p+32 ; 1.0x2^32 = 4 294 967 296.0 |
| \c dq 1.e10 ; 10 000 000 000.0 |
| \c dq 1.e+10 ; synonymous with 1.e10 |
| \c dq 1.e-10 ; 0.000 000 000 1 |
| \c dt 3.141592653589793238462 ; pi |
| \c do 1.e+4000 ; IEEE 754r quad precision |
| |
| The 8-bit "quarter-precision" floating-point format is |
| sign:exponent:mantissa = 1:4:3 with an exponent bias of 7. This |
| appears to be the most frequently used 8-bit floating-point format, |
| although it is not covered by any formal standard. This is sometimes |
| called a "\i{minifloat}." |
| |
| The special operators are used to produce floating-point numbers in |
| other contexts. They produce the binary representation of a specific |
| floating-point number as an integer, and can use anywhere integer |
| constants are used in an expression. \c{__?float80m?__} and |
| \c{__?float80e?__} produce the 64-bit mantissa and 16-bit exponent of an |
| 80-bit floating-point number, and \c{__?float128l?__} and |
| \c{__?float128h?__} produce the lower and upper 64-bit halves of a 128-bit |
| floating-point number, respectively. |
| |
| For example: |
| |
| \c mov rax,__?float64?__(3.141592653589793238462) |
| |
| ... would assign the binary representation of pi as a 64-bit floating |
| point number into \c{RAX}. This is exactly equivalent to: |
| |
| \c mov rax,0x400921fb54442d18 |
| |
| NASM cannot do compile-time arithmetic on floating-point constants. |
| This is because NASM is designed to be portable - although it always |
| generates code to run on x86 processors, the assembler itself can |
| run on any system with an ANSI C compiler. Therefore, the assembler |
| cannot guarantee the presence of a floating-point unit capable of |
| handling the \i{Intel number formats}, and so for NASM to be able to |
| do floating arithmetic it would have to include its own complete set |
| of floating-point routines, which would significantly increase the |
| size of the assembler for very little benefit. |
| |
| The special tokens \i\c{__?Infinity?__}, \i\c{__?QNaN?__} (or |
| \i\c{__?NaN?__}) and \i\c{__?SNaN?__} can be used to generate |
| \I{infinity}infinities, quiet \i{NaN}s, and signalling NaNs, |
| respectively. These are normally used as macros: |
| |
| \c %define Inf __?Infinity?__ |
| \c %define NaN __?QNaN?__ |
| \c |
| \c dq +1.5, -Inf, NaN ; Double-precision constants |
| |
| The \c{%use fp} standard macro package contains a set of convenience |
| macros. See \k{pkg_fp}. |
| |
| \S{bcdconst} \I{floating-point, packed BCD constants}Packed BCD Constants |
| |
| x87-style packed BCD constants can be used in the same contexts as |
| 80-bit floating-point numbers. They are suffixed with \c{p} or |
| prefixed with \c{0p}, and can include up to 18 decimal digits. |
| |
| As with other numeric constants, underscores can be used to separate |
| digits. |
| |
| For example: |
| |
| \c dt 12_345_678_901_245_678p |
| \c dt -12_345_678_901_245_678p |
| \c dt +0p33 |
| \c dt 33p |
| |
| |
| \H{expr} \i{Expressions} |
| |
| Expressions in NASM are similar in syntax to those in C. Expressions |
| are evaluated as 64-bit integers which are then adjusted to the |
| appropriate size. |
| |
| NASM supports two special tokens in expressions, allowing |
| calculations to involve the current assembly position: the |
| \I{$, here}\c{$} and \i\c{$$} tokens. \c{$} evaluates to the assembly |
| position at the beginning of the line containing the expression; so |
| you can code an \i{infinite loop} using \c{JMP $}. \c{$$} evaluates |
| to the beginning of the current section; so you can tell how far |
| into the section you are by using \c{($-$$)}. |
| |
| The arithmetic \i{operators} provided by NASM are listed here, in |
| increasing order of \i{precedence}. |
| |
| A \e{boolean} value is true if nonzero and false if zero. The |
| operators which return a boolean value always return 1 for true and 0 |
| for false. |
| |
| |
| \S{exptri} \I{?op}\c{?} ... \c{:}: Conditional Operator |
| |
| The syntax of this operator, similar to the C conditional operator, is: |
| |
| \e{boolean} \c{?} \e{trueval} \c{:} \e{falseval} |
| |
| This operator evaluates to \e{trueval} if \e{boolean} is true, |
| otherwise to \e{falseval}. |
| |
| Note that NASM allows \c{?} characters in symbol names. Therefore, it |
| is highly advisable to always put spaces around the \c{?} and \c{:} |
| characters. |
| |
| |
| \S{expbor}: \i\c{||}: \i{Boolean OR} Operator |
| |
| The \c{||} operator gives a boolean OR: it evaluates to 1 if both sides of |
| the expression are nonzero, otherwise 0. |
| |
| |
| \S{expbxor}: \i\c{^^}: \i{Boolean XOR} Operator |
| |
| The \c{^^} operator gives a boolean XOR: it evaluates to 1 if any one side of |
| the expression is nonzero, otherwise 0. |
| |
| |
| \S{expband}: \i\c{&&}: \i{Boolean AND} Operator |
| |
| The \c{&&} operator gives a boolean AND: it evaluates to 1 if both sides of |
| the expression is nonzero, otherwise 0. |
| |
| |
| \S{exprel}: \i{Comparison Operators} |
| |
| NASM supports the following comparison operators: |
| |
| \b \i\c{=} or \i\c{==} compare for equality. |
| |
| \b \i\c{!=} or \i\c{<>} compare for inequality. |
| |
| \b \i\c{<} compares signed less than. |
| |
| \b \i\c{<=} compares signed less than or equal. |
| |
| \b \i\c{>} compares signed greater than. |
| |
| \b \i\c{>=} compares signed greather than or equal. |
| |
| These operators evaluate to 0 for false or 1 for true. |
| |
| \b \i{<=>} does a signed comparison, and evaluates to -1 for less |
| than, 0 for equal, and 1 for greater than. |
| |
| At this time, NASM does not provide unsigned comparison operators. |
| |
| |
| \S{expor} \i\c{|}: \i{Bitwise OR} Operator |
| |
| The \c{|} operator gives a bitwise OR, exactly as performed by the |
| \c{OR} machine instruction. |
| |
| |
| \S{expxor} \i\c{^}: \i{Bitwise XOR} Operator |
| |
| \c{^} provides the bitwise XOR operation. |
| |
| |
| \S{expand} \i\c{&}: \i{Bitwise AND} Operator |
| |
| \c{&} provides the bitwise AND operation. |
| |
| |
| \S{expshift} \i{Bit Shift} Operators |
| |
| \i\c{<<} gives a bit-shift to the left, just as it does in C. So |
| \c{5<<3} evaluates to 5 times 8, or 40. \i\c{>>} gives an \e{unsigned} |
| (logical) bit-shift to the right; the bits shifted in from the left |
| are set to zero. |
| |
| \i\c{<<<} gives a bit-shift to the left, exactly equivalent to the |
| \c{<<} operator; it is included for completeness. \i\c{>>>} gives an |
| \e{signed} (arithmetic) bit-shift to the right; the bits shifted in |
| from the left are filled with copies of the most significant (sign) |
| bit. |
| |
| |
| \S{expplmi} \I{+ opaddition}\c{+} and \I{- opsubtraction}\c{-}: |
| \i{Addition} and \i{Subtraction} Operators |
| |
| The \c{+} and \c{-} operators do perfectly ordinary addition and |
| subtraction. |
| |
| |
| \S{expmul} \i{Multiplication}, \i{Division} and \i{Modulo} |
| |
| \i\c{*} is the multiplication operator. |
| |
| \i\c{/} and \i\c{//} are both division operators: \c{/} is \i{unsigned |
| division} and \c{//} is \i{signed division}. |
| |
| Similarly, \i\c{%} and \i\c{%%} provide \I{unsigned modulo}\I{modulo |
| operators} unsigned and \i{signed modulo} operators respectively. |
| |
| Since the \c{%} character is used extensively by the macro |
| \i{preprocessor}, you should ensure that both the signed and unsigned |
| modulo operators are followed by white space wherever they appear. |
| |
| NASM, like ANSI C, provides no guarantees about the sensible |
| operation of the signed modulo operator. On most systems it will match |
| the signed division operator, such that: |
| |
| \c b * (a // b) + (a %% b) = a (b != 0) |
| |
| |
| \S{expmul} \i{Unary Operators} |
| |
| The highest-priority operators in NASM's expression grammar are those |
| which only apply to one argument. These are \I{+ opunary}\c{+}, \I{- |
| opunary}\c{-}, \i\c{~}, \I{! opunary}\c{!}, \i\c{SEG}, and the |
| \i{integer functions} operators. |
| |
| \c{-} negates its operand, \c{+} does nothing (it's provided for |
| symmetry with \c{-}), \c{~} computes the \i{one's complement} of its |
| operand, \c{!} is the \i{logical negation} operator. |
| |
| \c{SEG} provides the \i{segment address} |
| of its operand (explained in more detail in \k{segwrt}). |
| |
| A set of additional operators with leading and trailing double |
| underscores are used to implement the integer functions of the |
| \c{ifunc} macro package, see \k{pkg_ifunc}. |
| |
| |
| \H{segwrt} \i\c{SEG} and \i\c{WRT} |
| |
| When writing large 16-bit programs, which must be split into |
| multiple \i{segments}, it is often necessary to be able to refer to |
| the \I{segment address}segment part of the address of a symbol. NASM |
| supports the \c{SEG} operator to perform this function. |
| |
| The \c{SEG} operator evaluates to the \i\e{preferred} segment base of a |
| symbol, defined as the segment base relative to which the offset of |
| the symbol makes sense. So the code |
| |
| \c mov ax,seg symbol |
| \c mov es,ax |
| \c mov bx,symbol |
| |
| will load \c{ES:BX} with a valid pointer to the symbol \c{symbol}. |
| |
| Things can be more complex than this: since 16-bit segments and |
| \i{groups} may \I{overlapping segments}overlap, you might occasionally |
| want to refer to some symbol using a different segment base from the |
| preferred one. NASM lets you do this, by the use of the \c{WRT} |
| (With Reference To) keyword. So you can do things like |
| |
| \c mov ax,weird_seg ; weird_seg is a segment base |
| \c mov es,ax |
| \c mov bx,symbol wrt weird_seg |
| |
| to load \c{ES:BX} with a different, but functionally equivalent, |
| pointer to the symbol \c{symbol}. |
| |
| NASM supports far (inter-segment) calls and jumps by means of the |
| syntax \c{call segment:offset}, where \c{segment} and \c{offset} |
| both represent immediate values. So to call a far procedure, you |
| could code either of |
| |
| \c call (seg procedure):procedure |
| \c call weird_seg:(procedure wrt weird_seg) |
| |
| (The parentheses are included for clarity, to show the intended |
| parsing of the above instructions. They are not necessary in |
| practice.) |
| |
| NASM supports the syntax \I\c{CALL FAR}\c{call far procedure} as a |
| synonym for the first of the above usages. \c{JMP} works identically |
| to \c{CALL} in these examples. |
| |
| To declare a \i{far pointer} to a data item in a data segment, you |
| must code |
| |
| \c dw symbol, seg symbol |
| |
| NASM supports no convenient synonym for this, though you can always |
| invent one using the macro processor. |
| |
| |
| \H{strict} \i\c{STRICT}: Inhibiting Optimization |
| |
| When assembling with the optimizer set to level 2 or higher (see |
| \k{opt-O}), NASM will use size specifiers (\c{BYTE}, \c{WORD}, |
| \c{DWORD}, \c{QWORD}, \c{TWORD}, \c{OWORD}, \c{YWORD} or \c{ZWORD}), |
| but will give them the smallest possible size. The keyword \c{STRICT} |
| can be used to inhibit optimization and force a particular operand to |
| be emitted in the specified size. For example, with the optimizer on, |
| and in \c{BITS 16} mode, |
| |
| \c push dword 33 |
| |
| is encoded in three bytes \c{66 6A 21}, whereas |
| |
| \c push strict dword 33 |
| |
| is encoded in six bytes, with a full dword immediate operand \c{66 68 |
| 21 00 00 00}. |
| |
| With the optimizer off, the same code (six bytes) is generated whether |
| the \c{STRICT} keyword was used or not. |
| |
| |
| \H{crit} \i{Critical Expressions} |
| |
| Although NASM has an optional multi-pass optimizer, there are some |
| expressions which must be resolvable on the first pass. These are |
| called \e{Critical Expressions}. |
| |
| The first pass is used to determine the size of all the assembled |
| code and data, so that the second pass, when generating all the |
| code, knows all the symbol addresses the code refers to. So one |
| thing NASM can't handle is code whose size depends on the value of a |
| symbol declared after the code in question. For example, |
| |
| \c times (label-$) db 0 |
| \c label: db 'Where am I?' |
| |
| The argument to \i\c{TIMES} in this case could equally legally |
| evaluate to anything at all; NASM will reject this example because |
| it cannot tell the size of the \c{TIMES} line when it first sees it. |
| It will just as firmly reject the slightly \I{paradox}paradoxical |
| code |
| |
| \c times (label-$+1) db 0 |
| \c label: db 'NOW where am I?' |
| |
| in which \e{any} value for the \c{TIMES} argument is by definition |
| wrong! |
| |
| NASM rejects these examples by means of a concept called a |
| \e{critical expression}, which is defined to be an expression whose |
| value is required to be computable in the first pass, and which must |
| therefore depend only on symbols defined before it. The argument to |
| the \c{TIMES} prefix is a critical expression. |
| |
| \H{locallab} \i{Local Labels} |
| |
| NASM gives special treatment to symbols beginning with a \i{period}. |
| A label beginning with a single period is treated as a \e{local} |
| label, which means that it is associated with the previous non-local |
| label. So, for example: |
| |
| \c label1 ; some code |
| \c |
| \c .loop |
| \c ; some more code |
| \c |
| \c jne .loop |
| \c ret |
| \c |
| \c label2 ; some code |
| \c |
| \c .loop |
| \c ; some more code |
| \c |
| \c jne .loop |
| \c ret |
| |
| In the above code fragment, each \c{JNE} instruction jumps to the |
| line immediately before it, because the two definitions of \c{.loop} |
| are kept separate by virtue of each being associated with the |
| previous non-local label. |
| |
| This form of local label handling is borrowed from the old Amiga |
| assembler \i{DevPac}; however, NASM goes one step further, in |
| allowing access to local labels from other parts of the code. This |
| is achieved by means of \e{defining} a local label in terms of the |
| previous non-local label: the first definition of \c{.loop} above is |
| really defining a symbol called \c{label1.loop}, and the second |
| defines a symbol called \c{label2.loop}. So, if you really needed |
| to, you could write |
| |
| \c label3 ; some more code |
| \c ; and some more |
| \c |
| \c jmp label1.loop |
| |
| Sometimes it is useful - in a macro, for instance - to be able to |
| define a label which can be referenced from anywhere but which |
| doesn't interfere with the normal local-label mechanism. Such a |
| label can't be non-local because it would interfere with subsequent |
| definitions of, and references to, local labels; and it can't be |
| local because the macro that defined it wouldn't know the label's |
| full name. NASM therefore introduces a third type of label, which is |
| probably only useful in macro definitions: if a label begins with |
| the \I{label prefix}special prefix \i\c{..@}, then it does nothing |
| to the local label mechanism. So you could code |
| |
| \c label1: ; a non-local label |
| \c .local: ; this is really label1.local |
| \c ..@foo: ; this is a special symbol |
| \c label2: ; another non-local label |
| \c .local: ; this is really label2.local |
| \c |
| \c jmp ..@foo ; this will jump three lines up |
| |
| NASM has the capacity to define other special symbols beginning with |
| a double period: for example, \c{..start} is used to specify the |
| entry point in the \c{obj} output format (see \k{dotdotstart}), |
| \c{..imagebase} is used to find out the offset from a base address |
| of the current image in the \c{win64} output format (see \k{win64pic}). |
| So just keep in mind that symbols beginning with a double period are |
| special. |
| |
| |
| \C{preproc} The NASM \i{Preprocessor} |
| |
| NASM contains a powerful \i{macro processor}, which supports |
| conditional assembly, multi-level file inclusion, two forms of macro |
| (single-line and multi-line), and a `context stack' mechanism for |
| extra macro power. Preprocessor directives all begin with a \c{%} |
| sign. |
| |
| The preprocessor collapses all lines which end with a backslash (\\) |
| character into a single line. Thus: |
| |
| \c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\ |
| \c THIS_VALUE |
| |
| will work like a single-line macro without the backslash-newline |
| sequence. |
| |
| \H{slmacro} \i{Single-Line Macros} |
| |
| \S{define} The Normal Way: \I\c{%idefine}\i\c{%define} |
| |
| Single-line macros are defined using the \c{%define} preprocessor |
| directive. The definitions work in a similar way to C; so you can do |
| things like |
| |
| \c %define ctrl 0x1F & |
| \c %define param(a,b) ((a)+(a)*(b)) |
| \c |
| \c mov byte [param(2,ebx)], ctrl 'D' |
| |
| which will expand to |
| |
| \c mov byte [(2)+(2)*(ebx)], 0x1F & 'D' |
| |
| When the expansion of a single-line macro contains tokens which |
| invoke another macro, the expansion is performed at invocation time, |
| not at definition time. Thus the code |
| |
| \c %define a(x) 1+b(x) |
| \c %define b(x) 2*x |
| \c |
| \c mov ax,a(8) |
| |
| will evaluate in the expected way to \c{mov ax,1+2*8}, even though |
| the macro \c{b} wasn't defined at the time of definition of \c{a}. |
| |
| Note that single-line macro argument list cannot be preceded by whitespace. |
| Otherwise it will be treated as an expansion. For example: |
| |
| \c %define foo (a,b) ; no arguments, (a,b) is the expansion |
| \c %define bar(a,b) ; two arguments, empty expansion |
| |
| |
| Macros defined with \c{%define} are \i{case sensitive}: after |
| \c{%define foo bar}, only \c{foo} will expand to \c{bar}: \c{Foo} or |
| \c{FOO} will not. By using \c{%idefine} instead of \c{%define} (the |
| `i' stands for `insensitive') you can define all the case variants |
| of a macro at once, so that \c{%idefine foo bar} would cause |
| \c{foo}, \c{Foo}, \c{FOO}, \c{fOO} and so on all to expand to |
| \c{bar}. |
| |
| There is a mechanism which detects when a macro call has occurred as |
| a result of a previous expansion of the same macro, to guard against |
| \i{circular references} and infinite loops. If this happens, the |
| preprocessor will only expand the first occurrence of the macro. |
| Hence, if you code |
| |
| \c %define a(x) 1+a(x) |
| \c |
| \c mov ax,a(3) |
| |
| the macro \c{a(3)} will expand once, becoming \c{1+a(3)}, and will |
| then expand no further. This behaviour can be useful: see \k{32c} |
| for an example of its use. |
| |
| You can \I{overloading, single-line macros}overload single-line |
| macros: if you write |
| |
| \c %define foo(x) 1+x |
| \c %define foo(x,y) 1+x*y |
| |
| the preprocessor will be able to handle both types of macro call, |
| by counting the parameters you pass; so \c{foo(3)} will become |
| \c{1+3} whereas \c{foo(ebx,2)} will become \c{1+ebx*2}. However, if |
| you define |
| |
| \c %define foo bar |
| |
| then no other definition of \c{foo} will be accepted: a macro with |
| no parameters prohibits the definition of the same name as a macro |
| \e{with} parameters, and vice versa. |
| |
| This doesn't prevent single-line macros being \e{redefined}: you can |
| perfectly well define a macro with |
| |
| \c %define foo bar |
| |
| and then re-define it later in the same source file with |
| |
| \c %define foo baz |
| |
| Then everywhere the macro \c{foo} is invoked, it will be expanded |
| according to the most recent definition. This is particularly useful |
| when defining single-line macros with \c{%assign} (see \k{assign}). |
| |
| The following additional features were added in NASM 2.15: |
| |
| It is possible to define an empty string instead of an argument name |
| if the argument is never used. For example: |
| |
| \c %define ereg(foo,) e %+ foo |
| \c mov eax,ereg(dx,cx) |
| |
| A single pair of parentheses is a subcase of a single, unused argument: |
| |
| \c %define myreg() eax |
| \c mov edx,myreg() |
| |
| This is similar to the behavior of the C preprocessor. |
| |
| \b If declared with an \c{=}, NASM will evaluate the argument as an |
| expression after expansion. |
| |
| \b If an argument declared with an \c{&}, a macro parameter will be |
| turned into a quoted string after expansion. |
| |
| \b If declared with a \c{+}, it is a greedy or variadic parameter; it |
| includes any subsequent commas and parameters. |
| |
| \b If declared with an \c{!}, NASM will not strip whitespace and |
| braces (useful in conjunction with \c{&}). |
| |
| For example: |
| |
| \c %define xyzzy(=expr,&val) expr, str |
| \c %define plugh(x) xyzzy(x,x) |
| \c db plugh(3+5), `\0` ; Expands to: db 8, "3+5", `\0` |
| |
| You can \i{pre-define} single-line macros using the `-d' option on |
| the NASM command line: see \k{opt-d}. |
| |
| |
| \S{xdefine} Resolving \c{%define}: \I\c{%ixdefine}\i\c{%xdefine} |
| |
| To have a reference to an embedded single-line macro resolved at the |
| time that the embedding macro is \e{defined}, as opposed to when the |
| embedding macro is \e{expanded}, you need a different mechanism to the |
| one offered by \c{%define}. The solution is to use \c{%xdefine}, or |
| it's \I{case sensitive}case-insensitive counterpart \c{%ixdefine}. |
| |
| Suppose you have the following code: |
| |
| \c %define isTrue 1 |
| \c %define isFalse isTrue |
| \c %define isTrue 0 |
| \c |
| \c val1: db isFalse |
| \c |
| \c %define isTrue 1 |
| \c |
| \c val2: db isFalse |
| |
| In this case, \c{val1} is equal to 0, and \c{val2} is equal to 1. |
| This is because, when a single-line macro is defined using |
| \c{%define}, it is expanded only when it is called. As \c{isFalse} |
| expands to \c{isTrue}, the expansion will be the current value of |
| \c{isTrue}. The first time it is called that is 0, and the second |
| time it is 1. |
| |
| If you wanted \c{isFalse} to expand to the value assigned to the |
| embedded macro \c{isTrue} at the time that \c{isFalse} was defined, |
| you need to change the above code to use \c{%xdefine}. |
| |
| \c %xdefine isTrue 1 |
| \c %xdefine isFalse isTrue |
| \c %xdefine isTrue 0 |
| \c |
| \c val1: db isFalse |
| \c |
| \c %xdefine isTrue 1 |
| \c |
| \c val2: db isFalse |
| |
| Now, each time that \c{isFalse} is called, it expands to 1, |
| as that is what the embedded macro \c{isTrue} expanded to at |
| the time that \c{isFalse} was defined. |
| |
| \c{%xdefine} and \c{%ixdefine} supports argument expansion exactly the |
| same way that \c{%define} and \c{%idefine} does. |
| |
| |
| \S{indmacro} \i{Macro Indirection}: \I\c{%[}\c{%[...]} |
| |
| The \c{%[...]} construct can be used to expand macros in contexts |
| where macro expansion would otherwise not occur, including in the |
| names other macros. For example, if you have a set of macros named |
| \c{Foo16}, \c{Foo32} and \c{Foo64}, you could write: |
| |
| \c mov ax,Foo%[__?BITS?__] ; The Foo value |
| |
| to use the builtin macro \c{__?BITS?__} (see \k{bitsm}) to automatically |
| select between them. Similarly, the two statements: |
| |
| \c %xdefine Bar Quux ; Expands due to %xdefine |
| \c %define Bar %[Quux] ; Expands due to %[...] |
| |
| have, in fact, exactly the same effect. |
| |
| \c{%[...]} concatenates to adjacent tokens in the same way that |
| multi-line macro parameters do, see \k{concat} for details. |
| |
| |
| \S{concat%+} Concatenating Single Line Macro Tokens: \i\c{%+} |
| |
| Individual tokens in single line macros can be concatenated, to produce |
| longer tokens for later processing. This can be useful if there are |
| several similar macros that perform similar functions. |
| |
| Please note that a space is required after \c{%+}, in order to |
| disambiguate it from the syntax \c{%+1} used in multiline macros. |
| |
| As an example, consider the following: |
| |
| \c %define BDASTART 400h ; Start of BIOS data area |
| |
| \c struc tBIOSDA ; its structure |
| \c .COM1addr RESW 1 |
| \c .COM2addr RESW 1 |
| \c ; ..and so on |
| \c endstruc |
| |
| Now, if we need to access the elements of tBIOSDA in different places, |
| we can end up with: |
| |
| \c mov ax,BDASTART + tBIOSDA.COM1addr |
| \c mov bx,BDASTART + tBIOSDA.COM2addr |
| |
| This will become pretty ugly (and tedious) if used in many places, and |
| can be reduced in size significantly by using the following macro: |
| |
| \c ; Macro to access BIOS variables by their names (from tBDA): |
| |
| \c %define BDA(x) BDASTART + tBIOSDA. %+ x |
| |
| Now the above code can be written as: |
| |
| \c mov ax,BDA(COM1addr) |
| \c mov bx,BDA(COM2addr) |
| |
| Using this feature, we can simplify references to a lot of macros (and, |
| in turn, reduce typing errors). |
| |
| |
| \S{selfref%?} The Macro Name Itself: \i\c{%?} and \i\c{%??} |
| |
| The special symbols \c{%?} and \c{%??} can be used to reference the |
| macro name itself inside a macro expansion, this is supported for both |
| single-and multi-line macros. \c{%?} refers to the macro name as |
| \e{invoked}, whereas \c{%??} refers to the macro name as |
| \e{declared}. The two are always the same for case-sensitive |
| macros, but for case-insensitive macros, they can differ. |
| |
| For example: |
| |
| \c %idefine Foo mov %?,%?? |
| \c |
| \c foo |
| \c FOO |
| |
| will expand to: |
| |
| \c mov foo,Foo |
| \c mov FOO,Foo |
| |
| The sequence: |
| |
| \c %idefine keyword $%? |
| |
| can be used to make a keyword "disappear", for example in case a new |
| instruction has been used as a label in older code. For example: |
| |
| \c %idefine pause $%? ; Hide the PAUSE instruction |
| |
| |
| \S{undef} Undefining Single-Line Macros: \i\c{%undef} |
| |
| Single-line macros can be removed with the \c{%undef} directive. For |
| example, the following sequence: |
| |
| \c %define foo bar |
| \c %undef foo |
| \c |
| \c mov eax, foo |
| |
| will expand to the instruction \c{mov eax, foo}, since after |
| \c{%undef} the macro \c{foo} is no longer defined. |
| |
| Macros that would otherwise be pre-defined can be undefined on the |
| command-line using the `-u' option on the NASM command line: see |
| \k{opt-u}. |
| |
| |
| \S{assign} \i{Preprocessor Variables}: \i\c{%assign} |
| |
| An alternative way to define single-line macros is by means of the |
| \c{%assign} command (and its \I{case sensitive}case-insensitive |
| counterpart \i\c{%iassign}, which differs from \c{%assign} in |
| exactly the same way that \c{%idefine} differs from \c{%define}). |
| |
| \c{%assign} is used to define single-line macros which take no |
| parameters and have a numeric value. This value can be specified in |
| the form of an expression, and it will be evaluated once, when the |
| \c{%assign} directive is processed. |
| |
| Like \c{%define}, macros defined using \c{%assign} can be re-defined |
| later, so you can do things like |
| |
| \c %assign i i+1 |
| |
| to increment the numeric value of a macro. |
| |
| \c{%assign} is useful for controlling the termination of \c{%rep} |
| preprocessor loops: see \k{rep} for an example of this. Another |
| use for \c{%assign} is given in \k{16c} and \k{32c}. |
| |
| The expression passed to \c{%assign} is a \i{critical expression} |
| (see \k{crit}), and must also evaluate to a pure number (rather than |
| a relocatable reference such as a code or data address, or anything |
| involving a register). |
| |
| |
| \S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr} |
| |
| \c{%defstr}, and its case-insensitive counterpart \c{%idefstr}, define |
| or redefine a single-line macro without parameters but converts the |
| entire right-hand side, after macro expansion, to a quoted string |
| before definition. |
| |
| For example: |
| |
| \c %defstr test TEST |
| |
| is equivalent to |
| |
| \c %define test 'TEST' |
| |
| This can be used, for example, with the \c{%!} construct (see |
| \k{getenv}): |
| |
| \c %defstr PATH %!PATH ; The operating system PATH variable |
| |
| |
| \S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok} |
| |
| \c{%deftok}, and its case-insensitive counterpart \c{%ideftok}, define |
| or redefine a single-line macro without parameters but converts the |
| second parameter, after string conversion, to a sequence of tokens. |
| |
| For example: |
| |
| \c %deftok test 'TEST' |
| |
| is equivalent to |
| |
| \c %define test TEST |
| |
| |
| \S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias} |
| |
| \c{%defalias}, and its case-insensitive counterpart \c{%idefalias}, define an |
| alias to a macro, i.e. equivalent of a symbolic link. |
| |
| When used with various macro defining and undefining directives, it |
| affects the aliased macro. This functionality is intended for being |
| able to rename macros while retaining the legacy names. |
| |
| When an alias is defined, but the aliased macro is then undefined, the |
| aliases can legitimately point to nonexistent macros. |
| |
| The alias can be undefined using the \c{%undefalias} directive. \e{All} |
| aliases can be undefined using the \c{%clear defalias} directive. This |
| includes backwards compatibility aliases defined by NASM itself. |
| |
| To disable aliases without undefining them, use the \c{%aliases off} |
| directive. |
| |
| To check whether an alias is defined, regardless of the existence of |
| the aliased macro, use \c{%ifdefalias}. |
| |
| For example: |
| |
| \c %defalias OLD NEW |
| \c ; OLD and NEW both undefined |
| \c %define NEW 123 |
| \c ; OLD and NEW both 123 |
| \c %undef OLD |
| \c ; OLD and NEW both undefined |
| \c %define OLD 456 |
| \c ; OLD and NEW both 456 |
| \c %undefalias OLD |
| \c ; OLD undefined, NEW defined to 456 |
| |
| \S{cond-comma} \i{Conditional Comma Operator}: \i\c{%,} |
| |
| As of version 2.15, NASM has a conditional comma operator \c{%,} that |
| expands to a comma \e{unless} followed by a null expansion, which |
| allows suppressing the comma before an empty argument. This is |
| especially useful with greedy single-line macros. |
| |
| For example, all the expressions below are valid: |
| |
| \c %define greedy(a,b,c+) a + 66 %, b * 3 %, c |
| \c |
| \c db greedy(1,2) ; db 1 + 66, 2 * 3 |
| \c db greedy(1,2,3) ; db 1 + 66, 2 * 3, 3 |
| \c db greedy(1,2,3,4) ; db 1 + 66, 2 * 3, 3, 4 |
| \c db greedy(1,2,3,4,5) ; db 1 + 66, 2 * 3, 3, 4, 5 |
| |
| |
| \H{strlen} \i{String Manipulation in Macros} |
| |
| It's often useful to be able to handle strings in macros. NASM |
| supports a few simple string handling macro operators from which |
| more complex operations can be constructed. |
| |
| All the string operators define or redefine a value (either a string |
| or a numeric value) to a single-line macro. When producing a string |
| value, it may change the style of quoting of the input string or |
| strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings. |
| |
| \S{strcat} \i{Concatenating Strings}: \i\c{%strcat} |
| |
| The \c{%strcat} operator concatenates quoted strings and assign them to |
| a single-line macro. |
| |
| For example: |
| |
| \c %strcat alpha "Alpha: ", '12" screen' |
| |
| ... would assign the value \c{'Alpha: 12" screen'} to \c{alpha}. |
| Similarly: |
| |
| \c %strcat beta '"foo"\', "'bar'" |
| |
| ... would assign the value \c{`"foo"\\\\'bar'`} to \c{beta}. |
| |
| The use of commas to separate strings is permitted but optional. |
| |
| |
| \S{strlen} \i{String Length}: \i\c{%strlen} |
| |
| The \c{%strlen} operator assigns the length of a string to a macro. |
| For example: |
| |
| \c %strlen charcnt 'my string' |
| |
| In this example, \c{charcnt} would receive the value 9, just as |
| if an \c{%assign} had been used. In this example, \c{'my string'} |
| was a literal string but it could also have been a single-line |
| macro that expands to a string, as in the following example: |
| |
| \c %define sometext 'my string' |
| \c %strlen charcnt sometext |
| |
| As in the first case, this would result in \c{charcnt} being |
| assigned the value of 9. |
| |
| |
| \S{substr} \i{Extracting Substrings}: \i\c{%substr} |
| |
| Individual letters or substrings in strings can be extracted using the |
| \c{%substr} operator. An example of its use is probably more useful |
| than the description: |
| |
| \c %substr mychar 'xyzw' 1 ; equivalent to %define mychar 'x' |
| \c %substr mychar 'xyzw' 2 ; equivalent to %define mychar 'y' |
| \c %substr mychar 'xyzw' 3 ; equivalent to %define mychar 'z' |
| \c %substr mychar 'xyzw' 2,2 ; equivalent to %define mychar 'yz' |
| \c %substr mychar 'xyzw' 2,-1 ; equivalent to %define mychar 'yzw' |
| \c %substr mychar 'xyzw' 2,-2 ; equivalent to %define mychar 'yz' |
| |
| As with \c{%strlen} (see \k{strlen}), the first parameter is the |
| single-line macro to be created and the second is the string. The |
| third parameter specifies the first character to be selected, and the |
| optional fourth parameter preceeded by comma) is the length. Note |
| that the first index is 1, not 0 and the last index is equal to the |
| value that \c{%strlen} would assign given the same string. Index |
| values out of range result in an empty string. A negative length |
| means "until N-1 characters before the end of string", i.e. \c{-1} |
| means until end of string, \c{-2} until one character before, etc. |
| |
| |
| \H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro} |
| |
| Multi-line macros are much more like the type of macro seen in MASM |
| and TASM: a multi-line macro definition in NASM looks something like |
| this. |
| |
| \c %macro prologue 1 |
| \c |
| \c push ebp |
| \c mov ebp,esp |
| \c sub esp,%1 |
| \c |
| \c %endmacro |
| |
| This defines a C-like function prologue as a macro: so you would |
| invoke the macro with a call such as: |
| |
| \c myfunc: prologue 12 |
| |
| which would expand to the three lines of code |
| |
| \c myfunc: push ebp |
| \c mov ebp,esp |
| \c sub esp,12 |
| |
| The number \c{1} after the macro name in the \c{%macro} line defines |
| the number of parameters the macro \c{prologue} expects to receive. |
| The use of \c{%1} inside the macro definition refers to the first |
| parameter to the macro call. With a macro taking more than one |
| parameter, subsequent parameters would be referred to as \c{%2}, |
| \c{%3} and so on. |
| |
| Multi-line macros, like single-line macros, are \i{case-sensitive}, |
| unless you define them using the alternative directive \c{%imacro}. |
| |
| If you need to pass a comma as \e{part} of a parameter to a |
| multi-line macro, you can do that by enclosing the entire parameter |
| in \I{braces, around macro parameters}braces. So you could code |
| things like: |
| |
| \c %macro silly 2 |
| \c |
| \c %2: db %1 |
| \c |
| \c %endmacro |
| \c |
| \c silly 'a', letter_a ; letter_a: db 'a' |
| \c silly 'ab', string_ab ; string_ab: db 'ab' |
| \c silly {13,10}, crlf ; crlf: db 13,10 |
| |
| The behavior with regards to empty arguments at the end of multi-line |
| macros before NASM 2.15 was often very strange. For backwards |
| compatibility, NASM attempts to recognize cases where the legacy |
| behavior would give unexpected results, and issues a warning, but |
| largely tries to match the legacy behavior. This can be disabled with |
| the \c{%pragma} (see \k{pragma-preproc}): |
| |
| \c %pragma preproc sane_empty_expansion |
| |
| |
| \S{mlmacover} Overloading Multi-Line Macros\I{overloading, multi-line macros} |
| |
| As with single-line macros, multi-line macros can be overloaded by |
| defining the same macro name several times with different numbers of |
| parameters. This time, no exception is made for macros with no |
| parameters at all. So you could define |
| |
| \c %macro prologue 0 |
| \c |
| \c push ebp |
| \c mov ebp,esp |
| \c |
| \c %endmacro |
| |
| to define an alternative form of the function prologue which |
| allocates no local stack space. |
| |
| Sometimes, however, you might want to `overload' a machine |
| instruction; for example, you might want to define |
| |
| \c %macro push 2 |
| \c |
| \c push %1 |
| \c push %2 |
| \c |
| \c %endmacro |
| |
| so that you could code |
| |
| \c push ebx ; this line is not a macro call |
| \c push eax,ecx ; but this one is |
| |
| Ordinarily, NASM will give a warning for the first of the above two |
| lines, since \c{push} is now defined to be a macro, and is being |
| invoked with a number of parameters for which no definition has been |
| given. The correct code will still be generated, but the assembler |
| will give a warning. This warning can be disabled by the use of the |
| \c{-w-macro-params} command-line option (see \k{opt-w}). |
| |
| |
| \S{maclocal} \i{Macro-Local Labels} |
| |
| NASM allows you to define labels within a multi-line macro |
| definition in such a way as to make them local to the macro call: so |
| calling the same macro multiple times will use a different label |
| each time. You do this by prefixing \i\c{%%} to the label name. So |
| you can invent an instruction which executes a \c{RET} if the \c{Z} |
| flag is set by doing this: |
| |
| \c %macro retz 0 |
| \c |
| \c jnz %%skip |
| \c ret |
| \c %%skip: |
| \c |
| \c %endmacro |
| |
| You can call this macro as many times as you want, and every time |
| you call it NASM will make up a different `real' name to substitute |
| for the label \c{%%skip}. The names NASM invents are of the form |
| \c{..@2345.skip}, where the number 2345 changes with every macro |
| call. The \i\c{..@} prefix prevents macro-local labels from |
| interfering with the local label mechanism, as described in |
| \k{locallab}. You should avoid defining your own labels in this form |
| (the \c{..@} prefix, then a number, then another period) in case |
| they interfere with macro-local labels. |
| |
| |
| \S{mlmacgre} \i{Greedy Macro Parameters} |
| |
| Occasionally it is useful to define a macro which lumps its entire |
| command line into one parameter definition, possibly after |
| extracting one or two smaller parameters from the front. An example |
| might be a macro to write a text string to a file in MS-DOS, where |
| you might want to be able to write |
| |
| \c writefile [filehandle],"hello, world",13,10 |
| |
| NASM allows you to define the last parameter of a macro to be |
| \e{greedy}, meaning that if you invoke the macro with more |
| parameters than it expects, all the spare parameters get lumped into |
| the last defined one along with the separating commas. So if you |
| code: |
| |
| \c %macro writefile 2+ |
| \c |
| \c jmp %%endstr |
| \c %%str: db %2 |
| \c %%endstr: |
| \c mov dx,%%str |
| \c mov cx,%%endstr-%%str |
| \c mov bx,%1 |
| \c mov ah,0x40 |
| \c int 0x21 |
| \c |
| \c %endmacro |
| |
| then the example call to \c{writefile} above will work as expected: |
| the text before the first comma, \c{[filehandle]}, is used as the |
| first macro parameter and expanded when \c{%1} is referred to, and |
| all the subsequent text is lumped into \c{%2} and placed after the |
| \c{db}. |
| |
| The greedy nature of the macro is indicated to NASM by the use of |
| the \I{+ modifier}\c{+} sign after the parameter count on the |
| \c{%macro} line. |
| |
| If you define a greedy macro, you are effectively telling NASM how |
| it should expand the macro given \e{any} number of parameters from |
| the actual number specified up to infinity; in this case, for |
| example, NASM now knows what to do when it sees a call to |
| \c{writefile} with 2, 3, 4 or more parameters. NASM will take this |
| into account when overloading macros, and will not allow you to |
| define another form of \c{writefile} taking 4 parameters (for |
| example). |
| |
| Of course, the above macro could have been implemented as a |
| non-greedy macro, in which case the call to it would have had to |
| look like |
| |
| \c writefile [filehandle], {"hello, world",13,10} |
| |
| NASM provides both mechanisms for putting \i{commas in macro |
| parameters}, and you choose which one you prefer for each macro |
| definition. |
| |
| See \k{sectmac} for a better way to write the above macro. |
| |
| \S{mlmacrange} \i{Macro Parameters Range} |
| |
| NASM allows you to expand parameters via special construction \c{%\{x:y\}} |
| where \c{x} is the first parameter index and \c{y} is the last. Any index can |
| be either negative or positive but must never be zero. |
| |
| For example |
| |
| \c %macro mpar 1-* |
| \c db %{3:5} |
| \c %endmacro |
| \c |
| \c mpar 1,2,3,4,5,6 |
| |
| expands to \c{3,4,5} range. |
| |
| Even more, the parameters can be reversed so that |
| |
| \c %macro mpar 1-* |
| \c db %{5:3} |
| \c %endmacro |
| \c |
| \c mpar 1,2,3,4,5,6 |
| |
| expands to \c{5,4,3} range. |
| |
| But even this is not the last. The parameters can be addressed via negative |
| indices so NASM will count them reversed. The ones who know Python may see |
| the analogue here. |
| |
| \c %macro mpar 1-* |
| \c db %{-1:-3} |
| \c %endmacro |
| \c |
| \c mpar 1,2,3,4,5,6 |
| |
| expands to \c{6,5,4} range. |
| |
| Note that NASM uses \i{comma} to separate parameters being expanded. |
| |
| By the way, here is a trick - you might use the index \c{%{-1:-1}} |
| which gives you the \i{last} argument passed to a macro. |
| |
| \S{mlmacdef} \i{Default Macro Parameters} |
| |
| NASM also allows you to define a multi-line macro with a \e{range} |
| of allowable parameter counts. If you do this, you can specify |
| defaults for \i{omitted parameters}. So, for example: |
| |
| \c %macro die 0-1 "Painful program death has occurred." |
| \c |
| \c writefile 2,%1 |
| \c mov ax,0x4c01 |
| \c int 0x21 |
| \c |
| \c %endmacro |
| |
| This macro (which makes use of the \c{writefile} macro defined in |
| \k{mlmacgre}) can be called with an explicit error message, which it |
| will display on the error output stream before exiting, or it can be |
| called with no parameters, in which case it will use the default |
| error message supplied in the macro definition. |
| |
| In general, you supply a minimum and maximum number of parameters |
| for a macro of this type; the minimum number of parameters are then |
| required in the macro call, and then you provide defaults for the |
| optional ones. So if a macro definition began with the line |
| |
| \c %macro foobar 1-3 eax,[ebx+2] |
| |
| then it could be called with between one and three parameters, and |
| \c{%1} would always be taken from the macro call. \c{%2}, if not |
| specified by the macro call, would default to \c{eax}, and \c{%3} if |
| not specified would default to \c{[ebx+2]}. |
| |
| You can provide extra information to a macro by providing |
| too many default parameters: |
| |
| \c %macro quux 1 something |
| |
| This will trigger a warning by default; see \k{opt-w} for |
| more information. |
| When \c{quux} is invoked, it receives not one but two parameters. |
| \c{something} can be referred to as \c{%2}. The difference |
| between passing \c{something} this way and writing \c{something} |
| in the macro body is that with this way \c{something} is evaluated |
| when the macro is defined, not when it is expanded. |
| |
| You may omit parameter defaults from the macro definition, in which |
| case the parameter default is taken to be blank. This can be useful |
| for macros which can take a variable number of parameters, since the |
| \i\c{%0} token (see \k{percent0}) allows you to determine how many |
| parameters were really passed to the macro call. |
| |
| This defaulting mechanism can be combined with the greedy-parameter |
| mechanism; so the \c{die} macro above could be made more powerful, |
| and more useful, by changing the first line of the definition to |
| |
| \c %macro die 0-1+ "Painful program death has occurred.",13,10 |
| |
| The maximum parameter count can be infinite, denoted by \c{*}. In |
| this case, of course, it is impossible to provide a \e{full} set of |
| default parameters. Examples of this usage are shown in \k{rotate}. |
| |
| |
| \S{percent0} \i\c{%0}: \I{counting macro parameters}Macro Parameter Counter |
| |
| The parameter reference \c{%0} will return a numeric constant giving the |
| number of parameters received, that is, if \c{%0} is n then \c{%}n is the |
| last parameter. \c{%0} is mostly useful for macros that can take a variable |
| number of parameters. It can be used as an argument to \c{%rep} |
| (see \k{rep}) in order to iterate through all the parameters of a macro. |
| Examples are given in \k{rotate}. |
| |
| |
| \S{percent00} \i\c{%00}: \I{label preceeding macro}Label Preceeding Macro |
| |
| \c{%00} will return the label preceeding the macro invocation, if any. The |
| label must be on the same line as the macro invocation, may be a local label |
| (see \k{locallab}), and need not end in a colon. |
| |
| |
| \S{rotate} \i\c{%rotate}: \i{Rotating Macro Parameters} |
| |
| Unix shell programmers will be familiar with the \I{shift |
| command}\c{shift} shell command, which allows the arguments passed |
| to a shell script (referenced as \c{$1}, \c{$2} and so on) to be |
| moved left by one place, so that the argument previously referenced |
| as \c{$2} becomes available as \c{$1}, and the argument previously |
| referenced as \c{$1} is no longer available at all. |
| |
| NASM provides a similar mechanism, in the form of \c{%rotate}. As |
| its name suggests, it differs from the Unix \c{shift} in that no |
| parameters are lost: parameters rotated off the left end of the |
| argument list reappear on the right, and vice versa. |
| |
| \c{%rotate} is invoked with a single numeric argument (which may be |
| an expression). The macro parameters are rotated to the left by that |
| many places. If the argument to \c{%rotate} is negative, the macro |
| parameters are rotated to the right. |
| |
| \I{iterating over macro parameters}So a pair of macros to save and |
| restore a set of registers might work as follows: |
| |
| \c %macro multipush 1-* |
| \c |
| \c %rep %0 |
| \c push %1 |
| \c %rotate 1 |
| \c %endrep |
| \c |
| \c %endmacro |
| |
| This macro invokes the \c{PUSH} instruction on each of its arguments |
| in turn, from left to right. It begins by pushing its first |
| argument, \c{%1}, then invokes \c{%rotate} to move all the arguments |
| one place to the left, so that the original second argument is now |
| available as \c{%1}. Repeating this procedure as many times as there |
| were arguments (achieved by supplying \c{%0} as the argument to |
| \c{%rep}) causes each argument in turn to be pushed. |
| |
| Note also the use of \c{*} as the maximum parameter count, |
| indicating that there is no upper limit on the number of parameters |
| you may supply to the \i\c{multipush} macro. |
| |
| It would be convenient, when using this macro, to have a \c{POP} |
| equivalent, which \e{didn't} require the arguments to be given in |
| reverse order. Ideally, you would write the \c{multipush} macro |
| call, then cut-and-paste the line to where the pop needed to be |
| done, and change the name of the called macro to \c{multipop}, and |
| the macro would take care of popping the registers in the opposite |
| order from the one in which they were pushed. |
| |
| This can be done by the following definition: |
| |
| \c %macro multipop 1-* |
| \c |
| \c %rep %0 |
| \c %rotate -1 |
| \c pop %1 |
| \c %endrep |
| \c |
| \c %endmacro |
| |
| This macro begins by rotating its arguments one place to the |
| \e{right}, so that the original \e{last} argument appears as \c{%1}. |
| This is then popped, and the arguments are rotated right again, so |
| the second-to-last argument becomes \c{%1}. Thus the arguments are |
| iterated through in reverse order. |
| |
| |
| \S{concat} \i{Concatenating Macro Parameters} |
| |
| NASM can concatenate macro parameters and macro indirection constructs |
| on to other text surrounding them. This allows you to declare a family |
| of symbols, for example, in a macro definition. If, for example, you |
| wanted to generate a table of key codes along with offsets into the |
| table, you could code something like |
| |
| \c %macro keytab_entry 2 |
| \c |
| \c keypos%1 equ $-keytab |
| \c db %2 |
| \c |
| \c %endmacro |
| \c |
| \c keytab: |
| \c keytab_entry F1,128+1 |
| \c keytab_entry F2,128+2 |
| \c keytab_entry Return,13 |
| |
| which would expand to |
| |
| \c keytab: |
| \c keyposF1 equ $-keytab |
| \c db 128+1 |
| \c keyposF2 equ $-keytab |
| \c db 128+2 |
| \c keyposReturn equ $-keytab |
| \c db 13 |
| |
| You can just as easily concatenate text on to the other end of a |
| macro parameter, by writing \c{%1foo}. |
| |
| If you need to append a \e{digit} to a macro parameter, for example |
| defining labels \c{foo1} and \c{foo2} when passed the parameter |
| \c{foo}, you can't code \c{%11} because that would be taken as the |
| eleventh macro parameter. Instead, you must code |
| \I{braces, after % sign}\c{%\{1\}1}, which will separate the first |
| \c{1} (giving the number of the macro parameter) from the second |
| (literal text to be concatenated to the parameter). |
| |
| This concatenation can also be applied to other preprocessor in-line |
| objects, such as macro-local labels (\k{maclocal}) and context-local |
| labels (\k{ctxlocal}). In all cases, ambiguities in syntax can be |
| resolved by enclosing everything after the \c{%} sign and before the |
| literal text in braces: so \c{%\{%foo\}bar} concatenates the text |
| \c{bar} to the end of the real name of the macro-local label |
| \c{%%foo}. (This is unnecessary, since the form NASM uses for the |
| real names of macro-local labels means that the two usages |
| \c{%\{%foo\}bar} and \c{%%foobar} would both expand to the same |
| thing anyway; nevertheless, the capability is there.) |
| |
| The single-line macro indirection construct, \c{%[...]} |
| (\k{indmacro}), behaves the same way as macro parameters for the |
| purpose of concatenation. |
| |
| See also the \c{%+} operator, \k{concat%+}. |
| |
| |
| \S{mlmaccc} \i{Condition Codes as Macro Parameters} |
| |
| NASM can give special treatment to a macro parameter which contains |
| a condition code. For a start, you can refer to the macro parameter |
| \c{%1} by means of the alternative syntax \i\c{%+1}, which informs |
| NASM that this macro parameter is supposed to contain a condition |
| code, and will cause the preprocessor to report an error message if |
| the macro is called with a parameter which is \e{not} a valid |
| condition code. |
| |
| Far more usefully, though, you can refer to the macro parameter by |
| means of \i\c{%-1}, which NASM will expand as the \e{inverse} |
| condition code. So the \c{retz} macro defined in \k{maclocal} can be |
| replaced by a general \i{conditional-return macro} like this: |
| |
| \c %macro retc 1 |
| \c |
| \c j%-1 %%skip |
| \c ret |
| \c %%skip: |
| \c |
| \c %endmacro |
| |
| This macro can now be invoked using calls like \c{retc ne}, which |
| will cause the conditional-jump instruction in the macro expansion |
| to come out as \c{JE}, or \c{retc po} which will make the jump a |
| \c{JPE}. |
| |
| The \c{%+1} macro-parameter reference is quite happy to interpret |
| the arguments \c{CXZ} and \c{ECXZ} as valid condition codes; |
| however, \c{%-1} will report an error if passed either of these, |
| because no inverse condition code exists. |
| |
| |
| \S{nolist} \i{Disabling Listing Expansion}\I\c{.nolist} |
| |
| When NASM is generating a listing file from your program, it will |
| generally expand multi-line macros by means of writing the macro |
| call and then listing each line of the expansion. This allows you to |
| see which instructions in the macro expansion are generating what |
| code; however, for some macros this clutters the listing up |
| unnecessarily. |
| |
| NASM therefore provides the \c{.nolist} qualifier, which you can |
| include in a macro definition to inhibit the expansion of the macro |
| in the listing file. The \c{.nolist} qualifier comes directly after |
| the number of parameters, like this: |
| |
| \c %macro foo 1.nolist |
| |
| Or like this: |
| |
| \c %macro bar 1-5+.nolist a,b,c,d,e,f,g,h |
| |
| \S{unmacro} Undefining Multi-Line Macros: \i\c{%unmacro} |
| |
| Multi-line macros can be removed with the \c{%unmacro} directive. |
| Unlike the \c{%undef} directive, however, \c{%unmacro} takes an |
| argument specification, and will only remove \i{exact matches} with |
| that argument specification. |
| |
| For example: |
| |
| \c %macro foo 1-3 |
| \c ; Do something |
| \c %endmacro |
| \c %unmacro foo 1-3 |
| |
| removes the previously defined macro \c{foo}, but |
| |
| \c %macro bar 1-3 |
| \c ; Do something |
| \c %endmacro |
| \c %unmacro bar 1 |
| |
| does \e{not} remove the macro \c{bar}, since the argument |
| specification does not match exactly. |
| |
| |
| \H{condasm} \i{Conditional Assembly}\I\c{%if} |
| |
| Similarly to the C preprocessor, NASM allows sections of a source |
| file to be assembled only if certain conditions are met. The general |
| syntax of this feature looks like this: |
| |
| \c %if<condition> |
| \c ; some code which only appears if <condition> is met |
| \c %elif<condition2> |
| \c ; only appears if <condition> is not met but <condition2> is |
| \c %else |
| \c ; this appears if neither <condition> nor <condition2> was met |
| \c %endif |
| |
| The inverse forms \i\c{%ifn} and \i\c{%elifn} are also supported. |
| |
| The \i\c{%else} clause is optional, as is the \i\c{%elif} clause. |
| You can have more than one \c{%elif} clause as well. |
| |
| There are a number of variants of the \c{%if} directive. Each has its |
| corresponding \c{%elif}, \c{%ifn}, and \c{%elifn} directives; for |
| example, the equivalents to the \c{%ifdef} directive are \c{%elifdef}, |
| \c{%ifndef}, and \c{%elifndef}. |
| |
| \S{ifdef} \i\c{%ifdef}: Testing Single-Line Macro Existence\I{testing, |
| single-line macro existence} |
| |
| Beginning a conditional-assembly block with the line \c{%ifdef |
| MACRO} will assemble the subsequent code if, and only if, a |
| single-line macro called \c{MACRO} is defined. If not, then the |
| \c{%elif} and \c{%else} blocks (if any) will be processed instead. |
| |
| For example, when debugging a program, you might want to write code |
| such as |
| |
| \c ; perform some function |
| \c %ifdef DEBUG |
| \c writefile 2,"Function performed successfully",13,10 |
| \c %endif |
| \c ; go and do something else |
| |
| Then you could use the command-line option \c{-dDEBUG} to create a |
| version of the program which produced debugging messages, and remove |
| the option to generate the final release version of the program. |
| |
| You can test for a macro \e{not} being defined by using |
| \i\c{%ifndef} instead of \c{%ifdef}. You can also test for macro |
| definitions in \c{%elif} blocks by using \i\c{%elifdef} and |
| \i\c{%elifndef}. |
| |
| |
| \S{ifmacro} \i\c{%ifmacro}: Testing Multi-Line Macro |
| Existence\I{testing, multi-line macro existence} |
| |
| The \c{%ifmacro} directive operates in the same way as the \c{%ifdef} |
| directive, except that it checks for the existence of a multi-line macro. |
| |
| For example, you may be working with a large project and not have control |
| over the macros in a library. You may want to create a macro with one |
| name if it doesn't already exist, and another name if one with that name |
| does exist. |
| |
| The \c{%ifmacro} is considered true if defining a macro with the given name |
| and number of arguments would cause a definitions conflict. For example: |
| |
| \c %ifmacro MyMacro 1-3 |
| \c |
| \c %error "MyMacro 1-3" causes a conflict with an existing macro. |
| \c |
| \c %else |
| \c |
| \c %macro MyMacro 1-3 |
| \c |
| \c ; insert code to define the macro |
| \c |
| \c %endmacro |
| \c |
| \c %endif |
| |
| This will create the macro "MyMacro 1-3" if no macro already exists which |
| would conflict with it, and emits a warning if there would be a definition |
| conflict. |
| |
| You can test for the macro not existing by using the \i\c{%ifnmacro} instead |
| of \c{%ifmacro}. Additional tests can be performed in \c{%elif} blocks by using |
| \i\c{%elifmacro} and \i\c{%elifnmacro}. |
| |
| |
| \S{ifctx} \i\c{%ifctx}: Testing the Context Stack\I{testing, context |
| stack} |
| |
| The conditional-assembly construct \c{%ifctx} will cause the |
| subsequent code to be assembled if and only if the top context on |
| the preprocessor's context stack has the same name as one of the arguments. |
| As with \c{%ifdef}, the inverse and \c{%elif} forms \i\c{%ifnctx}, |
| \i\c{%elifctx} and \i\c{%elifnctx} are also supported. |
| |
| For more details of the context stack, see \k{ctxstack}. For a |
| sample use of \c{%ifctx}, see \k{blockif}. |
| |
| |
| \S{if} \i\c{%if}: Testing Arbitrary Numeric Expressions\I{testing, |
| arbitrary numeric expressions} |
| |
| The conditional-assembly construct \c{%if expr} will cause the |
| subsequent code to be assembled if and only if the value of the |
| numeric expression \c{expr} is non-zero. An example of the use of |
| this feature is in deciding when to break out of a \c{%rep} |
| preprocessor loop: see \k{rep} for a detailed example. |
| |
| The expression given to \c{%if}, and its counterpart \i\c{%elif}, is |
| a critical expression (see \k{crit}). |
| |
| |
| Like other \c{%if} constructs, \c{%if} has a counterpart |
| \i\c{%elif}, and negative forms \i\c{%ifn} and \i\c{%elifn}. |
| |
| \S{ifidn} \i\c{%ifidn} and \i\c{%ifidni}: Testing Exact Text |
| Identity\I{testing, exact text identity} |
| |
| The construct \c{%ifidn text1,text2} will cause the subsequent code |
| to be assembled if and only if \c{text1} and \c{text2}, after |
| expanding single-line macros, are identical pieces of text. |
| Differences in white space are not counted. |
| |
| \c{%ifidni} is similar to \c{%ifidn}, but is \i{case-insensitive}. |
| |
| For example, the following macro pushes a register or number on the |
| stack, and allows you to treat \c{IP} as a real register: |
| |
| \c %macro pushparam 1 |
| \c |
| \c %ifidni %1,ip |
| \c call %%label |
| \c %%label: |
| \c %else |
| \c push %1 |
| \c %endif |
| \c |
| \c %endmacro |
| |
| Like other \c{%if} constructs, \c{%ifidn} has a counterpart |
| \i\c{%elifidn}, and negative forms \i\c{%ifnidn} and \i\c{%elifnidn}. |
| Similarly, \c{%ifidni} has counterparts \i\c{%elifidni}, |
| \i\c{%ifnidni} and \i\c{%elifnidni}. |
| |
| \S{iftyp} \i\c{%ifid}, \i\c{%ifnum}, \i\c{%ifstr}: Testing Token |
| Types\I{testing, token types} |
| |
| Some macros will want to perform different tasks depending on |
| whether they are passed a number, a string, or an identifier. For |
| example, a string output macro might want to be able to cope with |
| being passed either a string constant or a pointer to an existing |
| string. |
| |
| The conditional assembly construct \c{%ifid}, taking one parameter |
| (which may be blank), assembles the subsequent code if and only if |
| the first token in the parameter exists and is an identifier. |
| \c{%ifnum} works similarly, but tests for the token being a numeric |
| constant; \c{%ifstr} tests for it being a string. |
| |
| For example, the \c{writefile} macro defined in \k{mlmacgre} can be |
| extended to take advantage of \c{%ifstr} in the following fashion: |
| |
| \c %macro writefile 2-3+ |
| \c |
| \c %ifstr %2 |
| \c jmp %%endstr |
| \c %if %0 = 3 |
| \c %%str: db %2,%3 |
| \c %else |
| \c %%str: db %2 |
| \c %endif |
| \c %%endstr: mov dx,%%str |
| \c mov cx,%%endstr-%%str |
| \c %else |
| \c mov dx,%2 |
| \c mov cx,%3 |
| \c %endif |
| \c mov bx,%1 |
| \c mov ah,0x40 |
| \c int 0x21 |
| \c |
| \c %endmacro |
| |
| Then the \c{writefile} macro can cope with being called in either of |
| the following two ways: |
| |
| \c writefile [file], strpointer, length |
| \c writefile [file], "hello", 13, 10 |
| |
| In the first, \c{strpointer} is used as the address of an |
| already-declared string, and \c{length} is used as its length; in |
| the second, a string is given to the macro, which therefore declares |
| it itself and works out the address and length for itself. |
| |
| Note the use of \c{%if} inside the \c{%ifstr}: this is to detect |
| whether the macro was passed two arguments (so the string would be a |
| single string constant, and \c{db %2} would be adequate) or more (in |
| which case, all but the first two would be lumped together into |
| \c{%3}, and \c{db %2,%3} would be required). |
| |
| The usual \I\c{%elifid}\I\c{%elifnum}\I\c{%elifstr}\c{%elif}..., |
| \I\c{%ifnid}\I\c{%ifnnum}\I\c{%ifnstr}\c{%ifn}..., and |
| \I\c{%elifnid}\I\c{%elifnnum}\I\c{%elifnstr}\c{%elifn}... versions |
| exist for each of \c{%ifid}, \c{%ifnum} and \c{%ifstr}. |
| |
| \S{iftoken} \i\c{%iftoken}: Test for a Single Token |
| |
| Some macros will want to do different things depending on if it is |
| passed a single token (e.g. paste it to something else using \c{%+}) |
| versus a multi-token sequence. |
| |
| The conditional assembly construct \c{%iftoken} assembles the |
| subsequent code if and only if the expanded parameters consist of |
| exactly one token, possibly surrounded by whitespace. |
| |
| For example: |
| |
| \c %iftoken 1 |
| |
| will assemble the subsequent code, but |
| |
| \c %iftoken -1 |
| |
| will not, since \c{-1} contains two tokens: the unary minus operator |
| \c{-}, and the number \c{1}. |
| |
| The usual \i\c{%eliftoken}, \i\c\{%ifntoken}, and \i\c{%elifntoken} |
| variants are also provided. |
| |
| \S{ifempty} \i\c{%ifempty}: Test for Empty Expansion |
| |
| The conditional assembly construct \c{%ifempty} assembles the |
| subsequent code if and only if the expanded parameters do not contain |
| any tokens at all, whitespace excepted. |
| |
| The usual \i\c{%elifempty}, \i\c\{%ifnempty}, and \i\c{%elifnempty} |
| variants are also provided. |
| |
| \S{ifenv} \i\c{%ifenv}: Test If Environment Variable Exists |
| |
| The conditional assembly construct \c{%ifenv} assembles the |
| subsequent code if and only if the environment variable referenced by |
| the \c{%!}\e{variable} directive exists. |
| |
| The usual \i\c{%elifenv}, \i\c\{%ifnenv}, and \i\c{%elifnenv} |
| variants are also provided. |
| |
| Just as for \c{%!}\e{variable} the argument should be written as a |
| string if it contains characters that would not be legal in an |
| identifier. See \k{getenv}. |
| |
| \H{rep} \i{Preprocessor Loops}\I{repeating code}: \i\c{%rep} |
| |
| NASM's \c{TIMES} prefix, though useful, cannot be used to invoke a |
| multi-line macro multiple times, because it is processed by NASM |
| after macros have already been expanded. Therefore NASM provides |
| another form of loop, this time at the preprocessor level: \c{%rep}. |
| |
| The directives \c{%rep} and \i\c{%endrep} (\c{%rep} takes a numeric |
| argument, which can be an expression; \c{%endrep} takes no |
| arguments) can be used to enclose a chunk of code, which is then |
| replicated as many times as specified by the preprocessor: |
| |
| \c %assign i 0 |
| \c %rep 64 |
| \c inc word [table+2*i] |
| \c %assign i i+1 |
| \c %endrep |
| |
| This will generate a sequence of 64 \c{INC} instructions, |
| incrementing every word of memory from \c{[table]} to |
| \c{[table+126]}. |
| |
| For more complex termination conditions, or to break out of a repeat |
| loop part way along, you can use the \i\c{%exitrep} directive to |
| terminate the loop, like this: |
| |
| \c fibonacci: |
| \c %assign i 0 |
| \c %assign j 1 |
| \c %rep 100 |
| \c %if j > 65535 |
| \c %exitrep |
| \c %endif |
| \c dw j |
| \c %assign k j+i |
| \c %assign i j |
| \c %assign j k |
| \c %endrep |
| \c |
| \c fib_number equ ($-fibonacci)/2 |
| |
| This produces a list of all the Fibonacci numbers that will fit in |
| 16 bits. Note that a maximum repeat count must still be given to |
| \c{%rep}. This is to prevent the possibility of NASM getting into an |
| infinite loop in the preprocessor, which (on multitasking or |
| multi-user systems) would typically cause all the system memory to |
| be gradually used up and other applications to start crashing. |
| |
| Note the maximum repeat count is limited to the value specified by the |
| \c{--limit-rep} option or \c{%pragma limit rep}, see \k{opt-limit}. |
| |
| |
| \H{files} Source Files and Dependencies |
| |
| These commands allow you to split your sources into multiple files. |
| |
| \S{include} \i\c{%include}: \i{Including Other Files} |
| |
| Using, once again, a very similar syntax to the C preprocessor, |
| NASM's preprocessor lets you include other source files into your |
| code. This is done by the use of the \i\c{%include} directive: |
| |
| \c %include "macros.mac" |
| |
| will include the contents of the file \c{macros.mac} into the source |
| file containing the \c{%include} directive. |
| |
| Include files are \I{searching for include files}searched for in the |
| current directory (the directory you're in when you run NASM, as |
| opposed to the location of the NASM executable or the location of |
| the source file), plus any directories specified on the NASM command |
| line using the \c{-i} option. |
| |
| The standard C idiom for preventing a file being included more than |
| once is just as applicable in NASM: if the file \c{macros.mac} has |
| the form |
| |
| \c %ifndef MACROS_MAC |
| \c %define MACROS_MAC |
| \c ; now define some macros |
| \c %endif |
| |
| then including the file more than once will not cause errors, |
| because the second time the file is included nothing will happen |
| because the macro \c{MACROS_MAC} will already be defined. |
| |
| You can force a file to be included even if there is no \c{%include} |
| directive that explicitly includes it, by using the \i\c{-p} option |
| on the NASM command line (see \k{opt-p}). |
| |
| |
| \S{pathsearch} \i\c{%pathsearch}: Search the Include Path |
| |
| The \c{%pathsearch} directive takes a single-line macro name and a |
| filename, and declare or redefines the specified single-line macro to |
| be the include-path-resolved version of the filename, if the file |
| exists (otherwise, it is passed unchanged.) |
| |
| For example, |
| |
| \c %pathsearch MyFoo "foo.bin" |
| |
| ... with \c{-Ibins/} in the include path may end up defining the macro |
| \c{MyFoo} to be \c{"bins/foo.bin"}. |
| |
| |
| \S{depend} \i\c{%depend}: Add Dependent Files |
| |
| The \c{%depend} directive takes a filename and adds it to the list of |
| files to be emitted as dependency generation when the \c{-M} options |
| and its relatives (see \k{opt-M}) are used. It produces no output. |
| |
| This is generally used in conjunction with \c{%pathsearch}. For |
| example, a simplified version of the standard macro wrapper for the |
| \c{INCBIN} directive looks like: |
| |
| \c %imacro incbin 1-2+ 0 |
| \c %pathsearch dep %1 |
| \c %depend dep |
| \c incbin dep,%2 |
| \c %endmacro |
| |
| This first resolves the location of the file into the macro \c{dep}, |
| then adds it to the dependency lists, and finally issues the |
| assembler-level \c{INCBIN} directive. |
| |
| |
| \S{use} \i\c{%use}: Include Standard Macro Package |
| |
| The \c{%use} directive is similar to \c{%include}, but rather than |
| including the contents of a file, it includes a named standard macro |
| package. The standard macro packages are part of NASM, and are |
| described in \k{macropkg}. |
| |
| Unlike the \c{%include} directive, package names for the \c{%use} |
| directive do not require quotes, but quotes are permitted. In NASM |
| 2.04 and 2.05 the unquoted form would be macro-expanded; this is no |
| longer true. Thus, the following lines are equivalent: |
| |
| \c %use altreg |
| \c %use 'altreg' |
| |
| Standard macro packages are protected from multiple inclusion. When a |
| standard macro package is used, a testable single-line macro of the |
| form \c{__?USE_}\e{package}\c{?__} is also defined, see \k{use_def}. |
| |
| \H{ctxstack} The \i{Context Stack} |
| |
| Having labels that are local to a macro definition is sometimes not |
| quite powerful enough: sometimes you want to be able to share labels |
| between several macro calls. An example might be a \c{REPEAT} ... |
| \c{UNTIL} loop, in which the expansion of the \c{REPEAT} macro |
| would need to be able to refer to a label which the \c{UNTIL} macro |
| had defined. However, for such a macro you would also want to be |
| able to nest these loops. |
| |
| NASM provides this level of power by means of a \e{context stack}. |
| The preprocessor maintains a stack of \e{contexts}, each of which is |
| characterized by a name. You add a new context to the stack using |
| the \i\c{%push} directive, and remove one using \i\c{%pop}. You can |
| define labels that are local to a particular context on the stack. |
| |
| |
| \S{pushpop} \i\c{%push} and \i\c{%pop}: \I{creating |
| contexts}\I{removing contexts}Creating and Removing Contexts |
| |
| The \c{%push} directive is used to create a new context and place it |
| on the top of the context stack. \c{%push} takes an optional argument, |
| which is the name of the context. For example: |
| |
| \c %push foobar |
| |
| This pushes a new context called \c{foobar} on the stack. You can have |
| several contexts on the stack with the same name: they can still be |
| distinguished. If no name is given, the context is unnamed (this is |
| normally used when both the \c{%push} and the \c{%pop} are inside a |
| single macro definition.) |
| |
| The directive \c{%pop}, taking one optional argument, removes the top |
| context from the context stack and destroys it, along with any |
| labels associated with it. If an argument is given, it must match the |
| name of the current context, otherwise it will issue an error. |
| |
| |
| \S{ctxlocal} \i{Context-Local Labels} |
| |
| Just as the usage \c{%%foo} defines a label which is local to the |
| particular macro call in which it is used, the usage \I{%$}\c{%$foo} |
| is used to define a label which is local to the context on the top |
| of the context stack. So the \c{REPEAT} and \c{UNTIL} example given |
| above could be implemented by means of: |
| |
| \c %macro repeat 0 |
| \c |
| \c %push repeat |
| \c %$begin: |
| \c |
| \c %endmacro |
| \c |
| \c %macro until 1 |
| \c |
| \c j%-1 %$begin |
| \c %pop |
| \c |
| \c %endmacro |
| |
| and invoked by means of, for example, |
| |
| \c mov cx,string |
| \c repeat |
| \c add cx,3 |
| \c scasb |
| \c until e |
| |
| which would scan every fourth byte of a string in search of the byte |
| in \c{AL}. |
| |
| If you need to define, or access, labels local to the context |
| \e{below} the top one on the stack, you can use \I{%$$}\c{%$$foo}, or |
| \c{%$$$foo} for the context below that, and so on. |
| |
| |
| \S{ctxdefine} \i{Context-Local Single-Line Macros} |
| |
| NASM also allows you to define single-line macros which are local to |
| a particular context, in just the same way: |
| |
| \c %define %$localmac 3 |
| |
| will define the single-line macro \c{%$localmac} to be local to the |
| top context on the stack. Of course, after a subsequent \c{%push}, |
| it can then still be accessed by the name \c{%$$localmac}. |
| |
| |
| \S{ctxfallthrough} \i{Context Fall-Through Lookup} \e{(deprecated)} |
| |
| Context fall-through lookup (automatic searching of outer contexts) |
| is a feature that was added in NASM version 0.98.03. Unfortunately, |
| this feature is unintuitive and can result in buggy code that would |
| have otherwise been prevented by NASM's error reporting. As a result, |
| this feature has been \e{deprecated}. NASM version 2.09 will issue a |
| warning when usage of this \e{deprecated} feature is detected. Starting |
| with NASM version 2.10, usage of this \e{deprecated} feature will simply |
| result in an \e{expression syntax error}. |
| |
| An example usage of this \e{deprecated} feature follows: |
| |
| \c %macro ctxthru 0 |
| \c %push ctx1 |
| \c %assign %$external 1 |
| \c %push ctx2 |
| \c %assign %$internal 1 |
| \c mov eax, %$external |
| \c mov eax, %$internal |
| \c %pop |
| \c %pop |
| \c %endmacro |
| |
| As demonstrated, \c{%$external} is being defined in the \c{ctx1} |
| context and referenced within the \c{ctx2} context. With context |
| fall-through lookup, referencing an undefined context-local macro |
| like this implicitly searches through all outer contexts until a match |
| is made or isn't found in any context. As a result, \c{%$external} |
| referenced within the \c{ctx2} context would implicitly use \c{%$external} |
| as defined in \c{ctx1}. Most people would expect NASM to issue an error in |
| this situation because \c{%$external} was never defined within \c{ctx2} and also |
| isn't qualified with the proper context depth, \c{%$$external}. |
| |
| Here is a revision of the above example with proper context depth: |
| |
| \c %macro ctxthru 0 |
| \c %push ctx1 |
| \c %assign %$external 1 |
| \c %push ctx2 |
| \c %assign %$internal 1 |
| \c mov eax, %$$external |
| \c mov eax, %$internal |
| \c %pop |
| \c %pop |
| \c %endmacro |
| |
| As demonstrated, \c{%$external} is still being defined in the \c{ctx1} |
| context and referenced within the \c{ctx2} context. However, the |
| reference to \c{%$external} within \c{ctx2} has been fully qualified with |
| the proper context depth, \c{%$$external}, and thus is no longer ambiguous, |
| unintuitive or erroneous. |
| |
| |
| \S{ctxrepl} \i\c{%repl}: \I{renaming contexts}Renaming a Context |
| |
| If you need to change the name of the top context on the stack (in |
| order, for example, to have it respond differently to \c{%ifctx}), |
| you can execute a \c{%pop} followed by a \c{%push}; but this will |
| have the side effect of destroying all context-local labels and |
| macros associated with the context that was just popped. |
| |
| NASM provides the directive \c{%repl}, which \e{replaces} a context |
| with a different name, without touching the associated macros and |
| labels. So you could replace the destructive code |
| |
| \c %pop |
| \c %push newname |
| |
| with the non-destructive version \c{%repl newname}. |
| |
| |
| \S{blockif} Example Use of the \i{Context Stack}: \i{Block IFs} |
| |
| This example makes use of almost all the context-stack features, |
| including the conditional-assembly construct \i\c{%ifctx}, to |
| implement a block IF statement as a set of macros. |
| |
| \c %macro if 1 |
| \c |
| \c %push if |
| \c j%-1 %$ifnot |
| \c |
| \c %endmacro |
| \c |
| \c %macro else 0 |
| \c |
| \c %ifctx if |
| \c %repl else |
| \c jmp %$ifend |
| \c %$ifnot: |
| \c %else |
| \c %error "expected `if' before `else'" |
| \c %endif |
| \c |
| \c %endmacro |
| \c |
| \c %macro endif 0 |
| \c |
| \c %ifctx if |
| \c %$ifnot: |
| \c %pop |
| \c %elifctx else |
| \c %$ifend: |
| \c %pop |
| \c %else |
| \c %error "expected `if' or `else' before `endif'" |
| \c %endif |
| \c |
| \c %endmacro |
| |
| This code is more robust than the \c{REPEAT} and \c{UNTIL} macros |
| given in \k{ctxlocal}, because it uses conditional assembly to check |
| that the macros are issued in the right order (for example, not |
| calling \c{endif} before \c{if}) and issues a \c{%error} if they're |
| not. |
| |
| In addition, the \c{endif} macro has to be able to cope with the two |
| distinct cases of either directly following an \c{if}, or following |
| an \c{else}. It achieves this, again, by using conditional assembly |
| to do different things depending on whether the context on top of |
| the stack is \c{if} or \c{else}. |
| |
| The \c{else} macro has to preserve the context on the stack, in |
| order to have the \c{%$ifnot} referred to by the \c{if} macro be the |
| same as the one defined by the \c{endif} macro, but has to change |
| the context's name so that \c{endif} will know there was an |
| intervening \c{else}. It does this by the use of \c{%repl}. |
| |
| A sample usage of these macros might look like: |
| |
| \c cmp ax,bx |
| \c |
| \c if ae |
| \c cmp bx,cx |
| \c |
| \c if ae |
| \c mov ax,cx |
| \c else |
| \c mov ax,bx |
| \c endif |
| \c |
| \c else |
| \c cmp ax,cx |
| \c |
| \c if ae |
| \c mov ax,cx |
| \c endif |
| \c |
| \c endif |
| |
| The block-\c{IF} macros handle nesting quite happily, by means of |
| pushing another context, describing the inner \c{if}, on top of the |
| one describing the outer \c{if}; thus \c{else} and \c{endif} always |
| refer to the last unmatched \c{if} or \c{else}. |
| |
| |
| \H{stackrel} \i{Stack Relative Preprocessor Directives} |
| |
| The following preprocessor directives provide a way to use |
| labels to refer to local variables allocated on the stack. |
| |
| \b\c{%arg} (see \k{arg}) |
| |
| \b\c{%stacksize} (see \k{stacksize}) |
| |
| \b\c{%local} (see \k{local}) |
| |
| |
| \S{arg} \i\c{%arg} Directive |
| |
| The \c{%arg} directive is used to simplify the handling of |
| parameters passed on the stack. Stack based parameter passing |
| is used by many high level languages, including C, C++ and Pascal. |
| |
| While NASM has macros which attempt to duplicate this |
| functionality (see \k{16cmacro}), the syntax is not particularly |
| convenient to use and is not TASM compatible. Here is an example |
| which shows the use of \c{%arg} without any external macros: |
| |
| \c some_function: |
| \c |
| \c %push mycontext ; save the current context |
| \c %stacksize large ; tell NASM to use bp |
| \c %arg i:word, j_ptr:word |
| \c |
| \c mov ax,[i] |
| \c mov bx,[j_ptr] |
| \c add ax,[bx] |
| \c ret |
| \c |
| \c %pop ; restore original context |
| |
| This is similar to the procedure defined in \k{16cmacro} and adds |
| the value in i to the value pointed to by j_ptr and returns the |
| sum in the ax register. See \k{pushpop} for an explanation of |
| \c{push} and \c{pop} and the use of context stacks. |
| |
| |
| \S{stacksize} \i\c{%stacksize} Directive |
| |
| The \c{%stacksize} directive is used in conjunction with the |
| \c{%arg} (see \k{arg}) and the \c{%local} (see \k{local}) directives. |
| It tells NASM the default size to use for subsequent \c{%arg} and |
| \c{%local} directives. The \c{%stacksize} directive takes one |
| required argument which is one of \c{flat}, \c{flat64}, \c{large} or \c{small}. |
| |
| \c %stacksize flat |
| |
| This form causes NASM to use stack-based parameter addressing |
| relative to \c{ebp} and it assumes that a near form of call was used |
| to get to this label (i.e. that \c{eip} is on the stack). |
| |
| \c %stacksize flat64 |
| |
| This form causes NASM to use stack-based parameter addressing |
| relative to \c{rbp} and it assumes that a near form of call was used |
| to get to this label (i.e. that \c{rip} is on the stack). |
| |
| \c %stacksize large |
| |
| This form uses \c{bp} to do stack-based parameter addressing and |
| assumes that a far form of call was used to get to this address |
| (i.e. that \c{ip} and \c{cs} are on the stack). |
| |
| \c %stacksize small |
| |
| This form also uses \c{bp} to address stack parameters, but it is |
| different from \c{large} because it also assumes that the old value |
| of bp is pushed onto the stack (i.e. it expects an \c{ENTER} |
| instruction). In other words, it expects that \c{bp}, \c{ip} and |
| \c{cs} are on the top of the stack, underneath any local space which |
| may have been allocated by \c{ENTER}. This form is probably most |
| useful when used in combination with the \c{%local} directive |
| (see \k{local}). |
| |
| |
| \S{local} \i\c{%local} Directive |
| |
| The \c{%local} directive is used to simplify the use of local |
| temporary stack variables allocated in a stack frame. Automatic |
| local variables in C are an example of this kind of variable. The |
| \c{%local} directive is most useful when used with the \c{%stacksize} |
| (see \k{stacksize} and is also compatible with the \c{%arg} directive |
| (see \k{arg}). It allows simplified reference to variables on the |
| stack which have been allocated typically by using the \c{ENTER} |
| instruction. |
| \# (see \k{insENTER} for a description of that instruction). |
| An example of its use is the following: |
| |
| \c silly_swap: |
| \c |
| \c %push mycontext ; save the current context |
| \c %stacksize small ; tell NASM to use bp |
| \c %assign %$localsize 0 ; see text for explanation |
| \c %local old_ax:word, old_dx:word |
| \c |
| \c enter %$localsize,0 ; see text for explanation |
| \c mov [old_ax],ax ; swap ax & bx |
| \c mov [old_dx],dx ; and swap dx & cx |
| \c mov ax,bx |
| \c mov dx,cx |
| \c mov bx,[old_ax] |
| \c mov cx,[old_dx] |
| \c leave ; restore old bp |
| \c ret ; |
| \c |
| \c %pop ; restore original context |
| |
| The \c{%$localsize} variable is used internally by the |
| \c{%local} directive and \e{must} be defined within the |
| current context before the \c{%local} directive may be used. |
| Failure to do so will result in one expression syntax error for |
| each \c{%local} variable declared. It then may be used in |
| the construction of an appropriately sized ENTER instruction |
| as shown in the example. |
| |
| |
| \H{pperror} Reporting \i{User-Defined Errors}: \i\c{%error}, \i\c{%warning}, \i\c{%fatal} |
| |
| The preprocessor directive \c{%error} will cause NASM to report an |
| error if it occurs in assembled code. So if other users are going to |
| try to assemble your source files, you can ensure that they define the |
| right macros by means of code like this: |
| |
| \c %ifdef F1 |
| \c ; do some setup |
| \c %elifdef F2 |
| \c ; do some different setup |
| \c %else |
| \c %error "Neither F1 nor F2 was defined." |
| \c %endif |
| |
| Then any user who fails to understand the way your code is supposed |
| to be assembled will be quickly warned of their mistake, rather than |
| having to wait until the program crashes on being run and then not |
| knowing what went wrong. |
| |
| Similarly, \c{%warning} issues a warning, but allows assembly to continue: |
| |
| \c %ifdef F1 |
| \c ; do some setup |
| \c %elifdef F2 |
| \c ; do some different setup |
| \c %else |
| \c %warning "Neither F1 nor F2 was defined, assuming F1." |
| \c %define F1 |
| \c %endif |
| |
| \c{%error} and \c{%warning} are issued only on the final assembly |
| pass. This makes them safe to use in conjunction with tests that |
| depend on symbol values. |
| |
| \c{%fatal} terminates assembly immediately, regardless of pass. This |
| is useful when there is no point in continuing the assembly further, |
| and doing so is likely just going to cause a spew of confusing error |
| messages. |
| |
| It is optional for the message string after \c{%error}, \c{%warning} |
| or \c{%fatal} to be quoted. If it is \e{not}, then single-line macros |
| are expanded in it, which can be used to display more information to |
| the user. For example: |
| |
| \c %if foo > 64 |
| \c %assign foo_over foo-64 |
| \c %error foo is foo_over bytes too large |
| \c %endif |
| |
| |
| \H{pragma} \i\c{%pragma}: Setting Options |
| |
| The \c{%pragma} directive controls a number of options in |
| NASM. Pragmas are intended to remain backwards compatible, and |
| therefore an unknown \c{%pragma} directive is not an error. |
| |
| The various pragmas are documented with the options they affect. |
| |
| The general structure of a NASM pragma is: |
| |
| \c{%pragma} \e{namespace} \e{directive} [\e{arguments...}] |
| |
| Currently defined namespaces are: |
| |
| \b \c{ignore}: this \c{%pragma} is unconditionally ignored. |
| |
| \b \c{preproc}: preprocessor, see \k{pragma-preproc}. |
| |
| \b \c{limit}: resource limits, see \k{opt-limit}. |
| |
| \b \c{asm}: the parser and assembler proper. Currently no such pragmas |
| are defined. |
| |
| \b \c{list}: listing options, see \k{opt-L}. |
| |
| \b \c{file}: general file handling options. Currently no such pragmas |
| are defined. |
| |
| \b \c{input}: input file handling options. Currently no such pragmas |
| are defined. |
| |
| \b \c{output}: output format options. |
| |
| \b \c{debug}: debug format options. |
| |
| In addition, the name of any output or debug format, and sometimes |
| groups thereof, also constitue \c{%pragma} namespaces. The namespaces |
| \c{output} and \c{debug} simply refer to \e{any} output or debug |
| format, respectively. |
| |
| For example, to prepend an underscore to global symbols regardless of |
| the output format (see \k{mangling}): |
| |
| \c %pragma output gprefix _ |
| |
| ... whereas to prepend an underscore to global symbols only when the |
| output is either \c{win32} or \c{win64}: |
| |
| \c %pragma win gprefix _ |
| |
| |
| \S{pragma-preproc} Preprocessor Pragmas |
| |
| The only preprocessor \c{%pragma} defined in NASM 2.15 is: |
| |
| \b \c{%pragma preproc sane_empty_expansion}: disables legacy |
| compatibility handling of braceless empty arguments to multi-line |
| macros. See \k{mlmacro} and \k{opt-w}. |
| |
| |
| \H{otherpreproc} \i{Other Preprocessor Directives} |
| |
| \S{line} \i\c{%line} Directive |
| |
| The \c{%line} directive is used to notify NASM that the input line |
| corresponds to a specific line number in another file. Typically |
| this other file would be an original source file, with the current |
| NASM input being the output of a pre-processor. The \c{%line} |
| directive allows NASM to output messages which indicate the line |
| number of the original source file, instead of the file that is being |
| read by NASM. |
| |
| This preprocessor directive is not generally used directly by |
| programmers, but may be of interest to preprocessor authors. The |
| usage of the \c{%line} preprocessor directive is as follows: |
| |
| \c %line nnn[+mmm] [filename] |
| |
| In this directive, \c{nnn} identifies the line of the original source |
| file which this line corresponds to. \c{mmm} is an optional parameter |
| which specifies a line increment value; each line of the input file |
| read in is considered to correspond to \c{mmm} lines of the original |
| source file. Finally, \c{filename} is an optional parameter which |
| specifies the file name of the original source file. It may be a |
| quoted string. |
| |
| After reading a \c{%line} preprocessor directive, NASM will report |
| all file name and line numbers relative to the values specified |
| therein. |
| |
| If the command line option \i\c{--no-line} is given, all \c{%line} |
| directives are ignored. This may be useful for debugging preprocessed |
| code. See \k{opt-no-line}. |
| |
| Starting in NASM 2.15, \c{%line} directives are processed before any |
| other processing takes place. |
| |
| \# This isn't a directive, it should be moved elsewhere... |
| \S{getenv} \i\c{%!}\e{variable}: Read an Environment Variable. |
| |
| The \c{%!}\e{variable} directive makes it possible to read the value of an |
| environment variable at assembly time. This could, for example, be used |
| to store the contents of an environment variable into a string, which |
| could be used at some other point in your code. |
| |
| For example, suppose that you have an environment variable \c{FOO}, |
| and you want the contents of \c{FOO} to be embedded in your program as |
| a quoted string. You could do that as follows: |
| |
| \c %defstr FOO %!FOO |
| |
| See \k{defstr} for notes on the \c{%defstr} directive. |
| |
| If the name of the environment variable contains non-identifier |
| characters, you can use string quotes to surround the name of the |
| variable, for example: |
| |
| \c %defstr C_colon %!'C:' |
| |
| |
| \S{clear} \i\c\{%clear}: Clear All Macro Definitions |
| |
| The directive \c{%clear} clears all definitions of a certain type, |
| \e{including the ones defined by NASM itself.} This can be useful when |
| preprocessing non-NASM code, or to drop backwards compatibility |
| aliases. |
| |
| The syntax is: |
| |
| \c %clear [global|context] type... |
| |
| ... where \c{context} indicates that this applies to context-local |
| macros only; the default is \c{global}. |
| |
| \c{type} can be one or more of: |
| |
| \b \c{define} single-line macros |
| |
| \b \c{defalias} single-line macro aliases (useful to remove backwards |
| compatibility aliases) |
| |
| \b \c{alldefine} same as \c{define defalias} |
| |
| \b \c{macro} multi-line macros |
| |
| \b \c{all} same as \c{alldefine macro} (default) |
| |
| In NASM 2.14 and earlier, only the single syntax \c{%clear} was |
| supported, which is equivalent to \c{%clear global all}. |
| |
| |
| |
| |
| \C{stdmac} \i{Standard Macros} |
| |
| NASM defines a set of standard macros, which are already defined when |
| it starts to process any source file. If you really need a program to |
| be assembled with no pre-defined macros, you can use the \i\c{%clear} |
| directive to empty the preprocessor of everything but context-local |
| preprocessor variables and single-line macros, see \k{clear}. |
| |
| Most \i{user-level assembler directives} (see \k{directive}) are |
| implemented as macros which invoke primitive directives; these are |
| described in \k{directive}. The rest of the standard macro set is |
| described here. |
| |
| For compability with NASM versions before NASM 2.15, most standard |
| macros of the form \c{__?foo?__} have aliases of form \c{__foo__} (see |
| \k{defalias}). These can be removed with the directive \c{%clear |
| defalias}. |
| |
| |
| \H{stdmacver} \i{NASM Version} Macros |
| |
| The single-line macros \i\c{__?NASM_MAJOR?__}, \i\c{__?NASM_MINOR?__}, |
| \i\c{__?NASM_SUBMINOR?__} and \i\c{__?_NASM_PATCHLEVEL?__} expand to the |
| major, minor, subminor and patch level parts of the \i{version |
| number of NASM} being used. So, under NASM 0.98.32p1 for |
| example, \c{__?NASM_MAJOR?__} would be defined to be 0, \c{__?NASM_MINOR?__} |
| would be defined as 98, \c{__?NASM_SUBMINOR?__} would be defined to 32, |
| and \c{__?_NASM_PATCHLEVEL?__} would be defined as 1. |
| |
| Additionally, the macro \i\c{__?NASM_SNAPSHOT?__} is defined for |
| automatically generated snapshot releases \e{only}. |
| |
| |
| \S{stdmacverid} \i\c{__?NASM_VERSION_ID?__}: \i{NASM Version ID} |
| |
| The single-line macro \c{__?NASM_VERSION_ID?__} expands to a dword integer |
| representing the full version number of the version of nasm being used. |
| The value is the equivalent to \c{__?NASM_MAJOR?__}, \c{__?NASM_MINOR?__}, |
| \c{__?NASM_SUBMINOR?__} and \c{__?_NASM_PATCHLEVEL?__} concatenated to |
| produce a single doubleword. Hence, for 0.98.32p1, the returned number |
| would be equivalent to: |
| |
| \c dd 0x00622001 |
| |
| or |
| |
| \c db 1,32,98,0 |
| |
| Note that the above lines are generate exactly the same code, the second |
| line is used just to give an indication of the order that the separate |
| values will be present in memory. |
| |
| |
| \S{stdmacverstr} \i\c{__?NASM_VER?__}: \i{NASM Version string} |
| |
| The single-line macro \c{__?NASM_VER?__} expands to a string which defines |
| the version number of nasm being used. So, under NASM 0.98.32 for example, |
| |
| \c db __?NASM_VER?__ |
| |
| would expand to |
| |
| \c db "0.98.32" |
| |
| |
| \H{fileline} \i\c{__?FILE?__} and \i\c{__?LINE?__}: File Name and Line Number |
| |
| Like the C preprocessor, NASM allows the user to find out the file |
| name and line number containing the current instruction. The macro |
| \c{__?FILE?__} expands to a string constant giving the name of the |
| current input file (which may change through the course of assembly |
| if \c{%include} directives are used), and \c{__?LINE?__} expands to a |
| numeric constant giving the current line number in the input file. |
| |
| These macros could be used, for example, to communicate debugging |
| information to a macro, since invoking \c{__?LINE?__} inside a macro |
| definition (either single-line or multi-line) will return the line |
| number of the macro \e{call}, rather than \e{definition}. So to |
| determine where in a piece of code a crash is occurring, for |
| example, one could write a routine \c{stillhere}, which is passed a |
| line number in \c{EAX} and outputs something like \c{line 155: still |
| here}. You could then write a macro: |
| |
| \c %macro notdeadyet 0 |
| \c |
| \c push eax |
| \c mov eax,__?LINE?__ |
| \c call stillhere |
| \c pop eax |
| \c |
| \c %endmacro |
| |
| and then pepper your code with calls to \c{notdeadyet} until you |
| find the crash point. |
| |
| |
| \H{bitsm} \i\c{__?BITS?__}: Current Code Generation Mode |
| |
| The \c{__?BITS?__} standard macro is updated every time that the BITS mode is |
| set using the \c{BITS XX} or \c{[BITS XX]} directive, where XX is a valid mode |
| number of 16, 32 or 64. \c{__?BITS?__} receives the specified mode number and |
| makes it globally available. This can be very useful for those who utilize |
| mode-dependent macros. |
| |
| \H{ofmtm} \i\c{__?OUTPUT_FORMAT?__}: Current Output Format |
| |
| The \c{__?OUTPUT_FORMAT?__} standard macro holds the current output |
| format name, as given by the \c{-f} option or NASM's default. Type |
| \c{nasm -h} for a list. |
| |
| \c %ifidn __?OUTPUT_FORMAT?__, win32 |
| \c %define NEWLINE 13, 10 |
| \c %elifidn __?OUTPUT_FORMAT?__, elf32 |
| \c %define NEWLINE 10 |
| \c %endif |
| |
| \H{dfmtm} \i\c{__?DEBUG_FORMAT?__}: Current Debug Format |
| |
| If debugging information generation is enabled, The |
| \c{__?DEBUG_FORMAT?__} standard macro holds the current debug format |
| name as specified by the \c{-F} or \c{-g} option or the output format |
| default. Type \c{nasm -f} \e{output} \c{y} for a list. |
| |
| \c{__?DEBUG_FORMAT?__} is not defined if debugging is not enabled, or if |
| the debug format specified is \c{null}. |
| |
| \H{datetime} Assembly Date and Time Macros |
| |
| NASM provides a variety of macros that represent the timestamp of the |
| assembly session. |
| |
| \b The \i\c{__?DATE?__} and \i\c{__?TIME?__} macros give the assembly date and |
| time as strings, in ISO 8601 format (\c{"YYYY-MM-DD"} and \c{"HH:MM:SS"}, |
| respectively.) |
| |
| \b The \i\c{__?DATE_NUM?__} and \i\c{__?TIME_NUM?__} macros give the assembly |
| date and time in numeric form; in the format \c{YYYYMMDD} and |
| \c{HHMMSS} respectively. |
| |
| \b The \i\c{__?UTC_DATE?__} and \i\c{__?UTC_TIME?__} macros give the assembly |
| date and time in universal time (UTC) as strings, in ISO 8601 format |
| (\c{"YYYY-MM-DD"} and \c{"HH:MM:SS"}, respectively.) If the host |
| platform doesn't provide UTC time, these macros are undefined. |
| |
| \b The \i\c{__?UTC_DATE_NUM?__} and \i\c{__?UTC_TIME_NUM?__} macros give the |
| assembly date and time universal time (UTC) in numeric form; in the |
| format \c{YYYYMMDD} and \c{HHMMSS} respectively. If the |
| host platform doesn't provide UTC time, these macros are |
| undefined. |
| |
| \b The \c{__?POSIX_TIME?__} macro is defined as a number containing the |
| number of seconds since the POSIX epoch, 1 January 1970 00:00:00 UTC; |
| excluding any leap seconds. This is computed using UTC time if |
| available on the host platform, otherwise it is computed using the |
| local time as if it was UTC. |
| |
| All instances of time and date macros in the same assembly session |
| produce consistent output. For example, in an assembly session |
| started at 42 seconds after midnight on January 1, 2010 in Moscow |
| (timezone UTC+3) these macros would have the following values, |
| assuming, of course, a properly configured environment with a correct |
| clock: |
| |
| \c __?DATE?__ "2010-01-01" |
| \c __?TIME?__ "00:00:42" |
| \c __?DATE_NUM?__ 20100101 |
| \c __?TIME_NUM?__ 000042 |
| \c __?UTC_DATE?__ "2009-12-31" |
| \c __?UTC_TIME?__ "21:00:42" |
| \c __?UTC_DATE_NUM?__ 20091231 |
| \c __?UTC_TIME_NUM?__ 210042 |
| \c __?POSIX_TIME?__ 1262293242 |
| |
| |
| \H{use_def} \I\c{__?USE_*?__}\c{__?USE_}\e{package}\c{?__}: Package |
| Include Test |
| |
| When a standard macro package (see \k{macropkg}) is included with the |
| \c{%use} directive (see \k{use}), a single-line macro of the form |
| \c{__USE_}\e{package}\c{__} is automatically defined. This allows |
| testing if a particular package is invoked or not. |
| |
| For example, if the \c{altreg} package is included (see |
| \k{pkg_altreg}), then the macro \c{__?USE_ALTREG?__} is defined. |
| |
| |
| \H{pass_macro} \i\c{__?PASS?__}: Assembly Pass |
| |
| The macro \c{__?PASS?__} is defined to be \c{1} on preparatory passes, |
| and \c{2} on the final pass. In preprocess-only mode, it is set to |
| \c{3}, and when running only to generate dependencies (due to the |
| \c{-M} or \c{-MG} option, see \k{opt-M}) it is set to \c{0}. |
| |
| \e{Avoid using this macro if at all possible. It is tremendously easy |
| to generate very strange errors by misusing it, and the semantics may |
| change in future versions of NASM.} |
| |
| |
| \H{strucs} \i{Structure Data Types} |
| |
| \S{struc} \i\c{STRUC} and \i\c{ENDSTRUC}: \i{Declaring Structure} Data Types |
| |
| The core of NASM contains no intrinsic means of defining data |
| structures; instead, the preprocessor is sufficiently powerful that |
| data structures can be implemented as a set of macros. The macros |
| \c{STRUC} and \c{ENDSTRUC} are used to define a structure data type. |
| |
| \c{STRUC} takes one or two parameters. The first parameter is the name |
| of the data type. The second, optional parameter is the base offset of |
| the structure. The name of the data type is defined as a symbol with |
| the value of the base offset, and the name of the data type with the |
| suffix \c{_size} appended to it is defined as an \c{EQU} giving the |
| size of the structure. Once \c{STRUC} has been issued, you are |
| defining the structure, and should define fields using the \c{RESB} |
| family of pseudo-instructions, and then invoke \c{ENDSTRUC} to finish |
| the definition. |
| |
| For example, to define a structure called \c{mytype} containing a |
| longword, a word, a byte and a string of bytes, you might code |
| |
| \c struc mytype |
| \c |
| \c mt_long: resd 1 |
| \c mt_word: resw 1 |
| \c mt_byte: resb 1 |
| \c mt_str: resb 32 |
| \c |
| \c endstruc |
| |
| The above code defines six symbols: \c{mt_long} as 0 (the offset |
| from the beginning of a \c{mytype} structure to the longword field), |
| \c{mt_word} as 4, \c{mt_byte} as 6, \c{mt_str} as 7, \c{mytype_size} |
| as 39, and \c{mytype} itself as zero. |
| |
| The reason why the structure type name is defined at zero by default |
| is a side effect of allowing structures to work with the local label |
| mechanism: if your structure members tend to have the same names in |
| more than one structure, you can define the above structure like this: |
| |
| \c struc mytype |
| \c |
| \c .long: resd 1 |
| \c .word: resw 1 |
| \c .byte: resb 1 |
| \c .str: resb 32 |
| \c |
| \c endstruc |
| |
| This defines the offsets to the structure fields as \c{mytype.long}, |
| \c{mytype.word}, \c{mytype.byte} and \c{mytype.str}. |
| |
| NASM, since it has no \e{intrinsic} structure support, does not |
| support any form of period notation to refer to the elements of a |
| structure once you have one (except the above local-label notation), |
| so code such as \c{mov ax,[mystruc.mt_word]} is not valid. |
| \c{mt_word} is a constant just like any other constant, so the |
| correct syntax is \c{mov ax,[mystruc+mt_word]} or \c{mov |
| ax,[mystruc+mytype.word]}. |
| |
| Sometimes you only have the address of the structure displaced by an |
| offset. For example, consider this standard stack frame setup: |
| |
| \c push ebp |
| \c mov ebp, esp |
| \c sub esp, 40 |
| |
| In this case, you could access an element by subtracting the offset: |
| |
| \c mov [ebp - 40 + mytype.word], ax |
| |
| However, if you do not want to repeat this offset, you can use -40 as |
| a base offset: |
| |
| \c struc mytype, -40 |
| |
| And access an element this way: |
| |
| \c mov [ebp + mytype.word], ax |
| |
| |
| \S{istruc} \i\c{ISTRUC}, \i\c{AT} and \i\c{IEND}: Declaring |
| \i{Instances of Structures} |
| |
| Having defined a structure type, the next thing you typically want |
| to do is to declare instances of that structure in your data |
| segment. NASM provides an easy way to do this in the \c{ISTRUC} |
| mechanism. To declare a structure of type \c{mytype} in a program, |
| you code something like this: |
| |
| \c mystruc: |
| \c istruc mytype |
| \c |
| \c at mt_long, dd 123456 |
| \c at mt_word, dw 1024 |
| \c at mt_byte, db 'x' |
| \c at mt_str, db 'hello, world', 13, 10, 0 |
| \c |
| \c iend |
| |
| The function of the \c{AT} macro is to make use of the \c{TIMES} |
| prefix to advance the assembly position to the correct point for the |
| specified structure field, and then to declare the specified data. |
| Therefore the structure fields must be declared in the same order as |
| they were specified in the structure definition. |
| |
| If the data to go in a structure field requires more than one source |
| line to specify, the remaining source lines can easily come after |
| the \c{AT} line. For example: |
| |
| \c at mt_str, db 123,134,145,156,167,178,189 |
| \c db 190,100,0 |
| |
| Depending on personal taste, you can also omit the code part of the |
| \c{AT} line completely, and start the structure field on the next |
| line: |
| |
| \c at mt_str |
| \c db 'hello, world' |
| \c db 13,10,0 |
| |
| \H{alignment} \i{Alignment} Control |
| |
| \S{align} \i\c{ALIGN} and \i\c{ALIGNB}: Code and Data Alignment |
| |
| The \c{ALIGN} and \c{ALIGNB} macros provides a convenient way to |
| align code or data on a word, longword, paragraph or other boundary. |
| (Some assemblers call this directive \i\c{EVEN}.) The syntax of the |
| \c{ALIGN} and \c{ALIGNB} macros is |
| |
| \c align 4 ; align on 4-byte boundary |
| \c align 16 ; align on 16-byte boundary |
| \c align 8,db 0 ; pad with 0s rather than NOPs |
| \c align 4,resb 1 ; align to 4 in the BSS |
| \c alignb 4 ; equivalent to previous line |
| |
| Both macros require their first argument to be a power of two; they |
| both compute the number of additional bytes required to bring the |
| length of the current section up to a multiple of that power of two, |
| and then apply the \c{TIMES} prefix to their second argument to |
| perform the alignment. |
| |
| If the second argument is not specified, the default for \c{ALIGN} |
| is \c{NOP}, and the default for \c{ALIGNB} is \c{RESB 1}. So if the |
| second argument is specified, the two macros are equivalent. |
| Normally, you can just use \c{ALIGN} in code and data sections and |
| \c{ALIGNB} in BSS sections, and never need the second argument |
| except for special purposes. |
| |
| \c{ALIGN} and \c{ALIGNB}, being simple macros, perform no error |
| checking: they cannot warn you if their first argument fails to be a |
| power of two, or if their second argument generates more than one |
| byte of code. In each of these cases they will silently do the wrong |
| thing. |
| |
| \c{ALIGNB} (or \c{ALIGN} with a second argument of \c{RESB 1}) can |
| be used within structure definitions: |
| |
| \c struc mytype2 |
| \c |
| \c mt_byte: |
| \c resb 1 |
| \c alignb 2 |
| \c mt_word: |
| \c resw 1 |
| \c alignb 4 |
| \c mt_long: |
| \c resd 1 |
| \c mt_str: |
| \c resb 32 |
| \c |
| \c endstruc |
| |
| This will ensure that the structure members are sensibly aligned |
| relative to the base of the structure. |
| |
| A final caveat: \c{ALIGN} and \c{ALIGNB} work relative to the |
| beginning of the \e{section}, not the beginning of the address space |
| in the final executable. Aligning to a 16-byte boundary when the |
| section you're in is only guaranteed to be aligned to a 4-byte |
| boundary, for example, is a waste of effort. Again, NASM does not |
| check that the section's alignment characteristics are sensible for |
| the use of \c{ALIGN} or \c{ALIGNB}. |
| |
| Both \c{ALIGN} and \c{ALIGNB} do call \c{SECTALIGN} macro implicitly. |
| See \k{sectalign} for details. |
| |
| See also the \c{smartalign} standard macro package, \k{pkg_smartalign}. |
| |
| |
| \S{sectalign} \i\c{SECTALIGN}: Section Alignment |
| |
| The \c{SECTALIGN} macros provides a way to modify alignment attribute |
| of output file section. Unlike the \c{align=} attribute (which is allowed |
| at section definition only) the \c{SECTALIGN} macro may be used at any time. |
| |
| For example the directive |
| |
| \c SECTALIGN 16 |
| |
| sets the section alignment requirements to 16 bytes. Once increased it can |
| not be decreased, the magnitude may grow only. |
| |
| Note that \c{ALIGN} (see \k{align}) calls the \c{SECTALIGN} macro implicitly |
| so the active section alignment requirements may be updated. This is by default |
| behaviour, if for some reason you want the \c{ALIGN} do not call \c{SECTALIGN} |
| at all use the directive |
| |
| \c SECTALIGN OFF |
| |
| It is still possible to turn in on again by |
| |
| \c SECTALIGN ON |
| |
| Note that \c{SECTALIGN <ON|OFF>} affects only the \c{ALIGN}/\c{ALIGNB} directives, |
| not an explicit \c{SECTALIGN} directive. |
| |
| \C{macropkg} \i{Standard Macro Packages} |
| |
| The \i\c{%use} directive (see \k{use}) includes one of the standard |
| macro packages included with the NASM distribution and compiled into |
| the NASM binary. It operates like the \c{%include} directive (see |
| \k{include}), but the included contents is provided by NASM itself. |
| |
| The names of standard macro packages are case insensitive and can be |
| quoted or not. |
| |
| As of version 2.15, NASM has \c{%ifusable} and \c{%ifusing} directives to help |
| the user understand whether an individual package available in this version of |
| NASM (\c{%ifusable}) or a particular package already loaded (\c{%ifusing}). |
| |
| |
| \H{pkg_altreg} \i\c{altreg}: \i{Alternate Register Names} |
| |
| The \c{altreg} standard macro package provides alternate register |
| names. It provides numeric register names for all registers (not just |
| \c{R8}-\c{R15}), the Intel-defined aliases \c{R8L}-\c{R15L} for the |
| low bytes of register (as opposed to the NASM/AMD standard names |
| \c{R8B}-\c{R15B}), and the names \c{R0H}-\c{R3H} (by analogy with |
| \c{R0L}-\c{R3L}) for \c{AH}, \c{CH}, \c{DH}, and \c{BH}. |
| |
| Example use: |
| |
| \c %use altreg |
| \c |
| \c proc: |
| \c mov r0l,r3h ; mov al,bh |
| \c ret |
| |
| See also \k{reg64}. |
| |
| |
| \H{pkg_smartalign} \i\c{smartalign}\I{align, smart}: Smart \c{ALIGN} Macro |
| |
| The \c{smartalign} standard macro package provides for an \i\c{ALIGN} |
| macro which is more powerful than the default (and |
| backwards-compatible) one (see \k{align}). When the \c{smartalign} |
| package is enabled, when \c{ALIGN} is used without a second argument, |
| NASM will generate a sequence of instructions more efficient than a |
| series of \c{NOP}. Furthermore, if the padding exceeds a specific |
| threshold, then NASM will generate a jump over the entire padding |
| sequence. |
| |
| The specific instructions generated can be controlled with the |
| new \i\c{ALIGNMODE} macro. This macro takes two parameters: one mode, |
| and an optional jump threshold override. If (for any reason) you need |
| to turn off the jump completely just set jump threshold value to -1 |
| (or set it to \c{nojmp}). The following modes are possible: |
| |
| \b \c{generic}: Works on all x86 CPUs and should have reasonable |
| performance. The default jump threshold is 8. This is the |
| default. |
| |
| \b \c{nop}: Pad out with \c{NOP} instructions. The only difference |
| compared to the standard \c{ALIGN} macro is that NASM can still jump |
| over a large padding area. The default jump threshold is 16. |
| |
| \b \c{k7}: Optimize for the AMD K7 (Athlon/Althon XP). These |
| instructions should still work on all x86 CPUs. The default jump |
| threshold is 16. |
| |
| \b \c{k8}: Optimize for the AMD K8 (Opteron/Althon 64). These |
| instructions should still work on all x86 CPUs. The default jump |
| threshold is 16. |
| |
| \b \c{p6}: Optimize for Intel CPUs. This uses the long \c{NOP} |
| instructions first introduced in Pentium Pro. This is incompatible |
| with all CPUs of family 5 or lower, as well as some VIA CPUs and |
| several virtualization solutions. The default jump threshold is 16. |
| |
| The macro \i\c{__?ALIGNMODE?__} is defined to contain the current |
| alignment mode. A number of other macros beginning with \c{__?ALIGN_} |
| are used internally by this macro package. |
| |
| |
| \H{pkg_fp} \i\c\{fp}: Floating-point macros |
| |
| This packages contains the following floating-point convenience macros: |
| |
| \c %define Inf __?Infinity?__ |
| \c %define NaN __?QNaN?__ |
| \c %define QNaN __?QNaN?__ |
| \c %define SNaN __?SNaN?__ |
| \c |
| \c %define float8(x) __?float8?__(x) |
| \c %define float16(x) __?float16?__(x) |
| \c %define float32(x) __?float32?__(x) |
| \c %define float64(x) __?float64?__(x) |
| \c %define float80m(x) __?float80m?__(x) |
| \c %define float80e(x) __?float80e?__(x) |
| \c %define float128l(x) __?float128l?__(x) |
| \c %define float128h(x) __?float128h?__(x) |
| |
| |
| \H{pkg_ifunc} \i\c{ifunc}: \i{Integer functions} |
| |
| This package contains a set of macros which implement integer |
| functions. These are actually implemented as special operators, but |
| are most conveniently accessed via this macro package. |
| |
| The macros provided are: |
| |
| \S{ilog2} \i{Integer logarithms} |
| |
| These functions calculate the integer logarithm base 2 of their |
| argument, considered as an unsigned integer. The only differences |
| between the functions is their respective behavior if the argument |
| provided is not a power of two. |
| |
| The function \i\c{ilog2e()} (alias \i\c{ilog2()}) generates an error if |
| the argument is not a power of two. |
| |
| The function \i\c{ilog2f()} rounds the argument down to the nearest |
| power of two; if the argument is zero it returns zero. |
| |
| The function \i\c{ilog2c()} rounds the argument up to the nearest |
| power of two. |
| |
| The functions \i\c{ilog2fw()} (alias \i\c{ilog2w()}) and |
| \i\c{ilog2cw()} generate a warning if the argument is not a power of |
| two, but otherwise behaves like \c{ilog2f()} and \c{ilog2c()}, |
| respectively. |
| |
| \H{pkg_masm} \i\c{masm}: \i{MASM compatibility} |
| |
| Since version 2.15, NASM has a MASM compatibility package with minimal |
| functionality, as intended to be used primarily with machine-generated code. |
| It does not include any "programmer-friendly" shortcuts, nor does it in any way |
| support ASSUME, symbol typing, or MASM-style structures. |
| |
| Currently, the MASM compatibility package emulates only the PTR keyword and |
| recognize syntax displacement[index] for memory operations. |
| |
| To enable the package, use the directive: |
| |
| \c{%use masm} |
| |
| |
| \C{directive} \i{Assembler Directives} |
| |
| NASM, though it attempts to avoid the bureaucracy of assemblers like |
| MASM and TASM, is nevertheless forced to support a \e{few} |
| directives. These are described in this chapter. |
| |
| NASM's directives come in two types: \I{user-level |
| directives}\e{user-level} directives and \I{primitive |
| directives}\e{primitive} directives. Typically, each directive has a |
| user-level form and a primitive form. In almost all cases, we |
| recommend that users use the user-level forms of the directives, |
| which are implemented as macros which call the primitive forms. |
| |
| Primitive directives are enclosed in square brackets; user-level |
| directives are not. |
| |
| In addition to the universal directives described in this chapter, |
| each object file format can optionally supply extra directives in |
| order to control particular features of that file format. These |
| \I{format-specific directives}\e{format-specific} directives are |
| documented along with the formats that implement them, in \k{outfmt}. |
| |
| |
| \H{bits} \i\c{BITS}: Specifying Target \i{Processor Mode} |
| |
| The \c{BITS} directive specifies whether NASM should generate code |
| \I{16-bit mode, versus 32-bit mode}designed to run on a processor |
| operating in 16-bit mode, 32-bit mode or 64-bit mode. The syntax is |
| \c{BITS XX}, where XX is 16, 32 or 64. |
| |
| In most cases, you should not need to use \c{BITS} explicitly. The |
| \c{aout}, \c{coff}, \c{elf*}, \c{macho}, \c{win32} and \c{win64} |
| object formats, which are designed for use in 32-bit or 64-bit |
| operating systems, all cause NASM to select 32-bit or 64-bit mode, |
| respectively, by default. The \c{obj} object format allows you |
| to specify each segment you define as either \c{USE16} or \c{USE32}, |
| and NASM will set its operating mode accordingly, so the use of the |
| \c{BITS} directive is once again unnecessary. |
| |
| The most likely reason for using the \c{BITS} directive is to write |
| 32-bit or 64-bit code in a flat binary file; this is because the \c{bin} |
| output format defaults to 16-bit mode in anticipation of it being |
| used most frequently to write DOS \c{.COM} programs, DOS \c{.SYS} |
| device drivers and boot loader software. |
| |
| The \c{BITS} directive can also be used to generate code for a |
| different mode than the standard one for the output format. |
| |
| You do \e{not} need to specify \c{BITS 32} merely in order to use |
| 32-bit instructions in a 16-bit DOS program; if you do, the |
| assembler will generate incorrect code because it will be writing |
| code targeted at a 32-bit platform, to be run on a 16-bit one. |
| |
| When NASM is in \c{BITS 16} mode, instructions which use 32-bit |
| data are prefixed with an 0x66 byte, and those referring to 32-bit |
| addresses have an 0x67 prefix. In \c{BITS 32} mode, the reverse is |
| true: 32-bit instructions require no prefixes, whereas instructions |
| using 16-bit data need an 0x66 and those working on 16-bit addresses |
| need an 0x67. |
| |
| When NASM is in \c{BITS 64} mode, most instructions operate the same |
| as they do for \c{BITS 32} mode. However, there are 8 more general and |
| SSE registers, and 16-bit addressing is no longer supported. |
| |
| The default address size is 64 bits; 32-bit addressing can be selected |
| with the 0x67 prefix. The default operand size is still 32 bits, |
| however, and the 0x66 prefix selects 16-bit operand size. The \c{REX} |
| prefix is used both to select 64-bit operand size, and to access the |
| new registers. NASM automatically inserts REX prefixes when |
| necessary. |
| |
| When the \c{REX} prefix is used, the processor does not know how to |
| address the AH, BH, CH or DH (high 8-bit legacy) registers. Instead, |
| it is possible to access the the low 8-bits of the SP, BP SI and DI |
| registers as SPL, BPL, SIL and DIL, respectively; but only when the |
| REX prefix is used. |
| |
| The \c{BITS} directive has an exactly equivalent primitive form, |
| \c{[BITS 16]}, \c{[BITS 32]} and \c{[BITS 64]}. The user-level form is |
| a macro which has no function other than to call the primitive form. |
| |
| Note that the space is neccessary, e.g. \c{BITS32} will \e{not} work! |
| |
| \S{USE16 & USE32} \i\c{USE16} & \i\c{USE32}: Aliases for BITS |
| |
| The `\c{USE16}' and `\c{USE32}' directives can be used in place of |
| `\c{BITS 16}' and `\c{BITS 32}', for compatibility with other assemblers. |
| |
| |
| \H{default} \i\c{DEFAULT}: Change the assembler defaults |
| |
| The \c{DEFAULT} directive changes the assembler defaults. Normally, |
| NASM defaults to a mode where the programmer is expected to explicitly |
| specify most features directly. However, this is occasionally |
| obnoxious, as the explicit form is pretty much the only one one wishes |
| to use. |
| |
| Currently, \c{DEFAULT} can set \c{REL} & \c{ABS} and \c{BND} & \c{NOBND}. |
| |
| \S{REL & ABS} \i\c{REL} & \i\c{ABS}: RIP-relative addressing |
| |
| This sets whether registerless instructions in 64-bit mode are \c{RIP}-relative |
| or not. By default, they are absolute unless overridden with the \i\c{REL} |
| specifier (see \k{effaddr}). However, if \c{DEFAULT REL} is |
| specified, \c{REL} is default, unless overridden with the \c{ABS} |
| specifier, \e{except when used with an FS or GS segment override}. |
| |
| The special handling of \c{FS} and \c{GS} overrides are due to the |
| fact that these registers are generally used as thread pointers or |
| other special functions in 64-bit mode, and generating |
| \c{RIP}-relative addresses would be extremely confusing. |
| |
| \c{DEFAULT REL} is disabled with \c{DEFAULT ABS}. |
| |
| \S{BND & NOBND} \i\c{BND} & \i\c{NOBND}: \c{BND} prefix |
| |
| If \c{DEFAULT BND} is set, all bnd-prefix available instructions following |
| this directive are prefixed with bnd. To override it, \c{NOBND} prefix can |
| be used. |
| |
| \c DEFAULT BND |
| \c call foo ; BND will be prefixed |
| \c nobnd call foo ; BND will NOT be prefixed |
| |
| \c{DEFAULT NOBND} can disable \c{DEFAULT BND} and then \c{BND} prefix will be |
| added only when explicitly specified in code. |
| |
| \c{DEFAULT BND} is expected to be the normal configuration for writing |
| MPX-enabled code. |
| |
| \H{section} \i\c{SECTION} or \i\c{SEGMENT}: Changing and \i{Defining |
| Sections} |
| |
| \I{changing sections}\I{switching between sections}The \c{SECTION} |
| directive (\c{SEGMENT} is an exactly equivalent synonym) changes |
| which section of the output file the code you write will be |
| assembled into. In some object file formats, the number and names of |
| sections are fixed; in others, the user may make up as many as they |
| wish. Hence \c{SECTION} may sometimes give an error message, or may |
| define a new section, if you try to switch to a section that does |
| not (yet) exist. |
| |
| The Unix object formats, and the \c{bin} object format (but see |
| \k{multisec}), all support |
| the \i{standardized section names} \c{.text}, \c{.data} and \c{.bss} |
| for the code, data and uninitialized-data sections. The \c{obj} |
| format, by contrast, does not recognize these section names as being |
| special, and indeed will strip off the leading period of any section |
| name that has one. |
| |
| |
| \S{sectmac} The \i\c{__?SECT?__} Macro |
| |
| The \c{SECTION} directive is unusual in that its user-level form |
| functions differently from its primitive form. The primitive form, |
| \c{[SECTION xyz]}, simply switches the current target section to the |
| one given. The user-level form, \c{SECTION xyz}, however, first |
| defines the single-line macro \c{__?SECT?__} to be the primitive |
| \c{[SECTION]} directive which it is about to issue, and then issues |
| it. So the user-level directive |
| |
| \c SECTION .text |
| |
| expands to the two lines |
| |
| \c %define __?SECT?__ [SECTION .text] |
| \c [SECTION .text] |
| |
| Users may find it useful to make use of this in their own macros. |
| For example, the \c{writefile} macro defined in \k{mlmacgre} can be |
| usefully rewritten in the following more sophisticated form: |
| |
| \c %macro writefile 2+ |
| \c |
| \c [section .data] |
| \c |
| \c %%str: db %2 |
| \c %%endstr: |
| \c |
| \c __?SECT?__ |
| \c |
| \c mov dx,%%str |
| \c mov cx,%%endstr-%%str |
| \c mov bx,%1 |
| \c mov ah,0x40 |
| \c int 0x21 |
| \c |
| \c %endmacro |
| |
| This form of the macro, once passed a string to output, first |
| switches temporarily to the data section of the file, using the |
| primitive form of the \c{SECTION} directive so as not to modify |
| \c{__?SECT?__}. It then declares its string in the data section, and |
| then invokes \c{__?SECT?__} to switch back to \e{whichever} section |
| the user was previously working in. It thus avoids the need, in the |
| previous version of the macro, to include a \c{JMP} instruction to |
| jump over the data, and also does not fail if, in a complicated |
| \c{OBJ} format module, the user could potentially be assembling the |
| code in any of several separate code sections. |
| |
| |
| \H{absolute} \i\c{ABSOLUTE}: Defining Absolute Labels |
| |
| The \c{ABSOLUTE} directive can be thought of as an alternative form |
| of \c{SECTION}: it causes the subsequent code to be directed at no |
| physical section, but at the hypothetical section starting at the |
| given absolute address. The only instructions you can use in this |
| mode are the \c{RESB} family. |
| |
| \c{ABSOLUTE} is used as follows: |
| |
| \c absolute 0x1A |
| \c |
| \c kbuf_chr resw 1 |
| \c kbuf_free resw 1 |
| \c kbuf resw 16 |
| |
| This example describes a section of the PC BIOS data area, at |
| segment address 0x40: the above code defines \c{kbuf_chr} to be |
| 0x1A, \c{kbuf_free} to be 0x1C, and \c{kbuf} to be 0x1E. |
| |
| The user-level form of \c{ABSOLUTE}, like that of \c{SECTION}, |
| redefines the \i\c{__?SECT?__} macro when it is invoked. |
| |
| \i\c{STRUC} and \i\c{ENDSTRUC} are defined as macros which use |
| \c{ABSOLUTE} (and also \c{__?SECT?__}). |
| |
| \c{ABSOLUTE} doesn't have to take an absolute constant as an |
| argument: it can take an expression (actually, a \i{critical |
| expression}: see \k{crit}) and it can be a value in a segment. For |
| example, a TSR can re-use its setup code as run-time BSS like this: |
| |
| \c org 100h ; it's a .COM program |
| \c |
| \c jmp setup ; setup code comes last |
| \c |
| \c ; the resident part of the TSR goes here |
| \c setup: |
| \c ; now write the code that installs the TSR here |
| \c |
| \c absolute setup |
| \c |
| \c runtimevar1 resw 1 |
| \c runtimevar2 resd 20 |
| \c |
| \c tsr_end: |
| |
| This defines some variables `on top of' the setup code, so that |
| after the setup has finished running, the space it took up can be |
| re-used as data storage for the running TSR. The symbol `tsr_end' |
| can be used to calculate the total size of the part of the TSR that |
| needs to be made resident. |
| |
| |
| \H{extern} \i\c{EXTERN}: \i{Importing Symbols} from Other Modules |
| |
| \c{EXTERN} is similar to the MASM directive \c{EXTRN} and the C |
| keyword \c{extern}: it is used to declare a symbol which is not |
| defined anywhere in the module being assembled, but is assumed to be |
| defined in some other module and needs to be referred to by this |
| one. Not every object-file format can support external variables: |
| the \c{bin} format cannot. |
| |
| The \c{EXTERN} directive takes as many arguments as you like. Each |
| argument is the name of a symbol: |
| |
| \c extern _printf |
| \c extern _sscanf,_fscanf |
| |
| Some object-file formats provide extra features to the \c{EXTERN} |
| directive. In all cases, the extra features are used by suffixing a |
| colon to the symbol name followed by object-format specific text. |
| For example, the \c{obj} format allows you to declare that the |
| default segment base of an external should be the group \c{dgroup} |
| by means of the directive |
| |
| \c extern _variable:wrt dgroup |
| |
| The primitive form of \c{EXTERN} differs from the user-level form |
| only in that it can take only one argument at a time: the support |
| for multiple arguments is implemented at the preprocessor level. |
| |
| You can declare the same variable as \c{EXTERN} more than once: NASM |
| will quietly ignore the second and later redeclarations. |
| |
| If a variable is declared both \c{GLOBAL} and \c{EXTERN}, or if it is |
| declared as \c{EXTERN} and then defined, it will be treated as |
| \c{GLOBAL}. If a variable is declared both as \c{COMMON} and |
| \c{EXTERN}, it will be treated as \c{COMMON}. |
| |
| |
| \H{required} \i\c{REQUIRED}: \i{Importing Symbols} from Other Modules |
| |
| The \c{REQUIRED} keyword is similar to \c{EXTERN} one. The difference is that |
| the \c{EXTERN} keyword as of version 2.15 does not generate unknown symbols, as |
| this behavior is highly undesirable when using common header files, |
| because it might cause the linker to pull in a bunch of unnecessary modules, |
| depending on how smart the linker is. |
| |
| If the old behavior is required, use \c{REQUIRED} keyword instead. |
| |
| |
| \H{global} \i\c{GLOBAL}: \i{Exporting Symbols} to Other Modules |
| |
| \c{GLOBAL} is the other end of \c{EXTERN}: if one module declares a |
| symbol as \c{EXTERN} and refers to it, then in order to prevent |
| linker errors, some other module must actually \e{define} the |
| symbol and declare it as \c{GLOBAL}. Some assemblers use the name |
| \i\c{PUBLIC} for this purpose. |
| |
| \c{GLOBAL} uses the same syntax as \c{EXTERN}, except that it must |
| refer to symbols which \e{are} defined in the same module as the |
| \c{GLOBAL} directive. For example: |
| |
| \c global _main |
| \c _main: |
| \c ; some code |
| |
| \c{GLOBAL}, like \c{EXTERN}, allows object formats to define private |
| extensions by means of a colon. The ELF object format, for example, |
| lets you specify whether global data items are functions or data: |
| |
| \c global hashlookup:function, hashtable:data |
| |
| Like \c{EXTERN}, the primitive form of \c{GLOBAL} differs from the |
| user-level form only in that it can take only one argument at a |
| time. |
| |
| |
| \H{common} \i\c{COMMON}: Defining Common Data Areas |
| |
| The \c{COMMON} directive is used to declare \i\e{common variables}. |
| A common variable is much like a global variable declared in the |
| uninitialized data section, so that |
| |
| \c common intvar 4 |
| |
| is similar in function to |
| |
| \c global intvar |
| \c section .bss |
| \c |
| \c intvar resd 1 |
| |
| The difference is that if more than one module defines the same |
| common variable, then at link time those variables will be |
| \e{merged}, and references to \c{intvar} in all modules will point |
| at the same piece of memory. |
| |
| Like \c{GLOBAL} and \c{EXTERN}, \c{COMMON} supports object-format |
| specific extensions. For example, the \c{obj} format allows common |
| variables to be NEAR or FAR, and the ELF format allows you to specify |
| the alignment requirements of a common variable: |
| |
| \c common commvar 4:near ; works in OBJ |
| \c common intarray 100:4 ; works in ELF: 4 byte aligned |
| |
| Once again, like \c{EXTERN} and \c{GLOBAL}, the primitive form of |
| \c{COMMON} differs from the user-level form only in that it can take |
| only one argument at a time. |
| |
| \H{static} \i\c{STATIC}: Local Symbols within Modules |
| |
| Opposite to \c{EXTERN} and \c{GLOBAL}, \c{STATIC} is local symbol, but |
| should be named according to the global mangling rules (named by |
| analogy with the C keyword \c{static} as applied to functions or |
| global variables). |
| |
| \c static foo |
| \c foo: |
| \c ; codes |
| |
| Unlike \c{GLOBAL}, \c{STATIC} does not allow object formats to accept |
| private extensions mentioned in \k{global}. |
| |
| \H{mangling} \i\c{(G|L)PREFIX}, \i\c{(G|L)POSTFIX}: Mangling Symbols |
| |
| \c{PREFIX}, \c{GPREFIX}, \c{LPREFIX}, \c{POSTFIX}, \c{GPOSTFIX}, and |
| \c{LPOSTFIX} directives can prepend or append a string to a certain |
| type of symbols, normally to fit specific ABI conventions |
| |
| \b\c{PREFIX}|\c{GPREFIX}: Prepend the argument to all \c{EXTERN} |
| \c{COMMON}, \c{STATIC}, and \c{GLOBAL} symbols. |
| |
| \b\c{LPREFIX}: Prepend the argument to all other symbols |
| such as local labels and backend defined symbols. |
| |
| \b\c{POSTFIX}|\c{GPOSTFIX}: Append the argument to all \c{EXTERN} |
| \c{COMMON}, \c{STATIC}, and \c{GLOBAL} symbols. |
| |
| \b\c{LPOSTFIX}: Append the argument to all other symbols |
| such as local labels and backend defined symbols. |
| |
| These a macros implemented as pragmas, and using \c{%pragma} syntax |
| can be restricted to specific backends (see \k{pragma}): |
| |
| \c %pragma macho lprefix L_ |
| |
| Command line options are also available. See also \k{opt-pfix}. |
| |
| One example which supports many ABIs: |
| |
| \c ; The most common conventions |
| \c %pragma output gprefix _ |
| \c %pragma output lprefix L_ |
| \c ; ELF uses a different convention |
| \c %pragma elf gprefix ; empty |
| \c %pragma elf lprefix .L |
| |
| Some toolchains is aware of a particular prefix for its own optimization |
| options, such as code elimination. For instance, Mach-O backend has a |
| linker that uses a simplistic naming scheme to chunk up sections into a |
| meta section. When the \c{subsections_via_symbols} directive |
| (\k{macho-ssvs}) is declared, each symbol is the start of a |
| separate block. The meta section is, then, defined to include sections |
| before the one that starts with a 'L'. \c{LPREFIX} is useful here to mark |
| all local symbols with the 'L' prefix to be excluded to the meta section. |
| It converts local symbols compatible with the particular toolchain. |
| Note that local symbols declared with \c{STATIC} (\k{static}) |
| are excluded from the symbol mangling and also not marked as global. |
| |
| |
| \H{CPU} \i\c{CPU}: Defining CPU Dependencies |
| |
| The \i\c{CPU} directive restricts assembly to those instructions which |
| are available on the specified CPU. |
| |
| Options are: |
| |
| \b\c{CPU 8086} Assemble only 8086 instruction set |
| |
| \b\c{CPU 186} Assemble instructions up to the 80186 instruction set |
| |
| \b\c{CPU 286} Assemble instructions up to the 286 instruction set |
| |
| \b\c{CPU 386} Assemble instructions up to the 386 instruction set |
| |
| \b\c{CPU 486} 486 instruction set |
| |
| \b\c{CPU 586} Pentium instruction set |
| |
| \b\c{CPU PENTIUM} Same as 586 |
| |
| \b\c{CPU 686} P6 instruction set |
| |
| \b\c{CPU PPRO} Same as 686 |
| |
| \b\c{CPU P2} Same as 686 |
| |
| \b\c{CPU P3} Pentium III (Katmai) instruction sets |
| |
| \b\c{CPU KATMAI} Same as P3 |
| |
| \b\c{CPU P4} Pentium 4 (Willamette) instruction set |
| |
| \b\c{CPU WILLAMETTE} Same as P4 |
| |
| \b\c{CPU PRESCOTT} Prescott instruction set |
| |
| \b\c{CPU X64} x86-64 (x64/AMD64/Intel 64) instruction set |
| |
| \b\c{CPU IA64} IA64 CPU (in x86 mode) instruction set |
| |
| All options are case insensitive. All instructions will be selected |
| only if they apply to the selected CPU or lower. By default, all |
| instructions are available. |
| |
| |
| \H{FLOAT} \i\c{FLOAT}: Handling of \I{floating-point, constants}floating-point constants |
| |
| By default, floating-point constants are rounded to nearest, and IEEE |
| denormals are supported. The following options can be set to alter |
| this behaviour: |
| |
| \b\c{FLOAT DAZ} Flush denormals to zero |
| |
| \b\c{FLOAT NODAZ} Do not flush denormals to zero (default) |
| |
| \b\c{FLOAT NEAR} Round to nearest (default) |
| |
| \b\c{FLOAT UP} Round up (toward +Infinity) |
| |
| \b\c{FLOAT DOWN} Round down (toward -Infinity) |
| |
| \b\c{FLOAT ZERO} Round toward zero |
| |
| \b\c{FLOAT DEFAULT} Restore default settings |
| |
| The standard macros \i\c{__?FLOAT_DAZ?__}, \i\c{__?FLOAT_ROUND?__}, and |
| \i\c{__?FLOAT?__} contain the current state, as long as the programmer |
| has avoided the use of the brackeded primitive form, (\c{[FLOAT]}). |
| |
| \c{__?FLOAT?__} contains the full set of floating-point settings; this |
| value can be saved away and invoked later to restore the setting. |
| |
| |
| \H{asmdir-warning} \i\c{[WARNING]}: Enable or disable warnings |
| |
| The \c{[WARNING]} directive can be used to enable or disable classes |
| of warnings in the same way as the \c{-w} option, see \k{opt-w} for |
| more details about warning classes. |
| |
| \b \c{[warning +}\e{warning-class}\c{]} enables warnings for |
| \e{warning-class}. |
| |
| \b \c{[warning -}\e{warning-class}\c{]} disables warnings for |
| \e{warning-class}. |
| |
| \b \c{[warning *}\e{warning-class}\c{]} restores \e{warning-class} to |
| the original value, either the default value or as specified on the |
| command line. |
| |
| \b \c{[warning push]} saves the current warning state on a stack. |
| |
| \b \c{[warning pop]} restores the current warning state from the stack. |
| |
| The \c{[WARNING]} directive also accepts the \c{all}, \c{error} and |
| \c{error=}\e{warning-class} specifiers. |
| |
| No "user form" (without the brackets) currently exists. |
| |
| |
| \C{outfmt} \i{Output Formats} |
| |
| NASM is a portable assembler, designed to be able to compile on any |
| ANSI C-supporting platform and produce output to run on a variety of |
| Intel x86 operating systems. For this reason, it has a large number |
| of available output formats, selected using the \i\c{-f} option on |
| the NASM \i{command line}. Each of these formats, along with its |
| extensions to the base NASM syntax, is detailed in this chapter. |
| |
| As stated in \k{opt-o}, NASM chooses a \i{default name} for your |
| output file based on the input file name and the chosen output |
| format. This will be generated by removing the \i{extension} |
| (\c{.asm}, \c{.s}, or whatever you like to use) from the input file |
| name, and substituting an extension defined by the output format. |
| The extensions are given with each format below. |
| |
| |
| \H{binfmt} \i\c{bin}: \i{Flat-Form Binary}\I{pure binary} Output |
| |
| The \c{bin} format does not produce object files: it generates |
| nothing in the output file except the code you wrote. Such `pure |
| binary' files are used by \i{MS-DOS}: \i\c{.COM} executables and |
| \i\c{.SYS} device drivers are pure binary files. Pure binary output |
| is also useful for \i{operating system} and \i{boot loader} |
| development. |
| |
| The \c{bin} format supports \i{multiple section names}. For details of |
| how NASM handles sections in the \c{bin} format, see \k{multisec}. |
| |
| Using the \c{bin} format puts NASM by default into 16-bit mode (see |
| \k{bits}). In order to use \c{bin} to write 32-bit or 64-bit code, |
| such as an OS kernel, you need to explicitly issue the \I\c{BITS}\c{BITS 32} |
| or \I\c{BITS}\c{BITS 64} directive. |
| |
| \c{bin} has no default output file name extension: instead, it |
| leaves your file name as it is once the original extension has been |
| removed. Thus, the default is for NASM to assemble \c{binprog.asm} |
| into a binary file called \c{binprog}. |
| |
| |
| \S{org} \i\c{ORG}: Binary File \i{Program Origin} |
| |
| The \c{bin} format provides an additional directive to the list |
| given in \k{directive}: \c{ORG}. The function of the \c{ORG} |
| directive is to specify the origin address which NASM will assume |
| the program begins at when it is loaded into memory. |
| |
| For example, the following code will generate the longword |
| \c{0x00000104}: |
| |
| \c org 0x100 |
| \c dd label |
| \c label: |
| |
| Unlike the \c{ORG} directive provided by MASM-compatible assemblers, |
| which allows you to jump around in the object file and overwrite |
| code you have already generated, NASM's \c{ORG} does exactly what |
| the directive says: \e{origin}. Its sole function is to specify one |
| offset which is added to all internal address references within the |
| section; it does not permit any of the trickery that MASM's version |
| does. See \k{proborg} for further comments. |
| |
| |
| \S{binseg} \c{bin} Extensions to the \c{SECTION} |
| Directive\I{SECTION, bin extensions to} |
| |
| The \c{bin} output format extends the \c{SECTION} (or \c{SEGMENT}) |
| directive to allow you to specify the alignment requirements of |
| segments. This is done by appending the \i\c{ALIGN} qualifier to the |
| end of the section-definition line. For example, |
| |
| \c section .data align=16 |
| |
| switches to the section \c{.data} and also specifies that it must be |
| aligned on a 16-byte boundary. |
| |
| The parameter to \c{ALIGN} specifies how many low bits of the |
| section start address must be forced to zero. The alignment value |
| given may be any power of two.\I{section alignment, in |
| bin}\I{segment alignment, in bin}\I{alignment, in bin sections} |
| |
| |
| \S{multisec} \i{Multisection}\I{bin, multisection} Support for the \c{bin} Format |
| |
| The \c{bin} format allows the use of multiple sections, of arbitrary names, |
| besides the "known" \c{.text}, \c{.data}, and \c{.bss} names. |
| |
| \b Sections may be designated \i\c{progbits} or \i\c{nobits}. Default |
| is \c{progbits} (except \c{.bss}, which defaults to \c{nobits}, |
| of course). |
| |
| \b Sections can be aligned at a specified boundary following the previous |
| section with \c{align=}, or at an arbitrary byte-granular position with |
| \i\c{start=}. |
| |
| \b Sections can be given a virtual start address, which will be used |
| for the calculation of all memory references within that section |
| with \i\c{vstart=}. |
| |
| \b Sections can be ordered using \i\c{follows=}\c{<section>} or |
| \i\c{vfollows=}\c{<section>} as an alternative to specifying an explicit |
| start address. |
| |
| \b Arguments to \c{org}, \c{start}, \c{vstart}, and \c{align=} are |
| critical expressions. See \k{crit}. E.g. \c{align=(1 << ALIGN_SHIFT)} |
| - \c{ALIGN_SHIFT} must be defined before it is used here. |
| |
| \b Any code which comes before an explicit \c{SECTION} directive |
| is directed by default into the \c{.text} section. |
| |
| \b If an \c{ORG} statement is not given, \c{ORG 0} is used |
| by default. |
| |
| \b The \c{.bss} section will be placed after the last \c{progbits} |
| section, unless \c{start=}, \c{vstart=}, \c{follows=}, or \c{vfollows=} |
| has been specified. |
| |
| \b All sections are aligned on dword boundaries, unless a different |
| alignment has been specified. |
| |
| \b Sections may not overlap. |
| |
| \b NASM creates the \c{section.<secname>.start} for each section, |
| which may be used in your code. |
| |
| \S{map}\i{Map Files} |
| |
| Map files can be generated in \c{-f bin} format by means of the \c{[map]} |
| option. Map types of \c{all} (default), \c{brief}, \c{sections}, \c{segments}, |
| or \c{symbols} may be specified. Output may be directed to \c{stdout} |
| (default), \c{stderr}, or a specified file. E.g. |
| \c{[map symbols myfile.map]}. No "user form" exists, the square |
| brackets must be used. |
| |
| |
| \H{ithfmt} \i\c{ith}: \i{Intel Hex} Output |
| |
| The \c{ith} file format produces Intel hex-format files. Just as the |
| \c{bin} format, this is a flat memory image format with no support for |
| relocation or linking. It is usually used with ROM programmers and |
| similar utilities. |
| |
| All extensions supported by the \c{bin} file format is also supported by |
| the \c{ith} file format. |
| |
| \c{ith} provides a default output file-name extension of \c{.ith}. |
| |
| |
| \H{srecfmt} \i\c{srec}: \i{Motorola S-Records} Output |
| |
| The \c{srec} file format produces Motorola S-records files. Just as the |
| \c{bin} format, this is a flat memory image format with no support for |
| relocation or linking. It is usually used with ROM programmers and |
| similar utilities. |
| |
| All extensions supported by the \c{bin} file format is also supported by |
| the \c{srec} file format. |
| |
| \c{srec} provides a default output file-name extension of \c{.srec}. |
| |
| |
| \H{objfmt} \i\c{obj}: \i{Microsoft OMF}\I{OMF} Object Files |
| |
| The \c{obj} file format (NASM calls it \c{obj} rather than \c{omf} |
| for historical reasons) is the one produced by \i{MASM} and |
| \i{TASM}, which is typically fed to 16-bit DOS linkers to produce |
| \i\c{.EXE} files. It is also the format used by \i{OS/2}. |
| |
| \c{obj} provides a default output file-name extension of \c{.obj}. |
| |
| \c{obj} is not exclusively a 16-bit format, though: NASM has full |
| support for the 32-bit extensions to the format. In particular, |
| 32-bit \c{obj} format files are used by \i{Borland's Win32 |
| compilers}, instead of using Microsoft's newer \i\c{win32} object |
| file format. |
| |
| The \c{obj} format does not define any special segment names: you |
| can call your segments anything you like. Typical names for segments |
| in \c{obj} format files are \c{CODE}, \c{DATA} and \c{BSS}. |
| |
| If your source file contains code before specifying an explicit |
| \c{SEGMENT} directive, then NASM will invent its own segment called |
| \i\c{__NASMDEFSEG} for you. |
| |
| When you define a segment in an \c{obj} file, NASM defines the |
| segment name as a symbol as well, so that you can access the segment |
| address of the segment. So, for example: |
| |
| \c segment data |
| \c |
| \c dvar: dw 1234 |
| \c |
| \c segment code |
| \c |
| \c function: |
| \c mov ax,data ; get segment address of data |
| \c mov ds,ax ; and move it into DS |
| \c inc word [dvar] ; now this reference will work |
| \c ret |
| |
| The \c{obj} format also enables the use of the \i\c{SEG} and |
| \i\c{WRT} operators, so that you can write code which does things |
| like |
| |
| \c extern foo |
| \c |
| \c mov ax,seg foo ; get preferred segment of foo |
| \c mov ds,ax |
| \c mov ax,data ; a different segment |
| \c mov es,ax |
| \c mov ax,[ds:foo] ; this accesses `foo' |
| \c mov [es:foo wrt data],bx ; so does this |
| |
| |
| \S{objseg} \c{obj} Extensions to the \c{SEGMENT} |
| Directive\I{SEGMENT, obj extensions to} |
| |
| The \c{obj} output format extends the \c{SEGMENT} (or \c{SECTION}) |
| directive to allow you to specify various properties of the segment |
| you are defining. This is done by appending extra qualifiers to the |
| end of the segment-definition line. For example, |
| |
| \c segment code private align=16 |
| |
| defines the segment \c{code}, but also declares it to be a private |
| segment, and requires that the portion of it described in this code |
| module must be aligned on a 16-byte boundary. |
| |
| The available qualifiers are: |
| |
| \b \i\c{PRIVATE}, \i\c{PUBLIC}, \i\c{COMMON} and \i\c{STACK} specify |
| the combination characteristics of the segment. \c{PRIVATE} segments |
| do not get combined with any others by the linker; \c{PUBLIC} and |
| \c{STACK} segments get concatenated together at link time; and |
| \c{COMMON} segments all get overlaid on top of each other rather |
| than stuck end-to-end. |
| |
| \b \i\c{ALIGN} is used, as shown above, to specify how many low bits |
| of the segment start address must be forced to zero. The alignment |
| value given may be any power of two from 1 to 4096; in reality, the |
| only values supported are 1, 2, 4, 16, 256 and 4096, so if 8 is |
| specified it will be rounded up to 16, and 32, 64 and 128 will all |
| be rounded up to 256, and so on. Note that alignment to 4096-byte |
| boundaries is a \i{PharLap} extension to the format and may not be |
| supported by all linkers.\I{section alignment, in OBJ}\I{segment |
| alignment, in OBJ}\I{alignment, in OBJ sections} |
| |
| \b \i\c{CLASS} can be used to specify the segment class; this feature |
| indicates to the linker that segments of the same class should be |
| placed near each other in the output file. The class name can be any |
| word, e.g. \c{CLASS=CODE}. |
| |
| \b \i\c{OVERLAY}, like \c{CLASS}, is specified with an arbitrary word |
| as an argument, and provides overlay information to an |
| overlay-capable linker. |
| |
| \b Segments can be declared as \i\c{USE16} or \i\c{USE32}, which has |
| the effect of recording the choice in the object file and also |
| ensuring that NASM's default assembly mode when assembling in that |
| segment is 16-bit or 32-bit respectively. |
| |
| \b When writing \i{OS/2} object files, you should declare 32-bit |
| segments as \i\c{FLAT}, which causes the default segment base for |
| anything in the segment to be the special group \c{FLAT}, and also |
| defines the group if it is not already defined. |
| |
| \b The \c{obj} file format also allows segments to be declared as |
| having a pre-defined absolute segment address, although no linkers |
| are currently known to make sensible use of this feature; |
| nevertheless, NASM allows you to declare a segment such as |
| \c{SEGMENT SCREEN ABSOLUTE=0xB800} if you need to. The \i\c{ABSOLUTE} |
| and \c{ALIGN} keywords are mutually exclusive. |
| |
| NASM's default segment attributes are \c{PUBLIC}, \c{ALIGN=1}, no |
| class, no overlay, and \c{USE16}. |
| |
| |
| \S{group} \i\c{GROUP}: Defining Groups of Segments\I{segments, groups of} |
| |
| The \c{obj} format also allows segments to be grouped, so that a |
| single segment register can be used to refer to all the segments in |
| a group. NASM therefore supplies the \c{GROUP} directive, whereby |
| you can code |
| |
| \c segment data |
| \c |
| \c ; some data |
| \c |
| \c segment bss |
| \c |
| \c ; some uninitialized data |
| \c |
| \c group dgroup data bss |
| |
| which will define a group called \c{dgroup} to contain the segments |
| \c{data} and \c{bss}. Like \c{SEGMENT}, \c{GROUP} causes the group |
| name to be defined as a symbol, so that you can refer to a variable |
| \c{var} in the \c{data} segment as \c{var wrt data} or as \c{var wrt |
| dgroup}, depending on which segment value is currently in your |
| segment register. |
| |
| If you just refer to \c{var}, however, and \c{var} is declared in a |
| segment which is part of a group, then NASM will default to giving |
| you the offset of \c{var} from the beginning of the \e{group}, not |
| the \e{segment}. Therefore \c{SEG var}, also, will return the group |
| base rather than the segment base. |
| |
| NASM will allow a segment to be part of more than one group, but |
| will generate a warning if you do this. Variables declared in a |
| segment which is part of more than one group will default to being |
| relative to the first group that was defined to contain the segment. |
| |
| A group does not have to contain any segments; you can still make |
| \c{WRT} references to a group which does not contain the variable |
| you are referring to. OS/2, for example, defines the special group |
| \c{FLAT} with no segments in it. |
| |
| |
| \S{uppercase} \i\c{UPPERCASE}: Disabling Case Sensitivity in Output |
| |
| Although NASM itself is \i{case sensitive}, some OMF linkers are |
| not; therefore it can be useful for NASM to output single-case |
| object files. The \c{UPPERCASE} format-specific directive causes all |
| segment, group and symbol names that are written to the object file |
| to be forced to upper case just before being written. Within a |
| source file, NASM is still case-sensitive; but the object file can |
| be written entirely in upper case if desired. |
| |
| \c{UPPERCASE} is used alone on a line; it requires no parameters. |
| |
| |
| \S{import} \i\c{IMPORT}: Importing DLL Symbols\I{DLL symbols, |
| importing}\I{symbols, importing from DLLs} |
| |
| The \c{IMPORT} format-specific directive defines a symbol to be |
| imported from a DLL, for use if you are writing a DLL's \i{import |
| library} in NASM. You still need to declare the symbol as \c{EXTERN} |
| as well as using the \c{IMPORT} directive. |
| |
| The \c{IMPORT} directive takes two required parameters, separated by |
| white space, which are (respectively) the name of the symbol you |
| wish to import and the name of the library you wish to import it |
| from. For example: |
| |
| \c import WSAStartup wsock32.dll |
| |
| A third optional parameter gives the name by which the symbol is |
| known in the library you are importing it from, in case this is not |
| the same as the name you wish the symbol to be known by to your code |
| once you have imported it. For example: |
| |
| \c import asyncsel wsock32.dll WSAAsyncSelect |
| |
| |
| \S{export} \i\c{EXPORT}: Exporting DLL Symbols\I{DLL symbols, |
| exporting}\I{symbols, exporting from DLLs} |
| |
| The \c{EXPORT} format-specific directive defines a global symbol to |
| be exported as a DLL symbol, for use if you are writing a DLL in |
| NASM. You still need to declare the symbol as \c{GLOBAL} as well as |
| using the \c{EXPORT} directive. |
| |
| \c{EXPORT} takes one required parameter, which is the name of the |
| symbol you wish to export, as it was defined in your source file. An |
| optional second parameter (separated by white space from the first) |
| gives the \e{external} name of the symbol: the name by which you |
| wish the symbol to be known to programs using the DLL. If this name |
| is the same as the internal name, you may leave the second parameter |
| off. |
| |
| Further parameters can be given to define attributes of the exported |
| symbol. These parameters, like the second, are separated by white |
| space. If further parameters are given, the external name must also |
| be specified, even if it is the same as the internal name. The |
| available attributes are: |
| |
| \b \c{resident} indicates that the exported name is to be kept |
| resident by the system loader. This is an optimisation for |
| frequently used symbols imported by name. |
| |
| \b \c{nodata} indicates that the exported symbol is a function which |
| does not make use of any initialized data. |
| |
| \b \c{parm=NNN}, where \c{NNN} is an integer, sets the number of |
| parameter words for the case in which the symbol is a call gate |
| between 32-bit and 16-bit segments. |
| |
| \b An attribute which is just a number indicates that the symbol |
| should be exported with an identifying number (ordinal), and gives |
| the desired number. |
| |
| For example: |
| |
| \c export myfunc |
| \c export myfunc TheRealMoreFormalLookingFunctionName |
| \c export myfunc myfunc 1234 ; export by ordinal |
| \c export myfunc myfunc resident parm=23 nodata |
| |
| |
| \S{dotdotstart} \i\c{..start}: Defining the \i{Program Entry |
| Point} |
| |
| \c{OMF} linkers require exactly one of the object files being linked to |
| define the program entry point, where execution will begin when the |
| program is run. If the object file that defines the entry point is |
| assembled using NASM, you specify the entry point by declaring the |
| special symbol \c{..start} at the point where you wish execution to |
| begin. |
| |
| |
| \S{objextern} \c{obj} Extensions to the \c{EXTERN} |
| Directive\I{EXTERN, obj extensions to} |
| |
| If you declare an external symbol with the directive |
| |
| \c extern foo |
| |
| then references such as \c{mov ax,foo} will give you the offset of |
| \c{foo} from its preferred segment base (as specified in whichever |
| module \c{foo} is actually defined in). So to access the contents of |
| \c{foo} you will usually need to do something like |
| |
| \c mov ax,seg foo ; get preferred segment base |
| \c mov es,ax ; move it into ES |
| \c mov ax,[es:foo] ; and use offset `foo' from it |
| |
| This is a little unwieldy, particularly if you know that an external |
| is going to be accessible from a given segment or group, say |
| \c{dgroup}. So if \c{DS} already contained \c{dgroup}, you could |
| simply code |
| |
| \c mov ax,[foo wrt dgroup] |
| |
| However, having to type this every time you want to access \c{foo} |
| can be a pain; so NASM allows you to declare \c{foo} in the |
| alternative form |
| |
| \c extern foo:wrt dgroup |
| |
| This form causes NASM to pretend that the preferred segment base of |
| \c{foo} is in fact \c{dgroup}; so the expression \c{seg foo} will |
| now return \c{dgroup}, and the expression \c{foo} is equivalent to |
| \c{foo wrt dgroup}. |
| |
| This \I{default-WRT mechanism}default-\c{WRT} mechanism can be used |
| to make externals appear to be relative to any group or segment in |
| your program. It can also be applied to common variables: see |
| \k{objcommon}. |
| |
| |
| \S{objcommon} \c{obj} Extensions to the \c{COMMON} |
| Directive\I{COMMON, obj extensions to} |
| |
| The \c{obj} format allows common variables to be either near\I{near |
| common variables} or far\I{far common variables}; NASM allows you to |
| specify which your variables should be by the use of the syntax |
| |
| \c common nearvar 2:near ; `nearvar' is a near common |
| \c common farvar 10:far ; and `farvar' is far |
| |
| Far common variables may be greater in size than 64Kb, and so the |
| OMF specification says that they are declared as a number of |
| \e{elements} of a given size. So a 10-byte far common variable could |
| be declared as ten one-byte elements, five two-byte elements, two |
| five-byte elements or one ten-byte element. |
| |
| Some \c{OMF} linkers require the \I{element size, in common |
| variables}\I{common variables, element size}element size, as well as |
| the variable size, to match when resolving common variables declared |
| in more than one module. Therefore NASM must allow you to specify |
| the element size on your far common variables. This is done by the |
| following syntax: |
| |
| \c common c_5by2 10:far 5 ; two five-byte elements |
| \c common c_2by5 10:far 2 ; five two-byte elements |
| |
| If no element size is specified, the default is 1. Also, the \c{FAR} |
| keyword is not required when an element size is specified, since |
| only far commons may have element sizes at all. So the above |
| declarations could equivalently be |
| |
| \c common c_5by2 10:5 ; two five-byte elements |
| \c common c_2by5 10:2 ; five two-byte elements |
| |
| In addition to these extensions, the \c{COMMON} directive in \c{obj} |
| also supports default-\c{WRT} specification like \c{EXTERN} does |
| (explained in \k{objextern}). So you can also declare things like |
| |
| \c common foo 10:wrt dgroup |
| \c common bar 16:far 2:wrt data |
| \c common baz 24:wrt data:6 |
| |
| |
| \S{objdepend} Embedded File Dependency Information |
| |
| Since NASM 2.13.02, \c{obj} files contain embedded dependency file |
| information. To suppress the generation of dependencies, use |
| |
| \c %pragma obj nodepend |
| |
| |
| \H{win32fmt} \i\c{win32}: Microsoft Win32 Object Files |
| |
| The \c{win32} output format generates Microsoft Win32 object files, |
| suitable for passing to Microsoft linkers such as \i{Visual C++}. |
| Note that Borland Win32 compilers do not use this format, but use |
| \c{obj} instead (see \k{objfmt}). |
| |
| \c{win32} provides a default output file-name extension of \c{.obj}. |
| |
| Note that although Microsoft say that Win32 object files follow the |
| \c{COFF} (Common Object File Format) standard, the object files produced |
| by Microsoft Win32 compilers are not compatible with COFF linkers |
| such as DJGPP's, and vice versa. This is due to a difference of |
| opinion over the precise semantics of PC-relative relocations. To |
| produce COFF files suitable for DJGPP, use NASM's \c{coff} output |
| format; conversely, the \c{coff} format does not produce object |
| files that Win32 linkers can generate correct output from. |
| |
| |
| \S{win32sect} \c{win32} Extensions to the \c{SECTION} |
| Directive\I{SECTION, win32 extensions to} |
| |
| Like the \c{obj} format, \c{win32} allows you to specify additional |
| information on the \c{SECTION} directive line, to control the type |
| and properties of sections you declare. Section types and properties |
| are generated automatically by NASM for the \i{standard section names} |
| \c{.text}, \c{.data} and \c{.bss}, but may still be overridden by |
| these qualifiers. |
| |
| The available qualifiers are: |
| |
| \b \c{code}, or equivalently \c{text}, defines the section to be a |
| code section. This marks the section as readable and executable, but |
| not writable, and also indicates to the linker that the type of the |
| section is code. |
| |
| \b \c{data} and \c{bss} define the section to be a data section, |
| analogously to \c{code}. Data sections are marked as readable and |
| writable, but not executable. \c{data} declares an initialized data |
| section, whereas \c{bss} declares an uninitialized data section. |
| |
| \b \c{rdata} declares an initialized data section that is readable |
| but not writable. Microsoft compilers use this section to place |
| constants in it. |
| |
| \b \c{info} defines the section to be an \i{informational section}, |
| which is not included in the executable file by the linker, but may |
| (for example) pass information \e{to} the linker. For example, |
| declaring an \c{info}-type section called \i\c{.drectve} causes the |
| linker to interpret the contents of the section as command-line |
| options. |
| |
| \b \c{align=}, used with a trailing number as in \c{obj}, gives the |
| \I{section alignment, in win32}\I{alignment, in win32 |
| sections}alignment requirements of the section. The maximum you may |
| specify is 64: the Win32 object file format contains no means to |
| request a greater section alignment than this. If alignment is not |
| explicitly specified, the defaults are 16-byte alignment for code |
| sections, 8-byte alignment for rdata sections and 4-byte alignment |
| for data (and BSS) sections. |
| Informational sections get a default alignment of 1 byte (no |
| alignment), though the value does not matter. |
| |
| The defaults assumed by NASM if you do not specify the above |
| qualifiers are: |
| |
| \c section .text code align=16 |
| \c section .data data align=4 |
| \c section .rdata rdata align=8 |
| \c section .bss bss align=4 |
| |
| Any other section name is treated by default like \c{.text}. |
| |
| \S{win32safeseh} \c{win32}: Safe Structured Exception Handling |
| |
| Among other improvements in Windows XP SP2 and Windows Server 2003 |
| Microsoft has introduced concept of "safe structured exception |
| handling." General idea is to collect handlers' entry points in |
| designated read-only table and have alleged entry point verified |
| against this table prior exception control is passed to the handler. In |
| order for an executable module to be equipped with such "safe exception |
| handler table," all object modules on linker command line has to comply |
| with certain criteria. If one single module among them does not, then |
| the table in question is omitted and above mentioned run-time checks |
| will not be performed for application in question. Table omission is by |
| default silent and therefore can be easily overlooked. One can instruct |
| linker to refuse to produce binary without such table by passing |
| \c{/safeseh} command line option. |
| |
| Without regard to this run-time check merits it's natural to expect |
| NASM to be capable of generating modules suitable for \c{/safeseh} |
| linking. From developer's viewpoint the problem is two-fold: |
| |
| \b how to adapt modules not deploying exception handlers of their own; |
| |
| \b how to adapt/develop modules utilizing custom exception handling; |
| |
| Former can be easily achieved with any NASM version by adding following |
| line to source code: |
| |
| \c $@feat.00 equ 1 |
| |
| As of version 2.03 NASM adds this absolute symbol automatically. If |
| it's not already present to be precise. I.e. if for whatever reason |
| developer would choose to assign another value in source file, it would |
| still be perfectly possible. |
| |
| Registering custom exception handler on the other hand requires certain |
| "magic." As of version 2.03 additional directive is implemented, |
| \c{safeseh}, which instructs the assembler to produce appropriately |
| formatted input data for above mentioned "safe exception handler |
| table." Its typical use would be: |
| |
| \c section .text |
| \c extern _MessageBoxA@16 |
| \c %if __?NASM_VERSION_ID?__ >= 0x02030000 |
| \c safeseh handler ; register handler as "safe handler" |
| \c %endif |
| \c handler: |
| \c push DWORD 1 ; MB_OKCANCEL |
| \c push DWORD caption |
| \c push DWORD text |
| \c push DWORD 0 |
| \c call _MessageBoxA@16 |
| \c sub eax,1 ; incidentally suits as return value |
| \c ; for exception handler |
| \c ret |
| \c global _main |
| \c _main: |
| \c push DWORD handler |
| \c push DWORD [fs:0] |
| \c mov DWORD [fs:0],esp ; engage exception handler |
| \c xor eax,eax |
| \c mov eax,DWORD[eax] ; cause exception |
| \c pop DWORD [fs:0] ; disengage exception handler |
| \c add esp,4 |
| \c ret |
| \c text: db 'OK to rethrow, CANCEL to generate core dump',0 |
| \c caption:db 'SEGV',0 |
| \c |
| \c section .drectve info |
| \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' |
| |
| As you might imagine, it's perfectly possible to produce .exe binary |
| with "safe exception handler table" and yet engage unregistered |
| exception handler. Indeed, handler is engaged by simply manipulating |
| \c{[fs:0]} location at run-time, something linker has no power over, |
| run-time that is. It should be explicitly mentioned that such failure |
| to register handler's entry point with \c{safeseh} directive has |
| undesired side effect at run-time. If exception is raised and |
| unregistered handler is to be executed, the application is abruptly |
| terminated without any notification whatsoever. One can argue that |
| system could at least have logged some kind "non-safe exception |
| handler in x.exe at address n" message in event log, but no, literally |
| no notification is provided and user is left with no clue on what |
| caused application failure. |
| |
| Finally, all mentions of linker in this paragraph refer to Microsoft |
| linker version 7.x and later. Presence of \c{@feat.00} symbol and input |
| data for "safe exception handler table" causes no backward |
| incompatibilities and "safeseh" modules generated by NASM 2.03 and |
| later can still be linked by earlier versions or non-Microsoft linkers. |
| |
| \S{codeview} Debugging formats for Windows |
| \I{Windows debugging formats} |
| |
| The \c{win32} and \c{win64} formats support the Microsoft CodeView |
| debugging format. Currently CodeView version 8 format is supported |
| (\i\c{cv8}), but newer versions of the CodeView debugger should be |
| able to handle this format as well. |
| |
| |
| \H{win64fmt} \i\c{win64}: Microsoft Win64 Object Files |
| |
| The \c{win64} output format generates Microsoft Win64 object files, |
| which is nearly 100% identical to the \c{win32} object format (\k{win32fmt}) |
| with the exception that it is meant to target 64-bit code and the x86-64 |
| platform altogether. This object file is used exactly the same as the \c{win32} |
| object format (\k{win32fmt}), in NASM, with regard to this exception. |
| |
| \S{win64pic} \c{win64}: Writing Position-Independent Code |
| |
| While \c{REL} takes good care of RIP-relative addressing, there is one |
| aspect that is easy to overlook for a Win64 programmer: indirect |
| references. Consider a switch dispatch table: |
| |
| \c jmp qword [dsptch+rax*8] |
| \c ... |
| \c dsptch: dq case0 |
| \c dq case1 |
| \c ... |
| |
| Even a novice Win64 assembler programmer will soon realize that the code |
| is not 64-bit savvy. Most notably linker will refuse to link it with |
| |
| \c 'ADDR32' relocation to '.text' invalid without /LARGEADDRESSAWARE:NO |
| |
| So [s]he will have to split jmp instruction as following: |
| |
| \c lea rbx,[rel dsptch] |
| \c jmp qword [rbx+rax*8] |
| |
| What happens behind the scene is that effective address in \c{lea} is |
| encoded relative to instruction pointer, or in perfectly |
| position-independent manner. But this is only part of the problem! |
| Trouble is that in .dll context \c{caseN} relocations will make their |
| way to the final module and might have to be adjusted at .dll load |
| time. To be specific when it can't be loaded at preferred address. And |
| when this occurs, pages with such relocations will be rendered private |
| to current process, which kind of undermines the idea of sharing .dll. |
| But no worry, it's trivial to fix: |
| |
| \c lea rbx,[rel dsptch] |
| \c add rbx,[rbx+rax*8] |
| \c jmp rbx |
| \c ... |
| \c dsptch: dq case0-dsptch |
| \c dq case1-dsptch |
| \c ... |
| |
| NASM version 2.03 and later provides another alternative, \c{wrt |
| ..imagebase} operator, which returns offset from base address of the |
| current image, be it .exe or .dll module, therefore the name. For those |
| acquainted with PE-COFF format base address denotes start of |
| \c{IMAGE_DOS_HEADER} structure. Here is how to implement switch with |
| these image-relative references: |
| |
| \c lea rbx,[rel dsptch] |
| \c mov eax,[rbx+rax*4] |
| \c sub rbx,dsptch wrt ..imagebase |
| \c add rbx,rax |
| \c jmp rbx |
| \c ... |
| \c dsptch: dd case0 wrt ..imagebase |
| \c dd case1 wrt ..imagebase |
| |
| One can argue that the operator is redundant. Indeed, snippet before |
| last works just fine with any NASM version and is not even Windows |
| specific... The real reason for implementing \c{wrt ..imagebase} will |
| become apparent in next paragraph. |
| |
| It should be noted that \c{wrt ..imagebase} is defined as 32-bit |
| operand only: |
| |
| \c dd label wrt ..imagebase ; ok |
| \c dq label wrt ..imagebase ; bad |
| \c mov eax,label wrt ..imagebase ; ok |
| \c mov rax,label wrt ..imagebase ; bad |
| |
| \S{win64seh} \c{win64}: Structured Exception Handling |
| |
| Structured exception handing in Win64 is completely different matter |
| from Win32. Upon exception program counter value is noted, and |
| linker-generated table comprising start and end addresses of all the |
| functions [in given executable module] is traversed and compared to the |
| saved program counter. Thus so called \c{UNWIND_INFO} structure is |
| identified. If it's not found, then offending subroutine is assumed to |
| be "leaf" and just mentioned lookup procedure is attempted for its |
| caller. In Win64 leaf function is such function that does not call any |
| other function \e{nor} modifies any Win64 non-volatile registers, |
| including stack pointer. The latter ensures that it's possible to |
| identify leaf function's caller by simply pulling the value from the |
| top of the stack. |
| |
| While majority of subroutines written in assembler are not calling any |
| other function, requirement for non-volatile registers' immutability |
| leaves developer with not more than 7 registers and no stack frame, |
| which is not necessarily what [s]he counted with. Customarily one would |
| meet the requirement by saving non-volatile registers on stack and |
| restoring them upon return, so what can go wrong? If [and only if] an |
| exception is raised at run-time and no \c{UNWIND_INFO} structure is |
| associated with such "leaf" function, the stack unwind procedure will |
| expect to find caller's return address on the top of stack immediately |
| followed by its frame. Given that developer pushed caller's |
| non-volatile registers on stack, would the value on top point at some |
| code segment or even addressable space? Well, developer can attempt |
| copying caller's return address to the top of stack and this would |
| actually work in some very specific circumstances. But unless developer |
| can guarantee that these circumstances are always met, it's more |
| appropriate to assume worst case scenario, i.e. stack unwind procedure |
| going berserk. Relevant question is what happens then? Application is |
| abruptly terminated without any notification whatsoever. Just like in |
| Win32 case, one can argue that system could at least have logged |
| "unwind procedure went berserk in x.exe at address n" in event log, but |
| no, no trace of failure is left. |
| |
| Now, when we understand significance of the \c{UNWIND_INFO} structure, |
| let's discuss what's in it and/or how it's processed. First of all it |
| is checked for presence of reference to custom language-specific |
| exception handler. If there is one, then it's invoked. Depending on the |
| return value, execution flow is resumed (exception is said to be |
| "handled"), \e{or} rest of \c{UNWIND_INFO} structure is processed as |
| following. Beside optional reference to custom handler, it carries |
| information about current callee's stack frame and where non-volatile |
| registers are saved. Information is detailed enough to be able to |
| reconstruct contents of caller's non-volatile registers upon call to |
| current callee. And so caller's context is reconstructed, and then |
| unwind procedure is repeated, i.e. another \c{UNWIND_INFO} structure is |
| associated, this time, with caller's instruction pointer, which is then |
| checked for presence of reference to language-specific handler, etc. |
| The procedure is recursively repeated till exception is handled. As |
| last resort system "handles" it by generating memory core dump and |
| terminating the application. |
| |
| As for the moment of this writing NASM unfortunately does not |
| facilitate generation of above mentioned detailed information about |
| stack frame layout. But as of version 2.03 it implements building |
| blocks for generating structures involved in stack unwinding. As |
| simplest example, here is how to deploy custom exception handler for |
| leaf function: |
| |
| \c default rel |
| \c section .text |
| \c extern MessageBoxA |
| \c handler: |
| \c sub rsp,40 |
| \c mov rcx,0 |
| \c lea rdx,[text] |
| \c lea r8,[caption] |
| \c mov r9,1 ; MB_OKCANCEL |
| \c call MessageBoxA |
| \c sub eax,1 ; incidentally suits as return value |
| \c ; for exception handler |
| \c add rsp,40 |
| \c ret |
| \c global main |
| \c main: |
| \c xor rax,rax |
| \c mov rax,QWORD[rax] ; cause exception |
| \c ret |
| \c main_end: |
| \c text: db 'OK to rethrow, CANCEL to generate core dump',0 |
| \c caption:db 'SEGV',0 |
| \c |
| \c section .pdata rdata align=4 |
| \c dd main wrt ..imagebase |
| \c dd main_end wrt ..imagebase |
| \c dd xmain wrt ..imagebase |
| \c section .xdata rdata align=8 |
| \c xmain: db 9,0,0,0 |
| \c dd handler wrt ..imagebase |
| \c section .drectve info |
| \c db '/defaultlib:user32.lib /defaultlib:msvcrt.lib ' |
| |
| What you see in \c{.pdata} section is element of the "table comprising |
| start and end addresses of function" along with reference to associated |
| \c{UNWIND_INFO} structure. And what you see in \c{.xdata} section is |
| \c{UNWIND_INFO} structure describing function with no frame, but with |
| designated exception handler. References are \e{required} to be |
| image-relative (which is the real reason for implementing \c{wrt |
| ..imagebase} operator). It should be noted that \c{rdata align=n}, as |
| well as \c{wrt ..imagebase}, are optional in these two segments' |
| contexts, i.e. can be omitted. Latter means that \e{all} 32-bit |
| references, not only above listed required ones, placed into these two |
| segments turn out image-relative. Why is it important to understand? |
| Developer is allowed to append handler-specific data to \c{UNWIND_INFO} |
| structure, and if [s]he adds a 32-bit reference, then [s]he will have |
| to remember to adjust its value to obtain the real pointer. |
| |
| As already mentioned, in Win64 terms leaf function is one that does not |
| call any other function \e{nor} modifies any non-volatile register, |
| including stack pointer. But it's not uncommon that assembler |
| programmer plans to utilize every single register and sometimes even |
| have variable stack frame. Is there anything one can do with bare |
| building blocks? I.e. besides manually composing fully-fledged |
| \c{UNWIND_INFO} structure, which would surely be considered |
| error-prone? Yes, there is. Recall that exception handler is called |
| first, before stack layout is analyzed. As it turned out, it's |
| perfectly possible to manipulate current callee's context in custom |
| handler in manner that permits further stack unwinding. General idea is |
| that handler would not actually "handle" the exception, but instead |
| restore callee's context, as it was at its entry point and thus mimic |
| leaf function. In other words, handler would simply undertake part of |
| unwinding procedure. Consider following example: |
| |
| \c function: |
| \c mov rax,rsp ; copy rsp to volatile register |
| \c push r15 ; save non-volatile registers |
| \c push rbx |
| \c push rbp |
| \c mov r11,rsp ; prepare variable stack frame |
| \c sub r11,rcx |
| \c and r11,-64 |
| \c mov QWORD[r11],rax ; check for exceptions |
| \c mov rsp,r11 ; allocate stack frame |
| \c mov QWORD[rsp],rax ; save original rsp value |
| \c magic_point: |
| \c ... |
| \c mov r11,QWORD[rsp] ; pull original rsp value |
| \c mov rbp,QWORD[r11-24] |
| \c mov rbx,QWORD[r11-16] |
| \c mov r15,QWORD[r11-8] |
| \c mov rsp,r11 ; destroy frame |
| \c ret |
| |
| The keyword is that up to \c{magic_point} original \c{rsp} value |
| remains in chosen volatile register and no non-volatile register, |
| except for \c{rsp}, is modified. While past \c{magic_point} \c{rsp} |
| remains constant till the very end of the \c{function}. In this case |
| custom language-specific exception handler would look like this: |
| |
| \c EXCEPTION_DISPOSITION handler (EXCEPTION_RECORD *rec,ULONG64 frame, |
| \c CONTEXT *context,DISPATCHER_CONTEXT *disp) |
| \c { ULONG64 *rsp; |
| \c if (context->Rip<(ULONG64)magic_point) |
| \c rsp = (ULONG64 *)context->Rax; |
| \c else |
| \c { rsp = ((ULONG64 **)context->Rsp)[0]; |
| \c context->Rbp = rsp[-3]; |
| \c context->Rbx = rsp[-2]; |
| \c context->R15 = rsp[-1]; |
| \c } |
| \c context->Rsp = (ULONG64)rsp; |
| \c |
| \c memcpy (disp->ContextRecord,context,sizeof(CONTEXT)); |
| \c RtlVirtualUnwind(UNW_FLAG_NHANDLER,disp->ImageBase, |
| \c dips->ControlPc,disp->FunctionEntry,disp->ContextRecord, |
| \c &disp->HandlerData,&disp->EstablisherFrame,NULL); |
| \c return ExceptionContinueSearch; |
| \c } |
| |
| As custom handler mimics leaf function, corresponding \c{UNWIND_INFO} |
| structure does not have to contain any information about stack frame |
| and its layout. |
| |
| \H{cofffmt} \i\c{coff}: \i{Common Object File Format} |
| |
| The \c{coff} output type produces \c{COFF} object files suitable for |
| linking with the \i{DJGPP} linker. |
| |
| \c{coff} provides a default output file-name extension of \c{.o}. |
| |
| The \c{coff} format supports the same extensions to the \c{SECTION} |
| directive as \c{win32} does, except that the \c{align} qualifier and |
| the \c{info} section type are not supported. |
| |
| \H{machofmt} \I{Mach-O}\i\c{macho32} and \i\c{macho64}: \i{Mach Object File Format} |
| |
| The \c{macho32} and \c{macho64} output formts produces Mach-O |
| object files suitable for linking with the \i{MacOS X} linker. |
| \i\c{macho} is a synonym for \c{macho32}. |
| |
| \c{macho} provides a default output file-name extension of \c{.o}. |
| |
| \S{machosect} \c{macho} extensions to the \c{SECTION} Directive |
| \I{SECTION, macho extensions to} |
| |
| The \c{macho} output format specifies section names in the format |
| "\e{segment}\c{,}\e{section}". No spaces are allowed around the |
| comma. The following flags can also be specified: |
| |
| \b \c{data} - this section contains initialized data items |
| |
| \b \c{code} - this section contains code exclusively |
| |
| \b \c{mixed} - this section contains both code and data |
| |
| \b \c{bss} - this section is uninitialized and filled with zero |
| |
| \b \c{zerofill} - same as \c{bss} |
| |
| \b \c{no_dead_strip} - inhibit dead code stripping for this section |
| |
| \b \c{live_support} - set the live support flag for this section |
| |
| \b \c{strip_static_syms} - strip static symbols for this section |
| |
| \b \c{debug} - this section contains debugging information |
| |
| \b \c{align=}\e{alignment} - specify section alignment |
| |
| The default is \c{data}, unless the section name is \c{__text} or |
| \c{__bss} in which case the default is \c{text} or \c{bss}, |
| respectively. |
| |
| For compatibility with other Unix platforms, the following standard |
| names are also supported: |
| |
| \c .text = __TEXT,__text text |
| \c .rodata = __DATA,__const data |
| \c .data = __DATA,__data data |
| \c .bss = __DATA,__bss bss |
| |
| If the \c{.rodata} section contains no relocations, it is instead put |
| into the \c{__TEXT,__const} section unless this section has already |
| been specified explicitly. However, it is probably better to specify |
| \c{__TEXT,__const} and \c{__DATA,__const} explicitly as appropriate. |
| |
| \S{machotls} \i{Thread Local Storage in Mach-O}\I{TLS}: \c{macho} special |
| symbols and \i\c{WRT} |
| |
| Mach-O defines the following special symbols that can be used on the |
| right-hand side of the \c{WRT} operator: |
| |
| \b \c{..tlvp} is used to specify access to thread-local storage. |
| |
| \b \c{..gotpcrel} is used to specify references to the Global Offset |
| Table. The GOT is supported in the \c{macho64} format only. |
| |
| \S{macho-ssvs} \c{macho} specfic directive \i\c{subsections_via_symbols} |
| |
| The directive \c{subsections_via_symbols} sets the |
| \c{MH_SUBSECTIONS_VIA_SYMBOLS} flag in the Mach-O header, that effectively |
| separates a block (or a subsection) based on a symbol. It is often used |
| for eliminating dead codes by a linker. |
| |
| This directive takes no arguments. |
| |
| This is a macro implemented as a \c{%pragma}. It can also be |
| specified in its \c{%pragma} form, in which case it will not affect |
| non-Mach-O builds of the same source code: |
| |
| \c %pragma macho subsections_via_symbols |
| |
| \S{macho-ssvs} \c{macho} specfic directive \i\c{no_dead_strip} |
| |
| The directive \c{no_dead_strip} sets the Mach-O \c{SH_NO_DEAD_STRIP} |
| section flag on the section containing a a specific symbol. This |
| directive takes a list of symbols as its arguments. |
| |
| This is a macro implemented as a \c{%pragma}. It can also be |
| specified in its \c{%pragma} form, in which case it will not affect |
| non-Mach-O builds of the same source code: |
| |
| \c %pragma macho no_dead_strip symbol... |
| |
| \S{macho-pext} \c{macho} specific extensions to the \c{GLOBAL} |
| Directive: \i\c{private_extern} |
| |
| The directive extension to \c{GLOBAL} marks the symbol with limited |
| global scope. For example, you can specify the global symbol with |
| this extension: |
| |
| \c global foo:private_extern |
| \c foo: |
| \c ; codes |
| |
| Using with static linker will clear the private extern attribute. |
| But linker option like \c{-keep_private_externs} can avoid it. |
| |
| \H{elffmt} \i\c{elf32}, \i\c{elf64}, \i\c{elfx32}: \I{ELF}\I{linux, elf}\i{Executable and Linkable |
| Format} Object Files |
| |
| The \c{elf32}, \c{elf64} and \c{elfx32} output formats generate |
| \c{ELF32 and ELF64} (Executable and Linkable Format) object files, as |
| used by Linux as well as \i{Unix System V}, including \i{Solaris x86}, |
| \i{UnixWare} and \i{SCO Unix}. ELF provides a default output |
| file-name extension of \c{.o}. \c{elf} is a synonym for \c{elf32}. |
| |
| The \c{elfx32} format is used for the \i{x32} ABI, which is a 32-bit |
| ABI with the CPU in 64-bit mode. |
| |
| \S{abisect} ELF specific directive \i\c{osabi} |
| |
| The ELF header specifies the application binary interface for the |
| target operating system (OSABI). This field can be set by using the |
| \c{osabi} directive with the numeric value (0-255) of the target |
| system. If this directive is not used, the default value will be "UNIX |
| System V ABI" (0) which will work on most systems which support ELF. |
| |
| \S{elfsect} ELF extensions to the \c{SECTION} Directive |
| \I{SECTION, ELF extensions to} |
| |
| Like the \c{obj} format, \c{elf} allows you to specify additional |
| information on the \c{SECTION} directive line, to control the type |
| and properties of sections you declare. Section types and properties |
| are generated automatically by NASM for the \i{standard section |
| names}, but may still be |
| overridden by these qualifiers. |
| |
| The available qualifiers are: |
| |
| \b \i\c{alloc} defines the section to be one which is loaded into |
| memory when the program is run. \i\c{noalloc} defines it to be one |
| which is not, such as an informational or comment section. |
| |
| \b \i\c{exec} defines the section to be one which should have execute |
| permission when the program is run. \i\c{noexec} defines it as one |
| which should not. |
| |
| \b \i\c{write} defines the section to be one which should be writable |
| when the program is run. \i\c{nowrite} defines it as one which should |
| not. |
| |
| \b \i\c{progbits} defines the section to be one with explicit contents |
| stored in the object file: an ordinary code or data section, for |
| example. |
| |
| \b \i\c{nobits} defines the section to be one with no explicit |
| contents given, such as a BSS section. |
| |
| \b \i\c{note} indicates that this section contains ELF notes. The |
| content of ELF notes are specified using normal assembly instructions; |
| it is up to the programmer to ensure these are valid ELF notes. |
| |
| \b \i\c{preinit_array} indicates that this section contains function |
| addresses to be called before any other initialization has happened. |
| |
| \b \i\c{init_array} indicates that this section contains function |
| addresses to be called during initialization. |
| |
| \b \i\c{fini_array} indicates that this section contains function |
| pointers to be called during termination. |
| |
| \b \I{align, ELF attribute}\c{align=}, used with a trailing number as in \c{obj}, gives the |
| \I{section alignment, in elf}\I{alignment, in elf sections}alignment |
| requirements of the section. |
| |
| \b \c{byte}, \c{word}, \c{dword}, \c{qword}, \c{tword}, \c{oword}, |
| \c{yword}, or \c{zword} with an optional \c{*}\i{multiplier} specify |
| the fundamental data item size for a section which contains either |
| fixed-sized data structures or strings; it also sets a default |
| alignment. This is generally used with the \c{strings} and \c{merge} |
| attributes (see below.) For example \c{byte*4} defines a unit size of |
| 4 bytes, with a default alignment of 1; \c{dword} also defines a unit |
| size of 4 bytes, but with a default alignment of 4. The \c{align=} |
| attribute, if specified, overrides this default alignment. |
| |
| \b \I{pointer, ELF attribute}\c{pointer} is equivalent to \c{dword} |
| for \c{elf32} or \c{elfx32}, and \c{qword} for \c{elf64}. |
| |
| \b \I{strings, ELF attribute}\c{strings} indicate that this section |
| contains exclusively null-terminated strings. By default these are |
| assumed to be byte strings, but a size specifier can be used to |
| override that. |
| |
| \b \i\c{merge} indicates that duplicate data elements in this section |
| should be merged with data elements from other object files. Data |
| elements can be either fixed-sized objects or null-terminatedstrings |
| (with the \c{strings} attribute.) A size specifier is required unless |
| \c{strings} is specified, in which case the size defaults to \c{byte}. |
| |
| \b \i\c{tls} defines the section to be one which contains |
| thread local variables. |
| |
| The defaults assumed by NASM if you do not specify the above |
| qualifiers are: |
| |
| \I\c{.text} \I\c{.rodata} \I\c{.lrodata} \I\c{.data} \I\c{.ldata} |
| \I\c{.bss} \I\c{.lbss} \I\c{.tdata} \I\c{.tbss} \I\c\{.comment} |
| |
| \c section .text progbits alloc exec nowrite align=16 |
| \c section .rodata progbits alloc noexec nowrite align=4 |
| \c section .lrodata progbits alloc noexec nowrite align=4 |
| \c section .data progbits alloc noexec write align=4 |
| \c section .ldata progbits alloc noexec write align=4 |
| \c section .bss nobits alloc noexec write align=4 |
| \c section .lbss nobits alloc noexec write align=4 |
| \c section .tdata progbits alloc noexec write align=4 tls |
| \c section .tbss nobits alloc noexec write align=4 tls |
| \c section .comment progbits noalloc noexec nowrite align=1 |
| \c section .preinit_array preinit_array alloc noexec nowrite pointer |
| \c section .init_array init_array alloc noexec nowrite pointer |
| \c section .fini_array fini_array alloc noexec nowrite pointer |
| \c section .note note noalloc noexec nowrite align=4 |
| \c section other progbits alloc noexec nowrite align=1 |
| |
| (Any section name other than those in the above table |
| is treated by default like \c{other} in the above table. |
| Please note that section names are case sensitive.) |
| |
| |
| \S{elfwrt} \i{Position-Independent Code}\I{PIC}: ELF Special |
| Symbols and \i\c{WRT} |
| |
| Since \c{ELF} does not support segment-base references, the \c{WRT} |
| operator is not used for its normal purpose; therefore NASM's |
| \c{elf} output format makes use of \c{WRT} for a different purpose, |
| namely the PIC-specific \I{relocations, PIC-specific}relocation |
| types. |
| |
| \c{elf} defines five special symbols which you can use as the |
| right-hand side of the \c{WRT} operator to obtain PIC relocation |
| types. They are \i\c{..gotpc}, \i\c{..gotoff}, \i\c{..got}, |
| \i\c{..plt} and \i\c{..sym}. Their functions are summarized here: |
| |
| \b Referring to the symbol marking the global offset table base |
| using \c{wrt ..gotpc} will end up giving the distance from the |
| beginning of the current section to the global offset table. |
| (\i\c{_GLOBAL_OFFSET_TABLE_} is the standard symbol name used to |
| refer to the \i{GOT}.) So you would then need to add \i\c{$$} to the |
| result to get the real address of the GOT. |
| |
| \b Referring to a location in one of your own sections using \c{wrt |
| ..gotoff} will give the distance from the beginning of the GOT to |
| the specified location, so that adding on the address of the GOT |
| would give the real address of the location you wanted. |
| |
| \b Referring to an external or global symbol using \c{wrt ..got} |
| causes the linker to build an entry \e{in} the GOT containing the |
| address of the symbol, and the reference gives the distance from the |
| beginning of the GOT to the entry; so you can add on the address of |
| the GOT, load from the resulting address, and end up with the |
| address of the symbol. |
| |
| \b Referring to a procedure name using \c{wrt ..plt} causes the |
| linker to build a \i{procedure linkage table} entry for the symbol, |
| and the reference gives the address of the \i{PLT} entry. You can |
| only use this in contexts which would generate a PC-relative |
| relocation normally (i.e. as the destination for \c{CALL} or |
| \c{JMP}), since ELF contains no relocation type to refer to PLT |
| entries absolutely. |
| |
| \b Referring to a symbol name using \c{wrt ..sym} causes NASM to |
| write an ordinary relocation, but instead of making the relocation |
| relative to the start of the section and then adding on the offset |
| to the symbol, it will write a relocation record aimed directly at |
| the symbol in question. The distinction is a necessary one due to a |
| peculiarity of the dynamic linker. |
| |
| A fuller explanation of how to use these relocation types to write |
| shared libraries entirely in NASM is given in \k{picdll}. |
| |
| \S{elftls} \i{Thread Local Storage in ELF}\I{TLS}: \c{elf} Special |
| Symbols and \i\c{WRT} |
| |
| \b In ELF32 mode, referring to an external or global symbol using |
| \c{wrt ..tlsie} \I\c{..tlsie} |
| causes the linker to build an entry \e{in} the GOT containing the |
| offset of the symbol within the TLS block, so you can access the value |
| of the symbol with code such as: |
| |
| \c mov eax,[tid wrt ..tlsie] |
| \c mov [gs:eax],ebx |
| |
| |
| \b In ELF64 or ELFx32 mode, referring to an external or global symbol using |
| \c{wrt ..gottpoff} \I\c{..gottpoff} |
| causes the linker to build an entry \e{in} the GOT containing the |
| offset of the symbol within the TLS block, so you can access the value |
| of the symbol with code such as: |
| |
| \c mov rax,[rel tid wrt ..gottpoff] |
| \c mov rcx,[fs:rax] |
| |
| |
| \S{elfglob} \c{elf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL, |
| elf extensions to}\I{GLOBAL, aoutb extensions to} |
| |
| \c{ELF} object files can contain more information about a global symbol |
| than just its address: they can contain the \I{symbol sizes, |
| specifying}\I{size, of symbols}size of the symbol and its \I{symbol |
| types, specifying}\I{type, of symbols}type as well. These are not |
| merely debugger conveniences, but are actually necessary when the |
| program being written is a \i{shared library}. NASM therefore |
| supports some extensions to the \c{GLOBAL} directive, allowing you |
| to specify these features. |
| |
| You can specify whether a global variable is a function or a data |
| object by suffixing the name with a colon and the word |
| \i\c{function} or \i\c{data}. (\i\c{object} is a synonym for |
| \c{data}.) For example: |
| |
| \c global hashlookup:function, hashtable:data |
| |
| exports the global symbol \c{hashlookup} as a function and |
| \c{hashtable} as a data object. |
| |
| Optionally, you can control the ELF visibility of the symbol. Just |
| add one of the visibility keywords: \i\c{default}, \i\c{internal}, |
| \i\c{hidden}, or \i\c{protected}. The default is \i\c{default} of |
| course. For example, to make \c{hashlookup} hidden: |
| |
| \c global hashlookup:function hidden |
| |
| Since version 2.15, it is possible to specify symbols binding. The keywords |
| are: \i\c{weak} to generate weak symbol or \i\c{strong}. The default is \i\c{strong}. |
| |
| You can also specify the size of the data associated with the |
| symbol, as a numeric expression (which may involve labels, and even |
| forward references) after the type specifier. Like this: |
| |
| \c global hashtable:data (hashtable.end - hashtable) |
| \c |
| \c hashtable: |
| \c db this,that,theother ; some data here |
| \c .end: |
| |
| This makes NASM automatically calculate the length of the table and |
| place that information into the \c{ELF} symbol table. |
| |
| Declaring the type and size of global symbols is necessary when |
| writing shared library code. For more information, see |
| \k{picglobal}. |
| |
| |
| \S{elfextrn} \c{elf} Extensions to the \c{EXTERN} Directive\I{EXTERN, |
| elf extensions to}\I{EXTERN, elf extensions to} |
| |
| Since version 2.15 it is possible to specify keyword \i\c{weak} to generate weak external |
| reference. Example: |
| |
| \c extern weak_ref:weak |
| |
| |
| \S{elfcomm} \c{elf} Extensions to the \c{COMMON} Directive |
| \I{COMMON, elf extensions to} |
| |
| \c{ELF} also allows you to specify alignment requirements \I{common |
| variables, alignment in elf}\I{alignment, of elf common variables}on |
| common variables. This is done by putting a number (which must be a |
| power of two) after the name and size of the common variable, |
| separated (as usual) by a colon. For example, an array of |
| doublewords would benefit from 4-byte alignment: |
| |
| \c common dwordarray 128:4 |
| |
| This declares the total size of the array to be 128 bytes, and |
| requires that it be aligned on a 4-byte boundary. |
| |
| |
| \S{elf16} 16-bit code and ELF |
| \I{ELF, 16-bit code} |
| |
| Older versions of the \c{ELF32} specification did not provide |
| relocations for 8- and 16-bit values. It is now part of the formal |
| specification, and any new enough linker should support them. |
| |
| ELF has currently no support for segmented programming. |
| |
| \S{elfdbg} Debug formats and ELF |
| \I{ELF, debug formats} |
| |
| ELF provides debug information in \c{STABS} and \c{DWARF} formats. |
| Line number information is generated for all executable sections, but please |
| note that only the ".text" section is executable by default. |
| |
| \H{aoutfmt} \i\c{aout}: Linux \I{a.out, Linux version}\I{linux, a.out}\c{a.out} Object Files |
| |
| The \c{aout} format generates \c{a.out} object files, in the form used |
| by early Linux systems (current Linux systems use ELF, see |
| \k{elffmt}.) These differ from other \c{a.out} object files in that |
| the magic number in the first four bytes of the file is |
| different; also, some implementations of \c{a.out}, for example |
| NetBSD's, support position-independent code, which Linux's |
| implementation does not. |
| |
| \c{a.out} provides a default output file-name extension of \c{.o}. |
| |
| \c{a.out} is a very simple object format. It supports no special |
| directives, no special symbols, no use of \c{SEG} or \c{WRT}, and no |
| extensions to any standard directives. It supports only the three |
| \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. |
| |
| |
| \H{aoutfmt} \i\c{aoutb}: \i{NetBSD}/\i{FreeBSD}/\i{OpenBSD} |
| \I{a.out, BSD version}\c{a.out} Object Files |
| |
| The \c{aoutb} format generates \c{a.out} object files, in the form |
| used by the various free \c{BSD Unix} clones, \c{NetBSD}, \c{FreeBSD} |
| and \c{OpenBSD}. For simple object files, this object format is exactly |
| the same as \c{aout} except for the magic number in the first four bytes |
| of the file. However, the \c{aoutb} format supports |
| \I{PIC}\i{position-independent code} in the same way as the \c{elf} |
| format, so you can use it to write \c{BSD} \i{shared libraries}. |
| |
| \c{aoutb} provides a default output file-name extension of \c{.o}. |
| |
| \c{aoutb} supports no special directives, no special symbols, and |
| only the three \i{standard section names} \i\c{.text}, \i\c{.data} |
| and \i\c{.bss}. However, it also supports the same use of \i\c{WRT} as |
| \c{elf} does, to provide position-independent code relocation types. |
| See \k{elfwrt} for full documentation of this feature. |
| |
| \c{aoutb} also supports the same extensions to the \c{GLOBAL} |
| directive as \c{elf} does: see \k{elfglob} for documentation of |
| this. |
| |
| |
| \H{as86fmt} \c{as86}: \i{Minix}/Linux\I{linux, as86} \i\c{as86} Object Files |
| |
| The Minix/Linux 16-bit assembler \c{as86} has its own non-standard |
| object file format. Although its companion linker \i\c{ld86} produces |
| something close to ordinary \c{a.out} binaries as output, the object |
| file format used to communicate between \c{as86} and \c{ld86} is not |
| itself \c{a.out}. |
| |
| NASM supports this format, just in case it is useful, as \c{as86}. |
| \c{as86} provides a default output file-name extension of \c{.o}. |
| |
| \c{as86} is a very simple object format (from the NASM user's point |
| of view). It supports no special directives, no use of \c{SEG} or \c{WRT}, |
| and no extensions to any standard directives. It supports only the three |
| \i{standard section names} \i\c{.text}, \i\c{.data} and \i\c{.bss}. The |
| only special symbol supported is \c{..start}. |
| |
| |
| \H{rdffmt} \I{RDOFF}\i\c{rdf}: \i{Relocatable Dynamic Object File |
| Format} |
| |
| The \c{rdf} output format produces \c{RDOFF} object files. \c{RDOFF} |
| (Relocatable Dynamic Object File Format) is a home-grown object-file |
| format, designed alongside NASM itself and reflecting in its file |
| format the internal structure of the assembler. |
| |
| \c{RDOFF} is not used by any well-known operating systems. Those |
| writing their own systems, however, may well wish to use \c{RDOFF} |
| as their object format, on the grounds that it is designed primarily |
| for simplicity and contains very little file-header bureaucracy. |
| |
| The Unix NASM archive, and the DOS archive which includes sources, |
| both contain an \I{rdoff subdirectory}\c{rdoff} subdirectory holding |
| a set of RDOFF utilities: an RDF linker, an \c{RDF} static-library |
| manager, an RDF file dump utility, and a program which will load and |
| execute an RDF executable under Linux. |
| |
| \c{rdf} supports only the \i{standard section names} \i\c{.text}, |
| \i\c{.data} and \i\c{.bss}. |
| |
| |
| \S{rdflib} Requiring a Library: The \i\c{LIBRARY} Directive |
| |
| \c{RDOFF} contains a mechanism for an object file to demand a given |
| library to be linked to the module, either at load time or run time. |
| This is done by the \c{LIBRARY} directive, which takes one argument |
| which is the name of the module: |
| |
| \c library mylib.rdl |
| |
| |
| \S{rdfmod} Specifying a Module Name: The \i\c{MODULE} Directive |
| |
| Special \c{RDOFF} header record is used to store the name of the module. |
| It can be used, for example, by run-time loader to perform dynamic |
| linking. \c{MODULE} directive takes one argument which is the name |
| of current module: |
| |
| \c module mymodname |
| |
| Note that when you statically link modules and tell linker to strip |
| the symbols from output file, all module names will be stripped too. |
| To avoid it, you should start module names with \I{$, prefix}\c{$}, like: |
| |
| \c module $kernel.core |
| |
| |
| \S{rdfglob} \c{rdf} Extensions to the \c{GLOBAL} Directive\I{GLOBAL, |
| rdf extensions to} |
| |
| \c{RDOFF} global symbols can contain additional information needed by |
| the static linker. You can mark a global symbol as exported, thus |
| telling the linker do not strip it from target executable or library |
| file. Like in \c{ELF}, you can also specify whether an exported symbol |
| is a procedure (function) or data object. |
| |
| Suffixing the name with a colon and the word \i\c{export} you make the |
| symbol exported: |
| |
| \c global sys_open:export |
| |
| To specify that exported symbol is a procedure (function), you add the |
| word \i\c{proc} or \i\c{function} after declaration: |
| |
| \c global sys_open:export proc |
| |
| Similarly, to specify exported data object, add the word \i\c{data} |
| or \i\c{object} to the directive: |
| |
| \c global kernel_ticks:export data |
| |
| |
| \S{rdfimpt} \c{rdf} Extensions to the \c{EXTERN} Directive\I{EXTERN, |
| rdf extensions to} |
| |
| By default the \c{EXTERN} directive in \c{RDOFF} declares a "pure external" |
| symbol (i.e. the static linker will complain if such a symbol is not resolved). |
| To declare an "imported" symbol, which must be resolved later during a dynamic |
| linking phase, \c{RDOFF} offers an additional \c{import} modifier. As in |
| \c{GLOBAL}, you can also specify whether an imported symbol is a procedure |
| (function) or data object. For example: |
| |
| \c library $libc |
| \c extern _open:import |
| \c extern _printf:import proc |
| \c extern _errno:import data |
| |
| Here the directive \c{LIBRARY} is also included, which gives the dynamic linker |
| a hint as to where to find requested symbols. |
| |
| |
| \H{dbgfmt} \i\c{dbg}: Debugging Format |
| |
| The \c{dbg} format does not output an object file as such; instead, |
| it outputs a text file which contains a complete list of all the |
| transactions between the main body of NASM and the output-format |
| back end module. It is primarily intended to aid people who want to |
| write their own output drivers, so that they can get a clearer idea |
| of the various requests the main program makes of the output driver, |
| and in what order they happen. |
| |
| For simple files, one can easily use the \c{dbg} format like this: |
| |
| \c nasm -f dbg filename.asm |
| |
| which will generate a diagnostic file called \c{filename.dbg}. |
| However, this will not work well on files which were designed for a |
| different object format, because each object format defines its own |
| macros (usually user-level forms of directives), and those macros |
| will not be defined in the \c{dbg} format. Therefore it can be |
| useful to run NASM twice, in order to do the preprocessing with the |
| native object format selected: |
| |
| \c nasm -e -f rdf -o rdfprog.i rdfprog.asm |
| \c nasm -a -f dbg rdfprog.i |
| |
| This preprocesses \c{rdfprog.asm} into \c{rdfprog.i}, keeping the |
| \c{rdf} object format selected in order to make sure RDF special |
| directives are converted into primitive form correctly. Then the |
| preprocessed source is fed through the \c{dbg} format to generate |
| the final diagnostic output. |
| |
| This workaround will still typically not work for programs intended |
| for \c{obj} format, because the \c{obj} \c{SEGMENT} and \c{GROUP} |
| directives have side effects of defining the segment and group names |
| as symbols; \c{dbg} will not do this, so the program will not |
| assemble. You will have to work around that by defining the symbols |
| yourself (using \c{EXTERN}, for example) if you really need to get a |
| \c{dbg} trace of an \c{obj}-specific source file. |
| |
| \c{dbg} accepts any section name and any directives at all, and logs |
| them all to its output file. |
| |
| \c{dbg} accepts and logs any \c{%pragma}, but the specific |
| \c{%pragma}: |
| |
| \c %pragma dbg maxdump <size> |
| |
| where \c{<size>} is either a number or \c{unlimited}, can be used to |
| control the maximum size for dumping the full contents of a |
| \c{rawdata} output object. |
| |
| |
| \C{16bit} Writing 16-bit Code (DOS, Windows 3/3.1) |
| |
| This chapter attempts to cover some of the common issues encountered |
| when writing 16-bit code to run under \c{MS-DOS} or \c{Windows 3.x}. It |
| covers how to link programs to produce \c{.EXE} or \c{.COM} files, |
| how to write \c{.SYS} device drivers, and how to interface assembly |
| language code with 16-bit C compilers and with Borland Pascal. |
| |
| |
| \H{exefiles} Producing \i\c{.EXE} Files |
| |
| Any large program written under DOS needs to be built as a \c{.EXE} |
| file: only \c{.EXE} files have the necessary internal structure |
| required to span more than one 64K segment. \i{Windows} programs, |
| also, have to be built as \c{.EXE} files, since Windows does not |
| support the \c{.COM} format. |
| |
| In general, you generate \c{.EXE} files by using the \c{obj} output |
| format to produce one or more \i\c{.OBJ} files, and then linking |
| them together using a linker. However, NASM also supports the direct |
| generation of simple DOS \c{.EXE} files using the \c{bin} output |
| format (by using \c{DB} and \c{DW} to construct the \c{.EXE} file |
| header), and a macro package is supplied to do this. Thanks to |
| Yann Guidon for contributing the code for this. |
| |
| NASM may also support \c{.EXE} natively as another output format in |
| future releases. |
| |
| |
| \S{objexe} Using the \c{obj} Format To Generate \c{.EXE} Files |
| |
| This section describes the usual method of generating \c{.EXE} files |
| by linking \c{.OBJ} files together. |
| |
| Most 16-bit programming language packages come with a suitable |
| linker; if you have none of these, there is a free linker called |
| \i{VAL}\I{linker, free}, available in \c{LZH} archive format from |
| \W{ftp://x2ftp.oulu.fi/pub/msdos/programming/lang/}\i\c{x2ftp.oulu.fi}. |
| An LZH archiver can be found at |
| \W{ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers}\i\c{ftp.simtel.net}. |
| There is another `free' linker (though this one doesn't come with |
| sources) called \i{FREELINK}, available from |
| \W{http://www.pcorner.com/tpc/old/3-101.html}\i\c{www.pcorner.com}. |
| A third, \i\c{djlink}, written by DJ Delorie, is available at |
| \W{http://www.delorie.com/djgpp/16bit/djlink/}\i\c{www.delorie.com}. |
| A fourth linker, \i\c{ALINK}, written by Anthony A.J. Williams, is |
| available at \W{http://alink.sourceforge.net}\i\c{alink.sourceforge.net}. |
| |
| When linking several \c{.OBJ} files into a \c{.EXE} file, you should |
| ensure that exactly one of them has a start point defined (using the |
| \I{program entry point}\i\c{..start} special symbol defined by the |
| \c{obj} format: see \k{dotdotstart}). If no module defines a start |
| point, the linker will not know what value to give the entry-point |
| field in the output file header; if more than one defines a start |
| point, the linker will not know \e{which} value to use. |
| |
| An example of a NASM source file which can be assembled to a |
| \c{.OBJ} file and linked on its own to a \c{.EXE} is given here. It |
| demonstrates the basic principles of defining a stack, initialising |
| the segment registers, and declaring a start point. This file is |
| also provided in the \I{test subdirectory}\c{test} subdirectory of |
| the NASM archives, under the name \c{objexe.asm}. |
| |
| \c segment code |
| \c |
| \c ..start: |
| \c mov ax,data |
| \c mov ds,ax |
| \c mov ax,stack |
| \c mov ss,ax |
| \c mov sp,stacktop |
| |
| This initial piece of code sets up \c{DS} to point to the data |
| segment, and initializes \c{SS} and \c{SP} to point to the top of |
| the provided stack. Notice that interrupts are implicitly disabled |
| for one instruction after a move into \c{SS}, precisely for this |
| situation, so that there's no chance of an interrupt occurring |
| between the loads of \c{SS} and \c{SP} and not having a stack to |
| execute on. |
| |
| Note also that the special symbol \c{..start} is defined at the |
| beginning of this code, which means that will be the entry point |
| into the resulting executable file. |
| |
| \c mov dx,hello |
| \c mov ah,9 |
| \c int 0x21 |
| |
| The above is the main program: load \c{DS:DX} with a pointer to the |
| greeting message (\c{hello} is implicitly relative to the segment |
| \c{data}, which was loaded into \c{DS} in the setup code, so the |
| full pointer is valid), and call the DOS print-string function. |
| |
| \c mov ax,0x4c00 |
| \c int 0x21 |
| |
| This terminates the program using another DOS system call. |
| |
| \c segment data |
| \c |
| \c hello: db 'hello, world', 13, 10, '$' |
| |
| The data segment contains the string we want to display. |
| |
| \c segment stack stack |
| \c resb 64 |
| \c stacktop: |
| |
| The above code declares a stack segment containing 64 bytes of |
| uninitialized stack space, and points \c{stacktop} at the top of it. |
| The directive \c{segment stack stack} defines a segment \e{called} |
| \c{stack}, and also of \e{type} \c{STACK}. The latter is not |
| necessary to the correct running of the program, but linkers are |
| likely to issue warnings or errors if your program has no segment of |
| type \c{STACK}. |
| |
| The above file, when assembled into a \c{.OBJ} file, will link on |
| its own to a valid \c{.EXE} file, which when run will print `hello, |
| world' and then exit. |
| |
| |
| \S{binexe} Using the \c{bin} Format To Generate \c{.EXE} Files |
| |
| The \c{.EXE} file format is simple enough that it's possible to |
| build a \c{.EXE} file by writing a pure-binary program and sticking |
| a 32-byte header on the front. This header is simple enough that it |
| can be generated using \c{DB} and \c{DW} commands by NASM itself, so |
| that you can use the \c{bin} output format to directly generate |
| \c{.EXE} files. |
| |
| Included in the NASM archives, in the \I{misc subdirectory}\c{misc} |
| subdirectory, is a file \i\c{exebin.mac} of macros. It defines three |
| macros: \i\c{EXE_begin}, \i\c{EXE_stack} and \i\c{EXE_end}. |
| |
| To produce a \c{.EXE} file using this method, you should start by |
| using \c{%include} to load the \c{exebin.mac} macro package into |
| your source file. You should then issue the \c{EXE_begin} macro call |
| (which takes no arguments) to generate the file header data. Then |
| write code as normal for the \c{bin} format - you can use all three |
| standard sections \c{.text}, \c{.data} and \c{.bss}. At the end of |
| the file you should call the \c{EXE_end} macro (again, no arguments), |
| which defines some symbols to mark section sizes, and these symbols |
| are referred to in the header code generated by \c{EXE_begin}. |
| |
| In this model, the code you end up writing starts at \c{0x100}, just |
| like a \c{.COM} file - in fact, if you strip off the 32-byte header |
| from the resulting \c{.EXE} file, you will have a valid \c{.COM} |
| program. All the segment bases are the same, so you are limited to a |
| 64K program, again just like a \c{.COM} file. Note that an \c{ORG} |
| directive is issued by the \c{EXE_begin} macro, so you should not |
| explicitly issue one of your own. |
| |
| You can't directly refer to your segment base value, unfortunately, |
| since this would require a relocation in the header, and things |
| would get a lot more complicated. So you should get your segment |
| base by copying it out of \c{CS} instead. |
| |
| On entry to your \c{.EXE} file, \c{SS:SP} are already set up to |
| point to the top of a 2Kb stack. You can adjust the default stack |
| size of 2Kb by calling the \c{EXE_stack} macro. For example, to |
| change the stack size of your program to 64 bytes, you would call |
| \c{EXE_stack 64}. |
| |
| A sample program which generates a \c{.EXE} file in this way is |
| given in the \c{test} subdirectory of the NASM archive, as |
| \c{binexe.asm}. |
| |
| |
| \H{comfiles} Producing \i\c{.COM} Files |
| |
| While large DOS programs must be written as \c{.EXE} files, small |
| ones are often better written as \c{.COM} files. \c{.COM} files are |
| pure binary, and therefore most easily produced using the \c{bin} |
| output format. |
| |
| |
| \S{combinfmt} Using the \c{bin} Format To Generate \c{.COM} Files |
| |
| \c{.COM} files expect to be loaded at offset \c{100h} into their |
| segment (though the segment may change). Execution then begins at |
| \I\c{ORG}\c{100h}, i.e. right at the start of the program. So to |
| write a \c{.COM} program, you would create a source file looking |
| like |
| |
| \c org 100h |
| \c |
| \c section .text |
| \c |
| \c start: |
| \c ; put your code here |
| \c |
| \c section .data |
| \c |
| \c ; put data items here |
| \c |
| \c section .bss |
| \c |
| \c ; put uninitialized data here |
| |
| The \c{bin} format puts the \c{.text} section first in the file, so |
| you can declare data or BSS items before beginning to write code if |
| you want to and the code will still end up at the front of the file |
| where it belongs. |
| |
| The BSS (uninitialized data) section does not take up space in the |
| \c{.COM} file itself: instead, addresses of BSS items are resolved |
| to point at space beyond the end of the file, on the grounds that |
| this will be free memory when the program is run. Therefore you |
| should not rely on your BSS being initialized to all zeros when you |
| run. |
| |
| To assemble the above program, you should use a command line like |
| |
| \c nasm myprog.asm -fbin -o myprog.com |
| |
| The \c{bin} format would produce a file called \c{myprog} if no |
| explicit output file name were specified, so you have to override it |
| and give the desired file name. |
| |
| |
| \S{comobjfmt} Using the \c{obj} Format To Generate \c{.COM} Files |
| |
| If you are writing a \c{.COM} program as more than one module, you |
| may wish to assemble several \c{.OBJ} files and link them together |
| into a \c{.COM} program. You can do this, provided you have a linker |
| capable of outputting \c{.COM} files directly (\i{TLINK} does this), |
| or alternatively a converter program such as \i\c{EXE2BIN} to |
| transform the \c{.EXE} file output from the linker into a \c{.COM} |
| file. |
| |
| If you do this, you need to take care of several things: |
| |
| \b The first object file containing code should start its code |
| segment with a line like \c{RESB 100h}. This is to ensure that the |
| code begins at offset \c{100h} relative to the beginning of the code |
| segment, so that the linker or converter program does not have to |
| adjust address references within the file when generating the |
| \c{.COM} file. Other assemblers use an \i\c{ORG} directive for this |
| purpose, but \c{ORG} in NASM is a format-specific directive to the |
| \c{bin} output format, and does not mean the same thing as it does |
| in MASM-compatible assemblers. |
| |
| \b You don't need to define a stack segment. |
| |
| \b All your segments should be in the same group, so that every time |
| your code or data references a symbol offset, all offsets are |
| relative to the same segment base. This is because, when a \c{.COM} |
| file is loaded, all the segment registers contain the same value. |
| |
| |
| \H{sysfiles} Producing \i\c{.SYS} Files |
| |
| \i{MS-DOS device drivers} - \c{.SYS} files - are pure binary files, |
| similar to \c{.COM} files, except that they start at origin zero |
| rather than \c{100h}. Therefore, if you are writing a device driver |
| using the \c{bin} format, you do not need the \c{ORG} directive, |
| since the default origin for \c{bin} is zero. Similarly, if you are |
| using \c{obj}, you do not need the \c{RESB 100h} at the start of |
| your code segment. |
| |
| \c{.SYS} files start with a header structure, containing pointers to |
| the various routines inside the driver which do the work. This |
| structure should be defined at the start of the code segment, even |
| though it is not actually code. |
| |
| For more information on the format of \c{.SYS} files, and the data |
| which has to go in the header structure, a list of books is given in |
| the Frequently Asked Questions list for the newsgroup |
| \W{news:comp.os.msdos.programmer}\i\c{comp.os.msdos.programmer}. |
| |
| |
| \H{16c} Interfacing to 16-bit C Programs |
| |
| This section covers the basics of writing assembly routines that |
| call, or are called from, C programs. To do this, you would |
| typically write an assembly module as a \c{.OBJ} file, and link it |
| with your C modules to produce a \i{mixed-language program}. |
| |
| |
| \S{16cunder} External Symbol Names |
| |
| \I{C symbol names}\I{underscore, in C symbols}C compilers have the |
| convention that the names of all global symbols (functions or data) |
| they define are formed by prefixing an underscore to the name as it |
| appears in the C program. So, for example, the function a C |
| programmer thinks of as \c{printf} appears to an assembly language |
| programmer as \c{_printf}. This means that in your assembly |
| programs, you can define symbols without a leading underscore, and |
| not have to worry about name clashes with C symbols. |
| |
| If you find the underscores inconvenient, you can define macros to |
| replace the \c{GLOBAL} and \c{EXTERN} directives as follows: |
| |
| \c %macro cglobal 1 |
| \c |
| \c global _%1 |
| \c %define %1 _%1 |
| \c |
| \c %endmacro |
| \c |
| \c %macro cextern 1 |
| \c |
| \c extern _%1 |
| \c %define %1 _%1 |
| \c |
| \c %endmacro |
| |
| (These forms of the macros only take one argument at a time; a |
| \c{%rep} construct could solve this.) |
| |
| If you then declare an external like this: |
| |
| \c cextern printf |
| |
| then the macro will expand it as |
| |
| \c extern _printf |
| \c %define printf _printf |
| |
| Thereafter, you can reference \c{printf} as if it was a symbol, and |
| the preprocessor will put the leading underscore on where necessary. |
| |
| The \c{cglobal} macro works similarly. You must use \c{cglobal} |
| before defining the symbol in question, but you would have had to do |
| that anyway if you used \c{GLOBAL}. |
| |
| Also see \k{opt-pfix}. |
| |
| \S{16cmodels} \i{Memory Models} |
| |
| NASM contains no mechanism to support the various C memory models |
| directly; you have to keep track yourself of which one you are |
| writing for. This means you have to keep track of the following |
| things: |
| |
| \b In models using a single code segment (tiny, small and compact), |
| functions are near. This means that function pointers, when stored |
| in data segments or pushed on the stack as function arguments, are |
| 16 bits long and contain only an offset field (the \c{CS} register |
| never changes its value, and always gives the segment part of the |
| full function address), and that functions are called using ordinary |
| near \c{CALL} instructions and return using \c{RETN} (which, in |
| NASM, is synonymous with \c{RET} anyway). This means both that you |
| should write your own routines to return with \c{RETN}, and that you |
| should call external C routines with near \c{CALL} instructions. |
| |
| \b In models using more than one code segment (medium, large and |
| huge), functions are far. This means that function pointers are 32 |
| bits long (consisting of a 16-bit offset followed by a 16-bit |
| segment), and that functions are called using \c{CALL FAR} (or |
| \c{CALL seg:offset}) and return using \c{RETF}. Again, you should |
| therefore write your own routines to return with \c{RETF} and use |
| \c{CALL FAR} to call external routines. |
| |
| \b In models using a single data segment (tiny, small and medium), |
| data pointers are 16 bits long, containing only an offset field (the |
| \c{DS} register doesn't change its value, and always gives the |
| segment part of the full data item address). |
| |
| \b In models using more than one data segment (compact, large and |
| huge), data pointers are 32 bits long, consisting of a 16-bit offset |
| followed by a 16-bit segment. You should still be careful not to |
| modify \c{DS} in your routines without restoring it afterwards, but |
| \c{ES} is free for you to use to access the contents of 32-bit data |
| pointers you are passed. |
| |
| \b The huge memory model allows single data items to exceed 64K in |
| size. In all other memory models, you can access the whole of a data |
| item just by doing arithmetic on the offset field of the pointer you |
| are given, whether a segment field is present or not; in huge model, |
| you have to be more careful of your pointer arithmetic. |
| |
| \b In most memory models, there is a \e{default} data segment, whose |
| segment address is kept in \c{DS} throughout the program. This data |
| segment is typically the same segment as the stack, kept in \c{SS}, |
| so that functions' local variables (which are stored on the stack) |
| and global data items can both be accessed easily without changing |
| \c{DS}. Particularly large data items are typically stored in other |
| segments. However, some memory models (though not the standard |
| ones, usually) allow the assumption that \c{SS} and \c{DS} hold the |
| same value to be removed. Be careful about functions' local |
| variables in this latter case. |
| |
| In models with a single code segment, the segment is called |
| \i\c{_TEXT}, so your code segment must also go by this name in order |
| to be linked into the same place as the main code segment. In models |
| with a single data segment, or with a default data segment, it is |
| called \i\c{_DATA}. |
| |
| |
| \S{16cfunc} Function Definitions and Function Calls |
| |
| \I{functions, C calling convention}The \i{C calling convention} in |
| 16-bit programs is as follows. In the following description, the |
| words \e{caller} and \e{callee} are used to denote the function |
| doing the calling and the function which gets called. |
| |
| \b The caller pushes the function's parameters on the stack, one |
| after another, in reverse order (right to left, so that the first |
| argument specified to the function is pushed last). |
| |
| \b The caller then executes a \c{CALL} instruction to pass control |
| to the callee. This \c{CALL} is either near or far depending on the |
| memory model. |
| |
| \b The callee receives control, and typically (although this is not |
| actually necessary, in functions which do not need to access their |
| parameters) starts by saving the value of \c{SP} in \c{BP} so as to |
| be able to use \c{BP} as a base pointer to find its parameters on |
| the stack. However, the caller was probably doing this too, so part |
| of the calling convention states that \c{BP} must be preserved by |
| any C function. Hence the callee, if it is going to set up \c{BP} as |
| a \i\e{frame pointer}, must push the previous value first. |
| |
| \b The callee may then access its parameters relative to \c{BP}. |
| The word at \c{[BP]} holds the previous value of \c{BP} as it was |
| pushed; the next word, at \c{[BP+2]}, holds the offset part of the |
| return address, pushed implicitly by \c{CALL}. In a small-model |
| (near) function, the parameters start after that, at \c{[BP+4]}; in |
| a large-model (far) function, the segment part of the return address |
| lives at \c{[BP+4]}, and the parameters begin at \c{[BP+6]}. The |
| leftmost parameter of the function, since it was pushed last, is |
| accessible at this offset from \c{BP}; the others follow, at |
| successively greater offsets. Thus, in a function such as \c{printf} |
| which takes a variable number of parameters, the pushing of the |
| parameters in reverse order means that the function knows where to |
| find its first parameter, which tells it the number and type of the |
| remaining ones. |
| |
| \b The callee may also wish to decrease \c{SP} further, so as to |
| allocate space on the stack for local variables, which will then be |
| accessible at negative offsets from \c{BP}. |
| |
| \b The callee, if it wishes to return a value to the caller, should |
| leave the value in \c{AL}, \c{AX} or \c{DX:AX} depending on the size |
| of the value. Floating-point results are sometimes (depending on the |
| compiler) returned in \c{ST0}. |
| |
| \b Once the callee has finished processing, it restores \c{SP} from |
| \c{BP} if it had allocated local stack space, then pops the previous |
| value of \c{BP}, and returns via \c{RETN} or \c{RETF} depending on |
| memory model. |
| |
| \b When the caller regains control from the callee, the function |
| parameters are still on the stack, so it typically adds an immediate |
| constant to \c{SP} to remove them (instead of executing a number of |
| slow \c{POP} instructions). Thus, if a function is accidentally |
| called with the wrong number of parameters due to a prototype |
| mismatch, the stack will still be returned to a sensible state since |
| the caller, which \e{knows} how many parameters it pushed, does the |
| removing. |
| |
| It is instructive to compare this calling convention with that for |
| Pascal programs (described in \k{16bpfunc}). Pascal has a simpler |
| convention, since no functions have variable numbers of parameters. |
| Therefore the callee knows how many parameters it should have been |
| passed, and is able to deallocate them from the stack itself by |
| passing an immediate argument to the \c{RET} or \c{RETF} |
| instruction, so the caller does not have to do it. Also, the |
| parameters are pushed in left-to-right order, not right-to-left, |
| which means that a compiler can give better guarantees about |
| sequence points without performance suffering. |
| |
| Thus, you would define a function in C style in the following way. |
| The following example is for small model: |
| |
| \c global _myfunc |
| \c |
| \c _myfunc: |
| \c push bp |
| \c mov bp,sp |
| \c sub sp,0x40 ; 64 bytes of local stack space |
| \c mov bx,[bp+4] ; first parameter to function |
| \c |
| \c ; some more code |
| \c |
| \c mov sp,bp ; undo "sub sp,0x40" above |
| \c pop bp |
| \c ret |
| |
| For a large-model function, you would replace \c{RET} by \c{RETF}, |
| and look for the first parameter at \c{[BP+6]} instead of |
| \c{[BP+4]}. Of course, if one of the parameters is a pointer, then |
| the offsets of \e{subsequent} parameters will change depending on |
| the memory model as well: far pointers take up four bytes on the |
| stack when passed as a parameter, whereas near pointers take up two. |
| |
| At the other end of the process, to call a C function from your |
| assembly code, you would do something like this: |
| |
| \c extern _printf |
| \c |
| \c ; and then, further down... |
| \c |
| \c push word [myint] ; one of my integer variables |
| \c push word mystring ; pointer into my data segment |
| \c call _printf |
| \c add sp,byte 4 ; `byte' saves space |
| \c |
| \c ; then those data items... |
| \c |
| \c segment _DATA |
| \c |
| \c myint dw 1234 |
| \c mystring db 'This number -> %d <- should be 1234',10,0 |
| |
| This piece of code is the small-model assembly equivalent of the C |
| code |
| |
| \c int myint = 1234; |
| \c printf("This number -> %d <- should be 1234\n", myint); |
| |
| In large model, the function-call code might look more like this. In |
| this example, it is assumed that \c{DS} already holds the segment |
| base of the segment \c{_DATA}. If not, you would have to initialize |
| it first. |
| |
| \c push word [myint] |
| \c push word seg mystring ; Now push the segment, and... |
| \c push word mystring ; ... offset of "mystring" |
| \c call far _printf |
| \c add sp,byte 6 |
| |
| The integer value still takes up one word on the stack, since large |
| model does not affect the size of the \c{int} data type. The first |
| argument (pushed last) to \c{printf}, however, is a data pointer, |
| and therefore has to contain a segment and offset part. The segment |
| should be stored second in memory, and therefore must be pushed |
| first. (Of course, \c{PUSH DS} would have been a shorter instruction |
| than \c{PUSH WORD SEG mystring}, if \c{DS} was set up as the above |
| example assumed.) Then the actual call becomes a far call, since |
| functions expect far calls in large model; and \c{SP} has to be |
| increased by 6 rather than 4 afterwards to make up for the extra |
| word of parameters. |
| |
| |
| \S{16cdata} Accessing Data Items |
| |
| To get at the contents of C variables, or to declare variables which |
| C can access, you need only declare the names as \c{GLOBAL} or |
| \c{EXTERN}. (Again, the names require leading underscores, as stated |
| in \k{16cunder}.) Thus, a C variable declared as \c{int i} can be |
| accessed from assembler as |
| |
| \c extern _i |
| \c |
| \c mov ax,[_i] |
| |
| And to declare your own integer variable which C programs can access |
| as \c{extern int j}, you do this (making sure you are assembling in |
| the \c{_DATA} segment, if necessary): |
| |
| \c global _j |
| \c |
| \c _j dw 0 |
| |
| To access a C array, you need to know the size of the components of |
| the array. For example, \c{int} variables are two bytes long, so if |
| a C program declares an array as \c{int a[10]}, you can access |
| \c{a[3]} by coding \c{mov ax,[_a+6]}. (The byte offset 6 is obtained |
| by multiplying the desired array index, 3, by the size of the array |
| element, 2.) The sizes of the C base types in 16-bit compilers are: |
| 1 for \c{char}, 2 for \c{short} and \c{int}, 4 for \c{long} and |
| \c{float}, and 8 for \c{double}. |
| |
| To access a C \i{data structure}, you need to know the offset from |
| the base of the structure to the field you are interested in. You |
| can either do this by converting the C structure definition into a |
| NASM structure definition (using \i\c{STRUC}), or by calculating the |
| one offset and using just that. |
| |
| To do either of these, you should read your C compiler's manual to |
| find out how it organizes data structures. NASM gives no special |
| alignment to structure members in its own \c{STRUC} macro, so you |
| have to specify alignment yourself if the C compiler generates it. |
| Typically, you might find that a structure like |
| |
| \c struct { |
| \c char c; |
| \c int i; |
| \c } foo; |
| |
| might be four bytes long rather than three, since the \c{int} field |
| would be aligned to a two-byte boundary. However, this sort of |
| feature tends to be a configurable option in the C compiler, either |
| using command-line options or \c{#pragma} lines, so you have to find |
| out how your own compiler does it. |
| |
| |
| \S{16cmacro} \i\c{c16.mac}: Helper Macros for the 16-bit C Interface |
| |
| Included in the NASM archives, in the \I{misc subdirectory}\c{misc} |
| directory, is a file \c{c16.mac} of macros. It defines three macros: |
| \i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be |
| used for C-style procedure definitions, and they automate a lot of |
| the work involved in keeping track of the calling convention. |
| |
| (An alternative, TASM compatible form of \c{arg} is also now built |
| into NASM's preprocessor. See \k{stackrel} for details.) |
| |
| An example of an assembly function using the macro set is given |
| here: |
| |
| \c proc _nearproc |
| \c |
| \c %$i arg |
| \c %$j arg |
| \c mov ax,[bp + %$i] |
| \c mov bx,[bp + %$j] |
| \c add ax,[bx] |
| \c |
| \c endproc |
| |
| This defines \c{_nearproc} to be a procedure taking two arguments, |
| the first (\c{i}) an integer and the second (\c{j}) a pointer to an |
| integer. It returns \c{i + *j}. |
| |
| Note that the \c{arg} macro has an \c{EQU} as the first line of its |
| expansion, and since the label before the macro call gets prepended |
| to the first line of the expanded macro, the \c{EQU} works, defining |
| \c{%$i} to be an offset from \c{BP}. A context-local variable is |
| used, local to the context pushed by the \c{proc} macro and popped |
| by the \c{endproc} macro, so that the same argument name can be used |
| in later procedures. Of course, you don't \e{have} to do that. |
| |
| The macro set produces code for near functions (tiny, small and |
| compact-model code) by default. You can have it generate far |
| functions (medium, large and huge-model code) by means of coding |
| \I\c{FARCODE}\c{%define FARCODE}. This changes the kind of return |
| instruction generated by \c{endproc}, and also changes the starting |
| point for the argument offsets. The macro set contains no intrinsic |
| dependency on whether data pointers are far or not. |
| |
| \c{arg} can take an optional parameter, giving the size of the |
| argument. If no size is given, 2 is assumed, since it is likely that |
| many function parameters will be of type \c{int}. |
| |
| The large-model equivalent of the above function would look like this: |
| |
| \c %define FARCODE |
| \c |
| \c proc _farproc |
| \c |
| \c %$i arg |
| \c %$j arg 4 |
| \c mov ax,[bp + %$i] |
| \c mov bx,[bp + %$j] |
| \c mov es,[bp + %$j + 2] |
| \c add ax,[bx] |
| \c |
| \c endproc |
| |
| This makes use of the argument to the \c{arg} macro to define a |
| parameter of size 4, because \c{j} is now a far pointer. When we |
| load from \c{j}, we must load a segment and an offset. |
| |
| |
| \H{16bp} Interfacing to \i{Borland Pascal} Programs |
| |
| Interfacing to Borland Pascal programs is similar in concept to |
| interfacing to 16-bit C programs. The differences are: |
| |
| \b The leading underscore required for interfacing to C programs is |
| not required for Pascal. |
| |
| \b The memory model is always large: functions are far, data |
| pointers are far, and no data item can be more than 64K long. |
| (Actually, some functions are near, but only those functions that |
| are local to a Pascal unit and never called from outside it. All |
| assembly functions that Pascal calls, and all Pascal functions that |
| assembly routines are able to call, are far.) However, all static |
| data declared in a Pascal program goes into the default data |
| segment, which is the one whose segment address will be in \c{DS} |
| when control is passed to your assembly code. The only things that |
| do not live in the default data segment are local variables (they |
| live in the stack segment) and dynamically allocated variables. All |
| data \e{pointers}, however, are far. |
| |
| \b The function calling convention is different - described below. |
| |
| \b Some data types, such as strings, are stored differently. |
| |
| \b There are restrictions on the segment names you are allowed to |
| use - Borland Pascal will ignore code or data declared in a segment |
| it doesn't like the name of. The restrictions are described below. |
| |
| |
| \S{16bpfunc} The Pascal Calling Convention |
| |
| \I{functions, Pascal calling convention}\I{Pascal calling |
| convention}The 16-bit Pascal calling convention is as follows. In |
| the following description, the words \e{caller} and \e{callee} are |
| used to denote the function doing the calling and the function which |
| gets called. |
| |
| \b The caller pushes the function's parameters on the stack, one |
| after another, in normal order (left to right, so that the first |
| argument specified to the function is pushed first). |
| |
| \b The caller then executes a far \c{CALL} instruction to pass |
| control to the callee. |
| |
| \b The callee receives control, and typically (although this is not |
| actually necessary, in functions which do not need to access their |
| parameters) starts by saving the value of \c{SP} in \c{BP} so as to |
| be able to use \c{BP} as a base pointer to find its parameters on |
| the stack. However, the caller was probably doing this too, so part |
| of the calling convention states that \c{BP} must be preserved by |
| any function. Hence the callee, if it is going to set up \c{BP} as a |
| \i{frame pointer}, must push the previous value first. |
| |
| \b The callee may then access its parameters relative to \c{BP}. |
| The word at \c{[BP]} holds the previous value of \c{BP} as it was |
| pushed. The next word, at \c{[BP+2]}, holds the offset part of the |
| return address, and the next one at \c{[BP+4]} the segment part. The |
| parameters begin at \c{[BP+6]}. The rightmost parameter of the |
| function, since it was pushed last, is accessible at this offset |
| from \c{BP}; the others follow, at successively greater offsets. |
| |
| \b The callee may also wish to decrease \c{SP} further, so as to |
| allocate space on the stack for local variables, which will then be |
| accessible at negative offsets from \c{BP}. |
| |
| \b The callee, if it wishes to return a value to the caller, should |
| leave the value in \c{AL}, \c{AX} or \c{DX:AX} depending on the size |
| of the value. Floating-point results are returned in \c{ST0}. |
| Results of type \c{Real} (Borland's own custom floating-point data |
| type, not handled directly by the FPU) are returned in \c{DX:BX:AX}. |
| To return a result of type \c{String}, the caller pushes a pointer |
| to a temporary string before pushing the parameters, and the callee |
| places the returned string value at that location. The pointer is |
| not a parameter, and should not be removed from the stack by the |
| \c{RETF} instruction. |
| |
| \b Once the callee has finished processing, it restores \c{SP} from |
| \c{BP} if it had allocated local stack space, then pops the previous |
| value of \c{BP}, and returns via \c{RETF}. It uses the form of |
| \c{RETF} with an immediate parameter, giving the number of bytes |
| taken up by the parameters on the stack. This causes the parameters |
| to be removed from the stack as a side effect of the return |
| instruction. |
| |
| \b When the caller regains control from the callee, the function |
| parameters have already been removed from the stack, so it needs to |
| do nothing further. |
| |
| Thus, you would define a function in Pascal style, taking two |
| \c{Integer}-type parameters, in the following way: |
| |
| \c global myfunc |
| \c |
| \c myfunc: push bp |
| \c mov bp,sp |
| \c sub sp,0x40 ; 64 bytes of local stack space |
| \c mov bx,[bp+8] ; first parameter to function |
| \c mov bx,[bp+6] ; second parameter to function |
| \c |
| \c ; some more code |
| \c |
| \c mov sp,bp ; undo "sub sp,0x40" above |
| \c pop bp |
| \c retf 4 ; total size of params is 4 |
| |
| At the other end of the process, to call a Pascal function from your |
| assembly code, you would do something like this: |
| |
| \c extern SomeFunc |
| \c |
| \c ; and then, further down... |
| \c |
| \c push word seg mystring ; Now push the segment, and... |
| \c push word mystring ; ... offset of "mystring" |
| \c push word [myint] ; one of my variables |
| \c call far SomeFunc |
| |
| This is equivalent to the Pascal code |
| |
| \c procedure SomeFunc(String: PChar; Int: Integer); |
| \c SomeFunc(@mystring, myint); |
| |
| |
| \S{16bpseg} Borland Pascal \I{segment names, Borland Pascal}Segment |
| Name Restrictions |
| |
| Since Borland Pascal's internal unit file format is completely |
| different from \c{OBJ}, it only makes a very sketchy job of actually |
| reading and understanding the various information contained in a |
| real \c{OBJ} file when it links that in. Therefore an object file |
| intended to be linked to a Pascal program must obey a number of |
| restrictions: |
| |
| \b Procedures and functions must be in a segment whose name is |
| either \c{CODE}, \c{CSEG}, or something ending in \c{_TEXT}. |
| |
| \b initialized data must be in a segment whose name is either |
| \c{CONST} or something ending in \c{_DATA}. |
| |
| \b Uninitialized data must be in a segment whose name is either |
| \c{DATA}, \c{DSEG}, or something ending in \c{_BSS}. |
| |
| \b Any other segments in the object file are completely ignored. |
| \c{GROUP} directives and segment attributes are also ignored. |
| |
| |
| \S{16bpmacro} Using \i\c{c16.mac} With Pascal Programs |
| |
| The \c{c16.mac} macro package, described in \k{16cmacro}, can also |
| be used to simplify writing functions to be called from Pascal |
| programs, if you code \I\c{PASCAL}\c{%define PASCAL}. This |
| definition ensures that functions are far (it implies |
| \i\c{FARCODE}), and also causes procedure return instructions to be |
| generated with an operand. |
| |
| Defining \c{PASCAL} does not change the code which calculates the |
| argument offsets; you must declare your function's arguments in |
| reverse order. For example: |
| |
| \c %define PASCAL |
| \c |
| \c proc _pascalproc |
| \c |
| \c %$j arg 4 |
| \c %$i arg |
| \c mov ax,[bp + %$i] |
| \c mov bx,[bp + %$j] |
| \c mov es,[bp + %$j + 2] |
| \c add ax,[bx] |
| \c |
| \c endproc |
| |
| This defines the same routine, conceptually, as the example in |
| \k{16cmacro}: it defines a function taking two arguments, an integer |
| and a pointer to an integer, which returns the sum of the integer |
| and the contents of the pointer. The only difference between this |
| code and the large-model C version is that \c{PASCAL} is defined |
| instead of \c{FARCODE}, and that the arguments are declared in |
| reverse order. |
| |
| |
| \C{32bit} Writing 32-bit Code (Unix, Win32, DJGPP) |
| |
| This chapter attempts to cover some of the common issues involved |
| when writing 32-bit code, to run under \i{Win32} or Unix, or to be |
| linked with C code generated by a Unix-style C compiler such as |
| \i{DJGPP}. It covers how to write assembly code to interface with |
| 32-bit C routines, and how to write position-independent code for |
| shared libraries. |
| |
| Almost all 32-bit code, and in particular all code running under |
| \c{Win32}, \c{DJGPP} or any of the PC Unix variants, runs in \I{flat |
| memory model}\e{flat} memory model. This means that the segment registers |
| and paging have already been set up to give you the same 32-bit 4Gb |
| address space no matter what segment you work relative to, and that |
| you should ignore all segment registers completely. When writing |
| flat-model application code, you never need to use a segment |
| override or modify any segment register, and the code-section |
| addresses you pass to \c{CALL} and \c{JMP} live in the same address |
| space as the data-section addresses you access your variables by and |
| the stack-section addresses you access local variables and procedure |
| parameters by. Every address is 32 bits long and contains only an |
| offset part. |
| |
| |
| \H{32c} Interfacing to 32-bit C Programs |
| |
| A lot of the discussion in \k{16c}, about interfacing to 16-bit C |
| programs, still applies when working in 32 bits. The absence of |
| memory models or segmentation worries simplifies things a lot. |
| |
| |
| \S{32cunder} External Symbol Names |
| |
| Most 32-bit C compilers share the convention used by 16-bit |
| compilers, that the names of all global symbols (functions or data) |
| they define are formed by prefixing an underscore to the name as it |
| appears in the C program. However, not all of them do: the \c{ELF} |
| specification states that C symbols do \e{not} have a leading |
| underscore on their assembly-language names. |
| |
| The older Linux \c{a.out} C compiler, all \c{Win32} compilers, |
| \c{DJGPP}, and \c{NetBSD} and \c{FreeBSD}, all use the leading |
| underscore; for these compilers, the macros \c{cextern} and |
| \c{cglobal}, as given in \k{16cunder}, will still work. For \c{ELF}, |
| though, the leading underscore should not be used. |
| |
| See also \k{opt-pfix}. |
| |
| \S{32cfunc} Function Definitions and Function Calls |
| |
| \I{functions, C calling convention}The \i{C calling convention} |
| in 32-bit programs is as follows. In the following description, |
| the words \e{caller} and \e{callee} are used to denote |
| the function doing the calling and the function which gets called. |
| |
| \b The caller pushes the function's parameters on the stack, one |
| after another, in reverse order (right to left, so that the first |
| argument specified to the function is pushed last). |
| |
| \b The caller then executes a near \c{CALL} instruction to pass |
| control to the callee. |
| |
| \b The callee receives control, and typically (although this is not |
| actually necessary, in functions which do not need to access their |
| parameters) starts by saving the value of \c{ESP} in \c{EBP} so as |
| to be able to use \c{EBP} as a base pointer to find its parameters |
| on the stack. However, the caller was probably doing this too, so |
| part of the calling convention states that \c{EBP} must be preserved |
| by any C function. Hence the callee, if it is going to set up |
| \c{EBP} as a \i{frame pointer}, must push the previous value first. |
| |
| \b The callee may then access its parameters relative to \c{EBP}. |
| The doubleword at \c{[EBP]} holds the previous value of \c{EBP} as |
| it was pushed; the next doubleword, at \c{[EBP+4]}, holds the return |
| address, pushed implicitly by \c{CALL}. The parameters start after |
| that, at \c{[EBP+8]}. The leftmost parameter of the function, since |
| it was pushed last, is accessible at this offset from \c{EBP}; the |
| others follow, at successively greater offsets. Thus, in a function |
| such as \c{printf} which takes a variable number of parameters, the |
| pushing of the parameters in reverse order means that the function |
| knows where to find its first parameter, which tells it the number |
| and type of the remaining ones. |
| |
| \b The callee may also wish to decrease \c{ESP} further, so as to |
| allocate space on the stack for local variables, which will then be |
| accessible at negative offsets from \c{EBP}. |
| |
| \b The callee, if it wishes to return a value to the caller, should |
| leave the value in \c{AL}, \c{AX} or \c{EAX} depending on the size |
| of the value. Floating-point results are typically returned in |
| \c{ST0}. |
| |
| \b Once the callee has finished processing, it restores \c{ESP} from |
| \c{EBP} if it had allocated local stack space, then pops the previous |
| value of \c{EBP}, and returns via \c{RET} (equivalently, \c{RETN}). |
| |
| \b When the caller regains control from the callee, the function |
| parameters are still on the stack, so it typically adds an immediate |
| constant to \c{ESP} to remove them (instead of executing a number of |
| slow \c{POP} instructions). Thus, if a function is accidentally |
| called with the wrong number of parameters due to a prototype |
| mismatch, the stack will still be returned to a sensible state since |
| the caller, which \e{knows} how many parameters it pushed, does the |
| removing. |
| |
| There is an alternative calling convention used by Win32 programs |
| for Windows API calls, and also for functions called \e{by} the |
| Windows API such as window procedures: they follow what Microsoft |
| calls the \c{__stdcall} convention. This is slightly closer to the |
| Pascal convention, in that the callee clears the stack by passing a |
| parameter to the \c{RET} instruction. However, the parameters are |
| still pushed in right-to-left order. |
| |
| Thus, you would define a function in C style in the following way: |
| |
| \c global _myfunc |
| \c |
| \c _myfunc: |
| \c push ebp |
| \c mov ebp,esp |
| \c sub esp,0x40 ; 64 bytes of local stack space |
| \c mov ebx,[ebp+8] ; first parameter to function |
| \c |
| \c ; some more code |
| \c |
| \c leave ; mov esp,ebp / pop ebp |
| \c ret |
| |
| At the other end of the process, to call a C function from your |
| assembly code, you would do something like this: |
| |
| \c extern _printf |
| \c |
| \c ; and then, further down... |
| \c |
| \c push dword [myint] ; one of my integer variables |
| \c push dword mystring ; pointer into my data segment |
| \c call _printf |
| \c add esp,byte 8 ; `byte' saves space |
| \c |
| \c ; then those data items... |
| \c |
| \c segment _DATA |
| \c |
| \c myint dd 1234 |
| \c mystring db 'This number -> %d <- should be 1234',10,0 |
| |
| This piece of code is the assembly equivalent of the C code |
| |
| \c int myint = 1234; |
| \c printf("This number -> %d <- should be 1234\n", myint); |
| |
| |
| \S{32cdata} Accessing Data Items |
| |
| To get at the contents of C variables, or to declare variables which |
| C can access, you need only declare the names as \c{GLOBAL} or |
| \c{EXTERN}. (Again, the names require leading underscores, as stated |
| in \k{32cunder}.) Thus, a C variable declared as \c{int i} can be |
| accessed from assembler as |
| |
| \c extern _i |
| \c mov eax,[_i] |
| |
| And to declare your own integer variable which C programs can access |
| as \c{extern int j}, you do this (making sure you are assembling in |
| the \c{_DATA} segment, if necessary): |
| |
| \c global _j |
| \c _j dd 0 |
| |
| To access a C array, you need to know the size of the components of |
| the array. For example, \c{int} variables are four bytes long, so if |
| a C program declares an array as \c{int a[10]}, you can access |
| \c{a[3]} by coding \c{mov ax,[_a+12]}. (The byte offset 12 is obtained |
| by multiplying the desired array index, 3, by the size of the array |
| element, 4.) The sizes of the C base types in 32-bit compilers are: |
| 1 for \c{char}, 2 for \c{short}, 4 for \c{int}, \c{long} and |
| \c{float}, and 8 for \c{double}. Pointers, being 32-bit addresses, |
| are also 4 bytes long. |
| |
| To access a C \i{data structure}, you need to know the offset from |
| the base of the structure to the field you are interested in. You |
| can either do this by converting the C structure definition into a |
| NASM structure definition (using \c{STRUC}), or by calculating the |
| one offset and using just that. |
| |
| To do either of these, you should read your C compiler's manual to |
| find out how it organizes data structures. NASM gives no special |
| alignment to structure members in its own \i\c{STRUC} macro, so you |
| have to specify alignment yourself if the C compiler generates it. |
| Typically, you might find that a structure like |
| |
| \c struct { |
| \c char c; |
| \c int i; |
| \c } foo; |
| |
| might be eight bytes long rather than five, since the \c{int} field |
| would be aligned to a four-byte boundary. However, this sort of |
| feature is sometimes a configurable option in the C compiler, either |
| using command-line options or \c{#pragma} lines, so you have to find |
| out how your own compiler does it. |
| |
| |
| \S{32cmacro} \i\c{c32.mac}: Helper Macros for the 32-bit C Interface |
| |
| Included in the NASM archives, in the \I{misc directory}\c{misc} |
| directory, is a file \c{c32.mac} of macros. It defines three macros: |
| \i\c{proc}, \i\c{arg} and \i\c{endproc}. These are intended to be |
| used for C-style procedure definitions, and they automate a lot of |
| the work involved in keeping track of the calling convention. |
| |
| An example of an assembly function using the macro set is given |
| here: |
| |
| \c proc _proc32 |
| \c |
| \c %$i arg |
| \c %$j arg |
| \c mov eax,[ebp + %$i] |
| \c mov ebx,[ebp + %$j] |
| \c add eax,[ebx] |
| \c |
| \c endproc |
| |
| This defines \c{_proc32} to be a procedure taking two arguments, the |
| first (\c{i}) an integer and the second (\c{j}) a pointer to an |
| integer. It returns \c{i + *j}. |
| |
| Note that the \c{arg} macro has an \c{EQU} as the first line of its |
| expansion, and since the label before the macro call gets prepended |
| to the first line of the expanded macro, the \c{EQU} works, defining |
| \c{%$i} to be an offset from \c{BP}. A context-local variable is |
| used, local to the context pushed by the \c{proc} macro and popped |
| by the \c{endproc} macro, so that the same argument name can be used |
| in later procedures. Of course, you don't \e{have} to do that. |
| |
| \c{arg} can take an optional parameter, giving the size of the |
| argument. If no size is given, 4 is assumed, since it is likely that |
| many function parameters will be of type \c{int} or pointers. |
| |
| |
| \H{picdll} Writing NetBSD/FreeBSD/OpenBSD and Linux/ELF \i{Shared |
| Libraries} |
| |
| \c{ELF} replaced the older \c{a.out} object file format under Linux |
| because it contains support for \i{position-independent code} |
| (\i{PIC}), which makes writing shared libraries much easier. NASM |
| supports the \c{ELF} position-independent code features, so you can |
| write Linux \c{ELF} shared libraries in NASM. |
| |
| \i{NetBSD}, and its close cousins \i{FreeBSD} and \i{OpenBSD}, take |
| a different approach by hacking PIC support into the \c{a.out} |
| format. NASM supports this as the \i\c{aoutb} output format, so you |
| can write \i{BSD} shared libraries in NASM too. |
| |
| The operating system loads a PIC shared library by memory-mapping |
| the library file at an arbitrarily chosen point in the address space |
| of the running process. The contents of the library's code section |
| must therefore not depend on where it is loaded in memory. |
| |
| Therefore, you cannot get at your variables by writing code like |
| this: |
| |
| \c mov eax,[myvar] ; WRONG |
| |
| Instead, the linker provides an area of memory called the |
| \i\e{global offset table}, or \i{GOT}; the GOT is situated at a |
| constant distance from your library's code, so if you can find out |
| where your library is loaded (which is typically done using a |
| \c{CALL} and \c{POP} combination), you can obtain the address of the |
| GOT, and you can then load the addresses of your variables out of |
| linker-generated entries in the GOT. |
| |
| The \e{data} section of a PIC shared library does not have these |
| restrictions: since the data section is writable, it has to be |
| copied into memory anyway rather than just paged in from the library |
| file, so as long as it's being copied it can be relocated too. So |
| you can put ordinary types of relocation in the data section without |
| too much worry (but see \k{picglobal} for a caveat). |
| |
| |
| \S{picgot} Obtaining the Address of the GOT |
| |
| Each code module in your shared library should define the GOT as an |
| external symbol: |
| |
| \c extern _GLOBAL_OFFSET_TABLE_ ; in ELF |
| \c extern __GLOBAL_OFFSET_TABLE_ ; in BSD a.out |
| |
| At the beginning of any function in your shared library which plans |
| to access your data or BSS sections, you must first calculate the |
| address of the GOT. This is typically done by writing the function |
| in this form: |
| |
| \c func: push ebp |
| \c mov ebp,esp |
| \c push ebx |
| \c call .get_GOT |
| \c .get_GOT: |
| \c pop ebx |
| \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc |
| \c |
| \c ; the function body comes here |
| \c |
| \c mov ebx,[ebp-4] |
| \c mov esp,ebp |
| \c pop ebp |
| \c ret |
| |
| (For BSD, again, the symbol \c{_GLOBAL_OFFSET_TABLE} requires a |
| second leading underscore.) |
| |
| The first two lines of this function are simply the standard C |
| prologue to set up a stack frame, and the last three lines are |
| standard C function epilogue. The third line, and the fourth to last |
| line, save and restore the \c{EBX} register, because PIC shared |
| libraries use this register to store the address of the GOT. |
| |
| The interesting bit is the \c{CALL} instruction and the following |
| two lines. The \c{CALL} and \c{POP} combination obtains the address |
| of the label \c{.get_GOT}, without having to know in advance where |
| the program was loaded (since the \c{CALL} instruction is encoded |
| relative to the current position). The \c{ADD} instruction makes use |
| of one of the special PIC relocation types: \i{GOTPC relocation}. |
| With the \i\c{WRT ..gotpc} qualifier specified, the symbol |
| referenced (here \c{_GLOBAL_OFFSET_TABLE_}, the special symbol |
| assigned to the GOT) is given as an offset from the beginning of the |
| section. (Actually, \c{ELF} encodes it as the offset from the operand |
| field of the \c{ADD} instruction, but NASM simplifies this |
| deliberately, so you do things the same way for both \c{ELF} and |
| \c{BSD}.) So the instruction then \e{adds} the beginning of the section, |
| to get the real address of the GOT, and subtracts the value of |
| \c{.get_GOT} which it knows is in \c{EBX}. Therefore, by the time |
| that instruction has finished, \c{EBX} contains the address of the GOT. |
| |
| If you didn't follow that, don't worry: it's never necessary to |
| obtain the address of the GOT by any other means, so you can put |
| those three instructions into a macro and safely ignore them: |
| |
| \c %macro get_GOT 0 |
| \c |
| \c call %%getgot |
| \c %%getgot: |
| \c pop ebx |
| \c add ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc |
| \c |
| \c %endmacro |
| |
| \S{piclocal} Finding Your Local Data Items |
| |
| Having got the GOT, you can then use it to obtain the addresses of |
| your data items. Most variables will reside in the sections you have |
| declared; they can be accessed using the \I{GOTOFF |
| relocation}\c{..gotoff} special \I\c{WRT ..gotoff}\c{WRT} type. The |
| way this works is like this: |
| |
| \c lea eax,[ebx+myvar wrt ..gotoff] |
| |
| The expression \c{myvar wrt ..gotoff} is calculated, when the shared |
| library is linked, to be the offset to the local variable \c{myvar} |
| from the beginning of the GOT. Therefore, adding it to \c{EBX} as |
| above will place the real address of \c{myvar} in \c{EAX}. |
| |
| If you declare variables as \c{GLOBAL} without specifying a size for |
| them, they are shared between code modules in the library, but do |
| not get exported from the library to the program that loaded it. |
| They will still be in your ordinary data and BSS sections, so you |
| can access them in the same way as local variables, using the above |
| \c{..gotoff} mechanism. |
| |
| Note that due to a peculiarity of the way BSD \c{a.out} format |
| handles this relocation type, there must be at least one non-local |
| symbol in the same section as the address you're trying to access. |
| |
| |
| \S{picextern} Finding External and Common Data Items |
| |
| If your library needs to get at an external variable (external to |
| the \e{library}, not just to one of the modules within it), you must |
| use the \I{GOT relocations}\I\c{WRT ..got}\c{..got} type to get at |
| it. The \c{..got} type, instead of giving you the offset from the |
| GOT base to the variable, gives you the offset from the GOT base to |
| a GOT \e{entry} containing the address of the variable. The linker |
| will set up this GOT entry when it builds the library, and the |
| dynamic linker will place the correct address in it at load time. So |
| to obtain the address of an external variable \c{extvar} in \c{EAX}, |
| you would code |
| |
| \c mov eax,[ebx+extvar wrt ..got] |
| |
| This loads the address of \c{extvar} out of an entry in the GOT. The |
| linker, when it builds the shared library, collects together every |
| relocation of type \c{..got}, and builds the GOT so as to ensure it |
| has every necessary entry present. |
| |
| Common variables must also be accessed in this way. |
| |
| |
| \S{picglobal} Exporting Symbols to the Library User |
| |
| If you want to export symbols to the user of the library, you have |
| to declare whether they are functions or data, and if they are data, |
| you have to give the size of the data item. This is because the |
| dynamic linker has to build \I{PLT}\i{procedure linkage table} |
| entries for any exported functions, and also moves exported data |
| items away from the library's data section in which they were |
| declared. |
| |
| So to export a function to users of the library, you must use |
| |
| \c global func:function ; declare it as a function |
| \c |
| \c func: push ebp |
| \c |
| \c ; etc. |
| |
| And to export a data item such as an array, you would have to code |
| |
| \c global array:data array.end-array ; give the size too |
| \c |
| \c array: resd 128 |
| \c .end: |
| |
| Be careful: If you export a variable to the library user, by |
| declaring it as \c{GLOBAL} and supplying a size, the variable will |
| end up living in the data section of the main program, rather than |
| in your library's data section, where you declared it. So you will |
| have to access your own global variable with the \c{..got} mechanism |
| rather than \c{..gotoff}, as if it were external (which, |
| effectively, it has become). |
| |
| Equally, if you need to store the address of an exported global in |
| one of your data sections, you can't do it by means of the standard |
| sort of code: |
| |
| \c dataptr: dd global_data_item ; WRONG |
| |
| NASM will interpret this code as an ordinary relocation, in which |
| \c{global_data_item} is merely an offset from the beginning of the |
| \c{.data} section (or whatever); so this reference will end up |
| pointing at your data section instead of at the exported global |
| which resides elsewhere. |
| |
| Instead of the above code, then, you must write |
| |
| \c dataptr: dd global_data_item wrt ..sym |
| |
| which makes use of the special \c{WRT} type \I\c{WRT ..sym}\c{..sym} |
| to instruct NASM to search the symbol table for a particular symbol |
| at that address, rather than just relocating by section base. |
| |
| Either method will work for functions: referring to one of your |
| functions by means of |
| |
| \c funcptr: dd my_function |
| |
| will give the user the address of the code you wrote, whereas |
| |
| \c funcptr: dd my_function wrt ..sym |
| |
| will give the address of the procedure linkage table for the |
| function, which is where the calling program will \e{believe} the |
| function lives. Either address is a valid way to call the function. |
| |
| |
| \S{picproc} Calling Procedures Outside the Library |
| |
| Calling procedures outside your shared library has to be done by |
| means of a \i\e{procedure linkage table}, or \i{PLT}. The PLT is |
| placed at a known offset from where the library is loaded, so the |
| library code can make calls to the PLT in a position-independent |
| way. Within the PLT there is code to jump to offsets contained in |
| the GOT, so function calls to other shared libraries or to routines |
| in the main program can be transparently passed off to their real |
| destinations. |
| |
| To call an external routine, you must use another special PIC |
| relocation type, \I{PLT relocations}\i\c{WRT ..plt}. This is much |
| easier than the GOT-based ones: you simply replace calls such as |
| \c{CALL printf} with the PLT-relative version \c{CALL printf WRT |
| ..plt}. |
| |
| |
| \S{link} Generating the Library File |
| |
| Having written some code modules and assembled them to \c{.o} files, |
| you then generate your shared library with a command such as |
| |
| \c ld -shared -o library.so module1.o module2.o # for ELF |
| \c ld -Bshareable -o library.so module1.o module2.o # for BSD |
| |
| For ELF, if your shared library is going to reside in system |
| directories such as \c{/usr/lib} or \c{/lib}, it is usually worth |
| using the \i\c{-soname} flag to the linker, to store the final |
| library file name, with a version number, into the library: |
| |
| \c ld -shared -soname library.so.1 -o library.so.1.2 *.o |
| |
| You would then copy \c{library.so.1.2} into the library directory, |
| and create \c{library.so.1} as a symbolic link to it. |
| |
| |
| \C{mixsize} Mixing 16- and 32-bit Code |
| |
| This chapter tries to cover some of the issues, largely related to |
| unusual forms of addressing and jump instructions, encountered when |
| writing operating system code such as protected-mode initialisation |
| routines, which require code that operates in mixed segment sizes, |
| such as code in a 16-bit segment trying to modify data in a 32-bit |
| one, or jumps between different-size segments. |
| |
| |
| \H{mixjump} Mixed-Size Jumps\I{jumps, mixed-size} |
| |
| \I{operating system, writing}\I{writing operating systems}The most |
| common form of \i{mixed-size instruction} is the one used when |
| writing a 32-bit OS: having done your setup in 16-bit mode, such as |
| loading the kernel, you then have to boot it by switching into |
| protected mode and jumping to the 32-bit kernel start address. In a |
| fully 32-bit OS, this tends to be the \e{only} mixed-size |
| instruction you need, since everything before it can be done in pure |
| 16-bit code, and everything after it can be pure 32-bit. |
| |
| This jump must specify a 48-bit far address, since the target |
| segment is a 32-bit one. However, it must be assembled in a 16-bit |
| segment, so just coding, for example, |
| |
| \c jmp 0x1234:0x56789ABC ; wrong! |
| |
| will not work, since the offset part of the address will be |
| truncated to \c{0x9ABC} and the jump will be an ordinary 16-bit far |
| one. |
| |
| The Linux kernel setup code gets round the inability of \c{as86} to |
| generate the required instruction by coding it manually, using |
| \c{DB} instructions. NASM can go one better than that, by actually |
| generating the right instruction itself. Here's how to do it right: |
| |
| \c jmp dword 0x1234:0x56789ABC ; right |
| |
| \I\c{JMP DWORD}The \c{DWORD} prefix (strictly speaking, it should |
| come \e{after} the colon, since it is declaring the \e{offset} field |
| to be a doubleword; but NASM will accept either form, since both are |
| unambiguous) forces the offset part to be treated as far, in the |
| assumption that you are deliberately writing a jump from a 16-bit |
| segment to a 32-bit one. |
| |
| You can do the reverse operation, jumping from a 32-bit segment to a |
| 16-bit one, by means of the \c{WORD} prefix: |
| |
| \c jmp word 0x8765:0x4321 ; 32 to 16 bit |
| |
| If the \c{WORD} prefix is specified in 16-bit mode, or the \c{DWORD} |
| prefix in 32-bit mode, they will be ignored, since each is |
| explicitly forcing NASM into a mode it was in anyway. |
| |
| |
| \H{mixaddr} Addressing Between Different-Size Segments\I{addressing, |
| mixed-size}\I{mixed-size addressing} |
| |
| If your OS is mixed 16 and 32-bit, or if you are writing a DOS |
| extender, you are likely to have to deal with some 16-bit segments |
| and some 32-bit ones. At some point, you will probably end up |
| writing code in a 16-bit segment which has to access data in a |
| 32-bit segment, or vice versa. |
| |
| If the data you are trying to access in a 32-bit segment lies within |
| the first 64K of the segment, you may be able to get away with using |
| an ordinary 16-bit addressing operation for the purpose; but sooner |
| or later, you will want to do 32-bit addressing from 16-bit mode. |
| |
| The easiest way to do this is to make sure you use a register for |
| the address, since any effective address containing a 32-bit |
| register is forced to be a 32-bit address. So you can do |
| |
| \c mov eax,offset_into_32_bit_segment_specified_by_fs |
| \c mov dword [fs:eax],0x11223344 |
| |
| This is fine, but slightly cumbersome (since it wastes an |
| instruction and a register) if you already know the precise offset |
| you are aiming at. The x86 architecture does allow 32-bit effective |
| addresses to specify nothing but a 4-byte offset, so why shouldn't |
| NASM be able to generate the best instruction for the purpose? |
| |
| It can. As in \k{mixjump}, you need only prefix the address with the |
| \c{DWORD} keyword, and it will be forced to be a 32-bit address: |
| |
| \c mov dword [fs:dword my_offset],0x11223344 |
| |
| Also as in \k{mixjump}, NASM is not fussy about whether the |
| \c{DWORD} prefix comes before or after the segment override, so |
| arguably a nicer-looking way to code the above instruction is |
| |
| \c mov dword [dword fs:my_offset],0x11223344 |
| |
| Don't confuse the \c{DWORD} prefix \e{outside} the square brackets, |
| which controls the size of the data stored at the address, with the |
| one \c{inside} the square brackets which controls the length of the |
| address itself. The two can quite easily be different: |
| |
| \c mov word [dword 0x12345678],0x9ABC |
| |
| This moves 16 bits of data to an address specified by a 32-bit |
| offset. |
| |
| You can also specify \c{WORD} or \c{DWORD} prefixes along with the |
| \c{FAR} prefix to indirect far jumps or calls. For example: |
| |
| \c call dword far [fs:word 0x4321] |
| |
| This instruction contains an address specified by a 16-bit offset; |
| it loads a 48-bit far pointer from that (16-bit segment and 32-bit |
| offset), and calls that address. |
| |
| |
| \H{mixother} Other Mixed-Size Instructions |
| |
| The other way you might want to access data might be using the |
| string instructions (\c{LODSx}, \c{STOSx} and so on) or the |
| \c{XLATB} instruction. These instructions, since they take no |
| parameters, might seem to have no easy way to make them perform |
| 32-bit addressing when assembled in a 16-bit segment. |
| |
| This is the purpose of NASM's \i\c{a16}, \i\c{a32} and \i\c{a64} prefixes. If |
| you are coding \c{LODSB} in a 16-bit segment but it is supposed to |
| be accessing a string in a 32-bit segment, you should load the |
| desired address into \c{ESI} and then code |
| |
| \c a32 lodsb |
| |
| The prefix forces the addressing size to 32 bits, meaning that |
| \c{LODSB} loads from \c{[DS:ESI]} instead of \c{[DS:SI]}. To access |
| a string in a 16-bit segment when coding in a 32-bit one, the |
| corresponding \c{a16} prefix can be used. |
| |
| The \c{a16}, \c{a32} and \c{a64} prefixes can be applied to any instruction |
| in NASM's instruction table, but most of them can generate all the |
| useful forms without them. The prefixes are necessary only for |
| instructions with implicit addressing: |
| \# \c{CMPSx} (\k{insCMPSB}), |
| \# \c{SCASx} (\k{insSCASB}), \c{LODSx} (\k{insLODSB}), \c{STOSx} |
| \# (\k{insSTOSB}), \c{MOVSx} (\k{insMOVSB}), \c{INSx} (\k{insINSB}), |
| \# \c{OUTSx} (\k{insOUTSB}), and \c{XLATB} (\k{insXLATB}). |
| \c{CMPSx}, \c{SCASx}, \c{LODSx}, \c{STOSx}, \c{MOVSx}, \c{INSx}, |
| \c{OUTSx}, and \c{XLATB}. |
| Also, the |
| various push and pop instructions (\c{PUSHA} and \c{POPF} as well as |
| the more usual \c{PUSH} and \c{POP}) can accept \c{a16}, \c{a32} or \c{a64} |
| prefixes to force a particular one of \c{SP}, \c{ESP} or \c{RSP} to be used |
| as a stack pointer, in case the stack segment in use is a different |
| size from the code segment. |
| |
| \c{PUSH} and \c{POP}, when applied to segment registers in 32-bit |
| mode, also have the slightly odd behaviour that they push and pop 4 |
| bytes at a time, of which the top two are ignored and the bottom two |
| give the value of the segment register being manipulated. To force |
| the 16-bit behaviour of segment-register push and pop instructions, |
| you can use the operand-size prefix \i\c{o16}: |
| |
| \c o16 push ss |
| \c o16 push ds |
| |
| This code saves a doubleword of stack space by fitting two segment |
| registers into the space which would normally be consumed by pushing |
| one. |
| |
| (You can also use the \i\c{o32} prefix to force the 32-bit behaviour |
| when in 16-bit mode, but this seems less useful.) |
| |
| |
| \C{64bit} Writing 64-bit Code (Unix, Win64) |
| |
| This chapter attempts to cover some of the common issues involved when |
| writing 64-bit code, to run under \i{Win64} or Unix. It covers how to |
| write assembly code to interface with 64-bit C routines, and how to |
| write position-independent code for shared libraries. |
| |
| All 64-bit code uses a flat memory model, since segmentation is not |
| available in 64-bit mode. The one exception is the \c{FS} and \c{GS} |
| registers, which still add their bases. |
| |
| Position independence in 64-bit mode is significantly simpler, since |
| the processor supports \c{RIP}-relative addressing directly; see the |
| \c{REL} keyword (\k{effaddr}). On most 64-bit platforms, it is |
| probably desirable to make that the default, using the directive |
| \c{DEFAULT REL} (\k{default}). |
| |
| 64-bit programming is relatively similar to 32-bit programming, but |
| of course pointers are 64 bits long; additionally, all existing |
| platforms pass arguments in registers rather than on the stack. |
| Furthermore, 64-bit platforms use SSE2 by default for floating point. |
| Please see the ABI documentation for your platform. |
| |
| 64-bit platforms differ in the sizes of the C/C++ fundamental |
| datatypes, not just from 32-bit platforms but from each other. If a |
| specific size data type is desired, it is probably best to use the |
| types defined in the standard C header \c{<inttypes.h>}. |
| |
| All known 64-bit platforms except some embedded platforms require that |
| the stack is 16-byte aligned at the entry to a function. In order to |
| enforce that, the stack pointer (\c{RSP}) needs to be aligned on an |
| \c{odd} multiple of 8 bytes before the \c{CALL} instruction. |
| |
| In 64-bit mode, the default instruction size is still 32 bits. When |
| loading a value into a 32-bit register (but not an 8- or 16-bit |
| register), the upper 32 bits of the corresponding 64-bit register are |
| set to zero. |
| |
| \H{reg64} Register Names in 64-bit Mode |
| |
| NASM uses the following names for general-purpose registers in 64-bit |
| mode, for 8-, 16-, 32- and 64-bit references, respectively: |
| |
| \c AL/AH, CL/CH, DL/DH, BL/BH, SPL, BPL, SIL, DIL, R8B-R15B |
| \c AX, CX, DX, BX, SP, BP, SI, DI, R8W-R15W |
| \c EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, R8D-R15D |
| \c RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8-R15 |
| |
| This is consistent with the AMD documentation and most other |
| assemblers. The Intel documentation, however, uses the names |
| \c{R8L-R15L} for 8-bit references to the higher registers. It is |
| possible to use those names by definiting them as macros; similarly, |
| if one wants to use numeric names for the low 8 registers, define them |
| as macros. The standard macro package \c{altreg} (see \k{pkg_altreg}) |
| can be used for this purpose. |
| |
| \H{id64} Immediates and Displacements in 64-bit Mode |
| |
| In 64-bit mode, immediates and displacements are generally only 32 |
| bits wide. NASM will therefore truncate most displacements and |
| immediates to 32 bits. |
| |
| The only instruction which takes a full \i{64-bit immediate} is: |
| |
| \c MOV reg64,imm64 |
| |
| NASM will produce this instruction whenever the programmer uses |
| \c{MOV} with an immediate into a 64-bit register. If this is not |
| desirable, simply specify the equivalent 32-bit register, which will |
| be automatically zero-extended by the processor, or specify the |
| immediate as \c{DWORD}: |
| |
| \c mov rax,foo ; 64-bit immediate |
| \c mov rax,qword foo ; (identical) |
| \c mov eax,foo ; 32-bit immediate, zero-extended |
| \c mov rax,dword foo ; 32-bit immediate, sign-extended |
| |
| The length of these instructions are 10, 5 and 7 bytes, respectively. |
| |
| If optimization is enabled and NASM can determine at assembly time |
| that a shorter instruction will suffice, the shorter instruction will |
| be emitted unless of course \c{STRICT QWORD} or \c{STRICT DWORD} is |
| specified (see \k{strict}): |
| |
| \c mov rax,1 ; Assembles as "mov eax,1" (5 bytes) |
| \c mov rax,strict qword 1 ; Full 10-byte instruction |
| \c mov rax,strict dword 1 ; 7-byte instruction |
| \c mov rax,symbol ; 10 bytes, not known at assembly time |
| \c lea rax,[rel symbol] ; 7 bytes, usually preferred by the ABI |
| |
| Note that \c{lea rax,[rel symbol]} is position-independent, whereas |
| \c{mov rax,symbol} is not. Most ABIs prefer or even require |
| position-independent code in 64-bit mode. However, the \c{MOV} |
| instruction is able to reference a symbol anywhere in the 64-bit |
| address space, whereas \c{LEA} is only able to access a symbol within |
| within 2 GB of the instruction itself (see below.) |
| |
| The only instructions which take a full \I{64-bit displacement}64-bit |
| \e{displacement} is loading or storing, using \c{MOV}, \c{AL}, \c{AX}, |
| \c{EAX} or \c{RAX} (but no other registers) to an absolute 64-bit address. |
| Since this is a relatively rarely used instruction (64-bit code generally uses |
| relative addressing), the programmer has to explicitly declare the |
| displacement size as \c{ABS QWORD}: |
| |
| \c default abs |
| \c |
| \c mov eax,[foo] ; 32-bit absolute disp, sign-extended |
| \c mov eax,[a32 foo] ; 32-bit absolute disp, zero-extended |
| \c mov eax,[qword foo] ; 64-bit absolute disp |
| \c |
| \c default rel |
| \c |
| \c mov eax,[foo] ; 32-bit relative disp |
| \c mov eax,[a32 foo] ; d:o, address truncated to 32 bits(!) |
| \c mov eax,[qword foo] ; error |
| \c mov eax,[abs qword foo] ; 64-bit absolute disp |
| |
| A sign-extended absolute displacement can access from -2 GB to +2 GB; |
| a zero-extended absolute displacement can access from 0 to 4 GB. |
| |
| \H{unix64} Interfacing to 64-bit C Programs (Unix) |
| |
| On Unix, the 64-bit ABI as well as the x32 ABI (32-bit ABI with the |
| CPU in 64-bit mode) is defined by the documents at: |
| |
| \W{http://www.nasm.us/abi/unix64}\c{http://www.nasm.us/abi/unix64} |
| |
| Although written for AT&T-syntax assembly, the concepts apply equally |
| well for NASM-style assembly. What follows is a simplified summary. |
| |
| The first six integer arguments (from the left) are passed in \c{RDI}, |
| \c{RSI}, \c{RDX}, \c{RCX}, \c{R8}, and \c{R9}, in that order. |
| Additional integer arguments are passed on the stack. These |
| registers, plus \c{RAX}, \c{R10} and \c{R11} are destroyed by function |
| calls, and thus are available for use by the function without saving. |
| |
| Integer return values are passed in \c{RAX} and \c{RDX}, in that order. |
| |
| Floating point is done using SSE registers, except for \c{long |
| double}, which is 80 bits (\c{TWORD}) on most platforms (Android is |
| one exception; there \c{long double} is 64 bits and treated the same |
| as \c{double}.) Floating-point arguments are passed in \c{XMM0} to |
| \c{XMM7}; return is \c{XMM0} and \c{XMM1}. \c{long double} are passed |
| on the stack, and returned in \c{ST0} and \c{ST1}. |
| |
| All SSE and x87 registers are destroyed by function calls. |
| |
| On 64-bit Unix, \c{long} is 64 bits. |
| |
| Integer and SSE register arguments are counted separately, so for the case of |
| |
| \c void foo(long a, double b, int c) |
| |
| \c{a} is passed in \c{RDI}, \c{b} in \c{XMM0}, and \c{c} in \c{ESI}. |
| |
| \H{win64} Interfacing to 64-bit C Programs (Win64) |
| |
| The Win64 ABI is described by the document at: |
| |
| \W{http://www.nasm.us/abi/win64}\c{http://www.nasm.us/abi/win64} |
| |
| What follows is a simplified summary. |
| |
| The first four integer arguments are passed in \c{RCX}, \c{RDX}, |
| \c{R8} and \c{R9}, in that order. Additional integer arguments are |
| passed on the stack. These registers, plus \c{RAX}, \c{R10} and |
| \c{R11} are destroyed by function calls, and thus are available for |
| use by the function without saving. |
| |
| Integer return values are passed in \c{RAX} only. |
| |
| Floating point is done using SSE registers, except for \c{long |
| double}. Floating-point arguments are passed in \c{XMM0} to \c{XMM3}; |
| return is \c{XMM0} only. |
| |
| On Win64, \c{long} is 32 bits; \c{long long} or \c{_int64} is 64 bits. |
| |
| Integer and SSE register arguments are counted together, so for the case of |
| |
| \c void foo(long long a, double b, int c) |
| |
| \c{a} is passed in \c{RCX}, \c{b} in \c{XMM1}, and \c{c} in \c{R8D}. |
| |
| \C{trouble} Troubleshooting |
| |
| This chapter describes some of the common problems that users have |
| been known to encounter with NASM, and answers them. If you think you |
| have found a bug in NASM, please see \k{bugs}. |
| |
| |
| \H{problems} Common Problems |
| |
| \S{inefficient} NASM Generates \i{Inefficient Code} |
| |
| We sometimes get `bug' reports about NASM generating inefficient, or |
| even `wrong', code on instructions such as \c{ADD ESP,8}. This is a |
| deliberate design feature, connected to predictability of output: |
| NASM, on seeing \c{ADD ESP,8}, will generate the form of the |
| instruction which leaves room for a 32-bit offset. You need to code |
| \I\c{BYTE}\c{ADD ESP,BYTE 8} if you want the space-efficient form of |
| the instruction. This isn't a bug, it's user error: if you prefer to |
| have NASM produce the more efficient code automatically enable |
| optimization with the \c{-O} option (see \k{opt-O}). |
| |
| |
| \S{jmprange} My Jumps are Out of Range\I{out of range, jumps} |
| |
| Similarly, people complain that when they issue \i{conditional |
| jumps} (which are \c{SHORT} by default) that try to jump too far, |
| NASM reports `short jump out of range' instead of making the jumps |
| longer. |
| |
| This, again, is partly a predictability issue, but in fact has a |
| more practical reason as well. NASM has no means of being told what |
| type of processor the code it is generating will be run on; so it |
| cannot decide for itself that it should generate \i\c{Jcc NEAR} type |
| instructions, because it doesn't know that it's working for a 386 or |
| above. Alternatively, it could replace the out-of-range short |
| \c{JNE} instruction with a very short \c{JE} instruction that jumps |
| over a \c{JMP NEAR}; this is a sensible solution for processors |
| below a 386, but hardly efficient on processors which have good |
| branch prediction \e{and} could have used \c{JNE NEAR} instead. So, |
| once again, it's up to the user, not the assembler, to decide what |
| instructions should be generated. See \k{opt-O}. |
| |
| |
| \S{proborg} \i\c{ORG} Doesn't Work |
| |
| People writing \i{boot sector} programs in the \c{bin} format often |
| complain that \c{ORG} doesn't work the way they'd like: in order to |
| place the \c{0xAA55} signature word at the end of a 512-byte boot |
| sector, people who are used to MASM tend to code |
| |
| \c ORG 0 |
| \c |
| \c ; some boot sector code |
| \c |
| \c ORG 510 |
| \c DW 0xAA55 |
| |
| This is not the intended use of the \c{ORG} directive in NASM, and |
| will not work. The correct way to solve this problem in NASM is to |
| use the \i\c{TIMES} directive, like this: |
| |
| \c ORG 0 |
| \c |
| \c ; some boot sector code |
| \c |
| \c TIMES 510-($-$$) DB 0 |
| \c DW 0xAA55 |
| |
| The \c{TIMES} directive will insert exactly enough zero bytes into |
| the output to move the assembly point up to 510. This method also |
| has the advantage that if you accidentally fill your boot sector too |
| full, NASM will catch the problem at assembly time and report it, so |
| you won't end up with a boot sector that you have to disassemble to |
| find out what's wrong with it. |
| |
| |
| \S{probtimes} \i\c{TIMES} Doesn't Work |
| |
| The other common problem with the above code is people who write the |
| \c{TIMES} line as |
| |
| \c TIMES 510-$ DB 0 |
| |
| by reasoning that \c{$} should be a pure number, just like 510, so |
| the difference between them is also a pure number and can happily be |
| fed to \c{TIMES}. |
| |
| NASM is a \e{modular} assembler: the various component parts are |
| designed to be easily separable for re-use, so they don't exchange |
| information unnecessarily. In consequence, the \c{bin} output |
| format, even though it has been told by the \c{ORG} directive that |
| the \c{.text} section should start at 0, does not pass that |
| information back to the expression evaluator. So from the |
| evaluator's point of view, \c{$} isn't a pure number: it's an offset |
| from a section base. Therefore the difference between \c{$} and 510 |
| is also not a pure number, but involves a section base. Values |
| involving section bases cannot be passed as arguments to \c{TIMES}. |
| |
| The solution, as in the previous section, is to code the \c{TIMES} |
| line in the form |
| |
| \c TIMES 510-($-$$) DB 0 |
| |
| in which \c{$} and \c{$$} are offsets from the same section base, |
| and so their difference is a pure number. This will solve the |
| problem and generate sensible code. |
| |
| \A{ndisasm} \i{Ndisasm} |
| |
| The Netwide Disassembler, NDISASM |
| |
| \H{ndisintro} Introduction |
| |
| |
| The Netwide Disassembler is a small companion program to the Netwide |
| Assembler, NASM. It seemed a shame to have an x86 assembler, |
| complete with a full instruction table, and not make as much use of |
| it as possible, so here's a disassembler which shares the |
| instruction table (and some other bits of code) with NASM. |
| |
| The Netwide Disassembler does nothing except to produce |
| disassemblies of \e{binary} source files. NDISASM does not have any |
| understanding of object file formats, like \c{objdump}, and it will |
| not understand \c{DOS .EXE} files like \c{debug} will. It just |
| disassembles. |
| |
| |
| \H{ndisrun} Running NDISASM |
| |
| To disassemble a file, you will typically use a command of the form |
| |
| \c ndisasm -b {16|32|64} filename |
| |
| NDISASM can disassemble 16-, 32- or 64-bit code equally easily, |
| provided of course that you remember to specify which it is to work |
| with. If no \i\c{-b} switch is present, NDISASM works in 16-bit mode |
| by default. The \i\c{-u} switch (for USE32) also invokes 32-bit mode. |
| |
| Two more command line options are \i\c{-r} which reports the version |
| number of NDISASM you are running, and \i\c{-h} which gives a short |
| summary of command line options. |
| |
| |
| \S{ndiscom} COM Files: Specifying an Origin |
| |
| To disassemble a \c{DOS .COM} file correctly, a disassembler must assume |
| that the first instruction in the file is loaded at address \c{0x100}, |
| rather than at zero. NDISASM, which assumes by default that any file |
| you give it is loaded at zero, will therefore need to be informed of |
| this. |
| |
| The \i\c{-o} option allows you to declare a different origin for the |
| file you are disassembling. Its argument may be expressed in any of |
| the NASM numeric formats: decimal by default, if it begins with `\c{$}' |
| or `\c{0x}' or ends in `\c{H}' it's \c{hex}, if it ends in `\c{Q}' it's |
| \c{octal}, and if it ends in `\c{B}' it's \c{binary}. |
| |
| Hence, to disassemble a \c{.COM} file: |
| |
| \c ndisasm -o100h filename.com |
| |
| will do the trick. |
| |
| |
| \S{ndissync} Code Following Data: Synchronisation |
| |
| Suppose you are disassembling a file which contains some data which |
| isn't machine code, and \e{then} contains some machine code. NDISASM |
| will faithfully plough through the data section, producing machine |
| instructions wherever it can (although most of them will look |
| bizarre, and some may have unusual prefixes, e.g. `\c{FS OR AX,0x240A}'), |
| and generating `DB' instructions ever so often if it's totally stumped. |
| Then it will reach the code section. |
| |
| Supposing NDISASM has just finished generating a strange machine |
| instruction from part of the data section, and its file position is |
| now one byte \e{before} the beginning of the code section. It's |
| entirely possible that another spurious instruction will get |
| generated, starting with the final byte of the data section, and |
| then the correct first instruction in the code section will not be |
| seen because the starting point skipped over it. This isn't really |
| ideal. |
| |
| To avoid this, you can specify a `\i{synchronisation}' point, or indeed |
| as many synchronisation points as you like (although NDISASM can |
| only handle 2147483647 sync points internally). The definition of a sync |
| point is this: NDISASM guarantees to hit sync points exactly during |
| disassembly. If it is thinking about generating an instruction which |
| would cause it to jump over a sync point, it will discard that |
| instruction and output a `\c{db}' instead. So it \e{will} start |
| disassembly exactly from the sync point, and so you \e{will} see all |
| the instructions in your code section. |
| |
| Sync points are specified using the \i\c{-s} option: they are measured |
| in terms of the program origin, not the file position. So if you |
| want to synchronize after 32 bytes of a \c{.COM} file, you would have to |
| do |
| |
| \c ndisasm -o100h -s120h file.com |
| |
| rather than |
| |
| \c ndisasm -o100h -s20h file.com |
| |
| As stated above, you can specify multiple sync markers if you need |
| to, just by repeating the \c{-s} option. |
| |
| |
| \S{ndisisync} Mixed Code and Data: Automatic (Intelligent) Synchronisation |
| \I\c{auto-sync} |
| |
| Suppose you are disassembling the boot sector of a \c{DOS} floppy (maybe |
| it has a virus, and you need to understand the virus so that you |
| know what kinds of damage it might have done you). Typically, this |
| will contain a \c{JMP} instruction, then some data, then the rest of the |
| code. So there is a very good chance of NDISASM being \e{misaligned} |
| when the data ends and the code begins. Hence a sync point is |
| needed. |
| |
| On the other hand, why should you have to specify the sync point |
| manually? What you'd do in order to find where the sync point would |
| be, surely, would be to read the \c{JMP} instruction, and then to use |
| its target address as a sync point. So can NDISASM do that for you? |
| |
| The answer, of course, is yes: using either of the synonymous |
| switches \i\c{-a} (for automatic sync) or \i\c{-i} (for intelligent |
| sync) will enable \c{auto-sync} mode. Auto-sync mode automatically |
| generates a sync point for any forward-referring PC-relative jump or |
| call instruction that NDISASM encounters. (Since NDISASM is one-pass, |
| if it encounters a PC-relative jump whose target has already been |
| processed, there isn't much it can do about it...) |
| |
| Only PC-relative jumps are processed, since an absolute jump is |
| either through a register (in which case NDISASM doesn't know what |
| the register contains) or involves a segment address (in which case |
| the target code isn't in the same segment that NDISASM is working |
| in, and so the sync point can't be placed anywhere useful). |
| |
| For some kinds of file, this mechanism will automatically put sync |
| points in all the right places, and save you from having to place |
| any sync points manually. However, it should be stressed that |
| auto-sync mode is \e{not} guaranteed to catch all the sync points, and |
| you may still have to place some manually. |
| |
| Auto-sync mode doesn't prevent you from declaring manual sync |
| points: it just adds automatically generated ones to the ones you |
| provide. It's perfectly feasible to specify \c{-i} \e{and} some \c{-s} |
| options. |
| |
| Another caveat with auto-sync mode is that if, by some unpleasant |
| fluke, something in your data section should disassemble to a |
| PC-relative call or jump instruction, NDISASM may obediently place a |
| sync point in a totally random place, for example in the middle of |
| one of the instructions in your code section. So you may end up with |
| a wrong disassembly even if you use auto-sync. Again, there isn't |
| much I can do about this. If you have problems, you'll have to use |
| manual sync points, or use the \c{-k} option (documented below) to |
| suppress disassembly of the data area. |
| |
| |
| \S{ndisother} Other Options |
| |
| The \i\c{-e} option skips a header on the file, by ignoring the first N |
| bytes. This means that the header is \e{not} counted towards the |
| disassembly offset: if you give \c{-e10 -o10}, disassembly will start |
| at byte 10 in the file, and this will be given offset 10, not 20. |
| |
| The \i\c{-k} option is provided with two comma-separated numeric |
| arguments, the first of which is an assembly offset and the second |
| is a number of bytes to skip. This \e{will} count the skipped bytes |
| towards the assembly offset: its use is to suppress disassembly of a |
| data section which wouldn't contain anything you wanted to see |
| anyway. |
| |
| |
| \A{inslist} \i{Instruction List} |
| |
| \H{inslistintro} Introduction |
| |
| The following sections show the instructions which NASM currently supports. For each |
| instruction, there is a separate entry for each supported addressing mode. The third |
| column shows the processor type in which the instruction was introduced and, |
| when appropriate, one or more usage flags. |
| |
| \& inslist.src |
| |
| \A{changelog} \i{NASM Version History} |
| |
| \& changes.src |
| |
| \A{source} Building NASM from Source |
| |
| The source code for NASM is available from our website, |
| \W{http://www.nasm.us/}{http://wwww.nasm.us/}, see \k{website}. |
| |
| \H{tarball} Building from a Source Archive |
| |
| The source archives available on the web site should be capable of |
| building on a number of platforms. This is the recommended method for |
| building NASM to support platforms for which executables are not |
| available. |
| |
| On a system which has Unix shell (\c{sh}), run: |
| |
| \c sh configure |
| \c make everything |
| |
| A number of options can be passed to \c{configure}; see |
| \c{sh configure --help}. |
| |
| A set of Makefiles for some other environments are also available; |
| please see the file \c{Mkfiles/README}. |
| |
| To build the installer for the Windows platform, you will need the |
| \i\e{Nullsoft Scriptable Installer}, \i{NSIS}, installed. |
| |
| To build the documentation, you will need a set of additional tools. |
| The documentation is not likely to be able to build on non-Unix |
| systems. |
| |
| \H{git} Building from the \i\c{git} Repository |
| |
| The NASM development tree is kept in a source code repository using |
| the \c{git} distributed source control system. The link is available |
| on the website. This is recommended only to participate in the |
| development of NASM or to assist with testing the development code. |
| |
| To build NASM from the \c{git} repository you will need a Perl and, if |
| building on a Unix system, GNU autoconf. |
| |
| To build on a Unix system, run: |
| |
| \c sh autogen.sh |
| |
| to create the \c{configure} script and then build as listed above. |
| |
| \A{contact} Contact Information |
| |
| \H{website} Website |
| |
| NASM has a \i{website} at |
| \W{http://www.nasm.us/}\c{http://www.nasm.us/}. |
| |
| \i{New releases}, \i{release candidates}, and \I{snapshots, daily |
| development}\i{daily development snapshots} of NASM are available from |
| the official web site in source form as well as binaries for a number |
| of common platforms. |
| |
| \S{forums} User Forums |
| |
| Users of NASM may find the Forums on the website useful. These are, |
| however, not frequented much by the developers of NASM, so they are |
| not suitable for reporting bugs. |
| |
| \S{develcom} Development Community |
| |
| The development of NASM is coordinated primarily though the |
| \i\c{nasm-devel} mailing list. If you wish to participate in |
| development of NASM, please join this mailing list. Subscription |
| links and archives of past posts are available on the website. |
| |
| \H{bugs} \i{Reporting Bugs}\I{bugs} |
| |
| To report bugs in NASM, please use the \i{bug tracker} at |
| \W{http://www.nasm.us/}\c{http://www.nasm.us/} (click on "Bug |
| Tracker"), or if that fails then through one of the contacts in |
| \k{website}. |
| |
| Please read \k{qstart} first, and don't report the bug if it's |
| listed in there as a deliberate feature. (If you think the feature |
| is badly thought out, feel free to send us reasons why you think it |
| should be changed, but don't just send us mail saying `This is a |
| bug' if the documentation says we did it on purpose.) Then read |
| \k{problems}, and don't bother reporting the bug if it's listed |
| there. |
| |
| If you do report a bug, \e{please} make sure your bug report includes |
| the following information: |
| |
| \b What operating system you're running NASM under. Linux, |
| FreeBSD, NetBSD, MacOS X, Win16, Win32, Win64, MS-DOS, OS/2, VMS, |
| whatever. |
| |
| \b If you compiled your own executable from a source archive, compiled |
| your own executable from \c{git}, used the standard distribution |
| binaries from the website, or got an executable from somewhere else |
| (e.g. a Linux distribution.) If you were using a locally built |
| executable, try to reproduce the problem using one of the standard |
| binaries, as this will make it easier for us to reproduce your problem |
| prior to fixing it. |
| |
| \b Which version of NASM you're using, and exactly how you invoked |
| it. Give us the precise command line, and the contents of the |
| \c{NASMENV} environment variable if any. |
| |
| \b Which versions of any supplementary programs you're using, and |
| how you invoked them. If the problem only becomes visible at link |
| time, tell us what linker you're using, what version of it you've |
| got, and the exact linker command line. If the problem involves |
| linking against object files generated by a compiler, tell us what |
| compiler, what version, and what command line or options you used. |
| (If you're compiling in an IDE, please try to reproduce the problem |
| with the command-line version of the compiler.) |
| |
| \b If at all possible, send us a NASM source file which exhibits the |
| problem. If this causes copyright problems (e.g. you can only |
| reproduce the bug in restricted-distribution code) then bear in mind |
| the following two points: firstly, we guarantee that any source code |
| sent to us for the purposes of debugging NASM will be used \e{only} |
| for the purposes of debugging NASM, and that we will delete all our |
| copies of it as soon as we have found and fixed the bug or bugs in |
| question; and secondly, we would prefer \e{not} to be mailed large |
| chunks of code anyway. The smaller the file, the better. A |
| three-line sample file that does nothing useful \e{except} |
| demonstrate the problem is much easier to work with than a |
| fully fledged ten-thousand-line program. (Of course, some errors |
| \e{do} only crop up in large files, so this may not be possible.) |
| |
| \b A description of what the problem actually \e{is}. `It doesn't |
| work' is \e{not} a helpful description! Please describe exactly what |
| is happening that shouldn't be, or what isn't happening that should. |
| Examples might be: `NASM generates an error message saying Line 3 |
| for an error that's actually on Line 5'; `NASM generates an error |
| message that I believe it shouldn't be generating at all'; `NASM |
| fails to generate an error message that I believe it \e{should} be |
| generating'; `the object file produced from this source code crashes |
| my linker'; `the ninth byte of the output file is 66 and I think it |
| should be 77 instead'. |
| |
| \b If you believe the output file from NASM to be faulty, send it to |
| us. That allows us to determine whether our own copy of NASM |
| generates the same file, or whether the problem is related to |
| portability issues between our development platforms and yours. We |
| can handle binary files mailed to us as MIME attachments, uuencoded, |
| and even BinHex. Alternatively, we may be able to provide an FTP |
| site you can upload the suspect files to; but mailing them is easier |
| for us. |
| |
| \b Any other information or data files that might be helpful. If, |
| for example, the problem involves NASM failing to generate an object |
| file while TASM can generate an equivalent file without trouble, |
| then send us \e{both} object files, so we can see what TASM is doing |
| differently from us. |