This doc describes how SuperSize breaks down native binaries into symbols.
Native symbols are those with a
.data.rel.ro(data that is read-only after ELF relocations are applied)
.bss(symbols that are zero-initialized. These consume no space in the binary, and so are generally ignored despite still being collected.
There are 3 modes that SuperSize can use to break an ELF down into symbols:
linker_map- Uses linker map + build directory to create symbols.
dwarf- Uses debug information to create symbols.
sections- Creates one symbol for each ELF section.
This is the mode that produces the largest number of symbols, and thus is the preferred mode. Information provided only by this mode:
"some string dat...").
** merge stringsentries, which tell us where to string tables exist within
object_path, which is useful for attributing STL usages to individual source files.
build.ninjais parsed to get:
.afiles that were inputs to the linker.
.a) files are parsed:
nmto get symbol list.
nmto get list of string literals
llvm-bcanalyzerto get list of string literals
nmto get list of symbol names that were identical-code-folded to the same address.
-Wl,-Map=output.map) parsed to get:
** merge stringsentries).
.ofile) associated with each symbol
object_pathpoints to a hashed filename within the thinlto cache (not useful).
\0bytes and creating string literal symbols for each string therein.
.ofiles define each symbol (match up the names).
.ofiles define the same string literal.
.hfiles are never listed as sources. No information about inlined symbols is gathered (by design).
nmreports multiple symbols mapping to the same address.
source_pathby removing generated path prefix (and adding
FLAG_GENERATED) when applicable.
Creates symbols using only an ELF with debug information enabled. Requires compiler flag
-gmlt to enable full source paths (rather than just basename).
nm(this could have been done at the same time as the previous step, but is done as a separate step in order to share logic with
dwarfdumpto find all
DW_AT_rangesentries and create a map of address range -> source path.
Bloaty is an excellent tool, and produces size information with similar fidelity to “dwarf” mode, as it uses the same data source. We did not use bloaty since “dwarfdump” was already readily available and gave similar results. It would be nice to also have a “bloaty” mode so that we could more direclty compare outputs.
This mode uses
readelf -s to create one symbol for each ELF section. It is used for native files where no debug information or linker map file is available, and for native files whose ABI do not match the
Some manipulation happens in order to make names and paths more human-readable.
(anonymous::)is removed from names (and stored as a symbol flag).
[clone]suffix removed (and stored as a symbol flag).
vtable for FOO->
name: Name without template and argument parameters.
template_name: Name without argument parameters.
full_name: Name with all parameters.
OUTLINED_FUNCTION_*symbols. These are renamed to
** outlined functionsor
** outlined functions * (count), and are de-duped so an address can have at most one such symbol.