This doc describes how SuperSize breaks down native binaries into symbols.
Native symbols are those with a section
of:
.text
(executable code).rodata
(read-only data).data
(writable data).data.rel.ro
(data that is read-only after ELF relocations are applied).bss
(symbols that are zero-initialized. These consume no space in the binary, and so are generally ignored despite still being collected.There are 3 modes that SuperSize can use to break an ELF down into symbols:
linker_map
- Uses linker map + build directory to create symbols.dwarf
- Uses debug information to create symbols.sections
- Creates one symbol for each ELF section.This is the mode that produces the largest number of symbols, and thus is the preferred mode. Information provided only by this mode:
"some string dat..."
).** merge strings
entries, which tell us where to string tables exist within .rodata
.object_path
, which is useful for attributing STL usages to individual source files.build.ninja
is parsed to get:.o
and .a
files that were inputs to the linker..cc
-> .o
files..o
(and .a
) files are parsed:nm
to get symbol list.nm
to get list of string literalsllvm-bcanalyzer
to get list of string literalsnm
to get list of symbol names that were identical-code-folded to the same address.-Wl,-Map=output.map
) parsed to get:** merge strings
entries).object_path
(.o
file) associated with each symbolobject_path
points to a hashed filename within the thinlto cache (not useful).\0
bytes and creating string literal symbols for each string therein..o
files define each symbol (match up the names)..o
files define the same string literal.source_path
using the .o
-> .cc
mapping from build.ninja
..h
files are never listed as sources. No information about inlined symbols is gathered (by design).nm
reports multiple symbols mapping to the same address.source_path
by removing generated path prefix (and adding FLAG_GENERATED
) when applicable.Creates symbols using only an ELF with debug information enabled. Requires compiler flag -gmlt
to enable full source paths (rather than just basename).
nm --print-size
.nm
(this could have been done at the same time as the previous step, but is done as a separate step in order to share logic with linker_map
mode.dwarfdump
to find all DW_AT_compile_unit
and DW_AT_ranges
entries and create a map of address range -> source path.Bloaty is an excellent tool, and produces size information with similar fidelity to “dwarf” mode, as it uses the same data source. We did not use bloaty since “dwarfdump” was already readily available and gave similar results. It would be nice to also have a “bloaty” mode so that we could more direclty compare outputs.
This mode uses readelf -s
to create one symbol for each ELF section. It is used for native files where no debug information or linker map file is available, and for native files whose ABI do not match the --abi-filter
.
Some manipulation happens in order to make names and paths more human-readable.
(anonymous::)
is removed from names (and stored as a symbol flag).[clone]
suffix removed (and stored as a symbol flag).vtable for FOO
-> Foo [vtable]
name
: Name without template and argument parameters.template_name
: Name without argument parameters.full_name
: Name with all parameters.OUTLINED_FUNCTION_*
symbols. These are renamed to ** outlined functions
or ** outlined functions * (count)
, and are de-duped so an address can have at most one such symbol.