Data Model

The SuperSize data model is a sorted flat list of symbols. Using a flat list is simple, and allows arbitrary queries to be made on symbols.

//tools/binary_size/libsupersize/models.py contains the definition of all data classes.

Python API Reference

SizeInfo

Represents the data within a .size file. Contains:

  • build_config: JSON metadata applicable to all symbols.
  • containers: List of Container instances used by symbols in this SizeInfo.
  • raw_symbols: List of Symbols.

Symbol

Each symbol contains the following fields:

  • container: A (shared) Container instance.
  • section_name: E.g. “.text”, “.rodata”, “.data.rel.local”
  • section: The single character abbreviation of section_name. E.g. “t”, “r”, “d”.
  • size: The number of bytes this symbol takes up, including padding that comes before |address|.
  • padding: The number of bytes of padding before |address|.
  • address (optional): The start address of the symbol.
  • source_path (optional): Path to the source file that caused this symbol to exist (e.g. base64.cc, SomeClass.java).
  • object_path (optional):
    • For native and pak: Path to associated object file. E.g.: base/base64.o
    • For dex: Package path. E.g.: $APK/org/chromium/chrome/SomeClass.class
  • aliases: List of symbols that represent the same bytes. The |aliases| of each symbol in this list points to the same list instance.
  • num_aliases: The number of symbols with the same address (including self).
  • pss: size / num_aliases.
  • padding_pss: padding / num_aliases.
  • full_name: Name for this symbol.
    • Symbols are not required to have unique names, or names as all (empty string is valid).
  • template_name: Derived from full_name. Name with parameter list removed, but template parameters present.
  • name: Derived from full_name. Names with templates and parameter list removed.
  • component: The team that owns this feature (optional, maybe be empty).
  • flags: Bitmask of flags. See FLAG_* constants in models.py.
  • disassembly (optional): The disassembly code for the symbol.

Diffs

Diffs are represented in Python using DeltaSizeInfo, which contains a list of DeltaSymbol instances. DeltaSymbols maintain the full fidelity of symbols in the diff by storing a pointer to the before / after symbol that they represent. See diffs.md for more details.

Concepts

Symbol Aliases

Aliases occur when multiple symbols refer to the same bytes (have the same address, size, and padding).

Examples of where aliases are used:

  • Functions with identical code are de-deuped via identical code folding.
  • Functions that appear in multiple translation units (e.g. functions with inline linkage). These have the same name, but different paths.
    • Represented as one alias per path, but are collapsed into a single symbol with a path of $COMMON_PREFIX/{shared}/$SYMBOL_COUNT when the number of aliases is large.
      • E.g.: base/{shared}/3
  • String literals that are de-duped by identical code folding.
  • Pak entries with identical payloads.

Path Normalization

  • Prefixes are removed: out/Release/, gen/, obj/
    • This causes generated files to overlay non-generated source tree, which is useful for attribution since the two generally mirror one another.
    • Generated symbols have the FLAG_GENERATED bit set.

Overhead and Star Symbols

Overhead symbols are symbols with a name that starts with “Overhead:”. They track bytes that are generally unactionable. They are recorded as padding-only symbols (e.g.: size=10, padding=10, size_without_padding=0) because “padding” better associates with “overhead” vs. size.

  • Overhead: ELF file: elf_file_size - sum(elf_sections).
    • Captures bytes taken up by ELF headers and section alignment.
  • Overhead: APK file: apk_file_size - sum(compressed_file_sizes)
    • Captures bytes taken up by .zip metadata and zipalign padding.
  • Overhead: ${NAME}.pak: pak_file_size - sum(pak_entries)
  • Overhead: aggregate padding of diff'ed symbols: Appears in symbol diffs to represent the per-section cumulative delta in padding.

Star symbols are symbols with a name that starts with “**”. They represent sections of binary that are unattributed.

Examples:

  • ** Merge Globals - Taken from linker map file. A section of data containing unnamed constants.
  • ** Symbol gap: A gap between symbols that is larger than what could be plausibly be due to alignment.
  • ** ELF Section: .rel.dyn: A native code ELF section that is not broken down into smaller symbols.