|  | # Debugging Memory Issues | 
|  |  | 
|  | This page is designed to help Chromium developers debug memory issues. | 
|  |  | 
|  | When in doubt, reach out to memory-dev@chromium.org. | 
|  |  | 
|  | [TOC] | 
|  |  | 
|  | ## Investigating Reproducible Memory Regression | 
|  |  | 
|  | Let's say that there's a CL or feature that reproducibly increases memory usage | 
|  | when it's landed/enabled, given a particular set of repro steps. | 
|  |  | 
|  | * Take a look at [the documentation](/docs/memory/README.md) for both | 
|  | taking and navigating memory-infra traces. | 
|  | * Take two memory-infra traces. One with the reproducible memory regression, and | 
|  | one without. | 
|  | * Load the memory-infra traces into two tabs. | 
|  | * Compare the memory dump providers and look for the one that shows the | 
|  | regression. Follow the relevant link. | 
|  | * [The regression is in the Malloc MemoryDumpProvider.](#Investigating-Reproducible-Memory-Issues) | 
|  | * [The regression is in a non-Malloc | 
|  | MemoryDumpProvider.](#Regression-in-Non-Malloc-MemoryDumpProvider) | 
|  | * [The regression is only observed in **private | 
|  | footprint**.](#Regression-only-in-Private-Footprint) | 
|  | * [No regression is observed.](#No-observed-regression) | 
|  |  | 
|  | ### Regression in Malloc MemoryDumpProvider | 
|  |  | 
|  | Repeat the above steps, but this time also [take a heap | 
|  | dump](#Taking-a-Heap-Dump). Confirm that the regression is also visible in the | 
|  | heap dump, and then compare the two heap dumps to find the difference. You can | 
|  | also use | 
|  | [diff_heap_profiler.py](https://cs.chromium.org/chromium/src/third_party/catapult/experimental/tracing/bin/diff_heap_profiler.py) | 
|  | to perform the diff. | 
|  |  | 
|  | ### Regression in Non-Malloc MemoryDumpProvider | 
|  |  | 
|  | Hopefully the MemoryDumpProvider has sufficient information to help diagnose the | 
|  | leak. Depending on the whether the leaked object is allocated via malloc or new | 
|  | - it usually should be, you can also use the steps for debugging a Malloc | 
|  | MemoryDumpProvider regression. | 
|  |  | 
|  | ### Regression only in Private Footprint | 
|  |  | 
|  | * Repeat the repro steps, but instead of taking a memory-infra trace, use | 
|  | the following tools to map the process's virtual space: | 
|  | * On macOS, use vmmap | 
|  | * On Windows, use SysInternal VMMap | 
|  | * On other OSes, use /proc/<pid\>/smaps. | 
|  | * The results should help diagnose what's happening. Contact the | 
|  | memory-dev@chromium.org mailing list for more help. | 
|  |  | 
|  | ### No observed regression | 
|  |  | 
|  | * If there isn't a regression in PrivateMemoryFootprint, then this might become | 
|  | a question of semantics for what constitutes a memory regression. Common | 
|  | problems include: | 
|  | * Shared Memory, which is hard to attribute, but is mostly accounted for in | 
|  | the memory-infra trace. | 
|  | * Binary size, which is currently not accounted for anywhere. | 
|  |  | 
|  | ## Investigating Heap Dumps From the Wild | 
|  |  | 
|  | For a small set of Chrome users in the wild, Chrome will record and upload | 
|  | anonymized heap dumps. This has the benefit of wider coverage for real code | 
|  | paths, at the expense of reproducibility. | 
|  |  | 
|  | These heap dumps can take some time to grok, but frequently yield valuable | 
|  | insight. At the time of this writing, heap dumps from the wild have resulted in | 
|  | real, high impact bugs being found in Chrome code ~90% of the time. | 
|  |  | 
|  | For an example investigation of a real heap dump, see [this | 
|  | link](/docs/memory/investigating_heap_dump_example.md). | 
|  |  | 
|  | * Raw heap dumps can be viewed in the trace viewer. [See detailed | 
|  | instructions.](/docs/memory-infra/heap_profiler.md#how-to-manually-browse-a-heap-dump). | 
|  | This interface surfaces all available information, but can be overwhelming and | 
|  | is usually unnecessary for investigating heap dumps. | 
|  | * Important note: Heap profiling in the field uses | 
|  | [Poisson process sampling](https://bugs.chromium.org/p/chromium/issues/detail?id=810748) | 
|  | with a rate parameter of 10000. This means that for large/frequent allocations | 
|  | [e.g. >100 MB], the noise will be quite small [much less than 1%]. But | 
|  | there is noise so counts will not be exact. | 
|  | * The heap dump summary typically contains all information necessary to diagnose | 
|  | a memory issue. | 
|  | * The stack trace of the potential memory leak is almost always sufficient to | 
|  | tell the type of object being leaked, since most functions in Chrome | 
|  | have a limited number of calls to new and malloc. | 
|  | * The next thing to do is to determine whether the memory usage is intentional. | 
|  | Very rarely, components in Chrome legitimately need to use many 100s of MBs of | 
|  | memory. In this case, it's important to create a | 
|  | [MemoryDumpProvider](https://cs.chromium.org/chromium/src/base/trace_event/memory_dump_provider.h) | 
|  | to report this memory usage, so that we have a better understanding of which | 
|  | components are using a lot of memory. For an example, see | 
|  | [Issue 813046](https://bugs.chromium.org/p/chromium/issues/detail?id=813046). | 
|  | * Assuming the memory usage is not intentional, the next thing to do is to | 
|  | figure out what is causing the memory leak. | 
|  | * The most common cause is adding elements to a container with no limit. | 
|  | Usually the code makes assumptions about how frequently it will be called | 
|  | in the wild, and something breaks those assumptions. Or sometimes the code | 
|  | to clear the container is not called as frequently as expected [or at | 
|  | all]. [Example | 
|  | 1](https://bugs.chromium.org/p/chromium/issues/detail?id=798012). [Example | 
|  | 2](https://bugs.chromium.org/p/chromium/issues/detail?id=804440). | 
|  | * Retain cycles for ref-counted objects. | 
|  | [Example](https://bugs.chromium.org/p/chromium/issues/detail?id=814334#c23) | 
|  | * Straight up leaks resulting from incorrect use of APIs. [Example | 
|  | 1](https://bugs.chromium.org/p/chromium/issues/detail?id=801702#c31). | 
|  | [Example | 
|  | 2](https://bugs.chromium.org/p/chromium/issues/detail?id=814444#c17). | 
|  |  | 
|  | ## Taking a Heap Dump | 
|  |  | 
|  | Navigate to chrome://flags and search for **memlog**. There are several options | 
|  | that can be used to configure heap dumps. All of these options are also | 
|  | available as command line flags, for automated test runs [e.g. telemetry]. | 
|  |  | 
|  | * `#memlog` controls which processes are profiled. It's also possible to | 
|  | manually specify the process via the interface at `chrome://memory-internals`. | 
|  | * `#memlog-in-process` makes the profiling service to be run within the | 
|  | Chrome browser process. Defaults to run the service as a separate dedicated | 
|  | process. | 
|  | * `#memlog-sampling-rate` specifies the sampling interval in bytes. The lower | 
|  | the interval, the more precise is the profile. However it comes at the cost of | 
|  | performance. Default value is 100KB, that is enough to observe allocation | 
|  | sites that make allocations >500KB total, where total equals to a single | 
|  | allocation size times the number of such allocations at the same call site. | 
|  | * `#memlog-stack-mode` describes the type of metadata recorded for each | 
|  | allocation. `native` stacks provide the most utility. The only time the other | 
|  | options should be considered is for Android official builds, most of which do | 
|  | not support `native` stacks. | 
|  |  | 
|  | Once the flags have been set appropriately, restart Chrome and take a | 
|  | memory-infra trace. The results will have a heap dump. | 
|  |  | 
|  | ## Investigating Memory Corruption | 
|  |  | 
|  | In case you can reproduce the corruption locally, | 
|  | you are advised to run sanitizers (e.g. | 
|  | [ASan](https://chromium.googlesource.com/chromium/src/+/HEAD/docs/asan.md)) | 
|  | to locate and fix UB. | 
|  |  | 
|  | Otherwise, you can look into | 
|  | [minidump](https://sites.google.com/a/google.com/crash/users/how-to/manually-debug-a-minidump) | 
|  | (link Googlers-only) if available. | 
|  |  | 
|  | ### Known Memory Poisoning Patterns | 
|  |  | 
|  | Memory allocation goes through multiple states, | 
|  | and its payload sometimes has a distinctive pattern. | 
|  | You may also see some variance on lower bits, introduced by | 
|  | e.g. an offset within `struct`. | 
|  |  | 
|  | #### Memory held by the OS | 
|  |  | 
|  | * All memory comes from the OS and returns back to the OS at some point. | 
|  | * Access to memory that is already returned to the OS is likely a crash. | 
|  | * Large allocations (>= ~1 MiB) tend to go back to the OS quickly when | 
|  | freed, while smaller allocations are mostly reused. | 
|  |  | 
|  | #### Memory held by the allocator | 
|  |  | 
|  | * The allocator holds the memory region borrowed from the OS in a free-list. | 
|  | * Payload and behavior are implementation-specific. | 
|  | * In Chrome, we use | 
|  | [PartitionAlloc](/base/allocator/partition_allocator/PartitionAlloc.md) as the | 
|  | main allocator. | 
|  | * We embed some data on payload and the original payload before `free()` may | 
|  | or may not be overwritten. | 
|  | * Writes to `free()`d memory may be caught as "free-list corruption". | 
|  | * Following patterns can be written at this stage: | 
|  | * `0xCDCDCDCDCDCDCDCD`: when allocation gets returned to PartitionAlloc. | 
|  | * Shows up only in `PA_BUILDFLAG(EXPENSIVE_DCHECKS_ARE_ON)` builds. | 
|  |  | 
|  | #### Quarantined Memory | 
|  |  | 
|  | * Optionally, the allocator may keep `free()`d memory in quarantine | 
|  | for a while before returning it into a free-list to detect and mitigate | 
|  | UaF bugs. | 
|  | * Following patterns can be written at this stage: | 
|  | * `0xCDCDCDCDCDCDCDCD`: PartitionAlloc's `FreeFlags::kZap`. | 
|  | * As of Aug. 2024 this is used by only [AMSC](https://docs.google.com/document/d/12OM0CSKgKv6NhM9YylSqAAXiV_f4uMgYgaH8KABUe-o/edit?usp=sharing). | 
|  | * `0xEFEFEFEFEFEFEFEF`: In [BRP](https://chromium.googlesource.com/chromium/src/+/HEAD/base/memory/raw_ptr.md) quarantine. | 
|  | * You are using a dangling pointer to access invalidated memory region. | 
|  | * `0xEFED????????8000`: In [LUD](https://docs.google.com/document/d/1xfGa_IMtFZiQ3beOmkncEafODwn4U90ZyL4NfPaAtDY/edit?usp=sharing&resourcekey=0-89BZl1SVILB6ylOHula0IA) quarantine. | 
|  | * (Googlers-only) You may have an access to `free()` stack trace on crashpad. | 
|  | * `0xECEC????????8000`: In [E-LUD](https://docs.google.com/document/d/1_9TSOtQuPR3NjorLDjAkuloi8lYqblb6Ykt5nbVnh9I/edit?usp=sharing) quarantine. | 
|  |  | 
|  |  | 
|  | #### Memory allocation you officially own | 
|  |  | 
|  | In principle, once initialized you should only see values written | 
|  | by your code while your allocation is alive. | 
|  | However, in rare case, you may see values from Write-after-Free. | 
|  |  | 
|  | ```txt | 
|  | void YourFunc() {              | void TheirFunc() { | 
|  | |   int* p1 = new int; | 
|  | |   delete p1; | 
|  | // The allocator may         | | 
|  | // redistribute `p1` to `p2` | | 
|  | int* p2 = new int;           | | 
|  | *p2 = 123;                   | | 
|  | |   // Write-after-Free | 
|  | |   *p1 = 456; | 
|  | // 456 may show up           | | 
|  | printf("%d\n", *p2);         | | 
|  | }                              | } | 
|  | ``` | 
|  |  | 
|  | ...or values from Double-Free. | 
|  |  | 
|  | ``` | 
|  | void YourFunc() {              | void TheirFunc() { | 
|  | |   int* p1 = new int; | 
|  | |   delete p1; | 
|  | // The allocator may         | | 
|  | // redistribute `p1` to `p2` | | 
|  | int* p2 = new int;           | | 
|  | *p2 = 123;                   | | 
|  | |   // Double-Free | 
|  | |   delete p1; | 
|  | | | 
|  | |   // The allocator may | 
|  | |   // redistribute `p2` to `p3` | 
|  | |   int* p3 = new int; | 
|  | |   *p3 = 456; | 
|  | // 456 may show up           | | 
|  | printf("%d\n", *p2);         | | 
|  | }                              | } | 
|  | ``` | 
|  |  | 
|  | * Following patterns can be written at this stage: | 
|  | * `0x0000000000000000`: [zero initialization](https://en.cppreference.com/w/cpp/language/zero_initialization). | 
|  | * `0x0000000000000000`: PartitionAlloc's `AllocFlags::kZeroFill`. | 
|  | * This payload is written as a part of memory allocation but requires | 
|  | explicit opt-in e.g. `calloc()`. | 
|  | - `0xABABABABABABABAB`: PartitionAlloc's newly allocated memory. | 
|  | * Shows up only in `PA_BUILDFLAG(EXPENSIVE_DCHECKS_ARE_ON)` builds. | 
|  | * MSan should be capable of catching this kind of reads to uninitialized | 
|  | regions. | 
|  |  | 
|  |  | 
|  | #### Memory allocation owned by someone else | 
|  |  | 
|  | You may see random values written by someone else | 
|  | if you keep using pointers to `free()`d region. | 
|  |  | 
|  | ``` | 
|  | void YourFunc() {              | void TheirFunc() { | 
|  | int* p1 = new int;           | | 
|  | *p1 = 123;                   | | 
|  | delete p1;                   | | 
|  | |   // The allocator may | 
|  | |   // redistribute `p1` to `p2` | 
|  | |   int* p2 = new int; | 
|  | |   *p2 = 456; | 
|  | // Use-after-Free;           | | 
|  | // 456 may show up           | | 
|  | printf("%d\n", *p1);         | | 
|  | }                              | } | 
|  | ``` |