This document describes how the heap profiler works and how to add heap profiling support to your allocator. If you just want to know how to use it, see Heap Profiling with MemoryInfra
The heap profiler consists of tree main components:
These components are designed to work well together, but to be usable independently as well.
When there is a way to get notified of all allocations and frees, this is the normal flow:
AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot()
to get an AllocationContext
.AllocationRegister
by calling Insert()
.Remove()
.ExportHeapDump()
, and add the generated heap dump to the memory dump.When heap profiling is enabled (the --enable-heap-profiling
flag is passed), the memory dump manager calls OnHeapProfilingEnabled()
on every MemoryDumpProvider
as early as possible, so allocators can start recording allocations. This should be done even when tracing has not been started, because these allocations might still be around when a heap dump happens during tracing.
The AllocationContextTracker
is a thread-local object. Its main purpose is to keep track of a pseudo stack of trace events. Chrome has been instrumented with lots of TRACE_EVENT
macros. These trace events push their name to a thread-local stack when they go into scope, and pop when they go out of scope, if all of the following conditions have been met:
--enable-heap-profiling
flag).This means that allocations that occur before tracing is started will not have backtrace information in their context.
A thread-local instance of the context tracker is initialized lazily when it is first accessed. This might be because a trace event pushed or popped, or because GetContextSnapshot()
was called when an allocation occurred.
AllocationContext
is what is used to group and break down allocations. Currently AllocationContext
has the following fields:
It is possible to modify this context after insertion into the register, for instance to set the type name if it was not known at the time of allocation.
The AllocationRegister
is a hash table specialized for storing (size, AllocationContext)
pairs by address. It has been optimized for Chrome's typical number of unfreed allocations, and it is backed by mmap
memory directly so there are no reentrancy issues when using it to record malloc
allocations.
The allocation register is threading-agnostic. Access must be synchronised properly.
Dumping every single allocation in the allocation register straight into the trace log is not an option due to the sheer volume (~300k unfreed allocations). The role of the ExportHeapDump()
function is to group allocations, striking a balance between trace log size and detail.
See the Heap Dump Format document for more details about the structure of the heap dump in the trace log.
Below is an example of adding heap profiling support to an allocator that has an existing memory dump provider.
class FooDumpProvider : public MemoryDumpProvider { // Kept as pointer because |AllocationRegister| allocates a lot of virtual // address space when constructed, so only construct it when heap profiling is // enabled. scoped_ptr<AllocationRegister> allocation_register_; Lock allocation_register_lock_; static FooDumpProvider* GetInstance(); void InsertAllocation(void* address, size_t size) { AllocationContext context = AllocationContextTracker::GetInstanceForCurrentThread()->GetContextSnapshot(); AutoLock lock(allocation_register_lock_); allocation_register_->Insert(address, size, context); } void RemoveAllocation(void* address) { AutoLock lock(allocation_register_lock_); allocation_register_->Remove(address); } // Will be called as early as possible by the memory dump manager. void OnHeapProfilingEnabled(bool enabled) override { AutoLock lock(allocation_register_lock_); allocation_register_.reset(new AllocationRegister()); // At this point, make sure that from now on, for every allocation and // free, |FooDumpProvider::GetInstance()->InsertAllocation()| and // |RemoveAllocation| are called. } bool OnMemoryDump(const MemoryDumpArgs& args, ProcessMemoryDump& pmd) override { // Do regular dumping here. // Dump the heap only for detailed dumps. if (args.level_of_detail == MemoryDumpLevelOfDetail::DETAILED) { TraceEventMemoryOverhead overhead; hash_map<AllocationContext, size_t> bytes_by_context; { AutoLock lock(allocation_register_lock_); if (allocation_register_) { // Group allocations in the register into |bytes_by_context|, but do // no additional processing inside the lock. for (const auto& alloc_size : *allocation_register_) bytes_by_context[alloc_size.context] += alloc_size.size; allocation_register_->EstimateTraceMemoryOverhead(&overhead); } } if (!bytes_by_context.empty()) { scoped_refptr<TracedValue> heap_dump = ExportHeapDump( bytes_by_context, pmd->session_state()->stack_frame_deduplicator(), pmb->session_state()->type_name_deduplicator()); pmd->AddHeapDump("foo_allocator", heap_dump); overhead.DumpInto("tracing/heap_profiler", pmd); } } return true; } };
malloc
is more complicated because it needs to deal with reentrancy.