tcmalloc: precompute the pointer mask for the doubly linked list

MaskPtr is the hottest function in Chrome, accounting for >2% of chrome
CPU cycles in the field.
Precomputing the mask, cuts the number of machine instructions generated
for MaskPtr in half, with similar potential savings to its running time.

Moved definitions of doubly-linked list API functions to the .cc file to
make the pointer mask local to the free list implementation.
Let the compiler make inlining decisions based on profile information.

The change shows statistically significant improvements to some timing
metrics in a browsing test, https://screenshot.googleplex.com/GTR9nkiaoBt,
but shows no effect on speedometer2 score.
Full metric results for a few benchmarks are in
https://gmx.users.x20web.corp.google.com/crosperf_maskptr/results-section.html

BUG=b:141578761

Change-Id: I9c252ab5617a2d30eec556b9951a8426a612c14e
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2255561
Reviewed-by: Will Harris <wfh@chromium.org>
Reviewed-by: Gabriel Marin <gmx@chromium.org>
Commit-Queue: Gabriel Marin <gmx@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#780957}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: d57ce8a7828f63b65d673f1f31443ba27e87f19e
3 files changed
tree: 413ad3850eafb849b9dfc2d14c4634d31631c555
  1. src/