| POWERPC NOTES |
| ------------- |
| |
| This branch contains PowerPC specific performance tunings and |
| platform fixes. Although some tests heap and cpu profiler tests fail, |
| libtcmalloc works correctly. |
| |
| On newer distros with 64k page size, it is highly recommended |
| to configure with '-DTCMALLOC_LARGE_PAGES64K' CXX flag. It turns the |
| internal page allocation to use 64K page size and avoid more syscalls |
| to allocate memory from OS. |
| |
| The default value of objects transfered between central list and |
| thread cache is increased from 32 to 32768. This is a performance |
| improvement, specially on programs that allocate lot of objects of |
| the same size (like a std::map<int> with a large set of elements). |
| The value can be change by setting the environment variable |
| TCMALLOC_TRANSFER_NUM_OBJ. |
| |
| |
| IMPORTANT NOTE FOR 64-BIT USERS |
| ------------------------------- |
| There are known issues with some perftools functionality on x86_64 |
| systems. See 64-BIT ISSUES, below. |
| |
| |
| TCMALLOC |
| -------- |
| Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of |
| tcmalloc -- a replacement for malloc and new. See below for some |
| environment variables you can use with tcmalloc, as well. |
| |
| tcmalloc functionality is available on all systems we've tested; see |
| INSTALL for more details. See README_windows.txt for instructions on |
| using tcmalloc on Windows. |
| |
| NOTE: When compiling with programs with gcc, that you plan to link |
| with libtcmalloc, it's safest to pass in the flags |
| |
| -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free |
| |
| when compiling. gcc makes some optimizations assuming it is using its |
| own, built-in malloc; that assumption obviously isn't true with |
| tcmalloc. In practice, we haven't seen any problems with this, but |
| the expected risk is highest for users who register their own malloc |
| hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is |
| lowest for folks who use tcmalloc_minimal (or, of course, who pass in |
| the above flags :-) ). |
| |
| |
| HEAP PROFILER |
| ------------- |
| See doc/heap-profiler.html for information about how to use tcmalloc's |
| heap profiler and analyze its output. |
| |
| As a quick-start, do the following after installing this package: |
| |
| 1) Link your executable with -ltcmalloc |
| 2) Run your executable with the HEAPPROFILE environment var set: |
| $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] |
| 3) Run pprof to analyze the heap usage |
| $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options |
| $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap |
| |
| You can also use LD_PRELOAD to heap-profile an executable that you |
| didn't compile. |
| |
| There are other environment variables, besides HEAPPROFILE, you can |
| set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" |
| below. |
| |
| The heap profiler is available on all unix-based systems we've tested; |
| see INSTALL for more details. It is not currently available on Windows. |
| |
| |
| HEAP CHECKER |
| ------------ |
| See doc/heap-checker.html for information about how to use tcmalloc's |
| heap checker. |
| |
| In order to catch all heap leaks, tcmalloc must be linked *last* into |
| your executable. The heap checker may mischaracterize some memory |
| accesses in libraries listed after it on the link line. For instance, |
| it may report these libraries as leaking memory when they're not. |
| (See the source code for more details.) |
| |
| Here's a quick-start for how to use: |
| |
| As a quick-start, do the following after installing this package: |
| |
| 1) Link your executable with -ltcmalloc |
| 2) Run your executable with the HEAPCHECK environment var set: |
| $ HEAPCHECK=1 <path/to/binary> [binary args] |
| |
| Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian |
| |
| You can also use LD_PRELOAD to heap-check an executable that you |
| didn't compile. |
| |
| The heap checker is only available on Linux at this time; see INSTALL |
| for more details. |
| |
| |
| CPU PROFILER |
| ------------ |
| See doc/cpu-profiler.html for information about how to use the CPU |
| profiler and analyze its output. |
| |
| As a quick-start, do the following after installing this package: |
| |
| 1) Link your executable with -lprofiler |
| 2) Run your executable with the CPUPROFILE environment var set: |
| $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] |
| 3) Run pprof to analyze the CPU usage |
| $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output |
| $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output |
| |
| There are other environment variables, besides CPUPROFILE, you can set |
| to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. |
| |
| The CPU profiler is available on all unix-based systems we've tested; |
| see INSTALL for more details. It is not currently available on Windows. |
| |
| NOTE: CPU profiling doesn't work after fork (unless you immediately |
| do an exec()-like call afterwards). Furthermore, if you do |
| fork, and the child calls exit(), it may corrupt the profile |
| data. You can use _exit() to work around this. We hope to have |
| a fix for both problems in the next release of perftools |
| (hopefully perftools 1.2). |
| |
| |
| EVERYTHING IN ONE |
| ----------------- |
| If you want the CPU profiler, heap profiler, and heap leak-checker to |
| all be available for your application, you can do: |
| gcc -o myapp ... -lprofiler -ltcmalloc |
| |
| However, if you have a reason to use the static versions of the |
| library, this two-library linking won't work: |
| gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors! |
| |
| Instead, use the special libtcmalloc_and_profiler library, which we |
| make for just this purpose: |
| gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a |
| |
| |
| CONFIGURATION OPTIONS |
| --------------------- |
| For advanced users, there are several flags you can pass to |
| './configure' that tweak tcmalloc performace. (These are in addition |
| to the environment variables you can set at runtime to affect |
| tcmalloc, described below.) See the INSTALL file for details. |
| |
| |
| ENVIRONMENT VARIABLES |
| --------------------- |
| The cpu profiler, heap checker, and heap profiler will lie dormant, |
| using no memory or CPU, until you turn them on. (Thus, there's no |
| harm in linking -lprofiler into every application, and also -ltcmalloc |
| assuming you're ok using the non-libc malloc library.) |
| |
| The easiest way to turn them on is by setting the appropriate |
| environment variables. We have several variables that let you |
| enable/disable features as well as tweak parameters. |
| |
| Here are some of the most important variables: |
| |
| HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix |
| HEAPCHECK=<type> -- turns on heap checking with strictness 'type' |
| CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. |
| PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code |
| surrounded with ProfilerEnable()/ProfilerDisable(). |
| PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples. |
| |
| TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits |
| MALLOCSTATS=<level> -- prints memory-use stats at program-exit |
| |
| For a full list of variables, see the documentation pages: |
| doc/cpuprofile.html |
| doc/heapprofile.html |
| doc/heap_checker.html |
| |
| |
| COMPILING ON NON-LINUX SYSTEMS |
| ------------------------------ |
| |
| Perftools was developed and tested on x86 Linux systems, and it works |
| in its full generality only on those systems. However, we've |
| successfully ported much of the tcmalloc library to FreeBSD, Solaris |
| x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic |
| functionality in tcmalloc_minimal to Windows. See INSTALL for details. |
| See README_windows.txt for details on the Windows port. |
| |
| |
| PERFORMANCE |
| ----------- |
| |
| If you're interested in some third-party comparisons of tcmalloc to |
| other malloc libraries, here are a few web pages that have been |
| brought to our attention. The first discusses the effect of using |
| various malloc libraries on OpenLDAP. The second compares tcmalloc to |
| win32's malloc. |
| http://www.highlandsun.com/hyc/malloc/ |
| http://gaiacrtn.free.fr/articles/win32perftools.html |
| |
| It's possible to build tcmalloc in a way that trades off faster |
| performance (particularly for deletes) at the cost of more memory |
| fragmentation (that is, more unusable memory on your system). See the |
| INSTALL file for details. |
| |
| |
| OLD SYSTEM ISSUES |
| ----------------- |
| |
| When compiling perftools on some old systems, like RedHat 8, you may |
| get an error like this: |
| ___tls_get_addr: symbol not found |
| |
| This means that you have a system where some parts are updated enough |
| to support Thread Local Storage, but others are not. The perftools |
| configure script can't always detect this kind of case, leading to |
| that error. To fix it, just comment out (or delete) the line |
| #define HAVE_TLS 1 |
| in your config.h file before building. |
| |
| |
| 64-BIT ISSUES |
| ------------- |
| |
| There are two issues that can cause program hangs or crashes on x86_64 |
| 64-bit systems, which use the libunwind library to get stack-traces. |
| Neither issue should affect the core tcmalloc library; they both |
| affect the perftools tools such as cpu-profiler, heap-checker, and |
| heap-profiler. |
| |
| 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the |
| libc function dl_iterate_phdr() acquires its locks in the wrong |
| order. This bug should not affect tcmalloc, but may cause occasional |
| deadlock with the cpu-profiler, heap-profiler, and heap-checker. |
| Its likeliness increases the more dlopen() commands an executable has. |
| Most executables don't have any, though several library routines like |
| getgrgid() call dlopen() behind the scenes. |
| |
| 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the |
| cpu-profiler tool is unreliable: it will sometimes work, but sometimes |
| cause a segfault. I'll explain the problem first, and then some |
| workarounds. |
| |
| Note that this only affects the cpu-profiler, which is a |
| gperftools feature you must turn on manually by setting the |
| CPUPROFILE environment variable. If you do not turn on cpu-profiling, |
| you shouldn't see any crashes due to perftools. |
| |
| The gory details: The underlying problem is in the backtrace() |
| function, which is a built-in function in libc. |
| Backtracing is fairly straightforward in the normal case, but can run |
| into problems when having to backtrace across a signal frame. |
| Unfortunately, the cpu-profiler uses signals in order to register a |
| profiling event, so every backtrace that the profiler does crosses a |
| signal frame. |
| |
| In our experience, the only time there is trouble is when the signal |
| fires in the middle of pthread_mutex_lock. pthread_mutex_lock is |
| called quite a bit from system libraries, particularly at program |
| startup and when creating a new thread. |
| |
| The solution: The dwarf debugging format has support for 'cfi |
| annotations', which make it easy to recognize a signal frame. Some OS |
| distributions, such as Fedora and gentoo 2007.0, already have added |
| cfi annotations to their libc. A future version of libunwind should |
| recognize these annotations; these systems should not see any |
| crashses. |
| |
| Workarounds: If you see problems with crashes when running the |
| cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into |
| your code, rather than setting CPUPROFILE. This will profile only |
| those sections of the codebase. Though we haven't done much testing, |
| in theory this should reduce the chance of crashes by limiting the |
| signal generation to only a small part of the codebase. Ideally, you |
| would not use ProfilerStart()/ProfilerStop() around code that spawns |
| new threads, or is otherwise likely to cause a call to |
| pthread_mutex_lock! |
| |
| --- |
| 17 May 2011 |