tree 4076d0b7d13a25ee27853805f7b4def98b26c56e
parent 1692593c58150eff065220a8e6017be7c1ba97e1
author Thomas Lively <tlively@google.com> 1733279371 -0800
committer Thomas Lively <tlively@google.com> 1733282075 -0800

Parallelize the binary parsing of function bodies

After a linear scan through the code section and input source map to
find the start locations corresponding to each function body, parse the
locals and instructions for each function in parallel.

This speeds up binary parsing with a sourcemap by about 20% with 8 cores
on my machine, but only by about 2% with all 128 cores, so this
parallelization has potential but suffers from scaling overhead[^1].

When running a full -O3 optimization pipeline, the parallel parsing
slightly reduces the number of minor page faults, presumably by better
allocating the original instructions in separate thread-local arenas. It
also slightly reduces the max RSS, but these improvements do not
translate into better overall performance. In fact, overall performance
is slightly lower with this change (at least on my machine.)

[^1]: FWIW the full -O3 pipeline also performs significantly better with
8 cores than with 128 cores on my machine.
