help clang with inlining important fast-path functions

Clang's recent focus on code size doesn't help us in malloc fast-path
because somehow clang completely ignores inline directives.

In order to help clang generate code that was actually intended by
original authors, we're adding always_inline attribute to key
fast-path functions.

Clang also guessed likely branch "wrong" in couple places. Which is
now addressed by UNLIKELY declarations there.
2 files changed