help clang with inlining important fast-path functions Clang's recent focus on code size doesn't help us in malloc fast-path because somehow clang completely ignores inline directives. In order to help clang generate code that was actually intended by original authors, we're adding always_inline attribute to key fast-path functions. Clang also guessed likely branch "wrong" in couple places. Which is now addressed by UNLIKELY declarations there.