[libc++] Optimize num_get integral functions (#121795)
```
---------------------------------------------------
Benchmark old new
---------------------------------------------------
BM_num_get<bool> 86.5 ns 32.3 ns
BM_num_get<long> 82.1 ns 30.3 ns
BM_num_get<long long> 85.2 ns 33.4 ns
BM_num_get<unsigned short> 85.3 ns 31.2 ns
BM_num_get<unsigned int> 84.2 ns 31.1 ns
BM_num_get<unsigned long> 83.6 ns 31.9 ns
BM_num_get<unsigned long long> 87.7 ns 31.5 ns
BM_num_get<float> 116 ns 114 ns
BM_num_get<double> 114 ns 114 ns
BM_num_get<long double> 113 ns 114 ns
BM_num_get<void*> 151 ns 144 ns
```
This patch applies multiple optimizations:
- Stages two and three of do_get are merged and a custom integer parser
has been implemented
This avoids allocations, removes the need for strto{,u}ll and avoids
__stage2_int_loop (avoiding extra writes to memory)
- std::find has been replaced with __atoms_offset, which uses vector
instructions to look for a character
Fixes #158100
Fixes #158102
NOKEYCHECK=True
GitOrigin-RevId: 2bdd1357c826afe681ab0d6ddfa8fb814b2cef6a
21 files changed