- dc0d882 Fix wrong types in hexagon_hvx.h by Alexander Shaposhnikov · 5 hours ago upstream/master
- 12f3b6e On Hexagon std::int32_t is defined as long, which is distinct from int. Because slinky::thread_pool uses int in its base class signature, using int32_t in the derived class causes a type mismatch. by Alexander Shaposhnikov · 5 hours ago
- 3ed73f5 Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. by Mohammadreza Heydary · 11 hours ago
- 43c4f7b Add fine-grain detection of unsupported fp16 ops when falling back to fp32. by Quentin Khan · 16 hours ago
- 1f4e631 Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. by Misha Gutman · 20 hours ago
- ce3b7c1 Fix XNNPACK compilation failure on Windows ARM64. by Frank Barchard · 25 hours ago
- 68373b6 Disable F32QC8W tests when using YNNPACK by Dillon Sharlet · 28 hours ago
- cf112cb Minor loop fusion improvements by Dillon Sharlet · 28 hours ago
- 98930ba Test reduce kernels with pi summation in both ascending and descending order by Dillon Sharlet · 29 hours ago
- eb93f24 Correctly zero-initialize `xnn_unary_params`. by Quentin Khan · 32 hours ago
- 5fa627d f16 vlog switch from rational-3-3 to rational-1-3 by Frank Barchard · 32 hours ago
- 51af7f5 Make external tensor order follow topological traversal of the graph. by Quentin Khan · 33 hours ago
- 6f2d84e Fix isomorphic matcher. by Quentin Khan · 33 hours ago
- 859e1a5 Merge pull request #10369 from aizu-m:reduce-axis-bounds by XNNPACK Team · 35 hours ago
- 799de5c Merge pull request #10373 from sin99xx:master by XNNPACK Team · 35 hours ago
- 4bc0e6a Remove useless test main function. by Quentin Khan · 2 days ago
- 207dab6 Enabled f32_qc8w bmm on the subgraph level. by Misha Gutman · 2 days ago
- 932ea64 Make hardware configuration and initialization guards (re)setable. by Quentin Khan · 2 days ago
- 85a6530 Added f32_qc8w operator level support for batch matrix multiply. by Misha Gutman · 2 days ago
- be7bb97 Don't rewrite sum(a * b) => dot(a, b) if the dot would be a vector-vector multiply by Dillon Sharlet · 2 days ago
- a1b0202 Add 2 bit SSE GEMM microkernels by Frank Barchard · 2 days ago
- 2fef1fc Change cache key schema for bazel builds. by Alexander Shaposhnikov · 2 days ago
- d431260 Fix undeclared identifier XNN_SIMD_NUM_RCP_ITER_F16 in wasm relaxed simd fp16 by Frank Barchard · 2 days ago
- 6062dc0 fix-concat-oob-write by waris ) · 2 days ago
- cced44b Add fp16 and bf16 implementations of `exp`, `expm1`, `log`, `log1p`, `erf`, `tanh` by Dillon Sharlet · 2 days ago
- 383840f Update XNNPACK elementwise benchmarks to use consistent N elements. by Frank Barchard · 2 days ago
- 8649fdd validate reduction axes before nchw remap in static reduce by aizu-m · 3 days ago
- 76de138 Add fp16 arithmetic to SIMD wrappers by Dillon Sharlet · 3 days ago chromium/7874 chromium/7875
- 6c4f444 Use `topological_sort` instead of `swap` by Dillon Sharlet · 3 days ago
- fa26412 Fix NaN handling by XNNPACK Team · 3 days ago
- 0654066 Fix bf16 cast by Dillon Sharlet · 4 days ago
- 1aa63db Fixed flakiness of qd8_f16_qc2w operator level tests. by Misha Gutman · 4 days ago
- 46394a4 Simplified fix for warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots. by Frank Barchard · 6 days ago
- 078291a Fix overzealous assert by Dillon Sharlet · 6 days ago
- d34f52c Update KleidiAI in XNNPACK by Dillon Sharlet · 6 days ago
- 5d756cd Add approx_tanh operator support behind YNN_FLAG_FAST_MATH. by XNNPACK Team · 6 days ago
- 76228ba Fix NaN handling by XNNPACK Team · 6 days ago
- 58c0a52 Remove RTTI from the Tensor API. Refactor operation handling in the graph. by Quentin Khan · 7 days ago
- d89bf2f Don't allow broadcasts as the first dimension of dot inputs by Dillon Sharlet · 7 days ago
- a74b048 Relax tolerance of sum reduce test by Dillon Sharlet · 7 days ago
- 23ba0fb Remove `force_root` from `static_transpose` scheduling by Dillon Sharlet · 7 days ago
- 0702200 Enable `YNN_FLAG_FAST_MATH` in XNNPACK compatibility layer by Dillon Sharlet · 7 days ago
- bf8d96b Add YNN_FLAG_FAST_MATH and approx_erf operator support behind this flag. by XNNPACK Team · 7 days ago
- 1eb7300 Do not create serial loops for k2, k3, ... by Marie White · 8 days ago
- 9b4a49f Fix warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots. by Frank Barchard · 8 days ago
- 05c9b91 Remove fxdiv usages from XNNPACK, keeping it only for pthreadpool by Frank Barchard · 8 days ago
- e2ab35a Merge pull request #10242 from ken-unger:f16-vlog-rvv by XNNPACK Team · 8 days ago
- 5004f85 Rewrite `transpose(static_broadcast(x))` => `static_broadcast(transpose(x))` by Dillon Sharlet · 8 days ago
- 01d254d Remove `broadcast` op implementation by Dillon Sharlet · 8 days ago
- 5fc47cc Fuse sequences of transpose(transpose(x)) into one transpose(x) by Dillon Sharlet · 8 days ago
- f9f2c22 Do not rely on tile_k when aligning split_k by Marie White · 8 days ago
- f1ab455 Implement `static_expand_dims` using `static_transpose` by Dillon Sharlet · 8 days ago
- d1da9a5 Add transcendental ops for every x86 architecture by Dillon Sharlet · 9 days ago
- 38eb8ab Remove RTTI from the Tensor API. Rework the `Quantization` hierarchy. by Quentin Khan · 9 days ago
- ac73c5b Remove RTTI from the Tensor API. Rework the `Buffer` hierarchy. by Quentin Khan · 9 days ago
- 43118f4 Remove RTTI from the Tensor API. Introduce `TypeId` class. by Quentin Khan · 9 days ago
- 30e1a98 f16-vtanh using high-accuracy rational polynomial implementation. by Frank Barchard · 9 days ago
- f9ee799 Add indirection_bench to test performance of indirection init by Frank Barchard · 9 days ago
- 134e8ef Fix test timeouts on emulators by Dillon Sharlet · 9 days ago
- dddad07 Split dot operation on K. by Marie White · 9 days ago
- 4d837f2 Fix attempts to use AVX2 instructions on non-AVX2 targets by Dillon Sharlet · 9 days ago
- acb00f5 Remove tile size from kernel function name by Dillon Sharlet · 9 days ago
- e60eb9d Add missing build of arm_neonfma benchmarks by Dillon Sharlet · 9 days ago
- 6cb1a2d Add XNN_ENABLE_RNDNU16 build flag and conditionally use rndnu16 kernels by Frank Barchard · 9 days ago
- 8ea1945 `tanh` accuracy improvements by Dillon Sharlet · 9 days ago
- cc4daec Migrate LiteRT ATS unary op graph generation to use litert::tensor API. by Gerardo Carranza · 9 days ago
- 35ba08c Add `tanh` SIMD wrappers by Dillon Sharlet · 10 days ago
- f8deca0 Replace rational polynomials for exp with non-rational polynomials by Dillon Sharlet · 10 days ago
- f4ed055 Add `erf` SIMD math functions by Dillon Sharlet · 10 days ago
- 41fde00 Improve exp approximation by Dillon Sharlet · 10 days ago
- 2b99af3 Add clarifying comments in call to define_transpose_a(). by Marie White · 11 days ago
- 894ae65 Fix `floor_log2(NaN)` to be `NaN` by Dillon Sharlet · 11 days ago
- eafd6fe Add benchmarks of exp and log for avx and avx2 by Dillon Sharlet · 11 days ago
- 38631e8 Refactor `exp` and `expm1` to use the same implementation by Dillon Sharlet · 11 days ago
- fda3ca7 Fix precision issue in rndnu16 requantization for scales near powers of 2. by Frank Barchard · 13 days ago
- 52bd8d0 Tighten tolerances of `log` from 3 ULPs to 2 by Dillon Sharlet · 13 days ago
- 0ccb84e Implement `YNN_FLAG_CONSISTENT_ARITHMETIC` for unary elementwise kernels by Dillon Sharlet · 13 days ago
- c45d6b4 Don't split the innermost dimension if the type of the input is sub-byte. by Volodymyr Kysenko · 13 days ago
- 5ff101c Merge pull request #10298 from wangw-1991:fix_LUT_fusion by XNNPACK Team · 13 days ago
- 3b5dbb2 Remove `fma` when not available, and add `multiply_add` which optionally uses `fma` when available. by Dillon Sharlet · 13 days ago
- 0080367 Combine x86 SIMD wrapper headers by Dillon Sharlet · 13 days ago
- c4e49c6 Combine ARM SIMD wrapper headers by Dillon Sharlet · 13 days ago
- 00825fc Define all architecture flags transitively implied by enabled architectures. by Dillon Sharlet · 14 days ago
- cccad55 Add math helpers to SIMD wrappers by Dillon Sharlet · 2 weeks ago
- 549deb8 Remove unused `transpose` SIMD wrapper by Dillon Sharlet · 2 weeks ago
- 58a65db Mark values as external outputs in constant folding only if they are actually used in the non-constant pipeline. by Volodymyr Kysenko · 2 weeks ago
- 6c9a1ab Consolidate some SIMD wrapper headers by Dillon Sharlet · 2 weeks ago
- ea77aab Generalize FMA emulation helper by Dillon Sharlet · 2 weeks ago
- 1b849f4 Tune params for unary kernels to avoid tolerance issues by Dillon Sharlet · 2 weeks ago
- 0f6ee41 Initial upload. by Wei Wang · 2 weeks ago
- d4adfcd gemm benchmark documentation fix - update names of models to match files by Frank Barchard · 2 weeks ago
- f56a6c7 Add numerically correct `expm1` kernels by Dillon Sharlet · 2 weeks ago
- 29a1c73 Add std::string overloads for tensor::Create. by XNNPACK Team · 2 weeks ago
- b1a0a5d Merge pull request #10261 from velonica0:f16 by XNNPACK Team · 2 weeks ago
- a0dbef3 Improve `exp` accuracy by Dillon Sharlet · 2 weeks ago
- 3f33e55 Add `select` and conditional operations to SIMD wrappers by Dillon Sharlet · 2 weeks ago
- 5a59a54 Properly open source tensor api in github through copybara by XNNPACK Team · 2 weeks ago
- bcc179a Remove fp64 wasm support by Dillon Sharlet · 2 weeks ago
- 0547829 Remove lo/hi as member functions of `vec<T, N>` by Dillon Sharlet · 2 weeks ago
- cc68da8 Add sigmoid_fp64 kernels by Dillon Sharlet · 2 weeks ago