- f34fa03 Add simd wrappers for bitwise ops. by Volodymyr Kysenko · 72 minutes ago upstream/master
- a94c2ac Update SDE from 10.5 to 10.7 for github testing on Intel by Frank Barchard · 2 hours ago
- 15e04ff Fix incorrectly validating non-zero zero points for qcint8. by Dillon Sharlet · 6 hours ago
- ecb6799 Check that input_b's zero point is 0 before ignoring it by Dillon Sharlet · 7 hours ago
- 60b526f Use simd wrappers for round, ceil, floor, and sqrt. by Volodymyr Kysenko · 8 hours ago
- 03aba00 Add floor, ceil, and sqrt SIMD wrappers. by Volodymyr Kysenko · 9 hours ago
- 6491ba3 Add sse2_fma to emulate fma by Dillon Sharlet · 10 hours ago
- a36bd79 Move fp16 rewrite specific fields in `xnn_value` to a dedicated struct. by Quentin Khan · 10 hours ago
- 12d63c6 Fix incorrect claim that unary_elementwise ops can implicitly broadcast by Dillon Sharlet · 10 hours ago
- 1b40695 Add `xnn_subgraph_add_internal_values()`. by Quentin Khan · 10 hours ago
- fb98246 Refactor `xnn_subgraph_new_node()` to use `xnn_subgraph_add_nodes()`. by Quentin Khan · 11 hours ago
- fcd3a3e Move macro comment to the correct place. by Quentin Khan · 13 hours ago
- 3aacda2 Refactor SIMD wrappers to reduce boilerplate by Dillon Sharlet · 24 hours ago
- 48c0219 Add u8x4 and u8x8 using uint32_t and uint64_t implementations by Dillon Sharlet · 25 hours ago
- 156ef8a Test and fix empty reductions in YNNPACK by Dillon Sharlet · 26 hours ago
- 64e490f Use infix operators if they're defined for simd::vec in elementwise compiler. by Volodymyr Kysenko · 30 hours ago
- 48d83fe Only test empty reduction inputs, not empty reduction outputs. by Dillon Sharlet · 33 hours ago
- 1960fd1 Check and get permissions before executing a kernel in schedule_bench tool. by Marie White · 2 days ago
- 1114532 Refactor emit_op function to remove redundant code. by Volodymyr Kysenko · 3 days ago
- bb52ed0 Remove redundant type cast in elementwise kernel compiler. by Volodymyr Kysenko · 3 days ago
- d94380b Fix empty reductions by Dillon Sharlet · 4 days ago
- f137acb Add a dot kernel benchmark with a manual dot scheduling command line interface by Dillon Sharlet · 5 days ago
- 34e5cce Enable arm fp32 and fp64 dot kernels to unroll k by more than one vector by Dillon Sharlet · 5 days ago
- 67dec31 Limit the number of threads used by `dot_bench` by Dillon Sharlet · 5 days ago
- f652afe Fixed input size issue found via masan in 2-bit FC. by Misha Gutman · 5 days ago
- 34f5e8a Tweak scheduling logic for dots by Dillon Sharlet · 6 days ago
- 2c0677c Rename kAmxTileRowBytes to tile_row_bytes by Marie White · 6 days ago
- 8da5e67 Add 2x2 AMX BF16 and INT8 kernels by Marie White · 6 days ago
- 262e13f Add fp64 dot kernels by Dillon Sharlet · 6 days ago
- 0011854 Added qd8_f16_qc2w and qdu8_f16_qc2w on operator and subgraph levels. by Misha Gutman · 6 days ago
- e4b5160 Enable AVXVNNI qd8 qc2w GEMM microkernel by Frank Barchard · 7 days ago
- 957988f Enable AVXVNNI qs8 qc2w GEMM microkernel by Frank Barchard · 7 days ago
- cdd3b23 Change fp16 softmax kernels to compute the sum as fp32 by Dillon Sharlet · 7 days ago
- 41d9b2d Allow an absolute error of 1 for integer convert outputs. by Dillon Sharlet · 7 days ago
- 5b4268d Run generators to update pqs8 tests and benchmarks by Frank Barchard · 7 days ago
- 7734071 AVXVNNI qd8_f16_qc2w and qd8_f32_qc2w GEMM microkernels by Frank Barchard · 7 days ago
- 9213121 Merge pull request #8880 from qualcomm:sme1/pqs8-qc8w-gemm-igemm by XNNPACK Team · 7 days ago
- dc51753 Scalar qd8_f16_qc2w GEMM microkernel by Frank Barchard · 7 days ago
- 9670025 Reduce log level from info to warning in debug builds by Dillon Sharlet · 7 days ago
- b6b9992 Remove `testonly = True` from `benchmark_main` by Dillon Sharlet · 7 days ago
- c4c44a7 Skip buffer out of bounds checks when running with msan by Dillon Sharlet · 8 days ago
- 26feacc Handle quantization with separate ops by Dillon Sharlet · 8 days ago chromium/7718
- 1b70ef3 AVXVNNI qs8 qc2w GEMM microkernel by Frank Barchard · 8 days ago
- 7aa9183 Add `ynn_define_tensor` API by Dillon Sharlet · 8 days ago
- 05e81ea [gn] February update of DEPS by Richard Townsend · 8 days ago
- cd2a0d4 Merge pull request #9594 from ken-unger:f16-gemm-spmm-dwconv-rvv by XNNPACK Team · 8 days ago
- b7c88d4 Increase timeout for `qd8_f32_qc8w_igemm_minmax_test` by Dillon Sharlet · 9 days ago
- 406cc2d Unroll the tail loop of SME kernels by Dillon Sharlet · 9 days ago
- f870adb Relax tolerance for YNN_NODE_FLAG_F32_DOT_TO_BF16_X3 dots by Dillon Sharlet · 9 days ago
- ce7aaae [gn] Link pthreadpool_standalone only against the module that needs it by Richard Townsend · 11 days ago
- 639decd Use avx512 for small output tiles by Dillon Sharlet · 11 days ago
- 337dacf Rename avx512f and avx512bw kernels to avx512 by Dillon Sharlet · 12 days ago
- 77f08c2 Clean up unnecessary checks in generated dot kernels by Dillon Sharlet · 12 days ago
- 511cc99 Merge remote-tracking branch 'google/master' into sme1/pqs8-qc8w-gemm-igemm by Vaisakh K V · 12 days ago
- 924bf08 Dot kernel naming and other cleanup by Dillon Sharlet · 12 days ago
- cb00a82 Rewrite common subgraphs test to use `ynn_subgraph` by reference by Dillon Sharlet · 12 days ago
- c0b9276 s32_mul for simd-hvx.h use mpyieo for upper 16 bits. by Frank Barchard · 12 days ago
- 035f3dd Add ARM SVE transpose kernels by Dillon Sharlet · 12 days ago
- d2ae61d Add SVE partial loads/stores to SIMD wrappers by Dillon Sharlet · 12 days ago
- 6fb6cad Fix benchmarks attempting to run code without checking the architecture first by Dillon Sharlet · 12 days ago
- 04581a6 Update SDE from 9.58 to 10.5 for github testing on Intel by Frank Barchard · 13 days ago
- 6361eab Add `dequantize` ternary kernels and use these to implement convert by Dillon Sharlet · 13 days ago
- 066e3e7 Add an `operator<<` for `ynn_subgraph` to make test failures easier to read by Dillon Sharlet · 14 days ago
- 8dc139c add rvv fp16 kernels for f16-gemm, f16-igemm, f16-dwconv, f16-spmm by Ken Unger · 14 days ago
- 6425d79 Add "scalar" quantize kernels, and use them to implement convert assuming they exist. by Dillon Sharlet · 14 days ago
- 43feb82 Use slinky::span instead of std::vector when possible by Dillon Sharlet · 14 days ago
- 726be35 Minor cleanups of fusion by Dillon Sharlet · 2 weeks ago
- 9ab5966 Randomize arithmetic tests for SIMD wrappers by Dillon Sharlet · 2 weeks ago
- d6298c9 [gn] Fixups required for integration into Chromium by Richard Townsend · 2 weeks ago
- e3f5ee8 Reorder x86 ISA enum by performance for QD8 to select faster ISA by Frank Barchard · 2 weeks ago
- c28ce78 Add basic HVX reduce kernels. by Dillon Sharlet · 2 weeks ago
- af8ea33 Use guard bytes on 32-bit arm by Dillon Sharlet · 2 weeks ago
- 533d396 Clean up f16-raddstoreexpminusmax kernels by Dillon Sharlet · 2 weeks ago
- 46ac455 Disable our own guard bytes mechanism if we have asan or msan by Dillon Sharlet · 3 weeks ago
- f32d329 Fix padding for subconvolution case by Dillon Sharlet · 3 weeks ago
- 7fd0f79 Use load/store from SIMD wrappers in elementwise compiler. by Volodymyr Kysenko · 3 weeks ago
- bd547a5 Add f16x4 load/store operations. by Volodymyr Kysenko · 3 weeks ago
- b347035 Allow partial SIMD load/store functions to handle full-size vectors. by Volodymyr Kysenko · 3 weeks ago
- 2a32558 Add missing rule for `broadcast_like` test by Dillon Sharlet · 3 weeks ago
- 5121fe0 [gn] Remove unnecessary DEPS by Richard Townsend · 3 weeks ago
- 5b05286 Use generic simd::min/max for elementwise kernels. by Volodymyr Kysenko · 3 weeks ago
- de693ef Replace `TypeGenerator` with `fill_random` by Dillon Sharlet · 3 weeks ago
- 10b7aa0 Use simd::broadcast in elementwise kernel compiler. by Volodymyr Kysenko · 3 weeks ago
- 928baa2 Refactor YNNPACK kernel generation to use simd::vec wrappers. (1/N) by Volodymyr Kysenko · 3 weeks ago
- 226f135 Enable AVX2 and AVX10 5x8 QS8 2 bit GEMM microkernels by Frank Barchard · 3 weeks ago
- 19e1944 Specialize `TypeGenerator` for all integers by Dillon Sharlet · 3 weeks ago
- e607cdd Use shorthand consistent type names for hexagon_hvx simd test by Dillon Sharlet · 3 weeks ago
- 61ba61d Avoid BENCHMARK_CAPTURE when unnecessary by Dillon Sharlet · 3 weeks ago
- e8aad3c Fix 32-bit integer multiply on HVX by Dillon Sharlet · 3 weeks ago
- 9940dd6 Reduce kernel test improvements by Dillon Sharlet · 3 weeks ago
- 57b14a8 Better sigmoid reference implementation by Dillon Sharlet · 3 weeks ago
- 82367b5 Don't replace minimum/maximum operators with clamp when input is broadcast by Reilly Grant · 3 weeks ago
- 9f84c7a Remove unused generate-spmm-test.py by Dillon Sharlet · 3 weeks ago
- 236a3e2 Update use of deprecated benchmark::internal::Benchmark by Dillon Sharlet · 3 weeks ago
- 4a547fe Simplify global construction of GEMM benchmarks by Dillon Sharlet · 3 weeks ago
- 35b9206 Clean up unnecessary codepaths from operator-run by Dillon Sharlet · 3 weeks ago
- 4f38bf9 Added qdu8_qc2w to FC for qc2w. This is needed to use fast avx2 kernels for DQ. by Misha Gutman · 3 weeks ago
- b83baa8 AVX10 qs8 qc2w GEMM microkernels by Frank Barchard · 4 weeks ago
- 810b9af [gn] Add experimental integration with KleidiAI by Richard Townsend · 4 weeks ago
- 661b24e Improve transpose benchmark tile size selection by Dillon Sharlet · 4 weeks ago