1. f34fa03 Add simd wrappers for bitwise ops. by Volodymyr Kysenko · 72 minutes ago upstream/master
  2. a94c2ac Update SDE from 10.5 to 10.7 for github testing on Intel by Frank Barchard · 2 hours ago
  3. 15e04ff Fix incorrectly validating non-zero zero points for qcint8. by Dillon Sharlet · 6 hours ago
  4. ecb6799 Check that input_b's zero point is 0 before ignoring it by Dillon Sharlet · 7 hours ago
  5. 60b526f Use simd wrappers for round, ceil, floor, and sqrt. by Volodymyr Kysenko · 8 hours ago
  6. 03aba00 Add floor, ceil, and sqrt SIMD wrappers. by Volodymyr Kysenko · 9 hours ago
  7. 6491ba3 Add sse2_fma to emulate fma by Dillon Sharlet · 10 hours ago
  8. a36bd79 Move fp16 rewrite specific fields in `xnn_value` to a dedicated struct. by Quentin Khan · 10 hours ago
  9. 12d63c6 Fix incorrect claim that unary_elementwise ops can implicitly broadcast by Dillon Sharlet · 10 hours ago
  10. 1b40695 Add `xnn_subgraph_add_internal_values()`. by Quentin Khan · 10 hours ago
  11. fb98246 Refactor `xnn_subgraph_new_node()` to use `xnn_subgraph_add_nodes()`. by Quentin Khan · 11 hours ago
  12. fcd3a3e Move macro comment to the correct place. by Quentin Khan · 13 hours ago
  13. 3aacda2 Refactor SIMD wrappers to reduce boilerplate by Dillon Sharlet · 24 hours ago
  14. 48c0219 Add u8x4 and u8x8 using uint32_t and uint64_t implementations by Dillon Sharlet · 25 hours ago
  15. 156ef8a Test and fix empty reductions in YNNPACK by Dillon Sharlet · 26 hours ago
  16. 64e490f Use infix operators if they're defined for simd::vec in elementwise compiler. by Volodymyr Kysenko · 30 hours ago
  17. 48d83fe Only test empty reduction inputs, not empty reduction outputs. by Dillon Sharlet · 33 hours ago
  18. 1960fd1 Check and get permissions before executing a kernel in schedule_bench tool. by Marie White · 2 days ago
  19. 1114532 Refactor emit_op function to remove redundant code. by Volodymyr Kysenko · 3 days ago
  20. bb52ed0 Remove redundant type cast in elementwise kernel compiler. by Volodymyr Kysenko · 3 days ago
  21. d94380b Fix empty reductions by Dillon Sharlet · 4 days ago
  22. f137acb Add a dot kernel benchmark with a manual dot scheduling command line interface by Dillon Sharlet · 5 days ago
  23. 34e5cce Enable arm fp32 and fp64 dot kernels to unroll k by more than one vector by Dillon Sharlet · 5 days ago
  24. 67dec31 Limit the number of threads used by `dot_bench` by Dillon Sharlet · 5 days ago
  25. f652afe Fixed input size issue found via masan in 2-bit FC. by Misha Gutman · 5 days ago
  26. 34f5e8a Tweak scheduling logic for dots by Dillon Sharlet · 6 days ago
  27. 2c0677c Rename kAmxTileRowBytes to tile_row_bytes by Marie White · 6 days ago
  28. 8da5e67 Add 2x2 AMX BF16 and INT8 kernels by Marie White · 6 days ago
  29. 262e13f Add fp64 dot kernels by Dillon Sharlet · 6 days ago
  30. 0011854 Added qd8_f16_qc2w and qdu8_f16_qc2w on operator and subgraph levels. by Misha Gutman · 6 days ago
  31. e4b5160 Enable AVXVNNI qd8 qc2w GEMM microkernel by Frank Barchard · 7 days ago
  32. 957988f Enable AVXVNNI qs8 qc2w GEMM microkernel by Frank Barchard · 7 days ago
  33. cdd3b23 Change fp16 softmax kernels to compute the sum as fp32 by Dillon Sharlet · 7 days ago
  34. 41d9b2d Allow an absolute error of 1 for integer convert outputs. by Dillon Sharlet · 7 days ago
  35. 5b4268d Run generators to update pqs8 tests and benchmarks by Frank Barchard · 7 days ago
  36. 7734071 AVXVNNI qd8_f16_qc2w and qd8_f32_qc2w GEMM microkernels by Frank Barchard · 7 days ago
  37. 9213121 Merge pull request #8880 from qualcomm:sme1/pqs8-qc8w-gemm-igemm by XNNPACK Team · 7 days ago
  38. dc51753 Scalar qd8_f16_qc2w GEMM microkernel by Frank Barchard · 7 days ago
  39. 9670025 Reduce log level from info to warning in debug builds by Dillon Sharlet · 7 days ago
  40. b6b9992 Remove `testonly = True` from `benchmark_main` by Dillon Sharlet · 7 days ago
  41. c4c44a7 Skip buffer out of bounds checks when running with msan by Dillon Sharlet · 8 days ago
  42. 26feacc Handle quantization with separate ops by Dillon Sharlet · 8 days ago chromium/7718
  43. 1b70ef3 AVXVNNI qs8 qc2w GEMM microkernel by Frank Barchard · 8 days ago
  44. 7aa9183 Add `ynn_define_tensor` API by Dillon Sharlet · 8 days ago
  45. 05e81ea [gn] February update of DEPS by Richard Townsend · 8 days ago
  46. cd2a0d4 Merge pull request #9594 from ken-unger:f16-gemm-spmm-dwconv-rvv by XNNPACK Team · 8 days ago
  47. b7c88d4 Increase timeout for `qd8_f32_qc8w_igemm_minmax_test` by Dillon Sharlet · 9 days ago
  48. 406cc2d Unroll the tail loop of SME kernels by Dillon Sharlet · 9 days ago
  49. f870adb Relax tolerance for YNN_NODE_FLAG_F32_DOT_TO_BF16_X3 dots by Dillon Sharlet · 9 days ago
  50. ce7aaae [gn] Link pthreadpool_standalone only against the module that needs it by Richard Townsend · 11 days ago
  51. 639decd Use avx512 for small output tiles by Dillon Sharlet · 11 days ago
  52. 337dacf Rename avx512f and avx512bw kernels to avx512 by Dillon Sharlet · 12 days ago
  53. 77f08c2 Clean up unnecessary checks in generated dot kernels by Dillon Sharlet · 12 days ago
  54. 511cc99 Merge remote-tracking branch 'google/master' into sme1/pqs8-qc8w-gemm-igemm by Vaisakh K V · 12 days ago
  55. 924bf08 Dot kernel naming and other cleanup by Dillon Sharlet · 12 days ago
  56. cb00a82 Rewrite common subgraphs test to use `ynn_subgraph` by reference by Dillon Sharlet · 12 days ago
  57. c0b9276 s32_mul for simd-hvx.h use mpyieo for upper 16 bits. by Frank Barchard · 12 days ago
  58. 035f3dd Add ARM SVE transpose kernels by Dillon Sharlet · 12 days ago
  59. d2ae61d Add SVE partial loads/stores to SIMD wrappers by Dillon Sharlet · 12 days ago
  60. 6fb6cad Fix benchmarks attempting to run code without checking the architecture first by Dillon Sharlet · 12 days ago
  61. 04581a6 Update SDE from 9.58 to 10.5 for github testing on Intel by Frank Barchard · 13 days ago
  62. 6361eab Add `dequantize` ternary kernels and use these to implement convert by Dillon Sharlet · 13 days ago
  63. 066e3e7 Add an `operator<<` for `ynn_subgraph` to make test failures easier to read by Dillon Sharlet · 14 days ago
  64. 8dc139c add rvv fp16 kernels for f16-gemm, f16-igemm, f16-dwconv, f16-spmm by Ken Unger · 14 days ago
  65. 6425d79 Add "scalar" quantize kernels, and use them to implement convert assuming they exist. by Dillon Sharlet · 14 days ago
  66. 43feb82 Use slinky::span instead of std::vector when possible by Dillon Sharlet · 14 days ago
  67. 726be35 Minor cleanups of fusion by Dillon Sharlet · 2 weeks ago
  68. 9ab5966 Randomize arithmetic tests for SIMD wrappers by Dillon Sharlet · 2 weeks ago
  69. d6298c9 [gn] Fixups required for integration into Chromium by Richard Townsend · 2 weeks ago
  70. e3f5ee8 Reorder x86 ISA enum by performance for QD8 to select faster ISA by Frank Barchard · 2 weeks ago
  71. c28ce78 Add basic HVX reduce kernels. by Dillon Sharlet · 2 weeks ago
  72. af8ea33 Use guard bytes on 32-bit arm by Dillon Sharlet · 2 weeks ago
  73. 533d396 Clean up f16-raddstoreexpminusmax kernels by Dillon Sharlet · 2 weeks ago
  74. 46ac455 Disable our own guard bytes mechanism if we have asan or msan by Dillon Sharlet · 3 weeks ago
  75. f32d329 Fix padding for subconvolution case by Dillon Sharlet · 3 weeks ago
  76. 7fd0f79 Use load/store from SIMD wrappers in elementwise compiler. by Volodymyr Kysenko · 3 weeks ago
  77. bd547a5 Add f16x4 load/store operations. by Volodymyr Kysenko · 3 weeks ago
  78. b347035 Allow partial SIMD load/store functions to handle full-size vectors. by Volodymyr Kysenko · 3 weeks ago
  79. 2a32558 Add missing rule for `broadcast_like` test by Dillon Sharlet · 3 weeks ago
  80. 5121fe0 [gn] Remove unnecessary DEPS by Richard Townsend · 3 weeks ago
  81. 5b05286 Use generic simd::min/max for elementwise kernels. by Volodymyr Kysenko · 3 weeks ago
  82. de693ef Replace `TypeGenerator` with `fill_random` by Dillon Sharlet · 3 weeks ago
  83. 10b7aa0 Use simd::broadcast in elementwise kernel compiler. by Volodymyr Kysenko · 3 weeks ago
  84. 928baa2 Refactor YNNPACK kernel generation to use simd::vec wrappers. (1/N) by Volodymyr Kysenko · 3 weeks ago
  85. 226f135 Enable AVX2 and AVX10 5x8 QS8 2 bit GEMM microkernels by Frank Barchard · 3 weeks ago
  86. 19e1944 Specialize `TypeGenerator` for all integers by Dillon Sharlet · 3 weeks ago
  87. e607cdd Use shorthand consistent type names for hexagon_hvx simd test by Dillon Sharlet · 3 weeks ago
  88. 61ba61d Avoid BENCHMARK_CAPTURE when unnecessary by Dillon Sharlet · 3 weeks ago
  89. e8aad3c Fix 32-bit integer multiply on HVX by Dillon Sharlet · 3 weeks ago
  90. 9940dd6 Reduce kernel test improvements by Dillon Sharlet · 3 weeks ago
  91. 57b14a8 Better sigmoid reference implementation by Dillon Sharlet · 3 weeks ago
  92. 82367b5 Don't replace minimum/maximum operators with clamp when input is broadcast by Reilly Grant · 3 weeks ago
  93. 9f84c7a Remove unused generate-spmm-test.py by Dillon Sharlet · 3 weeks ago
  94. 236a3e2 Update use of deprecated benchmark::internal::Benchmark by Dillon Sharlet · 3 weeks ago
  95. 4a547fe Simplify global construction of GEMM benchmarks by Dillon Sharlet · 3 weeks ago
  96. 35b9206 Clean up unnecessary codepaths from operator-run by Dillon Sharlet · 3 weeks ago
  97. 4f38bf9 Added qdu8_qc2w to FC for qc2w. This is needed to use fast avx2 kernels for DQ. by Misha Gutman · 3 weeks ago
  98. b83baa8 AVX10 qs8 qc2w GEMM microkernels by Frank Barchard · 4 weeks ago
  99. 810b9af [gn] Add experimental integration with KleidiAI by Richard Townsend · 4 weeks ago
  100. 661b24e Improve transpose benchmark tile size selection by Dillon Sharlet · 4 weeks ago