1. 4536794 Fix KBLOCK size for qc2w GEMM scalar and neondot microkernel tests by Frank Barchard · 3 days ago upstream/master
  2. cafdb1e Fix KBLOCK size for neondot qc2w microkernel tests by Frank Barchard · 3 days ago
  3. 4a0e2db Benchmarks for Apple macos / ios detect cpu frequency by Frank Barchard · 3 days ago
  4. 25b42df Use tags to filter out tests for YNNPACK by Dillon Sharlet · 8 days ago
  5. bf97a38 Fix nc param to gemm/igemm headers by Frank Barchard · 8 days ago
  6. 742fd64 QC2W NEONDOT defer kernel_zero_point compute to after loop by Frank Barchard · 8 days ago
  7. 9b2d3e8 Remove unnecessary assert by Dillon Sharlet · 8 days ago
  8. 2fff7b8 Fix `tile_k > 1` case for rewriting `transpose_a(stencil_copy(x))` -> `stencil_copy(transpose_a(x))` by Dillon Sharlet · 8 days ago
  9. 4f8026d Try to store output of stencil_copy at the innermost loop level. by Volodymyr Kysenko · 11 days ago
  10. e1e7194 Fix stencil_copy bounds if multiple stencils use the same original dimension by Dillon Sharlet · 11 days ago
  11. 911c20f Define Hexagon type for intrinsics compatible with clang by Frank Barchard · 11 days ago
  12. 20821fb 1. Adjust AVX reductions to match AVX512 2. Add consistent_reduce_test.cc by Alexander Shaposhnikov · 11 days ago
  13. da578ff Fix stencil output extent calculation. by Volodymyr Kysenko · 11 days ago
  14. 549d5d2 Remove unused Hexagon intrinsics from intrinsics-polyfill.h by Frank Barchard · 12 days ago
  15. c299a16 Include bias in initializer of dot for f32. by Marie White · 12 days ago
  16. 7d0408c Fix bug in avx512 int32 helpers by Dillon Sharlet · 14 days ago
  17. fc1d7b1 Don't use `EnumerateIndices` unnecessariily by Dillon Sharlet · 2 weeks ago
  18. 82d56fb Change simd::vec to be recursively defined by Dillon Sharlet · 2 weeks ago
  19. f52fbff Fixed full byte requirement in 2-bit Fully Connected test for qs8 version. by Misha Gutman · 2 weeks ago
  20. ecdf584 Don't try to test int2 fully connected with YNNPACK by Dillon Sharlet · 2 weeks ago
  21. 91bc5c2 Added qs8_qc2w FullyConnected. by Misha Gutman · 2 weeks ago
  22. e31a359 Support tile_k > 1 in transpose `stencil_copy` rewrite. by Marie White · 2 weeks ago
  23. 183297d Fix `multi_vec`'s partial `load`/`store` by Dillon Sharlet · 2 weeks ago
  24. ff23b55 Rewrite transpose_a(stencil_copy(x)) to stencil_copy(transpose_a(x)) by XNNPACK Team · 2 weeks ago
  25. 0723f38 Minor fixes to stencil_copy by XNNPACK Team · 2 weeks ago
  26. 944d44d Enable transpose_a kernels for num_k_dims > 1 by Dillon Sharlet · 2 weeks ago
  27. 51dd4c8 Tweak K1 = 1 reduction case by Dillon Sharlet · 2 weeks ago
  28. 996234f Merge pull request #9316 from salmanmkc:upgrade-github-actions-node24 by XNNPACK Team · 2 weeks ago
  29. e317c78 Pass down the original kernel and bias buffers when creating a convolution operation. by Quentin Khan · 2 weeks ago
  30. 103c631 Improved k1=1 reductions. Most of the cases have significant speed-up except scalar kernels, sum bf16_f32 for f16c and sse2, and x8_s32 for avx2. by Misha Gutman · 2 weeks ago
  31. adf70ec Allowed not to pass channel wise zero point to accommodate asymmetric (zero point is assumed 0) 2-bit case. by Misha Gutman · 2 weeks ago
  32. 2c1a512 Enable weight caching for quantized depthwise convolutions. by Quentin Khan · 2 weeks ago
  33. 4574c4d Rewrite SIMD tests to make a new test for each architecture by Dillon Sharlet · 2 weeks ago
  34. 30cd310 Fixed n=0 case that left uninitialized vectors. by Misha Gutman · 3 weeks ago
  35. 7926389 Upgrade GitHub Actions for Node 24 compatibility by Salman Muin Kayser Chishti · 3 weeks ago
  36. 9e77b69 Pass down the original kernel and bias buffers when creating a fully connected operation. by Quentin Khan · 3 weeks ago
  37. 5dab025 Add stencil_copy helper function by XNNPACK Team · 3 weeks ago
  38. d9ad8b4 Attepmted to fix 2-bit fully-connected operator flakyness. by Misha Gutman · 3 weeks ago
  39. 7b059b1 Create transpose_a helper by XNNPACK Team · 3 weeks ago
  40. becb9f5 Don't allow computing ops in-place if the output of the op is an external output. by Dillon Sharlet · 3 weeks ago
  41. 5d89a13 Hexagon V81 portable simd - Remove Q6_Vsf_equals_Vqf32 and output Vsf directly by Frank Barchard · 3 weeks ago
  42. f51a395 Added scalar kernels to qc2w gemm-config. by Misha Gutman · 3 weeks ago
  43. 7b275c9 Fix assert in dot.cc by Volodymyr Kysenko · 3 weeks ago
  44. a423689 Tighten up reduce kernels and architectures by Dillon Sharlet · 3 weeks ago
  45. 021f28e Add `fma`, `concat`, and more `convert` functions to SIMD headers by Dillon Sharlet · 3 weeks ago
  46. 3fc92c7 Add a function to get fingerprints by index for testing. by Quentin Khan · 3 weeks ago
  47. 7746844 Added simd extract tests. by Misha Gutman · 3 weeks ago
  48. ddc8e1a Fix uninitialized variable build error for params to rsum for 2 bit by Frank Barchard · 3 weeks ago
  49. 3a7b710 Merge pull request #9270 from JonathanC-ARM:jclohess_fix_missing_debug_logging by XNNPACK Team · 3 weeks ago
  50. e436865 Use /clang: to pass copts to clang-cl by Dillon Sharlet · 3 weeks ago
  51. 4e2113d Minor code cleanup by Dillon Sharlet · 3 weeks ago
  52. 421648d Added convert<bf16x16, f32x16> and convert<f16x16, f32x16> to x86_avx512f to decrease intrinsics usage in reduce kernels. by Misha Gutman · 3 weeks ago
  53. 79a6eac _mm256_packs_epi* is AVX2, not AVX by Dillon Sharlet · 3 weeks ago
  54. 6400256 Fix windows build flags by Dillon Sharlet · 3 weeks ago
  55. 1e2d4d0 Added convert<bf16x8, f32x8> to x86_avx2 to decrease intrinsics usage in reduce kernels. by Misha Gutman · 3 weeks ago
  56. 5a4ee42 Merge branch 'google:master' into jclohess_fix_missing_debug_logging by Jonathan Clohessy · 3 weeks ago
  57. 31b1ede simd test improvements by Dillon Sharlet · 3 weeks ago
  58. 3c9ad05 Fix uses of AVX2 instructions in AVX targeted code by Dillon Sharlet · 3 weeks ago
  59. aabc21c Automated Code Change by XNNPACK Team · 3 weeks ago
  60. 269fbf1 Use a more intuitive formulation of `make_stencil_dim` by Dillon Sharlet · 3 weeks ago
  61. be53d66 Fixed int2 fully connected setting producers in the beginning of row_sum rewrite. by Misha Gutman · 3 weeks ago
  62. b666baa Templated neondot int2 GEMMs. by Misha Gutman · 3 weeks ago
  63. dfaf2ae Added scalar kernels for int2 gemm. by Misha Gutman · 3 weeks ago
  64. aa67973 Re-enable kernels disabled under msan due to msan bugs by Dillon Sharlet · 3 weeks ago
  65. dc05a09 Add @bazel_tools//tools/cpp/compiler:clang-cl as a signal to target MSVC-style builds by Dillon Sharlet · 3 weeks ago
  66. 631f179 Use _mm_packs_epi32 instead of _mm_packus_epi32 for converting to uint8 by Dillon Sharlet · 3 weeks ago
  67. 0852723 Give ternary nodes a proper type by Dillon Sharlet · 4 weeks ago
  68. 470fb9c Fix unused variable warning in `pack-lh-config.c`. by Quentin Khan · 4 weeks ago
  69. ff3f40a Move f16 sum/sum_squared kernels to avx512bw by Dillon Sharlet · 4 weeks ago
  70. c187017 Speed up reduce kernels test by Dillon Sharlet · 4 weeks ago
  71. 81bb443 Minor readability and refactors for reductions by Dillon Sharlet · 4 weeks ago
  72. e7efd4a Fix out of bounds memory access when transposed A kernels have tile_m != 1 by Dillon Sharlet · 4 weeks ago
  73. 84cdd30 Fix avx512fp16 sum to only add each input once. by Dillon Sharlet · 4 weeks ago
  74. 3e588fd Added bf16 sum and sum_squared to sse2. by Misha Gutman · 4 weeks ago
  75. 8ff7ddb Added bf16 sum and sum_squared to avx512. by Misha Gutman · 4 weeks ago
  76. 682892c Add packed igemm debug logging by Jonathan Clohessy · 4 weeks ago
  77. 4d0cba7 Added bf16 sum and sum_squared to avx2. by Misha Gutman · 4 weeks ago
  78. 4f4cfcc Move bias add to after the dot operation. by XNNPACK Team · 4 weeks ago
  79. 940155d Update xnnpack shim by XNNPACK Team · 4 weeks ago
  80. 85468e7 Partially enable convolution and fully_connected subgraph tests when YNNPACK is enabled by Dillon Sharlet · 4 weeks ago
  81. 09e2d3e Fix hole in reduce kernel test coverage by Dillon Sharlet · 4 weeks ago
  82. 59eb057 Fix some regressions in fully connected test by Dillon Sharlet · 4 weeks ago
  83. a342cf6 Minor reduction cleanups by Dillon Sharlet · 4 weeks ago
  84. 995241f Added bf16 sum and sum_squared to arm neon. by Misha Gutman · 4 weeks ago
  85. 9bab77f Merge pull request #9244 from JonathanC-ARM:jclohess_sme2_pf32_igemm by XNNPACK Team · 4 weeks ago
  86. 0399c7d Added sum_squared reductions to arm. by Misha Gutman · 4 weeks ago
  87. 1aef20a Add neon-bf16 dot kernels by Dillon Sharlet · 4 weeks ago
  88. 3e084f0 Refactor ifdef to be inline with sme1 variant by Jonathan Clohessy · 4 weeks ago
  89. fd03d06 Merge branch 'google:master' into jclohess_sme2_pf32_igemm by Jonathan Clohessy · 4 weeks ago
  90. 3c43700 Clean up SIMD headers and improve test coverage by Dillon Sharlet · 4 weeks ago
  91. 92813a3 Move code into ifdef guards and add sme2 packing variant for lhs by Jonathan Clohessy · 4 weeks ago
  92. 149835f Added sum_squared reductions for more AVX variants. by Misha Gutman · 4 weeks ago
  93. 4a84acb Added sum_squared for AVX512BW. Used multi_vec instead of s32x16x4. by Misha Gutman · 4 weeks ago
  94. 2a8e706 Added sum_squared for avx512bf16. by Misha Gutman · 4 weeks ago
  95. d5affbf Change ARM i8mm to be a transpose_a kernel by Dillon Sharlet · 4 weeks ago
  96. 6f79f4d Don't try to optimize if static reshape's new shape isn't fully defined. by Quentin Khan · 4 weeks ago
  97. 549cca9 Fixed 2-bit gemm benchmark. Number of elements in packed weights didn't account for row_sum. by Misha Gutman · 4 weeks ago
  98. 7f398de Added Map to ynnpack reductions. Added sum_squared for avx2. by Misha Gutman · 4 weeks ago
  99. 7884caf Disable inlining of functions that attempt to disable sanitizers by Dillon Sharlet · 4 weeks ago
  100. c18f402 Fix build when `XNN_ENABLE_KLEIDIAI` is false by Dillon Sharlet · 4 weeks ago