1. 35b9206 Clean up unnecessary codepaths from operator-run by Dillon Sharlet · 6 hours ago upstream/master
  2. 4f38bf9 Added qdu8_qc2w to FC for qc2w. This is needed to use fast avx2 kernels for DQ. by Misha Gutman · 22 hours ago
  3. b83baa8 AVX10 qs8 qc2w GEMM microkernels by Frank Barchard · 3 days ago
  4. 810b9af [gn] Add experimental integration with KleidiAI by Richard Townsend · 3 days ago
  5. 661b24e Improve transpose benchmark tile size selection by Dillon Sharlet · 3 days ago
  6. 538ac8f Merge pull request #9516 from ken-unger:f16-vunary-vbinary-rvv by XNNPACK Team · 3 days ago
  7. 0c8ef5d Use memcpy to load partial NEON vectors by Dillon Sharlet · 4 days ago
  8. 9d47c4a Use simd::store for partial tiles by Dillon Sharlet · 4 days ago
  9. b9fde4f Only allow f32 to bf16 dot rewrite when B is constant. This is a temporary workaround for an accuracy bug. by Marie White · 4 days ago
  10. c0a9b3a Add support for reduce_window(square) rewriting. by Alexander Shaposhnikov · 4 days ago
  11. 7d657e4 Fix fma benchmark by Dillon Sharlet · 4 days ago
  12. 4e3eca3 Strengthen disabling of consistent_reduce_test by Dillon Sharlet · 4 days ago
  13. ba53e53 Add avx512 transpose kernels by Dillon Sharlet · 4 days ago
  14. f6576aa Use gmock to test simd vector ops by Dillon Sharlet · 4 days ago
  15. 27769b3 Emulate fp32 dots with 3 bf16 dots by Marie White · 4 days ago
  16. c140cb6 [gn] Add extra GN config for MacOS by Richard Townsend · 5 days ago
  17. 6e4a91d Use avx512 instructions for 128- and 256-bit partial loads/stores when available. by Dillon Sharlet · 5 days ago
  18. 41fda8b For avx512, combine f, bw, vl, dq into one target by Dillon Sharlet · 5 days ago
  19. cb4d18a [gn] Add initial support for building XNNPACK benchmarks by Richard Townsend · 5 days ago
  20. 49ad0fa Fix msan for simd/bench by Dillon Sharlet · 6 days ago
  21. ad5f476 Make `xnn_fingerprint_id_to_string` part of the available API. by Quentin Khan · 6 days ago
  22. beb38a8 Look up packed weights in the cache before computing them in 2D convolutions. by Quentin Khan · 6 days ago
  23. aa15b51 Add Hexagon transpose kernels to YNNPACK by Dillon Sharlet · 6 days ago
  24. 8748d64 Add `interleave_in_place` and optimize lifetimes of transpose intermediates by Dillon Sharlet · 6 days ago
  25. a670397 Add initial Hexagon HVX support to YNNPACK by Dillon Sharlet · 6 days ago
  26. 6932ee7 AVX qd8 qc2w GEMM microkernels generated for MR=2 to 8 by Frank Barchard · 6 days ago
  27. eec404f Disable avx512f on gcc9 too. by Dillon Sharlet · 6 days ago
  28. 1abd243 Optimize partial loads on x86 by Dillon Sharlet · 6 days ago
  29. 14f12a9 Don't rely on dummy load/store ops to avoid incorrectly computing the min/max by Dillon Sharlet · 6 days ago
  30. 349e42a Run generators for binary ops on RISC-V by Frank Barchard · 6 days ago
  31. eea59d1 Align mask to avoid crossing cache line boundaries by Dillon Sharlet · 7 days ago
  32. d4323cb [gn] Restrict maximum test output lines in CI by Richard Townsend · 7 days ago
  33. 44284b7 Remove `YNN_ALIGN` macro, `alignas` is standard C++ by Dillon Sharlet · 7 days ago
  34. 396a8ef Add `zeros` and `undef` tags for partial loads by Dillon Sharlet · 7 days ago
  35. ef5c0b9 [gn] Switch off AVX512, reduce volume of output by Richard Townsend · 7 days ago
  36. d171011 Add avx2 kernels for statically quantized 2-bit FC. by Misha Gutman · 7 days ago
  37. 2ae275e Refactor `src/operators/convolution-nchw.c` to mirror `convolution-nhwc.c`. by Quentin Khan · 7 days ago
  38. 8023ac4 Add common subexpression elimination pass by Marie White · 7 days ago
  39. 869b909 Fix benchmarks getting stripped by the linker by Dillon Sharlet · 7 days ago
  40. fefde82 Refactor tests: by Marie White · 7 days ago
  41. 33bda67 Docker.riscv and Docker.sme2 are multi-stage docker files, by Alexander Shaposhnikov · 7 days ago
  42. 6b2936d Update partial load implementation in XNNPACK x86 kernels. by Volodymyr Kysenko · 7 days ago
  43. def51dc Add xnnpack user to sudoers inside the container. by Alexander Shaposhnikov · 7 days ago
  44. 394b332 Add benchmarks of some SIMD wrapper operations by Dillon Sharlet · 7 days ago
  45. 0d80f77 Disable all AVX512 on gcc9 by Dillon Sharlet · 7 days ago
  46. b91c032 Another attempt at fixing Windows build. by Volodymyr Kysenko · 7 days ago
  47. 83ce1b6 SIMD wrapper cleanups by Dillon Sharlet · 8 days ago
  48. fb61513 Make SIMD test names consistent with the vector type names. by Dillon Sharlet · 8 days ago
  49. 7b0beed cleanup by Ken Unger · 8 days ago
  50. b6f471e Fix missing declaration of header by Dillon Sharlet · 8 days ago
  51. c8c583a Speed up transpose kernel tests by Dillon Sharlet · 8 days ago
  52. abe167a Fix Windows build. by Volodymyr Kysenko · 8 days ago
  53. 3a0101d Add pf32 support for f32_f16 FC node to enable SME acceleration. by XNNPACK Team · 8 days ago
  54. b6f1b88 Add pf32 support for f32_f16 CONV_2D to enable SME acceleration. by XNNPACK Team · 8 days ago
  55. 9bab87b 1. Skip adding 0 padding 2. Fix handling of broadcastable dimensions. by Alexander Shaposhnikov · 8 days ago
  56. b497ce5 Fix cleanup command for sme2 build. by Alexander Shaposhnikov · 8 days ago
  57. 7dc968f Add xnn_get_fingerprint functions to XNNPACK shim (stubs). by Alexander Shaposhnikov · 9 days ago
  58. d7ddd02 Store static_pad buffer at the innermost level. by Volodymyr Kysenko · 10 days ago
  59. 52ac311 Reorder operations in YNNPACK's grouped convolution definition. by Volodymyr Kysenko · 10 days ago
  60. 4939810 Add scalar support and test coverage of horizontal_sum by Dillon Sharlet · 10 days ago
  61. 548e701 Add scalar implementation of simd::vec by Dillon Sharlet · 10 days ago
  62. f5f93c4 Make all floating point sum kernels arithmetically consistent on x86. by Dillon Sharlet · 11 days ago
  63. 3e0e760 Fix target processor inference has logic bug by Frank Barchard · 11 days ago
  64. 92d786a Add linkopts and malloc settings to new binaries by Dillon Sharlet · 11 days ago
  65. d899254 Add max pooling test case to reduce_window test by Dillon Sharlet · 11 days ago
  66. 1e15a3e Set output shape too by Dillon Sharlet · 11 days ago
  67. 0d1b729 [gn] Add support for most of XNNPACK's tests by Richard Townsend · 11 days ago
  68. d08f992 More narrowly disable sme2-qemu build by Dillon Sharlet · 11 days ago
  69. d698580 Change simd tests to skip in Test::SetUp instead of a custom main by Dillon Sharlet · 12 days ago
  70. 313afbb Added qs8_qc2w neondot kernel. by Misha Gutman · 12 days ago
  71. fa19b54 Add a test only implementation of reduce_window. by Alexander Shaposhnikov · 12 days ago
  72. 023f2bd Force root for pack_b in YNNPACK dot subgraph when num_k_dims > 1. by Volodymyr Kysenko · 12 days ago
  73. 4e3a0a6 Add subtract_fp32_bf16 kernel by Marie White · 12 days ago
  74. 102dac3 Temporarily disable sme2-qemu by Dillon Sharlet · 12 days ago
  75. 675932d Fix fingerprinting test compilation for C++11. by Quentin Khan · 12 days ago
  76. 4cad8c8 Temporarily disable sme2-qemu by Dillon Sharlet · 12 days ago
  77. 4a5d698 Remove WORKSPACE by Dillon Sharlet · 12 days ago
  78. cb7ba0c Update bazel dependencies by Dillon Sharlet · 12 days ago
  79. f837e5b Speculative fix for: by Dillon Sharlet · 13 days ago
  80. 7cf53d6 Fix workflow trigger paths by Dillon Sharlet · 13 days ago
  81. 16c0f7d Temporarily disable sme2-qemu by Dillon Sharlet · 13 days ago
  82. f967def Merge branch 'google:master' into f16-vunary-vbinary-rvv by Ken Unger · 13 days ago
  83. b33278f Add 2-bit QD8_F32_QC2W GEMM for AVX2 - add template and generate 1x8 by Frank Barchard · 13 days ago
  84. f95d1d3 Make SME2 docker image multi-arch by Dillon Sharlet · 13 days ago
  85. 46426be Fix dot_bench argument parsing by Marie White · 13 days ago
  86. fd47a05 Remove explicit bazel version by Dillon Sharlet · 13 days ago
  87. 2ceace7 Fix usage of `std::uniform_int_distribution` with unsupported types. by Dillon Sharlet · 13 days ago
  88. a49f92c Fix incorrect usages of anonymous namespaces by Dillon Sharlet · 13 days ago
  89. 1b32c8a Remove slinky integration in XNNPACK by Dillon Sharlet · 14 days ago
  90. 199903d Merge pull request #9393 from ken-unger:f32-rsum-rvv by XNNPACK Team · 14 days ago
  91. 3caec45 Remove 12x4x4 dot kernel from avx512 by Dillon Sharlet · 14 days ago
  92. b4deb87 Fix flaky failures due to differences in overflow when converting float to bf16 by Dillon Sharlet · 2 weeks ago
  93. 0a76944 Tweak LUTs so we only need unsigned index kernels. by Dillon Sharlet · 2 weeks ago
  94. 4b8cfde Add convert bf16 to fp32 kernels for AVX2 and AVX512 by Marie White · 2 weeks ago
  95. 88634cd Fix 2D lut use cases by Dillon Sharlet · 2 weeks ago
  96. 4db8f78 Implement convert f32 to bf16 for AVX2 and AVX512BF by Marie White · 2 weeks ago
  97. e7da7bc Clean up `get_binary_kernel` by Dillon Sharlet · 2 weeks ago
  98. 7e65961 Update KleidiAI dependency by Dillon Sharlet · 2 weeks ago
  99. e9459e3 Update cpuinfo, gtest, KleidiAI dependencies by Dillon Sharlet · 2 weeks ago
  100. 8ff3752 Add `//ynnpack/subgraph/test:dot_bench` by Dillon Sharlet · 2 weeks ago