1. 61ba61d Avoid BENCHMARK_CAPTURE when unnecessary by Dillon Sharlet · 9 hours ago upstream/master
  2. e8aad3c Fix 32-bit integer multiply on HVX by Dillon Sharlet · 12 hours ago
  3. 9940dd6 Reduce kernel test improvements by Dillon Sharlet · 12 hours ago
  4. 57b14a8 Better sigmoid reference implementation by Dillon Sharlet · 13 hours ago
  5. 82367b5 Don't replace minimum/maximum operators with clamp when input is broadcast by Reilly Grant · 13 hours ago
  6. 9f84c7a Remove unused generate-spmm-test.py by Dillon Sharlet · 13 hours ago
  7. 236a3e2 Update use of deprecated benchmark::internal::Benchmark by Dillon Sharlet · 13 hours ago
  8. 4a547fe Simplify global construction of GEMM benchmarks by Dillon Sharlet · 19 hours ago
  9. 35b9206 Clean up unnecessary codepaths from operator-run by Dillon Sharlet · 30 hours ago
  10. 4f38bf9 Added qdu8_qc2w to FC for qc2w. This is needed to use fast avx2 kernels for DQ. by Misha Gutman · 2 days ago
  11. b83baa8 AVX10 qs8 qc2w GEMM microkernels by Frank Barchard · 4 days ago
  12. 810b9af [gn] Add experimental integration with KleidiAI by Richard Townsend · 4 days ago
  13. 661b24e Improve transpose benchmark tile size selection by Dillon Sharlet · 4 days ago
  14. 538ac8f Merge pull request #9516 from ken-unger:f16-vunary-vbinary-rvv by XNNPACK Team · 5 days ago
  15. 0c8ef5d Use memcpy to load partial NEON vectors by Dillon Sharlet · 5 days ago
  16. 9d47c4a Use simd::store for partial tiles by Dillon Sharlet · 5 days ago
  17. b9fde4f Only allow f32 to bf16 dot rewrite when B is constant. This is a temporary workaround for an accuracy bug. by Marie White · 5 days ago
  18. c0a9b3a Add support for reduce_window(square) rewriting. by Alexander Shaposhnikov · 5 days ago
  19. 7d657e4 Fix fma benchmark by Dillon Sharlet · 5 days ago
  20. 4e3eca3 Strengthen disabling of consistent_reduce_test by Dillon Sharlet · 5 days ago
  21. ba53e53 Add avx512 transpose kernels by Dillon Sharlet · 5 days ago
  22. f6576aa Use gmock to test simd vector ops by Dillon Sharlet · 5 days ago
  23. 27769b3 Emulate fp32 dots with 3 bf16 dots by Marie White · 5 days ago
  24. c140cb6 [gn] Add extra GN config for MacOS by Richard Townsend · 6 days ago
  25. 6e4a91d Use avx512 instructions for 128- and 256-bit partial loads/stores when available. by Dillon Sharlet · 6 days ago
  26. 41fda8b For avx512, combine f, bw, vl, dq into one target by Dillon Sharlet · 6 days ago
  27. cb4d18a [gn] Add initial support for building XNNPACK benchmarks by Richard Townsend · 6 days ago
  28. 49ad0fa Fix msan for simd/bench by Dillon Sharlet · 7 days ago
  29. ad5f476 Make `xnn_fingerprint_id_to_string` part of the available API. by Quentin Khan · 7 days ago
  30. beb38a8 Look up packed weights in the cache before computing them in 2D convolutions. by Quentin Khan · 7 days ago
  31. aa15b51 Add Hexagon transpose kernels to YNNPACK by Dillon Sharlet · 7 days ago
  32. 8748d64 Add `interleave_in_place` and optimize lifetimes of transpose intermediates by Dillon Sharlet · 7 days ago
  33. a670397 Add initial Hexagon HVX support to YNNPACK by Dillon Sharlet · 7 days ago
  34. 6932ee7 AVX qd8 qc2w GEMM microkernels generated for MR=2 to 8 by Frank Barchard · 7 days ago
  35. eec404f Disable avx512f on gcc9 too. by Dillon Sharlet · 7 days ago
  36. 1abd243 Optimize partial loads on x86 by Dillon Sharlet · 7 days ago
  37. 14f12a9 Don't rely on dummy load/store ops to avoid incorrectly computing the min/max by Dillon Sharlet · 7 days ago
  38. 349e42a Run generators for binary ops on RISC-V by Frank Barchard · 7 days ago
  39. eea59d1 Align mask to avoid crossing cache line boundaries by Dillon Sharlet · 8 days ago
  40. d4323cb [gn] Restrict maximum test output lines in CI by Richard Townsend · 8 days ago
  41. 44284b7 Remove `YNN_ALIGN` macro, `alignas` is standard C++ by Dillon Sharlet · 8 days ago
  42. 396a8ef Add `zeros` and `undef` tags for partial loads by Dillon Sharlet · 8 days ago
  43. ef5c0b9 [gn] Switch off AVX512, reduce volume of output by Richard Townsend · 8 days ago
  44. d171011 Add avx2 kernels for statically quantized 2-bit FC. by Misha Gutman · 8 days ago
  45. 2ae275e Refactor `src/operators/convolution-nchw.c` to mirror `convolution-nhwc.c`. by Quentin Khan · 8 days ago
  46. 8023ac4 Add common subexpression elimination pass by Marie White · 8 days ago
  47. 869b909 Fix benchmarks getting stripped by the linker by Dillon Sharlet · 8 days ago
  48. fefde82 Refactor tests: by Marie White · 8 days ago
  49. 33bda67 Docker.riscv and Docker.sme2 are multi-stage docker files, by Alexander Shaposhnikov · 8 days ago
  50. 6b2936d Update partial load implementation in XNNPACK x86 kernels. by Volodymyr Kysenko · 8 days ago
  51. def51dc Add xnnpack user to sudoers inside the container. by Alexander Shaposhnikov · 8 days ago
  52. 394b332 Add benchmarks of some SIMD wrapper operations by Dillon Sharlet · 8 days ago
  53. 0d80f77 Disable all AVX512 on gcc9 by Dillon Sharlet · 8 days ago
  54. b91c032 Another attempt at fixing Windows build. by Volodymyr Kysenko · 8 days ago
  55. 83ce1b6 SIMD wrapper cleanups by Dillon Sharlet · 9 days ago
  56. fb61513 Make SIMD test names consistent with the vector type names. by Dillon Sharlet · 9 days ago
  57. 7b0beed cleanup by Ken Unger · 9 days ago
  58. b6f471e Fix missing declaration of header by Dillon Sharlet · 9 days ago
  59. c8c583a Speed up transpose kernel tests by Dillon Sharlet · 9 days ago
  60. abe167a Fix Windows build. by Volodymyr Kysenko · 9 days ago
  61. 3a0101d Add pf32 support for f32_f16 FC node to enable SME acceleration. by XNNPACK Team · 9 days ago
  62. b6f1b88 Add pf32 support for f32_f16 CONV_2D to enable SME acceleration. by XNNPACK Team · 9 days ago
  63. 9bab87b 1. Skip adding 0 padding 2. Fix handling of broadcastable dimensions. by Alexander Shaposhnikov · 9 days ago
  64. b497ce5 Fix cleanup command for sme2 build. by Alexander Shaposhnikov · 9 days ago
  65. 7dc968f Add xnn_get_fingerprint functions to XNNPACK shim (stubs). by Alexander Shaposhnikov · 10 days ago
  66. d7ddd02 Store static_pad buffer at the innermost level. by Volodymyr Kysenko · 11 days ago
  67. 52ac311 Reorder operations in YNNPACK's grouped convolution definition. by Volodymyr Kysenko · 11 days ago
  68. 4939810 Add scalar support and test coverage of horizontal_sum by Dillon Sharlet · 11 days ago
  69. 548e701 Add scalar implementation of simd::vec by Dillon Sharlet · 12 days ago
  70. f5f93c4 Make all floating point sum kernels arithmetically consistent on x86. by Dillon Sharlet · 12 days ago
  71. 3e0e760 Fix target processor inference has logic bug by Frank Barchard · 12 days ago
  72. 92d786a Add linkopts and malloc settings to new binaries by Dillon Sharlet · 12 days ago
  73. d899254 Add max pooling test case to reduce_window test by Dillon Sharlet · 12 days ago
  74. 1e15a3e Set output shape too by Dillon Sharlet · 12 days ago
  75. 0d1b729 [gn] Add support for most of XNNPACK's tests by Richard Townsend · 12 days ago
  76. d08f992 More narrowly disable sme2-qemu build by Dillon Sharlet · 13 days ago
  77. d698580 Change simd tests to skip in Test::SetUp instead of a custom main by Dillon Sharlet · 13 days ago
  78. 313afbb Added qs8_qc2w neondot kernel. by Misha Gutman · 13 days ago
  79. fa19b54 Add a test only implementation of reduce_window. by Alexander Shaposhnikov · 13 days ago
  80. 023f2bd Force root for pack_b in YNNPACK dot subgraph when num_k_dims > 1. by Volodymyr Kysenko · 13 days ago
  81. 4e3a0a6 Add subtract_fp32_bf16 kernel by Marie White · 13 days ago
  82. 102dac3 Temporarily disable sme2-qemu by Dillon Sharlet · 13 days ago
  83. 675932d Fix fingerprinting test compilation for C++11. by Quentin Khan · 13 days ago
  84. 4cad8c8 Temporarily disable sme2-qemu by Dillon Sharlet · 13 days ago
  85. 4a5d698 Remove WORKSPACE by Dillon Sharlet · 13 days ago
  86. cb7ba0c Update bazel dependencies by Dillon Sharlet · 13 days ago
  87. f837e5b Speculative fix for: by Dillon Sharlet · 14 days ago
  88. 7cf53d6 Fix workflow trigger paths by Dillon Sharlet · 14 days ago
  89. 16c0f7d Temporarily disable sme2-qemu by Dillon Sharlet · 14 days ago
  90. f967def Merge branch 'google:master' into f16-vunary-vbinary-rvv by Ken Unger · 14 days ago
  91. b33278f Add 2-bit QD8_F32_QC2W GEMM for AVX2 - add template and generate 1x8 by Frank Barchard · 2 weeks ago
  92. f95d1d3 Make SME2 docker image multi-arch by Dillon Sharlet · 2 weeks ago
  93. 46426be Fix dot_bench argument parsing by Marie White · 2 weeks ago
  94. fd47a05 Remove explicit bazel version by Dillon Sharlet · 2 weeks ago
  95. 2ceace7 Fix usage of `std::uniform_int_distribution` with unsupported types. by Dillon Sharlet · 2 weeks ago
  96. a49f92c Fix incorrect usages of anonymous namespaces by Dillon Sharlet · 2 weeks ago
  97. 1b32c8a Remove slinky integration in XNNPACK by Dillon Sharlet · 2 weeks ago
  98. 199903d Merge pull request #9393 from ken-unger:f32-rsum-rvv by XNNPACK Team · 2 weeks ago
  99. 3caec45 Remove 12x4x4 dot kernel from avx512 by Dillon Sharlet · 2 weeks ago
  100. b4deb87 Fix flaky failures due to differences in overflow when converting float to bf16 by Dillon Sharlet · 2 weeks ago