Sign in
chromium
/
external
/
github.com
/
google
/
XNNPACK
/
HEAD
4536794
Fix KBLOCK size for qc2w GEMM scalar and neondot microkernel tests
by Frank Barchard
· 3 days ago
upstream/master
cafdb1e
Fix KBLOCK size for neondot qc2w microkernel tests
by Frank Barchard
· 3 days ago
4a0e2db
Benchmarks for Apple macos / ios detect cpu frequency
by Frank Barchard
· 3 days ago
25b42df
Use tags to filter out tests for YNNPACK
by Dillon Sharlet
· 8 days ago
bf97a38
Fix nc param to gemm/igemm headers
by Frank Barchard
· 8 days ago
742fd64
QC2W NEONDOT defer kernel_zero_point compute to after loop
by Frank Barchard
· 8 days ago
9b2d3e8
Remove unnecessary assert
by Dillon Sharlet
· 8 days ago
2fff7b8
Fix `tile_k > 1` case for rewriting `transpose_a(stencil_copy(x))` -> `stencil_copy(transpose_a(x))`
by Dillon Sharlet
· 8 days ago
4f8026d
Try to store output of stencil_copy at the innermost loop level.
by Volodymyr Kysenko
· 11 days ago
e1e7194
Fix stencil_copy bounds if multiple stencils use the same original dimension
by Dillon Sharlet
· 11 days ago
911c20f
Define Hexagon type for intrinsics compatible with clang
by Frank Barchard
· 11 days ago
20821fb
1. Adjust AVX reductions to match AVX512 2. Add consistent_reduce_test.cc
by Alexander Shaposhnikov
· 11 days ago
da578ff
Fix stencil output extent calculation.
by Volodymyr Kysenko
· 11 days ago
549d5d2
Remove unused Hexagon intrinsics from intrinsics-polyfill.h
by Frank Barchard
· 12 days ago
c299a16
Include bias in initializer of dot for f32.
by Marie White
· 12 days ago
7d0408c
Fix bug in avx512 int32 helpers
by Dillon Sharlet
· 14 days ago
fc1d7b1
Don't use `EnumerateIndices` unnecessariily
by Dillon Sharlet
· 2 weeks ago
82d56fb
Change simd::vec to be recursively defined
by Dillon Sharlet
· 2 weeks ago
f52fbff
Fixed full byte requirement in 2-bit Fully Connected test for qs8 version.
by Misha Gutman
· 2 weeks ago
ecdf584
Don't try to test int2 fully connected with YNNPACK
by Dillon Sharlet
· 2 weeks ago
91bc5c2
Added qs8_qc2w FullyConnected.
by Misha Gutman
· 2 weeks ago
e31a359
Support tile_k > 1 in transpose `stencil_copy` rewrite.
by Marie White
· 2 weeks ago
183297d
Fix `multi_vec`'s partial `load`/`store`
by Dillon Sharlet
· 2 weeks ago
ff23b55
Rewrite transpose_a(stencil_copy(x)) to stencil_copy(transpose_a(x))
by XNNPACK Team
· 2 weeks ago
0723f38
Minor fixes to stencil_copy
by XNNPACK Team
· 2 weeks ago
944d44d
Enable transpose_a kernels for num_k_dims > 1
by Dillon Sharlet
· 2 weeks ago
51dd4c8
Tweak K1 = 1 reduction case
by Dillon Sharlet
· 2 weeks ago
996234f
Merge pull request #9316 from salmanmkc:upgrade-github-actions-node24
by XNNPACK Team
· 2 weeks ago
e317c78
Pass down the original kernel and bias buffers when creating a convolution operation.
by Quentin Khan
· 2 weeks ago
103c631
Improved k1=1 reductions. Most of the cases have significant speed-up except scalar kernels, sum bf16_f32 for f16c and sse2, and x8_s32 for avx2.
by Misha Gutman
· 2 weeks ago
adf70ec
Allowed not to pass channel wise zero point to accommodate asymmetric (zero point is assumed 0) 2-bit case.
by Misha Gutman
· 2 weeks ago
2c1a512
Enable weight caching for quantized depthwise convolutions.
by Quentin Khan
· 2 weeks ago
4574c4d
Rewrite SIMD tests to make a new test for each architecture
by Dillon Sharlet
· 2 weeks ago
30cd310
Fixed n=0 case that left uninitialized vectors.
by Misha Gutman
· 3 weeks ago
7926389
Upgrade GitHub Actions for Node 24 compatibility
by Salman Muin Kayser Chishti
· 3 weeks ago
9e77b69
Pass down the original kernel and bias buffers when creating a fully connected operation.
by Quentin Khan
· 3 weeks ago
5dab025
Add stencil_copy helper function
by XNNPACK Team
· 3 weeks ago
d9ad8b4
Attepmted to fix 2-bit fully-connected operator flakyness.
by Misha Gutman
· 3 weeks ago
7b059b1
Create transpose_a helper
by XNNPACK Team
· 3 weeks ago
becb9f5
Don't allow computing ops in-place if the output of the op is an external output.
by Dillon Sharlet
· 3 weeks ago
5d89a13
Hexagon V81 portable simd - Remove Q6_Vsf_equals_Vqf32 and output Vsf directly
by Frank Barchard
· 3 weeks ago
f51a395
Added scalar kernels to qc2w gemm-config.
by Misha Gutman
· 3 weeks ago
7b275c9
Fix assert in dot.cc
by Volodymyr Kysenko
· 3 weeks ago
a423689
Tighten up reduce kernels and architectures
by Dillon Sharlet
· 3 weeks ago
021f28e
Add `fma`, `concat`, and more `convert` functions to SIMD headers
by Dillon Sharlet
· 3 weeks ago
3fc92c7
Add a function to get fingerprints by index for testing.
by Quentin Khan
· 3 weeks ago
7746844
Added simd extract tests.
by Misha Gutman
· 3 weeks ago
ddc8e1a
Fix uninitialized variable build error for params to rsum for 2 bit
by Frank Barchard
· 3 weeks ago
3a7b710
Merge pull request #9270 from JonathanC-ARM:jclohess_fix_missing_debug_logging
by XNNPACK Team
· 3 weeks ago
e436865
Use /clang: to pass copts to clang-cl
by Dillon Sharlet
· 3 weeks ago
4e2113d
Minor code cleanup
by Dillon Sharlet
· 3 weeks ago
421648d
Added convert<bf16x16, f32x16> and convert<f16x16, f32x16> to x86_avx512f to decrease intrinsics usage in reduce kernels.
by Misha Gutman
· 3 weeks ago
79a6eac
_mm256_packs_epi* is AVX2, not AVX
by Dillon Sharlet
· 3 weeks ago
6400256
Fix windows build flags
by Dillon Sharlet
· 3 weeks ago
1e2d4d0
Added convert<bf16x8, f32x8> to x86_avx2 to decrease intrinsics usage in reduce kernels.
by Misha Gutman
· 3 weeks ago
5a4ee42
Merge branch 'google:master' into jclohess_fix_missing_debug_logging
by Jonathan Clohessy
· 3 weeks ago
31b1ede
simd test improvements
by Dillon Sharlet
· 3 weeks ago
3c9ad05
Fix uses of AVX2 instructions in AVX targeted code
by Dillon Sharlet
· 3 weeks ago
aabc21c
Automated Code Change
by XNNPACK Team
· 3 weeks ago
269fbf1
Use a more intuitive formulation of `make_stencil_dim`
by Dillon Sharlet
· 3 weeks ago
be53d66
Fixed int2 fully connected setting producers in the beginning of row_sum rewrite.
by Misha Gutman
· 3 weeks ago
b666baa
Templated neondot int2 GEMMs.
by Misha Gutman
· 3 weeks ago
dfaf2ae
Added scalar kernels for int2 gemm.
by Misha Gutman
· 3 weeks ago
aa67973
Re-enable kernels disabled under msan due to msan bugs
by Dillon Sharlet
· 3 weeks ago
dc05a09
Add @bazel_tools//tools/cpp/compiler:clang-cl as a signal to target MSVC-style builds
by Dillon Sharlet
· 3 weeks ago
631f179
Use _mm_packs_epi32 instead of _mm_packus_epi32 for converting to uint8
by Dillon Sharlet
· 3 weeks ago
0852723
Give ternary nodes a proper type
by Dillon Sharlet
· 4 weeks ago
470fb9c
Fix unused variable warning in `pack-lh-config.c`.
by Quentin Khan
· 4 weeks ago
ff3f40a
Move f16 sum/sum_squared kernels to avx512bw
by Dillon Sharlet
· 4 weeks ago
c187017
Speed up reduce kernels test
by Dillon Sharlet
· 4 weeks ago
81bb443
Minor readability and refactors for reductions
by Dillon Sharlet
· 4 weeks ago
e7efd4a
Fix out of bounds memory access when transposed A kernels have tile_m != 1
by Dillon Sharlet
· 4 weeks ago
84cdd30
Fix avx512fp16 sum to only add each input once.
by Dillon Sharlet
· 4 weeks ago
3e588fd
Added bf16 sum and sum_squared to sse2.
by Misha Gutman
· 4 weeks ago
8ff7ddb
Added bf16 sum and sum_squared to avx512.
by Misha Gutman
· 4 weeks ago
682892c
Add packed igemm debug logging
by Jonathan Clohessy
· 4 weeks ago
4d0cba7
Added bf16 sum and sum_squared to avx2.
by Misha Gutman
· 4 weeks ago
4f4cfcc
Move bias add to after the dot operation.
by XNNPACK Team
· 4 weeks ago
940155d
Update xnnpack shim
by XNNPACK Team
· 4 weeks ago
85468e7
Partially enable convolution and fully_connected subgraph tests when YNNPACK is enabled
by Dillon Sharlet
· 4 weeks ago
09e2d3e
Fix hole in reduce kernel test coverage
by Dillon Sharlet
· 4 weeks ago
59eb057
Fix some regressions in fully connected test
by Dillon Sharlet
· 4 weeks ago
a342cf6
Minor reduction cleanups
by Dillon Sharlet
· 4 weeks ago
995241f
Added bf16 sum and sum_squared to arm neon.
by Misha Gutman
· 4 weeks ago
9bab77f
Merge pull request #9244 from JonathanC-ARM:jclohess_sme2_pf32_igemm
by XNNPACK Team
· 4 weeks ago
0399c7d
Added sum_squared reductions to arm.
by Misha Gutman
· 4 weeks ago
1aef20a
Add neon-bf16 dot kernels
by Dillon Sharlet
· 4 weeks ago
3e084f0
Refactor ifdef to be inline with sme1 variant
by Jonathan Clohessy
· 4 weeks ago
fd03d06
Merge branch 'google:master' into jclohess_sme2_pf32_igemm
by Jonathan Clohessy
· 4 weeks ago
3c43700
Clean up SIMD headers and improve test coverage
by Dillon Sharlet
· 4 weeks ago
92813a3
Move code into ifdef guards and add sme2 packing variant for lhs
by Jonathan Clohessy
· 4 weeks ago
149835f
Added sum_squared reductions for more AVX variants.
by Misha Gutman
· 4 weeks ago
4a84acb
Added sum_squared for AVX512BW. Used multi_vec instead of s32x16x4.
by Misha Gutman
· 4 weeks ago
2a8e706
Added sum_squared for avx512bf16.
by Misha Gutman
· 4 weeks ago
d5affbf
Change ARM i8mm to be a transpose_a kernel
by Dillon Sharlet
· 4 weeks ago
6f79f4d
Don't try to optimize if static reshape's new shape isn't fully defined.
by Quentin Khan
· 4 weeks ago
549cca9
Fixed 2-bit gemm benchmark. Number of elements in packed weights didn't account for row_sum.
by Misha Gutman
· 4 weeks ago
7f398de
Added Map to ynnpack reductions. Added sum_squared for avx2.
by Misha Gutman
· 4 weeks ago
7884caf
Disable inlining of functions that attempt to disable sanitizers
by Dillon Sharlet
· 4 weeks ago
c18f402
Fix build when `XNN_ENABLE_KLEIDIAI` is false
by Dillon Sharlet
· 4 weeks ago
Next »