Sign in
chromium
/
external
/
github.com
/
google
/
XNNPACK
/
HEAD
46394a4
Simplified fix for warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots.
by Frank Barchard
· 20 hours ago
upstream/master
078291a
Fix overzealous assert
by Dillon Sharlet
· 22 hours ago
d34f52c
Update KleidiAI in XNNPACK
by Dillon Sharlet
· 26 hours ago
5d756cd
Add approx_tanh operator support behind YNN_FLAG_FAST_MATH.
by XNNPACK Team
· 27 hours ago
76228ba
Fix NaN handling
by XNNPACK Team
· 29 hours ago
58c0a52
Remove RTTI from the Tensor API. Refactor operation handling in the graph.
by Quentin Khan
· 30 hours ago
d89bf2f
Don't allow broadcasts as the first dimension of dot inputs
by Dillon Sharlet
· 2 days ago
a74b048
Relax tolerance of sum reduce test
by Dillon Sharlet
· 2 days ago
23ba0fb
Remove `force_root` from `static_transpose` scheduling
by Dillon Sharlet
· 2 days ago
0702200
Enable `YNN_FLAG_FAST_MATH` in XNNPACK compatibility layer
by Dillon Sharlet
· 2 days ago
bf8d96b
Add YNN_FLAG_FAST_MATH and approx_erf operator support behind this flag.
by XNNPACK Team
· 2 days ago
1eb7300
Do not create serial loops for k2, k3, ...
by Marie White
· 2 days ago
9b4a49f
Fix warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots.
by Frank Barchard
· 2 days ago
05c9b91
Remove fxdiv usages from XNNPACK, keeping it only for pthreadpool
by Frank Barchard
· 3 days ago
e2ab35a
Merge pull request #10242 from ken-unger:f16-vlog-rvv
by XNNPACK Team
· 3 days ago
5004f85
Rewrite `transpose(static_broadcast(x))` => `static_broadcast(transpose(x))`
by Dillon Sharlet
· 3 days ago
01d254d
Remove `broadcast` op implementation
by Dillon Sharlet
· 3 days ago
5fc47cc
Fuse sequences of transpose(transpose(x)) into one transpose(x)
by Dillon Sharlet
· 3 days ago
f9f2c22
Do not rely on tile_k when aligning split_k
by Marie White
· 3 days ago
f1ab455
Implement `static_expand_dims` using `static_transpose`
by Dillon Sharlet
· 3 days ago
d1da9a5
Add transcendental ops for every x86 architecture
by Dillon Sharlet
· 3 days ago
38eb8ab
Remove RTTI from the Tensor API. Rework the `Quantization` hierarchy.
by Quentin Khan
· 3 days ago
ac73c5b
Remove RTTI from the Tensor API. Rework the `Buffer` hierarchy.
by Quentin Khan
· 3 days ago
43118f4
Remove RTTI from the Tensor API. Introduce `TypeId` class.
by Quentin Khan
· 3 days ago
30e1a98
f16-vtanh using high-accuracy rational polynomial implementation.
by Frank Barchard
· 4 days ago
f9ee799
Add indirection_bench to test performance of indirection init
by Frank Barchard
· 4 days ago
134e8ef
Fix test timeouts on emulators
by Dillon Sharlet
· 4 days ago
dddad07
Split dot operation on K.
by Marie White
· 4 days ago
4d837f2
Fix attempts to use AVX2 instructions on non-AVX2 targets
by Dillon Sharlet
· 4 days ago
acb00f5
Remove tile size from kernel function name
by Dillon Sharlet
· 4 days ago
e60eb9d
Add missing build of arm_neonfma benchmarks
by Dillon Sharlet
· 4 days ago
6cb1a2d
Add XNN_ENABLE_RNDNU16 build flag and conditionally use rndnu16 kernels
by Frank Barchard
· 4 days ago
8ea1945
`tanh` accuracy improvements
by Dillon Sharlet
· 4 days ago
cc4daec
Migrate LiteRT ATS unary op graph generation to use litert::tensor API.
by Gerardo Carranza
· 4 days ago
35ba08c
Add `tanh` SIMD wrappers
by Dillon Sharlet
· 4 days ago
f8deca0
Replace rational polynomials for exp with non-rational polynomials
by Dillon Sharlet
· 5 days ago
f4ed055
Add `erf` SIMD math functions
by Dillon Sharlet
· 5 days ago
41fde00
Improve exp approximation
by Dillon Sharlet
· 5 days ago
2b99af3
Add clarifying comments in call to define_transpose_a().
by Marie White
· 6 days ago
894ae65
Fix `floor_log2(NaN)` to be `NaN`
by Dillon Sharlet
· 6 days ago
eafd6fe
Add benchmarks of exp and log for avx and avx2
by Dillon Sharlet
· 6 days ago
38631e8
Refactor `exp` and `expm1` to use the same implementation
by Dillon Sharlet
· 6 days ago
fda3ca7
Fix precision issue in rndnu16 requantization for scales near powers of 2.
by Frank Barchard
· 8 days ago
52bd8d0
Tighten tolerances of `log` from 3 ULPs to 2
by Dillon Sharlet
· 8 days ago
0ccb84e
Implement `YNN_FLAG_CONSISTENT_ARITHMETIC` for unary elementwise kernels
by Dillon Sharlet
· 8 days ago
c45d6b4
Don't split the innermost dimension if the type of the input is sub-byte.
by Volodymyr Kysenko
· 8 days ago
5ff101c
Merge pull request #10298 from wangw-1991:fix_LUT_fusion
by XNNPACK Team
· 8 days ago
3b5dbb2
Remove `fma` when not available, and add `multiply_add` which optionally uses `fma` when available.
by Dillon Sharlet
· 8 days ago
0080367
Combine x86 SIMD wrapper headers
by Dillon Sharlet
· 8 days ago
c4e49c6
Combine ARM SIMD wrapper headers
by Dillon Sharlet
· 8 days ago
00825fc
Define all architecture flags transitively implied by enabled architectures.
by Dillon Sharlet
· 8 days ago
cccad55
Add math helpers to SIMD wrappers
by Dillon Sharlet
· 9 days ago
549deb8
Remove unused `transpose` SIMD wrapper
by Dillon Sharlet
· 9 days ago
58a65db
Mark values as external outputs in constant folding only if they are actually used in the non-constant pipeline.
by Volodymyr Kysenko
· 9 days ago
6c9a1ab
Consolidate some SIMD wrapper headers
by Dillon Sharlet
· 9 days ago
ea77aab
Generalize FMA emulation helper
by Dillon Sharlet
· 9 days ago
1b849f4
Tune params for unary kernels to avoid tolerance issues
by Dillon Sharlet
· 9 days ago
0f6ee41
Initial upload.
by Wei Wang
· 9 days ago
d4adfcd
gemm benchmark documentation fix - update names of models to match files
by Frank Barchard
· 10 days ago
f56a6c7
Add numerically correct `expm1` kernels
by Dillon Sharlet
· 10 days ago
29a1c73
Add std::string overloads for tensor::Create.
by XNNPACK Team
· 10 days ago
b1a0a5d
Merge pull request #10261 from velonica0:f16
by XNNPACK Team
· 10 days ago
a0dbef3
Improve `exp` accuracy
by Dillon Sharlet
· 10 days ago
3f33e55
Add `select` and conditional operations to SIMD wrappers
by Dillon Sharlet
· 10 days ago
5a59a54
Properly open source tensor api in github through copybara
by XNNPACK Team
· 11 days ago
bcc179a
Remove fp64 wasm support
by Dillon Sharlet
· 11 days ago
0547829
Remove lo/hi as member functions of `vec<T, N>`
by Dillon Sharlet
· 11 days ago
cc68da8
Add sigmoid_fp64 kernels
by Dillon Sharlet
· 11 days ago
4b53a43
Prevent scheduling of ki/ko loops in packing.
by Volodymyr Kysenko
· 11 days ago
0741ac5
Open source Tensor API in google-ai-edge/LiteRT
by XNNPACK Team
· 11 days ago
ce14e18
Adjust bounds for elementwise unary kernels with sub-byte inputs.
by Volodymyr Kysenko
· 11 days ago
d0004f8
Add `xnn_datatype_qint2` for tensorwise quantized 2-bit values.
by Pedro Gonnet
· 11 days ago
adf9795
Split dot operation on K.
by XNNPACK Team
· 12 days ago
ea44341
Split dot operation on K.
by Marie White
· 12 days ago
7e3c789
Add tanh_fp64 kernels
by Dillon Sharlet
· 12 days ago
7333afb
Optimize `floor_log2` for fp64 for non-AVX512 targets
by Dillon Sharlet
· 12 days ago
5191fee
Change tree reduction factor from 32 to 16, and add another level
by Dillon Sharlet
· 12 days ago
7ef5fdc
Add `round_to_bf16`
by Dillon Sharlet
· 12 days ago
56ac34b
Add a graph rewrite to fallback to fp32 when fp16 isn't supported.
by Quentin Khan
· 12 days ago
chromium/7852
chromium/7853
chromium/7854
chromium/7855
chromium/7856
chromium/7857
chromium/7858
chromium/7859
chromium/7860
chromium/7861
chromium/7862
chromium/7863
chromium/7864
chromium/7865
chromium/7866
chromium/7867
chromium/7868
f873466
Align C++ standard to C++17 in CMake builds to be equal to Bazel builds.
by Quentin Khan
· 12 days ago
3ca1b08
Relax tolerance for sum squared kernel test
by Dillon Sharlet
· 12 days ago
98c8ded
Polynomial approximation improvements for `exp` and `log`
by Dillon Sharlet
· 12 days ago
f1fe9b5
Only rewrite reduce(convert(x)) if we have a kernel for that reduction type.
by Dillon Sharlet
· 12 days ago
01db6e1
Fix possible infinite recursion in convert
by Dillon Sharlet
· 12 days ago
4ae8b8f
fix bug
by velonica0
· 13 days ago
1052f90
[gn] Add support for building/testing AArch32
by Richard Townsend
· 2 weeks ago
8da42ae
Add support for log fp16 in XNNPACK.
by Gerardo Carranza
· 2 weeks ago
1c292bf
[gn] Test building AVX512
by Richard Townsend
· 2 weeks ago
c3ac56a
Add subgraph matcher target to `BUILD.gn`.
by Quentin Khan
· 2 weeks ago
chromium/7847
chromium/7848
chromium/7849
chromium/7850
chromium/7851
7bf9c69
Fix ambiguous std::isfinite, std::abs, and std::fpclassify calls for _Float16 in test framework by explicitly casting to float.
by Frank Barchard
· 2 weeks ago
34c8015
Make sure partial reduction splits match the loop step.
by Volodymyr Kysenko
· 2 weeks ago
ace56b6
Improve `exp` kernel accuracy and correctness
by Dillon Sharlet
· 2 weeks ago
49e266f
Add optimized convert int2/int4 to int8 kernels.
by Volodymyr Kysenko
· 2 weeks ago
11fb885
Implement round to nearest even for float -> bf16 conversions
by Dillon Sharlet
· 2 weeks ago
9ab80cd
Allow adding function own loops even if some of its non-trivial loops has been already fused.
by Volodymyr Kysenko
· 2 weeks ago
95ee916
Use a better unroll factor for log2_fp32_sse2
by Dillon Sharlet
· 2 weeks ago
d72fa85
Improve log_fp32 kernels
by Dillon Sharlet
· 2 weeks ago
4fad5b3
Disable static_slice test until slinky bug is fixed
by Dillon Sharlet
· 2 weeks ago
393da7d
add rvv kernel for f16-vlog
by Ken Unger
· 2 weeks ago
fe16697
Disable static_slice test until slinky bug is fixed
by Dillon Sharlet
· 2 weeks ago
Next »