1. 46394a4 Simplified fix for warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots. by Frank Barchard · 20 hours ago upstream/master
  2. 078291a Fix overzealous assert by Dillon Sharlet · 22 hours ago
  3. d34f52c Update KleidiAI in XNNPACK by Dillon Sharlet · 26 hours ago
  4. 5d756cd Add approx_tanh operator support behind YNN_FLAG_FAST_MATH. by XNNPACK Team · 27 hours ago
  5. 76228ba Fix NaN handling by XNNPACK Team · 29 hours ago
  6. 58c0a52 Remove RTTI from the Tensor API. Refactor operation handling in the graph. by Quentin Khan · 30 hours ago
  7. d89bf2f Don't allow broadcasts as the first dimension of dot inputs by Dillon Sharlet · 2 days ago
  8. a74b048 Relax tolerance of sum reduce test by Dillon Sharlet · 2 days ago
  9. 23ba0fb Remove `force_root` from `static_transpose` scheduling by Dillon Sharlet · 2 days ago
  10. 0702200 Enable `YNN_FLAG_FAST_MATH` in XNNPACK compatibility layer by Dillon Sharlet · 2 days ago
  11. bf8d96b Add YNN_FLAG_FAST_MATH and approx_erf operator support behind this flag. by XNNPACK Team · 2 days ago
  12. 1eb7300 Do not create serial loops for k2, k3, ... by Marie White · 2 days ago
  13. 9b4a49f Fix warnings in update-microkernels.py by recursively ignoring subdirectories of ignored roots. by Frank Barchard · 2 days ago
  14. 05c9b91 Remove fxdiv usages from XNNPACK, keeping it only for pthreadpool by Frank Barchard · 3 days ago
  15. e2ab35a Merge pull request #10242 from ken-unger:f16-vlog-rvv by XNNPACK Team · 3 days ago
  16. 5004f85 Rewrite `transpose(static_broadcast(x))` => `static_broadcast(transpose(x))` by Dillon Sharlet · 3 days ago
  17. 01d254d Remove `broadcast` op implementation by Dillon Sharlet · 3 days ago
  18. 5fc47cc Fuse sequences of transpose(transpose(x)) into one transpose(x) by Dillon Sharlet · 3 days ago
  19. f9f2c22 Do not rely on tile_k when aligning split_k by Marie White · 3 days ago
  20. f1ab455 Implement `static_expand_dims` using `static_transpose` by Dillon Sharlet · 3 days ago
  21. d1da9a5 Add transcendental ops for every x86 architecture by Dillon Sharlet · 3 days ago
  22. 38eb8ab Remove RTTI from the Tensor API. Rework the `Quantization` hierarchy. by Quentin Khan · 3 days ago
  23. ac73c5b Remove RTTI from the Tensor API. Rework the `Buffer` hierarchy. by Quentin Khan · 3 days ago
  24. 43118f4 Remove RTTI from the Tensor API. Introduce `TypeId` class. by Quentin Khan · 3 days ago
  25. 30e1a98 f16-vtanh using high-accuracy rational polynomial implementation. by Frank Barchard · 4 days ago
  26. f9ee799 Add indirection_bench to test performance of indirection init by Frank Barchard · 4 days ago
  27. 134e8ef Fix test timeouts on emulators by Dillon Sharlet · 4 days ago
  28. dddad07 Split dot operation on K. by Marie White · 4 days ago
  29. 4d837f2 Fix attempts to use AVX2 instructions on non-AVX2 targets by Dillon Sharlet · 4 days ago
  30. acb00f5 Remove tile size from kernel function name by Dillon Sharlet · 4 days ago
  31. e60eb9d Add missing build of arm_neonfma benchmarks by Dillon Sharlet · 4 days ago
  32. 6cb1a2d Add XNN_ENABLE_RNDNU16 build flag and conditionally use rndnu16 kernels by Frank Barchard · 4 days ago
  33. 8ea1945 `tanh` accuracy improvements by Dillon Sharlet · 4 days ago
  34. cc4daec Migrate LiteRT ATS unary op graph generation to use litert::tensor API. by Gerardo Carranza · 4 days ago
  35. 35ba08c Add `tanh` SIMD wrappers by Dillon Sharlet · 4 days ago
  36. f8deca0 Replace rational polynomials for exp with non-rational polynomials by Dillon Sharlet · 5 days ago
  37. f4ed055 Add `erf` SIMD math functions by Dillon Sharlet · 5 days ago
  38. 41fde00 Improve exp approximation by Dillon Sharlet · 5 days ago
  39. 2b99af3 Add clarifying comments in call to define_transpose_a(). by Marie White · 6 days ago
  40. 894ae65 Fix `floor_log2(NaN)` to be `NaN` by Dillon Sharlet · 6 days ago
  41. eafd6fe Add benchmarks of exp and log for avx and avx2 by Dillon Sharlet · 6 days ago
  42. 38631e8 Refactor `exp` and `expm1` to use the same implementation by Dillon Sharlet · 6 days ago
  43. fda3ca7 Fix precision issue in rndnu16 requantization for scales near powers of 2. by Frank Barchard · 8 days ago
  44. 52bd8d0 Tighten tolerances of `log` from 3 ULPs to 2 by Dillon Sharlet · 8 days ago
  45. 0ccb84e Implement `YNN_FLAG_CONSISTENT_ARITHMETIC` for unary elementwise kernels by Dillon Sharlet · 8 days ago
  46. c45d6b4 Don't split the innermost dimension if the type of the input is sub-byte. by Volodymyr Kysenko · 8 days ago
  47. 5ff101c Merge pull request #10298 from wangw-1991:fix_LUT_fusion by XNNPACK Team · 8 days ago
  48. 3b5dbb2 Remove `fma` when not available, and add `multiply_add` which optionally uses `fma` when available. by Dillon Sharlet · 8 days ago
  49. 0080367 Combine x86 SIMD wrapper headers by Dillon Sharlet · 8 days ago
  50. c4e49c6 Combine ARM SIMD wrapper headers by Dillon Sharlet · 8 days ago
  51. 00825fc Define all architecture flags transitively implied by enabled architectures. by Dillon Sharlet · 8 days ago
  52. cccad55 Add math helpers to SIMD wrappers by Dillon Sharlet · 9 days ago
  53. 549deb8 Remove unused `transpose` SIMD wrapper by Dillon Sharlet · 9 days ago
  54. 58a65db Mark values as external outputs in constant folding only if they are actually used in the non-constant pipeline. by Volodymyr Kysenko · 9 days ago
  55. 6c9a1ab Consolidate some SIMD wrapper headers by Dillon Sharlet · 9 days ago
  56. ea77aab Generalize FMA emulation helper by Dillon Sharlet · 9 days ago
  57. 1b849f4 Tune params for unary kernels to avoid tolerance issues by Dillon Sharlet · 9 days ago
  58. 0f6ee41 Initial upload. by Wei Wang · 9 days ago
  59. d4adfcd gemm benchmark documentation fix - update names of models to match files by Frank Barchard · 10 days ago
  60. f56a6c7 Add numerically correct `expm1` kernels by Dillon Sharlet · 10 days ago
  61. 29a1c73 Add std::string overloads for tensor::Create. by XNNPACK Team · 10 days ago
  62. b1a0a5d Merge pull request #10261 from velonica0:f16 by XNNPACK Team · 10 days ago
  63. a0dbef3 Improve `exp` accuracy by Dillon Sharlet · 10 days ago
  64. 3f33e55 Add `select` and conditional operations to SIMD wrappers by Dillon Sharlet · 10 days ago
  65. 5a59a54 Properly open source tensor api in github through copybara by XNNPACK Team · 11 days ago
  66. bcc179a Remove fp64 wasm support by Dillon Sharlet · 11 days ago
  67. 0547829 Remove lo/hi as member functions of `vec<T, N>` by Dillon Sharlet · 11 days ago
  68. cc68da8 Add sigmoid_fp64 kernels by Dillon Sharlet · 11 days ago
  69. 4b53a43 Prevent scheduling of ki/ko loops in packing. by Volodymyr Kysenko · 11 days ago
  70. 0741ac5 Open source Tensor API in google-ai-edge/LiteRT by XNNPACK Team · 11 days ago
  71. ce14e18 Adjust bounds for elementwise unary kernels with sub-byte inputs. by Volodymyr Kysenko · 11 days ago
  72. d0004f8 Add `xnn_datatype_qint2` for tensorwise quantized 2-bit values. by Pedro Gonnet · 11 days ago
  73. adf9795 Split dot operation on K. by XNNPACK Team · 12 days ago
  74. ea44341 Split dot operation on K. by Marie White · 12 days ago
  75. 7e3c789 Add tanh_fp64 kernels by Dillon Sharlet · 12 days ago
  76. 7333afb Optimize `floor_log2` for fp64 for non-AVX512 targets by Dillon Sharlet · 12 days ago
  77. 5191fee Change tree reduction factor from 32 to 16, and add another level by Dillon Sharlet · 12 days ago
  78. 7ef5fdc Add `round_to_bf16` by Dillon Sharlet · 12 days ago
  79. 56ac34b Add a graph rewrite to fallback to fp32 when fp16 isn't supported. by Quentin Khan · 12 days ago chromium/7852 chromium/7853 chromium/7854 chromium/7855 chromium/7856 chromium/7857 chromium/7858 chromium/7859 chromium/7860 chromium/7861 chromium/7862 chromium/7863 chromium/7864 chromium/7865 chromium/7866 chromium/7867 chromium/7868
  80. f873466 Align C++ standard to C++17 in CMake builds to be equal to Bazel builds. by Quentin Khan · 12 days ago
  81. 3ca1b08 Relax tolerance for sum squared kernel test by Dillon Sharlet · 12 days ago
  82. 98c8ded Polynomial approximation improvements for `exp` and `log` by Dillon Sharlet · 12 days ago
  83. f1fe9b5 Only rewrite reduce(convert(x)) if we have a kernel for that reduction type. by Dillon Sharlet · 12 days ago
  84. 01db6e1 Fix possible infinite recursion in convert by Dillon Sharlet · 12 days ago
  85. 4ae8b8f fix bug by velonica0 · 13 days ago
  86. 1052f90 [gn] Add support for building/testing AArch32 by Richard Townsend · 2 weeks ago
  87. 8da42ae Add support for log fp16 in XNNPACK. by Gerardo Carranza · 2 weeks ago
  88. 1c292bf [gn] Test building AVX512 by Richard Townsend · 2 weeks ago
  89. c3ac56a Add subgraph matcher target to `BUILD.gn`. by Quentin Khan · 2 weeks ago chromium/7847 chromium/7848 chromium/7849 chromium/7850 chromium/7851
  90. 7bf9c69 Fix ambiguous std::isfinite, std::abs, and std::fpclassify calls for _Float16 in test framework by explicitly casting to float. by Frank Barchard · 2 weeks ago
  91. 34c8015 Make sure partial reduction splits match the loop step. by Volodymyr Kysenko · 2 weeks ago
  92. ace56b6 Improve `exp` kernel accuracy and correctness by Dillon Sharlet · 2 weeks ago
  93. 49e266f Add optimized convert int2/int4 to int8 kernels. by Volodymyr Kysenko · 2 weeks ago
  94. 11fb885 Implement round to nearest even for float -> bf16 conversions by Dillon Sharlet · 2 weeks ago
  95. 9ab80cd Allow adding function own loops even if some of its non-trivial loops has been already fused. by Volodymyr Kysenko · 2 weeks ago
  96. 95ee916 Use a better unroll factor for log2_fp32_sse2 by Dillon Sharlet · 2 weeks ago
  97. d72fa85 Improve log_fp32 kernels by Dillon Sharlet · 2 weeks ago
  98. 4fad5b3 Disable static_slice test until slinky bug is fixed by Dillon Sharlet · 2 weeks ago
  99. 393da7d add rvv kernel for f16-vlog by Ken Unger · 2 weeks ago
  100. fe16697 Disable static_slice test until slinky bug is fixed by Dillon Sharlet · 2 weeks ago