lib/lz4: enable LZ4_FAST_DEC_LOOP on aarch64 Clang builds

Upstream lz4 mentioned a performance regression on Qualcomm SoCs when built with Clang, but not with GCC [1]. However, according to my testing on sm8350 with LLVM Clang 15, this patch does offer a nice 10% boost in decompression, so enable the fast dec loop for Clang as well. Testing procedure: - pre-fill zram with 1GB of real-word zram data dumped under memory pressure, for example $ dd if=/sdcard/zram.test of=/dev/block/zram0 bs=1m count=1000 - $ fio --readonly --name=randread --direct=1 --rw=randread \ --ioengine=psync --randrepeat=0 --numjobs=4 --iodepth=1 \ --group_reporting=1 --filename=/dev/block/zram0 --bs=4K --size=1000M Results: - vanilla lz4: read: IOPS=1646k, BW=6431MiB/s (6743MB/s)(4000MiB/622msec) - lz4 fast dec: read: IOPS=1775k, BW=6932MiB/s (7269MB/s)(4000MiB/577msec) [1] https://github.com/lz4/lz4/pull/707 Signed-off-by: Chenyang Zhong <zhongcy95@gmail.com>
2022-06-18 22:18:14 -04:00 · 2022-06-18 22:18:14 -04:00 · dde7ceabd2
commit dde7ceabd2
parent 84ec8660f1
1 changed files with 1 additions and 1 deletions
--- a/lib/lz4/lz4_decompress.c
+++ b/lib/lz4/lz4_decompress.c
@ -53,7 +53,7 @@
 #ifndef LZ4_FAST_DEC_LOOP
 #if defined(__i386__) || defined(__x86_64__)
 #define LZ4_FAST_DEC_LOOP 1
-#elif defined(__aarch64__) && !defined(__clang__)
+#elif defined(__aarch64__)
     /* On aarch64, we disable this optimization for clang because on certain
      * mobile chipsets and clang, it reduces performance. For more information
      * refer to https://github.com/lz4/lz4/pull/707. */