mbox series

[RFC,0/6] btrfs: offload zlib-deflate to accelerators

Message ID 20240426110941.5456-1-giovanni.cabiddu@intel.com (mailing list archive)
Headers show
Series btrfs: offload zlib-deflate to accelerators | expand

Message

Cabiddu, Giovanni April 26, 2024, 10:54 a.m. UTC
Add support for zlib compression and decompression through the acomp
APIs in BTRFS. This enables [de]compression operations to be offloaded
to accelerators. This is a rework of [1].

This set also re-enables zlib-deflate in the Crypto API and in the QAT
driver as they were removed in [2] since there was no user in kernel.
The re-enablement is done by reverting the commits that removed such
feature.

The code has been benchmarked on a system with the following specs:
 * Dual socket Intel(R) Xeon(R) Platinum 8470N
 * 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s])
 * 4 NVMe disks (349.3G INTEL SSDPE21K375GA)
 * 2 QAT 4xxx devices, one per socket, configured for compression only
 * Kernel 6.8.2

The test consisted of 4 processes running `dd` that wrote in parallel
50GB of data (Silesia corpus) to the 4 NVMe disks separately. We captured
disk write throughput, CPU utilization and compression ratio:

    +---------------------------+---------+---------+---------+---------+
    |                           | QAT-L9  | ZSTD-L3 | ZLIB-L3 | LZO-L1  |
    +---------------------------+---------+---------+---------+---------+
    | Disk Write TPUT (GiB/s)   | 6.5     | 5.2     | 2.2     | 6.5     |
    +---------------------------+---------+---------+---------+---------+
    | CPU utils %age @208 cores | 4.56%   | 15.67%  | 12.79%  | 19.85%  |
    +---------------------------+---------+---------+---------+---------+
    | Compression Ratio         | 34%     | 35%     | 37%     | 58%     |
    +---------------------------+---------+---------+---------+---------+

From the results we see that BTRFS with QAT configured for zlib-deflate Level 9
provides the best throughput with less CPU utilization and better compression
ratio compared with software zstd-l3, zlib-l3 and lzo. 

Limitations: 
  * The implementation is synchronous, even if acomp is an asynchronous API.
  * The implementation tries always to use an acomp tfm even if only
    zlib-deflate-scomp is present. This ignores the compression levels
    configuration for zlib.
  * There is no way to configure a compression level for acomp(zlib-deflate).
    This is hardcoded in the acomp algorithm implementation/provider.

[1] https://lore.kernel.org/all/1467083180-111750-1-git-send-email-weigang.li@intel.com/  
[2] https://lore.kernel.org/all/ZO8ULhlJSrJ0Mcsx@gondor.apana.org.au/

Giovanni Cabiddu (5):
  Revert "crypto: testmgr - Remove zlib-deflate"
  Revert "crypto: deflate - Remove zlib-deflate"
  Revert "crypto: qat - Remove zlib-deflate"
  Revert "crypto: qat - remove unused macros in qat_comp_alg.c"
  crypto: qat - change compressor settings for QAT GEN4

Weigang Li (1):
  btrfs: zlib: add support for zlib-deflate through acomp

 crypto/deflate.c                              |  61 +++--
 crypto/testmgr.c                              |  10 +
 crypto/testmgr.h                              |  75 ++++++
 .../crypto/intel/qat/qat_common/adf_gen4_dc.c |   4 +-
 .../intel/qat/qat_common/qat_comp_algs.c      | 138 ++++++++++-
 fs/btrfs/zlib.c                               | 216 ++++++++++++++++++
 6 files changed, 484 insertions(+), 20 deletions(-)

base-commit: ed265f7fd9a635d77c8022fc6d9a1b735dd4dfd7