diff mbox series

[RFC,6/6] btrfs: zlib: add support for zlib-deflate through acomp

Message ID 20240426110941.5456-7-giovanni.cabiddu@intel.com (mailing list archive)
State RFC
Delegated to: Herbert Xu
Headers show
Series btrfs: offload zlib-deflate to accelerators | expand

Commit Message

Cabiddu, Giovanni April 26, 2024, 10:54 a.m. UTC
From: Weigang Li <weigang.li@intel.com>

Add support for zlib compression and decompression through the acomp
APIs.
Input pages are added to an sg-list and sent to acomp in one request.
Since acomp is asynchronous, the thread is put to sleep and then the CPU
is freed up. Once compression is done, the acomp callback is triggered
and the thread is woke up.

This patch doesn't change the BTRFS disk format, this means that files
compressed by hardware engines can be de-compressed by the zlib software
library, and vice versa.

Limitations:
  * The implementation tries always to use an acomp even if only
    zlib-deflate-scomp is present
  * Acomp does not provide a way to support compression levels
  * Acomp is an asynchronous API but used here synchronously

Signed-off-by: Weigang Li <weigang.li@intel.com>
---
 fs/btrfs/zlib.c | 216 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)

Comments

Josef Bacik April 29, 2024, 1:56 p.m. UTC | #1
On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> From: Weigang Li <weigang.li@intel.com>
> 
> Add support for zlib compression and decompression through the acomp
> APIs.
> Input pages are added to an sg-list and sent to acomp in one request.
> Since acomp is asynchronous, the thread is put to sleep and then the CPU
> is freed up. Once compression is done, the acomp callback is triggered
> and the thread is woke up.
> 
> This patch doesn't change the BTRFS disk format, this means that files
> compressed by hardware engines can be de-compressed by the zlib software
> library, and vice versa.
> 
> Limitations:
>   * The implementation tries always to use an acomp even if only
>     zlib-deflate-scomp is present
>   * Acomp does not provide a way to support compression levels

That's a non-starter.  We can't just lie to the user about the compression level
that is being used.  If the user just does "-o compress=zlib" then you need to
update btrfs_compress_set_level() to figure out the compression level that acomp
is going to use and set that appropriately, so we can report to the user what is
actually being used.

Additionally if a user specifies a compression level you need to make sure we
don't do acomp if it doesn't match what acomp is going to do.

Finally, for the normal code review, there's a bunch of things that need to be
fixed up before I take a closer look

- We don't use pr_(), we have btrfs specific printk helpers, please use those.
- We do 1 variable per line, fix up the variable declarations in your functions.

Thanks,

Josef
Cabiddu, Giovanni April 29, 2024, 3:21 p.m. UTC | #2
On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > From: Weigang Li <weigang.li@intel.com>
> > 
> > Add support for zlib compression and decompression through the acomp
> > APIs.
> > Input pages are added to an sg-list and sent to acomp in one request.
> > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > is freed up. Once compression is done, the acomp callback is triggered
> > and the thread is woke up.
> > 
> > This patch doesn't change the BTRFS disk format, this means that files
> > compressed by hardware engines can be de-compressed by the zlib software
> > library, and vice versa.
> > 
> > Limitations:
> >   * The implementation tries always to use an acomp even if only
> >     zlib-deflate-scomp is present
> >   * Acomp does not provide a way to support compression levels
> 
> That's a non-starter.  We can't just lie to the user about the compression level
> that is being used.  If the user just does "-o compress=zlib" then you need to
> update btrfs_compress_set_level() to figure out the compression level that acomp
> is going to use and set that appropriately, so we can report to the user what is
> actually being used.
> 
> Additionally if a user specifies a compression level you need to make sure we
> don't do acomp if it doesn't match what acomp is going to do.
Thanks for the feedback. We should then extend the acomp API to take the
compression level.
@Herbert, do you have any objection if we add the compression level to
the acomp tfm and we add an API to set it? Example:

    tfm = crypto_alloc_acomp("deflate", 0, 0);
    acomp_set_level(tfm, compression_level);

> Finally, for the normal code review, there's a bunch of things that need to be
> fixed up before I take a closer look
> 
> - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> - We do 1 variable per line, fix up the variable declarations in your functions.
I see that the code in fs/btrfs/zlib.c uses both pr_() and more than one
variable per line. If we change it, will mixed style be a concern?
David Sterba April 29, 2024, 3:41 p.m. UTC | #3
On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > From: Weigang Li <weigang.li@intel.com>
> > 
> > Add support for zlib compression and decompression through the acomp
> > APIs.
> > Input pages are added to an sg-list and sent to acomp in one request.
> > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > is freed up. Once compression is done, the acomp callback is triggered
> > and the thread is woke up.
> > 
> > This patch doesn't change the BTRFS disk format, this means that files
> > compressed by hardware engines can be de-compressed by the zlib software
> > library, and vice versa.
> > 
> > Limitations:
> >   * The implementation tries always to use an acomp even if only
> >     zlib-deflate-scomp is present
> >   * Acomp does not provide a way to support compression levels
> 
> That's a non-starter.  We can't just lie to the user about the compression level
> that is being used.  If the user just does "-o compress=zlib" then you need to
> update btrfs_compress_set_level() to figure out the compression level that acomp
> is going to use and set that appropriately, so we can report to the user what is
> actually being used.
> 
> Additionally if a user specifies a compression level you need to make sure we
> don't do acomp if it doesn't match what acomp is going to do.
> 
> Finally, for the normal code review, there's a bunch of things that need to be
> fixed up before I take a closer look
> 
> - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> - We do 1 variable per line, fix up the variable declarations in your functions.

I'd skip the style and implementation details for now. The absence of
compression level support seems like the biggest problem, also in
combination with uncondtional use of the acomp interface. We'd have to
enhance the compression format specifier to make it configurable in the
sense: if accelerator is available use it, otherwise do CPU and
synchronous compression.

On the other hand, the compression levels are to trade off time and
space. If the QAT implementation with zlib level 9 is always better than
CPU compression then it's not that bad, not counting the possibly
misleading level to the users.

If QAT can also support ZSTD I'm not sure that lack of levels can work
there though, the memory overhead is bigger and it's a more complex
algorithm. Extending the acomp API with levels would be necessary.

Regarding the implementation, there are many allocations that set up the
async request. This is problematic as the compression is at the end of
the IO path and potentially called after memory pressure. We still do
some allocations there but also try not to fail due to ENOMEM, each
allocation is a new failure point. Anything that could be reused should
be in the workspace memory.
David Sterba April 29, 2024, 3:44 p.m. UTC | #4
On Mon, Apr 29, 2024 at 04:21:46PM +0100, Cabiddu, Giovanni wrote:
> On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> > On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > > From: Weigang Li <weigang.li@intel.com>
> > > 
> > > Add support for zlib compression and decompression through the acomp
> > > APIs.
> > > Input pages are added to an sg-list and sent to acomp in one request.
> > > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > > is freed up. Once compression is done, the acomp callback is triggered
> > > and the thread is woke up.
> > > 
> > > This patch doesn't change the BTRFS disk format, this means that files
> > > compressed by hardware engines can be de-compressed by the zlib software
> > > library, and vice versa.
> > > 
> > > Limitations:
> > >   * The implementation tries always to use an acomp even if only
> > >     zlib-deflate-scomp is present
> > >   * Acomp does not provide a way to support compression levels
> > 
> > That's a non-starter.  We can't just lie to the user about the compression level
> > that is being used.  If the user just does "-o compress=zlib" then you need to
> > update btrfs_compress_set_level() to figure out the compression level that acomp
> > is going to use and set that appropriately, so we can report to the user what is
> > actually being used.
> > 
> > Additionally if a user specifies a compression level you need to make sure we
> > don't do acomp if it doesn't match what acomp is going to do.
> Thanks for the feedback. We should then extend the acomp API to take the
> compression level.
> @Herbert, do you have any objection if we add the compression level to
> the acomp tfm and we add an API to set it? Example:
> 
>     tfm = crypto_alloc_acomp("deflate", 0, 0);
>     acomp_set_level(tfm, compression_level);
> 
> > Finally, for the normal code review, there's a bunch of things that need to be
> > fixed up before I take a closer look
> > 
> > - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> > - We do 1 variable per line, fix up the variable declarations in your functions.
> I see that the code in fs/btrfs/zlib.c uses both pr_() and more than one
> variable per line. If we change it, will mixed style be a concern?

I have a work in progress to rework the messages in compression. The
messages with pr_() helpers are there for historical reasons and using
proper btrfs_info/... need extracting structures like fs_info from
various data. Josef's comment is valid but you can skip that for the QAT
series.
David Sterba April 29, 2024, 3:57 p.m. UTC | #5
On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> From: Weigang Li <weigang.li@intel.com>
> +static int acomp_comp_pages(struct address_space *mapping, u64 start,
> +			    unsigned long len, struct page **pages,
> +			    unsigned long *out_pages,
> +			    unsigned long *total_in,
> +			    unsigned long *total_out)
> +{
> +	unsigned int nr_src_pages = 0, nr_dst_pages = 0, nr_pages = 0;
> +	struct sg_table in_sg = { 0 }, out_sg = { 0 };
> +	struct page *in_page, *out_page, **in_pages;
> +	struct crypto_acomp *tfm = NULL;
> +	struct acomp_req *req = NULL;
> +	struct crypto_wait wait;
> +	int ret, i;
> +
> +	nr_src_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +	in_pages = kcalloc(nr_src_pages, sizeof(struct page *), GFP_KERNEL);

The maximum length is bounded so you could store the in_pages array in
zlib's workspace.

> +	if (!in_pages) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < nr_src_pages; i++) {
> +		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> +		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);

Output pages should be newly allocated by btrfs_alloc_compr_folio()

> +		if (!in_page || !out_page) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		in_pages[i] = in_page;
> +		pages[i] = out_page;
> +		nr_dst_pages += 1;
> +		start += PAGE_SIZE;
> +	}
> +
> +	ret = sg_alloc_table_from_pages(&in_sg, in_pages, nr_src_pages, 0,
> +					nr_src_pages << PAGE_SHIFT, GFP_KERNEL);

I'm not sure if the sg interface allows to use an existing buffer but
the input parameters are bounded in size and count so the allocation
should be dropped and replaced by workspace data.

> +	if (ret)
> +		goto out;
> +
> +	ret = sg_alloc_table_from_pages(&out_sg, pages, nr_dst_pages, 0,
> +					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
> +	if (ret)
> +		goto out;
> +
> +	crypto_init_wait(&wait);
> +	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);

AFAIK the TFM should be allocated only once way before any IO is done
and then reused, this can trigger resolving the best implementation or
maybe even module loading.

> +	if (IS_ERR(tfm)) {
> +		ret = PTR_ERR(tfm);
> +		goto out;
> +	}
> +
> +	req = acomp_request_alloc(tfm);

The request should be in workspace, the only initialization I see
setting the right ->tfm pointer.

> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, len,
> +				 nr_dst_pages << PAGE_SHIFT);
> +	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &wait);
> +
> +	ret = crypto_wait_req(crypto_acomp_compress(req), &wait);
> +	if (ret)
> +		goto out;
> +
> +	*total_in = len;
> +	*total_out = req->dlen;
> +	nr_pages = (*total_out + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +
> +out:
> +	sg_free_table(&in_sg);
> +	sg_free_table(&out_sg);
> +
> +	if (in_pages) {
> +		for (i = 0; i < nr_src_pages; i++)
> +			put_page(in_pages[i]);
> +		kfree(in_pages);

Pages returned back to the pool by btrfs_free_compr_folio()

> +	}
> +
> +	/* free un-used out pages */
> +	for (i = nr_pages; i < nr_dst_pages; i++)
> +		put_page(pages[i]);
> +
> +	if (req)
> +		acomp_request_free(req);
> +
> +	if (tfm)
> +		crypto_free_acomp(tfm);
> +
> +	*out_pages = nr_pages;
> +
> +	return ret;
> +}
> +
> +static int acomp_zlib_decomp_bio(struct page **in_pages,
> +				 struct compressed_bio *cb, size_t srclen,
> +				 unsigned long total_pages_in)
> +{
> +	unsigned int nr_dst_pages = BTRFS_MAX_COMPRESSED_PAGES;
> +	struct sg_table in_sg = { 0 }, out_sg = { 0 };
> +	struct bio *orig_bio = &cb->orig_bbio->bio;
> +	char *data_out = NULL, *bv_buf = NULL;
> +	int copy_len = 0, bytes_left = 0;
> +	struct crypto_acomp *tfm = NULL;
> +	struct page **out_pages = NULL;
> +	struct acomp_req *req = NULL;
> +	struct crypto_wait wait;
> +	struct bio_vec bvec;
> +	int ret, i = 0;
> +
> +	ret = sg_alloc_table_from_pages(&in_sg, in_pages, total_pages_in,
> +					0, srclen, GFP_KERNEL);

Any allocation here needs to be GFP_NOFS for now. Actually we'd need
memalloc_nofs_save/memalloc_nofs_restore around all compression and
decompression code that does not use GFP_NOFS directly and could call
other APIs that do GFP_KERNEL. Like crypto or sg.

> +	if (ret)
> +		goto out;
> +
> +	out_pages = kcalloc(nr_dst_pages, sizeof(struct page *), GFP_KERNEL);
> +	if (!out_pages) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < nr_dst_pages; i++) {
> +		out_pages[i] = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +		if (!out_pages[i]) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +	}
> +
> +	ret = sg_alloc_table_from_pages(&out_sg, out_pages, nr_dst_pages, 0,
> +					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
> +	if (ret)
> +		goto out;
> +
> +	crypto_init_wait(&wait);
> +	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
> +	if (IS_ERR(tfm)) {
> +		ret = PTR_ERR(tfm);
> +		goto out;
> +	}
> +
> +	req = acomp_request_alloc(tfm);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, srclen,
> +				 nr_dst_pages << PAGE_SHIFT);
> +	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &wait);
> +
> +	ret = crypto_wait_req(crypto_acomp_decompress(req), &wait);
> +	if (ret)
> +		goto out;
> +
> +	/* Copy decompressed buffer to bio pages */
> +	bytes_left = req->dlen;
> +	for (i = 0; i < nr_dst_pages; i++) {
> +		copy_len = bytes_left > PAGE_SIZE ? PAGE_SIZE : bytes_left;
> +		data_out = kmap_local_page(out_pages[i]);
> +
> +		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
> +		bv_buf = kmap_local_page(bvec.bv_page);
> +		memcpy(bv_buf, data_out, copy_len);
> +		kunmap_local(bv_buf);
> +
> +		bio_advance(orig_bio, copy_len);
> +		if (!orig_bio->bi_iter.bi_size)
> +			break;
> +		bytes_left -= copy_len;
> +		if (bytes_left <= 0)
> +			break;
> +	}
> +out:
> +	sg_free_table(&in_sg);
> +	sg_free_table(&out_sg);
> +
> +	if (out_pages) {
> +		for (i = 0; i < nr_dst_pages; i++) {
> +			if (out_pages[i])
> +				put_page(out_pages[i]);
> +		}
> +		kfree(out_pages);
> +	}
> +
> +	if (req)
> +		acomp_request_free(req);
> +	if (tfm)
> +		crypto_free_acomp(tfm);
> +
> +	return ret;
> +}
> +
>  struct list_head *zlib_get_workspace(unsigned int level)
>  {
>  	struct list_head *ws = btrfs_get_workspace(BTRFS_COMPRESS_ZLIB, level);
> @@ -108,6 +305,15 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  	unsigned long nr_dest_pages = *out_pages;
>  	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>  
> +	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
> +		ret = acomp_comp_pages(mapping, start, len, pages, out_pages,
> +				       total_in, total_out);
> +		if (!ret)
> +			return ret;
> +
> +		pr_warn("BTRFS: acomp compression failed: ret = %d\n", ret);
> +		/* Fallback to SW implementation if HW compression failed */
> +	}
>  	*out_pages = 0;
>  	*total_out = 0;
>  	*total_in = 0;
> @@ -281,6 +487,16 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>  	unsigned long buf_start;
>  	struct page **pages_in = cb->compressed_pages;
>  
> +	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
> +		ret = acomp_zlib_decomp_bio(pages_in, cb, srclen,
> +					    total_pages_in);
> +		if (!ret)
> +			return ret;
> +
> +		pr_warn("BTRFS: acomp decompression failed, ret=%d\n", ret);
> +		/* Fallback to SW implementation if HW decompression failed */
> +	}
> +
>  	data_in = kmap_local_page(pages_in[page_in_index]);
>  	workspace->strm.next_in = data_in;
>  	workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE);
> -- 
> 2.44.0
>
Herbert Xu May 3, 2024, 10:04 a.m. UTC | #6
On Mon, Apr 29, 2024 at 04:21:46PM +0100, Cabiddu, Giovanni wrote:
>
> @Herbert, do you have any objection if we add the compression level to
> the acomp tfm and we add an API to set it? Example:
> 
>     tfm = crypto_alloc_acomp("deflate", 0, 0);
>     acomp_set_level(tfm, compression_level);

Yes I think that's fine.  I'd make it a more generic interface
and model it after setkey so that you can set any parameter for
the given algorithm.

Cheers,
diff mbox series

Patch

diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index e5b3f2003896..b5bbb8c97244 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -18,6 +18,8 @@ 
 #include <linux/pagemap.h>
 #include <linux/bio.h>
 #include <linux/refcount.h>
+#include <crypto/acompress.h>
+#include <linux/scatterlist.h>
 #include "compression.h"
 
 /* workspace buffer size for s390 zlib hardware support */
@@ -33,6 +35,201 @@  struct workspace {
 
 static struct workspace_manager wsm;
 
+static int acomp_comp_pages(struct address_space *mapping, u64 start,
+			    unsigned long len, struct page **pages,
+			    unsigned long *out_pages,
+			    unsigned long *total_in,
+			    unsigned long *total_out)
+{
+	unsigned int nr_src_pages = 0, nr_dst_pages = 0, nr_pages = 0;
+	struct sg_table in_sg = { 0 }, out_sg = { 0 };
+	struct page *in_page, *out_page, **in_pages;
+	struct crypto_acomp *tfm = NULL;
+	struct acomp_req *req = NULL;
+	struct crypto_wait wait;
+	int ret, i;
+
+	nr_src_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	in_pages = kcalloc(nr_src_pages, sizeof(struct page *), GFP_KERNEL);
+	if (!in_pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < nr_src_pages; i++) {
+		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (!in_page || !out_page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		in_pages[i] = in_page;
+		pages[i] = out_page;
+		nr_dst_pages += 1;
+		start += PAGE_SIZE;
+	}
+
+	ret = sg_alloc_table_from_pages(&in_sg, in_pages, nr_src_pages, 0,
+					nr_src_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	ret = sg_alloc_table_from_pages(&out_sg, pages, nr_dst_pages, 0,
+					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	crypto_init_wait(&wait);
+	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
+	if (IS_ERR(tfm)) {
+		ret = PTR_ERR(tfm);
+		goto out;
+	}
+
+	req = acomp_request_alloc(tfm);
+	if (!req) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, len,
+				 nr_dst_pages << PAGE_SHIFT);
+	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   crypto_req_done, &wait);
+
+	ret = crypto_wait_req(crypto_acomp_compress(req), &wait);
+	if (ret)
+		goto out;
+
+	*total_in = len;
+	*total_out = req->dlen;
+	nr_pages = (*total_out + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+out:
+	sg_free_table(&in_sg);
+	sg_free_table(&out_sg);
+
+	if (in_pages) {
+		for (i = 0; i < nr_src_pages; i++)
+			put_page(in_pages[i]);
+		kfree(in_pages);
+	}
+
+	/* free un-used out pages */
+	for (i = nr_pages; i < nr_dst_pages; i++)
+		put_page(pages[i]);
+
+	if (req)
+		acomp_request_free(req);
+
+	if (tfm)
+		crypto_free_acomp(tfm);
+
+	*out_pages = nr_pages;
+
+	return ret;
+}
+
+static int acomp_zlib_decomp_bio(struct page **in_pages,
+				 struct compressed_bio *cb, size_t srclen,
+				 unsigned long total_pages_in)
+{
+	unsigned int nr_dst_pages = BTRFS_MAX_COMPRESSED_PAGES;
+	struct sg_table in_sg = { 0 }, out_sg = { 0 };
+	struct bio *orig_bio = &cb->orig_bbio->bio;
+	char *data_out = NULL, *bv_buf = NULL;
+	int copy_len = 0, bytes_left = 0;
+	struct crypto_acomp *tfm = NULL;
+	struct page **out_pages = NULL;
+	struct acomp_req *req = NULL;
+	struct crypto_wait wait;
+	struct bio_vec bvec;
+	int ret, i = 0;
+
+	ret = sg_alloc_table_from_pages(&in_sg, in_pages, total_pages_in,
+					0, srclen, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	out_pages = kcalloc(nr_dst_pages, sizeof(struct page *), GFP_KERNEL);
+	if (!out_pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < nr_dst_pages; i++) {
+		out_pages[i] = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (!out_pages[i]) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	ret = sg_alloc_table_from_pages(&out_sg, out_pages, nr_dst_pages, 0,
+					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	crypto_init_wait(&wait);
+	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
+	if (IS_ERR(tfm)) {
+		ret = PTR_ERR(tfm);
+		goto out;
+	}
+
+	req = acomp_request_alloc(tfm);
+	if (!req) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, srclen,
+				 nr_dst_pages << PAGE_SHIFT);
+	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   crypto_req_done, &wait);
+
+	ret = crypto_wait_req(crypto_acomp_decompress(req), &wait);
+	if (ret)
+		goto out;
+
+	/* Copy decompressed buffer to bio pages */
+	bytes_left = req->dlen;
+	for (i = 0; i < nr_dst_pages; i++) {
+		copy_len = bytes_left > PAGE_SIZE ? PAGE_SIZE : bytes_left;
+		data_out = kmap_local_page(out_pages[i]);
+
+		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
+		bv_buf = kmap_local_page(bvec.bv_page);
+		memcpy(bv_buf, data_out, copy_len);
+		kunmap_local(bv_buf);
+
+		bio_advance(orig_bio, copy_len);
+		if (!orig_bio->bi_iter.bi_size)
+			break;
+		bytes_left -= copy_len;
+		if (bytes_left <= 0)
+			break;
+	}
+out:
+	sg_free_table(&in_sg);
+	sg_free_table(&out_sg);
+
+	if (out_pages) {
+		for (i = 0; i < nr_dst_pages; i++) {
+			if (out_pages[i])
+				put_page(out_pages[i]);
+		}
+		kfree(out_pages);
+	}
+
+	if (req)
+		acomp_request_free(req);
+	if (tfm)
+		crypto_free_acomp(tfm);
+
+	return ret;
+}
+
 struct list_head *zlib_get_workspace(unsigned int level)
 {
 	struct list_head *ws = btrfs_get_workspace(BTRFS_COMPRESS_ZLIB, level);
@@ -108,6 +305,15 @@  int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 	unsigned long nr_dest_pages = *out_pages;
 	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
 
+	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
+		ret = acomp_comp_pages(mapping, start, len, pages, out_pages,
+				       total_in, total_out);
+		if (!ret)
+			return ret;
+
+		pr_warn("BTRFS: acomp compression failed: ret = %d\n", ret);
+		/* Fallback to SW implementation if HW compression failed */
+	}
 	*out_pages = 0;
 	*total_out = 0;
 	*total_in = 0;
@@ -281,6 +487,16 @@  int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 	unsigned long buf_start;
 	struct page **pages_in = cb->compressed_pages;
 
+	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
+		ret = acomp_zlib_decomp_bio(pages_in, cb, srclen,
+					    total_pages_in);
+		if (!ret)
+			return ret;
+
+		pr_warn("BTRFS: acomp decompression failed, ret=%d\n", ret);
+		/* Fallback to SW implementation if HW decompression failed */
+	}
+
 	data_in = kmap_local_page(pages_in[page_in_index]);
 	workspace->strm.next_in = data_in;
 	workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE);