diff mbox

[v7,4/9] crypto: acomp - add support for lzo via scomp

Message ID 1473770981-9869-5-git-send-email-giovanni.cabiddu@intel.com (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show

Commit Message

Cabiddu, Giovanni Sept. 13, 2016, 12:49 p.m. UTC
Add scomp backend for lzo compression algorithm

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig |    1 +
 crypto/lzo.c   |  146 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 134 insertions(+), 13 deletions(-)

Comments

Herbert Xu Sept. 20, 2016, 9:26 a.m. UTC | #1
On Tue, Sep 13, 2016 at 01:49:36PM +0100, Giovanni Cabiddu wrote:
>
> +	lzo_src_scratches = crypto_scomp_alloc_scratches(LZO_SCRATCH_SIZE);
> +	if (!lzo_src_scratches)
> +		return -ENOMEM;

Rather than duplicating the scratch buffer handling in every scomp
algorithm, let's centralize this and put it into scomp.c.

Thanks,
Cabiddu, Giovanni Sept. 20, 2016, 12:23 p.m. UTC | #2
Hi Herbert,

apologies for the duplicate. The previous email didn't get delivered to
the ML.

On Tue, Sep 20, 2016 at 05:26:18PM +0800, Herbert Xu wrote:
> Rather than duplicating the scratch buffer handling in every scomp
> algorithm, let's centralize this and put it into scomp.c.
Are you suggesting to hide the scratch buffers from the scomp
implementations and allocate them inside crypto_register_scomp?
Does that mean we have to go back to a scomp_alg with a flat buffer API
and linearize inside scomp?
If we take this direction, how do we support DMA from scomp
implementation? Scratch buffers are allocated using vmalloc.

Thanks,

--
Giovanni
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Sept. 22, 2016, 9:22 a.m. UTC | #3
On Tue, Sep 20, 2016 at 12:51:40PM +0100, Giovanni Cabiddu wrote:
> Hi Herbert,
> 
> On Tue, Sep 20, 2016 at 05:26:18PM +0800, Herbert Xu wrote:
> > Rather than duplicating the scratch buffer handling in every scomp
> > algorithm, let's centralize this and put it into scomp.c.
> Are you suggesting to hide the scratch buffers from the scomp implementations 
> and allocate them inside crypto_register_scomp?

I'm suggesting that we have just one set of buffers for all scomp
algorithms.  After all, a CPU can only process one request at a
time.

> Does that mean we have to go back to a scomp_alg with a flat buffer API
> and linearize inside scomp?

Yes scomp should just be flat.  A sync algorithm capable of producing
partial output should use the acomp interface.

> If we take this direction, how do we support DMA from scomp implementation?
> Scratch buffers are allocated using vmalloc.

Huh? If you're doing DMA then you'd be using the acomp interface,
how can you even get into scomp?

I think you may have misread my earlier message from June.  What
I'd like to see is for the acomp layer to allocate the output
memory, rather than have it provided by the user as is the case
with the current interface.  The user could provide a maximum to
prevent crazy cases consuming unlimited memory.

What will happen with scomp is that it will simply use a linear
vmalloc buffer to store the output before copying it into an SG
list of individual pages allocated on the spot.

Cheers,
Cabiddu, Giovanni Sept. 22, 2016, 10:54 p.m. UTC | #4
On Thu, Sep 22, 2016 at 05:22:44PM +0800, Herbert Xu wrote:
> I'm suggesting that we have just one set of buffers for all scomp
> algorithms.  After all, a CPU can only process one request at a
> time.
Makes sense. Implemented in v8.

> Yes scomp should just be flat.  A sync algorithm capable of producing
> partial output should use the acomp interface.
I went back to scomp interface in v6.
>
> I think you may have misread my earlier message from June.  What
> I'd like to see is for the acomp layer to allocate the output
> memory, rather than have it provided by the user as is the case
> with the current interface.  The user could provide a maximum to
> prevent crazy cases consuming unlimited memory.
Why would you prefer to have the output buffer allocated in the acomp
layer? Is there a use case for this? Who would free that memory?

We believe that the output buffer should be allocated by the user of the
API. A caller might decide to allocate memory upfront or point the
buffer list to pre-allocate buffers. This would happen in BTRFS where
its block buffers are already allocated for submission to the
compression API.

Thanks,

--
Giovanni
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Sept. 23, 2016, 3:05 p.m. UTC | #5
On Thu, Sep 22, 2016 at 11:54:25PM +0100, Giovanni Cabiddu wrote:
> > I think you may have misread my earlier message from June.  What
> > I'd like to see is for the acomp layer to allocate the output
> > memory, rather than have it provided by the user as is the case
> > with the current interface.  The user could provide a maximum to
> > prevent crazy cases consuming unlimited memory.
> Why would you prefer to have the output buffer allocated in the acomp
> layer? Is there a use case for this? Who would free that memory?

When I said acomp layer I'm referring specifically to the algorithm
or driver.  As to your last question it would be the caller's
responsibility to free that memory.

The use-case is our oldest user, IPcomp.  Most packets are 1500 bytes
max but we have to allocate 64K of memory to cover the worst case.
For an algorithm that can deal with SG lists it can easily allocate
pages of memory as it goes and place them in an SG list.

> We believe that the output buffer should be allocated by the user of the
> API. A caller might decide to allocate memory upfront or point the
> buffer list to pre-allocate buffers. This would happen in BTRFS where
> its block buffers are already allocated for submission to the
> compression API.

Sure if you already have memory allocated then we don't want to
force you to allocate it again in the algorithm/driver.  But our
interface should allow the memory to be allocated in the driver.

Cheers,
Cabiddu, Giovanni Sept. 26, 2016, 5:27 p.m. UTC | #6
On Fri, Sep 23, 2016 at 11:05:18PM +0800, Herbert Xu wrote:
> When I said acomp layer I'm referring specifically to the algorithm
> or driver.  As to your last question it would be the caller's
> responsibility to free that memory.
>
> The use-case is our oldest user, IPcomp.  Most packets are 1500 bytes
> max but we have to allocate 64K of memory to cover the worst case.
> For an algorithm that can deal with SG lists it can easily allocate
> pages of memory as it goes and place them in an SG list.
It is clear now. Thanks.

> Sure if you already have memory allocated then we don't want to
> force you to allocate it again in the algorithm/driver.  But our
> interface should allow the memory to be allocated in the driver.
I think the definition of the acomp interface already allows for this.
If the destination scatterlist inside the request is NULL, the
algorithm/driver can allocate pages of memory for the output buffers as
well as the scatterlist. In this case, the destination length, if not zero,
could be used to specify the maximum size to allocate.
What do you think?

--
Giovanni
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Sept. 27, 2016, 3:07 a.m. UTC | #7
On Mon, Sep 26, 2016 at 06:27:04PM +0100, Giovanni Cabiddu wrote:
>
> I think the definition of the acomp interface already allows for this.
> If the destination scatterlist inside the request is NULL, the
> algorithm/driver can allocate pages of memory for the output buffers as
> well as the scatterlist. In this case, the destination length, if not zero,
> could be used to specify the maximum size to allocate.
> What do you think?

Sounds good to me.  We should also add a function to free the
resulting SG list along with the allocated pages.

Cheers,
diff mbox

Patch

diff --git a/crypto/Kconfig b/crypto/Kconfig
index f553f66..d275591 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1589,6 +1589,7 @@  config CRYPTO_DEFLATE
 config CRYPTO_LZO
 	tristate "LZO compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select LZO_COMPRESS
 	select LZO_DECOMPRESS
 	help
diff --git a/crypto/lzo.c b/crypto/lzo.c
index c3f3dd9..6faed95 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -22,40 +22,61 @@ 
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
 #include <linux/lzo.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/scompress.h>
+
+#define LZO_SCRATCH_SIZE	131072
 
 struct lzo_ctx {
 	void *lzo_comp_mem;
 };
 
+static void * __percpu *lzo_src_scratches;
+static void * __percpu *lzo_dst_scratches;
+
+static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
+{
+	void *ctx;
+
+	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
+	if (!ctx)
+		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	return ctx;
+}
+
 static int lzo_init(struct crypto_tfm *tfm)
 {
 	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	ctx->lzo_comp_mem = kmalloc(LZO1X_MEM_COMPRESS,
-				    GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx->lzo_comp_mem)
-		ctx->lzo_comp_mem = vmalloc(LZO1X_MEM_COMPRESS);
-	if (!ctx->lzo_comp_mem)
+	ctx->lzo_comp_mem = lzo_alloc_ctx(NULL);
+	if (IS_ERR(ctx->lzo_comp_mem))
 		return -ENOMEM;
 
 	return 0;
 }
 
+static void lzo_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	kvfree(ctx);
+}
+
 static void lzo_exit(struct crypto_tfm *tfm)
 {
 	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	kvfree(ctx->lzo_comp_mem);
+	lzo_free_ctx(NULL, ctx->lzo_comp_mem);
 }
 
-static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static int __lzo_compress(const u8 *src, unsigned int slen,
+			  u8 *dst, unsigned int *dlen, void *ctx)
 {
-	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 	size_t tmp_len = *dlen; /* size_t(ulong) <-> uint on 64 bit */
 	int err;
 
-	err = lzo1x_1_compress(src, slen, dst, &tmp_len, ctx->lzo_comp_mem);
+	err = lzo1x_1_compress(src, slen, dst, &tmp_len, ctx);
 
 	if (err != LZO_E_OK)
 		return -EINVAL;
@@ -64,8 +85,16 @@  static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
 	return 0;
 }
 
-static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
+			unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return __lzo_compress(src, slen, dst, dlen, ctx->lzo_comp_mem);
+}
+
+static int __lzo_decompress(const u8 *src, unsigned int slen,
+			    u8 *dst, unsigned int *dlen)
 {
 	int err;
 	size_t tmp_len = *dlen; /* size_t(ulong) <-> uint on 64 bit */
@@ -77,7 +106,56 @@  static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
 
 	*dlen = tmp_len;
 	return 0;
+}
 
+static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
+			  unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	return __lzo_decompress(src, slen, dst, dlen);
+}
+
+static int lzo_scomp_comp_decomp(struct crypto_scomp *tfm,
+				 struct scatterlist *src, unsigned int slen,
+				 struct scatterlist *dst, unsigned int *dlen,
+				 void *ctx, int dir)
+{
+	const int cpu = get_cpu();
+	u8 *scratch_src = *per_cpu_ptr(lzo_src_scratches, cpu);
+	u8 *scratch_dst = *per_cpu_ptr(lzo_dst_scratches, cpu);
+	int ret;
+
+	if (slen > LZO_SCRATCH_SIZE || *dlen > LZO_SCRATCH_SIZE) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	scatterwalk_map_and_copy(scratch_src, src, 0, slen, 0);
+	if (dir)
+		ret = __lzo_compress(scratch_src, slen, scratch_dst, dlen, ctx);
+	else
+		ret = __lzo_decompress(scratch_src, slen, scratch_dst, dlen);
+	if (!ret)
+		scatterwalk_map_and_copy(scratch_dst, dst, 0, *dlen, 1);
+
+out:
+	put_cpu();
+	return ret;
+}
+
+static int lzo_scomp_compress(struct crypto_scomp *tfm,
+			      struct scatterlist *src, unsigned int slen,
+			      struct scatterlist *dst, unsigned int *dlen,
+			      void *ctx)
+{
+	return lzo_scomp_comp_decomp(tfm, src, slen, dst, dlen, ctx, 1);
+}
+
+static int lzo_scomp_decompress(struct crypto_scomp *tfm,
+				struct scatterlist *src, unsigned int slen,
+				struct scatterlist *dst, unsigned int *dlen,
+				void *ctx)
+{
+	return lzo_scomp_comp_decomp(tfm, src, slen, dst, dlen, ctx, 0);
 }
 
 static struct crypto_alg alg = {
@@ -92,14 +170,56 @@  static struct crypto_alg alg = {
 	.coa_decompress  	= lzo_decompress } }
 };
 
+static struct scomp_alg scomp = {
+	.alloc_ctx		= lzo_alloc_ctx,
+	.free_ctx		= lzo_free_ctx,
+	.compress		= lzo_scomp_compress,
+	.decompress		= lzo_scomp_decompress,
+	.base			= {
+		.cra_name	= "lzo",
+		.cra_driver_name = "lzo-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
+};
+
 static int __init lzo_mod_init(void)
 {
-	return crypto_register_alg(&alg);
+	int ret;
+
+	lzo_src_scratches = crypto_scomp_alloc_scratches(LZO_SCRATCH_SIZE);
+	if (!lzo_src_scratches)
+		return -ENOMEM;
+
+	lzo_dst_scratches = crypto_scomp_alloc_scratches(LZO_SCRATCH_SIZE);
+	if (!lzo_dst_scratches) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
+	ret = crypto_register_alg(&alg);
+	if (ret)
+		goto error;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg);
+		goto error;
+	}
+
+	return ret;
+
+error:
+	crypto_scomp_free_scratches(lzo_src_scratches);
+	crypto_scomp_free_scratches(lzo_dst_scratches);
+	return ret;
 }
 
 static void __exit lzo_mod_fini(void)
 {
 	crypto_unregister_alg(&alg);
+	crypto_unregister_scomp(&scomp);
+	crypto_scomp_free_scratches(lzo_src_scratches);
+	crypto_scomp_free_scratches(lzo_dst_scratches);
 }
 
 module_init(lzo_mod_init);