diff mbox

[v5,1/2] crypto: skcipher AF_ALG - overhaul memory management

Message ID 16908830.B9gcyeAhPX@positron.chronox.de (mailing list archive)
State Changes Requested
Delegated to: Herbert Xu
Headers show

Commit Message

Stephan Mueller Feb. 17, 2017, 10:31 p.m. UTC
The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

In addition, the code structure is adjusted to match the structure of
algif_aead for easier code assessment.

The user space interface changed slightly as follows: the old AIO
operation returned zero upon success and < 0 in case of an error to user
space. As all other AF_ALG interfaces (including the sync skcipher
interface) returned the number of processed bytes upon success and < 0
in case of an error, the new skcipher interface (regardless of AIO or
sync) returns the number of processed bytes in case of success.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/algif_skcipher.c | 540 ++++++++++++++++++++++--------------------------
 1 file changed, 251 insertions(+), 289 deletions(-)

Comments

Herbert Xu March 16, 2017, 8:39 a.m. UTC | #1
On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote:
>
> +	} else {
> +		/* Synchronous operation */
> +		skcipher_request_set_callback(&areq->req,
> +					      CRYPTO_TFM_REQ_MAY_SLEEP |
> +					      CRYPTO_TFM_REQ_MAY_BACKLOG,
> +					      af_alg_complete,
> +					      &ctx->completion);
> +		err = af_alg_wait_for_completion(ctx->enc ?
> +					crypto_skcipher_encrypt(&areq->req) :
> +					crypto_skcipher_decrypt(&areq->req),
> +						 &ctx->completion);
> +	}

This is now outside of the loop for the sync case.  The purpose
of the loop in the sync case was to segment the data when we get
a very large SG list that does not fit inside a single call.

Or did I miss something?

Overall I like this patch.

Thanks,
Stephan Mueller March 16, 2017, 8:55 a.m. UTC | #2
Am Donnerstag, 16. März 2017, 09:39:23 CET schrieb Herbert Xu:

Hi Herbert,

> On Fri, Feb 17, 2017 at 11:31:41PM +0100, Stephan Müller wrote:
> > +	} else {
> > +		/* Synchronous operation */
> > +		skcipher_request_set_callback(&areq->req,
> > +					      CRYPTO_TFM_REQ_MAY_SLEEP |
> > +					      CRYPTO_TFM_REQ_MAY_BACKLOG,
> > +					      af_alg_complete,
> > +					      &ctx->completion);
> > +		err = af_alg_wait_for_completion(ctx->enc ?
> > +					crypto_skcipher_encrypt(&areq->req) :
> > +					crypto_skcipher_decrypt(&areq->req),
> > +						 &ctx->completion);
> > +	}
> 
> This is now outside of the loop for the sync case.  The purpose
> of the loop in the sync case was to segment the data when we get
> a very large SG list that does not fit inside a single call.
> 
> Or did I miss something?

The while loop present in the skcipher_recvmsg_sync present in the current 
upstream kernel operates on ctx->rsgl. That data structure is defined as:

struct af_alg_sgl {
        struct scatterlist sg[ALG_MAX_PAGES + 1];
        struct page *pages[ALG_MAX_PAGES];
        unsigned int npages;
};

I.e. recvmsg operates on at most 16 SGs where each can take one page of data. 
Thus, if a user provides more data than 16 pages, such while loop would make 
much sense.

The patch proposed here is not limited on a fixed set of 16 SGs in the RX-SGL. 
The recvmsg code path allocates the required amount of SGLs space: 

        /* convert iovecs of output buffers into RX SGL */
        while (len < ctx->used && iov_iter_count(&msg->msg_iter)) {
                struct skcipher_rsgl *rsgl;
...
                        rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
...

struct skcipher_rsgl uses the af_alg_sgl data structure with its 16 SG 
entries. But now you can have arbitrary numbers of af_alg_sgl instances up to 
the memory provided by sock_kmalloc. I.e. recvmsg can now operate on multiples 
of af_alg_sgl in one go.

With this approach I thought that the while loop could be a thing of the past, 
considering that this is also the approach taken in skcipher_recvmsg_async 
that is present in the current upstream code base.

> 
> Overall I like this patch.

Thank you very much.
> 
> Thanks,


Ciao
Stephan
Herbert Xu March 16, 2017, 9:08 a.m. UTC | #3
On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
>
> With this approach I thought that the while loop could be a thing of the past, 
> considering that this is also the approach taken in skcipher_recvmsg_async 
> that is present in the current upstream code base.

The reason there is a limit is so that user-space doesn't pin down
unlimited amounts of memory.  How is this addressed under your
scheme?

Cheers,
Stephan Mueller March 16, 2017, 9:23 a.m. UTC | #4
Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu:

Hi Herbert,

> On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
> > With this approach I thought that the while loop could be a thing of the
> > past, considering that this is also the approach taken in
> > skcipher_recvmsg_async that is present in the current upstream code base.
> 
> The reason there is a limit is so that user-space doesn't pin down
> unlimited amounts of memory.  How is this addressed under your
> scheme?

sock_kmalloc limits the number of SG tables that we can manage. On my system, 
sock_kmalloc has 20480 bytes at its disposal as the limiting factor.

As this concept for limiting the impact of user space on kernel memory is used 
in the current upstream skcipher_recvmsg_async, I simply re-used that 
approach:

        while (iov_iter_count(&msg->msg_iter)) {
                struct skcipher_async_rsgl *rsgl;
...
                        rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
...

Note I use sock_kmalloc instead of the currently existing kmalloc to ensure 
such limitation.

Also, please note that this approach is identical to the currently used code 
in algif_aead:aead_recvmsg_sync. This implementation led be to belief that 
this my new code would be appropriate in this case, too.

Ciao
Stephan
Herbert Xu March 16, 2017, 9:52 a.m. UTC | #5
On Thu, Mar 16, 2017 at 10:23:49AM +0100, Stephan Müller wrote:
> Am Donnerstag, 16. März 2017, 10:08:23 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > On Thu, Mar 16, 2017 at 09:55:17AM +0100, Stephan Müller wrote:
> > > With this approach I thought that the while loop could be a thing of the
> > > past, considering that this is also the approach taken in
> > > skcipher_recvmsg_async that is present in the current upstream code base.
> > 
> > The reason there is a limit is so that user-space doesn't pin down
> > unlimited amounts of memory.  How is this addressed under your
> > scheme?
> 
> sock_kmalloc limits the number of SG tables that we can manage. On my system, 
> sock_kmalloc has 20480 bytes at its disposal as the limiting factor.
> 
> As this concept for limiting the impact of user space on kernel memory is used 
> in the current upstream skcipher_recvmsg_async, I simply re-used that 
> approach:
> 
>         while (iov_iter_count(&msg->msg_iter)) {
>                 struct skcipher_async_rsgl *rsgl;
> ...
>                         rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
> ...
> 
> Note I use sock_kmalloc instead of the currently existing kmalloc to ensure 
> such limitation.

First of all you're only limiting the amount of memory occupied
by the SG list which is not the same thing as the memory pinned
down by the actual recvmsg.

More importantly, with the current code, a very large recvmsg
would still work by doing it piecemeal.  With your patch, won't
it fail because sock_kmalloc could fail to allocate memory for
the whole thing?

Cheers,
Stephan Mueller March 16, 2017, 10:18 a.m. UTC | #6
Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:

Hi Herbert,

> First of all you're only limiting the amount of memory occupied
> by the SG list which is not the same thing as the memory pinned
> down by the actual recvmsg.

I am fully aware of that. As this was present in the code, I thought I could 
reuse that approach.

Are you saying that you want to stop this approach? 
> 
> More importantly, with the current code, a very large recvmsg
> would still work by doing it piecemeal.

This piecemeal would be done in the sync case. In the current AIO code path, 
such piecemeal would not be achieved.

> With your patch, won't
> it fail because sock_kmalloc could fail to allocate memory for
> the whole thing?

I this case yes. To handle AIO and sync in a common code path (and thus make 
both behave identically), this would be the result.

If very large recvmsg requests are to be handled, the socket option could be 
increased or the caller would piecemeal that request.

Otherwise, if do a piecemeal handling like it is done now, I see the following 
drawbacks:

- no common sync/AIO code and no common behavior

- no common approach between skcipher and other implementations (algif_aead 
and the algif_akcipher that I have in the pipe which looks identically to the 
proposed patchset)

Ciao
Stephan
Stephan Mueller March 16, 2017, 12:20 p.m. UTC | #7
Am Donnerstag, 16. März 2017, 11:18:33 CET schrieb Stephan Müller:

Hi Stephan,

> Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > First of all you're only limiting the amount of memory occupied
> > by the SG list which is not the same thing as the memory pinned
> > down by the actual recvmsg.
> 
> I am fully aware of that. As this was present in the code, I thought I could
> reuse that approach.
> 
> Are you saying that you want to stop this approach?

I would like to bring another point to the table for this discussion relating 
to the maximum amount of memory to be used as well as for the size of the RX-
SGL: allow us please to consider skcipher_sendmsg (sendpage works conceptually 
similar, so it should be covered with this discussion as well).

skcipher_alloc_sgl used in the sendmsg code path is conceptually identical to 
the RX-SGL allocation that I propose here: it allocates a new SGL with 
MAX_SGL_ENTS entries as long as the caller sends data. It limits the amount of 
memory to be allocated based on the maximum size of the SG management data and 
not the actual plaintext/ciphertext data sent by user space. I again was 
therefore under the impression that my suggested approach in the recvmsg code 
path regarding allocation of memory would be allowed.

Regarding the amount of data processed with the RX-SGL, I would like to 
consider the size of the TX-SGL. In the skcipher case, the RX-SGL (or the 
anticipated RX data to be processed) does not need to be larger than the TX-
SGL. Hence the currently existing while() loop of the upstream code that we 
discuss here will only be executed as often to fulfill the available TX-SGL 
data size. Given that both are limited on the sock_kmalloc memory limitation, 
I would imply that using a conceptually identical allocation approach for the 
TX SGL and RX SGL would be a sufficient replacement for the while-loop without 
being visible to user space (i.e. without causing a limit that was not there 
before).

Ciao
Stephan
Stephan Mueller March 19, 2017, 4:34 a.m. UTC | #8
Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:

Hi Herbert,
> 
> First of all you're only limiting the amount of memory occupied
> by the SG list which is not the same thing as the memory pinned
> down by the actual recvmsg.

When considering af_alg_make_sg, the function iov_iter_get_pages is used to 
obtain the user space pages for recvmsg. When looking closely at that 
function, I would understand that the user space pages are pinned into memory 
and made accessible from kernel space. That said, I would infer from the code 
that no kernel-local memory is allocated for the real data. Similar to zero-
copy, pages are shared between kernel and user that will hold the crypto 
operation result (ciphertext for enc or plaintext for dec).

Therefore, I would infer that the memory usage of the discussed code is 
limited to the skcipher_rsgl meta data which is limited by sock_kmalloc. This 
finally would imply that the kernel will not occupy more memory than allocated 
by sock_kmalloc.

Or am I missing something?

Thanks
Stephan
Stephan Mueller March 30, 2017, 1:59 p.m. UTC | #9
Am Donnerstag, 16. März 2017, 10:52:48 CEST schrieb Herbert Xu:

Hi Herbert,

> More importantly, with the current code, a very large recvmsg
> would still work by doing it piecemeal.  With your patch, won't
> it fail because sock_kmalloc could fail to allocate memory for
> the whole thing?

For testing purpose, I wrote the app that is attached. It simply encrypts a 
given file with ctr(aes).

On the current implementation of algif_skcipher where I directly pipe in the 
provided memory block (i.e. using the stream operation of libkcapi):

dd if=/dev/zero of=testfile bs=1024 count=1000000
./kcapi-long testfile testfile.out
encryption failed with error -14

==> The -EFAULT happens at sendmsg() -- i.e. you cannot provide as much data 
as you want.

==> On the proposed update, I see the same error.

When I use vmsplice, the current implementation somehow simply waits (I do not 
yet know what happens). My new implementation simply returns the amount of 
data that could have been spliced (64k -- which would allow user space to 
implement a loop to send chunks).


When I ask libkcapi to send the data in chunks, libkcapi implements the loop 
that sends/receives the data. In this case, both implementations of 
algif_skcipher.c work to encrypt the whole file regardless of its size.

Thus, I would conclude that the current outer loop in recvmsg of the current 
algif_skcipher is not really helpful as the bottleneck is on the sendmsg side.


With this, I would conclude that the new implementation of algif_skcipher.c 
proposed in this patch set has the same behavior as the old one.

Ciao
Stephan
Herbert Xu March 31, 2017, 10:33 a.m. UTC | #10
On Thu, Mar 16, 2017 at 11:18:33AM +0100, Stephan Müller wrote:
> Am Donnerstag, 16. März 2017, 10:52:48 CET schrieb Herbert Xu:
> 
> Hi Herbert,
> 
> > First of all you're only limiting the amount of memory occupied
> > by the SG list which is not the same thing as the memory pinned
> > down by the actual recvmsg.
> 
> I am fully aware of that. As this was present in the code, I thought I could 
> reuse that approach.
> 
> Are you saying that you want to stop this approach? 

No you're confusing things.  Previously there was an explicit
limit on the number of pages that can be pinned.  Now you're
only indirectly limiting it by limiting the size of the metadata
through sock_kmalloc.

The end result is that you're now allowing a huge amount of user
memory to be pinned down by the system call.  This is *unacceptable*.

Cheers,
diff mbox

Patch

diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a9e79d8..d3e0226 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -10,6 +10,21 @@ 
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
  *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. As part of the recvmsg operation, the processed
+ * TX buffers are extracted from the TX SGL into a separate SGL.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The extracted TX SGL parts are released together with
+ * the RX SGL release.
  */
 
 #include <crypto/scatterwalk.h>
@@ -23,86 +38,58 @@ 
 #include <linux/net.h>
 #include <net/sock.h>
 
-struct skcipher_sg_list {
+struct skcipher_tsgl {
 	struct list_head list;
-
 	int cur;
-
 	struct scatterlist sg[0];
 };
 
+struct skcipher_rsgl {
+	struct af_alg_sgl sgl;
+	struct list_head list;
+};
+
+struct skcipher_async_req {
+	struct kiocb *iocb;
+	struct sock *sk;
+
+	struct skcipher_rsgl first_sgl;
+	struct list_head rsgl_list;
+
+	struct scatterlist *tsgl;
+	unsigned int tsgl_entries;
+
+	unsigned int areqlen;
+	struct skcipher_request req;
+};
+
 struct skcipher_tfm {
 	struct crypto_skcipher *skcipher;
 	bool has_key;
 };
 
 struct skcipher_ctx {
-	struct list_head tsgl;
-	struct af_alg_sgl rsgl;
+	struct list_head tsgl_list;
 
 	void *iv;
 
 	struct af_alg_completion completion;
 
-	atomic_t inflight;
+	unsigned int inflight;
 	size_t used;
 
-	unsigned int len;
 	bool more;
 	bool merge;
 	bool enc;
 
-	struct skcipher_request req;
-};
-
-struct skcipher_async_rsgl {
-	struct af_alg_sgl sgl;
-	struct list_head list;
+	unsigned int len;
 };
 
-struct skcipher_async_req {
-	struct kiocb *iocb;
-	struct skcipher_async_rsgl first_sgl;
-	struct list_head list;
-	struct scatterlist *tsg;
-	atomic_t *inflight;
-	struct skcipher_request req;
-};
+static DECLARE_WAIT_QUEUE_HEAD(skcipher_aio_finish_wait);
 
-#define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_sg_list)) / \
+#define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_tsgl)) / \
 		      sizeof(struct scatterlist) - 1)
 
-static void skcipher_free_async_sgls(struct skcipher_async_req *sreq)
-{
-	struct skcipher_async_rsgl *rsgl, *tmp;
-	struct scatterlist *sgl;
-	struct scatterlist *sg;
-	int i, n;
-
-	list_for_each_entry_safe(rsgl, tmp, &sreq->list, list) {
-		af_alg_free_sg(&rsgl->sgl);
-		if (rsgl != &sreq->first_sgl)
-			kfree(rsgl);
-	}
-	sgl = sreq->tsg;
-	n = sg_nents(sgl);
-	for_each_sg(sgl, sg, n, i)
-		put_page(sg_page(sg));
-
-	kfree(sreq->tsg);
-}
-
-static void skcipher_async_cb(struct crypto_async_request *req, int err)
-{
-	struct skcipher_async_req *sreq = req->data;
-	struct kiocb *iocb = sreq->iocb;
-
-	atomic_dec(sreq->inflight);
-	skcipher_free_async_sgls(sreq);
-	kzfree(sreq);
-	iocb->ki_complete(iocb, err, err);
-}
-
 static inline int skcipher_sndbuf(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
@@ -117,15 +104,15 @@  static inline bool skcipher_writable(struct sock *sk)
 	return PAGE_SIZE <= skcipher_sndbuf(sk);
 }
 
-static int skcipher_alloc_sgl(struct sock *sk)
+static int skcipher_alloc_tsgl(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	struct skcipher_sg_list *sgl;
+	struct skcipher_tsgl *sgl;
 	struct scatterlist *sg = NULL;
 
-	sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
-	if (!list_empty(&ctx->tsgl))
+	sgl = list_entry(ctx->tsgl_list.prev, struct skcipher_tsgl, list);
+	if (!list_empty(&ctx->tsgl_list))
 		sg = sgl->sg;
 
 	if (!sg || sgl->cur >= MAX_SGL_ENTS) {
@@ -141,31 +128,66 @@  static int skcipher_alloc_sgl(struct sock *sk)
 		if (sg)
 			sg_chain(sg, MAX_SGL_ENTS + 1, sgl->sg);
 
-		list_add_tail(&sgl->list, &ctx->tsgl);
+		list_add_tail(&sgl->list, &ctx->tsgl_list);
 	}
 
 	return 0;
 }
 
-static void skcipher_pull_sgl(struct sock *sk, size_t used, int put)
+static unsigned int skcipher_count_tsgl(struct sock *sk, size_t bytes)
+{
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct skcipher_tsgl *sgl, *tmp;
+	unsigned int i;
+	unsigned int sgl_count = 0;
+
+	if (!bytes)
+		return 0;
+
+	list_for_each_entry_safe(sgl, tmp, &ctx->tsgl_list, list) {
+		struct scatterlist *sg = sgl->sg;
+
+		for (i = 0; i < sgl->cur; i++) {
+			sgl_count++;
+			if (sg[i].length >= bytes)
+				return sgl_count;
+
+			bytes -= sg[i].length;
+		}
+	}
+
+	return sgl_count;
+}
+
+static void skcipher_pull_tsgl(struct sock *sk, size_t used,
+			       struct scatterlist *dst)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	struct skcipher_sg_list *sgl;
+	struct skcipher_tsgl *sgl;
 	struct scatterlist *sg;
-	int i;
+	unsigned int i;
 
-	while (!list_empty(&ctx->tsgl)) {
-		sgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list,
+	while (!list_empty(&ctx->tsgl_list)) {
+		sgl = list_first_entry(&ctx->tsgl_list, struct skcipher_tsgl,
 				       list);
 		sg = sgl->sg;
 
 		for (i = 0; i < sgl->cur; i++) {
 			size_t plen = min_t(size_t, used, sg[i].length);
+			struct page *page = sg_page(sg + i);
 
-			if (!sg_page(sg + i))
+			if (!page)
 				continue;
 
+			/*
+			 * Assumption: caller created skcipher_count_tsgl(len)
+			 * SG entries in dst.
+			 */
+			if (dst)
+				sg_set_page(dst + i, page, plen, sg[i].offset);
+
 			sg[i].length -= plen;
 			sg[i].offset += plen;
 
@@ -174,27 +196,45 @@  static void skcipher_pull_sgl(struct sock *sk, size_t used, int put)
 
 			if (sg[i].length)
 				return;
-			if (put)
-				put_page(sg_page(sg + i));
+
+			if (!dst)
+				put_page(page);
 			sg_assign_page(sg + i, NULL);
 		}
 
 		list_del(&sgl->list);
-		sock_kfree_s(sk, sgl,
-			     sizeof(*sgl) + sizeof(sgl->sg[0]) *
-					    (MAX_SGL_ENTS + 1));
+		sock_kfree_s(sk, sgl, sizeof(*sgl) + sizeof(sgl->sg[0]) *
+						     (MAX_SGL_ENTS + 1));
 	}
 
 	if (!ctx->used)
 		ctx->merge = 0;
 }
 
-static void skcipher_free_sgl(struct sock *sk)
+static void skcipher_free_areq_sgls(struct skcipher_async_req *areq)
 {
-	struct alg_sock *ask = alg_sk(sk);
-	struct skcipher_ctx *ctx = ask->private;
+	struct sock *sk = areq->sk;
+	struct skcipher_rsgl *rsgl, *tmp;
+	struct scatterlist *tsgl;
+	struct scatterlist *sg;
+	unsigned int i;
+
+	list_for_each_entry_safe(rsgl, tmp, &areq->rsgl_list, list) {
+		af_alg_free_sg(&rsgl->sgl);
+		list_del(&rsgl->list);
+		if (rsgl != &areq->first_sgl)
+			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
+	}
+
+	tsgl = areq->tsgl;
+	for_each_sg(tsgl, sg, areq->tsgl_entries, i) {
+		if (!sg_page(sg))
+			continue;
+		put_page(sg_page(sg));
+	}
 
-	skcipher_pull_sgl(sk, ctx->used, 1);
+	if (areq->tsgl && areq->tsgl_entries)
+		sock_kfree_s(sk, tsgl, areq->tsgl_entries * sizeof(*tsgl));
 }
 
 static int skcipher_wait_for_wmem(struct sock *sk, unsigned flags)
@@ -301,7 +341,7 @@  static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
 	struct skcipher_tfm *skc = pask->private;
 	struct crypto_skcipher *tfm = skc->skcipher;
 	unsigned ivsize = crypto_skcipher_ivsize(tfm);
-	struct skcipher_sg_list *sgl;
+	struct skcipher_tsgl *sgl;
 	struct af_alg_control con = {};
 	long copied = 0;
 	bool enc = 0;
@@ -348,8 +388,8 @@  static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
 		size_t plen;
 
 		if (ctx->merge) {
-			sgl = list_entry(ctx->tsgl.prev,
-					 struct skcipher_sg_list, list);
+			sgl = list_entry(ctx->tsgl_list.prev,
+					 struct skcipher_tsgl, list);
 			sg = sgl->sg + sgl->cur - 1;
 			len = min_t(unsigned long, len,
 				    PAGE_SIZE - sg->offset - sg->length);
@@ -378,11 +418,12 @@  static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		len = min_t(unsigned long, len, skcipher_sndbuf(sk));
 
-		err = skcipher_alloc_sgl(sk);
+		err = skcipher_alloc_tsgl(sk);
 		if (err)
 			goto unlock;
 
-		sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
+		sgl = list_entry(ctx->tsgl_list.prev, struct skcipher_tsgl,
+				 list);
 		sg = sgl->sg;
 		if (sgl->cur)
 			sg_unmark_end(sg + sgl->cur - 1);
@@ -434,7 +475,7 @@  static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	struct skcipher_sg_list *sgl;
+	struct skcipher_tsgl *sgl;
 	int err = -EINVAL;
 
 	if (flags & MSG_SENDPAGE_NOTLAST)
@@ -453,12 +494,12 @@  static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
 			goto unlock;
 	}
 
-	err = skcipher_alloc_sgl(sk);
+	err = skcipher_alloc_tsgl(sk);
 	if (err)
 		goto unlock;
 
 	ctx->merge = 0;
-	sgl = list_entry(ctx->tsgl.prev, struct skcipher_sg_list, list);
+	sgl = list_entry(ctx->tsgl_list.prev, struct skcipher_tsgl, list);
 
 	if (sgl->cur)
 		sg_unmark_end(sgl->sg + sgl->cur - 1);
@@ -479,25 +520,36 @@  static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
 	return err ?: size;
 }
 
-static int skcipher_all_sg_nents(struct skcipher_ctx *ctx)
+static void skcipher_async_cb(struct crypto_async_request *req, int err)
 {
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	int nents = 0;
+	struct skcipher_async_req *areq = req->data;
+	struct sock *sk = areq->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct kiocb *iocb = areq->iocb;
+	unsigned int resultlen;
 
-	list_for_each_entry(sgl, &ctx->tsgl, list) {
-		sg = sgl->sg;
+	lock_sock(sk);
 
-		while (!sg->length)
-			sg++;
+	BUG_ON(!ctx->inflight);
 
-		nents += sg_nents(sg);
-	}
-	return nents;
+	/* Buffer size written by crypto operation. */
+	resultlen = areq->req.cryptlen;
+
+	skcipher_free_areq_sgls(areq);
+	sock_kfree_s(sk, areq, areq->areqlen);
+	__sock_put(sk);
+	ctx->inflight--;
+
+	iocb->ki_complete(iocb, err ? err : resultlen, 0);
+
+	release_sock(sk);
+
+	wake_up_interruptible(&skcipher_aio_finish_wait);
 }
 
-static int skcipher_recvmsg_async(struct socket *sock, struct msghdr *msg,
-				  int flags)
+static int skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
+			    size_t ignored, int flags)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
@@ -506,215 +558,137 @@  static int skcipher_recvmsg_async(struct socket *sock, struct msghdr *msg,
 	struct skcipher_ctx *ctx = ask->private;
 	struct skcipher_tfm *skc = pask->private;
 	struct crypto_skcipher *tfm = skc->skcipher;
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	struct skcipher_async_req *sreq;
-	struct skcipher_request *req;
-	struct skcipher_async_rsgl *last_rsgl = NULL;
-	unsigned int txbufs = 0, len = 0, tx_nents;
-	unsigned int reqsize = crypto_skcipher_reqsize(tfm);
-	unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+	unsigned int bs = crypto_skcipher_blocksize(tfm);
+	unsigned int areqlen = sizeof(struct skcipher_async_req) +
+			       crypto_skcipher_reqsize(tfm);
+	struct skcipher_async_req *areq;
+	struct skcipher_rsgl *last_rsgl = NULL;
 	int err = -ENOMEM;
-	bool mark = false;
-	char *iv;
-
-	sreq = kzalloc(sizeof(*sreq) + reqsize + ivsize, GFP_KERNEL);
-	if (unlikely(!sreq))
-		goto out;
-
-	req = &sreq->req;
-	iv = (char *)(req + 1) + reqsize;
-	sreq->iocb = msg->msg_iocb;
-	INIT_LIST_HEAD(&sreq->list);
-	sreq->inflight = &ctx->inflight;
+	size_t len = 0;
 
 	lock_sock(sk);
-	tx_nents = skcipher_all_sg_nents(ctx);
-	sreq->tsg = kcalloc(tx_nents, sizeof(*sg), GFP_KERNEL);
-	if (unlikely(!sreq->tsg))
+
+	/* Allocate cipher request for current operation. */
+	areq = sock_kmalloc(sk, areqlen, GFP_KERNEL);
+	if (unlikely(!areq))
 		goto unlock;
-	sg_init_table(sreq->tsg, tx_nents);
-	memcpy(iv, ctx->iv, ivsize);
-	skcipher_request_set_tfm(req, tfm);
-	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP,
-				      skcipher_async_cb, sreq);
+	areq->areqlen = areqlen;
+	areq->sk = sk;
+	INIT_LIST_HEAD(&areq->rsgl_list);
+	areq->tsgl = NULL;
+	areq->tsgl_entries = 0;
 
-	while (iov_iter_count(&msg->msg_iter)) {
-		struct skcipher_async_rsgl *rsgl;
-		int used;
+	/* convert iovecs of output buffers into RX SGL */
+	while (len < ctx->used && iov_iter_count(&msg->msg_iter)) {
+		struct skcipher_rsgl *rsgl;
+		size_t seglen;
 
 		if (!ctx->used) {
 			err = skcipher_wait_for_data(sk, flags);
 			if (err)
 				goto free;
 		}
-		sgl = list_first_entry(&ctx->tsgl,
-				       struct skcipher_sg_list, list);
-		sg = sgl->sg;
 
-		while (!sg->length)
-			sg++;
-
-		used = min_t(unsigned long, ctx->used,
-			     iov_iter_count(&msg->msg_iter));
-		used = min_t(unsigned long, used, sg->length);
-
-		if (txbufs == tx_nents) {
-			struct scatterlist *tmp;
-			int x;
-			/* Ran out of tx slots in async request
-			 * need to expand */
-			tmp = kcalloc(tx_nents * 2, sizeof(*tmp),
-				      GFP_KERNEL);
-			if (!tmp) {
-				err = -ENOMEM;
-				goto free;
-			}
+		seglen = min_t(size_t, ctx->used,
+			       iov_iter_count(&msg->msg_iter));
 
-			sg_init_table(tmp, tx_nents * 2);
-			for (x = 0; x < tx_nents; x++)
-				sg_set_page(&tmp[x], sg_page(&sreq->tsg[x]),
-					    sreq->tsg[x].length,
-					    sreq->tsg[x].offset);
-			kfree(sreq->tsg);
-			sreq->tsg = tmp;
-			tx_nents *= 2;
-			mark = true;
-		}
-		/* Need to take over the tx sgl from ctx
-		 * to the asynch req - these sgls will be freed later */
-		sg_set_page(sreq->tsg + txbufs++, sg_page(sg), sg->length,
-			    sg->offset);
-
-		if (list_empty(&sreq->list)) {
-			rsgl = &sreq->first_sgl;
-			list_add_tail(&rsgl->list, &sreq->list);
+		if (list_empty(&areq->rsgl_list)) {
+			rsgl = &areq->first_sgl;
 		} else {
-			rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
+			rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
 			if (!rsgl) {
 				err = -ENOMEM;
 				goto free;
 			}
-			list_add_tail(&rsgl->list, &sreq->list);
 		}
 
-		used = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, used);
-		err = used;
-		if (used < 0)
+		rsgl->sgl.npages = 0;
+		list_add_tail(&rsgl->list, &areq->rsgl_list);
+
+		/* make one iovec available as scatterlist */
+		err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen);
+		if (err < 0)
 			goto free;
+
+		/* chain the new scatterlist with previous one */
 		if (last_rsgl)
 			af_alg_link_sg(&last_rsgl->sgl, &rsgl->sgl);
 
 		last_rsgl = rsgl;
-		len += used;
-		skcipher_pull_sgl(sk, used, 0);
-		iov_iter_advance(&msg->msg_iter, used);
+		len += err;
+		iov_iter_advance(&msg->msg_iter, err);
 	}
 
-	if (mark)
-		sg_mark_end(sreq->tsg + txbufs - 1);
+	/* Process only as much RX buffers for which we have TX data */
+	if (len > ctx->used)
+		len = ctx->used;
+
+	/*
+	 * If more buffers are to be expected to be processed, process only
+	 * full block size buffers.
+	 */
+	if (ctx->more || len < ctx->used)
+		len -= len % bs;
+
+	/*
+	 * Create a per request TX SGL for this request which tracks the
+	 * SG entries from the global TX SGL.
+	 */
+	areq->tsgl_entries = skcipher_count_tsgl(sk, len);
+	if (!areq->tsgl_entries)
+		areq->tsgl_entries = 1;
+	areq->tsgl = sock_kmalloc(sk, sizeof(*areq->tsgl) * areq->tsgl_entries,
+				  GFP_KERNEL);
+	if (!areq->tsgl) {
+		err = -ENOMEM;
+		goto free;
+	}
+	sg_init_table(areq->tsgl, areq->tsgl_entries);
+	skcipher_pull_tsgl(sk, len, areq->tsgl);
+
+	/* Initialize the crypto operation */
+	skcipher_request_set_tfm(&areq->req, tfm);
+	skcipher_request_set_crypt(&areq->req, areq->tsgl,
+				   areq->first_sgl.sgl.sg, len, ctx->iv);
+
+	if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
+		/* AIO operation */
+		areq->iocb = msg->msg_iocb;
+		skcipher_request_set_callback(&areq->req,
+					      CRYPTO_TFM_REQ_MAY_SLEEP,
+					      skcipher_async_cb, areq);
+		err = ctx->enc ? crypto_skcipher_encrypt(&areq->req) :
+				 crypto_skcipher_decrypt(&areq->req);
+	} else {
+		/* Synchronous operation */
+		skcipher_request_set_callback(&areq->req,
+					      CRYPTO_TFM_REQ_MAY_SLEEP |
+					      CRYPTO_TFM_REQ_MAY_BACKLOG,
+					      af_alg_complete,
+					      &ctx->completion);
+		err = af_alg_wait_for_completion(ctx->enc ?
+					crypto_skcipher_encrypt(&areq->req) :
+					crypto_skcipher_decrypt(&areq->req),
+						 &ctx->completion);
+	}
 
-	skcipher_request_set_crypt(req, sreq->tsg, sreq->first_sgl.sgl.sg,
-				   len, iv);
-	err = ctx->enc ? crypto_skcipher_encrypt(req) :
-			 crypto_skcipher_decrypt(req);
+	/* AIO operation in progress */
 	if (err == -EINPROGRESS) {
-		atomic_inc(&ctx->inflight);
+		sock_hold(sk);
 		err = -EIOCBQUEUED;
-		sreq = NULL;
+		ctx->inflight++;
 		goto unlock;
 	}
-free:
-	skcipher_free_async_sgls(sreq);
-unlock:
-	skcipher_wmem_wakeup(sk);
-	release_sock(sk);
-	kzfree(sreq);
-out:
-	return err;
-}
-
-static int skcipher_recvmsg_sync(struct socket *sock, struct msghdr *msg,
-				 int flags)
-{
-	struct sock *sk = sock->sk;
-	struct alg_sock *ask = alg_sk(sk);
-	struct sock *psk = ask->parent;
-	struct alg_sock *pask = alg_sk(psk);
-	struct skcipher_ctx *ctx = ask->private;
-	struct skcipher_tfm *skc = pask->private;
-	struct crypto_skcipher *tfm = skc->skcipher;
-	unsigned bs = crypto_skcipher_blocksize(tfm);
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	int err = -EAGAIN;
-	int used;
-	long copied = 0;
-
-	lock_sock(sk);
-	while (msg_data_left(msg)) {
-		if (!ctx->used) {
-			err = skcipher_wait_for_data(sk, flags);
-			if (err)
-				goto unlock;
-		}
-
-		used = min_t(unsigned long, ctx->used, msg_data_left(msg));
-
-		used = af_alg_make_sg(&ctx->rsgl, &msg->msg_iter, used);
-		err = used;
-		if (err < 0)
-			goto unlock;
-
-		if (ctx->more || used < ctx->used)
-			used -= used % bs;
-
-		err = -EINVAL;
-		if (!used)
-			goto free;
-
-		sgl = list_first_entry(&ctx->tsgl,
-				       struct skcipher_sg_list, list);
-		sg = sgl->sg;
-
-		while (!sg->length)
-			sg++;
-
-		skcipher_request_set_crypt(&ctx->req, sg, ctx->rsgl.sg, used,
-					   ctx->iv);
-
-		err = af_alg_wait_for_completion(
-				ctx->enc ?
-					crypto_skcipher_encrypt(&ctx->req) :
-					crypto_skcipher_decrypt(&ctx->req),
-				&ctx->completion);
 
 free:
-		af_alg_free_sg(&ctx->rsgl);
-
-		if (err)
-			goto unlock;
-
-		copied += used;
-		skcipher_pull_sgl(sk, used, 1);
-		iov_iter_advance(&msg->msg_iter, used);
-	}
-
-	err = 0;
+	skcipher_free_areq_sgls(areq);
+	if (areq)
+		sock_kfree_s(sk, areq, areqlen);
 
 unlock:
 	skcipher_wmem_wakeup(sk);
 	release_sock(sk);
-
-	return copied ?: err;
-}
-
-static int skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
-			    size_t ignored, int flags)
-{
-	return (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) ?
-		skcipher_recvmsg_async(sock, msg, flags) :
-		skcipher_recvmsg_sync(sock, msg, flags);
+	return err ? err : len;
 }
 
 static unsigned int skcipher_poll(struct file *file, struct socket *sock,
@@ -723,10 +697,9 @@  static unsigned int skcipher_poll(struct file *file, struct socket *sock,
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	unsigned int mask;
+	unsigned int mask = 0;
 
 	sock_poll_wait(file, sk_sleep(sk), wait);
-	mask = 0;
 
 	if (ctx->used)
 		mask |= POLLIN | POLLRDNORM;
@@ -894,26 +867,20 @@  static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen)
 	return err;
 }
 
-static void skcipher_wait(struct sock *sk)
-{
-	struct alg_sock *ask = alg_sk(sk);
-	struct skcipher_ctx *ctx = ask->private;
-	int ctr = 0;
-
-	while (atomic_read(&ctx->inflight) && ctr++ < 100)
-		msleep(100);
-}
-
 static void skcipher_sock_destruct(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(&ctx->req);
+	struct sock *psk = ask->parent;
+	struct alg_sock *pask = alg_sk(psk);
+	struct skcipher_tfm *skc = pask->private;
+	struct crypto_skcipher *tfm = skc->skcipher;
 
-	if (atomic_read(&ctx->inflight))
-		skcipher_wait(sk);
+	/* Suspend caller if AIO operations are in flight. */
+	wait_event_interruptible(skcipher_aio_finish_wait,
+				 (ctx->inflight == 0));
 
-	skcipher_free_sgl(sk);
+	skcipher_pull_tsgl(sk, ctx->used, NULL);
 	sock_kzfree_s(sk, ctx->iv, crypto_skcipher_ivsize(tfm));
 	sock_kfree_s(sk, ctx, ctx->len);
 	af_alg_release_parent(sk);
@@ -925,7 +892,7 @@  static int skcipher_accept_parent_nokey(void *private, struct sock *sk)
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_tfm *tfm = private;
 	struct crypto_skcipher *skcipher = tfm->skcipher;
-	unsigned int len = sizeof(*ctx) + crypto_skcipher_reqsize(skcipher);
+	unsigned int len = sizeof(*ctx);
 
 	ctx = sock_kmalloc(sk, len, GFP_KERNEL);
 	if (!ctx)
@@ -940,22 +907,17 @@  static int skcipher_accept_parent_nokey(void *private, struct sock *sk)
 
 	memset(ctx->iv, 0, crypto_skcipher_ivsize(skcipher));
 
-	INIT_LIST_HEAD(&ctx->tsgl);
+	INIT_LIST_HEAD(&ctx->tsgl_list);
 	ctx->len = len;
 	ctx->used = 0;
+	ctx->inflight = 0;
 	ctx->more = 0;
 	ctx->merge = 0;
 	ctx->enc = 0;
-	atomic_set(&ctx->inflight, 0);
 	af_alg_init_completion(&ctx->completion);
 
 	ask->private = ctx;
 
-	skcipher_request_set_tfm(&ctx->req, skcipher);
-	skcipher_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_SLEEP |
-						 CRYPTO_TFM_REQ_MAY_BACKLOG,
-				      af_alg_complete, &ctx->completion);
-
 	sk->sk_destruct = skcipher_sock_destruct;
 
 	return 0;