diff mbox series

[net,3/5] tls: don't skip over different type records from the rx_list

Message ID f00c0c0afa080c60f016df1471158c1caf983c34.1708007371.git.sd@queasysnail.net (mailing list archive)
State Accepted
Commit ec823bf3a479d42c589dc0f28ef4951c49cd2d2a
Delegated to: Netdev Maintainers
Headers show
Series tls: fixes for record type handling with PEEK | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 956 this patch: 956
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 973 this patch: 973
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 973 this patch: 973
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-19--18-00 (tests: 1449)

Commit Message

Sabrina Dubroca Feb. 15, 2024, 4:17 p.m. UTC
If we queue 3 records:
 - record 1, type DATA
 - record 2, some other type
 - record 3, type DATA
and do a recv(PEEK), the rx_list will contain the first two records.

The next large recv will walk through the rx_list and copy data from
record 1, then stop because record 2 is a different type. Since we
haven't filled up our buffer, we will process the next available
record. It's also DATA, so we can merge it with the current read.

We shouldn't do that, since there was a record in between that we
ignored.

Add a flag to let process_rx_list inform tls_sw_recvmsg that it had
more data available.

Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
 net/tls/tls_sw.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

Comments

Jakub Kicinski Feb. 19, 2024, 8:07 p.m. UTC | #1
On Thu, 15 Feb 2024 17:17:31 +0100 Sabrina Dubroca wrote:
> @@ -1772,7 +1772,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
>  			   u8 *control,
>  			   size_t skip,
>  			   size_t len,
> -			   bool is_peek)
> +			   bool is_peek,
> +			   bool *more)
>  {
>  	struct sk_buff *skb = skb_peek(&ctx->rx_list);
>  	struct tls_msg *tlm;


> @@ -1844,6 +1845,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
>  
>  out:
>  	return copied ? : err;
> +more:
> +	if (more)
> +		*more = true;
> +	goto out;

Patches look correct, one small nit here -

I don't have great ideas how to avoid the 7th argument completely but 
I think it'd be a little cleaner if we either:
 - passed in err as an output argument (some datagram code does that
   IIRC), then function can always return copied directly, or 
 - passed copied as an output argument, and then we can always return
   err?
I like the former a little better because we won't have to special case
NULL for the "after async decryption" call sites.
Sabrina Dubroca Feb. 19, 2024, 11:10 p.m. UTC | #2
2024-02-19, 12:07:03 -0800, Jakub Kicinski wrote:
> On Thu, 15 Feb 2024 17:17:31 +0100 Sabrina Dubroca wrote:
> > @@ -1772,7 +1772,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> >  			   u8 *control,
> >  			   size_t skip,
> >  			   size_t len,
> > -			   bool is_peek)
> > +			   bool is_peek,
> > +			   bool *more)
> >  {
> >  	struct sk_buff *skb = skb_peek(&ctx->rx_list);
> >  	struct tls_msg *tlm;
> 
> 
> > @@ -1844,6 +1845,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> >  
> >  out:
> >  	return copied ? : err;
> > +more:
> > +	if (more)
> > +		*more = true;
> > +	goto out;
> 
> Patches look correct, one small nit here -
> 
> I don't have great ideas how to avoid the 7th argument completely but 

I hesitated between this patch and a variant combining is_peek and
more into a single u8 *flags, but that felt a bit messy (or does that
fall into what you describe as "not [having] great ideas"? :))

@@ -1772,9 +1777,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 			   u8 *control,
 			   size_t skip,
 			   size_t len,
-			   bool is_peek)
+			   u8 *flags)
 {
 	struct sk_buff *skb = skb_peek(&ctx->rx_list);
+	bool is_peek = *flags & RXLIST_PEEK;
 	struct tls_msg *tlm;
 	ssize_t copied = 0;
 	int err;
[...]
@@ -1844,6 +1850,9 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 
 out:
 	return copied ? : err;
+more:
+	*flags |= RXLIST_MORE;
+	goto out;
 }


and then in tls_sw_recvmsg:
u8 rxlist_flags = is_peek ? RXLIST_PEEK : 0;
err = process_rx_list(ctx, msg, &control, 0, len, &rxlist_flags);


> I think it'd be a little cleaner if we either:
>  - passed in err as an output argument (some datagram code does that
>    IIRC), then function can always return copied directly, or 

(yes, __skb_wait_for_more_packets, __skb_try_recv_datagram, and their
variants)

>  - passed copied as an output argument, and then we can always return
>    err?

Aren't those 2 options adding an 8th argument?

I tend to find ">= 0 on success, otherwise errno" more readable,
probably because that's a very common pattern (either for recvmsg
style of cases, or all the ERR_PTR type situations).

> I like the former a little better because we won't have to special case
> NULL for the "after async decryption" call sites.

We could also pass &rx_more every time and not check for NULL.

What do you want to clean up more specifically? The number of
arguments, the backwards goto, the NULL check before setting *more,
something else/all of the above?
Jakub Kicinski Feb. 21, 2024, 1:50 a.m. UTC | #3
On Tue, 20 Feb 2024 00:10:58 +0100 Sabrina Dubroca wrote:
> 2024-02-19, 12:07:03 -0800, Jakub Kicinski wrote:
> > On Thu, 15 Feb 2024 17:17:31 +0100 Sabrina Dubroca wrote:  
> > > @@ -1772,7 +1772,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> > >  			   u8 *control,
> > >  			   size_t skip,
> > >  			   size_t len,
> > > -			   bool is_peek)
> > > +			   bool is_peek,
> > > +			   bool *more)
> > >  {
> > >  	struct sk_buff *skb = skb_peek(&ctx->rx_list);
> > >  	struct tls_msg *tlm;  
> > 
> > > @@ -1844,6 +1845,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> > >  
> > >  out:
> > >  	return copied ? : err;
> > > +more:
> > > +	if (more)
> > > +		*more = true;
> > > +	goto out;  
> > 
> > Patches look correct, one small nit here -
> > 
> > I don't have great ideas how to avoid the 7th argument completely but   
> 
> I hesitated between this patch and a variant combining is_peek and
> more into a single u8 *flags, but that felt a bit messy (or does that
> fall into what you describe as "not [having] great ideas"? :))

I guess it saves a register, it seems a bit better but then it's a
truly in/out argument :)

> > I think it'd be a little cleaner if we either:
> >  - passed in err as an output argument (some datagram code does that
> >    IIRC), then function can always return copied directly, or   
> 
> (yes, __skb_wait_for_more_packets, __skb_try_recv_datagram, and their
> variants)
> 
> >  - passed copied as an output argument, and then we can always return
> >    err?  
> 
> Aren't those 2 options adding an 8th argument?

No, no, still 7, if we separate copied from err - checking err < 0
is enough to know that we need to exit.

Differently put, perhaps, my preference is to pass an existing entity
(err or copied), rather that conjure new concept (more) on one end and
interpret it on the other.

> I tend to find ">= 0 on success, otherwise errno" more readable,
> probably because that's a very common pattern (either for recvmsg
> style of cases, or all the ERR_PTR type situations).

Right it definitely is a good pattern. I think passing copied via
argument would give us those semantics still?

> > I like the former a little better because we won't have to special case
> > NULL for the "after async decryption" call sites.  
> 
> We could also pass &rx_more every time and not check for NULL.
> 
> What do you want to clean up more specifically? The number of
> arguments, the backwards goto, the NULL check before setting *more,
> something else/all of the above?

Not compiled, but what I had in mind was something along the lines of:

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9fbc70200cd0..6e6e6d89b173 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1772,7 +1772,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 			   u8 *control,
 			   size_t skip,
 			   size_t len,
-			   bool is_peek)
+			   bool is_peek,
+			   int *out_copied)
 {
 	struct sk_buff *skb = skb_peek(&ctx->rx_list);
 	struct tls_msg *tlm;
@@ -1843,7 +1844,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 	err = 0;
 
 out:
-	return copied ? : err;
+	*out_copied = copied;
+	return err;
 }
 
 static bool
@@ -1966,11 +1968,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		goto end;
 
 	/* Process pending decrypted records. It must be non-zero-copy */
-	err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
+	err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &copied);
 	if (err < 0)
 		goto end;
 
-	copied = err;
 	if (len <= copied)
 		goto end;
 
@@ -2128,10 +2129,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		/* Drain records from the rx_list & copy if required */
 		if (is_peek || is_kvec)
 			err = process_rx_list(ctx, msg, &control, copied,
-					      decrypted, is_peek);
+					      decrypted, is_peek, &ret);
 		else
 			err = process_rx_list(ctx, msg, &control, 0,
-					      async_copy_bytes, is_peek);
+					      async_copy_bytes, is_peek, &ret);
 	}
 
 	copied += decrypted;
Sabrina Dubroca Feb. 21, 2024, 1:59 p.m. UTC | #4
2024-02-20, 17:50:53 -0800, Jakub Kicinski wrote:
> On Tue, 20 Feb 2024 00:10:58 +0100 Sabrina Dubroca wrote:
> > 2024-02-19, 12:07:03 -0800, Jakub Kicinski wrote:
> > > On Thu, 15 Feb 2024 17:17:31 +0100 Sabrina Dubroca wrote:  
> > > > @@ -1772,7 +1772,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> > > >  			   u8 *control,
> > > >  			   size_t skip,
> > > >  			   size_t len,
> > > > -			   bool is_peek)
> > > > +			   bool is_peek,
> > > > +			   bool *more)
> > > >  {
> > > >  	struct sk_buff *skb = skb_peek(&ctx->rx_list);
> > > >  	struct tls_msg *tlm;  
> > > 
> > > > @@ -1844,6 +1845,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
> > > >  
> > > >  out:
> > > >  	return copied ? : err;
> > > > +more:
> > > > +	if (more)
> > > > +		*more = true;
> > > > +	goto out;  
> > > 
> > > Patches look correct, one small nit here -
> > > 
> > > I don't have great ideas how to avoid the 7th argument completely but   
> > 
> > I hesitated between this patch and a variant combining is_peek and
> > more into a single u8 *flags, but that felt a bit messy (or does that
> > fall into what you describe as "not [having] great ideas"? :))
> 
> I guess it saves a register, it seems a bit better but then it's a
> truly in/out argument :)

We already do that with darg all over the receive code, so it
shouldn't be too confusing to readers. It can be named flags_inout if
you think that would help, or have a comment like above tls_decrypt_sg.

> > > I think it'd be a little cleaner if we either:
> > >  - passed in err as an output argument (some datagram code does that
> > >    IIRC), then function can always return copied directly, or   
> > 
> > (yes, __skb_wait_for_more_packets, __skb_try_recv_datagram, and their
> > variants)
> > 
> > >  - passed copied as an output argument, and then we can always return
> > >    err?  
> > 
> > Aren't those 2 options adding an 8th argument?
> 
> No, no, still 7, if we separate copied from err - checking err < 0
> is enough to know that we need to exit.

Right, I realized that you probably meant something like that as I was
going to bed last night.

It's not exactly enough, since tls_record_content_type will return 0
on a content type mismatch. We'll have to translate that into an
"error". I think it would be a bit nicer to set err=1 and then check
err != 0 in tls_sw_recvmsg (we can document that in a comment above
process_rx_list) rather than making up a fake errno. See diff [1].

Or we could swap the 0/1 returns from tls_record_content_type and
switch the err <= 0 tests to err != 0 after the existing calls, then
process_rx_list doesn't have a weird special case [2].

What do you think?


> Differently put, perhaps, my preference is to pass an existing entity
> (err or copied), rather that conjure new concept (more) on one end and
> interpret it on the other.
> 
> > I tend to find ">= 0 on success, otherwise errno" more readable,
> > probably because that's a very common pattern (either for recvmsg
> > style of cases, or all the ERR_PTR type situations).
> 
> Right it definitely is a good pattern. I think passing copied via
> argument would give us those semantics still?

For recvmsg sure, but not for process_rx_list.

> > > I like the former a little better because we won't have to special case
> > > NULL for the "after async decryption" call sites.  
> > 
> > We could also pass &rx_more every time and not check for NULL.
> > 
> > What do you want to clean up more specifically? The number of
> > arguments, the backwards goto, the NULL check before setting *more,
> > something else/all of the above?
> 
> Not compiled, but what I had in mind was something along the lines of:

copied is a ssize_t (but ret isn't), so the change gets a bit uglier :(


------------ 8< ------------

[1] fix by setting err=1 in process_rx_list

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 43dd0d82b6ed..711504614da7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1766,13 +1766,19 @@ static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
  * decrypted records into the buffer provided by caller zero copy is not
  * true. Further, the records are removed from the rx_list if it is not a peek
  * case and the record has been consumed completely.
+ *
+ * Return:
+ *  - 0 if len bytes were copied
+ *  - 1 if < len bytes were copied due to a record type mismatch
+ *  - <0 if an error occurred
  */
 static int process_rx_list(struct tls_sw_context_rx *ctx,
 			   struct msghdr *msg,
 			   u8 *control,
 			   size_t skip,
 			   size_t len,
-			   bool is_peek)
+			   bool is_peek,
+			   ssize_t *out_copied)
 {
 	struct sk_buff *skb = skb_peek(&ctx->rx_list);
 	struct tls_msg *tlm;
@@ -1802,8 +1808,11 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 		tlm = tls_msg(skb);
 
 		err = tls_record_content_type(msg, tlm, control);
-		if (err <= 0)
+		if (err <= 0) {
+			if (err == 0)
+				err = 1;
 			goto out;
+		}
 
 		err = skb_copy_datagram_msg(skb, rxm->offset + skip,
 					    msg, chunk);
@@ -1843,7 +1852,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 	err = 0;
 
 out:
-	return copied ? : err;
+	*out_copied = copied;
+	return err;
 }
 
 static bool
@@ -1966,11 +1976,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		goto end;
 
 	/* Process pending decrypted records. It must be non-zero-copy */
-	err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
-	if (err < 0)
+	err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &copied);
+	if (err != 0)
 		goto end;
 
-	copied = err;
 	if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA))
 		goto end;
 
@@ -2114,6 +2123,7 @@ int tls_sw_recvmsg(struct sock *sk,
 
 recv_end:
 	if (async) {
+		ssize_t ret2;
 		int ret;
 
 		/* Wait for all previously submitted records to be decrypted */
@@ -2130,10 +2140,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		/* Drain records from the rx_list & copy if required */
 		if (is_peek || is_kvec)
 			err = process_rx_list(ctx, msg, &control, copied,
-					      decrypted, is_peek);
+					      decrypted, is_peek, &ret2);
 		else
 			err = process_rx_list(ctx, msg, &control, 0,
-					      async_copy_bytes, is_peek);
+					      async_copy_bytes, is_peek, &ret2);
 	}
 
 	copied += decrypted;


------------ 8< ------------

[2] fixing the bug by changing tls_record_content_type as well

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 43dd0d82b6ed..3da62ba97945 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1734,6 +1734,11 @@ int decrypt_skb(struct sock *sk, struct scatterlist *sgout)
 	return tls_decrypt_sg(sk, NULL, sgout, &darg);
 }
 
+/* Return:
+ *  - 0 on success
+ *  - 1 if the record's type doesn't match the value in control
+ *  - <0 if an error occurred
+ */
 static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
 				   u8 *control)
 {
@@ -1751,10 +1756,10 @@ static int tls_record_content_type(struct msghdr *msg, struct tls_msg *tlm,
 				return -EIO;
 		}
 	} else if (*control != tlm->control) {
-		return 0;
+		return 1;
 	}
 
-	return 1;
+	return 0;
 }
 
 static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
@@ -1766,13 +1771,19 @@ static void tls_rx_rec_done(struct tls_sw_context_rx *ctx)
  * decrypted records into the buffer provided by caller zero copy is not
  * true. Further, the records are removed from the rx_list if it is not a peek
  * case and the record has been consumed completely.
+ *
+ * Return:
+ *  - 0 if len bytes were copied
+ *  - 1 if < len bytes were copied due to a record type mismatch
+ *  - <0 if an error occurred
  */
 static int process_rx_list(struct tls_sw_context_rx *ctx,
 			   struct msghdr *msg,
 			   u8 *control,
 			   size_t skip,
 			   size_t len,
-			   bool is_peek)
+			   bool is_peek,
+			   ssize_t *out_copied)
 {
 	struct sk_buff *skb = skb_peek(&ctx->rx_list);
 	struct tls_msg *tlm;
@@ -1784,7 +1795,7 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 		tlm = tls_msg(skb);
 
 		err = tls_record_content_type(msg, tlm, control);
-		if (err <= 0)
+		if (err != 0)
 			goto out;
 
 		if (skip < rxm->full_len)
@@ -1802,7 +1813,7 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 		tlm = tls_msg(skb);
 
 		err = tls_record_content_type(msg, tlm, control);
-		if (err <= 0)
+		if (err != 0)
 			goto out;
 
 		err = skb_copy_datagram_msg(skb, rxm->offset + skip,
@@ -1843,7 +1854,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 	err = 0;
 
 out:
-	return copied ? : err;
+	*out_copied = copied;
+	return err;
 }
 
 static bool
@@ -1966,11 +1978,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		goto end;
 
 	/* Process pending decrypted records. It must be non-zero-copy */
-	err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
-	if (err < 0)
+	err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &copied);
+	if (err != 0)
 		goto end;
 
-	copied = err;
 	if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA))
 		goto end;
 
@@ -2032,7 +2043,7 @@ int tls_sw_recvmsg(struct sock *sk,
 		 * For tls1.3, we disable async.
 		 */
 		err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
-		if (err <= 0) {
+		if (err != 0) {
 			DEBUG_NET_WARN_ON_ONCE(darg.zc);
 			tls_rx_rec_done(ctx);
 put_on_rx_list_err:
@@ -2114,6 +2125,7 @@ int tls_sw_recvmsg(struct sock *sk,
 
 recv_end:
 	if (async) {
+		ssize_t ret2;
 		int ret;
 
 		/* Wait for all previously submitted records to be decrypted */
@@ -2130,10 +2142,10 @@ int tls_sw_recvmsg(struct sock *sk,
 		/* Drain records from the rx_list & copy if required */
 		if (is_peek || is_kvec)
 			err = process_rx_list(ctx, msg, &control, copied,
-					      decrypted, is_peek);
+					      decrypted, is_peek, &ret2);
 		else
 			err = process_rx_list(ctx, msg, &control, 0,
-					      async_copy_bytes, is_peek);
+					      async_copy_bytes, is_peek, &ret2);
 	}
 
 	copied += decrypted;
Jakub Kicinski Feb. 21, 2024, 6:33 p.m. UTC | #5
On Wed, 21 Feb 2024 14:59:40 +0100 Sabrina Dubroca wrote:
> It's not exactly enough, since tls_record_content_type will return 0
> on a content type mismatch. We'll have to translate that into an
> "error". 

Ugh, that's unpleasant.

> I think it would be a bit nicer to set err=1 and then check
> err != 0 in tls_sw_recvmsg (we can document that in a comment above
> process_rx_list) rather than making up a fake errno. See diff [1].
> 
> Or we could swap the 0/1 returns from tls_record_content_type and
> switch the err <= 0 tests to err != 0 after the existing calls, then
> process_rx_list doesn't have a weird special case [2].
> 
> What do you think?

I missed the error = 1 case, sorry. No strong preference, then.
Checking for error = 1 will be as special as the new rx_more
flag. Should I apply this version as is, then?
Sabrina Dubroca Feb. 21, 2024, 6:42 p.m. UTC | #6
2024-02-21, 10:33:30 -0800, Jakub Kicinski wrote:
> On Wed, 21 Feb 2024 14:59:40 +0100 Sabrina Dubroca wrote:
> > It's not exactly enough, since tls_record_content_type will return 0
> > on a content type mismatch. We'll have to translate that into an
> > "error". 
> 
> Ugh, that's unpleasant.
> 
> > I think it would be a bit nicer to set err=1 and then check
> > err != 0 in tls_sw_recvmsg (we can document that in a comment above
> > process_rx_list) rather than making up a fake errno. See diff [1].
> > 
> > Or we could swap the 0/1 returns from tls_record_content_type and
> > switch the err <= 0 tests to err != 0 after the existing calls, then
> > process_rx_list doesn't have a weird special case [2].
> > 
> > What do you think?
> 
> I missed the error = 1 case, sorry. No strong preference, then.
> Checking for error = 1 will be as special as the new rx_more
> flag. Should I apply this version as is, then?

If you're ok with that version, sure. Thanks.
diff mbox series

Patch

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 43dd0d82b6ed..de96959336c4 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1772,7 +1772,8 @@  static int process_rx_list(struct tls_sw_context_rx *ctx,
 			   u8 *control,
 			   size_t skip,
 			   size_t len,
-			   bool is_peek)
+			   bool is_peek,
+			   bool *more)
 {
 	struct sk_buff *skb = skb_peek(&ctx->rx_list);
 	struct tls_msg *tlm;
@@ -1785,7 +1786,7 @@  static int process_rx_list(struct tls_sw_context_rx *ctx,
 
 		err = tls_record_content_type(msg, tlm, control);
 		if (err <= 0)
-			goto out;
+			goto more;
 
 		if (skip < rxm->full_len)
 			break;
@@ -1803,12 +1804,12 @@  static int process_rx_list(struct tls_sw_context_rx *ctx,
 
 		err = tls_record_content_type(msg, tlm, control);
 		if (err <= 0)
-			goto out;
+			goto more;
 
 		err = skb_copy_datagram_msg(skb, rxm->offset + skip,
 					    msg, chunk);
 		if (err < 0)
-			goto out;
+			goto more;
 
 		len = len - chunk;
 		copied = copied + chunk;
@@ -1844,6 +1845,10 @@  static int process_rx_list(struct tls_sw_context_rx *ctx,
 
 out:
 	return copied ? : err;
+more:
+	if (more)
+		*more = true;
+	goto out;
 }
 
 static bool
@@ -1947,6 +1952,7 @@  int tls_sw_recvmsg(struct sock *sk,
 	int target, err;
 	bool is_kvec = iov_iter_is_kvec(&msg->msg_iter);
 	bool is_peek = flags & MSG_PEEK;
+	bool rx_more = false;
 	bool released = true;
 	bool bpf_strp_enabled;
 	bool zc_capable;
@@ -1966,12 +1972,12 @@  int tls_sw_recvmsg(struct sock *sk,
 		goto end;
 
 	/* Process pending decrypted records. It must be non-zero-copy */
-	err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
+	err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &rx_more);
 	if (err < 0)
 		goto end;
 
 	copied = err;
-	if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA))
+	if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA) || rx_more)
 		goto end;
 
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
@@ -2130,10 +2136,10 @@  int tls_sw_recvmsg(struct sock *sk,
 		/* Drain records from the rx_list & copy if required */
 		if (is_peek || is_kvec)
 			err = process_rx_list(ctx, msg, &control, copied,
-					      decrypted, is_peek);
+					      decrypted, is_peek, NULL);
 		else
 			err = process_rx_list(ctx, msg, &control, 0,
-					      async_copy_bytes, is_peek);
+					      async_copy_bytes, is_peek, NULL);
 	}
 
 	copied += decrypted;