diff mbox series

net: remove release/lock_sock in tcp_splice_read

Message ID 20240815084330.166987-1-sunyiqixm@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: remove release/lock_sock in tcp_splice_read | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 29 this patch: 29
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: axboe@kernel.dk; 1 maintainers not CCed: axboe@kernel.dk
netdev/build_clang success Errors and warnings before: 29 this patch: 29
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 29 this patch: 29
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

sunyiqi Aug. 15, 2024, 8:43 a.m. UTC
When enters tcp_splice_read, tcp_splice_read will call lock_sock
for sk in order to prevent other threads acquiring sk and release it
before return.

But in while(tss.len) loop, it releases and re-locks sk, give the other
thread a small window to lock the sk.

As a result, release/lock_sock in the while loop in tcp_splice_read may
cause race condition.

Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
Signed-off-by: sunyiqi <sunyiqixm@gmail.com>
---
 net/ipv4/tcp.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Paolo Abeni Aug. 15, 2024, 10 a.m. UTC | #1
On 8/15/24 10:43, sunyiqi wrote:
> When enters tcp_splice_read, tcp_splice_read will call lock_sock
> for sk in order to prevent other threads acquiring sk and release it
> before return.
> 
> But in while(tss.len) loop, it releases and re-locks sk, give the other
> thread a small window to lock the sk.
> 
> As a result, release/lock_sock in the while loop in tcp_splice_read may
> cause race condition.

Which race condition exactly? do you have a backtrace?

> 
> Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
> Signed-off-by: sunyiqi <sunyiqixm@gmail.com>
> ---
>   net/ipv4/tcp.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e03a342c9162..7a2ce0e2e5be 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -856,8 +856,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
>   
>   		if (!tss.len || !timeo)
>   			break;
> -		release_sock(sk);
> -		lock_sock(sk);

This is needed to flush the sk backlog.

Somewhat related, I think we could replace the pair with sk_flush_backlog().

Thanks,

Paolo
Jason Xing Aug. 15, 2024, 10:48 a.m. UTC | #2
On Thu, Aug 15, 2024 at 5:46 PM sunyiqi <sunyiqixm@gmail.com> wrote:
>
> When enters tcp_splice_read, tcp_splice_read will call lock_sock
> for sk in order to prevent other threads acquiring sk and release it
> before return.
>
> But in while(tss.len) loop, it releases and re-locks sk, give the other
> thread a small window to lock the sk.
>
> As a result, release/lock_sock in the while loop in tcp_splice_read may
> cause race condition.
>
> Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.")
> Signed-off-by: sunyiqi <sunyiqixm@gmail.com>

It's more of an optimization instead of a BUG, no?

I don't consider it as a bug, unless you can prove it... Let me ask
what kind of race issues could re-lock cause?

I think holding the socket lock too long is not a good idea because
releasing the lock can give others breathing (having the chance to
finish their own stuff). Please see release_sock().

Thanks,
Jason

> ---
>  net/ipv4/tcp.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e03a342c9162..7a2ce0e2e5be 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -856,8 +856,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
>
>                 if (!tss.len || !timeo)
>                         break;
> -               release_sock(sk);
> -               lock_sock(sk);
>
>                 if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
>                     (sk->sk_shutdown & RCV_SHUTDOWN) ||
> --
> 2.34.1
>
>
Jason Xing Aug. 15, 2024, 10:55 a.m. UTC | #3
Hello Paolo,

On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote:
[...]
> > -             release_sock(sk);
> > -             lock_sock(sk);
>
> This is needed to flush the sk backlog.
>
> Somewhat related, I think we could replace the pair with sk_flush_backlog().
>

Do you think we could do this like the following commit:

commit d41a69f1d390fa3f2546498103cdcd78b30676ff
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 29 14:16:53 2016 -0700

    tcp: make tcp_sendmsg() aware of socket backlog

    Large sendmsg()/write() hold socket lock for the duration of the call,
    unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
    are parked into socket backlog for a long time.

?

Then we can avoid taking the lock too long which results in too many
packets in the backlog.

Thanks,
Jason
Paolo Abeni Aug. 15, 2024, 11:22 a.m. UTC | #4
On 8/15/24 12:55, Jason Xing wrote:
> On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote:
> [...]
>>> -             release_sock(sk);
>>> -             lock_sock(sk);
>>
>> This is needed to flush the sk backlog.
>>
>> Somewhat related, I think we could replace the pair with sk_flush_backlog().
>>
> 
> Do you think we could do this like the following commit:
> 
> commit d41a69f1d390fa3f2546498103cdcd78b30676ff
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Apr 29 14:16:53 2016 -0700
> 
>      tcp: make tcp_sendmsg() aware of socket backlog
> 
>      Large sendmsg()/write() hold socket lock for the duration of the call,
>      unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
>      are parked into socket backlog for a long time. >
> ?

Yep. To be more accurate I was looking at commit 
93afcfd1db35882921b2521a637c78755c27b02c

In any case this should be unrelated from the supposed issue.

Cheers,

Paolo
Jason Xing Aug. 15, 2024, 11:34 a.m. UTC | #5
On Thu, Aug 15, 2024 at 7:23 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
>
> On 8/15/24 12:55, Jason Xing wrote:
> > On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > [...]
> >>> -             release_sock(sk);
> >>> -             lock_sock(sk);
> >>
> >> This is needed to flush the sk backlog.
> >>
> >> Somewhat related, I think we could replace the pair with sk_flush_backlog().
> >>
> >
> > Do you think we could do this like the following commit:
> >
> > commit d41a69f1d390fa3f2546498103cdcd78b30676ff
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Fri Apr 29 14:16:53 2016 -0700
> >
> >      tcp: make tcp_sendmsg() aware of socket backlog
> >
> >      Large sendmsg()/write() hold socket lock for the duration of the call,
> >      unless sk->sk_sndbuf limit is hit. This is bad because incoming packets
> >      are parked into socket backlog for a long time. >
> > ?
>
> Yep. To be more accurate I was looking at commit
> 93afcfd1db35882921b2521a637c78755c27b02c

Thanks. It arouses my interest. Now I do believe we can do such
optimization in this function.

>
> In any case this should be unrelated from the supposed issue.

Sure.

Thanks,
Jason
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e03a342c9162..7a2ce0e2e5be 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -856,8 +856,6 @@  ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
 
 		if (!tss.len || !timeo)
 			break;
-		release_sock(sk);
-		lock_sock(sk);
 
 		if (sk->sk_err || sk->sk_state == TCP_CLOSE ||
 		    (sk->sk_shutdown & RCV_SHUTDOWN) ||