Message ID | 20240815084330.166987-1-sunyiqixm@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: remove release/lock_sock in tcp_splice_read | expand |
On 8/15/24 10:43, sunyiqi wrote: > When enters tcp_splice_read, tcp_splice_read will call lock_sock > for sk in order to prevent other threads acquiring sk and release it > before return. > > But in while(tss.len) loop, it releases and re-locks sk, give the other > thread a small window to lock the sk. > > As a result, release/lock_sock in the while loop in tcp_splice_read may > cause race condition. Which race condition exactly? do you have a backtrace? > > Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.") > Signed-off-by: sunyiqi <sunyiqixm@gmail.com> > --- > net/ipv4/tcp.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index e03a342c9162..7a2ce0e2e5be 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -856,8 +856,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos, > > if (!tss.len || !timeo) > break; > - release_sock(sk); > - lock_sock(sk); This is needed to flush the sk backlog. Somewhat related, I think we could replace the pair with sk_flush_backlog(). Thanks, Paolo
On Thu, Aug 15, 2024 at 5:46 PM sunyiqi <sunyiqixm@gmail.com> wrote: > > When enters tcp_splice_read, tcp_splice_read will call lock_sock > for sk in order to prevent other threads acquiring sk and release it > before return. > > But in while(tss.len) loop, it releases and re-locks sk, give the other > thread a small window to lock the sk. > > As a result, release/lock_sock in the while loop in tcp_splice_read may > cause race condition. > > Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.") > Signed-off-by: sunyiqi <sunyiqixm@gmail.com> It's more of an optimization instead of a BUG, no? I don't consider it as a bug, unless you can prove it... Let me ask what kind of race issues could re-lock cause? I think holding the socket lock too long is not a good idea because releasing the lock can give others breathing (having the chance to finish their own stuff). Please see release_sock(). Thanks, Jason > --- > net/ipv4/tcp.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index e03a342c9162..7a2ce0e2e5be 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -856,8 +856,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos, > > if (!tss.len || !timeo) > break; > - release_sock(sk); > - lock_sock(sk); > > if (sk->sk_err || sk->sk_state == TCP_CLOSE || > (sk->sk_shutdown & RCV_SHUTDOWN) || > -- > 2.34.1 > >
Hello Paolo, On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote: [...] > > - release_sock(sk); > > - lock_sock(sk); > > This is needed to flush the sk backlog. > > Somewhat related, I think we could replace the pair with sk_flush_backlog(). > Do you think we could do this like the following commit: commit d41a69f1d390fa3f2546498103cdcd78b30676ff Author: Eric Dumazet <edumazet@google.com> Date: Fri Apr 29 14:16:53 2016 -0700 tcp: make tcp_sendmsg() aware of socket backlog Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. ? Then we can avoid taking the lock too long which results in too many packets in the backlog. Thanks, Jason
On 8/15/24 12:55, Jason Xing wrote: > On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote: > [...] >>> - release_sock(sk); >>> - lock_sock(sk); >> >> This is needed to flush the sk backlog. >> >> Somewhat related, I think we could replace the pair with sk_flush_backlog(). >> > > Do you think we could do this like the following commit: > > commit d41a69f1d390fa3f2546498103cdcd78b30676ff > Author: Eric Dumazet <edumazet@google.com> > Date: Fri Apr 29 14:16:53 2016 -0700 > > tcp: make tcp_sendmsg() aware of socket backlog > > Large sendmsg()/write() hold socket lock for the duration of the call, > unless sk->sk_sndbuf limit is hit. This is bad because incoming packets > are parked into socket backlog for a long time. > > ? Yep. To be more accurate I was looking at commit 93afcfd1db35882921b2521a637c78755c27b02c In any case this should be unrelated from the supposed issue. Cheers, Paolo
On Thu, Aug 15, 2024 at 7:23 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 8/15/24 12:55, Jason Xing wrote: > > On Thu, Aug 15, 2024 at 6:40 PM Paolo Abeni <pabeni@redhat.com> wrote: > > [...] > >>> - release_sock(sk); > >>> - lock_sock(sk); > >> > >> This is needed to flush the sk backlog. > >> > >> Somewhat related, I think we could replace the pair with sk_flush_backlog(). > >> > > > > Do you think we could do this like the following commit: > > > > commit d41a69f1d390fa3f2546498103cdcd78b30676ff > > Author: Eric Dumazet <edumazet@google.com> > > Date: Fri Apr 29 14:16:53 2016 -0700 > > > > tcp: make tcp_sendmsg() aware of socket backlog > > > > Large sendmsg()/write() hold socket lock for the duration of the call, > > unless sk->sk_sndbuf limit is hit. This is bad because incoming packets > > are parked into socket backlog for a long time. > > > ? > > Yep. To be more accurate I was looking at commit > 93afcfd1db35882921b2521a637c78755c27b02c Thanks. It arouses my interest. Now I do believe we can do such optimization in this function. > > In any case this should be unrelated from the supposed issue. Sure. Thanks, Jason
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e03a342c9162..7a2ce0e2e5be 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -856,8 +856,6 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos, if (!tss.len || !timeo) break; - release_sock(sk); - lock_sock(sk); if (sk->sk_err || sk->sk_state == TCP_CLOSE || (sk->sk_shutdown & RCV_SHUTDOWN) ||
When enters tcp_splice_read, tcp_splice_read will call lock_sock for sk in order to prevent other threads acquiring sk and release it before return. But in while(tss.len) loop, it releases and re-locks sk, give the other thread a small window to lock the sk. As a result, release/lock_sock in the while loop in tcp_splice_read may cause race condition. Fixes: 9c55e01c0cc8 ("[TCP]: Splice receive support.") Signed-off-by: sunyiqi <sunyiqixm@gmail.com> --- net/ipv4/tcp.c | 2 -- 1 file changed, 2 deletions(-)