diff mbox series

[v2,net-next,03/11] af_unix: Don't retry after unix_state_lock_nested() in unix_stream_connect().

Message ID 20240611222905.34695-4-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series af_unix: Remove spin_lock_nested() and convert to lock_cmp_fn. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 856 this patch: 856
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 854 this patch: 854
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 860 this patch: 860
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 52 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-14--06-00 (tests: 647)

Commit Message

Kuniyuki Iwashima June 11, 2024, 10:28 p.m. UTC
When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
to lock the two sockets to check their states in unix_stream_connect().

We use unix_state_lock() for the server and unix_state_lock_nested() for
client with tricky sk->sk_state check to avoid deadlock.

The possible deadlock scenario are the following:

  1) Self connect()
  2) Simultaneous connect()

The former is simple, attempt to grab the same lock, and the latter is
AB-BA deadlock.

After the server's unix_state_lock(), we check the server socket's state,
and if it's not TCP_LISTEN, connect() fails with -EINVAL.

Then, we avoid the former deadlock by checking the client's state before
unix_state_lock_nested().  If its state is not TCP_LISTEN, we can make
sure that the client and the server are not identical based on the state.

Also, the latter deadlock can be avoided in the same way.  Due to the
server sk->sk_state requirement, AB-BA deadlock could happen only with
TCP_LISTEN sockets.  So, if the client's state is TCP_LISTEN, we can
give up the second lock to avoid the deadlock.

  CPU 1                 CPU 2                  CPU 3
  connect(A -> B)       connect(B -> A)        listen(A)
  ---                   ---                    ---
  unix_state_lock(B)
  B->sk_state == TCP_LISTEN
  READ_ONCE(A->sk_state) == TCP_CLOSE
                            ^^^^^^^^^
                            ok, will lock A    unix_state_lock(A)
             .--------------'                  WRITE_ONCE(A->sk_state, TCP_LISTEN)
             |                                 unix_state_unlock(A)
             |
             |          unix_state_lock(A)
             |          A->sk_sk_state == TCP_LISTEN
             |          READ_ONCE(B->sk_state) == TCP_LISTEN
             v                                    ^^^^^^^^^^
  unix_state_lock_nested(A)                       Don't lock B !!

Currently, while checking the client's state, we also check if it's
TCP_ESTABLISHED, but this is unlikely and can be checked after we know
the state is not TCP_CLOSE.

Moreover, if it happens after the second lock, we now jump to the restart
label, but it's unlikely that the server is not found during the retry,
so the jump is mostly to revist the client state check.

Let's remove the retry logic and check the state against TCP_CLOSE first.

Note that sk->sk_state does not change once it's changed from TCP_CLOSE,
so READ_ONCE() is not needed in the second state read in the first check.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 34 ++++++++--------------------------
 1 file changed, 8 insertions(+), 26 deletions(-)

Comments

Paolo Abeni June 14, 2024, 10:49 a.m. UTC | #1
On Tue, 2024-06-11 at 15:28 -0700, Kuniyuki Iwashima wrote:
> When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
> to lock the two sockets to check their states in unix_stream_connect().
> 
> We use unix_state_lock() for the server and unix_state_lock_nested() for
> client with tricky sk->sk_state check to avoid deadlock.
> 
> The possible deadlock scenario are the following:
> 
>   1) Self connect()
>   2) Simultaneous connect()
> 
> The former is simple, attempt to grab the same lock, and the latter is
> AB-BA deadlock.
> 
> After the server's unix_state_lock(), we check the server socket's state,
> and if it's not TCP_LISTEN, connect() fails with -EINVAL.
> 
> Then, we avoid the former deadlock by checking the client's state before
> unix_state_lock_nested().  If its state is not TCP_LISTEN, we can make
> sure that the client and the server are not identical based on the state.
> 
> Also, the latter deadlock can be avoided in the same way.  Due to the
> server sk->sk_state requirement, AB-BA deadlock could happen only with
> TCP_LISTEN sockets.  So, if the client's state is TCP_LISTEN, we can
> give up the second lock to avoid the deadlock.
> 
>   CPU 1                 CPU 2                  CPU 3
>   connect(A -> B)       connect(B -> A)        listen(A)
>   ---                   ---                    ---
>   unix_state_lock(B)
>   B->sk_state == TCP_LISTEN
>   READ_ONCE(A->sk_state) == TCP_CLOSE
>                             ^^^^^^^^^
>                             ok, will lock A    unix_state_lock(A)
>              .--------------'                  WRITE_ONCE(A->sk_state, TCP_LISTEN)
>              |                                 unix_state_unlock(A)
>              |
>              |          unix_state_lock(A)
>              |          A->sk_sk_state == TCP_LISTEN
>              |          READ_ONCE(B->sk_state) == TCP_LISTEN
>              v                                    ^^^^^^^^^^
>   unix_state_lock_nested(A)                       Don't lock B !!
> 
> Currently, while checking the client's state, we also check if it's
> TCP_ESTABLISHED, but this is unlikely and can be checked after we know
> the state is not TCP_CLOSE.
> 
> Moreover, if it happens after the second lock, we now jump to the restart
> label, but it's unlikely that the server is not found during the retry,
> so the jump is mostly to revist the client state check.
> 
> Let's remove the retry logic and check the state against TCP_CLOSE first.
> 
> Note that sk->sk_state does not change once it's changed from TCP_CLOSE,
> so READ_ONCE() is not needed in the second state read in the first check.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/unix/af_unix.c | 34 ++++++++--------------------------
>  1 file changed, 8 insertions(+), 26 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index c09bf2b03582..a6dc8bb360ca 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1546,7 +1546,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
>  		goto out;
>  	}
>  
> -	/* Latch state of peer */
>  	unix_state_lock(other);
>  
>  	/* Apparently VFS overslept socket death. Retry. */
> @@ -1576,37 +1575,20 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
>  		goto restart;
>  	}
>  
> -	/* Latch our state.
> -
> -	   It is tricky place. We need to grab our state lock and cannot
> -	   drop lock on peer. It is dangerous because deadlock is
> -	   possible. Connect to self case and simultaneous
> -	   attempt to connect are eliminated by checking socket
> -	   state. other is TCP_LISTEN, if sk is TCP_LISTEN we
> -	   check this before attempt to grab lock.
> -
> -	   Well, and we have to recheck the state after socket locked.
> +	/* self connect and simultaneous connect are eliminated
> +	 * by rejecting TCP_LISTEN socket to avoid deadlock.
>  	 */
> -	switch (READ_ONCE(sk->sk_state)) {
> -	case TCP_CLOSE:
> -		/* This is ok... continue with connect */
> -		break;
> -	case TCP_ESTABLISHED:
> -		/* Socket is already connected */
> -		err = -EISCONN;
> -		goto out_unlock;
> -	default:
> -		err = -EINVAL;
> +	if (unlikely(READ_ONCE(sk->sk_state) != TCP_CLOSE)) {
> +		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;

I find the mixed READ_ONCE()/plain read confusing. What about using a
single READ_ONCE() caching the return value?

>  		goto out_unlock;
>  	}
>  
>  	unix_state_lock_nested(sk, U_LOCK_SECOND);
>  
> -	if (sk->sk_state != TCP_CLOSE) {
> -		unix_state_unlock(sk);
> -		unix_state_unlock(other);
> -		sock_put(other);
> -		goto restart;
> +	if (unlikely(sk->sk_state != TCP_CLOSE)) {
> +		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
> +		unix_state_lock(sk);

Should likely be:
		unix_state_unlock(sk)
?



Thanks!

Paolo
Kuniyuki Iwashima June 14, 2024, 6:53 p.m. UTC | #2
From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 14 Jun 2024 12:49:35 +0200
> On Tue, 2024-06-11 at 15:28 -0700, Kuniyuki Iwashima wrote:
> > When a SOCK_(STREAM|SEQPACKET) socket connect()s to another one, we need
> > to lock the two sockets to check their states in unix_stream_connect().
> > 
> > We use unix_state_lock() for the server and unix_state_lock_nested() for
> > client with tricky sk->sk_state check to avoid deadlock.
> > 
> > The possible deadlock scenario are the following:
> > 
> >   1) Self connect()
> >   2) Simultaneous connect()
> > 
> > The former is simple, attempt to grab the same lock, and the latter is
> > AB-BA deadlock.
> > 
> > After the server's unix_state_lock(), we check the server socket's state,
> > and if it's not TCP_LISTEN, connect() fails with -EINVAL.
> > 
> > Then, we avoid the former deadlock by checking the client's state before
> > unix_state_lock_nested().  If its state is not TCP_LISTEN, we can make
> > sure that the client and the server are not identical based on the state.
> > 
> > Also, the latter deadlock can be avoided in the same way.  Due to the
> > server sk->sk_state requirement, AB-BA deadlock could happen only with
> > TCP_LISTEN sockets.  So, if the client's state is TCP_LISTEN, we can
> > give up the second lock to avoid the deadlock.
> > 
> >   CPU 1                 CPU 2                  CPU 3
> >   connect(A -> B)       connect(B -> A)        listen(A)
> >   ---                   ---                    ---
> >   unix_state_lock(B)
> >   B->sk_state == TCP_LISTEN
> >   READ_ONCE(A->sk_state) == TCP_CLOSE
> >                             ^^^^^^^^^
> >                             ok, will lock A    unix_state_lock(A)
> >              .--------------'                  WRITE_ONCE(A->sk_state, TCP_LISTEN)
> >              |                                 unix_state_unlock(A)
> >              |
> >              |          unix_state_lock(A)
> >              |          A->sk_sk_state == TCP_LISTEN
> >              |          READ_ONCE(B->sk_state) == TCP_LISTEN
> >              v                                    ^^^^^^^^^^
> >   unix_state_lock_nested(A)                       Don't lock B !!
> > 
> > Currently, while checking the client's state, we also check if it's
> > TCP_ESTABLISHED, but this is unlikely and can be checked after we know
> > the state is not TCP_CLOSE.
> > 
> > Moreover, if it happens after the second lock, we now jump to the restart
> > label, but it's unlikely that the server is not found during the retry,
> > so the jump is mostly to revist the client state check.
> > 
> > Let's remove the retry logic and check the state against TCP_CLOSE first.
> > 
> > Note that sk->sk_state does not change once it's changed from TCP_CLOSE,
> > so READ_ONCE() is not needed in the second state read in the first check.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/unix/af_unix.c | 34 ++++++++--------------------------
> >  1 file changed, 8 insertions(+), 26 deletions(-)
> > 
> > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> > index c09bf2b03582..a6dc8bb360ca 100644
> > --- a/net/unix/af_unix.c
> > +++ b/net/unix/af_unix.c
> > @@ -1546,7 +1546,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> >  		goto out;
> >  	}
> >  
> > -	/* Latch state of peer */
> >  	unix_state_lock(other);
> >  
> >  	/* Apparently VFS overslept socket death. Retry. */
> > @@ -1576,37 +1575,20 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> >  		goto restart;
> >  	}
> >  
> > -	/* Latch our state.
> > -
> > -	   It is tricky place. We need to grab our state lock and cannot
> > -	   drop lock on peer. It is dangerous because deadlock is
> > -	   possible. Connect to self case and simultaneous
> > -	   attempt to connect are eliminated by checking socket
> > -	   state. other is TCP_LISTEN, if sk is TCP_LISTEN we
> > -	   check this before attempt to grab lock.
> > -
> > -	   Well, and we have to recheck the state after socket locked.
> > +	/* self connect and simultaneous connect are eliminated
> > +	 * by rejecting TCP_LISTEN socket to avoid deadlock.
> >  	 */
> > -	switch (READ_ONCE(sk->sk_state)) {
> > -	case TCP_CLOSE:
> > -		/* This is ok... continue with connect */
> > -		break;
> > -	case TCP_ESTABLISHED:
> > -		/* Socket is already connected */
> > -		err = -EISCONN;
> > -		goto out_unlock;
> > -	default:
> > -		err = -EINVAL;
> > +	if (unlikely(READ_ONCE(sk->sk_state) != TCP_CLOSE)) {
> > +		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
> 
> I find the mixed READ_ONCE()/plain read confusing. What about using a
> single READ_ONCE() caching the return value?

Will use cached sk_state.

> 
> >  		goto out_unlock;
> >  	}
> >  
> >  	unix_state_lock_nested(sk, U_LOCK_SECOND);
> >  
> > -	if (sk->sk_state != TCP_CLOSE) {
> > -		unix_state_unlock(sk);
> > -		unix_state_unlock(other);
> > -		sock_put(other);
> > -		goto restart;
> > +	if (unlikely(sk->sk_state != TCP_CLOSE)) {
> > +		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
> > +		unix_state_lock(sk);
> 
> Should likely be:
> 		unix_state_unlock(sk)
> ?

Oops, will fix it.

Thanks!
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index c09bf2b03582..a6dc8bb360ca 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1546,7 +1546,6 @@  static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto out;
 	}
 
-	/* Latch state of peer */
 	unix_state_lock(other);
 
 	/* Apparently VFS overslept socket death. Retry. */
@@ -1576,37 +1575,20 @@  static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 		goto restart;
 	}
 
-	/* Latch our state.
-
-	   It is tricky place. We need to grab our state lock and cannot
-	   drop lock on peer. It is dangerous because deadlock is
-	   possible. Connect to self case and simultaneous
-	   attempt to connect are eliminated by checking socket
-	   state. other is TCP_LISTEN, if sk is TCP_LISTEN we
-	   check this before attempt to grab lock.
-
-	   Well, and we have to recheck the state after socket locked.
+	/* self connect and simultaneous connect are eliminated
+	 * by rejecting TCP_LISTEN socket to avoid deadlock.
 	 */
-	switch (READ_ONCE(sk->sk_state)) {
-	case TCP_CLOSE:
-		/* This is ok... continue with connect */
-		break;
-	case TCP_ESTABLISHED:
-		/* Socket is already connected */
-		err = -EISCONN;
-		goto out_unlock;
-	default:
-		err = -EINVAL;
+	if (unlikely(READ_ONCE(sk->sk_state) != TCP_CLOSE)) {
+		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
 		goto out_unlock;
 	}
 
 	unix_state_lock_nested(sk, U_LOCK_SECOND);
 
-	if (sk->sk_state != TCP_CLOSE) {
-		unix_state_unlock(sk);
-		unix_state_unlock(other);
-		sock_put(other);
-		goto restart;
+	if (unlikely(sk->sk_state != TCP_CLOSE)) {
+		err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EINVAL;
+		unix_state_lock(sk);
+		goto out_unlock;
 	}
 
 	err = security_unix_stream_connect(sk, other, newsk);