diff mbox series

[net] unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()

Message ID 38a920a7-cfba-7929-886d-c3c6effc0c43@ya.ru (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers warning 1 maintainers not CCed: kuniyu@amazon.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 22 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kirill Tkhai Nov. 19, 2022, 11:16 p.m. UTC
There is a race resulting in alive SOCK_SEQPACKET socket
may change its state from TCP_ESTABLISHED to TCP_CLOSE:

unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
  sock_orphan(peer)
    sock_set_flag(peer, SOCK_DEAD)
                                           sock_alloc_send_pskb()
                                             if !(sk->sk_shutdown & SEND_SHUTDOWN)
                                               OK
                                           if sock_flag(peer, SOCK_DEAD)
                                             sk->sk_state = TCP_CLOSE
  sk->sk_shutdown = SHUTDOWN_MASK


After that socket sk remains almost normal: it is able to connect, listen, accept
and recvmsg, while it can't sendmsg.

Since this is the only possibility for alive SOCK_SEQPACKET to change
the state in such way, we should better fix this strange and potentially
danger corner case.

Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.

Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
---
 net/unix/af_unix.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Kuniyuki Iwashima Nov. 20, 2022, 9:09 a.m. UTC | #1
From:   Kirill Tkhai <tkhai@ya.ru>
Date:   Sun, 20 Nov 2022 02:16:47 +0300
> There is a race resulting in alive SOCK_SEQPACKET socket
> may change its state from TCP_ESTABLISHED to TCP_CLOSE:
> 
> unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
>   sock_orphan(peer)
>     sock_set_flag(peer, SOCK_DEAD)
>                                            sock_alloc_send_pskb()
>                                              if !(sk->sk_shutdown & SEND_SHUTDOWN)
>                                                OK
>                                            if sock_flag(peer, SOCK_DEAD)
>                                              sk->sk_state = TCP_CLOSE
>   sk->sk_shutdown = SHUTDOWN_MASK
> 
> 
> After that socket sk remains almost normal: it is able to connect, listen, accept
> and recvmsg, while it can't sendmsg.

nit: Then, also recvmsg() fails with -ENOTCONN.  And after connect(), even
both of recvmsg() and sendmsg() does not fail.

static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
				  size_t size, int flags)
{
	struct sock *sk = sock->sk;

	if (sk->sk_state != TCP_ESTABLISHED)
		return -ENOTCONN;

	return unix_dgram_recvmsg(sock, msg, size, flags);
}


> 
> Since this is the only possibility for alive SOCK_SEQPACKET to change
> the state in such way, we should better fix this strange and potentially
> danger corner case.
> 
> Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.
> 
> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>

Fixes tag is needed:

Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")

Before this commit, there was no state change and SEQPACKET sk also went
through the same path.  The bug was introduced because the commit did not
consider SEAPACKET.

So, I think the fix should be like below, then we can free the peer faster.
Note unix_dgram_peer_wake_disconnect_wakeup() is dgram-specific too.

---8<---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b3545fc68097..be40023a61fb 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2001,11 +2001,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		err = 0;
 		if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
-			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+
+			if (sk->sk_type == SOCK_DGRAM) {
+				unix_dgram_peer_wake_disconnect_wakeup(sk, other);
+				sk->sk_state = TCP_CLOSE;
+			}
 
 			unix_state_unlock(sk);
 
-			sk->sk_state = TCP_CLOSE;
 			unix_dgram_disconnected(sk, other);
 			sock_put(other);
 			err = -ECONNREFUSED;
---8<---

Also, it's better to mention that moving TCP_CLOSE under the lock resolves
another rare race with unix_dgram_connect() for DGRAM sk:

  unix_state_unlock(sk);
  <--------------------------> connect() could set TCP_ESTABLISHED here.
  sk->sk_state = TCP_CLOSE;


Thank you!


> ---
>  net/unix/af_unix.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index b3545fc68097..6fd745cfc492 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1999,13 +1999,20 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>  			unix_state_lock(sk);
>  
>  		err = 0;
> -		if (unix_peer(sk) == other) {
> +		if (sk->sk_type == SOCK_SEQPACKET) {
> +			/* We are here only when racing with unix_release_sock()
> +			 * is clearing @other. Never change state to TCP_CLOSE
> +			 * unlike SOCK_DGRAM wants.
> +			 */
> +			unix_state_unlock(sk);
> +			err = -EPIPE;
> +		} else if (unix_peer(sk) == other) {
>  			unix_peer(sk) = NULL;
>  			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>  
> +			sk->sk_state = TCP_CLOSE;
>  			unix_state_unlock(sk);
>  
> -			sk->sk_state = TCP_CLOSE;
>  			unix_dgram_disconnected(sk, other);
>  			sock_put(other);
>  			err = -ECONNREFUSED;
Kirill Tkhai Nov. 20, 2022, 9:46 a.m. UTC | #2
On 20.11.2022 12:09, Kuniyuki Iwashima wrote:
> From:   Kirill Tkhai <tkhai@ya.ru>
> Date:   Sun, 20 Nov 2022 02:16:47 +0300
>> There is a race resulting in alive SOCK_SEQPACKET socket
>> may change its state from TCP_ESTABLISHED to TCP_CLOSE:
>>
>> unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
>>   sock_orphan(peer)
>>     sock_set_flag(peer, SOCK_DEAD)
>>                                            sock_alloc_send_pskb()
>>                                              if !(sk->sk_shutdown & SEND_SHUTDOWN)
>>                                                OK
>>                                            if sock_flag(peer, SOCK_DEAD)
>>                                              sk->sk_state = TCP_CLOSE
>>   sk->sk_shutdown = SHUTDOWN_MASK
>>
>>
>> After that socket sk remains almost normal: it is able to connect, listen, accept
>> and recvmsg, while it can't sendmsg.
> 
> nit: Then, also recvmsg() fails with -ENOTCONN.  And after connect(), even
> both of recvmsg() and sendmsg() does not fail.

Possible, I wrote not clear. I mean sendmsg fails after connect, while recvmsg does not fail :)
Here is sendmsg abort path:

unix_dgram_sendmsg()
  sock_alloc_send_pskb()
    if (sk->sk_shutdown & SEND_SHUTDOWN)
      FAIL

> static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
> 				  size_t size, int flags)
> {
> 	struct sock *sk = sock->sk;
> 
> 	if (sk->sk_state != TCP_ESTABLISHED)
> 		return -ENOTCONN;
> 
> 	return unix_dgram_recvmsg(sock, msg, size, flags);
> }
> 
> 
>>
>> Since this is the only possibility for alive SOCK_SEQPACKET to change
>> the state in such way, we should better fix this strange and potentially
>> danger corner case.
>>
>> Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.
>>
>> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
> 
> Fixes tag is needed:
> 
> Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
>> Before this commit, there was no state change and SEQPACKET sk also went
> through the same path.  The bug was introduced because the commit did not
> consider SEAPACKET.
> 
> So, I think the fix should be like below, then we can free the peer faster.
> Note unix_dgram_peer_wake_disconnect_wakeup() is dgram-specific too.
>
> ---8<---
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index b3545fc68097..be40023a61fb 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -2001,11 +2001,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>  		err = 0;
>  		if (unix_peer(sk) == other) {
>  			unix_peer(sk) = NULL;
> -			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> +
> +			if (sk->sk_type == SOCK_DGRAM) {
> +				unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> +				sk->sk_state = TCP_CLOSE;
> +			}
>  
>  			unix_state_unlock(sk);
>  
> -			sk->sk_state = TCP_CLOSE;
>  			unix_dgram_disconnected(sk, other);
>  			sock_put(other);
>  			err = -ECONNREFUSED;
> ---8<---
> 
> Also, it's better to mention that moving TCP_CLOSE under the lock resolves
> another rare race with unix_dgram_connect() for DGRAM sk:
> 
>   unix_state_unlock(sk);
>   <--------------------------> connect() could set TCP_ESTABLISHED here.
>   sk->sk_state = TCP_CLOSE;

Sounds good, I'll test this variant. Thanks!

Kirill

>> ---
>>  net/unix/af_unix.c |   11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index b3545fc68097..6fd745cfc492 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -1999,13 +1999,20 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>>  			unix_state_lock(sk);
>>  
>>  		err = 0;
>> -		if (unix_peer(sk) == other) {
>> +		if (sk->sk_type == SOCK_SEQPACKET) {
>> +			/* We are here only when racing with unix_release_sock()
>> +			 * is clearing @other. Never change state to TCP_CLOSE
>> +			 * unlike SOCK_DGRAM wants.
>> +			 */
>> +			unix_state_unlock(sk);
>> +			err = -EPIPE;
>> +		} else if (unix_peer(sk) == other) {
>>  			unix_peer(sk) = NULL;
>>  			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>>  
>> +			sk->sk_state = TCP_CLOSE;
>>  			unix_state_unlock(sk);
>>  
>> -			sk->sk_state = TCP_CLOSE;
>>  			unix_dgram_disconnected(sk, other);
>>  			sock_put(other);
>>  			err = -ECONNREFUSED;
Kirill Tkhai Nov. 20, 2022, 11:43 a.m. UTC | #3
On 20.11.2022 12:46, Kirill Tkhai wrote:
> On 20.11.2022 12:09, Kuniyuki Iwashima wrote:
>> From:   Kirill Tkhai <tkhai@ya.ru>
>> Date:   Sun, 20 Nov 2022 02:16:47 +0300
>>> There is a race resulting in alive SOCK_SEQPACKET socket
>>> may change its state from TCP_ESTABLISHED to TCP_CLOSE:
>>>
>>> unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
>>>   sock_orphan(peer)
>>>     sock_set_flag(peer, SOCK_DEAD)
>>>                                            sock_alloc_send_pskb()
>>>                                              if !(sk->sk_shutdown & SEND_SHUTDOWN)
>>>                                                OK
>>>                                            if sock_flag(peer, SOCK_DEAD)
>>>                                              sk->sk_state = TCP_CLOSE
>>>   sk->sk_shutdown = SHUTDOWN_MASK
>>>
>>>
>>> After that socket sk remains almost normal: it is able to connect, listen, accept
>>> and recvmsg, while it can't sendmsg.
>>
>> nit: Then, also recvmsg() fails with -ENOTCONN.  And after connect(), even
>> both of recvmsg() and sendmsg() does not fail.
> 
> Possible, I wrote not clear. I mean sendmsg fails after connect, while recvmsg does not fail :)
> Here is sendmsg abort path:
> 
> unix_dgram_sendmsg()
>   sock_alloc_send_pskb()
>     if (sk->sk_shutdown & SEND_SHUTDOWN)
>       FAIL
> 
>> static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
>> 				  size_t size, int flags)
>> {
>> 	struct sock *sk = sock->sk;
>>
>> 	if (sk->sk_state != TCP_ESTABLISHED)
>> 		return -ENOTCONN;
>>
>> 	return unix_dgram_recvmsg(sock, msg, size, flags);
>> }
>>
>>
>>>
>>> Since this is the only possibility for alive SOCK_SEQPACKET to change
>>> the state in such way, we should better fix this strange and potentially
>>> danger corner case.
>>>
>>> Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.
>>>
>>> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
>>
>> Fixes tag is needed:
>>
>> Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
>>> Before this commit, there was no state change and SEQPACKET sk also went
>> through the same path.  The bug was introduced because the commit did not
>> consider SEAPACKET.
>>
>> So, I think the fix should be like below, then we can free the peer faster.
>> Note unix_dgram_peer_wake_disconnect_wakeup() is dgram-specific too.
>>
>> ---8<---
>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>> index b3545fc68097..be40023a61fb 100644
>> --- a/net/unix/af_unix.c
>> +++ b/net/unix/af_unix.c
>> @@ -2001,11 +2001,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>>  		err = 0;
>>  		if (unix_peer(sk) == other) {
>>  			unix_peer(sk) = NULL;
>> -			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>> +
>> +			if (sk->sk_type == SOCK_DGRAM) {
>> +				unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>> +				sk->sk_state = TCP_CLOSE;
>> +			}
>>  
>>  			unix_state_unlock(sk);
>>  
>> -			sk->sk_state = TCP_CLOSE;
>>  			unix_dgram_disconnected(sk, other);
>>  			sock_put(other);
>>  			err = -ECONNREFUSED;
>> ---8<---
>>
>> Also, it's better to mention that moving TCP_CLOSE under the lock resolves
>> another rare race with unix_dgram_connect() for DGRAM sk:
>>
>>   unix_state_unlock(sk);
>>   <--------------------------> connect() could set TCP_ESTABLISHED here.
>>   sk->sk_state = TCP_CLOSE;
> 
> Sounds good, I'll test this variant. Thanks!

Thinking again, I'd argue about disconnecting from peer (unix_peer(sk) = NULL) here.
Normally, unix_dgram_sendmsg() never came here like I wrote:

 unix_dgram_sendmsg()
   sock_alloc_send_pskb()
     if (sk->sk_shutdown & SEND_SHUTDOWN)
       FAIL with EPIPE

So, this optimization will work very rare, it fact there will not real performance profit.
But this creates a cornet case (safe but corner), which seems not good. Corner cases are not
good in general.

I'd leave an original variant. What do you think about this?

Kirill

>>> ---
>>>  net/unix/af_unix.c |   11 +++++++++--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>>> index b3545fc68097..6fd745cfc492 100644
>>> --- a/net/unix/af_unix.c
>>> +++ b/net/unix/af_unix.c
>>> @@ -1999,13 +1999,20 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>>>  			unix_state_lock(sk);
>>>  
>>>  		err = 0;
>>> -		if (unix_peer(sk) == other) {
>>> +		if (sk->sk_type == SOCK_SEQPACKET) {
>>> +			/* We are here only when racing with unix_release_sock()
>>> +			 * is clearing @other. Never change state to TCP_CLOSE
>>> +			 * unlike SOCK_DGRAM wants.
>>> +			 */
>>> +			unix_state_unlock(sk);
>>> +			err = -EPIPE;
>>> +		} else if (unix_peer(sk) == other) {
>>>  			unix_peer(sk) = NULL;
>>>  			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>>>  
>>> +			sk->sk_state = TCP_CLOSE;
>>>  			unix_state_unlock(sk);
>>>  
>>> -			sk->sk_state = TCP_CLOSE;
>>>  			unix_dgram_disconnected(sk, other);
>>>  			sock_put(other);
>>>  			err = -ECONNREFUSED;
>
Kuniyuki Iwashima Nov. 21, 2022, 5:16 p.m. UTC | #4
From:   Kirill Tkhai <tkhai@ya.ru>
Date:   Sun, 20 Nov 2022 14:43:06 +0300
> On 20.11.2022 12:46, Kirill Tkhai wrote:
> > On 20.11.2022 12:09, Kuniyuki Iwashima wrote:
> >> From:   Kirill Tkhai <tkhai@ya.ru>
> >> Date:   Sun, 20 Nov 2022 02:16:47 +0300
> >>> There is a race resulting in alive SOCK_SEQPACKET socket
> >>> may change its state from TCP_ESTABLISHED to TCP_CLOSE:
> >>>
> >>> unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
> >>>   sock_orphan(peer)
> >>>     sock_set_flag(peer, SOCK_DEAD)
> >>>                                            sock_alloc_send_pskb()
> >>>                                              if !(sk->sk_shutdown & SEND_SHUTDOWN)
> >>>                                                OK
> >>>                                            if sock_flag(peer, SOCK_DEAD)
> >>>                                              sk->sk_state = TCP_CLOSE
> >>>   sk->sk_shutdown = SHUTDOWN_MASK
> >>>
> >>>
> >>> After that socket sk remains almost normal: it is able to connect, listen, accept
> >>> and recvmsg, while it can't sendmsg.
> >>
> >> nit: Then, also recvmsg() fails with -ENOTCONN.  And after connect(), even
> >> both of recvmsg() and sendmsg() does not fail.
> > 
> > Possible, I wrote not clear. I mean sendmsg fails after connect, while recvmsg does not fail :)
> > Here is sendmsg abort path:
> > 
> > unix_dgram_sendmsg()
> >   sock_alloc_send_pskb()
> >     if (sk->sk_shutdown & SEND_SHUTDOWN)
> >       FAIL
> > 
> >> static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
> >> 				  size_t size, int flags)
> >> {
> >> 	struct sock *sk = sock->sk;
> >>
> >> 	if (sk->sk_state != TCP_ESTABLISHED)
> >> 		return -ENOTCONN;
> >>
> >> 	return unix_dgram_recvmsg(sock, msg, size, flags);
> >> }
> >>
> >>
> >>>
> >>> Since this is the only possibility for alive SOCK_SEQPACKET to change
> >>> the state in such way, we should better fix this strange and potentially
> >>> danger corner case.
> >>>
> >>> Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.
> >>>
> >>> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
> >>
> >> Fixes tag is needed:
> >>
> >> Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
> >>> Before this commit, there was no state change and SEQPACKET sk also went
> >> through the same path.  The bug was introduced because the commit did not
> >> consider SEAPACKET.
> >>
> >> So, I think the fix should be like below, then we can free the peer faster.
> >> Note unix_dgram_peer_wake_disconnect_wakeup() is dgram-specific too.
> >>
> >> ---8<---
> >> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >> index b3545fc68097..be40023a61fb 100644
> >> --- a/net/unix/af_unix.c
> >> +++ b/net/unix/af_unix.c
> >> @@ -2001,11 +2001,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> >>  		err = 0;
> >>  		if (unix_peer(sk) == other) {
> >>  			unix_peer(sk) = NULL;
> >> -			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> >> +
> >> +			if (sk->sk_type == SOCK_DGRAM) {
> >> +				unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> >> +				sk->sk_state = TCP_CLOSE;
> >> +			}
> >>  
> >>  			unix_state_unlock(sk);
> >>  
> >> -			sk->sk_state = TCP_CLOSE;
> >>  			unix_dgram_disconnected(sk, other);
> >>  			sock_put(other);
> >>  			err = -ECONNREFUSED;
> >> ---8<---
> >>
> >> Also, it's better to mention that moving TCP_CLOSE under the lock resolves
> >> another rare race with unix_dgram_connect() for DGRAM sk:
> >>
> >>   unix_state_unlock(sk);
> >>   <--------------------------> connect() could set TCP_ESTABLISHED here.
> >>   sk->sk_state = TCP_CLOSE;
> > 
> > Sounds good, I'll test this variant. Thanks!
> 
> Thinking again, I'd argue about disconnecting from peer (unix_peer(sk) = NULL) here.
> Normally, unix_dgram_sendmsg() never came here like I wrote:
> 
>  unix_dgram_sendmsg()
>    sock_alloc_send_pskb()
>      if (sk->sk_shutdown & SEND_SHUTDOWN)
>        FAIL with EPIPE
> 
> So, this optimization will work very rare, it fact there will not real performance profit.
> But this creates a cornet case (safe but corner), which seems not good. Corner cases are not
> good in general.
> 
> I'd leave an original variant. What do you think about this?

I think there is no good reason to delay freeing unused memory, not only
sock_put(other), unix_dgram_disconnected() frees sk->sk_receive_queue.

And if peer is cleared, we need not call sock_alloc_send_pskb() and
return earlier with -ENOTCONN in the following sendmsg()s.  It's easy
to read because we need not dive into another layer implemented in
sock_alloc_send_pskb().

Also, at least, the code has been safe for decades, and if we don't
clear peer and sk_receive_queue, we have to check other places if
there are sanity checks for such cases.  IMHO, we should minimize the
risk if the patch is for -stable.

Thank you.


> 
> Kirill
> 
> >>> ---
> >>>  net/unix/af_unix.c |   11 +++++++++--
> >>>  1 file changed, 9 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> >>> index b3545fc68097..6fd745cfc492 100644
> >>> --- a/net/unix/af_unix.c
> >>> +++ b/net/unix/af_unix.c
> >>> @@ -1999,13 +1999,20 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
> >>>  			unix_state_lock(sk);
> >>>  
> >>>  		err = 0;
> >>> -		if (unix_peer(sk) == other) {
> >>> +		if (sk->sk_type == SOCK_SEQPACKET) {
> >>> +			/* We are here only when racing with unix_release_sock()
> >>> +			 * is clearing @other. Never change state to TCP_CLOSE
> >>> +			 * unlike SOCK_DGRAM wants.
> >>> +			 */
> >>> +			unix_state_unlock(sk);
> >>> +			err = -EPIPE;
> >>> +		} else if (unix_peer(sk) == other) {
> >>>  			unix_peer(sk) = NULL;
> >>>  			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
> >>>  
> >>> +			sk->sk_state = TCP_CLOSE;
> >>>  			unix_state_unlock(sk);
> >>>  
> >>> -			sk->sk_state = TCP_CLOSE;
> >>>  			unix_dgram_disconnected(sk, other);
> >>>  			sock_put(other);
> >>>  			err = -ECONNREFUSED;
Kirill Tkhai Dec. 3, 2022, 10:34 a.m. UTC | #5
On 21.11.2022 20:16, Kuniyuki Iwashima wrote:
> From:   Kirill Tkhai <tkhai@ya.ru>
> Date:   Sun, 20 Nov 2022 14:43:06 +0300
>> On 20.11.2022 12:46, Kirill Tkhai wrote:
>>> On 20.11.2022 12:09, Kuniyuki Iwashima wrote:
>>>> From:   Kirill Tkhai <tkhai@ya.ru>
>>>> Date:   Sun, 20 Nov 2022 02:16:47 +0300
>>>>> There is a race resulting in alive SOCK_SEQPACKET socket
>>>>> may change its state from TCP_ESTABLISHED to TCP_CLOSE:
>>>>>
>>>>> unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
>>>>>   sock_orphan(peer)
>>>>>     sock_set_flag(peer, SOCK_DEAD)
>>>>>                                            sock_alloc_send_pskb()
>>>>>                                              if !(sk->sk_shutdown & SEND_SHUTDOWN)
>>>>>                                                OK
>>>>>                                            if sock_flag(peer, SOCK_DEAD)
>>>>>                                              sk->sk_state = TCP_CLOSE
>>>>>   sk->sk_shutdown = SHUTDOWN_MASK
>>>>>
>>>>>
>>>>> After that socket sk remains almost normal: it is able to connect, listen, accept
>>>>> and recvmsg, while it can't sendmsg.
>>>>
>>>> nit: Then, also recvmsg() fails with -ENOTCONN.  And after connect(), even
>>>> both of recvmsg() and sendmsg() does not fail.
>>>
>>> Possible, I wrote not clear. I mean sendmsg fails after connect, while recvmsg does not fail :)
>>> Here is sendmsg abort path:
>>>
>>> unix_dgram_sendmsg()
>>>   sock_alloc_send_pskb()
>>>     if (sk->sk_shutdown & SEND_SHUTDOWN)
>>>       FAIL
>>>
>>>> static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg,
>>>> 				  size_t size, int flags)
>>>> {
>>>> 	struct sock *sk = sock->sk;
>>>>
>>>> 	if (sk->sk_state != TCP_ESTABLISHED)
>>>> 		return -ENOTCONN;
>>>>
>>>> 	return unix_dgram_recvmsg(sock, msg, size, flags);
>>>> }
>>>>
>>>>
>>>>>
>>>>> Since this is the only possibility for alive SOCK_SEQPACKET to change
>>>>> the state in such way, we should better fix this strange and potentially
>>>>> danger corner case.
>>>>>
>>>>> Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock.
>>>>>
>>>>> Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
>>>>
>>>> Fixes tag is needed:
>>>>
>>>> Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
>>>>> Before this commit, there was no state change and SEQPACKET sk also went
>>>> through the same path.  The bug was introduced because the commit did not
>>>> consider SEAPACKET.
>>>>
>>>> So, I think the fix should be like below, then we can free the peer faster.
>>>> Note unix_dgram_peer_wake_disconnect_wakeup() is dgram-specific too.
>>>>
>>>> ---8<---
>>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
>>>> index b3545fc68097..be40023a61fb 100644
>>>> --- a/net/unix/af_unix.c
>>>> +++ b/net/unix/af_unix.c
>>>> @@ -2001,11 +2001,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
>>>>  		err = 0;
>>>>  		if (unix_peer(sk) == other) {
>>>>  			unix_peer(sk) = NULL;
>>>> -			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>>>> +
>>>> +			if (sk->sk_type == SOCK_DGRAM) {
>>>> +				unix_dgram_peer_wake_disconnect_wakeup(sk, other);
>>>> +				sk->sk_state = TCP_CLOSE;
>>>> +			}
>>>>  
>>>>  			unix_state_unlock(sk);
>>>>  
>>>> -			sk->sk_state = TCP_CLOSE;
>>>>  			unix_dgram_disconnected(sk, other);
>>>>  			sock_put(other);
>>>>  			err = -ECONNREFUSED;
>>>> ---8<---
>>>>
>>>> Also, it's better to mention that moving TCP_CLOSE under the lock resolves
>>>> another rare race with unix_dgram_connect() for DGRAM sk:
>>>>
>>>>   unix_state_unlock(sk);
>>>>   <--------------------------> connect() could set TCP_ESTABLISHED here.
>>>>   sk->sk_state = TCP_CLOSE;
>>>
>>> Sounds good, I'll test this variant. Thanks!
>>
>> Thinking again, I'd argue about disconnecting from peer (unix_peer(sk) = NULL) here.
>> Normally, unix_dgram_sendmsg() never came here like I wrote:
>>
>>  unix_dgram_sendmsg()
>>    sock_alloc_send_pskb()
>>      if (sk->sk_shutdown & SEND_SHUTDOWN)
>>        FAIL with EPIPE
>>
>> So, this optimization will work very rare, it fact there will not real performance profit.
>> But this creates a cornet case (safe but corner), which seems not good. Corner cases are not
>> good in general.
>>
>> I'd leave an original variant. What do you think about this?
> 
> I think there is no good reason to delay freeing unused memory, not only
> sock_put(other), unix_dgram_disconnected() frees sk->sk_receive_queue.

Hm, it's not about freeing memory. This optimization will work almost never, because of the probability
to get into this is very small. This just adds additional race condition, which is bad from any side.

So, finally I don't like this, sorry.

Thanks for your opinion.
Kirill

> And if peer is cleared, we need not call sock_alloc_send_pskb() and
> return earlier with -ENOTCONN in the following sendmsg()s.  It's easy
> to read because we need not dive into another layer implemented in
> sock_alloc_send_pskb().
> 
> Also, at least, the code has been safe for decades, and if we don't
> clear peer and sk_receive_queue, we have to check other places if
> there are sanity checks for such cases.  IMHO, we should minimize the
> risk if the patch is for -stable.
diff mbox series

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b3545fc68097..6fd745cfc492 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1999,13 +1999,20 @@  static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			unix_state_lock(sk);
 
 		err = 0;
-		if (unix_peer(sk) == other) {
+		if (sk->sk_type == SOCK_SEQPACKET) {
+			/* We are here only when racing with unix_release_sock()
+			 * is clearing @other. Never change state to TCP_CLOSE
+			 * unlike SOCK_DGRAM wants.
+			 */
+			unix_state_unlock(sk);
+			err = -EPIPE;
+		} else if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
 			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
 
+			sk->sk_state = TCP_CLOSE;
 			unix_state_unlock(sk);
 
-			sk->sk_state = TCP_CLOSE;
 			unix_dgram_disconnected(sk, other);
 			sock_put(other);
 			err = -ECONNREFUSED;