diff mbox series

[net] af_unix: fix races in sk_peer_pid and sk_peer_cred accesses

Message ID 20210929225750.2548112-1-eric.dumazet@gmail.com (mailing list archive)
State Accepted
Commit 35306eb23814444bd4021f8a1c3047d3cb0c8b2b
Delegated to: Netdev Maintainers
Headers show
Series [net] af_unix: fix races in sk_peer_pid and sk_peer_cred accesses | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: daniel.lezcano@free.fr xemul@openvz.org; 16 maintainers not CCed: xiangxia.m.yue@gmail.com yangbo.lu@nxp.com cong.wang@bytedance.com ast@kernel.org Rao.Shoaib@oracle.com xemul@openvz.org pabeni@redhat.com ceggers@arri.de viro@zeniv.linux.org.uk christian.brauner@ubuntu.com jiang.wang@bytedance.com fw@strlen.de kuniyu@amazon.co.jp daniel.lezcano@free.fr aahringo@redhat.com tglx@linutronix.de
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 3081 this patch: 3081
netdev/kdoc fail Errors and warnings before: 0 this patch: 1
netdev/verify_fixes success Link
netdev/checkpatch warning CHECK: spinlock_t definition without comment
netdev/build_allmodconfig_warn success Errors and warnings before: 3191 this patch: 3191
netdev/header_inline success Link

Commit Message

Eric Dumazet Sept. 29, 2021, 10:57 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.

In order to fix this issue, this patch adds a new spinlock that needs
to be used whenever these fields are read or written.

Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
reading sk->sk_peer_pid which makes no sense, as this field
is only possibly set by AF_UNIX sockets.
We will have to clean this in a separate patch.
This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
or implementing what was truly expected.

Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jann Horn <jannh@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
---
 include/net/sock.h |  2 ++
 net/core/sock.c    | 32 ++++++++++++++++++++++++++------
 net/unix/af_unix.c | 34 ++++++++++++++++++++++++++++------
 3 files changed, 56 insertions(+), 12 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Sept. 30, 2021, 1:30 p.m. UTC | #1
Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Wed, 29 Sep 2021 15:57:50 -0700 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
> are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
> 
> In order to fix this issue, this patch adds a new spinlock that needs
> to be used whenever these fields are read or written.
> 
> [...]

Here is the summary with links:
  - [net] af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
    https://git.kernel.org/netdev/net/c/35306eb23814

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Wang Hai March 24, 2022, 8:03 a.m. UTC | #2
在 2021/9/30 6:57, Eric Dumazet 写道:
> From: Eric Dumazet <edumazet@google.com>
>
> Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
> are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
>
> In order to fix this issue, this patch adds a new spinlock that needs
> to be used whenever these fields are read or written.
>
> Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
> reading sk->sk_peer_pid which makes no sense, as this field
> is only possibly set by AF_UNIX sockets.
> We will have to clean this in a separate patch.
> This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
> or implementing what was truly expected.
>
> Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Jann Horn <jannh@google.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
> Cc: Marcel Holtmann <marcel@holtmann.org>
> ---
...
>   static void copy_peercred(struct sock *sk, struct sock *peersk)
>   {
> -	put_pid(sk->sk_peer_pid);
> -	if (sk->sk_peer_cred)
> -		put_cred(sk->sk_peer_cred);
> +	const struct cred *old_cred;
> +	struct pid *old_pid;
> +
> +	if (sk < peersk) {
> +		spin_lock(&sk->sk_peer_lock);
> +		spin_lock_nested(&peersk->sk_peer_lock, SINGLE_DEPTH_NESTING);
> +	} else {
> +		spin_lock(&peersk->sk_peer_lock);
> +		spin_lock_nested(&sk->sk_peer_lock, SINGLE_DEPTH_NESTING);
> +	}
Hi, ALL.
I'm sorry to bother you.

This patch adds sk_peer_lock to solve the problem that af_unix may
concurrently change sk_peer_pid and sk_peer_cred.

I am confused as to why the order of locks is needed here based on
the address size of sk and peersk.

Any feedback would be appreciated, thanks.
> +	old_pid = sk->sk_peer_pid;
> +	old_cred = sk->sk_peer_cred;
>   	sk->sk_peer_pid  = get_pid(peersk->sk_peer_pid);
>   	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
> +
> +	spin_unlock(&sk->sk_peer_lock);
> +	spin_unlock(&peersk->sk_peer_lock);
> +
> +	put_pid(old_pid);
> +	put_cred(old_cred);
>   }
>   
>   static int unix_listen(struct socket *sock, int backlog)
Iwashima, Kuniyuki March 24, 2022, 9:04 a.m. UTC | #3
From:   "wanghai (M)" <wanghai38@huawei.com>
Date:   Thu, 24 Mar 2022 16:03:31 +0800
> 在 2021/9/30 6:57, Eric Dumazet 写道:
>> From: Eric Dumazet <edumazet@google.com>
>>
>> Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
>> are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
>>
>> In order to fix this issue, this patch adds a new spinlock that needs
>> to be used whenever these fields are read or written.
>>
>> Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
>> reading sk->sk_peer_pid which makes no sense, as this field
>> is only possibly set by AF_UNIX sockets.
>> We will have to clean this in a separate patch.
>> This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
>> or implementing what was truly expected.
>>
>> Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Reported-by: Jann Horn <jannh@google.com>
>> Cc: Eric W. Biederman <ebiederm@xmission.com>
>> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
>> Cc: Marcel Holtmann <marcel@holtmann.org>
>> ---
> ...
>>   static void copy_peercred(struct sock *sk, struct sock *peersk)
>>   {
>> -	put_pid(sk->sk_peer_pid);
>> -	if (sk->sk_peer_cred)
>> -		put_cred(sk->sk_peer_cred);
>> +	const struct cred *old_cred;
>> +	struct pid *old_pid;
>> +
>> +	if (sk < peersk) {
>> +		spin_lock(&sk->sk_peer_lock);
>> +		spin_lock_nested(&peersk->sk_peer_lock, SINGLE_DEPTH_NESTING);
>> +	} else {
>> +		spin_lock(&peersk->sk_peer_lock);
>> +		spin_lock_nested(&sk->sk_peer_lock, SINGLE_DEPTH_NESTING);
>> +	}
> Hi, ALL.
> I'm sorry to bother you.
> 
> This patch adds sk_peer_lock to solve the problem that af_unix may
> concurrently change sk_peer_pid and sk_peer_cred.
> 
> I am confused as to why the order of locks is needed here based on
> the address size of sk and peersk.

To simply avoid dead lock.  These locks must be acquired in the same
order.  The smaller address lock is acquired first, then larger one.

  e.g.) CPU-A calls copy_peercred(sk-A, sk-B), and
        CPU-B calls copy_peercred(sk-B, sk-A).

There are some implementations like this:

  $ grep -rn double_lock


> 
> Any feedback would be appreciated, thanks.
>> +	old_pid = sk->sk_peer_pid;
>> +	old_cred = sk->sk_peer_cred;
>>   	sk->sk_peer_pid  = get_pid(peersk->sk_peer_pid);
>>   	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
>> +
>> +	spin_unlock(&sk->sk_peer_lock);
>> +	spin_unlock(&peersk->sk_peer_lock);
>> +
>> +	put_pid(old_pid);
>> +	put_cred(old_cred);
>>   }
>>   
>>   static int unix_listen(struct socket *sock, int backlog)
> 
> -- 
> Wang Hai
Wang Hai March 24, 2022, 11:15 a.m. UTC | #4
在 2022/3/24 17:04, Kuniyuki Iwashima 写道:
> From:   "wanghai (M)" <wanghai38@huawei.com>
> Date:   Thu, 24 Mar 2022 16:03:31 +0800
>> 在 2021/9/30 6:57, Eric Dumazet 写道:
>>> From: Eric Dumazet <edumazet@google.com>
>>>
>>> Jann Horn reported that SO_PEERCRED and SO_PEERGROUPS implementations
>>> are racy, as af_unix can concurrently change sk_peer_pid and sk_peer_cred.
>>>
>>> In order to fix this issue, this patch adds a new spinlock that needs
>>> to be used whenever these fields are read or written.
>>>
>>> Jann also pointed out that l2cap_sock_get_peer_pid_cb() is currently
>>> reading sk->sk_peer_pid which makes no sense, as this field
>>> is only possibly set by AF_UNIX sockets.
>>> We will have to clean this in a separate patch.
>>> This could be done by reverting b48596d1dc25 "Bluetooth: L2CAP: Add get_peer_pid callback"
>>> or implementing what was truly expected.
>>>
>>> Fixes: 109f6e39fa07 ("af_unix: Allow SO_PEERCRED to work across namespaces.")
>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>> Reported-by: Jann Horn <jannh@google.com>
>>> Cc: Eric W. Biederman <ebiederm@xmission.com>
>>> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
>>> Cc: Marcel Holtmann <marcel@holtmann.org>
>>> ---
>> ...
>>>    static void copy_peercred(struct sock *sk, struct sock *peersk)
>>>    {
>>> -	put_pid(sk->sk_peer_pid);
>>> -	if (sk->sk_peer_cred)
>>> -		put_cred(sk->sk_peer_cred);
>>> +	const struct cred *old_cred;
>>> +	struct pid *old_pid;
>>> +
>>> +	if (sk < peersk) {
>>> +		spin_lock(&sk->sk_peer_lock);
>>> +		spin_lock_nested(&peersk->sk_peer_lock, SINGLE_DEPTH_NESTING);
>>> +	} else {
>>> +		spin_lock(&peersk->sk_peer_lock);
>>> +		spin_lock_nested(&sk->sk_peer_lock, SINGLE_DEPTH_NESTING);
>>> +	}
>> Hi, ALL.
>> I'm sorry to bother you.
>>
>> This patch adds sk_peer_lock to solve the problem that af_unix may
>> concurrently change sk_peer_pid and sk_peer_cred.
>>
>> I am confused as to why the order of locks is needed here based on
>> the address size of sk and peersk.
> To simply avoid dead lock.  These locks must be acquired in the same
> order.  The smaller address lock is acquired first, then larger one.
>
>    e.g.) CPU-A calls copy_peercred(sk-A, sk-B), and
>          CPU-B calls copy_peercred(sk-B, sk-A).
>
> There are some implementations like this:
>
>    $ grep -rn double_lock
I got it, thank you for your patient explanation.
>
>
>> Any feedback would be appreciated, thanks.
>>> +	old_pid = sk->sk_peer_pid;
>>> +	old_cred = sk->sk_peer_cred;
>>>    	sk->sk_peer_pid  = get_pid(peersk->sk_peer_pid);
>>>    	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
>>> +
>>> +	spin_unlock(&sk->sk_peer_lock);
>>> +	spin_unlock(&peersk->sk_peer_lock);
>>> +
>>> +	put_pid(old_pid);
>>> +	put_cred(old_cred);
>>>    }
>>>    
>>>    static int unix_listen(struct socket *sock, int backlog)
>> -- 
>> Wang Hai
> .
>
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index c005c3c750e89b6d4d36d36ccfad392a7e733603..f471cd0a6f98cb5b2b961f01c2de62870d61521e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -488,8 +488,10 @@  struct sock {
 	u8			sk_prefer_busy_poll;
 	u16			sk_busy_poll_budget;
 #endif
+	spinlock_t		sk_peer_lock;
 	struct pid		*sk_peer_pid;
 	const struct cred	*sk_peer_cred;
+
 	long			sk_rcvtimeo;
 	ktime_t			sk_stamp;
 #if BITS_PER_LONG==32
diff --git a/net/core/sock.c b/net/core/sock.c
index 512e629f97803f3216db8f054e97fd1b7a6e6c63..c70cbce69a66181c3688862ea51aa781b3ce4f97 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1376,6 +1376,16 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 }
 EXPORT_SYMBOL(sock_setsockopt);
 
+static const struct cred *sk_get_peer_cred(struct sock *sk)
+{
+	const struct cred *cred;
+
+	spin_lock(&sk->sk_peer_lock);
+	cred = get_cred(sk->sk_peer_cred);
+	spin_unlock(&sk->sk_peer_lock);
+
+	return cred;
+}
 
 static void cred_to_ucred(struct pid *pid, const struct cred *cred,
 			  struct ucred *ucred)
@@ -1552,7 +1562,11 @@  int sock_getsockopt(struct socket *sock, int level, int optname,
 		struct ucred peercred;
 		if (len > sizeof(peercred))
 			len = sizeof(peercred);
+
+		spin_lock(&sk->sk_peer_lock);
 		cred_to_ucred(sk->sk_peer_pid, sk->sk_peer_cred, &peercred);
+		spin_unlock(&sk->sk_peer_lock);
+
 		if (copy_to_user(optval, &peercred, len))
 			return -EFAULT;
 		goto lenout;
@@ -1560,20 +1574,23 @@  int sock_getsockopt(struct socket *sock, int level, int optname,
 
 	case SO_PEERGROUPS:
 	{
+		const struct cred *cred;
 		int ret, n;
 
-		if (!sk->sk_peer_cred)
+		cred = sk_get_peer_cred(sk);
+		if (!cred)
 			return -ENODATA;
 
-		n = sk->sk_peer_cred->group_info->ngroups;
+		n = cred->group_info->ngroups;
 		if (len < n * sizeof(gid_t)) {
 			len = n * sizeof(gid_t);
+			put_cred(cred);
 			return put_user(len, optlen) ? -EFAULT : -ERANGE;
 		}
 		len = n * sizeof(gid_t);
 
-		ret = groups_to_user((gid_t __user *)optval,
-				     sk->sk_peer_cred->group_info);
+		ret = groups_to_user((gid_t __user *)optval, cred->group_info);
+		put_cred(cred);
 		if (ret)
 			return ret;
 		goto lenout;
@@ -1935,9 +1952,10 @@  static void __sk_destruct(struct rcu_head *head)
 		sk->sk_frag.page = NULL;
 	}
 
-	if (sk->sk_peer_cred)
-		put_cred(sk->sk_peer_cred);
+	/* We do not need to acquire sk->sk_peer_lock, we are the last user. */
+	put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
+
 	if (likely(sk->sk_net_refcnt))
 		put_net(sock_net(sk));
 	sk_prot_free(sk->sk_prot_creator, sk);
@@ -3145,6 +3163,8 @@  void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_peer_pid 	=	NULL;
 	sk->sk_peer_cred	=	NULL;
+	spin_lock_init(&sk->sk_peer_lock);
+
 	sk->sk_write_pending	=	0;
 	sk->sk_rcvlowat		=	1;
 	sk->sk_rcvtimeo		=	MAX_SCHEDULE_TIMEOUT;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f505b89bda6adefe973806f842001d428fc6c5ca..efac5989edb5d37881139bb1ad2fb9cf63bd7342 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -608,20 +608,42 @@  static void unix_release_sock(struct sock *sk, int embrion)
 
 static void init_peercred(struct sock *sk)
 {
-	put_pid(sk->sk_peer_pid);
-	if (sk->sk_peer_cred)
-		put_cred(sk->sk_peer_cred);
+	const struct cred *old_cred;
+	struct pid *old_pid;
+
+	spin_lock(&sk->sk_peer_lock);
+	old_pid = sk->sk_peer_pid;
+	old_cred = sk->sk_peer_cred;
 	sk->sk_peer_pid  = get_pid(task_tgid(current));
 	sk->sk_peer_cred = get_current_cred();
+	spin_unlock(&sk->sk_peer_lock);
+
+	put_pid(old_pid);
+	put_cred(old_cred);
 }
 
 static void copy_peercred(struct sock *sk, struct sock *peersk)
 {
-	put_pid(sk->sk_peer_pid);
-	if (sk->sk_peer_cred)
-		put_cred(sk->sk_peer_cred);
+	const struct cred *old_cred;
+	struct pid *old_pid;
+
+	if (sk < peersk) {
+		spin_lock(&sk->sk_peer_lock);
+		spin_lock_nested(&peersk->sk_peer_lock, SINGLE_DEPTH_NESTING);
+	} else {
+		spin_lock(&peersk->sk_peer_lock);
+		spin_lock_nested(&sk->sk_peer_lock, SINGLE_DEPTH_NESTING);
+	}
+	old_pid = sk->sk_peer_pid;
+	old_cred = sk->sk_peer_cred;
 	sk->sk_peer_pid  = get_pid(peersk->sk_peer_pid);
 	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
+
+	spin_unlock(&sk->sk_peer_lock);
+	spin_unlock(&peersk->sk_peer_lock);
+
+	put_pid(old_pid);
+	put_cred(old_cred);
 }
 
 static int unix_listen(struct socket *sock, int backlog)