[v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock().

Message ID	20240729192601.97316-1-kuniyu@amazon.com (mailing list archive)
State	Superseded
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp-fw-33001.amazon.com (smtp-fw-33001.amazon.com [207.171.190.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6FC984D34; Mon, 29 Jul 2024 19:26:23 +0000 (UTC) From: Kuniyuki Iwashima <kuniyu@amazon.com> To: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>, Xin Long <lucien.xin@gmail.com>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> CC: Kuniyuki Iwashima <kuniyu@amazon.com>, Kuniyuki Iwashima <kuni1840@gmail.com>, <netdev@vger.kernel.org>, <linux-sctp@vger.kernel.org>, <syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com> Subject: [PATCH v1 net] sctp: Fix null-ptr-deref in reuseport_add_sock(). Date: Mon, 29 Jul 2024 12:26:01 -0700 Message-ID: <20240729192601.97316-1-kuniyu@amazon.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	[v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock(). \| expand [v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock().

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag present in non-next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 42 this patch: 42
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 7 of 8 maintainers
netdev/build_clang	success	Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	Fixes tag looks correct
netdev/build_allmodconfig_warn	success	Errors and warnings before: 46 this patch: 46
netdev/checkpatch	warning	WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	warning	net-next-2024-07-30--03-00 (tests: 675)

Kuniyuki Iwashima July 29, 2024, 7:26 p.m. UTC

syzkaller reported a null-ptr-deref while accessing sk2->sk_reuseport_cb
in reuseport_add_sock(). [0]

The repro first creates a listener with SO_REUSEPORT.  Then, it creates
another listener on the same port and concurrently close()s the first
listener.

The second listen() calls reuseport_add_sock() with the first listener as
sk2, where sk2 is not expected to be cleared concurrently, but the close()
does clear it.

The assumption in reuseport_add_sock() is correct for TCP/UDP, which
serialises the changes to sk_reuseport_cb by holding the hash bucket
lock.

However, __sctp_hash_endpoint() does not hold sctp_hashbucket.lock before
calling reuseport_add_sock() , nor __sctp_unhash_endpoint() does before
reuseport_detach_sock().

Let's fix it by acquiring the bucket lock before reuseport_add_sock() and
reuseport_detach_sock().

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 UID: 0 PID: 10230 Comm: syz-executor119 Not tainted 6.10.0-syzkaller-12585-g301927d2d2eb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
RIP: 0010:reuseport_add_sock+0x27e/0x5e0 net/core/sock_reuseport.c:350
Code: 00 0f b7 5d 00 bf 01 00 00 00 89 de e8 1b a4 ff f7 83 fb 01 0f 85 a3 01 00 00 e8 6d a0 ff f7 49 8d 7e 12 48 89 f8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 4b 02 00 00 41 0f b7 5e 12 49 8d 7e 14
RSP: 0018:ffffc9000b947c98 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff8880252ddf98 RCX: ffff888079478000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000012
RBP: 0000000000000001 R08: ffffffff8993e18d R09: 1ffffffff1fef385
R10: dffffc0000000000 R11: fffffbfff1fef386 R12: ffff8880252ddac0
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f24e45b96c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcced5f7b8 CR3: 00000000241be000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __sctp_hash_endpoint net/sctp/input.c:762 [inline]
 sctp_hash_endpoint+0x52a/0x600 net/sctp/input.c:790
 sctp_listen_start net/sctp/socket.c:8570 [inline]
 sctp_inet_listen+0x767/0xa20 net/sctp/socket.c:8625
 __sys_listen_socket net/socket.c:1883 [inline]
 __sys_listen+0x1b7/0x230 net/socket.c:1894
 __do_sys_listen net/socket.c:1902 [inline]
 __se_sys_listen net/socket.c:1900 [inline]
 __x64_sys_listen+0x5a/0x70 net/socket.c:1900
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f24e46039b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f24e45b9228 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 00007f24e468e428 RCX: 00007f24e46039b9
RDX: 00007f24e46039b9 RSI: 0000000000000003 RDI: 0000000000000004
RBP: 00007f24e468e420 R08: 00007f24e45b96c0 R09: 00007f24e45b96c0
R10: 00007f24e45b96c0 R11: 0000000000000246 R12: 00007f24e468e42c
R13: 00007f24e465a5dc R14: 0020000000000001 R15: 00007ffcced5f7d8
 </TASK>
Modules linked in:

Fixes: 6ba845740267 ("sctp: process sk_reuseport in sctp_get_port_local")
Reported-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e6979a5d2f10ecb700e4
Tested-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/sctp/input.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

Dmitry Antipov July 30, 2024, 1:46 p.m. UTC | #1

Hm. Note both 'reuseport_add_sock()' and 'reuseport_detach_sock()' uses
global 'reuseport_lock' internally. So what about using read lock to protect
'sctp_for_each_entry()' loop and upgrade to write lock only if hash bucket
should be actually updated? E.g.:

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 17fcaa9b0df9..4fbff388b1b4 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -735,6 +735,7 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 	struct sock *sk = ep->base.sk;
 	struct net *net = sock_net(sk);
 	struct sctp_hashbucket *head;
+	int err = 0;
 
 	ep->hashent = sctp_ep_hashfn(net, ep->base.bind_addr.port);
 	head = &sctp_ep_hashtable[ep->hashent];
@@ -743,11 +744,14 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 		bool any = sctp_is_ep_boundall(sk);
 		struct sctp_endpoint *ep2;
 		struct list_head *list;
-		int cnt = 0, err = 1;
+		int cnt = 0;
+
+		err = 1;
 
 		list_for_each(list, &ep->base.bind_addr.address_list)
 			cnt++;
 
+		read_lock(&head->lock);
 		sctp_for_each_hentry(ep2, &head->chain) {
 			struct sock *sk2 = ep2->base.sk;
 
@@ -760,25 +764,30 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 						    sctp_sk(sk), cnt);
 			if (!err) {
 				err = reuseport_add_sock(sk, sk2, any);
-				if (err)
-					return err;
+				if (err) {
+					read_unlock(&head->lock);
+					goto out;
+				}
 				break;
 			} else if (err < 0) {
-				return err;
+				read_unlock(&head->lock);
+				goto out;
 			}
 		}
+		read_unlock(&head->lock);
 
 		if (err) {
 			err = reuseport_alloc(sk, any);
 			if (err)
-				return err;
+				goto out;
 		}
 	}
 
 	write_lock(&head->lock);
 	hlist_add_head(&ep->node, &head->chain);
 	write_unlock(&head->lock);
-	return 0;
+out:
+	return err;
 }
 
 /* Add an endpoint to the hash. Local BH-safe. */

Dmitry

Kuniyuki Iwashima July 30, 2024, 5:06 p.m. UTC | #2

From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Tue, 30 Jul 2024 16:46:05 +0300
> Hm. Note both 'reuseport_add_sock()' and 'reuseport_detach_sock()' uses
> global 'reuseport_lock' internally. So what about using read lock to protect
> 'sctp_for_each_entry()' loop and upgrade to write lock only if hash bucket
> should be actually updated? E.g.:
> 
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 17fcaa9b0df9..4fbff388b1b4 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -735,6 +735,7 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  	struct sock *sk = ep->base.sk;
>  	struct net *net = sock_net(sk);
>  	struct sctp_hashbucket *head;
> +	int err = 0;
>  
>  	ep->hashent = sctp_ep_hashfn(net, ep->base.bind_addr.port);
>  	head = &sctp_ep_hashtable[ep->hashent];
> @@ -743,11 +744,14 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  		bool any = sctp_is_ep_boundall(sk);
>  		struct sctp_endpoint *ep2;
>  		struct list_head *list;
> -		int cnt = 0, err = 1;
> +		int cnt = 0;
> +
> +		err = 1;
>  
>  		list_for_each(list, &ep->base.bind_addr.address_list)
>  			cnt++;
>  
> +		read_lock(&head->lock);
>  		sctp_for_each_hentry(ep2, &head->chain) {
>  			struct sock *sk2 = ep2->base.sk;
>  
> @@ -760,25 +764,30 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  						    sctp_sk(sk), cnt);
>  			if (!err) {
>  				err = reuseport_add_sock(sk, sk2, any);
> -				if (err)
> -					return err;
> +				if (err) {
> +					read_unlock(&head->lock);
> +					goto out;
> +				}
>  				break;
>  			} else if (err < 0) {
> -				return err;
> +				read_unlock(&head->lock);
> +				goto out;
>  			}
>  		}
> +		read_unlock(&head->lock);
>  
>  		if (err) {
>  			err = reuseport_alloc(sk, any);

What happens if two sockets matching each other reach here ?

There would be no error but the first hashed socket will be silently
dead as lookup stops at the 2nd sk ?


>  			if (err)
> -				return err;
> +				goto out;
>  		}
>  	}
>  
>  	write_lock(&head->lock);
>  	hlist_add_head(&ep->node, &head->chain);
>  	write_unlock(&head->lock);
> -	return 0;
> +out:
> +	return err;
>  }
>  
>  /* Add an endpoint to the hash. Local BH-safe. */
> 
> Dmitry

Dmitry Antipov July 31, 2024, 7:05 a.m. UTC | #3

> What happens if two sockets matching each other reach here ?

Not sure. In general, an attempt to use rather external thing (struct
sctp_hashbucket) to implement extra synchronization for reuse innards
looks somewhat weird.

So let's look at reuseport_add_sock() again. Note extra comments:

int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
{
	struct sock_reuseport *old_reuse, *reuse;

	/* RCU access _outside_ of reuseport_lock'ed section */
	if (!rcu_access_pointer(sk2->sk_reuseport_cb)) {
		int err = reuseport_alloc(sk2, bind_inany);

		if (err)
			return err;
	}

	spin_lock_bh(&reuseport_lock);
	reuse = rcu_dereference_protected(sk2->sk_reuseport_cb,
					  lockdep_is_held(&reuseport_lock));
	old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
					      lockdep_is_held(&reuseport_lock));
	if (old_reuse && old_reuse->num_closed_socks) {
		/* sk was shutdown()ed before */
		int err = reuseport_resurrect(sk, old_reuse, reuse, reuse->bind_inany);

		spin_unlock_bh(&reuseport_lock);
		return err;
	}

	if (old_reuse && old_reuse->num_socks != 1) {
		spin_unlock_bh(&reuseport_lock);
		return -EBUSY;
	}

	if (reuse->num_socks + reuse->num_closed_socks == reuse->max_socks) {
		reuse = reuseport_grow(reuse);
		if (!reuse) {
			spin_unlock_bh(&reuseport_lock);
			return -ENOMEM;
		}
	}

	__reuseport_add_sock(sk, reuse);
	/* RCU access _inside_ of reuseport_lock'ed section */
	rcu_assign_pointer(sk->sk_reuseport_cb, reuse);

	spin_unlock_bh(&reuseport_lock);

	if (old_reuse)
		call_rcu(&old_reuse->rcu, reuseport_free_rcu);
	return 0;
}

Why this is so? I've tried to extend reuseport_lock'ed section to include the
first rcu_access_pointer() in subject as well, e.g.:

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index 5a165286e4d8..39a353ab8ff8 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -186,16 +186,11 @@ static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks)
 	return reuse;
 }
 
-int reuseport_alloc(struct sock *sk, bool bind_inany)
+static int reuseport_alloc_unlocked(struct sock *sk, bool bind_inany)
 {
 	struct sock_reuseport *reuse;
 	int id, ret = 0;
 
-	/* bh lock used since this function call may precede hlist lock in
-	 * soft irq of receive path or setsockopt from process context
-	 */
-	spin_lock_bh(&reuseport_lock);
-
 	/* Allocation attempts can occur concurrently via the setsockopt path
 	 * and the bind/hash path.  Nothing to do when we lose the race.
 	 */
@@ -236,8 +231,19 @@ int reuseport_alloc(struct sock *sk, bool bind_inany)
 	reuse->num_socks = 1;
 	reuseport_get_incoming_cpu(sk, reuse);
 	rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
-
 out:
+	return ret;
+}
+
+int reuseport_alloc(struct sock *sk, bool bind_inany)
+{
+	int ret;
+
+	/* bh lock used since this function call may precede hlist lock in
+	 * soft irq of receive path or setsockopt from process context
+	 */
+	spin_lock_bh(&reuseport_lock);
+	ret = reuseport_alloc_unlocked(sk, bind_inany);
 	spin_unlock_bh(&reuseport_lock);
 
 	return ret;
@@ -322,14 +328,17 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
 {
 	struct sock_reuseport *old_reuse, *reuse;
 
+	spin_lock_bh(&reuseport_lock);
+
 	if (!rcu_access_pointer(sk2->sk_reuseport_cb)) {
-		int err = reuseport_alloc(sk2, bind_inany);
+		int err = reuseport_alloc_unlocked(sk2, bind_inany);
 
-		if (err)
+		if (err) {
+			spin_unlock_bh(&reuseport_lock);
 			return err;
+		}
 	}
 
-	spin_lock_bh(&reuseport_lock);
 	reuse = rcu_dereference_protected(sk2->sk_reuseport_cb,
 					  lockdep_is_held(&reuseport_lock));
 	old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb,

and this seems fixes the original crash as well.

Dmitry

Kuniyuki Iwashima July 31, 2024, 7:06 p.m. UTC | #4

From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Wed, 31 Jul 2024 10:05:37 +0300
> > What happens if two sockets matching each other reach here ?
> 
> Not sure.

If two sockets reach the reuseport_alloc() path, two identical
reuseport groups are created.

Then, in __sctp_rcv_lookup_endpoint(), incoming packets hit only
one socket group placed before the other in the hash bucket, and
the other socket no longer receive data and silently died from
userspace POV.

The suggested change papers over the problem.

reusport_add_sock() and reuseport_alloc() must be placed in the
same writer critical section.

> In general, an attempt to use rather external thing (struct
> sctp_hashbucket) to implement extra synchronization for reuse innards
> looks somewhat weird.

reuseport_lock is to synchronise operations within the same
reuseport group, but which socket should belong to which group
is out of scope.

That must be well-defined and synchronised at the upper level.

I'll post v2 with updated changelog.

Thanks!

Dmitry Antipov July 31, 2024, 7:15 p.m. UTC | #5

On 7/31/24 10:06 PM, Kuniyuki Iwashima wrote:

> reuseport_lock is to synchronise operations within the same
> reuseport group, but which socket should belong to which group
> is out of scope.

Hm... then the lock should be a member of 'struct sock_reuseport'
rather than global, isn't?

Dmitry

Kuniyuki Iwashima July 31, 2024, 7:33 p.m. UTC | #6

From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Wed, 31 Jul 2024 22:15:12 +0300
> On 7/31/24 10:06 PM, Kuniyuki Iwashima wrote:
> 
> > reuseport_lock is to synchronise operations within the same
> > reuseport group, but which socket should belong to which group
> > is out of scope.
> 
> Hm... then the lock should be a member of 'struct sock_reuseport'
> rather than global, isn't?

I thought that before and it would be doable with more complex
locking since we still need another synchronisation for add/detach
and alloc.  Such operations are most unlikely done concurrently
nor frequently, and the global lock would be enough.

I'll check that again later, but anyway, it's a different topic
for net-next.

[v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock().

Checks

Commit Message

Comments

Patch