diff mbox series

[v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock().

Message ID 20240729192601.97316-1-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v1,net] sctp: Fix null-ptr-deref in reuseport_add_sock(). | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 8 maintainers
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 46 this patch: 46
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest warning net-next-2024-07-30--03-00 (tests: 675)

Commit Message

Kuniyuki Iwashima July 29, 2024, 7:26 p.m. UTC
syzkaller reported a null-ptr-deref while accessing sk2->sk_reuseport_cb
in reuseport_add_sock(). [0]

The repro first creates a listener with SO_REUSEPORT.  Then, it creates
another listener on the same port and concurrently close()s the first
listener.

The second listen() calls reuseport_add_sock() with the first listener as
sk2, where sk2 is not expected to be cleared concurrently, but the close()
does clear it.

The assumption in reuseport_add_sock() is correct for TCP/UDP, which
serialises the changes to sk_reuseport_cb by holding the hash bucket
lock.

However, __sctp_hash_endpoint() does not hold sctp_hashbucket.lock before
calling reuseport_add_sock() , nor __sctp_unhash_endpoint() does before
reuseport_detach_sock().

Let's fix it by acquiring the bucket lock before reuseport_add_sock() and
reuseport_detach_sock().

[0]:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
CPU: 1 UID: 0 PID: 10230 Comm: syz-executor119 Not tainted 6.10.0-syzkaller-12585-g301927d2d2eb #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
RIP: 0010:reuseport_add_sock+0x27e/0x5e0 net/core/sock_reuseport.c:350
Code: 00 0f b7 5d 00 bf 01 00 00 00 89 de e8 1b a4 ff f7 83 fb 01 0f 85 a3 01 00 00 e8 6d a0 ff f7 49 8d 7e 12 48 89 f8 48 c1 e8 03 <42> 0f b6 04 28 84 c0 0f 85 4b 02 00 00 41 0f b7 5e 12 49 8d 7e 14
RSP: 0018:ffffc9000b947c98 EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff8880252ddf98 RCX: ffff888079478000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000012
RBP: 0000000000000001 R08: ffffffff8993e18d R09: 1ffffffff1fef385
R10: dffffc0000000000 R11: fffffbfff1fef386 R12: ffff8880252ddac0
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f24e45b96c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcced5f7b8 CR3: 00000000241be000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __sctp_hash_endpoint net/sctp/input.c:762 [inline]
 sctp_hash_endpoint+0x52a/0x600 net/sctp/input.c:790
 sctp_listen_start net/sctp/socket.c:8570 [inline]
 sctp_inet_listen+0x767/0xa20 net/sctp/socket.c:8625
 __sys_listen_socket net/socket.c:1883 [inline]
 __sys_listen+0x1b7/0x230 net/socket.c:1894
 __do_sys_listen net/socket.c:1902 [inline]
 __se_sys_listen net/socket.c:1900 [inline]
 __x64_sys_listen+0x5a/0x70 net/socket.c:1900
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f24e46039b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f24e45b9228 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 00007f24e468e428 RCX: 00007f24e46039b9
RDX: 00007f24e46039b9 RSI: 0000000000000003 RDI: 0000000000000004
RBP: 00007f24e468e420 R08: 00007f24e45b96c0 R09: 00007f24e45b96c0
R10: 00007f24e45b96c0 R11: 0000000000000246 R12: 00007f24e468e42c
R13: 00007f24e465a5dc R14: 0020000000000001 R15: 00007ffcced5f7d8
 </TASK>
Modules linked in:

Fixes: 6ba845740267 ("sctp: process sk_reuseport in sctp_get_port_local")
Reported-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e6979a5d2f10ecb700e4
Tested-by: syzbot+e6979a5d2f10ecb700e4@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/sctp/input.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

Comments

Dmitry Antipov July 30, 2024, 1:46 p.m. UTC | #1
Hm. Note both 'reuseport_add_sock()' and 'reuseport_detach_sock()' uses
global 'reuseport_lock' internally. So what about using read lock to protect
'sctp_for_each_entry()' loop and upgrade to write lock only if hash bucket
should be actually updated? E.g.:

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 17fcaa9b0df9..4fbff388b1b4 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -735,6 +735,7 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 	struct sock *sk = ep->base.sk;
 	struct net *net = sock_net(sk);
 	struct sctp_hashbucket *head;
+	int err = 0;
 
 	ep->hashent = sctp_ep_hashfn(net, ep->base.bind_addr.port);
 	head = &sctp_ep_hashtable[ep->hashent];
@@ -743,11 +744,14 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 		bool any = sctp_is_ep_boundall(sk);
 		struct sctp_endpoint *ep2;
 		struct list_head *list;
-		int cnt = 0, err = 1;
+		int cnt = 0;
+
+		err = 1;
 
 		list_for_each(list, &ep->base.bind_addr.address_list)
 			cnt++;
 
+		read_lock(&head->lock);
 		sctp_for_each_hentry(ep2, &head->chain) {
 			struct sock *sk2 = ep2->base.sk;
 
@@ -760,25 +764,30 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 						    sctp_sk(sk), cnt);
 			if (!err) {
 				err = reuseport_add_sock(sk, sk2, any);
-				if (err)
-					return err;
+				if (err) {
+					read_unlock(&head->lock);
+					goto out;
+				}
 				break;
 			} else if (err < 0) {
-				return err;
+				read_unlock(&head->lock);
+				goto out;
 			}
 		}
+		read_unlock(&head->lock);
 
 		if (err) {
 			err = reuseport_alloc(sk, any);
 			if (err)
-				return err;
+				goto out;
 		}
 	}
 
 	write_lock(&head->lock);
 	hlist_add_head(&ep->node, &head->chain);
 	write_unlock(&head->lock);
-	return 0;
+out:
+	return err;
 }
 
 /* Add an endpoint to the hash. Local BH-safe. */

Dmitry
Kuniyuki Iwashima July 30, 2024, 5:06 p.m. UTC | #2
From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Tue, 30 Jul 2024 16:46:05 +0300
> Hm. Note both 'reuseport_add_sock()' and 'reuseport_detach_sock()' uses
> global 'reuseport_lock' internally. So what about using read lock to protect
> 'sctp_for_each_entry()' loop and upgrade to write lock only if hash bucket
> should be actually updated? E.g.:
> 
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 17fcaa9b0df9..4fbff388b1b4 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -735,6 +735,7 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  	struct sock *sk = ep->base.sk;
>  	struct net *net = sock_net(sk);
>  	struct sctp_hashbucket *head;
> +	int err = 0;
>  
>  	ep->hashent = sctp_ep_hashfn(net, ep->base.bind_addr.port);
>  	head = &sctp_ep_hashtable[ep->hashent];
> @@ -743,11 +744,14 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  		bool any = sctp_is_ep_boundall(sk);
>  		struct sctp_endpoint *ep2;
>  		struct list_head *list;
> -		int cnt = 0, err = 1;
> +		int cnt = 0;
> +
> +		err = 1;
>  
>  		list_for_each(list, &ep->base.bind_addr.address_list)
>  			cnt++;
>  
> +		read_lock(&head->lock);
>  		sctp_for_each_hentry(ep2, &head->chain) {
>  			struct sock *sk2 = ep2->base.sk;
>  
> @@ -760,25 +764,30 @@ static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
>  						    sctp_sk(sk), cnt);
>  			if (!err) {
>  				err = reuseport_add_sock(sk, sk2, any);
> -				if (err)
> -					return err;
> +				if (err) {
> +					read_unlock(&head->lock);
> +					goto out;
> +				}
>  				break;
>  			} else if (err < 0) {
> -				return err;
> +				read_unlock(&head->lock);
> +				goto out;
>  			}
>  		}
> +		read_unlock(&head->lock);
>  
>  		if (err) {
>  			err = reuseport_alloc(sk, any);

What happens if two sockets matching each other reach here ?

There would be no error but the first hashed socket will be silently
dead as lookup stops at the 2nd sk ?


>  			if (err)
> -				return err;
> +				goto out;
>  		}
>  	}
>  
>  	write_lock(&head->lock);
>  	hlist_add_head(&ep->node, &head->chain);
>  	write_unlock(&head->lock);
> -	return 0;
> +out:
> +	return err;
>  }
>  
>  /* Add an endpoint to the hash. Local BH-safe. */
> 
> Dmitry
Dmitry Antipov July 31, 2024, 7:05 a.m. UTC | #3
> What happens if two sockets matching each other reach here ?

Not sure. In general, an attempt to use rather external thing (struct
sctp_hashbucket) to implement extra synchronization for reuse innards
looks somewhat weird.

So let's look at reuseport_add_sock() again. Note extra comments:

int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
{
	struct sock_reuseport *old_reuse, *reuse;

	/* RCU access _outside_ of reuseport_lock'ed section */
	if (!rcu_access_pointer(sk2->sk_reuseport_cb)) {
		int err = reuseport_alloc(sk2, bind_inany);

		if (err)
			return err;
	}

	spin_lock_bh(&reuseport_lock);
	reuse = rcu_dereference_protected(sk2->sk_reuseport_cb,
					  lockdep_is_held(&reuseport_lock));
	old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
					      lockdep_is_held(&reuseport_lock));
	if (old_reuse && old_reuse->num_closed_socks) {
		/* sk was shutdown()ed before */
		int err = reuseport_resurrect(sk, old_reuse, reuse, reuse->bind_inany);

		spin_unlock_bh(&reuseport_lock);
		return err;
	}

	if (old_reuse && old_reuse->num_socks != 1) {
		spin_unlock_bh(&reuseport_lock);
		return -EBUSY;
	}

	if (reuse->num_socks + reuse->num_closed_socks == reuse->max_socks) {
		reuse = reuseport_grow(reuse);
		if (!reuse) {
			spin_unlock_bh(&reuseport_lock);
			return -ENOMEM;
		}
	}

	__reuseport_add_sock(sk, reuse);
	/* RCU access _inside_ of reuseport_lock'ed section */
	rcu_assign_pointer(sk->sk_reuseport_cb, reuse);

	spin_unlock_bh(&reuseport_lock);

	if (old_reuse)
		call_rcu(&old_reuse->rcu, reuseport_free_rcu);
	return 0;
}

Why this is so? I've tried to extend reuseport_lock'ed section to include the
first rcu_access_pointer() in subject as well, e.g.:

diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index 5a165286e4d8..39a353ab8ff8 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -186,16 +186,11 @@ static struct sock_reuseport *__reuseport_alloc(unsigned int max_socks)
 	return reuse;
 }
 
-int reuseport_alloc(struct sock *sk, bool bind_inany)
+static int reuseport_alloc_unlocked(struct sock *sk, bool bind_inany)
 {
 	struct sock_reuseport *reuse;
 	int id, ret = 0;
 
-	/* bh lock used since this function call may precede hlist lock in
-	 * soft irq of receive path or setsockopt from process context
-	 */
-	spin_lock_bh(&reuseport_lock);
-
 	/* Allocation attempts can occur concurrently via the setsockopt path
 	 * and the bind/hash path.  Nothing to do when we lose the race.
 	 */
@@ -236,8 +231,19 @@ int reuseport_alloc(struct sock *sk, bool bind_inany)
 	reuse->num_socks = 1;
 	reuseport_get_incoming_cpu(sk, reuse);
 	rcu_assign_pointer(sk->sk_reuseport_cb, reuse);
-
 out:
+	return ret;
+}
+
+int reuseport_alloc(struct sock *sk, bool bind_inany)
+{
+	int ret;
+
+	/* bh lock used since this function call may precede hlist lock in
+	 * soft irq of receive path or setsockopt from process context
+	 */
+	spin_lock_bh(&reuseport_lock);
+	ret = reuseport_alloc_unlocked(sk, bind_inany);
 	spin_unlock_bh(&reuseport_lock);
 
 	return ret;
@@ -322,14 +328,17 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
 {
 	struct sock_reuseport *old_reuse, *reuse;
 
+	spin_lock_bh(&reuseport_lock);
+
 	if (!rcu_access_pointer(sk2->sk_reuseport_cb)) {
-		int err = reuseport_alloc(sk2, bind_inany);
+		int err = reuseport_alloc_unlocked(sk2, bind_inany);
 
-		if (err)
+		if (err) {
+			spin_unlock_bh(&reuseport_lock);
 			return err;
+		}
 	}
 
-	spin_lock_bh(&reuseport_lock);
 	reuse = rcu_dereference_protected(sk2->sk_reuseport_cb,
 					  lockdep_is_held(&reuseport_lock));
 	old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb,

and this seems fixes the original crash as well.

Dmitry
Kuniyuki Iwashima July 31, 2024, 7:06 p.m. UTC | #4
From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Wed, 31 Jul 2024 10:05:37 +0300
> > What happens if two sockets matching each other reach here ?
> 
> Not sure.

If two sockets reach the reuseport_alloc() path, two identical
reuseport groups are created.

Then, in __sctp_rcv_lookup_endpoint(), incoming packets hit only
one socket group placed before the other in the hash bucket, and
the other socket no longer receive data and silently died from
userspace POV.

The suggested change papers over the problem.

reusport_add_sock() and reuseport_alloc() must be placed in the
same writer critical section.


> In general, an attempt to use rather external thing (struct
> sctp_hashbucket) to implement extra synchronization for reuse innards
> looks somewhat weird.

reuseport_lock is to synchronise operations within the same
reuseport group, but which socket should belong to which group
is out of scope.

That must be well-defined and synchronised at the upper level.

I'll post v2 with updated changelog.

Thanks!
Dmitry Antipov July 31, 2024, 7:15 p.m. UTC | #5
On 7/31/24 10:06 PM, Kuniyuki Iwashima wrote:

> reuseport_lock is to synchronise operations within the same
> reuseport group, but which socket should belong to which group
> is out of scope.

Hm... then the lock should be a member of 'struct sock_reuseport'
rather than global, isn't?

Dmitry
Kuniyuki Iwashima July 31, 2024, 7:33 p.m. UTC | #6
From: Dmitry Antipov <dmantipov@yandex.ru>
Date: Wed, 31 Jul 2024 22:15:12 +0300
> On 7/31/24 10:06 PM, Kuniyuki Iwashima wrote:
> 
> > reuseport_lock is to synchronise operations within the same
> > reuseport group, but which socket should belong to which group
> > is out of scope.
> 
> Hm... then the lock should be a member of 'struct sock_reuseport'
> rather than global, isn't?

I thought that before and it would be doable with more complex
locking since we still need another synchronisation for add/detach
and alloc.  Such operations are most unlikely done concurrently
nor frequently, and the global lock would be enough.

I'll check that again later, but anyway, it's a different topic
for net-next.
diff mbox series

Patch

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 17fcaa9b0df9..a8a254a5008e 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -735,15 +735,19 @@  static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 	struct sock *sk = ep->base.sk;
 	struct net *net = sock_net(sk);
 	struct sctp_hashbucket *head;
+	int err = 0;
 
 	ep->hashent = sctp_ep_hashfn(net, ep->base.bind_addr.port);
 	head = &sctp_ep_hashtable[ep->hashent];
 
+	write_lock(&head->lock);
 	if (sk->sk_reuseport) {
 		bool any = sctp_is_ep_boundall(sk);
 		struct sctp_endpoint *ep2;
 		struct list_head *list;
-		int cnt = 0, err = 1;
+		int cnt = 0;
+
+		err = 1;
 
 		list_for_each(list, &ep->base.bind_addr.address_list)
 			cnt++;
@@ -761,24 +765,24 @@  static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
 			if (!err) {
 				err = reuseport_add_sock(sk, sk2, any);
 				if (err)
-					return err;
+					goto out;
 				break;
 			} else if (err < 0) {
-				return err;
+				goto out;
 			}
 		}
 
 		if (err) {
 			err = reuseport_alloc(sk, any);
 			if (err)
-				return err;
+				goto out;
 		}
 	}
 
-	write_lock(&head->lock);
 	hlist_add_head(&ep->node, &head->chain);
+out:
 	write_unlock(&head->lock);
-	return 0;
+	return err;
 }
 
 /* Add an endpoint to the hash. Local BH-safe. */
@@ -803,10 +807,9 @@  static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
 
 	head = &sctp_ep_hashtable[ep->hashent];
 
+	write_lock(&head->lock);
 	if (rcu_access_pointer(sk->sk_reuseport_cb))
 		reuseport_detach_sock(sk);
-
-	write_lock(&head->lock);
 	hlist_del_init(&ep->node);
 	write_unlock(&head->lock);
 }