diff mbox series

[net] bpf, skmsg: fix NULL pointer dereference in sk_psock_skb_ingress_enqueue

Message ID 20240329134037.92124-1-kerneljasonxing@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] bpf, skmsg: fix NULL pointer dereference in sk_psock_skb_ingress_enqueue | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 945 this patch: 945
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 955 this patch: 955
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 956 this patch: 956
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-31--03-00 (tests: 953)

Commit Message

Jason Xing March 29, 2024, 1:40 p.m. UTC
From: Jason Xing <kernelxing@tencent.com>

Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
syzbot reported [1].

[1]
BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue

write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
 sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
 sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
 sk_psock_put include/linux/skmsg.h:459 [inline]
 sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
 unix_release+0x4b/0x80 net/unix/af_unix.c:1048
 __sock_release net/socket.c:659 [inline]
 sock_close+0x68/0x150 net/socket.c:1421
 __fput+0x2c1/0x660 fs/file_table.c:422
 __fput_sync+0x44/0x60 fs/file_table.c:507
 __do_sys_close fs/open.c:1556 [inline]
 __se_sys_close+0x101/0x1b0 fs/open.c:1541
 __x64_sys_close+0x1f/0x30 fs/open.c:1541
 do_syscall_64+0xd3/0x1d0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
 sk_psock_data_ready include/linux/skmsg.h:464 [inline]
 sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
 sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
 sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
 sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
 unix_read_skb net/unix/af_unix.c:2546 [inline]
 unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
 sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
 unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x140/0x180 net/socket.c:745
 ____sys_sendmsg+0x312/0x410 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
 __do_sys_sendmsg net/socket.c:2676 [inline]
 __se_sys_sendmsg net/socket.c:2674 [inline]
 __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
 do_syscall_64+0xd3/0x1d0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

value changed: 0xffffffff83d7feb0 -> 0x0000000000000000

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024

Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
similarly due to no protection of saved_data_ready. Here is another
different caller causing the same issue because of the same reason. So
we should protect it with sk_callback_lock read lock because the writer
side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
 net/core/skmsg.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jason Xing March 31, 2024, 9:44 a.m. UTC | #1
On Fri, Mar 29, 2024 at 9:40 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> syzbot reported [1].
>
> [1]
> BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
>
> write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
>  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
>  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
>  sk_psock_put include/linux/skmsg.h:459 [inline]
>  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
>  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
>  __sock_release net/socket.c:659 [inline]
>  sock_close+0x68/0x150 net/socket.c:1421
>  __fput+0x2c1/0x660 fs/file_table.c:422
>  __fput_sync+0x44/0x60 fs/file_table.c:507
>  __do_sys_close fs/open.c:1556 [inline]
>  __se_sys_close+0x101/0x1b0 fs/open.c:1541
>  __x64_sys_close+0x1f/0x30 fs/open.c:1541
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
>  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
>  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
>  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
>  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
>  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
>  unix_read_skb net/unix/af_unix.c:2546 [inline]
>  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
>  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
>  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x140/0x180 net/socket.c:745
>  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
>  __do_sys_sendmsg net/socket.c:2676 [inline]
>  __se_sys_sendmsg net/socket.c:2674 [inline]
>  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
>
> Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> similarly due to no protection of saved_data_ready. Here is another
> different caller causing the same issue because of the same reason. So
> we should protect it with sk_callback_lock read lock because the writer
> side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".

Should I use 'read_lock(&sk->sk_callback_lock);' in bpf_tcp_ingress()
to protect sk_callback_lock field? If it's ok, I will do it in another
patch.

Thanks,
Jason

>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/core/skmsg.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 4d75ef9d24bf..67c4c01c5235 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
>         msg->skb = skb;
>
>         sk_psock_queue_msg(psock, msg);
> +       read_lock_bh(&sk->sk_callback_lock);
>         sk_psock_data_ready(sk, psock);
> +       read_unlock_bh(&sk->sk_callback_lock);
>         return copied;
>  }
>
> --
> 2.37.3
>
Jason Xing April 4, 2024, 12:44 a.m. UTC | #2
On Fri, Mar 29, 2024 at 9:40 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> From: Jason Xing <kernelxing@tencent.com>
>
> Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> syzbot reported [1].
>
> [1]
> BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
>
> write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
>  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
>  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
>  sk_psock_put include/linux/skmsg.h:459 [inline]
>  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
>  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
>  __sock_release net/socket.c:659 [inline]
>  sock_close+0x68/0x150 net/socket.c:1421
>  __fput+0x2c1/0x660 fs/file_table.c:422
>  __fput_sync+0x44/0x60 fs/file_table.c:507
>  __do_sys_close fs/open.c:1556 [inline]
>  __se_sys_close+0x101/0x1b0 fs/open.c:1541
>  __x64_sys_close+0x1f/0x30 fs/open.c:1541
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
>  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
>  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
>  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
>  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
>  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
>  unix_read_skb net/unix/af_unix.c:2546 [inline]
>  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
>  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
>  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x140/0x180 net/socket.c:745
>  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
>  __do_sys_sendmsg net/socket.c:2676 [inline]
>  __se_sys_sendmsg net/socket.c:2674 [inline]
>  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
>
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
>
> Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> similarly due to no protection of saved_data_ready. Here is another
> different caller causing the same issue because of the same reason. So
> we should protect it with sk_callback_lock read lock because the writer
> side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".

Hello experts, any comments are welcome:) Did I miss something?

Thanks,
Jason

>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/core/skmsg.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 4d75ef9d24bf..67c4c01c5235 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
>         msg->skb = skb;
>
>         sk_psock_queue_msg(psock, msg);
> +       read_lock_bh(&sk->sk_callback_lock);
>         sk_psock_data_ready(sk, psock);
> +       read_unlock_bh(&sk->sk_callback_lock);
>         return copied;
>  }
>
> --
> 2.37.3
>
John Fastabend April 4, 2024, 1 a.m. UTC | #3
Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
> 
> Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> syzbot reported [1].
> 
> [1]
> BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> 
> write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
>  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
>  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
>  sk_psock_put include/linux/skmsg.h:459 [inline]
>  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
>  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
>  __sock_release net/socket.c:659 [inline]
>  sock_close+0x68/0x150 net/socket.c:1421
>  __fput+0x2c1/0x660 fs/file_table.c:422
>  __fput_sync+0x44/0x60 fs/file_table.c:507
>  __do_sys_close fs/open.c:1556 [inline]
>  __se_sys_close+0x101/0x1b0 fs/open.c:1541
>  __x64_sys_close+0x1f/0x30 fs/open.c:1541
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> 
> read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
>  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
>  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
>  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
>  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
>  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
>  unix_read_skb net/unix/af_unix.c:2546 [inline]
>  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
>  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
>  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x140/0x180 net/socket.c:745
>  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
>  __do_sys_sendmsg net/socket.c:2676 [inline]
>  __se_sys_sendmsg net/socket.c:2674 [inline]
>  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
>  do_syscall_64+0xd3/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> 
> value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> 
> Reported by Kernel Concurrency Sanitizer on:
> CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> 
> Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> similarly due to no protection of saved_data_ready. Here is another
> different caller causing the same issue because of the same reason. So
> we should protect it with sk_callback_lock read lock because the writer
> side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> 
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
>  net/core/skmsg.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 4d75ef9d24bf..67c4c01c5235 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
>  	msg->skb = skb;
>  
>  	sk_psock_queue_msg(psock, msg);
> +	read_lock_bh(&sk->sk_callback_lock);
>  	sk_psock_data_ready(sk, psock);
> +	read_unlock_bh(&sk->sk_callback_lock);
>  	return copied;
>  }

The problem is the check and then usage presumably it is already set
to NULL:

 static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
 {
	if (psock->saved_data_ready)
		psock->saved_data_ready(sk);


I'm thinking we might be able to get away with just a READ_ONCE here with
similar WRITE_ONCE on other side. Something like this,

  sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
  {
       saved_data_ready = READ_ONCE(psock->saved_data_ready)

       if (saved_data_ready)
             saved_data_ready(sk)
       ....

And then in sk_psock_stop_verdict,

	WRITE_ONCE(sk->sk_data_ready, psock->saved_data_ready);
	WRITE_ONCE(psock->saved_data_ready, NULL);

And because we don't actually release the sock until a RCU grace period we
should be OK. The TCP stack manages to work correctly without wrapping
tcp_data_ready in locks like this. But nice thing there is you don't change
this callback on live sockets.

I think at least to keep backport simply above patch is ok, but lets move
the read_lock_bh()/unlock_bh() into the sk_psock_data_ready() call and then
we don't duplicate this error again. Does that make sense?

Thanks,
John
Jason Xing April 4, 2024, 1:25 a.m. UTC | #4
Hello John,

On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > syzbot reported [1].
> >
> > [1]
> > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> >
> > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> >  sk_psock_put include/linux/skmsg.h:459 [inline]
> >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> >  __sock_release net/socket.c:659 [inline]
> >  sock_close+0x68/0x150 net/socket.c:1421
> >  __fput+0x2c1/0x660 fs/file_table.c:422
> >  __fput_sync+0x44/0x60 fs/file_table.c:507
> >  __do_sys_close fs/open.c:1556 [inline]
> >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> >  do_syscall_64+0xd3/0x1d0
> >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> >
> > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> >  sock_sendmsg_nosec net/socket.c:730 [inline]
> >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> >  ___sys_sendmsg net/socket.c:2638 [inline]
> >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> >  __do_sys_sendmsg net/socket.c:2676 [inline]
> >  __se_sys_sendmsg net/socket.c:2674 [inline]
> >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> >  do_syscall_64+0xd3/0x1d0
> >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> >
> > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> >
> > Reported by Kernel Concurrency Sanitizer on:
> > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> >
> > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > similarly due to no protection of saved_data_ready. Here is another
> > different caller causing the same issue because of the same reason. So
> > we should protect it with sk_callback_lock read lock because the writer
> > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> >
> > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> >  net/core/skmsg.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index 4d75ef9d24bf..67c4c01c5235 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> >       msg->skb = skb;
> >
> >       sk_psock_queue_msg(psock, msg);
> > +     read_lock_bh(&sk->sk_callback_lock);
> >       sk_psock_data_ready(sk, psock);
> > +     read_unlock_bh(&sk->sk_callback_lock);
> >       return copied;
> >  }
>
> The problem is the check and then usage presumably it is already set
> to NULL:
>
>  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
>  {
>         if (psock->saved_data_ready)
>                 psock->saved_data_ready(sk);

Yes.

>
>
> I'm thinking we might be able to get away with just a READ_ONCE here with
> similar WRITE_ONCE on other side. Something like this,

The simple fix that popped into my mind at the beginning is the same
as you: adding the READ_ONCE/WRITE_ONCE pair.

>
>   sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
>   {
>        saved_data_ready = READ_ONCE(psock->saved_data_ready)
>
>        if (saved_data_ready)
>              saved_data_ready(sk)
>        ....
>
> And then in sk_psock_stop_verdict,
>
>         WRITE_ONCE(sk->sk_data_ready, psock->saved_data_ready);
>         WRITE_ONCE(psock->saved_data_ready, NULL);
>
> And because we don't actually release the sock until a RCU grace period we
> should be OK. The TCP stack manages to work correctly without wrapping
> tcp_data_ready in locks like this. But nice thing there is you don't change
> this callback on live sockets.
>
> I think at least to keep backport simply above patch is ok, but lets move
> the read_lock_bh()/unlock_bh() into the sk_psock_data_ready() call and then
> we don't duplicate this error again. Does that make sense?

Thanks for your quick response and such detailed analysis:)

It makes sense. I will move it into the sk_psock_data_ready(), then
three callers for now and possible callers in future would never go
wrong.

>
> Thanks,
> John
>
John Fastabend April 5, 2024, 4:45 a.m. UTC | #5
Jason Xing wrote:
> Hello John,
> 
> On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > > syzbot reported [1].
> > >
> > > [1]
> > > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> > >
> > > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> > >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> > >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> > >  sk_psock_put include/linux/skmsg.h:459 [inline]
> > >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> > >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> > >  __sock_release net/socket.c:659 [inline]
> > >  sock_close+0x68/0x150 net/socket.c:1421
> > >  __fput+0x2c1/0x660 fs/file_table.c:422
> > >  __fput_sync+0x44/0x60 fs/file_table.c:507
> > >  __do_sys_close fs/open.c:1556 [inline]
> > >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> > >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> > >  do_syscall_64+0xd3/0x1d0
> > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > >
> > > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> > >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> > >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> > >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> > >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> > >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> > >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> > >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> > >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> > >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> > >  sock_sendmsg_nosec net/socket.c:730 [inline]
> > >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> > >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> > >  ___sys_sendmsg net/socket.c:2638 [inline]
> > >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> > >  __do_sys_sendmsg net/socket.c:2676 [inline]
> > >  __se_sys_sendmsg net/socket.c:2674 [inline]
> > >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> > >  do_syscall_64+0xd3/0x1d0
> > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > >
> > > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> > >
> > > Reported by Kernel Concurrency Sanitizer on:
> > > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > >
> > > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > > similarly due to no protection of saved_data_ready. Here is another
> > > different caller causing the same issue because of the same reason. So
> > > we should protect it with sk_callback_lock read lock because the writer
> > > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> > >
> > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > >  net/core/skmsg.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index 4d75ef9d24bf..67c4c01c5235 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> > >       msg->skb = skb;
> > >
> > >       sk_psock_queue_msg(psock, msg);
> > > +     read_lock_bh(&sk->sk_callback_lock);
> > >       sk_psock_data_ready(sk, psock);
> > > +     read_unlock_bh(&sk->sk_callback_lock);
> > >       return copied;
> > >  }
> >
> > The problem is the check and then usage presumably it is already set
> > to NULL:
> >
> >  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
> >  {
> >         if (psock->saved_data_ready)
> >                 psock->saved_data_ready(sk);
> 
> Yes.
> 
> >
> >
> > I'm thinking we might be able to get away with just a READ_ONCE here with
> > similar WRITE_ONCE on other side. Something like this,
> 
> The simple fix that popped into my mind at the beginning is the same
> as you: adding the READ_ONCE/WRITE_ONCE pair.

Let me know if you want to try doing a patch with the READ_ONCE/WRITE_ONCE
we could push something like that through bpf-next I think. Just needs
some extra thought and testing.
Jason Xing April 5, 2024, 5:12 a.m. UTC | #6
On Fri, Apr 5, 2024 at 12:45 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jason Xing wrote:
> > Hello John,
> >
> > On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > > > syzbot reported [1].
> > > >
> > > > [1]
> > > > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> > > >
> > > > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> > > >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> > > >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> > > >  sk_psock_put include/linux/skmsg.h:459 [inline]
> > > >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> > > >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> > > >  __sock_release net/socket.c:659 [inline]
> > > >  sock_close+0x68/0x150 net/socket.c:1421
> > > >  __fput+0x2c1/0x660 fs/file_table.c:422
> > > >  __fput_sync+0x44/0x60 fs/file_table.c:507
> > > >  __do_sys_close fs/open.c:1556 [inline]
> > > >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> > > >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> > > >  do_syscall_64+0xd3/0x1d0
> > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > >
> > > > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> > > >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> > > >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> > > >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> > > >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> > > >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> > > >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> > > >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> > > >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> > > >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> > > >  sock_sendmsg_nosec net/socket.c:730 [inline]
> > > >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> > > >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> > > >  ___sys_sendmsg net/socket.c:2638 [inline]
> > > >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> > > >  __do_sys_sendmsg net/socket.c:2676 [inline]
> > > >  __se_sys_sendmsg net/socket.c:2674 [inline]
> > > >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> > > >  do_syscall_64+0xd3/0x1d0
> > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > >
> > > > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> > > >
> > > > Reported by Kernel Concurrency Sanitizer on:
> > > > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > > >
> > > > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > > > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > > > similarly due to no protection of saved_data_ready. Here is another
> > > > different caller causing the same issue because of the same reason. So
> > > > we should protect it with sk_callback_lock read lock because the writer
> > > > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> > > >
> > > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > > > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > > > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > >  net/core/skmsg.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > > index 4d75ef9d24bf..67c4c01c5235 100644
> > > > --- a/net/core/skmsg.c
> > > > +++ b/net/core/skmsg.c
> > > > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> > > >       msg->skb = skb;
> > > >
> > > >       sk_psock_queue_msg(psock, msg);
> > > > +     read_lock_bh(&sk->sk_callback_lock);
> > > >       sk_psock_data_ready(sk, psock);
> > > > +     read_unlock_bh(&sk->sk_callback_lock);
> > > >       return copied;
> > > >  }
> > >
> > > The problem is the check and then usage presumably it is already set
> > > to NULL:
> > >
> > >  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
> > >  {
> > >         if (psock->saved_data_ready)
> > >                 psock->saved_data_ready(sk);
> >
> > Yes.
> >
> > >
> > >
> > > I'm thinking we might be able to get away with just a READ_ONCE here with
> > > similar WRITE_ONCE on other side. Something like this,
> >
> > The simple fix that popped into my mind at the beginning is the same
> > as you: adding the READ_ONCE/WRITE_ONCE pair.
>
> Let me know if you want to try doing a patch with the READ_ONCE/WRITE_ONCE
> we could push something like that through bpf-next I think. Just needs
> some extra thought and testing.

Yes, I'm interested in it. Just a little bit worried that I cannot do
it well. I will take some time to dig into it.

BTW, would this modification conflict with the current patch? The
final solution you're thinking of is using the READ_ONCE/WRITE_ONCE
pair?

Thanks,
Jason
John Fastabend April 5, 2024, 2:58 p.m. UTC | #7
Jason Xing wrote:
> On Fri, Apr 5, 2024 at 12:45 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > Hello John,
> > >
> > > On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > > > > syzbot reported [1].
> > > > >
> > > > > [1]
> > > > > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> > > > >
> > > > > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> > > > >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> > > > >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> > > > >  sk_psock_put include/linux/skmsg.h:459 [inline]
> > > > >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> > > > >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> > > > >  __sock_release net/socket.c:659 [inline]
> > > > >  sock_close+0x68/0x150 net/socket.c:1421
> > > > >  __fput+0x2c1/0x660 fs/file_table.c:422
> > > > >  __fput_sync+0x44/0x60 fs/file_table.c:507
> > > > >  __do_sys_close fs/open.c:1556 [inline]
> > > > >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> > > > >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> > > > >  do_syscall_64+0xd3/0x1d0
> > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > >
> > > > > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> > > > >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> > > > >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> > > > >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> > > > >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> > > > >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> > > > >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> > > > >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> > > > >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> > > > >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> > > > >  sock_sendmsg_nosec net/socket.c:730 [inline]
> > > > >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> > > > >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> > > > >  ___sys_sendmsg net/socket.c:2638 [inline]
> > > > >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> > > > >  __do_sys_sendmsg net/socket.c:2676 [inline]
> > > > >  __se_sys_sendmsg net/socket.c:2674 [inline]
> > > > >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> > > > >  do_syscall_64+0xd3/0x1d0
> > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > >
> > > > > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> > > > >
> > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > > > >
> > > > > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > > > > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > > > > similarly due to no protection of saved_data_ready. Here is another
> > > > > different caller causing the same issue because of the same reason. So
> > > > > we should protect it with sk_callback_lock read lock because the writer
> > > > > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> > > > >
> > > > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > > > > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > > > > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > >  net/core/skmsg.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > >
> > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > > > index 4d75ef9d24bf..67c4c01c5235 100644
> > > > > --- a/net/core/skmsg.c
> > > > > +++ b/net/core/skmsg.c
> > > > > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> > > > >       msg->skb = skb;
> > > > >
> > > > >       sk_psock_queue_msg(psock, msg);
> > > > > +     read_lock_bh(&sk->sk_callback_lock);
> > > > >       sk_psock_data_ready(sk, psock);
> > > > > +     read_unlock_bh(&sk->sk_callback_lock);
> > > > >       return copied;
> > > > >  }
> > > >
> > > > The problem is the check and then usage presumably it is already set
> > > > to NULL:
> > > >
> > > >  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
> > > >  {
> > > >         if (psock->saved_data_ready)
> > > >                 psock->saved_data_ready(sk);
> > >
> > > Yes.
> > >
> > > >
> > > >
> > > > I'm thinking we might be able to get away with just a READ_ONCE here with
> > > > similar WRITE_ONCE on other side. Something like this,
> > >
> > > The simple fix that popped into my mind at the beginning is the same
> > > as you: adding the READ_ONCE/WRITE_ONCE pair.
> >
> > Let me know if you want to try doing a patch with the READ_ONCE/WRITE_ONCE
> > we could push something like that through bpf-next I think. Just needs
> > some extra thought and testing.
> 
> Yes, I'm interested in it. Just a little bit worried that I cannot do
> it well. I will take some time to dig into it.
> 
> BTW, would this modification conflict with the current patch? The
> final solution you're thinking of is using the READ_ONCE/WRITE_ONCE
> pair?

Idea would be you can drop the read_lock/unlock once READ_ONCE/WRITE_ONCE
and such are in place.

> 
> Thanks,
> Jason
Jason Xing April 5, 2024, 3 p.m. UTC | #8
On Fri, Apr 5, 2024 at 10:58 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Fri, Apr 5, 2024 at 12:45 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > Hello John,
> > > >
> > > > On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > >
> > > > > > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > > > > > syzbot reported [1].
> > > > > >
> > > > > > [1]
> > > > > > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> > > > > >
> > > > > > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> > > > > >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> > > > > >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> > > > > >  sk_psock_put include/linux/skmsg.h:459 [inline]
> > > > > >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> > > > > >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> > > > > >  __sock_release net/socket.c:659 [inline]
> > > > > >  sock_close+0x68/0x150 net/socket.c:1421
> > > > > >  __fput+0x2c1/0x660 fs/file_table.c:422
> > > > > >  __fput_sync+0x44/0x60 fs/file_table.c:507
> > > > > >  __do_sys_close fs/open.c:1556 [inline]
> > > > > >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> > > > > >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> > > > > >  do_syscall_64+0xd3/0x1d0
> > > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > > >
> > > > > > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> > > > > >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> > > > > >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> > > > > >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> > > > > >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> > > > > >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> > > > > >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> > > > > >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> > > > > >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> > > > > >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> > > > > >  sock_sendmsg_nosec net/socket.c:730 [inline]
> > > > > >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> > > > > >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> > > > > >  ___sys_sendmsg net/socket.c:2638 [inline]
> > > > > >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> > > > > >  __do_sys_sendmsg net/socket.c:2676 [inline]
> > > > > >  __se_sys_sendmsg net/socket.c:2674 [inline]
> > > > > >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> > > > > >  do_syscall_64+0xd3/0x1d0
> > > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > > >
> > > > > > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> > > > > >
> > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > > > > >
> > > > > > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > > > > > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > > > > > similarly due to no protection of saved_data_ready. Here is another
> > > > > > different caller causing the same issue because of the same reason. So
> > > > > > we should protect it with sk_callback_lock read lock because the writer
> > > > > > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> > > > > >
> > > > > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > > > > > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > > > > > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > > ---
> > > > > >  net/core/skmsg.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > > > > index 4d75ef9d24bf..67c4c01c5235 100644
> > > > > > --- a/net/core/skmsg.c
> > > > > > +++ b/net/core/skmsg.c
> > > > > > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> > > > > >       msg->skb = skb;
> > > > > >
> > > > > >       sk_psock_queue_msg(psock, msg);
> > > > > > +     read_lock_bh(&sk->sk_callback_lock);
> > > > > >       sk_psock_data_ready(sk, psock);
> > > > > > +     read_unlock_bh(&sk->sk_callback_lock);
> > > > > >       return copied;
> > > > > >  }
> > > > >
> > > > > The problem is the check and then usage presumably it is already set
> > > > > to NULL:
> > > > >
> > > > >  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
> > > > >  {
> > > > >         if (psock->saved_data_ready)
> > > > >                 psock->saved_data_ready(sk);
> > > >
> > > > Yes.
> > > >
> > > > >
> > > > >
> > > > > I'm thinking we might be able to get away with just a READ_ONCE here with
> > > > > similar WRITE_ONCE on other side. Something like this,
> > > >
> > > > The simple fix that popped into my mind at the beginning is the same
> > > > as you: adding the READ_ONCE/WRITE_ONCE pair.
> > >
> > > Let me know if you want to try doing a patch with the READ_ONCE/WRITE_ONCE
> > > we could push something like that through bpf-next I think. Just needs
> > > some extra thought and testing.
> >
> > Yes, I'm interested in it. Just a little bit worried that I cannot do
> > it well. I will take some time to dig into it.
> >
> > BTW, would this modification conflict with the current patch? The
> > final solution you're thinking of is using the READ_ONCE/WRITE_ONCE
> > pair?
>
> Idea would be you can drop the read_lock/unlock once READ_ONCE/WRITE_ONCE
> and such are in place.

Got it. Allow me to work on it. Lockless protection is surely better
than rw lock. But can we let the current patch go into the tree first?

Thanks,
Jason
>
> >
> > Thanks,
> > Jason
>
>
John Fastabend April 5, 2024, 3:28 p.m. UTC | #9
Jason Xing wrote:
> On Fri, Apr 5, 2024 at 10:58 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Jason Xing wrote:
> > > On Fri, Apr 5, 2024 at 12:45 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Jason Xing wrote:
> > > > > Hello John,
> > > > >
> > > > > On Thu, Apr 4, 2024 at 9:01 AM John Fastabend <john.fastabend@gmail.com> wrote:
> > > > > >
> > > > > > Jason Xing wrote:
> > > > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > > > >
> > > > > > > Fix NULL pointer data-races in sk_psock_skb_ingress_enqueue() which
> > > > > > > syzbot reported [1].
> > > > > > >
> > > > > > > [1]
> > > > > > > BUG: KCSAN: data-race in sk_psock_drop / sk_psock_skb_ingress_enqueue
> > > > > > >
> > > > > > > write to 0xffff88814b3278b8 of 8 bytes by task 10724 on cpu 1:
> > > > > > >  sk_psock_stop_verdict net/core/skmsg.c:1257 [inline]
> > > > > > >  sk_psock_drop+0x13e/0x1f0 net/core/skmsg.c:843
> > > > > > >  sk_psock_put include/linux/skmsg.h:459 [inline]
> > > > > > >  sock_map_close+0x1a7/0x260 net/core/sock_map.c:1648
> > > > > > >  unix_release+0x4b/0x80 net/unix/af_unix.c:1048
> > > > > > >  __sock_release net/socket.c:659 [inline]
> > > > > > >  sock_close+0x68/0x150 net/socket.c:1421
> > > > > > >  __fput+0x2c1/0x660 fs/file_table.c:422
> > > > > > >  __fput_sync+0x44/0x60 fs/file_table.c:507
> > > > > > >  __do_sys_close fs/open.c:1556 [inline]
> > > > > > >  __se_sys_close+0x101/0x1b0 fs/open.c:1541
> > > > > > >  __x64_sys_close+0x1f/0x30 fs/open.c:1541
> > > > > > >  do_syscall_64+0xd3/0x1d0
> > > > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > > > >
> > > > > > > read to 0xffff88814b3278b8 of 8 bytes by task 10713 on cpu 0:
> > > > > > >  sk_psock_data_ready include/linux/skmsg.h:464 [inline]
> > > > > > >  sk_psock_skb_ingress_enqueue+0x32d/0x390 net/core/skmsg.c:555
> > > > > > >  sk_psock_skb_ingress_self+0x185/0x1e0 net/core/skmsg.c:606
> > > > > > >  sk_psock_verdict_apply net/core/skmsg.c:1008 [inline]
> > > > > > >  sk_psock_verdict_recv+0x3e4/0x4a0 net/core/skmsg.c:1202
> > > > > > >  unix_read_skb net/unix/af_unix.c:2546 [inline]
> > > > > > >  unix_stream_read_skb+0x9e/0xf0 net/unix/af_unix.c:2682
> > > > > > >  sk_psock_verdict_data_ready+0x77/0x220 net/core/skmsg.c:1223
> > > > > > >  unix_stream_sendmsg+0x527/0x860 net/unix/af_unix.c:2339
> > > > > > >  sock_sendmsg_nosec net/socket.c:730 [inline]
> > > > > > >  __sock_sendmsg+0x140/0x180 net/socket.c:745
> > > > > > >  ____sys_sendmsg+0x312/0x410 net/socket.c:2584
> > > > > > >  ___sys_sendmsg net/socket.c:2638 [inline]
> > > > > > >  __sys_sendmsg+0x1e9/0x280 net/socket.c:2667
> > > > > > >  __do_sys_sendmsg net/socket.c:2676 [inline]
> > > > > > >  __se_sys_sendmsg net/socket.c:2674 [inline]
> > > > > > >  __x64_sys_sendmsg+0x46/0x50 net/socket.c:2674
> > > > > > >  do_syscall_64+0xd3/0x1d0
> > > > > > >  entry_SYSCALL_64_after_hwframe+0x6d/0x75
> > > > > > >
> > > > > > > value changed: 0xffffffff83d7feb0 -> 0x0000000000000000
> > > > > > >
> > > > > > > Reported by Kernel Concurrency Sanitizer on:
> > > > > > > CPU: 0 PID: 10713 Comm: syz-executor.4 Tainted: G        W          6.8.0-syzkaller-08951-gfe46a7dd189e #0
> > > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> > > > > > >
> > > > > > > Prior to this, commit 4cd12c6065df ("bpf, sockmap: Fix NULL pointer
> > > > > > > dereference in sk_psock_verdict_data_ready()") fixed one NULL pointer
> > > > > > > similarly due to no protection of saved_data_ready. Here is another
> > > > > > > different caller causing the same issue because of the same reason. So
> > > > > > > we should protect it with sk_callback_lock read lock because the writer
> > > > > > > side in the sk_psock_drop() uses "write_lock_bh(&sk->sk_callback_lock);".
> > > > > > >
> > > > > > > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> > > > > > > Reported-by: syzbot+aa8c8ec2538929f18f2d@syzkaller.appspotmail.com
> > > > > > > Closes: https://syzkaller.appspot.com/bug?extid=aa8c8ec2538929f18f2d
> > > > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > > > ---
> > > > > > >  net/core/skmsg.c | 2 ++
> > > > > > >  1 file changed, 2 insertions(+)
> > > > > > >
> > > > > > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > > > > > index 4d75ef9d24bf..67c4c01c5235 100644
> > > > > > > --- a/net/core/skmsg.c
> > > > > > > +++ b/net/core/skmsg.c
> > > > > > > @@ -552,7 +552,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> > > > > > >       msg->skb = skb;
> > > > > > >
> > > > > > >       sk_psock_queue_msg(psock, msg);
> > > > > > > +     read_lock_bh(&sk->sk_callback_lock);
> > > > > > >       sk_psock_data_ready(sk, psock);
> > > > > > > +     read_unlock_bh(&sk->sk_callback_lock);
> > > > > > >       return copied;
> > > > > > >  }
> > > > > >
> > > > > > The problem is the check and then usage presumably it is already set
> > > > > > to NULL:
> > > > > >
> > > > > >  static inline void sk_psock_data_ready(struct sock *sk, struct sk_psock *psock)
> > > > > >  {
> > > > > >         if (psock->saved_data_ready)
> > > > > >                 psock->saved_data_ready(sk);
> > > > >
> > > > > Yes.
> > > > >
> > > > > >
> > > > > >
> > > > > > I'm thinking we might be able to get away with just a READ_ONCE here with
> > > > > > similar WRITE_ONCE on other side. Something like this,
> > > > >
> > > > > The simple fix that popped into my mind at the beginning is the same
> > > > > as you: adding the READ_ONCE/WRITE_ONCE pair.
> > > >
> > > > Let me know if you want to try doing a patch with the READ_ONCE/WRITE_ONCE
> > > > we could push something like that through bpf-next I think. Just needs
> > > > some extra thought and testing.
> > >
> > > Yes, I'm interested in it. Just a little bit worried that I cannot do
> > > it well. I will take some time to dig into it.
> > >
> > > BTW, would this modification conflict with the current patch? The
> > > final solution you're thinking of is using the READ_ONCE/WRITE_ONCE
> > > pair?
> >
> > Idea would be you can drop the read_lock/unlock once READ_ONCE/WRITE_ONCE
> > and such are in place.
> 
> Got it. Allow me to work on it. Lockless protection is surely better
> than rw lock. But can we let the current patch go into the tree first?

Yes v2 fix has my Reviewed-by and should go in as a fix. The above
improvement can go through bpf-next when its ready.
diff mbox series

Patch

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 4d75ef9d24bf..67c4c01c5235 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -552,7 +552,9 @@  static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
 	msg->skb = skb;
 
 	sk_psock_queue_msg(psock, msg);
+	read_lock_bh(&sk->sk_callback_lock);
 	sk_psock_data_ready(sk, psock);
+	read_unlock_bh(&sk->sk_callback_lock);
 	return copied;
 }