diff mbox series

[bpf] tcp_bpf: fix return value of tcp_bpf_sendmsg()

Message ID 20240821030744.320934-1-xiyou.wangcong@gmail.com (mailing list archive)
State Accepted
Delegated to: Netdev Maintainers
Headers show
Series [bpf] tcp_bpf: fix return value of tcp_bpf_sendmsg() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for bpf
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 2 blamed authors not CCed: ast@kernel.org daniel@iogearbox.net; 6 maintainers not CCed: pabeni@redhat.com ast@kernel.org kuba@kernel.org edumazet@google.com dsahern@kernel.org daniel@iogearbox.net
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

Commit Message

Cong Wang Aug. 21, 2024, 3:07 a.m. UTC
From: Cong Wang <cong.wang@bytedance.com>

When we cork messages in psock->cork, the last message triggers the
flushing will result in sending a sk_msg larger than the current
message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
negative at least in the following case:

468         case __SK_DROP:
469         default:
470                 sk_msg_free_partial(sk, msg, tosend);
471                 sk_msg_apply_bytes(psock, tosend);
472                 *copied -= (tosend + delta); // <==== HERE
473                 return -EACCES;

Therefore, it could lead to the following BUG with a proper value of
'copied' (thanks to syzbot). We should not use negative 'copied' as a
return value here.

  ------------[ cut here ]------------
  kernel BUG at net/socket.c:733!
  Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
  Hardware name: linux,dummy-virt (DT)
  pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
  pc : sock_sendmsg_nosec net/socket.c:733 [inline]
  pc : sock_sendmsg_nosec net/socket.c:728 [inline]
  pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
  lr : sock_sendmsg_nosec net/socket.c:730 [inline]
  lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
  sp : ffff800088ea3b30
  x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
  x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
  x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
  x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
  x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
  x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
  x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
  x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
  x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
  x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
  Call trace:
   sock_sendmsg_nosec net/socket.c:733 [inline]
   __sock_sendmsg+0x5c/0x60 net/socket.c:745
   ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
   ___sys_sendmsg+0xac/0x100 net/socket.c:2651
   __sys_sendmsg+0x84/0xe0 net/socket.c:2680
   __do_sys_sendmsg net/socket.c:2689 [inline]
   __se_sys_sendmsg net/socket.c:2687 [inline]
   __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
   __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
   invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
   el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
   do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
   el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
   el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
   el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
  Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
  ---[ end trace 0000000000000000 ]---

Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/tcp_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Simon Horman Aug. 21, 2024, 2:55 p.m. UTC | #1
On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> When we cork messages in psock->cork, the last message triggers the
> flushing will result in sending a sk_msg larger than the current
> message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
> negative at least in the following case:
> 
> 468         case __SK_DROP:
> 469         default:
> 470                 sk_msg_free_partial(sk, msg, tosend);
> 471                 sk_msg_apply_bytes(psock, tosend);
> 472                 *copied -= (tosend + delta); // <==== HERE
> 473                 return -EACCES;
> 
> Therefore, it could lead to the following BUG with a proper value of
> 'copied' (thanks to syzbot). We should not use negative 'copied' as a
> return value here.
> 
>   ------------[ cut here ]------------
>   kernel BUG at net/socket.c:733!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>   Modules linked in:
>   CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>   pc : sock_sendmsg_nosec net/socket.c:733 [inline]
>   pc : sock_sendmsg_nosec net/socket.c:728 [inline]
>   pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
>   lr : sock_sendmsg_nosec net/socket.c:730 [inline]
>   lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
>   sp : ffff800088ea3b30
>   x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
>   x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
>   x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
>   x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
>   x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
>   x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>   x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
>   x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
>   x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
>   x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
>   Call trace:
>    sock_sendmsg_nosec net/socket.c:733 [inline]
>    __sock_sendmsg+0x5c/0x60 net/socket.c:745
>    ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
>    ___sys_sendmsg+0xac/0x100 net/socket.c:2651
>    __sys_sendmsg+0x84/0xe0 net/socket.c:2680
>    __do_sys_sendmsg net/socket.c:2689 [inline]
>    __se_sys_sendmsg net/socket.c:2687 [inline]
>    __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
>    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>    invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
>    el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
>    do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
>    el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
>    el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
>    el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
>   Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
>   ---[ end trace 0000000000000000 ]---
> 
> Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
> Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>  net/ipv4/tcp_bpf.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> index 53b0d62fd2c2..fe6178715ba0 100644
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c
> @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>  		err = sk_stream_error(sk, msg->msg_flags, err);
>  	release_sock(sk);
>  	sk_psock_put(sk, psock);
> -	return copied ? copied : err;
> +	return copied > 0 ? copied : err;

Does it make more sense to make the condition err:
is err 0 iif everything is ok? (completely untested!)

	return err ? err : copied;

>  }
>  
>  enum {
> -- 
> 2.34.1
> 
>
Cong Wang Aug. 22, 2024, 12:49 a.m. UTC | #2
On Wed, Aug 21, 2024 at 03:55:33PM +0100, Simon Horman wrote:
> On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> > 
> > When we cork messages in psock->cork, the last message triggers the
> > flushing will result in sending a sk_msg larger than the current
> > message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
> > negative at least in the following case:
> > 
> > 468         case __SK_DROP:
> > 469         default:
> > 470                 sk_msg_free_partial(sk, msg, tosend);
> > 471                 sk_msg_apply_bytes(psock, tosend);
> > 472                 *copied -= (tosend + delta); // <==== HERE
> > 473                 return -EACCES;
> > 
> > Therefore, it could lead to the following BUG with a proper value of
> > 'copied' (thanks to syzbot). We should not use negative 'copied' as a
> > return value here.
> > 
> >   ------------[ cut here ]------------
> >   kernel BUG at net/socket.c:733!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> >   Modules linked in:
> >   CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> >   pc : sock_sendmsg_nosec net/socket.c:733 [inline]
> >   pc : sock_sendmsg_nosec net/socket.c:728 [inline]
> >   pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
> >   lr : sock_sendmsg_nosec net/socket.c:730 [inline]
> >   lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
> >   sp : ffff800088ea3b30
> >   x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
> >   x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
> >   x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
> >   x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
> >   x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
> >   x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> >   x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
> >   x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
> >   x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
> >   x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
> >   Call trace:
> >    sock_sendmsg_nosec net/socket.c:733 [inline]
> >    __sock_sendmsg+0x5c/0x60 net/socket.c:745
> >    ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
> >    ___sys_sendmsg+0xac/0x100 net/socket.c:2651
> >    __sys_sendmsg+0x84/0xe0 net/socket.c:2680
> >    __do_sys_sendmsg net/socket.c:2689 [inline]
> >    __se_sys_sendmsg net/socket.c:2687 [inline]
> >    __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
> >    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
> >    invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
> >    el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
> >    do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
> >    el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
> >    el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> >    el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> >   Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
> >   ---[ end trace 0000000000000000 ]---
> > 
> > Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
> > Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >  net/ipv4/tcp_bpf.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > index 53b0d62fd2c2..fe6178715ba0 100644
> > --- a/net/ipv4/tcp_bpf.c
> > +++ b/net/ipv4/tcp_bpf.c
> > @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> >  		err = sk_stream_error(sk, msg->msg_flags, err);
> >  	release_sock(sk);
> >  	sk_psock_put(sk, psock);
> > -	return copied ? copied : err;
> > +	return copied > 0 ? copied : err;
> 
> Does it make more sense to make the condition err:
> is err 0 iif everything is ok? (completely untested!)

Mind to elaborate?

From my point of view, 'copied' is to handle partial transmission, for
example:

0. User wants to send 2 * 1K bytes with sendmsg()
1. Kernel already sent the first 1K successfully
2. Kernel got some error when sending the 2nd 1K

In this scenario, we should return 1K instead of the error to the caller to
indicate this partial transmission situation, otherwise we could not
distinguish it with a compete failure (that is, 0 byte sent).

Do I miss anything?

Thanks.
John Fastabend Aug. 22, 2024, 8:45 p.m. UTC | #3
Cong Wang wrote:
> On Wed, Aug 21, 2024 at 03:55:33PM +0100, Simon Horman wrote:
> > On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > > 
> > > When we cork messages in psock->cork, the last message triggers the
> > > flushing will result in sending a sk_msg larger than the current
> > > message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
> > > negative at least in the following case:
> > > 
> > > 468         case __SK_DROP:
> > > 469         default:
> > > 470                 sk_msg_free_partial(sk, msg, tosend);
> > > 471                 sk_msg_apply_bytes(psock, tosend);
> > > 472                 *copied -= (tosend + delta); // <==== HERE
> > > 473                 return -EACCES;
> > > 
> > > Therefore, it could lead to the following BUG with a proper value of
> > > 'copied' (thanks to syzbot). We should not use negative 'copied' as a
> > > return value here.
> > > 
> > >   ------------[ cut here ]------------
> > >   kernel BUG at net/socket.c:733!
> > >   Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > >   Modules linked in:
> > >   CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
> > >   Hardware name: linux,dummy-virt (DT)
> > >   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > >   pc : sock_sendmsg_nosec net/socket.c:733 [inline]
> > >   pc : sock_sendmsg_nosec net/socket.c:728 [inline]
> > >   pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > >   lr : sock_sendmsg_nosec net/socket.c:730 [inline]
> > >   lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
> > >   sp : ffff800088ea3b30
> > >   x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
> > >   x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
> > >   x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
> > >   x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
> > >   x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
> > >   x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > >   x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
> > >   x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
> > >   x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
> > >   x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
> > >   Call trace:
> > >    sock_sendmsg_nosec net/socket.c:733 [inline]
> > >    __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > >    ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
> > >    ___sys_sendmsg+0xac/0x100 net/socket.c:2651
> > >    __sys_sendmsg+0x84/0xe0 net/socket.c:2680
> > >    __do_sys_sendmsg net/socket.c:2689 [inline]
> > >    __se_sys_sendmsg net/socket.c:2687 [inline]
> > >    __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
> > >    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
> > >    invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
> > >    el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
> > >    do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
> > >    el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
> > >    el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> > >    el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> > >   Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
> > >   ---[ end trace 0000000000000000 ]---
> > > 
> > > Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
> > > Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >  net/ipv4/tcp_bpf.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > > index 53b0d62fd2c2..fe6178715ba0 100644
> > > --- a/net/ipv4/tcp_bpf.c
> > > +++ b/net/ipv4/tcp_bpf.c
> > > @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > >  		err = sk_stream_error(sk, msg->msg_flags, err);
> > >  	release_sock(sk);
> > >  	sk_psock_put(sk, psock);
> > > -	return copied ? copied : err;
> > > +	return copied > 0 ? copied : err;
> > 
> > Does it make more sense to make the condition err:
> > is err 0 iif everything is ok? (completely untested!)
> 
> Mind to elaborate?
> 
> From my point of view, 'copied' is to handle partial transmission, for
> example:
> 
> 0. User wants to send 2 * 1K bytes with sendmsg()
> 1. Kernel already sent the first 1K successfully
> 2. Kernel got some error when sending the 2nd 1K
> 
> In this scenario, we should return 1K instead of the error to the caller to
> indicate this partial transmission situation, otherwise we could not
> distinguish it with a compete failure (that is, 0 byte sent).

Yep, if we don't return the positive value on partial send we will confuse
apps and they will probably resent data.

From my side this looks good.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

> 
> Do I miss anything?
> 
> Thanks.
Simon Horman Aug. 23, 2024, 8:25 a.m. UTC | #4
On Thu, Aug 22, 2024 at 01:45:51PM -0700, John Fastabend wrote:
> Cong Wang wrote:
> > On Wed, Aug 21, 2024 at 03:55:33PM +0100, Simon Horman wrote:
> > > On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
> > > > From: Cong Wang <cong.wang@bytedance.com>
> > > > 
> > > > When we cork messages in psock->cork, the last message triggers the
> > > > flushing will result in sending a sk_msg larger than the current
> > > > message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
> > > > negative at least in the following case:
> > > > 
> > > > 468         case __SK_DROP:
> > > > 469         default:
> > > > 470                 sk_msg_free_partial(sk, msg, tosend);
> > > > 471                 sk_msg_apply_bytes(psock, tosend);
> > > > 472                 *copied -= (tosend + delta); // <==== HERE
> > > > 473                 return -EACCES;
> > > > 
> > > > Therefore, it could lead to the following BUG with a proper value of
> > > > 'copied' (thanks to syzbot). We should not use negative 'copied' as a
> > > > return value here.
> > > > 
> > > >   ------------[ cut here ]------------
> > > >   kernel BUG at net/socket.c:733!
> > > >   Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
> > > >   Modules linked in:
> > > >   CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
> > > >   Hardware name: linux,dummy-virt (DT)
> > > >   pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> > > >   pc : sock_sendmsg_nosec net/socket.c:733 [inline]
> > > >   pc : sock_sendmsg_nosec net/socket.c:728 [inline]
> > > >   pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > > >   lr : sock_sendmsg_nosec net/socket.c:730 [inline]
> > > >   lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
> > > >   sp : ffff800088ea3b30
> > > >   x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
> > > >   x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
> > > >   x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
> > > >   x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
> > > >   x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
> > > >   x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> > > >   x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
> > > >   x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
> > > >   x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
> > > >   x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
> > > >   Call trace:
> > > >    sock_sendmsg_nosec net/socket.c:733 [inline]
> > > >    __sock_sendmsg+0x5c/0x60 net/socket.c:745
> > > >    ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
> > > >    ___sys_sendmsg+0xac/0x100 net/socket.c:2651
> > > >    __sys_sendmsg+0x84/0xe0 net/socket.c:2680
> > > >    __do_sys_sendmsg net/socket.c:2689 [inline]
> > > >    __se_sys_sendmsg net/socket.c:2687 [inline]
> > > >    __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
> > > >    __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
> > > >    invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
> > > >    el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
> > > >    do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
> > > >    el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
> > > >    el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
> > > >    el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
> > > >   Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
> > > >   ---[ end trace 0000000000000000 ]---
> > > > 
> > > > Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
> > > > Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
> > > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > > ---
> > > >  net/ipv4/tcp_bpf.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> > > > index 53b0d62fd2c2..fe6178715ba0 100644
> > > > --- a/net/ipv4/tcp_bpf.c
> > > > +++ b/net/ipv4/tcp_bpf.c
> > > > @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
> > > >  		err = sk_stream_error(sk, msg->msg_flags, err);
> > > >  	release_sock(sk);
> > > >  	sk_psock_put(sk, psock);
> > > > -	return copied ? copied : err;
> > > > +	return copied > 0 ? copied : err;
> > > 
> > > Does it make more sense to make the condition err:
> > > is err 0 iif everything is ok? (completely untested!)
> > 
> > Mind to elaborate?

I was thinking that a valid test for for being in an error state is that
err is non-zero. Although given the below, it seems that I was mistaken.

> > 
> > From my point of view, 'copied' is to handle partial transmission, for
> > example:
> > 
> > 0. User wants to send 2 * 1K bytes with sendmsg()
> > 1. Kernel already sent the first 1K successfully
> > 2. Kernel got some error when sending the 2nd 1K
> > 
> > In this scenario, we should return 1K instead of the error to the caller to
> > indicate this partial transmission situation, otherwise we could not
> > distinguish it with a compete failure (that is, 0 byte sent).
> 
> Yep, if we don't return the positive value on partial send we will confuse
> apps and they will probably resent data.
> 
> >From my side this looks good.
> 
> Reviewed-by: John Fastabend <john.fastabend@gmail.com>

Thanks for responding to my query.
FWIIW, I am now happy with this patch.
Martin KaFai Lau Aug. 29, 2024, 6:36 p.m. UTC | #5
On 8/22/24 1:45 PM, John Fastabend wrote:
> Cong Wang wrote:
>> On Wed, Aug 21, 2024 at 03:55:33PM +0100, Simon Horman wrote:
>>> On Tue, Aug 20, 2024 at 08:07:44PM -0700, Cong Wang wrote:
>>>> From: Cong Wang <cong.wang@bytedance.com>
>>>>
>>>> When we cork messages in psock->cork, the last message triggers the
>>>> flushing will result in sending a sk_msg larger than the current
>>>> message size. In this case, in tcp_bpf_send_verdict(), 'copied' becomes
>>>> negative at least in the following case:
>>>>
>>>> 468         case __SK_DROP:
>>>> 469         default:
>>>> 470                 sk_msg_free_partial(sk, msg, tosend);
>>>> 471                 sk_msg_apply_bytes(psock, tosend);
>>>> 472                 *copied -= (tosend + delta); // <==== HERE
>>>> 473                 return -EACCES;
>>>>
>>>> Therefore, it could lead to the following BUG with a proper value of
>>>> 'copied' (thanks to syzbot). We should not use negative 'copied' as a
>>>> return value here.
>>>>
>>>>    ------------[ cut here ]------------
>>>>    kernel BUG at net/socket.c:733!
>>>>    Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
>>>>    Modules linked in:
>>>>    CPU: 0 UID: 0 PID: 3265 Comm: syz-executor510 Not tainted 6.11.0-rc3-syzkaller-00060-gd07b43284ab3 #0
>>>>    Hardware name: linux,dummy-virt (DT)
>>>>    pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>>    pc : sock_sendmsg_nosec net/socket.c:733 [inline]
>>>>    pc : sock_sendmsg_nosec net/socket.c:728 [inline]
>>>>    pc : __sock_sendmsg+0x5c/0x60 net/socket.c:745
>>>>    lr : sock_sendmsg_nosec net/socket.c:730 [inline]
>>>>    lr : __sock_sendmsg+0x54/0x60 net/socket.c:745
>>>>    sp : ffff800088ea3b30
>>>>    x29: ffff800088ea3b30 x28: fbf00000062bc900 x27: 0000000000000000
>>>>    x26: ffff800088ea3bc0 x25: ffff800088ea3bc0 x24: 0000000000000000
>>>>    x23: f9f00000048dc000 x22: 0000000000000000 x21: ffff800088ea3d90
>>>>    x20: f9f00000048dc000 x19: ffff800088ea3d90 x18: 0000000000000001
>>>>    x17: 0000000000000000 x16: 0000000000000000 x15: 000000002002ffaf
>>>>    x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>>>>    x11: 0000000000000000 x10: ffff8000815849c0 x9 : ffff8000815b49c0
>>>>    x8 : 0000000000000000 x7 : 000000000000003f x6 : 0000000000000000
>>>>    x5 : 00000000000007e0 x4 : fff07ffffd239000 x3 : fbf00000062bc900
>>>>    x2 : 0000000000000000 x1 : 0000000000000000 x0 : 00000000fffffdef
>>>>    Call trace:
>>>>     sock_sendmsg_nosec net/socket.c:733 [inline]
>>>>     __sock_sendmsg+0x5c/0x60 net/socket.c:745
>>>>     ____sys_sendmsg+0x274/0x2ac net/socket.c:2597
>>>>     ___sys_sendmsg+0xac/0x100 net/socket.c:2651
>>>>     __sys_sendmsg+0x84/0xe0 net/socket.c:2680
>>>>     __do_sys_sendmsg net/socket.c:2689 [inline]
>>>>     __se_sys_sendmsg net/socket.c:2687 [inline]
>>>>     __arm64_sys_sendmsg+0x24/0x30 net/socket.c:2687
>>>>     __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
>>>>     invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
>>>>     el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
>>>>     do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
>>>>     el0_svc+0x34/0xec arch/arm64/kernel/entry-common.c:712
>>>>     el0t_64_sync_handler+0x100/0x12c arch/arm64/kernel/entry-common.c:730
>>>>     el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:598
>>>>    Code: f9404463 d63f0060 3108441f 54fffe81 (d4210000)
>>>>    ---[ end trace 0000000000000000 ]---
>>>>
>>>> Fixes: 4f738adba30a ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
>>>> Reported-by: syzbot+58c03971700330ce14d8@syzkaller.appspotmail.com
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: Jakub Sitnicki <jakub@cloudflare.com>
>>>> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
>>>> ---
>>>>   net/ipv4/tcp_bpf.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
>>>> index 53b0d62fd2c2..fe6178715ba0 100644
>>>> --- a/net/ipv4/tcp_bpf.c
>>>> +++ b/net/ipv4/tcp_bpf.c
>>>> @@ -577,7 +577,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
>>>>   		err = sk_stream_error(sk, msg->msg_flags, err);
>>>>   	release_sock(sk);
>>>>   	sk_psock_put(sk, psock);
>>>> -	return copied ? copied : err;
>>>> +	return copied > 0 ? copied : err;
>>>
>>> Does it make more sense to make the condition err:
>>> is err 0 iif everything is ok? (completely untested!)
>>
>> Mind to elaborate?
>>
>>  From my point of view, 'copied' is to handle partial transmission, for
>> example:
>>
>> 0. User wants to send 2 * 1K bytes with sendmsg()
>> 1. Kernel already sent the first 1K successfully
>> 2. Kernel got some error when sending the 2nd 1K
>>
>> In this scenario, we should return 1K instead of the error to the caller to
>> indicate this partial transmission situation, otherwise we could not
>> distinguish it with a compete failure (that is, 0 byte sent).
> 
> Yep, if we don't return the positive value on partial send we will confuse
> apps and they will probably resent data.
> 
>  From my side this looks good.
> 
> Reviewed-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Martin KaFai Lau <martin.lau@kernel.org>

Jakub, can you directly land it to the net tree?
Jakub Kicinski Aug. 29, 2024, 7:51 p.m. UTC | #6
On Thu, 29 Aug 2024 11:36:41 -0700 Martin KaFai Lau wrote:
> Jakub, can you directly land it to the net tree?

Sure thing! Let me re-assign it to netdev so it goes to the CI once
again, and I'll push it out.
diff mbox series

Patch

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 53b0d62fd2c2..fe6178715ba0 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -577,7 +577,7 @@  static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 		err = sk_stream_error(sk, msg->msg_flags, err);
 	release_sock(sk);
 	sk_psock_put(sk, psock);
-	return copied ? copied : err;
+	return copied > 0 ? copied : err;
 }
 
 enum {