diff mbox series

[RFC] bpf: Remove in_atomic() from bpf_link_put().

Message ID 20230509132433.2FSY_6t7@linutronix.de (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [RFC] bpf: Remove in_atomic() from bpf_link_put(). | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 fail Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 fail Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on s390x with gcc
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 10 this patch: 10
netdev/cc_maintainers warning 8 maintainers not CCed: yhs@fb.com kpsingh@kernel.org martin.lau@linux.dev song@kernel.org sdf@google.com andrii@kernel.org jolsa@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Sebastian Andrzej Siewior May 9, 2023, 1:24 p.m. UTC
bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are
invoked within softirq context. By setting rcutree.use_softirq=0 boot
option the RCU callbacks will be invoked in a per-CPU kthread with
bottom halves disabled which implies a RCU read section.

On PREEMPT_RT the context remains fully preemptible. The RCU read
section however does not allow schedule() invocation. The latter happens
in mutex_lock() performed by bpf_trampoline_unlink_prog() originated
from bpf_link_put().

Remove the context checks and use the workqueue unconditionally.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
The warning can be observed as:
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510
| in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 47, name: rcuc/3
| preempt_count: 0, expected: 0
| RCU nest depth: 2, expected: 0
| CPU: 3 PID: 47 Comm: rcuc/3 Tainted: G            E      v6.3-rt12 #1
| Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.3a 01/06/2021
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x43/0x60
|  __might_resched+0x137/0x190
|  mutex_lock+0x1a/0x50
|  bpf_trampoline_unlink_prog+0x1b/0x100
|  bpf_tracing_link_release+0x12/0x40
|  bpf_link_free+0x70/0x90
|  bpf_free_inode+0x3e/0x80
|  rcu_core+0x4ff/0x7c0
|  rcu_cpu_kthread+0xa9/0x2f0
|  smpboot_thread_fn+0x141/0x2c0
|  kthread+0x110/0x130
|  ret_from_fork+0x2c/0x50
|  </TASK>

 kernel/bpf/syscall.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Andrii Nakryiko May 10, 2023, 7:19 a.m. UTC | #1
On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are
> invoked within softirq context. By setting rcutree.use_softirq=0 boot
> option the RCU callbacks will be invoked in a per-CPU kthread with
> bottom halves disabled which implies a RCU read section.
>
> On PREEMPT_RT the context remains fully preemptible. The RCU read
> section however does not allow schedule() invocation. The latter happens
> in mutex_lock() performed by bpf_trampoline_unlink_prog() originated
> from bpf_link_put().
>
> Remove the context checks and use the workqueue unconditionally.
>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Please see [0] and corresponding revert commit. We do want
bpf_link_free() to happen synchronously if it's caused by close()
syscall.

f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in
atomic context"")

  [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/

> The warning can be observed as:
> | BUG: sleeping function called from invalid context at kernel/locking/rtmutex_api.c:510
> | in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 47, name: rcuc/3
> | preempt_count: 0, expected: 0
> | RCU nest depth: 2, expected: 0
> | CPU: 3 PID: 47 Comm: rcuc/3 Tainted: G            E      v6.3-rt12 #1
> | Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.3a 01/06/2021
> | Call Trace:
> |  <TASK>
> |  dump_stack_lvl+0x43/0x60
> |  __might_resched+0x137/0x190
> |  mutex_lock+0x1a/0x50
> |  bpf_trampoline_unlink_prog+0x1b/0x100
> |  bpf_tracing_link_release+0x12/0x40
> |  bpf_link_free+0x70/0x90
> |  bpf_free_inode+0x3e/0x80
> |  rcu_core+0x4ff/0x7c0
> |  rcu_cpu_kthread+0xa9/0x2f0
> |  smpboot_thread_fn+0x141/0x2c0
> |  kthread+0x110/0x130
> |  ret_from_fork+0x2c/0x50
> |  </TASK>
>
>  kernel/bpf/syscall.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 14f39c1e573ee..0adaa1bfbb0d2 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2785,12 +2785,8 @@ void bpf_link_put(struct bpf_link *link)
>         if (!atomic64_dec_and_test(&link->refcnt))
>                 return;
>
> -       if (in_atomic()) {
> -               INIT_WORK(&link->work, bpf_link_put_deferred);
> -               schedule_work(&link->work);
> -       } else {
> -               bpf_link_free(link);
> -       }
> +       INIT_WORK(&link->work, bpf_link_put_deferred);
> +       schedule_work(&link->work);
>  }
>  EXPORT_SYMBOL(bpf_link_put);
>
> --
> 2.40.1
>
>
Alexei Starovoitov May 17, 2023, 5:26 a.m. UTC | #2
On Wed, May 10, 2023 at 12:19 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are
> > invoked within softirq context. By setting rcutree.use_softirq=0 boot
> > option the RCU callbacks will be invoked in a per-CPU kthread with
> > bottom halves disabled which implies a RCU read section.
> >
> > On PREEMPT_RT the context remains fully preemptible. The RCU read
> > section however does not allow schedule() invocation. The latter happens
> > in mutex_lock() performed by bpf_trampoline_unlink_prog() originated
> > from bpf_link_put().
> >
> > Remove the context checks and use the workqueue unconditionally.
> >
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
>
> Please see [0] and corresponding revert commit. We do want
> bpf_link_free() to happen synchronously if it's caused by close()
> syscall.
>
> f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in
> atomic context"")
>
>   [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/

Sebastian,

Andrii is correct. We cannot do this unconditionally,
but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT)
if it's causing issues on RT, but BPF users won't be happy
with non deterministic prog detach.
Do you see a different way of solving it?
Sebastian Andrzej Siewior May 17, 2023, 6:09 a.m. UTC | #3
On 2023-05-16 22:26:01 [-0700], Alexei Starovoitov wrote:
> Sebastian,
Hi Alexei,

> Andrii is correct. We cannot do this unconditionally,
> but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT)
> if it's causing issues on RT, but BPF users won't be happy
> with non deterministic prog detach.
> Do you see a different way of solving it?

Yes. I've been distracted with other things, I get back to it.

Sebastian
Andrii Nakryiko May 17, 2023, 4:26 p.m. UTC | #4
On Tue, May 16, 2023 at 10:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, May 10, 2023 at 12:19 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, May 9, 2023 at 6:24 AM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are
> > > invoked within softirq context. By setting rcutree.use_softirq=0 boot
> > > option the RCU callbacks will be invoked in a per-CPU kthread with
> > > bottom halves disabled which implies a RCU read section.
> > >
> > > On PREEMPT_RT the context remains fully preemptible. The RCU read
> > > section however does not allow schedule() invocation. The latter happens
> > > in mutex_lock() performed by bpf_trampoline_unlink_prog() originated
> > > from bpf_link_put().
> > >
> > > Remove the context checks and use the workqueue unconditionally.
> > >
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > ---
> >
> > Please see [0] and corresponding revert commit. We do want
> > bpf_link_free() to happen synchronously if it's caused by close()
> > syscall.
> >
> > f00f2f7fe860 ("Revert "bpf: Fix potential call bpf_link_free() in
> > atomic context"")
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4BzZ9zwA=SrLTx9JT50OeM6fVPg0Py0Gx+K9ah2we8YtCRA@mail.gmail.com/
>
> Sebastian,
>
> Andrii is correct. We cannot do this unconditionally,
> but we can do it for IS_ENABLED(CONFIG_PREEMPT_RT)
> if it's causing issues on RT, but BPF users won't be happy
> with non deterministic prog detach.
> Do you see a different way of solving it?

Is struct file_operations.release() callback guaranteed to be called
from user context? If yes, then perhaps the most straightforward way
to guarantee synchronous bpf_link cleanup on close(link_fd) is to have
a bpf_link_put() variant that will be only called from user-context
and will just do bpf_link_free(link) directly, without checking
in_atomic().
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 14f39c1e573ee..0adaa1bfbb0d2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2785,12 +2785,8 @@  void bpf_link_put(struct bpf_link *link)
 	if (!atomic64_dec_and_test(&link->refcnt))
 		return;
 
-	if (in_atomic()) {
-		INIT_WORK(&link->work, bpf_link_put_deferred);
-		schedule_work(&link->work);
-	} else {
-		bpf_link_free(link);
-	}
+	INIT_WORK(&link->work, bpf_link_put_deferred);
+	schedule_work(&link->work);
 }
 EXPORT_SYMBOL(bpf_link_put);