Message ID | 20230520094722.5393-1-zegao@tencent.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | | expand |
On 5/20/23 2:47 AM, Ze Gao wrote: > > Hi Jiri, > > Would you like to consider to add rcu_is_watching check in > to solve this from the viewpoint of kprobe_multi_link_prog_run > itself? And accounting of missed runs can be added as well > to imporve observability. > > Regards, > Ze > > > ----------------- > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001 > From: Ze Gao <zegao@tencent.com> > Date: Sat, 20 May 2023 17:32:05 +0800 > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching > > From the perspective of kprobe_multi_link_prog_run, any traceable > functions can be attached while bpf progs need specical care and > ought to be under rcu protection. To solve the likely rcu lockdep > warns once for good, when (future) functions in idle path were > attached accidentally, we better paying some cost to check at least > in kernel-side, and return when rcu is not watching, which helps > to avoid any unpredictable results. kprobe_multi/fprobe share the same set of attachments with fentry. Currently, fentry does not filter with !rcu_is_watching, maybe because this is an extreme corner case. Not sure whether it is worthwhile or not. Maybe if you can give a concrete example (e.g., attachment point) with current code base to show what the issue you encountered and it will make it easier to judge whether adding !rcu_is_watching() is necessary or not. > > Signed-off-by: Ze Gao <zegao@tencent.com> > --- > kernel/trace/bpf_trace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 9a050e36dc6c..3e6ea7274765 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, > struct bpf_run_ctx *old_run_ctx; > int err; > > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) { > err = 0; > goto out; > }
On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote: > > Hi Jiri, > > Would you like to consider to add rcu_is_watching check in > to solve this from the viewpoint of kprobe_multi_link_prog_run I think this was discussed in here: https://lore.kernel.org/bpf/20230321020103.13494-1-laoar.shao@gmail.com/ and was considered a bug, there's fix mentioned later in the thread there's also this recent patchset: https://lore.kernel.org/bpf/20230517034510.15639-3-zegao@tencent.com/ that solves related problems > itself? And accounting of missed runs can be added as well > to imporve observability. right, we count fprobe->nmissed but it's not exposed, we should allow to get 'missed' stats from both fprobe and kprobe_multi later, which is missing now, will check thanks, jirka > > Regards, > Ze > > > ----------------- > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001 > From: Ze Gao <zegao@tencent.com> > Date: Sat, 20 May 2023 17:32:05 +0800 > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching > > From the perspective of kprobe_multi_link_prog_run, any traceable > functions can be attached while bpf progs need specical care and > ought to be under rcu protection. To solve the likely rcu lockdep > warns once for good, when (future) functions in idle path were > attached accidentally, we better paying some cost to check at least > in kernel-side, and return when rcu is not watching, which helps > to avoid any unpredictable results. > > Signed-off-by: Ze Gao <zegao@tencent.com> > --- > kernel/trace/bpf_trace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 9a050e36dc6c..3e6ea7274765 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, > struct bpf_run_ctx *old_run_ctx; > int err; > > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) { > err = 0; > goto out; > } > -- > 2.40.1 >
On Sun, 21 May 2023 10:08:46 +0200 Jiri Olsa <olsajiri@gmail.com> wrote: > On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote: > > > > Hi Jiri, > > > > Would you like to consider to add rcu_is_watching check in > > to solve this from the viewpoint of kprobe_multi_link_prog_run > > I think this was discussed in here: > https://lore.kernel.org/bpf/20230321020103.13494-1-laoar.shao@gmail.com/ > > and was considered a bug, there's fix mentioned later in the thread > > there's also this recent patchset: > https://lore.kernel.org/bpf/20230517034510.15639-3-zegao@tencent.com/ > > that solves related problems I think this rcu_is_watching() is a bit different issue. This rcu_is_watching() check is required if the kprobe_multi_link_prog_run() uses any RCU API. E.g. rethook_try_get() is also checks rcu_is_watching() because it uses call_rcu(). Thank you, > > > itself? And accounting of missed runs can be added as well > > to imporve observability. > > right, we count fprobe->nmissed but it's not exposed, we should allow > to get 'missed' stats from both fprobe and kprobe_multi later, which > is missing now, will check > > thanks, > jirka > > > > > Regards, > > Ze > > > > > > ----------------- > > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001 > > From: Ze Gao <zegao@tencent.com> > > Date: Sat, 20 May 2023 17:32:05 +0800 > > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching > > > > From the perspective of kprobe_multi_link_prog_run, any traceable > > functions can be attached while bpf progs need specical care and > > ought to be under rcu protection. To solve the likely rcu lockdep > > warns once for good, when (future) functions in idle path were > > attached accidentally, we better paying some cost to check at least > > in kernel-side, and return when rcu is not watching, which helps > > to avoid any unpredictable results. > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > --- > > kernel/trace/bpf_trace.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 9a050e36dc6c..3e6ea7274765 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, > > struct bpf_run_ctx *old_run_ctx; > > int err; > > > > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { > > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) { > > err = 0; > > goto out; > > } > > -- > > 2.40.1 > >
On Sun, May 21, 2023 at 6:09 PM Masami Hiramatsu <mhiramat@kernel.org> wrote: > > On Sun, 21 May 2023 10:08:46 +0200 > Jiri Olsa <olsajiri@gmail.com> wrote: > > > On Sat, May 20, 2023 at 05:47:24PM +0800, Ze Gao wrote: > > > > > > Hi Jiri, > > > > > > Would you like to consider to add rcu_is_watching check in > > > to solve this from the viewpoint of kprobe_multi_link_prog_run > > > > I think this was discussed in here: > > https://lore.kernel.org/bpf/20230321020103.13494-1-laoar.shao@gmail.com/ > > > > and was considered a bug, there's fix mentioned later in the thread > > > > there's also this recent patchset: > > https://lore.kernel.org/bpf/20230517034510.15639-3-zegao@tencent.com/ > > > > that solves related problems > > I think this rcu_is_watching() is a bit different issue. This rcu_is_watching() > check is required if the kprobe_multi_link_prog_run() uses any RCU API. > E.g. rethook_try_get() is also checks rcu_is_watching() because it uses > call_rcu(). Yes, that's my point! Regards, Ze > > > > > > itself? And accounting of missed runs can be added as well > > > to imporve observability. > > > > right, we count fprobe->nmissed but it's not exposed, we should allow > > to get 'missed' stats from both fprobe and kprobe_multi later, which > > is missing now, will check > > > > thanks, > > jirka > > > > > > > > Regards, > > > Ze > > > > > > > > > ----------------- > > > From 29fd3cd713e65461325c2703cf5246a6fae5d4fe Mon Sep 17 00:00:00 2001 > > > From: Ze Gao <zegao@tencent.com> > > > Date: Sat, 20 May 2023 17:32:05 +0800 > > > Subject: [PATCH] bpf: kprobe_multi runs bpf progs only when rcu_is_watching > > > > > > From the perspective of kprobe_multi_link_prog_run, any traceable > > > functions can be attached while bpf progs need specical care and > > > ought to be under rcu protection. To solve the likely rcu lockdep > > > warns once for good, when (future) functions in idle path were > > > attached accidentally, we better paying some cost to check at least > > > in kernel-side, and return when rcu is not watching, which helps > > > to avoid any unpredictable results. > > > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > > --- > > > kernel/trace/bpf_trace.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > index 9a050e36dc6c..3e6ea7274765 100644 > > > --- a/kernel/trace/bpf_trace.c > > > +++ b/kernel/trace/bpf_trace.c > > > @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, > > > struct bpf_run_ctx *old_run_ctx; > > > int err; > > > > > > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { > > > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) { > > > err = 0; > > > goto out; > > > } > > > -- > > > 2.40.1 > > > > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>
> kprobe_multi/fprobe share the same set of attachments with fentry. > Currently, fentry does not filter with !rcu_is_watching, maybe > because this is an extreme corner case. Not sure whether it is > worthwhile or not. Agreed, it's rare, especially after Peter's patches which push narrow down rcu eqs regions in the idle path and reduce the chance of any traceable functions happening in between. However, from RCU's perspective, we ought to check if rcu_is_watching theoretically when there's a chance our code will run in the idle path and also we need rcu to be alive, And also we cannot simply make assumptions for any future changes in the idle path. You know, just like what was hit in the thread. > Maybe if you can give a concrete example (e.g., attachment point) > with current code base to show what the issue you encountered and > it will make it easier to judge whether adding !rcu_is_watching() > is necessary or not. I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is traceable but not on the latest version so far. But as I state above, in theory we need it. So here is a gentle ping :) .
On Sun, May 21, 2023 at 11:10:16PM +0800, Ze Gao wrote: > > kprobe_multi/fprobe share the same set of attachments with fentry. > > Currently, fentry does not filter with !rcu_is_watching, maybe > > because this is an extreme corner case. Not sure whether it is > > worthwhile or not. > > Agreed, it's rare, especially after Peter's patches which push narrow > down rcu eqs regions > in the idle path and reduce the chance of any traceable functions > happening in between. > > However, from RCU's perspective, we ought to check if rcu_is_watching > theoretically > when there's a chance our code will run in the idle path and also we > need rcu to be alive, > And also we cannot simply make assumptions for any future changes in > the idle path. > You know, just like what was hit in the thread. > > > Maybe if you can give a concrete example (e.g., attachment point) > > with current code base to show what the issue you encountered and > > it will make it easier to judge whether adding !rcu_is_watching() > > is necessary or not. > > I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is > traceable but not on the latest version > so far. But as I state above, in theory we need it. So here is a > gentle ping :) . hum, this change [1] added rcu_is_watching check to ftrace_test_recursion_trylock, which we use in fprobe_handler and is coming to fprobe_exit_handler in [2] I might be missing something, but it seems like we don't need another rcu_is_watching call on kprobe_multi level jirka [1] d099dbfd3306 cpuidle: tracing: Warn about !rcu_is_watching() [2] https://lore.kernel.org/bpf/20230517034510.15639-4-zegao@tencent.com/
On Sun, 21 May 2023 22:26:37 +0200 Jiri Olsa <olsajiri@gmail.com> wrote: > On Sun, May 21, 2023 at 11:10:16PM +0800, Ze Gao wrote: > > > kprobe_multi/fprobe share the same set of attachments with fentry. > > > Currently, fentry does not filter with !rcu_is_watching, maybe > > > because this is an extreme corner case. Not sure whether it is > > > worthwhile or not. > > > > Agreed, it's rare, especially after Peter's patches which push narrow > > down rcu eqs regions > > in the idle path and reduce the chance of any traceable functions > > happening in between. > > > > However, from RCU's perspective, we ought to check if rcu_is_watching > > theoretically > > when there's a chance our code will run in the idle path and also we > > need rcu to be alive, > > And also we cannot simply make assumptions for any future changes in > > the idle path. > > You know, just like what was hit in the thread. > > > > > Maybe if you can give a concrete example (e.g., attachment point) > > > with current code base to show what the issue you encountered and > > > it will make it easier to judge whether adding !rcu_is_watching() > > > is necessary or not. > > > > I can reproduce likely warnings on v6.1.18 where arch_cpu_idle is > > traceable but not on the latest version > > so far. But as I state above, in theory we need it. So here is a > > gentle ping :) . > > hum, this change [1] added rcu_is_watching check to ftrace_test_recursion_trylock, > which we use in fprobe_handler and is coming to fprobe_exit_handler in [2] > > I might be missing something, but it seems like we don't need another > rcu_is_watching call on kprobe_multi level Good point! OK, then it seems we don't need it. The rethook continues to use the rcu_is_watching() because it is also used from kprobes, but the kprobe_multi doesn't need it. Thank you, > > jirka > > > [1] d099dbfd3306 cpuidle: tracing: Warn about !rcu_is_watching() > [2] https://lore.kernel.org/bpf/20230517034510.15639-4-zegao@tencent.com/
Oops, I missed that. Thanks for pointing that out, which I thought is conditional use of rcu_is_watching before. One last point, I think we should double check on this "fentry does not filter with !rcu_is_watching" as quoted from Yonghong and argue whether it needs the same check for fentry as well. Regards, Ze
On 5/21/23 7:07 PM, Ze Gao wrote: > Oops, I missed that. Thanks for pointing that out, which I thought is > conditional use of rcu_is_watching before. > > One last point, I think we should double check on this > "fentry does not filter with !rcu_is_watching" > as quoted from Yonghong and argue whether it needs > the same check for fentry as well. I would suggest that we address rcu_is_watching issue for fentry only if we do have a reproducible case to show something goes wrong... > > Regards, > Ze
On Mon, 22 May 2023 10:07:42 +0800 Ze Gao <zegao2021@gmail.com> wrote: > Oops, I missed that. Thanks for pointing that out, which I thought is > conditional use of rcu_is_watching before. > > One last point, I think we should double check on this > "fentry does not filter with !rcu_is_watching" > as quoted from Yonghong and argue whether it needs > the same check for fentry as well. rcu_is_watching() comment says; * if the current CPU is not in its idle loop or is in an interrupt or * NMI handler, return true. Thus it returns *fault* if the current CPU is in the idle loop and not any interrupt(including NMI) context. This means if any tracable function is called from idle loop, it can be !rcu_is_watching(). I meant, this is 'context' based check, thus fentry can not filter out that some commonly used functions is called from that context but it can be detected. Thank you, > > Regards, > Ze
On Tue, May 23, 2023 at 01:30:19PM +0800, Masami Hiramatsu wrote: > On Mon, 22 May 2023 10:07:42 +0800 > Ze Gao <zegao2021@gmail.com> wrote: > > > Oops, I missed that. Thanks for pointing that out, which I thought is > > conditional use of rcu_is_watching before. > > > > One last point, I think we should double check on this > > "fentry does not filter with !rcu_is_watching" > > as quoted from Yonghong and argue whether it needs > > the same check for fentry as well. > > rcu_is_watching() comment says; > > * if the current CPU is not in its idle loop or is in an interrupt or > * NMI handler, return true. > > Thus it returns *fault* if the current CPU is in the idle loop and not > any interrupt(including NMI) context. This means if any tracable function > is called from idle loop, it can be !rcu_is_watching(). I meant, this is > 'context' based check, thus fentry can not filter out that some commonly > used functions is called from that context but it can be detected. It really does return false (rather than faulting?) if the current CPU is deep within the idle loop. In addition, the recent x86/entry rework (thank you Peter and Thomas!) mean that the "idle loop" is quite restricted, as can be seen by the invocations of ct_cpuidle_enter() and ct_cpuidle_exit(). For example, in default_idle_call(), these are immediately before and after the call to arch_cpu_idle(). Would the following help? Or am I missing your point? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 1449cb69a0e0..fae9b4e29c93 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -679,10 +679,14 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp) /** * rcu_is_watching - see if RCU thinks that the current CPU is not idle * - * Return true if RCU is watching the running CPU, which means that this - * CPU can safely enter RCU read-side critical sections. In other words, - * if the current CPU is not in its idle loop or is in an interrupt or - * NMI handler, return true. + * Return @true if RCU is watching the running CPU and @false otherwise. + * An @true return means that this CPU can safely enter RCU read-side + * critical sections. + * + * More specifically, if the current CPU is not deep within its idle + * loop, return @true. Note that rcu_is_watching() will return @true if + * invoked from an interrupt or NMI handler, even if that interrupt or + * NMI interrupted the CPU while it was deep within its idle loop. * * Make notrace because it can be called by the internal functions of * ftrace, and making this notrace removes unnecessary recursion calls.
[ Added a subject, as I always want to delete these emails as spam! ] On Mon, 22 May 2023 10:07:42 +0800 Ze Gao <zegao2021@gmail.com> wrote: > Oops, I missed that. Thanks for pointing that out, which I thought is > conditional use of rcu_is_watching before. > > One last point, I think we should double check on this > "fentry does not filter with !rcu_is_watching" > as quoted from Yonghong and argue whether it needs > the same check for fentry as well. > Note that trace_test_and_set_recursion() (which is used by ftrace_test_recursion_trylock()) checks for rcu_is_watching() and returns false if it isn't (and the trylock will fail). -- Steve
Thanks Steven, I think we've come to a consensus on this. The question here is whether bpf tracing fentry i.e., __bpf_prog_enter{_sleepable} needs to check rcu_is_watching as well before using rcu related calls. And Yonghong suggested making a change when there is indeed some bad case occurring since it's rare the tracee is in the idle path. Regards, Ze On Tue, May 23, 2023 at 10:10 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > [ Added a subject, as I always want to delete these emails as spam! ] > > On Mon, 22 May 2023 10:07:42 +0800 > Ze Gao <zegao2021@gmail.com> wrote: > > > Oops, I missed that. Thanks for pointing that out, which I thought is > > conditional use of rcu_is_watching before. > > > > One last point, I think we should double check on this > > "fentry does not filter with !rcu_is_watching" > > as quoted from Yonghong and argue whether it needs > > the same check for fentry as well. > > > > Note that trace_test_and_set_recursion() (which is used by > ftrace_test_recursion_trylock()) checks for rcu_is_watching() and > returns false if it isn't (and the trylock will fail). > > -- Steve
On Mon, 22 May 2023 23:59:28 -0700 "Paul E. McKenney" <paulmck@kernel.org> wrote: > On Tue, May 23, 2023 at 01:30:19PM +0800, Masami Hiramatsu wrote: > > On Mon, 22 May 2023 10:07:42 +0800 > > Ze Gao <zegao2021@gmail.com> wrote: > > > > > Oops, I missed that. Thanks for pointing that out, which I thought is > > > conditional use of rcu_is_watching before. > > > > > > One last point, I think we should double check on this > > > "fentry does not filter with !rcu_is_watching" > > > as quoted from Yonghong and argue whether it needs > > > the same check for fentry as well. > > > > rcu_is_watching() comment says; > > > > * if the current CPU is not in its idle loop or is in an interrupt or > > * NMI handler, return true. > > > > Thus it returns *fault* if the current CPU is in the idle loop and not > > any interrupt(including NMI) context. This means if any tracable function > > is called from idle loop, it can be !rcu_is_watching(). I meant, this is > > 'context' based check, thus fentry can not filter out that some commonly > > used functions is called from that context but it can be detected. > > It really does return false (rather than faulting?) if the current CPU > is deep within the idle loop. > > In addition, the recent x86/entry rework (thank you Peter and > Thomas!) mean that the "idle loop" is quite restricted, as can be > seen by the invocations of ct_cpuidle_enter() and ct_cpuidle_exit(). > For example, in default_idle_call(), these are immediately before and > after the call to arch_cpu_idle(). Thanks! I also found that the default_idle_call() is enough small and it seems not happening on fentry because there are no commonly used functions on that path. > > Would the following help? Or am I missing your point? Yes, thank you for the update! > > Thanx, Paul > > ------------------------------------------------------------------------ > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 1449cb69a0e0..fae9b4e29c93 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -679,10 +679,14 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp) > /** > * rcu_is_watching - see if RCU thinks that the current CPU is not idle > * > - * Return true if RCU is watching the running CPU, which means that this > - * CPU can safely enter RCU read-side critical sections. In other words, > - * if the current CPU is not in its idle loop or is in an interrupt or > - * NMI handler, return true. > + * Return @true if RCU is watching the running CPU and @false otherwise. > + * An @true return means that this CPU can safely enter RCU read-side > + * critical sections. > + * > + * More specifically, if the current CPU is not deep within its idle > + * loop, return @true. Note that rcu_is_watching() will return @true if > + * invoked from an interrupt or NMI handler, even if that interrupt or > + * NMI interrupted the CPU while it was deep within its idle loop. > * > * Make notrace because it can be called by the internal functions of > * ftrace, and making this notrace removes unnecessary recursion calls.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 9a050e36dc6c..3e6ea7274765 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2622,7 +2622,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link, struct bpf_run_ctx *old_run_ctx; int err; - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) { + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1 || !rcu_is_watching())) { err = 0; goto out; }