Message ID | 20240117061636.288412-1-chenzhongjin@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | kprobes: Use synchronize_rcu_tasks_rude in kprobe_optimizer | expand |
On Wed, 17 Jan 2024 06:16:36 +0000 Chen Zhongjin <chenzhongjin@huawei.com> wrote: > There is a deadlock scenario in kprobe_optimizer(): > > pid A pid B pid C > kprobe_optimizer() do_exit() perf_kprobe_init() > mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex) > synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex > // waiting tasks_rcu_exit_srcu kernel_wait4() > // waiting pid C exit > > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer() > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise > that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu. > > Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Thanks. Should we backport this fix into earlier kernels? If so, are we able to identify a suitable Fixes: target?
On Wed, Jan 17, 2024 at 12:31:33PM -0800, Andrew Morton wrote: > On Wed, 17 Jan 2024 06:16:36 +0000 Chen Zhongjin <chenzhongjin@huawei.com> wrote: > > > There is a deadlock scenario in kprobe_optimizer(): > > > > pid A pid B pid C > > kprobe_optimizer() do_exit() perf_kprobe_init() > > mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex) > > synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex > > // waiting tasks_rcu_exit_srcu kernel_wait4() > > // waiting pid C exit > > > > To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer() > > rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise > > that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu. Hello, Chen Zhongjin, Relying on the non-preemptability of the last portion of the do_exit() function does appear to be the basis for a much better solution than what we currently have. So at the very least, thank you for the idea!!! I feel a bit silly for not having thought of it myself. ;-) However, just invoking synchronize_rcu_tasks_rude() will be bad for both energy efficiency and real-time response. This is due to the fact that synchronize_rcu_tasks_rude() sends an IPI to each and every online CPUs, almost none of which will be in the non-preemptible tail of do_exit() at any given time. These additional unnecessary IPIs will drain battery when they hit idle CPUs and degrade real-time response when they hit CPUs running aggressive real-time applications. Which might not make people happy. So, would you be willing to use RCU's do_exit() hooks and RCU's hook into the scheduler (either rcu_note_context_switch() or rcu_tasks_qs(), whichever would work better) maintain a per-CPU variable that is a pointer to the task in the non-preemptible tail of do_exit() if there is one or NULL otherwise? This would get us the deadlock-avoidance simplicity of your underlying idea, while avoiding (almost all of) the energy-efficiency and real-time-response issues with your patch. This does require a bit of memory-ordering care, so if you would prefer that I do the implementation, just let me know. (The memory-ordering trick is to use "smp_mb(); smp_load_acquire();" to sample the counter and "smp_store_release(); smp_mb();" to update it.) Thanx, Paul > > Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> > > Thanks. Should we backport this fix into earlier kernels? If so, are > we able to identify a suitable Fixes: target?
diff --git a/arch/Kconfig b/arch/Kconfig index f4b210ab0612..dc6a18854017 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -104,7 +104,7 @@ config STATIC_CALL_SELFTEST config OPTPROBES def_bool y depends on KPROBES && HAVE_OPTPROBES - select TASKS_RCU if PREEMPTION + select TASKS_RUDE_RCU config KPROBES_ON_FTRACE def_bool y diff --git a/kernel/kprobes.c b/kernel/kprobes.c index d5a0ee40bf66..09056ae50c58 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -623,7 +623,7 @@ static void kprobe_optimizer(struct work_struct *work) * Note that on non-preemptive kernel, this is transparently converted * to synchronoze_sched() to wait for all interrupts to have completed. */ - synchronize_rcu_tasks(); + synchronize_rcu_tasks_rude(); /* Step 3: Optimize kprobes after quiesence period */ do_optimize_kprobes();
There is a deadlock scenario in kprobe_optimizer(): pid A pid B pid C kprobe_optimizer() do_exit() perf_kprobe_init() mutex_lock(&kprobe_mutex) exit_tasks_rcu_start() mutex_lock(&kprobe_mutex) synchronize_rcu_tasks() zap_pid_ns_processes() // waiting kprobe_mutex // waiting tasks_rcu_exit_srcu kernel_wait4() // waiting pid C exit To avoid this deadlock loop, use synchronize_rcu_tasks_rude() in kprobe_optimizer() rather than synchronize_rcu_tasks(). synchronize_rcu_tasks_rude() can also promise that all preempted tasks have scheduled, but it will not wait tasks_rcu_exit_srcu. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> --- arch/Kconfig | 2 +- kernel/kprobes.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)