Message ID | 20241220134234.3809621-1-koichiro.den@canonical.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | vmstat: disable vmstat_work on vmstat_cpu_down_prep() | expand |
On Fri, Dec 20, 2024 at 10:42:34PM +0900, Koichiro Den wrote: > Even after mm/vmstat:online teardown, shepherd may still queue work for > the dying cpu until the cpu is removed from online mask. While it's > quite rare, this means that after unbind_workers() unbinds a per-cpu > kworker, it potentially runs vmstat_update for the dying CPU on an > irrelevant cpu before entering STARTING section. > When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the > backtrace. > > BUG: using smp_processor_id() in preemptible [00000000] code: \ > kworker/7:3/1702 > caller is refresh_cpu_vm_stats+0x235/0x5f0 > CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G > Tainted: [N]=TEST > Workqueue: mm_percpu_wq vmstat_update > Call Trace: > <TASK> > dump_stack_lvl+0x8d/0xb0 > check_preemption_disabled+0xce/0xe0 > refresh_cpu_vm_stats+0x235/0x5f0 > vmstat_update+0x17/0xa0 > process_one_work+0x869/0x1aa0 > worker_thread+0x5e5/0x1100 > kthread+0x29e/0x380 > ret_from_fork+0x2d/0x70 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > So, disable vmstat_work reliably on vmstat_cpu_down_prep(). > > Signed-off-by: Koichiro Den <koichiro.den@canonical.com> > --- > mm/vmstat.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/vmstat.c b/mm/vmstat.c > index 4d016314a56c..44e1d87dcf01 100644 > --- a/mm/vmstat.c > +++ b/mm/vmstat.c > @@ -2154,7 +2154,7 @@ static int vmstat_cpu_online(unsigned int cpu) > > static int vmstat_cpu_down_prep(unsigned int cpu) > { > - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); > return 0; > } > > -- Andrew, I just noticed my silly mistake - I needed to enable the work in the opposite direction. It looks like you've already queued this (v1) to mm-hotfixes-unstable. Could you please replace it with v2? (https://lore.kernel.org/all/20241221033321.4154409-1-koichiro.den@canonical.com/) Sorry to bother you. Let me know if submitting a separate follow-up patch would be more appropriate. Thank you. -Koichiro Den
diff --git a/mm/vmstat.c b/mm/vmstat.c index 4d016314a56c..44e1d87dcf01 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2154,7 +2154,7 @@ static int vmstat_cpu_online(unsigned int cpu) static int vmstat_cpu_down_prep(unsigned int cpu) { - cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); + disable_delayed_work_sync(&per_cpu(vmstat_work, cpu)); return 0; }
Even after mm/vmstat:online teardown, shepherd may still queue work for the dying cpu until the cpu is removed from online mask. While it's quite rare, this means that after unbind_workers() unbinds a per-cpu kworker, it potentially runs vmstat_update for the dying CPU on an irrelevant cpu before entering STARTING section. When CONFIG_DEBUG_PREEMPT=y, it results in the following error with the backtrace. BUG: using smp_processor_id() in preemptible [00000000] code: \ kworker/7:3/1702 caller is refresh_cpu_vm_stats+0x235/0x5f0 CPU: 0 UID: 0 PID: 1702 Comm: kworker/7:3 Tainted: G Tainted: [N]=TEST Workqueue: mm_percpu_wq vmstat_update Call Trace: <TASK> dump_stack_lvl+0x8d/0xb0 check_preemption_disabled+0xce/0xe0 refresh_cpu_vm_stats+0x235/0x5f0 vmstat_update+0x17/0xa0 process_one_work+0x869/0x1aa0 worker_thread+0x5e5/0x1100 kthread+0x29e/0x380 ret_from_fork+0x2d/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> So, disable vmstat_work reliably on vmstat_cpu_down_prep(). Signed-off-by: Koichiro Den <koichiro.den@canonical.com> --- mm/vmstat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)