Message ID | 20230210003937.1030753-1-qiang1.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] sched/isolation: Fix illegal CPU value by housekeeping_any_cpu() return | expand |
> From: Zqiang <qiang1.zhang@intel.com> > Sent: Friday, February 10, 2023 8:40 AM > To: mingo@redhat.com; peterz@infradead.org; juri.lelli@redhat.com; > paulmck@kernel.org; frederic@kernel.org; joel@joelfernandes.org; > rcu@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH v3] sched/isolation: Fix illegal CPU value by > housekeeping_any_cpu() return > > For kernels built with CONFIG_NO_HZ_FULL=y, running the following tests: > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" bootparams= > "console=ttyS0 nohz_full=0,1 rcu_nocbs=0,1 sched_verbose" -d > > root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu2/online > root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu3/online Hi Qiang, Did some quick testing using the same kernel parameters and the reproducing steps as yours: 1) If not apply this v3, the kernel was panic like you found. 2) If apply this v3, the kernel did NOT panic and worked well. But a WARNING call trace [1] was thrown. Not sure whether [1] was another issue. [1] [ 2445.396928] smpboot: CPU 2 is now offline [ 2445.399084] CPU2 attaching NULL sched-domain. [ 2445.399091] CPU3 attaching NULL sched-domain. [ 2445.399202] CPU3 attaching NULL sched-domain. [ 2445.399208] root domain span: 3 (max cpu_capacity = 1024) [ 2449.731424] process 672 (tuned) no longer affine to cpu3 [ 2449.733332] process 509 (systemd-journal) no longer affine to cpu3 [ 2449.742278] process 541 (systemd-udevd) no longer affine to cpu3 [ 2449.745409] process 760 (bash) no longer affine to cpu3 [ 2449.748550] smpboot: CPU 3 is now offline [ 2449.755129] CPU3 attaching NULL sched-domain. [ 2449.755194] ------------[ cut here ]------------ [ 2449.756296] WARNING: CPU: 0 PID: 483 at kernel/sched/topology.c:2257 build_sched_domains+0x104c/0x1430 [ 2449.758227] Modules linked in: rfkill sunrpc psmouse i2c_piix4 atkbd libps2 vivaldi_fmap serio_raw virtio_net net_failover failover sr_mod cdrom i8042 qemu_fw_cfg pata_acpi ipmi_devintf ipmi_msghandler [ 2449.760804] CPU: 0 PID: 483 Comm: kworker/3:6 Not tainted 6.2.0-rc7-rcu+ #21 [ 2449.761820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 2449.762931] Workqueue: events cpuset_hotplug_workfn [ 2449.763676] RIP: 0010:build_sched_domains+0x104c/0x1430 [ 2449.764465] Code: 45 98 f4 ff ff ff 0f 84 1a f8 ff ff 48 8b 7d 90 31 f6 e8 17 48 ff ff e9 0a f8 ff ff 0f 0b e9 01 fe ff ff 0f 0b e9 b6 fb ff ff <0f> 0b c7 45 98 f4 ff ff ff e9 3f f7 ff ff 48 c7 45 90 00 00 00 00 [ 2449.766934] RSP: 0000:ffffab51c08f7c00 EFLAGS: 00010246 [ 2449.767568] process 737 (tuned) no longer affine to cpu3 [ 2449.768378] RAX: 0000000000000004 RBX: 0000000000000004 RCX: 0000000000000000 [ 2449.769079] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff9486442d7f08 [ 2449.769785] RBP: ffffab51c08f7ca0 R08: 0000000000000000 R09: 0000000000000000 [ 2449.770501] R10: 0000000000000190 R11: ffffab51c08f7ab8 R12: 0000000000000001 [ 2449.771227] R13: 0000000000000000 R14: ffff9486424379c0 R15: 0000000000000001 [ 2449.771920] FS: 0000000000000000(0000) GS:ffff948777c00000(0000) knlGS:0000000000000000 [ 2449.772714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2449.773303] CR2: 000055ed0e0ed158 CR3: 000000010091a002 CR4: 0000000000370ef0 [ 2449.774011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2449.774725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2449.775437] Call Trace: [ 2449.775752] <TASK> [ 2449.776053] ? cpu_attach_domain+0x3d7/0x810 [ 2449.776532] ? wait_for_completion+0xff/0x110 [ 2449.777015] partition_sched_domains_locked+0x1e7/0x3a0 [ 2449.777554] rebuild_sched_domains_locked+0x545/0x800 [ 2449.778032] ? rcu_sync_enter+0x6b/0xc0 [ 2449.778377] rebuild_sched_domains+0x1a/0x40 [ 2449.778728] cpuset_hotplug_workfn+0x18a/0xe10 [ 2449.779105] ? balance_push+0x51/0x110 [ 2449.779444] ? finish_task_switch+0x85/0x2c0 [ 2449.779810] ? __schedule+0x2f7/0x9f0 [ 2449.780134] process_one_work+0x1cd/0x3e0 [ 2449.780495] worker_thread+0x32/0x380 [ 2449.781436] ? process_one_work+0x3e0/0x3e0 [ 2449.782006] kthread+0xe8/0x110 [ 2449.782478] ? kthread_complete_and_exit+0x20/0x20 [ 2449.783067] ret_from_fork+0x1f/0x30 [ 2449.783566] </TASK> [ 2449.783953] ---[ end trace 0000000000000000 ]--- [ 2449.789269] process 741 (tuned) no longer affine to cpu3 [ 2449.794191] process 759 (sshd) no longer affine to cpu3 [ 2450.188215] process 732 (in:imjournal) no longer affine to cpu3 [ 2450.188457] process 733 (rs:main Q:Reg) no longer affine to cpu3 [ 2453.011183] process 659 (gmain) no longer affine to cpu3 [ 2465.517178] select_fallback_rq: 1 callbacks suppressed [ 2465.517185] process 605 (rpcbind) no longer affine to cpu3 [ 2479.794154] process 652 (chronyd) no longer affine to cpu2 ...
> From: Zqiang <qiang1.zhang@intel.com> > Sent: Friday, February 10, 2023 8:40 AM > To: mingo@redhat.com; peterz@infradead.org; juri.lelli@redhat.com; > paulmck@kernel.org; frederic@kernel.org; joel@joelfernandes.org; > rcu@vger.kernel.org; linux-kernel@vger.kernel.org > Subject: [PATCH v3] sched/isolation: Fix illegal CPU value by > housekeeping_any_cpu() return > > For kernels built with CONFIG_NO_HZ_FULL=y, running the following tests: > > runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" bootparams= > "console=ttyS0 nohz_full=0,1 rcu_nocbs=0,1 sched_verbose" -d > > root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu2/online > root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu3/online > >Hi Qiang, > >Did some quick testing using the same kernel parameters and the reproducing steps as yours: > >1) If not apply this v3, the kernel was panic like you found. > >2) If apply this v3, the kernel did NOT panic and worked well. > But a WARNING call trace [1] was thrown. > Not sure whether [1] was another issue. Thanks for testing, yes this is another issue, if enable CONFIG_CPUSETS options will trigger follow calltrace. Thanks Zqiang > >[1] >[ 2445.396928] smpboot: CPU 2 is now offline >[ 2445.399084] CPU2 attaching NULL sched-domain. [ 2445.399091] CPU3 attaching NULL sched-domain. [ 2445.399202] CPU3 attaching NULL sched-domain. [ 2445.399208] root domain span: 3 (max cpu_capacity = 1024) [ 2449.731424] process 672 (tuned) no longer affine to cpu3 [ 2449.733332] process 509 (systemd-journal) no longer affine to cpu3 [ 2449.742278] process 541 (systemd-udevd) no longer affine to cpu3 [ 2449.745409] process 760 (bash) no longer affine to cpu3 [ 2449.748550] smpboot: CPU 3 is now offline [ 2449.755129] CPU3 attaching NULL sched-domain. [ 2449.755194] ------------[ cut here ]------------ [ 2449.756296] WARNING: CPU: 0 PID: 483 at kernel/sched/topology.c:2257 build_sched_domains+0x104c/0x1430 [ 2449.758227] Modules linked in: rfkill sunrpc psmouse i2c_piix4 atkbd libps2 vivaldi_fmap serio_raw virtio_net net_failover failover sr_mod cdrom i8042 qemu_fw_cfg pata_acpi ipmi_devintf ipmi_msghandler [ 2449.760804] CPU: 0 PID: 483 Comm: kworker/3:6 Not tainted 6.2.0-rc7-rcu+ #21 [ 2449.761820] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 2449.762931] Workqueue: events cpuset_hotplug_workfn [ 2449.763676] RIP: 0010:build_sched_domains+0x104c/0x1430 [ 2449.764465] Code: 45 98 f4 ff ff ff 0f 84 1a f8 ff ff 48 8b 7d 90 31 f6 e8 17 48 ff ff e9 0a f8 ff ff 0f 0b e9 01 fe ff ff 0f 0b e9 b6 fb ff ff <0f> 0b c7 45 98 f4 ff ff ff e9 3f f7 ff ff 48 c7 45 90 00 00 00 00 [ 2449.766934] RSP: 0000:ffffab51c08f7c00 EFLAGS: 00010246 [ 2449.767568] process 737 (tuned) no longer affine to cpu3 [ 2449.768378] RAX: 0000000000000004 RBX: 0000000000000004 RCX: 0000000000000000 [ 2449.769079] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff9486442d7f08 [ 2449.769785] RBP: ffffab51c08f7ca0 R08: 0000000000000000 R09: 0000000000000000 [ 2449.770501] R10: 0000000000000190 R11: ffffab51c08f7ab8 R12: 0000000000000001 [ 2449.771227] R13: 0000000000000000 R14: ffff9486424379c0 R15: 0000000000000001 [ 2449.771920] FS: 0000000000000000(0000) GS:ffff948777c00000(0000) knlGS:0000000000000000 [ 2449.772714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2449.773303] CR2: 000055ed0e0ed158 CR3: 000000010091a002 CR4: 0000000000370ef0 [ 2449.774011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2449.774725] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2449.775437] Call Trace: [ 2449.775752] <TASK> [ 2449.776053] ? cpu_attach_domain+0x3d7/0x810 [ 2449.776532] ? wait_for_completion+0xff/0x110 [ 2449.777015] partition_sched_domains_locked+0x1e7/0x3a0 [ 2449.777554] rebuild_sched_domains_locked+0x545/0x800 [ 2449.778032] ? rcu_sync_enter+0x6b/0xc0 [ 2449.778377] rebuild_sched_domains+0x1a/0x40 [ 2449.778728] cpuset_hotplug_workfn+0x18a/0xe10 [ 2449.779105] ? balance_push+0x51/0x110 [ 2449.779444] ? finish_task_switch+0x85/0x2c0 [ 2449.779810] ? __schedule+0x2f7/0x9f0 [ 2449.780134] process_one_work+0x1cd/0x3e0 [ 2449.780495] worker_thread+0x32/0x380 [ 2449.781436] ? process_one_work+0x3e0/0x3e0 [ 2449.782006] kthread+0xe8/0x110 [ 2449.782478] ? kthread_complete_and_exit+0x20/0x20 [ 2449.783067] ret_from_fork+0x1f/0x30 [ 2449.783566] </TASK> [ 2449.783953] ---[ end trace 0000000000000000 ]--- [ 2449.789269] process 741 (tuned) no longer affine to cpu3 [ 2449.794191] process 759 (sshd) no longer affine to cpu3 [ 2450.188215] process 732 (in:imjournal) no longer affine to cpu3 [ 2450.188457] process 733 (rs:main Q:Reg) no longer affine to cpu3 [ 2453.011183] process 659 (gmain) no longer affine to cpu3 [ 2465.517178] select_fallback_rq: 1 callbacks suppressed [ 2465.517185] process 605 (rpcbind) no longer affine to cpu3 [ 2479.794154] process 652 (chronyd) no longer affine to cpu2 ...
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c index 373d42c707bc..edfba557a2e1 100644 --- a/kernel/sched/isolation.c +++ b/kernel/sched/isolation.c @@ -46,7 +46,8 @@ int housekeeping_any_cpu(enum hk_type type) if (cpu < nr_cpu_ids) return cpu; - return cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask); + cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask); + return (cpu >= nr_cpu_ids) ? smp_processor_id() : cpu; } } return smp_processor_id();
For kernels built with CONFIG_NO_HZ_FULL=y, running the following tests: runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" bootparams= "console=ttyS0 nohz_full=0,1 rcu_nocbs=0,1 sched_verbose" -d root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu2/online root@qemux86-64:~# echo 0 > /sys/devices/system/cpu/cpu3/online [ 22.838290] BUG: unable to handle page fault for address: ffffffff84cd48c0 [ 22.839409] #PF: supervisor read access in kernel mode [ 22.840215] #PF: error_code(0x0000) - not-present page [ 22.841028] PGD 3e19067 P4D 3e19067 PUD 3e1a063 PMD 800ffffffb3ff062 [ 22.841889] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI [ 22.842175] CPU: 0 PID: 16 Comm: rcu_preempt Not tainted 6.2.0-rc1-yocto-standard+ #658 [ 22.842534] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.o4 [ 22.843036] RIP: 0010:do_raw_spin_trylock+0x70/0x120 [ 22.843267] Code: 81 c7 00 f1 f1 f1 f1 c7 40 04 04 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 e0 31 c0 e8 b8 0 [ 22.844187] RSP: 0018:ffff8880072b7b30 EFLAGS: 00010046 [ 22.844429] RAX: 0000000000000000 RBX: ffffffff84cd48c0 RCX: dffffc0000000000 [ 22.844751] RDX: 0000000000000003 RSI: 0000000000000004 RDI: ffffffff84cd48c0 [ 22.845074] RBP: ffff8880072b7ba8 R08: ffffffff811daa20 R09: fffffbfff099a919 [ 22.845400] R10: ffffffff84cd48c3 R11: fffffbfff099a918 R12: 1ffff11000e56f66 [ 22.845719] R13: ffffffff84cd48d8 R14: ffffffff84cd48c0 R15: ffff8880072b7cd8 [ 22.846040] FS: 0000000000000000(0000) GS:ffff888035200000(0000) knlGS:0000000000000000 [ 22.846403] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 22.846667] CR2: ffffffff84cd48c0 CR3: 000000001036c000 CR4: 00000000001506f0 [ 22.846988] Call Trace: [ 22.847105] <TASK> [ 22.847208] ? __pfx_do_raw_spin_trylock+0x10/0x10 [ 22.847430] ? rcu_read_unlock+0x26/0x80 [ 22.847612] ? trace_preempt_off+0x2a/0x130 [ 22.847812] _raw_spin_lock+0x41/0x80 [ 22.847984] ? schedule_timeout+0x242/0x580 [ 22.848178] schedule_timeout+0x242/0x580 [ 22.848366] ? __pfx_schedule_timeout+0x10/0x10 [ 22.848575] ? __pfx_do_raw_spin_trylock+0x10/0x10 [ 22.848796] ? __pfx_process_timeout+0x10/0x10 [ 22.849005] ? _raw_spin_unlock_irqrestore+0x46/0x80 [ 22.849232] ? prepare_to_swait_event+0xb8/0x210 [ 22.849450] rcu_gp_fqs_loop+0x66e/0xe70 [ 22.849633] ? rcu_gp_init+0x87c/0x1130 [ 22.849813] ? __pfx_rcu_gp_fqs_loop+0x10/0x10 [ 22.850022] ? _raw_spin_unlock_irqrestore+0x46/0x80 [ 22.850251] ? finish_swait+0xce/0x100 [ 22.850429] rcu_gp_kthread+0x2ea/0x6b0 [ 22.850608] ? __pfx_do_raw_spin_trylock+0x10/0x10 [ 22.850829] ? __pfx_rcu_gp_kthread+0x10/0x10 [ 22.851039] ? __kasan_check_read+0x11/0x20 [ 22.851233] ? __kthread_parkme+0xe8/0x110 [ 22.851424] ? __pfx_rcu_gp_kthread+0x10/0x10 [ 22.851627] kthread+0x172/0x1a0 [ 22.851781] ? __pfx_kthread+0x10/0x10 [ 22.851956] ret_from_fork+0x2c/0x50 [ 22.852129] </TASK> schedule_timeout() ->__mod_timer() ->get_target_base(base, timer->flags) ->get_timer_cpu_base(tflags, get_nohz_timer_target()); ->cpu = get_nohz_timer_target() ->housekeeping_any_cpu(HK_TYPE_TIMER) /*housekeeping.cpumasks[type] is 2-3*/ /*cpu_online_mask is 0-1*/ ->cpu = cpumask_any_and(housekeeping.cpumasks[type], cpu_online_mask); /*cpu value is 4*/ ->new_base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu); /*new_base is illegal address*/ ->if (base != new_base) ->raw_spin_lock(&new_base->lock); ==> trigger Oops This commit therefore add checks for cpumask_any_and() return values in housekeeping_any_cpu(), if cpumask_any_and() returns an illegal CPU value, the housekeeping_any_cpu() will return current CPU number. Signed-off-by: Zqiang <qiang1.zhang@intel.com> --- kernel/sched/isolation.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)