Message ID | 20220330155611.30216-1-pauld@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] arch/arm64: Fix topology initialization for core scheduling | expand |
On 30/03/2022 17:56, Phil Auld wrote: > Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() > to transfer the toplogy to the various cpu masks. This needs to be done > before the call to notify_cpu_starting() which tells the scheduler about > each cpu found, otherwise the core scheduling data structures are setup > in a way that does not match the actual topology. > > Without this change stress-ng (which enables core scheduling in its prctl > tests) causes a warning and then a crash (trimmed for legibility): > > [ 1853.805168] ------------[ cut here ]------------ > [ 1853.809784] task_rq(b)->core != rq->core > [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 > ... > [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 > ... > [ 1854.231256] Call trace: > [ 1854.233689] pick_next_task+0x3dc/0x81c > [ 1854.237512] __schedule+0x10c/0x4cc > [ 1854.240988] schedule_idle+0x34/0x54 > > Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") > Signed-off-by: Phil Auld <pauld@redhat.com> > --- > This is a similar issue to > f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") > which fixed it for MIPS. > > v2: Fixed the commit message. No code change. Ah, the reason is that smt_mask is not correctly setup, so we bail on `cpumask_weight(smt_mask) == 1` for !leaders in: notify_cpu_starting() cpuhp_invoke_callback_range() sched_cpu_starting() sched_core_cpu_starting() which leads to rq->core not being correctly set for !leader-rq's. LGTM. Tested on: HPE Apollo 70 X1 Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [...]
On Thu, Mar 31, 2022 at 11:04:31AM +0200 Dietmar Eggemann wrote: > On 30/03/2022 17:56, Phil Auld wrote: > > Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() > > to transfer the toplogy to the various cpu masks. This needs to be done > > before the call to notify_cpu_starting() which tells the scheduler about > > each cpu found, otherwise the core scheduling data structures are setup > > in a way that does not match the actual topology. > > > > Without this change stress-ng (which enables core scheduling in its prctl > > tests) causes a warning and then a crash (trimmed for legibility): > > > > [ 1853.805168] ------------[ cut here ]------------ > > [ 1853.809784] task_rq(b)->core != rq->core > > [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 > > ... > > [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 > > ... > > [ 1854.231256] Call trace: > > [ 1854.233689] pick_next_task+0x3dc/0x81c > > [ 1854.237512] __schedule+0x10c/0x4cc > > [ 1854.240988] schedule_idle+0x34/0x54 > > > > Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") > > Signed-off-by: Phil Auld <pauld@redhat.com> > > --- > > This is a similar issue to > > f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") > > which fixed it for MIPS. > > > > v2: Fixed the commit message. No code change. > > Ah, the reason is that smt_mask is not correctly setup, so we bail on > `cpumask_weight(smt_mask) == 1` for !leaders in: > > notify_cpu_starting() > cpuhp_invoke_callback_range() > sched_cpu_starting() > sched_core_cpu_starting() > > which leads to rq->core not being correctly set for !leader-rq's. > Exactly, sorry I was not clearer. smt_mask must be setup correctly by the time sched_core_cpu_starting() is called. (Maybe I should crib some of the above lines into the commit message?) > LGTM. Tested on: HPE Apollo 70 X1 > > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > Thanks! Cheers, Phil > [...] >
On 31/03/2022 15:21, Phil Auld wrote: > On Thu, Mar 31, 2022 at 11:04:31AM +0200 Dietmar Eggemann wrote: >> On 30/03/2022 17:56, Phil Auld wrote: [...] >> Ah, the reason is that smt_mask is not correctly setup, so we bail on >> `cpumask_weight(smt_mask) == 1` for !leaders in: >> >> notify_cpu_starting() >> cpuhp_invoke_callback_range() >> sched_cpu_starting() >> sched_core_cpu_starting() >> >> which leads to rq->core not being correctly set for !leader-rq's. >> > > Exactly, sorry I was not clearer. smt_mask must be setup correctly > by the time sched_core_cpu_starting() is called. (Maybe I should crib > some of the above lines into the commit message?) Yeah, maybe, it wouldn't hurt I guess. IMHO mentioning stress-ng's prctl needs PR_SCHED_CORE support could also be handy since today's stress-ng packages don't seem to have this yet.
On Thu, Mar 31, 2022 at 04:37:50PM +0200 Dietmar Eggemann wrote: > On 31/03/2022 15:21, Phil Auld wrote: > > On Thu, Mar 31, 2022 at 11:04:31AM +0200 Dietmar Eggemann wrote: > >> On 30/03/2022 17:56, Phil Auld wrote: > > [...] > > >> Ah, the reason is that smt_mask is not correctly setup, so we bail on > >> `cpumask_weight(smt_mask) == 1` for !leaders in: > >> > >> notify_cpu_starting() > >> cpuhp_invoke_callback_range() > >> sched_cpu_starting() > >> sched_core_cpu_starting() > >> > >> which leads to rq->core not being correctly set for !leader-rq's. > >> > > > > Exactly, sorry I was not clearer. smt_mask must be setup correctly > > by the time sched_core_cpu_starting() is called. (Maybe I should crib > > some of the above lines into the commit message?) > > Yeah, maybe, it wouldn't hurt I guess. IMHO mentioning stress-ng's prctl > needs PR_SCHED_CORE support could also be handy since today's stress-ng > packages don't seem to have this yet. > My scripts clone it so I did not realize that was not in prepackaged versions yet. But that said, that's really just a way to tickle the problem. Anyone using core scheduling on such a system will hit this (at least the WARN part, the actual crash was harder to create w/o all the threads and tasks stress-ng uses). I can send a v3 with a further commit message update. Cheers, Phil
On 31/03/2022 16:53, Phil Auld wrote: > On Thu, Mar 31, 2022 at 04:37:50PM +0200 Dietmar Eggemann wrote: >> On 31/03/2022 15:21, Phil Auld wrote: >>> On Thu, Mar 31, 2022 at 11:04:31AM +0200 Dietmar Eggemann wrote: >>>> On 30/03/2022 17:56, Phil Auld wrote: [...] >> Yeah, maybe, it wouldn't hurt I guess. IMHO mentioning stress-ng's prctl >> needs PR_SCHED_CORE support could also be handy since today's stress-ng >> packages don't seem to have this yet. >> > > My scripts clone it so I did not realize that was not in prepackaged versions > yet. But that said, that's really just a way to tickle the problem. Anyone > using core scheduling on such a system will hit this (at least the WARN part, > the actual crash was harder to create w/o all the threads and tasks stress-ng > uses). OK. I just saw that there is even a kselftest for this: 9f2699007493 - kselftest: Add test for core sched prctl interface (2021-05-12 Chris Hyser) But it doesn't trigger the issue. > I can send a v3 with a further commit message update. Sounds good.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 27df5c1e6baa..3b46041f2b97 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void) * Log the CPU info before it is marked online and might get read. */ cpuinfo_store_cpu(); + store_cpu_topology(cpu); /* * Enable GIC and timers. @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void) ipi_setup(cpu); - store_cpu_topology(cpu); numa_add_cpu(cpu); /*
Arm64 systems rely on store_cpu_topology() to call update_siblings_masks() to transfer the toplogy to the various cpu masks. This needs to be done before the call to notify_cpu_starting() which tells the scheduler about each cpu found, otherwise the core scheduling data structures are setup in a way that does not match the actual topology. Without this change stress-ng (which enables core scheduling in its prctl tests) causes a warning and then a crash (trimmed for legibility): [ 1853.805168] ------------[ cut here ]------------ [ 1853.809784] task_rq(b)->core != rq->core [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4 ... [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 ... [ 1854.231256] Call trace: [ 1854.233689] pick_next_task+0x3dc/0x81c [ 1854.237512] __schedule+0x10c/0x4cc [ 1854.240988] schedule_idle+0x34/0x54 Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock") Signed-off-by: Phil Auld <pauld@redhat.com> --- This is a similar issue to f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") which fixed it for MIPS. v2: Fixed the commit message. No code change. arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)