Message ID | alpine.LRH.2.02.2206011316440.25830@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers | show |
Series | [v2] parisc: fix a crash with multicore scheduler | expand |
Hi Mikulas, On 6/1/22 19:18, Mikulas Patocka wrote: > With the kernel 5.18, the system will hang on boot if it is compiled with > CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU". > > The crash happens in sd_init > tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens > because cpu_topology[0].core_sibling is empty. > Consequently, sd_span is set to an empty mask > sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is > empty) > sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL > because sd_id is out of range > atomic_inc(&sd->shared->ref); crashes without printing anything > > We can fix it by calling reset_cpu_topology() from init_cpu_topology() - > this will initialize the sibling masks on CPUs, so that they're not empty. > > This patch also removes the variable "dualcores_found", it is useless, > because during boot, init_cpu_topology is called before > store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never > called. We don't need to call it at all because default_topology in > kernel/sched/topology.c contains the same items as parisc_mc_topology. > > Note that we should not call store_cpu_topology() from init_per_cpu() > because it is called too early in the kernel initialization process and it > results in the message "Failure to register CPU0 device". Before this > patch, store_cpu_topology() would exit immediatelly because > cpuid_topo->core id was uninitialized and it was 0. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org # v5.18 Thanks a lot !!! It took me some time to test it, but it looks good and boots on all of my machines so far. I was curious if 32-bit kernels still work since that was one of the issues with the older patches... With your patch we can drop the "config SCHED_MC" entry from arch/parisc/Kconfig as well. Will you respin, or should I simply add this to your patch? Helge > > --- > arch/parisc/kernel/processor.c | 2 -- > arch/parisc/kernel/topology.c | 16 +--------------- > 2 files changed, 1 insertion(+), 17 deletions(-) > > Index: linux-2.6/arch/parisc/kernel/topology.c > =================================================================== > --- linux-2.6.orig/arch/parisc/kernel/topology.c 2022-06-01 15:32:59.000000000 +0200 > +++ linux-2.6/arch/parisc/kernel/topology.c 2022-06-01 18:37:37.000000000 +0200 > @@ -20,8 +20,6 @@ > > static DEFINE_PER_CPU(struct cpu, cpu_devices); > > -static int dualcores_found; > - > /* > * store_cpu_topology is called at boot when only one cpu is running > * and with the mutex cpu_hotplug.lock locked, when several cpus have booted, > @@ -60,7 +58,6 @@ void store_cpu_topology(unsigned int cpu > if (p->cpu_loc) { > cpuid_topo->core_id++; > cpuid_topo->package_id = cpu_topology[cpu].package_id; > - dualcores_found = 1; > continue; > } > } > @@ -80,22 +77,11 @@ void store_cpu_topology(unsigned int cpu > cpu_topology[cpuid].package_id); > } > > -static struct sched_domain_topology_level parisc_mc_topology[] = { > -#ifdef CONFIG_SCHED_MC > - { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) }, > -#endif > - > - { cpu_cpu_mask, SD_INIT_NAME(DIE) }, > - { NULL, }, > -}; > - > /* > * init_cpu_topology is called at boot when only one cpu is running > * which prevent simultaneous write access to cpu_topology array > */ > void __init init_cpu_topology(void) > { > - /* Set scheduler topology descriptor */ > - if (dualcores_found) > - set_sched_topology(parisc_mc_topology); > + reset_cpu_topology(); > } > Index: linux-2.6/arch/parisc/kernel/processor.c > =================================================================== > --- linux-2.6.orig/arch/parisc/kernel/processor.c 2022-06-01 15:32:59.000000000 +0200 > +++ linux-2.6/arch/parisc/kernel/processor.c 2022-06-01 18:35:12.000000000 +0200 > @@ -327,8 +327,6 @@ int init_per_cpu(int cpunum) > set_firmware_width(); > ret = pdc_coproc_cfg(&coproc_cfg); > > - store_cpu_topology(cpunum); > - > if(ret >= 0 && coproc_cfg.ccr_functional) { > mtctl(coproc_cfg.ccr_functional, 10); /* 10 == Coprocessor Control Reg */ > >
On Thu, 2 Jun 2022, Helge Deller wrote: > Hi Mikulas, > > Thanks a lot !!! > > It took me some time to test it, but it looks good and boots on > all of my machines so far. I was curious if 32-bit kernels still > work since that was one of the issues with the older patches... > > With your patch we can drop the "config SCHED_MC" entry from > arch/parisc/Kconfig as well. > Will you respin, or should I simply add this to your patch? > > Helge I think that we don't have to drop "config SCHED_MC". It is used in kernel/sched/topology.c to select the multicore-aware scheduler. There is no reason why the multicore scheduler would not work on parisc. Mikulas
Index: linux-2.6/arch/parisc/kernel/topology.c =================================================================== --- linux-2.6.orig/arch/parisc/kernel/topology.c 2022-06-01 15:32:59.000000000 +0200 +++ linux-2.6/arch/parisc/kernel/topology.c 2022-06-01 18:37:37.000000000 +0200 @@ -20,8 +20,6 @@ static DEFINE_PER_CPU(struct cpu, cpu_devices); -static int dualcores_found; - /* * store_cpu_topology is called at boot when only one cpu is running * and with the mutex cpu_hotplug.lock locked, when several cpus have booted, @@ -60,7 +58,6 @@ void store_cpu_topology(unsigned int cpu if (p->cpu_loc) { cpuid_topo->core_id++; cpuid_topo->package_id = cpu_topology[cpu].package_id; - dualcores_found = 1; continue; } } @@ -80,22 +77,11 @@ void store_cpu_topology(unsigned int cpu cpu_topology[cpuid].package_id); } -static struct sched_domain_topology_level parisc_mc_topology[] = { -#ifdef CONFIG_SCHED_MC - { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) }, -#endif - - { cpu_cpu_mask, SD_INIT_NAME(DIE) }, - { NULL, }, -}; - /* * init_cpu_topology is called at boot when only one cpu is running * which prevent simultaneous write access to cpu_topology array */ void __init init_cpu_topology(void) { - /* Set scheduler topology descriptor */ - if (dualcores_found) - set_sched_topology(parisc_mc_topology); + reset_cpu_topology(); } Index: linux-2.6/arch/parisc/kernel/processor.c =================================================================== --- linux-2.6.orig/arch/parisc/kernel/processor.c 2022-06-01 15:32:59.000000000 +0200 +++ linux-2.6/arch/parisc/kernel/processor.c 2022-06-01 18:35:12.000000000 +0200 @@ -327,8 +327,6 @@ int init_per_cpu(int cpunum) set_firmware_width(); ret = pdc_coproc_cfg(&coproc_cfg); - store_cpu_topology(cpunum); - if(ret >= 0 && coproc_cfg.ccr_functional) { mtctl(coproc_cfg.ccr_functional, 10); /* 10 == Coprocessor Control Reg */
With the kernel 5.18, the system will hang on boot if it is compiled with CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU". The crash happens in sd_init tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens because cpu_topology[0].core_sibling is empty. Consequently, sd_span is set to an empty mask sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is empty) sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL because sd_id is out of range atomic_inc(&sd->shared->ref); crashes without printing anything We can fix it by calling reset_cpu_topology() from init_cpu_topology() - this will initialize the sibling masks on CPUs, so that they're not empty. This patch also removes the variable "dualcores_found", it is useless, because during boot, init_cpu_topology is called before store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never called. We don't need to call it at all because default_topology in kernel/sched/topology.c contains the same items as parisc_mc_topology. Note that we should not call store_cpu_topology() from init_per_cpu() because it is called too early in the kernel initialization process and it results in the message "Failure to register CPU0 device". Before this patch, store_cpu_topology() would exit immediatelly because cpuid_topo->core id was uninitialized and it was 0. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v5.18 --- arch/parisc/kernel/processor.c | 2 -- arch/parisc/kernel/topology.c | 16 +--------------- 2 files changed, 1 insertion(+), 17 deletions(-)