diff mbox series

[v2] parisc: fix a crash with multicore scheduler

Message ID alpine.LRH.2.02.2206011316440.25830@file01.intranet.prod.int.rdu2.redhat.com (mailing list archive)
State Accepted, archived
Headers show
Series [v2] parisc: fix a crash with multicore scheduler | expand

Commit Message

Mikulas Patocka June 1, 2022, 5:18 p.m. UTC
With the kernel 5.18, the system will hang on boot if it is compiled with
CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU".

The crash happens in sd_init
tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens
	because cpu_topology[0].core_sibling is empty.
Consequently, sd_span is set to an empty mask
sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is
	empty)
sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL
	because sd_id is out of range
atomic_inc(&sd->shared->ref); crashes without printing anything

We can fix it by calling reset_cpu_topology() from init_cpu_topology() -
this will initialize the sibling masks on CPUs, so that they're not empty.

This patch also removes the variable "dualcores_found", it is useless,
because during boot, init_cpu_topology is called before
store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never
called. We don't need to call it at all because default_topology in
kernel/sched/topology.c contains the same items as parisc_mc_topology.

Note that we should not call store_cpu_topology() from init_per_cpu()
because it is called too early in the kernel initialization process and it
results in the message "Failure to register CPU0 device". Before this
patch, store_cpu_topology() would exit immediatelly because
cpuid_topo->core id was uninitialized and it was 0.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org	# v5.18

---
 arch/parisc/kernel/processor.c |    2 --
 arch/parisc/kernel/topology.c  |   16 +---------------
 2 files changed, 1 insertion(+), 17 deletions(-)

Comments

Helge Deller June 2, 2022, 8:52 p.m. UTC | #1
Hi Mikulas,

On 6/1/22 19:18, Mikulas Patocka wrote:
> With the kernel 5.18, the system will hang on boot if it is compiled with
> CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU".
>
> The crash happens in sd_init
> tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens
> 	because cpu_topology[0].core_sibling is empty.
> Consequently, sd_span is set to an empty mask
> sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is
> 	empty)
> sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL
> 	because sd_id is out of range
> atomic_inc(&sd->shared->ref); crashes without printing anything
>
> We can fix it by calling reset_cpu_topology() from init_cpu_topology() -
> this will initialize the sibling masks on CPUs, so that they're not empty.
>
> This patch also removes the variable "dualcores_found", it is useless,
> because during boot, init_cpu_topology is called before
> store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never
> called. We don't need to call it at all because default_topology in
> kernel/sched/topology.c contains the same items as parisc_mc_topology.
>
> Note that we should not call store_cpu_topology() from init_per_cpu()
> because it is called too early in the kernel initialization process and it
> results in the message "Failure to register CPU0 device". Before this
> patch, store_cpu_topology() would exit immediatelly because
> cpuid_topo->core id was uninitialized and it was 0.
>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org	# v5.18

Thanks a lot !!!

It took me some time to test it, but it looks good and boots on
all of my machines so far. I was curious if 32-bit kernels still
work since that was one of the issues with the older patches...

With your patch we can drop the "config SCHED_MC" entry from
arch/parisc/Kconfig as well.
Will you respin, or should I simply add this to your patch?

Helge


>
> ---
>  arch/parisc/kernel/processor.c |    2 --
>  arch/parisc/kernel/topology.c  |   16 +---------------
>  2 files changed, 1 insertion(+), 17 deletions(-)
>
> Index: linux-2.6/arch/parisc/kernel/topology.c
> ===================================================================
> --- linux-2.6.orig/arch/parisc/kernel/topology.c	2022-06-01 15:32:59.000000000 +0200
> +++ linux-2.6/arch/parisc/kernel/topology.c	2022-06-01 18:37:37.000000000 +0200
> @@ -20,8 +20,6 @@
>
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
>
> -static int dualcores_found;
> -
>  /*
>   * store_cpu_topology is called at boot when only one cpu is running
>   * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
> @@ -60,7 +58,6 @@ void store_cpu_topology(unsigned int cpu
>  			if (p->cpu_loc) {
>  				cpuid_topo->core_id++;
>  				cpuid_topo->package_id = cpu_topology[cpu].package_id;
> -				dualcores_found = 1;
>  				continue;
>  			}
>  		}
> @@ -80,22 +77,11 @@ void store_cpu_topology(unsigned int cpu
>  		cpu_topology[cpuid].package_id);
>  }
>
> -static struct sched_domain_topology_level parisc_mc_topology[] = {
> -#ifdef CONFIG_SCHED_MC
> -	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> -#endif
> -
> -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -	{ NULL, },
> -};
> -
>  /*
>   * init_cpu_topology is called at boot when only one cpu is running
>   * which prevent simultaneous write access to cpu_topology array
>   */
>  void __init init_cpu_topology(void)
>  {
> -	/* Set scheduler topology descriptor */
> -	if (dualcores_found)
> -		set_sched_topology(parisc_mc_topology);
> +	reset_cpu_topology();
>  }
> Index: linux-2.6/arch/parisc/kernel/processor.c
> ===================================================================
> --- linux-2.6.orig/arch/parisc/kernel/processor.c	2022-06-01 15:32:59.000000000 +0200
> +++ linux-2.6/arch/parisc/kernel/processor.c	2022-06-01 18:35:12.000000000 +0200
> @@ -327,8 +327,6 @@ int init_per_cpu(int cpunum)
>  	set_firmware_width();
>  	ret = pdc_coproc_cfg(&coproc_cfg);
>
> -	store_cpu_topology(cpunum);
> -
>  	if(ret >= 0 && coproc_cfg.ccr_functional) {
>  		mtctl(coproc_cfg.ccr_functional, 10);  /* 10 == Coprocessor Control Reg */
>
>
Mikulas Patocka June 3, 2022, 7:40 a.m. UTC | #2
On Thu, 2 Jun 2022, Helge Deller wrote:

> Hi Mikulas,
> 
> Thanks a lot !!!
> 
> It took me some time to test it, but it looks good and boots on
> all of my machines so far. I was curious if 32-bit kernels still
> work since that was one of the issues with the older patches...
> 
> With your patch we can drop the "config SCHED_MC" entry from
> arch/parisc/Kconfig as well.
> Will you respin, or should I simply add this to your patch?
> 
> Helge

I think that we don't have to drop "config SCHED_MC". It is used in 
kernel/sched/topology.c to select the multicore-aware scheduler. There is 
no reason why the multicore scheduler would not work on parisc.

Mikulas
diff mbox series

Patch

Index: linux-2.6/arch/parisc/kernel/topology.c
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/topology.c	2022-06-01 15:32:59.000000000 +0200
+++ linux-2.6/arch/parisc/kernel/topology.c	2022-06-01 18:37:37.000000000 +0200
@@ -20,8 +20,6 @@ 
 
 static DEFINE_PER_CPU(struct cpu, cpu_devices);
 
-static int dualcores_found;
-
 /*
  * store_cpu_topology is called at boot when only one cpu is running
  * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
@@ -60,7 +58,6 @@  void store_cpu_topology(unsigned int cpu
 			if (p->cpu_loc) {
 				cpuid_topo->core_id++;
 				cpuid_topo->package_id = cpu_topology[cpu].package_id;
-				dualcores_found = 1;
 				continue;
 			}
 		}
@@ -80,22 +77,11 @@  void store_cpu_topology(unsigned int cpu
 		cpu_topology[cpuid].package_id);
 }
 
-static struct sched_domain_topology_level parisc_mc_topology[] = {
-#ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
-#endif
-
-	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
-	{ NULL, },
-};
-
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
  */
 void __init init_cpu_topology(void)
 {
-	/* Set scheduler topology descriptor */
-	if (dualcores_found)
-		set_sched_topology(parisc_mc_topology);
+	reset_cpu_topology();
 }
Index: linux-2.6/arch/parisc/kernel/processor.c
===================================================================
--- linux-2.6.orig/arch/parisc/kernel/processor.c	2022-06-01 15:32:59.000000000 +0200
+++ linux-2.6/arch/parisc/kernel/processor.c	2022-06-01 18:35:12.000000000 +0200
@@ -327,8 +327,6 @@  int init_per_cpu(int cpunum)
 	set_firmware_width();
 	ret = pdc_coproc_cfg(&coproc_cfg);
 
-	store_cpu_topology(cpunum);
-
 	if(ret >= 0 && coproc_cfg.ccr_functional) {
 		mtctl(coproc_cfg.ccr_functional, 10);  /* 10 == Coprocessor Control Reg */