diff mbox series

[RFC,v2,05/35] drivers: base: Print a warning instead of panic() when register_cpu() fails

Message ID 20230913163823.7880-6-james.morse@arm.com (mailing list archive)
State Handled Elsewhere
Headers show
Series ACPI/arm64: add support for virtual cpuhotplug | expand

Checks

Context Check Description
conchuod/cover_letter success Series has a cover letter
conchuod/tree_selection success Guessed tree name to be for-next at HEAD 0bb80ecc33a8
conchuod/fixes_present success Fixes tag not required for -next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 5 and now 5
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 9 this patch: 9
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 9 this patch: 9
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 25 this patch: 25
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success No Fixes tag
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

James Morse Sept. 13, 2023, 4:37 p.m. UTC
loongarch, mips, parisc, riscv and sh all print a warning if
register_cpu() returns an error. Architectures that use
GENERIC_CPU_DEVICES call panic() instead.

Errors in this path indicate something is wrong with the firmware
description of the platform, but the kernel is able to keep running.

Downgrade this to a warning to make it easier to debug this issue.

This will allow architectures that switching over to GENERIC_CPU_DEVICES
to drop their warning, but keep the existing behaviour.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/base/cpu.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Russell King (Oracle) Sept. 14, 2023, 9:52 a.m. UTC | #1
On Wed, Sep 13, 2023 at 04:37:53PM +0000, James Morse wrote:
> loongarch, mips, parisc, riscv and sh all print a warning if
> register_cpu() returns an error. Architectures that use
> GENERIC_CPU_DEVICES call panic() instead.
> 
> Errors in this path indicate something is wrong with the firmware
> description of the platform, but the kernel is able to keep running.
> 
> Downgrade this to a warning to make it easier to debug this issue.
> 
> This will allow architectures that switching over to GENERIC_CPU_DEVICES
> to drop their warning, but keep the existing behaviour.
> 
> Signed-off-by: James Morse <james.morse@arm.com>

Assuming other architectures do similar to x86 (which only return the
error code from register_cpu()), the only error that would occur here
is if device_register() fails, which would be catastophic, and I
suspect the system would fail to boot anyway.

Downgrading the panic to a warning at least gives us a chance that
the system may come up sufficiently to examine what happened, so I
think this makes sense:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Gavin Shan Sept. 18, 2023, 3:33 a.m. UTC | #2
On 9/14/23 02:37, James Morse wrote:
> loongarch, mips, parisc, riscv and sh all print a warning if
> register_cpu() returns an error. Architectures that use
> GENERIC_CPU_DEVICES call panic() instead.
> 
> Errors in this path indicate something is wrong with the firmware
> description of the platform, but the kernel is able to keep running.
> 
> Downgrade this to a warning to make it easier to debug this issue.
> 
> This will allow architectures that switching over to GENERIC_CPU_DEVICES
> to drop their warning, but keep the existing behaviour.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
>   drivers/base/cpu.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 579064fda97b..d31c936f0955 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -535,14 +535,15 @@ int __weak arch_register_cpu(int cpu)
>   
>   static void __init cpu_dev_register_generic(void)
>   {
> -	int i;
> +	int i, ret;
>   
>   	if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
>   		return;
>   
>   	for_each_present_cpu(i) {
> -		if (arch_register_cpu(i))
> -			panic("Failed to register CPU device");
> +		ret = arch_register_cpu(i);
> +		if (ret)
> +			pr_warn("register_cpu %d failed (%d)\n", i, ret);
>   	}
>   }
>   

The same warning message has been printed by arch/loongarch/kernel/topology.c::arch_register_cpu().
In order to avoid the duplication, I think the warning message in arch/loongarch needs to be dropped?

Thanks,
Gavin
Russell King (Oracle) Oct. 20, 2023, 11:16 a.m. UTC | #3
On Mon, Sep 18, 2023 at 01:33:37PM +1000, Gavin Shan wrote:
> 
> 
> On 9/14/23 02:37, James Morse wrote:
> > loongarch, mips, parisc, riscv and sh all print a warning if
> > register_cpu() returns an error. Architectures that use
> > GENERIC_CPU_DEVICES call panic() instead.
> > 
> > Errors in this path indicate something is wrong with the firmware
> > description of the platform, but the kernel is able to keep running.
> > 
> > Downgrade this to a warning to make it easier to debug this issue.
> > 
> > This will allow architectures that switching over to GENERIC_CPU_DEVICES
> > to drop their warning, but keep the existing behaviour.
> > 
> > Signed-off-by: James Morse <james.morse@arm.com>
> > ---
> >   drivers/base/cpu.c | 7 ++++---
> >   1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index 579064fda97b..d31c936f0955 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -535,14 +535,15 @@ int __weak arch_register_cpu(int cpu)
> >   static void __init cpu_dev_register_generic(void)
> >   {
> > -	int i;
> > +	int i, ret;
> >   	if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
> >   		return;
> >   	for_each_present_cpu(i) {
> > -		if (arch_register_cpu(i))
> > -			panic("Failed to register CPU device");
> > +		ret = arch_register_cpu(i);
> > +		if (ret)
> > +			pr_warn("register_cpu %d failed (%d)\n", i, ret);
> >   	}
> >   }
> 
> The same warning message has been printed by arch/loongarch/kernel/topology.c::arch_register_cpu().
> In order to avoid the duplication, I think the warning message in arch/loongarch needs to be dropped?

No it doesn't, as far as Loongarch is concerned. Given where this change
occurs in the series, it is correct as far as this is concerned.

The reason is that this code path can only be reached when
CONFIG_GENERIC_CPU_DEVICES is set, which is something the arch has to
select. Loongarch doesn't select that until patch 9 in the series,
"LoongArch: Switch over to GENERIC_CPU_DEVICES", and that patch is
where the warning message in arch/loongarch is removed.
diff mbox series

Patch

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 579064fda97b..d31c936f0955 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -535,14 +535,15 @@  int __weak arch_register_cpu(int cpu)
 
 static void __init cpu_dev_register_generic(void)
 {
-	int i;
+	int i, ret;
 
 	if (!IS_ENABLED(CONFIG_GENERIC_CPU_DEVICES))
 		return;
 
 	for_each_present_cpu(i) {
-		if (arch_register_cpu(i))
-			panic("Failed to register CPU device");
+		ret = arch_register_cpu(i);
+		if (ret)
+			pr_warn("register_cpu %d failed (%d)\n", i, ret);
 	}
 }