diff mbox series

[v2,06/16] x86/mce/amd: Prep DFR handler before enabling banks

Message ID 20240404151359.47970-7-yazen.ghannam@amd.com (mailing list archive)
State New, archived
Headers show
Series MCA Updates | expand

Commit Message

Yazen Ghannam April 4, 2024, 3:13 p.m. UTC
Scalable MCA systems use the per-bank MCA_CONFIG register to enable
deferred error interrupts. This is done as part of SMCA configuration.

Currently, the deferred error interrupt handler is set up after SMCA
configuration.

Move the deferred error interrupt handler set up before SMCA
configuration. This ensures the kernel is ready to receive the
interrupts before the hardware is configured to send them.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lkml.kernel.org/r/20231118193248.1296798-11-yazen.ghannam@amd.com
    
    v1->v2:
    * No change.

 arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Borislav Petkov April 24, 2024, 6:34 p.m. UTC | #1
On Thu, Apr 04, 2024 at 10:13:49AM -0500, Yazen Ghannam wrote:
> Scalable MCA systems use the per-bank MCA_CONFIG register to enable
> deferred error interrupts. This is done as part of SMCA configuration.
> 
> Currently, the deferred error interrupt handler is set up after SMCA
> configuration.
> 
> Move the deferred error interrupt handler set up before SMCA
> configuration. This ensures the kernel is ready to receive the
> interrupts before the hardware is configured to send them.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> 
> Notes:
>     Link:
>     https://lkml.kernel.org/r/20231118193248.1296798-11-yazen.ghannam@amd.com
>     
>     v1->v2:
>     * No change.
> 
>  arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 3093fed06194..e8e78d91082b 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -589,6 +589,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>  	u32 low = 0, high = 0;
>  	int def_offset = -1, def_new;
>  
> +	if (!mce_flags.succor)

Does succor imply smca?

Because you have now this order:

	deferred_error_interrupt_enable(c);

	...

	configure_smca(bank);

and that one tests mce_flags.smca

Now, if succor didn't imply smca, we'll enable the DF irq handler for
no good reason on (succor=true && smca=false) systems.

If the implication is given:

Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Yazen Ghannam April 25, 2024, 1:31 p.m. UTC | #2
On 4/24/2024 2:34 PM, Borislav Petkov wrote:
> On Thu, Apr 04, 2024 at 10:13:49AM -0500, Yazen Ghannam wrote:
>> Scalable MCA systems use the per-bank MCA_CONFIG register to enable
>> deferred error interrupts. This is done as part of SMCA configuration.
>>
>> Currently, the deferred error interrupt handler is set up after SMCA
>> configuration.
>>
>> Move the deferred error interrupt handler set up before SMCA
>> configuration. This ensures the kernel is ready to receive the
>> interrupts before the hardware is configured to send them.
>>
>> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> ---
>>
>> Notes:
>>     Link:
>>     https://lkml.kernel.org/r/20231118193248.1296798-11-yazen.ghannam@amd.com
>>     
>>     v1->v2:
>>     * No change.
>>
>>  arch/x86/kernel/cpu/mce/amd.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
>> index 3093fed06194..e8e78d91082b 100644
>> --- a/arch/x86/kernel/cpu/mce/amd.c
>> +++ b/arch/x86/kernel/cpu/mce/amd.c
>> @@ -589,6 +589,9 @@ static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
>>  	u32 low = 0, high = 0;
>>  	int def_offset = -1, def_new;
>>  
>> +	if (!mce_flags.succor)
> 
> Does succor imply smca?
> 
> Because you have now this order:
> 
> 	deferred_error_interrupt_enable(c);
> 
> 	...
> 
> 	configure_smca(bank);
> 
> and that one tests mce_flags.smca
> 
> Now, if succor didn't imply smca, we'll enable the DF irq handler for
> no good reason on (succor=true && smca=false) systems.
> 
> If the implication is given:
> 
> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
> 

They are independent features. SUCCOR is the feature that defines the deferred
error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
in the later Family 15h systems.

Thanks,
Yazen
Borislav Petkov April 29, 2024, 12:38 p.m. UTC | #3
On Thu, Apr 25, 2024 at 09:31:58AM -0400, Yazen Ghannam wrote:
> They are independent features. SUCCOR is the feature that defines the deferred
> error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
> in the later Family 15h systems.

... and as we've established, it is not really enabled on them for
whatever reason so for simplicity's sake, we'll simply assume that it
was enabled together with SMCA and that would simplify a lot of things
in the code.

Please put that in the commit message so that we know.

Thx.
Yazen Ghannam April 29, 2024, 1:22 p.m. UTC | #4
On 4/29/2024 8:38 AM, Borislav Petkov wrote:
> On Thu, Apr 25, 2024 at 09:31:58AM -0400, Yazen Ghannam wrote:
>> They are independent features. SUCCOR is the feature that defines the deferred
>> error interrupt and data poisoning. This predates SMCA. SUCCOR was introduced
>> in the later Family 15h systems.
> 
> ... and as we've established, it is not really enabled on them for
> whatever reason so for simplicity's sake, we'll simply assume that it
> was enabled together with SMCA and that would simplify a lot of things
> in the code.
> 
> Please put that in the commit message so that we know.
> 
> Thx.
> 

Okay, will do.

Thanks,
Yazen
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 3093fed06194..e8e78d91082b 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -589,6 +589,9 @@  static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0;
 	int def_offset = -1, def_new;
 
+	if (!mce_flags.succor)
+		return;
+
 	if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
 		return;
 
@@ -768,6 +771,7 @@  void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	deferred_error_interrupt_enable(c);
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
 		if (mce_flags.smca)
@@ -794,9 +798,6 @@  void mce_amd_feature_init(struct cpuinfo_x86 *c)
 			offset = prepare_threshold_block(bank, block, address, offset, high);
 		}
 	}
-
-	if (mce_flags.succor)
-		deferred_error_interrupt_enable(c);
 }
 
 /*