diff mbox series

x86/MCE: Prevent CPU offline for SMCA CPUs with non-core banks

Message ID 20240821140017.330105-1-yazen.ghannam@amd.com (mailing list archive)
State New
Headers show
Series x86/MCE: Prevent CPU offline for SMCA CPUs with non-core banks | expand

Commit Message

Yazen Ghannam Aug. 21, 2024, 2 p.m. UTC
Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
banks. Each of these banks represents unique and separate hardware
located within the system. Each bank is managed by a single logical CPU;
they are not shared. Furthermore, the "CPU to MCA bank" assignment
cannot be modified at run time.

The MCE subsystem supports run time CPU hotplug. Many vendors have
non-core MCA banks, so MCA settings are not cleared when a CPU is
offlined for these vendors.

Even though the non-core MCA banks remain enabled, MCA errors will not
be handled (reported, cleared, etc.) on SMCA systems when the managing
CPU is offline.

Check if a CPU manages non-core MCA banks and, if so, prevent it from
being taken offline.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
 arch/x86/kernel/cpu/mce/core.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Boris Ostrovsky Aug. 21, 2024, 6:35 p.m. UTC | #1
On 8/21/24 10:00 AM, Yazen Ghannam wrote:

>   
> +	if (!mce_cpu_is_hotpluggable()) {
> +		pr_info("CPU%d is not hotpluggable\n", cpu);

Can this error message be made a little more informative, e.g. that it 
is emitted from MCA code? And is info the right level? I think notice 
would be more appropriate.


-boris
Yazen Ghannam Aug. 22, 2024, 2:14 p.m. UTC | #2
On Wed, Aug 21, 2024 at 02:35:23PM -0400, boris.ostrovsky@oracle.com wrote:
> 
> 
> On 8/21/24 10:00 AM, Yazen Ghannam wrote:
> 
> > +	if (!mce_cpu_is_hotpluggable()) {
> > +		pr_info("CPU%d is not hotpluggable\n", cpu);
> 
> Can this error message be made a little more informative, e.g. that it is
> emitted from MCA code? And is info the right level? I think notice would be
> more appropriate.
>

Sure thing. I can change the level to notice.

The MCE subsystem has a global prefix set already by pr_fmt. This is
defined in arch/x86/kernel/cpu/mce/internal.h.

Thanks,
Yazen
Thomas Gleixner Aug. 25, 2024, 11:16 a.m. UTC | #3
On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
> Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
> banks. Each of these banks represents unique and separate hardware
> located within the system. Each bank is managed by a single logical CPU;
> they are not shared. Furthermore, the "CPU to MCA bank" assignment
> cannot be modified at run time.
>
> The MCE subsystem supports run time CPU hotplug. Many vendors have
> non-core MCA banks, so MCA settings are not cleared when a CPU is
> offlined for these vendors.
>
> Even though the non-core MCA banks remain enabled, MCA errors will not
> be handled (reported, cleared, etc.) on SMCA systems when the managing
> CPU is offline.
>
> Check if a CPU manages non-core MCA banks and, if so, prevent it from
> being taken offline.

Which in turn breaks hibernation and kexec...

Thanks,

        tglx
Yazen Ghannam Aug. 26, 2024, 1:20 p.m. UTC | #4
On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
> > banks. Each of these banks represents unique and separate hardware
> > located within the system. Each bank is managed by a single logical CPU;
> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
> > cannot be modified at run time.
> >
> > The MCE subsystem supports run time CPU hotplug. Many vendors have
> > non-core MCA banks, so MCA settings are not cleared when a CPU is
> > offlined for these vendors.
> >
> > Even though the non-core MCA banks remain enabled, MCA errors will not
> > be handled (reported, cleared, etc.) on SMCA systems when the managing
> > CPU is offline.
> >
> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
> > being taken offline.
> 
> Which in turn breaks hibernation and kexec...
>

Right, good point.

Maybe this change can apply only to a user-initiated (sysfs) case?

Thanks,
Yazen
Borislav Petkov Aug. 27, 2024, 12:50 p.m. UTC | #5
On August 26, 2024 3:20:57 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
>> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
>> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
>> > banks. Each of these banks represents unique and separate hardware
>> > located within the system. Each bank is managed by a single logical CPU;
>> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
>> > cannot be modified at run time.
>> >
>> > The MCE subsystem supports run time CPU hotplug. Many vendors have
>> > non-core MCA banks, so MCA settings are not cleared when a CPU is
>> > offlined for these vendors.
>> >
>> > Even though the non-core MCA banks remain enabled, MCA errors will not
>> > be handled (reported, cleared, etc.) on SMCA systems when the managing
>> > CPU is offline.
>> >
>> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
>> > being taken offline.
>> 
>> Which in turn breaks hibernation and kexec...
>>
>
>Right, good point.
>
>Maybe this change can apply only to a user-initiated (sysfs) case?
>
>Thanks,
>Yazen
>

Or, you can simply say that the MCE cannot be processed because the user took the managing CPU offline. 

What is this actually really fixing anyway?
Yazen Ghannam Aug. 27, 2024, 1:47 p.m. UTC | #6
On Tue, Aug 27, 2024 at 02:50:40PM +0200, Borislav Petkov wrote:
> On August 26, 2024 3:20:57 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
> >> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
> >> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
> >> > banks. Each of these banks represents unique and separate hardware
> >> > located within the system. Each bank is managed by a single logical CPU;
> >> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
> >> > cannot be modified at run time.
> >> >
> >> > The MCE subsystem supports run time CPU hotplug. Many vendors have
> >> > non-core MCA banks, so MCA settings are not cleared when a CPU is
> >> > offlined for these vendors.
> >> >
> >> > Even though the non-core MCA banks remain enabled, MCA errors will not
> >> > be handled (reported, cleared, etc.) on SMCA systems when the managing
> >> > CPU is offline.
> >> >
> >> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
> >> > being taken offline.
> >> 
> >> Which in turn breaks hibernation and kexec...
> >>
> >
> >Right, good point.
> >
> >Maybe this change can apply only to a user-initiated (sysfs) case?
> >
> >Thanks,
> >Yazen
> >
> 
> Or, you can simply say that the MCE cannot be processed because the user took the managing CPU offline. 
>

I found that we can not populate the "cpuN/online" file. This would
prevent a user from offlining a CPU, but it shouldn't prevent the system
from doing what it needs.

This is already done for CPU0, and other cases I think.

> What is this actually really fixing anyway?

There are times where a user wants to take CPUs offline due to software
licensing. And this would prevent the user from unintentionally
offlining CPUs that would affect MCA handling.

Thanks,
Yazen
Borislav Petkov Aug. 29, 2024, 8:39 a.m. UTC | #7
On August 27, 2024 3:47:06 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>On Tue, Aug 27, 2024 at 02:50:40PM +0200, Borislav Petkov wrote:
>> On August 26, 2024 3:20:57 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>> >On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
>> >> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
>> >> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
>> >> > banks. Each of these banks represents unique and separate hardware
>> >> > located within the system. Each bank is managed by a single logical CPU;
>> >> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
>> >> > cannot be modified at run time.
>> >> >
>> >> > The MCE subsystem supports run time CPU hotplug. Many vendors have
>> >> > non-core MCA banks, so MCA settings are not cleared when a CPU is
>> >> > offlined for these vendors.
>> >> >
>> >> > Even though the non-core MCA banks remain enabled, MCA errors will not
>> >> > be handled (reported, cleared, etc.) on SMCA systems when the managing
>> >> > CPU is offline.
>> >> >
>> >> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
>> >> > being taken offline.
>> >> 
>> >> Which in turn breaks hibernation and kexec...
>> >>
>> >
>> >Right, good point.
>> >
>> >Maybe this change can apply only to a user-initiated (sysfs) case?
>> >
>> >Thanks,
>> >Yazen
>> >
>> 
>> Or, you can simply say that the MCE cannot be processed because the user took the managing CPU offline. 
>>
>
>I found that we can not populate the "cpuN/online" file. This would
>prevent a user from offlining a CPU, but it shouldn't prevent the system
>from doing what it needs.
>
>This is already done for CPU0, and other cases I think.
>
>> What is this actually really fixing anyway?
>
>There are times where a user wants to take CPUs offline due to software
>licensing. And this would prevent the user from unintentionally
>offlining CPUs that would affect MCA handling.
>
>Thanks,
>Yazen

If the user offlines CPUs and some MCEs cannot be handled as a result, then that's her/his problem, no?

- Why does it hurt when I do this? 
- Well, don't do that then.
Yazen Ghannam Aug. 29, 2024, 2:03 p.m. UTC | #8
On Thu, Aug 29, 2024 at 10:39:41AM +0200, Borislav Petkov wrote:
> On August 27, 2024 3:47:06 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >On Tue, Aug 27, 2024 at 02:50:40PM +0200, Borislav Petkov wrote:
> >> On August 26, 2024 3:20:57 PM GMT+02:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >> >On Sun, Aug 25, 2024 at 01:16:37PM +0200, Thomas Gleixner wrote:
> >> >> On Wed, Aug 21 2024 at 09:00, Yazen Ghannam wrote:
> >> >> > Logical CPUs in AMD Scalable MCA (SMCA) systems can manage non-core
> >> >> > banks. Each of these banks represents unique and separate hardware
> >> >> > located within the system. Each bank is managed by a single logical CPU;
> >> >> > they are not shared. Furthermore, the "CPU to MCA bank" assignment
> >> >> > cannot be modified at run time.
> >> >> >
> >> >> > The MCE subsystem supports run time CPU hotplug. Many vendors have
> >> >> > non-core MCA banks, so MCA settings are not cleared when a CPU is
> >> >> > offlined for these vendors.
> >> >> >
> >> >> > Even though the non-core MCA banks remain enabled, MCA errors will not
> >> >> > be handled (reported, cleared, etc.) on SMCA systems when the managing
> >> >> > CPU is offline.
> >> >> >
> >> >> > Check if a CPU manages non-core MCA banks and, if so, prevent it from
> >> >> > being taken offline.
> >> >> 
> >> >> Which in turn breaks hibernation and kexec...
> >> >>
> >> >
> >> >Right, good point.
> >> >
> >> >Maybe this change can apply only to a user-initiated (sysfs) case?
> >> >
> >> >Thanks,
> >> >Yazen
> >> >
> >> 
> >> Or, you can simply say that the MCE cannot be processed because the user took the managing CPU offline. 
> >>
> >
> >I found that we can not populate the "cpuN/online" file. This would
> >prevent a user from offlining a CPU, but it shouldn't prevent the system
> >from doing what it needs.
> >
> >This is already done for CPU0, and other cases I think.
> >
> >> What is this actually really fixing anyway?
> >
> >There are times where a user wants to take CPUs offline due to software
> >licensing. And this would prevent the user from unintentionally
> >offlining CPUs that would affect MCA handling.
> >
> >Thanks,
> >Yazen
> 
> If the user offlines CPUs and some MCEs cannot be handled as a result, then that's her/his problem, no?
> 
> - Why does it hurt when I do this? 
> - Well, don't do that then.
> -- 

Right, that was our initial feedback.

But there was an ask to have the kernel manage this.

Do you think we should we continue to pursue this or no?

Thanks,
Yazen
Borislav Petkov Aug. 29, 2024, 2:14 p.m. UTC | #9
On Thu, Aug 29, 2024 at 10:03:05AM -0400, Yazen Ghannam wrote:
> Do you think we should we continue to pursue this or no?

You mean the kernel should prevent those folks from shooting themselves in the
foot?

How would that patch look like?
Yazen Ghannam Aug. 29, 2024, 2:18 p.m. UTC | #10
On Thu, Aug 29, 2024 at 04:14:15PM +0200, Borislav Petkov wrote:
> On Thu, Aug 29, 2024 at 10:03:05AM -0400, Yazen Ghannam wrote:
> > Do you think we should we continue to pursue this or no?
> 
> You mean the kernel should prevent those folks from shooting themselves in the
> foot?
> 
> How would that patch look like?
>

Right, I'm working on another revision. I'll try to send it today.

The gist is that we can hide the sysfs interface for CPUs that shouldn't
be hotplugged. We already do this today for other special cases like
CPU0.

Thanks,
Yazen
diff mbox series

Patch

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2a938f429c4d..cf1529d0e6b1 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2770,10 +2770,34 @@  static int mce_cpu_online(unsigned int cpu)
 	return 0;
 }
 
+static bool mce_cpu_is_hotpluggable(void)
+{
+	if (!mce_flags.smca)
+		return true;
+
+	/*
+	 * SMCA systems use banks 0-6 for core units. Banks 7 and later are
+	 * used for non-core units.
+	 *
+	 * Logical CPUs with 7 or fewer banks can be offlined, since they are not
+	 * managing any non-core units.
+	 *
+	 * Check if non-core banks are enabled using MCG_CTL. The hardware may
+	 * report MCG_CAP[Count] greater than is actually present, so it is not a
+	 * good indicator that a CPU has non-core banks.
+	 */
+	return fls_long(mce_rdmsrl(MSR_IA32_MCG_CTL)) <= 7;
+}
+
 static int mce_cpu_pre_down(unsigned int cpu)
 {
 	struct timer_list *t = this_cpu_ptr(&mce_timer);
 
+	if (!mce_cpu_is_hotpluggable()) {
+		pr_info("CPU%d is not hotpluggable\n", cpu);
+		return -EOPNOTSUPP;
+	}
+
 	mce_disable_cpu();
 	del_timer_sync(t);
 	mce_threshold_remove_device(cpu);