mbox series

[RFC,v2,0/2] CPU offlining with non-core MCA banks

Message ID 20240829223225.223639-1-yazen.ghannam@amd.com (mailing list archive)
Headers show
Series CPU offlining with non-core MCA banks | expand

Message

Yazen Ghannam Aug. 29, 2024, 10:32 p.m. UTC
Hi all,

The major change in this revision is to prevent the sysfs interface from
being created in the first place for CPUs that shouldn't be offlined.
This is a more direct solution to prevent users from bringing down CPUs.
And it shouldn't affect internal kernel hotplug flows.

Also, I've changed this set to RFC, because there are still open questions
about how to address this use case. Here are just a couple to start...

1) What if a user wants to offline a CPU, and they don't know or care
   about this restriction?

   Should this behavior be controlled by a kernel parameter? In this
   way, a system admin can enforce this policy without affecting the
   general user base.

2) Should this use case be generalized and indicated by the platform?
   Maybe a new flag in the ACPI MADT Processor Local APIC Structure?
   This would be set by firmware to inform the OS to not allow a logical
   CPU to be taken offline. Again, this could be enforced by a system
   admin by changing system BIOS/firmware settings.

Thanks,
Yazen

Link:
https://lkml.kernel.org/r/20240821140017.330105-1-yazen.ghannam@amd.com/

v1->v2:
* Change to RFC.
* Include new patch to adjust the number of MCA banks.
* Change solution to prevent the creation of "cpuX/online".

Yazen Ghannam (2):
  x86/mce: Set a more accurate value for mce_num_banks
  x86/mce: Prevent CPU offline for SMCA CPUs with non-core banks

 arch/x86/include/asm/mce.h     |  2 ++
 arch/x86/kernel/cpu/mce/core.c | 22 +++++++++++++++++++++-
 arch/x86/kernel/setup.c        |  2 +-
 3 files changed, 24 insertions(+), 2 deletions(-)


base-commit: 793aa4bf192d0ad07cca001a596f955d121f5c10