mbox series

[v2,00/16] MCA Updates

Message ID 20240404151359.47970-1-yazen.ghannam@amd.com (mailing list archive)
Headers show
Series MCA Updates | expand

Message

Yazen Ghannam April 4, 2024, 3:13 p.m. UTC
Hi all,

This set is a collection of logically independent updates that make
changes to common code. I've collected them to resolve conflicts and
ordering. Furthermore, this is the first half of a larger set. The
second half is focused on refactoring the AMD MCA Thresholding feature
support. So I decided to leave out the second half for now. The second
part will include AMD interrupt storm handling support on top of the
refactored code. Please see the link below for a work-in-progress branch
with the remaining changes.

Patches 1-2 deal with BERT MCA decode and preemption.

Patches 3-8 are general refactoring in preparation for later patches in
this set and the second planned set. The overall theme is to simplify
the AMD MCA init flow and to remove unnecessary data caching in per-CPU
variables. The init flow refactor will be completed in the second patch
set, since much of the cached data is used to set up MCA Thresholding.

Patches 9-10 unify the AMD THR and DFR interrupt handlers with MCA
polling.

Patch 11 is a small cleanup for the MCA Thresholding init path.

Patch 12 adds support for a new Corrected Error Interrupt on Scalable
MCA systems.

Patches 13-16 add support for new Scalable MCA registers and FRU Text
decoding feature.

Thanks,
Yazen

Branch for this set:
https://github.com/AMDESE/linux/tree/mca-updates-v2

Branch for remaining changes (work-in-progrss):
https://github.com/AMDESE/linux/tree/wip-mca

Link:
https://lkml.kernel.org/r/20231118193248.1296798-1-yazen.ghannam@amd.com

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Yazen Ghannam (14):
  x86/mce: Define mce_setup() helpers for common and per-CPU fields
  x86/mce: Use mce_setup() helpers for apei_smca_report_x86_error()
  x86/mce/amd: Use fixed bank number for quirks
  x86/mce/amd: Look up bank type by IPID
  x86/mce/amd: Clean up SMCA configuration
  x86/mce/amd: Prep DFR handler before enabling banks
  x86/mce/amd: Simplify DFR handler setup
  x86/mce/amd: Clean up enable_deferred_error_interrupt()
  x86/mce: Unify AMD THR handler with MCA Polling
  x86/mce: Unify AMD DFR handler with MCA Polling
  x86/mce: Skip AMD threshold init if no threshold banks found
  x86/mce/amd: Support SMCA Corrected Error Interrupt
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h              |  24 +-
 arch/x86/kernel/cpu/mce/amd.c           | 461 ++++++++++++++----------
 arch/x86/kernel/cpu/mce/apei.c          | 124 +++++--
 arch/x86/kernel/cpu/mce/core.c          | 253 ++++++++-----
 arch/x86/kernel/cpu/mce/dev-mcelog.c    |   2 +-
 arch/x86/kernel/cpu/mce/genpool.c       |  20 +-
 arch/x86/kernel/cpu/mce/inject.c        |   4 +-
 arch/x86/kernel/cpu/mce/internal.h      |  13 +-
 drivers/acpi/acpi_extlog.c              |   2 +-
 drivers/acpi/nfit/mce.c                 |   3 +-
 drivers/edac/amd64_edac.c               |   2 +-
 drivers/edac/i7core_edac.c              |   2 +-
 drivers/edac/igen6_edac.c               |   2 +-
 drivers/edac/mce_amd.c                  |  29 +-
 drivers/edac/pnd2_edac.c                |   2 +-
 drivers/edac/sb_edac.c                  |   2 +-
 drivers/edac/skx_common.c               |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c |   4 +-
 drivers/ras/amd/fmpm.c                  |   2 +-
 drivers/ras/cec.c                       |   3 +-
 include/trace/events/mce.h              |  51 +--
 21 files changed, 620 insertions(+), 387 deletions(-)


base-commit: f382ab1037497f49d290ce6ceb9cdb10b186682e