mbox series

[v5,0/5] MCE wrapper and support for new SMCA syndrome MSRs

Message ID 20241001181617.604573-1-avadhut.naik@amd.com (mailing list archive)
Headers show
Series MCE wrapper and support for new SMCA syndrome MSRs | expand

Message

Avadhut Naik Oct. 1, 2024, 6:12 p.m. UTC
This patchset adds a new wrapper for struct mce to prevent its bloating
and export vendor specific error information. Additionally, support is
also introduced for two new "syndrome" MSRs used in newer AMD Scalable
MCA (SMCA) systems. Also, a new "FRU Text in MCA" feature that uses these
new "syndrome" MSRs has been addded.

Patch 1 adds the new wrapper structure mce_hw_err for the struct mce
while also modifying the mce_record tracepoint to use the new wrapper.

Patch 2 introduces a new helper function, __print_dynamic_array(), for
logging dynamic arrays through tracepoints.

Patch 3 adds support for the new "syndrome" registers. They are read/printed
wherever the existing MCA_SYND register is used.

Patch 4 updates the function that pulls MCA information from UEFI x86
Common Platform Error Records (CPERs) to handle systems that support the
new registers.

Patch 5 adds support to the AMD MCE decoder module to detect and use the
"FRU Text in MCA" feature which leverages the new registers.
  
NOTE:
 
This set was initially submitted as part of the larger MCA Updates set.
 
v1: https://lore.kernel.org/linux-edac/20231118193248.1296798-1-yazen.ghannam@amd.com/
v2: https://lore.kernel.org/linux-edac/20240404151359.47970-1-yazen.ghannam@amd.com/
 
However, since the MCA Updates set has been split up into smaller sets,
this set, going forward, will be submitted independently.
 
Having said that, this set set depends on and applies cleanly on top of
the below two sets.
 
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

Changes in v2:
 - Drop dependencies on sets [1] and [2] above and rebase on top of
   tip/master.

Changes in v3:
 - Move wrapper changes required in mce_read_aux() and mce_no_way_out()
   from second patch to the first patch.
 - Modify commit messages for second and fourth patch per feedback
   received.
 - Add comments to explain purpose of the new wrapper structure.
 - Incorporate suggested touchup in the third patch.
 - Remove call to memset() for the frutext string in the fourth patch.
   Instead, just ensure that the string is NULL terminated.
 - Fix SoB chains on all patches to properly reflect the patch path.

Changes in v4:
 - Resolve kernel test robot's warning on the use of memset() in
   do_machine_check().
 - Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
 - Introduce a new helper function __print_dynamic_array() for logging
   dynamic arrays through tracepoints.
 - Remove "len" field from the modified mce_record tracepoint since the
   length of a dynamic array can be fetched from its metadata.
 - Substitute __print_array() with __print_dynamic_array().

Links:
v1: https://lore.kernel.org/linux-edac/20240530211620.1829453-1-avadhut.naik@amd.com/
v2: https://lore.kernel.org/linux-edac/20240625195624.2565741-1-avadhut.naik@amd.com/
v3: https://lore.kernel.org/linux-edac/20240730185406.3709876-1-avadhut.naik@amd.com/T/#t
v4: https://lore.kernel.org/linux-edac/20240815211635.1336721-1-avadhut.naik@amd.com/

Avadhut Naik (2):
  x86/mce: Add wrapper for struct mce to export vendor specific info
  x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Steven Rostedt (1):
  tracing: Add __print_dynamic_array() helper

Yazen Ghannam (2):
  x86/mce/apei: Handle variable register array size
  EDAC/mce_amd: Add support for FRU Text in MCA

 arch/x86/include/asm/mce.h                 |  36 +++-
 arch/x86/include/uapi/asm/mce.h            |   3 +-
 arch/x86/kernel/cpu/mce/amd.c              |  31 +--
 arch/x86/kernel/cpu/mce/apei.c             | 109 ++++++++---
 arch/x86/kernel/cpu/mce/core.c             | 210 ++++++++++++---------
 arch/x86/kernel/cpu/mce/dev-mcelog.c       |   2 +-
 arch/x86/kernel/cpu/mce/genpool.c          |  20 +-
 arch/x86/kernel/cpu/mce/inject.c           |   4 +-
 arch/x86/kernel/cpu/mce/internal.h         |   4 +-
 drivers/acpi/acpi_extlog.c                 |   2 +-
 drivers/acpi/nfit/mce.c                    |   2 +-
 drivers/edac/i7core_edac.c                 |   2 +-
 drivers/edac/igen6_edac.c                  |   2 +-
 drivers/edac/mce_amd.c                     |  27 ++-
 drivers/edac/pnd2_edac.c                   |   2 +-
 drivers/edac/sb_edac.c                     |   2 +-
 drivers/edac/skx_common.c                  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |   2 +-
 drivers/ras/amd/fmpm.c                     |   2 +-
 drivers/ras/cec.c                          |   2 +-
 include/trace/events/mce.h                 |  49 ++---
 include/trace/stages/stage3_trace_output.h |   8 +
 include/trace/stages/stage7_class_define.h |   1 +
 samples/trace_events/trace-events-sample.h |   7 +-
 24 files changed, 342 insertions(+), 189 deletions(-)


base-commit: 040fcc9d4c772223d07972eceab5bfd25a2edbc9