Message ID | 1535654266-40053-1-git-send-email-wufan@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] EDAC, ghes: use CPER module handles to locate DIMMs | expand |
Hi Fan, On Thu, Aug 30, 2018 at 2:37 PM, Fan Wu <wufan@codeaurora.org> wrote: > For platforms whose firmwares provide valid module handles > (SMBIOS type 17) in error records, this patch uses the module > handles to locate corresponding DIMMs and enables per-DIMM > error counter update. > > Signed-off-by: Fan Wu <wufan@codeaurora.org> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c > index 473aeec..949f603 100644 > --- a/drivers/edac/ghes_edac.c > +++ b/drivers/edac/ghes_edac.c > @@ -81,6 +81,20 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg) > (*num_dimm)++; > } > > +static int get_dimm_smbios_index(u16 handle) > +{ > + struct mem_ctl_info *mci; > + int i; > + > + mci = ghes_pvt->mci; > + Minor nit: you could define and set mci in the same line to save some space here. Otherwise this patch looks good to me. Reviewed-by: Tyler Baicar <baicar.tyler@gmail.com>
Hi Tyler, > > +static int get_dimm_smbios_index(u16 handle) { > > + struct mem_ctl_info *mci; > > + int i; > > + > > + mci = ghes_pvt->mci; > > + > > Minor nit: you could define and set mci in the same line to save some space > here. > > Otherwise this patch looks good to me. Good catch Tyler! I'll fix it. Thanks, Fan > Reviewed-by: Tyler Baicar <baicar.tyler@gmail.com>
On Thu, 2018-08-30 at 18:37 +0000, Fan Wu wrote: > For platforms whose firmwares provide valid module handles > (SMBIOS type 17) in error records, this patch uses the module > handles to locate corresponding DIMMs and enables per-DIMM > error counter update. > > Signed-off-by: Fan Wu <wufan@codeaurora.org> Tested on an HPE x86 server. It turns out that mem_err->mem_dev_handle was set to 0 with this FW, and get_dimm_smbios_index() returned -1. But the code handles this case fine. dimm->smbios_handle's were initialized properly. So: Tested-by: Toshi Kani <toshi.kani@hpe.com> Thanks, -Toshi
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 473aeec..949f603 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -81,6 +81,20 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg) (*num_dimm)++; } +static int get_dimm_smbios_index(u16 handle) +{ + struct mem_ctl_info *mci; + int i; + + mci = ghes_pvt->mci; + + for (i = 0; i < mci->tot_dimms; i++) { + if (mci->dimms[i]->smbios_handle == handle) + return i; + } + return -1; +} + static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) { struct ghes_edac_dimm_fill *dimm_fill = arg; @@ -177,6 +191,8 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) entry->total_width, entry->data_width); } + dimm->smbios_handle = entry->handle; + dimm_fill->count++; } } @@ -327,12 +343,21 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) p += sprintf(p, "bit_pos:%d ", mem_err->bit_pos); if (mem_err->validation_bits & CPER_MEM_VALID_MODULE_HANDLE) { const char *bank = NULL, *device = NULL; + int index = -1; + dmi_memdev_name(mem_err->mem_dev_handle, &bank, &device); if (bank != NULL && device != NULL) p += sprintf(p, "DIMM location:%s %s ", bank, device); else p += sprintf(p, "DIMM DMI handle: 0x%.4x ", mem_err->mem_dev_handle); + + index = get_dimm_smbios_index(mem_err->mem_dev_handle); + if (index >= 0) { + e->top_layer = index; + e->enable_per_layer_report = true; + } + } if (p > e->location) *(p - 1) = '\0'; diff --git a/include/linux/edac.h b/include/linux/edac.h index bffb978..a45ce1f 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -451,6 +451,8 @@ struct dimm_info { u32 nr_pages; /* number of pages on this dimm */ unsigned csrow, cschannel; /* Points to the old API data */ + + u16 smbios_handle; /* Handle for SMBIOS type 17 */ }; /**
For platforms whose firmwares provide valid module handles (SMBIOS type 17) in error records, this patch uses the module handles to locate corresponding DIMMs and enables per-DIMM error counter update. Signed-off-by: Fan Wu <wufan@codeaurora.org> --- Changes from v1: * Changed the new function name to get_dimm_smbios_index * Removed unnecessary checks inside get_dimm_smbios_index * Reverted the change of the DMI handle print * Updated commit message --- drivers/edac/ghes_edac.c | 25 +++++++++++++++++++++++++ include/linux/edac.h | 2 ++ 2 files changed, 27 insertions(+)