Message ID | 20200306151318.17422-12-rrichter@marvell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | EDAC/ghes: Cleanup, rework and improvement of memory reporting | expand |
On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote: > The ghes driver only creates one memory controller for the whole > system. This does not reflect memory topology especially in multi-node > systems. E.g. a Marvell ThunderX2 system shows: > > /sys/devices/system/edac/mc/mc0/dimm0 > /sys/devices/system/edac/mc/mc0/dimm1 > /sys/devices/system/edac/mc/mc0/dimm2 > /sys/devices/system/edac/mc/mc0/dimm3 > /sys/devices/system/edac/mc/mc0/dimm4 > /sys/devices/system/edac/mc/mc0/dimm5 > /sys/devices/system/edac/mc/mc0/dimm6 > /sys/devices/system/edac/mc/mc0/dimm7 > /sys/devices/system/edac/mc/mc0/dimm8 > /sys/devices/system/edac/mc/mc0/dimm9 > /sys/devices/system/edac/mc/mc0/dimm10 > /sys/devices/system/edac/mc/mc0/dimm11 > /sys/devices/system/edac/mc/mc0/dimm12 > /sys/devices/system/edac/mc/mc0/dimm13 > /sys/devices/system/edac/mc/mc0/dimm14 > /sys/devices/system/edac/mc/mc0/dimm15 > > The DIMMs 9-15 are located on the 2nd node of the system. On > comparable x86 systems there is one memory controller per node. The > ghes driver should also group DIMMs depending on the topology and > create one MC per node. > > There are several options to detect the topology. ARM64 systems > retrieve the (NUMA) node information from the ACPI SRAT table (see > acpi_table_parse_srat()). The node id is later stored in the physical > address page. The pfn_to_nid() macro could be used for a DIMM after > determining its physical address. The drawback of this approach is > that there are too many subsystems involved it depends on. It could > easily break and makes the implementation complex. E.g. pfn_to_nid() > can only be reliable used on mapped address ranges which is not always > granted, there are various firmware instances involved which could be > broken, or results may vary depending on NUMA settings. > > Another approach that was suggested by James' is to use the DIMM's > physical memory array handle to group DIMMs [1]. The advantage is to > only use the information on memory devices from the SMBIOS table that > contains a reference to the physical memory array it belongs too. This > information is mandatory same as the use of DIMM handle references by > GHES to provide the DIMM location of an error. There is only a single > table to parse which eases implementation. This patch uses this > approach for DIMM grouping. > > Modify the DMI decoder to also detect the physical memory array a DIMM > is linked to and create one memory controller per array to group > DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system > shows one MC per node now: > > # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label > /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 > /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 > /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 > /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 > /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 > /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 > /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 > /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 > /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0 > /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0 > /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0 > /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0 > /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0 > /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0 > /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0 > /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0 > > [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com > > Suggested-by: James Morse <james.morse@arm.com> > Signed-off-by: Robert Richter <rrichter@marvell.com> > --- > drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++--------- > 1 file changed, 107 insertions(+), 30 deletions(-) This is all fine and good but that change affects the one x86 platform we support so the whole patchset should be tested there too. Adding Toshi. As a matter of fact, the final version of this set should be tested on all platforms which are using this thing. Adding John Garry too who reported issues with this driver recently on his platform. Thx.
On 16/03/2020 09:51, Borislav Petkov wrote: > On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote: >> The ghes driver only creates one memory controller for the whole >> system. This does not reflect memory topology especially in multi-node >> systems. E.g. a Marvell ThunderX2 system shows: >> >> /sys/devices/system/edac/mc/mc0/dimm0 >> /sys/devices/system/edac/mc/mc0/dimm1 >> /sys/devices/system/edac/mc/mc0/dimm2 >> /sys/devices/system/edac/mc/mc0/dimm3 >> /sys/devices/system/edac/mc/mc0/dimm4 >> /sys/devices/system/edac/mc/mc0/dimm5 >> /sys/devices/system/edac/mc/mc0/dimm6 >> /sys/devices/system/edac/mc/mc0/dimm7 >> /sys/devices/system/edac/mc/mc0/dimm8 >> /sys/devices/system/edac/mc/mc0/dimm9 >> /sys/devices/system/edac/mc/mc0/dimm10 >> /sys/devices/system/edac/mc/mc0/dimm11 >> /sys/devices/system/edac/mc/mc0/dimm12 >> /sys/devices/system/edac/mc/mc0/dimm13 >> /sys/devices/system/edac/mc/mc0/dimm14 >> /sys/devices/system/edac/mc/mc0/dimm15 >> >> The DIMMs 9-15 are located on the 2nd node of the system. On >> comparable x86 systems there is one memory controller per node. The >> ghes driver should also group DIMMs depending on the topology and >> create one MC per node. >> >> There are several options to detect the topology. ARM64 systems >> retrieve the (NUMA) node information from the ACPI SRAT table (see >> acpi_table_parse_srat()). The node id is later stored in the physical >> address page. The pfn_to_nid() macro could be used for a DIMM after >> determining its physical address. The drawback of this approach is >> that there are too many subsystems involved it depends on. It could >> easily break and makes the implementation complex. E.g. pfn_to_nid() >> can only be reliable used on mapped address ranges which is not always >> granted, there are various firmware instances involved which could be >> broken, or results may vary depending on NUMA settings. >> >> Another approach that was suggested by James' is to use the DIMM's >> physical memory array handle to group DIMMs [1]. The advantage is to >> only use the information on memory devices from the SMBIOS table that >> contains a reference to the physical memory array it belongs too. This >> information is mandatory same as the use of DIMM handle references by >> GHES to provide the DIMM location of an error. There is only a single >> table to parse which eases implementation. This patch uses this >> approach for DIMM grouping. >> >> Modify the DMI decoder to also detect the physical memory array a DIMM >> is linked to and create one memory controller per array to group >> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system >> shows one MC per node now: >> >> # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label >> /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 >> /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 >> /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 >> /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 >> /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 >> /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 >> /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 >> /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 >> /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0 >> /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0 >> /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0 >> /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0 >> /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0 >> /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0 >> /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0 >> /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0 >> >> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com >> >> Suggested-by: James Morse <james.morse@arm.com> >> Signed-off-by: Robert Richter <rrichter@marvell.com> >> --- >> drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++--------- >> 1 file changed, 107 insertions(+), 30 deletions(-) > > This is all fine and good but that change affects the one x86 platform > we support so the whole patchset should be tested there too. Adding > Toshi. > > As a matter of fact, the final version of this set should be tested on > all platforms which are using this thing. Adding John Garry too who > reported issues with this driver recently on his platform. Adding other RAS-centric guys for H. Cheers, John > > Thx. >
> > This is all fine and good but that change affects the one x86 platform > > we support so the whole patchset should be tested there too. Adding > > Toshi. Thanks for the heads-up. I do not have an access to a test system at the moment, but am checking my colleague to do the test. -Toshi
On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote: > > > This is all fine and good but that change affects the one x86 platform > > > we support so the whole patchset should be tested there too. Adding > > > Toshi. > > Thanks for the heads-up. I do not have an access to a test system at > the moment, but am checking my colleague to do the test. Thx but hold that off for now - Robert is reworking the set.
> On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote: > > > > This is all fine and good but that change affects the one x86 > > > > platform we support so the whole patchset should be tested there > > > > too. Adding Toshi. > > > > Thanks for the heads-up. I do not have an access to a test system at > > the moment, but am checking my colleague to do the test. > > Thx but hold that off for now - Robert is reworking the set. Got it. Thanks, -Toshi
On 17.03.20 22:53:09, Kani, Toshi wrote: > > On Tue, Mar 17, 2020 at 10:14:46PM +0000, Kani, Toshi wrote: > > > > > This is all fine and good but that change affects the one x86 > > > > > platform we support so the whole patchset should be tested there > > > > > too. Adding Toshi. > > > > > > Thanks for the heads-up. I do not have an access to a test system at > > > the moment, but am checking my colleague to do the test. > > > > Thx but hold that off for now - Robert is reworking the set. It would still be good to have a test run as the general concept wont change and we will then early see potential issues, especially wrt SMBIOS/DMI. It would esp. be interesting to see how the node mapping reflects the memory topology: # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0 Thanks, -Robert
On 2020/3/18 0:34, John Garry wrote: > On 16/03/2020 09:51, Borislav Petkov wrote: >> On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote: >>> The ghes driver only creates one memory controller for the whole >>> system. This does not reflect memory topology especially in multi-node >>> systems. E.g. a Marvell ThunderX2 system shows: >>> >>> /sys/devices/system/edac/mc/mc0/dimm0 >>> /sys/devices/system/edac/mc/mc0/dimm1 >>> /sys/devices/system/edac/mc/mc0/dimm2 >>> /sys/devices/system/edac/mc/mc0/dimm3 >>> /sys/devices/system/edac/mc/mc0/dimm4 >>> /sys/devices/system/edac/mc/mc0/dimm5 >>> /sys/devices/system/edac/mc/mc0/dimm6 >>> /sys/devices/system/edac/mc/mc0/dimm7 >>> /sys/devices/system/edac/mc/mc0/dimm8 >>> /sys/devices/system/edac/mc/mc0/dimm9 >>> /sys/devices/system/edac/mc/mc0/dimm10 >>> /sys/devices/system/edac/mc/mc0/dimm11 >>> /sys/devices/system/edac/mc/mc0/dimm12 >>> /sys/devices/system/edac/mc/mc0/dimm13 >>> /sys/devices/system/edac/mc/mc0/dimm14 >>> /sys/devices/system/edac/mc/mc0/dimm15 >>> >>> The DIMMs 9-15 are located on the 2nd node of the system. On >>> comparable x86 systems there is one memory controller per node. The >>> ghes driver should also group DIMMs depending on the topology and >>> create one MC per node. >>> >>> There are several options to detect the topology. ARM64 systems >>> retrieve the (NUMA) node information from the ACPI SRAT table (see >>> acpi_table_parse_srat()). The node id is later stored in the physical >>> address page. The pfn_to_nid() macro could be used for a DIMM after >>> determining its physical address. The drawback of this approach is >>> that there are too many subsystems involved it depends on. It could >>> easily break and makes the implementation complex. E.g. pfn_to_nid() >>> can only be reliable used on mapped address ranges which is not always >>> granted, there are various firmware instances involved which could be >>> broken, or results may vary depending on NUMA settings. >>> >>> Another approach that was suggested by James' is to use the DIMM's >>> physical memory array handle to group DIMMs [1]. The advantage is to >>> only use the information on memory devices from the SMBIOS table that >>> contains a reference to the physical memory array it belongs too. This >>> information is mandatory same as the use of DIMM handle references by >>> GHES to provide the DIMM location of an error. There is only a single >>> table to parse which eases implementation. This patch uses this >>> approach for DIMM grouping. >>> >>> Modify the DMI decoder to also detect the physical memory array a DIMM >>> is linked to and create one memory controller per array to group >>> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system >>> shows one MC per node now: >>> >>> # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label >>> /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 >>> /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 >>> /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 >>> /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 >>> /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 >>> /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 >>> /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 >>> /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 >>> /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0 >>> /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0 >>> /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0 >>> /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0 >>> /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0 >>> /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0 >>> /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0 >>> /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0 >>> >>> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com >>> >>> Suggested-by: James Morse <james.morse@arm.com> >>> Signed-off-by: Robert Richter <rrichter@marvell.com> >>> --- >>> drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++--------- >>> 1 file changed, 107 insertions(+), 30 deletions(-) >> >> This is all fine and good but that change affects the one x86 platform >> we support so the whole patchset should be tested there too. Adding >> Toshi. >> >> As a matter of fact, the final version of this set should be tested on >> all platforms which are using this thing. Adding John Garry too who >> reported issues with this driver recently on his platform. > > Adding other RAS-centric guys for H. Hi John & Borislav & Robert I have tested this patch set on our platform. Only one memory controller found when there is one DIMM on each socket or node. Just like this: estuary:/$ grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:SOCKET 0 CHANNEL 0 DIMM 0 DIMM0 /sys/devices/system/edac/mc/mc0/dimm20/dimm_label:SOCKET 1 CHANNEL 2 DIMM 0 DIMM1 It is not the problem of the patch set. Because our BIOS only defined one "Physical Memory Array Handle" in DMI table. Just like this: estuary:/$ dmidecode -t memory | grep "Array Handle" Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 Array Handle: 0x0000 BTW, i also test other function of edac driver our platform used. They're all good. :) > > Cheers, > John > >> >> Thx. >> > > > . >
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 64220397296e..35b38cccc6da 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -125,12 +125,44 @@ static void ghes_dimm_release(struct list_head *dimms) list_splice(dimms, &ghes_dimm_pool); } -static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg) +struct ghes_mci_fill { + unsigned long *map; + int index; + int count; + int num_mc; + int num_dimm; + u16 handle; +}; + +static void ghes_edac_dmidecode_mci(const struct dmi_header *dh, void *arg) { - int *num_dimm = arg; + struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh; + struct ghes_mci_fill *mci_fill = arg; + + if (dh->type != DMI_ENTRY_MEM_DEVICE) + return; + + /* First run, no mapping, just count. */ + if (!mci_fill->map) { + mci_fill->count++; + return; + } + + if (mci_fill->index >= mci_fill->count) + goto out; - if (dh->type == DMI_ENTRY_MEM_DEVICE) - (*num_dimm)++; + if (test_bit(mci_fill->index, mci_fill->map)) + goto out; + + if (!mci_fill->num_dimm) + mci_fill->handle = entry->phys_mem_array_handle; + else if (mci_fill->handle != entry->phys_mem_array_handle) + goto out; + + set_bit(mci_fill->index, mci_fill->map); + mci_fill->num_dimm++; +out: + mci_fill->index++; } /* @@ -181,17 +213,29 @@ struct ghes_dimm_fill { struct list_head dimms; struct mem_ctl_info *mci; int index; + u16 link; }; -static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg) +static void ghes_edac_dmidecode_dimm(const struct dmi_header *dh, void *arg) { struct ghes_dimm_fill *dimm_fill = arg; struct mem_ctl_info *mci = dimm_fill->mci; + struct memdev_dmi_entry *entry; + struct ghes_dimm *ghes_dimm; + struct dimm_info *dimm; if (dh->type == DMI_ENTRY_MEM_DEVICE) { - struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh; - struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->index, 0, 0); - struct ghes_dimm *ghes_dimm; + entry = (struct memdev_dmi_entry *)dh; + if (entry->phys_mem_array_handle != dimm_fill->link) + return; + + /* + * Always returns non-zero as the mci should have + * allocated the correct number of DIMMs. + */ + dimm = edac_get_dimm_by_index(mci, dimm_fill->index); + if (WARN_ON_ONCE(!dimm)) + return; ghes_dimm = ghes_dimm_alloc(dimm, entry->handle); if (ghes_dimm) @@ -605,29 +649,35 @@ static int ghes_mc_add_or_free(struct mem_ctl_info *mci, static void ghes_mc_free(void) { struct ghes_dimm *ghes_dimm, *tmp; - struct mem_ctl_info *mci = NULL; + struct mem_ctl_info *mci; LIST_HEAD(dimm_list); unsigned long flags; - spin_lock_irqsave(&ghes_lock, flags); + while (1) { + mci = NULL; - list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) { - mci = mci ?: ghes_dimm->dimm->mci; - WARN_ON_ONCE(mci != ghes_dimm->dimm->mci); - list_move_tail(&ghes_dimm->entry, &dimm_list); - } + spin_lock_irqsave(&ghes_lock, flags); - WARN_ON_ONCE(!list_empty(&ghes_dimm_list)); + list_for_each_entry_safe(ghes_dimm, tmp, &ghes_dimm_list, entry) { + mci = mci ?: ghes_dimm->dimm->mci; + if (mci != ghes_dimm->dimm->mci) + continue; + list_move_tail(&ghes_dimm->entry, &dimm_list); + } - spin_unlock_irqrestore(&ghes_lock, flags); + WARN_ON_ONCE(!mci && !list_empty(&ghes_dimm_list)); - ghes_dimm_release(&dimm_list); + spin_unlock_irqrestore(&ghes_lock, flags); - if (!mci) - return; + ghes_dimm_release(&dimm_list); + + if (!mci) + return; + + mci = edac_mc_del_mc(mci->pdev); + if (!mci) + continue; - mci = edac_mc_del_mc(mci->pdev); - if (mci) { platform_device_unregister(to_platform_device(mci->pdev)); edac_mc_free(mci); } @@ -661,7 +711,8 @@ static int ghes_edac_register_fake(struct device *dev) return ghes_mc_add_or_free(mci, &dimm_list); } -static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm) +static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm, + u16 handle) { struct ghes_dimm_fill dimm_fill; struct mem_ctl_info *mci; @@ -672,16 +723,18 @@ static int ghes_edac_register_one(struct device *dev, int mc_idx, int num_dimm) dimm_fill.index = 0; dimm_fill.mci = mci; + dimm_fill.link = handle; INIT_LIST_HEAD(&dimm_fill.dimms); - dmi_walk(ghes_edac_dmidecode, &dimm_fill); + dmi_walk(ghes_edac_dmidecode_dimm, &dimm_fill); return ghes_mc_add_or_free(mci, &dimm_fill.dimms); } int ghes_edac_register(struct ghes *ghes, struct device *dev) { - int rc = 0, num_dimm = 0; + struct ghes_mci_fill mci_fill = { }; + int rc = 0; int idx; if (IS_ENABLED(CONFIG_X86)) { @@ -703,13 +756,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) goto unlock; /* Get the number of DIMMs */ - dmi_walk(ghes_edac_count_dimms, &num_dimm); + dmi_walk(ghes_edac_dmidecode_mci, &mci_fill); - rc = ghes_dimm_init(num_dimm ?: 1); + rc = ghes_dimm_init(mci_fill.count ?: 1); if (rc) goto unlock; - if (!num_dimm) { + if (!mci_fill.count) { /* * Bogus BIOS: Ignore DMI topology and use a single MC * with only one DIMM for the whole address range to @@ -732,10 +785,34 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev) pr_info("So, the end result of using this driver varies from vendor to vendor.\n"); pr_info("If you find incorrect reports, please contact your hardware vendor\n"); pr_info("to correct its BIOS.\n"); - pr_info("This system has %d DIMM sockets.\n", num_dimm); + pr_info("This system has %d DIMM sockets.\n", mci_fill.count); } - rc = ghes_edac_register_one(dev, 0, num_dimm); + mci_fill.map = kcalloc(BITS_TO_LONGS(mci_fill.count), + sizeof(*mci_fill.map), GFP_KERNEL); + + if (!mci_fill.map) { + rc = -ENOMEM; + goto unlock; + } + + while (1) { + dmi_walk(ghes_edac_dmidecode_mci, &mci_fill); + if (!mci_fill.num_dimm) + break; + + rc = ghes_edac_register_one(dev, mci_fill.num_mc, + mci_fill.num_dimm, mci_fill.handle); + if (rc) + break; + + mci_fill.index = 0; + mci_fill.num_dimm = 0; + mci_fill.num_mc++; + } + + kfree(mci_fill.map); + if (rc) goto unlock;
The ghes driver only creates one memory controller for the whole system. This does not reflect memory topology especially in multi-node systems. E.g. a Marvell ThunderX2 system shows: /sys/devices/system/edac/mc/mc0/dimm0 /sys/devices/system/edac/mc/mc0/dimm1 /sys/devices/system/edac/mc/mc0/dimm2 /sys/devices/system/edac/mc/mc0/dimm3 /sys/devices/system/edac/mc/mc0/dimm4 /sys/devices/system/edac/mc/mc0/dimm5 /sys/devices/system/edac/mc/mc0/dimm6 /sys/devices/system/edac/mc/mc0/dimm7 /sys/devices/system/edac/mc/mc0/dimm8 /sys/devices/system/edac/mc/mc0/dimm9 /sys/devices/system/edac/mc/mc0/dimm10 /sys/devices/system/edac/mc/mc0/dimm11 /sys/devices/system/edac/mc/mc0/dimm12 /sys/devices/system/edac/mc/mc0/dimm13 /sys/devices/system/edac/mc/mc0/dimm14 /sys/devices/system/edac/mc/mc0/dimm15 The DIMMs 9-15 are located on the 2nd node of the system. On comparable x86 systems there is one memory controller per node. The ghes driver should also group DIMMs depending on the topology and create one MC per node. There are several options to detect the topology. ARM64 systems retrieve the (NUMA) node information from the ACPI SRAT table (see acpi_table_parse_srat()). The node id is later stored in the physical address page. The pfn_to_nid() macro could be used for a DIMM after determining its physical address. The drawback of this approach is that there are too many subsystems involved it depends on. It could easily break and makes the implementation complex. E.g. pfn_to_nid() can only be reliable used on mapped address ranges which is not always granted, there are various firmware instances involved which could be broken, or results may vary depending on NUMA settings. Another approach that was suggested by James' is to use the DIMM's physical memory array handle to group DIMMs [1]. The advantage is to only use the information on memory devices from the SMBIOS table that contains a reference to the physical memory array it belongs too. This information is mandatory same as the use of DIMM handle references by GHES to provide the DIMM location of an error. There is only a single table to parse which eases implementation. This patch uses this approach for DIMM grouping. Modify the DMI decoder to also detect the physical memory array a DIMM is linked to and create one memory controller per array to group DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system shows one MC per node now: # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0 /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0 /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0 /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0 /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0 /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0 /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0 /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0 /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0 /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0 /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0 /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0 /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0 /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0 /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0 /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0 [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com Suggested-by: James Morse <james.morse@arm.com> Signed-off-by: Robert Richter <rrichter@marvell.com> --- drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++--------- 1 file changed, 107 insertions(+), 30 deletions(-)