Message ID | 1531762009-15112-1-git-send-email-tbaicar@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: > Enable per-layer error reporting for ARM systems so that the error > counters are incremented per-DIMM. > > On ARM systems that use firmware first error handling it is understood > that card=channel and module=DIMM on that channel. Populate that > information and enable per layer error reporting for ARM systems so that > the EDAC error counters are incremented based on DIMM number as per the > SMBIOS table rather than just incrementing the noinfo counters on the > memory controller. I guess. James? > Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> > --- > drivers/edac/ghes_edac.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c > index 473aeec..e4c8b6e 100644 > --- a/drivers/edac/ghes_edac.c > +++ b/drivers/edac/ghes_edac.c > @@ -213,9 +213,18 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) > strcpy(e->label, "unknown label"); > e->msg = pvt->msg; > e->other_detail = pvt->other_detail; > - e->top_layer = -1; > - e->mid_layer = -1; > - e->low_layer = -1; <----- newline here. > + if ((IS_ENABLED(CONFIG_ARM) || IS_ENABLED(CONFIG_ARM64)) > + && (mem_err->validation_bits & CPER_MEM_VALID_CARD) > + && (mem_err->validation_bits & CPER_MEM_VALID_MODULE)) { > + e->top_layer = mem_err->card; > + e->mid_layer = mem_err->module; > + e->low_layer = -1; > + e->enable_per_layer_report = true; > + } else { > + e->top_layer = -1; > + e->mid_layer = -1; > + e->low_layer = -1; > + } ditto. > *pvt->other_detail = '\0'; > *pvt->msg = '\0'; > > --
Hi guys, On 19/07/18 15:01, Borislav Petkov wrote: > On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: >> Enable per-layer error reporting for ARM systems so that the error >> counters are incremented per-DIMM. >> >> On ARM systems that use firmware first error handling it is understood understood by whom? Is this written down somewhere, or is it the convention. (in which case, lets get it written down somewhere) >> that card=channel and module=DIMM on that channel. Populate that I'm guessing this is the mapping between CPER records and the DMItable data. >> information and enable per layer error reporting for ARM systems so that >> the EDAC error counters are incremented based on DIMM number as per the >> SMBIOS table rather than just incrementing the noinfo counters on the >> memory controller. Does this work on x86, and its just the dmi/cper fields have a subtle difference? > I guess. > > James? I don't know anything about this stuff. Looks like the SMBIOS specification is my new bedtime reading. Thanks, James
On 7/19/2018 10:46 AM, James Morse wrote: > On 19/07/18 15:01, Borislav Petkov wrote: >> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: >>> Enable per-layer error reporting for ARM systems so that the error >>> counters are incremented per-DIMM. >>> >>> On ARM systems that use firmware first error handling it is understood > understood by whom? Is this written down somewhere, or is it the convention. (in > which case, lets get it written down somewhere) Hey Boris, James, It has just been convention, but Harb recently brought up the idea of adding it to SBBR. >>> that card=channel and module=DIMM on that channel. Populate that > I'm guessing this is the mapping between CPER records and the DMItable data. Unfortunately the DMI table doesn't actually have channel and DIMM number values which makes this more complicated than I originally thought... >>> information and enable per layer error reporting for ARM systems so that >>> the EDAC error counters are incremented based on DIMM number as per the >>> SMBIOS table rather than just incrementing the noinfo counters on the >>> memory controller. > Does this work on x86, and its just the dmi/cper fields have a subtle difference? There are CPU specific EDAC drivers for a lot of x86 folks and those drivers populate the layer information in a custom way. With more investigation and testing it turns out a simple patch like this is not going to work. This worked for me on a 1DPC board since the card number turned out to always be the same as the index into the DMI table to find the proper DIMM. On a 2DPC board this fails completely. The ghes_edac driver only sets up a single layer so it is only using the card number with this patch. That setup can be seen here: https://elixir.bootlin.com/linux/v4.18-rc5/source/drivers/edac/ghes_edac.c#L469 So it is only setting up a single layer with all the DIMMs on that layer. In order to properly enable the layers to represent channel and DIMM number on that channel, we would need to have a way of determining the number of channels (which would be layers[0].size) and the number of DIMMs each channel supported (layers[1].size). There doesn't appear to be a way to determine that information at this point. With the current ghes_edac setup, it seems the only way this could work would be to have the firmware always report the module value to be the index into the DMI table that this DIMM information lives. When I say index into the DMI table, I'm meaning the index into the list of "type 17" DMI entries. So, DIMM number doesn't actually matter, what really matters is the ordering of the type 17 entries in the DMI table. This seems pretty hacky to me, so if anyone has other suggestions please share them. The goal is to be able to enable the per layer error reporting in the ghes_edac driver so that the per dimm counters exposed in the EDAC sysfs nodes are properly updated. The other obvious but more messy way would be to have notifiers register to be called by ghes_edac and have a custom EDAC driver for each CPU to properly populate their layer information. Thanks, Tyler
On Thu, Jul 19, 2018 at 02:36:21PM -0400, Tyler Baicar wrote: > With the current ghes_edac setup, it seems the only way this could > work would be to have the firmware always report the module value to My experience with firmware so far is that it is a lost cause, considering all the bugs, snafus and incompleteness it demonstrates... > The other obvious but more messy way would be to have notifiers > register to be called by ghes_edac and have a custom EDAC driver for > each CPU to properly populate their layer information. ... which is what we do on x86 and whitelist only known-good platforms in ghes_edac, which claim that their fw info is correct.
Hi guys, (CC: +Fan Wu) On 19/07/18 19:36, Tyler Baicar wrote: > On 7/19/2018 10:46 AM, James Morse wrote: >> On 19/07/18 15:01, Borislav Petkov wrote: >>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: >>>> Enable per-layer error reporting for ARM systems so that the error >>>> counters are incremented per-DIMM. This 'layer' term seems to be EDAC's artificial view of memory. >> I'm guessing this is the mapping between CPER records and the DMItable data. > Unfortunately the DMI table doesn't actually have channel and DIMM number values > which makes this more complicated than I originally thought... >>>> information and enable per layer error reporting for ARM systems so that >>>> the EDAC error counters are incremented based on DIMM number as per the >>>> SMBIOS table rather than just incrementing the noinfo counters on the >>>> memory controller. >> Does this work on x86, and its just the dmi/cper fields have a subtle difference? > There are CPU specific EDAC drivers for a lot of x86 folks and those drivers > populate the layer information in a custom way. Not for GHES surely? > With more investigation and testing it turns out a simple patch like this is not > going to work. This worked for > me on a 1DPC board since the card number turned out to always be the same as the > index into the DMI table > to find the proper DIMM. On a 2DPC board this fails completely. The ghes_edac > driver only sets up a single > layer so it is only using the card number with this patch. (DPC == DIMM per Channel?) > So it is only setting up a single layer with all the DIMMs on that layer. In > order to properly enable the layers > to represent channel and DIMM number on that channel, we would need to have a > way of determining the > number of channels (which would be layers[0].size) and the number of DIMMs each > channel supported > (layers[1].size). There doesn't appear to be a way to determine that information > at this point. So, we're after a mapping for EDAC:layers that includes 'channel'? What would you do with a channel counter? Isn't that part of the SoC? (it can't be replaced!) > The goal is to be able to enable the per layer error reporting in the ghes_edac > driver so that the per dimm counters exposed in the EDAC sysfs nodes are properly > updated. What do you mean by layer? I can't find anything in the ACPI/UEFI/SMBIOS specs that uses this term... If its just 'per dimm counters' you're after, this looks straightforward. > The other obvious but more messy way would be to have notifiers register to be called > by ghes_edac and have a custom EDAC driver for each CPU to properly populate their layer > information. Yuck. This isn't platform specific, its firmware specific. You're hooking the driver to say "my firmware thinks 'card' means this". Where would this information come from? We don't want per-soc,per-firmware-version code mangling what was supposed to be a standard interface. ... we can't be the first people to try and do this ... [re-ordered hunk:] > This seems pretty hacky to me, so if anyone has other suggestions please share > them. CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should provide the information necessary to identify the failing FRU". As EDAC has three 'levels', these are what they should correspond to for ghes-edac. I assume NODE means rack/chassis in some distributed system. Lets ignore it as it doesn't seem to map to anything in the SMBIOS table. The CPER record's card and module numbers are useless to us, as we need to know how many there will be in advance. (does this version of firmware count from 0 or 1?) ... but CPER also gives us a 'Card Handle' and 'Module Handle'. 'Module Handle' maps to SMBIOS:17 Memory Device (aka, a DIMM). The Handle is a word-value in the structure, so it doesn't depend on the layout/parse-order of the SMBIOS tables. When we count the DIMMs in edac-ghes we can give them some level-idx, then use the handle to find which level-idx to use for this DIMM. ghes_edac_report_mem_error() already picks up the module-handle, but only uses it to print the bank/device. 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array Structure", which the Memory Device structure also points to. Card then must mean "a collection of memory devices (DIMMs) that operate together to form an address space". This might be what I think of as a memory-controller, or it might be something more complicated. Regardless, the CPER records think its relevant. For the edac:layers, we could walk the DMI table to find these structures, and build the layers from them. If the Memory-array-structures are missing, we can use the existing 1:NUM_DIMMS approach. Hope this makes sense! Thanks, James
Hello James, On Thu, Aug 23, 2018 at 5:29 AM James Morse <james.morse@arm.com> wrote: > On 19/07/18 19:36, Tyler Baicar wrote: > > On 7/19/2018 10:46 AM, James Morse wrote: > >> On 19/07/18 15:01, Borislav Petkov wrote: > >>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: > >>>> Enable per-layer error reporting for ARM systems so that the error > >>>> counters are incremented per-DIMM. > > This 'layer' term seems to be EDAC's artificial view of memory. > Yes, it's just the terminology that EDAC uses for locating a DIMM. "Layer" can mean several things here: https://elixir.bootlin.com/linux/latest/source/include/linux/edac.h#L318 We should be able to avoid the layer definitions with the SMBIOS handles. > > >> Does this work on x86, and its just the dmi/cper fields have a subtle difference? > > > There are CPU specific EDAC drivers for a lot of x86 folks and those drivers > > populate the layer information in a custom way. > > Not for GHES surely? > Correct, the x86 drivers that properly increment the DIMM error counters are not tied to the ghes_edac driver. > > (DPC == DIMM per Channel?) > Yes. > > The goal is to be able to enable the per layer error reporting in the ghes_edac > > driver so that the per dimm counters exposed in the EDAC sysfs nodes are properly > > updated. > > What do you mean by layer? I can't find anything in the ACPI/UEFI/SMBIOS specs > that uses this term... > > If its just 'per dimm counters' you're after, this looks straightforward. > Yes, we just need a way to increment the per DIMM counters that are exposed by the EDAC sysfs nodes. > [re-ordered hunk:] > > This seems pretty hacky to me, so if anyone has other suggestions please share > > them. > > CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should provide > the information necessary to identify the failing FRU". As EDAC has three > 'levels', these are what they should correspond to for ghes-edac. > > I assume NODE means rack/chassis in some distributed system. Lets ignore it as > it doesn't seem to map to anything in the SMBIOS table. I believe NODE should map to socket number for multi-socket systems. > The CPER record's card and module numbers are useless to us, as we need to know > how many there will be in advance. (does this version of firmware count from 0 > or 1?) > > ... but CPER also gives us a 'Card Handle' and 'Module Handle'. > 'Module Handle' maps to SMBIOS:17 Memory Device (aka, a DIMM). The Handle is a > word-value in the structure, so it doesn't depend on the layout/parse-order of > the SMBIOS tables. When we count the DIMMs in edac-ghes we can give them some > level-idx, then use the handle to find which level-idx to use for this DIMM. > > ghes_edac_report_mem_error() already picks up the module-handle, but only uses > it to print the bank/device. > > 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array > Structure", which the Memory Device structure also points to. > Card then must mean "a collection of memory devices (DIMMs) that operate > together to form an address space". > > This might be what I think of as a memory-controller, or it might be something > more complicated. Regardless, the CPER records think its relevant. > > For the edac:layers, we could walk the DMI table to find these structures, and > build the layers from them. If the Memory-array-structures are missing, we can > use the existing 1:NUM_DIMMS approach. > I think the proper way to get this working would be to use these handles. We can avoid populating this layer information and instead have a mapping of type 17 index number (how edac is numbering the DIMMs today) to the handle number. Then we will need a new function to increment the counter based on the handle number rather than this layer information. Is that how you are envisioning it? Thanks, Tyler
Hi Tyler, On 23/08/18 16:46, Tyler Baicar wrote: > On Thu, Aug 23, 2018 at 5:29 AM James Morse <james.morse@arm.com> wrote: >> On 19/07/18 19:36, Tyler Baicar wrote: >>> On 7/19/2018 10:46 AM, James Morse wrote: >>>> On 19/07/18 15:01, Borislav Petkov wrote: >>>>> On Mon, Jul 16, 2018 at 01:26:49PM -0400, Tyler Baicar wrote: >>>>>> Enable per-layer error reporting for ARM systems so that the error >>>>>> counters are incremented per-DIMM. >> >> This 'layer' term seems to be EDAC's artificial view of memory. >> > > Yes, it's just the terminology that EDAC uses for locating a DIMM. > > "Layer" can mean several things here: > > https://elixir.bootlin.com/linux/latest/source/include/linux/edac.h#L318 Aha, its an enum. I thought it was an upper/middle/lower mapping at the whim of the edac driver. [...] >> [re-ordered hunk:] >>> This seems pretty hacky to me, so if anyone has other suggestions please share >>> them. >> >> CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should provide >> the information necessary to identify the failing FRU". As EDAC has three >> 'levels', these are what they should correspond to for ghes-edac. >> >> I assume NODE means rack/chassis in some distributed system. Lets ignore it as >> it doesn't seem to map to anything in the SMBIOS table. > > I believe NODE should map to socket number for multi-socket systems. Isn't the Memory Array Structure still unique in a multi-socket system? If so the node isn't telling us anything new. Do sockets show up in the SMBIOS table? We would need to know how many there are in advance. For arm systems the cpu topology from PPTT is the best bet for this information, but what do we do if that table is missing? (also, does firmware count from 1 or 0?) I suspect we can't use this field unless we know what the range of values is going to be in advance. I assumed this node must be a level of information above Card/Memory-Array's address-space. Somehow the Card handle isn't no long unique, we need the node number too. If the CPER records were all being pumped at a single agent, (shared BMC in a blade/chassis thing) then this might matter. I suspect we can ignore it in linux. >> The CPER record's card and module numbers are useless to us, as we need to know >> how many there will be in advance. (does this version of firmware count from 0 >> or 1?) >> >> ... but CPER also gives us a 'Card Handle' and 'Module Handle'. >> 'Module Handle' maps to SMBIOS:17 Memory Device (aka, a DIMM). The Handle is a >> word-value in the structure, so it doesn't depend on the layout/parse-order of >> the SMBIOS tables. When we count the DIMMs in edac-ghes we can give them some >> level-idx, then use the handle to find which level-idx to use for this DIMM. >> >> ghes_edac_report_mem_error() already picks up the module-handle, but only uses >> it to print the bank/device. >> >> 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array >> Structure", which the Memory Device structure also points to. >> Card then must mean "a collection of memory devices (DIMMs) that operate >> together to form an address space". >> >> This might be what I think of as a memory-controller, or it might be something >> more complicated. Regardless, the CPER records think its relevant. >> >> For the edac:layers, we could walk the DMI table to find these structures, and >> build the layers from them. If the Memory-array-structures are missing, we can >> use the existing 1:NUM_DIMMS approach. > I think the proper way to get this working would be to use these handles. We can > avoid populating this layer information and instead have a mapping of type 17 > index number (how edac is numbering the DIMMs today) to the handle number. Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what EDAC_MC_LAYER_SLOT is for? > Then we will need a new function to increment the counter based on the handle > number rather than this layer information. Is that how you are envisioning it? I'm not familiar with edac's internals, so I didn't have any particular vision! Isn't the problem that ghes_edac_report_mem_error() does this: | e->top_layer = -1; | e->mid_layer = -1; | e->low_layer = -1; so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't read what it does with this information yet). ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its set, it uses the handle to find the bank/device strings and prints them out. Naively I thought we could generate some index during ghes_edac_count_dimms(), and use this as e->${whichever}_layer. I hoped there would be something we could already use as the index, but I can't spot it, so this will be more than the one-liner I was hoping for! Thanks, James
On Fri, Aug 24, 2018 at 10:48:24AM +0100, James Morse wrote: > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? Yap. > so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't > read what it does with this information yet). See edac_inc_ce_error(), for example - it uses the layers which are not negative (-1) to increment the error counts of the respective layer. It all depends on what granularity of the hardware part you're reporting the error for: is it a DIMM rank, a whole DIMM or for a channel which can span multiple DIMM ranks. And so on... Look at some of the drivers and how they're doing that layering. It all depends on whether you can get the precise info from the hw. > ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its > set, it uses the handle to find the bank/device strings and prints them out. Yap, and the error counts are lumped together into /sys/devices/system/edac/mc/mc*/ce_noinfo_count > Naively I thought we could generate some index during ghes_edac_count_dimms(), > and use this as e->${whichever}_layer. I hoped there would be something we could > already use as the index, but I can't spot it, so this will be more than the > one-liner I was hoping for! If you can get that info from the hardware and injecting an error into a DIMM gives you the correct DIMM number so that we can increment the proper counter, then you're golden. I don't think that works reliably on x86, though, therefore the lumping together.
Hi James, > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? Borislav has explained it in his response. Here let me elaborate a little more. To use the layer information you need an accurate way to pinpoint each component in the layer and the parent components in the layers above. For example, to use EDAC_MC_LAYER_SLOT you also need information for the parent layer say EDAC_MC_LAYER_CHANNEL, or another layer on top say EDAC_MC_LAYER_BRANCH. There are no clear ways to get the information from SMBIOS table. In the case of "memory channel" we looked at type 37 which has the exact spelling but it was introduced to support RamBus and Synclink. Not sure we can readily use it for modern architecture concept of "channel/slot". I think it is good enough if we can pin each error to the corresponding DIMM. At the end of the day DIMMs are what customer can replace in the memory system and that's all that they care about. For the manufacturers of the board/chips they have the knowledge to map the specific DIMMs to the upper layer components, so they can easily collect error counter data for upper layers. > CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE > should provide the information necessary to identify the failing FRU". As > EDAC has three 'levels', these are what they should correspond to for ghes- > edac. > > I assume NODE means rack/chassis in some distributed system. Lets ignore it > as it doesn't seem to map to anything in the SMBIOS table. How about type 4 "Processor Information"? > 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array > Structure", which the Memory Device structure also points to. > Card then must mean "a collection of memory devices (DIMMs) that operate > together to form an address space". > > This might be what I think of as a memory-controller, or it might be > something more complicated. Regardless, the CPER records think its relevant. Originally I thought "Card" were memory channel. But looking at the definition of "Card Handle" in CPER: "... this field contains the SMBIOS handle for the Type 16 Memory Array Structure that represents the memory card". So Card is memory controller or something similar to that. Right now ghes-edac assumes one mc. We probably need to map mc(s) to the type 16 instances in SMBIOS table. Thanks, Fan
On Fri, Aug 24, 2018 at 5:48 AM, James Morse <james.morse@arm.com> wrote: > On 23/08/18 16:46, Tyler Baicar wrote: >> On Thu, Aug 23, 2018 at 5:29 AM James Morse <james.morse@arm.com> wrote: >>> On 19/07/18 19:36, Tyler Baicar wrote: >>>> This seems pretty hacky to me, so if anyone has other suggestions please share >>>> them. >>> >>> CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should provide >>> the information necessary to identify the failing FRU". As EDAC has three >>> 'levels', these are what they should correspond to for ghes-edac. >>> >>> I assume NODE means rack/chassis in some distributed system. Lets ignore it as >>> it doesn't seem to map to anything in the SMBIOS table. >> >> I believe NODE should map to socket number for multi-socket systems. > > Isn't the Memory Array Structure still unique in a multi-socket system? If so > the node isn't telling us anything new. Yes, the Memory Array structure in SMBIOS is still unique, but the NODE value is needed in NODE, CARD, MODULE because the CARD number here typically maps to channel number which each socket has their own channel numbers. (i.e. socket 0 can have channel 0 and socket 1 can have a channel 0) > Do sockets show up in the SMBIOS table? We would need to know how many there are > in advance. For arm systems the cpu topology from PPTT is the best bet for this > information, but what do we do if that table is missing? (also, does firmware > count from 1 or 0?) I suspect we can't use this field unless we know what the > range of values is going to be in advance. An Fan mentioned in his response, what the customers really care about is mapping to a particular DIMM since that is what they can replace. To do this, the Memory Device handle should be enough since those are all unique regardless of Memory Array handle and which socket the DIMM is on. The Firmware I've worked with counts from 0, but I'm not sure if that is required. That won't matter if we just use the Memory Device handle. >> I think the proper way to get this working would be to use these handles. We can >> avoid populating this layer information and instead have a mapping of type 17 >> index number (how edac is numbering the DIMMs today) to the handle number. > > Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what > EDAC_MC_LAYER_SLOT is for? The problem with the layer reporting is that you need to know all the layer information as Fan mentioned. SoCs can support multiple board combinations (ie 1DPC vs. 2DPC) and there is no standardized way of knowing whether you are booted on a 1DPC or 2DPC board. >> Then we will need a new function to increment the counter based on the handle >> number rather than this layer information. Is that how you are envisioning it? > > I'm not familiar with edac's internals, so I didn't have any particular vision! > > Isn't the problem that ghes_edac_report_mem_error() does this: > | e->top_layer = -1; > | e->mid_layer = -1; > | e->low_layer = -1; The other problem is that the sysfs nodes are all setup with a single layer representing all of the memory on the board. https://elixir.bootlin.com/linux/latest/source/drivers/edac/ghes_edac.c#L469 So the DIMM counters exposed in sysfs are all under a single memory controller and just numbered from 0 to n-1 based on the order in which the type 17 SMBIOS entries show up in the DMI walk. > so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't > read what it does with this information yet). > > ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its > set, it uses the handle to find the bank/device strings and prints them out. Yes, I think this is where we need to add support to increment the count based on that module handle. > Naively I thought we could generate some index during ghes_edac_count_dimms(), > and use this as e->${whichever}_layer. I hoped there would be something we could > already use as the index, but I can't spot it, so this will be more than the > one-liner I was hoping for! We could use what ghes_edac_register does by setting up a single layer with all memory and then keep a map of which module handle maps to which index into that layer. From that it would be easy to increment the proper sysfs exposed DIMM counters using the single layer (that way we can probably avoid the custom increment function I eluded to in my previous response). Thanks, Tyler
Hi Boris, On 24/08/18 13:01, Borislav Petkov wrote: > On Fri, Aug 24, 2018 at 10:48:24AM +0100, James Morse wrote: >> so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't >> read what it does with this information yet). > > See edac_inc_ce_error(), for example - it uses the layers which are not > negative (-1) to increment the error counts of the respective layer. It > all depends on what granularity of the hardware part you're reporting > the error for: is it a DIMM rank, a whole DIMM or for a channel which > can span multiple DIMM ranks. And so on... > > Look at some of the drivers and how they're doing that layering. It all > depends on whether you can get the precise info from the hw. Hmmm, in this example we need the information from firmware, as that is where ghes-edac gets its information from. We already count the module/device/dimms in the smbios table, memory is described as 'EDAC_MC_LAYER_ALL_MEM' with num_dimms. I think all we're missing is which dimm in ghes_edac_report_mem_error(). We have the handle, we just need a number between 1 and num_dimms. If it turns out firmware doesn't populate the handles in its cper records, then we can keep e->enable_per_layer_report false when calling edac_raw_mc_handle_error(). (I suggest we ignore 'card', and just do this for the device/dimms). >> Naively I thought we could generate some index during ghes_edac_count_dimms(), >> and use this as e->${whichever}_layer. I hoped there would be something we could >> already use as the index, but I can't spot it, so this will be more than the >> one-liner I was hoping for! > > If you can get that info from the hardware and injecting an error into > a DIMM gives you the correct DIMM number so that we can increment the > proper counter, then you're golden. I don't think that works reliably on > x86, though, therefore the lumping together. ... 'correct DIMM number' ... Does x86 have another source of memory-topology information it needs to correlate smbios with? For arm there is nothing else describing the memory-topology, so as long as we can correlate the smbios table and ghes:cper records through the handles, we can get this working for all systems. Thanks, James
Hi Fan, On 24/08/18 15:30, wufan wrote: >> Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what >> EDAC_MC_LAYER_SLOT is for? > > Borislav has explained it in his response. Here let me elaborate a little more. > To use the layer information you need an accurate way to pinpoint each component > in the layer and the parent components in the layers above. For example, to use > EDAC_MC_LAYER_SLOT you also need information for the parent layer say > EDAC_MC_LAYER_CHANNEL, or another layer on top say EDAC_MC_LAYER_BRANCH. I haven't spotted anything that forces a particular meaning/topology on these types. (there are four of them, but only three 'levels') i3000_edac.c has EDAC_MC_LAYER_CHIP_SELECT then EDAC_MC_LAYER_CHANNEL, but i5400_edac.c has EDAC_MC_LAYER_BRANCH then EDAC_MC_LAYER_CHANNEL. pnd2_edac.c agrees that channel comes before slot, but thunderx_edac.c just uses slot directly.... ghes-edac already describes memory as 'EDAC_MC_LAYER_ALL_MEM', with num_dimms, I think we just need to get 'the' dimm number. Using the cper-module-handle means we don't have to worry about firmware's count of dimms being different, as we both agree its smbios-type-17 we're talking about. > There > are no clear ways to get the information from SMBIOS table. In the case of "memory > channel" we looked at type 37 which has the exact spelling but it was introduced > to support RamBus and Synclink. Not sure we can readily use it for modern > architecture concept of "channel/slot". I think we should avoid this 'channel' thing as it means different things to different people! We can use ghes:card smbios:physical-memory-array as the UEFI spec tells us they are the same, and we don't actually need to know what they mean. > I think it is good enough if we can pin each error to the corresponding DIMM. > At the end of the day DIMMs are what customer can replace in the memory system > and that's all that they care about. For the manufacturers of the board/chips > they have the knowledge to map the specific DIMMs to the upper layer components, > so they can easily collect error counter data for upper layers. I agree. >> CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE >> should provide the information necessary to identify the failing FRU". As >> EDAC has three 'levels', these are what they should correspond to for ghes- >> edac. >> >> I assume NODE means rack/chassis in some distributed system. Lets ignore it >> as it doesn't seem to map to anything in the SMBIOS table. > > How about type 4 "Processor Information"? As the spec doesn't tell us what the field means, we can't really do anything other than print the value out. >> 'Card' doesn't mean much to me, but it maps to SMBIOS:17 "Memory Array >> Structure", which the Memory Device structure also points to. >> Card then must mean "a collection of memory devices (DIMMs) that operate >> together to form an address space". >> >> This might be what I think of as a memory-controller, or it might be >> something more complicated. Regardless, the CPER records think its relevant. > > Originally I thought "Card" were memory channel. But looking at the definition > of "Card Handle" in CPER: "... this field contains the SMBIOS handle for the > Type 16 Memory Array Structure that represents the memory card". So Card is > memory controller or something similar to that. > Right now ghes-edac assumes > one mc. We probably need to map mc(s) to the type 16 instances in SMBIOS table. I think we should ignore 'mc's, and just report the dimm numbers. ghes-edac only cares about the number of dimms today, and this would work on systems that only describe the dimms in the smbios table. Thanks, James
Hi Tyler, On 24/08/18 16:14, Tyler Baicar wrote: > On Fri, Aug 24, 2018 at 5:48 AM, James Morse <james.morse@arm.com> wrote: >> On 23/08/18 16:46, Tyler Baicar wrote: >>> On Thu, Aug 23, 2018 at 5:29 AM James Morse <james.morse@arm.com> wrote: >>>> On 19/07/18 19:36, Tyler Baicar wrote: >>>>> This seems pretty hacky to me, so if anyone has other suggestions please share >>>>> them. >>>> >>>> CPER's "Memory Error Record 2" thinks that "NODE, CARD and MODULE should provide >>>> the information necessary to identify the failing FRU". As EDAC has three >>>> 'levels', these are what they should correspond to for ghes-edac. >>>> >>>> I assume NODE means rack/chassis in some distributed system. Lets ignore it as >>>> it doesn't seem to map to anything in the SMBIOS table. >>> >>> I believe NODE should map to socket number for multi-socket systems. >> >> Isn't the Memory Array Structure still unique in a multi-socket system? If so >> the node isn't telling us anything new. > Yes, the Memory Array structure in SMBIOS is still unique, but the NODE value > is needed in NODE, CARD, MODULE because the CARD number here typically > maps to channel number which each socket has their own channel numbers. /me flinches at 'typically'. Okay, so the hierarchy applies to the numbers, not the handles. How come there isn't a node handle? I think we should ignore the extra hierarchy. The module/devices/dimms are the only replaceable part, and we don't know if the firmware will provide the information. >> Do sockets show up in the SMBIOS table? We would need to know how many there are >> in advance. For arm systems the cpu topology from PPTT is the best bet for this >> information, but what do we do if that table is missing? (also, does firmware >> count from 1 or 0?) I suspect we can't use this field unless we know what the >> range of values is going to be in advance. > > An Fan mentioned in his response, what the customers really care about > is mapping to > a particular DIMM since that is what they can replace. To do this, the > Memory Device > handle should be enough since those are all unique regardless of > Memory Array handle > and which socket the DIMM is on. The Firmware I've worked with counts > from 0, but I'm > not sure if that is required. If the spec doesn't say, its difficult to know the range of values until we've seen one of the limits. > That won't matter if we just use the Memory Device handle. I agree. >>> I think the proper way to get this working would be to use these handles. We can >>> avoid populating this layer information and instead have a mapping of type 17 >>> index number (how edac is numbering the DIMMs today) to the handle number. >> >> Why get avoid the layer stuff? Isn't counting DIMM/memory-devices what >> EDAC_MC_LAYER_SLOT is for? > > The problem with the layer reporting is that you need to know all the > layer information as Fan mentioned. I haven't spotted what requires this, there seems to be a bit of a mix of numbers-of-layers and what order they go in. thunderx_edac.c just uses SLOT. I think we can get away with using EDAC_MC_LAYER_ALL_MEM, sized by num_dimms (which ghes-edac already does). Providing extra topology may not be useful unless the firmware populates this information. What do we do if we export card+module, but then firmware only populates the module-handle? > SoCs can support multiple board combinations (ie > 1DPC vs. 2DPC) > and there is no standardized way of knowing whether you are booted on a 1DPC or > 2DPC board. > >>> Then we will need a new function to increment the counter based on the handle >>> number rather than this layer information. Is that how you are envisioning it? >> >> I'm not familiar with edac's internals, so I didn't have any particular vision! >> >> Isn't the problem that ghes_edac_report_mem_error() does this: >> | e->top_layer = -1; >> | e->mid_layer = -1; >> | e->low_layer = -1; > > The other problem is that the sysfs nodes are all setup with a single > layer representing > all of the memory on the board. > > https://elixir.bootlin.com/linux/latest/source/drivers/edac/ghes_edac.c#L469 > > So the DIMM counters exposed in sysfs are all under a single memory > controller and just > numbered from 0 to n-1 based on the order in which the type 17 SMBIOS > entries show up > in the DMI walk. User-space depending on the dmi walk order makes me nervous. Doing this gives you per-dimm counters, if user-space needs to know which physical-dimm/slot that is, we'd need a way of matching it with one of the smbios location strings. >> so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't >> read what it does with this information yet). >> >> ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its >> set, it uses the handle to find the bank/device strings and prints them out. > > Yes, I think this is where we need to add support to increment the > count based on that module handle. If layer[0] is EDAC_MC_LAYER_ALL_MEM, sized for num_dimm, don't we just put the dimm number in e->top_layer and flip e->enable_per_layer_report? >> Naively I thought we could generate some index during ghes_edac_count_dimms(), >> and use this as e->${whichever}_layer. I hoped there would be something we could >> already use as the index, but I can't spot it, so this will be more than the >> one-liner I was hoping for! > > We could use what ghes_edac_register does by setting up a single layer > with all memory and > then keep a map of which module handle maps to which index into that > layer. From that it would > be easy to increment the proper sysfs exposed DIMM counters using the > single layer Yes, I think this is what we should do! Thanks, James
On Tue, Aug 28, 2018 at 1:11 PM, James Morse <james.morse@arm.com> wrote: > On 24/08/18 16:14, Tyler Baicar wrote: >> On Fri, Aug 24, 2018 at 5:48 AM, James Morse <james.morse@arm.com> wrote: >>> On 23/08/18 16:46, Tyler Baicar wrote: >>> so edac_raw_mc_handle_error() has no clue where the error happened. (I haven't >>> read what it does with this information yet). >>> >>> ghes_edac_report_mem_error() does check CPER_MEM_VALID_MODULE_HANDLE, and if its >>> set, it uses the handle to find the bank/device strings and prints them out. >> >> Yes, I think this is where we need to add support to increment the >> count based on that module handle. > > If layer[0] is EDAC_MC_LAYER_ALL_MEM, sized for num_dimm, don't we just put the > dimm number in e->top_layer and flip e->enable_per_layer_report? Yes, that is all we would need to do. Figuring out that DIMM number is the issue, but that can be done with the map of module handles to DIMM index. >>> Naively I thought we could generate some index during ghes_edac_count_dimms(), >>> and use this as e->${whichever}_layer. I hoped there would be something we could >>> already use as the index, but I can't spot it, so this will be more than the >>> one-liner I was hoping for! >> >> We could use what ghes_edac_register does by setting up a single layer >> with all memory and >> then keep a map of which module handle maps to which index into that >> layer. From that it would >> be easy to increment the proper sysfs exposed DIMM counters using the >> single layer > > Yes, I think this is what we should do! Sounds good, I'll start working on this! Thanks, Tyler
On Tue, Aug 28, 2018 at 06:09:24PM +0100, James Morse wrote: > Does x86 have another source of memory-topology information it needs to > correlate smbios with? Bah, pinpointing the DIMM on x86 is a mess. There's no reliable way to say which DIMM it is in certain cases (interleaving, mirrorring, ...) and it is all platform-dependent. So we do the layers to dump a memory location (node, memory controller, ....) so that we can at least limit the number of DIMMs the user needs to replace/try. In an ideal world, I'd like to be able to query the SPD chips on the DIMMs and build the topology and then when an error happens to say, "error in DIMM <silkscreen>" where silkscreen is what is written on the motherboard under the DIMM socket. But I don't see that happening any time soon... > For arm there is nothing else describing the memory-topology, so as long as we > can correlate the smbios table and ghes:cper records through the handles, we can > get this working for all systems. And then make sure vendors fill in the proper info in smbios. Because that's also a mess on x86.
Hi Boris, On 29/08/18 08:38, Borislav Petkov wrote: > On Tue, Aug 28, 2018 at 06:09:24PM +0100, James Morse wrote: >> Does x86 have another source of memory-topology information it needs to >> correlate smbios with? > > Bah, pinpointing the DIMM on x86 is a mess. There's no reliable way to > say which DIMM it is in certain cases (interleaving, mirrorring, ...) > and it is all platform-dependent. So we do the layers to dump a memory > location (node, memory controller, ....) so that we can at least limit > the number of DIMMs the user needs to replace/try. Right. I'd like ghes-edac to work in the same way for both architectures. I think this is best done by stuffing the dmi-handle in struct dimm_info during ghes_edac_dmidecode(), then populating the struct edac_raw_error_desc layers from the matching mci->dimms 'location'. For EDAC_MC_LAYER_ALL_MEM this boils down to a flat index, so pointer arithmetic on mci->dimms is an appropriate short cut. (We should probably 'FIXME: It shouldn't be hard to also fill the DIMM labels' at the same time so that no-one is tempted to interpret the edac:dimm-idx) > In an ideal world, I'd like to be able to query the SPD chips on the (oh, that can be done?) > DIMMs and build the topology and then when an error happens to say, > "error in DIMM <silkscreen>" where silkscreen is what is written on the > motherboard under the DIMM socket. > > But I don't see that happening any time soon... >> For arm there is nothing else describing the memory-topology, so as long as we >> can correlate the smbios table and ghes:cper records through the handles, we can >> get this working for all systems. > > And then make sure vendors fill in the proper info in smbios. Because that's > also a mess on x86. I got educated by the people who look after specifications last time I touched this [0]. SMBIOS tables are required by Arm's 'Server Base Boot Requirements', It lists the memory-device and physical-memory-array as required. I will drop them a note that we will be depending on the handle, and it should go on the list too... if its not populated on today's systems we can fall back to !e->enable_per_layer_report as we do today. Thanks, James [0] https://www.spinics.net/lists/arm-kernel/msg653133.html
On Wed, Aug 29, 2018 at 11:20:48AM +0100, James Morse wrote: > Right. I'd like ghes-edac to work in the same way for both architectures. > > I think this is best done by stuffing the dmi-handle in struct dimm_info during > ghes_edac_dmidecode(), then populating the struct edac_raw_error_desc layers > from the matching mci->dimms 'location'. > > For EDAC_MC_LAYER_ALL_MEM this boils down to a flat index, so pointer arithmetic > on mci->dimms is an appropriate short cut. It all sounds nice on paper but you should try it on a couple of machines first. See whether/how it actually works there. Also, this probably would need to not change x86 unless you wanna fix it there too. I'd think twice before I attempt such a thing though :) > (We should probably 'FIXME: It shouldn't be hard to also fill the DIMM labels' > at the same time so that no-one is tempted to interpret the edac:dimm-idx) See above. > > In an ideal world, I'd like to be able to query the SPD chips on the > > (oh, that can be done?) There was some talk initially and I've seen BIOS read SPD chips and showing DIMM info but I've heard the word "proprietary" a couple of times. Haven't dug any deeper though. > I got educated by the people who look after specifications last time I touched > this [0]. SMBIOS tables are required by Arm's 'Server Base Boot Requirements', > It lists the memory-device and physical-memory-array as required. It is always better to have certification on your side. Should make ARM vendors dance. > I will drop them a note that we will be depending on the handle, and it should > go on the list too... if its not populated on today's systems we can fall back > to !e->enable_per_layer_report as we do today. Right.
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c index 473aeec..e4c8b6e 100644 --- a/drivers/edac/ghes_edac.c +++ b/drivers/edac/ghes_edac.c @@ -213,9 +213,18 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err) strcpy(e->label, "unknown label"); e->msg = pvt->msg; e->other_detail = pvt->other_detail; - e->top_layer = -1; - e->mid_layer = -1; - e->low_layer = -1; + if ((IS_ENABLED(CONFIG_ARM) || IS_ENABLED(CONFIG_ARM64)) + && (mem_err->validation_bits & CPER_MEM_VALID_CARD) + && (mem_err->validation_bits & CPER_MEM_VALID_MODULE)) { + e->top_layer = mem_err->card; + e->mid_layer = mem_err->module; + e->low_layer = -1; + e->enable_per_layer_report = true; + } else { + e->top_layer = -1; + e->mid_layer = -1; + e->low_layer = -1; + } *pvt->other_detail = '\0'; *pvt->msg = '\0';
Enable per-layer error reporting for ARM systems so that the error counters are incremented per-DIMM. On ARM systems that use firmware first error handling it is understood that card=channel and module=DIMM on that channel. Populate that information and enable per layer error reporting for ARM systems so that the EDAC error counters are incremented based on DIMM number as per the SMBIOS table rather than just incrementing the noinfo counters on the memory controller. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> --- drivers/edac/ghes_edac.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)