Message ID | 20220929232712.12202-14-Sergey.Semin@baikalelectronics.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | EDAC/mc/synopsys: Various fixes and cleanups | expand |
On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote: > In case of the unique index allocation it's not that optimal to always > rely on the low-level device drivers (platform drivers), because they get > to start to implement either the same design pattern (for instance global > static MC counter) or may end-up with having non-unique index eventually > at runtime. Needless to say that having a generic unique index > allocation/tracking procedure will make code more readable and safer. I guess this is trying to say that the current memory controller index thing doesn't work. But why doesn't it work? It works just fine with the x86 drivers - there the memory controller number is the same as the node number where it is located so that works just fine. If that scheme cannot work on other systems, then I need to see an explanation why it cannot work first. > The suggested implementation is based on the kernel IDA infrastructure > exposed by the lib/idr.c driver with API described in linux/idr.h header > file. It's used to create an ID resource descriptor "mc_idr", which then > is utilized either to track the custom MC idx specified by EDAC LLDDs or > to allocate the next-free MC idx. This is talking about the "what" and not the "why". > A new special MC index is introduced here. It's defined by the > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > index is specified by the EDAC LLDD, the MC index will be either retrieved > from the MC device OF-node alias index ("mc[:number:]") or automatically > generated as the next-free MC index found by the ID allocation procedure. This is also talking about the "what" and not the "why". At best, what you're doing should be visible from the patch itself. Here's a longer explanation of how a commit message should look like: https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Thx.
On Wed, Oct 12, 2022 at 07:29:22PM +0200, Borislav Petkov wrote: > On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote: > > In case of the unique index allocation it's not that optimal to always > > rely on the low-level device drivers (platform drivers), because they get > > to start to implement either the same design pattern (for instance global > > static MC counter) or may end-up with having non-unique index eventually > > at runtime. Needless to say that having a generic unique index > > allocation/tracking procedure will make code more readable and safer. > > I guess this is trying to say that the current memory controller index > thing doesn't work. But why doesn't it work? From what have you got this? I said that the current MC indexing approach wasn't that optimal (always relying on the low-level driver to allocate the index) because it caused having the same IDx allocation pattern re-implemented in the drivers. It can be avoided by the provided patch. The unified approach makes code indeed more readable in the platform drivers and safer since they didn't have to bother with more coding. See for instance the drivers with the static variable-based IDs allocation. It doesn't seem like these drivers bother with the detected DDR devices order. If so then the automatic IDs allocation is perfect for them. Note the static variable increment isn't atomic. Thus the ID allocation algorithm there is prone to races should the devices probe is run concurrently. > > It works just fine with the x86 drivers - there the memory controller > number is the same as the node number where it is located so that works > just fine. > > If that scheme cannot work on other systems, then I need to see an > explanation why it cannot work first. > > > The suggested implementation is based on the kernel IDA infrastructure > > exposed by the lib/idr.c driver with API described in linux/idr.h header > > file. It's used to create an ID resource descriptor "mc_idr", which then > > is utilized either to track the custom MC idx specified by EDAC LLDDs or > > to allocate the next-free MC idx. > > This is talking about the "what" and not the "why". > > > A new special MC index is introduced here. It's defined by the > > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > > index is specified by the EDAC LLDD, the MC index will be either retrieved > > from the MC device OF-node alias index ("mc[:number:]") or automatically > > generated as the next-free MC index found by the ID allocation procedure. > > This is also talking about the "what" and not the "why". > > At best, what you're doing should be visible from the patch itself. > > Here's a longer explanation of how a commit message should look like: > > https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes Have you read it yourself? Here is a short excerpt from there: "Once the problem is established, describe what you are actually doing about it in technical detail. It's important to describe the change in plain English for the reviewer to verify that the code is behaving as you intend it to." So the "problem" is described in the first paragraph and the technical details in the later paragraphs. -Sergey > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote: > The unified approach makes code indeed more readable in the platform > drivers and safer since they didn't have to bother with more coding. > See for instance the drivers with the static variable-based IDs > allocation. Which drivers? Concrete examples please. > Have you read it yourself? Yes. I even have improved it over the years. > Here is a short excerpt from there: > "Once the problem is established, describe what you are actually doing > about it in technical detail. It's important to describe the change > in plain English for the reviewer to verify that the code is behaving > as you intend it to." Maybe that part can be misunderstood: "describe what you're doing about it". That doesn't mean the text should explain what you're adding and how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I can see that from the diff. So let me try to explain to you what I'm expecting from commit messages in the EDAC tree: The commit message should explain *why* a change is being done so that, months, years from now, when you've gone on to do something else, people doing git archeology can actually figure out *why* this change was done. And the explanation in that commit message should be *complete* and should contain *all* necessary information to understand why this change was done. Your commit message is not explaining the problem. "In case of the unique index allocation it's not that optimal to always rely on the low-level device drivers (platform drivers)" That's your statement. That needs to have exact details so that people can look at that commit message, look at the code which *you* point them to in it and go, aha, that is the problem. "because they get to start to implement either the same design pattern (for instance global static MC counter) or may end-up with having non-unique index eventually at runtime." Who are they, exact pointers please. "The suggested implementation is based on the kernel IDA infrastructure exposed by the lib/idr.c driver with API described in linux/idr.h header file." That doesn't matter one bit for the change you're doing. You could have added it under the "---" line. "A new special MC index is introduced here. It's defined by the EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM index is specified by the EDAC LLDD, the MC index will be either retrieved from the MC device OF-node alias index ("mc[:number:]") or automatically generated as the next-free MC index found by the ID allocation procedure." Some of that paragraph should go over the function as a comment - not in the commit message as it pertains to what the function does and it would make a *lot* more sense there when someone tries to figure out what the function does instead of in the commit message. So, I'm still not convinced why do some EDAC drivers need unique MC identifiers, why the current scheme doesn't work and where it doesn't work.
On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote: > On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote: > "A new special MC index is introduced here. It's defined by the > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > index is specified by the EDAC LLDD, the MC index will be either retrieved > from the MC device OF-node alias index ("mc[:number:]") or automatically > generated as the next-free MC index found by the ID allocation procedure." Just curious.If I have an EDAC driver using EDAC_AUTO_MC_NUM, and I unload and reload the driver a bunch of times. Do I get the same memory controller numbers each reload, or get an ever increasing set each time? -Tony
On Wed, Oct 12, 2022 at 01:44:59PM -0700, Tony Luck wrote: > On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote: > > On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote: > > > "A new special MC index is introduced here. It's defined by the > > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > > index is specified by the EDAC LLDD, the MC index will be either retrieved > > from the MC device OF-node alias index ("mc[:number:]") or automatically > > generated as the next-free MC index found by the ID allocation procedure." > > Just curious.If I have an EDAC driver using EDAC_AUTO_MC_NUM, and I > unload and reload the driver a bunch of times. Do I get the same memory > controller numbers each reload, or get an ever increasing set each time? You'll get the same numbers if no other device-driver allocated them (which we already established isn't relevant to EDAC since only one driver per system is supposed to work). In order to implement what you said I should have called idr_alloc_cyclic() instead. Here is what the suggested implementation implies: 1. If there is a particular ID assigned in the framework of the dts aliases (i.e. mcX = &ddr0) then that value will be selected for the controller. 2. If no particular ID is assigned but there are mcX-like aliases defined in DTS, then the next free id greater than the last mcX value will be assigned. 3. If none if the above is relevant the first free ID in the pool will be allocated. 4. On the driver unload the driver remove method will be called thus the allocated ID will be freed on the MC-controller de-registration. Thus on the next driver loading the freed IDs will be reused by the idr_alloc() method design. -Sergey > > -Tony
On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote: > On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote: > > The unified approach makes code indeed more readable in the platform > > drivers and safer since they didn't have to bother with more coding. > > See for instance the drivers with the static variable-based IDs > > allocation. > > Which drivers? Concrete examples please. See below. > > > Have you read it yourself? > > Yes. I even have improved it over the years. > > > Here is a short excerpt from there: > > "Once the problem is established, describe what you are actually doing > > about it in technical detail. It's important to describe the change > > in plain English for the reviewer to verify that the code is behaving > > as you intend it to." > > Maybe that part can be misunderstood: "describe what you're doing about > it". That doesn't mean the text should explain what you're adding and > how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I > can see that from the diff. > > So let me try to explain to you what I'm expecting from commit messages > in the EDAC tree: > > The commit message should explain *why* a change is being done so that, > months, years from now, when you've gone on to do something else, people > doing git archeology can actually figure out *why* this change was done. > > And the explanation in that commit message should be *complete* and > should contain *all* necessary information to understand why this change > was done. A level of completeness can be relative to each person. For all the years I've submitting the patches to the kernel I couldn't even remember the last request to elaborate my logs. In no means I want to say they were perfect. I could just be too immersed into the problem so thought that the provided text was descriptive enough especially for the subsystem maintainer. So to speak asking for more details would be more than enough. > > Your commit message is not explaining the problem. > > "In case of the unique index allocation it's not that optimal to always > rely on the low-level device drivers (platform drivers)" > > That's your statement. That needs to have exact details so that people > can look at that commit message, look at the code which *you* point them > to in it and go, aha, that is the problem. > > "because they get to start to implement either the same design pattern > (for instance global static MC counter) or may end-up with having > non-unique index eventually at runtime." > > Who are they, exact pointers please. So you need more details. You should have just asked. I can't read your mind after all. IMO the description was detailed enough to understand the problem. Anyway as I already said the current MC indexing approach wasn't that optimal (always relying on the low-level driver to allocate the index) because it caused having the same IDx allocation pattern re-implemented in the drivers. The brightest example is the drivers with the static variable-based IDs allocation. It doesn't seem like these drivers bother with the detected DDR devices order. If so then the automatic IDs allocation is perfect for them. Instead they can just pass the EDAC_AUTO_MC_NUM id to the edac_mc_alloc() method and drop the static-based pattern. Thus getting smaller and more readable drivers code. Moreover the variable increment isn't atomic. Thus the ID allocation algorithm there is prone to races should the devices probe is run concurrently. The last but not least there is no currently way to assign the controllers ID by means of the DTS file. The suggested patch provides such functionality by means of the DT aliases. If it describes the problem better I'll add the text to the patchlog on the next patchset re-spin. > > "The suggested implementation is based on the kernel IDA infrastructure > exposed by the lib/idr.c driver with API described in linux/idr.h header > file." > > That doesn't matter one bit for the change you're doing. You could have > added it under the "---" line. Ok. I'll drop it from the log. > > "A new special MC index is introduced here. It's defined by the > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM > index is specified by the EDAC LLDD, the MC index will be either retrieved > from the MC device OF-node alias index ("mc[:number:]") or automatically > generated as the next-free MC index found by the ID allocation procedure." > > Some of that paragraph should go over the function as a comment - not in > the commit message as it pertains to what the function does and it would > make a *lot* more sense there when someone tries to figure out what the > function does instead of in the commit message. > IMO the function isn't that complex to add the comment there. The semantic can be easily enough inferred from the implementation. > So, I'm still not convinced why do some EDAC drivers need unique MC > identifiers, why the current scheme doesn't work and where it doesn't > work. Once again. I didn't say it didn't work. I said it wasn't optimal. Though in some circumstance it can misbehave for some drivers. Please see my response above. -Sergey > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Oct 13, 2022 at 01:30:39AM +0300, Serge Semin wrote: > A level of completeness can be relative to each person. For all the > years I've submitting the patches to the kernel I couldn't even > remember the last request to elaborate my logs. In no means I want to > say they were perfect. I could just be too immersed into the problem > so thought that the provided text was descriptive enough especially > for the subsystem maintainer. So to speak asking for more details > would be more than enough. Dude, are you even reading what I'm writing to you?! I don't care how immersed you were in the problem and who asked or didn't ask you to elaborate your logs. If you're submitting patches to the EDAC tree, those logs need to be complete and explain things sufficiently and exactly. Period. > So you need more details. You should have just asked. I can't read > your mind after all. And I can't read yours too. And I asked like three times already. And yet, you still are not giving me a concrete answer. I said "exact pointers please". That means, you point me to a driver and the *exact* *code* in there which you think is doing something which needs fixing. What you've given me again is the same spiel as before. So let me save you and me some time: your patches are not going anywhere until they explain the thing they're fixing properly and precisely. End of story.
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c index 24814839d885..634c41ea7804 100644 --- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -29,6 +29,9 @@ #include <linux/edac.h> #include <linux/bitops.h> #include <linux/uaccess.h> +#include <linux/idr.h> +#include <linux/of.h> + #include <asm/page.h> #include "edac_mc.h" #include "edac_module.h" @@ -46,6 +49,7 @@ EXPORT_SYMBOL_GPL(edac_op_state); /* lock to memory controller's control array */ static DEFINE_MUTEX(mem_ctls_mutex); static LIST_HEAD(mc_devices); +static DEFINE_IDR(mc_idr); /* * Used to lock EDAC MC to just one module, avoiding two drivers e. g. @@ -493,7 +497,64 @@ void edac_mc_reset_delay_period(unsigned long value) mutex_unlock(&mem_ctls_mutex); } +/** + * edac_mc_alloc_id() - Allocate unique Memory Controller identifier + * + * @mci: pointer to the mci structure to allocate ID for + * + * Use edac_mc_free_id() to coherently free the MC identifier. + * + * .. note:: + * locking model: must be called with the mem_ctls_mutex lock held + * + * Returns: + * 0 on Success, or an error code on failure + */ +static int edac_mc_alloc_id(struct mem_ctl_info *mci) +{ + struct device_node *np = dev_of_node(mci->pdev); + int ret, min, max; + + if (mci->mc_idx == EDAC_AUTO_MC_NUM) { + ret = of_alias_get_id(np, "mc"); + if (ret >= 0) { + min = ret; + max = ret + 1; + } else { + min = of_alias_get_highest_id("mc"); + if (min >= 0) + min++; + else + min = 0; + + max = 0; + } + } else { + min = mci->mc_idx; + max = mci->mc_idx + 1; + } + + ret = idr_alloc(&mc_idr, mci, min, max, GFP_KERNEL); + if (ret < 0) + return ret == -ENOSPC ? -EBUSY : ret; + + mci->mc_idx = ret; + + return 0; +} +/** + * edac_mc_free_id() - Free Memory Controller identifier + * + * @mci: pointer to the mci structure to free ID from + * + * .. note:: + * locking model: must be called with the mem_ctls_mutex lock held + */ +static void edac_mc_free_id(struct mem_ctl_info *mci) +{ + idr_remove(&mc_idr, mci->mc_idx); +} /** * edac_mc_init_labels() - Initialize DIMM labels @@ -612,7 +673,8 @@ EXPORT_SYMBOL_GPL(edac_get_owner); int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci, const struct attribute_group **groups) { - int ret = -EINVAL; + int ret; + edac_dbg(0, "\n"); #ifdef CONFIG_EDAC_DEBUG @@ -649,20 +711,30 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci, goto fail0; } + ret = edac_mc_alloc_id(mci); + if (ret) { + edac_printk(KERN_ERR, EDAC_MC, "failed to allocate MC idx %u\n", + mci->mc_idx); + goto fail0; + } + edac_mc_init_labels(mci); - if (add_mc_to_global_list(mci)) - goto fail0; + if (add_mc_to_global_list(mci)) { + ret = -EINVAL; + goto fail1; + } /* set load time so that error rate can be tracked */ mci->start_time = jiffies; mci->bus = edac_get_sysfs_subsys(); - if (edac_create_sysfs_mci_device(mci, groups)) { + ret = edac_create_sysfs_mci_device(mci, groups); + if (ret) { edac_mc_printk(mci, KERN_WARNING, "failed to create sysfs device\n"); - goto fail1; + goto fail2; } if (mci->edac_check) { @@ -686,9 +758,12 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci, mutex_unlock(&mem_ctls_mutex); return 0; -fail1: +fail2: del_mc_from_global_list(mci); +fail1: + edac_mc_free_id(mci); + fail0: mutex_unlock(&mem_ctls_mutex); return ret; @@ -716,6 +791,8 @@ struct mem_ctl_info *edac_mc_del_mc(struct device *dev) if (del_mc_from_global_list(mci)) edac_mc_owner = NULL; + edac_mc_free_id(mci); + mutex_unlock(&mem_ctls_mutex); if (mci->edac_check) diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h index 881b00eadf7a..4b6676235b1b 100644 --- a/drivers/edac/edac_mc.h +++ b/drivers/edac/edac_mc.h @@ -23,6 +23,7 @@ #define _EDAC_MC_H_ #include <linux/kernel.h> +#include <linux/limits.h> #include <linux/types.h> #include <linux/module.h> #include <linux/spinlock.h> @@ -37,6 +38,9 @@ #include <linux/workqueue.h> #include <linux/edac.h> +/* Generate MC identifier automatically */ +#define EDAC_AUTO_MC_NUM UINT_MAX + #if PAGE_SHIFT < 20 #define PAGES_TO_MiB(pages) ((pages) >> (20 - PAGE_SHIFT)) #define MiB_TO_PAGES(mb) ((mb) << (20 - PAGE_SHIFT))
In case of the unique index allocation it's not that optimal to always rely on the low-level device drivers (platform drivers), because they get to start to implement either the same design pattern (for instance global static MC counter) or may end-up with having non-unique index eventually at runtime. Needless to say that having a generic unique index allocation/tracking procedure will make code more readable and safer. The suggested implementation is based on the kernel IDA infrastructure exposed by the lib/idr.c driver with API described in linux/idr.h header file. It's used to create an ID resource descriptor "mc_idr", which then is utilized either to track the custom MC idx specified by EDAC LLDDs or to allocate the next-free MC idx. A new special MC index is introduced here. It's defined by the EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM index is specified by the EDAC LLDD, the MC index will be either retrieved from the MC device OF-node alias index ("mc[:number:]") or automatically generated as the next-free MC index found by the ID allocation procedure. Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru> --- Note the approach implemented here has been partly ported from the SPI core driver using IDA to track/allocate SPI bus numbers. Link: https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi.c#L2957 --- drivers/edac/edac_mc.c | 89 +++++++++++++++++++++++++++++++++++++++--- drivers/edac/edac_mc.h | 4 ++ 2 files changed, 87 insertions(+), 6 deletions(-)