diff mbox series

[RESEND,v3,13/17] EDAC/mc: Add MC unique index allocation procedure

Message ID 20220929232712.12202-14-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State New, archived
Headers show
Series EDAC/mc/synopsys: Various fixes and cleanups | expand

Commit Message

Serge Semin Sept. 29, 2022, 11:27 p.m. UTC
In case of the unique index allocation it's not that optimal to always
rely on the low-level device drivers (platform drivers), because they get
to start to implement either the same design pattern (for instance global
static MC counter) or may end-up with having non-unique index eventually
at runtime. Needless to say that having a generic unique index
allocation/tracking procedure will make code more readable and safer.

The suggested implementation is based on the kernel IDA infrastructure
exposed by the lib/idr.c driver with API described in linux/idr.h header
file. It's used to create an ID resource descriptor "mc_idr", which then
is utilized either to track the custom MC idx specified by EDAC LLDDs or
to allocate the next-free MC idx.

A new special MC index is introduced here. It's defined by the
EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
index is specified by the EDAC LLDD, the MC index will be either retrieved
from the MC device OF-node alias index ("mc[:number:]") or automatically
generated as the next-free MC index found by the ID allocation procedure.

Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>

---

Note the approach implemented here has been partly ported from the SPI
core driver using IDA to track/allocate SPI bus numbers.
Link: https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi.c#L2957
---
 drivers/edac/edac_mc.c | 89 +++++++++++++++++++++++++++++++++++++++---
 drivers/edac/edac_mc.h |  4 ++
 2 files changed, 87 insertions(+), 6 deletions(-)

Comments

Borislav Petkov Oct. 12, 2022, 5:29 p.m. UTC | #1
On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote:
> In case of the unique index allocation it's not that optimal to always
> rely on the low-level device drivers (platform drivers), because they get
> to start to implement either the same design pattern (for instance global
> static MC counter) or may end-up with having non-unique index eventually
> at runtime. Needless to say that having a generic unique index
> allocation/tracking procedure will make code more readable and safer.

I guess this is trying to say that the current memory controller index
thing doesn't work. But why doesn't it work?

It works just fine with the x86 drivers - there the memory controller
number is the same as the node number where it is located so that works
just fine.

If that scheme cannot work on other systems, then I need to see an
explanation why it cannot work first.

> The suggested implementation is based on the kernel IDA infrastructure
> exposed by the lib/idr.c driver with API described in linux/idr.h header
> file. It's used to create an ID resource descriptor "mc_idr", which then
> is utilized either to track the custom MC idx specified by EDAC LLDDs or
> to allocate the next-free MC idx.

This is talking about the "what" and not the "why".

> A new special MC index is introduced here. It's defined by the
> EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> index is specified by the EDAC LLDD, the MC index will be either retrieved
> from the MC device OF-node alias index ("mc[:number:]") or automatically
> generated as the next-free MC index found by the ID allocation procedure.

This is also talking about the "what" and not the "why".

At best, what you're doing should be visible from the patch itself.

Here's a longer explanation of how a commit message should look like:

https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Thx.
Serge Semin Oct. 12, 2022, 8:01 p.m. UTC | #2
On Wed, Oct 12, 2022 at 07:29:22PM +0200, Borislav Petkov wrote:
> On Fri, Sep 30, 2022 at 02:27:08AM +0300, Serge Semin wrote:
> > In case of the unique index allocation it's not that optimal to always
> > rely on the low-level device drivers (platform drivers), because they get
> > to start to implement either the same design pattern (for instance global
> > static MC counter) or may end-up with having non-unique index eventually
> > at runtime. Needless to say that having a generic unique index
> > allocation/tracking procedure will make code more readable and safer.
> 

> I guess this is trying to say that the current memory controller index
> thing doesn't work. But why doesn't it work?

From what have you got this? I said that the current MC indexing
approach wasn't that optimal (always relying on the low-level driver
to allocate the index) because it caused having the same IDx
allocation pattern re-implemented in the drivers. It can be avoided by
the provided patch. The unified approach makes code indeed more
readable in the platform drivers and safer since they didn't have to
bother with more coding. See for instance the drivers with the
static variable-based IDs allocation. It doesn't seem like these
drivers bother with the detected DDR devices order. If so then the
automatic IDs allocation is perfect for them. Note the static variable
increment isn't atomic. Thus the ID allocation algorithm there is prone
to races should the devices probe is run concurrently.

> 
> It works just fine with the x86 drivers - there the memory controller
> number is the same as the node number where it is located so that works
> just fine.
> 
> If that scheme cannot work on other systems, then I need to see an
> explanation why it cannot work first.
> 
> > The suggested implementation is based on the kernel IDA infrastructure
> > exposed by the lib/idr.c driver with API described in linux/idr.h header
> > file. It's used to create an ID resource descriptor "mc_idr", which then
> > is utilized either to track the custom MC idx specified by EDAC LLDDs or
> > to allocate the next-free MC idx.
> 

> This is talking about the "what" and not the "why".
> 
> > A new special MC index is introduced here. It's defined by the
> > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> > index is specified by the EDAC LLDD, the MC index will be either retrieved
> > from the MC device OF-node alias index ("mc[:number:]") or automatically
> > generated as the next-free MC index found by the ID allocation procedure.
> 
> This is also talking about the "what" and not the "why".
> 
> At best, what you're doing should be visible from the patch itself.
> 
> Here's a longer explanation of how a commit message should look like:
> 
> https://kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

Have you read it yourself? Here is a short excerpt from there:
"Once the problem is established, describe what you are actually doing
about it in technical detail.  It's important to describe the change
in plain English for the reviewer to verify that the code is behaving
as you intend it to."

So the "problem" is described in the first paragraph and the technical
details in the later paragraphs.

-Sergey

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Oct. 12, 2022, 8:33 p.m. UTC | #3
On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote:
> The unified approach makes code indeed more readable in the platform
> drivers and safer since they didn't have to bother with more coding.
> See for instance the drivers with the static variable-based IDs
> allocation.

Which drivers? Concrete examples please.

> Have you read it yourself? 

Yes. I even have improved it over the years.

> Here is a short excerpt from there:
> "Once the problem is established, describe what you are actually doing
> about it in technical detail.  It's important to describe the change
> in plain English for the reviewer to verify that the code is behaving
> as you intend it to."

Maybe that part can be misunderstood: "describe what you're doing about
it". That doesn't mean the text should explain what you're adding and
how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I
can see that from the diff.

So let me try to explain to you what I'm expecting from commit messages
in the EDAC tree:

The commit message should explain *why* a change is being done so that,
months, years from now, when you've gone on to do something else, people
doing git archeology can actually figure out *why* this change was done.

And the explanation in that commit message should be *complete* and
should contain *all* necessary information to understand why this change
was done.

Your commit message is not explaining the problem.

"In case of the unique index allocation it's not that optimal to always
rely on the low-level device drivers (platform drivers)"

That's your statement. That needs to have exact details so that people
can look at that commit message, look at the code which *you* point them
to in it and go, aha, that is the problem.

"because they get to start to implement either the same design pattern
(for instance global static MC counter) or may end-up with having
non-unique index eventually at runtime."

Who are they, exact pointers please.

"The suggested implementation is based on the kernel IDA infrastructure
exposed by the lib/idr.c driver with API described in linux/idr.h header
file."

That doesn't matter one bit for the change you're doing. You could have
added it under the "---" line.

"A new special MC index is introduced here. It's defined by the
EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
index is specified by the EDAC LLDD, the MC index will be either retrieved
from the MC device OF-node alias index ("mc[:number:]") or automatically
generated as the next-free MC index found by the ID allocation procedure."

Some of that paragraph should go over the function as a comment - not in
the commit message as it pertains to what the function does and it would
make a *lot* more sense there when someone tries to figure out what the
function does instead of in the commit message.

So, I'm still not convinced why do some EDAC drivers need unique MC
identifiers, why the current scheme doesn't work and where it doesn't
work.
Tony Luck Oct. 12, 2022, 8:44 p.m. UTC | #4
On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote:
> On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote:

> "A new special MC index is introduced here. It's defined by the
> EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> index is specified by the EDAC LLDD, the MC index will be either retrieved
> from the MC device OF-node alias index ("mc[:number:]") or automatically
> generated as the next-free MC index found by the ID allocation procedure."

Just curious.If I have an EDAC driver using EDAC_AUTO_MC_NUM, and I
unload and reload the driver a bunch of times. Do I get the same memory
controller numbers each reload, or get an ever increasing set each time?

-Tony
Serge Semin Oct. 12, 2022, 9:31 p.m. UTC | #5
On Wed, Oct 12, 2022 at 01:44:59PM -0700, Tony Luck wrote:
> On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote:
> > On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote:
> 
> > "A new special MC index is introduced here. It's defined by the
> > EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> > probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> > index is specified by the EDAC LLDD, the MC index will be either retrieved
> > from the MC device OF-node alias index ("mc[:number:]") or automatically
> > generated as the next-free MC index found by the ID allocation procedure."
> 

> Just curious.If I have an EDAC driver using EDAC_AUTO_MC_NUM, and I
> unload and reload the driver a bunch of times. Do I get the same memory
> controller numbers each reload, or get an ever increasing set each time?

You'll get the same numbers if no other device-driver allocated them
(which we already established isn't relevant to EDAC since only one
driver per system is supposed to work). In order to implement what you
said I should have called idr_alloc_cyclic() instead.

Here is what the suggested implementation implies:

1. If there is a particular ID assigned in the framework of the dts aliases
(i.e. mcX = &ddr0) then that value will be selected for the controller.
2. If no particular ID is assigned but there are mcX-like aliases
defined in DTS, then the next free id greater than the last mcX value
will be assigned.
3. If none if the above is relevant the first free ID in the pool will be
allocated.
4. On the driver unload the driver remove method will be called thus
the allocated ID will be freed on the MC-controller de-registration.
Thus on the next driver loading the freed IDs will be reused by the
idr_alloc() method design.

-Sergey

> 
> -Tony
Serge Semin Oct. 12, 2022, 10:30 p.m. UTC | #6
On Wed, Oct 12, 2022 at 10:33:35PM +0200, Borislav Petkov wrote:
> On Wed, Oct 12, 2022 at 11:01:54PM +0300, Serge Semin wrote:
> > The unified approach makes code indeed more readable in the platform
> > drivers and safer since they didn't have to bother with more coding.
> > See for instance the drivers with the static variable-based IDs
> > allocation.
> 
> Which drivers? Concrete examples please.

See below.

> 
> > Have you read it yourself? 
> 
> Yes. I even have improved it over the years.
> 
> > Here is a short excerpt from there:
> > "Once the problem is established, describe what you are actually doing
> > about it in technical detail.  It's important to describe the change
> > in plain English for the reviewer to verify that the code is behaving
> > as you intend it to."
> 
> Maybe that part can be misunderstood: "describe what you're doing about
> it". That doesn't mean the text should explain what you're adding and
> how stuff is defined: "It's defined by the EDAC_AUTO_MC_NUM macro." I
> can see that from the diff.
> 
> So let me try to explain to you what I'm expecting from commit messages
> in the EDAC tree:
> 

> The commit message should explain *why* a change is being done so that,
> months, years from now, when you've gone on to do something else, people
> doing git archeology can actually figure out *why* this change was done.
> 
> And the explanation in that commit message should be *complete* and
> should contain *all* necessary information to understand why this change
> was done.

A level of completeness can be relative to each person. For all the
years I've submitting the patches to the kernel I couldn't even
remember the last request to elaborate my logs. In no means I want to
say they were perfect. I could just be too immersed into the problem
so thought that the provided text was descriptive enough especially
for the subsystem maintainer. So to speak asking for more details
would be more than enough.

> 
> Your commit message is not explaining the problem.
> 
> "In case of the unique index allocation it's not that optimal to always
> rely on the low-level device drivers (platform drivers)"
> 

> That's your statement. That needs to have exact details so that people
> can look at that commit message, look at the code which *you* point them
> to in it and go, aha, that is the problem.
> 
> "because they get to start to implement either the same design pattern
> (for instance global static MC counter) or may end-up with having
> non-unique index eventually at runtime."
> 
> Who are they, exact pointers please.

So you need more details. You should have just asked. I can't read
your mind after all. IMO the description was detailed enough to
understand the problem. Anyway as I already said the current MC
indexing approach wasn't that optimal (always relying on the low-level
driver to allocate the index) because it caused having the same IDx
allocation pattern re-implemented in the drivers. The brightest
example is the drivers with the static variable-based IDs allocation.
It doesn't seem like these drivers bother with the detected DDR
devices order. If so then the automatic IDs allocation is perfect for
them. Instead they can just pass the EDAC_AUTO_MC_NUM id to the
edac_mc_alloc() method and drop the static-based pattern. Thus getting
smaller and more readable drivers code. Moreover the variable
increment isn't atomic. Thus the ID allocation algorithm there is
prone to races should the devices probe is run concurrently.

The last but not least there is no currently way to assign the
controllers ID by means of the DTS file. The suggested patch provides
such functionality by means of the DT aliases.

If it describes the problem better I'll add the text to the patchlog
on the next patchset re-spin.

> 
> "The suggested implementation is based on the kernel IDA infrastructure
> exposed by the lib/idr.c driver with API described in linux/idr.h header
> file."
> 
> That doesn't matter one bit for the change you're doing. You could have
> added it under the "---" line.

Ok. I'll drop it from the log.

> 
> "A new special MC index is introduced here. It's defined by the
> EDAC_AUTO_MC_NUM macro with a value specifically chosen as the least
> probable value used for the real MC index. In case if the EDAC_AUTO_MC_NUM
> index is specified by the EDAC LLDD, the MC index will be either retrieved
> from the MC device OF-node alias index ("mc[:number:]") or automatically
> generated as the next-free MC index found by the ID allocation procedure."
>
 
> Some of that paragraph should go over the function as a comment - not in
> the commit message as it pertains to what the function does and it would
> make a *lot* more sense there when someone tries to figure out what the
> function does instead of in the commit message.
> 

IMO the function isn't that complex to add the comment there. The
semantic can be easily enough inferred from the implementation. 

> So, I'm still not convinced why do some EDAC drivers need unique MC
> identifiers, why the current scheme doesn't work and where it doesn't
> work.

Once again. I didn't say it didn't work. I said it wasn't optimal.
Though in some circumstance it can misbehave for some drivers.
Please see my response above.

-Sergey

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Oct. 13, 2022, 9:38 a.m. UTC | #7
On Thu, Oct 13, 2022 at 01:30:39AM +0300, Serge Semin wrote:
> A level of completeness can be relative to each person. For all the
> years I've submitting the patches to the kernel I couldn't even
> remember the last request to elaborate my logs. In no means I want to
> say they were perfect. I could just be too immersed into the problem
> so thought that the provided text was descriptive enough especially
> for the subsystem maintainer. So to speak asking for more details
> would be more than enough.

Dude, are you even reading what I'm writing to you?!

I don't care how immersed you were in the problem and who asked or
didn't ask you to elaborate your logs. If you're submitting patches
to the EDAC tree, those logs need to be complete and explain things
sufficiently and exactly. Period.

> So you need more details. You should have just asked. I can't read
> your mind after all.

And I can't read yours too. And I asked like three times already. And
yet, you still are not giving me a concrete answer.

I said "exact pointers please". That means, you point me to a driver and
the *exact* *code* in there which you think is doing something which
needs fixing.

What you've given me again is the same spiel as before.

So let me save you and me some time: your patches are not going anywhere
until they explain the thing they're fixing properly and precisely. End
of story.
diff mbox series

Patch

diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
index 24814839d885..634c41ea7804 100644
--- a/drivers/edac/edac_mc.c
+++ b/drivers/edac/edac_mc.c
@@ -29,6 +29,9 @@ 
 #include <linux/edac.h>
 #include <linux/bitops.h>
 #include <linux/uaccess.h>
+#include <linux/idr.h>
+#include <linux/of.h>
+
 #include <asm/page.h>
 #include "edac_mc.h"
 #include "edac_module.h"
@@ -46,6 +49,7 @@  EXPORT_SYMBOL_GPL(edac_op_state);
 /* lock to memory controller's control array */
 static DEFINE_MUTEX(mem_ctls_mutex);
 static LIST_HEAD(mc_devices);
+static DEFINE_IDR(mc_idr);
 
 /*
  * Used to lock EDAC MC to just one module, avoiding two drivers e. g.
@@ -493,7 +497,64 @@  void edac_mc_reset_delay_period(unsigned long value)
 	mutex_unlock(&mem_ctls_mutex);
 }
 
+/**
+ * edac_mc_alloc_id() - Allocate unique Memory Controller identifier
+ *
+ * @mci: pointer to the mci structure to allocate ID for
+ *
+ * Use edac_mc_free_id() to coherently free the MC identifier.
+ *
+ * .. note::
+ *	locking model: must be called with the mem_ctls_mutex lock held
+ *
+ * Returns:
+ *	0 on Success, or an error code on failure
+ */
+static int edac_mc_alloc_id(struct mem_ctl_info *mci)
+{
+	struct device_node *np = dev_of_node(mci->pdev);
+	int ret, min, max;
+
+	if (mci->mc_idx == EDAC_AUTO_MC_NUM) {
+		ret = of_alias_get_id(np, "mc");
+		if (ret >= 0) {
+			min = ret;
+			max = ret + 1;
+		} else {
+			min = of_alias_get_highest_id("mc");
+			if (min >= 0)
+				min++;
+			else
+				min = 0;
+
+			max = 0;
+		}
+	} else {
+		min = mci->mc_idx;
+		max = mci->mc_idx + 1;
+	}
+
+	ret = idr_alloc(&mc_idr, mci, min, max, GFP_KERNEL);
+	if (ret < 0)
+		return ret == -ENOSPC ? -EBUSY : ret;
+
+	mci->mc_idx = ret;
+
+	return 0;
+}
 
+/**
+ * edac_mc_free_id() - Free Memory Controller identifier
+ *
+ * @mci: pointer to the mci structure to free ID from
+ *
+ * .. note::
+ *	locking model: must be called with the mem_ctls_mutex lock held
+ */
+static void edac_mc_free_id(struct mem_ctl_info *mci)
+{
+	idr_remove(&mc_idr, mci->mc_idx);
+}
 
 /**
  * edac_mc_init_labels() - Initialize DIMM labels
@@ -612,7 +673,8 @@  EXPORT_SYMBOL_GPL(edac_get_owner);
 int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
 			       const struct attribute_group **groups)
 {
-	int ret = -EINVAL;
+	int ret;
+
 	edac_dbg(0, "\n");
 
 #ifdef CONFIG_EDAC_DEBUG
@@ -649,20 +711,30 @@  int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
 		goto fail0;
 	}
 
+	ret = edac_mc_alloc_id(mci);
+	if (ret) {
+		edac_printk(KERN_ERR, EDAC_MC, "failed to allocate MC idx %u\n",
+			    mci->mc_idx);
+		goto fail0;
+	}
+
 	edac_mc_init_labels(mci);
 
-	if (add_mc_to_global_list(mci))
-		goto fail0;
+	if (add_mc_to_global_list(mci)) {
+		ret = -EINVAL;
+		goto fail1;
+	}
 
 	/* set load time so that error rate can be tracked */
 	mci->start_time = jiffies;
 
 	mci->bus = edac_get_sysfs_subsys();
 
-	if (edac_create_sysfs_mci_device(mci, groups)) {
+	ret = edac_create_sysfs_mci_device(mci, groups);
+	if (ret) {
 		edac_mc_printk(mci, KERN_WARNING,
 			"failed to create sysfs device\n");
-		goto fail1;
+		goto fail2;
 	}
 
 	if (mci->edac_check) {
@@ -686,9 +758,12 @@  int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
 	mutex_unlock(&mem_ctls_mutex);
 	return 0;
 
-fail1:
+fail2:
 	del_mc_from_global_list(mci);
 
+fail1:
+	edac_mc_free_id(mci);
+
 fail0:
 	mutex_unlock(&mem_ctls_mutex);
 	return ret;
@@ -716,6 +791,8 @@  struct mem_ctl_info *edac_mc_del_mc(struct device *dev)
 	if (del_mc_from_global_list(mci))
 		edac_mc_owner = NULL;
 
+	edac_mc_free_id(mci);
+
 	mutex_unlock(&mem_ctls_mutex);
 
 	if (mci->edac_check)
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
index 881b00eadf7a..4b6676235b1b 100644
--- a/drivers/edac/edac_mc.h
+++ b/drivers/edac/edac_mc.h
@@ -23,6 +23,7 @@ 
 #define _EDAC_MC_H_
 
 #include <linux/kernel.h>
+#include <linux/limits.h>
 #include <linux/types.h>
 #include <linux/module.h>
 #include <linux/spinlock.h>
@@ -37,6 +38,9 @@ 
 #include <linux/workqueue.h>
 #include <linux/edac.h>
 
+/* Generate MC identifier automatically */
+#define EDAC_AUTO_MC_NUM	UINT_MAX
+
 #if PAGE_SHIFT < 20
 #define PAGES_TO_MiB(pages)	((pages) >> (20 - PAGE_SHIFT))
 #define MiB_TO_PAGES(mb)	((mb) << (20 - PAGE_SHIFT))