diff mbox series

[1/9] of: Add warpper function of_find_node_by_name_balanced()

Message ID 20250207013117.104205-2-zhangzekun11@huawei.com (mailing list archive)
State New
Headers show
Series Add wrapper function of_find_node_by_name_balanced() | expand

Commit Message

zhangzekun (A) Feb. 7, 2025, 1:31 a.m. UTC
There are many drivers use of_find_node_by_name() with a not-NULL
device_node pointer, and a number of callers would require a call to
of_node_get() before using it. There are also some drivers who forget
to call of_node_get() which would cause a ref count leak[1]. So, Add a
wraper function for of_find_node_by_name(), drivers may use this function
to call of_find_node_by_name() with the refcount already balanced.

[1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/

Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
---
 include/linux/of.h | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Oleksij Rempel Feb. 7, 2025, 8:24 a.m. UTC | #1
On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
> There are many drivers use of_find_node_by_name() with a not-NULL
> device_node pointer, and a number of callers would require a call to
> of_node_get() before using it. There are also some drivers who forget
> to call of_node_get() which would cause a ref count leak[1]. So, Add a
> wraper function for of_find_node_by_name(), drivers may use this function
> to call of_find_node_by_name() with the refcount already balanced.
> 
> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/

Hi Zhang Zekun,

thank you for working on this issue!

First of all, let's take a step back and analyze the initial problem.
Everything following is only my opinion...

The main issue I see is that the current API - of_find_node_by_name -
modifies the refcount of its input by calling of_node_put(from) as part
of its search. Typically, a "find" function is expected to treat its
input as read-only. That is, when you pass an object into such a
function, you expect its reference count to remain unchanged unless
ownership is explicitly transferred. In this case, lowering the refcount
on the input node is counterintuitive and already lead to unexpected
behavior and subtle bugs.

To address this, the workaround introduces a wrapper function,
of_find_node_by_name_balanced, which first increments the input’s
refcount (via of_node_get()) before calling the original function. While
this "balances" the refcount change, the naming remains problematic from
my perspective. The "_balanced" suffix isn’t part of our common naming
conventions (traditions? :)). Most drivers expect that a function
starting with "find" will not alter the reference count of its input.
The term "balanced" doesn’t clearly convey that the input's refcount is
being explicitly managed - it instead obscures the underlying behavior,
leaving many developers confused about what guarantees the API provides.

In my view, a more natural solution would be to redesign the API so that
it doesn’t modify the input object’s refcount at all. Instead, it should
solely increase the refcount of the returned node (if found) for safe
asynchronous usage. This approach would align with established
conventions where "find" implies no side effects on inputs or output,
and a "get" indicates that the output comes with an extra reference. For
example, a function named of_get_node_by_name would clearly signal that
only the returned node is subject to a refcount increase while leaving
the input intact.

Thus, while the current workaround "balances" the reference count, it
doesn't address the underlying design flaw. The naming still suggests a
"find" function that should leave the input untouched, which isn’t the
case here. A redesign of the API - with both the behavior and naming
aligned to common expectations - would be a clearer and more robust
solution.

Nevertheless, it is only my POV, and the final decision rests with the
OpenFirmware framework maintainers.

Best Regards,
Oleksij
Laurent Pinchart Feb. 7, 2025, 8:57 a.m. UTC | #2
Hi Oleksij,

On Fri, Feb 07, 2025 at 09:24:10AM +0100, Oleksij Rempel wrote:
> On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
> > There are many drivers use of_find_node_by_name() with a not-NULL
> > device_node pointer, and a number of callers would require a call to
> > of_node_get() before using it. There are also some drivers who forget
> > to call of_node_get() which would cause a ref count leak[1]. So, Add a
> > wraper function for of_find_node_by_name(), drivers may use this function
> > to call of_find_node_by_name() with the refcount already balanced.
> > 
> > [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> 
> Hi Zhang Zekun,
> 
> thank you for working on this issue!
> 
> First of all, let's take a step back and analyze the initial problem.
> Everything following is only my opinion...
> 
> The main issue I see is that the current API - of_find_node_by_name -
> modifies the refcount of its input by calling of_node_put(from) as part
> of its search. Typically, a "find" function is expected to treat its
> input as read-only. That is, when you pass an object into such a
> function, you expect its reference count to remain unchanged unless
> ownership is explicitly transferred. In this case, lowering the refcount
> on the input node is counterintuitive and already lead to unexpected
> behavior and subtle bugs.
> 
> To address this, the workaround introduces a wrapper function,
> of_find_node_by_name_balanced, which first increments the input’s
> refcount (via of_node_get()) before calling the original function. While
> this "balances" the refcount change, the naming remains problematic from
> my perspective. The "_balanced" suffix isn’t part of our common naming
> conventions (traditions? :)). Most drivers expect that a function
> starting with "find" will not alter the reference count of its input.
> The term "balanced" doesn’t clearly convey that the input's refcount is
> being explicitly managed - it instead obscures the underlying behavior,
> leaving many developers confused about what guarantees the API provides.
> 
> In my view, a more natural solution would be to redesign the API so that
> it doesn’t modify the input object’s refcount at all. Instead, it should
> solely increase the refcount of the returned node (if found) for safe
> asynchronous usage. This approach would align with established
> conventions where "find" implies no side effects on inputs or output,
> and a "get" indicates that the output comes with an extra reference. For
> example, a function named of_get_node_by_name would clearly signal that
> only the returned node is subject to a refcount increase while leaving
> the input intact.
> 
> Thus, while the current workaround "balances" the reference count, it
> doesn't address the underlying design flaw. The naming still suggests a
> "find" function that should leave the input untouched, which isn’t the
> case here. A redesign of the API - with both the behavior and naming
> aligned to common expectations - would be a clearer and more robust
> solution.
> 
> Nevertheless, it is only my POV, and the final decision rests with the
> OpenFirmware framework maintainers.

I agree overall that the naming is not optimal. Looking at the other
patches in the series, I think at least some of them misuse
of_find_node_by_name(). For instance, drivers/media/i2c/max9286.c calls
the function to find a *child* node of the device's of_node named
"i2c-mux", while of_find_node_by_name() looks at children first but will
then walk the *whole* DT to find a named node. I haven't checked all
patches, but other ones seem to suffer from the same misuse.

Assuming that the named node those drivers are looking for is a direct
child of the node passed as argument to of_find_node_by_name(), the
right fix would tbe to use of_get_child_by_name(). If it's not a direct
child, a recursive version of of_get_child_by_name() could be useful.
zhangzekun (A) Feb. 7, 2025, 11:28 a.m. UTC | #3
在 2025/2/7 16:24, Oleksij Rempel 写道:
> On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
>> There are many drivers use of_find_node_by_name() with a not-NULL
>> device_node pointer, and a number of callers would require a call to
>> of_node_get() before using it. There are also some drivers who forget
>> to call of_node_get() which would cause a ref count leak[1]. So, Add a
>> wraper function for of_find_node_by_name(), drivers may use this function
>> to call of_find_node_by_name() with the refcount already balanced.
>>
>> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> 
> Hi Zhang Zekun,
> 
> thank you for working on this issue!
> 
> First of all, let's take a step back and analyze the initial problem.
> Everything following is only my opinion...
> 
> The main issue I see is that the current API - of_find_node_by_name -
> modifies the refcount of its input by calling of_node_put(from) as part
> of its search. Typically, a "find" function is expected to treat its
> input as read-only. That is, when you pass an object into such a
> function, you expect its reference count to remain unchanged unless
> ownership is explicitly transferred. In this case, lowering the refcount
> on the input node is counterintuitive and already lead to unexpected
> behavior and subtle bugs.
> 
> To address this, the workaround introduces a wrapper function,
> of_find_node_by_name_balanced, which first increments the input’s
> refcount (via of_node_get()) before calling the original function. While
> this "balances" the refcount change, the naming remains problematic from
> my perspective. The "_balanced" suffix isn’t part of our common naming
> conventions (traditions? :)). Most drivers expect that a function
> starting with "find" will not alter the reference count of its input.
> The term "balanced" doesn’t clearly convey that the input's refcount is
> being explicitly managed - it instead obscures the underlying behavior,
> leaving many developers confused about what guarantees the API provides.
> 
> In my view, a more natural solution would be to redesign the API so that
> it doesn’t modify the input object’s refcount at all. Instead, it should
> solely increase the refcount of the returned node (if found) for safe
> asynchronous usage. This approach would align with established
> conventions where "find" implies no side effects on inputs or output,
> and a "get" indicates that the output comes with an extra reference. For
> example, a function named of_get_node_by_name would clearly signal that
> only the returned node is subject to a refcount increase while leaving
> the input intact.
> 
> Thus, while the current workaround "balances" the reference count, it
> doesn't address the underlying design flaw. The naming still suggests a
> "find" function that should leave the input untouched, which isn’t the
> case here. A redesign of the API - with both the behavior and naming
> aligned to common expectations - would be a clearer and more robust
> solution.
> 
> Nevertheless, it is only my POV, and the final decision rests with the
> OpenFirmware framework maintainers.
> 
> Best Regards,
> Oleksij

Hi, Oleksij,

Thanks for your review. I think redesign the API would be a fundamental 
way to fix this issue, but it would cause impact for current users who 
knows the exact functionality of of_find_node_by_name(), which need to 
put the "from" before calling it (I can't make the assumption that all 
of drivers calling of_find_node_by_name() with a not-NULL "from" 
pointer, but not calling of_node_get() before is misuse). The basic idea 
for adding a new function is try not to impact current users. For users 
who are confused about the "_balanced" suffix,the comments of 
of_find_node_by_name() explains why this interface exists.

Thanks,
Zekun
Laurent Pinchart Feb. 7, 2025, 3:37 p.m. UTC | #4
Hi Zekun,

On Fri, Feb 07, 2025 at 07:28:23PM +0800, zhangzekun (A) wrote:
> 在 2025/2/7 16:24, Oleksij Rempel 写道:
> > On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
> >> There are many drivers use of_find_node_by_name() with a not-NULL
> >> device_node pointer, and a number of callers would require a call to
> >> of_node_get() before using it. There are also some drivers who forget
> >> to call of_node_get() which would cause a ref count leak[1]. So, Add a
> >> wraper function for of_find_node_by_name(), drivers may use this function
> >> to call of_find_node_by_name() with the refcount already balanced.
> >>
> >> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> > 
> > Hi Zhang Zekun,
> > 
> > thank you for working on this issue!
> > 
> > First of all, let's take a step back and analyze the initial problem.
> > Everything following is only my opinion...
> > 
> > The main issue I see is that the current API - of_find_node_by_name -
> > modifies the refcount of its input by calling of_node_put(from) as part
> > of its search. Typically, a "find" function is expected to treat its
> > input as read-only. That is, when you pass an object into such a
> > function, you expect its reference count to remain unchanged unless
> > ownership is explicitly transferred. In this case, lowering the refcount
> > on the input node is counterintuitive and already lead to unexpected
> > behavior and subtle bugs.
> > 
> > To address this, the workaround introduces a wrapper function,
> > of_find_node_by_name_balanced, which first increments the input’s
> > refcount (via of_node_get()) before calling the original function. While
> > this "balances" the refcount change, the naming remains problematic from
> > my perspective. The "_balanced" suffix isn’t part of our common naming
> > conventions (traditions? :)). Most drivers expect that a function
> > starting with "find" will not alter the reference count of its input.
> > The term "balanced" doesn’t clearly convey that the input's refcount is
> > being explicitly managed - it instead obscures the underlying behavior,
> > leaving many developers confused about what guarantees the API provides.
> > 
> > In my view, a more natural solution would be to redesign the API so that
> > it doesn’t modify the input object’s refcount at all. Instead, it should
> > solely increase the refcount of the returned node (if found) for safe
> > asynchronous usage. This approach would align with established
> > conventions where "find" implies no side effects on inputs or output,
> > and a "get" indicates that the output comes with an extra reference. For
> > example, a function named of_get_node_by_name would clearly signal that
> > only the returned node is subject to a refcount increase while leaving
> > the input intact.
> > 
> > Thus, while the current workaround "balances" the reference count, it
> > doesn't address the underlying design flaw. The naming still suggests a
> > "find" function that should leave the input untouched, which isn’t the
> > case here. A redesign of the API - with both the behavior and naming
> > aligned to common expectations - would be a clearer and more robust
> > solution.
> > 
> > Nevertheless, it is only my POV, and the final decision rests with the
> > OpenFirmware framework maintainers.
> > 
> > Best Regards,
> > Oleksij
> 
> Hi, Oleksij,
> 
> Thanks for your review. I think redesign the API would be a fundamental 
> way to fix this issue, but it would cause impact for current users who 
> knows the exact functionality of of_find_node_by_name(), which need to 
> put the "from" before calling it (I can't make the assumption that all 
> of drivers calling of_find_node_by_name() with a not-NULL "from" 
> pointer, but not calling of_node_get() before is misuse). The basic idea 
> for adding a new function is try not to impact current users. For users 
> who are confused about the "_balanced" suffix,the comments of 
> of_find_node_by_name() explains why this interface exists.

I think we all agree that of_find_node_by_name() is miused, and that it
shows the API isn't optimal. What we have different opinions on is how
to make the API less error-prone. I think adding a new
of_find_node_by_name_balanced() function works around the issue and
doesn't improve the situation much, I would argue it makes things even
more confusing.

We have only 20 calls to of_find_node_by_name() with a non-NULL first
argument in v6.14-rc1:

arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");

The 'root' variable here is the result of a call to
'of_find_node_by_path("/")', so I think we could pass a null pointer
instead to simplify things.

arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");

Here I believe of_find_node_by_name() is called to find a *child* node
of 'master'. of_find_node_by_name() is the wrong function for that.

arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");

Here too the code seems to be looking for child nodes only (but I
couldn't find a DT example or binding in-tree, so I'm not entirely
sure).

drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);

Usage here seems correct, the reference-count decrement is intended.

drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);

Incorrect usage, as far as I understand all those drivers are looking
for child nodes only.

drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");

Here too I think only child nodes are meant to be considered.

of_find_node_by_name() is very much misused as most callers want to find
child nodes, while of_find_node_by_name() will walk the whole DT from a
given starting point.

I think the right fix here is to

- Replace of_find_node_by_name(root, ...) with
  of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
  (if my understanding of the code is correct).

- Replace of_find_node_by_name() with of_get_child_by_name() in callers
  that need to search immediate children only (I expected that to be the
  majority of the above call sites).

- If there are other callers that need to find indirect children,
  introduce a new of_get_child_by_name_recursive() function.

At that point, the only remaining caller of of_find_node_by_name()
(beside its usage in the for_each_node_by_name() macro) will be
drivers/clk/ti/clk.c, which uses the function correctly.

I'm tempted to then rename of_find_node_by_name() to
__of_find_node_by_name() to indicate it's an internal function not meant
to be called except in special cases. It could all be renamed to
__of_find_next_node_by_name() to make its behaviour clearer.
Dan Carpenter Feb. 8, 2025, 4:18 a.m. UTC | #5
On Fri, Feb 07, 2025 at 05:37:22PM +0200, Laurent Pinchart wrote:
> I'm tempted to then rename of_find_node_by_name() to
> __of_find_node_by_name() to indicate it's an internal function not meant
> to be called except in special cases. It could all be renamed to
> __of_find_next_node_by_name() to make its behaviour clearer.
> 

Adding "next" to the name would help a lot.

Joe Hattori was finding some of these bugs using his static checker.
We could easily write something really specific to find this sort of
bug using Smatch.  If you have ideas like this feel free to ask on
smatch@vger.kernel.org.  It doesn't find anything that your grep
didn't find but any new bugs will be detected when they're introduced.

regards,
dan carpenter
drivers/net/ethernet/broadcom/asp2/bcmasp.c:1370 bcmasp_probe() warn: 'dev->of_node' was not incremented
drivers/net/pse-pd/tps23881.c:505 tps23881_get_of_channels() warn: 'priv->np' was not incremented
drivers/media/platform/qcom/venus/core.c:301 venus_add_video_core() warn: 'dev->of_node' was not incremented
drivers/regulator/tps6594-regulator.c:618 tps6594_regulator_probe() warn: 'tps->dev->of_node' was not incremented
zhangzekun (A) Feb. 10, 2025, 6:47 a.m. UTC | #6
Hi, Laurent,

> I think we all agree that of_find_node_by_name() is miused, and that it
> shows the API isn't optimal. What we have different opinions on is how
> to make the API less error-prone. I think adding a new
> of_find_node_by_name_balanced() function works around the issue and
> doesn't improve the situation much, I would argue it makes things even
> more confusing.
> 
> We have only 20 calls to of_find_node_by_name() with a non-NULL first
> argument in v6.14-rc1:
> 
> arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
> 
> The 'root' variable here is the result of a call to
> 'of_find_node_by_path("/")', so I think we could pass a null pointer
> instead to simplify things.
> 
> arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
> 
> Here I believe of_find_node_by_name() is called to find a *child* node
> of 'master'. of_find_node_by_name() is the wrong function for that.
> 
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
> 
> Here too the code seems to be looking for child nodes only (but I
> couldn't find a DT example or binding in-tree, so I'm not entirely
> sure).
> 
> drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
> 
> Usage here seems correct, the reference-count decrement is intended.
> 
> drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
> drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
> drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
> drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
> drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
> drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
> drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
> drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
> drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
> 
> Incorrect usage, as far as I understand all those drivers are looking
> for child nodes only.
> 
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
> 
> Here too I think only child nodes are meant to be considered.
> 
> of_find_node_by_name() is very much misused as most callers want to find
> child nodes, while of_find_node_by_name() will walk the whole DT from a
> given starting point.
> 
> I think the right fix here is to
> 
> - Replace of_find_node_by_name(root, ...) with
>    of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
>    (if my understanding of the code is correct).

For arch/powerpc/platforms/chrp/pci.c, noticing that there is a comment 
in setup_peg2():
  /* keep the reference to the root node */

It might can not be convert to of_find_node_by_name(NULL, ...), and the 
origin use of of_find_node_by_name() put the ref count which want to be 
kept.

> 
> - Replace of_find_node_by_name() with of_get_child_by_name() in callers
>    that need to search immediate children only (I expected that to be the
>    majority of the above call sites)
Since there is no enough information about these DT nodes, it would take 
time to prove if it is OK to make such convert.
> 
> - If there are other callers that need to find indirect children,
>    introduce a new of_get_child_by_name_recursive() function.
> 
> At that point, the only remaining caller of of_find_node_by_name()
> (beside its usage in the for_each_node_by_name() macro) will be
> drivers/clk/ti/clk.c, which uses the function correctly.
> 
> I'm tempted to then rename of_find_node_by_name() to
> __of_find_node_by_name() to indicate it's an internal function not meant
> to be called except in special cases. It could all be renamed to
> __of_find_next_node_by_name() to make its behaviour clearer.
>

The actual code logic of of_find_node_by_name() is more suitable to be 
used in a loop.So,rename of_find_node_by_name() to 
__of_find_next_node_by_name() seems to be a good idea.

Best regards,
Zekun
Laurent Pinchart Feb. 10, 2025, 10:03 a.m. UTC | #7
Hi Zekun,

On Mon, Feb 10, 2025 at 02:47:28PM +0800, zhangzekun (A) wrote:
> > I think we all agree that of_find_node_by_name() is miused, and that it
> > shows the API isn't optimal. What we have different opinions on is how
> > to make the API less error-prone. I think adding a new
> > of_find_node_by_name_balanced() function works around the issue and
> > doesn't improve the situation much, I would argue it makes things even
> > more confusing.
> > 
> > We have only 20 calls to of_find_node_by_name() with a non-NULL first
> > argument in v6.14-rc1:
> > 
> > arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
> > 
> > The 'root' variable here is the result of a call to
> > 'of_find_node_by_path("/")', so I think we could pass a null pointer
> > instead to simplify things.
> > 
> > arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
> > 
> > Here I believe of_find_node_by_name() is called to find a *child* node
> > of 'master'. of_find_node_by_name() is the wrong function for that.
> > 
> > arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
> > arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
> > arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
> > arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
> > 
> > Here too the code seems to be looking for child nodes only (but I
> > couldn't find a DT example or binding in-tree, so I'm not entirely
> > sure).
> > 
> > drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
> > 
> > Usage here seems correct, the reference-count decrement is intended.
> > 
> > drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
> > drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
> > drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
> > drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
> > drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
> > drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
> > drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
> > drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
> > drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
> > 
> > Incorrect usage, as far as I understand all those drivers are looking
> > for child nodes only.
> > 
> > drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
> > drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
> > drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
> > drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
> > 
> > Here too I think only child nodes are meant to be considered.
> > 
> > of_find_node_by_name() is very much misused as most callers want to find
> > child nodes, while of_find_node_by_name() will walk the whole DT from a
> > given starting point.
> > 
> > I think the right fix here is to
> > 
> > - Replace of_find_node_by_name(root, ...) with
> >    of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
> >    (if my understanding of the code is correct).
> 
> For arch/powerpc/platforms/chrp/pci.c, noticing that there is a comment 
> in setup_peg2():
>   /* keep the reference to the root node */
> 
> It might can not be convert to of_find_node_by_name(NULL, ...), and the 
> origin use of of_find_node_by_name() put the ref count which want to be 
> kept.

But the reference is dropped by of_find_node_by_name(). Unless I'm
missing something, dropping the lien

	struct device_node *root = of_find_node_by_path("/");

and changing

	rtas = of_find_node_by_name (root, "rtas");

to

	rtas = of_find_node_by_name (NULL, "rtas");

will not change the behaviour of the code.

> > 
> > - Replace of_find_node_by_name() with of_get_child_by_name() in callers
> >    that need to search immediate children only (I expected that to be the
> >    majority of the above call sites)
>
> Since there is no enough information about these DT nodes, it would take 
> time to prove if it is OK to make such convert.

It will take a bit of time, yes. I'm afraid time is needed to improve
things :-) In most cases, as DT bindings are available, it shouldn't be
very difficult.

> > 
> > - If there are other callers that need to find indirect children,
> >    introduce a new of_get_child_by_name_recursive() function.
> > 
> > At that point, the only remaining caller of of_find_node_by_name()
> > (beside its usage in the for_each_node_by_name() macro) will be
> > drivers/clk/ti/clk.c, which uses the function correctly.
> > 
> > I'm tempted to then rename of_find_node_by_name() to
> > __of_find_node_by_name() to indicate it's an internal function not meant
> > to be called except in special cases. It could all be renamed to
> > __of_find_next_node_by_name() to make its behaviour clearer.
> >
> 
> The actual code logic of of_find_node_by_name() is more suitable to be 
> used in a loop.So,rename of_find_node_by_name() to 
> __of_find_next_node_by_name() seems to be a good idea.
zhangzekun (A) Feb. 11, 2025, 11:26 a.m. UTC | #8
在 2025/2/10 18:03, Laurent Pinchart 写道:
> Hi Zekun,
> 
> On Mon, Feb 10, 2025 at 02:47:28PM +0800, zhangzekun (A) wrote:
>>> I think we all agree that of_find_node_by_name() is miused, and that it
>>> shows the API isn't optimal. What we have different opinions on is how
>>> to make the API less error-prone. I think adding a new
>>> of_find_node_by_name_balanced() function works around the issue and
>>> doesn't improve the situation much, I would argue it makes things even
>>> more confusing.
>>>
>>> We have only 20 calls to of_find_node_by_name() with a non-NULL first
>>> argument in v6.14-rc1:
>>>
>>> arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
>>>
>>> The 'root' variable here is the result of a call to
>>> 'of_find_node_by_path("/")', so I think we could pass a null pointer
>>> instead to simplify things.
>>>
>>> arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
>>>
>>> Here I believe of_find_node_by_name() is called to find a *child* node
>>> of 'master'. of_find_node_by_name() is the wrong function for that.
>>>
>>> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
>>> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
>>> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
>>> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
>>>
>>> Here too the code seems to be looking for child nodes only (but I
>>> couldn't find a DT example or binding in-tree, so I'm not entirely
>>> sure).
>>>
>>> drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
>>>
>>> Usage here seems correct, the reference-count decrement is intended.
>>>
>>> drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
>>> drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
>>> drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
>>> drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
>>> drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
>>> drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
>>> drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
>>> drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
>>> drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
>>>
>>> Incorrect usage, as far as I understand all those drivers are looking
>>> for child nodes only.
>>>
>>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
>>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
>>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
>>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
>>>
>>> Here too I think only child nodes are meant to be considered.
>>>
>>> of_find_node_by_name() is very much misused as most callers want to find
>>> child nodes, while of_find_node_by_name() will walk the whole DT from a
>>> given starting point.
>>>
>>> I think the right fix here is to
>>>
>>> - Replace of_find_node_by_name(root, ...) with
>>>     of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
>>>     (if my understanding of the code is correct).
>>
>> For arch/powerpc/platforms/chrp/pci.c, noticing that there is a comment
>> in setup_peg2():
>>    /* keep the reference to the root node */
>>
>> It might can not be convert to of_find_node_by_name(NULL, ...), and the
>> origin use of of_find_node_by_name() put the ref count which want to be
>> kept.
> 
> But the reference is dropped by of_find_node_by_name(). Unless I'm
> missing something, dropping the lien
> 
> 	struct device_node *root = of_find_node_by_path("/");
> 
> and changing
> 
> 	rtas = of_find_node_by_name (root, "rtas");
> 
> to
> 
> 	rtas = of_find_node_by_name (NULL, "rtas");
> 
> will not change the behaviour of the code.
> 

Hi, Laurent,

I think that the original code try to keep the refcount get by 
of_find_node_by_path(), but leak it accidently by 
of_find_node_by_name(). I am not sure that what driver really wants to 
do and if it has a bug here.

Beset Regards,
Zekun
Laurent Pinchart Feb. 11, 2025, 11:43 a.m. UTC | #9
On Tue, Feb 11, 2025 at 07:26:18PM +0800, zhangzekun (A) wrote:
> 在 2025/2/10 18:03, Laurent Pinchart 写道:
> > On Mon, Feb 10, 2025 at 02:47:28PM +0800, zhangzekun (A) wrote:
> >>> I think we all agree that of_find_node_by_name() is miused, and that it
> >>> shows the API isn't optimal. What we have different opinions on is how
> >>> to make the API less error-prone. I think adding a new
> >>> of_find_node_by_name_balanced() function works around the issue and
> >>> doesn't improve the situation much, I would argue it makes things even
> >>> more confusing.
> >>>
> >>> We have only 20 calls to of_find_node_by_name() with a non-NULL first
> >>> argument in v6.14-rc1:
> >>>
> >>> arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
> >>>
> >>> The 'root' variable here is the result of a call to
> >>> 'of_find_node_by_path("/")', so I think we could pass a null pointer
> >>> instead to simplify things.
> >>>
> >>> arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
> >>>
> >>> Here I believe of_find_node_by_name() is called to find a *child* node
> >>> of 'master'. of_find_node_by_name() is the wrong function for that.
> >>>
> >>> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
> >>> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
> >>> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
> >>> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
> >>>
> >>> Here too the code seems to be looking for child nodes only (but I
> >>> couldn't find a DT example or binding in-tree, so I'm not entirely
> >>> sure).
> >>>
> >>> drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
> >>>
> >>> Usage here seems correct, the reference-count decrement is intended.
> >>>
> >>> drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
> >>> drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
> >>> drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
> >>> drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
> >>> drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
> >>> drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
> >>> drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
> >>> drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
> >>> drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
> >>>
> >>> Incorrect usage, as far as I understand all those drivers are looking
> >>> for child nodes only.
> >>>
> >>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
> >>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
> >>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
> >>> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
> >>>
> >>> Here too I think only child nodes are meant to be considered.
> >>>
> >>> of_find_node_by_name() is very much misused as most callers want to find
> >>> child nodes, while of_find_node_by_name() will walk the whole DT from a
> >>> given starting point.
> >>>
> >>> I think the right fix here is to
> >>>
> >>> - Replace of_find_node_by_name(root, ...) with
> >>>     of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
> >>>     (if my understanding of the code is correct).
> >>
> >> For arch/powerpc/platforms/chrp/pci.c, noticing that there is a comment
> >> in setup_peg2():
> >>    /* keep the reference to the root node */
> >>
> >> It might can not be convert to of_find_node_by_name(NULL, ...), and the
> >> origin use of of_find_node_by_name() put the ref count which want to be
> >> kept.
> > 
> > But the reference is dropped by of_find_node_by_name(). Unless I'm
> > missing something, dropping the lien
> > 
> > 	struct device_node *root = of_find_node_by_path("/");
> > 
> > and changing
> > 
> > 	rtas = of_find_node_by_name (root, "rtas");
> > 
> > to
> > 
> > 	rtas = of_find_node_by_name (NULL, "rtas");
> > 
> > will not change the behaviour of the code.
> 
> Hi, Laurent,
> 
> I think that the original code try to keep the refcount get by 
> of_find_node_by_path(), but leak it accidently by 
> of_find_node_by_name(). I am not sure that what driver really wants to 
> do and if it has a bug here.

Looking at the git history, I don't think the code needs or tries to
keep a reference to the root node.
Rob Herring (Arm) Feb. 11, 2025, 2:15 p.m. UTC | #10
On Fri, Feb 7, 2025 at 9:37 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Zekun,
>
> On Fri, Feb 07, 2025 at 07:28:23PM +0800, zhangzekun (A) wrote:
> > 在 2025/2/7 16:24, Oleksij Rempel 写道:
> > > On Fri, Feb 07, 2025 at 09:31:09AM +0800, Zhang Zekun wrote:
> > >> There are many drivers use of_find_node_by_name() with a not-NULL
> > >> device_node pointer, and a number of callers would require a call to
> > >> of_node_get() before using it. There are also some drivers who forget
> > >> to call of_node_get() which would cause a ref count leak[1]. So, Add a
> > >> wraper function for of_find_node_by_name(), drivers may use this function
> > >> to call of_find_node_by_name() with the refcount already balanced.
> > >>
> > >> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> > >
> > > Hi Zhang Zekun,
> > >
> > > thank you for working on this issue!
> > >
> > > First of all, let's take a step back and analyze the initial problem.
> > > Everything following is only my opinion...
> > >
> > > The main issue I see is that the current API - of_find_node_by_name -
> > > modifies the refcount of its input by calling of_node_put(from) as part
> > > of its search. Typically, a "find" function is expected to treat its
> > > input as read-only. That is, when you pass an object into such a
> > > function, you expect its reference count to remain unchanged unless
> > > ownership is explicitly transferred. In this case, lowering the refcount
> > > on the input node is counterintuitive and already lead to unexpected
> > > behavior and subtle bugs.
> > >
> > > To address this, the workaround introduces a wrapper function,
> > > of_find_node_by_name_balanced, which first increments the input’s
> > > refcount (via of_node_get()) before calling the original function. While
> > > this "balances" the refcount change, the naming remains problematic from
> > > my perspective. The "_balanced" suffix isn’t part of our common naming
> > > conventions (traditions? :)). Most drivers expect that a function
> > > starting with "find" will not alter the reference count of its input.
> > > The term "balanced" doesn’t clearly convey that the input's refcount is
> > > being explicitly managed - it instead obscures the underlying behavior,
> > > leaving many developers confused about what guarantees the API provides.
> > >
> > > In my view, a more natural solution would be to redesign the API so that
> > > it doesn’t modify the input object’s refcount at all. Instead, it should
> > > solely increase the refcount of the returned node (if found) for safe
> > > asynchronous usage. This approach would align with established
> > > conventions where "find" implies no side effects on inputs or output,
> > > and a "get" indicates that the output comes with an extra reference. For
> > > example, a function named of_get_node_by_name would clearly signal that
> > > only the returned node is subject to a refcount increase while leaving
> > > the input intact.
> > >
> > > Thus, while the current workaround "balances" the reference count, it
> > > doesn't address the underlying design flaw. The naming still suggests a
> > > "find" function that should leave the input untouched, which isn’t the
> > > case here. A redesign of the API - with both the behavior and naming
> > > aligned to common expectations - would be a clearer and more robust
> > > solution.
> > >
> > > Nevertheless, it is only my POV, and the final decision rests with the
> > > OpenFirmware framework maintainers.
> > >
> > > Best Regards,
> > > Oleksij
> >
> > Hi, Oleksij,
> >
> > Thanks for your review. I think redesign the API would be a fundamental
> > way to fix this issue, but it would cause impact for current users who
> > knows the exact functionality of of_find_node_by_name(), which need to
> > put the "from" before calling it (I can't make the assumption that all
> > of drivers calling of_find_node_by_name() with a not-NULL "from"
> > pointer, but not calling of_node_get() before is misuse). The basic idea
> > for adding a new function is try not to impact current users. For users
> > who are confused about the "_balanced" suffix,the comments of
> > of_find_node_by_name() explains why this interface exists.
>
> I think we all agree that of_find_node_by_name() is miused, and that it
> shows the API isn't optimal. What we have different opinions on is how
> to make the API less error-prone. I think adding a new
> of_find_node_by_name_balanced() function works around the issue and
> doesn't improve the situation much, I would argue it makes things even
> more confusing.
>
> We have only 20 calls to of_find_node_by_name() with a non-NULL first
> argument in v6.14-rc1:
>
> arch/powerpc/platforms/chrp/pci.c:      rtas = of_find_node_by_name (root, "rtas");
>
> The 'root' variable here is the result of a call to
> 'of_find_node_by_path("/")', so I think we could pass a null pointer
> instead to simplify things.

I think this could just be 'of_find_node_by_path("/rtas")' which does
occur elsewhere.

> arch/powerpc/platforms/powermac/pic.c:          slave = of_find_node_by_name(master, "mac-io");
>
> Here I believe of_find_node_by_name() is called to find a *child* node
> of 'master'. of_find_node_by_name() is the wrong function for that.

Yes, I think that's always a child of the PCI bus.

>
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(rootnp, "GAISLER_IRQMP");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(rootnp, "01_00d");
> arch/sparc/kernel/leon_kernel.c:        np = of_find_node_by_name(nnp, "GAISLER_GPTIMER");
> arch/sparc/kernel/leon_kernel.c:                np = of_find_node_by_name(nnp, "01_011");
>
> Here too the code seems to be looking for child nodes only (but I
> couldn't find a DT example or binding in-tree, so I'm not entirely
> sure).

Sparc doesn't use OF_DYNAMIC, so who cares... But there are some Sparc
DT dumps here:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/prtconfs.git/

>
> drivers/clk/ti/clk.c:   return of_find_node_by_name(from, tmp);
>
> Usage here seems correct, the reference-count decrement is intended.
>
> drivers/media/i2c/max9286.c:    i2c_mux = of_find_node_by_name(dev->of_node, "i2c-mux");
> drivers/media/platform/qcom/venus/core.c:       enp = of_find_node_by_name(dev->of_node, node_name);
> drivers/net/dsa/bcm_sf2.c:      ports = of_find_node_by_name(dn, "ports");
> drivers/net/dsa/hirschmann/hellcreek_ptp.c:     leds = of_find_node_by_name(hellcreek->dev->of_node, "leds");
> drivers/net/ethernet/broadcom/asp2/bcmasp.c:    ports_node = of_find_node_by_name(dev->of_node, "ethernet-ports");
> drivers/net/ethernet/marvell/prestera/prestera_main.c:  ports = of_find_node_by_name(sw->np, "ports");
> drivers/net/pse-pd/tps23881.c:  channels_node = of_find_node_by_name(priv->np, "channels");
> drivers/regulator/scmi-regulator.c:     np = of_find_node_by_name(handle->dev->of_node, "regulators");
> drivers/regulator/tps6594-regulator.c:          np = of_find_node_by_name(tps->dev->of_node, multi_regs[multi].supply_name);
>
> Incorrect usage, as far as I understand all those drivers are looking
> for child nodes only.
>
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest16");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest17");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest18");
> drivers/of/unittest.c:          found = of_find_node_by_name(nd->overlay, "test-unittest19");
>
> Here too I think only child nodes are meant to be considered.
>
> of_find_node_by_name() is very much misused as most callers want to find
> child nodes, while of_find_node_by_name() will walk the whole DT from a
> given starting point.

Agreed. In general, it's preferred to look for nodes by compatible
rather than node name with child nodes being an exception.

> I think the right fix here is to
>
> - Replace of_find_node_by_name(root, ...) with
>   of_find_node_by_name(NULL, ...) in arch/powerpc/platforms/chrp/pci.c
>   (if my understanding of the code is correct).
>
> - Replace of_find_node_by_name() with of_get_child_by_name() in callers
>   that need to search immediate children only (I expected that to be the
>   majority of the above call sites).

+1

> - If there are other callers that need to find indirect children,
>   introduce a new of_get_child_by_name_recursive() function.
>
> At that point, the only remaining caller of of_find_node_by_name()
> (beside its usage in the for_each_node_by_name() macro) will be
> drivers/clk/ti/clk.c, which uses the function correctly.
>
> I'm tempted to then rename of_find_node_by_name() to
> __of_find_node_by_name() to indicate it's an internal function not meant
> to be called except in special cases. It could all be renamed to
> __of_find_next_node_by_name() to make its behaviour clearer.

+1

There's a number of functions which I classify as "don't add new
users" which I keep meaning to document. It's generally older stuff
still using them. Anything returning pointers to raw DT data are
problematic for dynamic DT. Hence all the patches removing
of_find_property/of_get_property calls.

Rob
Krzysztof Kozlowski Feb. 12, 2025, 5:47 a.m. UTC | #11
On 07/02/2025 02:31, Zhang Zekun wrote:
> There are many drivers use of_find_node_by_name() with a not-NULL
> device_node pointer, and a number of callers would require a call to
> of_node_get() before using it. There are also some drivers who forget
> to call of_node_get() which would cause a ref count leak[1]. So, Add a
> wraper function for of_find_node_by_name(), drivers may use this function
> to call of_find_node_by_name() with the refcount already balanced.
> 
> [1] https://lore.kernel.org/all/20241024015909.58654-1-zhangzekun11@huawei.com/
> 
> Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
> ---
>  include/linux/of.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/of.h b/include/linux/of.h
> index eaf0e2a2b75c..b7c6d7ff278c 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -268,6 +268,11 @@ static inline const char *of_node_full_name(const struct device_node *np)
>  #define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)
>  extern struct device_node *of_find_node_by_name(struct device_node *from,
>  	const char *name);
> +static inline struct device_node *of_find_node_by_name_balanced(struct device_node *from,
> +								const char *name)
> +{
> +	return of_find_node_by_name(of_node_get(from), name);

I don't think that solution to people not reading API description is to
create more API with similar but a bit different behavior, especially
undocumented.

Best regards,
Krzysztof
diff mbox series

Patch

diff --git a/include/linux/of.h b/include/linux/of.h
index eaf0e2a2b75c..b7c6d7ff278c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -268,6 +268,11 @@  static inline const char *of_node_full_name(const struct device_node *np)
 #define for_each_of_allnodes(dn) for_each_of_allnodes_from(NULL, dn)
 extern struct device_node *of_find_node_by_name(struct device_node *from,
 	const char *name);
+static inline struct device_node *of_find_node_by_name_balanced(struct device_node *from,
+								const char *name)
+{
+	return of_find_node_by_name(of_node_get(from), name);
+}
 extern struct device_node *of_find_node_by_type(struct device_node *from,
 	const char *type);
 extern struct device_node *of_find_compatible_node(struct device_node *from,