diff mbox series

[v3,09/12] of: property: Simplify of_link_to_phandle()

Message ID 20230207014207.1678715-10-saravanak@google.com (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series fw_devlink improvements | expand

Commit Message

Saravana Kannan Feb. 7, 2023, 1:42 a.m. UTC
The driver core now:
- Has the parent device of a supplier pick up the consumers if the
  supplier never has a device created for it.
- Ignores a supplier if the supplier has no parent device and will never
  be probed by a driver

And already prevents creating a device link with the consumer as a
supplier of a parent.

So, we no longer need to find the "compatible" node of the supplier or
do any other checks in of_link_to_phandle(). We simply need to make sure
that the supplier is available in DT.

Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Colin Foster <colin.foster@in-advantage.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/of/property.c | 84 +++++++------------------------------------
 1 file changed, 13 insertions(+), 71 deletions(-)

Comments

Geert Uytterhoeven Feb. 7, 2023, 8:57 p.m. UTC | #1
Hi Saravana,

On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> The driver core now:
> - Has the parent device of a supplier pick up the consumers if the
>   supplier never has a device created for it.
> - Ignores a supplier if the supplier has no parent device and will never
>   be probed by a driver
>
> And already prevents creating a device link with the consumer as a
> supplier of a parent.
>
> So, we no longer need to find the "compatible" node of the supplier or
> do any other checks in of_link_to_phandle(). We simply need to make sure
> that the supplier is available in DT.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Thanks for your patch!

This patch introduces a regression when dynamically loading DT overlays.
Unfortunately this happens when using the out-of-tree OF configfs,
which is not supported upstream.  Still, there may be (obscure)
in-tree users.

When loading a DT overlay[1] to enable an SPI controller, and
instantiate a connected SPI EEPROM:

    $ overlay add 25lc040
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins

The SPI controller and the SPI EEPROM are no longer instantiated.

    # cat /sys/kernel/debug/devices_deferred
    e6e90000.spi    platform: wait for supplier msiof0

Let's remove the overlay again:

    $ overlay rm 25lc040
    input: keys as /devices/platform/keys/input/input1

And retry:

    $ overlay add 25lc040
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins
    spi_sh_msiof e6e90000.spi: DMA available
    spi_sh_msiof e6e90000.spi: registered master spi0
    spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
    at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
    spi_sh_msiof e6e90000.spi: registered child spi0.0

Now it succeeds, and the SPI EEPROM is available, and works.

Without this patch, or with this patch reverted after applying the
full series:

    $ overlay add 25lc040
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /keys/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-0
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/pinctrl-names
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/cs-gpios
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /soc/spi@e6e90000/status
    OF: overlay: WARNING: memory leak will occur if overlay removed,
property: /__symbols__/msiof0_pins
    OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
struct device
    spi_sh_msiof e6e90000.spi: DMA available
    spi_sh_msiof e6e90000.spi: registered master spi0
    spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
    at25 spi0.0: 444 bps (2 bytes in 9 ticks)
    at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
    spi_sh_msiof e6e90000.spi: registered child spi0.0

The SPI EEPROM is available on the first try after boot.

All output is with #define DEBUG in drivers/of/property.c, and with
CONFIG_SPI_DEBUG=y.

Note that your patch has no impact on drivers/of/unittest.c, as that
checks only internal DT structures, not actual device instantiation.

Thanks! ;-)

[1] https://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git/diff/arch/arm64/boot/dts/renesas/r8a77990-ebisu-cn41-msiof0-25lc040.dtso?h=topic/renesas-overlays&id=86d0cf6fa7f191145380485c22f684873c5cce26

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Saravana Kannan Feb. 8, 2023, 2:08 a.m. UTC | #2
On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > The driver core now:
> > - Has the parent device of a supplier pick up the consumers if the
> >   supplier never has a device created for it.
> > - Ignores a supplier if the supplier has no parent device and will never
> >   be probed by a driver
> >
> > And already prevents creating a device link with the consumer as a
> > supplier of a parent.
> >
> > So, we no longer need to find the "compatible" node of the supplier or
> > do any other checks in of_link_to_phandle(). We simply need to make sure
> > that the supplier is available in DT.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch!
>
> This patch introduces a regression when dynamically loading DT overlays.
> Unfortunately this happens when using the out-of-tree OF configfs,
> which is not supported upstream.  Still, there may be (obscure)
> in-tree users.
>
> When loading a DT overlay[1] to enable an SPI controller, and
> instantiate a connected SPI EEPROM:
>
>     $ overlay add 25lc040
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
>
> The SPI controller and the SPI EEPROM are no longer instantiated.
>
>     # cat /sys/kernel/debug/devices_deferred
>     e6e90000.spi    platform: wait for supplier msiof0
>
> Let's remove the overlay again:
>
>     $ overlay rm 25lc040
>     input: keys as /devices/platform/keys/input/input1
>
> And retry:
>
>     $ overlay add 25lc040
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
>     spi_sh_msiof e6e90000.spi: DMA available
>     spi_sh_msiof e6e90000.spi: registered master spi0
>     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
>     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
>     spi_sh_msiof e6e90000.spi: registered child spi0.0
>
> Now it succeeds, and the SPI EEPROM is available, and works.
>
> Without this patch, or with this patch reverted after applying the
> full series:
>
>     $ overlay add 25lc040
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /keys/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-0
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/pinctrl-names
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/cs-gpios
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /soc/spi@e6e90000/status
>     OF: overlay: WARNING: memory leak will occur if overlay removed,
> property: /__symbols__/msiof0_pins
>     OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> struct device
>     spi_sh_msiof e6e90000.spi: DMA available
>     spi_sh_msiof e6e90000.spi: registered master spi0
>     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
>     at25 spi0.0: 444 bps (2 bytes in 9 ticks)
>     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
>     spi_sh_msiof e6e90000.spi: registered child spi0.0
>
> The SPI EEPROM is available on the first try after boot.

Sigh... I spent way too long trying to figure out if I caused a memory
leak. I should have scrolled down further! Doesn't look like that part
is related to anything I did.

There are some flags set to avoid re-parsing fwnodes multiple times.
My guess is that the issue you are seeing has to do with how many of
the in memory structs are reused vs not when an overlay is
applied/removed and some of these flags might not be getting cleared
and this is having a bigger impact with this patch (because the fwnode
links are no longer anchored on "compatible" nodes).

With/without this patch (let's keep the series) can you look at how
the following things change between each step you do above (add,
remove, retry):
1) List of directories under /sys/class/devlink
2) Enable the debug logs inside __fwnode_link_add(),
__fwnode_link_del(), device_link_add()

My guess is that the final solution would entail clearing
FWNODE_FLAG_LINKS_ADDED for some fwnodes.

Thanks,
Saravana
Geert Uytterhoeven Feb. 8, 2023, 7:30 a.m. UTC | #3
Hi Saravana,

On Wed, Feb 8, 2023 at 3:08 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > >   supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > >   be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream.  Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> >     # cat /sys/kernel/debug/devices_deferred
> >     e6e90000.spi    platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> >     $ overlay rm 25lc040
> >     input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.

Please ignore the memory leak messages.  They are known issues
in the upstream DT overlay code, and not related to your series.

> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()

Thanks, I'll give that a try, later...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Saravana Kannan Feb. 8, 2023, 7:31 a.m. UTC | #4
On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > >   supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > >   be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream.  Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> >     # cat /sys/kernel/debug/devices_deferred
> >     e6e90000.spi    platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> >     $ overlay rm 25lc040
> >     input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.
>
> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()
>
> My guess is that the final solution would entail clearing
> FWNODE_FLAG_LINKS_ADDED for some fwnodes.

You replied just as I was about to hit send. So sending this anyway...

Ok, I took a closer look and I think it's a bit of a mess. The fact
that it even worked for you without this patch is a bit of a
coincidence.

Let's just take platform devices that are created by
driver/of/platform.c as an example.

The main problem is that when you add/remove properties to a DT node
of an existing platform device, nothing is really done about it at the
device level. We don't even unbind and rebind the driver so the driver
could make use of the new properties. We don't remove and add back the
device so whoever might use the new property will use it. And if you
are adding a new node, it'll only trigger any platform device level
impact if it's a new node of a "simple-bus" (or similar bus) device.

Problem 1:
So if you add a new child node to an existing probed device that adds
its children explicitly (as in, the parent is not a "simple-bus" like
device), nothing will happen. The newly added child device node will
get converted into a platform device, not will the parent device
notice it. So in your case of adding msiof0_pins, it's just that when
the consumer gets the pins, the driver doesn't get involved much and
it's the pinctrl framework that reads the DT and figures it out.

With this patch, the fwnode links point to the actual resource and the
actual parent device inherits them if they don't get converted to a
struct device. But since we are adding this msiof0_pins after the
parent device has probed, the fwnode link isn't inherited by the
parent pinctrl device.

Problem 2:
So if you add a property to an already bound device, nothing is done
by the driver. In your overlay example, if you move the status="okay"
line to be the first property you change in the msiof0 spi device,
you'll probably see that fw_devlink is no longer the one blocking the
probe. This is because the platform device will get added as soon as
the status flips from disabled to enabled and at that point fw_devlink
will think it has no suppliers and won't do any probe deferring. And
then as the new properties get added nothing will happen at the device
or fw_devlink level. If the msiof0's spi driver fails immediately with
NOT -EPROBE_DEFER when platform device is added because it couldn't
find any pinctrl property, then msiof0 will never probe (unless you
remove and add the driver). If it had failed with -EPROBE_DEFER, then
it might probe again if something else triggers a deferred probe
attempt. Clearly, things working/not working based on the order of
properties in DT is not a good implementation.

Problem 3:
What if you enable a previously disabled supplier. There's no way to
handle that from a fw_devlink level without re-parsing the entire
device tree because existing devices might be consumers now.

Anyway, long story short, it's sorta worked due to coincidence and
it's quite messy to get it to work correctly.

Another way to get this to work is to:
1) unload pinctrl driver, unload spi driver.
2) apply overlay
3) reload pinctrl driver, reload spi driver.

This is assuming unloading those 2 drivers doesn't crash your system.

In terms of difficult + inefficiency of solving the problems, the
easiest/efficient to hardest/inefficient would be problem 1, 2 and
then 3. I'll think about them, but it's broken anyway without the
series/patch. The only real guarantee as of today is that we aren't
leaking any memory or corrupting anything.

-Saravana
Greg KH Feb. 8, 2023, 7:33 a.m. UTC | #5
On Tue, Feb 07, 2023 at 09:57:14PM +0100, Geert Uytterhoeven wrote:
> Hi Saravana,
> 
> On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > The driver core now:
> > - Has the parent device of a supplier pick up the consumers if the
> >   supplier never has a device created for it.
> > - Ignores a supplier if the supplier has no parent device and will never
> >   be probed by a driver
> >
> > And already prevents creating a device link with the consumer as a
> > supplier of a parent.
> >
> > So, we no longer need to find the "compatible" node of the supplier or
> > do any other checks in of_link_to_phandle(). We simply need to make sure
> > that the supplier is available in DT.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> 
> Thanks for your patch!
> 
> This patch introduces a regression when dynamically loading DT overlays.
> Unfortunately this happens when using the out-of-tree OF configfs,
> which is not supported upstream.  Still, there may be (obscure)
> in-tree users.

As we can't do anything about out-of-tree code, why does this matter?

thanks,

greg k-h
Geert Uytterhoeven Feb. 8, 2023, 7:50 a.m. UTC | #6
Hi Greg,

On Wed, Feb 8, 2023 at 8:33 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Feb 07, 2023 at 09:57:14PM +0100, Geert Uytterhoeven wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > >   supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > >   be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream.  Still, there may be (obscure)
> > in-tree users.
>
> As we can't do anything about out-of-tree code, why does this matter?

Because the actual DT overlay mechanism is upstream.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Geert Uytterhoeven Feb. 8, 2023, 7:56 a.m. UTC | #7
Hi Saravana,

On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > The driver core now:
> > > > - Has the parent device of a supplier pick up the consumers if the
> > > >   supplier never has a device created for it.
> > > > - Ignores a supplier if the supplier has no parent device and will never
> > > >   be probed by a driver
> > > >
> > > > And already prevents creating a device link with the consumer as a
> > > > supplier of a parent.
> > > >
> > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > that the supplier is available in DT.
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > Thanks for your patch!
> > >
> > > This patch introduces a regression when dynamically loading DT overlays.
> > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > which is not supported upstream.  Still, there may be (obscure)
> > > in-tree users.
> > >
> > > When loading a DT overlay[1] to enable an SPI controller, and
> > > instantiate a connected SPI EEPROM:

[...]

> > > The SPI controller and the SPI EEPROM are no longer instantiated.

> > Sigh... I spent way too long trying to figure out if I caused a memory
> > leak. I should have scrolled down further! Doesn't look like that part
> > is related to anything I did.
> >
> > There are some flags set to avoid re-parsing fwnodes multiple times.
> > My guess is that the issue you are seeing has to do with how many of
> > the in memory structs are reused vs not when an overlay is
> > applied/removed and some of these flags might not be getting cleared
> > and this is having a bigger impact with this patch (because the fwnode
> > links are no longer anchored on "compatible" nodes).
> >
> > With/without this patch (let's keep the series) can you look at how
> > the following things change between each step you do above (add,
> > remove, retry):
> > 1) List of directories under /sys/class/devlink
> > 2) Enable the debug logs inside __fwnode_link_add(),
> > __fwnode_link_del(), device_link_add()
> >
> > My guess is that the final solution would entail clearing
> > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
>
> You replied just as I was about to hit send. So sending this anyway...
>
> Ok, I took a closer look and I think it's a bit of a mess. The fact
> that it even worked for you without this patch is a bit of a
> coincidence.
>
> Let's just take platform devices that are created by
> driver/of/platform.c as an example.
>
> The main problem is that when you add/remove properties to a DT node
> of an existing platform device, nothing is really done about it at the
> device level. We don't even unbind and rebind the driver so the driver
> could make use of the new properties. We don't remove and add back the
> device so whoever might use the new property will use it. And if you
> are adding a new node, it'll only trigger any platform device level
> impact if it's a new node of a "simple-bus" (or similar bus) device.
>
> Problem 1:
> So if you add a new child node to an existing probed device that adds
> its children explicitly (as in, the parent is not a "simple-bus" like
> device), nothing will happen. The newly added child device node will
> get converted into a platform device, not will the parent device
> notice it. So in your case of adding msiof0_pins, it's just that when
> the consumer gets the pins, the driver doesn't get involved much and
> it's the pinctrl framework that reads the DT and figures it out.
>
> With this patch, the fwnode links point to the actual resource and the
> actual parent device inherits them if they don't get converted to a
> struct device. But since we are adding this msiof0_pins after the
> parent device has probed, the fwnode link isn't inherited by the
> parent pinctrl device.
>
> Problem 2:
> So if you add a property to an already bound device, nothing is done
> by the driver. In your overlay example, if you move the status="okay"
> line to be the first property you change in the msiof0 spi device,
> you'll probably see that fw_devlink is no longer the one blocking the
> probe. This is because the platform device will get added as soon as
> the status flips from disabled to enabled and at that point fw_devlink
> will think it has no suppliers and won't do any probe deferring. And
> then as the new properties get added nothing will happen at the device
> or fw_devlink level. If the msiof0's spi driver fails immediately with
> NOT -EPROBE_DEFER when platform device is added because it couldn't
> find any pinctrl property, then msiof0 will never probe (unless you
> remove and add the driver). If it had failed with -EPROBE_DEFER, then
> it might probe again if something else triggers a deferred probe
> attempt. Clearly, things working/not working based on the order of
> properties in DT is not a good implementation.
>
> Problem 3:
> What if you enable a previously disabled supplier. There's no way to
> handle that from a fw_devlink level without re-parsing the entire
> device tree because existing devices might be consumers now.
>
> Anyway, long story short, it's sorta worked due to coincidence and
> it's quite messy to get it to work correctly.

Several subsystems register notifiers to be informed of the events
above. E.g. drivers/spi/spi.c:

        if (IS_ENABLED(CONFIG_OF_DYNAMIC))
                WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
        if (IS_ENABLED(CONFIG_ACPI))
                WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));

So my issue might be triggered using ACPI, too.

> Another way to get this to work is to:
> 1) unload pinctrl driver, unload spi driver.
> 2) apply overlay
> 3) reload pinctrl driver, reload spi driver.
>
> This is assuming unloading those 2 drivers doesn't crash your system.

Unloading the pinctrl driver is not an option.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Saravana Kannan Feb. 8, 2023, 8:35 a.m. UTC | #8
On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > The driver core now:
> > > > > - Has the parent device of a supplier pick up the consumers if the
> > > > >   supplier never has a device created for it.
> > > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > >   be probed by a driver
> > > > >
> > > > > And already prevents creating a device link with the consumer as a
> > > > > supplier of a parent.
> > > > >
> > > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > > that the supplier is available in DT.
> > > > >
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >
> > > > Thanks for your patch!
> > > >
> > > > This patch introduces a regression when dynamically loading DT overlays.
> > > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > > which is not supported upstream.  Still, there may be (obscure)
> > > > in-tree users.
> > > >
> > > > When loading a DT overlay[1] to enable an SPI controller, and
> > > > instantiate a connected SPI EEPROM:
>
> [...]
>
> > > > The SPI controller and the SPI EEPROM are no longer instantiated.
>
> > > Sigh... I spent way too long trying to figure out if I caused a memory
> > > leak. I should have scrolled down further! Doesn't look like that part
> > > is related to anything I did.
> > >
> > > There are some flags set to avoid re-parsing fwnodes multiple times.
> > > My guess is that the issue you are seeing has to do with how many of
> > > the in memory structs are reused vs not when an overlay is
> > > applied/removed and some of these flags might not be getting cleared
> > > and this is having a bigger impact with this patch (because the fwnode
> > > links are no longer anchored on "compatible" nodes).
> > >
> > > With/without this patch (let's keep the series) can you look at how
> > > the following things change between each step you do above (add,
> > > remove, retry):
> > > 1) List of directories under /sys/class/devlink
> > > 2) Enable the debug logs inside __fwnode_link_add(),
> > > __fwnode_link_del(), device_link_add()
> > >
> > > My guess is that the final solution would entail clearing
> > > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
> >
> > You replied just as I was about to hit send. So sending this anyway...
> >
> > Ok, I took a closer look and I think it's a bit of a mess. The fact
> > that it even worked for you without this patch is a bit of a
> > coincidence.
> >
> > Let's just take platform devices that are created by
> > driver/of/platform.c as an example.
> >
> > The main problem is that when you add/remove properties to a DT node
> > of an existing platform device, nothing is really done about it at the
> > device level. We don't even unbind and rebind the driver so the driver
> > could make use of the new properties. We don't remove and add back the
> > device so whoever might use the new property will use it. And if you
> > are adding a new node, it'll only trigger any platform device level
> > impact if it's a new node of a "simple-bus" (or similar bus) device.
> >
> > Problem 1:
> > So if you add a new child node to an existing probed device that adds
> > its children explicitly (as in, the parent is not a "simple-bus" like
> > device), nothing will happen. The newly added child device node will
> > get converted into a platform device, not will the parent device
> > notice it. So in your case of adding msiof0_pins, it's just that when
> > the consumer gets the pins, the driver doesn't get involved much and
> > it's the pinctrl framework that reads the DT and figures it out.
> >
> > With this patch, the fwnode links point to the actual resource and the
> > actual parent device inherits them if they don't get converted to a
> > struct device. But since we are adding this msiof0_pins after the
> > parent device has probed, the fwnode link isn't inherited by the
> > parent pinctrl device.
> >
> > Problem 2:
> > So if you add a property to an already bound device, nothing is done
> > by the driver. In your overlay example, if you move the status="okay"
> > line to be the first property you change in the msiof0 spi device,
> > you'll probably see that fw_devlink is no longer the one blocking the
> > probe. This is because the platform device will get added as soon as
> > the status flips from disabled to enabled and at that point fw_devlink
> > will think it has no suppliers and won't do any probe deferring. And
> > then as the new properties get added nothing will happen at the device
> > or fw_devlink level. If the msiof0's spi driver fails immediately with
> > NOT -EPROBE_DEFER when platform device is added because it couldn't
> > find any pinctrl property, then msiof0 will never probe (unless you
> > remove and add the driver). If it had failed with -EPROBE_DEFER, then
> > it might probe again if something else triggers a deferred probe
> > attempt. Clearly, things working/not working based on the order of
> > properties in DT is not a good implementation.
> >
> > Problem 3:
> > What if you enable a previously disabled supplier. There's no way to
> > handle that from a fw_devlink level without re-parsing the entire
> > device tree because existing devices might be consumers now.
> >
> > Anyway, long story short, it's sorta worked due to coincidence and
> > it's quite messy to get it to work correctly.
>
> Several subsystems register notifiers to be informed of the events
> above. E.g. drivers/spi/spi.c:
>
>         if (IS_ENABLED(CONFIG_OF_DYNAMIC))
>                 WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
>         if (IS_ENABLED(CONFIG_ACPI))
>                 WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));
>
> So my issue might be triggered using ACPI, too.

Yeah, I did notice this before my email. Here's an ugly hack (at end
of email) to test my theory about Problem 1. I didn't compile test it
(because I should go to bed now), but you get the idea. Can you give
this a shot? It should fix your specific case. Basically for all
overlays (I hope the function is only used for overlays) we assume all
nodes are NOT devices until they actually get added as a device. Don't
review the code, it's not meant to be :)

-Saravana

--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np)
        np->sibling = np->parent->child;
        np->parent->child = np;
        of_node_clear_flag(np, OF_DETACHED);
+       np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE;
 }

 /**
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 81c8c227ab6b..7299cd668e51 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb,
                if (of_node_check_flag(rd->dn, OF_POPULATED))
                        return NOTIFY_OK;

+               rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
                /* pdev_parent may be NULL when no bus platform device */
                pdev_parent = of_find_device_by_node(rd->dn->parent);
                pdev = of_platform_device_create(rd->dn, NULL,
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 15f174f4e056..1de55561b25d 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block
*nb, unsigned long action,
                        return NOTIFY_OK;
                }

+               rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
                spi = of_register_spi_device(ctlr, rd->dn);
                put_device(&ctlr->dev);
Andy Shevchenko Feb. 8, 2023, 1:37 p.m. UTC | #9
On Tue, Feb 07, 2023 at 11:31:57PM -0800, Saravana Kannan wrote:
> On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:

...

> Another way to get this to work is to:
> 1) unload pinctrl driver, unload spi driver.
> 2) apply overlay
> 3) reload pinctrl driver, reload spi driver.
> 
> This is assuming unloading those 2 drivers doesn't crash your system.

Just a side note.

For ACPI case the ACPICA prevents appearing of the same device in the
namespace, so the above is kinda enforced, that's why overlays work better
there (but have a lot of limitations).
Geert Uytterhoeven Feb. 13, 2023, 1:04 p.m. UTC | #10
Hi Saravana,

On Wed, Feb 8, 2023 at 3:08 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The driver core now:
> > > - Has the parent device of a supplier pick up the consumers if the
> > >   supplier never has a device created for it.
> > > - Ignores a supplier if the supplier has no parent device and will never
> > >   be probed by a driver
> > >
> > > And already prevents creating a device link with the consumer as a
> > > supplier of a parent.
> > >
> > > So, we no longer need to find the "compatible" node of the supplier or
> > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > that the supplier is available in DT.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch!
> >
> > This patch introduces a regression when dynamically loading DT overlays.
> > Unfortunately this happens when using the out-of-tree OF configfs,
> > which is not supported upstream.  Still, there may be (obscure)
> > in-tree users.
> >
> > When loading a DT overlay[1] to enable an SPI controller, and
> > instantiate a connected SPI EEPROM:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >
> > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> >     # cat /sys/kernel/debug/devices_deferred
> >     e6e90000.spi    platform: wait for supplier msiof0
> >
> > Let's remove the overlay again:
> >
> >     $ overlay rm 25lc040
> >     input: keys as /devices/platform/keys/input/input1
> >
> > And retry:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > Now it succeeds, and the SPI EEPROM is available, and works.
> >
> > Without this patch, or with this patch reverted after applying the
> > full series:
> >
> >     $ overlay add 25lc040
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /keys/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-0
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/pinctrl-names
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/cs-gpios
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /soc/spi@e6e90000/status
> >     OF: overlay: WARNING: memory leak will occur if overlay removed,
> > property: /__symbols__/msiof0_pins
> >     OF: Not linking spi@e6e90000 to interrupt-controller@f1010000 - No
> > struct device
> >     spi_sh_msiof e6e90000.spi: DMA available
> >     spi_sh_msiof e6e90000.spi: registered master spi0
> >     spi spi0.0: setup mode 0, 8 bits/w, 100000 Hz max --> 0
> >     at25 spi0.0: 444 bps (2 bytes in 9 ticks)
> >     at25 spi0.0: 512 Byte at25 eeprom, pagesize 16
> >     spi_sh_msiof e6e90000.spi: registered child spi0.0
> >
> > The SPI EEPROM is available on the first try after boot.
>
> Sigh... I spent way too long trying to figure out if I caused a memory
> leak. I should have scrolled down further! Doesn't look like that part
> is related to anything I did.
>
> There are some flags set to avoid re-parsing fwnodes multiple times.
> My guess is that the issue you are seeing has to do with how many of
> the in memory structs are reused vs not when an overlay is
> applied/removed and some of these flags might not be getting cleared
> and this is having a bigger impact with this patch (because the fwnode
> links are no longer anchored on "compatible" nodes).
>
> With/without this patch (let's keep the series) can you look at how
> the following things change between each step you do above (add,
> remove, retry):
> 1) List of directories under /sys/class/devlink
> 2) Enable the debug logs inside __fwnode_link_add(),
> __fwnode_link_del(), device_link_add()

Without "of: property: Simplify of_link_to_phandle()":

  - Adding overlay:

        spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
        spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
        spi@e6e90000 Linked as a fwnode consumer to pinctrl@e6060000
        spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
        platform e6e90000.spi: Linked as a consumer to e6055000.gpio
        spi@e6e90000 Dropping the fwnode link to gpio@e6055000
        platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl
        spi@e6e90000 Dropping the fwnode link to pinctrl@e6060000
        spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
        platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
        spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000

        +platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        +platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
        +platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
        -platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

        SPI EEPROM works

  - Removing overlay:

        platform keys: Linked as a sync state only consumer to e6055000.gpio

        -platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        -platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
        -platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi

        platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?

  - Adding overlay again:

        No debug output
        No change in sys/class/devlink?!?!?
        SPI EEPROM works


With "of: property: Simplify of_link_to_phandle()":

  - Adding overlay:

        spi@e6e90000 Linked as a fwnode consumer to
interrupt-controller@f1010000
        spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
        spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
        spi@e6e90000 Linked as a fwnode consumer to msiof0
        spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
        platform e6e90000.spi: Linked as a consumer to e6055000.gpio
        spi@e6e90000 Dropping the fwnode link to gpio@e6055000
        spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
        platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
        spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000
        platform e6e90000.spi: Linked as a consumer to soc
        spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000

        +platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        +platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
        +platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

        -platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

        SPI EEPROM does not work

  - Removing overlay:

        platform keys: Linked as a sync state only consumer to e6055000.gpio
        spi@e6e90000 Dropping the fwnode link to msiof0

        -platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        -platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
        -platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

        platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?


  - Adding overlay again:

        No debug output
        No change in sys/class/devlink?!?!?
        SPI EEPROM works

Gr{oetje,eeting}s,

                        Geert
Geert Uytterhoeven Feb. 13, 2023, 1:10 p.m. UTC | #11
Hi Saravana,

On Wed, Feb 8, 2023 at 9:35 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Feb 7, 2023 at 11:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Feb 8, 2023 at 8:32 AM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Feb 7, 2023 at 6:08 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Tue, Feb 7, 2023 at 12:57 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Tue, Feb 7, 2023 at 2:42 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > The driver core now:
> > > > > > - Has the parent device of a supplier pick up the consumers if the
> > > > > >   supplier never has a device created for it.
> > > > > > - Ignores a supplier if the supplier has no parent device and will never
> > > > > >   be probed by a driver
> > > > > >
> > > > > > And already prevents creating a device link with the consumer as a
> > > > > > supplier of a parent.
> > > > > >
> > > > > > So, we no longer need to find the "compatible" node of the supplier or
> > > > > > do any other checks in of_link_to_phandle(). We simply need to make sure
> > > > > > that the supplier is available in DT.
> > > > > >
> > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > >
> > > > > Thanks for your patch!
> > > > >
> > > > > This patch introduces a regression when dynamically loading DT overlays.
> > > > > Unfortunately this happens when using the out-of-tree OF configfs,
> > > > > which is not supported upstream.  Still, there may be (obscure)
> > > > > in-tree users.
> > > > >
> > > > > When loading a DT overlay[1] to enable an SPI controller, and
> > > > > instantiate a connected SPI EEPROM:
> >
> > [...]
> >
> > > > > The SPI controller and the SPI EEPROM are no longer instantiated.
> >
> > > > Sigh... I spent way too long trying to figure out if I caused a memory
> > > > leak. I should have scrolled down further! Doesn't look like that part
> > > > is related to anything I did.
> > > >
> > > > There are some flags set to avoid re-parsing fwnodes multiple times.
> > > > My guess is that the issue you are seeing has to do with how many of
> > > > the in memory structs are reused vs not when an overlay is
> > > > applied/removed and some of these flags might not be getting cleared
> > > > and this is having a bigger impact with this patch (because the fwnode
> > > > links are no longer anchored on "compatible" nodes).
> > > >
> > > > With/without this patch (let's keep the series) can you look at how
> > > > the following things change between each step you do above (add,
> > > > remove, retry):
> > > > 1) List of directories under /sys/class/devlink
> > > > 2) Enable the debug logs inside __fwnode_link_add(),
> > > > __fwnode_link_del(), device_link_add()
> > > >
> > > > My guess is that the final solution would entail clearing
> > > > FWNODE_FLAG_LINKS_ADDED for some fwnodes.
> > >
> > > You replied just as I was about to hit send. So sending this anyway...
> > >
> > > Ok, I took a closer look and I think it's a bit of a mess. The fact
> > > that it even worked for you without this patch is a bit of a
> > > coincidence.
> > >
> > > Let's just take platform devices that are created by
> > > driver/of/platform.c as an example.
> > >
> > > The main problem is that when you add/remove properties to a DT node
> > > of an existing platform device, nothing is really done about it at the
> > > device level. We don't even unbind and rebind the driver so the driver
> > > could make use of the new properties. We don't remove and add back the
> > > device so whoever might use the new property will use it. And if you
> > > are adding a new node, it'll only trigger any platform device level
> > > impact if it's a new node of a "simple-bus" (or similar bus) device.
> > >
> > > Problem 1:
> > > So if you add a new child node to an existing probed device that adds
> > > its children explicitly (as in, the parent is not a "simple-bus" like
> > > device), nothing will happen. The newly added child device node will
> > > get converted into a platform device, not will the parent device
> > > notice it. So in your case of adding msiof0_pins, it's just that when
> > > the consumer gets the pins, the driver doesn't get involved much and
> > > it's the pinctrl framework that reads the DT and figures it out.
> > >
> > > With this patch, the fwnode links point to the actual resource and the
> > > actual parent device inherits them if they don't get converted to a
> > > struct device. But since we are adding this msiof0_pins after the
> > > parent device has probed, the fwnode link isn't inherited by the
> > > parent pinctrl device.
> > >
> > > Problem 2:
> > > So if you add a property to an already bound device, nothing is done
> > > by the driver. In your overlay example, if you move the status="okay"
> > > line to be the first property you change in the msiof0 spi device,
> > > you'll probably see that fw_devlink is no longer the one blocking the
> > > probe. This is because the platform device will get added as soon as
> > > the status flips from disabled to enabled and at that point fw_devlink
> > > will think it has no suppliers and won't do any probe deferring. And
> > > then as the new properties get added nothing will happen at the device
> > > or fw_devlink level. If the msiof0's spi driver fails immediately with
> > > NOT -EPROBE_DEFER when platform device is added because it couldn't
> > > find any pinctrl property, then msiof0 will never probe (unless you
> > > remove and add the driver). If it had failed with -EPROBE_DEFER, then
> > > it might probe again if something else triggers a deferred probe
> > > attempt. Clearly, things working/not working based on the order of
> > > properties in DT is not a good implementation.
> > >
> > > Problem 3:
> > > What if you enable a previously disabled supplier. There's no way to
> > > handle that from a fw_devlink level without re-parsing the entire
> > > device tree because existing devices might be consumers now.
> > >
> > > Anyway, long story short, it's sorta worked due to coincidence and
> > > it's quite messy to get it to work correctly.
> >
> > Several subsystems register notifiers to be informed of the events
> > above. E.g. drivers/spi/spi.c:
> >
> >         if (IS_ENABLED(CONFIG_OF_DYNAMIC))
> >                 WARN_ON(of_reconfig_notifier_register(&spi_of_notifier));
> >         if (IS_ENABLED(CONFIG_ACPI))
> >                 WARN_ON(acpi_reconfig_notifier_register(&spi_acpi_notifier));
> >
> > So my issue might be triggered using ACPI, too.
>
> Yeah, I did notice this before my email. Here's an ugly hack (at end
> of email) to test my theory about Problem 1. I didn't compile test it
> (because I should go to bed now), but you get the idea. Can you give
> this a shot? It should fix your specific case. Basically for all
> overlays (I hope the function is only used for overlays) we assume all
> nodes are NOT devices until they actually get added as a device. Don't
> review the code, it's not meant to be :)
>
> -Saravana
>
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -226,6 +226,7 @@ static void __of_attach_node(struct device_node *np)
>         np->sibling = np->parent->child;
>         np->parent->child = np;
>         of_node_clear_flag(np, OF_DETACHED);
> +       np->fwnode.flags |= FWNODE_FLAG_NOT_DEVICE;
>  }
>
>  /**
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 81c8c227ab6b..7299cd668e51 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -732,6 +732,7 @@ static int of_platform_notify(struct notifier_block *nb,
>                 if (of_node_check_flag(rd->dn, OF_POPULATED))
>                         return NOTIFY_OK;
>
> +               rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
>                 /* pdev_parent may be NULL when no bus platform device */
>                 pdev_parent = of_find_device_by_node(rd->dn->parent);
>                 pdev = of_platform_device_create(rd->dn, NULL,
> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
> index 15f174f4e056..1de55561b25d 100644
> --- a/drivers/spi/spi.c
> +++ b/drivers/spi/spi.c
> @@ -4436,6 +4436,7 @@ static int of_spi_notify(struct notifier_block
> *nb, unsigned long action,
>                         return NOTIFY_OK;
>                 }
>
> +               rd->dn->fwnode.flags &= ~FWNODE_FLAG_NOT_DEVICE;
>                 spi = of_register_spi_device(ctlr, rd->dn);
>                 put_device(&ctlr->dev);

Thanks, these changes fix my SPI EEPROM in a DT overlay.
A similar change should be applied to the i2c bus core (and to other
users of of_reconfig_notifier_register()?).

For reference, the same debug output and /sys/class/devlink
changes with this fix applied can be found below.

Note that there are still a few remaining issues, for which I do not
know the full impact:
  - platform:e6060000.pinctrl--platform:keys link is not recreated
     on overlay remove,
  - There is no change in /sys/class/devlink after an add/remove/add
    cycle.
Shouldn't removing a DT overlay restore  /sys/class/devlink to
the exact same state as before adding the DT overlay?

With extra FWNODE_FLAG_NOT_DEVICE handling:

  - Adding overlay:

        spi@e6e90000 Linked as a fwnode consumer to
interrupt-controller@f1010000
        spi@e6e90000 Linked as a fwnode consumer to clock-controller@e6150000
        spi@e6e90000 Linked as a fwnode consumer to system-controller@e6180000
        spi@e6e90000 Linked as a fwnode consumer to msiof0
        spi@e6e90000 Linked as a fwnode consumer to gpio@e6055000
        platform e6e90000.spi: Linked as a consumer to e6055000.gpio
        spi@e6e90000 Dropping the fwnode link to gpio@e6055000
        platform e6e90000.spi: Linked as a consumer to e6060000.pinctrl
        spi@e6e90000 Dropping the fwnode link to msiof0
        spi@e6e90000 Dropping the fwnode link to system-controller@e6180000
        platform e6e90000.spi: Linked as a consumer to e6150000.clock-controller
        spi@e6e90000 Dropping the fwnode link to clock-controller@e6150000
        platform e6e90000.spi: Linked as a consumer to soc
        spi@e6e90000 Dropping the fwnode link to interrupt-controller@f1010000

        +platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        +platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
        +platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
        +platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi
        -platform:e6060000.pinctrl--platform:keys ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:keys

        SPI EEPROM works

  - Removing overlay:

        platform keys: Linked as a sync state only consumer to e6055000.gpio

        -platform:e6055000.gpio--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6055000.gpio--platform:e6e90000.spi
        -platform:e6060000.pinctrl--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6060000.pinctrl--platform:e6e90000.spi
        -platform:e6150000.clock-controller--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:e6150000.clock-controller--platform:e6e90000.spi
        -platform:soc--platform:e6e90000.spi ->
../../devices/virtual/devlink/platform:soc--platform:e6e90000.spi

        platform:e6060000.pinctrl--platform:keys link is not recreated?!?!?

  - Adding overlay again:

        No debug output
        No change in sys/class/devlink?!?!?
        SPI EEPROM works

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 134cfc980b70..c651aad6f34b 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1062,20 +1062,6 @@  of_fwnode_device_get_match_data(const struct fwnode_handle *fwnode,
 	return of_device_get_match_data(dev);
 }
 
-static bool of_is_ancestor_of(struct device_node *test_ancestor,
-			      struct device_node *child)
-{
-	of_node_get(child);
-	while (child) {
-		if (child == test_ancestor) {
-			of_node_put(child);
-			return true;
-		}
-		child = of_get_next_parent(child);
-	}
-	return false;
-}
-
 static struct device_node *of_get_compat_node(struct device_node *np)
 {
 	of_node_get(np);
@@ -1106,71 +1092,27 @@  static struct device_node *of_get_compat_node_parent(struct device_node *np)
 	return node;
 }
 
-/**
- * of_link_to_phandle - Add fwnode link to supplier from supplier phandle
- * @con_np: consumer device tree node
- * @sup_np: supplier device tree node
- *
- * Given a phandle to a supplier device tree node (@sup_np), this function
- * finds the device that owns the supplier device tree node and creates a
- * device link from @dev consumer device to the supplier device. This function
- * doesn't create device links for invalid scenarios such as trying to create a
- * link with a parent device as the consumer of its child device. In such
- * cases, it returns an error.
- *
- * Returns:
- * - 0 if fwnode link successfully created to supplier
- * - -EINVAL if the supplier link is invalid and should not be created
- * - -ENODEV if struct device will never be create for supplier
- */
-static int of_link_to_phandle(struct device_node *con_np,
+static void of_link_to_phandle(struct device_node *con_np,
 			      struct device_node *sup_np)
 {
-	struct device *sup_dev;
-	struct device_node *tmp_np = sup_np;
+	struct device_node *tmp_np = of_node_get(sup_np);
 
-	/*
-	 * Find the device node that contains the supplier phandle.  It may be
-	 * @sup_np or it may be an ancestor of @sup_np.
-	 */
-	sup_np = of_get_compat_node(sup_np);
-	if (!sup_np) {
-		pr_debug("Not linking %pOFP to %pOFP - No device\n",
-			 con_np, tmp_np);
-		return -ENODEV;
-	}
+	/* Check that sup_np and its ancestors are available. */
+	while (tmp_np) {
+		if (of_fwnode_handle(tmp_np)->dev) {
+			of_node_put(tmp_np);
+			break;
+		}
 
-	/*
-	 * Don't allow linking a device node as a consumer of one of its
-	 * descendant nodes. By definition, a child node can't be a functional
-	 * dependency for the parent node.
-	 */
-	if (of_is_ancestor_of(con_np, sup_np)) {
-		pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
-			 con_np, sup_np);
-		of_node_put(sup_np);
-		return -EINVAL;
-	}
+		if (!of_device_is_available(tmp_np)) {
+			of_node_put(tmp_np);
+			return;
+		}
 
-	/*
-	 * Don't create links to "early devices" that won't have struct devices
-	 * created for them.
-	 */
-	sup_dev = get_dev_from_fwnode(&sup_np->fwnode);
-	if (!sup_dev &&
-	    (of_node_check_flag(sup_np, OF_POPULATED) ||
-	     sup_np->fwnode.flags & FWNODE_FLAG_NOT_DEVICE)) {
-		pr_debug("Not linking %pOFP to %pOFP - No struct device\n",
-			 con_np, sup_np);
-		of_node_put(sup_np);
-		return -ENODEV;
+		tmp_np = of_get_next_parent(tmp_np);
 	}
-	put_device(sup_dev);
 
 	fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
-	of_node_put(sup_np);
-
-	return 0;
 }
 
 /**