diff mbox series

[v3,2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD

Message ID 20210915170940.617415-3-saravanak@google.com (mailing list archive)
State Not Applicable
Headers show
Series fw_devlink bug fixes | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 16212 this patch: 16212
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 43 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 16070 this patch: 16070
netdev/header_inline success Link

Commit Message

Saravana Kannan Sept. 15, 2021, 5:09 p.m. UTC
If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.

However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.

Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.

Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 19 +++++++++++++++++++
 include/linux/fwnode.h | 11 ++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

Comments

Rafael J. Wysocki Sept. 18, 2021, 3:03 p.m. UTC | #1
On Wed, Sep 15, 2021 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> If a parent device is also a supplier to a child device, fw_devlink=on by
> design delays the probe() of the child device until the probe() of the
> parent finishes successfully.
>
> However, some drivers of such parent devices (where parent is also a
> supplier) expect the child device to finish probing successfully as soon as
> they are added using device_add() and before the probe() of the parent
> device has completed successfully. One example of such a case is discussed
> in the link mentioned below.
>
> Add a flag to make fw_devlink=on not enforce these supplier-consumer
> relationships, so these drivers can continue working.
>
> Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
> Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 19 +++++++++++++++++++
>  include/linux/fwnode.h | 11 ++++++++---
>  2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 316df6027093..21d4cb5d3767 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1722,6 +1722,25 @@ static int fw_devlink_create_devlink(struct device *con,
>         struct device *sup_dev;
>         int ret = 0;
>
> +       /*
> +        * In some cases, a device P might also be a supplier to its child node
> +        * C. However, this would defer the probe of C until the probe of P
> +        * completes successfully. This is perfectly fine in the device driver
> +        * model. device_add() doesn't guarantee probe completion of the device
> +        * by the time it returns.

That's right.

However, I don't quite see a point in device links where the supplier
is a direct ancestor of the consumer.  From the PM perspective they
are simply redundant and I'm not sure about any other use cases where
they aren't.

So what's the reason to add them in the first place?

> +        *
> +        * However, there are a few drivers that assume C will finish probing
> +        * as soon as it's added and before P finishes probing. So, we provide
> +        * a flag to let fw_devlink know not to delay the probe of C until the
> +        * probe of P completes successfully.

Well, who's expected to set that flag and when?  This needs to be
mentioned here IMO.

> +        *
> +        * When such a flag is set, we can't create device links where P is the
> +        * supplier of C as that would delay the probe of C.
> +        */
> +       if (sup_handle->flags & FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD &&
> +           fwnode_is_ancestor_of(sup_handle, con->fwnode))
> +               return -EINVAL;
> +
>         sup_dev = get_dev_from_fwnode(sup_handle);
>         if (sup_dev) {
>                 /*
> diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> index 59828516ebaf..9f4ad719bfe3 100644
> --- a/include/linux/fwnode.h
> +++ b/include/linux/fwnode.h
> @@ -22,10 +22,15 @@ struct device;
>   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
>   * NOT_DEVICE: The fwnode will never be populated as a struct device.
>   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> + *                          driver needs its child devices to be bound with
> + *                          their respective drivers as soon as they are
> + *                          added.

The fact that this requires so much comment text here is a clear
band-aid indication to me.

>   */
> -#define FWNODE_FLAG_LINKS_ADDED                BIT(0)
> -#define FWNODE_FLAG_NOT_DEVICE         BIT(1)
> -#define FWNODE_FLAG_INITIALIZED                BIT(2)
> +#define FWNODE_FLAG_LINKS_ADDED                        BIT(0)
> +#define FWNODE_FLAG_NOT_DEVICE                 BIT(1)
> +#define FWNODE_FLAG_INITIALIZED                        BIT(2)
> +#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD   BIT(3)
>
>  struct fwnode_handle {
>         struct fwnode_handle *secondary;
> --
> 2.33.0.309.g3052b89438-goog
>
Andrew Lunn Sept. 19, 2021, 1:16 a.m. UTC | #2
> > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > index 59828516ebaf..9f4ad719bfe3 100644
> > --- a/include/linux/fwnode.h
> > +++ b/include/linux/fwnode.h
> > @@ -22,10 +22,15 @@ struct device;
> >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > + *                          driver needs its child devices to be bound with
> > + *                          their respective drivers as soon as they are
> > + *                          added.
> 
> The fact that this requires so much comment text here is a clear
> band-aid indication to me.

This whole patchset is a band aid, but it is for stable, to fix things
which are currently broken. So we need to answer the question, is a
bad aid good enough for stable, with the assumption a real fix will
come along later?

     Andrew
Greg Kroah-Hartman Sept. 19, 2021, 6:41 a.m. UTC | #3
On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > index 59828516ebaf..9f4ad719bfe3 100644
> > > --- a/include/linux/fwnode.h
> > > +++ b/include/linux/fwnode.h
> > > @@ -22,10 +22,15 @@ struct device;
> > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > + *                          driver needs its child devices to be bound with
> > > + *                          their respective drivers as soon as they are
> > > + *                          added.
> > 
> > The fact that this requires so much comment text here is a clear
> > band-aid indication to me.
> 
> This whole patchset is a band aid, but it is for stable, to fix things
> which are currently broken. So we need to answer the question, is a
> bad aid good enough for stable, with the assumption a real fix will
> come along later?

Fix it properly first and worry about stable later.

greg k-h
Greg Kroah-Hartman Sept. 21, 2021, 4:15 p.m. UTC | #4
On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > index 59828516ebaf..9f4ad719bfe3 100644
> > > --- a/include/linux/fwnode.h
> > > +++ b/include/linux/fwnode.h
> > > @@ -22,10 +22,15 @@ struct device;
> > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > + *                          driver needs its child devices to be bound with
> > > + *                          their respective drivers as soon as they are
> > > + *                          added.
> > 
> > The fact that this requires so much comment text here is a clear
> > band-aid indication to me.
> 
> This whole patchset is a band aid, but it is for stable, to fix things
> which are currently broken. So we need to answer the question, is a
> bad aid good enough for stable, with the assumption a real fix will
> come along later?

Fix it properly first, don't worry about stable.

But what is wrong with this as-is?  What needs to be done that is not
happening here that you feels still needs to be addressed?

thanks,

greg k-h
Rafael J. Wysocki Sept. 21, 2021, 4:35 p.m. UTC | #5
On Tue, Sep 21, 2021 at 6:15 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > > index 59828516ebaf..9f4ad719bfe3 100644
> > > > --- a/include/linux/fwnode.h
> > > > +++ b/include/linux/fwnode.h
> > > > @@ -22,10 +22,15 @@ struct device;
> > > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > > + *                          driver needs its child devices to be bound with
> > > > + *                          their respective drivers as soon as they are
> > > > + *                          added.
> > >
> > > The fact that this requires so much comment text here is a clear
> > > band-aid indication to me.
> >
> > This whole patchset is a band aid, but it is for stable, to fix things
> > which are currently broken. So we need to answer the question, is a
> > bad aid good enough for stable, with the assumption a real fix will
> > come along later?
>
> Fix it properly first, don't worry about stable.
>
> But what is wrong with this as-is?  What needs to be done that is not
> happening here that you feels still needs to be addressed?

The existing code attempts to "enforce" device links where the
supplier is a direct ancestor of the consumer (e.g. its parent), which
is questionable by itself (why do that?) and that's the source of the
underlying issue (self-inflicted circular dependencies that cause
devices to wait for a deferred probe forever) which this patchest
attempts to avoid by adding an extra flag to the driver core and
expecting specific drivers to mark their devices as "special".  And
that's "until we have a real fix".
Andrew Lunn Sept. 21, 2021, 5:56 p.m. UTC | #6
> The existing code attempts to "enforce" device links where the
> supplier is a direct ancestor of the consumer (e.g. its parent), which
> is questionable by itself (why do that?)

In this case, we have an Ethernet switch as the parent device. It
registers an interrupt controller, to the interrupt subsystem. It also
registers an MDIO controller to the MDIO subsystem. The MDIO subsystem
finds the Ethernet PHYs on the MDIO bus, and registers the PHYs to the
PHY subsystem.

Device tree is then used to glue all the parts together. The PHY has
an interrupt output which is connected to the interrupt controller,
and a standard DT property is used to connect the two. The MACs in the
switch are connected to the PHYs, and standard DT properties are used
to connect them together. So we have a loop. But the driver model does
not have a problem with this, at least not until fw_devlink came
along. As soon as a resource is registered with a subsystem, it can be
used. Where as fw_devlink seems to assume a resource cannot be used
until the driver providing it completes probe.

Now, we could ignore all these subsystems, re-invent the wheels inside
the switch driver, and then not have suppliers and consumers at all,
it is all internal. But that seems like a bad idea, more wheels, more
bugs.

So for me, the real fix is that fw_devlink learns that resources are
available as soon as they are registered, not when the provider device
completes probe.

	  Andrew
Rafael J. Wysocki Sept. 21, 2021, 6:56 p.m. UTC | #7
On Tue, Sep 21, 2021 at 7:56 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > The existing code attempts to "enforce" device links where the
> > supplier is a direct ancestor of the consumer (e.g. its parent), which
> > is questionable by itself (why do that?)
>
> In this case, we have an Ethernet switch as the parent device. It
> registers an interrupt controller, to the interrupt subsystem. It also
> registers an MDIO controller to the MDIO subsystem. The MDIO subsystem
> finds the Ethernet PHYs on the MDIO bus, and registers the PHYs to the
> PHY subsystem.
>
> Device tree is then used to glue all the parts together. The PHY has
> an interrupt output which is connected to the interrupt controller,
> and a standard DT property is used to connect the two. The MACs in the
> switch are connected to the PHYs, and standard DT properties are used
> to connect them together. So we have a loop. But the driver model does
> not have a problem with this, at least not until fw_devlink came
> along. As soon as a resource is registered with a subsystem, it can be
> used. Where as fw_devlink seems to assume a resource cannot be used
> until the driver providing it completes probe.

It works at a device level, so it doesn't know about resources.  The
only information it has is of the "this device may depend on that
other device" type and it uses that information to figure out a usable
probe ordering for drivers.

> Now, we could ignore all these subsystems, re-invent the wheels inside
> the switch driver, and then not have suppliers and consumers at all,
> it is all internal. But that seems like a bad idea, more wheels, more
> bugs.
>
> So for me, the real fix is that fw_devlink learns that resources are
> available as soon as they are registered, not when the provider device
> completes probe.

Because it doesn't know about individual resources, it cannot really do that.

Also if the probe has already started, it may still return
-EPROBE_DEFER at any time in theory, so as a rule the dependency is
actually known to be satisfied when the probe has successfully
completed.

However, making children wait for their parents to complete probing is
generally artificial, especially in the cases when the children are
registered by the parent's driver.  So waiting should be an exception
in these cases, not a rule.
Andrew Lunn Sept. 21, 2021, 7:50 p.m. UTC | #8
> It works at a device level, so it doesn't know about resources.  The
> only information it has is of the "this device may depend on that
> other device" type and it uses that information to figure out a usable
> probe ordering for drivers.

And that simplification is the problem. A phandle does not point to a
device, it points to a resource of a device. It should really be doing
what the driver would do, follow the phandle to the resource and see
if it exists yet. If it does not exist then yes it can defer the
probe. If the resource does exist, allow the driver to probe.

> Also if the probe has already started, it may still return
> -EPROBE_DEFER at any time in theory

Sure it can, and does. And any driver which is not broken will
unregister its resources on the error path. And that causes users of
the resources to release them. It all nicely unravels, and then tries
again later. This all works, it is what these drivers do.

> However, making children wait for their parents to complete probing is
> generally artificial, especially in the cases when the children are
> registered by the parent's driver.  So waiting should be an exception
> in these cases, not a rule.

So are you suggesting that fw_devlink core needs to change, recognise
the dependency on a parent, and allow the probe? Works for me. Gets us
back to before fw_devlink.

    Andrew
Saravana Kannan Sept. 21, 2021, 8:07 p.m. UTC | #9
Sorry I've been busy with LPC and some other stuff and could respond earlier.

On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > It works at a device level, so it doesn't know about resources.  The
> > only information it has is of the "this device may depend on that
> > other device" type and it uses that information to figure out a usable
> > probe ordering for drivers.
>
> And that simplification is the problem. A phandle does not point to a
> device, it points to a resource of a device. It should really be doing
> what the driver would do, follow the phandle to the resource and see
> if it exists yet. If it does not exist then yes it can defer the
> probe. If the resource does exist, allow the driver to probe.
>
> > Also if the probe has already started, it may still return
> > -EPROBE_DEFER at any time in theory
>
> Sure it can, and does. And any driver which is not broken will
> unregister its resources on the error path. And that causes users of
> the resources to release them. It all nicely unravels, and then tries
> again later. This all works, it is what these drivers do.

One of the points of fw_devlink=on is to avoid the pointless deferred
probes that'd happen in this situation. So saying "let this happen"
when fw_devlink=on kinda beats the point of it. See further below.

>
> > However, making children wait for their parents to complete probing is
> > generally artificial, especially in the cases when the children are
> > registered by the parent's driver.  So waiting should be an exception
> > in these cases, not a rule.

Rafael,

There are cases where the children try to probe too quickly (before
the parent has had time to set up all the resources it's setting up)
and the child defers the probe. Even Andrew had an example of that
with some ethernet driver where the deferred probe is attempted
multiple times wasting time and then it eventually succeeds.

Considering there's no guarantee that a device_add() will result in
the device being bound immediately, why shouldn't we make the child
device wait until the parent has completely probed and we know all the
resources from the parent are guaranteed to be available? Why can't we
treat drivers that assume a device will get bound as soon as it's
added as the exception (because we don't guarantee that anyway)?

Also, this assumption that the child will be bound successfully upon
addition forces the parent/child drivers to play initcall chicken --
the child's driver has to be registered before the parent's driver. We
should be getting away from those by fixing the parent driver that's
making these assumptions (I'll be glad to help with that). We need to
be moving towards reducing pointless deferred probes and initcall
ordering requirements instead of saying "this bad assumption used to
work, so allow me to continue doing that".

-Saravana

> So are you suggesting that fw_devlink core needs to change, recognise
> the dependency on a parent, and allow the probe? Works for me. Gets us
> back to before fw_devlink.
Andrew Lunn Sept. 21, 2021, 9:02 p.m. UTC | #10
> There are cases where the children try to probe too quickly (before
> the parent has had time to set up all the resources it's setting up)
> and the child defers the probe. Even Andrew had an example of that
> with some ethernet driver where the deferred probe is attempted
> multiple times wasting time and then it eventually succeeds.

And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
which is the current state. I'm happy to see optimisations, but not at
the expense of breaking working stuff.

> Also, this assumption that the child will be bound successfully upon
> addition forces the parent/child drivers to play initcall chicken

We have never had any initcall chicken problems. The switch drivers
all are standard mdio_module_driver, module_platform_driver,
module_i2c_driver, module_pci_driver. Nothing special here. Things
load in whatever order they load, and it all works out, maybe with an
EPROBE_DEFER cycle. Which is good, we get our error paths tested, and
sometimes find bugs that way.

     Andrew
Saravana Kannan Sept. 21, 2021, 9:54 p.m. UTC | #11
On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > There are cases where the children try to probe too quickly (before
> > the parent has had time to set up all the resources it's setting up)
> > and the child defers the probe. Even Andrew had an example of that
> > with some ethernet driver where the deferred probe is attempted
> > multiple times wasting time and then it eventually succeeds.
>
> And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> which is the current state. I'm happy to see optimisations, but not at
> the expense of breaking working stuff.

Right, but in that case, the long term solution should be to make
changes so we don't expect the child to be bound as soon as it's
added. Not disable the optimization. Agree?

>
> > Also, this assumption that the child will be bound successfully upon
> > addition forces the parent/child drivers to play initcall chicken
>
> We have never had any initcall chicken problems. The switch drivers
> all are standard mdio_module_driver, module_platform_driver,
> module_i2c_driver, module_pci_driver. Nothing special here. Things
> load in whatever order they load, and it all works out, maybe with an
> EPROBE_DEFER cycle. Which is good, we get our error paths tested, and
> sometimes find bugs that way.

My comment was a general comment about parent drives that expect the
child drivers to be bound on addition -- not specific to DSA.

But even in the DSA case, not playing initcall chicken means if you
change the order of driver registration, things should still work.
However, as it stands, if you register the switch driver before the
PHY drivers are registered, you are going to force bind the PHYs to
the generic driver.

-Saravana
Andrew Lunn Sept. 21, 2021, 10:04 p.m. UTC | #12
On Tue, Sep 21, 2021 at 02:54:47PM -0700, Saravana Kannan wrote:
> On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > There are cases where the children try to probe too quickly (before
> > > the parent has had time to set up all the resources it's setting up)
> > > and the child defers the probe. Even Andrew had an example of that
> > > with some ethernet driver where the deferred probe is attempted
> > > multiple times wasting time and then it eventually succeeds.
> >
> > And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> > which is the current state. I'm happy to see optimisations, but not at
> > the expense of breaking working stuff.
> 
> Right, but in that case, the long term solution should be to make
> changes so we don't expect the child to be bound as soon as it's
> added. Not disable the optimization. Agree?

Maybe. Lets see how you fix what is currently broken. At the moment, i
don't care too much about the long term solution. The current quick
fix for stable does not seem to be making any progress. So we need the
real fix now, to unbreak what is currently broken, then we can think
about the long term.

    Andrew
Saravana Kannan Sept. 21, 2021, 10:26 p.m. UTC | #13
On Tue, Sep 21, 2021 at 3:04 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Tue, Sep 21, 2021 at 02:54:47PM -0700, Saravana Kannan wrote:
> > On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > There are cases where the children try to probe too quickly (before
> > > > the parent has had time to set up all the resources it's setting up)
> > > > and the child defers the probe. Even Andrew had an example of that
> > > > with some ethernet driver where the deferred probe is attempted
> > > > multiple times wasting time and then it eventually succeeds.
> > >
> > > And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> > > which is the current state. I'm happy to see optimisations, but not at
> > > the expense of breaking working stuff.
> >
> > Right, but in that case, the long term solution should be to make
> > changes so we don't expect the child to be bound as soon as it's
> > added. Not disable the optimization. Agree?
>
> Maybe. Lets see how you fix what is currently broken. At the moment, i
> don't care too much about the long term solution. The current quick
> fix for stable does not seem to be making any progress. So we need the
> real fix now, to unbreak what is currently broken, then we can think
> about the long term.

Wait, what's the difference between a real fix vs a long term fix? To
me those are the same.

I agree that having DSA be broken till we have the final fix isn't
good. Especially because DSA is complicated and the over eager gen PHY
driver makes it even harder to fix it quickly.

Merging this patch gives an exception to DSA, while we figure out a
good fix. It also makes sure more drivers don't get merged with the
same assumptions (because fw_devlink=on won't give them the
exception).

Greg/Rafael, can we merge this please while we figure out a fix for
DSA/PHY. It's a non-trivial problem to solve because PHYs need some
kind of driver "match rating".

-Saravana
Andrew Lunn Sept. 22, 2021, 12:45 a.m. UTC | #14
> Wait, what's the difference between a real fix vs a long term fix? To
> me those are the same.

Maybe the long term fix is you follow the phandle to the actual
resources, see it is present, and allow the probe? That brings you in
line with how things actually work with devices probing against
resources.

I don't know how much work that is, since there is no uniform API to
follow a phandle to a resource. I think each phandle type has its own
helper. For an interrupt phandle you need to use of_irq_get(), for a
gpio phandle maybe of_get_named_gpio_flags(), for a reset phandle
__of_reset_control_get(), etc.

Because this does not sounds too simple, maybe you can find something
simpler which is a real fix for now, good enough that it will get
merged, and then you can implement this phandle following for the long
term fix?

	 Andrew
Saravana Kannan Sept. 22, 2021, 1 a.m. UTC | #15
On Tue, Sep 21, 2021 at 5:45 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Wait, what's the difference between a real fix vs a long term fix? To
> > me those are the same.
>
> Maybe the long term fix is you follow the phandle to the actual
> resources, see it is present, and allow the probe? That brings you in
> line with how things actually work with devices probing against
> resources.
>
> I don't know how much work that is, since there is no uniform API to
> follow a phandle to a resource. I think each phandle type has its own
> helper. For an interrupt phandle you need to use of_irq_get(), for a
> gpio phandle maybe of_get_named_gpio_flags(), for a reset phandle
> __of_reset_control_get(), etc.

That goes back to Rafael's reply (and I agree):

"Also if the probe has already started, it may still return
-EPROBE_DEFER at any time in theory, so as a rule the dependency is
actually known to be satisfied when the probe has successfully
completed."

So waiting for the probe to finish is the right behavior/intentional
for fw_devlink.

> Because this does not sounds too simple, maybe you can find something
> simpler which is a real fix for now, good enough that it will get
> merged, and then you can implement this phandle following for the long
> term fix?

The simpler fix is really just this patch. I'm hoping Greg/Rafael see
my point about doing the exception this way prevents things from
getting worse will we address existing cases that need the flag.

The long/proper fix is to the DSA framework. I have some ideas that I
think will work but I've had time to get to (but on the top of my
upstream work list). We can judge that after I send out the patches :)

-Saravana
Andrew Lunn Sept. 22, 2021, 12:52 p.m. UTC | #16
> That goes back to Rafael's reply (and I agree):
> 
> "Also if the probe has already started, it may still return
> -EPROBE_DEFER at any time in theory, so as a rule the dependency is
> actually known to be satisfied when the probe has successfully
> completed."
> 
> So waiting for the probe to finish is the right behavior/intentional
> for fw_devlink.

But differs to how things actually work in the driver model. The
driver model does not care if a driver has finished probing, you can
use a resource as soon as it is registered. Hence this whole
problem/discussion.

	Andrew
Rafael J. Wysocki Sept. 23, 2021, 12:48 p.m. UTC | #17
On Tue, Sep 21, 2021 at 10:07 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Sorry I've been busy with LPC and some other stuff and could respond earlier.
>
> On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > It works at a device level, so it doesn't know about resources.  The
> > > only information it has is of the "this device may depend on that
> > > other device" type and it uses that information to figure out a usable
> > > probe ordering for drivers.
> >
> > And that simplification is the problem. A phandle does not point to a
> > device, it points to a resource of a device. It should really be doing
> > what the driver would do, follow the phandle to the resource and see
> > if it exists yet. If it does not exist then yes it can defer the
> > probe. If the resource does exist, allow the driver to probe.
> >
> > > Also if the probe has already started, it may still return
> > > -EPROBE_DEFER at any time in theory
> >
> > Sure it can, and does. And any driver which is not broken will
> > unregister its resources on the error path. And that causes users of
> > the resources to release them. It all nicely unravels, and then tries
> > again later. This all works, it is what these drivers do.
>
> One of the points of fw_devlink=on is to avoid the pointless deferred
> probes that'd happen in this situation. So saying "let this happen"
> when fw_devlink=on kinda beats the point of it. See further below.

Well, you need to define "pointless deferred probes" in the first
place.  fw_devlink adds deferred probes by itself, so why are those
not pointless whereas the others are?

> >
> > > However, making children wait for their parents to complete probing is
> > > generally artificial, especially in the cases when the children are
> > > registered by the parent's driver.  So waiting should be an exception
> > > in these cases, not a rule.
>
> Rafael,
>
> There are cases where the children try to probe too quickly (before
> the parent has had time to set up all the resources it's setting up)
> and the child defers the probe. Even Andrew had an example of that
> with some ethernet driver where the deferred probe is attempted
> multiple times wasting time and then it eventually succeeds.

You seem to be arguing that it may be possible to replace multiple
probe attempts that each are deferred with one probe deferral which
then is beneficial from the performance perspective.

Yes, there are cases like that, but when this is used as a general
rule, it introduces a problem if it does a deferred probe when there
is no need for a probe deferral at all (like in the specific problem
case at hand).  Also if the probing of the child is deferred just
once, adding an extra dependency on the parent to it doesn't really
help.

> Considering there's no guarantee that a device_add() will result in
> the device being bound immediately, why shouldn't we make the child
> device wait until the parent has completely probed and we know all the
> resources from the parent are guaranteed to be available? Why can't we
> treat drivers that assume a device will get bound as soon as it's
> added as the exception (because we don't guarantee that anyway)?

Because this adds artificial constraints that otherwise aren't there
in some cases to the picture and asking drivers to mark themselves as
"please don't add these artificial constraints for me" is not
particularly straightforward.  Moreover, it does that retroactively
causing things that are entirely correct and previously worked just
fine to now have to paint themselves red to continue working as
before.

The fact that immediate probe completion cannot be guaranteed in
general doesn't mean that it cannot be assumed in certain situations.
For example, a parent driver registering a child may know what the
child driver is and so it may know that the child will either probe
successfully right away or the probing of it will fail and your extra
constraint breaks that assumption.  You can't really know how many of
such cases there are and trying to cover them with a new flag is a
retroactive whack-a-mole game.

> Also, this assumption that the child will be bound successfully upon
> addition forces the parent/child drivers to play initcall chicken --
> the child's driver has to be registered before the parent's driver.

That's true, but why is this a general problem?  For example, they
both may be registered by the same function in the right order.
What's wrong with that?

> We should be getting away from those by fixing the parent driver that's
> making these assumptions (I'll be glad to help with that). We need to
> be moving towards reducing pointless deferred probes and initcall
> ordering requirements instead of saying "this bad assumption used to
> work, so allow me to continue doing that".

It is not always a bad assumption.  It may be code designed this way.
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 316df6027093..21d4cb5d3767 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1722,6 +1722,25 @@  static int fw_devlink_create_devlink(struct device *con,
 	struct device *sup_dev;
 	int ret = 0;
 
+	/*
+	 * In some cases, a device P might also be a supplier to its child node
+	 * C. However, this would defer the probe of C until the probe of P
+	 * completes successfully. This is perfectly fine in the device driver
+	 * model. device_add() doesn't guarantee probe completion of the device
+	 * by the time it returns.
+	 *
+	 * However, there are a few drivers that assume C will finish probing
+	 * as soon as it's added and before P finishes probing. So, we provide
+	 * a flag to let fw_devlink know not to delay the probe of C until the
+	 * probe of P completes successfully.
+	 *
+	 * When such a flag is set, we can't create device links where P is the
+	 * supplier of C as that would delay the probe of C.
+	 */
+	if (sup_handle->flags & FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD &&
+	    fwnode_is_ancestor_of(sup_handle, con->fwnode))
+		return -EINVAL;
+
 	sup_dev = get_dev_from_fwnode(sup_handle);
 	if (sup_dev) {
 		/*
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 59828516ebaf..9f4ad719bfe3 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -22,10 +22,15 @@  struct device;
  * LINKS_ADDED:	The fwnode has already be parsed to add fwnode links.
  * NOT_DEVICE:	The fwnode will never be populated as a struct device.
  * INITIALIZED: The hardware corresponding to fwnode has been initialized.
+ * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
+ *			     driver needs its child devices to be bound with
+ *			     their respective drivers as soon as they are
+ *			     added.
  */
-#define FWNODE_FLAG_LINKS_ADDED		BIT(0)
-#define FWNODE_FLAG_NOT_DEVICE		BIT(1)
-#define FWNODE_FLAG_INITIALIZED		BIT(2)
+#define FWNODE_FLAG_LINKS_ADDED			BIT(0)
+#define FWNODE_FLAG_NOT_DEVICE			BIT(1)
+#define FWNODE_FLAG_INITIALIZED			BIT(2)
+#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD	BIT(3)
 
 struct fwnode_handle {
 	struct fwnode_handle *secondary;