diff mbox series

driver core: Fix double failed probing with fw_devlink=on

Message ID 20210215111619.2385030-1-geert+renesas@glider.be (mailing list archive)
State Under Review
Delegated to: Geert Uytterhoeven
Headers show
Series driver core: Fix double failed probing with fw_devlink=on | expand

Commit Message

Geert Uytterhoeven Feb. 15, 2021, 11:16 a.m. UTC
With fw_devlink=permissive, devices are added to the deferred probe
pending list if their driver's .probe() method returns -EPROBE_DEFER.

With fw_devlink=on, devices are added to the deferred probe pending list
if they are determined to be a consumer, which happens before their
driver's .probe() method is called.  If the actual probe fails later
(real failure, not -EPROBE_DEFER), the device will still be on the
deferred probe pending list, and it will be probed again when deferred
probing kicks in, which is futile.

Fix this by explicitly removing the device from the deferred probe
pending list in case of probe failures.

Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
Seen on various Renesas R-Car platforms, cfr.
https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
---
 drivers/base/dd.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Rafael J. Wysocki Feb. 15, 2021, 2:58 p.m. UTC | #1
On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
<geert+renesas@glider.be> wrote:
>
> With fw_devlink=permissive, devices are added to the deferred probe
> pending list if their driver's .probe() method returns -EPROBE_DEFER.
>
> With fw_devlink=on, devices are added to the deferred probe pending list
> if they are determined to be a consumer, which happens before their
> driver's .probe() method is called.  If the actual probe fails later
> (real failure, not -EPROBE_DEFER), the device will still be on the
> deferred probe pending list, and it will be probed again when deferred
> probing kicks in, which is futile.
>
> Fix this by explicitly removing the device from the deferred probe
> pending list in case of probe failures.
>
> Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Good catch:

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> Seen on various Renesas R-Car platforms, cfr.
> https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
> ---
>  drivers/base/dd.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9179825ff646f4e3..91c4181093c43709 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>         case -ENXIO:
>                 pr_debug("%s: probe of %s rejects match %d\n",
>                          drv->name, dev_name(dev), ret);
> +               driver_deferred_probe_del(dev);
>                 break;
>         default:
>                 /* driver matched but the probe failed */
>                 pr_warn("%s: probe of %s failed with error %d\n",
>                         drv->name, dev_name(dev), ret);
> +               driver_deferred_probe_del(dev);
>         }
>         /*
>          * Ignore errors returned by ->probe so that the next driver can try
> --
> 2.25.1
>
Saravana Kannan Feb. 15, 2021, 6:26 p.m. UTC | #2
On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> <geert+renesas@glider.be> wrote:
> >
> > With fw_devlink=permissive, devices are added to the deferred probe
> > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> >
> > With fw_devlink=on, devices are added to the deferred probe pending list
> > if they are determined to be a consumer,

If they are determined to be a consumer or if they are determined to
have a supplier that hasn't probed yet?

> > which happens before their
> > driver's .probe() method is called.  If the actual probe fails later
> > (real failure, not -EPROBE_DEFER), the device will still be on the
> > deferred probe pending list, and it will be probed again when deferred
> > probing kicks in, which is futile.
> >
> > Fix this by explicitly removing the device from the deferred probe
> > pending list in case of probe failures.
> >
> > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
>
> Good catch:
>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Geert,

The issue is real and needs to be fixed. But I'm confused how this can
happen. We won't even enter really_probe() if the driver isn't ready.
We also won't get to run the driver's .probe() if the suppliers aren't
ready. So how does the device get added to the deferred probe list
before the driver is ready? Is this due to device_links_driver_bound()
on the supplier?

Can you give a more detailed step by step on the case you are hitting?

Greg/Rafael,

Let's hold off picking this patch till I get to take a closer look
(within a day or two) please.

-Saravana

>
> > ---
> > Seen on various Renesas R-Car platforms, cfr.
> > https://lore.kernel.org/linux-acpi/CAMuHMdVL-1RKJ5u-HDVA4F4w_+8yGvQQuJQBcZMsdV4yXzzfcw@mail.gmail.com
> > ---
> >  drivers/base/dd.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9179825ff646f4e3..91c4181093c43709 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >         case -ENXIO:
> >                 pr_debug("%s: probe of %s rejects match %d\n",
> >                          drv->name, dev_name(dev), ret);
> > +               driver_deferred_probe_del(dev);
> >                 break;
> >         default:
> >                 /* driver matched but the probe failed */
> >                 pr_warn("%s: probe of %s failed with error %d\n",
> >                         drv->name, dev_name(dev), ret);
> > +               driver_deferred_probe_del(dev);
> >         }
> >         /*
> >          * Ignore errors returned by ->probe so that the next driver can try
> > --
> > 2.25.1
> >
Geert Uytterhoeven Feb. 15, 2021, 7:08 p.m. UTC | #3
Hi Saravana,

On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > <geert+renesas@glider.be> wrote:
> > > With fw_devlink=permissive, devices are added to the deferred probe
> > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > >
> > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > if they are determined to be a consumer,
>
> If they are determined to be a consumer or if they are determined to
> have a supplier that hasn't probed yet?

When the supplier has probed:

    bus: 'platform': driver_probe_device: matched device
e6150000.clock-controller with driver renesas-cpg-mssr
    bus: 'platform': really_probe: probing driver renesas-cpg-mssr
with device e6150000.clock-controller
    PM: Added domain provider from /soc/clock-controller@e6150000
    driver: 'renesas-cpg-mssr': driver_bound: bound to device
'e6150000.clock-controller'
    platform e6055800.gpio: Added to deferred list
    [...]
    platform e6020000.watchdog: Added to deferred list
    [...]
    platform fe000000.pcie: Added to deferred list

> > > which happens before their
> > > driver's .probe() method is called.  If the actual probe fails later
> > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > deferred probe pending list, and it will be probed again when deferred
> > > probing kicks in, which is futile.
> > >
> > > Fix this by explicitly removing the device from the deferred probe
> > > pending list in case of probe failures.
> > >
> > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> >
> > Good catch:
> >
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The issue is real and needs to be fixed. But I'm confused how this can
> happen. We won't even enter really_probe() if the driver isn't ready.
> We also won't get to run the driver's .probe() if the suppliers aren't
> ready. So how does the device get added to the deferred probe list
> before the driver is ready? Is this due to device_links_driver_bound()
> on the supplier?
>
> Can you give a more detailed step by step on the case you are hitting?

The device is added to the list due to device_links_driver_bound()
calling driver_deferred_probe_add() on all consumer devices.

> > > +++ b/drivers/base/dd.c
> > > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >         case -ENXIO:
> > >                 pr_debug("%s: probe of %s rejects match %d\n",
> > >                          drv->name, dev_name(dev), ret);
> > > +               driver_deferred_probe_del(dev);
> > >                 break;
> > >         default:
> > >                 /* driver matched but the probe failed */
> > >                 pr_warn("%s: probe of %s failed with error %d\n",
> > >                         drv->name, dev_name(dev), ret);
> > > +               driver_deferred_probe_del(dev);
> > >         }
> > >         /*
> > >          * Ignore errors returned by ->probe so that the next driver can try

Gr{oetje,eeting}s,

                        Geert
Saravana Kannan Feb. 15, 2021, 8:59 p.m. UTC | #4
On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > <geert+renesas@glider.be> wrote:
> > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > >
> > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > if they are determined to be a consumer,
> >
> > If they are determined to be a consumer or if they are determined to
> > have a supplier that hasn't probed yet?
>
> When the supplier has probed:
>
>     bus: 'platform': driver_probe_device: matched device
> e6150000.clock-controller with driver renesas-cpg-mssr
>     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> with device e6150000.clock-controller
>     PM: Added domain provider from /soc/clock-controller@e6150000
>     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> 'e6150000.clock-controller'
>     platform e6055800.gpio: Added to deferred list
>     [...]
>     platform e6020000.watchdog: Added to deferred list
>     [...]
>     platform fe000000.pcie: Added to deferred list
>
> > > > which happens before their
> > > > driver's .probe() method is called.  If the actual probe fails later
> > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > deferred probe pending list, and it will be probed again when deferred
> > > > probing kicks in, which is futile.
> > > >
> > > > Fix this by explicitly removing the device from the deferred probe
> > > > pending list in case of probe failures.
> > > >
> > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > >
> > > Good catch:
> > >
> > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The issue is real and needs to be fixed. But I'm confused how this can
> > happen. We won't even enter really_probe() if the driver isn't ready.
> > We also won't get to run the driver's .probe() if the suppliers aren't
> > ready. So how does the device get added to the deferred probe list
> > before the driver is ready? Is this due to device_links_driver_bound()
> > on the supplier?
> >
> > Can you give a more detailed step by step on the case you are hitting?
>
> The device is added to the list due to device_links_driver_bound()
> calling driver_deferred_probe_add() on all consumer devices.

Thanks for the explanation. Maybe add more details like this to the
commit text or in the code?

For the code:
Reviewed-by: Saravana Kanna <saravanak@google.com>

-Saravana

>
> > > > +++ b/drivers/base/dd.c
> > > > @@ -639,11 +639,13 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > >         case -ENXIO:
> > > >                 pr_debug("%s: probe of %s rejects match %d\n",
> > > >                          drv->name, dev_name(dev), ret);
> > > > +               driver_deferred_probe_del(dev);
> > > >                 break;
> > > >         default:
> > > >                 /* driver matched but the probe failed */
> > > >                 pr_warn("%s: probe of %s failed with error %d\n",
> > > >                         drv->name, dev_name(dev), ret);
> > > > +               driver_deferred_probe_del(dev);
> > > >         }
> > > >         /*
> > > >          * Ignore errors returned by ->probe so that the next driver can try
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
Saravana Kannan Feb. 16, 2021, 5:07 p.m. UTC | #5
On Mon, Feb 15, 2021 at 12:59 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Feb 15, 2021 at 11:08 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Mon, Feb 15, 2021 at 7:27 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Mon, Feb 15, 2021 at 6:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > On Mon, Feb 15, 2021 at 12:16 PM Geert Uytterhoeven
> > > > <geert+renesas@glider.be> wrote:
> > > > > With fw_devlink=permissive, devices are added to the deferred probe
> > > > > pending list if their driver's .probe() method returns -EPROBE_DEFER.
> > > > >
> > > > > With fw_devlink=on, devices are added to the deferred probe pending list
> > > > > if they are determined to be a consumer,
> > >
> > > If they are determined to be a consumer or if they are determined to
> > > have a supplier that hasn't probed yet?
> >
> > When the supplier has probed:
> >
> >     bus: 'platform': driver_probe_device: matched device
> > e6150000.clock-controller with driver renesas-cpg-mssr
> >     bus: 'platform': really_probe: probing driver renesas-cpg-mssr
> > with device e6150000.clock-controller
> >     PM: Added domain provider from /soc/clock-controller@e6150000
> >     driver: 'renesas-cpg-mssr': driver_bound: bound to device
> > 'e6150000.clock-controller'
> >     platform e6055800.gpio: Added to deferred list
> >     [...]
> >     platform e6020000.watchdog: Added to deferred list
> >     [...]
> >     platform fe000000.pcie: Added to deferred list
> >
> > > > > which happens before their
> > > > > driver's .probe() method is called.  If the actual probe fails later
> > > > > (real failure, not -EPROBE_DEFER), the device will still be on the
> > > > > deferred probe pending list, and it will be probed again when deferred
> > > > > probing kicks in, which is futile.
> > > > >
> > > > > Fix this by explicitly removing the device from the deferred probe
> > > > > pending list in case of probe failures.
> > > > >
> > > > > Fixes: e590474768f1cc04 ("driver core: Set fw_devlink=on by default")
> > > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
> > > >
> > > > Good catch:
> > > >
> > > > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > >
> > > The issue is real and needs to be fixed. But I'm confused how this can
> > > happen. We won't even enter really_probe() if the driver isn't ready.
> > > We also won't get to run the driver's .probe() if the suppliers aren't
> > > ready. So how does the device get added to the deferred probe list
> > > before the driver is ready? Is this due to device_links_driver_bound()
> > > on the supplier?
> > >
> > > Can you give a more detailed step by step on the case you are hitting?
> >
> > The device is added to the list due to device_links_driver_bound()
> > calling driver_deferred_probe_add() on all consumer devices.
>
> Thanks for the explanation. Maybe add more details like this to the
> commit text or in the code?
>
> For the code:
> Reviewed-by: Saravana Kanna <saravanak@google.com>

Ugh... I just realized that I might have to give this a Nak because of
bad locking in deferred_probe_work_func(). The unlock/lock inside the
loop is a terrible hack. If we add this patch, we can end up modifying
a linked list while it's being traversed and cause a crash or busy
loop (you'll accidentally end up on an "empty list"). I ran into a
similar issue during one of my unrelated refactors.

-Saravana
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9179825ff646f4e3..91c4181093c43709 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -639,11 +639,13 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 	case -ENXIO:
 		pr_debug("%s: probe of %s rejects match %d\n",
 			 drv->name, dev_name(dev), ret);
+		driver_deferred_probe_del(dev);
 		break;
 	default:
 		/* driver matched but the probe failed */
 		pr_warn("%s: probe of %s failed with error %d\n",
 			drv->name, dev_name(dev), ret);
+		driver_deferred_probe_del(dev);
 	}
 	/*
 	 * Ignore errors returned by ->probe so that the next driver can try