diff mbox series

[01/10] driver core: Do not continue searching for drivers if deferred probe is used

Message ID 1-v1-324b2038f212+1041f1-vfio3a_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Allow mdev drivers to directly create the vfio_device | expand

Commit Message

Jason Gunthorpe June 8, 2021, 12:55 a.m. UTC
Once a driver has been matched and probe() returns with -EPROBE_DEFER the
device is added to a deferred list and will be retried later.

At this point __device_attach_driver() should stop trying other drivers as
we have "matched" this driver and already scheduled another probe to
happen later.

Return the -EPROBE_DEFER from really_probe() instead of squashing it to
zero. This is similar to the code at the top of the function which
directly returns -EPROBE_DEFER.

It is not really a bug as, AFAIK, we don't actually have cases where
multiple drivers can bind.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/base/dd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig June 8, 2021, 5:51 a.m. UTC | #1
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Greg KH June 8, 2021, 6:44 a.m. UTC | #2
On Mon, Jun 07, 2021 at 09:55:43PM -0300, Jason Gunthorpe wrote:
> Once a driver has been matched and probe() returns with -EPROBE_DEFER the
> device is added to a deferred list and will be retried later.
> 
> At this point __device_attach_driver() should stop trying other drivers as
> we have "matched" this driver and already scheduled another probe to
> happen later.
> 
> Return the -EPROBE_DEFER from really_probe() instead of squashing it to
> zero. This is similar to the code at the top of the function which
> directly returns -EPROBE_DEFER.
> 
> It is not really a bug as, AFAIK, we don't actually have cases where
> multiple drivers can bind.

We _do_ have devices that multiple drivers can bind to.  Are you sure
you did not just break them?

Why are you needing to change this?  What is it helping?  What problem
is this solving?

thanks,

greg k-h
Greg KH June 8, 2021, 7:35 a.m. UTC | #3
On Mon, Jun 07, 2021 at 09:55:43PM -0300, Jason Gunthorpe wrote:
> Once a driver has been matched and probe() returns with -EPROBE_DEFER the
> device is added to a deferred list and will be retried later.
> 
> At this point __device_attach_driver() should stop trying other drivers as
> we have "matched" this driver and already scheduled another probe to
> happen later.
> 
> Return the -EPROBE_DEFER from really_probe() instead of squashing it to
> zero. This is similar to the code at the top of the function which
> directly returns -EPROBE_DEFER.
> 
> It is not really a bug as, AFAIK, we don't actually have cases where
> multiple drivers can bind.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/base/dd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index ecd7cf848daff7..9d79a139290271 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -656,7 +656,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  		/* Driver requested deferred probing */
>  		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
>  		driver_deferred_probe_add_trigger(dev, local_trigger_count);
> -		break;
> +		goto done;
>  	case -ENODEV:
>  	case -ENXIO:
>  		pr_debug("%s: probe of %s rejects match %d\n",
> -- 
> 2.31.1
> 

Why is lkml not cc:ed on driver core changes like get_maintainer.pl will
say?

thanks,

greg k-h
Jason Gunthorpe June 8, 2021, 12:16 p.m. UTC | #4
On Tue, Jun 08, 2021 at 08:44:20AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 07, 2021 at 09:55:43PM -0300, Jason Gunthorpe wrote:
> > Once a driver has been matched and probe() returns with -EPROBE_DEFER the
> > device is added to a deferred list and will be retried later.
> > 
> > At this point __device_attach_driver() should stop trying other drivers as
> > we have "matched" this driver and already scheduled another probe to
> > happen later.
> > 
> > Return the -EPROBE_DEFER from really_probe() instead of squashing it to
> > zero. This is similar to the code at the top of the function which
> > directly returns -EPROBE_DEFER.
> > 
> > It is not really a bug as, AFAIK, we don't actually have cases where
> > multiple drivers can bind.
> 
> We _do_ have devices that multiple drivers can bind to.  Are you sure
> you did not just break them?

I asked Cornelia Huck who added this code if she knew who was using it
and she said she added it but never ended up using it. Can you point
at where there are multiple drivers matching the same device?

If multiple drivers are matchable what creates determinism in which
will bind?

And yes, this would break devices with multiple drivers that are using
EPROBE_DEFER to set driver bind ordering. Do you know of any place
doing that?

> Why are you needing to change this?  What is it helping?  What problem
> is this solving?

This special flow works by returning 'success' when the driver bind
actually failed. This is OK in the one call site that continues to
scan but is not OK in the other callsites that don't keep scanning and
want to see error codes.

So after the later patches this becomes quite a bit more complicated
to keep implemented as-is. Better to delete it if it is safe to delete.

Jason
Jason Gunthorpe June 8, 2021, 12:17 p.m. UTC | #5
On Tue, Jun 08, 2021 at 09:35:17AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 07, 2021 at 09:55:43PM -0300, Jason Gunthorpe wrote:
> > Once a driver has been matched and probe() returns with -EPROBE_DEFER the
> > device is added to a deferred list and will be retried later.
> > 
> > At this point __device_attach_driver() should stop trying other drivers as
> > we have "matched" this driver and already scheduled another probe to
> > happen later.
> > 
> > Return the -EPROBE_DEFER from really_probe() instead of squashing it to
> > zero. This is similar to the code at the top of the function which
> > directly returns -EPROBE_DEFER.
> > 
> > It is not really a bug as, AFAIK, we don't actually have cases where
> > multiple drivers can bind.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >  drivers/base/dd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index ecd7cf848daff7..9d79a139290271 100644
> > +++ b/drivers/base/dd.c
> > @@ -656,7 +656,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >  		/* Driver requested deferred probing */
> >  		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> >  		driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > -		break;
> > +		goto done;
> >  	case -ENODEV:
> >  	case -ENXIO:
> >  		pr_debug("%s: probe of %s rejects match %d\n",
> 
> Why is lkml not cc:ed on driver core changes like get_maintainer.pl will
> say?

Sorry, this is my error, it was intended to be cc'd but scripting
seems to have malfunctioned in this case. It is what HCH observed too.

Jason
Greg KH June 8, 2021, 1:13 p.m. UTC | #6
On Tue, Jun 08, 2021 at 09:16:54AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 08, 2021 at 08:44:20AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Jun 07, 2021 at 09:55:43PM -0300, Jason Gunthorpe wrote:
> > > Once a driver has been matched and probe() returns with -EPROBE_DEFER the
> > > device is added to a deferred list and will be retried later.
> > > 
> > > At this point __device_attach_driver() should stop trying other drivers as
> > > we have "matched" this driver and already scheduled another probe to
> > > happen later.
> > > 
> > > Return the -EPROBE_DEFER from really_probe() instead of squashing it to
> > > zero. This is similar to the code at the top of the function which
> > > directly returns -EPROBE_DEFER.
> > > 
> > > It is not really a bug as, AFAIK, we don't actually have cases where
> > > multiple drivers can bind.
> > 
> > We _do_ have devices that multiple drivers can bind to.  Are you sure
> > you did not just break them?
> 
> I asked Cornelia Huck who added this code if she knew who was using it
> and she said she added it but never ended up using it. Can you point
> at where there are multiple drivers matching the same device?

USB storage devices is one set of devices where the drivers have the
ability to both bind to the same device but use a way to figure out the
"best" one to do so.

> If multiple drivers are matchable what creates determinism in which
> will bind?

Magic :)

> And yes, this would break devices with multiple drivers that are using
> EPROBE_DEFER to set driver bind ordering. Do you know of any place
> doing that?

Other than usb-storage and uas, not off the top of my head.  I think
there was some platform drivers using this but I can't find them at the
moment.

> > Why are you needing to change this?  What is it helping?  What problem
> > is this solving?
> 
> This special flow works by returning 'success' when the driver bind
> actually failed. This is OK in the one call site that continues to
> scan but is not OK in the other callsites that don't keep scanning and
> want to see error codes.
> 
> So after the later patches this becomes quite a bit more complicated
> to keep implemented as-is. Better to delete it if it is safe to delete.

I think you need to determine if it is safe first as it is changing the
core functionality in the kernel that a few thousand drivers depend on
:)

So a "proof" is needed before I can remove something like this.

thanks,

greg k-h
Jason Gunthorpe June 8, 2021, 1:53 p.m. UTC | #7
On Tue, Jun 08, 2021 at 03:13:37PM +0200, Greg Kroah-Hartman wrote:

> > If multiple drivers are matchable what creates determinism in which
> > will bind?
> 
> Magic :)

I suppose you mean something like this code:

        /* Probe the USB device with the driver in hand, but only
         * defer to a generic driver in case the current USB
         * device driver has an id_table or a match function; i.e.,
         * when the device driver was explicitly matched against
         * a device.
         *
         * If the device driver does not have either of these,
         * then we assume that it can bind to any device and is
         * not truly a more specialized/non-generic driver, so a
         * return value of -ENODEV should not force the device
         * to be handled by the generic USB driver, as there
         * can still be another, more specialized, device driver.
         *
         * This accommodates the usbip driver.
         *
         * TODO: What if, in the future, there are multiple
         * specialized USB device drivers for a particular device?
         * In such cases, there is a need to try all matching
         * specialised device drivers prior to setting the
         * use_generic_driver bit.
         */
        error = udriver->probe(udev);
        if (error == -ENODEV && udriver != &usb_generic_driver &&
            (udriver->id_table || udriver->match)) {
                udev->use_generic_driver = 1;
                return -EPROBE_DEFER;
        }

So it does use EPROBE_DEFER to recycle through another attempt at
driver binding. Yikes. Why didn't it return -ENODEV I wonder?

But OK, I can think of no way to find all the cases like this to even
try to do something else at this point, so this has to be
preserved. I'll try to do that, maybe add a comment or two.

Jason
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ecd7cf848daff7..9d79a139290271 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -656,7 +656,7 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 		/* Driver requested deferred probing */
 		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
 		driver_deferred_probe_add_trigger(dev, local_trigger_count);
-		break;
+		goto done;
 	case -ENODEV:
 	case -ENXIO:
 		pr_debug("%s: probe of %s rejects match %d\n",