diff mbox series

[03/10] driver core: Flow the return code from ->probe() through to sysfs bind

Message ID 3-v1-324b2038f212+1041f1-vfio3a_jgg@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Allow mdev drivers to directly create the vfio_device | expand

Commit Message

Jason Gunthorpe June 8, 2021, 12:55 a.m. UTC
Currently really_probe() returns 1 on success and 0 if the probe() call
fails. This return code arrangement is designed to be useful for
__device_attach_driver() which is walking the device list and trying every
driver. 0 means to keep trying.

However, it is not useful for the other places that call through to
really_probe() that do actually want to see the probe() return code.

For instance bind_store() would be better to return the actual error code
from the driver's probe method, not discarding it and returning -ENODEV.

Reorganize things so that really_probe() always returns an error code on
failure and 0 on success. Move the special code for device list walking
into the walker callback __device_attach_driver() and trigger it based on
an output flag from really_probe(). Update the rest of the API surface to
return a normal -ERR or 0 on success.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/base/bus.c |  6 +----
 drivers/base/dd.c  | 61 ++++++++++++++++++++++++++++++----------------
 2 files changed, 41 insertions(+), 26 deletions(-)

Comments

Christoph Hellwig June 8, 2021, 6:07 a.m. UTC | #1
> index 36d0c654ea6124..03591f82251302 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -212,13 +212,9 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
>  	dev = bus_find_device_by_name(bus, NULL, buf);
>  	if (dev && dev->driver == NULL && driver_match_device(drv, dev)) {
>  		err = device_driver_attach(drv, dev);
> -
> -		if (err > 0) {
> +		if (!err) {
>  			/* success */
>  			err = count;
> -		} else if (err == 0) {
> -			/* driver didn't accept device */
> -			err = -ENODEV;
>  		}
>  	}

I think we can also drop the dev->driver == NULL check above given
that device_driver_attach covers it now.
Greg KH June 8, 2021, 6:47 a.m. UTC | #2
On Mon, Jun 07, 2021 at 09:55:45PM -0300, Jason Gunthorpe wrote:
> Currently really_probe() returns 1 on success and 0 if the probe() call
> fails. This return code arrangement is designed to be useful for
> __device_attach_driver() which is walking the device list and trying every
> driver. 0 means to keep trying.
> 
> However, it is not useful for the other places that call through to
> really_probe() that do actually want to see the probe() return code.
> 
> For instance bind_store() would be better to return the actual error code
> from the driver's probe method, not discarding it and returning -ENODEV.

Why does that matter?  Why does it need to know this?

> Reorganize things so that really_probe() always returns an error code on
> failure and 0 on success. Move the special code for device list walking
> into the walker callback __device_attach_driver() and trigger it based on
> an output flag from really_probe(). Update the rest of the API surface to
> return a normal -ERR or 0 on success.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/base/bus.c |  6 +----
>  drivers/base/dd.c  | 61 ++++++++++++++++++++++++++++++----------------
>  2 files changed, 41 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 36d0c654ea6124..03591f82251302 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -212,13 +212,9 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
>  	dev = bus_find_device_by_name(bus, NULL, buf);
>  	if (dev && dev->driver == NULL && driver_match_device(drv, dev)) {
>  		err = device_driver_attach(drv, dev);
> -
> -		if (err > 0) {
> +		if (!err) {
>  			/* success */
>  			err = count;
> -		} else if (err == 0) {
> -			/* driver didn't accept device */
> -			err = -ENODEV;
>  		}
>  	}
>  	put_device(dev);
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index c1a92cff159873..7fb58e6219b255 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -513,7 +513,13 @@ static ssize_t state_synced_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(state_synced);
>  
> -static int really_probe(struct device *dev, struct device_driver *drv)
> +enum {
> +	/* Set on output if the -ERR has come from a probe() function */
> +	PROBEF_DRV_FAILED = 1 << 0,
> +};
> +
> +static int really_probe(struct device *dev, struct device_driver *drv,
> +			unsigned int *flags)

Ugh, no, please no functions with random "flags" in them, that way lies
madness and unmaintainable code for decades to come.

Especially as I have no idea what this is trying to solve here at all...

greg k-h
Jason Gunthorpe June 8, 2021, 12:30 p.m. UTC | #3
On Tue, Jun 08, 2021 at 08:47:19AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 07, 2021 at 09:55:45PM -0300, Jason Gunthorpe wrote:
> > Currently really_probe() returns 1 on success and 0 if the probe() call
> > fails. This return code arrangement is designed to be useful for
> > __device_attach_driver() which is walking the device list and trying every
> > driver. 0 means to keep trying.
> > 
> > However, it is not useful for the other places that call through to
> > really_probe() that do actually want to see the probe() return code.
> > 
> > For instance bind_store() would be better to return the actual error code
> > from the driver's probe method, not discarding it and returning -ENODEV.
> 
> Why does that matter?  Why does it need to know this?

Proper return code to userspace are important. Knowing why the driver
probe() fails is certainly helpful for debugging. Is there are reason
to hide them? I think this is an improvement for sysfs bind.

Why this series needs it is because mdev has fixed sys uAPI at this point
that requires carring the return code from device driver probe() to
a mdev sysfs function.

> > -static int really_probe(struct device *dev, struct device_driver *drv)
> > +enum {
> > +	/* Set on output if the -ERR has come from a probe() function */
> > +	PROBEF_DRV_FAILED = 1 << 0,
> > +};
> > +
> > +static int really_probe(struct device *dev, struct device_driver *drv,
> > +			unsigned int *flags)
> 
> Ugh, no, please no functions with random "flags" in them, that way lies
> madness and unmaintainable code for decades to come.

The alternative to this something like this:

static int really_probe(struct device *dev, struct device_driver *drv,
			int *probe_err)

And since we still need the 'do not probe defer' in next patches then
it would have to be this:

static int really_probe(struct device *dev, struct device_driver *drv,
			int *probe_err, bool allow_probe_defer)

And the two new arguments flowed up through several function call
sites.

Do you prefer one of these more?

For your other question PROBEF_ means 'probe flag'.

Jason
Greg KH June 8, 2021, 1:16 p.m. UTC | #4
On Tue, Jun 08, 2021 at 09:30:23AM -0300, Jason Gunthorpe wrote:
> On Tue, Jun 08, 2021 at 08:47:19AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Jun 07, 2021 at 09:55:45PM -0300, Jason Gunthorpe wrote:
> > > Currently really_probe() returns 1 on success and 0 if the probe() call
> > > fails. This return code arrangement is designed to be useful for
> > > __device_attach_driver() which is walking the device list and trying every
> > > driver. 0 means to keep trying.
> > > 
> > > However, it is not useful for the other places that call through to
> > > really_probe() that do actually want to see the probe() return code.
> > > 
> > > For instance bind_store() would be better to return the actual error code
> > > from the driver's probe method, not discarding it and returning -ENODEV.
> > 
> > Why does that matter?  Why does it need to know this?
> 
> Proper return code to userspace are important. Knowing why the driver
> probe() fails is certainly helpful for debugging. Is there are reason
> to hide them? I think this is an improvement for sysfs bind.
> 
> Why this series needs it is because mdev has fixed sys uAPI at this point
> that requires carring the return code from device driver probe() to
> a mdev sysfs function.

What is mdev and what userspace tool requires such a userspace api to
depend on this?

Tools doing manual bind/unbind from userspace are crazy, it's always
been a "look at this neat hack!" type of thing.  To do it "right" you
should always do it correctly within the kernel.

> > > +enum {
> > > +	/* Set on output if the -ERR has come from a probe() function */
> > > +	PROBEF_DRV_FAILED = 1 << 0,
> > > +};
> > > +
> > > +static int really_probe(struct device *dev, struct device_driver *drv,
> > > +			unsigned int *flags)
> > 
> > Ugh, no, please no functions with random "flags" in them, that way lies
> > madness and unmaintainable code for decades to come.
> 
> The alternative to this something like this:
> 
> static int really_probe(struct device *dev, struct device_driver *drv,
> 			int *probe_err)
> 
> And since we still need the 'do not probe defer' in next patches then
> it would have to be this:
> 
> static int really_probe(struct device *dev, struct device_driver *drv,
> 			int *probe_err, bool allow_probe_defer)
> 
> And the two new arguments flowed up through several function call
> sites.
> 
> Do you prefer one of these more?

Random boolean flags as parameters are just as bad.

Make the functions able to be understood when read.

> For your other question PROBEF_ means 'probe flag'.

That was not obvious at all, and not something I would remember the next
time I have to look at this code...

Please use full words, we don't have a limit on restricted characters
anymore, this isn't the 1980's...

thanks,

greg k-h
Jason Gunthorpe June 8, 2021, 2:03 p.m. UTC | #5
On Tue, Jun 08, 2021 at 03:16:51PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jun 08, 2021 at 09:30:23AM -0300, Jason Gunthorpe wrote:
> > On Tue, Jun 08, 2021 at 08:47:19AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Jun 07, 2021 at 09:55:45PM -0300, Jason Gunthorpe wrote:
> > > > Currently really_probe() returns 1 on success and 0 if the probe() call
> > > > fails. This return code arrangement is designed to be useful for
> > > > __device_attach_driver() which is walking the device list and trying every
> > > > driver. 0 means to keep trying.
> > > > 
> > > > However, it is not useful for the other places that call through to
> > > > really_probe() that do actually want to see the probe() return code.
> > > > 
> > > > For instance bind_store() would be better to return the actual error code
> > > > from the driver's probe method, not discarding it and returning -ENODEV.
> > > 
> > > Why does that matter?  Why does it need to know this?
> > 
> > Proper return code to userspace are important. Knowing why the driver
> > probe() fails is certainly helpful for debugging. Is there are reason
> > to hide them? I think this is an improvement for sysfs bind.
> > 
> > Why this series needs it is because mdev has fixed sys uAPI at this point
> > that requires carring the return code from device driver probe() to
> > a mdev sysfs function.
> 
> What is mdev and what userspace tool requires such a userspace api to
> depend on this?

Were you able to see the cover letter? mdev is part of vfio, it is
very ugly, but it has a userspace ecosystem now.

> Tools doing manual bind/unbind from userspace are crazy, it's always
> been a "look at this neat hack!" type of thing.  To do it "right" you
> should always do it correctly within the kernel.

Which is what the later patches do for mdev, but the dual operation of
creating the struct device and connecting to its driver have a
historical requirement to return the error code from driver probe to
the userspace.

v1 of this just did a hacky approach inside mdev to achieve this but
Dan and CH thought it would be more widely useful so asked for this
series to allow the driver core to handle it. This did turn out fairly
nice so I tend to agree - but returning the error code is important.

> Random boolean flags as parameters are just as bad.

Sure, but the function needs different behavior depending on the call
site. 

An alternative is to rework all the call chains to somehow embed the
difference directly and I don't have a clear vision how to do that
that is any nicer than this.

Jason
Jason Gunthorpe June 8, 2021, 11:53 p.m. UTC | #6
On Tue, Jun 08, 2021 at 07:07:49AM +0100, Christoph Hellwig wrote:
> > index 36d0c654ea6124..03591f82251302 100644
> > +++ b/drivers/base/bus.c
> > @@ -212,13 +212,9 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
> >  	dev = bus_find_device_by_name(bus, NULL, buf);
> >  	if (dev && dev->driver == NULL && driver_match_device(drv, dev)) {
> >  		err = device_driver_attach(drv, dev);
> > -
> > -		if (err > 0) {
> > +		if (!err) {
> >  			/* success */
> >  			err = count;
> > -		} else if (err == 0) {
> > -			/* driver didn't accept device */
> > -			err = -ENODEV;
> >  		}
> >  	}
> 
> I think we can also drop the dev->driver == NULL check above given
> that device_driver_attach covers it now.

I'm glad you noticed this because it is wonky today:

static ssize_t bind_store() {
        int err = -ENODEV;

        if (dev && dev->driver == NULL && driver_match_device(drv, dev)) {
               err = device_driver_attach() {
                    int ret = 0;
                    __device_driver_lock(dev, dev->parent);
                    if (!dev->p->dead && !dev->driver)
                           ..
                    return ret;
               }
        }
        return err;

Thus if dev->driver == NULL this will usually return -ENODEV unless it
races just right and returns 0. So I fixed it up to always return
-EBUSY and always read dev->driver under the lock.

Jason
diff mbox series

Patch

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 36d0c654ea6124..03591f82251302 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -212,13 +212,9 @@  static ssize_t bind_store(struct device_driver *drv, const char *buf,
 	dev = bus_find_device_by_name(bus, NULL, buf);
 	if (dev && dev->driver == NULL && driver_match_device(drv, dev)) {
 		err = device_driver_attach(drv, dev);
-
-		if (err > 0) {
+		if (!err) {
 			/* success */
 			err = count;
-		} else if (err == 0) {
-			/* driver didn't accept device */
-			err = -ENODEV;
 		}
 	}
 	put_device(dev);
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index c1a92cff159873..7fb58e6219b255 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -513,7 +513,13 @@  static ssize_t state_synced_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(state_synced);
 
-static int really_probe(struct device *dev, struct device_driver *drv)
+enum {
+	/* Set on output if the -ERR has come from a probe() function */
+	PROBEF_DRV_FAILED = 1 << 0,
+};
+
+static int really_probe(struct device *dev, struct device_driver *drv,
+			unsigned int *flags)
 {
 	int ret = -EPROBE_DEFER;
 	int local_trigger_count = atomic_read(&deferred_trigger_count);
@@ -574,12 +580,16 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 
 	if (dev->bus->probe) {
 		ret = dev->bus->probe(dev);
-		if (ret)
+		if (ret) {
+			*flags |= PROBEF_DRV_FAILED;
 			goto probe_failed;
+		}
 	} else if (drv->probe) {
 		ret = drv->probe(dev);
-		if (ret)
+		if (ret) {
+			*flags |= PROBEF_DRV_FAILED;
 			goto probe_failed;
+		}
 	}
 
 	if (device_add_groups(dev, drv->dev_groups)) {
@@ -621,7 +631,6 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 		dev->pm_domain->sync(dev);
 
 	driver_bound(dev);
-	ret = 1;
 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 	goto done;
@@ -656,7 +665,7 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 		/* Driver requested deferred probing */
 		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
 		driver_deferred_probe_add_trigger(dev, local_trigger_count);
-		goto done;
+		break;
 	case -ENODEV:
 	case -ENXIO:
 		pr_debug("%s: probe of %s rejects match %d\n",
@@ -667,11 +676,6 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 		pr_warn("%s: probe of %s failed with error %d\n",
 			drv->name, dev_name(dev), ret);
 	}
-	/*
-	 * Ignore errors returned by ->probe so that the next driver can try
-	 * its luck.
-	 */
-	ret = 0;
 done:
 	atomic_dec(&probe_count);
 	wake_up_all(&probe_waitqueue);
@@ -681,13 +685,14 @@  static int really_probe(struct device *dev, struct device_driver *drv)
 /*
  * For initcall_debug, show the driver probe time.
  */
-static int really_probe_debug(struct device *dev, struct device_driver *drv)
+static int really_probe_debug(struct device *dev, struct device_driver *drv,
+			      unsigned int *flags)
 {
 	ktime_t calltime, rettime;
 	int ret;
 
 	calltime = ktime_get();
-	ret = really_probe(dev, drv);
+	ret = really_probe(dev, drv, flags);
 	rettime = ktime_get();
 	pr_debug("probe of %s returned %d after %lld usecs\n",
 		 dev_name(dev), ret, ktime_us_delta(rettime, calltime));
@@ -732,17 +737,18 @@  EXPORT_SYMBOL_GPL(wait_for_device_probe);
  * driver_probe_device - attempt to bind device & driver together
  * @drv: driver to bind a device to
  * @dev: device to try to bind to the driver
+ * @flags: PROBEF flags input/output
  *
  * This function returns -ENODEV if the device is not registered, -EBUSY if it
- * already has a driver, and 1 if the device is bound successfully and 0
- * otherwise.
+ * already has a driver,  and 0 if the device is bound successfully.
  *
  * This function must be called with @dev lock held.  When called for a
  * USB interface, @dev->parent lock must be held as well.
  *
  * If the device has a parent, runtime-resume the parent before driver probing.
  */
-static int driver_probe_device(struct device_driver *drv, struct device *dev)
+static int driver_probe_device(struct device_driver *drv, struct device *dev,
+			       unsigned int *flags)
 {
 	int ret = 0;
 
@@ -761,9 +767,9 @@  static int driver_probe_device(struct device_driver *drv, struct device *dev)
 
 	pm_runtime_barrier(dev);
 	if (initcall_debug)
-		ret = really_probe_debug(dev, drv);
+		ret = really_probe_debug(dev, drv, flags);
 	else
-		ret = really_probe(dev, drv);
+		ret = really_probe(dev, drv, flags);
 	pm_request_idle(dev);
 
 	if (dev->parent)
@@ -847,6 +853,7 @@  static int __device_attach_driver(struct device_driver *drv, void *_data)
 	struct device_attach_data *data = _data;
 	struct device *dev = data->dev;
 	bool async_allowed;
+	int flags = 0;
 	int ret;
 
 	ret = driver_match_device(drv, dev);
@@ -870,7 +877,17 @@  static int __device_attach_driver(struct device_driver *drv, void *_data)
 	if (data->check_async && async_allowed != data->want_async)
 		return 0;
 
-	return driver_probe_device(drv, dev);
+	ret = driver_probe_device(drv, dev, &flags);
+	if (ret) {
+		/*
+		 * Ignore errors returned by ->probe so that the next driver can
+		 * try its luck.
+		 */
+		if (flags & PROBEF_DRV_FAILED)
+			return 0;
+		return ret;
+	}
+	return 1;
 }
 
 static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
@@ -1026,10 +1043,11 @@  static void __device_driver_unlock(struct device *dev, struct device *parent)
  * @dev: Device to attach it to
  *
  * Manually attach driver to a device. Will acquire both @dev lock and
- * @dev->parent lock if needed.
+ * @dev->parent lock if needed. Returns 0 on success, -ERR on failure.
  */
 int device_driver_attach(struct device_driver *drv, struct device *dev)
 {
+	unsigned int flags = 0;
 	int ret = 0;
 
 	__device_driver_lock(dev, dev->parent);
@@ -1039,7 +1057,7 @@  int device_driver_attach(struct device_driver *drv, struct device *dev)
 	 * just skip the driver probe call.
 	 */
 	if (!dev->driver)
-		ret = driver_probe_device(drv, dev);
+		ret = driver_probe_device(drv, dev, &flags);
 
 	__device_driver_unlock(dev, dev->parent);
 
@@ -1050,11 +1068,12 @@  static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
 {
 	struct device *dev = _dev;
 	struct device_driver *drv;
+	unsigned int flags = 0;
 	int ret;
 
 	__device_driver_lock(dev, dev->parent);
 	drv = dev->p->async_driver;
-	ret = driver_probe_device(drv, dev);
+	ret = driver_probe_device(drv, dev, &flags);
 	__device_driver_unlock(dev, dev->parent);
 
 	dev_dbg(dev, "driver %s async attach completed: %d\n", drv->name, ret);