diff mbox series

[driver-core,v7,4/9] driver core: Probe devices asynchronously instead of the driver

Message ID 154345154692.18040.8161459765233879389.stgit@ahduyck-desk1.amr.corp.intel.com (mailing list archive)
State Superseded
Headers show
Series Add NUMA aware async_schedule calls | expand

Commit Message

Alexander Duyck Nov. 29, 2018, 12:32 a.m. UTC
Probe devices asynchronously instead of the driver. This results in us
seeing the same behavior if the device is registered before the driver or
after. This way we can avoid serializing the initialization should the
driver not be loaded until after the devices have already been added.

The motivation behind this is that if we have a set of devices that
take a significant amount of time to load we can greatly reduce the time to
load by processing them in parallel instead of one at a time. In addition,
each device can exist on a different node so placing a single thread on one
CPU to initialize all of the devices for a given driver can result in poor
performance on a system with multiple nodes.

This approach can reduce the time needed to scan SCSI LUNs significantly.
The only way to realize that speedup is by enabling more concurrency which
is what is achieved with this patch.

To achieve this it was necessary to add a new member "async_driver" to the
device_private structure to store the driver pointer while we wait on the
deferred probe call.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/base/base.h |    2 +
 drivers/base/bus.c  |   23 ++---------------
 drivers/base/dd.c   |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 73 insertions(+), 21 deletions(-)

Comments

Luis Chamberlain Dec. 1, 2018, 2:48 a.m. UTC | #1
On Wed, Nov 28, 2018 at 04:32:26PM -0800, Alexander Duyck wrote:
> Probe devices asynchronously instead of the driver.

> +static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
> +{
> +	struct device *dev = _dev;
> +	struct device_driver *drv;
> +
> +	__device_driver_lock(dev, dev->parent);
> +
> +	/*
> +	 * If someone attempted to bind a driver either successfully or
> +	 * unsuccessfully before we got here we should just skip the driver
> +	 * probe call.
> +	 */
> +	drv = dev_get_drv_async(dev);
> +	if (drv && !dev->driver)
> +		driver_probe_device(drv, dev);

I believe this should mean drivers which have async work on probe can
deadlock. For instance, if a driver does call async_schedule() or a
derivative call does this for it, the kernel will call
async_synchronize_full() and I believe we deadlock.

Are we sure most subsystems which would use async probe will not have
an async_schedule() call?

  Luis
Alexander Duyck Dec. 3, 2018, 4:44 p.m. UTC | #2
On Fri, 2018-11-30 at 18:48 -0800, Luis Chamberlain wrote:
> On Wed, Nov 28, 2018 at 04:32:26PM -0800, Alexander Duyck wrote:
> > Probe devices asynchronously instead of the driver.
> > +static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
> > +{
> > +	struct device *dev = _dev;
> > +	struct device_driver *drv;
> > +
> > +	__device_driver_lock(dev, dev->parent);
> > +
> > +	/*
> > +	 * If someone attempted to bind a driver either successfully or
> > +	 * unsuccessfully before we got here we should just skip the driver
> > +	 * probe call.
> > +	 */
> > +	drv = dev_get_drv_async(dev);
> > +	if (drv && !dev->driver)
> > +		driver_probe_device(drv, dev);
> 
> I believe this should mean drivers which have async work on probe can
> deadlock. For instance, if a driver does call async_schedule() or a
> derivative call does this for it, the kernel will call
> async_synchronize_full() and I believe we deadlock.
> 
> Are we sure most subsystems which would use async probe will not have
> an async_schedule() call?
> 
>   Luis

So the async_schedule call isn't a problem. I would only be an issue if
they are calling async_sychronize_full while we are holding a lock
and/or mutex. To mitigate that I believe many drivers are just using
the domain version of things instead of using the global async calls.

An issue like what you have described would already exist if there is
code like that floating around out there. As is this patch isn't
changing the fact that a driver can load asynchronously. All it is
doing is allowing each device to be handled asynchronously instead of
having just one thread work its way though all the devices one at a
time.

The earlier bug we were addressing in patch 1/9 was something like what
you were describing where we were performing an async_synchronize_full
while holding the device lock. I would think the requirement if you are
going to are going to use async within a driver is to use the domain
specific version instead of just synchronizing entire domains, or if
you must synchronize the entire domain you should not be doing so while
holding any locks and/or mutexs.

One of the reasons why I am using a flag to perform the synchronization
between the device_add and device_del in patch 2 is because technically
any driver can be turned into an asynchronous probing driver by just
adding the kernel parameter <driver>.async_probe. That flag is somewhat
hidden here as dev_get_drv_async was checking for the async_probe flag
in this version of the patch. In the future I plan to replace the
"async_probe" flag with a "dead" flag to indicate that the device is in
the process of doing through a device_del which should accomplish the
same thing.

- Alex
diff mbox series

Patch

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 3f22ebd6117a..c95384a8e53c 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -64,6 +64,7 @@  struct driver_private {
  *	binding of drivers which were unable to get all the resources needed by
  *	the device; typically because it depends on another driver getting
  *	probed first.
+ * @async_driver - pointer to device driver awaiting probe via async_probe
  * @device - pointer back to the struct device that this structure is
  * associated with.
  *
@@ -75,6 +76,7 @@  struct device_private {
 	struct klist_node knode_driver;
 	struct klist_node knode_bus;
 	struct list_head deferred_probe;
+	struct device_driver *async_driver;
 	struct device *device;
 };
 #define to_device_private_parent(obj)	\
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 8a630f9bd880..0cd2eadd0816 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -606,17 +606,6 @@  static ssize_t uevent_store(struct device_driver *drv, const char *buf,
 }
 static DRIVER_ATTR_WO(uevent);
 
-static void driver_attach_async(void *_drv, async_cookie_t cookie)
-{
-	struct device_driver *drv = _drv;
-	int ret;
-
-	ret = driver_attach(drv);
-
-	pr_debug("bus: '%s': driver %s async attach completed: %d\n",
-		 drv->bus->name, drv->name, ret);
-}
-
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -649,15 +638,9 @@  int bus_add_driver(struct device_driver *drv)
 
 	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	if (drv->bus->p->drivers_autoprobe) {
-		if (driver_allows_async_probing(drv)) {
-			pr_debug("bus: '%s': probing driver %s asynchronously\n",
-				drv->bus->name, drv->name);
-			async_schedule(driver_attach_async, drv);
-		} else {
-			error = driver_attach(drv);
-			if (error)
-				goto out_unregister;
-		}
+		error = driver_attach(drv);
+		if (error)
+			goto out_unregister;
 	}
 	module_add_driver(drv->owner, drv);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index d2515520569e..036c8ffa522f 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -674,6 +674,22 @@  int driver_probe_device(struct device_driver *drv, struct device *dev)
 	return ret;
 }
 
+static inline struct device_driver *dev_get_drv_async(const struct device *dev)
+{
+	return dev->async_probe ? dev->p->async_driver : NULL;
+}
+
+static inline void dev_set_drv_async(struct device *dev,
+				     struct device_driver *drv)
+{
+	/*
+	 * Set async_probe to true indicating we are waiting for this data to be
+	 * loaded as a potential driver.
+	 */
+	dev->p->async_driver = drv;
+	dev->async_probe = true;
+}
+
 bool driver_allows_async_probing(struct device_driver *drv)
 {
 	switch (drv->probe_type) {
@@ -836,7 +852,7 @@  static int __device_attach(struct device *dev, bool allow_async)
 			 */
 			dev_dbg(dev, "scheduling asynchronous probe\n");
 			get_device(dev);
-			dev->async_probe = true;
+			dev_set_drv_async(dev, NULL);
 			async_schedule(__device_attach_async_helper, dev);
 		} else {
 			pm_request_idle(dev);
@@ -929,6 +945,32 @@  int device_driver_attach(struct device_driver *drv, struct device *dev)
 	return ret;
 }
 
+static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
+{
+	struct device *dev = _dev;
+	struct device_driver *drv;
+
+	__device_driver_lock(dev, dev->parent);
+
+	/*
+	 * If someone attempted to bind a driver either successfully or
+	 * unsuccessfully before we got here we should just skip the driver
+	 * probe call.
+	 */
+	drv = dev_get_drv_async(dev);
+	if (drv && !dev->driver)
+		driver_probe_device(drv, dev);
+
+	/* We made our attempt at an async_probe, clear the flag */
+	dev->async_probe = false;
+
+	__device_driver_unlock(dev, dev->parent);
+
+	put_device(dev);
+
+	dev_dbg(dev, "async probe completed\n");
+}
+
 static int __driver_attach(struct device *dev, void *data)
 {
 	struct device_driver *drv = data;
@@ -956,6 +998,25 @@  static int __driver_attach(struct device *dev, void *data)
 		return ret;
 	} /* ret > 0 means positive match */
 
+	if (driver_allows_async_probing(drv)) {
+		/*
+		 * Instead of probing the device synchronously we will
+		 * probe it asynchronously to allow for more parallelism.
+		 *
+		 * We only take the device lock here in order to guarantee
+		 * that the dev->driver and async_driver fields are protected
+		 */
+		dev_dbg(dev, "scheduling asynchronous probe\n");
+		device_lock(dev);
+		if (!dev->driver) {
+			get_device(dev);
+			dev_set_drv_async(dev, drv);
+			async_schedule(__driver_attach_async_helper, dev);
+		}
+		device_unlock(dev);
+		return 0;
+	}
+
 	device_driver_attach(drv, dev);
 
 	return 0;
@@ -1054,6 +1115,12 @@  void device_release_driver_internal(struct device *dev,
 {
 	__device_driver_lock(dev, parent);
 
+	/*
+	 * We shouldn't need to add a check for any pending async_probe here
+	 * because the only caller that will pass us a driver, driver_detach,
+	 * should have been called after the driver was removed from the bus
+	 * and will call async_synchronize_full before we get to this point.
+	 */
 	if (!drv || drv == dev->driver)
 		__device_release_driver(dev, parent);