[driver-core,v6,6/9] driver core: Probe devices asynchronously instead of the driver
diff mbox series

Message ID 154170043123.12967.3591757325647337726.stgit@ahduyck-desk1.jf.intel.com
State Superseded
Headers show
Series
  • Add NUMA aware async_schedule calls
Related show

Commit Message

Alexander Duyck Nov. 8, 2018, 6:07 p.m. UTC
Probe devices asynchronously instead of the driver. This results in us
seeing the same behavior if the device is registered before the driver or
after. This way we can avoid serializing the initialization should the
driver not be loaded until after the devices have already been added.

The motivation behind this is that if we have a set of devices that
take a significant amount of time to load we can greatly reduce the time to
load by processing them in parallel instead of one at a time. In addition,
each device can exist on a different node so placing a single thread on one
CPU to initialize all of the devices for a given driver can result in poor
performance on a system with multiple nodes.

I am using the driver_data member of the device struct to store the driver
pointer while we wait on the deferred probe call. This should be safe to do
as the value will either be set to NULL on a failed probe or driver load
followed by unload, or the driver value itself will be set on a successful
driver load. In addition I have used the async_probe flag to add additional
protection as it will be cleared if someone overwrites the driver_data
member as a part of loading the driver.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/base/bus.c     |   23 ++--------------
 drivers/base/dd.c      |   68 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h |   10 ++++++-
 3 files changed, 80 insertions(+), 21 deletions(-)

Comments

Bart Van Assche Nov. 8, 2018, 11:59 p.m. UTC | #1
On Thu, 2018-11-08 at 10:07 -0800, Alexander Duyck wrote:
> Probe devices asynchronously instead of the driver. This results in us
> seeing the same behavior if the device is registered before the driver or
> after. This way we can avoid serializing the initialization should the
> driver not be loaded until after the devices have already been added.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Dan Williams Nov. 27, 2018, 2:48 a.m. UTC | #2
On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> Probe devices asynchronously instead of the driver. This results in us
> seeing the same behavior if the device is registered before the driver or
> after. This way we can avoid serializing the initialization should the
> driver not be loaded until after the devices have already been added.
>
> The motivation behind this is that if we have a set of devices that
> take a significant amount of time to load we can greatly reduce the time to
> load by processing them in parallel instead of one at a time. In addition,
> each device can exist on a different node so placing a single thread on one
> CPU to initialize all of the devices for a given driver can result in poor
> performance on a system with multiple nodes.

Do you have numbers on effects of this change individually? Is this
change necessary for the libnvdimm init speedup, or is it independent?

> I am using the driver_data member of the device struct to store the driver
> pointer while we wait on the deferred probe call. This should be safe to do
> as the value will either be set to NULL on a failed probe or driver load
> followed by unload, or the driver value itself will be set on a successful
> driver load. In addition I have used the async_probe flag to add additional
> protection as it will be cleared if someone overwrites the driver_data
> member as a part of loading the driver.

I would not put it past a device-driver to call dev_get_drvdata()
before dev_set_drvdata(), to check "has this device already been
initialized". So I don't think it is safe to assume that the core can
stash this information in ->driver_data. Why not put this
infrastructure in struct device_private?
Alexander Duyck Nov. 27, 2018, 5:57 p.m. UTC | #3
On Mon, 2018-11-26 at 18:48 -0800, Dan Williams wrote:
> On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
> > 
> > Probe devices asynchronously instead of the driver. This results in us
> > seeing the same behavior if the device is registered before the driver or
> > after. This way we can avoid serializing the initialization should the
> > driver not be loaded until after the devices have already been added.
> > 
> > The motivation behind this is that if we have a set of devices that
> > take a significant amount of time to load we can greatly reduce the time to
> > load by processing them in parallel instead of one at a time. In addition,
> > each device can exist on a different node so placing a single thread on one
> > CPU to initialize all of the devices for a given driver can result in poor
> > performance on a system with multiple nodes.
> 
> Do you have numbers on effects of this change individually? Is this
> change necessary for the libnvdimm init speedup, or is it independent?

It depends on the case. I was using X86_PMEM_LEGACY_DEVICE to spawn a
couple of 32GB persistent memory devices. I had to use this patch and
the async_probe option to get them loading in parallel versus serial as
the driver load order is a bit different.

Basically as long as all the necessary drivers are loaded for libnvdimm
you are good, however if the device can get probed before the driver is
loaded you run into issues as the loading will be serialized without
this patch.

> > I am using the driver_data member of the device struct to store the driver
> > pointer while we wait on the deferred probe call. This should be safe to do
> > as the value will either be set to NULL on a failed probe or driver load
> > followed by unload, or the driver value itself will be set on a successful
> > driver load. In addition I have used the async_probe flag to add additional
> > protection as it will be cleared if someone overwrites the driver_data
> > member as a part of loading the driver.
> 
> I would not put it past a device-driver to call dev_get_drvdata()
> before dev_set_drvdata(), to check "has this device already been
> initialized". So I don't think it is safe to assume that the core can
> stash this information in ->driver_data. Why not put this
> infrastructure in struct device_private?

The data should be cleared before we even get to the probe call so I am
not sure that is something we would need to worry about.

As far as why I didn't use device_private, it was mostly just for the
sake of space savings. I only had to add one bit to an existing
bitfield to make the async_probe approach work, and the drvdata just
seemed like the obvious place to put the deferred driver.
Dan Williams Nov. 27, 2018, 6:32 p.m. UTC | #4
On Tue, Nov 27, 2018 at 9:58 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Mon, 2018-11-26 at 18:48 -0800, Dan Williams wrote:
> > On Thu, Nov 8, 2018 at 10:07 AM Alexander Duyck
> > <alexander.h.duyck@linux.intel.com> wrote:
> > >
> > > Probe devices asynchronously instead of the driver. This results in us
> > > seeing the same behavior if the device is registered before the driver or
> > > after. This way we can avoid serializing the initialization should the
> > > driver not be loaded until after the devices have already been added.
> > >
> > > The motivation behind this is that if we have a set of devices that
> > > take a significant amount of time to load we can greatly reduce the time to
> > > load by processing them in parallel instead of one at a time. In addition,
> > > each device can exist on a different node so placing a single thread on one
> > > CPU to initialize all of the devices for a given driver can result in poor
> > > performance on a system with multiple nodes.
> >
> > Do you have numbers on effects of this change individually? Is this
> > change necessary for the libnvdimm init speedup, or is it independent?
>
> It depends on the case. I was using X86_PMEM_LEGACY_DEVICE to spawn a
> couple of 32GB persistent memory devices. I had to use this patch and
> the async_probe option to get them loading in parallel versus serial as
> the driver load order is a bit different.
>
> Basically as long as all the necessary drivers are loaded for libnvdimm
> you are good, however if the device can get probed before the driver is
> loaded you run into issues as the loading will be serialized without
> this patch.

I think we could achieve the same with something like the following:

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 77f188cd8023..66c9827efdb4 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -3718,5 +3718,6 @@ static __exit void nfit_exit(void)

 module_init(nfit_init);
 module_exit(nfit_exit);
+MODULE_SOFTDEP("pre: nd_pmem");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");

...to ensure that the pmem driver is loaded and ready to service
devices before they start being discovered.

>
> > > I am using the driver_data member of the device struct to store the driver
> > > pointer while we wait on the deferred probe call. This should be safe to do
> > > as the value will either be set to NULL on a failed probe or driver load
> > > followed by unload, or the driver value itself will be set on a successful
> > > driver load. In addition I have used the async_probe flag to add additional
> > > protection as it will be cleared if someone overwrites the driver_data
> > > member as a part of loading the driver.
> >
> > I would not put it past a device-driver to call dev_get_drvdata()
> > before dev_set_drvdata(), to check "has this device already been
> > initialized". So I don't think it is safe to assume that the core can
> > stash this information in ->driver_data. Why not put this
> > infrastructure in struct device_private?
>
> The data should be cleared before we even get to the probe call so I am
> not sure that is something we would need to worry about.

Yes it "should", but I have the sense that I have seen code that looks
at dev_get_drvdata() != NULL when it really should be looking at
dev->driver. Maybe not in leaf drivers, but bus code.

> As far as why I didn't use device_private, it was mostly just for the
> sake of space savings. I only had to add one bit to an existing
> bitfield to make the async_probe approach work, and the drvdata just
> seemed like the obvious place to put the deferred driver.

It seems device_private already has deferred_probe data, why not async_probe?

Patch
diff mbox series

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 8a630f9bd880..0cd2eadd0816 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -606,17 +606,6 @@  static ssize_t uevent_store(struct device_driver *drv, const char *buf,
 }
 static DRIVER_ATTR_WO(uevent);
 
-static void driver_attach_async(void *_drv, async_cookie_t cookie)
-{
-	struct device_driver *drv = _drv;
-	int ret;
-
-	ret = driver_attach(drv);
-
-	pr_debug("bus: '%s': driver %s async attach completed: %d\n",
-		 drv->bus->name, drv->name, ret);
-}
-
 /**
  * bus_add_driver - Add a driver to the bus.
  * @drv: driver.
@@ -649,15 +638,9 @@  int bus_add_driver(struct device_driver *drv)
 
 	klist_add_tail(&priv->knode_bus, &bus->p->klist_drivers);
 	if (drv->bus->p->drivers_autoprobe) {
-		if (driver_allows_async_probing(drv)) {
-			pr_debug("bus: '%s': probing driver %s asynchronously\n",
-				drv->bus->name, drv->name);
-			async_schedule(driver_attach_async, drv);
-		} else {
-			error = driver_attach(drv);
-			if (error)
-				goto out_unregister;
-		}
+		error = driver_attach(drv);
+		if (error)
+			goto out_unregister;
 	}
 	module_add_driver(drv->owner, drv);
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ed19cf0d6f9a..f4e84d639c69 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -808,6 +808,7 @@  static int __device_attach(struct device *dev, bool allow_async)
 			ret = 1;
 		else {
 			dev->driver = NULL;
+			dev_set_drvdata(dev, NULL);
 			ret = 0;
 		}
 	} else {
@@ -925,6 +926,48 @@  int device_driver_attach(struct device_driver *drv, struct device *dev)
 	return ret;
 }
 
+static inline struct device_driver *dev_get_drv_async(const struct device *dev)
+{
+	return dev->async_probe ? dev->driver_data : NULL;
+}
+
+static inline void dev_set_drv_async(struct device *dev,
+				     struct device_driver *drv)
+{
+	/*
+	 * Set async_probe to true indicating we are waiting for this data to be
+	 * loaded as a potential driver.
+	 */
+	dev->driver_data = drv;
+	dev->async_probe = true;
+}
+
+static void __driver_attach_async_helper(void *_dev, async_cookie_t cookie)
+{
+	struct device *dev = _dev;
+	struct device_driver *drv;
+
+	__device_driver_lock(dev, dev->parent);
+
+	/*
+	 * If someone attempted to bind a driver either successfully or
+	 * unsuccessfully before we got here we should just skip the driver
+	 * probe call.
+	 */
+	drv = dev_get_drv_async(dev);
+	if (drv && !dev->driver)
+		driver_probe_device(drv, dev);
+
+	/* We made our attempt at an async_probe, clear the flag */
+	dev->async_probe = false;
+
+	__device_driver_unlock(dev, dev->parent);
+
+	put_device(dev);
+
+	dev_dbg(dev, "async probe completed\n");
+}
+
 static int __driver_attach(struct device *dev, void *data)
 {
 	struct device_driver *drv = data;
@@ -952,6 +995,25 @@  static int __driver_attach(struct device *dev, void *data)
 		return ret;
 	} /* ret > 0 means positive match */
 
+	if (driver_allows_async_probing(drv)) {
+		/*
+		 * Instead of probing the device synchronously we will
+		 * probe it asynchronously to allow for more parallelism.
+		 *
+		 * We only take the device lock here in order to guarantee
+		 * that the dev->driver and driver_data fields are protected
+		 */
+		dev_dbg(dev, "scheduling asynchronous probe\n");
+		device_lock(dev);
+		if (!dev->driver) {
+			get_device(dev);
+			dev_set_drv_async(dev, drv);
+			async_schedule(__driver_attach_async_helper, dev);
+		}
+		device_unlock(dev);
+		return 0;
+	}
+
 	device_driver_attach(drv, dev);
 
 	return 0;
@@ -1049,6 +1111,12 @@  void device_release_driver_internal(struct device *dev,
 {
 	__device_driver_lock(dev, parent);
 
+	/*
+	 * We shouldn't need to add a check for any pending async_probe here
+	 * because the only caller that will pass us a driver, driver_detach,
+	 * should have been called after the driver was removed from the bus
+	 * and will call async_synchronize_full before we get to this point.
+	 */
 	if (!drv || drv == dev->driver)
 		__device_release_driver(dev, parent);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 4d2eb2c74149..2305eb886006 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -910,7 +910,9 @@  struct dev_links_info {
  * 		variants, which GPIO pins act in what additional roles, and so
  * 		on.  This shrinks the "Board Support Packages" (BSPs) and
  * 		minimizes board-specific #ifdefs in drivers.
- * @driver_data: Private pointer for driver specific info.
+ * @driver_data: Private pointer for driver specific info if driver is
+ *		non-NULL. Pointer to deferred driver to be attached if driver
+ *		is NULL.
  * @links:	Links to suppliers and consumers of this device.
  * @power:	For device power management.
  *		See Documentation/driver-api/pm/devices.rst for details.
@@ -1118,6 +1120,12 @@  static inline void *dev_get_drvdata(const struct device *dev)
 
 static inline void dev_set_drvdata(struct device *dev, void *data)
 {
+	/*
+	 * clear async_probe to prevent us from attempting to read driver_data
+	 * as a driver. We can reset this to true for the one case where we are
+	 * using this to record an actual driver.
+	 */
+	dev->async_probe = false;
 	dev->driver_data = data;
 }