[driver-core,v7,2/9] driver core: Establish clear order of operations for deferred probe and remove
diff mbox series

Message ID 154345153672.18040.3771035148218843351.stgit@ahduyck-desk1.amr.corp.intel.com
State Superseded
Headers show
Series
  • Add NUMA aware async_schedule calls
Related show

Commit Message

Alexander Duyck Nov. 29, 2018, 12:32 a.m. UTC
Add an additional bit flag to the device struct named async_probe. This
additional flag allows us to guarantee ordering between probe and remove
operations.

This allows us to guarantee that if we execute a remove operation on a
given interface it will not attempt to update the driver member
asynchronously following the earlier operation. Previously this guarantee
was not present and could result in us attempting to remove a driver from
an interface only to have it attempt to attach the driver later when we
finally complete the deferred asynchronous probe call.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/base/dd.c      |   16 ++++++++++++++++
 include/linux/device.h |    3 +++
 2 files changed, 19 insertions(+)

Comments

Dan Williams Nov. 29, 2018, 1:57 a.m. UTC | #1
On Wed, Nov 28, 2018 at 4:32 PM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> Add an additional bit flag to the device struct named async_probe. This
> additional flag allows us to guarantee ordering between probe and remove
> operations.
>
> This allows us to guarantee that if we execute a remove operation on a

You missed the review comment on the usage of "us". I've long been an
abuser of this as well saying "we" and "us" to casually refer to
whatever part of the kernel I'm currently modifying. The problem is
that it is ambiguous and assumes the reader happens translates the
"us" / "we" to the same specific subject you had in mind. It leaves
room for confusion that can be eliminated by explicitly referencing
the expected agent, subject, object in mind.

I long blew off suggestions to correct usages like this, but it
finally sunk in for me after reading Thomas' rewrite of a "we" and
"this" laden changelog, and why he and other tip-maintainers want to
push back on the usage in the tip tree, see the "Changelog" section of
the guidance in "[patch 2/2] Documentation/process: Add tip tree
handbook": https://lkml.org/lkml/2018/11/7/932.

Patch review is quicker without the speed bumps of translating
occurrences of the "we" and "us"

> given interface it will not attempt to update the driver member
> asynchronously following the earlier operation. Previously this guarantee
> was not present and could result in us attempting to remove a driver from
> an interface only to have it attempt to attach the driver later when we
> finally complete the deferred asynchronous probe call.
>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  drivers/base/dd.c      |   16 ++++++++++++++++
>  include/linux/device.h |    3 +++
>  2 files changed, 19 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 88713f182086..ef3f70a7cb5a 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -774,6 +774,10 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
>
>         device_lock(dev);
>
> +       /* nothing to do if async_probe has been cleared */
> +       if (!dev->async_probe)
> +               goto out_unlock;
> +
>         if (dev->parent)
>                 pm_runtime_get_sync(dev->parent);
>
> @@ -785,6 +789,9 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
>         if (dev->parent)
>                 pm_runtime_put(dev->parent);
>
> +       /* We made our attempt at an async_probe, clear the flag */
> +       dev->async_probe = false;
> +out_unlock:
>         device_unlock(dev);
>
>         put_device(dev);
> @@ -829,6 +836,7 @@ static int __device_attach(struct device *dev, bool allow_async)
>                          */
>                         dev_dbg(dev, "scheduling asynchronous probe\n");
>                         get_device(dev);
> +                       dev->async_probe = true;
>                         async_schedule(__device_attach_async_helper, dev);
>                 } else {
>                         pm_request_idle(dev);
> @@ -929,6 +937,14 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>  {
>         struct device_driver *drv;
>
> +       /*
> +        * In the event that we are asked to release the driver on an
> +        * interface that is still waiting on a probe we can just terminate
> +        * the probe by setting async_probe to false. When the async call
> +        * is finally completed it will see this state and just exit.
> +        */
> +       dev->async_probe = false;
> +
>         drv = dev->driver;
>         if (drv) {
>                 while (device_links_busy(dev)) {
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1b25c7a43f4c..4d2eb2c74149 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -957,6 +957,8 @@ struct dev_links_info {
>   *              device.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *             architecture supports non-coherent devices.
> + * @async_probe: This device has an asynchronous probe event pending. Should
> + *              only be updated while holding device lock.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -1051,6 +1053,7 @@ struct device {
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>         bool                    dma_coherent:1;
>  #endif
> +       bool                    async_probe:1;

I think this flag is misnamed, the wrong polarity and should be set in
the device removal path, not the driver detach path. The wider problem
is the removal of a device while actions initiated by its arrival may
still be in flight, or have yet to start. It's not just the probe path
in the driver-core that might be interested in this state, but also
bus implementations that kick off their own async operations.

I think the flag should be named "cancel" and set it in the
device_del() path. Otherwise this is encoding code flow state in the
struct rather than device-state that the code needs to comprehend.
Alexander Duyck Nov. 29, 2018, 6:07 p.m. UTC | #2
On Wed, 2018-11-28 at 17:57 -0800, Dan Williams wrote:
> On Wed, Nov 28, 2018 at 4:32 PM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
> > 
> > Add an additional bit flag to the device struct named async_probe. This
> > additional flag allows us to guarantee ordering between probe and remove
> > operations.
> > 
> > This allows us to guarantee that if we execute a remove operation on a
> 
> You missed the review comment on the usage of "us". I've long been an
> abuser of this as well saying "we" and "us" to casually refer to
> whatever part of the kernel I'm currently modifying. The problem is
> that it is ambiguous and assumes the reader happens translates the
> "us" / "we" to the same specific subject you had in mind. It leaves
> room for confusion that can be eliminated by explicitly referencing
> the expected agent, subject, object in mind.
> 
> I long blew off suggestions to correct usages like this, but it
> finally sunk in for me after reading Thomas' rewrite of a "we" and
> "this" laden changelog, and why he and other tip-maintainers want to
> push back on the usage in the tip tree, see the "Changelog" section of
> the guidance in "[patch 2/2] Documentation/process: Add tip tree
> handbook": https://lkml.org/lkml/2018/11/7/932.
> 
> Patch review is quicker without the speed bumps of translating
> occurrences of the "we" and "us"

It wasn't my intention to blow it off. I have gone through and updated
it in my repo and I can see how it can be confusing as in one spot  I
wasn't sure if the "we"/"us" was the probe or the remove routine.

> > given interface it will not attempt to update the driver member
> > asynchronously following the earlier operation. Previously this guarantee
> > was not present and could result in us attempting to remove a driver from
> > an interface only to have it attempt to attach the driver later when we
> > finally complete the deferred asynchronous probe call.
> > 
> > Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  drivers/base/dd.c      |   16 ++++++++++++++++
> >  include/linux/device.h |    3 +++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 88713f182086..ef3f70a7cb5a 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -774,6 +774,10 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
> > 
> >         device_lock(dev);
> > 
> > +       /* nothing to do if async_probe has been cleared */
> > +       if (!dev->async_probe)
> > +               goto out_unlock;
> > +
> >         if (dev->parent)
> >                 pm_runtime_get_sync(dev->parent);
> > 
> > @@ -785,6 +789,9 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
> >         if (dev->parent)
> >                 pm_runtime_put(dev->parent);
> > 
> > +       /* We made our attempt at an async_probe, clear the flag */
> > +       dev->async_probe = false;
> > +out_unlock:
> >         device_unlock(dev);
> > 
> >         put_device(dev);
> > @@ -829,6 +836,7 @@ static int __device_attach(struct device *dev, bool allow_async)
> >                          */
> >                         dev_dbg(dev, "scheduling asynchronous probe\n");
> >                         get_device(dev);
> > +                       dev->async_probe = true;
> >                         async_schedule(__device_attach_async_helper, dev);
> >                 } else {
> >                         pm_request_idle(dev);
> > @@ -929,6 +937,14 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> >  {
> >         struct device_driver *drv;
> > 
> > +       /*
> > +        * In the event that we are asked to release the driver on an
> > +        * interface that is still waiting on a probe we can just terminate
> > +        * the probe by setting async_probe to false. When the async call
> > +        * is finally completed it will see this state and just exit.
> > +        */
> > +       dev->async_probe = false;
> > +
> >         drv = dev->driver;
> >         if (drv) {
> >                 while (device_links_busy(dev)) {
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 1b25c7a43f4c..4d2eb2c74149 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -957,6 +957,8 @@ struct dev_links_info {
> >   *              device.
> >   * @dma_coherent: this particular device is dma coherent, even if the
> >   *             architecture supports non-coherent devices.
> > + * @async_probe: This device has an asynchronous probe event pending. Should
> > + *              only be updated while holding device lock.
> >   *
> >   * At the lowest level, every device in a Linux system is represented by an
> >   * instance of struct device. The device structure contains the information
> > @@ -1051,6 +1053,7 @@ struct device {
> >      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> >         bool                    dma_coherent:1;
> >  #endif
> > +       bool                    async_probe:1;
> 
> I think this flag is misnamed, the wrong polarity and should be set in
> the device removal path, not the driver detach path. The wider problem
> is the removal of a device while actions initiated by its arrival may
> still be in flight, or have yet to start. It's not just the probe path
> in the driver-core that might be interested in this state, but also
> bus implementations that kick off their own async operations.

Okay, so increase the scope so that the information is usable outside
of driver core.

> I think the flag should be named "cancel" and set it in the
> device_del() path. Otherwise this is encoding code flow state in the
> struct rather than device-state that the code needs to comprehend.

Instead of "cancel" what would you think of "dead"? In my mind once we
call device_del we are essentially working with a dead device object so
that might make more sense in terms of a state rather than "cancel"
which doesn't really tell us what should be canceled.

Looking over the code I could probably set it before we start calling
the notifiers for BUS_NOTIFY_DEL_DEVICE. The only thing I am not sure
about is if we would need to add any sort of synchronization primitives
around it.
Dan Williams Nov. 29, 2018, 6:55 p.m. UTC | #3
On Thu, Nov 29, 2018 at 10:07 AM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Wed, 2018-11-28 at 17:57 -0800, Dan Williams wrote:
[..]
> > I think the flag should be named "cancel" and set it in the
> > device_del() path. Otherwise this is encoding code flow state in the
> > struct rather than device-state that the code needs to comprehend.
>
> Instead of "cancel" what would you think of "dead"? In my mind once we
> call device_del we are essentially working with a dead device object so
> that might make more sense in terms of a state rather than "cancel"
> which doesn't really tell us what should be canceled.

That sounds good to me.

> Looking over the code I could probably set it before we start calling
> the notifiers for BUS_NOTIFY_DEL_DEVICE. The only thing I am not sure
> about is if we would need to add any sort of synchronization primitives
> around it.
>

I think it needs to be something like a barrier:

    dev->dead;
    device_lock();
    device_unlock();

...where you can be sure that anyone after that device_unlock() has
acted on dev->dead, or will see dev->dead.
Alexander Duyck Nov. 29, 2018, 9:53 p.m. UTC | #4
On Thu, 2018-11-29 at 10:55 -0800, Dan Williams wrote:
> On Thu, Nov 29, 2018 at 10:07 AM Alexander Duyck
> <alexander.h.duyck@linux.intel.com> wrote:
> > 
> > On Wed, 2018-11-28 at 17:57 -0800, Dan Williams wrote:
> 
> [..]
> > > I think the flag should be named "cancel" and set it in the
> > > device_del() path. Otherwise this is encoding code flow state in the
> > > struct rather than device-state that the code needs to comprehend.
> > 
> > Instead of "cancel" what would you think of "dead"? In my mind once we
> > call device_del we are essentially working with a dead device object so
> > that might make more sense in terms of a state rather than "cancel"
> > which doesn't really tell us what should be canceled.
> 
> That sounds good to me.
> 
> > Looking over the code I could probably set it before we start calling
> > the notifiers for BUS_NOTIFY_DEL_DEVICE. The only thing I am not sure
> > about is if we would need to add any sort of synchronization primitives
> > around it.
> > 
> 
> I think it needs to be something like a barrier:
> 
>     dev->dead;
>     device_lock();
>     device_unlock();
> 
> ...where you can be sure that anyone after that device_unlock() has
> acted on dev->dead, or will see dev->dead.

Actually what I think I will do is set dev->dead to true with the
device lock held at the start of device_del. So something like:
	device_lock();
	dev->dead = true;
	device_unlock();

It seems like that would probably make the most sense and be consistent
with the handling of other bits such as dev->offline. It means adding
one more call to device_lock/unlock but it guarantees that the update
behavior is consistent with the other bitfields near it and that we
cannot have an asynchronous probe routine trying to run while we are
tearing out the bus/class/sysfs from underneath the device.
Dan Williams Nov. 29, 2018, 10 p.m. UTC | #5
On Thu, Nov 29, 2018 at 1:54 PM Alexander Duyck
<alexander.h.duyck@linux.intel.com> wrote:
>
> On Thu, 2018-11-29 at 10:55 -0800, Dan Williams wrote:
> > On Thu, Nov 29, 2018 at 10:07 AM Alexander Duyck
> > <alexander.h.duyck@linux.intel.com> wrote:
> > >
> > > On Wed, 2018-11-28 at 17:57 -0800, Dan Williams wrote:
> >
> > [..]
> > > > I think the flag should be named "cancel" and set it in the
> > > > device_del() path. Otherwise this is encoding code flow state in the
> > > > struct rather than device-state that the code needs to comprehend.
> > >
> > > Instead of "cancel" what would you think of "dead"? In my mind once we
> > > call device_del we are essentially working with a dead device object so
> > > that might make more sense in terms of a state rather than "cancel"
> > > which doesn't really tell us what should be canceled.
> >
> > That sounds good to me.
> >
> > > Looking over the code I could probably set it before we start calling
> > > the notifiers for BUS_NOTIFY_DEL_DEVICE. The only thing I am not sure
> > > about is if we would need to add any sort of synchronization primitives
> > > around it.
> > >
> >
> > I think it needs to be something like a barrier:
> >
> >     dev->dead;
> >     device_lock();
> >     device_unlock();
> >
> > ...where you can be sure that anyone after that device_unlock() has
> > acted on dev->dead, or will see dev->dead.
>
> Actually what I think I will do is set dev->dead to true with the
> device lock held at the start of device_del. So something like:
>         device_lock();
>         dev->dead = true;
>         device_unlock();

Yeah, the lock is needed anyway since it's a bitfield.

> It seems like that would probably make the most sense and be consistent
> with the handling of other bits such as dev->offline. It means adding
> one more call to device_lock/unlock but it guarantees that the update
> behavior is consistent with the other bitfields near it and that we
> cannot have an asynchronous probe routine trying to run while we are
> tearing out the bus/class/sysfs from underneath the device.

I think that last statement

"guarantee that the update behavior is consistent with the other
bitfields near it and that we cannot have an asynchronous probe
routine trying to run while we are tearing out the bus/class/sysfs
from underneath the device."

...would be good to include as a comment when setting 'dead' to true.
Luis Chamberlain Nov. 30, 2018, 11:40 p.m. UTC | #6
On Wed, Nov 28, 2018 at 04:32:16PM -0800, Alexander Duyck wrote:
> Add an additional bit flag to the device struct named async_probe. This
> additional flag allows us to guarantee ordering between probe and remove
> operations.
> 
> This allows us to guarantee that if we execute a remove operation on a
> given interface it will not attempt to update the driver member
> asynchronously following the earlier operation. Previously this guarantee
> was not present and could result in us attempting to remove a driver from
> an interface only to have it attempt to attach the driver later when we
> finally complete the deferred asynchronous probe call.
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

This is the sort of corner case that is best if we had a test case for
it, as it is hard to reproduce and -- how do we know we won't regress
later? Not sure if it helps but we have lib/test_kmod.c and its
respective tools/testing/selftests/kmod/kmod.sh, a new enum kmod_test_case
might be in order for device emulation creeping up / disappearing
during a custom mock driver using async probe.

Yeah.. I know.. "yes this seems good but how about later"? While we're going
through the motions here and have your attention on this I think it
would be valuable for this now. This is the sort of code that won't
change often, but if modified *can* really break things badly.

  Luis

> ---
>  drivers/base/dd.c      |   16 ++++++++++++++++
>  include/linux/device.h |    3 +++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 88713f182086..ef3f70a7cb5a 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -774,6 +774,10 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
>  
>  	device_lock(dev);
>  
> +	/* nothing to do if async_probe has been cleared */
> +	if (!dev->async_probe)
> +		goto out_unlock;
> +
>  	if (dev->parent)
>  		pm_runtime_get_sync(dev->parent);
>  
> @@ -785,6 +789,9 @@ static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
>  	if (dev->parent)
>  		pm_runtime_put(dev->parent);
>  
> +	/* We made our attempt at an async_probe, clear the flag */
> +	dev->async_probe = false;
> +out_unlock:
>  	device_unlock(dev);
>  
>  	put_device(dev);
> @@ -829,6 +836,7 @@ static int __device_attach(struct device *dev, bool allow_async)
>  			 */
>  			dev_dbg(dev, "scheduling asynchronous probe\n");
>  			get_device(dev);
> +			dev->async_probe = true;
>  			async_schedule(__device_attach_async_helper, dev);
>  		} else {
>  			pm_request_idle(dev);
> @@ -929,6 +937,14 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>  {
>  	struct device_driver *drv;
>  
> +	/*
> +	 * In the event that we are asked to release the driver on an
> +	 * interface that is still waiting on a probe we can just terminate
> +	 * the probe by setting async_probe to false. When the async call
> +	 * is finally completed it will see this state and just exit.
> +	 */
> +	dev->async_probe = false;
> +
>  	drv = dev->driver;
>  	if (drv) {
>  		while (device_links_busy(dev)) {
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 1b25c7a43f4c..4d2eb2c74149 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -957,6 +957,8 @@ struct dev_links_info {
>   *              device.
>   * @dma_coherent: this particular device is dma coherent, even if the
>   *		architecture supports non-coherent devices.
> + * @async_probe: This device has an asynchronous probe event pending. Should
> + *		 only be updated while holding device lock.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -1051,6 +1053,7 @@ struct device {
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>  	bool			dma_coherent:1;
>  #endif
> +	bool			async_probe:1;
>  };
>  
>  static inline struct device *kobj_to_dev(struct kobject *kobj)
>

Patch
diff mbox series

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 88713f182086..ef3f70a7cb5a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -774,6 +774,10 @@  static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
 
 	device_lock(dev);
 
+	/* nothing to do if async_probe has been cleared */
+	if (!dev->async_probe)
+		goto out_unlock;
+
 	if (dev->parent)
 		pm_runtime_get_sync(dev->parent);
 
@@ -785,6 +789,9 @@  static void __device_attach_async_helper(void *_dev, async_cookie_t cookie)
 	if (dev->parent)
 		pm_runtime_put(dev->parent);
 
+	/* We made our attempt at an async_probe, clear the flag */
+	dev->async_probe = false;
+out_unlock:
 	device_unlock(dev);
 
 	put_device(dev);
@@ -829,6 +836,7 @@  static int __device_attach(struct device *dev, bool allow_async)
 			 */
 			dev_dbg(dev, "scheduling asynchronous probe\n");
 			get_device(dev);
+			dev->async_probe = true;
 			async_schedule(__device_attach_async_helper, dev);
 		} else {
 			pm_request_idle(dev);
@@ -929,6 +937,14 @@  static void __device_release_driver(struct device *dev, struct device *parent)
 {
 	struct device_driver *drv;
 
+	/*
+	 * In the event that we are asked to release the driver on an
+	 * interface that is still waiting on a probe we can just terminate
+	 * the probe by setting async_probe to false. When the async call
+	 * is finally completed it will see this state and just exit.
+	 */
+	dev->async_probe = false;
+
 	drv = dev->driver;
 	if (drv) {
 		while (device_links_busy(dev)) {
diff --git a/include/linux/device.h b/include/linux/device.h
index 1b25c7a43f4c..4d2eb2c74149 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -957,6 +957,8 @@  struct dev_links_info {
  *              device.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @async_probe: This device has an asynchronous probe event pending. Should
+ *		 only be updated while holding device lock.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -1051,6 +1053,7 @@  struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			async_probe:1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)