diff mbox series

[v2,1/1] vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING"

Message ID 159c2af6cc2fc8ca838cfa7ab9a54e8a1b7507b9.1554315372.git.alifm@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/1] vfio: Fix WARNING "do not call blocking ops when !TASK_RUNNING" | expand

Commit Message

Farhan Ali April 3, 2019, 6:22 p.m. UTC
vfio_dev_present() which is the condition to
wait_event_interruptible_timeout(), will call vfio_group_get_device
and try to acquire the mutex group->device_lock.

wait_event_interruptible_timeout() will set the state of the current
task to TASK_INTERRUPTIBLE, before doing the condition check. This
means that we will try to accquire the mutex while already in a
sleeping state. The scheduler warns us by giving the following
warning:

[ 4050.264464] ------------[ cut here ]------------
[ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
[ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
....

 4050.264756] Call Trace:
[ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
[ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
[ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
[ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
[ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
[ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
[ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
[ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
[ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
[ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
[ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
[ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
[ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
[ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
[ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
[ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
[ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
[ 4050.264925] 4 locks held by sh/35924:
[ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
[ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
[ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
[ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
[ 4050.264993] Last Breaking-Event-Address:
[ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
[ 4050.265010] irq event stamp: 7039
[ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
[ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
[ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
[ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
[ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---

Let's fix this as described in the article https://lwn.net/Articles/628628/.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---

I have already tested in my environment and the warning goes 
away for me with this patch. But appreciate further testing
and review feedback on the patch.

Thanks
Farhan


ChangeLog
---------
v1 -> v2
   - Keep the same behavior as before, so the task goes into
     TASK_UNINTERRUPTIBLE state after being interrupted once

---

 drivers/vfio/vfio.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

Comments

Farhan Ali April 9, 2019, 1:13 p.m. UTC | #1
On 04/03/2019 02:22 PM, Farhan Ali wrote:
> vfio_dev_present() which is the condition to
> wait_event_interruptible_timeout(), will call vfio_group_get_device
> and try to acquire the mutex group->device_lock.
> 
> wait_event_interruptible_timeout() will set the state of the current
> task to TASK_INTERRUPTIBLE, before doing the condition check. This
> means that we will try to accquire the mutex while already in a
> sleeping state. The scheduler warns us by giving the following
> warning:
> 
> [ 4050.264464] ------------[ cut here ]------------
> [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
> [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
> ....
> 
>   4050.264756] Call Trace:
> [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
> [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
> [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
> [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
> [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
> [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
> [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
> [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
> [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
> [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
> [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
> [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
> [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
> [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> [ 4050.264925] 4 locks held by sh/35924:
> [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
> [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
> [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
> [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
> [ 4050.264993] Last Breaking-Event-Address:
> [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
> [ 4050.265010] irq event stamp: 7039
> [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
> [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
> [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
> [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
> 
> Let's fix this as described in the articlehttps://lwn.net/Articles/628628/.
> 
> Signed-off-by: Farhan Ali<alifm@linux.ibm.com>
> ---
> 
> I have already tested in my environment and the warning goes
> away for me with this patch. But appreciate further testing
> and review feedback on the patch.
> 
> Thanks
> Farhan
> 
> 
> ChangeLog
> ---------
> v1 -> v2
>     - Keep the same behavior as before, so the task goes into
>       TASK_UNINTERRUPTIBLE state after being interrupted once
> 
> ---



Hi Alex,

Polite ping :)

Do you have any other feedback regarding the patch?

Thanks
Farhan
Alex Williamson April 9, 2019, 2:27 p.m. UTC | #2
On Tue, 9 Apr 2019 09:13:48 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 04/03/2019 02:22 PM, Farhan Ali wrote:
> > vfio_dev_present() which is the condition to
> > wait_event_interruptible_timeout(), will call vfio_group_get_device
> > and try to acquire the mutex group->device_lock.
> > 
> > wait_event_interruptible_timeout() will set the state of the current
> > task to TASK_INTERRUPTIBLE, before doing the condition check. This
> > means that we will try to accquire the mutex while already in a
> > sleeping state. The scheduler warns us by giving the following
> > warning:
> > 
> > [ 4050.264464] ------------[ cut here ]------------
> > [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
> > [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
> > ....
> > 
> >   4050.264756] Call Trace:
> > [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
> > [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
> > [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
> > [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
> > [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
> > [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
> > [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
> > [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
> > [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
> > [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
> > [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
> > [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
> > [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> > [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> > [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
> > [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> > [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> > [ 4050.264925] 4 locks held by sh/35924:
> > [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
> > [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
> > [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
> > [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
> > [ 4050.264993] Last Breaking-Event-Address:
> > [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
> > [ 4050.265010] irq event stamp: 7039
> > [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> > [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
> > [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
> > [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
> > [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
> > 
> > Let's fix this as described in the articlehttps://lwn.net/Articles/628628/.
> > 
> > Signed-off-by: Farhan Ali<alifm@linux.ibm.com>
> > ---
> > 
> > I have already tested in my environment and the warning goes
> > away for me with this patch. But appreciate further testing
> > and review feedback on the patch.
> > 
> > Thanks
> > Farhan
> > 
> > 
> > ChangeLog
> > ---------
> > v1 -> v2
> >     - Keep the same behavior as before, so the task goes into
> >       TASK_UNINTERRUPTIBLE state after being interrupted once
> > 
> > ---  
> 
> 
> 
> Hi Alex,
> 
> Polite ping :)
> 
> Do you have any other feedback regarding the patch?

Hi Farhan,

It looks fine to me, it's pending integration and further testing on
my end.  I'll target this for v5.2.  Thanks,

Alex
Alex Williamson April 18, 2019, 10:17 p.m. UTC | #3
On Wed,  3 Apr 2019 14:22:27 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> vfio_dev_present() which is the condition to
> wait_event_interruptible_timeout(), will call vfio_group_get_device
> and try to acquire the mutex group->device_lock.
> 
> wait_event_interruptible_timeout() will set the state of the current
> task to TASK_INTERRUPTIBLE, before doing the condition check. This
> means that we will try to accquire the mutex while already in a
> sleeping state. The scheduler warns us by giving the following
> warning:
> 
> [ 4050.264464] ------------[ cut here ]------------
> [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
> [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
> ....
> 
>  4050.264756] Call Trace:
> [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
> [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
> [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
> [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
> [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
> [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
> [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
> [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
> [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
> [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
> [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
> [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
> [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
> [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> [ 4050.264925] 4 locks held by sh/35924:
> [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
> [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
> [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
> [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
> [ 4050.264993] Last Breaking-Event-Address:
> [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
> [ 4050.265010] irq event stamp: 7039
> [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
> [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
> [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
> [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
> 
> Let's fix this as described in the article https://lwn.net/Articles/628628/.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
> 
> I have already tested in my environment and the warning goes 
> away for me with this patch. But appreciate further testing
> and review feedback on the patch.
> 
> Thanks
> Farhan
> 
> 
> ChangeLog
> ---------
> v1 -> v2
>    - Keep the same behavior as before, so the task goes into
>      TASK_UNINTERRUPTIBLE state after being interrupted once
> 
> ---
> 
>  drivers/vfio/vfio.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 6483387..62f9637 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -34,6 +34,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/vfio.h>
>  #include <linux/wait.h>
> +#include <linux/sched/signal.h>
>  
>  #define DRIVER_VERSION	"0.3"
>  #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
> @@ -922,12 +923,12 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
>   * removed.  Open file descriptors for the device... */
>  void *vfio_del_group_dev(struct device *dev)
>  {
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>  	struct vfio_device *device = dev_get_drvdata(dev);
>  	struct vfio_group *group = device->group;
>  	void *device_data = device->device_data;
>  	struct vfio_unbound_dev *unbound;
>  	unsigned int i = 0;
> -	long ret;
>  	bool interrupted = false;
>  
>  	/*
> @@ -964,6 +965,8 @@ void *vfio_del_group_dev(struct device *dev)
>  	 * interval with counter to allow the driver to take escalating
>  	 * measures to release the device if it has the ability to do so.
>  	 */
> +	add_wait_queue(&vfio.release_q, &wait);
> +
>  	do {
>  		device = vfio_group_get_device(group, dev);
>  		if (!device)
> @@ -974,13 +977,14 @@ void *vfio_del_group_dev(struct device *dev)
>  
>  		vfio_device_put(device);
>  
> +		if (!vfio_dev_present(group, dev))
> +			break;

Hi Farhan,

Sorry for the delay, this seems to work well, but I think the above
vfio_dev_present() check is redundant.  The code above this test is:

                device = vfio_group_get_device(group, dev);
                if (!device)
                        break;

                if (device->ops->request)
                        device->ops->request(device_data, i++);

                vfio_device_put(device);

And vfio_dev_present() is:

static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
{
        struct vfio_device *device;

        device = vfio_group_get_device(group, dev);
        if (!device)
                return false;

        vfio_device_put(device);
        return true;
}

vfio_dev_present() exists entirely to support the wait_event_timeout()
calls, so I think we can simply drop it and delete the function with
your conversion to this new method.  If you agree, I can go ahead and
make the change below on commit.  Thanks,

Alex


diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index fb7cc8f11ce5..82fcf07fa9ea 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -902,19 +902,6 @@ void *vfio_device_data(struct vfio_device *device)
 }
 EXPORT_SYMBOL_GPL(vfio_device_data);
 
-/* Given a referenced group, check if it contains the device */
-static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
-{
-	struct vfio_device *device;
-
-	device = vfio_group_get_device(group, dev);
-	if (!device)
-		return false;
-
-	vfio_device_put(device);
-	return true;
-}
-
 /*
  * Decrement the device reference count and wait for the device to be
  * removed.  Open file descriptors for the device... */
@@ -974,9 +961,6 @@ void *vfio_del_group_dev(struct device *dev)
 
 		vfio_device_put(device);
 
-		if (!vfio_dev_present(group, dev))
-			break;
-
 		if (interrupted) {
 			wait_woken(&wait, TASK_UNINTERRUPTIBLE, HZ * 10);
 		} else {
Farhan Ali April 19, 2019, 12:05 p.m. UTC | #4
On 04/18/2019 06:17 PM, Alex Williamson wrote:
> On Wed,  3 Apr 2019 14:22:27 -0400
> Farhan Ali <alifm@linux.ibm.com> wrote:
> 
>> vfio_dev_present() which is the condition to
>> wait_event_interruptible_timeout(), will call vfio_group_get_device
>> and try to acquire the mutex group->device_lock.
>>
>> wait_event_interruptible_timeout() will set the state of the current
>> task to TASK_INTERRUPTIBLE, before doing the condition check. This
>> means that we will try to accquire the mutex while already in a
>> sleeping state. The scheduler warns us by giving the following
>> warning:
>>
>> [ 4050.264464] ------------[ cut here ]------------
>> [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
>> [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
>> ....
>>
>>   4050.264756] Call Trace:
>> [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
>> [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
>> [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
>> [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
>> [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
>> [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
>> [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
>> [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
>> [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
>> [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
>> [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
>> [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
>> [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
>> [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
>> [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
>> [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
>> [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
>> [ 4050.264925] 4 locks held by sh/35924:
>> [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
>> [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
>> [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
>> [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
>> [ 4050.264993] Last Breaking-Event-Address:
>> [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
>> [ 4050.265010] irq event stamp: 7039
>> [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
>> [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
>> [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
>> [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
>> [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
>>
>> Let's fix this as described in the article https://lwn.net/Articles/628628/.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>
>> I have already tested in my environment and the warning goes
>> away for me with this patch. But appreciate further testing
>> and review feedback on the patch.
>>
>> Thanks
>> Farhan
>>
>>
>> ChangeLog
>> ---------
>> v1 -> v2
>>     - Keep the same behavior as before, so the task goes into
>>       TASK_UNINTERRUPTIBLE state after being interrupted once
>>
>> ---
>>
>>   drivers/vfio/vfio.c | 20 +++++++++++++-------
>>   1 file changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index 6483387..62f9637 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -34,6 +34,7 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/vfio.h>
>>   #include <linux/wait.h>
>> +#include <linux/sched/signal.h>
>>   
>>   #define DRIVER_VERSION	"0.3"
>>   #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
>> @@ -922,12 +923,12 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
>>    * removed.  Open file descriptors for the device... */
>>   void *vfio_del_group_dev(struct device *dev)
>>   {
>> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
>>   	struct vfio_device *device = dev_get_drvdata(dev);
>>   	struct vfio_group *group = device->group;
>>   	void *device_data = device->device_data;
>>   	struct vfio_unbound_dev *unbound;
>>   	unsigned int i = 0;
>> -	long ret;
>>   	bool interrupted = false;
>>   
>>   	/*
>> @@ -964,6 +965,8 @@ void *vfio_del_group_dev(struct device *dev)
>>   	 * interval with counter to allow the driver to take escalating
>>   	 * measures to release the device if it has the ability to do so.
>>   	 */
>> +	add_wait_queue(&vfio.release_q, &wait);
>> +
>>   	do {
>>   		device = vfio_group_get_device(group, dev);
>>   		if (!device)
>> @@ -974,13 +977,14 @@ void *vfio_del_group_dev(struct device *dev)
>>   
>>   		vfio_device_put(device);
>>   
>> +		if (!vfio_dev_present(group, dev))
>> +			break;
> 
> Hi Farhan,
> 
> Sorry for the delay, this seems to work well, but I think the above
> vfio_dev_present() check is redundant.  The code above this test is:
> 
>                  device = vfio_group_get_device(group, dev);
>                  if (!device)
>                          break;
> 
>                  if (device->ops->request)
>                          device->ops->request(device_data, i++);
> 
>                  vfio_device_put(device);
> 
> And vfio_dev_present() is:
> 
> static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> {
>          struct vfio_device *device;
> 
>          device = vfio_group_get_device(group, dev);
>          if (!device)
>                  return false;
> 
>          vfio_device_put(device);
>          return true;
> }
> 
> vfio_dev_present() exists entirely to support the wait_event_timeout()
> calls, so I think we can simply drop it and delete the function with
> your conversion to this new method.  If you agree, I can go ahead and
> make the change below on commit.  Thanks,
> 
> Alex
> 
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index fb7cc8f11ce5..82fcf07fa9ea 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -902,19 +902,6 @@ void *vfio_device_data(struct vfio_device *device)
>   }
>   EXPORT_SYMBOL_GPL(vfio_device_data);
>   
> -/* Given a referenced group, check if it contains the device */
> -static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> -{
> -	struct vfio_device *device;
> -
> -	device = vfio_group_get_device(group, dev);
> -	if (!device)
> -		return false;
> -
> -	vfio_device_put(device);
> -	return true;
> -}
> -
>   /*
>    * Decrement the device reference count and wait for the device to be
>    * removed.  Open file descriptors for the device... */
> @@ -974,9 +961,6 @@ void *vfio_del_group_dev(struct device *dev)
>   
>   		vfio_device_put(device);
>   
> -		if (!vfio_dev_present(group, dev))
> -			break;
> -
>   		if (interrupted) {
>   			wait_woken(&wait, TASK_UNINTERRUPTIBLE, HZ * 10);
>   		} else {
> 
> 

Hi Alex,

You are right and this change would remove redundant code. I like it :) 
and I am fine with the change.

Thanks
Farhan
Alex Williamson April 23, 2019, 5:37 p.m. UTC | #5
On Fri, 19 Apr 2019 08:05:00 -0400
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 04/18/2019 06:17 PM, Alex Williamson wrote:
> > On Wed,  3 Apr 2019 14:22:27 -0400
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >   
> >> vfio_dev_present() which is the condition to
> >> wait_event_interruptible_timeout(), will call vfio_group_get_device
> >> and try to acquire the mutex group->device_lock.
> >>
> >> wait_event_interruptible_timeout() will set the state of the current
> >> task to TASK_INTERRUPTIBLE, before doing the condition check. This
> >> means that we will try to accquire the mutex while already in a
> >> sleeping state. The scheduler warns us by giving the following
> >> warning:
> >>
> >> [ 4050.264464] ------------[ cut here ]------------
> >> [ 4050.264508] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000b33c00e2>] prepare_to_wait_event+0x14a/0x188
> >> [ 4050.264529] WARNING: CPU: 12 PID: 35924 at kernel/sched/core.c:6112 __might_sleep+0x76/0x90
> >> ....
> >>
> >>   4050.264756] Call Trace:
> >> [ 4050.264765] ([<000000000017bbaa>] __might_sleep+0x72/0x90)
> >> [ 4050.264774]  [<0000000000b97edc>] __mutex_lock+0x44/0x8c0
> >> [ 4050.264782]  [<0000000000b9878a>] mutex_lock_nested+0x32/0x40
> >> [ 4050.264793]  [<000003ff800d7abe>] vfio_group_get_device+0x36/0xa8 [vfio]
> >> [ 4050.264803]  [<000003ff800d87c0>] vfio_del_group_dev+0x238/0x378 [vfio]
> >> [ 4050.264813]  [<000003ff8015f67c>] mdev_remove+0x3c/0x68 [mdev]
> >> [ 4050.264825]  [<00000000008e01b0>] device_release_driver_internal+0x168/0x268
> >> [ 4050.264834]  [<00000000008de692>] bus_remove_device+0x162/0x190
> >> [ 4050.264843]  [<00000000008daf42>] device_del+0x1e2/0x368
> >> [ 4050.264851]  [<00000000008db12c>] device_unregister+0x64/0x88
> >> [ 4050.264862]  [<000003ff8015ed84>] mdev_device_remove+0xec/0x130 [mdev]
> >> [ 4050.264872]  [<000003ff8015f074>] remove_store+0x6c/0xa8 [mdev]
> >> [ 4050.264881]  [<000000000046f494>] kernfs_fop_write+0x14c/0x1f8
> >> [ 4050.264890]  [<00000000003c1530>] __vfs_write+0x38/0x1a8
> >> [ 4050.264899]  [<00000000003c187c>] vfs_write+0xb4/0x198
> >> [ 4050.264908]  [<00000000003c1af2>] ksys_write+0x5a/0xb0
> >> [ 4050.264916]  [<0000000000b9e270>] system_call+0xdc/0x2d8
> >> [ 4050.264925] 4 locks held by sh/35924:
> >> [ 4050.264933]  #0: 000000001ef90325 (sb_writers#4){.+.+}, at: vfs_write+0x9e/0x198
> >> [ 4050.264948]  #1: 000000005c1ab0b3 (&of->mutex){+.+.}, at: kernfs_fop_write+0x1cc/0x1f8
> >> [ 4050.264963]  #2: 0000000034831ab8 (kn->count#297){++++}, at: kernfs_remove_self+0x12e/0x150
> >> [ 4050.264979]  #3: 00000000e152484f (&dev->mutex){....}, at: device_release_driver_internal+0x5c/0x268
> >> [ 4050.264993] Last Breaking-Event-Address:
> >> [ 4050.265002]  [<000000000017bbaa>] __might_sleep+0x72/0x90
> >> [ 4050.265010] irq event stamp: 7039
> >> [ 4050.265020] hardirqs last  enabled at (7047): [<00000000001cee7a>] console_unlock+0x6d2/0x740
> >> [ 4050.265029] hardirqs last disabled at (7054): [<00000000001ce87e>] console_unlock+0xd6/0x740
> >> [ 4050.265040] softirqs last  enabled at (6416): [<0000000000b8fe26>] __udelay+0xb6/0x100
> >> [ 4050.265049] softirqs last disabled at (6415): [<0000000000b8fe06>] __udelay+0x96/0x100
> >> [ 4050.265057] ---[ end trace d04a07d39d99a9f9 ]---
> >>
> >> Let's fix this as described in the article https://lwn.net/Articles/628628/.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>
> >> I have already tested in my environment and the warning goes
> >> away for me with this patch. But appreciate further testing
> >> and review feedback on the patch.
> >>
> >> Thanks
> >> Farhan
> >>
> >>
> >> ChangeLog
> >> ---------
> >> v1 -> v2
> >>     - Keep the same behavior as before, so the task goes into
> >>       TASK_UNINTERRUPTIBLE state after being interrupted once
> >>
> >> ---
> >>
> >>   drivers/vfio/vfio.c | 20 +++++++++++++-------
> >>   1 file changed, 13 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> >> index 6483387..62f9637 100644
> >> --- a/drivers/vfio/vfio.c
> >> +++ b/drivers/vfio/vfio.c
> >> @@ -34,6 +34,7 @@
> >>   #include <linux/uaccess.h>
> >>   #include <linux/vfio.h>
> >>   #include <linux/wait.h>
> >> +#include <linux/sched/signal.h>
> >>   
> >>   #define DRIVER_VERSION	"0.3"
> >>   #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
> >> @@ -922,12 +923,12 @@ static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> >>    * removed.  Open file descriptors for the device... */
> >>   void *vfio_del_group_dev(struct device *dev)
> >>   {
> >> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> >>   	struct vfio_device *device = dev_get_drvdata(dev);
> >>   	struct vfio_group *group = device->group;
> >>   	void *device_data = device->device_data;
> >>   	struct vfio_unbound_dev *unbound;
> >>   	unsigned int i = 0;
> >> -	long ret;
> >>   	bool interrupted = false;
> >>   
> >>   	/*
> >> @@ -964,6 +965,8 @@ void *vfio_del_group_dev(struct device *dev)
> >>   	 * interval with counter to allow the driver to take escalating
> >>   	 * measures to release the device if it has the ability to do so.
> >>   	 */
> >> +	add_wait_queue(&vfio.release_q, &wait);
> >> +
> >>   	do {
> >>   		device = vfio_group_get_device(group, dev);
> >>   		if (!device)
> >> @@ -974,13 +977,14 @@ void *vfio_del_group_dev(struct device *dev)
> >>   
> >>   		vfio_device_put(device);
> >>   
> >> +		if (!vfio_dev_present(group, dev))
> >> +			break;  
> > 
> > Hi Farhan,
> > 
> > Sorry for the delay, this seems to work well, but I think the above
> > vfio_dev_present() check is redundant.  The code above this test is:
> > 
> >                  device = vfio_group_get_device(group, dev);
> >                  if (!device)
> >                          break;
> > 
> >                  if (device->ops->request)
> >                          device->ops->request(device_data, i++);
> > 
> >                  vfio_device_put(device);
> > 
> > And vfio_dev_present() is:
> > 
> > static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> > {
> >          struct vfio_device *device;
> > 
> >          device = vfio_group_get_device(group, dev);
> >          if (!device)
> >                  return false;
> > 
> >          vfio_device_put(device);
> >          return true;
> > }
> > 
> > vfio_dev_present() exists entirely to support the wait_event_timeout()
> > calls, so I think we can simply drop it and delete the function with
> > your conversion to this new method.  If you agree, I can go ahead and
> > make the change below on commit.  Thanks,
> > 
> > Alex
> > 
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index fb7cc8f11ce5..82fcf07fa9ea 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -902,19 +902,6 @@ void *vfio_device_data(struct vfio_device *device)
> >   }
> >   EXPORT_SYMBOL_GPL(vfio_device_data);
> >   
> > -/* Given a referenced group, check if it contains the device */
> > -static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
> > -{
> > -	struct vfio_device *device;
> > -
> > -	device = vfio_group_get_device(group, dev);
> > -	if (!device)
> > -		return false;
> > -
> > -	vfio_device_put(device);
> > -	return true;
> > -}
> > -
> >   /*
> >    * Decrement the device reference count and wait for the device to be
> >    * removed.  Open file descriptors for the device... */
> > @@ -974,9 +961,6 @@ void *vfio_del_group_dev(struct device *dev)
> >   
> >   		vfio_device_put(device);
> >   
> > -		if (!vfio_dev_present(group, dev))
> > -			break;
> > -
> >   		if (interrupted) {
> >   			wait_woken(&wait, TASK_UNINTERRUPTIBLE, HZ * 10);
> >   		} else {
> > 
> >   
> 
> Hi Alex,
> 
> You are right and this change would remove redundant code. I like it :) 
> and I am fine with the change.

Great, folded into patch and applied to vfio next branch for v5.2.
Thanks!

Alex
diff mbox series

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6483387..62f9637 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -34,6 +34,7 @@ 
 #include <linux/uaccess.h>
 #include <linux/vfio.h>
 #include <linux/wait.h>
+#include <linux/sched/signal.h>
 
 #define DRIVER_VERSION	"0.3"
 #define DRIVER_AUTHOR	"Alex Williamson <alex.williamson@redhat.com>"
@@ -922,12 +923,12 @@  static bool vfio_dev_present(struct vfio_group *group, struct device *dev)
  * removed.  Open file descriptors for the device... */
 void *vfio_del_group_dev(struct device *dev)
 {
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct vfio_device *device = dev_get_drvdata(dev);
 	struct vfio_group *group = device->group;
 	void *device_data = device->device_data;
 	struct vfio_unbound_dev *unbound;
 	unsigned int i = 0;
-	long ret;
 	bool interrupted = false;
 
 	/*
@@ -964,6 +965,8 @@  void *vfio_del_group_dev(struct device *dev)
 	 * interval with counter to allow the driver to take escalating
 	 * measures to release the device if it has the ability to do so.
 	 */
+	add_wait_queue(&vfio.release_q, &wait);
+
 	do {
 		device = vfio_group_get_device(group, dev);
 		if (!device)
@@ -974,13 +977,14 @@  void *vfio_del_group_dev(struct device *dev)
 
 		vfio_device_put(device);
 
+		if (!vfio_dev_present(group, dev))
+			break;
+
 		if (interrupted) {
-			ret = wait_event_timeout(vfio.release_q,
-					!vfio_dev_present(group, dev), HZ * 10);
+			wait_woken(&wait, TASK_UNINTERRUPTIBLE, HZ * 10);
 		} else {
-			ret = wait_event_interruptible_timeout(vfio.release_q,
-					!vfio_dev_present(group, dev), HZ * 10);
-			if (ret == -ERESTARTSYS) {
+			wait_woken(&wait, TASK_INTERRUPTIBLE, HZ * 10);
+			if (signal_pending(current)) {
 				interrupted = true;
 				dev_warn(dev,
 					 "Device is currently in use, task"
@@ -989,8 +993,10 @@  void *vfio_del_group_dev(struct device *dev)
 					 current->comm, task_pid_nr(current));
 			}
 		}
-	} while (ret <= 0);
 
+	} while (1);
+
+	remove_wait_queue(&vfio.release_q, &wait);
 	/*
 	 * In order to support multiple devices per group, devices can be
 	 * plucked from the group while other devices in the group are still