diff mbox series

[1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress

Message ID 20220303151929.2505822-1-lee.jones@linaro.org (mailing list archive)
State New, archived
Headers show
Series [1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress | expand

Commit Message

Lee Jones March 3, 2022, 3:19 p.m. UTC
All workers/users should be halted before any clean-up should take place.

Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/vhost/vhost.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Leon Romanovsky March 3, 2022, 7:14 p.m. UTC | #1
On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> All workers/users should be halted before any clean-up should take place.
> 
> Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Lee Jones <lee.jones@linaro.org>
> ---
>  drivers/vhost/vhost.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index bbaff6a5e21b8..d935d2506963f 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  	int i;
>  
>  	for (i = 0; i < dev->nvqs; ++i) {
> +		/* Ideally all workers should be stopped prior to clean-up */
> +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> +
>  		mutex_lock(&dev->vqs[i]->mutex);

I know nothing about vhost, but this construction and patch looks
strange to me.

If all workers were stopped, you won't need mutex_lock(). The mutex_lock
here suggests to me that workers can still run here.

Thanks

>  		if (dev->vqs[i]->error_ctx)
>  			eventfd_ctx_put(dev->vqs[i]->error_ctx);
> -- 
> 2.35.1.574.g5d30c73bfb-goog
>
Lee Jones March 3, 2022, 7:38 p.m. UTC | #2
On Thu, 03 Mar 2022, Leon Romanovsky wrote:

> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > All workers/users should be halted before any clean-up should take place.
> > 
> > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > ---
> >  drivers/vhost/vhost.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index bbaff6a5e21b8..d935d2506963f 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> >  	int i;
> >  
> >  	for (i = 0; i < dev->nvqs; ++i) {
> > +		/* Ideally all workers should be stopped prior to clean-up */
> > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > +
> >  		mutex_lock(&dev->vqs[i]->mutex);
> 
> I know nothing about vhost, but this construction and patch looks
> strange to me.
> 
> If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> here suggests to me that workers can still run here.

The suggestion for this patch came from the maintainer.

Please see the conversation here:

https://lore.kernel.org/all/20220302082021-mutt-send-email-mst@kernel.org/
Michael S. Tsirkin March 3, 2022, 9:01 p.m. UTC | #3
On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > All workers/users should be halted before any clean-up should take place.
> > 
> > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > ---
> >  drivers/vhost/vhost.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index bbaff6a5e21b8..d935d2506963f 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> >  	int i;
> >  
> >  	for (i = 0; i < dev->nvqs; ++i) {
> > +		/* Ideally all workers should be stopped prior to clean-up */
> > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > +
> >  		mutex_lock(&dev->vqs[i]->mutex);
> 
> I know nothing about vhost, but this construction and patch looks
> strange to me.
> 
> If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> here suggests to me that workers can still run here.
> 
> Thanks


"Ideally" here is misleading, we need a bigger detailed comment
along the lines of:

/* 
 * By design, no workers can run here. But if there's a bug and the
 * driver did not flush all work properly then they might, and we
 * encountered such bugs in the past.  With no proper flush guest won't
 * work correctly but avoiding host memory corruption in this case
 * sounds like a good idea.
 */

> >  		if (dev->vqs[i]->error_ctx)
> >  			eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > -- 
> > 2.35.1.574.g5d30c73bfb-goog
> >
Leon Romanovsky March 4, 2022, 7:08 a.m. UTC | #4
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
> On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
> > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > > All workers/users should be halted before any clean-up should take place.
> > > 
> > > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > ---
> > >  drivers/vhost/vhost.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > index bbaff6a5e21b8..d935d2506963f 100644
> > > --- a/drivers/vhost/vhost.c
> > > +++ b/drivers/vhost/vhost.c
> > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > >  	int i;
> > >  
> > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > +		/* Ideally all workers should be stopped prior to clean-up */
> > > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > +
> > >  		mutex_lock(&dev->vqs[i]->mutex);
> > 
> > I know nothing about vhost, but this construction and patch looks
> > strange to me.
> > 
> > If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> > here suggests to me that workers can still run here.
> > 
> > Thanks
> 
> 
> "Ideally" here is misleading, we need a bigger detailed comment
> along the lines of:
> 
> /* 
>  * By design, no workers can run here. But if there's a bug and the
>  * driver did not flush all work properly then they might, and we
>  * encountered such bugs in the past.  With no proper flush guest won't
>  * work correctly but avoiding host memory corruption in this case
>  * sounds like a good idea.
>  */

This description looks better, but the check is inherently racy.
Why don't you add a comment and mutex_lock()? The WARN_ON here is
more distraction than actual help.

Thanks

> 
> > >  		if (dev->vqs[i]->error_ctx)
> > >  			eventfd_ctx_put(dev->vqs[i]->error_ctx);
> > > -- 
> > > 2.35.1.574.g5d30c73bfb-goog
> > > 
>
Stefano Garzarella March 4, 2022, 7:50 a.m. UTC | #5
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
>On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
>> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
>> > All workers/users should be halted before any clean-up should take place.
>> >
>> > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
>> > Signed-off-by: Lee Jones <lee.jones@linaro.org>
>> > ---
>> >  drivers/vhost/vhost.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> > index bbaff6a5e21b8..d935d2506963f 100644
>> > --- a/drivers/vhost/vhost.c
>> > +++ b/drivers/vhost/vhost.c
>> > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>> >  	int i;
>> >
>> >  	for (i = 0; i < dev->nvqs; ++i) {
>> > +		/* Ideally all workers should be stopped prior to clean-up */
>> > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
>> > +
>> >  		mutex_lock(&dev->vqs[i]->mutex);
>>
>> I know nothing about vhost, but this construction and patch looks
>> strange to me.
>>
>> If all workers were stopped, you won't need mutex_lock(). The mutex_lock
>> here suggests to me that workers can still run here.
>>
>> Thanks
>
>
>"Ideally" here is misleading, we need a bigger detailed comment
>along the lines of:
>
>/*
> * By design, no workers can run here. But if there's a bug and the
> * driver did not flush all work properly then they might, and we
> * encountered such bugs in the past.  With no proper flush guest won't
> * work correctly but avoiding host memory corruption in this case
> * sounds like a good idea.
> */

Can we use vhost_vq_get_backend() to check this situation?

IIUC all the vhost devices clear the backend to stop the workers.
This is not racy (if we do after the mutex_lock) and should cover all 
cases.

Thanks,
Stefano
Lee Jones March 4, 2022, 8:11 a.m. UTC | #6
On Fri, 04 Mar 2022, Leon Romanovsky wrote:

> On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
> > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > > > All workers/users should be halted before any clean-up should take place.
> > > > 
> > > > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > > ---
> > > >  drivers/vhost/vhost.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index bbaff6a5e21b8..d935d2506963f 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > >  	int i;
> > > >  
> > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > +		/* Ideally all workers should be stopped prior to clean-up */
> > > > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > > +
> > > >  		mutex_lock(&dev->vqs[i]->mutex);

                HERE ---^

> > > I know nothing about vhost, but this construction and patch looks
> > > strange to me.
> > > 
> > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> > > here suggests to me that workers can still run here.
> > > 
> > > Thanks
> > 
> > 
> > "Ideally" here is misleading, we need a bigger detailed comment
> > along the lines of:
> > 
> > /* 
> >  * By design, no workers can run here. But if there's a bug and the
> >  * driver did not flush all work properly then they might, and we
> >  * encountered such bugs in the past.  With no proper flush guest won't
> >  * work correctly but avoiding host memory corruption in this case
> >  * sounds like a good idea.
> >  */
> 
> This description looks better, but the check is inherently racy.
> Why don't you add a comment and mutex_lock()?

We do, look up.  ^

> The WARN_ON here is more distraction than actual help.

The WARN() is just an indication that something else has gone wrong.

Stefano patched one problem in:

  vhost: Protect the virtqueue from being cleared whilst still in use

... but others may crop up and the WARN() is how we'll be informed.
Lee Jones March 4, 2022, 8:12 a.m. UTC | #7
On Fri, 04 Mar 2022, Stefano Garzarella wrote:

> On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote:
> > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote:
> > > > All workers/users should be halted before any clean-up should take place.
> > > >
> > > > Suggested-by:  Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > > ---
> > > >  drivers/vhost/vhost.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index bbaff6a5e21b8..d935d2506963f 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
> > > >  	int i;
> > > >
> > > >  	for (i = 0; i < dev->nvqs; ++i) {
> > > > +		/* Ideally all workers should be stopped prior to clean-up */
> > > > +		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
> > > > +
> > > >  		mutex_lock(&dev->vqs[i]->mutex);
> > > 
> > > I know nothing about vhost, but this construction and patch looks
> > > strange to me.
> > > 
> > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock
> > > here suggests to me that workers can still run here.
> > > 
> > > Thanks
> > 
> > 
> > "Ideally" here is misleading, we need a bigger detailed comment
> > along the lines of:
> > 
> > /*
> > * By design, no workers can run here. But if there's a bug and the
> > * driver did not flush all work properly then they might, and we
> > * encountered such bugs in the past.  With no proper flush guest won't
> > * work correctly but avoiding host memory corruption in this case
> > * sounds like a good idea.
> > */
> 
> Can we use vhost_vq_get_backend() to check this situation?
> 
> IIUC all the vhost devices clear the backend to stop the workers.
> This is not racy (if we do after the mutex_lock) and should cover all cases.

I can look into this too if you like.
diff mbox series

Patch

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index bbaff6a5e21b8..d935d2506963f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -693,6 +693,9 @@  void vhost_dev_cleanup(struct vhost_dev *dev)
 	int i;
 
 	for (i = 0; i < dev->nvqs; ++i) {
+		/* Ideally all workers should be stopped prior to clean-up */
+		WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex));
+
 		mutex_lock(&dev->vqs[i]->mutex);
 		if (dev->vqs[i]->error_ctx)
 			eventfd_ctx_put(dev->vqs[i]->error_ctx);