Message ID | 20220303151929.2505822-1-lee.jones@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/1] vhost: Provide a kernel warning if mutex is held whilst clean-up in progress | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > All workers/users should be halted before any clean-up should take place. > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > Signed-off-by: Lee Jones <lee.jones@linaro.org> > --- > drivers/vhost/vhost.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index bbaff6a5e21b8..d935d2506963f 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > int i; > > for (i = 0; i < dev->nvqs; ++i) { > + /* Ideally all workers should be stopped prior to clean-up */ > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > + > mutex_lock(&dev->vqs[i]->mutex); I know nothing about vhost, but this construction and patch looks strange to me. If all workers were stopped, you won't need mutex_lock(). The mutex_lock here suggests to me that workers can still run here. Thanks > if (dev->vqs[i]->error_ctx) > eventfd_ctx_put(dev->vqs[i]->error_ctx); > -- > 2.35.1.574.g5d30c73bfb-goog >
On Thu, 03 Mar 2022, Leon Romanovsky wrote: > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > All workers/users should be halted before any clean-up should take place. > > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Lee Jones <lee.jones@linaro.org> > > --- > > drivers/vhost/vhost.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index bbaff6a5e21b8..d935d2506963f 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > int i; > > > > for (i = 0; i < dev->nvqs; ++i) { > > + /* Ideally all workers should be stopped prior to clean-up */ > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > + > > mutex_lock(&dev->vqs[i]->mutex); > > I know nothing about vhost, but this construction and patch looks > strange to me. > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > here suggests to me that workers can still run here. The suggestion for this patch came from the maintainer. Please see the conversation here: https://lore.kernel.org/all/20220302082021-mutt-send-email-mst@kernel.org/
On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > All workers/users should be halted before any clean-up should take place. > > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > > Signed-off-by: Lee Jones <lee.jones@linaro.org> > > --- > > drivers/vhost/vhost.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > index bbaff6a5e21b8..d935d2506963f 100644 > > --- a/drivers/vhost/vhost.c > > +++ b/drivers/vhost/vhost.c > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > int i; > > > > for (i = 0; i < dev->nvqs; ++i) { > > + /* Ideally all workers should be stopped prior to clean-up */ > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > + > > mutex_lock(&dev->vqs[i]->mutex); > > I know nothing about vhost, but this construction and patch looks > strange to me. > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > here suggests to me that workers can still run here. > > Thanks "Ideally" here is misleading, we need a bigger detailed comment along the lines of: /* * By design, no workers can run here. But if there's a bug and the * driver did not flush all work properly then they might, and we * encountered such bugs in the past. With no proper flush guest won't * work correctly but avoiding host memory corruption in this case * sounds like a good idea. */ > > if (dev->vqs[i]->error_ctx) > > eventfd_ctx_put(dev->vqs[i]->error_ctx); > > -- > > 2.35.1.574.g5d30c73bfb-goog > >
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote: > On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > > All workers/users should be halted before any clean-up should take place. > > > > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > > > Signed-off-by: Lee Jones <lee.jones@linaro.org> > > > --- > > > drivers/vhost/vhost.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > index bbaff6a5e21b8..d935d2506963f 100644 > > > --- a/drivers/vhost/vhost.c > > > +++ b/drivers/vhost/vhost.c > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > int i; > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > + /* Ideally all workers should be stopped prior to clean-up */ > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > + > > > mutex_lock(&dev->vqs[i]->mutex); > > > > I know nothing about vhost, but this construction and patch looks > > strange to me. > > > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > > here suggests to me that workers can still run here. > > > > Thanks > > > "Ideally" here is misleading, we need a bigger detailed comment > along the lines of: > > /* > * By design, no workers can run here. But if there's a bug and the > * driver did not flush all work properly then they might, and we > * encountered such bugs in the past. With no proper flush guest won't > * work correctly but avoiding host memory corruption in this case > * sounds like a good idea. > */ This description looks better, but the check is inherently racy. Why don't you add a comment and mutex_lock()? The WARN_ON here is more distraction than actual help. Thanks > > > > if (dev->vqs[i]->error_ctx) > > > eventfd_ctx_put(dev->vqs[i]->error_ctx); > > > -- > > > 2.35.1.574.g5d30c73bfb-goog > > > >
On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote: >On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: >> On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: >> > All workers/users should be halted before any clean-up should take place. >> > >> > Suggested-by: Michael S. Tsirkin <mst@redhat.com> >> > Signed-off-by: Lee Jones <lee.jones@linaro.org> >> > --- >> > drivers/vhost/vhost.c | 3 +++ >> > 1 file changed, 3 insertions(+) >> > >> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> > index bbaff6a5e21b8..d935d2506963f 100644 >> > --- a/drivers/vhost/vhost.c >> > +++ b/drivers/vhost/vhost.c >> > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) >> > int i; >> > >> > for (i = 0; i < dev->nvqs; ++i) { >> > + /* Ideally all workers should be stopped prior to clean-up */ >> > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); >> > + >> > mutex_lock(&dev->vqs[i]->mutex); >> >> I know nothing about vhost, but this construction and patch looks >> strange to me. >> >> If all workers were stopped, you won't need mutex_lock(). The mutex_lock >> here suggests to me that workers can still run here. >> >> Thanks > > >"Ideally" here is misleading, we need a bigger detailed comment >along the lines of: > >/* > * By design, no workers can run here. But if there's a bug and the > * driver did not flush all work properly then they might, and we > * encountered such bugs in the past. With no proper flush guest won't > * work correctly but avoiding host memory corruption in this case > * sounds like a good idea. > */ Can we use vhost_vq_get_backend() to check this situation? IIUC all the vhost devices clear the backend to stop the workers. This is not racy (if we do after the mutex_lock) and should cover all cases. Thanks, Stefano
On Fri, 04 Mar 2022, Leon Romanovsky wrote: > On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote: > > On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: > > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > > > All workers/users should be halted before any clean-up should take place. > > > > > > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > > > > Signed-off-by: Lee Jones <lee.jones@linaro.org> > > > > --- > > > > drivers/vhost/vhost.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index bbaff6a5e21b8..d935d2506963f 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > int i; > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > + /* Ideally all workers should be stopped prior to clean-up */ > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > + > > > > mutex_lock(&dev->vqs[i]->mutex); HERE ---^ > > > I know nothing about vhost, but this construction and patch looks > > > strange to me. > > > > > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > > > here suggests to me that workers can still run here. > > > > > > Thanks > > > > > > "Ideally" here is misleading, we need a bigger detailed comment > > along the lines of: > > > > /* > > * By design, no workers can run here. But if there's a bug and the > > * driver did not flush all work properly then they might, and we > > * encountered such bugs in the past. With no proper flush guest won't > > * work correctly but avoiding host memory corruption in this case > > * sounds like a good idea. > > */ > > This description looks better, but the check is inherently racy. > Why don't you add a comment and mutex_lock()? We do, look up. ^ > The WARN_ON here is more distraction than actual help. The WARN() is just an indication that something else has gone wrong. Stefano patched one problem in: vhost: Protect the virtqueue from being cleared whilst still in use ... but others may crop up and the WARN() is how we'll be informed.
On Fri, 04 Mar 2022, Stefano Garzarella wrote: > On Thu, Mar 03, 2022 at 04:01:06PM -0500, Michael S. Tsirkin wrote: > > On Thu, Mar 03, 2022 at 09:14:36PM +0200, Leon Romanovsky wrote: > > > On Thu, Mar 03, 2022 at 03:19:29PM +0000, Lee Jones wrote: > > > > All workers/users should be halted before any clean-up should take place. > > > > > > > > Suggested-by: Michael S. Tsirkin <mst@redhat.com> > > > > Signed-off-by: Lee Jones <lee.jones@linaro.org> > > > > --- > > > > drivers/vhost/vhost.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > > > > index bbaff6a5e21b8..d935d2506963f 100644 > > > > --- a/drivers/vhost/vhost.c > > > > +++ b/drivers/vhost/vhost.c > > > > @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) > > > > int i; > > > > > > > > for (i = 0; i < dev->nvqs; ++i) { > > > > + /* Ideally all workers should be stopped prior to clean-up */ > > > > + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); > > > > + > > > > mutex_lock(&dev->vqs[i]->mutex); > > > > > > I know nothing about vhost, but this construction and patch looks > > > strange to me. > > > > > > If all workers were stopped, you won't need mutex_lock(). The mutex_lock > > > here suggests to me that workers can still run here. > > > > > > Thanks > > > > > > "Ideally" here is misleading, we need a bigger detailed comment > > along the lines of: > > > > /* > > * By design, no workers can run here. But if there's a bug and the > > * driver did not flush all work properly then they might, and we > > * encountered such bugs in the past. With no proper flush guest won't > > * work correctly but avoiding host memory corruption in this case > > * sounds like a good idea. > > */ > > Can we use vhost_vq_get_backend() to check this situation? > > IIUC all the vhost devices clear the backend to stop the workers. > This is not racy (if we do after the mutex_lock) and should cover all cases. I can look into this too if you like.
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index bbaff6a5e21b8..d935d2506963f 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -693,6 +693,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev) int i; for (i = 0; i < dev->nvqs; ++i) { + /* Ideally all workers should be stopped prior to clean-up */ + WARN_ON(mutex_is_locked(&dev->vqs[i]->mutex)); + mutex_lock(&dev->vqs[i]->mutex); if (dev->vqs[i]->error_ctx) eventfd_ctx_put(dev->vqs[i]->error_ctx);
All workers/users should be halted before any clean-up should take place. Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> --- drivers/vhost/vhost.c | 3 +++ 1 file changed, 3 insertions(+)