Message ID | 20191202215150.10250-1-mchristi@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nbd: fix shutdown and recv work deadlock | expand |
Josef and Jens, Ignore this patch. It could also deadlock but in a different way, and it looks like there are other possible issues with races and refcounts. I will send some new patches. On 12/02/2019 03:51 PM, Mike Christie wrote: > This fixes a regression added with: > > commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 > Author: Mike Christie <mchristi@redhat.com> > Date: Sun Aug 4 14:10:06 2019 -0500 > > nbd: fix max number of supported devs > > where we can deadlock during device shutdown. The problem will occur if > userpsace has done a NBD_CLEAR_SOCK call, then does close() before the > recv_work work has done its nbd_config_put() call. If recv_work does the > last call then it will do destroy_workqueue which will then be stuck > waiting for the work we are running from. > > This fixes the issue by having nbd_start_device_ioctl flush the work > queue on both the failure and success cases and has a refcount on the > nbd_device while it is flushing the work queue. > > Cc: stable@vger.kernel.org > Signed-off-by: Mike Christie <mchristi@redhat.com> > --- > drivers/block/nbd.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index 57532465fb83..f8597d2fb365 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -1293,13 +1293,15 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b > > if (max_part) > bdev->bd_invalidated = 1; > + > + refcount_inc(&nbd->config_refs); > mutex_unlock(&nbd->config_lock); > ret = wait_event_interruptible(config->recv_wq, > atomic_read(&config->recv_threads) == 0); > - if (ret) { > + if (ret) > sock_shutdown(nbd); > - flush_workqueue(nbd->recv_workq); > - } > + flush_workqueue(nbd->recv_workq); > + > mutex_lock(&nbd->config_lock); > nbd_bdev_reset(bdev); > /* user requested, ignore socket errors */ > @@ -1307,6 +1309,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b > ret = 0; > if (test_bit(NBD_RT_TIMEDOUT, &config->runtime_flags)) > ret = -ETIMEDOUT; > + nbd_config_put(nbd); > return ret; > } > >
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 57532465fb83..f8597d2fb365 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1293,13 +1293,15 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b if (max_part) bdev->bd_invalidated = 1; + + refcount_inc(&nbd->config_refs); mutex_unlock(&nbd->config_lock); ret = wait_event_interruptible(config->recv_wq, atomic_read(&config->recv_threads) == 0); - if (ret) { + if (ret) sock_shutdown(nbd); - flush_workqueue(nbd->recv_workq); - } + flush_workqueue(nbd->recv_workq); + mutex_lock(&nbd->config_lock); nbd_bdev_reset(bdev); /* user requested, ignore socket errors */ @@ -1307,6 +1309,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b ret = 0; if (test_bit(NBD_RT_TIMEDOUT, &config->runtime_flags)) ret = -ETIMEDOUT; + nbd_config_put(nbd); return ret; }
This fixes a regression added with: commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 Author: Mike Christie <mchristi@redhat.com> Date: Sun Aug 4 14:10:06 2019 -0500 nbd: fix max number of supported devs where we can deadlock during device shutdown. The problem will occur if userpsace has done a NBD_CLEAR_SOCK call, then does close() before the recv_work work has done its nbd_config_put() call. If recv_work does the last call then it will do destroy_workqueue which will then be stuck waiting for the work we are running from. This fixes the issue by having nbd_start_device_ioctl flush the work queue on both the failure and success cases and has a refcount on the nbd_device while it is flushing the work queue. Cc: stable@vger.kernel.org Signed-off-by: Mike Christie <mchristi@redhat.com> --- drivers/block/nbd.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)