nbd: fix shutdown and recv work deadlock

Message ID	20191202215150.10250-1-mchristi@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=iG2v=ZY=vger.kernel.org=linux-block-owner@kernel.org> From: Mike Christie <mchristi@redhat.com> To: sunke32@huawei.com, nbd@other.debian.org, axboe@kernel.dk, josef@toxicpanda.com, linux-block@vger.kernel.org Cc: Mike Christie <mchristi@redhat.com>, stable@vger.kernel.org Subject: [PATCH] nbd: fix shutdown and recv work deadlock Date: Mon, 2 Dec 2019 15:51:50 -0600 Message-Id: <20191202215150.10250-1-mchristi@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Sender: linux-block-owner@vger.kernel.org Precedence: bulk
Series	nbd: fix shutdown and recv work deadlock \| expand nbd: fix shutdown and recv work deadlock

Message ID

20191202215150.10250-1-mchristi@redhat.com (mailing list archive)

State

New, archived

Headers

From: Mike Christie <mchristi@redhat.com>
To: sunke32@huawei.com, nbd@other.debian.org, axboe@kernel.dk,
        josef@toxicpanda.com, linux-block@vger.kernel.org
Cc: Mike Christie <mchristi@redhat.com>, stable@vger.kernel.org
Subject: [PATCH] nbd: fix shutdown and recv work deadlock
Date: Mon,  2 Dec 2019 15:51:50 -0600
Message-Id: <20191202215150.10250-1-mchristi@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk

Series

nbd: fix shutdown and recv work deadlock | expand

Commit Message

Mike Christie Dec. 2, 2019, 9:51 p.m. UTC

This fixes a regression added with:

commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
Author: Mike Christie <mchristi@redhat.com>
Date:   Sun Aug 4 14:10:06 2019 -0500

    nbd: fix max number of supported devs

where we can deadlock during device shutdown. The problem will occur if
userpsace has done a NBD_CLEAR_SOCK call, then does close() before the
recv_work work has done its nbd_config_put() call. If recv_work does the
last call then it will do destroy_workqueue which will then be stuck
waiting for the work we are running from.

This fixes the issue by having nbd_start_device_ioctl flush the work
queue on both the failure and success cases and has a refcount on the
nbd_device while it is flushing the work queue.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Christie <mchristi@redhat.com>
---
 drivers/block/nbd.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Mike Christie Dec. 3, 2019, 3:45 p.m. UTC | #1

Josef and Jens,

Ignore this patch. It could also deadlock but in a different way, and it
looks like there are other possible issues with races and refcounts. I
will send some new patches.


On 12/02/2019 03:51 PM, Mike Christie wrote:
> This fixes a regression added with:
> 
> commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4
> Author: Mike Christie <mchristi@redhat.com>
> Date:   Sun Aug 4 14:10:06 2019 -0500
> 
>     nbd: fix max number of supported devs
> 
> where we can deadlock during device shutdown. The problem will occur if
> userpsace has done a NBD_CLEAR_SOCK call, then does close() before the
> recv_work work has done its nbd_config_put() call. If recv_work does the
> last call then it will do destroy_workqueue which will then be stuck
> waiting for the work we are running from.
> 
> This fixes the issue by having nbd_start_device_ioctl flush the work
> queue on both the failure and success cases and has a refcount on the
> nbd_device while it is flushing the work queue.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Mike Christie <mchristi@redhat.com>
> ---
>  drivers/block/nbd.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 57532465fb83..f8597d2fb365 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -1293,13 +1293,15 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
>  
>  	if (max_part)
>  		bdev->bd_invalidated = 1;
> +
> +	refcount_inc(&nbd->config_refs);
>  	mutex_unlock(&nbd->config_lock);
>  	ret = wait_event_interruptible(config->recv_wq,
>  					 atomic_read(&config->recv_threads) == 0);
> -	if (ret) {
> +	if (ret)
>  		sock_shutdown(nbd);
> -		flush_workqueue(nbd->recv_workq);
> -	}
> +	flush_workqueue(nbd->recv_workq);
> +
>  	mutex_lock(&nbd->config_lock);
>  	nbd_bdev_reset(bdev);
>  	/* user requested, ignore socket errors */
> @@ -1307,6 +1309,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
>  		ret = 0;
>  	if (test_bit(NBD_RT_TIMEDOUT, &config->runtime_flags))
>  		ret = -ETIMEDOUT;
> +	nbd_config_put(nbd);
>  	return ret;
>  }
>  
>

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 57532465fb83..f8597d2fb365 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1293,13 +1293,15 @@  static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 
 	if (max_part)
 		bdev->bd_invalidated = 1;
+
+	refcount_inc(&nbd->config_refs);
 	mutex_unlock(&nbd->config_lock);
 	ret = wait_event_interruptible(config->recv_wq,
 					 atomic_read(&config->recv_threads) == 0);
-	if (ret) {
+	if (ret)
 		sock_shutdown(nbd);
-		flush_workqueue(nbd->recv_workq);
-	}
+	flush_workqueue(nbd->recv_workq);
+
 	mutex_lock(&nbd->config_lock);
 	nbd_bdev_reset(bdev);
 	/* user requested, ignore socket errors */
@@ -1307,6 +1309,7 @@  static int nbd_start_device_ioctl(struct nbd_device *nbd, struct block_device *b
 		ret = 0;
 	if (test_bit(NBD_RT_TIMEDOUT, &config->runtime_flags))
 		ret = -ETIMEDOUT;
+	nbd_config_put(nbd);
 	return ret;
 }

nbd: fix shutdown and recv work deadlock

Commit Message

Comments

Patch