diff mbox series

block: init flush rq ref count to 1

Message ID 20190307213718.28017-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series block: init flush rq ref count to 1 | expand

Commit Message

Josef Bacik March 7, 2019, 9:37 p.m. UTC
We discovered a problem in newer kernels where a disconnect of a NBD
device while the flush request was pending would result in a hang.  This
is because the blk mq timeout handler does

        if (!refcount_inc_not_zero(&rq->ref))
                return true;

to determine if it's ok to run the timeout handler for the request.
Flush_rq's don't have a ref count set, so we'd skip running the timeout
handler for this request and it would just sit there in limbo forever.

Fix this by always setting the refcount of any request going through
blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
requests to verify that it hung, and then tested with this patch to
verify I got the timeout as expected and the error handling kicked in.
Thanks,

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Jens Axboe March 7, 2019, 9:54 p.m. UTC | #1
On 3/7/19 2:37 PM, Josef Bacik wrote:
> We discovered a problem in newer kernels where a disconnect of a NBD
> device while the flush request was pending would result in a hang.  This
> is because the blk mq timeout handler does
> 
>         if (!refcount_inc_not_zero(&rq->ref))
>                 return true;
> 
> to determine if it's ok to run the timeout handler for the request.
> Flush_rq's don't have a ref count set, so we'd skip running the timeout
> handler for this request and it would just sit there in limbo forever.
> 
> Fix this by always setting the refcount of any request going through
> blk_init_rq() to 1.  I tested this with a nbd-server that dropped flush
> requests to verify that it hung, and then tested with this patch to
> verify I got the timeout as expected and the error handling kicked in.
> Thanks,

Looks good to me, thanks Josef.
diff mbox series

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 6b78ec56a4f2..6107b27c14fb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -116,6 +116,7 @@  void blk_rq_init(struct request_queue *q, struct request *rq)
 	rq->internal_tag = -1;
 	rq->start_time_ns = ktime_get_ns();
 	rq->part = NULL;
+	refcount_set(&rq->ref, 1);
 }
 EXPORT_SYMBOL(blk_rq_init);