diff mbox series

[2/4] block: Fix a race between request queue freezing and running queues

Message ID 20190401212014.192753-3-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series Fir a race between request queue cleanup and running queues | expand

Commit Message

Bart Van Assche April 1, 2019, 9:20 p.m. UTC
Any request queue data structure may change while a queue is frozen.
Hence make sure that blk_mq_run_hw_queues() does not access any hw
queue while a request queue is frozen.

After blk_cleanup_queue() has marked a queue as dead it is no longer
safe to access the hardware queue data structures. This patch avoids
that blk_mq_run_hw_queues() crashes when called during or after
blk_cleanup_queue() has freed the hardware queues. This patch is a
variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't
call blk_mq_run_hw_queues() for dead or dying queues "). This patch
is similar in nature to commit c246e80d8673 ("block: Avoid that
request_fn is invoked on a dead queue"; v3.8). An example of a crash
that is fixed by this patch:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30
Call Trace:
 [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90
 [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120
 [<ffffffff81098cb4>] process_one_work+0x154/0x410
 [<ffffffff81099896>] worker_thread+0x116/0x4a0
 [<ffffffff8109edb9>] kthread+0xc9/0xe0
 [<ffffffff81619b05>] ret_from_fork+0x55/0x80

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: <stable@vger.kernel.org>
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17.
Reported-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Ming Lei April 2, 2019, 12:53 a.m. UTC | #1
On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
> Any request queue data structure may change while a queue is frozen.
> Hence make sure that blk_mq_run_hw_queues() does not access any hw
> queue while a request queue is frozen.
> 
> After blk_cleanup_queue() has marked a queue as dead it is no longer
> safe to access the hardware queue data structures. This patch avoids
> that blk_mq_run_hw_queues() crashes when called during or after
> blk_cleanup_queue() has freed the hardware queues. This patch is a
> variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't
> call blk_mq_run_hw_queues() for dead or dying queues "). This patch
> is similar in nature to commit c246e80d8673 ("block: Avoid that
> request_fn is invoked on a dead queue"; v3.8). An example of a crash
> that is fixed by this patch:
> 
> BUG: unable to handle kernel NULL pointer dereference at (null)
> IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30
> Call Trace:
>  [<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90
>  [<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120
>  [<ffffffff81098cb4>] process_one_work+0x154/0x410
>  [<ffffffff81099896>] worker_thread+0x116/0x4a0
>  [<ffffffff8109edb9>] kthread+0xc9/0xe0
>  [<ffffffff81619b05>] ret_from_fork+0x55/0x80
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: <stable@vger.kernel.org>
> Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17.
> Reported-by: James Smart <james.smart@broadcom.com>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3ff3d7b49969..652d0c6d5945 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
>  	struct blk_mq_hw_ctx *hctx;
>  	int i;
>  
> +	/*
> +	 * Do not run any hardware queues if the queue is frozen or if a
> +	 * concurrent blk_cleanup_queue() call is removing any data
> +	 * structures used by this function.
> +	 */
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +		return;
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		if (blk_mq_hctx_stopped(hctx))
>  			continue;
>  
>  		blk_mq_run_hw_queue(hctx, async);
>  	}
> +	percpu_ref_put(&q->q_usage_counter);
>  }
>  EXPORT_SYMBOL(blk_mq_run_hw_queues);

I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put()
in the fast path if we simply release all hctx resource in hctx's
release handler by the following patch:

https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u

Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in
scsi_end_request().

Thanks,
Ming
Bart Van Assche April 2, 2019, 3:44 p.m. UTC | #2
On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote:
> On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 3ff3d7b49969..652d0c6d5945 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
> >  	struct blk_mq_hw_ctx *hctx;
> >  	int i;
> >  
> > +	/*
> > +	 * Do not run any hardware queues if the queue is frozen or if a
> > +	 * concurrent blk_cleanup_queue() call is removing any data
> > +	 * structures used by this function.
> > +	 */
> > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > +		return;
> >  	queue_for_each_hw_ctx(q, hctx, i) {
> >  		if (blk_mq_hctx_stopped(hctx))
> >  			continue;
> >  
> >  		blk_mq_run_hw_queue(hctx, async);
> >  	}
> > +	percpu_ref_put(&q->q_usage_counter);
> >  }
> >  EXPORT_SYMBOL(blk_mq_run_hw_queues);
> 
> I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put()
> in the fast path if we simply release all hctx resource in hctx's
> release handler by the following patch:
> 
> https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u
> 
> Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in
> scsi_end_request().

The above approach has the advantages of being easy to review and to maintain.

Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler"
makes the block layer more complicated because it introduces a new state for
hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but
the hardware queues are still in use by the block layer core.

Let's see what other reviewers think.

Bart.
Ming Lei April 3, 2019, 3:31 a.m. UTC | #3
On Tue, Apr 02, 2019 at 08:44:10AM -0700, Bart Van Assche wrote:
> On Tue, 2019-04-02 at 08:53 +0800, Ming Lei wrote:
> > On Mon, Apr 01, 2019 at 02:20:12PM -0700, Bart Van Assche wrote:
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 3ff3d7b49969..652d0c6d5945 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
> > >  	struct blk_mq_hw_ctx *hctx;
> > >  	int i;
> > >  
> > > +	/*
> > > +	 * Do not run any hardware queues if the queue is frozen or if a
> > > +	 * concurrent blk_cleanup_queue() call is removing any data
> > > +	 * structures used by this function.
> > > +	 */
> > > +	if (!percpu_ref_tryget(&q->q_usage_counter))
> > > +		return;
> > >  	queue_for_each_hw_ctx(q, hctx, i) {
> > >  		if (blk_mq_hctx_stopped(hctx))
> > >  			continue;
> > >  
> > >  		blk_mq_run_hw_queue(hctx, async);
> > >  	}
> > > +	percpu_ref_put(&q->q_usage_counter);
> > >  }
> > >  EXPORT_SYMBOL(blk_mq_run_hw_queues);
> > 
> > I don't see it is necessary to add percpu_ref_tryget()/percpu_ref_put()
> > in the fast path if we simply release all hctx resource in hctx's
> > release handler by the following patch:
> > 
> > https://lore.kernel.org/linux-block/20190401044247.29881-2-ming.lei@redhat.com/T/#u
> > 
> > Even we can kill the percpu_ref_tryget_live()/percpu_ref_put() in
> > scsi_end_request().
> 
> The above approach has the advantages of being easy to review and to maintain.
> 
> Patch "[PATCH V2 1/3] blk-mq: free hw queue's resource in hctx's release handler"
> makes the block layer more complicated because it introduces a new state for
> hardware queues: block driver cleanup has happened (set->ops->exit_hctx(...)) but

We are done with driver after blk_freeze_queue() and blk_sync_queue(),
then call .exit_hctx() to say good bye with driver, I don't see it
causes any issue.

> the hardware queues are still in use by the block layer core.

Block layer has the correct in-memory state to work well, and no
driver activity is involved too.

Thanks,
Ming
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ff3d7b49969..652d0c6d5945 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1499,12 +1499,20 @@  void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 	struct blk_mq_hw_ctx *hctx;
 	int i;
 
+	/*
+	 * Do not run any hardware queues if the queue is frozen or if a
+	 * concurrent blk_cleanup_queue() call is removing any data
+	 * structures used by this function.
+	 */
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+		return;
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_hctx_stopped(hctx))
 			continue;
 
 		blk_mq_run_hw_queue(hctx, async);
 	}
+	percpu_ref_put(&q->q_usage_counter);
 }
 EXPORT_SYMBOL(blk_mq_run_hw_queues);