diff mbox series

[V3,1/6] blk-mq: grab .q_usage_counter when queuing request from plug code path

Message ID 20190403102609.18707-2-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: fix races related with freeing queue | expand

Commit Message

Ming Lei April 3, 2019, 10:26 a.m. UTC
Just like aio/io_uring, we need to grab 2 refcount for queuing one
request, one is for submission, another is for completion.

If the request isn't queued from plug code path, the refcount grabbed
in generic_make_request() serves for submission. In theroy, this
refcount should have been released after the sumission(async run queue)
is done. But blk_freeze_queue() is working with blk_sync_queue() together
for canceling async run queue work activities if hctx->run_work is scheduled
with holding the refcount.

However, if request is staggered into plug list, and finally queued
from plug code path, the refcount in submission side is actually missed.
And we may start to run queue after queue is removed, then kernel oops
is triggered.

Fixes the issue by grab .q_usage_counter before calling
blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
safe because the queue is absolutely alive before inserting request.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Cc: jianchao wang <jianchao.w.wang@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Bart Van Assche April 3, 2019, 3:21 p.m. UTC | #1
On Wed, 2019-04-03 at 18:26 +0800, Ming Lei wrote:
> However, if request is staggered into plug list, and finally queued
> from plug code path, the refcount in submission side is actually missed.
> And we may start to run queue after queue is removed, then kernel oops
> is triggered.

I don't think that this patch is necessary. blk_mq_get_request() increases
q_usage_counter. In other words, as long as at least one request has been
allocated that has not finished it is guaranteed that q_usage_counter > 0.
So there is no need for additional protection in blk_mq_flush_plug_list().

Bart.
Ming Lei April 3, 2019, 4:27 p.m. UTC | #2
On Wed, Apr 03, 2019 at 08:21:11AM -0700, Bart Van Assche wrote:
> On Wed, 2019-04-03 at 18:26 +0800, Ming Lei wrote:
> > However, if request is staggered into plug list, and finally queued
> > from plug code path, the refcount in submission side is actually missed.
> > And we may start to run queue after queue is removed, then kernel oops
> > is triggered.
> 
> I don't think that this patch is necessary. blk_mq_get_request() increases
> q_usage_counter. In other words, as long as at least one request has been
> allocated that has not finished it is guaranteed that q_usage_counter > 0.
> So there is no need for additional protection in blk_mq_flush_plug_list().

blk_mq_flush_plug_list():
	blk_mq_sched_insert_requests()
		insert requests to sw queue or scheduler queue 
		blk_mq_run_hw_queue

Because of concurrent run queue, all requests inserted above may be completed
before calling the above blk_mq_run_hw_queue. Then queue can be freed
during the above blk_mq_run_hw_queue().

Thanks,
Ming
Bart Van Assche April 3, 2019, 9:22 p.m. UTC | #3
On Thu, 2019-04-04 at 00:27 +0800, Ming Lei wrote:
> blk_mq_flush_plug_list():
> 	blk_mq_sched_insert_requests()
> 		insert requests to sw queue or scheduler queue 
> 		blk_mq_run_hw_queue
> 
> Because of concurrent run queue, all requests inserted above may be completed
> before calling the above blk_mq_run_hw_queue. Then queue can be freed
> during the above blk_mq_run_hw_queue().

Thanks for the clarification. That makes sense to me.

Bart.
Ming Lei April 4, 2019, 6:57 a.m. UTC | #4
On Wed, Apr 03, 2019 at 02:22:54PM -0700, Bart Van Assche wrote:
> On Thu, 2019-04-04 at 00:27 +0800, Ming Lei wrote:
> > blk_mq_flush_plug_list():
> > 	blk_mq_sched_insert_requests()
> > 		insert requests to sw queue or scheduler queue 
> > 		blk_mq_run_hw_queue
> > 
> > Because of concurrent run queue, all requests inserted above may be completed
> > before calling the above blk_mq_run_hw_queue. Then queue can be freed
> > during the above blk_mq_run_hw_queue().
> 
> Thanks for the clarification. That makes sense to me.

I guess it is helpful for review to add the above description in comment log,
will do that in V4.

Thanks,
Ming
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ff3d7b49969..5b586affee09 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1728,9 +1728,12 @@  void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 		if (rq->mq_hctx != this_hctx || rq->mq_ctx != this_ctx) {
 			if (this_hctx) {
 				trace_block_unplug(this_q, depth, !from_schedule);
+
+				percpu_ref_get(&this_q->q_usage_counter);
 				blk_mq_sched_insert_requests(this_hctx, this_ctx,
 								&rq_list,
 								from_schedule);
+				percpu_ref_put(&this_q->q_usage_counter);
 			}
 
 			this_q = rq->q;
@@ -1749,8 +1752,11 @@  void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
 	 */
 	if (this_hctx) {
 		trace_block_unplug(this_q, depth, !from_schedule);
+
+		percpu_ref_get(&this_q->q_usage_counter);
 		blk_mq_sched_insert_requests(this_hctx, this_ctx, &rq_list,
 						from_schedule);
+		percpu_ref_put(&this_q->q_usage_counter);
 	}
 }