diff mbox series

[V3,6/6] blk-mq: support concurrent queue quiesce/unquiesce

Message ID 20211009034713.1489183-7-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: support concurrent queue quiescing | expand

Commit Message

Ming Lei Oct. 9, 2021, 3:47 a.m. UTC
blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support
concurrent/nested quiesce. One biggest issue is that unquiesce can happen
unexpectedly in case that quiesce/unquiesce are run concurrently from
more than one context.

This patch introduces q->mq_quiesce_depth to deal concurrent quiesce,
and we only unquiesce queue when it is the last/outer-most one of all
contexts.

Several kernel panic issue has been reported[1][2][3] when running stress
quiesce test. And this patch has been verified in these reports.

[1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787
[2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722
[3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 21 ++++++++++++++++++---
 include/linux/blkdev.h |  2 ++
 2 files changed, 20 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Oct. 12, 2021, 10:30 a.m. UTC | #1
On Sat, Oct 09, 2021 at 11:47:13AM +0800, Ming Lei wrote:
> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	if (!q->quiesce_depth++)
> +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);

We can get rid of the QUEUE_FLAG_QUIESCED flag now and just look
at ->quiesce_depth directly.

> +	spin_lock_irqsave(&q->queue_lock, flags);
> +	WARN_ON_ONCE(q->quiesce_depth <= 0);
> +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {

	if (WARN_ON_ONCE(q->quiesce_depth <= 0))
		; /* oops */
	else if (!--q->quiesce_depth)
		run_queue = true;

Otherwise this looks sensible.
Ming Lei Oct. 12, 2021, 3:06 p.m. UTC | #2
On Tue, Oct 12, 2021 at 12:30:10PM +0200, Christoph Hellwig wrote:
> On Sat, Oct 09, 2021 at 11:47:13AM +0800, Ming Lei wrote:
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	if (!q->quiesce_depth++)
> > +		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
> 
> We can get rid of the QUEUE_FLAG_QUIESCED flag now and just look
> at ->quiesce_depth directly.

I'd rather not to do that given we need to check QUEUE_FLAG_QUIESCED in fast
path.

> 
> > +	spin_lock_irqsave(&q->queue_lock, flags);
> > +	WARN_ON_ONCE(q->quiesce_depth <= 0);
> > +	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
> 
> 	if (WARN_ON_ONCE(q->quiesce_depth <= 0))
> 		; /* oops */
> 	else if (!--q->quiesce_depth)
> 		run_queue = true;

OK.


Thanks,
Ming
Christoph Hellwig Oct. 12, 2021, 3:08 p.m. UTC | #3
On Tue, Oct 12, 2021 at 11:06:51PM +0800, Ming Lei wrote:
> > We can get rid of the QUEUE_FLAG_QUIESCED flag now and just look
> > at ->quiesce_depth directly.
> 
> I'd rather not to do that given we need to check QUEUE_FLAG_QUIESCED in fast
> path.

Checking an integer vs checking a bit is easier actually faster or at
least the same speed depending on the architecture / micro architecture.
Ming Lei Oct. 12, 2021, 3:13 p.m. UTC | #4
On Tue, Oct 12, 2021 at 05:08:27PM +0200, Christoph Hellwig wrote:
> On Tue, Oct 12, 2021 at 11:06:51PM +0800, Ming Lei wrote:
> > > We can get rid of the QUEUE_FLAG_QUIESCED flag now and just look
> > > at ->quiesce_depth directly.
> > 
> > I'd rather not to do that given we need to check QUEUE_FLAG_QUIESCED in fast
> > path.
> 
> Checking an integer vs checking a bit is easier actually faster or at
> least the same speed depending on the architecture / micro architecture.

->queue_flag is always hot, but quiesce_depth can't be and shouldn't be
since it is used very less.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 21bf4c3f0825..cb58f21c5be9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -209,7 +209,12 @@  EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
  */
 void blk_mq_quiesce_queue_nowait(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	if (!q->quiesce_depth++)
+		blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
 
@@ -250,10 +255,20 @@  EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
  */
 void blk_mq_unquiesce_queue(struct request_queue *q)
 {
-	blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+	unsigned long flags;
+	bool run_queue = false;
+
+	spin_lock_irqsave(&q->queue_lock, flags);
+	WARN_ON_ONCE(q->quiesce_depth <= 0);
+	if (q->quiesce_depth > 0 && !--q->quiesce_depth) {
+		blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
+		run_queue = true;
+	}
+	spin_unlock_irqrestore(&q->queue_lock, flags);
 
 	/* dispatch requests which are inserted during quiescing */
-	blk_mq_run_hw_queues(q, true);
+	if (run_queue)
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unquiesce_queue);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e960d74615e..74c60e2d61f9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -315,6 +315,8 @@  struct request_queue {
 	 */
 	struct mutex		mq_freeze_lock;
 
+	int			quiesce_depth;
+
 	struct blk_mq_tag_set	*tag_set;
 	struct list_head	tag_set_list;
 	struct bio_set		bio_split;