diff mbox

dm-multipath test scripts

Message ID 56CADA20.7050209@ce.jp.nec.com (mailing list archive)
State Accepted, archived
Delegated to: Mike Snitzer
Headers show

Commit Message

Junichi Nomura Feb. 22, 2016, 9:51 a.m. UTC
On 02/20/16 15:12, Mike Snitzer wrote:
> On Fri, Feb 19 2016 at  2:42pm -0500, Mike Snitzer <snitzer@redhat.com> wrote:
>> Have you been running with blk-mq?
>> Either by setting CONFIG_DM_MQ_DEFAULT or:
>> echo Y > /sys/module/dm_mod/parameters/use_blk_mq
>>
>> I'm seeing test_02_sdev_delete fail with blk-mq enabled.
> 
> I only see failure if I stack dm-mq ontop of old non-mq scsi devices with:
> 
> echo N > /sys/module/scsi_mod/parameters/use_blk_mq
> echo Y > /sys/module/dm_mod/parameters/use_blk_mq

Ah, I didn't test that combination. I can see the failure, too.

> But this makes me think the novelty of having dm-mq support stacking on
> non-blk-mq devices was misplaced.  It is a senseless config.  I'll
> probably remove support for such stacking soon (next week). 

Looking at the failure, I suspect it could be a common issue of dm-mq
regardless of underlying device type.

When requeueing, following calls happen in dm-mq:
  dm_requeue_original_request() {
    ..
    blk_mq_requeue_request(rq);
    blk_mq_kick_requeue_list(rq->q);

then from block workqueue:
  blk_mq_requeue_work() {
    ..
    blk_mq_start_hw_queue(q);

and blk_mq_start_hw_queue() re-starts the queue even if DM has
stopped it for suspending. As a result, dm-mq ends up repeating
submit-error-requeue forever and suspend never completes. Or,
suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and
I/O error may directly be returned to submitter.

Attached patch fixes the problem for DM. But given the code comment,
there should be call sites which depend on 'start-if-stopped' behavior
of blk_mq_requeue_work and we may need other solution.

Comments

Mike Snitzer Feb. 22, 2016, 3:09 p.m. UTC | #1
On Mon, Feb 22 2016 at  4:51am -0500,
Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> On 02/20/16 15:12, Mike Snitzer wrote:
> > On Fri, Feb 19 2016 at  2:42pm -0500, Mike Snitzer <snitzer@redhat.com> wrote:
> >> Have you been running with blk-mq?
> >> Either by setting CONFIG_DM_MQ_DEFAULT or:
> >> echo Y > /sys/module/dm_mod/parameters/use_blk_mq
> >>
> >> I'm seeing test_02_sdev_delete fail with blk-mq enabled.
> > 
> > I only see failure if I stack dm-mq ontop of old non-mq scsi devices with:
> > 
> > echo N > /sys/module/scsi_mod/parameters/use_blk_mq
> > echo Y > /sys/module/dm_mod/parameters/use_blk_mq
> 
> Ah, I didn't test that combination. I can see the failure, too.
> 
> > But this makes me think the novelty of having dm-mq support stacking on
> > non-blk-mq devices was misplaced.  It is a senseless config.  I'll
> > probably remove support for such stacking soon (next week). 
> 
> Looking at the failure, I suspect it could be a common issue of dm-mq
> regardless of underlying device type.

In practice I'm not seeing any issues with dm-mq on scsi-mq.

> When requeueing, following calls happen in dm-mq:
>   dm_requeue_original_request() {
>     ..
>     blk_mq_requeue_request(rq);
>     blk_mq_kick_requeue_list(rq->q);
> 
> then from block workqueue:
>   blk_mq_requeue_work() {
>     ..
>     blk_mq_start_hw_queue(q);
> 
> and blk_mq_start_hw_queue() re-starts the queue even if DM has
> stopped it for suspending. As a result, dm-mq ends up repeating
> submit-error-requeue forever and suspend never completes. Or,
> suspend somehow proceeds to clear DMF_NOFLUSH_SUSPENDING and
> I/O error may directly be returned to submitter.

I should note that I applied this patch for 4.6:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6

(but it was purely a fallout of code-review, and looking at the nvme's
use of blk_mq_requeue_request, I did't consider it to be a critical fix
or anything)

> Attached patch fixes the problem for DM. But given the code comment,
> there should be call sites which depend on 'start-if-stopped' behavior
> of blk_mq_requeue_work and we may need other solution.

Nice catch, it certainly does seem like the blk-mq requeue code is
undo-ing steps DM took to protect dm-mpath during suspend.  It likely
doesn't bite dm-mq on scsi-mq because in general blk-mq takes the
rq->q->queue_lock much less frequently.  But when stacking blk-mq on
.request_fn queues it causes live-lock you detailed above.

I'm not sure what the right fix is, but it would seem we need
something.  I cannot speak to why blk_mq_start_hw_queues() was used to
begin with (or why it is important for blk-mq to forcibly kicked stopped
queues on requeue).  Jens?

I see commit 8b95741569ea ("blk-mq: use blk_mq_start_hw_queues() when
running requeue work") but I'm still missing why the upper-layer driver
of the blk-mq queue (dm-mq in this case) isn't free to keep the queue
stopped.  This is pretty important for DM's suspend functionality.

> -- 
> Jun'ichi Nomura, NEC Corporation
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 56c0a72..bbfe936 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -481,11 +481,7 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  		blk_mq_insert_request(rq, false, false, false);
>  	}
>  
> -	/*
> -	 * Use the start variant of queue running here, so that running
> -	 * the requeue work will kick stopped queues.
> -	 */
> -	blk_mq_start_hw_queues(q);
> +	blk_mq_run_hw_queues(q, false);
>  }
>  
>  void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
Junichi Nomura Feb. 23, 2016, 1:34 a.m. UTC | #2
On 02/23/16 00:09, Mike Snitzer wrote:
> I should note that I applied this patch for 4.6:
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6
> 
> (but it was purely a fallout of code-review, and looking at the nvme's
> use of blk_mq_requeue_request, I did't consider it to be a critical fix
> or anything)

The patch above contains following change:

> +static void dm_mq_requeue_request(struct request *rq)
> +{
> +	struct request_queue *q = rq->q;
> +	unsigned long flags;
> +
> +	blk_mq_requeue_request(rq);
> +	spin_lock_irqsave(q->queue_lock, flags);
> +	if (!blk_queue_stopped(q))
> +		blk_mq_kick_requeue_list(q);
> +	spin_unlock_irqrestore(q->queue_lock, flags);
> +}

If you make it conditional to call blk_mq_kick_requeue_list() here,
I think we have to call the function from start_queue(), too,
otherwise requeued requests might stay forever in q->requeue_list.
Mike Snitzer Feb. 23, 2016, 3:43 a.m. UTC | #3
On Mon, Feb 22 2016 at  8:34pm -0500,
Junichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> On 02/23/16 00:09, Mike Snitzer wrote:
> > I should note that I applied this patch for 4.6:
> > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=7db905b3d4294e5db4c2938fb7d0e5ba4bd798d6
> > 
> > (but it was purely a fallout of code-review, and looking at the nvme's
> > use of blk_mq_requeue_request, I did't consider it to be a critical fix
> > or anything)
> 
> The patch above contains following change:
> 
> > +static void dm_mq_requeue_request(struct request *rq)
> > +{
> > +	struct request_queue *q = rq->q;
> > +	unsigned long flags;
> > +
> > +	blk_mq_requeue_request(rq);
> > +	spin_lock_irqsave(q->queue_lock, flags);
> > +	if (!blk_queue_stopped(q))
> > +		blk_mq_kick_requeue_list(q);
> > +	spin_unlock_irqrestore(q->queue_lock, flags);
> > +}
> 
> If you make it conditional to call blk_mq_kick_requeue_list() here,
> I think we have to call the function from start_queue(), too,
> otherwise requeued requests might stay forever in q->requeue_list.

Yes, you're right.  Fixed up and pushed to rebased linux-dm.git 'dm-4.6'
branch:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.6&id=818c5f3bef750eb5998b468f84391e4d656b97ed

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
diff mbox

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 56c0a72..bbfe936 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -481,11 +481,7 @@  static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, false, false, false);
 	}
 
-	/*
-	 * Use the start variant of queue running here, so that running
-	 * the requeue work will kick stopped queues.
-	 */
-	blk_mq_start_hw_queues(q);
+	blk_mq_run_hw_queues(q, false);
 }
 
 void blk_mq_add_to_requeue_list(struct request *rq, bool at_head)