diff mbox

dm-mq and end_clone_request()

Message ID 20160720142727.GA57399@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Mike Snitzer July 20, 2016, 2:27 p.m. UTC
On Wed, Jul 20 2016 at 10:08am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Jul 19 2016 at  6:57pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > Hello Mike,
> > 
> > If I run a fio data integrity test against kernel v4.7-rc7 then I
> > see often that fio reports I/O errors if a path is removed despite
> > queue_if_no_path having been set in /etc/multipath.conf. Further
> > analysis showed that this happens because during SCSI device removal
> > a SCSI device enters state SDEV_CANCEL before the block layer queue
> > is marked as "dying". In that state I/O requests submitted to that
> > SCSI device are failed with -EIO. The behavior for
> > end_clone_request() in drivers/md/dm.c for such requests is as
...
> > - With multiqueue support enabled, pass the "error" argument to
> >   dm_complete_request().
> 
> The error arg is passed to dm_complete_request() regardless of queue
> type but it is only immediately used by the blk-mq API (via
> blk_mq_complete_request).
> 
> > Shouldn't end_clone_request() requeue failed requests in both cases
> > instead of passing the I/O error to the submitter only if multiqueue
> > is enabled?
> 
> Pretty sure you'll find it is _not_ blk-mq that is passing the error
> up.  (But if I'm proven wrong that will be welcomed news).
>
> The error passed to dm_complete_request() is always used to set
> tio->error which is later used by dm_done().  DM core handles errors
> later via softirq in dm_done() -- where the error is passed into the
> target_type's rq_end_io hook.
> 
> So in DM multipath you'll see do_end_io() we do finally act on the error
> we got from the lower layer.  And if the error is -EIO, noretry_error()
> will return true and -EIO will be returned up the IO stack.

For some reason I thought -EIO was considered not retryable.  That's
obviously wrong (e.g. noretry_error() doesn't seize on -EIO).

> In the end we're relying on SCSI to properly categorize the underlying
> faults as retryable vs not -- via SCSI's differentiated IO errors.
> 
> Unfortunately I'm not seeing anything that DM multipath can do
> differently here.  -EIO is _always_ propagated up.
> 
> It is strange that all the dm-mq testing that has been done didn't ever
> catch this.  The mptest testsuite is a baseline for validating DM
> multipath (and request-based DM core) changes.  But I've also had Red
> Hat's QE hammer dm-mq with heavy IO (in terms of the "dt" utility) on a
> larger NetApp testbed in the face of regular controller faults.
> 
> Must be this scenario of SDEV_CANCEL is a race that is relatively
> unique/rare to your testbed?
> 
> This raises the question: should SCSI be returning something other than
> -EIO for this case?  E.g. an error that is retryable?

So it must be that blk-mq is somehow returning -EIO earlier based on
rq->errors that is established during blk_mq_complete_request().

Please try this patch (not happy with it since it assumes all
request-based DM targets will handle IO errors -- which is fine for now
since DM multipath is the only one).  Could be you've already tried
this?  Does it fix your problem?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bart Van Assche July 20, 2016, 5:37 p.m. UTC | #1
On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> On Wed, Jul 20 2016 at 10:08am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
> [ ... ]
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index 7a96618..347ff25 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
>  	if (!rq->q->mq_ops)
>  		blk_complete_request(rq);
>  	else
> -		blk_mq_complete_request(rq, error);
> +		blk_mq_complete_request(rq, 0);
>  }
>
>  /*

Hello Mike,

Thanks for the patch. But unfortunately even with this patch applied I 
see fio reporting I/O errors when run on top of dm-mq after a 
(simulated) cable pull. I think these errors occur because 
dm_softirq_done() fails requests if clone == NULL and tio->error != 
NULL. Adding WARN_ON_ONCE(error && !tio->clone) in dm_complete_request() 
produced the following call stack:

Workqueue: kblockd blk_mq_run_work_fn
Call Trace:
  [<ffffffff8132aa27>] dump_stack+0x68/0xa1
  [<ffffffff81061ed6>] __warn+0xc6/0xe0
  [<ffffffff81061fa8>] warn_slowpath_null+0x18/0x20
  [<ffffffffa0253fda>] dm_complete_request+0x8a/0xb0 [dm_mod]
  [<ffffffffa0255200>] map_request+0x70/0x2e0 [dm_mod]
  [<ffffffffa025771d>] dm_mq_queue_rq+0x7d/0x120 [dm_mod]
  [<ffffffff8131195a>] __blk_mq_run_hw_queue+0x1da/0x350
  [<ffffffff813120a0>] blk_mq_run_work_fn+0x10/0x20
  [<ffffffff8107f279>] process_one_work+0x1f9/0x6a0
  [<ffffffff8107f769>] worker_thread+0x49/0x490
  [<ffffffff81085f7a>] kthread+0xea/0x100
  [<ffffffff8162faff>] ret_from_fork+0x1f/0x40

Please let me know if you need more information.

Thanks,

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer July 20, 2016, 6:33 p.m. UTC | #2
On Wed, Jul 20 2016 at  1:37pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> >On Wed, Jul 20 2016 at 10:08am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >[ ... ]
> >diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> >index 7a96618..347ff25 100644
> >--- a/drivers/md/dm-rq.c
> >+++ b/drivers/md/dm-rq.c
> >@@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
> > 	if (!rq->q->mq_ops)
> > 		blk_complete_request(rq);
> > 	else
> >-		blk_mq_complete_request(rq, error);
> >+		blk_mq_complete_request(rq, 0);
> > }
> >
> > /*
> 
> Hello Mike,
> 
> Thanks for the patch. But unfortunately even with this patch applied
> I see fio reporting I/O errors when run on top of dm-mq after a
> (simulated) cable pull. I think these errors occur because
> dm_softirq_done() fails requests if clone == NULL and tio->error !=
> NULL. Adding WARN_ON_ONCE(error && !tio->clone) in
> dm_complete_request() produced the following call stack:
> 
> Workqueue: kblockd blk_mq_run_work_fn
> Call Trace:
>  [<ffffffff8132aa27>] dump_stack+0x68/0xa1
>  [<ffffffff81061ed6>] __warn+0xc6/0xe0
>  [<ffffffff81061fa8>] warn_slowpath_null+0x18/0x20
>  [<ffffffffa0253fda>] dm_complete_request+0x8a/0xb0 [dm_mod]
>  [<ffffffffa0255200>] map_request+0x70/0x2e0 [dm_mod]
>  [<ffffffffa025771d>] dm_mq_queue_rq+0x7d/0x120 [dm_mod]
>  [<ffffffff8131195a>] __blk_mq_run_hw_queue+0x1da/0x350
>  [<ffffffff813120a0>] blk_mq_run_work_fn+0x10/0x20
>  [<ffffffff8107f279>] process_one_work+0x1f9/0x6a0
>  [<ffffffff8107f769>] worker_thread+0x49/0x490
>  [<ffffffff81085f7a>] kthread+0xea/0x100
>  [<ffffffff8162faff>] ret_from_fork+0x1f/0x40
> 
> Please let me know if you need more information.

Cable pull testing isn't new.  I'd have thought it covered (albeit
simulated) via mptest.  Does mptest pass for you?
https://github.com/snitm/mptest

Last I knew your tests were all passing with dm-mq.  Would love to
understand how/when they have regressed.

If clone is NULL then the request wasn't actually issued to (or even
allocated from the tag space of) the underlying blk-mq request_queue
(via clone_and_map_rq's call to __multipath_map).

The original request could've been previously cloned, failed due to
cable pull, it began retry via requeue, but on retry
blk_mq_alloc_request() could've failed in __multipath_map() -- (maybe
now the request_queue was "dying"?).  Or __multipath_map() failed for
some other reason.

Would be interesting to know the error returned from map_request()'s
ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
But the stack you've provided shows map_request calling
dm_complete_request(), which implies dm_kill_unmapped_request() is being
called due to ti->type->clone_and_map_rq() returning < 0.

Could be that __multipath_map() is returning an error even before it
would've otherwise called blk_mq_alloc_request().

You said you are using queue_if_no_path?  Would be interesting to see
where __multipth_map is returning with an error.

Could be that __multipath_map() should respond differently if
blk_mq_alloc_request() fails and the queue is dying -- at a minimum the
path should be failed.  But before we explore that we need to understand
why the clone_and_map_rq() is returning < 0.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer July 21, 2016, 8:32 p.m. UTC | #3
On Wed, Jul 20 2016 at  1:37pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 07:27 AM, Mike Snitzer wrote:
> >On Wed, Jul 20 2016 at 10:08am -0400,
> >Mike Snitzer <snitzer@redhat.com> wrote:
> >[ ... ]
> >diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> >index 7a96618..347ff25 100644
> >--- a/drivers/md/dm-rq.c
> >+++ b/drivers/md/dm-rq.c
> >@@ -414,7 +414,7 @@ static void dm_complete_request(struct request *rq, int error)
> > 	if (!rq->q->mq_ops)
> > 		blk_complete_request(rq);
> > 	else
> >-		blk_mq_complete_request(rq, error);
> >+		blk_mq_complete_request(rq, 0);
> > }
> >
> > /*
> 
> Hello Mike,
> 
> Thanks for the patch. But unfortunately even with this patch applied
> I see fio reporting I/O errors when run on top of dm-mq after a
> (simulated) cable pull.

BTW, curious, how is it that you're simulating the cable pull?  I'd like
to see if doing the same, via a new test in mptest, exposes this issue
you're hitting.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 21, 2016, 8:40 p.m. UTC | #4
On 07/21/2016 01:32 PM, Mike Snitzer wrote:
> BTW, curious, how is it that you're simulating the cable pull?  I'd like
> to see if doing the same, via a new test in mptest, exposes this issue
> you're hitting.

Hello Mike,

Writing into the /sys/class/srp_remote_ports/port-*/delete sysfs 
attribute causes the SRP initiator driver to call scsi_remove_host(), 
what also happens if the dev_loss_tmo timer expires.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 21, 2016, 8:58 p.m. UTC | #5
On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> Would be interesting to know the error returned from map_request()'s
> ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> But the stack you've provided shows map_request calling
> dm_complete_request(), which implies dm_kill_unmapped_request() is being
> called due to ti->type->clone_and_map_rq() returning < 0.

Hello Mike,

Apparently certain requests fail with -EIO because DM_DEV_SUSPEND ioctls 
are being submitted to the same multipath target. As you know 
DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON() 
statement that I added in driver dm-mpath statement learned me that 
multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output of 
strace -fp$(pidof multipathd) I found the following:

[pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
[pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
[pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
[pid 13927] write(1, "uevent trigger error\n", 21) = 21
[pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
[pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
[pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0

I'm still analyzing these and other messages.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer July 25, 2016, 5:53 p.m. UTC | #6
On Thu, Jul 21 2016 at  4:58pm -0400,
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> >Would be interesting to know the error returned from map_request()'s
> >ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> >But the stack you've provided shows map_request calling
> >dm_complete_request(), which implies dm_kill_unmapped_request() is being
> >called due to ti->type->clone_and_map_rq() returning < 0.
> 
> Hello Mike,
> 
> Apparently certain requests fail with -EIO because DM_DEV_SUSPEND
> ioctls are being submitted to the same multipath target. As you know
> DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON()
> statement that I added in driver dm-mpath statement learned me that
> multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output
> of strace -fp$(pidof multipathd) I found the following:
> 
> [pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
> [pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
> [pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
> [pid 13927] write(1, "uevent trigger error\n", 21) = 21
> [pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
> [pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
> [pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0
> 
> I'm still analyzing these and other messages.

The various ioctls you're seeing is just multipathd responding to the
failures.  Part of reloading a table (with revised path info, etc) is to
suspend and then resume the device that is being updated.

But I'm not actually sure on the historic reasoning of why
queue_if_no_path is disabled (and active setting saved) on suspend.

I'll think about this further but maybe others recall why?
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike Snitzer July 25, 2016, 9:23 p.m. UTC | #7
On Mon, Jul 25 2016 at  1:53pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Thu, Jul 21 2016 at  4:58pm -0400,
> Bart Van Assche <bart.vanassche@sandisk.com> wrote:
> 
> > On 07/20/2016 11:33 AM, Mike Snitzer wrote:
> > >Would be interesting to know the error returned from map_request()'s
> > >ti->type->clone_and_map_rq().  Really should just be DM_MAPIO_REQUEUE.
> > >But the stack you've provided shows map_request calling
> > >dm_complete_request(), which implies dm_kill_unmapped_request() is being
> > >called due to ti->type->clone_and_map_rq() returning < 0.
> > 
> > Hello Mike,
> > 
> > Apparently certain requests fail with -EIO because DM_DEV_SUSPEND
> > ioctls are being submitted to the same multipath target. As you know
> > DM_DEV_SUSPEND changes QUEUE_IF_NO_PATH from 1 into 0. A WARN_ON()
> > statement that I added in driver dm-mpath statement learned me that
> > multipathd is submitting these DM_DEV_SUSPEND ioctls. In the output
> > of strace -fp$(pidof multipathd) I found the following:
> > 
> > [pid 13927] ioctl(5, DM_TABLE_STATUS, 0x7fa1000483f0) = 0
> > [pid 13927] write(1, "mpathbe: failed to setup multipa"..., 35) = 35
> > [pid 13927] write(1, "dm-0: uev_add_map failed\n", 25) = 25
> > [pid 13927] write(1, "uevent trigger error\n", 21) = 21
> > [pid 13927] write(1, "sdh: remove path (uevent)\n", 26) = 26
> > [pid 13927] ioctl(5, DM_TABLE_LOAD, 0x7fa1000483f0) = 0
> > [pid 13927] ioctl(5, DM_DEV_SUSPEND, 0x7fa1000483f0) = 0
> > 
> > I'm still analyzing these and other messages.
> 
> The various ioctls you're seeing is just multipathd responding to the
> failures.  Part of reloading a table (with revised path info, etc) is to
> suspend and then resume the device that is being updated.
> 
> But I'm not actually sure on the historic reasoning of why
> queue_if_no_path is disabled (and active setting saved) on suspend.
> 
> I'll think about this further but maybe others recall why?

I think it dates back to when we queued IO within the multipath target.
Commit e809917735ebf ("dm mpath: push back requests instead of
queueing") obviously changed how we handle the retry.

But regardless __must_push_back() should catch the case where
queue_io_no_path is cleared during suspend (by checking if current !=
saved).

SO I'd be curious to know if your debugging has enabled you to identify
exactly where in the dm-mapth.c code the -EIO return is being
established.  do_end_io() is the likely candidate -- but again the
__must_push_back() check should prevent it and DM_ENDIO_REQUEUE should
be returned.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bart Van Assche July 25, 2016, 10 p.m. UTC | #8
On 07/25/2016 02:23 PM, Mike Snitzer wrote:
> So I'd be curious to know if your debugging has enabled you to identify
> exactly where in the dm-mapth.c code the -EIO return is being
> established.  do_end_io() is the likely candidate -- but again the
> __must_push_back() check should prevent it and DM_ENDIO_REQUEUE should
> be returned.

Hello Mike,

Thanks for looking further into this. The pr_info() statement that I had 
added in the following code block in __multipath_map() fired what told 
me that the following code block triggered the -EIO return:

	if (!pgpath) {
		if (!must_push_back(m))
			r = -EIO;	/* Failed */
		pr_info("%s(): (a) returning %d\n", __func__, r);
		return r;
	}

 From the system log:

kernel: mpath 254:0: queue_if_no_path 1 -> 0
kernel: __multipath_map(): (a) returning -5

The code that I had added in queue_if_no_path() is as follows:

	old = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags);
         [ ... ]
	pr_info("mpath %s: queue_if_no_path %d -> %d\n",
		dm_device_name(dm_table_get_md(m->ti->table)), old,
		queue_if_no_path);

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index 7a96618..347ff25 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -414,7 +414,7 @@  static void dm_complete_request(struct request *rq, int error)
 	if (!rq->q->mq_ops)
 		blk_complete_request(rq);
 	else
-		blk_mq_complete_request(rq, error);
+		blk_mq_complete_request(rq, 0);
 }
 
 /*