Message ID | 20160801175948.GA6685@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 08/01/2016 10:59 AM, Mike Snitzer wrote: > This says to me that must_push_back is returning false because > dm_noflush_suspending() is false. When this happens -EIO will escape up > the IO stack. > > And this confirms that must_push_back() calling dm_noflush_suspending() > is quite suspect given queue_if_no_path was configured: we should > _always_ pushback if no paths are available. > > I'll dig deeper on really understanding _why_ must_push_back() is coded > like it is. Hello Mike, Earlier I had reported that I observe this behavior with CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have been able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but it takes a large number of iterations to trigger this behavior. The output that appears on my setup in the kernel log with a bunch of printk()'s added in the dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n is as follows (mpath 254:0 and /dev/mapper/mpathbe refer to the same multipath device): [ 314.755582] mpath 254:0: queue_if_no_path 0 -> 1 [ 314.770571] executing DM ioctl DEV_SUSPEND on mpathbe [ 314.770622] mpath 254:0: queue_if_no_path 1 -> 0 [ 314.770657] __multipath_map(): (a) returning -5 [ 314.770657] map_request(): clone_and_map_rq() returned -5 [ 314.770658] dm_complete_request: error = -5 Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 01 2016 at 2:55pm -0400, Bart Van Assche <bart.vanassche@sandisk.com> wrote: > On 08/01/2016 10:59 AM, Mike Snitzer wrote: > >This says to me that must_push_back is returning false because > >dm_noflush_suspending() is false. When this happens -EIO will escape up > >the IO stack. > > > >And this confirms that must_push_back() calling dm_noflush_suspending() > >is quite suspect given queue_if_no_path was configured: we should > >_always_ pushback if no paths are available. > > > >I'll dig deeper on really understanding _why_ must_push_back() is coded > >like it is. > > Hello Mike, > > Earlier I had reported that I observe this behavior with > CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have > been able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but > it takes a large number of iterations to trigger this behavior. The > output that appears on my setup in the kernel log with a bunch of > printk()'s added in the dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n > is as follows (mpath 254:0 and /dev/mapper/mpathbe refer to the same > multipath device): > > [ 314.755582] mpath 254:0: queue_if_no_path 0 -> 1 > [ 314.770571] executing DM ioctl DEV_SUSPEND on mpathbe > [ 314.770622] mpath 254:0: queue_if_no_path 1 -> 0 > [ 314.770657] __multipath_map(): (a) returning -5 > [ 314.770657] map_request(): clone_and_map_rq() returned -5 > [ 314.770658] dm_complete_request: error = -5 OK, that makes no sense at all (nevermind that your trace really isn't useful, my debug patch is more so). The old .request_fn code checks !blk_queue_stopped() in dm_request_fn(). When a dm-mpath device is suspended the queue gets stopped (same goes for dm-mq). So once a request-based DM device is suspended the .map_rq and .clone_and_map_rq hooks shouldn't get called. BUT the difference between .request_fn and dm-mq is that dm-mq doesn't ever check if the queue if stopped. We rely on blk-core to ensure that IO is not submitted (via dm_mq_queue_rq) while a device is suspended (and because the queue is stopped it shouldn't). Now you're saying both cases aren't working.. which is really confusing. Basically the request-based DM core should protect against requests being mapped while a device is suspended... seems to be mounting evidence that isn't the case. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 52baf8a..22baf29 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -433,10 +433,22 @@ failed: */ static int must_push_back(struct multipath *m) { + bool queue_if_no_path = test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags); + bool suspend_active = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) != + test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)); + bool suspending = (suspend_active && dm_noflush_suspending(m->ti)); + +#if 0 return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) != test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) && dm_noflush_suspending(m->ti))); +#else + if (!queue_if_no_path || !suspending) + DMERR_LIMIT("%s: queue_if_no_path=%d suspend_active=%d suspending=%d", + __func__, queue_if_no_path, suspend_active, suspending); + return (queue_if_no_path || suspending); +#endif } /* @@ -459,7 +471,7 @@ static int __multipath_map(struct dm_target *ti, struct request *clone, pgpath = choose_pgpath(m, nr_bytes); if (!pgpath) { - if (!must_push_back(m)) + if (WARN_ON_ONCE(!must_push_back(m))) r = -EIO; /* Failed */ return r; } else if (test_bit(MPATHF_QUEUE_IO, &m->flags) || @@ -1347,7 +1359,7 @@ static int do_end_io(struct multipath *m, struct request *clone, if (!atomic_read(&m->nr_valid_paths)) { if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { - if (!must_push_back(m)) + if (WARN_ON_ONCE(!must_push_back(m))) r = -EIO; } else { if (error == -EBADE)