Message ID | 20161107192646.GA11572@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/07/2016 12:26 PM, Mike Snitzer wrote: > Otherwise users can easily shoot themselves in the foot by creating the > situation where the top-level stacked device (e.g. DM multipath) has a > larger max_sectors_kb than the underlying device(s). Which will > certainly lead to IO errors due to the "over max size limit" check in > blk_cloned_rq_check_limits(). > > This is a crude, yet effective, solution that forces the use of system > software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the > underlying devices _before_ a layer like DM multipath is layered ontop. Maybe I'm missing something, but the code we have in place splits it into max sectors for software and hardware. Shouldn't the stacked devices have max_hw_sectors capped to what the lower levels support? If that was done, we would not have to worry about a user fiddling with max_sectors_kb, since it could only be smaller (or equal to) the max size of the lower level.
On Mon, Nov 07 2016 at 2:32pm -0500, Jens Axboe <axboe@kernel.dk> wrote: > On 11/07/2016 12:26 PM, Mike Snitzer wrote: > >Otherwise users can easily shoot themselves in the foot by creating the > >situation where the top-level stacked device (e.g. DM multipath) has a > >larger max_sectors_kb than the underlying device(s). Which will > >certainly lead to IO errors due to the "over max size limit" check in > >blk_cloned_rq_check_limits(). > > > >This is a crude, yet effective, solution that forces the use of system > >software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the > >underlying devices _before_ a layer like DM multipath is layered ontop. > > Maybe I'm missing something, but the code we have in place splits it > into max sectors for software and hardware. Shouldn't the stacked > devices have max_hw_sectors capped to what the lower levels support? If > that was done, we would not have to worry about a user fiddling with > max_sectors_kb, since it could only be smaller (or equal to) the max > size of the lower level. DM multipath just uses blk_stack_limits() to stack limits, which has: t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors); But I assume you realize that.. I'm just missing the relation you're saying exists, or should exist, between max_hw_sectors and max_sectors (other than the obvious: max_sectors cannot be greater than max_hw_sectors) as they relate to stacking. You're suggesting that when the DM multipath device's limits are stacked up from the underlying devices: cap the mpath's max_hw_sectors to the underlying devices' max_sectors? -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike,
Mike> You're suggesting that when the DM multipath device's limits are
Mike> stacked up from the underlying devices: cap the mpath's
Mike> max_hw_sectors to the underlying devices' max_sectors?
That will break SG_IO, firmware upgrades, etc. (things you probably
shouldn't do on the MP device in the first place but which users often
do).
However, doesn't it make more sense to tweak limits on DM device instead
of the underlying ones? It seems a bit counter-intuitive to me to change
max_sectors_kb on a different device than the one where the filesystem
whose max I/O size you want to change is located.
I guess this brings us back to the desire to be able to restack the
limits at will when something changes somewhere...
On Mon, Nov 07 2016 at 9:46pm -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike, > > Mike> You're suggesting that when the DM multipath device's limits are > Mike> stacked up from the underlying devices: cap the mpath's > Mike> max_hw_sectors to the underlying devices' max_sectors? > > That will break SG_IO, firmware upgrades, etc. (things you probably > shouldn't do on the MP device in the first place but which users often > do). > > However, doesn't it make more sense to tweak limits on DM device instead > of the underlying ones? It seems a bit counter-intuitive to me to change > max_sectors_kb on a different device than the one where the filesystem > whose max I/O size you want to change is located. As you know, the limits are stacked from the bottom-up. > I guess this brings us back to the desire to be able to restack the > limits at will when something changes somewhere... Tough to do that in a race-free way when DM multipath requests are cloned for submission to the underlying devices. As the commit header from this thread mentioned, what I've arrived at is to have multipathd establish a desired max_sectors_kb (if configured to do so in multipath.conf) on the underlying paths _before_ the multipath device is created. Yet to check with Ben Marzinski or others but it seems like a reasonable feature (and really the cleaniset way to deal with this issue IMHO.. no need for lots of kernel code changes for something so niche). -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike, >> However, doesn't it make more sense to tweak limits on DM device >> instead of the underlying ones? It seems a bit counter-intuitive to >> me to change max_sectors_kb on a different device than the one where >> the filesystem whose max I/O size you want to change is located. Mike> As you know, the limits are stacked from the bottom-up. Yep, but since max_sectors_kb is a performance tunable and not something enforced by the hardware, maybe we should consider treating it differently? Mike> As the commit header from this thread mentioned, what I've arrived Mike> at is to have multipathd establish a desired max_sectors_kb (if Mike> configured to do so in multipath.conf) on the underlying paths Mike> _before_ the multipath device is created. Yet to check with Ben Mike> Marzinski or others but it seems like a reasonable feature (and Mike> really the cleaniset way to deal with this issue IMHO.. no need Mike> for lots of kernel code changes for something so niche). That's fine. Another option would be to use the max_dev_sectors queue limit to contain the minimum max_sectors from below. It was really only envisioned as a LLD limit but it may be useful in this case. queue_max_sectors_store() already enforces it.
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 9cc8d7c..934f326 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -199,8 +199,12 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count) unsigned long max_sectors_kb, max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1, page_kb = 1 << (PAGE_SHIFT - 10); - ssize_t ret = queue_var_store(&max_sectors_kb, page, count); + ssize_t ret; + + if (blk_queue_stackable(q)) + return -EINVAL; + ret = queue_var_store(&max_sectors_kb, page, count); if (ret < 0) return ret;
Otherwise users can easily shoot themselves in the foot by creating the situation where the top-level stacked device (e.g. DM multipath) has a larger max_sectors_kb than the underlying device(s). Which will certainly lead to IO errors due to the "over max size limit" check in blk_cloned_rq_check_limits(). This is a crude, yet effective, solution that forces the use of system software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the underlying devices _before_ a layer like DM multipath is layered ontop. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-sysfs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)