diff mbox

[v2] block: disallow changing max_sectors_kb on a request stacking device

Message ID 20161107192646.GA11572@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mike Snitzer Nov. 7, 2016, 7:26 p.m. UTC
Otherwise users can easily shoot themselves in the foot by creating the
situation where the top-level stacked device (e.g. DM multipath) has a
larger max_sectors_kb than the underlying device(s).  Which will
certainly lead to IO errors due to the "over max size limit" check in
blk_cloned_rq_check_limits().

This is a crude, yet effective, solution that forces the use of system
software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
underlying devices _before_ a layer like DM multipath is layered ontop.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
 block/blk-sysfs.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Jens Axboe Nov. 7, 2016, 7:32 p.m. UTC | #1
On 11/07/2016 12:26 PM, Mike Snitzer wrote:
> Otherwise users can easily shoot themselves in the foot by creating the
> situation where the top-level stacked device (e.g. DM multipath) has a
> larger max_sectors_kb than the underlying device(s).  Which will
> certainly lead to IO errors due to the "over max size limit" check in
> blk_cloned_rq_check_limits().
>
> This is a crude, yet effective, solution that forces the use of system
> software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
> underlying devices _before_ a layer like DM multipath is layered ontop.

Maybe I'm missing something, but the code we have in place splits it
into max sectors for software and hardware. Shouldn't the stacked
devices have max_hw_sectors capped to what the lower levels support? If
that was done, we would not have to worry about a user fiddling with
max_sectors_kb, since it could only be smaller (or equal to) the max
size of the lower level.
Mike Snitzer Nov. 7, 2016, 9:27 p.m. UTC | #2
On Mon, Nov 07 2016 at  2:32pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/07/2016 12:26 PM, Mike Snitzer wrote:
> >Otherwise users can easily shoot themselves in the foot by creating the
> >situation where the top-level stacked device (e.g. DM multipath) has a
> >larger max_sectors_kb than the underlying device(s).  Which will
> >certainly lead to IO errors due to the "over max size limit" check in
> >blk_cloned_rq_check_limits().
> >
> >This is a crude, yet effective, solution that forces the use of system
> >software (e.g. udev rules or multipathd) to tweak max_sectors_kb of the
> >underlying devices _before_ a layer like DM multipath is layered ontop.
> 
> Maybe I'm missing something, but the code we have in place splits it
> into max sectors for software and hardware. Shouldn't the stacked
> devices have max_hw_sectors capped to what the lower levels support? If
> that was done, we would not have to worry about a user fiddling with
> max_sectors_kb, since it could only be smaller (or equal to) the max
> size of the lower level.

DM multipath just uses blk_stack_limits() to stack limits, which has:

        t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
        t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
        t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors);

But I assume you realize that.. I'm just missing the relation you're
saying exists, or should exist, between max_hw_sectors and max_sectors
(other than the obvious: max_sectors cannot be greater than
max_hw_sectors) as they relate to stacking.

You're suggesting that when the DM multipath device's limits are stacked
up from the underlying devices: cap the mpath's max_hw_sectors to the
underlying devices' max_sectors?
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Nov. 8, 2016, 2:46 a.m. UTC | #3
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike,

Mike> You're suggesting that when the DM multipath device's limits are
Mike> stacked up from the underlying devices: cap the mpath's
Mike> max_hw_sectors to the underlying devices' max_sectors?

That will break SG_IO, firmware upgrades, etc. (things you probably
shouldn't do on the MP device in the first place but which users often
do).

However, doesn't it make more sense to tweak limits on DM device instead
of the underlying ones? It seems a bit counter-intuitive to me to change
max_sectors_kb on a different device than the one where the filesystem
whose max I/O size you want to change is located.

I guess this brings us back to the desire to be able to restack the
limits at will when something changes somewhere...
Mike Snitzer Nov. 8, 2016, 3:34 a.m. UTC | #4
On Mon, Nov 07 2016 at  9:46pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike,
> 
> Mike> You're suggesting that when the DM multipath device's limits are
> Mike> stacked up from the underlying devices: cap the mpath's
> Mike> max_hw_sectors to the underlying devices' max_sectors?
> 
> That will break SG_IO, firmware upgrades, etc. (things you probably
> shouldn't do on the MP device in the first place but which users often
> do).
> 
> However, doesn't it make more sense to tweak limits on DM device instead
> of the underlying ones? It seems a bit counter-intuitive to me to change
> max_sectors_kb on a different device than the one where the filesystem
> whose max I/O size you want to change is located.

As you know, the limits are stacked from the bottom-up.

> I guess this brings us back to the desire to be able to restack the
> limits at will when something changes somewhere...

Tough to do that in a race-free way when DM multipath requests are
cloned for submission to the underlying devices.

As the commit header from this thread mentioned, what I've arrived at is
to have multipathd establish a desired max_sectors_kb (if configured to
do so in multipath.conf) on the underlying paths _before_ the multipath
device is created.  Yet to check with Ben Marzinski or others but it
seems like a reasonable feature (and really the cleaniset way to deal
with this issue IMHO.. no need for lots of kernel code changes for
something so niche).
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Martin K. Petersen Nov. 8, 2016, 9:10 p.m. UTC | #5
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike,

>> However, doesn't it make more sense to tweak limits on DM device
>> instead of the underlying ones? It seems a bit counter-intuitive to
>> me to change max_sectors_kb on a different device than the one where
>> the filesystem whose max I/O size you want to change is located.

Mike> As you know, the limits are stacked from the bottom-up.

Yep, but since max_sectors_kb is a performance tunable and not something
enforced by the hardware, maybe we should consider treating it
differently?

Mike> As the commit header from this thread mentioned, what I've arrived
Mike> at is to have multipathd establish a desired max_sectors_kb (if
Mike> configured to do so in multipath.conf) on the underlying paths
Mike> _before_ the multipath device is created.  Yet to check with Ben
Mike> Marzinski or others but it seems like a reasonable feature (and
Mike> really the cleaniset way to deal with this issue IMHO.. no need
Mike> for lots of kernel code changes for something so niche).

That's fine. Another option would be to use the max_dev_sectors queue
limit to contain the minimum max_sectors from below. It was really only
envisioned as a LLD limit but it may be useful in this case.
queue_max_sectors_store() already enforces it.
diff mbox

Patch

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9cc8d7c..934f326 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -199,8 +199,12 @@  queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 	unsigned long max_sectors_kb,
 		max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
 			page_kb = 1 << (PAGE_SHIFT - 10);
-	ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
+	ssize_t ret;
+
+	if (blk_queue_stackable(q))
+		return -EINVAL;
 
+	ret = queue_var_store(&max_sectors_kb, page, count);
 	if (ret < 0)
 		return ret;