Message ID | 20200528081003.238804-1-linus.walleij@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: Flag elevators suitable for single queue | expand |
On 28/05/2020 10:12, Linus Walleij wrote: > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > index b490f47fd553..324047add271 100644 > --- a/block/mq-deadline.c > +++ b/block/mq-deadline.c > @@ -794,7 +794,8 @@ static struct elevator_type mq_deadline = { > .elevator_attrs = deadline_attrs, > .elevator_name = "mq-deadline", > .elevator_alias = "deadline", > - .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE, > + .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE | > + ELEVATOR_F_SINGLE_HW_QUEUE, That indentation looks a bit odd to me but for the general concept Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Thu, May 28, 2020 at 10:26 AM Johannes Thumshirn <Johannes.Thumshirn@wdc.com> wrote: > On 28/05/2020 10:12, Linus Walleij wrote: > > diff --git a/block/mq-deadline.c b/block/mq-deadline.c > > index b490f47fd553..324047add271 100644 > > --- a/block/mq-deadline.c > > +++ b/block/mq-deadline.c > > @@ -794,7 +794,8 @@ static struct elevator_type mq_deadline = { > > .elevator_attrs = deadline_attrs, > > .elevator_name = "mq-deadline", > > .elevator_alias = "deadline", > > - .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE, > > + .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE | > > + ELEVATOR_F_SINGLE_HW_QUEUE, > > That indentation looks a bit odd to me but for the general concept Yeah it's what the EMACS default "linux" indentation suggest if I put the flag on a new line, but I can adjust to whatever the block maintainers suggest. > Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Thanks man! Yours, Linus Walleij
On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > The Kyber block scheduler is not suitable for single hardware > queue devices, so add a new flag for single hardware queue > devices and add that to the deadline and BFQ schedulers > so the Kyber scheduler will not be selected for single queue > devices. The above may not be true for some single hw queue high performance HBA( such as megasas), which can get better performance from none, so it is reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'"), and the following link: https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ Thanks, Ming
On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > > The Kyber block scheduler is not suitable for single hardware > > queue devices, so add a new flag for single hardware queue > > devices and add that to the deadline and BFQ schedulers > > so the Kyber scheduler will not be selected for single queue > > devices. > > The above may not be true for some single hw queue high performance HBA( > such as megasas), which can get better performance from none, so it is > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > issue directly if hw queue isn't busy in case of 'none'"), and the > following link: > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ I see, but isn't the case rather that none is preferred and kyber gives the same characteristics because it's not standing in the way as much? It looks like if we should add a special flag for these devices with very fast single queues so they can say "I prefer none", do you agree? Yours, Linus Walleij
On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: > On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: > > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > > > The Kyber block scheduler is not suitable for single hardware > > > queue devices, so add a new flag for single hardware queue > > > devices and add that to the deadline and BFQ schedulers > > > so the Kyber scheduler will not be selected for single queue > > > devices. > > > > The above may not be true for some single hw queue high performance HBA( > > such as megasas), which can get better performance from none, so it is > > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > > issue directly if hw queue isn't busy in case of 'none'"), and the > > following link: > > > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ > > I see, but isn't the case rather that none is preferred and kyber gives > the same characteristics because it's not standing in the way > as much? Kyber has its own characteristic, such as fair read & write, better IO merge. And the decision on scheduler isn't only related with device, but also with workloads. > > It looks like if we should add a special flag for these devices with > very fast single queues so they can say "I prefer none", do you > agree? I am not sure if it is easy to add such flag, because it isn't only related with HBA, but also with the attached disks. Thanks, Ming
On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote: > > On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: > > On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: > > > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > > > > The Kyber block scheduler is not suitable for single hardware > > > > queue devices, so add a new flag for single hardware queue > > > > devices and add that to the deadline and BFQ schedulers > > > > so the Kyber scheduler will not be selected for single queue > > > > devices. > > > > > > The above may not be true for some single hw queue high performance HBA( > > > such as megasas), which can get better performance from none, so it is > > > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > > > issue directly if hw queue isn't busy in case of 'none'"), and the > > > following link: > > > > > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ > > > > I see, but isn't the case rather that none is preferred and kyber gives > > the same characteristics because it's not standing in the way > > as much? > > Kyber has its own characteristic, such as fair read & write, better > IO merge. And the decision on scheduler isn't only related with device, > but also with workloads. > > > > > It looks like if we should add a special flag for these devices with > > very fast single queues so they can say "I prefer none", do you > > agree? > > I am not sure if it is easy to add such flag, because it isn't only > related with HBA, but also with the attached disks. > In general I don't mind the idea of giving hints from lower layer block devices, about what kind of scheduling algorithm that could make sense (as long it's on a reasonable granularity). If I understand your point correctly, what you are saying is that it isn't easy or even possible for some block devices HWs. However, that should be fine, as it wouldn't be mandatory to set this kind of flags, but instead could help where we see it fit, right? Kind regards Uffe
On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote: > On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote: > > On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: > > > On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: > > > > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > > > > > The Kyber block scheduler is not suitable for single hardware > > > > > queue devices, so add a new flag for single hardware queue > > > > > devices and add that to the deadline and BFQ schedulers > > > > > so the Kyber scheduler will not be selected for single queue > > > > > devices. > > > > > > > > The above may not be true for some single hw queue high performance HBA( > > > > such as megasas), which can get better performance from none, so it is > > > > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > > > > issue directly if hw queue isn't busy in case of 'none'"), and the > > > > following link: > > > > > > > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ > > > > > > I see, but isn't the case rather that none is preferred and kyber gives > > > the same characteristics because it's not standing in the way > > > as much? > > > > Kyber has its own characteristic, such as fair read & write, better > > IO merge. And the decision on scheduler isn't only related with device, > > but also with workloads. > > > > > It looks like if we should add a special flag for these devices with > > > very fast single queues so they can say "I prefer none", do you > > > agree? > > > > I am not sure if it is easy to add such flag, because it isn't only > > related with HBA, but also with the attached disks. > > > > In general I don't mind the idea of giving hints from lower layer > block devices, about what kind of scheduling algorithm that could make > sense (as long it's on a reasonable granularity). > > If I understand your point correctly, what you are saying is that it > isn't easy or even possible for some block devices HWs. However, that > should be fine, as it wouldn't be mandatory to set this kind of flags, > but > instead could help where we see it fit, right? The elevator features flag was implemented not as a hint, but as hard requirements for elevators that are needed (mandatory) for a particular device type for correct operation. By correct operation, I mean "no IO errors or weird behavior resulting in errors such as timeouts". Until now, the only hard requirement we have is for zoned block devices which need mq-deadline to guarantee in-order dispatch of write commands (for sequential zones writing). We definitely could add hint flags to better help the block layer decide on the default optimal elevator for a particular device type, but as is, the elevator features will completely prevent the use of any other elevator that does not have the feature set. Those elevators will not be seen in /sys/block/<dev>/queue/scheduler. This may be a little too much for hint level rather than hard requirement. Furthermore, as Ming said, this depends on the HBA too rather than just the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs) exposes single hard-disks as well as fast RAID arrays as multi-queue devices. While kyber may make sense for the latter, it probably does not make much sense for the former. In kernel vs udev rules for setting the optimal elevator for a particular device type should also be considered. > > Kind regards > Uffe
On 6/1/20 5:37 PM, Damien Le Moal wrote: > On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote: >> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote: >>> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: >>>> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: >>>>> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: >>>>>> The Kyber block scheduler is not suitable for single hardware >>>>>> queue devices, so add a new flag for single hardware queue >>>>>> devices and add that to the deadline and BFQ schedulers >>>>>> so the Kyber scheduler will not be selected for single queue >>>>>> devices. >>>>> >>>>> The above may not be true for some single hw queue high performance HBA( >>>>> such as megasas), which can get better performance from none, so it is >>>>> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: >>>>> issue directly if hw queue isn't busy in case of 'none'"), and the >>>>> following link: >>>>> >>>>> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ >>>> >>>> I see, but isn't the case rather that none is preferred and kyber gives >>>> the same characteristics because it's not standing in the way >>>> as much? >>> >>> Kyber has its own characteristic, such as fair read & write, better >>> IO merge. And the decision on scheduler isn't only related with device, >>> but also with workloads. >>> >>>> It looks like if we should add a special flag for these devices with >>>> very fast single queues so they can say "I prefer none", do you >>>> agree? >>> >>> I am not sure if it is easy to add such flag, because it isn't only >>> related with HBA, but also with the attached disks. >>> >> >> In general I don't mind the idea of giving hints from lower layer >> block devices, about what kind of scheduling algorithm that could make >> sense (as long it's on a reasonable granularity). >> >> If I understand your point correctly, what you are saying is that it >> isn't easy or even possible for some block devices HWs. However, that >> should be fine, as it wouldn't be mandatory to set this kind of flags, >> but >> instead could help where we see it fit, right? > > The elevator features flag was implemented not as a hint, but as hard > requirements for elevators that are needed (mandatory) for a particular > device type for correct operation. By correct operation, I mean "no IO > errors or weird behavior resulting in errors such as timeouts". Until > now, the only hard requirement we have is for zoned block devices which > need mq-deadline to guarantee in-order dispatch of write commands (for > sequential zones writing). > > We definitely could add hint flags to better help the block layer > decide on the default optimal elevator for a particular device type, > but as is, the elevator features will completely prevent the use of any > other elevator that does not have the feature set. Those elevators will > not be seen in /sys/block/<dev>/queue/scheduler. This may be a little > too much for hint level rather than hard requirement. > > Furthermore, as Ming said, this depends on the HBA too rather than just > the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs) > exposes single hard-disks as well as fast RAID arrays as multi-queue > devices. While kyber may make sense for the latter, it probably does > not make much sense for the former. > > In kernel vs udev rules for setting the optimal elevator for a > particular device type should also be considered. Agree, the elevator flags are hard requirements, which doesn't match what this patch is trying to do. There's absolutely nothing wrong with using none or kyber on single queue devices, hence it should be possible to configure it as such.
On Tue, 2 Jun 2020 at 01:45, Jens Axboe <axboe@kernel.dk> wrote: > > On 6/1/20 5:37 PM, Damien Le Moal wrote: > > On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote: > >> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote: > >>> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote: > >>>> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote: > >>>>> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote: > >>>>>> The Kyber block scheduler is not suitable for single hardware > >>>>>> queue devices, so add a new flag for single hardware queue > >>>>>> devices and add that to the deadline and BFQ schedulers > >>>>>> so the Kyber scheduler will not be selected for single queue > >>>>>> devices. > >>>>> > >>>>> The above may not be true for some single hw queue high performance HBA( > >>>>> such as megasas), which can get better performance from none, so it is > >>>>> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq: > >>>>> issue directly if hw queue isn't busy in case of 'none'"), and the > >>>>> following link: > >>>>> > >>>>> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/ > >>>> > >>>> I see, but isn't the case rather that none is preferred and kyber gives > >>>> the same characteristics because it's not standing in the way > >>>> as much? > >>> > >>> Kyber has its own characteristic, such as fair read & write, better > >>> IO merge. And the decision on scheduler isn't only related with device, > >>> but also with workloads. > >>> > >>>> It looks like if we should add a special flag for these devices with > >>>> very fast single queues so they can say "I prefer none", do you > >>>> agree? > >>> > >>> I am not sure if it is easy to add such flag, because it isn't only > >>> related with HBA, but also with the attached disks. > >>> > >> > >> In general I don't mind the idea of giving hints from lower layer > >> block devices, about what kind of scheduling algorithm that could make > >> sense (as long it's on a reasonable granularity). > >> > >> If I understand your point correctly, what you are saying is that it > >> isn't easy or even possible for some block devices HWs. However, that > >> should be fine, as it wouldn't be mandatory to set this kind of flags, > >> but > >> instead could help where we see it fit, right? > > > > The elevator features flag was implemented not as a hint, but as hard > > requirements for elevators that are needed (mandatory) for a particular > > device type for correct operation. By correct operation, I mean "no IO > > errors or weird behavior resulting in errors such as timeouts". Until > > now, the only hard requirement we have is for zoned block devices which > > need mq-deadline to guarantee in-order dispatch of write commands (for > > sequential zones writing). > > > > We definitely could add hint flags to better help the block layer > > decide on the default optimal elevator for a particular device type, > > but as is, the elevator features will completely prevent the use of any > > other elevator that does not have the feature set. Those elevators will > > not be seen in /sys/block/<dev>/queue/scheduler. This may be a little > > too much for hint level rather than hard requirement. > > > > Furthermore, as Ming said, this depends on the HBA too rather than just > > the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs) > > exposes single hard-disks as well as fast RAID arrays as multi-queue > > devices. While kyber may make sense for the latter, it probably does > > not make much sense for the former. > > > > In kernel vs udev rules for setting the optimal elevator for a > > particular device type should also be considered. > > Agree, the elevator flags are hard requirements, which doesn't match > what this patch is trying to do. There's absolutely nothing wrong with > using none or kyber on single queue devices, hence it should be possible > to configure it as such. I agree, the elevator flags as is, currently don't work for giving hints from lower block layers. However, I still think it would be worth exploring the idea that is brought up here. The point is, even if it's perfectly fine to use kyber for MMC/SD, for example, it would make little sense as BFQ performs better on this type of single queue storage device. So, why solely rely on userspace udev rules, when we can, in-kernel, help to decide what is the best configuration? Kind regards Uffe
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 3d411716d7ee..7bf99fd83472 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6812,6 +6812,7 @@ static struct elevator_type iosched_bfq_mq = { .icq_align = __alignof__(struct bfq_io_cq), .elevator_attrs = bfq_attrs, .elevator_name = "bfq", + .elevator_features = ELEVATOR_F_SINGLE_HW_QUEUE, .elevator_owner = THIS_MODULE, }; MODULE_ALIAS("bfq-iosched"); diff --git a/block/elevator.c b/block/elevator.c index 4eab3d70e880..ebb4fc875b86 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -678,6 +678,9 @@ void elevator_init_mq(struct request_queue *q) if (unlikely(q->elevator)) return; + if (q->nr_hw_queues == 1) + q->required_elevator_features |= ELEVATOR_F_SINGLE_HW_QUEUE; + if (!q->required_elevator_features) e = elevator_get_default(q); else diff --git a/block/mq-deadline.c b/block/mq-deadline.c index b490f47fd553..324047add271 100644 --- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -794,7 +794,8 @@ static struct elevator_type mq_deadline = { .elevator_attrs = deadline_attrs, .elevator_name = "mq-deadline", .elevator_alias = "deadline", - .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE, + .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE | + ELEVATOR_F_SINGLE_HW_QUEUE, .elevator_owner = THIS_MODULE, }; MODULE_ALIAS("mq-deadline-iosched"); diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 901bda352dcb..03057fa2f569 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -172,6 +172,8 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t); /* Supports zoned block devices sequential write constraint */ #define ELEVATOR_F_ZBD_SEQ_WRITE (1U << 0) +/* Elevator is suitable for single hardware queue devices */ +#define ELEVATOR_F_SINGLE_HW_QUEUE (1U << 1) #endif /* CONFIG_BLOCK */ #endif
The Kyber block scheduler is not suitable for single hardware queue devices, so add a new flag for single hardware queue devices and add that to the deadline and BFQ schedulers so the Kyber scheduler will not be selected for single queue devices. Deadline and BFQ are applicable to single HW queues so flag each of these as single HW queue-friendly. Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- block/bfq-iosched.c | 1 + block/elevator.c | 3 +++ block/mq-deadline.c | 3 ++- include/linux/elevator.h | 2 ++ 4 files changed, 8 insertions(+), 1 deletion(-)