diff mbox series

block: Flag elevators suitable for single queue

Message ID 20200528081003.238804-1-linus.walleij@linaro.org (mailing list archive)
State New, archived
Headers show
Series block: Flag elevators suitable for single queue | expand

Commit Message

Linus Walleij May 28, 2020, 8:10 a.m. UTC
The Kyber block scheduler is not suitable for single hardware
queue devices, so add a new flag for single hardware queue
devices and add that to the deadline and BFQ schedulers
so the Kyber scheduler will not be selected for single queue
devices.

Deadline and BFQ are applicable to single HW queues so flag
each of these as single HW queue-friendly.

Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 block/bfq-iosched.c      | 1 +
 block/elevator.c         | 3 +++
 block/mq-deadline.c      | 3 ++-
 include/linux/elevator.h | 2 ++
 4 files changed, 8 insertions(+), 1 deletion(-)

Comments

Johannes Thumshirn May 28, 2020, 8:26 a.m. UTC | #1
On 28/05/2020 10:12, Linus Walleij wrote:
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index b490f47fd553..324047add271 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -794,7 +794,8 @@ static struct elevator_type mq_deadline = {
>  	.elevator_attrs = deadline_attrs,
>  	.elevator_name = "mq-deadline",
>  	.elevator_alias = "deadline",
> -	.elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE,
> +	.elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE |
> +	ELEVATOR_F_SINGLE_HW_QUEUE,

That indentation looks a bit odd to me but for the general concept

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Linus Walleij May 28, 2020, 11:59 a.m. UTC | #2
On Thu, May 28, 2020 at 10:26 AM Johannes Thumshirn
<Johannes.Thumshirn@wdc.com> wrote:
> On 28/05/2020 10:12, Linus Walleij wrote:
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index b490f47fd553..324047add271 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -794,7 +794,8 @@ static struct elevator_type mq_deadline = {
> >       .elevator_attrs = deadline_attrs,
> >       .elevator_name = "mq-deadline",
> >       .elevator_alias = "deadline",
> > -     .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE,
> > +     .elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE |
> > +     ELEVATOR_F_SINGLE_HW_QUEUE,
>
> That indentation looks a bit odd to me but for the general concept

Yeah it's what the EMACS default "linux" indentation suggest if I
put the flag on a new line, but I can adjust to whatever the block
maintainers suggest.

> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Thanks man!

Yours,
Linus Walleij
Ming Lei June 1, 2020, 7:49 a.m. UTC | #3
On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> The Kyber block scheduler is not suitable for single hardware
> queue devices, so add a new flag for single hardware queue
> devices and add that to the deadline and BFQ schedulers
> so the Kyber scheduler will not be selected for single queue
> devices.

The above may not be true for some single hw queue high performance HBA(
such as megasas), which can get better performance from none, so it is
reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
issue directly if hw queue isn't busy in case of 'none'"), and the
following link:

https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/

Thanks, 
Ming
Linus Walleij June 1, 2020, 11:36 a.m. UTC | #4
On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> > The Kyber block scheduler is not suitable for single hardware
> > queue devices, so add a new flag for single hardware queue
> > devices and add that to the deadline and BFQ schedulers
> > so the Kyber scheduler will not be selected for single queue
> > devices.
>
> The above may not be true for some single hw queue high performance HBA(
> such as megasas), which can get better performance from none, so it is
> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
> issue directly if hw queue isn't busy in case of 'none'"), and the
> following link:
>
> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/

I see, but isn't the case rather that none is preferred and kyber gives
the same characteristics because it's not standing in the way
as much?

It looks like if we should add a special flag for these devices with
very fast single queues so they can say "I prefer none", do you
agree?

Yours,
Linus Walleij
Ming Lei June 1, 2020, 11:58 a.m. UTC | #5
On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote:
> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
> > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> > > The Kyber block scheduler is not suitable for single hardware
> > > queue devices, so add a new flag for single hardware queue
> > > devices and add that to the deadline and BFQ schedulers
> > > so the Kyber scheduler will not be selected for single queue
> > > devices.
> >
> > The above may not be true for some single hw queue high performance HBA(
> > such as megasas), which can get better performance from none, so it is
> > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
> > issue directly if hw queue isn't busy in case of 'none'"), and the
> > following link:
> >
> > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/
> 
> I see, but isn't the case rather that none is preferred and kyber gives
> the same characteristics because it's not standing in the way
> as much?

Kyber has its own characteristic, such as fair read & write, better
IO merge. And the decision on scheduler isn't only related with device,
but also with workloads.

> 
> It looks like if we should add a special flag for these devices with
> very fast single queues so they can say "I prefer none", do you
> agree?

I am not sure if it is easy to add such flag, because it isn't only
related with HBA, but also with the attached disks.


Thanks,
Ming
Ulf Hansson June 1, 2020, 12:53 p.m. UTC | #6
On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote:
>
> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote:
> > On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> > > > The Kyber block scheduler is not suitable for single hardware
> > > > queue devices, so add a new flag for single hardware queue
> > > > devices and add that to the deadline and BFQ schedulers
> > > > so the Kyber scheduler will not be selected for single queue
> > > > devices.
> > >
> > > The above may not be true for some single hw queue high performance HBA(
> > > such as megasas), which can get better performance from none, so it is
> > > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
> > > issue directly if hw queue isn't busy in case of 'none'"), and the
> > > following link:
> > >
> > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/
> >
> > I see, but isn't the case rather that none is preferred and kyber gives
> > the same characteristics because it's not standing in the way
> > as much?
>
> Kyber has its own characteristic, such as fair read & write, better
> IO merge. And the decision on scheduler isn't only related with device,
> but also with workloads.
>
> >
> > It looks like if we should add a special flag for these devices with
> > very fast single queues so they can say "I prefer none", do you
> > agree?
>
> I am not sure if it is easy to add such flag, because it isn't only
> related with HBA, but also with the attached disks.
>

In general I don't mind the idea of giving hints from lower layer
block devices, about what kind of scheduling algorithm that could make
sense (as long it's on a reasonable granularity).

If I understand your point correctly, what you are saying is that it
isn't easy or even possible for some block devices HWs. However, that
should be fine, as it wouldn't be mandatory to set this kind of flags,
but
instead could help where we see it fit, right?

Kind regards
Uffe
Damien Le Moal June 1, 2020, 11:37 p.m. UTC | #7
On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote:
> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote:
> > On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote:
> > > On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> > > > > The Kyber block scheduler is not suitable for single hardware
> > > > > queue devices, so add a new flag for single hardware queue
> > > > > devices and add that to the deadline and BFQ schedulers
> > > > > so the Kyber scheduler will not be selected for single queue
> > > > > devices.
> > > > 
> > > > The above may not be true for some single hw queue high performance HBA(
> > > > such as megasas), which can get better performance from none, so it is
> > > > reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
> > > > issue directly if hw queue isn't busy in case of 'none'"), and the
> > > > following link:
> > > > 
> > > > https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/
> > > 
> > > I see, but isn't the case rather that none is preferred and kyber gives
> > > the same characteristics because it's not standing in the way
> > > as much?
> > 
> > Kyber has its own characteristic, such as fair read & write, better
> > IO merge. And the decision on scheduler isn't only related with device,
> > but also with workloads.
> > 
> > > It looks like if we should add a special flag for these devices with
> > > very fast single queues so they can say "I prefer none", do you
> > > agree?
> > 
> > I am not sure if it is easy to add such flag, because it isn't only
> > related with HBA, but also with the attached disks.
> > 
> 
> In general I don't mind the idea of giving hints from lower layer
> block devices, about what kind of scheduling algorithm that could make
> sense (as long it's on a reasonable granularity).
> 
> If I understand your point correctly, what you are saying is that it
> isn't easy or even possible for some block devices HWs. However, that
> should be fine, as it wouldn't be mandatory to set this kind of flags,
> but
> instead could help where we see it fit, right?

The elevator features flag was implemented not as a hint, but as hard
requirements for elevators that are needed (mandatory) for a particular
device type for correct operation. By correct operation, I mean "no IO
errors or weird behavior resulting in errors such as timeouts". Until
now, the only hard requirement we have is for zoned block devices which
need mq-deadline to guarantee in-order dispatch of write commands (for
sequential zones writing).

We definitely could add hint flags to better help the block layer
decide on the default optimal elevator for a particular device type,
but as is, the elevator features will completely prevent the use of any
other elevator that does not have the feature set. Those elevators will
not be seen in /sys/block/<dev>/queue/scheduler. This may be a little
too much for hint level rather than hard requirement.

Furthermore, as Ming said, this depends on the HBA too rather than just
the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs)
exposes single hard-disks as well as fast RAID arrays as multi-queue
devices. While kyber may make sense for the latter, it probably does
not make much sense for the former.

In kernel vs udev rules for setting the optimal elevator for a
particular device type should also be considered.

> 
> Kind regards
> Uffe
Jens Axboe June 1, 2020, 11:45 p.m. UTC | #8
On 6/1/20 5:37 PM, Damien Le Moal wrote:
> On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote:
>> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote:
>>> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote:
>>>> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
>>>>> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
>>>>>> The Kyber block scheduler is not suitable for single hardware
>>>>>> queue devices, so add a new flag for single hardware queue
>>>>>> devices and add that to the deadline and BFQ schedulers
>>>>>> so the Kyber scheduler will not be selected for single queue
>>>>>> devices.
>>>>>
>>>>> The above may not be true for some single hw queue high performance HBA(
>>>>> such as megasas), which can get better performance from none, so it is
>>>>> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
>>>>> issue directly if hw queue isn't busy in case of 'none'"), and the
>>>>> following link:
>>>>>
>>>>> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/
>>>>
>>>> I see, but isn't the case rather that none is preferred and kyber gives
>>>> the same characteristics because it's not standing in the way
>>>> as much?
>>>
>>> Kyber has its own characteristic, such as fair read & write, better
>>> IO merge. And the decision on scheduler isn't only related with device,
>>> but also with workloads.
>>>
>>>> It looks like if we should add a special flag for these devices with
>>>> very fast single queues so they can say "I prefer none", do you
>>>> agree?
>>>
>>> I am not sure if it is easy to add such flag, because it isn't only
>>> related with HBA, but also with the attached disks.
>>>
>>
>> In general I don't mind the idea of giving hints from lower layer
>> block devices, about what kind of scheduling algorithm that could make
>> sense (as long it's on a reasonable granularity).
>>
>> If I understand your point correctly, what you are saying is that it
>> isn't easy or even possible for some block devices HWs. However, that
>> should be fine, as it wouldn't be mandatory to set this kind of flags,
>> but
>> instead could help where we see it fit, right?
> 
> The elevator features flag was implemented not as a hint, but as hard
> requirements for elevators that are needed (mandatory) for a particular
> device type for correct operation. By correct operation, I mean "no IO
> errors or weird behavior resulting in errors such as timeouts". Until
> now, the only hard requirement we have is for zoned block devices which
> need mq-deadline to guarantee in-order dispatch of write commands (for
> sequential zones writing).
> 
> We definitely could add hint flags to better help the block layer
> decide on the default optimal elevator for a particular device type,
> but as is, the elevator features will completely prevent the use of any
> other elevator that does not have the feature set. Those elevators will
> not be seen in /sys/block/<dev>/queue/scheduler. This may be a little
> too much for hint level rather than hard requirement.
> 
> Furthermore, as Ming said, this depends on the HBA too rather than just
> the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs)
> exposes single hard-disks as well as fast RAID arrays as multi-queue
> devices. While kyber may make sense for the latter, it probably does
> not make much sense for the former.
> 
> In kernel vs udev rules for setting the optimal elevator for a
> particular device type should also be considered.

Agree, the elevator flags are hard requirements, which doesn't match
what this patch is trying to do. There's absolutely nothing wrong with
using none or kyber on single queue devices, hence it should be possible
to configure it as such.
Ulf Hansson June 2, 2020, 6:46 a.m. UTC | #9
On Tue, 2 Jun 2020 at 01:45, Jens Axboe <axboe@kernel.dk> wrote:
>
> On 6/1/20 5:37 PM, Damien Le Moal wrote:
> > On Mon, 2020-06-01 at 14:53 +0200, Ulf Hansson wrote:
> >> On Mon, 1 Jun 2020 at 13:58, Ming Lei <ming.lei@redhat.com> wrote:
> >>> On Mon, Jun 01, 2020 at 01:36:54PM +0200, Linus Walleij wrote:
> >>>> On Mon, Jun 1, 2020 at 9:50 AM Ming Lei <ming.lei@redhat.com> wrote:
> >>>>> On Thu, May 28, 2020 at 10:10:03AM +0200, Linus Walleij wrote:
> >>>>>> The Kyber block scheduler is not suitable for single hardware
> >>>>>> queue devices, so add a new flag for single hardware queue
> >>>>>> devices and add that to the deadline and BFQ schedulers
> >>>>>> so the Kyber scheduler will not be selected for single queue
> >>>>>> devices.
> >>>>>
> >>>>> The above may not be true for some single hw queue high performance HBA(
> >>>>> such as megasas), which can get better performance from none, so it is
> >>>>> reasonable to get better performance from kyber, see 6ce3dd6eec11 ("blk-mq:
> >>>>> issue directly if hw queue isn't busy in case of 'none'"), and the
> >>>>> following link:
> >>>>>
> >>>>> https://lore.kernel.org/linux-block/20180710010331.27479-1-ming.lei@redhat.com/
> >>>>
> >>>> I see, but isn't the case rather that none is preferred and kyber gives
> >>>> the same characteristics because it's not standing in the way
> >>>> as much?
> >>>
> >>> Kyber has its own characteristic, such as fair read & write, better
> >>> IO merge. And the decision on scheduler isn't only related with device,
> >>> but also with workloads.
> >>>
> >>>> It looks like if we should add a special flag for these devices with
> >>>> very fast single queues so they can say "I prefer none", do you
> >>>> agree?
> >>>
> >>> I am not sure if it is easy to add such flag, because it isn't only
> >>> related with HBA, but also with the attached disks.
> >>>
> >>
> >> In general I don't mind the idea of giving hints from lower layer
> >> block devices, about what kind of scheduling algorithm that could make
> >> sense (as long it's on a reasonable granularity).
> >>
> >> If I understand your point correctly, what you are saying is that it
> >> isn't easy or even possible for some block devices HWs. However, that
> >> should be fine, as it wouldn't be mandatory to set this kind of flags,
> >> but
> >> instead could help where we see it fit, right?
> >
> > The elevator features flag was implemented not as a hint, but as hard
> > requirements for elevators that are needed (mandatory) for a particular
> > device type for correct operation. By correct operation, I mean "no IO
> > errors or weird behavior resulting in errors such as timeouts". Until
> > now, the only hard requirement we have is for zoned block devices which
> > need mq-deadline to guarantee in-order dispatch of write commands (for
> > sequential zones writing).
> >
> > We definitely could add hint flags to better help the block layer
> > decide on the default optimal elevator for a particular device type,
> > but as is, the elevator features will completely prevent the use of any
> > other elevator that does not have the feature set. Those elevators will
> > not be seen in /sys/block/<dev>/queue/scheduler. This may be a little
> > too much for hint level rather than hard requirement.
> >
> > Furthermore, as Ming said, this depends on the HBA too rather than just
> > the device itself. E.g. the smartpqi driver (Microsemi SAS HBAs)
> > exposes single hard-disks as well as fast RAID arrays as multi-queue
> > devices. While kyber may make sense for the latter, it probably does
> > not make much sense for the former.
> >
> > In kernel vs udev rules for setting the optimal elevator for a
> > particular device type should also be considered.
>
> Agree, the elevator flags are hard requirements, which doesn't match
> what this patch is trying to do. There's absolutely nothing wrong with
> using none or kyber on single queue devices, hence it should be possible
> to configure it as such.

I agree, the elevator flags as is, currently don't work for giving
hints from lower block layers. However, I still think it would be
worth exploring the idea that is brought up here.

The point is, even if it's perfectly fine to use kyber for MMC/SD, for
example, it would make little sense as BFQ performs better on this
type of single queue storage device. So, why solely rely on userspace
udev rules, when we can, in-kernel, help to decide what is the best
configuration?

Kind regards
Uffe
diff mbox series

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 3d411716d7ee..7bf99fd83472 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6812,6 +6812,7 @@  static struct elevator_type iosched_bfq_mq = {
 	.icq_align =		__alignof__(struct bfq_io_cq),
 	.elevator_attrs =	bfq_attrs,
 	.elevator_name =	"bfq",
+	.elevator_features =	ELEVATOR_F_SINGLE_HW_QUEUE,
 	.elevator_owner =	THIS_MODULE,
 };
 MODULE_ALIAS("bfq-iosched");
diff --git a/block/elevator.c b/block/elevator.c
index 4eab3d70e880..ebb4fc875b86 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -678,6 +678,9 @@  void elevator_init_mq(struct request_queue *q)
 	if (unlikely(q->elevator))
 		return;
 
+	if (q->nr_hw_queues == 1)
+		q->required_elevator_features |= ELEVATOR_F_SINGLE_HW_QUEUE;
+
 	if (!q->required_elevator_features)
 		e = elevator_get_default(q);
 	else
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b490f47fd553..324047add271 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -794,7 +794,8 @@  static struct elevator_type mq_deadline = {
 	.elevator_attrs = deadline_attrs,
 	.elevator_name = "mq-deadline",
 	.elevator_alias = "deadline",
-	.elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE,
+	.elevator_features = ELEVATOR_F_ZBD_SEQ_WRITE |
+	ELEVATOR_F_SINGLE_HW_QUEUE,
 	.elevator_owner = THIS_MODULE,
 };
 MODULE_ALIAS("mq-deadline-iosched");
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 901bda352dcb..03057fa2f569 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -172,6 +172,8 @@  extern struct request *elv_rb_find(struct rb_root *, sector_t);
 
 /* Supports zoned block devices sequential write constraint */
 #define ELEVATOR_F_ZBD_SEQ_WRITE	(1U << 0)
+/* Elevator is suitable for single hardware queue devices */
+#define ELEVATOR_F_SINGLE_HW_QUEUE	(1U << 1)
 
 #endif /* CONFIG_BLOCK */
 #endif