diff mbox

[GIT,PULL] Block pull request for- 4.11-rc1

Message ID ef431f1c-e3c6-4940-bb2a-f5131ca96855@kernel.dk (mailing list archive)
State New, archived
Headers show

Commit Message

Jens Axboe Feb. 21, 2017, 11:15 p.m. UTC
On 02/21/2017 04:02 PM, Linus Torvalds wrote:
> Hmm. The new config options are incomprehensible and their help
> messages don't actually help.
> 
> So can you fill in the blanks on what
> 
>   Default single-queue blk-mq I/O scheduler
>   Default multi-queue blk-mq I/O scheduler
> 
> config options mean, and why they default to none?
> 
> The config phase of the kernel is one of the worst parts of the whole
> project, and adding these kinds of random and incomprehensible config
> options does *not* help.

I'll try and see if I can come up with some better sounding/reading
explanations.

But under a device managed by blk-mq, that device exposes a number of
hardware queues. For older style devices, that number is typically 1
(single queue). This is true for most SCSI devices that are run by
scsi-mq, which is often hosting rotational storage. Faster devices, like
nvme, exposes a lot more hardware queues (multi-queue). Hence the
distinction between having a scheduler attached for single-queue
devices, and for multi-queue devices. For rotational devices, we'll want
to default to something like mq-deadline, and I actually thought that
was the default already. It should be (see below). For multi-queue
devices, we'll want to initially default to "none", and then later
attach a properly multiqueue scheduler, when we have it (it's still in
development).

"none" just means that we don't have a scheduler attached.

In essence, we want to default to having a sane IO scheduler attached
depending on device class. For single-queue devices, that's deadline for
now. For multi-queue, we'll want to wait until we have something that
scales really well. It's not that easy to present this information in a
user grokkable fashion, since most people would not know the difference
between the two.

I'll send the below as a real patch, and also ponder how we can improve
the Kconfig text.

Comments

Linus Torvalds Feb. 21, 2017, 11:23 p.m. UTC | #1
On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> But under a device managed by blk-mq, that device exposes a number of
> hardware queues. For older style devices, that number is typically 1
> (single queue).

... but why would this ever be different from the normal IO scheduler?

IOW, what makes single-queue mq scheduling so special that

 (a) it needs its own config option

 (b) it is different from just the regular IO scheduler in the first place?

So the whole thing stinks. The fact that it then has an
incomprehensible config option seems to be just gravy on top of the
crap.

> "none" just means that we don't have a scheduler attached.

.. which makes no sense to me in the first place.

People used to try to convince us that doing IO schedulers was a
mistake, because modern disk hardware did a better job than we could
do in software.

Those people were full of crap. The regular IO scheduler used to have
a "NONE" option too. Maybe it even still has one, but only insane
people actually use it.

Why is the MQ stuff magically so different that NONE would make sense at all>?

And equally importantly: why do we _ask_ people these issues? Is this
some kind of sick "cover your ass" thing, where you can say "well, I
asked about it", when inevitably the choice ends up being the wrong
one?

We have too damn many Kconfig options as-is, I'm trying to push back
on them. These two options seem fundamentally broken and stupid.

The "we have no good idea, so let's add a Kconfig option" seems like a
broken excuse for these things existing.

So why ask this question in the first place?

Is there any possible reason why "NONE" is a good option at all? And
if it is the _only_ option (because no other better choice exists), it
damn well shouldn't be a kconfig option!

             Linus
Jens Axboe Feb. 22, 2017, 6:14 p.m. UTC | #2
On 02/21/2017 04:23 PM, Linus Torvalds wrote:
> On Tue, Feb 21, 2017 at 3:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> But under a device managed by blk-mq, that device exposes a number of
>> hardware queues. For older style devices, that number is typically 1
>> (single queue).
> 
> ... but why would this ever be different from the normal IO scheduler?

Because we have a different set of schedulers for blk-mq, different
than the legacy path. mq-deadline is a basic port that will work
fine with rotational storage, but it's not going to be a good choice
for NVMe because of scalability issues.

We'll have BFQ on the blk-mq side, catering to the needs of those
folks that currently rely on the richer feature set that CFQ supports.

We've continually been working towards getting rid of the legacy
IO path, and its set of schedulers. So if it's any consolation,
those options will go away in the future.

> IOW, what makes single-queue mq scheduling so special that
> 
>  (a) it needs its own config option
> 
>  (b) it is different from just the regular IO scheduler in the first place?
> 
> So the whole thing stinks. The fact that it then has an
> incomprehensible config option seems to be just gravy on top of the
> crap.

What do you mean by "the regular IO scheduler"? These are different
schedulers.

As explained above, single-queue mq devices generally DO want mq-deadline.
multi-queue mq devices, we don't have a good choice for them right now,
so we retain the current behavior (that we've had since blk-mq was
introduced in 3.13) of NOT doing any IO scheduling for them. If you
do want scheduling for them, set the option, or configure udev to
make the right choice for you.

I agree the wording isn't great, and we can improve that. But I do
think that the current choices make sense.

>> "none" just means that we don't have a scheduler attached.
> 
> .. which makes no sense to me in the first place.
> 
> People used to try to convince us that doing IO schedulers was a
> mistake, because modern disk hardware did a better job than we could
> do in software.
> 
> Those people were full of crap. The regular IO scheduler used to have
> a "NONE" option too. Maybe it even still has one, but only insane
> people actually use it.
> 
> Why is the MQ stuff magically so different that NONE would make sense at all>?

I was never one of those people, and I've always been a strong advocate
for imposing scheduling to keep devices in check. The regular IO scheduler
pool includes "noop", which is probably the one you are thinking of. That
one is a bit different than the new "none" option for blk-mq, in that it
does do insertion sorts and it does do merges. "none" does some merging,
but only where it happens to make sense. There's no insertion sorting.

> And equally importantly: why do we _ask_ people these issues? Is this
> some kind of sick "cover your ass" thing, where you can say "well, I
> asked about it", when inevitably the choice ends up being the wrong
> one?
> 
> We have too damn many Kconfig options as-is, I'm trying to push back
> on them. These two options seem fundamentally broken and stupid.
> 
> The "we have no good idea, so let's add a Kconfig option" seems like a
> broken excuse for these things existing.
> 
> So why ask this question in the first place?
> 
> Is there any possible reason why "NONE" is a good option at all? And
> if it is the _only_ option (because no other better choice exists), it
> damn well shouldn't be a kconfig option!

I'm all for NOT asking questions, and not providing tunables. That's
generally how I do write code. See the blk-wbt stuff, for instance, that
basically just has one tunable that's set sanely by default, and we
figure out the rest.

I don't want to regress performance of blk-mq devices by attaching
mq-deadline to them. When we do have a sane scheduler choice, we'll
make that the default. And yes, maybe we can remove the Kconfig option
at that point.

For single queue devices, we could kill the option. But we're expecting
bfq-mq for 4.12, and we'll want to have the option at that point unless
you want to rely solely on runtime setting of the scheduler through
udev or by the sysadmin.
Linus Torvalds Feb. 22, 2017, 6:26 p.m. UTC | #3
On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> What do you mean by "the regular IO scheduler"? These are different
> schedulers.

Not to the user they aren't.

If the user already answered once about the IO schedulers, we damn
well shouldn't ask again abotu another small implementaiton detail.

How hard is this to understand? You're asking users stupid things.

It's not just about the wording. It's a fundamental issue.  These
questions are about internal implementation details. They make no
sense to a user. They don't even make sense to a kernel developer, for
chrissake!

Don't make the kconfig mess worse. This "we can't make good defaults
in the kernel, so ask users about random things that they cannot
possibly answer" model is not an acceptable model.

If the new schedulers aren't better than NOOP, they shouldn't exist.
And if you want people to be able to test, they should be dynamic.

And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

It's really that simple.

             Linus
Jens Axboe Feb. 22, 2017, 6:41 p.m. UTC | #4
On 02/22/2017 11:26 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:14 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> What do you mean by "the regular IO scheduler"? These are different
>> schedulers.
> 
> Not to the user they aren't.
> 
> If the user already answered once about the IO schedulers, we damn
> well shouldn't ask again abotu another small implementaiton detail.
> 
> How hard is this to understand? You're asking users stupid things.

The fact is that we have two different sets, until we can yank
the old ones. So I can't just ask one question, since the sets
aren't identical.

This IS confusing to the user, and it's an artifact of the situation
that we have where we are phasing out the old IO path and switching
to blk-mq. I don't want to user to know about blk-mq, I just want
it to be what everything runs on. But until that happens, and it is
happening, we are going to be stuck with that situation.

We have this exposed in other places, too. Like for dm, and for
SCSI. Not a perfect situation, but something that WILL go away
eventually.

> It's not just about the wording. It's a fundamental issue.  These
> questions are about internal implementation details. They make no
> sense to a user. They don't even make sense to a kernel developer, for
> chrissake!
> 
> Don't make the kconfig mess worse. This "we can't make good defaults
> in the kernel, so ask users about random things that they cannot
> possibly answer" model is not an acceptable model.

There are good defaults! mq single-queue should default to mq-deadline,
and mq multi-queue should default to "none" for now. If you feel that
strongly about it (and I'm guessing you do, judging by the speed
typing and generally annoyed demeanor), then by all means, let's kill
the config entries and I'll just hardwire the defaults. The config
entries were implemented similarly to the old schedulers, and each
scheduler is selectable individually. I'd greatly prefer just
improving the wording so it makes more sense.

> If the new schedulers aren't better than NOOP, they shouldn't exist.
> And if you want people to be able to test, they should be dynamic.

They are dynamic! You can build them as modules, you can switch at
runtime. Just like we have always been able to. I can't make it more
dynamic than that. We're reusing the same internal infrastructure for
that, AND the user visible ABI for checking what is available, and
setting a new one.

> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

BECAUSE IT'S POLICY! Fact of that matter is, if I just default to what
we had before, it'd all be running with none. In a few years time, if
I'm lucky, someone will have shipped udev rules setting this appropriately.
If I ask the question, we'll get testing NOW. People will run with
the default set.
Linus Torvalds Feb. 22, 2017, 6:42 p.m. UTC | #5
On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?

Basically, I'm pushing back on config options that I can't personally
even sanely answer.

If it's a config option about "do I have a particular piece of
hardware", it makes sense. But these new ones were just complete
garbage.

The whole "default IO scheduler" thing is a disease. We should stop
making up these shit schedulers and then say "we don't know which one
works best for you".

All it does is encourage developers to make shortcuts and create crap
that isn't generically useful, and then blame the user and say "well,
you should have picked a different scheduler" when they say "this does
not work well for me".

We have had too many of those kinds of broken choices.  And when the
new Kconfig options get so confusing and so esoteric that I go "Hmm, I
have no idea if my hardware does a single queue or not", I put my foot
down.

When the IO scheduler questions were about a generic IO scheduler for
everything, I can kind of understand them. I think it was still a
mistake (for the reasons outline above), but at least it was a
comprehensible question to ask.

But when it gets to "what should I do about a single-queue version of
a MQ scheduler", the question is no longer even remotely sensible. The
question should simply NOT EXIST. There is no possible valid reason to
ask that kind of crap.

               Linus
Jens Axboe Feb. 22, 2017, 6:44 p.m. UTC | #6
On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
> 
> Basically, I'm pushing back on config options that I can't personally
> even sanely answer.

I got that much, and I don't disagree on that part.

> If it's a config option about "do I have a particular piece of
> hardware", it makes sense. But these new ones were just complete
> garbage.
> 
> The whole "default IO scheduler" thing is a disease. We should stop
> making up these shit schedulers and then say "we don't know which one
> works best for you".
> 
> All it does is encourage developers to make shortcuts and create crap
> that isn't generically useful, and then blame the user and say "well,
> you should have picked a different scheduler" when they say "this does
> not work well for me".
> 
> We have had too many of those kinds of broken choices.  And when the
> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> have no idea if my hardware does a single queue or not", I put my foot
> down.
> 
> When the IO scheduler questions were about a generic IO scheduler for
> everything, I can kind of understand them. I think it was still a
> mistake (for the reasons outline above), but at least it was a
> comprehensible question to ask.
> 
> But when it gets to "what should I do about a single-queue version of
> a MQ scheduler", the question is no longer even remotely sensible. The
> question should simply NOT EXIST. There is no possible valid reason to
> ask that kind of crap.

OK, so here's what I'll do:

1) We'll kill the default scheduler choices. sq blk-mq will default to
   mq-deadline, mq blk-mq will default to "none" (at least for now, until
   the new scheduler is done).
2) The individual schedulers will be y/m/n selectable, just like any
   other driver.

I hope that works for everyone.
Linus Torvalds Feb. 22, 2017, 6:45 p.m. UTC | #7
On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> The fact is that we have two different sets, until we can yank
> the old ones. So I can't just ask one question, since the sets
> aren't identical.

Bullshit.

I'm, saying: rip out the question ENTIRELY. For *both* cases.

If you cannot yourself give a good answer, then there's no f*cking way
any user can give a good answer. So asking the question is totally and
utterly pointless.

All it means is that different people will try different (in random
ways) configurations, and the end result is random crap.

So get rid of those questions. Pick a default, and live with it. And
if people complain about performance, fix the performance issue.

It's that simple.

                Linus
Jens Axboe Feb. 22, 2017, 6:52 p.m. UTC | #8
On 02/22/2017 11:45 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:41 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> The fact is that we have two different sets, until we can yank
>> the old ones. So I can't just ask one question, since the sets
>> aren't identical.
> 
> Bullshit.
> 
> I'm, saying: rip out the question ENTIRELY. For *both* cases.
> 
> If you cannot yourself give a good answer, then there's no f*cking way
> any user can give a good answer. So asking the question is totally and
> utterly pointless.
> 
> All it means is that different people will try different (in random
> ways) configurations, and the end result is random crap.
> 
> So get rid of those questions. Pick a default, and live with it. And
> if people complain about performance, fix the performance issue.
> 
> It's that simple.

No, it's not that simple at all. Fact is, some optimizations make sense
for some workloads, and some do not. CFQ works great for some cases, and
it works poorly for others, even if we try to make heuristics that enable
it to work well for all cases. Some optimizations are costly, that's
fine on certain types of hardware or maybe that's a trade off you want
to make. Or we end up with tons of settings for a single driver, that
does not reduce the configuration matrix at all.

By that logic, why do we have ANY config options outside of what drivers
to build? What should I set HZ at? RCU options? Let's default to ext4,
and kill off xfs? Or btrfs? slab/slob/slub/whatever?

Yes, that's taking the argument a bit more to the extreme, but it's the
same damn thing.

I'm fine with getting rid of the default selections, but we're NOT
going to be able to have just one scheduler for everything. We can
make sane defaults based on the hardware type.
Linus Torvalds Feb. 22, 2017, 6:56 p.m. UTC | #9
On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> It's that simple.
>
> No, it's not that simple at all. Fact is, some optimizations make sense
> for some workloads, and some do not.

Are you even listening?

I'm saying no user can ever give a sane answer to your question. The
question is insane and wrong.

I already said you can have a dynamic configuration (and maybe even an
automatic heuristic - like saying that a ramdisk gets NOOP by default,
real hardware does not).

But asking a user at kernel config time for a default is insane. If
*you* cannot answer it, then the user sure as hell cannot.

Other configuration questions have problems too, but at least the
question about "should I support ext4" is something a user (or distro)
can sanely answer. So your comparisons are pure bullshit.

                     Linus
Jens Axboe Feb. 22, 2017, 6:58 p.m. UTC | #10
On 02/22/2017 11:56 AM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:52 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> It's that simple.
>>
>> No, it's not that simple at all. Fact is, some optimizations make sense
>> for some workloads, and some do not.
> 
> Are you even listening?
> 
> I'm saying no user can ever give a sane answer to your question. The
> question is insane and wrong.
> 
> I already said you can have a dynamic configuration (and maybe even an
> automatic heuristic - like saying that a ramdisk gets NOOP by default,
> real hardware does not).
> 
> But asking a user at kernel config time for a default is insane. If
> *you* cannot answer it, then the user sure as hell cannot.
> 
> Other configuration questions have problems too, but at least the
> question about "should I support ext4" is something a user (or distro)
> can sanely answer. So your comparisons are pure bullshit.

As per the previous email, this was my proposed solution:

OK, so here's what I'll do:

1) We'll kill the default scheduler choices. sq blk-mq will default to
   mq-deadline, mq blk-mq will default to "none" (at least for now, until
   the new scheduler is done).
2) The individual schedulers will be y/m/n selectable, just like any
   other driver.

Any further settings on that can be done at runtime, through sysfs.
Linus Torvalds Feb. 22, 2017, 7:04 p.m. UTC | #11
On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>
> OK, so here's what I'll do:
>
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>    the new scheduler is done).
> 2) The individual schedulers will be y/m/n selectable, just like any
>    other driver.

Yes. That makes sense as options. I can (or, perhaps even more
importantly, a distro can) answer those kinds of questions.

                   Linus
Jens Axboe Feb. 22, 2017, 9:29 p.m. UTC | #12
On 02/22/2017 12:04 PM, Linus Torvalds wrote:
> On Wed, Feb 22, 2017 at 10:58 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 02/22/2017 11:56 AM, Linus Torvalds wrote:
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>    the new scheduler is done).
>> 2) The individual schedulers will be y/m/n selectable, just like any
>>    other driver.
> 
> Yes. That makes sense as options. I can (or, perhaps even more
> importantly, a distro can) answer those kinds of questions.

Someone misspelled pacman:

parman (PARMAN) [N/m/y] (NEW) ?

There is no help available for this option.

Or I think it's pacman, because I have no idea what else it could be. I'm
going to say N.
Markus Trippelsdorf Feb. 22, 2017, 9:50 p.m. UTC | #13
On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
> > On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
> > 
> > Basically, I'm pushing back on config options that I can't personally
> > even sanely answer.
> 
> I got that much, and I don't disagree on that part.
> 
> > If it's a config option about "do I have a particular piece of
> > hardware", it makes sense. But these new ones were just complete
> > garbage.
> > 
> > The whole "default IO scheduler" thing is a disease. We should stop
> > making up these shit schedulers and then say "we don't know which one
> > works best for you".
> > 
> > All it does is encourage developers to make shortcuts and create crap
> > that isn't generically useful, and then blame the user and say "well,
> > you should have picked a different scheduler" when they say "this does
> > not work well for me".
> > 
> > We have had too many of those kinds of broken choices.  And when the
> > new Kconfig options get so confusing and so esoteric that I go "Hmm, I
> > have no idea if my hardware does a single queue or not", I put my foot
> > down.
> > 
> > When the IO scheduler questions were about a generic IO scheduler for
> > everything, I can kind of understand them. I think it was still a
> > mistake (for the reasons outline above), but at least it was a
> > comprehensible question to ask.
> > 
> > But when it gets to "what should I do about a single-queue version of
> > a MQ scheduler", the question is no longer even remotely sensible. The
> > question should simply NOT EXIST. There is no possible valid reason to
> > ask that kind of crap.
> 
> OK, so here's what I'll do:
> 
> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>    the new scheduler is done).

But what about e.g. SATA SSDs? Wouldn't they be better off without any
scheduler? 
So perhaps setting "none" for queue/rotational==0 and mq-deadline for
spinning drives automatically in the sq blk-mq case?
Jens Axboe Feb. 22, 2017, 9:55 p.m. UTC | #14
On 02/22/2017 02:50 PM, Markus Trippelsdorf wrote:
> On 2017.02.22 at 11:44 -0700, Jens Axboe wrote:
>> On 02/22/2017 11:42 AM, Linus Torvalds wrote:
>>> On Wed, Feb 22, 2017 at 10:26 AM, Linus Torvalds
>>> <torvalds@linux-foundation.org> wrote:
>>>>
>>>> And dammit, IF YOU DON'T EVEN KNOW, WHY THE HELL ARE YOU ASKING THE POOR USER?
>>>
>>> Basically, I'm pushing back on config options that I can't personally
>>> even sanely answer.
>>
>> I got that much, and I don't disagree on that part.
>>
>>> If it's a config option about "do I have a particular piece of
>>> hardware", it makes sense. But these new ones were just complete
>>> garbage.
>>>
>>> The whole "default IO scheduler" thing is a disease. We should stop
>>> making up these shit schedulers and then say "we don't know which one
>>> works best for you".
>>>
>>> All it does is encourage developers to make shortcuts and create crap
>>> that isn't generically useful, and then blame the user and say "well,
>>> you should have picked a different scheduler" when they say "this does
>>> not work well for me".
>>>
>>> We have had too many of those kinds of broken choices.  And when the
>>> new Kconfig options get so confusing and so esoteric that I go "Hmm, I
>>> have no idea if my hardware does a single queue or not", I put my foot
>>> down.
>>>
>>> When the IO scheduler questions were about a generic IO scheduler for
>>> everything, I can kind of understand them. I think it was still a
>>> mistake (for the reasons outline above), but at least it was a
>>> comprehensible question to ask.
>>>
>>> But when it gets to "what should I do about a single-queue version of
>>> a MQ scheduler", the question is no longer even remotely sensible. The
>>> question should simply NOT EXIST. There is no possible valid reason to
>>> ask that kind of crap.
>>
>> OK, so here's what I'll do:
>>
>> 1) We'll kill the default scheduler choices. sq blk-mq will default to
>>    mq-deadline, mq blk-mq will default to "none" (at least for now, until
>>    the new scheduler is done).
> 
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler? 

Marginal. If they are single queue, using a basic scheduler like
deadline isn't going to be a significant amount of overhead. In some
cases they are going to be better off, due to better merging. In the
worst case, overhead is slightly higher. Net result is positive, I'd
say.

> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

You can do that through a udev rule. The kernel doesn't know if the
device is rotational or not when we set up the scheduler. So we'd either
have to add code to do that, or simply just do it with a udev rule. I'd
prefer the latter.
Linus Torvalds Feb. 23, 2017, 12:16 a.m. UTC | #15
On Wed, Feb 22, 2017 at 1:50 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> But what about e.g. SATA SSDs? Wouldn't they be better off without any
> scheduler?
> So perhaps setting "none" for queue/rotational==0 and mq-deadline for
> spinning drives automatically in the sq blk-mq case?

Jens already said that the merging advantage can outweigh the costs,
but he didn't actually talk much about it.

The scheduler advantage can outweigh the costs of running a scheduler
by an absolutely _huge_ amount.

An SSD isn't zero-cost, and each command tends to have some fixed
overhead on the controller, and pretty much all SSD's heavily prefer
fewer large request over lots of tiny ones.

There are also fairness/latency issues that tend to very heavily favor
having an actual scheduler, ie reads want to be scheduled before
writes on an SSD (within reason) in order to make latency better.

Ten years ago, there were lots of people who argued that you don't
want to do do scheduling for SSD's, because SSD's were so fast that
you only added overhead. Nobody really believes that fairytale any
more.

So you might have particular loads that look better with noop, but
they will be rare and far between. Try it, by all means, and if it
works for you, set it in your udev rules.

The main place where a noop scheduler currently might make sense is
likely for a ramdisk, but quite frankly, since the main real usecase
for a ram-disk tends to be to make it easy to profile and find the
bottlenecks for performance analysis (for emulating future "infinitely
fast" media), even that isn't true - using noop there defeats the
whole purpose.

              Linus
diff mbox

Patch

diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
index 0715ce93daef..f6144c5d7c70 100644
--- a/block/Kconfig.iosched
+++ b/block/Kconfig.iosched
@@ -75,7 +75,7 @@  config MQ_IOSCHED_NONE
 
 choice
 	prompt "Default single-queue blk-mq I/O scheduler"
-	default DEFAULT_SQ_NONE
+	default DEFAULT_SQ_DEADLINE if MQ_IOSCHED_DEADLINE=y
 	help
 	  Select the I/O scheduler which will be used by default for blk-mq
 	  managed block devices with a single queue.