diff mbox series

[8/8] block: Always initialize bio IO priority on submit

Message ID 20220615161616.5055-8-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series block: Fix IO priority mess | expand

Commit Message

Jan Kara June 15, 2022, 4:16 p.m. UTC
Currently, IO priority set in task's IO context is not reflected in the
bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
results in odd results where process is submitting some bios with one
priority and other bios with a different (unset) priority and due to
differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
always set on bio submission.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-mq.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Damien Le Moal June 16, 2022, 3:15 a.m. UTC | #1
On 6/16/22 01:16, Jan Kara wrote:
> Currently, IO priority set in task's IO context is not reflected in the
> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
> results in odd results where process is submitting some bios with one
> priority and other bios with a different (unset) priority and due to
> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
> always set on bio submission.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   block/blk-mq.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index e17d822e6051..e976f696babc 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
>   
>   static void bio_set_ioprio(struct bio *bio)
>   {
> +	unsigned short ioprio_class;
> +
>   	blkcg_set_ioprio(bio);

Shouldn't this function set the default using the below "if" code ?

> +	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> +	/* Nobody set ioprio so far? Initialize it based on task's nice value */

I do not think that the ioprio_class variable is useful.
This can be:

	if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
		bio->bi_ioprio = get_current_ioprio();

> +	if (ioprio_class == IOPRIO_CLASS_NONE)
> +		bio->bi_ioprio = get_current_ioprio();
>   }
>   
>   /**

Beside this comment, I am still scratching my head regarding what the 
user gets with ioprio_get(). If I understood your patches correctly, the 
user may still see IOPRIO_CLASS_NONE ?
For that case, to be in sync with the man page, I thought the returned 
ioprio should be the effective one based on the task io nice value, that 
is, the value returned by get_current_ioprio(). Am I missing something... ?
Jan Kara June 16, 2022, 11:23 a.m. UTC | #2
On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
> On 6/16/22 01:16, Jan Kara wrote:
> > Currently, IO priority set in task's IO context is not reflected in the
> > bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
> > results in odd results where process is submitting some bios with one
> > priority and other bios with a different (unset) priority and due to
> > differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
> > always set on bio submission.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >   block/blk-mq.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index e17d822e6051..e976f696babc 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
> >   static void bio_set_ioprio(struct bio *bio)
> >   {
> > +	unsigned short ioprio_class;
> > +
> >   	blkcg_set_ioprio(bio);
> 
> Shouldn't this function set the default using the below "if" code ?
> 
> > +	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> > +	/* Nobody set ioprio so far? Initialize it based on task's nice value */
> 
> I do not think that the ioprio_class variable is useful.
> This can be:
> 
> 	if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
> 		bio->bi_ioprio = get_current_ioprio();

You are right. Fixed.

> > +	if (ioprio_class == IOPRIO_CLASS_NONE)
> > +		bio->bi_ioprio = get_current_ioprio();
> >   }
> >   /**
> 
> Beside this comment, I am still scratching my head regarding what the user
> gets with ioprio_get(). If I understood your patches correctly, the user may
> still see IOPRIO_CLASS_NONE ?
> For that case, to be in sync with the man page, I thought the returned
> ioprio should be the effective one based on the task io nice value, that is,
> the value returned by get_current_ioprio(). Am I missing something... ?

The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
or IOPRIO_WHO_USER the effective IO priority may be different for different
processes considered and it can be also further influenced by blk-ioprio
settings. But thinking about it now after things have settled I agree that
what you suggests makes more sense. I'll fix that. Thanks for suggestion!

								Honza
Jan Kara June 16, 2022, 12:24 p.m. UTC | #3
On Thu 16-06-22 13:23:03, Jan Kara wrote:
> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
> > On 6/16/22 01:16, Jan Kara wrote:
> > > +	if (ioprio_class == IOPRIO_CLASS_NONE)
> > > +		bio->bi_ioprio = get_current_ioprio();
> > >   }
> > >   /**
> > 
> > Beside this comment, I am still scratching my head regarding what the user
> > gets with ioprio_get(). If I understood your patches correctly, the user may
> > still see IOPRIO_CLASS_NONE ?
> > For that case, to be in sync with the man page, I thought the returned
> > ioprio should be the effective one based on the task io nice value, that is,
> > the value returned by get_current_ioprio(). Am I missing something... ?
> 
> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
> or IOPRIO_WHO_USER the effective IO priority may be different for different
> processes considered and it can be also further influenced by blk-ioprio
> settings. But thinking about it now after things have settled I agree that
> what you suggests makes more sense. I'll fix that. Thanks for suggestion!

Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS
(which is probably the most used and the best defined variant), we were
returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until
commit e70344c05995 ("block: fix default IO priority handling"). So my
patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER
consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to
change the behavior of IOPRIO_WHO_PROCESS because that has the biggest
chances for userspace regressions. But perhaps it makes sense to keep
IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and
just use effective IO priority in those two variants. That looks like the
smallest API change to make things at least somewhat sensible...

								Honza

> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Damien Le Moal June 17, 2022, 12:04 a.m. UTC | #4
On 6/16/22 21:24, Jan Kara wrote:
> On Thu 16-06-22 13:23:03, Jan Kara wrote:
>> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
>>> On 6/16/22 01:16, Jan Kara wrote:
>>>> +	if (ioprio_class == IOPRIO_CLASS_NONE)
>>>> +		bio->bi_ioprio = get_current_ioprio();
>>>>   }
>>>>   /**
>>>
>>> Beside this comment, I am still scratching my head regarding what the user
>>> gets with ioprio_get(). If I understood your patches correctly, the user may
>>> still see IOPRIO_CLASS_NONE ?
>>> For that case, to be in sync with the man page, I thought the returned
>>> ioprio should be the effective one based on the task io nice value, that is,
>>> the value returned by get_current_ioprio(). Am I missing something... ?
>>
>> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
>> or IOPRIO_WHO_USER the effective IO priority may be different for different
>> processes considered and it can be also further influenced by blk-ioprio
>> settings. But thinking about it now after things have settled I agree that
>> what you suggests makes more sense. I'll fix that. Thanks for suggestion!
> 
> Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS
> (which is probably the most used and the best defined variant), we were
> returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until
> commit e70344c05995 ("block: fix default IO priority handling"). So my
> patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER
> consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to
> change the behavior of IOPRIO_WHO_PROCESS because that has the biggest
> chances for userspace regressions. But perhaps it makes sense to keep
> IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and
> just use effective IO priority in those two variants. That looks like the
> smallest API change to make things at least somewhat sensible...

Still bit lost. Let me try to summarize your goal:

1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return
the effective priority

2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also
return the effective priority

3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do
you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE)
? Or switch to using the effective IO priority ? Not that the last 2
choices are actually equivalent if the user did not IO nice the process
(the default for the effective IO prio is class BE)

For (1) and (2), I am not sure. Given that my last changes to the ioprio
default did not seem to have bothered anyone (nobody screamed at me :)) I
am tempted to say: any choice is OK. So we should try to get as close as
the man page defined behavior as possible.
Damien Le Moal June 17, 2022, 12:05 a.m. UTC | #5
On 6/16/22 20:23, Jan Kara wrote:
> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
>> On 6/16/22 01:16, Jan Kara wrote:
>>> Currently, IO priority set in task's IO context is not reflected in the
>>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
>>> results in odd results where process is submitting some bios with one
>>> priority and other bios with a different (unset) priority and due to
>>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
>>> always set on bio submission.
>>>
>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>> ---
>>>   block/blk-mq.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index e17d822e6051..e976f696babc 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
>>>   static void bio_set_ioprio(struct bio *bio)
>>>   {
>>> +	unsigned short ioprio_class;
>>> +
>>>   	blkcg_set_ioprio(bio);
>>
>> Shouldn't this function set the default using the below "if" code ?
>>
>>> +	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
>>> +	/* Nobody set ioprio so far? Initialize it based on task's nice value */
>>
>> I do not think that the ioprio_class variable is useful.
>> This can be:
>>
>> 	if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
>> 		bio->bi_ioprio = get_current_ioprio();
> 
> You are right. Fixed.

What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would
then not be needed at all and blkcg_set_ioprio() called directly ?

> 
>>> +	if (ioprio_class == IOPRIO_CLASS_NONE)
>>> +		bio->bi_ioprio = get_current_ioprio();
>>>   }
>>>   /**
>>
>> Beside this comment, I am still scratching my head regarding what the user
>> gets with ioprio_get(). If I understood your patches correctly, the user may
>> still see IOPRIO_CLASS_NONE ?
>> For that case, to be in sync with the man page, I thought the returned
>> ioprio should be the effective one based on the task io nice value, that is,
>> the value returned by get_current_ioprio(). Am I missing something... ?
> 
> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
> or IOPRIO_WHO_USER the effective IO priority may be different for different
> processes considered and it can be also further influenced by blk-ioprio
> settings. But thinking about it now after things have settled I agree that
> what you suggests makes more sense. I'll fix that. Thanks for suggestion!
> 
> 								Honza
Jan Kara June 17, 2022, 11:49 a.m. UTC | #6
On Fri 17-06-22 09:04:34, Damien Le Moal wrote:
> On 6/16/22 21:24, Jan Kara wrote:
> > On Thu 16-06-22 13:23:03, Jan Kara wrote:
> >> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
> >>> On 6/16/22 01:16, Jan Kara wrote:
> >>>> +	if (ioprio_class == IOPRIO_CLASS_NONE)
> >>>> +		bio->bi_ioprio = get_current_ioprio();
> >>>>   }
> >>>>   /**
> >>>
> >>> Beside this comment, I am still scratching my head regarding what the user
> >>> gets with ioprio_get(). If I understood your patches correctly, the user may
> >>> still see IOPRIO_CLASS_NONE ?
> >>> For that case, to be in sync with the man page, I thought the returned
> >>> ioprio should be the effective one based on the task io nice value, that is,
> >>> the value returned by get_current_ioprio(). Am I missing something... ?
> >>
> >> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
> >> or IOPRIO_WHO_USER the effective IO priority may be different for different
> >> processes considered and it can be also further influenced by blk-ioprio
> >> settings. But thinking about it now after things have settled I agree that
> >> what you suggests makes more sense. I'll fix that. Thanks for suggestion!
> > 
> > Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS
> > (which is probably the most used and the best defined variant), we were
> > returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until
> > commit e70344c05995 ("block: fix default IO priority handling"). So my
> > patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER
> > consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to
> > change the behavior of IOPRIO_WHO_PROCESS because that has the biggest
> > chances for userspace regressions. But perhaps it makes sense to keep
> > IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and
> > just use effective IO priority in those two variants. That looks like the
> > smallest API change to make things at least somewhat sensible...
> 
> Still bit lost. Let me try to summarize your goal:
> 
> 1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return
> the effective priority

You make it sound here like IOPRIO_WHO_PGRP would be some different type of
IO priority. For record it is not, there's just one IO priority per task,
if you set ioprio with IOPRIO_WHO_PGRP, it will just iterate all the tasks
in PGRP and set IO priority for each task. After my patches,
ioprio_get(IOPRIO_WHO_PGRPIO) will return the best of the effective IO
priorities of tasks within PGRP. Before my patch it was doing the same but
if IO priority was unset for some task it considered it to be CLASS_BE,4.

> 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also
> return the effective priority.

This is the same as above. Just the calls iterate over all tasks of the
given user...

> 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do
> you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE)
> ? Or switch to using the effective IO priority ? Not that the last 2
> choices are actually equivalent if the user did not IO nice the process
> (the default for the effective IO prio is class BE)
 
I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO
priority.

> For (1) and (2), I am not sure. Given that my last changes to the ioprio
> default did not seem to have bothered anyone (nobody screamed at me :)) I
> am tempted to say: any choice is OK. So we should try to get as close as
> the man page defined behavior as possible.

I also don't find (1) and (2) too important. (3) is IMHO somewhat important
and I think that the reason why nobody complained about the change there is
because your change is relatively new so it didn't propagate yet to any
widely used distro kernel...

								Honza
Jan Kara June 17, 2022, 11:52 a.m. UTC | #7
On Fri 17-06-22 09:05:12, Damien Le Moal wrote:
> On 6/16/22 20:23, Jan Kara wrote:
> > On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
> >> On 6/16/22 01:16, Jan Kara wrote:
> >>> Currently, IO priority set in task's IO context is not reflected in the
> >>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
> >>> results in odd results where process is submitting some bios with one
> >>> priority and other bios with a different (unset) priority and due to
> >>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
> >>> always set on bio submission.
> >>>
> >>> Signed-off-by: Jan Kara <jack@suse.cz>
> >>> ---
> >>>   block/blk-mq.c | 6 ++++++
> >>>   1 file changed, 6 insertions(+)
> >>>
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index e17d822e6051..e976f696babc 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
> >>>   static void bio_set_ioprio(struct bio *bio)
> >>>   {
> >>> +	unsigned short ioprio_class;
> >>> +
> >>>   	blkcg_set_ioprio(bio);
> >>
> >> Shouldn't this function set the default using the below "if" code ?
> >>
> >>> +	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
> >>> +	/* Nobody set ioprio so far? Initialize it based on task's nice value */
> >>
> >> I do not think that the ioprio_class variable is useful.
> >> This can be:
> >>
> >> 	if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
> >> 		bio->bi_ioprio = get_current_ioprio();
> > 
> > You are right. Fixed.
> 
> What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would
> then not be needed at all and blkcg_set_ioprio() called directly ?

What I dislike about this is that it is counterintuitive that
blkcg_set_prio() does anything when blkcgs are disabled in kernel config
(and it would have to to keep things working). So if you dislike
bio_set_ioprio(), I can just opencode this function in its single call
site. Would that be better?

								Honza
Damien Le Moal June 17, 2022, noon UTC | #8
On 6/17/22 20:52, Jan Kara wrote:
> On Fri 17-06-22 09:05:12, Damien Le Moal wrote:
>> On 6/16/22 20:23, Jan Kara wrote:
>>> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
>>>> On 6/16/22 01:16, Jan Kara wrote:
>>>>> Currently, IO priority set in task's IO context is not reflected in the
>>>>> bio->bi_ioprio for most IO (only io_uring and direct IO set it). This
>>>>> results in odd results where process is submitting some bios with one
>>>>> priority and other bios with a different (unset) priority and due to
>>>>> differing priorities bios cannot be merged. Make sure bio->bi_ioprio is
>>>>> always set on bio submission.
>>>>>
>>>>> Signed-off-by: Jan Kara <jack@suse.cz>
>>>>> ---
>>>>>   block/blk-mq.c | 6 ++++++
>>>>>   1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index e17d822e6051..e976f696babc 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -2793,7 +2793,13 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
>>>>>   static void bio_set_ioprio(struct bio *bio)
>>>>>   {
>>>>> +	unsigned short ioprio_class;
>>>>> +
>>>>>   	blkcg_set_ioprio(bio);
>>>>
>>>> Shouldn't this function set the default using the below "if" code ?
>>>>
>>>>> +	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
>>>>> +	/* Nobody set ioprio so far? Initialize it based on task's nice value */
>>>>
>>>> I do not think that the ioprio_class variable is useful.
>>>> This can be:
>>>>
>>>> 	if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
>>>> 		bio->bi_ioprio = get_current_ioprio();
>>>
>>> You are right. Fixed.
>>
>> What about moving this inside blkcg_set_ioprio() ? bio_set_ioprio() would
>> then not be needed at all and blkcg_set_ioprio() called directly ?
> 
> What I dislike about this is that it is counterintuitive that
> blkcg_set_prio() does anything when blkcgs are disabled in kernel config
> (and it would have to to keep things working). So if you dislike
> bio_set_ioprio(), I can just opencode this function in its single call
> site. Would that be better?

Ah, yes, blkcg may be disabled. Got it. So sure, bio_set_ioprio() is fine
as is.

> 
> 								Honza
Damien Le Moal June 17, 2022, 12:03 p.m. UTC | #9
On 6/17/22 20:49, Jan Kara wrote:
> On Fri 17-06-22 09:04:34, Damien Le Moal wrote:
>> On 6/16/22 21:24, Jan Kara wrote:
>>> On Thu 16-06-22 13:23:03, Jan Kara wrote:
>>>> On Thu 16-06-22 12:15:25, Damien Le Moal wrote:
>>>>> On 6/16/22 01:16, Jan Kara wrote:
>>>>>> +	if (ioprio_class == IOPRIO_CLASS_NONE)
>>>>>> +		bio->bi_ioprio = get_current_ioprio();
>>>>>>   }
>>>>>>   /**
>>>>>
>>>>> Beside this comment, I am still scratching my head regarding what the user
>>>>> gets with ioprio_get(). If I understood your patches correctly, the user may
>>>>> still see IOPRIO_CLASS_NONE ?
>>>>> For that case, to be in sync with the man page, I thought the returned
>>>>> ioprio should be the effective one based on the task io nice value, that is,
>>>>> the value returned by get_current_ioprio(). Am I missing something... ?
>>>>
>>>> The trouble with returning "effective ioprio" is that with IOPRIO_WHO_PGRP
>>>> or IOPRIO_WHO_USER the effective IO priority may be different for different
>>>> processes considered and it can be also further influenced by blk-ioprio
>>>> settings. But thinking about it now after things have settled I agree that
>>>> what you suggests makes more sense. I'll fix that. Thanks for suggestion!
>>>
>>> Oh, now I've remembered why I've done it that way. With IOPRIO_WHO_PROCESS
>>> (which is probably the most used and the best defined variant), we were
>>> returning IOPRIO_CLASS_NONE if the task didn't have set IO priority until
>>> commit e70344c05995 ("block: fix default IO priority handling"). So my
>>> patch was just making behavior of IOPRIO_WHO_PGRP & IOPRIO_WHO_USER
>>> consistent with the behavior of IOPRIO_WHO_PROCESS. I'd be reluctant to
>>> change the behavior of IOPRIO_WHO_PROCESS because that has the biggest
>>> chances for userspace regressions. But perhaps it makes sense to keep
>>> IOPRIO_WHO_PGRP & IOPRIO_WHO_USER inconsistent with IOPRIO_WHO_PROCESS and
>>> just use effective IO priority in those two variants. That looks like the
>>> smallest API change to make things at least somewhat sensible...
>>
>> Still bit lost. Let me try to summarize your goal:
>>
>> 1) If IOPRIO_WHO_PGRP is not set, ioprio_get(IOPRIO_WHO_PGRP) will return
>> the effective priority
> 
> You make it sound here like IOPRIO_WHO_PGRP would be some different type of
> IO priority. For record it is not, there's just one IO priority per task,
> if you set ioprio with IOPRIO_WHO_PGRP, it will just iterate all the tasks
> in PGRP and set IO priority for each task. After my patches,
> ioprio_get(IOPRIO_WHO_PGRPIO) will return the best of the effective IO
> priorities of tasks within PGRP. Before my patch it was doing the same but
> if IO priority was unset for some task it considered it to be CLASS_BE,4.

OK. Got it. Thanks for clarifying.

> 
>> 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also
>> return the effective priority.
> 
> This is the same as above. Just the calls iterate over all tasks of the
> given user...
> 
>> 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do
>> you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE)
>> ? Or switch to using the effective IO priority ? Not that the last 2
>> choices are actually equivalent if the user did not IO nice the process
>> (the default for the effective IO prio is class BE)
>  
> I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO
> priority.

And that would be to retain the older (broken) behavior. Because if we
consider the man page, tasks with an unset IO prio should be reported as
having the effective IO nice based priority, which is class BE if IO nice
is not set. Right ? I am OK with that, but I think we should add this
explanation as a comment somewhere in the prio code. No ?

> 
>> For (1) and (2), I am not sure. Given that my last changes to the ioprio
>> default did not seem to have bothered anyone (nobody screamed at me :)) I
>> am tempted to say: any choice is OK. So we should try to get as close as
>> the man page defined behavior as possible.
> 
> I also don't find (1) and (2) too important. (3) is IMHO somewhat important
> and I think that the reason why nobody complained about the change there is
> because your change is relatively new so it didn't propagate yet to any
> widely used distro kernel...

Indeed.

> 
> 								Honza
Jan Kara June 17, 2022, 2:54 p.m. UTC | #10
On Fri 17-06-22 21:03:45, Damien Le Moal wrote:
> On 6/17/22 20:49, Jan Kara wrote:
> > On Fri 17-06-22 09:04:34, Damien Le Moal wrote:
> >> 2) If IOPRIO_WHO_USER is not set, ioprio_get(IOPRIO_WHO_USER) will also
> >> return the effective priority.
> > 
> > This is the same as above. Just the calls iterate over all tasks of the
> > given user...
> > 
> >> 3) if IOPRIO_WHO_PROCESS is not set, return ? I am lost for this one. Do
> >> you want to go back to IOPRIO_CLASS_NONE ? Keep default (IOPRIO_CLASS_BE)
> >> ? Or switch to using the effective IO priority ? Not that the last 2
> >> choices are actually equivalent if the user did not IO nice the process
> >> (the default for the effective IO prio is class BE)
> >  
> > I want to go back to returning IOPRIO_CLASS_NONE for tasks with unset IO
> > priority.
> 
> And that would be to retain the older (broken) behavior. Because if we
> consider the man page, tasks with an unset IO prio should be reported as
> having the effective IO nice based priority, which is class BE if IO nice
> is not set. Right ? I am OK with that, but I think we should add this
> explanation as a comment somewhere in the prio code. No ?

Adding a comment regarding this is certainly a good idea, I'll do that. WRT
whether the old behavior is broken or not - I actually think the old
behavior is more useful because it allows userspace to distinguish a
situation when IO priority is set based on nice value from a situation when
IO priority is set to a fixed value. Also the old behavior makes

  ioprio_set(pid, IOPRIO_WHO_PROCESS, ioprio_get(pid, IOPRIO_WHO_PROCESS))

a noop which is IMO a good property to have for a get/set APIs.

								Honza
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e17d822e6051..e976f696babc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2793,7 +2793,13 @@  static inline struct request *blk_mq_get_cached_request(struct request_queue *q,
 
 static void bio_set_ioprio(struct bio *bio)
 {
+	unsigned short ioprio_class;
+
 	blkcg_set_ioprio(bio);
+	ioprio_class = IOPRIO_PRIO_CLASS(bio->bi_ioprio);
+	/* Nobody set ioprio so far? Initialize it based on task's nice value */
+	if (ioprio_class == IOPRIO_CLASS_NONE)
+		bio->bi_ioprio = get_current_ioprio();
 }
 
 /**