diff mbox series

[BUGFIX/IMPROVEMENT,2/6] block, bfq: put reqs of waker and woken in dispatch list

Message ID 20210126105102.53102-3-paolo.valente@linaro.org (mailing list archive)
State New, archived
Headers show
Series block, bfq: third and last batch of fixes and improvements | expand

Commit Message

Paolo Valente Jan. 26, 2021, 10:50 a.m. UTC
Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
this happens, the only active bfq_queues are bfqq and either its waker
bfq_queue or one of its woken bfq_queues, then there is no point in
queueing this new I/O request in bfqq for service. In fact, the
in-service queue and bfqq agree on serving this new I/O request as
soon as possible. So this commit puts this new I/O request directly
into the dispatch list.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Jens Axboe Jan. 26, 2021, 4:18 p.m. UTC | #1
On 1/26/21 3:50 AM, Paolo Valente wrote:
> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
> this happens, the only active bfq_queues are bfqq and either its waker
> bfq_queue or one of its woken bfq_queues, then there is no point in
> queueing this new I/O request in bfqq for service. In fact, the
> in-service queue and bfqq agree on serving this new I/O request as
> soon as possible. So this commit puts this new I/O request directly
> into the dispatch list.
> 
> Tested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
> ---
>  block/bfq-iosched.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index a83149407336..e5b83910fbe0 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	spin_lock_irq(&bfqd->lock);
>  	bfqq = bfq_init_rq(rq);
> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
> +
> +	/*
> +	 * Additional case for putting rq directly into the dispatch
> +	 * queue: the only active bfq_queues are bfqq and either its
> +	 * waker bfq_queue or one of its woken bfq_queues. In this
> +	 * case, there is no point in queueing rq in bfqq for
> +	 * service. In fact, the in-service queue and bfqq agree on
> +	 * serving this new I/O request as soon as possible.
> +	 */
> +	if (!bfqq ||
> +	    (bfqq != bfqd->in_service_queue &&
> +	     bfqd->in_service_queue != NULL &&
> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
> +	    at_head || blk_rq_is_passthrough(rq)) {
>  		if (at_head)
>  			list_add(&rq->queuelist, &bfqd->dispatch);
>  		else
> 

This is unreadable... Just seems like you are piling heuristics in to
catch some case, and it's neither readable nor clean.
Paolo Valente Jan. 28, 2021, 5:54 p.m. UTC | #2
> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@kernel.dk> ha scritto:
> 
> On 1/26/21 3:50 AM, Paolo Valente wrote:
>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
>> this happens, the only active bfq_queues are bfqq and either its waker
>> bfq_queue or one of its woken bfq_queues, then there is no point in
>> queueing this new I/O request in bfqq for service. In fact, the
>> in-service queue and bfqq agree on serving this new I/O request as
>> soon as possible. So this commit puts this new I/O request directly
>> into the dispatch list.
>> 
>> Tested-by: Jan Kara <jack@suse.cz>
>> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
>> ---
>> block/bfq-iosched.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>> 
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index a83149407336..e5b83910fbe0 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>> 
>> 	spin_lock_irq(&bfqd->lock);
>> 	bfqq = bfq_init_rq(rq);
>> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>> +
>> +	/*
>> +	 * Additional case for putting rq directly into the dispatch
>> +	 * queue: the only active bfq_queues are bfqq and either its
>> +	 * waker bfq_queue or one of its woken bfq_queues. In this
>> +	 * case, there is no point in queueing rq in bfqq for
>> +	 * service. In fact, the in-service queue and bfqq agree on
>> +	 * serving this new I/O request as soon as possible.
>> +	 */
>> +	if (!bfqq ||
>> +	    (bfqq != bfqd->in_service_queue &&
>> +	     bfqd->in_service_queue != NULL &&
>> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
>> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
>> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
>> +	    at_head || blk_rq_is_passthrough(rq)) {
>> 		if (at_head)
>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>> 		else
>> 
> 
> This is unreadable... Just seems like you are piling heuristics in to
> catch some case, and it's neither readable nor clean.
> 

Yeah, these comments inappropriately assume that the reader knows the
waker mechanism in depth.  And they do not stress at all how important
this improvement is.

I'll do my best to improve these comments.

To try to do a better job, let me also explain the matter early here.
Maybe you or others can give me some early feedback (or just tell me
to proceed).

This change is one of the main improvements that boosted
throughput in Jan's tests.  Here is the rationale:
- consider a bfq_queue, say Q1, detected as a waker of another
  bfq_queue, say Q2
- by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
  of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
  example is journald
- so, Q1 and Q2 are in any respect two cooperating processes: if the
  service of Q1's I/O is delayed, Q2 can only suffer from it.
  Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.
- as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
  only queue in service, there is absolutely no point in delaying the
  service of such an I/O.  The only possible result is a throughput
  loss, detected by Jan's test
- so, when the above condition holds, the most effective and efficient
  action is to put the new I/O directly in the dispatch list
- as an additional restriction, Q1 and Q2 must be the only busy queues
  for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
  necessary, because, if also other queues are waiting for service, then
  putting new I/O directly in the dispatch list may evidently cause a
  violation of service guarantees for the other queues

If these comments make things clearer, then I'll put them in the
commit message and the code, and I'll proceed with a V2.

Thanks,
Paolo


> -- 
> Jens Axboe
>
Paolo Valente Feb. 3, 2021, 11:01 a.m. UTC | #3
> Il giorno 28 gen 2021, alle ore 18:54, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@kernel.dk> ha scritto:
>> 
>> On 1/26/21 3:50 AM, Paolo Valente wrote:
>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
>>> this happens, the only active bfq_queues are bfqq and either its waker
>>> bfq_queue or one of its woken bfq_queues, then there is no point in
>>> queueing this new I/O request in bfqq for service. In fact, the
>>> in-service queue and bfqq agree on serving this new I/O request as
>>> soon as possible. So this commit puts this new I/O request directly
>>> into the dispatch list.
>>> 
>>> Tested-by: Jan Kara <jack@suse.cz>
>>> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
>>> ---
>>> block/bfq-iosched.c | 17 ++++++++++++++++-
>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index a83149407336..e5b83910fbe0 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>> 
>>> 	spin_lock_irq(&bfqd->lock);
>>> 	bfqq = bfq_init_rq(rq);
>>> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>> +
>>> +	/*
>>> +	 * Additional case for putting rq directly into the dispatch
>>> +	 * queue: the only active bfq_queues are bfqq and either its
>>> +	 * waker bfq_queue or one of its woken bfq_queues. In this
>>> +	 * case, there is no point in queueing rq in bfqq for
>>> +	 * service. In fact, the in-service queue and bfqq agree on
>>> +	 * serving this new I/O request as soon as possible.
>>> +	 */
>>> +	if (!bfqq ||
>>> +	    (bfqq != bfqd->in_service_queue &&
>>> +	     bfqd->in_service_queue != NULL &&
>>> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
>>> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
>>> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
>>> +	    at_head || blk_rq_is_passthrough(rq)) {
>>> 		if (at_head)
>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>> 		else
>>> 
>> 
>> This is unreadable... Just seems like you are piling heuristics in to
>> catch some case, and it's neither readable nor clean.
>> 
> 
> Yeah, these comments inappropriately assume that the reader knows the
> waker mechanism in depth.  And they do not stress at all how important
> this improvement is.
> 
> I'll do my best to improve these comments.
> 
> To try to do a better job, let me also explain the matter early here.
> Maybe you or others can give me some early feedback (or just tell me
> to proceed).
> 
> This change is one of the main improvements that boosted
> throughput in Jan's tests.  Here is the rationale:
> - consider a bfq_queue, say Q1, detected as a waker of another
>  bfq_queue, say Q2
> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
>  of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
>  example is journald
> - so, Q1 and Q2 are in any respect two cooperating processes: if the
>  service of Q1's I/O is delayed, Q2 can only suffer from it.
>  Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.
> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
>  only queue in service, there is absolutely no point in delaying the
>  service of such an I/O.  The only possible result is a throughput
>  loss, detected by Jan's test
> - so, when the above condition holds, the most effective and efficient
>  action is to put the new I/O directly in the dispatch list
> - as an additional restriction, Q1 and Q2 must be the only busy queues
>  for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
>  necessary, because, if also other queues are waiting for service, then
>  putting new I/O directly in the dispatch list may evidently cause a
>  violation of service guarantees for the other queues
> 
> If these comments make things clearer, then I'll put them in the
> commit message and the code, and I'll proceed with a V2.
> 

Hi Jens,
may I proceed with a V2?

Thanks,
Paolo

> Thanks,
> Paolo
> 
> 
>> -- 
>> Jens Axboe
Jan Kara Feb. 3, 2021, 11:43 a.m. UTC | #4
On Thu 28-01-21 18:54:05, Paolo Valente wrote:
> 
> 
> > Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@kernel.dk> ha scritto:
> > 
> > On 1/26/21 3:50 AM, Paolo Valente wrote:
> >> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
> >> this happens, the only active bfq_queues are bfqq and either its waker
> >> bfq_queue or one of its woken bfq_queues, then there is no point in
> >> queueing this new I/O request in bfqq for service. In fact, the
> >> in-service queue and bfqq agree on serving this new I/O request as
> >> soon as possible. So this commit puts this new I/O request directly
> >> into the dispatch list.
> >> 
> >> Tested-by: Jan Kara <jack@suse.cz>
> >> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
> >> ---
> >> block/bfq-iosched.c | 17 ++++++++++++++++-
> >> 1 file changed, 16 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> >> index a83149407336..e5b83910fbe0 100644
> >> --- a/block/bfq-iosched.c
> >> +++ b/block/bfq-iosched.c
> >> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >> 
> >> 	spin_lock_irq(&bfqd->lock);
> >> 	bfqq = bfq_init_rq(rq);
> >> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
> >> +
> >> +	/*
> >> +	 * Additional case for putting rq directly into the dispatch
> >> +	 * queue: the only active bfq_queues are bfqq and either its
> >> +	 * waker bfq_queue or one of its woken bfq_queues. In this
> >> +	 * case, there is no point in queueing rq in bfqq for
> >> +	 * service. In fact, the in-service queue and bfqq agree on
> >> +	 * serving this new I/O request as soon as possible.
> >> +	 */
> >> +	if (!bfqq ||
> >> +	    (bfqq != bfqd->in_service_queue &&
> >> +	     bfqd->in_service_queue != NULL &&
> >> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
> >> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
> >> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
> >> +	    at_head || blk_rq_is_passthrough(rq)) {
> >> 		if (at_head)
> >> 			list_add(&rq->queuelist, &bfqd->dispatch);
> >> 		else
> >> 
> > 
> > This is unreadable... Just seems like you are piling heuristics in to
> > catch some case, and it's neither readable nor clean.
> > 
> 
> Yeah, these comments inappropriately assume that the reader knows the
> waker mechanism in depth.  And they do not stress at all how important
> this improvement is.
> 
> I'll do my best to improve these comments.
> 
> To try to do a better job, let me also explain the matter early here.
> Maybe you or others can give me some early feedback (or just tell me
> to proceed).
> 
> This change is one of the main improvements that boosted
> throughput in Jan's tests.  Here is the rationale:
> - consider a bfq_queue, say Q1, detected as a waker of another
>   bfq_queue, say Q2
> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
>   of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
					       ^^ Q2?

>   example is journald
> - so, Q1 and Q2 are in any respect two cooperating processes: if the
>   service of Q1's I/O is delayed, Q2 can only suffer from it.
>   Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.

What do you exactly mean by this last sentence?

> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
>   only queue in service, there is absolutely no point in delaying the
>   service of such an I/O.  The only possible result is a throughput
>   loss, detected by Jan's test

If we are idling at that moment waiting for more IO from in service queue,
I agree. But that doesn't seem to be part of your condition above?

> - so, when the above condition holds, the most effective and efficient
>   action is to put the new I/O directly in the dispatch list
> - as an additional restriction, Q1 and Q2 must be the only busy queues
>   for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
>   necessary, because, if also other queues are waiting for service, then
>   putting new I/O directly in the dispatch list may evidently cause a
>   violation of service guarantees for the other queues

This last restriction is not ideal for cases like jbd2 thread since it may
still lead to pointless idling but I understand that without some
restriction like this several waking threads could just starve other ones.
So I guess it's fine for now.

								Honza
Paolo Valente Feb. 5, 2021, 10:16 a.m. UTC | #5
> Il giorno 3 feb 2021, alle ore 12:43, Jan Kara <jack@suse.cz> ha scritto:
> 
> On Thu 28-01-21 18:54:05, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@kernel.dk> ha scritto:
>>> 
>>> On 1/26/21 3:50 AM, Paolo Valente wrote:
>>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
>>>> this happens, the only active bfq_queues are bfqq and either its waker
>>>> bfq_queue or one of its woken bfq_queues, then there is no point in
>>>> queueing this new I/O request in bfqq for service. In fact, the
>>>> in-service queue and bfqq agree on serving this new I/O request as
>>>> soon as possible. So this commit puts this new I/O request directly
>>>> into the dispatch list.
>>>> 
>>>> Tested-by: Jan Kara <jack@suse.cz>
>>>> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
>>>> ---
>>>> block/bfq-iosched.c | 17 ++++++++++++++++-
>>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>> index a83149407336..e5b83910fbe0 100644
>>>> --- a/block/bfq-iosched.c
>>>> +++ b/block/bfq-iosched.c
>>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>> 
>>>> 	spin_lock_irq(&bfqd->lock);
>>>> 	bfqq = bfq_init_rq(rq);
>>>> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>>> +
>>>> +	/*
>>>> +	 * Additional case for putting rq directly into the dispatch
>>>> +	 * queue: the only active bfq_queues are bfqq and either its
>>>> +	 * waker bfq_queue or one of its woken bfq_queues. In this
>>>> +	 * case, there is no point in queueing rq in bfqq for
>>>> +	 * service. In fact, the in-service queue and bfqq agree on
>>>> +	 * serving this new I/O request as soon as possible.
>>>> +	 */
>>>> +	if (!bfqq ||
>>>> +	    (bfqq != bfqd->in_service_queue &&
>>>> +	     bfqd->in_service_queue != NULL &&
>>>> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
>>>> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
>>>> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
>>>> +	    at_head || blk_rq_is_passthrough(rq)) {
>>>> 		if (at_head)
>>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>>> 		else
>>>> 
>>> 
>>> This is unreadable... Just seems like you are piling heuristics in to
>>> catch some case, and it's neither readable nor clean.
>>> 
>> 
>> Yeah, these comments inappropriately assume that the reader knows the
>> waker mechanism in depth.  And they do not stress at all how important
>> this improvement is.
>> 
>> I'll do my best to improve these comments.
>> 
>> To try to do a better job, let me also explain the matter early here.
>> Maybe you or others can give me some early feedback (or just tell me
>> to proceed).
>> 
>> This change is one of the main improvements that boosted
>> throughput in Jan's tests.  Here is the rationale:
>> - consider a bfq_queue, say Q1, detected as a waker of another
>>  bfq_queue, say Q2
>> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
>>  of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
> 					       ^^ Q2?
> 

Yes, thank you!

(after this interaction, I'll fix and improve all this description,
according to your comments)

>>  example is journald
>> - so, Q1 and Q2 are in any respect two cooperating processes: if the
>>  service of Q1's I/O is delayed, Q2 can only suffer from it.
>>  Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.
> 
> What do you exactly mean by this last sentence?

By definition of waker, the purpose of Q1's I/O is doing what needs to
be done, so that new Q2's I/O can finally be issued.  Delaying Q2's I/O
is the opposite of this goal.

> 
>> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
>>  only queue in service, there is absolutely no point in delaying the
>>  service of such an I/O.  The only possible result is a throughput
>>  loss, detected by Jan's test
> 
> If we are idling at that moment waiting for more IO from in service queue,
> I agree.

And I agree too, if the drive has no internal queueing, has no
parallelism or pipeline, or is at least one order of magnitude slower
than the CPU is processing I/O.  In all other cases, serving the I/O
of only one queue at a time means throwing away throughput.  For
example, on a consumer SSD, moving from one to two I/O threads served
in parallel usually means doubling the throughput.

So, the best thing to do, if all the above conditions are met, is to
have this new I/O dispatched as soon as possible.

The most efficient way to attain this goal is to just put the new I/O
directly into the dispatch list.

> But that doesn't seem to be part of your condition above?
> 
>> - so, when the above condition holds, the most effective and efficient
>>  action is to put the new I/O directly in the dispatch list
>> - as an additional restriction, Q1 and Q2 must be the only busy queues
>>  for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
>>  necessary, because, if also other queues are waiting for service, then
>>  putting new I/O directly in the dispatch list may evidently cause a
>>  violation of service guarantees for the other queues
> 
> This last restriction is not ideal for cases like jbd2 thread since it may
> still lead to pointless idling but I understand that without some
> restriction like this several waking threads could just starve other ones.

Yeah, the goal here is to reduce a little bit false positives.

> So I guess it's fine for now.
> 

Yes, hopefully experience will lead us to even improvements or even
better solutions.

Thanks,
Paolo

> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
Paolo Valente Feb. 9, 2021, 5:09 p.m. UTC | #6
> Il giorno 5 feb 2021, alle ore 11:16, Paolo Valente <paolo.valente@linaro.org> ha scritto:
> 
> 
> 
>> Il giorno 3 feb 2021, alle ore 12:43, Jan Kara <jack@suse.cz> ha scritto:
>> 
>> On Thu 28-01-21 18:54:05, Paolo Valente wrote:
>>> 
>>> 
>>>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@kernel.dk> ha scritto:
>>>> 
>>>> On 1/26/21 3:50 AM, Paolo Valente wrote:
>>>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
>>>>> this happens, the only active bfq_queues are bfqq and either its waker
>>>>> bfq_queue or one of its woken bfq_queues, then there is no point in
>>>>> queueing this new I/O request in bfqq for service. In fact, the
>>>>> in-service queue and bfqq agree on serving this new I/O request as
>>>>> soon as possible. So this commit puts this new I/O request directly
>>>>> into the dispatch list.
>>>>> 
>>>>> Tested-by: Jan Kara <jack@suse.cz>
>>>>> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
>>>>> ---
>>>>> block/bfq-iosched.c | 17 ++++++++++++++++-
>>>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>>> index a83149407336..e5b83910fbe0 100644
>>>>> --- a/block/bfq-iosched.c
>>>>> +++ b/block/bfq-iosched.c
>>>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>> 
>>>>> 	spin_lock_irq(&bfqd->lock);
>>>>> 	bfqq = bfq_init_rq(rq);
>>>>> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>>>> +
>>>>> +	/*
>>>>> +	 * Additional case for putting rq directly into the dispatch
>>>>> +	 * queue: the only active bfq_queues are bfqq and either its
>>>>> +	 * waker bfq_queue or one of its woken bfq_queues. In this
>>>>> +	 * case, there is no point in queueing rq in bfqq for
>>>>> +	 * service. In fact, the in-service queue and bfqq agree on
>>>>> +	 * serving this new I/O request as soon as possible.
>>>>> +	 */
>>>>> +	if (!bfqq ||
>>>>> +	    (bfqq != bfqd->in_service_queue &&
>>>>> +	     bfqd->in_service_queue != NULL &&
>>>>> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
>>>>> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
>>>>> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
>>>>> +	    at_head || blk_rq_is_passthrough(rq)) {
>>>>> 		if (at_head)
>>>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>>>> 		else
>>>>> 
>>>> 
>>>> This is unreadable... Just seems like you are piling heuristics in to
>>>> catch some case, and it's neither readable nor clean.
>>>> 
>>> 
>>> Yeah, these comments inappropriately assume that the reader knows the
>>> waker mechanism in depth.  And they do not stress at all how important
>>> this improvement is.
>>> 
>>> I'll do my best to improve these comments.
>>> 
>>> To try to do a better job, let me also explain the matter early here.
>>> Maybe you or others can give me some early feedback (or just tell me
>>> to proceed).
>>> 
>>> This change is one of the main improvements that boosted
>>> throughput in Jan's tests.  Here is the rationale:
>>> - consider a bfq_queue, say Q1, detected as a waker of another
>>> bfq_queue, say Q2
>>> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
>>> of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
>> 					       ^^ Q2?
>> 
> 
> Yes, thank you!
> 
> (after this interaction, I'll fix and improve all this description,
> according to your comments)
> 
>>> example is journald
>>> - so, Q1 and Q2 are in any respect two cooperating processes: if the
>>> service of Q1's I/O is delayed, Q2 can only suffer from it.
>>> Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.
>> 
>> What do you exactly mean by this last sentence?
> 
> By definition of waker, the purpose of Q1's I/O is doing what needs to
> be done, so that new Q2's I/O can finally be issued.  Delaying Q2's I/O
> is the opposite of this goal.
> 
>> 
>>> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
>>> only queue in service, there is absolutely no point in delaying the
>>> service of such an I/O.  The only possible result is a throughput
>>> loss, detected by Jan's test
>> 
>> If we are idling at that moment waiting for more IO from in service queue,
>> I agree.
> 
> And I agree too, if the drive has no internal queueing, has no
> parallelism or pipeline, or is at least one order of magnitude slower
> than the CPU is processing I/O.  In all other cases, serving the I/O
> of only one queue at a time means throwing away throughput.  For
> example, on a consumer SSD, moving from one to two I/O threads served
> in parallel usually means doubling the throughput.
> 
> So, the best thing to do, if all the above conditions are met, is to
> have this new I/O dispatched as soon as possible.
> 
> The most efficient way to attain this goal is to just put the new I/O
> directly into the dispatch list.
> 
>> But that doesn't seem to be part of your condition above?
>> 
>>> - so, when the above condition holds, the most effective and efficient
>>> action is to put the new I/O directly in the dispatch list
>>> - as an additional restriction, Q1 and Q2 must be the only busy queues
>>> for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
>>> necessary, because, if also other queues are waiting for service, then
>>> putting new I/O directly in the dispatch list may evidently cause a
>>> violation of service guarantees for the other queues
>> 
>> This last restriction is not ideal for cases like jbd2 thread since it may
>> still lead to pointless idling but I understand that without some
>> restriction like this several waking threads could just starve other ones.
> 
> Yeah, the goal here is to reduce a little bit false positives.
> 
>> So I guess it's fine for now.
>> 
> 
> Yes, hopefully experience will lead us to even improvements or even
> better solutions.
> 

Hi Jens,
on a separate thread, Jan told me that my last reply, and therefore
also this patch are ok for him.  May I now proceed with a V2, in which
I'll report my extra comments? Or are there some other issues for you?

Thanks,
Paolo

> Thanks,
> Paolo
> 
>> 								Honza
>> -- 
>> Jan Kara <jack@suse.com>
>> SUSE Labs, CR
diff mbox series

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a83149407336..e5b83910fbe0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5640,7 +5640,22 @@  static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	spin_lock_irq(&bfqd->lock);
 	bfqq = bfq_init_rq(rq);
-	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
+
+	/*
+	 * Additional case for putting rq directly into the dispatch
+	 * queue: the only active bfq_queues are bfqq and either its
+	 * waker bfq_queue or one of its woken bfq_queues. In this
+	 * case, there is no point in queueing rq in bfqq for
+	 * service. In fact, the in-service queue and bfqq agree on
+	 * serving this new I/O request as soon as possible.
+	 */
+	if (!bfqq ||
+	    (bfqq != bfqd->in_service_queue &&
+	     bfqd->in_service_queue != NULL &&
+	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
+	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
+	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
+	    at_head || blk_rq_is_passthrough(rq)) {
 		if (at_head)
 			list_add(&rq->queuelist, &bfqd->dispatch);
 		else