diff mbox series

[2/2] Revert "mq-deadline: Fix request accounting"

Message ID 20210907142145.112096-3-Niklas.Cassel@wdc.com (mailing list archive)
State New, archived
Headers show
Series don't call io scheduler callbacks for passthrough requests | expand

Commit Message

Niklas Cassel Sept. 7, 2021, 2:21 p.m. UTC
From: Niklas Cassel <niklas.cassel@wdc.com>

This reverts commit b6d2b054e8baaee53fd2d4854c63cbf0f2c6262a.

blk-mq will no longer call the I/O scheduler .finish_request() callback
for requests that were never inserted to the I/O scheduler.

Therefore, we can remove the logic inside mq-deadline that was added to
workaround the (no longer existing) quirky behavior of blk-mq.

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
 block/mq-deadline.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

Comments

Bart Van Assche Sept. 7, 2021, 2:54 p.m. UTC | #1
On 9/7/21 7:21 AM, Niklas Cassel wrote:
> From: Niklas Cassel <niklas.cassel@wdc.com>
> 
> This reverts commit b6d2b054e8baaee53fd2d4854c63cbf0f2c6262a.
> 
> blk-mq will no longer call the I/O scheduler .finish_request() callback
> for requests that were never inserted to the I/O scheduler.
> 
> Therefore, we can remove the logic inside mq-deadline that was added to
> workaround the (no longer existing) quirky behavior of blk-mq.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> ---
>   block/mq-deadline.c | 16 +++++-----------
>   1 file changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 7f3c3932b723..b2d1e3adcb39 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -678,7 +678,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>   
>   	prio = ioprio_class_to_prio[ioprio_class];
>   	dd_count(dd, inserted, prio);
> -	rq->elv.priv[0] = (void *)(uintptr_t)1;
>   
>   	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>   		blk_mq_free_requests(&free);
> @@ -727,10 +726,12 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
>   	spin_unlock(&dd->lock);
>   }
>   
> -/* Callback from inside blk_mq_rq_ctx_init(). */
> +/*
> + * Nothing to do here. This is defined only to ensure that .finish_request
> + * method is called upon request completion.
> + */
>   static void dd_prepare_request(struct request *rq)
>   {
> -	rq->elv.priv[0] = NULL;
>   }

Please take a look at
https://lore.kernel.org/linux-block/18594aff-4a94-8a48-334c-f21ae32120c6@acm.org/
If dd_prepare_request() is removed I will have to reintroduce it.

Thanks,

Bart.
Bart Van Assche Sept. 7, 2021, 3:15 p.m. UTC | #2
On 9/7/21 7:21 AM, Niklas Cassel wrote:
> blk-mq will no longer call the I/O scheduler .finish_request() callback
> for requests that were never inserted to the I/O scheduler.

I do not agree. Even with patch 1/2 from this series applied, finish_request()
will still be called for requests inserted by blk_insert_cloned_request()
although these requests are never inserted to the I/O scheduler.

Bart.
Niklas Cassel Sept. 7, 2021, 4:07 p.m. UTC | #3
On Tue, Sep 07, 2021 at 07:54:09AM -0700, Bart Van Assche wrote:
> On 9/7/21 7:21 AM, Niklas Cassel wrote:
> > From: Niklas Cassel <niklas.cassel@wdc.com>
> > 
> > This reverts commit b6d2b054e8baaee53fd2d4854c63cbf0f2c6262a.
> > 
> > blk-mq will no longer call the I/O scheduler .finish_request() callback
> > for requests that were never inserted to the I/O scheduler.
> > 
> > Therefore, we can remove the logic inside mq-deadline that was added to
> > workaround the (no longer existing) quirky behavior of blk-mq.
> > 
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > ---
> >   block/mq-deadline.c | 16 +++++-----------
> >   1 file changed, 5 insertions(+), 11 deletions(-)
> > 
> > diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> > index 7f3c3932b723..b2d1e3adcb39 100644
> > --- a/block/mq-deadline.c
> > +++ b/block/mq-deadline.c
> > @@ -678,7 +678,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
> >   	prio = ioprio_class_to_prio[ioprio_class];
> >   	dd_count(dd, inserted, prio);
> > -	rq->elv.priv[0] = (void *)(uintptr_t)1;
> >   	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
> >   		blk_mq_free_requests(&free);
> > @@ -727,10 +726,12 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
> >   	spin_unlock(&dd->lock);
> >   }
> > -/* Callback from inside blk_mq_rq_ctx_init(). */
> > +/*
> > + * Nothing to do here. This is defined only to ensure that .finish_request
> > + * method is called upon request completion.
> > + */
> >   static void dd_prepare_request(struct request *rq)
> >   {
> > -	rq->elv.priv[0] = NULL;
> >   }
> 
> Please take a look at
> https://lore.kernel.org/linux-block/18594aff-4a94-8a48-334c-f21ae32120c6@acm.org/
> If dd_prepare_request() is removed I will have to reintroduce it.

dd_prepare_request() itself is not removed, just the
rq->elv.priv[0] = NULL; inside dd_prepare_request().

If you need to modify dd_prepare_request() in a future
commit, that should be fine, no?

Without patch 1/2, e->type->ops.requeue_request() can get called
both for requests that bypassed the I/O scheduler, and for requests
that were inserted in the I/O scheduler.

See:
block/blk-mq-sched.h:blk_mq_sched_requeue_request()
If the RQF_ELVPRIV flag is not set, e->type->ops.requeue_request()
will not be called.

Perhaps you are having issues with requests that were inserted
in the scheduler, but later requeued?

If so, shouldn't these fixes help you, since you do not need to
worry about passthrough requests resulting in spurious calls to
the I/O scheduler callbacks?


Kind regards,
Niklas
Niklas Cassel Sept. 7, 2021, 4:28 p.m. UTC | #4
On Tue, Sep 07, 2021 at 08:15:03AM -0700, Bart Van Assche wrote:
> On 9/7/21 7:21 AM, Niklas Cassel wrote:
> > blk-mq will no longer call the I/O scheduler .finish_request() callback
> > for requests that were never inserted to the I/O scheduler.
> 
> I do not agree. Even with patch 1/2 from this series applied, finish_request()
> will still be called for requests inserted by blk_insert_cloned_request()
> although these requests are never inserted to the I/O scheduler.
> 
> Bart.

Hello Bart,


Looking at blk_mq_free_request(),
e->type->ops.finish_request() will only be called if RQF_ELVPRIV
is set.

blk_insert_cloned_request() doesn't seem to allocate a request
itself, but instead takes an already cloned request.

So I guess it depends on how the supplied request was cloned.

I would assume if the original request doesn't have RQF_ELVPRIV set,
then neither will the cloned request?

I tried to look at blk_rq_prep_clone(), which seems to be a common
cloning function, but I don't see req->rq_flags being copied
(except for RQF_SPECIAL_PAYLOAD).

Anyway, I don't see how .finish_request() will be called in relation
to blk_insert_cloned_request(). Could you please help me out and
give me an example of a call chain where this can happen?


Kind regards,
Niklas
Bart Van Assche Sept. 7, 2021, 4:49 p.m. UTC | #5
On 9/7/21 9:07 AM, Niklas Cassel wrote:
> On Tue, Sep 07, 2021 at 07:54:09AM -0700, Bart Van Assche wrote:
>> Please take a look at
>> https://lore.kernel.org/linux-block/18594aff-4a94-8a48-334c-f21ae32120c6@acm.org/
>> If dd_prepare_request() is removed I will have to reintroduce it.
> 
> dd_prepare_request() itself is not removed, just the
> rq->elv.priv[0] = NULL; inside dd_prepare_request().
> 
> If you need to modify dd_prepare_request() in a future
> commit, that should be fine, no?
> 
> Without patch 1/2, e->type->ops.requeue_request() can get called
> both for requests that bypassed the I/O scheduler, and for requests
> that were inserted in the I/O scheduler.
> 
> See:
> block/blk-mq-sched.h:blk_mq_sched_requeue_request()
> If the RQF_ELVPRIV flag is not set, e->type->ops.requeue_request()
> will not be called.
> 
> Perhaps you are having issues with requests that were inserted
> in the scheduler, but later requeued?
> 
> If so, shouldn't these fixes help you, since you do not need to
> worry about passthrough requests resulting in spurious calls to
> the I/O scheduler callbacks?

In my comment I was indeed referring to requeued requests. If a request
is requeued the scheduler insert callback function is called multiple
times. From my point of view this patch series doesn't help much since
requeued requests are not addressed.

Thanks,

Bart.
Bart Van Assche Sept. 7, 2021, 5:12 p.m. UTC | #6
On 9/7/21 9:28 AM, Niklas Cassel wrote:
> On Tue, Sep 07, 2021 at 08:15:03AM -0700, Bart Van Assche wrote:
>> On 9/7/21 7:21 AM, Niklas Cassel wrote:
>>> blk-mq will no longer call the I/O scheduler .finish_request() callback
>>> for requests that were never inserted to the I/O scheduler.
>>
>> I do not agree. Even with patch 1/2 from this series applied, finish_request()
>> will still be called for requests inserted by blk_insert_cloned_request()
>> although these requests are never inserted to the I/O scheduler.
> 
> Looking at blk_mq_free_request(),
> e->type->ops.finish_request() will only be called if RQF_ELVPRIV
> is set.
> 
> blk_insert_cloned_request() doesn't seem to allocate a request
> itself, but instead takes an already cloned request.
> 
> So I guess it depends on how the supplied request was cloned.
> 
> I would assume if the original request doesn't have RQF_ELVPRIV set,
> then neither will the cloned request?
> 
> I tried to look at blk_rq_prep_clone(), which seems to be a common
> cloning function, but I don't see req->rq_flags being copied
> (except for RQF_SPECIAL_PAYLOAD).
> 
> Anyway, I don't see how .finish_request() will be called in relation
> to blk_insert_cloned_request(). Could you please help me out and
> give me an example of a call chain where this can happen?

Hi Niklas,

This is a bit outside my area of expertise. Anyway: map_request() calls
.clone_and_map_rq(). At least multipath_clone_and_map() calls
blk_get_request(). I think this shows that blk_insert_cloned_request()
may insert an entirely new request. Is my understanding correct that
blk_mq_rq_ctx_init() will set RQF_ELVPRIV for the cloned request if a
scheduler is associated with the request queue associated with the
cloned request?

Bart.
Niklas Cassel Sept. 8, 2021, 11:57 a.m. UTC | #7
On Tue, Sep 07, 2021 at 10:12:49AM -0700, Bart Van Assche wrote:
> On 9/7/21 9:28 AM, Niklas Cassel wrote:
> > On Tue, Sep 07, 2021 at 08:15:03AM -0700, Bart Van Assche wrote:
> > > On 9/7/21 7:21 AM, Niklas Cassel wrote:
> > > > blk-mq will no longer call the I/O scheduler .finish_request() callback
> > > > for requests that were never inserted to the I/O scheduler.
> > > 
> > > I do not agree. Even with patch 1/2 from this series applied, finish_request()
> > > will still be called for requests inserted by blk_insert_cloned_request()
> > > although these requests are never inserted to the I/O scheduler.
> > 
> > Looking at blk_mq_free_request(),
> > e->type->ops.finish_request() will only be called if RQF_ELVPRIV
> > is set.
> > 
> > blk_insert_cloned_request() doesn't seem to allocate a request
> > itself, but instead takes an already cloned request.
> > 
> > So I guess it depends on how the supplied request was cloned.
> > 
> > I would assume if the original request doesn't have RQF_ELVPRIV set,
> > then neither will the cloned request?
> > 
> > I tried to look at blk_rq_prep_clone(), which seems to be a common
> > cloning function, but I don't see req->rq_flags being copied
> > (except for RQF_SPECIAL_PAYLOAD).
> > 
> > Anyway, I don't see how .finish_request() will be called in relation
> > to blk_insert_cloned_request(). Could you please help me out and
> > give me an example of a call chain where this can happen?
> 
> Hi Niklas,
> 
> This is a bit outside my area of expertise. Anyway: map_request() calls
> .clone_and_map_rq(). At least multipath_clone_and_map() calls
> blk_get_request(). I think this shows that blk_insert_cloned_request()
> may insert an entirely new request. Is my understanding correct that
> blk_mq_rq_ctx_init() will set RQF_ELVPRIV for the cloned request if a
> scheduler is associated with the request queue associated with the
> cloned request?
> 
> Bart.

Thank you Bart.


dm-rq.c:map_request() calls .clone_and_map_rq()

one example of a .clone_and_map_rq() implementation is
multipath_clone_and_map().

multipath_clone_and_map(), to get a clone simply does blk_get_request(),
which does blk_mq_alloc_request(), which calls __blk_mq_alloc_request(),
which calls blk_mq_rq_ctx_init(), which will set RQF_ELVPRIV, as long as
the request is not a flush or a passthrough request.

dm-rq.c:dm_dispatch_clone_request() calls blk_insert_cloned_request(),
which will bypass the I/O scheduler.

So, a request cloned by e.g. drivers/md/dm-mpath.c will
bypass the I/O scheduler, but can still have the RQF_ELVPRIV flag set.



This just tells me that if someone wants to clean up this mess,
we have to do something similar to what I proposed in my initial RFC:
https://lore.kernel.org/linux-block/20210827124100.98112-2-Niklas.Cassel@wdc.com/

i.e., set RQF_ELVPRIV flag immediately before calling
e->type->ops.insert_requests(), and only then.

It seems logical to simply set the flag before actually inserting the request
to the scheduler, rather than finding all the corner cases where we don't.

Anyway, I'll leave this potential cleanup to people more familiar with the
blk-mq code for now.


Kind regards,
Niklas
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 7f3c3932b723..b2d1e3adcb39 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -678,7 +678,6 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
-	rq->elv.priv[0] = (void *)(uintptr_t)1;
 
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -727,10 +726,12 @@  static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
-/* Callback from inside blk_mq_rq_ctx_init(). */
+/*
+ * Nothing to do here. This is defined only to ensure that .finish_request
+ * method is called upon request completion.
+ */
 static void dd_prepare_request(struct request *rq)
 {
-	rq->elv.priv[0] = NULL;
 }
 
 /*
@@ -757,14 +758,7 @@  static void dd_finish_request(struct request *rq)
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
 
-	/*
-	 * The block layer core may call dd_finish_request() without having
-	 * called dd_insert_requests(). Hence only update statistics for
-	 * requests for which dd_insert_requests() has been called. See also
-	 * blk_mq_request_bypass_insert().
-	 */
-	if (rq->elv.priv[0])
-		dd_count(dd, completed, prio);
+	dd_count(dd, completed, prio);
 
 	if (blk_queue_is_zoned(q)) {
 		unsigned long flags;