diff mbox series

mq-deadline: Fix request accounting

Message ID 20210824170520.1659173-1-bvanassche@acm.org (mailing list archive)
State New, archived
Headers show
Series mq-deadline: Fix request accounting | expand

Commit Message

Bart Van Assche Aug. 24, 2021, 5:05 p.m. UTC
The block layer may call the I/O scheduler .finish_request() callback
without having called the .insert_requests() callback. Make sure that the
mq-deadline I/O statistics are correct if the block layer inserts an I/O
request that bypasses the I/O scheduler. This patch prevents that lower
priority I/O is delayed longer than necessary for mixed I/O priority
workloads.

Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Fixes: 08a9ad8bf607 ("block/mq-deadline: Add cgroup support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/mq-deadline.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Jens Axboe Aug. 24, 2021, 5:09 p.m. UTC | #1
On 8/24/21 11:05 AM, Bart Van Assche wrote:
> The block layer may call the I/O scheduler .finish_request() callback
> without having called the .insert_requests() callback. Make sure that the
> mq-deadline I/O statistics are correct if the block layer inserts an I/O
> request that bypasses the I/O scheduler. This patch prevents that lower
> priority I/O is delayed longer than necessary for mixed I/O priority
> workloads.
> 
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
> Fixes: 08a9ad8bf607 ("block/mq-deadline: Add cgroup support")

This one is a little confusing, since that commit got reverted...
Yet the patch still applies.

Shouldn't it be:

Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")
Bart Van Assche Aug. 24, 2021, 5:13 p.m. UTC | #2
On 8/24/21 10:09 AM, Jens Axboe wrote:
> Shouldn't it be:
> 
> Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")

Right, that's what the Fixes line should look like. Do you want me to
resend this patch?

Thanks,

Bart.
Jens Axboe Aug. 24, 2021, 5:18 p.m. UTC | #3
On 8/24/21 11:13 AM, Bart Van Assche wrote:
> On 8/24/21 10:09 AM, Jens Axboe wrote:
>> Shouldn't it be:
>>
>> Fixes: 38ba64d12d4c ("block/mq-deadline: Track I/O statistics")
> 
> Right, that's what the Fixes line should look like. Do you want me to
> resend this patch?

Nah that's fine, I'll just correct it.
Niklas Cassel Aug. 24, 2021, 9:35 p.m. UTC | #4
On Tue, Aug 24, 2021 at 10:05:20AM -0700, Bart Van Assche wrote:
> The block layer may call the I/O scheduler .finish_request() callback
> without having called the .insert_requests() callback. Make sure that the
> mq-deadline I/O statistics are correct if the block layer inserts an I/O
> request that bypasses the I/O scheduler. This patch prevents that lower
> priority I/O is delayed longer than necessary for mixed I/O priority
> workloads.
> 
> Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
> Cc: Damien Le Moal <damien.lemoal@wdc.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
> Fixes: 08a9ad8bf607 ("block/mq-deadline: Add cgroup support")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/mq-deadline.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index dbd74bae9f11..28f2e7655a5f 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -713,6 +713,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	prio = ioprio_class_to_prio[ioprio_class];
>  	dd_count(dd, inserted, prio);
> +	rq->elv.priv[0] = (void *)(uintptr_t)1;
>  
>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>  		blk_mq_free_requests(&free);
> @@ -761,12 +762,10 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
>  	spin_unlock(&dd->lock);
>  }
>  
> -/*
> - * Nothing to do here. This is defined only to ensure that .finish_request
> - * method is called upon request completion.
> - */
> +/* Callback from inside blk_mq_rq_ctx_init(). */
>  static void dd_prepare_request(struct request *rq)
>  {
> +	rq->elv.priv[0] = NULL;
>  }
>  
>  /*
> @@ -793,7 +792,14 @@ static void dd_finish_request(struct request *rq)
>  	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
>  	struct dd_per_prio *per_prio = &dd->per_prio[prio];
>  
> -	dd_count(dd, completed, prio);
> +	/*
> +	 * The block layer core may call dd_finish_request() without having
> +	 * called dd_insert_requests(). Hence only update statistics for
> +	 * requests for which dd_insert_requests() has been called. See also
> +	 * blk_mq_request_bypass_insert().
> +	 */
> +	if (rq->elv.priv[0])
> +		dd_count(dd, completed, prio);
>  
>  	if (blk_queue_is_zoned(q)) {
>  		unsigned long flags;

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Bart Van Assche Aug. 27, 2021, 2:22 a.m. UTC | #5
On 8/24/21 2:35 PM, Niklas Cassel wrote:
> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
> Tested-by: Niklas Cassel <niklas.cassel@wdc.com>

Hi Niklas,

Thank you for the help, that's really appreciated. Earlier today I discovered
that this patch does not handle requeuing correctly so I have started testing
the patch below.

---
 block/mq-deadline.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index dbd74bae9f11..c4c3c2e3f446 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -712,7 +712,10 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	blk_req_zone_write_unlock(rq);

 	prio = ioprio_class_to_prio[ioprio_class];
-	dd_count(dd, inserted, prio);
+	if (!rq->elv.priv[0]) {
+		dd_count(dd, inserted, prio);
+		rq->elv.priv[0] = (void *)(uintptr_t)1;
+	}

 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -761,12 +764,10 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }

-/*
- * Nothing to do here. This is defined only to ensure that .finish_request
- * method is called upon request completion.
- */
+/* Callback from inside blk_mq_rq_ctx_init(). */
 static void dd_prepare_request(struct request *rq)
 {
+	rq->elv.priv[0] = NULL;
 }

 /*
@@ -793,6 +794,14 @@ static void dd_finish_request(struct request *rq)
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];

+	/*
+	 * The block layer core may call dd_finish_request() without having
+	 * called dd_insert_requests(). Skip requests that bypassed I/O
+	 * scheduling. See also blk_mq_request_bypass_insert().
+	 */
+	if (!rq->elv.priv[0])
+		return;
+
 	dd_count(dd, completed, prio);

 	if (blk_queue_is_zoned(q)) {
Niklas Cassel Aug. 27, 2021, 9:20 a.m. UTC | #6
On Thu, Aug 26, 2021 at 07:22:38PM -0700, Bart Van Assche wrote:
> On 8/24/21 2:35 PM, Niklas Cassel wrote:
> > Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
> > Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
> 
> Hi Niklas,
> 
> Thank you for the help, that's really appreciated. Earlier today I discovered
> that this patch does not handle requeuing correctly so I have started testing
> the patch below.

Since Jens has reverted "block/mq-deadline: Prioritize high-priority requests",
which was the culprit for the 10 second slowdowns in the first case, there is
no longer any urgency for you to fix the counters to behave properly on
requeues, since without that patch, the counters are not used for decision
making anyway.

Looking more closely on how it is solved in BFQ:
they have both .finish_request and .requeue_request pointing to the same
function: bfq_finish_requeue_request()

bfq_finish_requeue_request() also sets rq->elv.priv[0] = NULL; at the end
of the function.

Also see the big note about how this really should be solved in blk-mq:
https://github.com/torvalds/linux/blob/master/block/bfq-iosched.c#L6456-L6462

Is seems wrong to clutter multiple callback functions in both BFQ and
mq-deadline because of shortcomings in blk-mq.


It seems like the way forward is to add a RQF_ELEVATOR request flag, which
only gets set in req->rq_flags if e->type->ops.insert_requests() gets called
(by blk_mq_sched_insert_request() or blk_mq_sched_insert_requests()).
Then blk_mq_free_request() only calls ops.finish_request if RQF_ELEVATOR is
set, and similarly blk_mq_sched_requeue_request() only calls
ops.requeue_request if RQF_ELEVATOR is set (and perhaps also
blk_mq_sched_completed_request() for ops.completed_request).

This should enable us to remove ugly if-statements from .insert_requests and
.finish_request for both mq-deadline and BFQ. (And BFQ can also remove ugly
if-statements and comments from their .requeue_request callback.)


And no, we cannot reuse RQF_ELVPRIV, since that flag gets set by
__blk_mq_alloc_request(), which is at a way too early stage, since at that
point, we don't know if the request will bypass the scheduler or not,
the flag has to get set at insertion time.
(Since right now, a request that has RQF_ELVPRIV, and have thus been prepared
by ops.prepare_request, might still not get inserted into the scheduler.)

We should probably just remove RQF_ELVPRIV, and as a first step, simply have
blk_mq_sched_insert_request() and blk_mq_sched_insert_requests() call
ops.prepare_request immediately before ops.insert_requests. Because currently,
it seems that all elevators only use .prepare_request to clear private fields,
no elevator has any real logic in their .prepare_request callback.


Kind regards,
Niklas
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index dbd74bae9f11..28f2e7655a5f 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -713,6 +713,7 @@  static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
+	rq->elv.priv[0] = (void *)(uintptr_t)1;
 
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -761,12 +762,10 @@  static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 	spin_unlock(&dd->lock);
 }
 
-/*
- * Nothing to do here. This is defined only to ensure that .finish_request
- * method is called upon request completion.
- */
+/* Callback from inside blk_mq_rq_ctx_init(). */
 static void dd_prepare_request(struct request *rq)
 {
+	rq->elv.priv[0] = NULL;
 }
 
 /*
@@ -793,7 +792,14 @@  static void dd_finish_request(struct request *rq)
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
 
-	dd_count(dd, completed, prio);
+	/*
+	 * The block layer core may call dd_finish_request() without having
+	 * called dd_insert_requests(). Hence only update statistics for
+	 * requests for which dd_insert_requests() has been called. See also
+	 * blk_mq_request_bypass_insert().
+	 */
+	if (rq->elv.priv[0])
+		dd_count(dd, completed, prio);
 
 	if (blk_queue_is_zoned(q)) {
 		unsigned long flags;