diff mbox series

[1/8] block: Provide icq in request allocation data

Message ID 20211123103158.17284-1-jack@suse.cz (mailing list archive)
State New, archived
Headers show
Series bfq: Limit number of allocated scheduler tags per cgroup | expand

Commit Message

Jan Kara Nov. 23, 2021, 10:29 a.m. UTC
Currently we lookup ICQ only after the request is allocated. However BFQ
will want to decide how many scheduler tags it allows a given bfq queue
(effectively a process) to consume based on cgroup weight. So lookup ICQ
earlier and provide it in struct blk_mq_alloc_data so that BFQ can use
it.

Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 block/blk-mq-sched.c | 18 ++++++++++--------
 block/blk-mq-sched.h |  3 ++-
 block/blk-mq.c       |  8 ++++----
 block/blk-mq.h       |  1 +
 4 files changed, 17 insertions(+), 13 deletions(-)

Comments

Jens Axboe Nov. 23, 2021, 4:06 p.m. UTC | #1
On 11/23/21 3:29 AM, Jan Kara wrote:
> Currently we lookup ICQ only after the request is allocated. However BFQ
> will want to decide how many scheduler tags it allows a given bfq queue
> (effectively a process) to consume based on cgroup weight. So lookup ICQ
> earlier and provide it in struct blk_mq_alloc_data so that BFQ can use
> it.

I've been trying to clean this path up a bit, since I don't like having
something that just one scheduler needs in the fast path. See:

https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=f1f8191a8f9a0cdcd5ad99dfd7e551e8f444bec5

Would be better if we could avoid adding io_cq to blk_mq_alloc_data for
that reason, would it be possible to hide this away in the sched code
instead on top of the above?
Jan Kara Nov. 23, 2021, 10:30 p.m. UTC | #2
On Tue 23-11-21 09:06:47, Jens Axboe wrote:
> On 11/23/21 3:29 AM, Jan Kara wrote:
> > Currently we lookup ICQ only after the request is allocated. However BFQ
> > will want to decide how many scheduler tags it allows a given bfq queue
> > (effectively a process) to consume based on cgroup weight. So lookup ICQ
> > earlier and provide it in struct blk_mq_alloc_data so that BFQ can use
> > it.
> 
> I've been trying to clean this path up a bit, since I don't like having
> something that just one scheduler needs in the fast path. See:
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=f1f8191a8f9a0cdcd5ad99dfd7e551e8f444bec5
> 
> Would be better if we could avoid adding io_cq to blk_mq_alloc_data for
> that reason, would it be possible to hide this away in the sched code
> instead on top of the above?

Understood. We could certainly handle ICQ allocation & assignment only inside
BFQ. Just this would mean we would need to lookup ICQ once in
bfq_limit_depth() and then second time in bfq_prepare_request(). I guess
not a huge deal given the amount of work BFQ does for each request anyway.
So can I pull the above commit into the series and rebase this patch on top
of it?

								Honza
Jens Axboe Nov. 23, 2021, 11:10 p.m. UTC | #3
On 11/23/21 3:30 PM, Jan Kara wrote:
> On Tue 23-11-21 09:06:47, Jens Axboe wrote:
>> On 11/23/21 3:29 AM, Jan Kara wrote:
>>> Currently we lookup ICQ only after the request is allocated. However BFQ
>>> will want to decide how many scheduler tags it allows a given bfq queue
>>> (effectively a process) to consume based on cgroup weight. So lookup ICQ
>>> earlier and provide it in struct blk_mq_alloc_data so that BFQ can use
>>> it.
>>
>> I've been trying to clean this path up a bit, since I don't like having
>> something that just one scheduler needs in the fast path. See:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=perf-wip&id=f1f8191a8f9a0cdcd5ad99dfd7e551e8f444bec5
>>
>> Would be better if we could avoid adding io_cq to blk_mq_alloc_data for
>> that reason, would it be possible to hide this away in the sched code
>> instead on top of the above?
> 
> Understood. We could certainly handle ICQ allocation & assignment only
> inside BFQ. Just this would mean we would need to lookup ICQ once in
> bfq_limit_depth() and then second time in bfq_prepare_request(). I
> guess not a huge deal given the amount of work BFQ does for each
> request anyway.

Exactly, it's noise there, but would not be in the general core.

> So can I pull the above commit into the series and
> rebase this patch on top of it?

It's in my for-5.17/block, but might get rebased... For a series of
patches like this, basing on it would be fine though, and you should
just do that.
diff mbox series

Patch

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ba21449439cc..c4015b82a003 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -18,9 +18,8 @@ 
 #include "blk-mq-tag.h"
 #include "blk-wbt.h"
 
-void blk_mq_sched_assign_ioc(struct request *rq)
+struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q)
 {
-	struct request_queue *q = rq->q;
 	struct io_context *ioc;
 	struct io_cq *icq;
 
@@ -29,17 +28,20 @@  void blk_mq_sched_assign_ioc(struct request *rq)
 	 */
 	ioc = current->io_context;
 	if (!ioc)
-		return;
+		return NULL;
 
 	spin_lock_irq(&q->queue_lock);
 	icq = ioc_lookup_icq(ioc, q);
 	spin_unlock_irq(&q->queue_lock);
+	if (icq)
+		return icq;
+	return ioc_create_icq(ioc, q, GFP_ATOMIC);
+}
 
-	if (!icq) {
-		icq = ioc_create_icq(ioc, q, GFP_ATOMIC);
-		if (!icq)
-			return;
-	}
+void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq)
+{
+	if (!icq)
+		return;
 	get_io_context(icq->ioc);
 	rq->elv.icq = icq;
 }
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 25d1034952b6..40be707f69d7 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -8,7 +8,8 @@ 
 
 #define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ)
 
-void blk_mq_sched_assign_ioc(struct request *rq);
+struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q);
+void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq);
 
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
 		unsigned int nr_segs, struct request **merged_request);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8799fa73ef34..69b224c76d7c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -388,9 +388,7 @@  static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 
 		if (!op_is_flush(data->cmd_flags) &&
 		    e->type->ops.prepare_request) {
-			if (e->type->icq_cache)
-				blk_mq_sched_assign_ioc(rq);
-
+			blk_mq_sched_assign_ioc(rq, data->icq);
 			e->type->ops.prepare_request(rq);
 			rq->rq_flags |= RQF_ELVPRIV;
 		}
@@ -449,7 +447,9 @@  static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 		struct elevator_queue *e = q->elevator;
 
 		data->rq_flags |= RQF_ELV;
-
+		if (!op_is_flush(data->cmd_flags) && e->type->icq_cache &&
+		    e->type->ops.prepare_request)
+			data->icq = blk_mq_sched_lookup_icq(q);
 		/*
 		 * Flush/passthrough requests are special and go directly to the
 		 * dispatch list. Don't include reserved tags in the
diff --git a/block/blk-mq.h b/block/blk-mq.h
index afcf9931a489..4e9cf92ca587 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -152,6 +152,7 @@  static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
 struct blk_mq_alloc_data {
 	/* input parameter */
 	struct request_queue *q;
+	struct io_cq *icq;
 	blk_mq_req_flags_t flags;
 	unsigned int shallow_depth;
 	unsigned int cmd_flags;