[1/2] blk-mq: Remove blk_mq_put_ctx()
diff mbox series

Message ID 20190604181736.903-2-bvanassche@acm.org
State New
Headers show
Series
  • Simplify blk-mq implementation
Related show

Commit Message

Bart Van Assche June 4, 2019, 6:17 p.m. UTC
No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
on preemption being disabled for its correctness. Since removing the CPU
preemption calls does not measurably affect performance, simplify the
blk-mq code by removing the blk_mq_put_ctx() function and also by not
disabling preemption in blk_mq_get_ctx().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-sched.c  |  5 +----
 block/blk-mq-tag.c    |  8 --------
 block/blk-mq.c        | 16 +++-------------
 block/blk-mq.h        |  7 +------
 block/kyber-iosched.c |  1 -
 5 files changed, 5 insertions(+), 32 deletions(-)

Comments

Christoph Hellwig June 8, 2019, 8:19 a.m. UTC | #1
On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote:
> No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
> on preemption being disabled for its correctness. Since removing the CPU
> preemption calls does not measurably affect performance, simplify the
> blk-mq code by removing the blk_mq_put_ctx() function and also by not
> disabling preemption in blk_mq_get_ctx().

I like the idea behinds this, but I think it makes some small issues
we have in the current code even worse.  As far as I can tell the idea
behind this call was that we operate on the same blk_mq_ctx for the
duration of the I/O submission.  Now it should not matter which one,
that is we don't care if we get preempted, but it should stay the same.

To actually make this work properly we'll need to pass down the
blk_mq_ctx into the I/O scheduler merge path, which dereference it.
Note that I also have an outstanding series to pass down an additional
parameter there (the bi_phys_segments removal) so we'll need to
coordinate that.
Bart Van Assche June 10, 2019, 10 p.m. UTC | #2
On 6/8/19 1:19 AM, Christoph Hellwig wrote:
> On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote:
>> No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
>> on preemption being disabled for its correctness. Since removing the CPU
>> preemption calls does not measurably affect performance, simplify the
>> blk-mq code by removing the blk_mq_put_ctx() function and also by not
>> disabling preemption in blk_mq_get_ctx().
> 
> I like the idea behinds this, but I think it makes some small issues
> we have in the current code even worse.  As far as I can tell the idea
> behind this call was that we operate on the same blk_mq_ctx for the
> duration of the I/O submission.  Now it should not matter which one,
> that is we don't care if we get preempted, but it should stay the same.

Hi Christoph,

Can you clarify this? Isn't the goal of the rq->mq_ctx = data->ctx 
assignment in blk_mq_rq_ctx_init() to ensure that the same blk_mq_ctx is 
used during I/O submission? As you know blk_mq_rq_ctx_init() is called 
immediately after a tag has been allocated.

> To actually make this work properly we'll need to pass down the
> blk_mq_ctx into the I/O scheduler merge path, which dereference it.
> Note that I also have an outstanding series to pass down an additional
> parameter there (the bi_phys_segments removal) so we'll need to
> coordinate that.

I can wait until Jens has queued your patch series.

Thanks,

Bart.
Christoph Hellwig June 11, 2019, 8:09 a.m. UTC | #3
On Mon, Jun 10, 2019 at 03:00:39PM -0700, Bart Van Assche wrote:
> On 6/8/19 1:19 AM, Christoph Hellwig wrote:
>> On Tue, Jun 04, 2019 at 11:17:35AM -0700, Bart Van Assche wrote:
>>> No code that occurs between blk_mq_get_ctx() and blk_mq_put_ctx() depends
>>> on preemption being disabled for its correctness. Since removing the CPU
>>> preemption calls does not measurably affect performance, simplify the
>>> blk-mq code by removing the blk_mq_put_ctx() function and also by not
>>> disabling preemption in blk_mq_get_ctx().
>>
>> I like the idea behinds this, but I think it makes some small issues
>> we have in the current code even worse.  As far as I can tell the idea
>> behind this call was that we operate on the same blk_mq_ctx for the
>> duration of the I/O submission.  Now it should not matter which one,
>> that is we don't care if we get preempted, but it should stay the same.
>
> Hi Christoph,
>
> Can you clarify this? Isn't the goal of the rq->mq_ctx = data->ctx 
> assignment in blk_mq_rq_ctx_init() to ensure that the same blk_mq_ctx is 
> used during I/O submission?

Yes.  But we still have additional blk_mq_get_ctx calls that I was
concerned about.  But looking deeper it seems like the additional ones are
just used locally for I/O scheduler merge decisions, and we should be ok
even if the context changes due to a preemption between the failed merge
and the request allocation.  That being said it would still be nice
to pass the ctx from __blk_mq_sched_bio_merge to ->bio_merge instead
of having to find it again in kyber_bio_merge, but that isn't urgent.

Patch
diff mbox series

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 74c6bb871f7e..0daadb8d2274 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -326,10 +326,8 @@  bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 	bool ret = false;
 	enum hctx_type type;
 
-	if (e && e->type->ops.bio_merge) {
-		blk_mq_put_ctx(ctx);
+	if (e && e->type->ops.bio_merge)
 		return e->type->ops.bio_merge(hctx, bio);
-	}
 
 	type = hctx->type;
 	if ((hctx->flags & BLK_MQ_F_SHOULD_MERGE) &&
@@ -340,7 +338,6 @@  bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio)
 		spin_unlock(&ctx->lock);
 	}
 
-	blk_mq_put_ctx(ctx);
 	return ret;
 }
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 7513c8eaabee..da19f0bc8876 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -113,7 +113,6 @@  unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	struct sbq_wait_state *ws;
 	DEFINE_SBQ_WAIT(wait);
 	unsigned int tag_offset;
-	bool drop_ctx;
 	int tag;
 
 	if (data->flags & BLK_MQ_REQ_RESERVED) {
@@ -136,7 +135,6 @@  unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		return BLK_MQ_TAG_FAIL;
 
 	ws = bt_wait_ptr(bt, data->hctx);
-	drop_ctx = data->ctx == NULL;
 	do {
 		struct sbitmap_queue *bt_prev;
 
@@ -161,9 +159,6 @@  unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		if (tag != -1)
 			break;
 
-		if (data->ctx)
-			blk_mq_put_ctx(data->ctx);
-
 		bt_prev = bt;
 		io_schedule();
 
@@ -189,9 +184,6 @@  unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 		ws = bt_wait_ptr(bt, data->hctx);
 	} while (1);
 
-	if (drop_ctx && data->ctx)
-		blk_mq_put_ctx(data->ctx);
-
 	sbitmap_finish_wait(bt, ws, &wait);
 
 found_tag:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 35948d45d7b9..00d572265aa3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -359,13 +359,13 @@  static struct request *blk_mq_get_request(struct request_queue *q,
 	struct elevator_queue *e = q->elevator;
 	struct request *rq;
 	unsigned int tag;
-	bool put_ctx_on_error = false;
+	bool clear_ctx_on_error = false;
 
 	blk_queue_enter_live(q);
 	data->q = q;
 	if (likely(!data->ctx)) {
 		data->ctx = blk_mq_get_ctx(q);
-		put_ctx_on_error = true;
+		clear_ctx_on_error = true;
 	}
 	if (likely(!data->hctx))
 		data->hctx = blk_mq_map_queue(q, data->cmd_flags,
@@ -391,10 +391,8 @@  static struct request *blk_mq_get_request(struct request_queue *q,
 
 	tag = blk_mq_get_tag(data);
 	if (tag == BLK_MQ_TAG_FAIL) {
-		if (put_ctx_on_error) {
-			blk_mq_put_ctx(data->ctx);
+		if (clear_ctx_on_error)
 			data->ctx = NULL;
-		}
 		blk_queue_exit(q);
 		return NULL;
 	}
@@ -431,8 +429,6 @@  struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
 	if (!rq)
 		return ERR_PTR(-EWOULDBLOCK);
 
-	blk_mq_put_ctx(alloc_data.ctx);
-
 	rq->__data_len = 0;
 	rq->__sector = (sector_t) -1;
 	rq->bio = rq->biotail = NULL;
@@ -1975,7 +1971,6 @@  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 
 	plug = current->plug;
 	if (unlikely(is_flush_fua)) {
-		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 
 		/* bypass scheduler for flush rq */
@@ -1989,7 +1984,6 @@  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		unsigned int request_count = plug->rq_count;
 		struct request *last = NULL;
 
-		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 
 		if (!request_count)
@@ -2023,8 +2017,6 @@  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		blk_add_rq_to_plug(plug, rq);
 		trace_block_plug(q);
 
-		blk_mq_put_ctx(data.ctx);
-
 		if (same_queue_rq) {
 			data.hctx = same_queue_rq->mq_hctx;
 			trace_block_unplug(q, 1, true);
@@ -2033,11 +2025,9 @@  static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 		}
 	} else if ((q->nr_hw_queues > 1 && is_sync) || (!q->elevator &&
 			!data.hctx->dispatch_busy)) {
-		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 		blk_mq_try_issue_directly(data.hctx, rq, &cookie);
 	} else {
-		blk_mq_put_ctx(data.ctx);
 		blk_mq_bio_to_request(rq, bio);
 		blk_mq_sched_insert_request(rq, false, true, true);
 	}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 633a5a77ee8b..f4bf5161333e 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -151,12 +151,7 @@  static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
  */
 static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q)
 {
-	return __blk_mq_get_ctx(q, get_cpu());
-}
-
-static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
-{
-	put_cpu();
+	return __blk_mq_get_ctx(q, raw_smp_processor_id());
 }
 
 struct blk_mq_alloc_data {
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index c3b05119cebd..71c7b07b645a 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -574,7 +574,6 @@  static bool kyber_bio_merge(struct blk_mq_hw_ctx *hctx, struct bio *bio)
 	spin_lock(&kcq->lock);
 	merged = blk_mq_bio_list_merge(hctx->queue, rq_list, bio);
 	spin_unlock(&kcq->lock);
-	blk_mq_put_ctx(ctx);
 
 	return merged;
 }