diff mbox series

[2/4] block/mq-deadline: serialize request dispatching

Message ID 20240119160338.1191281-3-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show
Series mq-deadline scalability improvements | expand

Commit Message

Jens Axboe Jan. 19, 2024, 4:02 p.m. UTC
If we're entering request dispatch but someone else is already
dispatching, then just skip this dispatch. We know IO is inflight and
this will trigger another dispatch event for any completion. This will
potentially cause slightly lower queue depth for contended cases, but
those are slowed down anyway and this should not cause an issue.

By itself, this patch doesn't help a whole lot, as the dispatch
lock contention reduction is just eating up by the same dd->lock now
seeing increased insertion contention. But it's required work to be
able to reduce the lock contention in general.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/mq-deadline.c | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

Comments

Bart Van Assche Jan. 19, 2024, 11:24 p.m. UTC | #1
On 1/19/24 08:02, Jens Axboe wrote:
> +	/*
> +	 * If someone else is already dispatching, skip this one. This will
> +	 * defer the next dispatch event to when something completes, and could
> +	 * potentially lower the queue depth for contended cases.
> +	 *
> +	 * See the logic in blk_mq_do_dispatch_sched(), which loops and
> +	 * retries if nothing is dispatched.
> +	 */
> +	if (test_bit(DD_DISPATCHING, &dd->run_state) ||
> +	    test_and_set_bit(DD_DISPATCHING, &dd->run_state))
> +		return NULL;
> +
>   	spin_lock(&dd->lock);
>   	rq = dd_dispatch_prio_aged_requests(dd, now);
>   	if (rq)
> @@ -616,6 +635,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>   	}
>   
>   unlock:
> +	clear_bit(DD_DISPATCHING, &dd->run_state);
>   	spin_unlock(&dd->lock);

 From Documentation/memory-barriers.txt: "These are also used for atomic RMW
bitop functions that do not imply a memory barrier (such as set_bit and
clear_bit)." Does this mean that CPUs with a weak memory model (e.g. ARM)
are allowed to execute the clear_bit() call earlier than where it occurs in
the code? I think that spin_trylock() has "acquire" semantics and also that
"spin_unlock()" has release semantics. While a CPU is allowed to execute
clear_bit() before the memory operations that come before it, I don't think
that is the case for spin_unlock(). See also
tools/memory-model/Documentation/locking.txt.

Thanks,

Bart.
Jens Axboe Jan. 20, 2024, midnight UTC | #2
On 1/19/24 4:24 PM, Bart Van Assche wrote:
> On 1/19/24 08:02, Jens Axboe wrote:
>> +    /*
>> +     * If someone else is already dispatching, skip this one. This will
>> +     * defer the next dispatch event to when something completes, and could
>> +     * potentially lower the queue depth for contended cases.
>> +     *
>> +     * See the logic in blk_mq_do_dispatch_sched(), which loops and
>> +     * retries if nothing is dispatched.
>> +     */
>> +    if (test_bit(DD_DISPATCHING, &dd->run_state) ||
>> +        test_and_set_bit(DD_DISPATCHING, &dd->run_state))
>> +        return NULL;
>> +
>>       spin_lock(&dd->lock);
>>       rq = dd_dispatch_prio_aged_requests(dd, now);
>>       if (rq)
>> @@ -616,6 +635,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>>       }
>>     unlock:
>> +    clear_bit(DD_DISPATCHING, &dd->run_state);
>>       spin_unlock(&dd->lock);
> 
> From Documentation/memory-barriers.txt: "These are also used for atomic RMW
> bitop functions that do not imply a memory barrier (such as set_bit and
> clear_bit)." Does this mean that CPUs with a weak memory model (e.g. ARM)
> are allowed to execute the clear_bit() call earlier than where it occurs in
> the code? I think that spin_trylock() has "acquire" semantics and also that
> "spin_unlock()" has release semantics. While a CPU is allowed to execute
> clear_bit() before the memory operations that come before it, I don't think
> that is the case for spin_unlock(). See also
> tools/memory-model/Documentation/locking.txt.

Not sure why I didn't do it upfront, but they just need to be the _lock
variants of the bitops. I'll make that change.
diff mbox series

Patch

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 9b7563e9d638..b579ce282176 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -79,10 +79,20 @@  struct dd_per_prio {
 	struct io_stats_per_prio stats;
 };
 
+enum {
+	DD_DISPATCHING	= 0,
+};
+
 struct deadline_data {
 	/*
 	 * run time data
 	 */
+	struct {
+		spinlock_t lock;
+		spinlock_t zone_lock;
+	} ____cacheline_aligned_in_smp;
+
+	unsigned long run_state;
 
 	struct dd_per_prio per_prio[DD_PRIO_COUNT];
 
@@ -100,9 +110,6 @@  struct deadline_data {
 	int front_merges;
 	u32 async_depth;
 	int prio_aging_expire;
-
-	spinlock_t lock;
-	spinlock_t zone_lock;
 };
 
 /* Maps an I/O priority class to a deadline scheduler priority. */
@@ -600,6 +607,18 @@  static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct request *rq;
 	enum dd_prio prio;
 
+	/*
+	 * If someone else is already dispatching, skip this one. This will
+	 * defer the next dispatch event to when something completes, and could
+	 * potentially lower the queue depth for contended cases.
+	 *
+	 * See the logic in blk_mq_do_dispatch_sched(), which loops and
+	 * retries if nothing is dispatched.
+	 */
+	if (test_bit(DD_DISPATCHING, &dd->run_state) ||
+	    test_and_set_bit(DD_DISPATCHING, &dd->run_state))
+		return NULL;
+
 	spin_lock(&dd->lock);
 	rq = dd_dispatch_prio_aged_requests(dd, now);
 	if (rq)
@@ -616,6 +635,7 @@  static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	}
 
 unlock:
+	clear_bit(DD_DISPATCHING, &dd->run_state);
 	spin_unlock(&dd->lock);
 
 	return rq;
@@ -706,6 +726,9 @@  static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 
 	eq->elevator_data = dd;
 
+	spin_lock_init(&dd->lock);
+	spin_lock_init(&dd->zone_lock);
+
 	for (prio = 0; prio <= DD_PRIO_MAX; prio++) {
 		struct dd_per_prio *per_prio = &dd->per_prio[prio];
 
@@ -722,8 +745,6 @@  static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 	dd->last_dir = DD_WRITE;
 	dd->fifo_batch = fifo_batch;
 	dd->prio_aging_expire = prio_aging_expire;
-	spin_lock_init(&dd->lock);
-	spin_lock_init(&dd->zone_lock);
 
 	/* We dispatch from request queue wide instead of hw queue */
 	blk_queue_flag_set(QUEUE_FLAG_SQ_SCHED, q);