diff mbox series

[v2,2/2] block/blk-mq: Don't complete locally if capacities are different

Message ID 20240223155749.2958009-3-qyousef@layalina.io (mailing list archive)
State New, archived
Headers show
Series sched: blk: Handle HMP systems when completing IO | expand

Commit Message

Qais Yousef Feb. 23, 2024, 3:57 p.m. UTC
The logic in blk_mq_complete_need_ipi() assumes SMP systems where all
CPUs have equal compute capacities and only LLC cache can make
a different on perceived performance. But this assumption falls apart on
HMP systems where LLC is shared, but the CPUs have different capacities.
Staying local then can have a big performance impact if the IO request
was done from a CPU with higher capacity but the interrupt is serviced
on a lower capacity CPU.

Use the new cpus_equal_capacity() function to check if we need to send
an IPI.

Without the patch I see the BLOCK softirq always running on little cores
(where the hardirq is serviced). With it I can see it running on all
cores.

This was noticed after the topology change [1] where now on a big.LITTLE
we truly get that the LLC is shared between all cores where as in the
past it was being misrepresented for historical reasons. The logic
exposed a missing dependency on capacities for such systems where there
can be a big performance difference between the CPUs.

This of course introduced a noticeable change in behavior depending on
how the topology is presented. Leading to regressions in some workloads
as the performance of the BLOCK softirq on littles can be noticeably
worse on some platforms.

Worth noting that we could have checked for capacities being greater
than or equal instead for equality. This will lead to favouring higher
performance always. But opted for equality instead to match the
performance of the requester without making an assumption that can lead
to power trade-offs which these systems tend to be sensitive about. If
the requester would like to run faster, it's better to rely on the
scheduler to give the IO requester via some facility to run on a faster
core; and then if the interrupt triggered on a CPU with different
capacity we'll make sure to match the performance the requester is
supposed to run at.

[1] https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdf

Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
 block/blk-mq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Bart Van Assche Feb. 23, 2024, 4:09 p.m. UTC | #1
On 2/23/24 07:57, Qais Yousef wrote:
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2dc01551e27c..ea69047e12f7 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1167,10 +1167,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
>   	if (force_irqthreads())
>   		return false;
>   
> -	/* same CPU or cache domain?  Complete locally */
> +	/* same CPU or cache domain and capacity?  Complete locally */
>   	if (cpu == rq->mq_ctx->cpu ||
>   	    (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
> -	     cpus_share_cache(cpu, rq->mq_ctx->cpu)))
> +	     cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
> +	     cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
>   		return false;
>   
>   	/* don't try to IPI to an offline CPU */

I think it's worth mentioning that this change is intended for storage
controllers that only support a single completion interrupt. Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Qais Yousef Feb. 24, 2024, 11:06 p.m. UTC | #2
On 02/23/24 08:09, Bart Van Assche wrote:
> On 2/23/24 07:57, Qais Yousef wrote:
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 2dc01551e27c..ea69047e12f7 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1167,10 +1167,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
> >   	if (force_irqthreads())
> >   		return false;
> > -	/* same CPU or cache domain?  Complete locally */
> > +	/* same CPU or cache domain and capacity?  Complete locally */
> >   	if (cpu == rq->mq_ctx->cpu ||
> >   	    (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
> > -	     cpus_share_cache(cpu, rq->mq_ctx->cpu)))
> > +	     cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
> > +	     cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
> >   		return false;
> >   	/* don't try to IPI to an offline CPU */
> 
> I think it's worth mentioning that this change is intended for storage
> controllers that only support a single completion interrupt. Anyway:

Sorry I didn't realize it is only applied for this scenario.

> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>

Thanks for the reviews!
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2dc01551e27c..ea69047e12f7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1167,10 +1167,11 @@  static inline bool blk_mq_complete_need_ipi(struct request *rq)
 	if (force_irqthreads())
 		return false;
 
-	/* same CPU or cache domain?  Complete locally */
+	/* same CPU or cache domain and capacity?  Complete locally */
 	if (cpu == rq->mq_ctx->cpu ||
 	    (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
-	     cpus_share_cache(cpu, rq->mq_ctx->cpu)))
+	     cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
+	     cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
 		return false;
 
 	/* don't try to IPI to an offline CPU */