Message ID | 20240828114958.29422-1-quic_mapa@quicinc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: Allow complete locally if capacities are different | expand |
On 8/28/24 7:49 AM, Manish Pandey wrote: > 'Commit af550e4c9682 ("block/blk-mq: Don't complete locally if > capacities are different")' enforces to complete the request locally > only if the submission and completion CPUs have same capacity. > > To have optimal IO load balancing or to avoid contention b/w submission > path and completion path, user may need to complete IO request of large > capacity CPU(s) on Small Capacity CPU(s) or vice versa. > > Hence introduce a QUEUE_FLAG_ALLOW_DIFF_CAPACITY blk queue flag to let > user decide if it wants to complete the request locally or need an IPI > even if the capacity of the requesting and completion queue is different. > This gives flexibility to user to choose best CPU for their completion > to give best performance for their system. I think that the following is missing from the above description: - Mentioning that this is for an unusual interrupt routing technology (SoC sends the interrupt to another CPU core than what has been specified in the smp_affinity mask). - An explanation why the desired effect cannot be achieved by changing rq_affinity into 0. > block/blk-mq-debugfs.c | 1 + > block/blk-mq.c | 3 ++- > block/blk-sysfs.c | 12 ++++++++++-- > include/linux/blkdev.h | 1 + > 4 files changed, 14 insertions(+), 3 deletions(-) Since the semantics of a sysfs attribute are modified, Documentation/ABI/stable/sysfs-block should be updated. Thanks, Bart.
On 8/28/24 13:13, Bart Van Assche wrote: > On 8/28/24 7:49 AM, Manish Pandey wrote: >> 'Commit af550e4c9682 ("block/blk-mq: Don't complete locally if >> capacities are different")' enforces to complete the request locally >> only if the submission and completion CPUs have same capacity. >> >> To have optimal IO load balancing or to avoid contention b/w submission >> path and completion path, user may need to complete IO request of large >> capacity CPU(s) on Small Capacity CPU(s) or vice versa. >> >> Hence introduce a QUEUE_FLAG_ALLOW_DIFF_CAPACITY blk queue flag to let >> user decide if it wants to complete the request locally or need an IPI >> even if the capacity of the requesting and completion queue is different. >> This gives flexibility to user to choose best CPU for their completion >> to give best performance for their system. Why have the flag be QUEUE_FLAG_ALLOW_DIFF_CAPACITY instead of just QUEUE_FLAG_SAME_LLC so it isn't as HMP specifically-worded? (And of course then having QUEUE_FLAG_SAME_COMP be strictly stronger than QUEUE_FLAG_SAME_LLC. On !HMP they are equal.) That would also answer Bart's question below how this is different to rq_affinity=0. > > I think that the following is missing from the above description: > - Mentioning that this is for an unusual interrupt routing technology > (SoC sends the interrupt to another CPU core than what has been > specified in the smp_affinity mask). FWIW on !mcq that doesn't have to be the case. > - An explanation why the desired effect cannot be achieved by changing > rq_affinity into 0. > >> block/blk-mq-debugfs.c | 1 + >> block/blk-mq.c | 3 ++- >> block/blk-sysfs.c | 12 ++++++++++-- >> include/linux/blkdev.h | 1 + >> 4 files changed, 14 insertions(+), 3 deletions(-) > > Since the semantics of a sysfs attribute are modified, > Documentation/ABI/stable/sysfs-block should be updated. > > Thanks, > > Bart. >
On 8/28/24 8:26 AM, Christian Loehle wrote: > On 8/28/24 13:13, Bart Van Assche wrote: >> I think that the following is missing from the above description: >> - Mentioning that this is for an unusual interrupt routing technology >> (SoC sends the interrupt to another CPU core than what has been >> specified in the smp_affinity mask). > > FWIW on !mcq that doesn't have to be the case. Hmm ... is there any x86 architecture that ignores the smp_affinity mask? I have not yet encountered an x86 system that does not respect the smp_affinity mask. Thanks, Bart.
Thanks for the CC Bart. Manish, if you're going to send a patch to address an issue from another merged patch, the etiquet is to keep the CC list of the original patch the same and include the author of that patch in the loop. On 08/28/24 08:13, Bart Van Assche wrote: > On 8/28/24 7:49 AM, Manish Pandey wrote: > > 'Commit af550e4c9682 ("block/blk-mq: Don't complete locally if > > capacities are different")' enforces to complete the request locally > > only if the submission and completion CPUs have same capacity. > > > > To have optimal IO load balancing or to avoid contention b/w submission > > path and completion path, user may need to complete IO request of large > > capacity CPU(s) on Small Capacity CPU(s) or vice versa. > > > > Hence introduce a QUEUE_FLAG_ALLOW_DIFF_CAPACITY blk queue flag to let > > user decide if it wants to complete the request locally or need an IPI I answered you here https://lore.kernel.org/lkml/20240901171317.bm5z3vplqgdwp4bc@airbuntu/ This approach is not acceptable. I think you need to better explain why rq_affinity=0 is not usable instead of confusing rq_affinity=1 needs to be hacked in this manner. The right extension would be to teach the system how to detect cases where it is better not to keep them on the same LLC/capacity because of scenario XYZ that is known (genericaly and scalably) to break and requires an exception. rq_affinity=0 would give you what you want AFAICT and don't see a reason for this hack. > > even if the capacity of the requesting and completion queue is different. > > This gives flexibility to user to choose best CPU for their completion > > to give best performance for their system. > > I think that the following is missing from the above description: > - Mentioning that this is for an unusual interrupt routing technology > (SoC sends the interrupt to another CPU core than what has been > specified in the smp_affinity mask). > - An explanation why the desired effect cannot be achieved by changing > rq_affinity into 0. It fails to mention a lot of things from the discussion from the previous thread sadly... Including the fact that there's a strange argument about regression on a platform that is easily fixed by using rq_affinity=0, but the argument of not using this is because some other platforms don't need to use rq_affinity=0. I'm not sure if rq_affinity=1 is supposed to work for all cases especially with the specific and custom setup Manish has. Anyway. The submission has a broken CC list that omits a lot of folks from the discussion. > > > block/blk-mq-debugfs.c | 1 + > > block/blk-mq.c | 3 ++- > > block/blk-sysfs.c | 12 ++++++++++-- > > include/linux/blkdev.h | 1 + > > 4 files changed, 14 insertions(+), 3 deletions(-) > > Since the semantics of a sysfs attribute are modified, > Documentation/ABI/stable/sysfs-block should be updated. > > Thanks, > > Bart.
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 5463697a8442..af048dad9667 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -93,6 +93,7 @@ static const char *const blk_queue_flag_name[] = { QUEUE_FLAG_NAME(RQ_ALLOC_TIME), QUEUE_FLAG_NAME(HCTX_ACTIVE), QUEUE_FLAG_NAME(SQ_SCHED), + QUEUE_FLAG_NAME(ALLOW_DIFF_CAPACITY), }; #undef QUEUE_FLAG_NAME diff --git a/block/blk-mq.c b/block/blk-mq.c index aa28157b1aaf..1584312d870a 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1164,7 +1164,8 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq) if (cpu == rq->mq_ctx->cpu || (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) && cpus_share_cache(cpu, rq->mq_ctx->cpu) && - cpus_equal_capacity(cpu, rq->mq_ctx->cpu))) + (test_bit(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, &rq->q->queue_flags) || + cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))) return false; /* don't try to IPI to an offline CPU */ diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 60116d13cb80..37d6ab325180 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -300,8 +300,9 @@ static ssize_t queue_rq_affinity_show(struct gendisk *disk, char *page) { bool set = test_bit(QUEUE_FLAG_SAME_COMP, &disk->queue->queue_flags); bool force = test_bit(QUEUE_FLAG_SAME_FORCE, &disk->queue->queue_flags); + bool allow = test_bit(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, &disk->queue->queue_flags); - return queue_var_show(set << force, page); + return queue_var_show((set << force) | (allow << set), page); } static ssize_t @@ -316,15 +317,22 @@ queue_rq_affinity_store(struct gendisk *disk, const char *page, size_t count) if (ret < 0) return ret; - if (val == 2) { + if (val == 3) { + blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); + blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); + blk_queue_flag_set(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, q); + } else if (val == 2) { blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, q); + blk_queue_flag_clear(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, q); } else if (val == 1) { blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); + blk_queue_flag_clear(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, q); } else if (val == 0) { blk_queue_flag_clear(QUEUE_FLAG_SAME_COMP, q); blk_queue_flag_clear(QUEUE_FLAG_SAME_FORCE, q); + blk_queue_flag_clear(QUEUE_FLAG_ALLOW_DIFF_CAPACITY, q); } #endif return ret; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b7664d593486..902fb726ebe1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -602,6 +602,7 @@ enum { QUEUE_FLAG_RQ_ALLOC_TIME, /* record rq->alloc_time_ns */ QUEUE_FLAG_HCTX_ACTIVE, /* at least one blk-mq hctx is active */ QUEUE_FLAG_SQ_SCHED, /* single queue style io dispatch */ + QUEUE_FLAG_ALLOW_DIFF_CAPACITY, /* complete on different capacity CPU-group */ QUEUE_FLAG_MAX };
'Commit af550e4c9682 ("block/blk-mq: Don't complete locally if capacities are different")' enforces to complete the request locally only if the submission and completion CPUs have same capacity. To have optimal IO load balancing or to avoid contention b/w submission path and completion path, user may need to complete IO request of large capacity CPU(s) on Small Capacity CPU(s) or vice versa. Hence introduce a QUEUE_FLAG_ALLOW_DIFF_CAPACITY blk queue flag to let user decide if it wants to complete the request locally or need an IPI even if the capacity of the requesting and completion queue is different. This gives flexibility to user to choose best CPU for their completion to give best performance for their system. Link: https://lore.kernel.org/all/66912a22-540d-4b9a-bd06-cce55b9ad304@quicinc.com/T/ Co-developed-by: Can Guo <quic_cang@quicinc.com> Co-developed-by: Nitin Rawat <quic_nitirawa@quicinc.com> Signed-off-by: Manish Pandey <quic_mapa@quicinc.com> --- block/blk-mq-debugfs.c | 1 + block/blk-mq.c | 3 ++- block/blk-sysfs.c | 12 ++++++++++-- include/linux/blkdev.h | 1 + 4 files changed, 14 insertions(+), 3 deletions(-)