Message ID | c044e71afa25fdf65ca9abd21f8a5032e1b424eb.1580211965.git.zhangweiping@didiglobal.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add support Weighted Round Robin for blkcg and nvme | expand |
Weiping Zhang <zhangweiping@didiglobal.com> writes: > nvme driver will add 4 sets for supporting NVMe weighted round robin, > and some of these sets may be empty(depends on user configuration), > so each particular set is assigned one static index for avoiding the > management trouble, then the empty set will be been by > irq_create_affinity_masks(). What's the point of an empty interrupt set in the first place? This does not make sense and smells like a really bad hack. Can you please explain in detail why this is required and why it actually makes sense? Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> 于2020年2月1日周六 下午5:19写道: > > Weiping Zhang <zhangweiping@didiglobal.com> writes: > > > nvme driver will add 4 sets for supporting NVMe weighted round robin, > > and some of these sets may be empty(depends on user configuration), > > so each particular set is assigned one static index for avoiding the > > management trouble, then the empty set will be been by > > irq_create_affinity_masks(). > > What's the point of an empty interrupt set in the first place? This does > not make sense and smells like a really bad hack. > > Can you please explain in detail why this is required and why it > actually makes sense? > Hi Thomas, Sorry to late reply, I will post new patch to avoid creating empty sets. In this version, nvme add extra 4 sets, because nvme will split its io queues into 7 parts (poll, default, read, wrr_low, wrr_medium, wrr_high, wrr_urgent), the poll queues does not use irq, so nvme will has at most 6 irq sets. And nvme driver use two variables(dev->io_queues[index] and affd->set_size[index]) to track how many queues/irqs in each part. And the user may set some queues count to 0, for example: nvme use 96 io queues. default dev->io_queues[0]=90 affd->set_size[0] = 90 read dev->io_queues[1]=0 affd->set_size[1] = 0 wrr low dev->io_queues[2]=0 affd->set_size[2] = 0 wrr medium dev->io_queues[3]=0 affd->set_size[3] = 0 wrr high dev->io_queues[4]=6 affd->set_size[4] = 6 wrr urgent dev->io_queues[5]=0 affd->set_size[5] = 0 In this case the index from 1 to 3 will has 0 irqs. But actually, it's no need to use fixed index for io_queues and set_size, nvme just tells irq engine, how many irq_sets it has, and how may irqs in each set, so i will post V5 to solve this problem. nr_sets = 1; dev->io_queues[HCTX_TYPE_DEFAULT] = nr_default; affd->set_size[nr_sets - 1] = nr_default; dev->io_queues[HCTX_TYPE_READ] = nr_read; if (nr_read) { nr_sets++; affd->set_size[nr_sets - 1] = nr_read; } dev->io_queues[HCTX_TYPE_WRR_LOW] = nr_wrr_low; if (nr_wrr_low) { nr_sets++; affd->set_size[nr_sets - 1] = nr_wrr_low; } dev->io_queues[HCTX_TYPE_WRR_MEDIUM] = nr_wrr_medium; if (nr_wrr_medium) { nr_sets++; affd->set_size[nr_sets - 1] = nr_wrr_medium; } dev->io_queues[HCTX_TYPE_WRR_HIGH] = nr_wrr_high; if (nr_wrr_high) { nr_sets++; affd->set_size[nr_sets - 1] = nr_wrr_high; } dev->io_queues[HCTX_TYPE_WRR_URGENT] = nr_wrr_urgent; if (nr_wrr_urgent) { nr_sets++; affd->set_size[nr_sets - 1] = nr_wrr_urgent; } affd->nr_sets = nr_sets; Thanks Weiping
diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 4d89ad4fae3b..83154615cc9d 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -462,6 +462,10 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) unsigned int this_vecs = affd->set_size[i]; int ret; + /* skip empty affinity set */ + if (this_vecs == 0) + continue; + ret = irq_build_affinity_masks(curvec, this_vecs, curvec, masks); if (ret) {
nvme driver will add 4 sets for supporting NVMe weighted round robin, and some of these sets may be empty(depends on user configuration), so each particular set is assigned one static index for avoiding the management trouble, then the empty set will be been by irq_create_affinity_masks(). This patch make API more compatible. Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> --- kernel/irq/affinity.c | 4 ++++ 1 file changed, 4 insertions(+)