diff mbox series

[v4,4/5] genirq/affinity: allow driver's discontigous affinity set

Message ID c044e71afa25fdf65ca9abd21f8a5032e1b424eb.1580211965.git.zhangweiping@didiglobal.com (mailing list archive)
State New, archived
Headers show
Series Add support Weighted Round Robin for blkcg and nvme | expand

Commit Message

Weiping Zhang Jan. 28, 2020, 11:53 a.m. UTC
nvme driver will add 4 sets for supporting NVMe weighted round robin,
and some of these sets may be empty(depends on user configuration),
so each particular set is assigned one static index for avoiding the
management trouble, then the empty set will be been by
irq_create_affinity_masks(). This patch make API more compatible.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 kernel/irq/affinity.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Thomas Gleixner Feb. 1, 2020, 9:08 a.m. UTC | #1
Weiping Zhang <zhangweiping@didiglobal.com> writes:

> nvme driver will add 4 sets for supporting NVMe weighted round robin,
> and some of these sets may be empty(depends on user configuration),
> so each particular set is assigned one static index for avoiding the
> management trouble, then the empty set will be been by
> irq_create_affinity_masks().

What's the point of an empty interrupt set in the first place? This does
not make sense and smells like a really bad hack.

Can you please explain in detail why this is required and why it
actually makes sense?

Thanks,

        tglx
Weiping Zhang Feb. 4, 2020, 3:11 a.m. UTC | #2
Thomas Gleixner <tglx@linutronix.de> 于2020年2月1日周六 下午5:19写道:
>
> Weiping Zhang <zhangweiping@didiglobal.com> writes:
>
> > nvme driver will add 4 sets for supporting NVMe weighted round robin,
> > and some of these sets may be empty(depends on user configuration),
> > so each particular set is assigned one static index for avoiding the
> > management trouble, then the empty set will be been by
> > irq_create_affinity_masks().
>
> What's the point of an empty interrupt set in the first place? This does
> not make sense and smells like a really bad hack.
>
> Can you please explain in detail why this is required and why it
> actually makes sense?
>
Hi Thomas,
Sorry to late reply, I will post new patch to avoid creating empty sets.
In this version, nvme add extra 4 sets, because nvme will split its
io queues into 7 parts (poll, default, read, wrr_low, wrr_medium,
wrr_high, wrr_urgent),
the poll queues does not use irq, so nvme will has at most 6 irq sets.
And nvme driver use
two variables(dev->io_queues[index] and affd->set_size[index]) to
track how many queues/irqs
in each part. And the user may set some queues count to 0, for example:
nvme use 96 io queues.

default
dev->io_queues[0]=90
affd->set_size[0] = 90

read
dev->io_queues[1]=0
affd->set_size[1] = 0

wrr low
dev->io_queues[2]=0
affd->set_size[2] = 0

wrr medium
dev->io_queues[3]=0
affd->set_size[3] = 0

wrr high
dev->io_queues[4]=6
affd->set_size[4] = 6

wrr urgent
dev->io_queues[5]=0
affd->set_size[5] = 0

In this case the index from 1 to 3 will has 0 irqs.

But actually, it's no need to use fixed index for io_queues and set_size,
nvme just tells irq engine, how many irq_sets it has, and how may irqs
in each set,
so i will post V5 to solve this problem.
        nr_sets = 1;
        dev->io_queues[HCTX_TYPE_DEFAULT] = nr_default;
        affd->set_size[nr_sets - 1] = nr_default;
        dev->io_queues[HCTX_TYPE_READ] = nr_read;
        if (nr_read) {
                nr_sets++;
                affd->set_size[nr_sets - 1] = nr_read;
        }
        dev->io_queues[HCTX_TYPE_WRR_LOW] = nr_wrr_low;
        if (nr_wrr_low) {
                nr_sets++;
                affd->set_size[nr_sets - 1] = nr_wrr_low;
        }
        dev->io_queues[HCTX_TYPE_WRR_MEDIUM] = nr_wrr_medium;
        if (nr_wrr_medium) {
                nr_sets++;
                affd->set_size[nr_sets - 1] = nr_wrr_medium;
        }
        dev->io_queues[HCTX_TYPE_WRR_HIGH] = nr_wrr_high;
        if (nr_wrr_high) {
                nr_sets++;
                affd->set_size[nr_sets - 1] = nr_wrr_high;
        }
        dev->io_queues[HCTX_TYPE_WRR_URGENT] = nr_wrr_urgent;
        if (nr_wrr_urgent) {
                nr_sets++;
                affd->set_size[nr_sets - 1] = nr_wrr_urgent;
        }
        affd->nr_sets = nr_sets;

Thanks
Weiping
diff mbox series

Patch

diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c
index 4d89ad4fae3b..83154615cc9d 100644
--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -462,6 +462,10 @@  irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd)
 		unsigned int this_vecs = affd->set_size[i];
 		int ret;
 
+		/* skip empty affinity set */
+		if (this_vecs == 0)
+			continue;
+
 		ret = irq_build_affinity_masks(curvec, this_vecs,
 					       curvec, masks);
 		if (ret) {