Message ID | 1526636866-209210-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Shawn, On 18/05/18 10:47, Shawn Lin wrote: > gic-v3 seems only suppot distribute hwirq to one CPU in dispite of > setting it via /proc/irq/*/smp_affinity. > > My RK3399 platform has 6 CPUs and I was trying to bind the emmc > irq, whose hwirq is 43 and virq is 30, to all cores > > echo 3f > /proc/irq/30/smp_affinity > > but the I/O test still shows the irq was fired to CPU0. For really > user case, we may try to distribute different hwirqs to different cores, > with the hope of distributing to a less irq-binded core as possible. > Otherwise, as current implementation, gic-v3 always distribute it > to the first masked cpu, which is what cpumask_any_and actually did in > practice now on my platform. That's because GICv3 cannot broadcast the interrupt to all CPUs, and has to pick one. > > So I was thinking to record how much hwirqs are distributed to each > core and try to pick up the least used one. > > This patch is rather rough with slightly test on my board. Just for > asking advice from wisdom of your. :) My advice is not to do this in the kernel. Why don't you run something like irqbalance? It will watch the interrupt usage and place move interrupts around. Thanks, M.
Hi Marc, On 2018/5/18 18:05, Marc Zyngier wrote: > Hi Shawn, > > On 18/05/18 10:47, Shawn Lin wrote: >> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of >> setting it via /proc/irq/*/smp_affinity. >> >> My RK3399 platform has 6 CPUs and I was trying to bind the emmc >> irq, whose hwirq is 43 and virq is 30, to all cores >> >> echo 3f > /proc/irq/30/smp_affinity >> >> but the I/O test still shows the irq was fired to CPU0. For really >> user case, we may try to distribute different hwirqs to different cores, >> with the hope of distributing to a less irq-binded core as possible. >> Otherwise, as current implementation, gic-v3 always distribute it >> to the first masked cpu, which is what cpumask_any_and actually did in >> practice now on my platform. > > That's because GICv3 cannot broadcast the interrupt to all CPUs, and has > to pick one. > yep, that's what I got from the GIC-V3 TRM. Btw, IIRC, gic-400(belonging to GIC-V2) on RK3288 platform could support broadcast interrupts to all CPUs, so I was a bit surprised to know GIC-V3 cannot do it, as v3 sounds should be more powerful than v2 instinctively. :))) >> >> So I was thinking to record how much hwirqs are distributed to each >> core and try to pick up the least used one. >> >> This patch is rather rough with slightly test on my board. Just for >> asking advice from wisdom of your. :) > > My advice is not to do this in the kernel. Why don't you run something > like irqbalance? It will watch the interrupt usage and place move > interrupts around. I will take a look at how irqbalance work in practice. Thanks for your advice. > > Thanks, > > M. >
On Sat, 19 May 2018 07:35:55 +0800 Shawn Lin <shawn.lin@rock-chips.com> wrote: > Hi Marc, > > On 2018/5/18 18:05, Marc Zyngier wrote: > > Hi Shawn, > > > > On 18/05/18 10:47, Shawn Lin wrote: > >> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of > >> setting it via /proc/irq/*/smp_affinity. > >> > >> My RK3399 platform has 6 CPUs and I was trying to bind the emmc > >> irq, whose hwirq is 43 and virq is 30, to all cores > >> > >> echo 3f > /proc/irq/30/smp_affinity > >> > >> but the I/O test still shows the irq was fired to CPU0. For really > >> user case, we may try to distribute different hwirqs to different cores, > >> with the hope of distributing to a less irq-binded core as possible. > >> Otherwise, as current implementation, gic-v3 always distribute it > >> to the first masked cpu, which is what cpumask_any_and actually did in > >> practice now on my platform. > > > > That's because GICv3 cannot broadcast the interrupt to all CPUs, and has > > to pick one. > > > > yep, that's what I got from the GIC-V3 TRM. Btw, IIRC, gic-400(belonging > to GIC-V2) on RK3288 platform could support broadcast interrupts to all > CPUs, so I was a bit surprised to know GIC-V3 cannot do it, as v3 sounds > should be more powerful than v2 instinctively. :))) The GICv2 1:N feature is really nasty, actually. It places a overhead on all CPUs (they will all take an interrupt and only one will actually service it, while the others may only see a spurious interrupt). So in practice, you don't really gain anything, unless your CPUs are completely idle. On a busy system, you see an actual performance reduction of your overall throughput, for a very small benefit in latency. That's why I refuse to support this feature in Linux. This may be useful on latency sensitive systems where the software is too primitive to do an effective balancing, but Linux is a bit better than that. Thankfully, GICv3 got rid of this misfeature, and I'm making sure it won't come back. <rant> Overall, interrupt affinity is too critical to be left to the hardware, which has no knowledge of how busy the CPUs are. It is a bit like implementing the scheduler in HW. It works very well if your SW is minimal enough. Grow a bigger system, and HW scheduling is becoming a total pain. That's why you only see such feature on small microcontrollers, and not on larger CPUs. </rant> > > >> > >> So I was thinking to record how much hwirqs are distributed to each > >> core and try to pick up the least used one. > >> > >> This patch is rather rough with slightly test on my board. Just for > >> asking advice from wisdom of your. :) > > > > My advice is not to do this in the kernel. Why don't you run something > > like irqbalance? It will watch the interrupt usage and place move > > interrupts around. > > I will take a look at how irqbalance work in practice. Thanks > for your advice. Let me know how it goes. Thanks, M.
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index 5a67ec0..b838fda 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -65,6 +65,7 @@ struct gic_chip_data { static struct gic_kvm_info gic_v3_kvm_info; static DEFINE_PER_CPU(bool, has_rss); +static DEFINE_PER_CPU(int, bind_irq_nr); #define MPIDR_RS(mpidr) (((mpidr) & 0xF0UL) >> 4) #define gic_data_rdist() (this_cpu_ptr(gic_data.rdists.rdist)) @@ -340,7 +341,7 @@ static u64 gic_mpidr_to_affinity(unsigned long mpidr) MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 | MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8 | MPIDR_AFFINITY_LEVEL(mpidr, 0)); - + per_cpu(bind_irq_nr, mpidr) += 1; return aff; } @@ -774,15 +775,31 @@ static void gic_smp_init(void) static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, bool force) { - unsigned int cpu; + unsigned int cpu = 0, min_irq_nr_cpu; void __iomem *reg; int enabled; u64 val; + cpumask_t local; + u8 aff; - if (force) + if (force) { cpu = cpumask_first(mask_val); - else - cpu = cpumask_any_and(mask_val, cpu_online_mask); + } else { + cpu = cpumask_and(&local, mask_val, cpu_online_mask); + if (cpu) { + min_irq_nr_cpu = cpumask_first(&local); + for_each_cpu(cpu, &local) { + if (per_cpu(bind_irq_nr, cpu) < + per_cpu(bind_irq_nr, min_irq_nr_cpu)) + min_irq_nr_cpu = cpu; + } + + cpu = min_irq_nr_cpu; + + } else { + cpu = cpumask_any_and(mask_val, cpu_online_mask); + } + } if (cpu >= nr_cpu_ids) return -EINVAL; @@ -796,6 +813,9 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, gic_mask_irq(d); reg = gic_dist_base(d) + GICD_IROUTER + (gic_irq(d) * 8); + aff = readq_relaxed(reg) & 0xff; //arch_gicv3.h + if (per_cpu(bind_irq_nr, aff)) + per_cpu(bind_irq_nr, aff) -= 1; val = gic_mpidr_to_affinity(cpu_logical_map(cpu)); gic_write_irouter(val, reg);
gic-v3 seems only suppot distribute hwirq to one CPU in dispite of setting it via /proc/irq/*/smp_affinity. My RK3399 platform has 6 CPUs and I was trying to bind the emmc irq, whose hwirq is 43 and virq is 30, to all cores echo 3f > /proc/irq/30/smp_affinity but the I/O test still shows the irq was fired to CPU0. For really user case, we may try to distribute different hwirqs to different cores, with the hope of distributing to a less irq-binded core as possible. Otherwise, as current implementation, gic-v3 always distribute it to the first masked cpu, which is what cpumask_any_and actually did in practice now on my platform. So I was thinking to record how much hwirqs are distributed to each core and try to pick up the least used one. This patch is rather rough with slightly test on my board. Just for asking advice from wisdom of your. :) Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> --- drivers/irqchip/irq-gic-v3.c | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-)