diff mbox

[RFC] irqchip/gic-v3: Try to distribute irq affinity to the less distributed CPU

Message ID 1526636866-209210-1-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shawn Lin May 18, 2018, 9:47 a.m. UTC
gic-v3 seems only suppot distribute hwirq to one CPU in dispite of
setting it via /proc/irq/*/smp_affinity.

My RK3399 platform has 6 CPUs and I was trying to bind the emmc
irq, whose hwirq is 43 and virq is 30, to all cores

echo 3f > /proc/irq/30/smp_affinity

but the I/O test still shows the irq was fired to CPU0. For really
user case, we may try to distribute different hwirqs to different cores,
with the hope of distributing to a less irq-binded core as possible.
Otherwise, as current implementation, gic-v3 always distribute it
to the first masked cpu, which is what cpumask_any_and actually did in
practice now on my platform.

So I was thinking to record how much hwirqs are distributed to each
core and try to pick up the least used one.

This patch is rather rough with slightly test on my board. Just for
asking advice from wisdom of your. :)

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

 drivers/irqchip/irq-gic-v3.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

Comments

Marc Zyngier May 18, 2018, 10:05 a.m. UTC | #1
Hi Shawn,

On 18/05/18 10:47, Shawn Lin wrote:
> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of
> setting it via /proc/irq/*/smp_affinity.
> 
> My RK3399 platform has 6 CPUs and I was trying to bind the emmc
> irq, whose hwirq is 43 and virq is 30, to all cores
> 
> echo 3f > /proc/irq/30/smp_affinity
> 
> but the I/O test still shows the irq was fired to CPU0. For really
> user case, we may try to distribute different hwirqs to different cores,
> with the hope of distributing to a less irq-binded core as possible.
> Otherwise, as current implementation, gic-v3 always distribute it
> to the first masked cpu, which is what cpumask_any_and actually did in
> practice now on my platform.

That's because GICv3 cannot broadcast the interrupt to all CPUs, and has
to pick one.

> 
> So I was thinking to record how much hwirqs are distributed to each
> core and try to pick up the least used one.
> 
> This patch is rather rough with slightly test on my board. Just for
> asking advice from wisdom of your. :)

My advice is not to do this in the kernel. Why don't you run something
like irqbalance? It will watch the interrupt usage and place move
interrupts around.

Thanks,

	M.
Shawn Lin May 18, 2018, 11:35 p.m. UTC | #2
Hi Marc,

On 2018/5/18 18:05, Marc Zyngier wrote:
> Hi Shawn,
> 
> On 18/05/18 10:47, Shawn Lin wrote:
>> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of
>> setting it via /proc/irq/*/smp_affinity.
>>
>> My RK3399 platform has 6 CPUs and I was trying to bind the emmc
>> irq, whose hwirq is 43 and virq is 30, to all cores
>>
>> echo 3f > /proc/irq/30/smp_affinity
>>
>> but the I/O test still shows the irq was fired to CPU0. For really
>> user case, we may try to distribute different hwirqs to different cores,
>> with the hope of distributing to a less irq-binded core as possible.
>> Otherwise, as current implementation, gic-v3 always distribute it
>> to the first masked cpu, which is what cpumask_any_and actually did in
>> practice now on my platform.
> 
> That's because GICv3 cannot broadcast the interrupt to all CPUs, and has
> to pick one.
> 

yep, that's what I got from the GIC-V3 TRM. Btw, IIRC, gic-400(belonging
to GIC-V2) on RK3288 platform could support broadcast interrupts to all
CPUs, so I was a bit surprised to know GIC-V3 cannot do it, as v3 sounds
should be more powerful than v2 instinctively. :)))

>>
>> So I was thinking to record how much hwirqs are distributed to each
>> core and try to pick up the least used one.
>>
>> This patch is rather rough with slightly test on my board. Just for
>> asking advice from wisdom of your. :)
> 
> My advice is not to do this in the kernel. Why don't you run something
> like irqbalance? It will watch the interrupt usage and place move
> interrupts around.

I will take a look at how irqbalance work in practice. Thanks
for your advice.


> 
> Thanks,
> 
> 	M.
>
Marc Zyngier May 19, 2018, 10:04 a.m. UTC | #3
On Sat, 19 May 2018 07:35:55 +0800
Shawn Lin <shawn.lin@rock-chips.com> wrote:

> Hi Marc,
> 
> On 2018/5/18 18:05, Marc Zyngier wrote:
> > Hi Shawn,
> > 
> > On 18/05/18 10:47, Shawn Lin wrote:  
> >> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of
> >> setting it via /proc/irq/*/smp_affinity.
> >>
> >> My RK3399 platform has 6 CPUs and I was trying to bind the emmc
> >> irq, whose hwirq is 43 and virq is 30, to all cores
> >>
> >> echo 3f > /proc/irq/30/smp_affinity
> >>
> >> but the I/O test still shows the irq was fired to CPU0. For really
> >> user case, we may try to distribute different hwirqs to different cores,
> >> with the hope of distributing to a less irq-binded core as possible.
> >> Otherwise, as current implementation, gic-v3 always distribute it
> >> to the first masked cpu, which is what cpumask_any_and actually did in
> >> practice now on my platform.  
> > 
> > That's because GICv3 cannot broadcast the interrupt to all CPUs, and has
> > to pick one.
> >   
> 
> yep, that's what I got from the GIC-V3 TRM. Btw, IIRC, gic-400(belonging
> to GIC-V2) on RK3288 platform could support broadcast interrupts to all
> CPUs, so I was a bit surprised to know GIC-V3 cannot do it, as v3 sounds
> should be more powerful than v2 instinctively. :)))

The GICv2 1:N feature is really nasty, actually. It places a overhead
on all CPUs (they will all take an interrupt and only one will
actually service it, while the others may only see a spurious
interrupt). So in practice, you don't really gain anything, unless your
CPUs are completely idle. 

On a busy system, you see an actual performance reduction of your
overall throughput, for a very small benefit in latency. That's why I
refuse to support this feature in Linux. This may be useful on latency
sensitive systems where the software is too primitive to do an
effective balancing, but Linux is a bit better than that. Thankfully,
GICv3 got rid of this misfeature, and I'm making sure it won't come
back.

<rant>

Overall, interrupt affinity is too critical to be left to the hardware,
which has no knowledge of how busy the CPUs are. It is a bit like
implementing the scheduler in HW. It works very well if your SW is
minimal enough. Grow a bigger system, and HW scheduling is becoming a
total pain. That's why you only see such feature on small
microcontrollers, and not on larger CPUs.

</rant>

> 
> >>
> >> So I was thinking to record how much hwirqs are distributed to each
> >> core and try to pick up the least used one.
> >>
> >> This patch is rather rough with slightly test on my board. Just for
> >> asking advice from wisdom of your. :)  
> > 
> > My advice is not to do this in the kernel. Why don't you run something
> > like irqbalance? It will watch the interrupt usage and place move
> > interrupts around.  
> 
> I will take a look at how irqbalance work in practice. Thanks
> for your advice.

Let me know how it goes.

Thanks,

	M.
diff mbox

Patch

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 5a67ec0..b838fda 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -65,6 +65,7 @@  struct gic_chip_data {
 
 static struct gic_kvm_info gic_v3_kvm_info;
 static DEFINE_PER_CPU(bool, has_rss);
+static DEFINE_PER_CPU(int, bind_irq_nr);
 
 #define MPIDR_RS(mpidr)			(((mpidr) & 0xF0UL) >> 4)
 #define gic_data_rdist()		(this_cpu_ptr(gic_data.rdists.rdist))
@@ -340,7 +341,7 @@  static u64 gic_mpidr_to_affinity(unsigned long mpidr)
 	       MPIDR_AFFINITY_LEVEL(mpidr, 2) << 16 |
 	       MPIDR_AFFINITY_LEVEL(mpidr, 1) << 8  |
 	       MPIDR_AFFINITY_LEVEL(mpidr, 0));
-
+	per_cpu(bind_irq_nr, mpidr) += 1;
 	return aff;
 }
 
@@ -774,15 +775,31 @@  static void gic_smp_init(void)
 static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 			    bool force)
 {
-	unsigned int cpu;
+	unsigned int cpu = 0, min_irq_nr_cpu;
 	void __iomem *reg;
 	int enabled;
 	u64 val;
+	cpumask_t local;
+	u8 aff;
 
-	if (force)
+	if (force) {
 		cpu = cpumask_first(mask_val);
-	else
-		cpu = cpumask_any_and(mask_val, cpu_online_mask);
+	} else {
+		cpu = cpumask_and(&local, mask_val, cpu_online_mask);
+		if (cpu) {
+			min_irq_nr_cpu = cpumask_first(&local);
+			for_each_cpu(cpu, &local) {
+				if (per_cpu(bind_irq_nr, cpu) <
+						per_cpu(bind_irq_nr, min_irq_nr_cpu))
+					min_irq_nr_cpu = cpu;
+			}
+
+			cpu = min_irq_nr_cpu;
+
+		} else {
+			cpu = cpumask_any_and(mask_val, cpu_online_mask);
+		}
+	}
 
 	if (cpu >= nr_cpu_ids)
 		return -EINVAL;
@@ -796,6 +813,9 @@  static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 		gic_mask_irq(d);
 
 	reg = gic_dist_base(d) + GICD_IROUTER + (gic_irq(d) * 8);
+	aff = readq_relaxed(reg) & 0xff; //arch_gicv3.h
+	if (per_cpu(bind_irq_nr, aff))
+		per_cpu(bind_irq_nr, aff) -= 1;
 	val = gic_mpidr_to_affinity(cpu_logical_map(cpu));
 
 	gic_write_irouter(val, reg);