diff mbox series

irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu

Message ID 20240703072659.1427616-1-namcao@linutronix.de (mailing list archive)
State Changes Requested, archived
Headers show
Series irqchip/sifive-plic: Fix plic_set_affinity() only enables 1 cpu | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 success .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Nam Cao July 3, 2024, 7:26 a.m. UTC
plic_set_affinity() only enables interrupt for the first possible CPU in
the mask. The point is to prevent all CPUs trying to claim an interrupt,
but only one CPU succeeds and the other CPUs wasted some clock cycles for
nothing.

However, there are two problems with that:
1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
interrupt latency).
2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
is still invoked once (in plic_irqdomain_map()). Thus, by default, only
CPU0 handles interrupts from PLIC. That may overload CPU0.

Considering this optimization is not strictly the best (it is tradeoff
between CPU cycles and interrupt latency), it should not be forced on
users.

Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
the mask.

Before:
$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 10:       2538       2695       3080       2309  RISC-V INTC   5 Edge      riscv-timer
 12:          3          0          0          0  SiFive PLIC 111 Edge      17030000.power-controller
 13:       1163          0          0          0  SiFive PLIC  25 Edge      13010000.spi
 14:         60          0          0          0  SiFive PLIC   7 Edge      end0
 15:          0          0          0          0  SiFive PLIC   6 Edge      end0
 16:          0          0          0          0  SiFive PLIC   5 Edge      end0
 17:          0          0          0          0  SiFive PLIC  78 Edge      end1
 18:          0          0          0          0  SiFive PLIC  77 Edge      end1
 19:          0          0          0          0  SiFive PLIC  76 Edge      end1
 22:        796          0          0          0  SiFive PLIC  32 Edge      ttyS0
 23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
 24:       9062          0          0          0  SiFive PLIC  75 Edge      dw-mci
 25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
 26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
 27:          1          0          0          0  SiFive PLIC  50 Edge      12050000.i2c
 28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
IPI0:       118         98         88        138  Rescheduling interrupts
IPI1:      2272       1910       3758       3200  Function call interrupts
IPI2:         0          0          0          0  CPU stop interrupts
IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0  IRQ work interrupts
IPI5:         0          0          0          0  Timer broadcast interrupts

After:
$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 10:       2539       2734       2295       2552  RISC-V INTC   5 Edge      riscv-timer
 12:          2          1          0          0  SiFive PLIC 111 Edge      17030000.power-controller
 13:        643        194        368         75  SiFive PLIC  25 Edge      13010000.spi
 14:          6         22         19         27  SiFive PLIC   7 Edge      end0
 15:          0          0          0          0  SiFive PLIC   6 Edge      end0
 16:          0          0          0          0  SiFive PLIC   5 Edge      end0
 17:          0          0          0          0  SiFive PLIC  78 Edge      end1
 18:          0          0          0          0  SiFive PLIC  77 Edge      end1
 19:          0          0          0          0  SiFive PLIC  76 Edge      end1
 22:        158        254        226        207  SiFive PLIC  32 Edge      ttyS0
 23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
 24:       2265       2250       1452       2024  SiFive PLIC  75 Edge      dw-mci
 25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
 26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
 27:          0          0          0          1  SiFive PLIC  50 Edge      12050000.i2c
 28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
IPI0:        92        118        116        120  Rescheduling interrupts
IPI1:      4135       2653       2170       3160  Function call interrupts
IPI2:         0          0          0          0  CPU stop interrupts
IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0  IRQ work interrupts
IPI5:         0          0          0          0  Timer broadcast interrupts

Signed-off-by: Nam Cao <namcao@linutronix.de>
Cc: Anup Patel <anup@brainfault.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
---
 drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

Anup Patel July 3, 2024, 11:58 a.m. UTC | #1
On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
>
> plic_set_affinity() only enables interrupt for the first possible CPU in
> the mask. The point is to prevent all CPUs trying to claim an interrupt,
> but only one CPU succeeds and the other CPUs wasted some clock cycles for
> nothing.
>
> However, there are two problems with that:
> 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> interrupt latency).

Well, you are assuming that multiple CPUs are always idle or available
to process interrupts. In other words, if the system is loaded running
some workload on each CPU then performance on multiple CPUs
will degrade since multiple CPUs will wastefully try to claim interrupt.

In reality, we can't make such assumptions and it is better to target a
particular CPU for processing interrupts (just like various other interrupt
controllers). For balancing interrupt processing load, we have software
irq balancers running in user-space (or kernel space) which do a
reasonably fine job of picking appropriate CPU for interrupt processing.

> 2. Even if users do not touch SMP interrupt affinity, plic_set_affinity()
> is still invoked once (in plic_irqdomain_map()). Thus, by default, only
> CPU0 handles interrupts from PLIC. That may overload CPU0.
>
> Considering this optimization is not strictly the best (it is tradeoff
> between CPU cycles and interrupt latency), it should not be forced on
> users.

Randomly attempting to take an interrupt on multiple CPUs affects the
workload running all such CPUs (see above comment).

It's better to have a more predictable and targeted interrupt affinity
so that software irq balancers work effectively.

>
> Rewrite plic_set_affinity() to enable interrupt for all possible CPUs in
> the mask.

At least from my side, it is a NO to this approach.

Regards,
Anup

>
> Before:
> $ cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>  10:       2538       2695       3080       2309  RISC-V INTC   5 Edge      riscv-timer
>  12:          3          0          0          0  SiFive PLIC 111 Edge      17030000.power-controller
>  13:       1163          0          0          0  SiFive PLIC  25 Edge      13010000.spi
>  14:         60          0          0          0  SiFive PLIC   7 Edge      end0
>  15:          0          0          0          0  SiFive PLIC   6 Edge      end0
>  16:          0          0          0          0  SiFive PLIC   5 Edge      end0
>  17:          0          0          0          0  SiFive PLIC  78 Edge      end1
>  18:          0          0          0          0  SiFive PLIC  77 Edge      end1
>  19:          0          0          0          0  SiFive PLIC  76 Edge      end1
>  22:        796          0          0          0  SiFive PLIC  32 Edge      ttyS0
>  23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
>  24:       9062          0          0          0  SiFive PLIC  75 Edge      dw-mci
>  25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
>  26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
>  27:          1          0          0          0  SiFive PLIC  50 Edge      12050000.i2c
>  28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
> IPI0:       118         98         88        138  Rescheduling interrupts
> IPI1:      2272       1910       3758       3200  Function call interrupts
> IPI2:         0          0          0          0  CPU stop interrupts
> IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0  IRQ work interrupts
> IPI5:         0          0          0          0  Timer broadcast interrupts
>
> After:
> $ cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>  10:       2539       2734       2295       2552  RISC-V INTC   5 Edge      riscv-timer
>  12:          2          1          0          0  SiFive PLIC 111 Edge      17030000.power-controller
>  13:        643        194        368         75  SiFive PLIC  25 Edge      13010000.spi
>  14:          6         22         19         27  SiFive PLIC   7 Edge      end0
>  15:          0          0          0          0  SiFive PLIC   6 Edge      end0
>  16:          0          0          0          0  SiFive PLIC   5 Edge      end0
>  17:          0          0          0          0  SiFive PLIC  78 Edge      end1
>  18:          0          0          0          0  SiFive PLIC  77 Edge      end1
>  19:          0          0          0          0  SiFive PLIC  76 Edge      end1
>  22:        158        254        226        207  SiFive PLIC  32 Edge      ttyS0
>  23:          0          0          0          0  SiFive PLIC  38 Edge      pl022
>  24:       2265       2250       1452       2024  SiFive PLIC  75 Edge      dw-mci
>  25:          0          0          0          0  SiFive PLIC  35 Edge      10030000.i2c
>  26:          0          0          0          0  SiFive PLIC  37 Edge      10050000.i2c
>  27:          0          0          0          1  SiFive PLIC  50 Edge      12050000.i2c
>  28:          0          0          0          0  SiFive PLIC  51 Edge      12060000.i2c
> IPI0:        92        118        116        120  Rescheduling interrupts
> IPI1:      4135       2653       2170       3160  Function call interrupts
> IPI2:         0          0          0          0  CPU stop interrupts
> IPI3:         0          0          0          0  CPU stop (for crash dump) interrupts
> IPI4:         0          0          0          0  IRQ work interrupts
> IPI5:         0          0          0          0  Timer broadcast interrupts
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Cc: Anup Patel <anup@brainfault.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  drivers/irqchip/irq-sifive-plic.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index 9e22f7e378f5..f30bdb94ceeb 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -163,20 +163,19 @@ static void plic_irq_eoi(struct irq_data *d)
>  static int plic_set_affinity(struct irq_data *d,
>                              const struct cpumask *mask_val, bool force)
>  {
> -       unsigned int cpu;
>         struct plic_priv *priv = irq_data_get_irq_chip_data(d);
> +       struct cpumask new_mask;
>
> -       if (force)
> -               cpu = cpumask_first_and(&priv->lmask, mask_val);
> -       else
> -               cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
> +       cpumask_and(&new_mask, mask_val, &priv->lmask);
> +       if (!force)
> +               cpumask_and(&new_mask, &new_mask, cpu_online_mask);
>
> -       if (cpu >= nr_cpu_ids)
> +       if (cpumask_empty(&new_mask))
>                 return -EINVAL;
>
>         plic_irq_disable(d);
>
> -       irq_data_update_effective_affinity(d, cpumask_of(cpu));
> +       irq_data_update_effective_affinity(d, &new_mask);
>
>         if (!irqd_irq_disabled(d))
>                 plic_irq_enable(d);
> --
> 2.39.2
>
Nam Cao July 3, 2024, 12:33 p.m. UTC | #2
On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote:
> On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
> >
> > plic_set_affinity() only enables interrupt for the first possible CPU in
> > the mask. The point is to prevent all CPUs trying to claim an interrupt,
> > but only one CPU succeeds and the other CPUs wasted some clock cycles for
> > nothing.
> >
> > However, there are two problems with that:
> > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> > interrupt latency).
> 
> Well, you are assuming that multiple CPUs are always idle or available
> to process interrupts. In other words, if the system is loaded running
> some workload on each CPU then performance on multiple CPUs
> will degrade since multiple CPUs will wastefully try to claim interrupt.
> 
> In reality, we can't make such assumptions and it is better to target a
> particular CPU for processing interrupts (just like various other interrupt
> controllers). For balancing interrupt processing load, we have software
> irq balancers running in user-space (or kernel space) which do a
> reasonably fine job of picking appropriate CPU for interrupt processing.

Then we should leave the job of distributing interrupts to those tools,
right? Not all use cases want minimally wasted CPU cycles. For example, if
a particular interrupt does not arrive very often, but when it does, it
needs to be handled fast; in this example, clearly enabling this interrupt
for all CPUs is superior.

But I am sure there are users who don't use something like irqbalance and
just let the system do the default behavior. So I see your point of not
wasting CPU cycles. So, how about we keep this patch, but also add a
"default policy" which evenly distributes the interrupts to individually
CPUs (best effort only). Something like the un-tested patch below?

Best regards,
Nam

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index f30bdb94ceeb..953f375835b0 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
 	irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
 			    handle_fasteoi_irq, NULL, NULL);
 	irq_set_noprobe(irq);
-	irq_set_affinity(irq, &priv->lmask);
+	irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask)));
 	return 0;
 }
Anup Patel July 3, 2024, 3:01 p.m. UTC | #3
On Wed, Jul 3, 2024 at 6:03 PM Nam Cao <namcao@linutronix.de> wrote:
>
> On Wed, Jul 03, 2024 at 05:28:23PM +0530, Anup Patel wrote:
> > On Wed, Jul 3, 2024 at 12:57 PM Nam Cao <namcao@linutronix.de> wrote:
> > >
> > > plic_set_affinity() only enables interrupt for the first possible CPU in
> > > the mask. The point is to prevent all CPUs trying to claim an interrupt,
> > > but only one CPU succeeds and the other CPUs wasted some clock cycles for
> > > nothing.
> > >
> > > However, there are two problems with that:
> > > 1. Users cannot enable interrupt on multiple CPUs (for example, to minimize
> > > interrupt latency).
> >
> > Well, you are assuming that multiple CPUs are always idle or available
> > to process interrupts. In other words, if the system is loaded running
> > some workload on each CPU then performance on multiple CPUs
> > will degrade since multiple CPUs will wastefully try to claim interrupt.
> >
> > In reality, we can't make such assumptions and it is better to target a
> > particular CPU for processing interrupts (just like various other interrupt
> > controllers). For balancing interrupt processing load, we have software
> > irq balancers running in user-space (or kernel space) which do a
> > reasonably fine job of picking appropriate CPU for interrupt processing.
>
> Then we should leave the job of distributing interrupts to those tools,
> right? Not all use cases want minimally wasted CPU cycles. For example, if
> a particular interrupt does not arrive very often, but when it does, it
> needs to be handled fast; in this example, clearly enabling this interrupt
> for all CPUs is superior.

This is a very specific case which you are trying to optimize and in the
process hurting performance in many other cases. There are many high
speed IOs (network, storage, etc) where rate of interrupt is high so for
such IO your patch will degrade performance on multiple CPUs.

>
> But I am sure there are users who don't use something like irqbalance and
> just let the system do the default behavior. So I see your point of not
> wasting CPU cycles. So, how about we keep this patch, but also add a
> "default policy" which evenly distributes the interrupts to individually
> CPUs (best effort only). Something like the un-tested patch below?

I would suggest dropping this patch and for the sake of distributing
interrupts at boot time we can have the below change.

>
> Best regards,
> Nam
>
> diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
> index f30bdb94ceeb..953f375835b0 100644
> --- a/drivers/irqchip/irq-sifive-plic.c
> +++ b/drivers/irqchip/irq-sifive-plic.c
> @@ -312,7 +312,7 @@ static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
>         irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
>                             handle_fasteoi_irq, NULL, NULL);
>         irq_set_noprobe(irq);
> -       irq_set_affinity(irq, &priv->lmask);
> +       irq_set_affinity(irq, cpumask_of(cpumask_any_distribute(&priv->lmask)));
>         return 0;
>  }
>

Regards,
Anup
diff mbox series

Patch

diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index 9e22f7e378f5..f30bdb94ceeb 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -163,20 +163,19 @@  static void plic_irq_eoi(struct irq_data *d)
 static int plic_set_affinity(struct irq_data *d,
 			     const struct cpumask *mask_val, bool force)
 {
-	unsigned int cpu;
 	struct plic_priv *priv = irq_data_get_irq_chip_data(d);
+	struct cpumask new_mask;
 
-	if (force)
-		cpu = cpumask_first_and(&priv->lmask, mask_val);
-	else
-		cpu = cpumask_first_and_and(&priv->lmask, mask_val, cpu_online_mask);
+	cpumask_and(&new_mask, mask_val, &priv->lmask);
+	if (!force)
+		cpumask_and(&new_mask, &new_mask, cpu_online_mask);
 
-	if (cpu >= nr_cpu_ids)
+	if (cpumask_empty(&new_mask))
 		return -EINVAL;
 
 	plic_irq_disable(d);
 
-	irq_data_update_effective_affinity(d, cpumask_of(cpu));
+	irq_data_update_effective_affinity(d, &new_mask);
 
 	if (!irqd_irq_disabled(d))
 		plic_irq_enable(d);