diff mbox series

[RFC] genirq/cpuhotplug, PCI/rcar-host: Silence set affinity failed warning

Message ID 20240706132758.53298-1-marek.vasut+renesas@mailbox.org (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show
Series [RFC] genirq/cpuhotplug, PCI/rcar-host: Silence set affinity failed warning | expand

Commit Message

Marek Vasut July 6, 2024, 1:27 p.m. UTC
This is an RFC patch, I am looking for input on the approach taken here.
If the approach is sound, this patch would be split into proper patchset.

Various PCIe controllers that mux MSIs onto single IRQ line produce these
"IRQ%d: set affinity failed" warnings when entering suspend. This has been
discussed before [1] [2] and an example test case is included at the end
of this commit message.

Attempt to silence the warning by returning specific error code -EOPNOTSUPP
from the irqchip .irq_set_affinity callback, which skips printing the warning
in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what
the problem is, it is not possible to set affinity of each MSI IRQ line to a
specific CPU due to hardware limitation.

```
$ grep 25 /proc/interrupts
 25:   0 0 0 0 0 0 0 0   PCIe MSI   0   Edge   PCIe PME

$ echo core > /sys/power/pm_test ; echo mem > /sys/power/state
...
Disabling non-boot CPUs ...
IRQ25: set affinity failed(-22). <---------- This is being silenced here
psci: CPU7 killed (polled 4 ms)
...
```

[1] https://lore.kernel.org/all/d4a6eea3c5e33a3a4056885419df95a7@kernel.org/
[2] https://lore.kernel.org/all/5f4947b18bf381615a37aa81c2242477@kernel.org/

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
---
Cc: "Krzysztof WilczyƄski" <kw@linux.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-pci@vger.kernel.org
Cc: linux-renesas-soc@vger.kernel.org
---
 drivers/pci/controller/pcie-rcar-host.c | 2 +-
 kernel/irq/cpuhotplug.c                 | 6 ++++--
 2 files changed, 5 insertions(+), 3 deletions(-)

Comments

Thomas Gleixner July 7, 2024, 6:47 p.m. UTC | #1
Marek!

On Sat, Jul 06 2024 at 15:27, Marek Vasut wrote:

> This is an RFC patch, I am looking for input on the approach taken here.
> If the approach is sound, this patch would be split into proper patchset.
>
> Various PCIe controllers that mux MSIs onto single IRQ line produce these
> "IRQ%d: set affinity failed" warnings when entering suspend. This has been
> discussed before [1] [2] and an example test case is included at the end
> of this commit message.
>
> Attempt to silence the warning by returning specific error code -EOPNOTSUPP
> from the irqchip .irq_set_affinity callback, which skips printing the warning
> in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what
> the problem is, it is not possible to set affinity of each MSI IRQ line to a
> specific CPU due to hardware limitation.

Why does the irq_chip in question have an irq_set_affinity() callback in
the first place?

Thanks,

        tglx
Marek Vasut July 8, 2024, 11:55 a.m. UTC | #2
On 7/7/24 8:47 PM, Thomas Gleixner wrote:
> Marek!

Hello Thomas,

> On Sat, Jul 06 2024 at 15:27, Marek Vasut wrote:
> 
>> This is an RFC patch, I am looking for input on the approach taken here.
>> If the approach is sound, this patch would be split into proper patchset.
>>
>> Various PCIe controllers that mux MSIs onto single IRQ line produce these
>> "IRQ%d: set affinity failed" warnings when entering suspend. This has been
>> discussed before [1] [2] and an example test case is included at the end
>> of this commit message.
>>
>> Attempt to silence the warning by returning specific error code -EOPNOTSUPP
>> from the irqchip .irq_set_affinity callback, which skips printing the warning
>> in cpuhotplug.c . The -EOPNOTSUPP was chosen because it indicates exactly what
>> the problem is, it is not possible to set affinity of each MSI IRQ line to a
>> specific CPU due to hardware limitation.
> 
> Why does the irq_chip in question have an irq_set_affinity() callback in
> the first place?
I believe originally (at least that's what's being discussed in the 
linked threads) it was because the irqchip code didn't check whether 
.irq_set_affinity was not NULL at all, so if it was missing, there would 
be NULL pointer dereference.

Now this is checked and irq_do_set_affinity() returns -EINVAL, which 
triggers the warning that is being silenced by this patch.

If you think this is better, I can:
- Tweak the cpuhotplug.c code to do some
   if (chip && !chip->irq_set_affinity) return false;
- Remove all the .irq_set_affinity implementations from PCI drivers
   which only return -EINVAL

Would that be better ?
Thomas Gleixner July 9, 2024, 5:18 p.m. UTC | #3
On Mon, Jul 08 2024 at 13:55, Marek Vasut wrote:
> On 7/7/24 8:47 PM, Thomas Gleixner wrote:
>> Why does the irq_chip in question have an irq_set_affinity() callback in
>> the first place?
> I believe originally (at least that's what's being discussed in the 
> linked threads) it was because the irqchip code didn't check whether 
> .irq_set_affinity was not NULL at all, so if it was missing, there would 
> be NULL pointer dereference.
>
> Now this is checked and irq_do_set_affinity() returns -EINVAL, which 
> triggers the warning that is being silenced by this patch.
>
> If you think this is better, I can:
> - Tweak the cpuhotplug.c code to do some
>    if (chip && !chip->irq_set_affinity) return false;

It does already:

migrate_one_irq()
  if (chip && !chip->irq_set_affinity)
    return false;

Right at the top.

> - Remove all the .irq_set_affinity implementations from PCI drivers
>    which only return -EINVAL
>
> Would that be better ?

I think so.

Thanks,

        tglx
Thomas Gleixner July 10, 2024, 4:11 p.m. UTC | #4
On Tue, Jul 09 2024 at 19:18, Thomas Gleixner wrote:

> On Mon, Jul 08 2024 at 13:55, Marek Vasut wrote:
>> On 7/7/24 8:47 PM, Thomas Gleixner wrote:
>>> Why does the irq_chip in question have an irq_set_affinity() callback in
>>> the first place?
>> I believe originally (at least that's what's being discussed in the 
>> linked threads) it was because the irqchip code didn't check whether 
>> .irq_set_affinity was not NULL at all, so if it was missing, there would 
>> be NULL pointer dereference.
>>
>> Now this is checked and irq_do_set_affinity() returns -EINVAL, which 
>> triggers the warning that is being silenced by this patch.
>>
>> If you think this is better, I can:
>> - Tweak the cpuhotplug.c code to do some
>>    if (chip && !chip->irq_set_affinity) return false;
>
> It does already:
>
> migrate_one_irq()
>   if (chip && !chip->irq_set_affinity)
>     return false;
>
> Right at the top.

  if (!chip || !chip->irq_set_affinity) {

Obviously :)
diff mbox series

Patch

diff --git a/drivers/pci/controller/pcie-rcar-host.c b/drivers/pci/controller/pcie-rcar-host.c
index c01efc6ea64f6..2314b2b30df8a 100644
--- a/drivers/pci/controller/pcie-rcar-host.c
+++ b/drivers/pci/controller/pcie-rcar-host.c
@@ -660,7 +660,7 @@  static void rcar_msi_irq_unmask(struct irq_data *d)
 
 static int rcar_msi_set_affinity(struct irq_data *d, const struct cpumask *mask, bool force)
 {
-	return -EINVAL;
+	return -EOPNOTSUPP;
 }
 
 static void rcar_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index eb86283901565..822bd6ca40bf9 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -147,8 +147,10 @@  static bool migrate_one_irq(struct irq_desc *desc)
 	}
 
 	if (err) {
-		pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
-				    d->irq, err);
+		if (err != -EOPNOTSUPP) {
+			pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
+					    d->irq, err);
+		}
 		brokeaff = false;
 	}