diff mbox

[RFC] genirq: Change the non-balanced irq to balance irq when the cpu of the irq bounded off line

Message ID 1459481291-10136-1-git-send-email-majun258@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

majun (F) April 1, 2016, 3:28 a.m. UTC
From: Ma Jun <majun258@huawei.com>

When the CPU of a non-balanced irq bounded is off line, the irq will be migrated to other CPUs,
usually the first cpu on-line.

We can suppose the situation if a system has more than one non-balanced irq.
At extreme case, these irqs will be migrated to the same CPU and will cause the 
CPU run with high irq pressure, even make the system die.

So, I think maybe we need to change the non-balanced irq to a irq can be
balanced to avoid the problem descried above.

Maybe this is not a good solution for this problem, please offer me some
suggestion if you have a better one.

Signed-off-by: Ma Jun <majun258@huawei.com>
---
 kernel/irq/cpuhotplug.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

Comments

Marc Zyngier April 1, 2016, 8:30 a.m. UTC | #1
Hi Ma Jun,

On 01/04/16 04:28, MaJun wrote:
> From: Ma Jun <majun258@huawei.com>
> 
> When the CPU of a non-balanced irq bounded is off line, the irq will be migrated to other CPUs,
> usually the first cpu on-line.
> 
> We can suppose the situation if a system has more than one non-balanced irq.
> At extreme case, these irqs will be migrated to the same CPU and will cause the 
> CPU run with high irq pressure, even make the system die.

It would take a hell of lot of interrupts (and a very badly designed
system) for that system to collapse under the interrupt load. Whatever
people tend to think, interrupts are a very rare event.

Any moderately ancient CPU can take several hundred of thousand
interrupts per second, and you still barely notice it (try any embedded
platform with a bunch of MMC controllers...).

Now, let's get to the actual question:

> So, I think maybe we need to change the non-balanced irq to a irq can be
> balanced to avoid the problem descried above.

But what makes you think that you can safely clear that flag? If it has
been excluding from balancing, that's surely for a good reason, and the
device driver that requested this probably doesn't expect the interrupt
affinity to change, other than by the effect of CPU hotplug itself.

So if you're seeing a problem with an interrupt not being balanced,
please first investigate *why* the driver asked for it the first place.

But to the best of my understanding, this patch doesn't solve anything.

Thanks,

	N,
diff mbox

Patch

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 011f8c4..80d54a5 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -30,6 +30,8 @@  static bool migrate_one_irq(struct irq_desc *desc)
 		return false;
 
 	if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
+		if (irq_settings_has_no_balance_set(desc))
+			irqd_clear(d, IRQD_NO_BALANCING);
 		affinity = cpu_online_mask;
 		ret = true;
 	}