diff mbox

arm64: fix a migrating irq bug when hotplug cpu

Message ID 55E1AD09.7020701@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Yang Yingliang Aug. 29, 2015, 1 p.m. UTC
From: Yang Yingliang <yangyingliang@huawei.com>

When cpu is disabled, all irqs will be migratged to another cpu.
In some cases, a new affinity is different, it needed to be coppied
to irq's affinity. But if the type of irq is LPI, it's affinity will
not be coppied because of irq_set_affinity's return value.
So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.

Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
  arch/arm64/kernel/irq.c | 9 ++++++---
  1 file changed, 6 insertions(+), 3 deletions(-)

  	return ret;
  }

Comments

Jiang Liu Aug. 29, 2015, 3:12 p.m. UTC | #1
On 2015/8/29 21:00, Yang Yingliang wrote:
> From: Yang Yingliang <yangyingliang@huawei.com>
> 
> When cpu is disabled, all irqs will be migratged to another cpu.
> In some cases, a new affinity is different, it needed to be coppied
> to irq's affinity. But if the type of irq is LPI, it's affinity will
> not be coppied because of irq_set_affinity's return value.
> So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.
Hi Yingliang,
	If irq_set_affinity callback returns IRQ_SET_MASK_OK_DONE,
it means that irq_set_affinity has copied the new CPU mask to irq
affinity mask. It would be better to change irq_set_affinity for LPI
to follow this rule.
Thanks!
Gerry
> 
> Cc: Jiang Liu <jiang.liu@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/kernel/irq.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
> index 463fa2e..2acc8ec 100644
> --- a/arch/arm64/kernel/irq.c
> +++ b/arch/arm64/kernel/irq.c
> @@ -78,10 +78,13 @@ static bool migrate_one_irq(struct irq_desc *desc)
>      }
> 
>      c = irq_data_get_irq_chip(d);
> -    if (!c->irq_set_affinity)
> +    if (!c->irq_set_affinity) {
>          pr_debug("IRQ%u: unable to set affinity\n", d->irq);
> -    else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK
> && ret)
> -        cpumask_copy(irq_data_get_affinity_mask(d), affinity);
> +    } else if (c->irq_set_affinity(d, affinity, false) ==
> IRQ_SET_MASK_OK && ret) {
> +        int r = c->irq_set_affinity(d, affinity, false);
> +        if ((r == IRQ_SET_MASK_OK || r == IRQ_SET_MASK_OK_DONE) && ret)
> +            cpumask_copy(irq_data_get_affinity_mask(d), affinity);
> +    }
> 
>      return ret;
>  }
Marc Zyngier Aug. 29, 2015, 6:12 p.m. UTC | #2
On 2015-08-29 16:12, Jiang Liu wrote:
> On 2015/8/29 21:00, Yang Yingliang wrote:
>> From: Yang Yingliang <yangyingliang@huawei.com>
>>
>> When cpu is disabled, all irqs will be migratged to another cpu.
>> In some cases, a new affinity is different, it needed to be coppied
>> to irq's affinity. But if the type of irq is LPI, it's affinity will
>> not be coppied because of irq_set_affinity's return value.
>> So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.
> Hi Yingliang,
> 	If irq_set_affinity callback returns IRQ_SET_MASK_OK_DONE,
> it means that irq_set_affinity has copied the new CPU mask to irq
> affinity mask. It would be better to change irq_set_affinity for LPI
> to follow this rule.

The main issue here seems to be that we do not call irq_set_affinity, 
but
that we directly call into the top-level irqchip method, which relies 
on
the core code to do the copy (see irq_do_set_affinity). Too bad.

It feels like the arm/arm64 code would probably be better consolidated 
into
kernel/irq/migration.c, which already deals with some of this for x86
and ia64. It would save us the duplication and will make sure we don't
miss things next time we add a new return code, as irq_do_set_affinity
would handle this properly.

Thoughts?

          M.
Hanjun Guo Aug. 30, 2015, 1:15 p.m. UTC | #3
On 08/30/2015 02:12 AM, Marc Zyngier wrote:
> On 2015-08-29 16:12, Jiang Liu wrote:
>> On 2015/8/29 21:00, Yang Yingliang wrote:
>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>
>>> When cpu is disabled, all irqs will be migratged to another cpu.
>>> In some cases, a new affinity is different, it needed to be coppied
>>> to irq's affinity. But if the type of irq is LPI, it's affinity will
>>> not be coppied because of irq_set_affinity's return value.
>>> So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.
>> Hi Yingliang,
>>     If irq_set_affinity callback returns IRQ_SET_MASK_OK_DONE,
>> it means that irq_set_affinity has copied the new CPU mask to irq
>> affinity mask. It would be better to change irq_set_affinity for LPI
>> to follow this rule.
>
> The main issue here seems to be that we do not call irq_set_affinity, but
> that we directly call into the top-level irqchip method, which relies on
> the core code to do the copy (see irq_do_set_affinity). Too bad.
>
> It feels like the arm/arm64 code would probably be better consolidated into
> kernel/irq/migration.c, which already deals with some of this for x86
> and ia64. It would save us the duplication and will make sure we don't
> miss things next time we add a new return code, as irq_do_set_affinity
> would handle this properly.
>
> Thoughts?

I agree. In arch/arm64/kernel/irq.c the irq migrate code is the same
as ARM32, and it's duplicate. But this is a bugfix, can we fix it in
a simple way, and refactor the code later?

Thanks
Hanjun
Marc Zyngier Aug. 31, 2015, 12:20 p.m. UTC | #4
On Sun, 30 Aug 2015 21:15:56 +0800
Hanjun Guo <hanjun.guo@linaro.org> wrote:

> On 08/30/2015 02:12 AM, Marc Zyngier wrote:
> > On 2015-08-29 16:12, Jiang Liu wrote:
> >> On 2015/8/29 21:00, Yang Yingliang wrote:
> >>> From: Yang Yingliang <yangyingliang@huawei.com>
> >>>
> >>> When cpu is disabled, all irqs will be migratged to another cpu.
> >>> In some cases, a new affinity is different, it needed to be coppied
> >>> to irq's affinity. But if the type of irq is LPI, it's affinity will
> >>> not be coppied because of irq_set_affinity's return value.
> >>> So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.
> >> Hi Yingliang,
> >>     If irq_set_affinity callback returns IRQ_SET_MASK_OK_DONE,
> >> it means that irq_set_affinity has copied the new CPU mask to irq
> >> affinity mask. It would be better to change irq_set_affinity for LPI
> >> to follow this rule.
> >
> > The main issue here seems to be that we do not call irq_set_affinity, but
> > that we directly call into the top-level irqchip method, which relies on
> > the core code to do the copy (see irq_do_set_affinity). Too bad.
> >
> > It feels like the arm/arm64 code would probably be better consolidated into
> > kernel/irq/migration.c, which already deals with some of this for x86
> > and ia64. It would save us the duplication and will make sure we don't
> > miss things next time we add a new return code, as irq_do_set_affinity
> > would handle this properly.
> >
> > Thoughts?
> 
> I agree. In arch/arm64/kernel/irq.c the irq migrate code is the same
> as ARM32, and it's duplicate. But this is a bugfix, can we fix it in
> a simple way, and refactor the code later?

I'm not buying this.

I really can't see how adding more duplication can be beneficial. It is
not so much that there is duplication between arm and arm64 that
bothers me (as if that was the only thing...). The real issue is that
there is duplication between the arch code and the core code.

Migrating interrupts is a core code matter, and that's were it should
be handled IMHO. Plus, we're in the merge window, and there is plenty
of time to get this fixed the proper way.

Thanks,

	M.
Will Deacon Sept. 1, 2015, 8:48 a.m. UTC | #5
On Mon, Aug 31, 2015 at 01:20:31PM +0100, Marc Zyngier wrote:
> On Sun, 30 Aug 2015 21:15:56 +0800
> Hanjun Guo <hanjun.guo@linaro.org> wrote:
> > I agree. In arch/arm64/kernel/irq.c the irq migrate code is the same
> > as ARM32, and it's duplicate. But this is a bugfix, can we fix it in
> > a simple way, and refactor the code later?
> 
> I'm not buying this.
> 
> I really can't see how adding more duplication can be beneficial. It is
> not so much that there is duplication between arm and arm64 that
> bothers me (as if that was the only thing...). The real issue is that
> there is duplication between the arch code and the core code.
> 
> Migrating interrupts is a core code matter, and that's were it should
> be handled IMHO. Plus, we're in the merge window, and there is plenty
> of time to get this fixed the proper way.

Yup. I suggested this over a year ago but not sure why nothing happened:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2014-June/266923.html

Will
Yang Yingliang Sept. 1, 2015, 10:02 a.m. UTC | #6
? 2015/8/31 20:20, Marc Zyngier ??:
> On Sun, 30 Aug 2015 21:15:56 +0800
> Hanjun Guo <hanjun.guo@linaro.org> wrote:
>
>> On 08/30/2015 02:12 AM, Marc Zyngier wrote:
>>> On 2015-08-29 16:12, Jiang Liu wrote:
>>>> On 2015/8/29 21:00, Yang Yingliang wrote:
>>>>> From: Yang Yingliang <yangyingliang@huawei.com>
>>>>>
>>>>> When cpu is disabled, all irqs will be migratged to another cpu.
>>>>> In some cases, a new affinity is different, it needed to be coppied
>>>>> to irq's affinity. But if the type of irq is LPI, it's affinity will
>>>>> not be coppied because of irq_set_affinity's return value.
>>>>> So copy the affinity, when the return value is IRQ_SET_MASK_OK_DONE.
>>>> Hi Yingliang,
>>>>      If irq_set_affinity callback returns IRQ_SET_MASK_OK_DONE,
>>>> it means that irq_set_affinity has copied the new CPU mask to irq
>>>> affinity mask. It would be better to change irq_set_affinity for LPI
>>>> to follow this rule.
>>>
>>> The main issue here seems to be that we do not call irq_set_affinity, but
>>> that we directly call into the top-level irqchip method, which relies on
>>> the core code to do the copy (see irq_do_set_affinity). Too bad.
>>>
>>> It feels like the arm/arm64 code would probably be better consolidated into
>>> kernel/irq/migration.c, which already deals with some of this for x86
>>> and ia64. It would save us the duplication and will make sure we don't
>>> miss things next time we add a new return code, as irq_do_set_affinity
>>> would handle this properly.
>>>
>>> Thoughts?
>>
>> I agree. In arch/arm64/kernel/irq.c the irq migrate code is the same
>> as ARM32, and it's duplicate. But this is a bugfix, can we fix it in
>> a simple way, and refactor the code later?
>
> I'm not buying this.
>
> I really can't see how adding more duplication can be beneficial. It is
> not so much that there is duplication between arm and arm64 that
> bothers me (as if that was the only thing...). The real issue is that
> there is duplication between the arch code and the core code.
>
> Migrating interrupts is a core code matter, and that's were it should
> be handled IMHO. Plus, we're in the merge window, and there is plenty
> of time to get this fixed the proper way.

Got it. I'm trying to move the irq migrate code to kernel/irq/migration.c

Regards
Yang

>
> Thanks,
>
> 	M.
>
diff mbox

Patch

diff --git a/arch/arm64/kernel/irq.c b/arch/arm64/kernel/irq.c
index 463fa2e..2acc8ec 100644
--- a/arch/arm64/kernel/irq.c
+++ b/arch/arm64/kernel/irq.c
@@ -78,10 +78,13 @@  static bool migrate_one_irq(struct irq_desc *desc)
  	}

  	c = irq_data_get_irq_chip(d);
-	if (!c->irq_set_affinity)
+	if (!c->irq_set_affinity) {
  		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-	else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && 
ret)
-		cpumask_copy(irq_data_get_affinity_mask(d), affinity);
+	} else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK 
&& ret) {
+		int r = c->irq_set_affinity(d, affinity, false);
+		if ((r == IRQ_SET_MASK_OK || r == IRQ_SET_MASK_OK_DONE) && ret)
+			cpumask_copy(irq_data_get_affinity_mask(d), affinity);
+	}