diff mbox series

[RESEND,v1,3/9] spmi: pmic-arb: check apid against limits before calling irq handler

Message ID 1631860384-26608-4-git-send-email-quic_fenglinw@quicinc.com (mailing list archive)
State Superseded
Headers show
Series A bunch of fix and optimization patches in spmi-pmic-arb.c | expand

Commit Message

Fenglin Wu Sept. 17, 2021, 6:32 a.m. UTC
From: David Collins <collinsd@codeaurora.org>

Check that the apid for an SPMI interrupt falls between the
min_apid and max_apid that can be handled by the APPS processor
before invoking the per-apid interrupt handler:
periph_interrupt().

This avoids an access violation in rare cases where the status
bit is set for an interrupt that is not owned by the APPS
processor.

Signed-off-by: David Collins <collinsd@codeaurora.org>
Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
---
 drivers/spmi/spmi-pmic-arb.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Stephen Boyd Oct. 12, 2021, 6:02 p.m. UTC | #1
Quoting Fenglin Wu (2021-09-16 23:32:58)
> From: David Collins <collinsd@codeaurora.org>
> 
> Check that the apid for an SPMI interrupt falls between the
> min_apid and max_apid that can be handled by the APPS processor
> before invoking the per-apid interrupt handler:
> periph_interrupt().
> 
> This avoids an access violation in rare cases where the status
> bit is set for an interrupt that is not owned by the APPS
> processor.
> 
> Signed-off-by: David Collins <collinsd@codeaurora.org>
> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
> ---

Fixes? BTW, a lot of these patches are irqchip specific. It would be
good to get review from irqchip maintainers. Maybe we should split the
irqchip driver off via the auxiliary bus so that irqchip maintainers can
review. Please Cc them on irqchip related patches.

IRQCHIP DRIVERS
M:      Thomas Gleixner <tglx@linutronix.de>
M:      Marc Zyngier <maz@kernel.org>

>  drivers/spmi/spmi-pmic-arb.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
> index 4d7ad004..c4adc06 100644
> --- a/drivers/spmi/spmi-pmic-arb.c
> +++ b/drivers/spmi/spmi-pmic-arb.c
> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
>                         id = ffs(status) - 1;
>                         status &= ~BIT(id);
>                         apid = id + i * 32;
> +                       if (apid < pmic_arb->min_apid
> +                           || apid > pmic_arb->max_apid) {

The || goes on the line above. What about making a local variable for
first and last and then shifting by 5 in the loop?

int first = pmic_arb->min_apid;
int last = pmic_arb->max_apid;

for (i = first >> 5; i <= last >> 5; i++)

	if (apid < first || apid > last)

> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
> +                                       apid);

Is there any way to recover from this? Or once the mapping is wrong
we're going to get interrupts that we don't know what to do with
forever?

> +                               continue;
> +                       }
>                         enable = readl_relaxed(
>                                         ver_ops->acc_enable(pmic_arb, apid));
>                         if (enable & SPMI_PIC_ACC_ENABLE_BIT)
Fenglin Wu Oct. 13, 2021, 5:31 a.m. UTC | #2
On 10/13/2021 2:02 AM, Stephen Boyd wrote:
> Quoting Fenglin Wu (2021-09-16 23:32:58)
>> From: David Collins <collinsd@codeaurora.org>
>>
>> Check that the apid for an SPMI interrupt falls between the
>> min_apid and max_apid that can be handled by the APPS processor
>> before invoking the per-apid interrupt handler:
>> periph_interrupt().
>>
>> This avoids an access violation in rare cases where the status
>> bit is set for an interrupt that is not owned by the APPS
>> processor.
>>
>> Signed-off-by: David Collins <collinsd@codeaurora.org>
>> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
>> ---
> Fixes? BTW, a lot of these patches are irqchip specific. It would be
> good to get review from irqchip maintainers. Maybe we should split the
> irqchip driver off via the auxiliary bus so that irqchip maintainers can
> review. Please Cc them on irqchip related patches.
>
> IRQCHIP DRIVERS
> M:      Thomas Gleixner <tglx@linutronix.de>
> M:      Marc Zyngier <maz@kernel.org>
Sure, copied Thomas and Marc for code review.
This is a fix to avoid the register access violation in a case that an
interrupt is fired in a PMIC module which is not owned by APPS
processor.
>>   drivers/spmi/spmi-pmic-arb.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
>> index 4d7ad004..c4adc06 100644
>> --- a/drivers/spmi/spmi-pmic-arb.c
>> +++ b/drivers/spmi/spmi-pmic-arb.c
>> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
>>                          id = ffs(status) - 1;
>>                          status &= ~BIT(id);
>>                          apid = id + i * 32;
>> +                       if (apid < pmic_arb->min_apid
>> +                           || apid > pmic_arb->max_apid) {
> The || goes on the line above. What about making a local variable for
> first and last and then shifting by 5 in the loop?
>
> int first = pmic_arb->min_apid;
> int last = pmic_arb->max_apid;
>
> for (i = first >> 5; i <= last >> 5; i++)
>
> 	if (apid < first || apid > last)
ACK, will update it following this.
>> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
>> +                                       apid);
> Is there any way to recover from this? Or once the mapping is wrong
> we're going to get interrupts that we don't know what to do with
> forever?
This is a rare case that the unexpected interrupt is fired in a module
not owned by APPS process, so the interrupt itself is not expected hence
no need to recover from this but just bail out to avoid following register
access violation.
>> +                               continue;
>> +                       }
>>                          enable = readl_relaxed(
>>                                          ver_ops->acc_enable(pmic_arb, apid));
>>                          if (enable & SPMI_PIC_ACC_ENABLE_BIT)
Stephen Boyd Oct. 13, 2021, 7:25 p.m. UTC | #3
Quoting Fenglin Wu (2021-10-12 22:31:22)
> 
> On 10/13/2021 2:02 AM, Stephen Boyd wrote:
> > Quoting Fenglin Wu (2021-09-16 23:32:58)
> >> From: David Collins <collinsd@codeaurora.org>
> >>
> >> Check that the apid for an SPMI interrupt falls between the
> >> min_apid and max_apid that can be handled by the APPS processor
> >> before invoking the per-apid interrupt handler:
> >> periph_interrupt().
> >>
> >> This avoids an access violation in rare cases where the status
> >> bit is set for an interrupt that is not owned by the APPS
> >> processor.
> >>
> >> Signed-off-by: David Collins <collinsd@codeaurora.org>
> >> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
> >> ---
> > Fixes? BTW, a lot of these patches are irqchip specific. It would be
> > good to get review from irqchip maintainers. Maybe we should split the
> > irqchip driver off via the auxiliary bus so that irqchip maintainers can
> > review. Please Cc them on irqchip related patches.
> >
> > IRQCHIP DRIVERS
> > M:      Thomas Gleixner <tglx@linutronix.de>
> > M:      Marc Zyngier <maz@kernel.org>
> Sure, copied Thomas and Marc for code review.
> This is a fix to avoid the register access violation in a case that an
> interrupt is fired in a PMIC module which is not owned by APPS
> processor.

Got it.

> >>   drivers/spmi/spmi-pmic-arb.c | 6 ++++++
> >>   1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
> >> index 4d7ad004..c4adc06 100644
> >> --- a/drivers/spmi/spmi-pmic-arb.c
> >> +++ b/drivers/spmi/spmi-pmic-arb.c
> >> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
> >>                          id = ffs(status) - 1;
> >>                          status &= ~BIT(id);
> >>                          apid = id + i * 32;
> >> +                       if (apid < pmic_arb->min_apid
> >> +                           || apid > pmic_arb->max_apid) {
> > The || goes on the line above. What about making a local variable for
> > first and last and then shifting by 5 in the loop?
> >
> > int first = pmic_arb->min_apid;
> > int last = pmic_arb->max_apid;
> >
> > for (i = first >> 5; i <= last >> 5; i++)
> >
> >       if (apid < first || apid > last)
> ACK, will update it following this.
> >> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
> >> +                                       apid);
> > Is there any way to recover from this? Or once the mapping is wrong
> > we're going to get interrupts that we don't know what to do with
> > forever?
> This is a rare case that the unexpected interrupt is fired in a module
> not owned by APPS process, so the interrupt itself is not expected hence
> no need to recover from this but just bail out to avoid following register
> access violation.

And then the irq stops coming? It feels like a misconfiguration in the
firmware that we're trying to hide, hence the WARN_ONCE(). Can we
somehow silence irqs that aren't owned by the APPS when this driver
probes so that they can't even happen after probe?
Fenglin Wu Oct. 14, 2021, 3:11 a.m. UTC | #4
On 10/14/2021 3:25 AM, Stephen Boyd wrote:
> Quoting Fenglin Wu (2021-10-12 22:31:22)
>> On 10/13/2021 2:02 AM, Stephen Boyd wrote:
>>> Quoting Fenglin Wu (2021-09-16 23:32:58)
>>>> From: David Collins <collinsd@codeaurora.org>
>>>>
>>>> Check that the apid for an SPMI interrupt falls between the
>>>> min_apid and max_apid that can be handled by the APPS processor
>>>> before invoking the per-apid interrupt handler:
>>>> periph_interrupt().
>>>>
>>>> This avoids an access violation in rare cases where the status
>>>> bit is set for an interrupt that is not owned by the APPS
>>>> processor.
>>>>
>>>> Signed-off-by: David Collins <collinsd@codeaurora.org>
>>>> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
>>>> ---
>>> Fixes? BTW, a lot of these patches are irqchip specific. It would be
>>> good to get review from irqchip maintainers. Maybe we should split the
>>> irqchip driver off via the auxiliary bus so that irqchip maintainers can
>>> review. Please Cc them on irqchip related patches.
>>>
>>> IRQCHIP DRIVERS
>>> M:      Thomas Gleixner <tglx@linutronix.de>
>>> M:      Marc Zyngier <maz@kernel.org>
>> Sure, copied Thomas and Marc for code review.
>> This is a fix to avoid the register access violation in a case that an
>> interrupt is fired in a PMIC module which is not owned by APPS
>> processor.
> Got it.
>
>>>>    drivers/spmi/spmi-pmic-arb.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
>>>> index 4d7ad004..c4adc06 100644
>>>> --- a/drivers/spmi/spmi-pmic-arb.c
>>>> +++ b/drivers/spmi/spmi-pmic-arb.c
>>>> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
>>>>                           id = ffs(status) - 1;
>>>>                           status &= ~BIT(id);
>>>>                           apid = id + i * 32;
>>>> +                       if (apid < pmic_arb->min_apid
>>>> +                           || apid > pmic_arb->max_apid) {
>>> The || goes on the line above. What about making a local variable for
>>> first and last and then shifting by 5 in the loop?
>>>
>>> int first = pmic_arb->min_apid;
>>> int last = pmic_arb->max_apid;
>>>
>>> for (i = first >> 5; i <= last >> 5; i++)
>>>
>>>        if (apid < first || apid > last)
>> ACK, will update it following this.
>>>> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
>>>> +                                       apid);
>>> Is there any way to recover from this? Or once the mapping is wrong
>>> we're going to get interrupts that we don't know what to do with
>>> forever?
>> This is a rare case that the unexpected interrupt is fired in a module
>> not owned by APPS process, so the interrupt itself is not expected hence
>> no need to recover from this but just bail out to avoid following register
>> access violation.
> And then the irq stops coming? It feels like a misconfiguration in the
> firmware that we're trying to hide, hence the WARN_ONCE(). Can we
> somehow silence irqs that aren't owned by the APPS when this driver
> probes so that they can't even happen after probe?
Actually this is a rarely happened case that couldn't be reproduced easily
and consistently for further debug. I agreed this should be caused by HW
misconfiguration or even some unknown HW bug that it would send out SPMI
interrupt messages with incorrect APID, but we have never had any chance
to find out the root cause. The patch here simply checked the APID and
bail out if it's not in the valid range, it won't cause anything bad but
improves the SW robustness. After that, the IRQ won't be triggered again
because the latched status in PMIC is not cleared. Also, because of the
access restriction to the registers corresponding to this APID, there is
nothing we can do from APPS processor side to keep it silent.
Stephen Boyd Oct. 15, 2021, 1:15 a.m. UTC | #5
Quoting Fenglin Wu (2021-10-13 20:11:40)
> 
> On 10/14/2021 3:25 AM, Stephen Boyd wrote:
> > Quoting Fenglin Wu (2021-10-12 22:31:22)
> >> On 10/13/2021 2:02 AM, Stephen Boyd wrote:
> >>> Quoting Fenglin Wu (2021-09-16 23:32:58)
> >>>> From: David Collins <collinsd@codeaurora.org>
> >>>>
> >>>> Check that the apid for an SPMI interrupt falls between the
> >>>> min_apid and max_apid that can be handled by the APPS processor
> >>>> before invoking the per-apid interrupt handler:
> >>>> periph_interrupt().
> >>>>
> >>>> This avoids an access violation in rare cases where the status
> >>>> bit is set for an interrupt that is not owned by the APPS
> >>>> processor.
> >>>>
> >>>> Signed-off-by: David Collins <collinsd@codeaurora.org>
> >>>> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
> >>>> ---
> >>> Fixes? BTW, a lot of these patches are irqchip specific. It would be
> >>> good to get review from irqchip maintainers. Maybe we should split the
> >>> irqchip driver off via the auxiliary bus so that irqchip maintainers can
> >>> review. Please Cc them on irqchip related patches.
> >>>
> >>> IRQCHIP DRIVERS
> >>> M:      Thomas Gleixner <tglx@linutronix.de>
> >>> M:      Marc Zyngier <maz@kernel.org>
> >> Sure, copied Thomas and Marc for code review.
> >> This is a fix to avoid the register access violation in a case that an
> >> interrupt is fired in a PMIC module which is not owned by APPS
> >> processor.
> > Got it.
> >
> >>>>    drivers/spmi/spmi-pmic-arb.c | 6 ++++++
> >>>>    1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
> >>>> index 4d7ad004..c4adc06 100644
> >>>> --- a/drivers/spmi/spmi-pmic-arb.c
> >>>> +++ b/drivers/spmi/spmi-pmic-arb.c
> >>>> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
> >>>>                           id = ffs(status) - 1;
> >>>>                           status &= ~BIT(id);
> >>>>                           apid = id + i * 32;
> >>>> +                       if (apid < pmic_arb->min_apid
> >>>> +                           || apid > pmic_arb->max_apid) {
> >>> The || goes on the line above. What about making a local variable for
> >>> first and last and then shifting by 5 in the loop?
> >>>
> >>> int first = pmic_arb->min_apid;
> >>> int last = pmic_arb->max_apid;
> >>>
> >>> for (i = first >> 5; i <= last >> 5; i++)
> >>>
> >>>        if (apid < first || apid > last)
> >> ACK, will update it following this.
> >>>> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
> >>>> +                                       apid);
> >>> Is there any way to recover from this? Or once the mapping is wrong
> >>> we're going to get interrupts that we don't know what to do with
> >>> forever?
> >> This is a rare case that the unexpected interrupt is fired in a module
> >> not owned by APPS process, so the interrupt itself is not expected hence
> >> no need to recover from this but just bail out to avoid following register
> >> access violation.
> > And then the irq stops coming? It feels like a misconfiguration in the
> > firmware that we're trying to hide, hence the WARN_ONCE(). Can we
> > somehow silence irqs that aren't owned by the APPS when this driver
> > probes so that they can't even happen after probe?
> Actually this is a rarely happened case that couldn't be reproduced easily
> and consistently for further debug. I agreed this should be caused by HW
> misconfiguration or even some unknown HW bug that it would send out SPMI
> interrupt messages with incorrect APID, but we have never had any chance
> to find out the root cause. The patch here simply checked the APID and
> bail out if it's not in the valid range, it won't cause anything bad but
> improves the SW robustness. After that, the IRQ won't be triggered again
> because the latched status in PMIC is not cleared. Also, because of the
> access restriction to the registers corresponding to this APID, there is
> nothing we can do from APPS processor side to keep it silent.

This patch seems like a band-aid for an issue that isn't fully
understood. I suppose it's good that the irq will stay asserted forever
and then it won't happen again until it gets cleared by some other
processor in the SoC. Instead of the WARN_ONCE() can we track if any irq
is handled when the chained irq is raised, and if nothing is handled
then call handle_bad_irq() on the chained descriptor? Take a look at
pinctrl-msm.c to see how they handled spurious irqs that aren't actually
directed at the APPS processor. We should do something similar here.
Fenglin Wu Oct. 15, 2021, 1:54 a.m. UTC | #6
On 10/15/2021 9:15 AM, Stephen Boyd wrote:
> Quoting Fenglin Wu (2021-10-13 20:11:40)
>> On 10/14/2021 3:25 AM, Stephen Boyd wrote:
>>> Quoting Fenglin Wu (2021-10-12 22:31:22)
>>>> On 10/13/2021 2:02 AM, Stephen Boyd wrote:
>>>>> Quoting Fenglin Wu (2021-09-16 23:32:58)
>>>>>> From: David Collins <collinsd@codeaurora.org>
>>>>>>
>>>>>> Check that the apid for an SPMI interrupt falls between the
>>>>>> min_apid and max_apid that can be handled by the APPS processor
>>>>>> before invoking the per-apid interrupt handler:
>>>>>> periph_interrupt().
>>>>>>
>>>>>> This avoids an access violation in rare cases where the status
>>>>>> bit is set for an interrupt that is not owned by the APPS
>>>>>> processor.
>>>>>>
>>>>>> Signed-off-by: David Collins <collinsd@codeaurora.org>
>>>>>> Signed-off-by: Fenglin Wu <quic_fenglinw@quicinc.com>
>>>>>> ---
>>>>> Fixes? BTW, a lot of these patches are irqchip specific. It would be
>>>>> good to get review from irqchip maintainers. Maybe we should split the
>>>>> irqchip driver off via the auxiliary bus so that irqchip maintainers can
>>>>> review. Please Cc them on irqchip related patches.
>>>>>
>>>>> IRQCHIP DRIVERS
>>>>> M:      Thomas Gleixner <tglx@linutronix.de>
>>>>> M:      Marc Zyngier <maz@kernel.org>
>>>> Sure, copied Thomas and Marc for code review.
>>>> This is a fix to avoid the register access violation in a case that an
>>>> interrupt is fired in a PMIC module which is not owned by APPS
>>>> processor.
>>> Got it.
>>>
>>>>>>     drivers/spmi/spmi-pmic-arb.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
>>>>>> index 4d7ad004..c4adc06 100644
>>>>>> --- a/drivers/spmi/spmi-pmic-arb.c
>>>>>> +++ b/drivers/spmi/spmi-pmic-arb.c
>>>>>> @@ -535,6 +535,12 @@ static void pmic_arb_chained_irq(struct irq_desc *desc)
>>>>>>                            id = ffs(status) - 1;
>>>>>>                            status &= ~BIT(id);
>>>>>>                            apid = id + i * 32;
>>>>>> +                       if (apid < pmic_arb->min_apid
>>>>>> +                           || apid > pmic_arb->max_apid) {
>>>>> The || goes on the line above. What about making a local variable for
>>>>> first and last and then shifting by 5 in the loop?
>>>>>
>>>>> int first = pmic_arb->min_apid;
>>>>> int last = pmic_arb->max_apid;
>>>>>
>>>>> for (i = first >> 5; i <= last >> 5; i++)
>>>>>
>>>>>         if (apid < first || apid > last)
>>>> ACK, will update it following this.
>>>>>> +                               WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
>>>>>> +                                       apid);
>>>>> Is there any way to recover from this? Or once the mapping is wrong
>>>>> we're going to get interrupts that we don't know what to do with
>>>>> forever?
>>>> This is a rare case that the unexpected interrupt is fired in a module
>>>> not owned by APPS process, so the interrupt itself is not expected hence
>>>> no need to recover from this but just bail out to avoid following register
>>>> access violation.
>>> And then the irq stops coming? It feels like a misconfiguration in the
>>> firmware that we're trying to hide, hence the WARN_ONCE(). Can we
>>> somehow silence irqs that aren't owned by the APPS when this driver
>>> probes so that they can't even happen after probe?
>> Actually this is a rarely happened case that couldn't be reproduced easily
>> and consistently for further debug. I agreed this should be caused by HW
>> misconfiguration or even some unknown HW bug that it would send out SPMI
>> interrupt messages with incorrect APID, but we have never had any chance
>> to find out the root cause. The patch here simply checked the APID and
>> bail out if it's not in the valid range, it won't cause anything bad but
>> improves the SW robustness. After that, the IRQ won't be triggered again
>> because the latched status in PMIC is not cleared. Also, because of the
>> access restriction to the registers corresponding to this APID, there is
>> nothing we can do from APPS processor side to keep it silent.
> This patch seems like a band-aid for an issue that isn't fully
> understood. I suppose it's good that the irq will stay asserted forever
> and then it won't happen again until it gets cleared by some other
> processor in the SoC. Instead of the WARN_ONCE() can we track if any irq
> is handled when the chained irq is raised, and if nothing is handled
> then call handle_bad_irq() on the chained descriptor? Take a look at
> pinctrl-msm.c to see how they handled spurious irqs that aren't actually
> directed at the APPS processor. We should do something similar here.
Sure, I will do it that way.
diff mbox series

Patch

diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index 4d7ad004..c4adc06 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -535,6 +535,12 @@  static void pmic_arb_chained_irq(struct irq_desc *desc)
 			id = ffs(status) - 1;
 			status &= ~BIT(id);
 			apid = id + i * 32;
+			if (apid < pmic_arb->min_apid
+			    || apid > pmic_arb->max_apid) {
+				WARN_ONCE(true, "spurious spmi irq received for apid=%d\n",
+					apid);
+				continue;
+			}
 			enable = readl_relaxed(
 					ver_ops->acc_enable(pmic_arb, apid));
 			if (enable & SPMI_PIC_ACC_ENABLE_BIT)