diff mbox series

iio: adc: stm32-adc: fix erroneous handling of spurious IRQs

Message ID 20210112152441.20665-1-a.fatoum@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series iio: adc: stm32-adc: fix erroneous handling of spurious IRQs | expand

Commit Message

Ahmad Fatoum Jan. 12, 2021, 3:24 p.m. UTC
1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
that threaded IRQs either
  - have IRQF_ONESHOT set
  - don't have the default just return IRQ_WAKE_THREAD primary handler

This is necessary because level-triggered interrupts need to be masked,
either at device or irqchip, to avoid an interrupt storm.

For spurious interrupts, the STM32 ADC driver still does this bogus
request though:
  - It doesn't set IRQF_ONESHOT
  - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
    is unexpected, i.e. !(status & enabled_mask)
  - stm32mp151.dtsi describes the ADC interrupt as level-triggered

Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
until the IRQ thread has finished.

IRQF_ONESHOT also has the effect that the primary handler is no longer
forced into a thread. This makes the issue with spurious interrupts
interrupts disappear when reading the ADC on a threadirqs=1 kernel.
This used to result in following kernel error message:

	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
or
	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a

But with this patch applied (or threaded IRQs disabled), this no longer
occurs.

Cc: Lucas Stach <l.stach@pengutronix.de>
Reported-by: Holger Assmann <has@pengutronix.de>
Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 drivers/iio/adc/stm32-adc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jonathan Cameron Jan. 16, 2021, 5:53 p.m. UTC | #1
On Tue, 12 Jan 2021 16:24:42 +0100
Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:

> 1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
> that threaded IRQs either
>   - have IRQF_ONESHOT set
>   - don't have the default just return IRQ_WAKE_THREAD primary handler
> 
> This is necessary because level-triggered interrupts need to be masked,
> either at device or irqchip, to avoid an interrupt storm.
> 
> For spurious interrupts, the STM32 ADC driver still does this bogus
> request though:
>   - It doesn't set IRQF_ONESHOT
>   - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
>     is unexpected, i.e. !(status & enabled_mask)

This seems 'unusual'.  If this is a spurious interrupt we should be
returning IRQ_NONE and letting the spurious interrupt protection
stuff kick in.

The only reason I can see that it does this is print an error message.
I'm not sure why we need to go into the thread to do that given
it's not supposed to happen. If we need that message at all, I'd
suggest doing it in the interrupt handler then return IRQ_NONE;

>   - stm32mp151.dtsi describes the ADC interrupt as level-triggered
> 
> Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
> until the IRQ thread has finished.
> 
> IRQF_ONESHOT also has the effect that the primary handler is no longer
> forced into a thread. This makes the issue with spurious interrupts
> interrupts disappear when reading the ADC on a threadirqs=1 kernel.
> This used to result in following kernel error message:
> 
> 	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
> or
> 	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a
> 
> But with this patch applied (or threaded IRQs disabled), this no longer
> occurs.
> 
> Cc: Lucas Stach <l.stach@pengutronix.de>
> Reported-by: Holger Assmann <has@pengutronix.de>
> Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> ---
>  drivers/iio/adc/stm32-adc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
> index c067c994dae2..7e0e21c79ac8 100644
> --- a/drivers/iio/adc/stm32-adc.c
> +++ b/drivers/iio/adc/stm32-adc.c
> @@ -1910,7 +1910,7 @@ static int stm32_adc_probe(struct platform_device *pdev)
>  
>  	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
>  					stm32_adc_threaded_isr,
> -					0, pdev->name, indio_dev);
> +					IRQF_ONESHOT, pdev->name, indio_dev);
>  	if (ret) {
>  		dev_err(&pdev->dev, "failed to request IRQ\n");
>  		return ret;
Ahmad Fatoum Jan. 18, 2021, 11:42 a.m. UTC | #2
Hello Jonathan,

On 16.01.21 18:53, Jonathan Cameron wrote:
> On Tue, 12 Jan 2021 16:24:42 +0100
> Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
> 
>> 1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
>> that threaded IRQs either
>>   - have IRQF_ONESHOT set
>>   - don't have the default just return IRQ_WAKE_THREAD primary handler
>>
>> This is necessary because level-triggered interrupts need to be masked,
>> either at device or irqchip, to avoid an interrupt storm.
>>
>> For spurious interrupts, the STM32 ADC driver still does this bogus
>> request though:
>>   - It doesn't set IRQF_ONESHOT
>>   - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
>>     is unexpected, i.e. !(status & enabled_mask)
> 
> This seems 'unusual'.  If this is a spurious interrupt we should be
> returning IRQ_NONE and letting the spurious interrupt protection
> stuff kick in.
> 
> The only reason I can see that it does this is print an error message.
> I'm not sure why we need to go into the thread to do that given
> it's not supposed to happen. If we need that message at all, I'd
> suggest doing it in the interrupt handler then return IRQ_NONE;

As described, I run into the spurious IRQ case, so I think the message is
still useful (until that's properly fixed), but yes, it should've returned
IRQ_NONE in that case.

With these changes, IRQF_ONESHOT shouldn't be necessary, but in practice
the driver doesn't function correctly with the primary IRQ handler threaded.

Olivier, Fabrice: Are you aware of this problem?

Cheers,
Ahmad

> 
>>   - stm32mp151.dtsi describes the ADC interrupt as level-triggered
>>
>> Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
>> until the IRQ thread has finished.
>>
>> IRQF_ONESHOT also has the effect that the primary handler is no longer
>> forced into a thread. This makes the issue with spurious interrupts
>> interrupts disappear when reading the ADC on a threadirqs=1 kernel.
>> This used to result in following kernel error message:
>>
>> 	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
>> or
>> 	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a
>>
>> But with this patch applied (or threaded IRQs disabled), this no longer
>> occurs.
>>
>> Cc: Lucas Stach <l.stach@pengutronix.de>
>> Reported-by: Holger Assmann <has@pengutronix.de>
>> Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> ---
>>  drivers/iio/adc/stm32-adc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
>> index c067c994dae2..7e0e21c79ac8 100644
>> --- a/drivers/iio/adc/stm32-adc.c
>> +++ b/drivers/iio/adc/stm32-adc.c
>> @@ -1910,7 +1910,7 @@ static int stm32_adc_probe(struct platform_device *pdev)
>>  
>>  	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
>>  					stm32_adc_threaded_isr,
>> -					0, pdev->name, indio_dev);
>> +					IRQF_ONESHOT, pdev->name, indio_dev);
>>  	if (ret) {
>>  		dev_err(&pdev->dev, "failed to request IRQ\n");
>>  		return ret;
> 
>
Fabrice Gasnier Jan. 19, 2021, 5:56 p.m. UTC | #3
On 1/18/21 12:42 PM, Ahmad Fatoum wrote:
> Hello Jonathan,
>
> On 16.01.21 18:53, Jonathan Cameron wrote:
>> On Tue, 12 Jan 2021 16:24:42 +0100
>> Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>
>>> 1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
>>> that threaded IRQs either
>>>   - have IRQF_ONESHOT set
>>>   - don't have the default just return IRQ_WAKE_THREAD primary handler
>>>
>>> This is necessary because level-triggered interrupts need to be masked,
>>> either at device or irqchip, to avoid an interrupt storm.
>>>
>>> For spurious interrupts, the STM32 ADC driver still does this bogus
>>> request though:
>>>   - It doesn't set IRQF_ONESHOT
>>>   - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
>>>     is unexpected, i.e. !(status & enabled_mask)
>> This seems 'unusual'.  If this is a spurious interrupt we should be
>> returning IRQ_NONE and letting the spurious interrupt protection
>> stuff kick in.
>>
>> The only reason I can see that it does this is print an error message.
>> I'm not sure why we need to go into the thread to do that given
>> it's not supposed to happen. If we need that message at all, I'd
>> suggest doing it in the interrupt handler then return IRQ_NONE;
> As described, I run into the spurious IRQ case, so I think the message is
> still useful (until that's properly fixed), but yes, it should've returned
> IRQ_NONE in that case.
>
> With these changes, IRQF_ONESHOT shouldn't be necessary, but in practice
> the driver doesn't function correctly with the primary IRQ handler threaded.
>
> Olivier, Fabrice: Are you aware of this problem?


Hi Ahmad, Jonathan,

I wasn't aware of this up to now. I confirm we've the same behavior at
our end with threadirqs=1.

Olivier and I started to look at this. Indeed, the IRQF_ONESHOT makes
the issue to disappear.
I'm not sure 100% that's for the above reasons. Please let me share some
piece of logs, analysis and thoughts.


I may miss it but, the patch "genirq: Reject bogus threaded irq
requests" seems to handle the case where no HW handler is provided, but
only the threaded part?

In the stm32-adc both are provided. Also the IRQ domain in
stm32-adc-core maybe a key here ?


We did some testing, ftrace and observed following behavior for one
capture (a single cat in_voltage..._raw) :

in stm32-adc-core, as IRQ source is still active until the IRQ thread
can execute:
- stm32_adc_irq_handler <-- generic_handle_irq
- stm32_adc_irq_handler <-- generic_handle_irq
- stm32_adc_irq_handler <-- generic_handle_irq
...

- sched_switch to the 1st IRQ thread
- stm32_adc_irq_handler <-- generic_handle_irq (again until DR get read)

- stm32_adc_isr <-- irq_forced_thread_fn (from stm32-adc)
  DR read, clears the active flag
- stm32_adc_isr <-- irq_forced_thread_fn
  wakes the 2nd IRQ thread to print an error (unexpected...)

sched_switch to the 2nd IRQ thread that prints the message.

- stm32_adc_threaded_isr <-- irq_thread_fn


So my understanding is: the cause seems to be the concurrency between

- stm32_adc_irq_handler() storm calls in stm32-adc-core
- stm32_adc_isr() call to clear the cause (forced into a thread with
threadirqs=1).

To properly work, the stm32_adc_irq_handler() should be masked in between.

As you explain, this works in this case: the call to stm32_adc_isr (in
stm32-adc) isn't longer forced threaded with IRQF_ONESHOT.

It looks like IRQF_NO_THREAD for forced threading would have similar
effect? Maybe the same would be applicable here ? (I haven't tested...)


Hopefully this helps and is similar to what you observed.

Thanks and best regards,
Fabrice

>
> Cheers,
> Ahmad
>
>>>   - stm32mp151.dtsi describes the ADC interrupt as level-triggered
>>>
>>> Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
>>> until the IRQ thread has finished.
>>>
>>> IRQF_ONESHOT also has the effect that the primary handler is no longer
>>> forced into a thread. This makes the issue with spurious interrupts
>>> interrupts disappear when reading the ADC on a threadirqs=1 kernel.
>>> This used to result in following kernel error message:
>>>
>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
>>> or
>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a
>>>
>>> But with this patch applied (or threaded IRQs disabled), this no longer
>>> occurs.
>>>
>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>> Reported-by: Holger Assmann <has@pengutronix.de>
>>> Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>> ---
>>>  drivers/iio/adc/stm32-adc.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
>>> index c067c994dae2..7e0e21c79ac8 100644
>>> --- a/drivers/iio/adc/stm32-adc.c
>>> +++ b/drivers/iio/adc/stm32-adc.c
>>> @@ -1910,7 +1910,7 @@ static int stm32_adc_probe(struct platform_device *pdev)
>>>  
>>>  	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
>>>  					stm32_adc_threaded_isr,
>>> -					0, pdev->name, indio_dev);
>>> +					IRQF_ONESHOT, pdev->name, indio_dev);
>>>  	if (ret) {
>>>  		dev_err(&pdev->dev, "failed to request IRQ\n");
>>>  		return ret;
>>
Ahmad Fatoum Jan. 22, 2021, 12:18 p.m. UTC | #4
Hello Fabrice,

On 19.01.21 18:56, Fabrice Gasnier wrote:
> On 1/18/21 12:42 PM, Ahmad Fatoum wrote:
>> Hello Jonathan,
>>
>> On 16.01.21 18:53, Jonathan Cameron wrote:
>>> On Tue, 12 Jan 2021 16:24:42 +0100
>>> Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>>
>>>> 1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
>>>> that threaded IRQs either
>>>>   - have IRQF_ONESHOT set
>>>>   - don't have the default just return IRQ_WAKE_THREAD primary handler
>>>>
>>>> This is necessary because level-triggered interrupts need to be masked,
>>>> either at device or irqchip, to avoid an interrupt storm.
>>>>
>>>> For spurious interrupts, the STM32 ADC driver still does this bogus
>>>> request though:
>>>>   - It doesn't set IRQF_ONESHOT
>>>>   - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
>>>>     is unexpected, i.e. !(status & enabled_mask)
>>> This seems 'unusual'.  If this is a spurious interrupt we should be
>>> returning IRQ_NONE and letting the spurious interrupt protection
>>> stuff kick in.
>>>
>>> The only reason I can see that it does this is print an error message.
>>> I'm not sure why we need to go into the thread to do that given
>>> it's not supposed to happen. If we need that message at all, I'd
>>> suggest doing it in the interrupt handler then return IRQ_NONE;
>> As described, I run into the spurious IRQ case, so I think the message is
>> still useful (until that's properly fixed), but yes, it should've returned
>> IRQ_NONE in that case.
>>
>> With these changes, IRQF_ONESHOT shouldn't be necessary, but in practice
>> the driver doesn't function correctly with the primary IRQ handler threaded.
>>
>> Olivier, Fabrice: Are you aware of this problem?
> 
> 
> Hi Ahmad, Jonathan,
> 
> I wasn't aware of this up to now. I confirm we've the same behavior at
> our end with threadirqs=1.
> 
> Olivier and I started to look at this. Indeed, the IRQF_ONESHOT makes
> the issue to disappear.
> I'm not sure 100% that's for the above reasons. Please let me share some
> piece of logs, analysis and thoughts.

Thanks for looking at this.

> I may miss it but, the patch "genirq: Reject bogus threaded irq
> requests" seems to handle the case where no HW handler is provided, but
> only the threaded part?

There is still a primary handler, but that one does only do IRQ_WAKE_THREAD,
so I assumed that would be equivalent to what the driver is doing in the
spurious IRQ case.

> In the stm32-adc both are provided. Also the IRQ domain in
> stm32-adc-core maybe a key here ?

Oh, missed completely that the stm32-adc-core does the interrupt routing.

> We did some testing, ftrace and observed following behavior for one
> capture (a single cat in_voltage..._raw) :
> 
> in stm32-adc-core, as IRQ source is still active until the IRQ thread
> can execute:
> - stm32_adc_irq_handler <-- generic_handle_irq
> - stm32_adc_irq_handler <-- generic_handle_irq
> - stm32_adc_irq_handler <-- generic_handle_irq
> ...
> 
> - sched_switch to the 1st IRQ thread
> - stm32_adc_irq_handler <-- generic_handle_irq (again until DR get read)
> 
> - stm32_adc_isr <-- irq_forced_thread_fn (from stm32-adc)
>   DR read, clears the active flag
> - stm32_adc_isr <-- irq_forced_thread_fn
>   wakes the 2nd IRQ thread to print an error (unexpected...)
> 
> sched_switch to the 2nd IRQ thread that prints the message.
> 
> - stm32_adc_threaded_isr <-- irq_thread_fn
> 
> 
> So my understanding is: the cause seems to be the concurrency between
> 
> - stm32_adc_irq_handler() storm calls in stm32-adc-core
> - stm32_adc_isr() call to clear the cause (forced into a thread with
> threadirqs=1).

I can't follow here. Where does stm32_adc_isr() clear the IRQ cause?
I assumed it can't be isr_ovr.mask, because that's checked in the
primary handler.

> To properly work, the stm32_adc_irq_handler() should be masked in between.
> 
> As you explain, this works in this case: the call to stm32_adc_isr (in
> stm32-adc) isn't longer forced threaded with IRQF_ONESHOT.
> 
> It looks like IRQF_NO_THREAD for forced threading would have similar
> effect? Maybe the same would be applicable here ? (I haven't tested...)

I guess IRQF_NO_THREAD is meant for use with request_irq and
IRQF_ONESHOT for request_threaded_irq?
 
> Hopefully this helps and is similar to what you observed.

Cheers,
Ahmad

> 
> Thanks and best regards,
> Fabrice
> 
>>
>> Cheers,
>> Ahmad
>>
>>>>   - stm32mp151.dtsi describes the ADC interrupt as level-triggered
>>>>
>>>> Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
>>>> until the IRQ thread has finished.
>>>>
>>>> IRQF_ONESHOT also has the effect that the primary handler is no longer
>>>> forced into a thread. This makes the issue with spurious interrupts
>>>> interrupts disappear when reading the ADC on a threadirqs=1 kernel.
>>>> This used to result in following kernel error message:
>>>>
>>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
>>>> or
>>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a
>>>>
>>>> But with this patch applied (or threaded IRQs disabled), this no longer
>>>> occurs.
>>>>
>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>> Reported-by: Holger Assmann <has@pengutronix.de>
>>>> Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
>>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>>> ---
>>>>  drivers/iio/adc/stm32-adc.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
>>>> index c067c994dae2..7e0e21c79ac8 100644
>>>> --- a/drivers/iio/adc/stm32-adc.c
>>>> +++ b/drivers/iio/adc/stm32-adc.c
>>>> @@ -1910,7 +1910,7 @@ static int stm32_adc_probe(struct platform_device *pdev)
>>>>  
>>>>  	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
>>>>  					stm32_adc_threaded_isr,
>>>> -					0, pdev->name, indio_dev);
>>>> +					IRQF_ONESHOT, pdev->name, indio_dev);
>>>>  	if (ret) {
>>>>  		dev_err(&pdev->dev, "failed to request IRQ\n");
>>>>  		return ret;
>>>
>
Fabrice Gasnier Jan. 26, 2021, 3:52 p.m. UTC | #5
On 1/22/21 1:18 PM, Ahmad Fatoum wrote:
> Hello Fabrice,
> 
> On 19.01.21 18:56, Fabrice Gasnier wrote:
>> On 1/18/21 12:42 PM, Ahmad Fatoum wrote:
>>> Hello Jonathan,
>>>
>>> On 16.01.21 18:53, Jonathan Cameron wrote:
>>>> On Tue, 12 Jan 2021 16:24:42 +0100
>>>> Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:
>>>>
>>>>> 1c6c69525b40 ("genirq: Reject bogus threaded irq requests") makes sure
>>>>> that threaded IRQs either
>>>>>   - have IRQF_ONESHOT set
>>>>>   - don't have the default just return IRQ_WAKE_THREAD primary handler
>>>>>
>>>>> This is necessary because level-triggered interrupts need to be masked,
>>>>> either at device or irqchip, to avoid an interrupt storm.
>>>>>
>>>>> For spurious interrupts, the STM32 ADC driver still does this bogus
>>>>> request though:
>>>>>   - It doesn't set IRQF_ONESHOT
>>>>>   - Its primary handler just returns IRQ_WAKE_THREAD if the interrupt
>>>>>     is unexpected, i.e. !(status & enabled_mask)
>>>> This seems 'unusual'.  If this is a spurious interrupt we should be
>>>> returning IRQ_NONE and letting the spurious interrupt protection
>>>> stuff kick in.
>>>>
>>>> The only reason I can see that it does this is print an error message.
>>>> I'm not sure why we need to go into the thread to do that given
>>>> it's not supposed to happen. If we need that message at all, I'd
>>>> suggest doing it in the interrupt handler then return IRQ_NONE;
>>> As described, I run into the spurious IRQ case, so I think the message is
>>> still useful (until that's properly fixed), but yes, it should've returned
>>> IRQ_NONE in that case.
>>>
>>> With these changes, IRQF_ONESHOT shouldn't be necessary, but in practice
>>> the driver doesn't function correctly with the primary IRQ handler threaded.
>>>
>>> Olivier, Fabrice: Are you aware of this problem?
>>
>>
>> Hi Ahmad, Jonathan,
>>
>> I wasn't aware of this up to now. I confirm we've the same behavior at
>> our end with threadirqs=1.
>>
>> Olivier and I started to look at this. Indeed, the IRQF_ONESHOT makes
>> the issue to disappear.
>> I'm not sure 100% that's for the above reasons. Please let me share some
>> piece of logs, analysis and thoughts.
> 
> Thanks for looking at this.
> 
>> I may miss it but, the patch "genirq: Reject bogus threaded irq
>> requests" seems to handle the case where no HW handler is provided, but
>> only the threaded part?
> 
> There is still a primary handler, but that one does only do IRQ_WAKE_THREAD,
> so I assumed that would be equivalent to what the driver is doing in the
> spurious IRQ case.
> 
>> In the stm32-adc both are provided. Also the IRQ domain in
>> stm32-adc-core maybe a key here ?
> 
> Oh, missed completely that the stm32-adc-core does the interrupt routing.

Hi Ahmad, Jonathan,

The interrupt routing is done in the core by using "dummy_irq_chip".

Currently (with threadirqs=1), irq_mask and irq_unmask callbacks are
called, but this makes a "noop":
struct irq_chip dummy_irq_chip = {
...
    .irq_mask    = noop,
    .irq_unmask    = noop,
...

That's the reason for the hw irq storm until the primary threaded
handler can run.

I see no easy way to mask the irq from the core driver. e.g. enable bits
are in "child" registers.
The child adc driver already clear/set them at will (in IER: EOC/OVR bits).

Please find other considerations here after

> 
>> We did some testing, ftrace and observed following behavior for one
>> capture (a single cat in_voltage..._raw) :
>>
>> in stm32-adc-core, as IRQ source is still active until the IRQ thread
>> can execute:
>> - stm32_adc_irq_handler <-- generic_handle_irq
>> - stm32_adc_irq_handler <-- generic_handle_irq
>> - stm32_adc_irq_handler <-- generic_handle_irq
>> ...
>>
>> - sched_switch to the 1st IRQ thread
>> - stm32_adc_irq_handler <-- generic_handle_irq (again until DR get read)
>>
>> - stm32_adc_isr <-- irq_forced_thread_fn (from stm32-adc)
>>   DR read, clears the active flag
>> - stm32_adc_isr <-- irq_forced_thread_fn
>>   wakes the 2nd IRQ thread to print an error (unexpected...)
>>
>> sched_switch to the 2nd IRQ thread that prints the message.
>>
>> - stm32_adc_threaded_isr <-- irq_thread_fn
>>
>>
>> So my understanding is: the cause seems to be the concurrency between
>>
>> - stm32_adc_irq_handler() storm calls in stm32-adc-core
>> - stm32_adc_isr() call to clear the cause (forced into a thread with
>> threadirqs=1).
> 
> I can't follow here. Where does stm32_adc_isr() clear the IRQ cause?

The 'eoc' end of conversion flag is cleared by reading data register.

> I assumed it can't be isr_ovr.mask, because that's checked in the
> primary handler.
> 
>> To properly work, the stm32_adc_irq_handler() should be masked in between.
>>
>> As you explain, this works in this case: the call to stm32_adc_isr (in
>> stm32-adc) isn't longer forced threaded with IRQF_ONESHOT.
>>
>> It looks like IRQF_NO_THREAD for forced threading would have similar
>> effect? Maybe the same would be applicable here ? (I haven't tested...)
> 
> I guess IRQF_NO_THREAD is meant for use with request_irq and
> IRQF_ONESHOT for request_threaded_irq?

Thanks for pointing this. So I guess IRQF_ONESHOT can be used here, so
we don't hit the "noop" irq_mask/unmask.

Some other considerations are: some stm32 ADC have a single IRQ line
(stm32f4/stm32h7) but stm32mp1 have one irq line per "child" adc.
So I'd prefer to keep this approach, to prevent using a threaded primary
handler. Also, there's not always a fifo in stm32 adc (so tight timing
to read data in PIO mode).

Ahmad, Jonathan,

Is it reasonable to improve the commit message to summarize the root
cause, explain the rationale, and keep IRQF_ONESHOT (e.g. current patch ?) ?

Best Regards,
Fabrice

>  
>> Hopefully this helps and is similar to what you observed.
> 
> Cheers,
> Ahmad
> 
>>
>> Thanks and best regards,
>> Fabrice
>>
>>>
>>> Cheers,
>>> Ahmad
>>>
>>>>>   - stm32mp151.dtsi describes the ADC interrupt as level-triggered
>>>>>
>>>>> Fix this by setting IRQF_ONESHOT to have the irqchip mask the IRQ
>>>>> until the IRQ thread has finished.
>>>>>
>>>>> IRQF_ONESHOT also has the effect that the primary handler is no longer
>>>>> forced into a thread. This makes the issue with spurious interrupts
>>>>> interrupts disappear when reading the ADC on a threadirqs=1 kernel.
>>>>> This used to result in following kernel error message:
>>>>>
>>>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000000, ISR=0x0000100e
>>>>> or
>>>>> 	iio iio:device1: Unexpected IRQ: IER=0x00000004, ISR=0x0000100a
>>>>>
>>>>> But with this patch applied (or threaded IRQs disabled), this no longer
>>>>> occurs.
>>>>>
>>>>> Cc: Lucas Stach <l.stach@pengutronix.de>
>>>>> Reported-by: Holger Assmann <has@pengutronix.de>
>>>>> Fixes: 695e2f5c289b ("iio: adc: stm32-adc: fix a regression when using dma and irq")
>>>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>>>> ---
>>>>>  drivers/iio/adc/stm32-adc.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
>>>>> index c067c994dae2..7e0e21c79ac8 100644
>>>>> --- a/drivers/iio/adc/stm32-adc.c
>>>>> +++ b/drivers/iio/adc/stm32-adc.c
>>>>> @@ -1910,7 +1910,7 @@ static int stm32_adc_probe(struct platform_device *pdev)
>>>>>  
>>>>>  	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
>>>>>  					stm32_adc_threaded_isr,
>>>>> -					0, pdev->name, indio_dev);
>>>>> +					IRQF_ONESHOT, pdev->name, indio_dev);
>>>>>  	if (ret) {
>>>>>  		dev_err(&pdev->dev, "failed to request IRQ\n");
>>>>>  		return ret;
>>>>
>>
>
diff mbox series

Patch

diff --git a/drivers/iio/adc/stm32-adc.c b/drivers/iio/adc/stm32-adc.c
index c067c994dae2..7e0e21c79ac8 100644
--- a/drivers/iio/adc/stm32-adc.c
+++ b/drivers/iio/adc/stm32-adc.c
@@ -1910,7 +1910,7 @@  static int stm32_adc_probe(struct platform_device *pdev)
 
 	ret = devm_request_threaded_irq(&pdev->dev, adc->irq, stm32_adc_isr,
 					stm32_adc_threaded_isr,
-					0, pdev->name, indio_dev);
+					IRQF_ONESHOT, pdev->name, indio_dev);
 	if (ret) {
 		dev_err(&pdev->dev, "failed to request IRQ\n");
 		return ret;