iio: trigger: sysfs: Disable irqs before calling iio_trigger_poll()
diff mbox series

Message ID 20200727145714.4377-1-ceggers@arri.de
State New
Headers show
Series
  • iio: trigger: sysfs: Disable irqs before calling iio_trigger_poll()
Related show

Commit Message

Christian Eggers July 27, 2020, 2:57 p.m. UTC
iio_trigger_poll() calls generic_handle_irq(). This function expects to
be run with local IRQs disabled.

Signed-off-by: Christian Eggers <ceggers@arri.de>
Cc: stable@vger.kernel.org
---
 drivers/iio/trigger/iio-trig-sysfs.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jonathan Cameron Aug. 1, 2020, 4:02 p.m. UTC | #1
On Mon, 27 Jul 2020 16:57:13 +0200
Christian Eggers <ceggers@arri.de> wrote:

> iio_trigger_poll() calls generic_handle_irq(). This function expects to
> be run with local IRQs disabled.

Was there an error or warning that lead to this patch?
Or can you point to what call in generic_handle_irq is making the
assumption that we are breaking?

Given this is using the irq_work framework I'm wondering if this is
a more general problem?

Basically more info please!

Thanks,

Jonathan


> 
> Signed-off-by: Christian Eggers <ceggers@arri.de>
> Cc: stable@vger.kernel.org
> ---
>  drivers/iio/trigger/iio-trig-sysfs.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iio/trigger/iio-trig-sysfs.c b/drivers/iio/trigger/iio-trig-sysfs.c
> index e09e58072872..66a96b1632f8 100644
> --- a/drivers/iio/trigger/iio-trig-sysfs.c
> +++ b/drivers/iio/trigger/iio-trig-sysfs.c
> @@ -94,7 +94,9 @@ static void iio_sysfs_trigger_work(struct irq_work *work)
>  	struct iio_sysfs_trig *trig = container_of(work, struct iio_sysfs_trig,
>  							work);
>  
> +	local_irq_disable();
>  	iio_trigger_poll(trig->trig);
> +	local_irq_enable();
>  }
>  
>  static ssize_t iio_sysfs_trigger_poll(struct device *dev,
Lars-Peter Clausen Aug. 2, 2020, 3:05 p.m. UTC | #2
On 8/1/20 6:02 PM, Jonathan Cameron wrote:
> On Mon, 27 Jul 2020 16:57:13 +0200
> Christian Eggers <ceggers@arri.de> wrote:
>
>> iio_trigger_poll() calls generic_handle_irq(). This function expects to
>> be run with local IRQs disabled.
> Was there an error or warning that lead to this patch?
> Or can you point to what call in generic_handle_irq is making the
> assumption that we are breaking?
>
> Given this is using the irq_work framework I'm wondering if this is
> a more general problem?
>
> Basically more info please!

There is this series https://lkml.org/lkml/2020/3/6/433 which causes 
generic_handle_irq() to issue an warning if it is called with IRQs on, 
for a IRQ controller that can't handle it.

But I'm not convinced this applies to the IIO code, since this is a 
purely virtual interrupt and is not interfering with any interrupt 
controller hardware.
Christian Eggers Aug. 3, 2020, 5:16 a.m. UTC | #3
On Saturday, 1 August 2020, 18:02:34 CEST, Jonathan Cameron wrote:
> On Mon, 27 Jul 2020 16:57:13 +0200
> 
> Christian Eggers <ceggers@arri.de> wrote:
> > iio_trigger_poll() calls generic_handle_irq(). This function expects to
> > be run with local IRQs disabled.
> 
> Was there an error or warning that lead to this patch?
[   17.448466] 000: ------------[ cut here ]------------
[   17.448481] 000: WARNING: CPU: 0 PID: 9 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x55/0xae
[   17.448511] 000: irq 236 handler irq_default_primary_handler+0x1/0x4 enabled interrupts
[   17.448526] 000: Modules linked in: bridge stp llc usb_f_ncm u_ether libcomposite sd_mod configfs cdc_acm usb_storage scsi_mod ci_hdrc_imx ci_hdrc st_magn_spi ulpi st_sensors_spi ehci_hcd regmap_spi tcpm roles st_magn_i2c typec st_sensors_i2c udc_core st_magn as73211 st_sensors imx_thermal usb49xx usbcore industrialio_triggered_buffer rtc_rv3028 kfifo_buf at24 usb_common nls_base i2c_dev usbmisc_imx phy_mxs_usb anatop_regulator imx2_wdt imx_fan spidev leds_pwm leds_gpio led_class iio_trig_sysfs imx6sx_adc industrialio fixed at25 spi_imx spi_bitbang imx_napi dev imx_sdma virt_dma nfsv3 nfs lockd grace sunrpc ksz9477_i2c ksz9477 tag_ksz ksz_common dsa_core phylink regmap_i2c i2c_imx i2c_core fec ptp pps_core micrel
[   17.448712] 000: CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.47-rt28+ #446
[   17.448723] 000: Hardware name: Freescale i.MX6 Ultralite (Device Tree)
[   17.448738] 000: [<c0108265>] (unwind_backtrace) from [<c01070a7>] (show_stack+0xb/0xc)
[   17.448754] 000: [<c01070a7>] (show_stack) from [<c0110673>] (__warn+0x7b/0x8c)
[   17.448772] 000: [<c0110673>] (__warn) from [<c01106b5>] (warn_slowpath_fmt+0x31/0x50)
[   17.448787] 000: [<c01106b5>] (warn_slowpath_fmt) from [<c012be53>] (__handle_irq_event_percpu+0x55/0xae)
[   17.448807] 000: [<c012be53>] (__handle_irq_event_percpu) from [<c012bec5>] (handle_irq_event_percpu+0x19/0x40)
[   17.448823] 000: [<c012bec5>] (handle_irq_event_percpu) from [<c012bf2b>] (handle_irq_event+0x3f/0x5c)
[   17.448839] 000: [<c012bf2b>] (handle_irq_event) from [<c012dd73>] (handle_simple_irq+0x67/0x6a)
[   17.448854] 000: [<c012dd73>] (handle_simple_irq) from [<c012b915>] (generic_handle_irq+0xd/0x16)
[   17.448870] 000: [<c012b915>] (generic_handle_irq) from [<bf8fcf05>] (iio_trigger_poll+0x33/0x44 [industrialio])
[   17.448962] 000: [<bf8fcf05>] (iio_trigger_poll [industrialio]) from [<c0147b93>] (irq_work_run_list+0x43/0x66)
[   17.449010] 000: [<c0147b93>] (irq_work_run_list) from [<c013804f>] (run_timer_softirq+0x7/0x3c)
[   17.449029] 000: [<c013804f>] (run_timer_softirq) from [<c01022cf>] (__do_softirq+0x10f/0x160)
[   17.449045] 000: [<c01022cf>] (__do_softirq) from [<c0112255>] (run_ksoftirqd+0x19/0x2c)
[   17.449061] 000: [<c0112255>] (run_ksoftirqd) from [<c012153b>] (smpboot_thread_fn+0x13b/0x140)
[   17.449078] 000: [<c012153b>] (smpboot_thread_fn) from [<c011f823>] (kthread+0xa3/0xac)
[   17.449095] 000: [<c011f823>] (kthread) from [<c01010f1>] (ret_from_fork+0x11/0x20)
[   17.449110] 000: Exception stack(0xc2063fb0 to 0xc2063ff8)
[   17.449119] 000: 3fa0:                                     00000000 00000000 00000000 00000000
[   17.449130] 000: 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   17.449139] 000: 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   17.449146] 000: ---[ end trace 0000000000000002 ]---



> Or can you point to what call in generic_handle_irq is making the
> assumption that we are breaking?
> 
> Given this is using the irq_work framework I'm wondering if this is
> a more general problem?

If I understand correctly, the kernel temporarily disables hardware interrupts
while hardware irq handlers are run. In case of the iio-trig-hrtim and iio-trig-sysfs
interrupts, __handle_irq_event_percpu() is not called from a hardware irq
(where interrupts would be disabled), but from software.

Similar examples found here:
0a29ac5bd3 ("net: usb: lan78xx: Disable interrupts before calling generic_handle_irq()")

and
drivers/i2c/busses/i2c-cht-wc.c:103


> 
> Basically more info please!
> 
> Thanks,
> 
> Jonathan
> 
Regards
Christian
Lars-Peter Clausen Aug. 3, 2020, 6:37 a.m. UTC | #4
On 8/3/20 7:16 AM, Christian Eggers wrote:
> On Saturday, 1 August 2020, 18:02:34 CEST, Jonathan Cameron wrote:
>> On Mon, 27 Jul 2020 16:57:13 +0200
>>
>> Christian Eggers <ceggers@arri.de> wrote:
>>> iio_trigger_poll() calls generic_handle_irq(). This function expects to
>>> be run with local IRQs disabled.
>> Was there an error or warning that lead to this patch?
> [   17.448466] 000: ------------[ cut here ]------------
> [   17.448481] 000: WARNING: CPU: 0 PID: 9 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x55/0xae
> [   17.448511] 000: irq 236 handler irq_default_primary_handler+0x1/0x4 enabled interrupts
> [   17.448526] 000: Modules linked in: bridge stp llc usb_f_ncm u_ether libcomposite sd_mod configfs cdc_acm usb_storage scsi_mod ci_hdrc_imx ci_hdrc st_magn_spi ulpi st_sensors_spi ehci_hcd regmap_spi tcpm roles st_magn_i2c typec st_sensors_i2c udc_core st_magn as73211 st_sensors imx_thermal usb49xx usbcore industrialio_triggered_buffer rtc_rv3028 kfifo_buf at24 usb_common nls_base i2c_dev usbmisc_imx phy_mxs_usb anatop_regulator imx2_wdt imx_fan spidev leds_pwm leds_gpio led_class iio_trig_sysfs imx6sx_adc industrialio fixed at25 spi_imx spi_bitbang imx_napi dev imx_sdma virt_dma nfsv3 nfs lockd grace sunrpc ksz9477_i2c ksz9477 tag_ksz ksz_common dsa_core phylink regmap_i2c i2c_imx i2c_core fec ptp pps_core micrel
> [   17.448712] 000: CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.47-rt28+ #446
> [   17.448723] 000: Hardware name: Freescale i.MX6 Ultralite (Device Tree)
> [   17.448738] 000: [<c0108265>] (unwind_backtrace) from [<c01070a7>] (show_stack+0xb/0xc)
> [   17.448754] 000: [<c01070a7>] (show_stack) from [<c0110673>] (__warn+0x7b/0x8c)
> [   17.448772] 000: [<c0110673>] (__warn) from [<c01106b5>] (warn_slowpath_fmt+0x31/0x50)
> [   17.448787] 000: [<c01106b5>] (warn_slowpath_fmt) from [<c012be53>] (__handle_irq_event_percpu+0x55/0xae)
> [   17.448807] 000: [<c012be53>] (__handle_irq_event_percpu) from [<c012bec5>] (handle_irq_event_percpu+0x19/0x40)
> [   17.448823] 000: [<c012bec5>] (handle_irq_event_percpu) from [<c012bf2b>] (handle_irq_event+0x3f/0x5c)
> [   17.448839] 000: [<c012bf2b>] (handle_irq_event) from [<c012dd73>] (handle_simple_irq+0x67/0x6a)
> [   17.448854] 000: [<c012dd73>] (handle_simple_irq) from [<c012b915>] (generic_handle_irq+0xd/0x16)
> [   17.448870] 000: [<c012b915>] (generic_handle_irq) from [<bf8fcf05>] (iio_trigger_poll+0x33/0x44 [industrialio])
> [   17.448962] 000: [<bf8fcf05>] (iio_trigger_poll [industrialio]) from [<c0147b93>] (irq_work_run_list+0x43/0x66)
> [   17.449010] 000: [<c0147b93>] (irq_work_run_list) from [<c013804f>] (run_timer_softirq+0x7/0x3c)
> [   17.449029] 000: [<c013804f>] (run_timer_softirq) from [<c01022cf>] (__do_softirq+0x10f/0x160)
> [   17.449045] 000: [<c01022cf>] (__do_softirq) from [<c0112255>] (run_ksoftirqd+0x19/0x2c)
> [   17.449061] 000: [<c0112255>] (run_ksoftirqd) from [<c012153b>] (smpboot_thread_fn+0x13b/0x140)
> [   17.449078] 000: [<c012153b>] (smpboot_thread_fn) from [<c011f823>] (kthread+0xa3/0xac)
> [   17.449095] 000: [<c011f823>] (kthread) from [<c01010f1>] (ret_from_fork+0x11/0x20)
> [   17.449110] 000: Exception stack(0xc2063fb0 to 0xc2063ff8)
> [   17.449119] 000: 3fa0:                                     00000000 00000000 00000000 00000000
> [   17.449130] 000: 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   17.449139] 000: 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [   17.449146] 000: ---[ end trace 0000000000000002 ]---
>
>
>
>> Or can you point to what call in generic_handle_irq is making the
>> assumption that we are breaking?
>>
>> Given this is using the irq_work framework I'm wondering if this is
>> a more general problem?
> If I understand correctly, the kernel temporarily disables hardware interrupts
> while hardware irq handlers are run. In case of the iio-trig-hrtim and iio-trig-sysfs
> interrupts, __handle_irq_event_percpu() is not called from a hardware irq
> (where interrupts would be disabled), but from software.

The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll() 
and the promise of irq_work is that the callback will run in hard IRQ 
context. That's the whole point of it.

irq_work_run_list(), which shows up in your callgraph, has as 
BUG_ON(!irqs_disabled())[1], so we should never even get to calling 
iio_trigger_poll() if IRQs where not disabled at this point. That's the 
same condition that triggers the WARN_ON() in __handle_irq_event_percpu.

Are you using a non-upstream kernel? Maybe a RT kernel?

- Lars

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c#n163
Christian Eggers Aug. 3, 2020, 6:44 a.m. UTC | #5
Hi Lars,

On Monday, 3 August 2020, 08:37:43 CEST, Lars-Peter Clausen wrote:
> The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll()
> and the promise of irq_work is that the callback will run in hard IRQ
> context. That's the whole point of it.
> 
> irq_work_run_list(), which shows up in your callgraph, has as
> BUG_ON(!irqs_disabled())[1], so we should never even get to calling
> iio_trigger_poll() if IRQs where not disabled at this point. That's the
> same condition that triggers the WARN_ON() in __handle_irq_event_percpu.
is my patch sufficient, or would you prefer a different solution?

> Are you using a non-upstream kernel? Maybe a RT kernel?
I use v5.4.<almost-latest>-rt

regards
Christian
Lars-Peter Clausen Aug. 3, 2020, 6:52 a.m. UTC | #6
On 8/3/20 8:44 AM, Christian Eggers wrote:
> Hi Lars,
>
> On Monday, 3 August 2020, 08:37:43 CEST, Lars-Peter Clausen wrote:
>> The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll()
>> and the promise of irq_work is that the callback will run in hard IRQ
>> context. That's the whole point of it.
>>
>> irq_work_run_list(), which shows up in your callgraph, has as
>> BUG_ON(!irqs_disabled())[1], so we should never even get to calling
>> iio_trigger_poll() if IRQs where not disabled at this point. That's the
>> same condition that triggers the WARN_ON() in __handle_irq_event_percpu.
> is my patch sufficient, or would you prefer a different solution?
The code in normal upstream is correct, there is no need to patch it 
since iio_sysfs_trigger_work() always runs with IRQs disabled.
>
>> Are you using a non-upstream kernel? Maybe a RT kernel?
> I use v5.4.<almost-latest>-rt

That explains it. Have a look at 
0200-irqwork-push-most-work-into-softirq-context.patch.

The right fix for this issue is to add the following snippet to the RT 
patchset.

diff --git a/drivers/iio/trigger/iio-trig-sysfs.c 
b/drivers/iio/trigger/iio-trig-sysfs.c
--- a/drivers/iio/trigger/iio-trig-sysfs.c
+++ b/drivers/iio/trigger/iio-trig-sysfs.c
@@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id)
      iio_trigger_set_drvdata(t->trig, t);

      init_irq_work(&t->work, iio_sysfs_trigger_work);
+    t->work.flags = IRQ_WORK_HARD_IRQ;

      ret = iio_trigger_register(t->trig);
      if (ret)
Christian Eggers Aug. 12, 2020, 11:01 a.m. UTC | #7
Hi Lars

On Monday, 3 August 2020, 08:52:54 CEST, Lars-Peter Clausen wrote:
> On 8/3/20 8:44 AM, Christian Eggers wrote:
> > ...
> > is my patch sufficient, or would you prefer a different solution?
> 
> The code in normal upstream is correct, there is no need to patch it
> since iio_sysfs_trigger_work() always runs with IRQs disabled.
> 
> >> Are you using a non-upstream kernel? Maybe a RT kernel?
> > 
> > I use v5.4.<almost-latest>-rt
> 
> That explains it. Have a look at
> 0200-irqwork-push-most-work-into-softirq-context.patch.
> 
> The right fix for this issue is to add the following snippet to the RT
> patchset.
> 
> diff --git a/drivers/iio/trigger/iio-trig-sysfs.c
> b/drivers/iio/trigger/iio-trig-sysfs.c
> --- a/drivers/iio/trigger/iio-trig-sysfs.c
> +++ b/drivers/iio/trigger/iio-trig-sysfs.c
> @@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id)
>       iio_trigger_set_drvdata(t->trig, t);
> 
>       init_irq_work(&t->work, iio_sysfs_trigger_work);
> +    t->work.flags = IRQ_WORK_HARD_IRQ;
> 
>       ret = iio_trigger_register(t->trig);
>       if (ret)

I can confirm that this works for iio-trig-sysfs on 5.4.54-rt32. Currently I 
do not use iio-trig-hrtimer, but if I remember correctly, the problem was also 
present there.

Do you want to apply your patch for mainline? In contrast to v5.4, 
IRQ_WORK_HARD_IRQ is already available there (moved to smp_types.h). 
Unfortunately I cannot test it on mainline for now, as my BSP stuff is not 
ported yet.

Best regards
Christian
Lars-Peter Clausen Aug. 13, 2020, 7:23 a.m. UTC | #8
On 8/12/20 1:01 PM, Christian Eggers wrote:
> Hi Lars
>
> On Monday, 3 August 2020, 08:52:54 CEST, Lars-Peter Clausen wrote:
>> On 8/3/20 8:44 AM, Christian Eggers wrote:
>>> ...
>>> is my patch sufficient, or would you prefer a different solution?
>> The code in normal upstream is correct, there is no need to patch it
>> since iio_sysfs_trigger_work() always runs with IRQs disabled.
>>
>>>> Are you using a non-upstream kernel? Maybe a RT kernel?
>>> I use v5.4.<almost-latest>-rt
>> That explains it. Have a look at
>> 0200-irqwork-push-most-work-into-softirq-context.patch.
>>
>> The right fix for this issue is to add the following snippet to the RT
>> patchset.
>>
>> diff --git a/drivers/iio/trigger/iio-trig-sysfs.c
>> b/drivers/iio/trigger/iio-trig-sysfs.c
>> --- a/drivers/iio/trigger/iio-trig-sysfs.c
>> +++ b/drivers/iio/trigger/iio-trig-sysfs.c
>> @@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id)
>>        iio_trigger_set_drvdata(t->trig, t);
>>
>>        init_irq_work(&t->work, iio_sysfs_trigger_work);
>> +    t->work.flags = IRQ_WORK_HARD_IRQ;
>>
>>        ret = iio_trigger_register(t->trig);
>>        if (ret)
> I can confirm that this works for iio-trig-sysfs on 5.4.54-rt32. Currently I
> do not use iio-trig-hrtimer, but if I remember correctly, the problem was also
> present there.

Similar story, I think. On mainline hrtimers run in hardirq mode by 
default, whereas in RT they run in softirq mode by default. So we 
haven't see the issue in mainline.

To fix this we need to explicitly specify that the IIO hrtimer always 
needs to run in hardirq mode by using the HRTIMER_MODE_REL_HARD flag. 
I'll send a patch.

> Do you want to apply your patch for mainline? In contrast to v5.4,
> IRQ_WORK_HARD_IRQ is already available there (moved to smp_types.h).
> Unfortunately I cannot test it on mainline for now, as my BSP stuff is not
> ported yet.

Sounds like a plan :)

Patch
diff mbox series

diff --git a/drivers/iio/trigger/iio-trig-sysfs.c b/drivers/iio/trigger/iio-trig-sysfs.c
index e09e58072872..66a96b1632f8 100644
--- a/drivers/iio/trigger/iio-trig-sysfs.c
+++ b/drivers/iio/trigger/iio-trig-sysfs.c
@@ -94,7 +94,9 @@  static void iio_sysfs_trigger_work(struct irq_work *work)
 	struct iio_sysfs_trig *trig = container_of(work, struct iio_sysfs_trig,
 							work);
 
+	local_irq_disable();
 	iio_trigger_poll(trig->trig);
+	local_irq_enable();
 }
 
 static ssize_t iio_sysfs_trigger_poll(struct device *dev,