Message ID | 20200727145714.4377-1-ceggers@arri.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iio: trigger: sysfs: Disable irqs before calling iio_trigger_poll() | expand |
On Mon, 27 Jul 2020 16:57:13 +0200 Christian Eggers <ceggers@arri.de> wrote: > iio_trigger_poll() calls generic_handle_irq(). This function expects to > be run with local IRQs disabled. Was there an error or warning that lead to this patch? Or can you point to what call in generic_handle_irq is making the assumption that we are breaking? Given this is using the irq_work framework I'm wondering if this is a more general problem? Basically more info please! Thanks, Jonathan > > Signed-off-by: Christian Eggers <ceggers@arri.de> > Cc: stable@vger.kernel.org > --- > drivers/iio/trigger/iio-trig-sysfs.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/iio/trigger/iio-trig-sysfs.c b/drivers/iio/trigger/iio-trig-sysfs.c > index e09e58072872..66a96b1632f8 100644 > --- a/drivers/iio/trigger/iio-trig-sysfs.c > +++ b/drivers/iio/trigger/iio-trig-sysfs.c > @@ -94,7 +94,9 @@ static void iio_sysfs_trigger_work(struct irq_work *work) > struct iio_sysfs_trig *trig = container_of(work, struct iio_sysfs_trig, > work); > > + local_irq_disable(); > iio_trigger_poll(trig->trig); > + local_irq_enable(); > } > > static ssize_t iio_sysfs_trigger_poll(struct device *dev,
On 8/1/20 6:02 PM, Jonathan Cameron wrote: > On Mon, 27 Jul 2020 16:57:13 +0200 > Christian Eggers <ceggers@arri.de> wrote: > >> iio_trigger_poll() calls generic_handle_irq(). This function expects to >> be run with local IRQs disabled. > Was there an error or warning that lead to this patch? > Or can you point to what call in generic_handle_irq is making the > assumption that we are breaking? > > Given this is using the irq_work framework I'm wondering if this is > a more general problem? > > Basically more info please! There is this series https://lkml.org/lkml/2020/3/6/433 which causes generic_handle_irq() to issue an warning if it is called with IRQs on, for a IRQ controller that can't handle it. But I'm not convinced this applies to the IIO code, since this is a purely virtual interrupt and is not interfering with any interrupt controller hardware.
On Saturday, 1 August 2020, 18:02:34 CEST, Jonathan Cameron wrote: > On Mon, 27 Jul 2020 16:57:13 +0200 > > Christian Eggers <ceggers@arri.de> wrote: > > iio_trigger_poll() calls generic_handle_irq(). This function expects to > > be run with local IRQs disabled. > > Was there an error or warning that lead to this patch? [ 17.448466] 000: ------------[ cut here ]------------ [ 17.448481] 000: WARNING: CPU: 0 PID: 9 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x55/0xae [ 17.448511] 000: irq 236 handler irq_default_primary_handler+0x1/0x4 enabled interrupts [ 17.448526] 000: Modules linked in: bridge stp llc usb_f_ncm u_ether libcomposite sd_mod configfs cdc_acm usb_storage scsi_mod ci_hdrc_imx ci_hdrc st_magn_spi ulpi st_sensors_spi ehci_hcd regmap_spi tcpm roles st_magn_i2c typec st_sensors_i2c udc_core st_magn as73211 st_sensors imx_thermal usb49xx usbcore industrialio_triggered_buffer rtc_rv3028 kfifo_buf at24 usb_common nls_base i2c_dev usbmisc_imx phy_mxs_usb anatop_regulator imx2_wdt imx_fan spidev leds_pwm leds_gpio led_class iio_trig_sysfs imx6sx_adc industrialio fixed at25 spi_imx spi_bitbang imx_napi dev imx_sdma virt_dma nfsv3 nfs lockd grace sunrpc ksz9477_i2c ksz9477 tag_ksz ksz_common dsa_core phylink regmap_i2c i2c_imx i2c_core fec ptp pps_core micrel [ 17.448712] 000: CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.47-rt28+ #446 [ 17.448723] 000: Hardware name: Freescale i.MX6 Ultralite (Device Tree) [ 17.448738] 000: [<c0108265>] (unwind_backtrace) from [<c01070a7>] (show_stack+0xb/0xc) [ 17.448754] 000: [<c01070a7>] (show_stack) from [<c0110673>] (__warn+0x7b/0x8c) [ 17.448772] 000: [<c0110673>] (__warn) from [<c01106b5>] (warn_slowpath_fmt+0x31/0x50) [ 17.448787] 000: [<c01106b5>] (warn_slowpath_fmt) from [<c012be53>] (__handle_irq_event_percpu+0x55/0xae) [ 17.448807] 000: [<c012be53>] (__handle_irq_event_percpu) from [<c012bec5>] (handle_irq_event_percpu+0x19/0x40) [ 17.448823] 000: [<c012bec5>] (handle_irq_event_percpu) from [<c012bf2b>] (handle_irq_event+0x3f/0x5c) [ 17.448839] 000: [<c012bf2b>] (handle_irq_event) from [<c012dd73>] (handle_simple_irq+0x67/0x6a) [ 17.448854] 000: [<c012dd73>] (handle_simple_irq) from [<c012b915>] (generic_handle_irq+0xd/0x16) [ 17.448870] 000: [<c012b915>] (generic_handle_irq) from [<bf8fcf05>] (iio_trigger_poll+0x33/0x44 [industrialio]) [ 17.448962] 000: [<bf8fcf05>] (iio_trigger_poll [industrialio]) from [<c0147b93>] (irq_work_run_list+0x43/0x66) [ 17.449010] 000: [<c0147b93>] (irq_work_run_list) from [<c013804f>] (run_timer_softirq+0x7/0x3c) [ 17.449029] 000: [<c013804f>] (run_timer_softirq) from [<c01022cf>] (__do_softirq+0x10f/0x160) [ 17.449045] 000: [<c01022cf>] (__do_softirq) from [<c0112255>] (run_ksoftirqd+0x19/0x2c) [ 17.449061] 000: [<c0112255>] (run_ksoftirqd) from [<c012153b>] (smpboot_thread_fn+0x13b/0x140) [ 17.449078] 000: [<c012153b>] (smpboot_thread_fn) from [<c011f823>] (kthread+0xa3/0xac) [ 17.449095] 000: [<c011f823>] (kthread) from [<c01010f1>] (ret_from_fork+0x11/0x20) [ 17.449110] 000: Exception stack(0xc2063fb0 to 0xc2063ff8) [ 17.449119] 000: 3fa0: 00000000 00000000 00000000 00000000 [ 17.449130] 000: 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 17.449139] 000: 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 17.449146] 000: ---[ end trace 0000000000000002 ]--- > Or can you point to what call in generic_handle_irq is making the > assumption that we are breaking? > > Given this is using the irq_work framework I'm wondering if this is > a more general problem? If I understand correctly, the kernel temporarily disables hardware interrupts while hardware irq handlers are run. In case of the iio-trig-hrtim and iio-trig-sysfs interrupts, __handle_irq_event_percpu() is not called from a hardware irq (where interrupts would be disabled), but from software. Similar examples found here: 0a29ac5bd3 ("net: usb: lan78xx: Disable interrupts before calling generic_handle_irq()") and drivers/i2c/busses/i2c-cht-wc.c:103 > > Basically more info please! > > Thanks, > > Jonathan > Regards Christian
On 8/3/20 7:16 AM, Christian Eggers wrote: > On Saturday, 1 August 2020, 18:02:34 CEST, Jonathan Cameron wrote: >> On Mon, 27 Jul 2020 16:57:13 +0200 >> >> Christian Eggers <ceggers@arri.de> wrote: >>> iio_trigger_poll() calls generic_handle_irq(). This function expects to >>> be run with local IRQs disabled. >> Was there an error or warning that lead to this patch? > [ 17.448466] 000: ------------[ cut here ]------------ > [ 17.448481] 000: WARNING: CPU: 0 PID: 9 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x55/0xae > [ 17.448511] 000: irq 236 handler irq_default_primary_handler+0x1/0x4 enabled interrupts > [ 17.448526] 000: Modules linked in: bridge stp llc usb_f_ncm u_ether libcomposite sd_mod configfs cdc_acm usb_storage scsi_mod ci_hdrc_imx ci_hdrc st_magn_spi ulpi st_sensors_spi ehci_hcd regmap_spi tcpm roles st_magn_i2c typec st_sensors_i2c udc_core st_magn as73211 st_sensors imx_thermal usb49xx usbcore industrialio_triggered_buffer rtc_rv3028 kfifo_buf at24 usb_common nls_base i2c_dev usbmisc_imx phy_mxs_usb anatop_regulator imx2_wdt imx_fan spidev leds_pwm leds_gpio led_class iio_trig_sysfs imx6sx_adc industrialio fixed at25 spi_imx spi_bitbang imx_napi dev imx_sdma virt_dma nfsv3 nfs lockd grace sunrpc ksz9477_i2c ksz9477 tag_ksz ksz_common dsa_core phylink regmap_i2c i2c_imx i2c_core fec ptp pps_core micrel > [ 17.448712] 000: CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.47-rt28+ #446 > [ 17.448723] 000: Hardware name: Freescale i.MX6 Ultralite (Device Tree) > [ 17.448738] 000: [<c0108265>] (unwind_backtrace) from [<c01070a7>] (show_stack+0xb/0xc) > [ 17.448754] 000: [<c01070a7>] (show_stack) from [<c0110673>] (__warn+0x7b/0x8c) > [ 17.448772] 000: [<c0110673>] (__warn) from [<c01106b5>] (warn_slowpath_fmt+0x31/0x50) > [ 17.448787] 000: [<c01106b5>] (warn_slowpath_fmt) from [<c012be53>] (__handle_irq_event_percpu+0x55/0xae) > [ 17.448807] 000: [<c012be53>] (__handle_irq_event_percpu) from [<c012bec5>] (handle_irq_event_percpu+0x19/0x40) > [ 17.448823] 000: [<c012bec5>] (handle_irq_event_percpu) from [<c012bf2b>] (handle_irq_event+0x3f/0x5c) > [ 17.448839] 000: [<c012bf2b>] (handle_irq_event) from [<c012dd73>] (handle_simple_irq+0x67/0x6a) > [ 17.448854] 000: [<c012dd73>] (handle_simple_irq) from [<c012b915>] (generic_handle_irq+0xd/0x16) > [ 17.448870] 000: [<c012b915>] (generic_handle_irq) from [<bf8fcf05>] (iio_trigger_poll+0x33/0x44 [industrialio]) > [ 17.448962] 000: [<bf8fcf05>] (iio_trigger_poll [industrialio]) from [<c0147b93>] (irq_work_run_list+0x43/0x66) > [ 17.449010] 000: [<c0147b93>] (irq_work_run_list) from [<c013804f>] (run_timer_softirq+0x7/0x3c) > [ 17.449029] 000: [<c013804f>] (run_timer_softirq) from [<c01022cf>] (__do_softirq+0x10f/0x160) > [ 17.449045] 000: [<c01022cf>] (__do_softirq) from [<c0112255>] (run_ksoftirqd+0x19/0x2c) > [ 17.449061] 000: [<c0112255>] (run_ksoftirqd) from [<c012153b>] (smpboot_thread_fn+0x13b/0x140) > [ 17.449078] 000: [<c012153b>] (smpboot_thread_fn) from [<c011f823>] (kthread+0xa3/0xac) > [ 17.449095] 000: [<c011f823>] (kthread) from [<c01010f1>] (ret_from_fork+0x11/0x20) > [ 17.449110] 000: Exception stack(0xc2063fb0 to 0xc2063ff8) > [ 17.449119] 000: 3fa0: 00000000 00000000 00000000 00000000 > [ 17.449130] 000: 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 17.449139] 000: 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > [ 17.449146] 000: ---[ end trace 0000000000000002 ]--- > > > >> Or can you point to what call in generic_handle_irq is making the >> assumption that we are breaking? >> >> Given this is using the irq_work framework I'm wondering if this is >> a more general problem? > If I understand correctly, the kernel temporarily disables hardware interrupts > while hardware irq handlers are run. In case of the iio-trig-hrtim and iio-trig-sysfs > interrupts, __handle_irq_event_percpu() is not called from a hardware irq > (where interrupts would be disabled), but from software. The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll() and the promise of irq_work is that the callback will run in hard IRQ context. That's the whole point of it. irq_work_run_list(), which shows up in your callgraph, has as BUG_ON(!irqs_disabled())[1], so we should never even get to calling iio_trigger_poll() if IRQs where not disabled at this point. That's the same condition that triggers the WARN_ON() in __handle_irq_event_percpu. Are you using a non-upstream kernel? Maybe a RT kernel? - Lars [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/irq_work.c#n163
Hi Lars, On Monday, 3 August 2020, 08:37:43 CEST, Lars-Peter Clausen wrote: > The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll() > and the promise of irq_work is that the callback will run in hard IRQ > context. That's the whole point of it. > > irq_work_run_list(), which shows up in your callgraph, has as > BUG_ON(!irqs_disabled())[1], so we should never even get to calling > iio_trigger_poll() if IRQs where not disabled at this point. That's the > same condition that triggers the WARN_ON() in __handle_irq_event_percpu. is my patch sufficient, or would you prefer a different solution? > Are you using a non-upstream kernel? Maybe a RT kernel? I use v5.4.<almost-latest>-rt regards Christian
On 8/3/20 8:44 AM, Christian Eggers wrote: > Hi Lars, > > On Monday, 3 August 2020, 08:37:43 CEST, Lars-Peter Clausen wrote: >> The sysfs IIO trigger uses irq_work to schedule the iio_trigger_poll() >> and the promise of irq_work is that the callback will run in hard IRQ >> context. That's the whole point of it. >> >> irq_work_run_list(), which shows up in your callgraph, has as >> BUG_ON(!irqs_disabled())[1], so we should never even get to calling >> iio_trigger_poll() if IRQs where not disabled at this point. That's the >> same condition that triggers the WARN_ON() in __handle_irq_event_percpu. > is my patch sufficient, or would you prefer a different solution? The code in normal upstream is correct, there is no need to patch it since iio_sysfs_trigger_work() always runs with IRQs disabled. > >> Are you using a non-upstream kernel? Maybe a RT kernel? > I use v5.4.<almost-latest>-rt That explains it. Have a look at 0200-irqwork-push-most-work-into-softirq-context.patch. The right fix for this issue is to add the following snippet to the RT patchset. diff --git a/drivers/iio/trigger/iio-trig-sysfs.c b/drivers/iio/trigger/iio-trig-sysfs.c --- a/drivers/iio/trigger/iio-trig-sysfs.c +++ b/drivers/iio/trigger/iio-trig-sysfs.c @@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id) iio_trigger_set_drvdata(t->trig, t); init_irq_work(&t->work, iio_sysfs_trigger_work); + t->work.flags = IRQ_WORK_HARD_IRQ; ret = iio_trigger_register(t->trig); if (ret)
Hi Lars On Monday, 3 August 2020, 08:52:54 CEST, Lars-Peter Clausen wrote: > On 8/3/20 8:44 AM, Christian Eggers wrote: > > ... > > is my patch sufficient, or would you prefer a different solution? > > The code in normal upstream is correct, there is no need to patch it > since iio_sysfs_trigger_work() always runs with IRQs disabled. > > >> Are you using a non-upstream kernel? Maybe a RT kernel? > > > > I use v5.4.<almost-latest>-rt > > That explains it. Have a look at > 0200-irqwork-push-most-work-into-softirq-context.patch. > > The right fix for this issue is to add the following snippet to the RT > patchset. > > diff --git a/drivers/iio/trigger/iio-trig-sysfs.c > b/drivers/iio/trigger/iio-trig-sysfs.c > --- a/drivers/iio/trigger/iio-trig-sysfs.c > +++ b/drivers/iio/trigger/iio-trig-sysfs.c > @@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id) > iio_trigger_set_drvdata(t->trig, t); > > init_irq_work(&t->work, iio_sysfs_trigger_work); > + t->work.flags = IRQ_WORK_HARD_IRQ; > > ret = iio_trigger_register(t->trig); > if (ret) I can confirm that this works for iio-trig-sysfs on 5.4.54-rt32. Currently I do not use iio-trig-hrtimer, but if I remember correctly, the problem was also present there. Do you want to apply your patch for mainline? In contrast to v5.4, IRQ_WORK_HARD_IRQ is already available there (moved to smp_types.h). Unfortunately I cannot test it on mainline for now, as my BSP stuff is not ported yet. Best regards Christian
On 8/12/20 1:01 PM, Christian Eggers wrote: > Hi Lars > > On Monday, 3 August 2020, 08:52:54 CEST, Lars-Peter Clausen wrote: >> On 8/3/20 8:44 AM, Christian Eggers wrote: >>> ... >>> is my patch sufficient, or would you prefer a different solution? >> The code in normal upstream is correct, there is no need to patch it >> since iio_sysfs_trigger_work() always runs with IRQs disabled. >> >>>> Are you using a non-upstream kernel? Maybe a RT kernel? >>> I use v5.4.<almost-latest>-rt >> That explains it. Have a look at >> 0200-irqwork-push-most-work-into-softirq-context.patch. >> >> The right fix for this issue is to add the following snippet to the RT >> patchset. >> >> diff --git a/drivers/iio/trigger/iio-trig-sysfs.c >> b/drivers/iio/trigger/iio-trig-sysfs.c >> --- a/drivers/iio/trigger/iio-trig-sysfs.c >> +++ b/drivers/iio/trigger/iio-trig-sysfs.c >> @@ -161,6 +161,7 @@ static int iio_sysfs_trigger_probe(int id) >> iio_trigger_set_drvdata(t->trig, t); >> >> init_irq_work(&t->work, iio_sysfs_trigger_work); >> + t->work.flags = IRQ_WORK_HARD_IRQ; >> >> ret = iio_trigger_register(t->trig); >> if (ret) > I can confirm that this works for iio-trig-sysfs on 5.4.54-rt32. Currently I > do not use iio-trig-hrtimer, but if I remember correctly, the problem was also > present there. Similar story, I think. On mainline hrtimers run in hardirq mode by default, whereas in RT they run in softirq mode by default. So we haven't see the issue in mainline. To fix this we need to explicitly specify that the IIO hrtimer always needs to run in hardirq mode by using the HRTIMER_MODE_REL_HARD flag. I'll send a patch. > Do you want to apply your patch for mainline? In contrast to v5.4, > IRQ_WORK_HARD_IRQ is already available there (moved to smp_types.h). > Unfortunately I cannot test it on mainline for now, as my BSP stuff is not > ported yet. Sounds like a plan :)
diff --git a/drivers/iio/trigger/iio-trig-sysfs.c b/drivers/iio/trigger/iio-trig-sysfs.c index e09e58072872..66a96b1632f8 100644 --- a/drivers/iio/trigger/iio-trig-sysfs.c +++ b/drivers/iio/trigger/iio-trig-sysfs.c @@ -94,7 +94,9 @@ static void iio_sysfs_trigger_work(struct irq_work *work) struct iio_sysfs_trig *trig = container_of(work, struct iio_sysfs_trig, work); + local_irq_disable(); iio_trigger_poll(trig->trig); + local_irq_enable(); } static ssize_t iio_sysfs_trigger_poll(struct device *dev,
iio_trigger_poll() calls generic_handle_irq(). This function expects to be run with local IRQs disabled. Signed-off-by: Christian Eggers <ceggers@arri.de> Cc: stable@vger.kernel.org --- drivers/iio/trigger/iio-trig-sysfs.c | 2 ++ 1 file changed, 2 insertions(+)