Message ID | 20240430083934.26980-1-jason-jh.lin@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mailbox: mtk-cmdq: Fix sleeping function called from invalid context | expand |
Il 30/04/24 10:39, Jason-JH.Lin ha scritto: > When we run kernel with lockdebug option, we will get the BUG below: > [ 106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164 > [ 106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3 > [ 106.692226] preempt_count: 1, expected: 0 > [ 106.692254] RCU nest depth: 0, expected: 0 > [ 106.692282] INFO: lockdep is turned off. > [ 106.692306] irq event stamp: 0 > [ 106.692331] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0 > [ 106.692429] softirqs last enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0 > [ 106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9 > [ 106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT) > [ 106.692586] Workqueue: imgsys_runner imgsys_runner_func > [ 106.692638] Call trace: > [ 106.692662] dump_backtrace+0x100/0x120 > [ 106.692702] show_stack+0x20/0x2c > [ 106.692737] dump_stack_lvl+0x84/0xb4 > [ 106.692775] dump_stack+0x18/0x48 > [ 106.692809] __might_resched+0x354/0x4c0 > [ 106.692847] __might_sleep+0x98/0xe4 > [ 106.692883] __pm_runtime_resume+0x70/0x124 > [ 106.692921] cmdq_mbox_send_data+0xe4/0xb1c > [ 106.692964] msg_submit+0x194/0x2dc > [ 106.693003] mbox_send_message+0x190/0x330 > [ 106.693043] imgsys_cmdq_sendtask+0x1618/0x2224 > [ 106.693082] imgsys_runner_func+0xac/0x11c > [ 106.693118] process_one_work+0x638/0xf84 > [ 106.693158] worker_thread+0x808/0xcd0 > [ 106.693196] kthread+0x24c/0x324 > [ 106.693231] ret_from_fork+0x10/0x20 > > We found that there is a spin_lock_irqsave protection in msg_submit() > of mailbox.c. > So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(), > it will get this BUG report. > > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic > context with interrupts disabled. > I see. The problem with this is that pm_runtime_irq_safe() will raise the refcount of the parent device "forever", which isn't the best and partially defeats what we are trying to do here (keeping stuff off unless really needed). At this point I wonder if it'd be just better to really fix this instead of working around the problem. I'd say that one of the options would be to perform a change to msg_submit() so that it will take into account that a .send_data() callback might be using RPM functions. Ideas? :-) Cheers, Angelo > Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend") > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > --- > drivers/mailbox/mtk-cmdq-mailbox.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c > index ead2200f39ba..3b4f19633c83 100644 > --- a/drivers/mailbox/mtk-cmdq-mailbox.c > +++ b/drivers/mailbox/mtk-cmdq-mailbox.c > @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device *pdev) > > pm_runtime_set_autosuspend_delay(dev, CMDQ_MBOX_AUTOSUSPEND_DELAY_MS); > pm_runtime_use_autosuspend(dev); > + pm_runtime_irq_safe(dev); > > return 0; > }
On Tue, Apr 30, 2024 at 4:39 AM AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> wrote: > > Il 30/04/24 10:39, Jason-JH.Lin ha scritto: > > When we run kernel with lockdebug option, we will get the BUG below: > > [ 106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164 > > [ 106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3 > > [ 106.692226] preempt_count: 1, expected: 0 > > [ 106.692254] RCU nest depth: 0, expected: 0 > > [ 106.692282] INFO: lockdep is turned off. > > [ 106.692306] irq event stamp: 0 > > [ 106.692331] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > [ 106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0 > > [ 106.692429] softirqs last enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0 > > [ 106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0 > > [ 106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9 > > [ 106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT) > > [ 106.692586] Workqueue: imgsys_runner imgsys_runner_func > > [ 106.692638] Call trace: > > [ 106.692662] dump_backtrace+0x100/0x120 > > [ 106.692702] show_stack+0x20/0x2c > > [ 106.692737] dump_stack_lvl+0x84/0xb4 > > [ 106.692775] dump_stack+0x18/0x48 > > [ 106.692809] __might_resched+0x354/0x4c0 > > [ 106.692847] __might_sleep+0x98/0xe4 > > [ 106.692883] __pm_runtime_resume+0x70/0x124 > > [ 106.692921] cmdq_mbox_send_data+0xe4/0xb1c > > [ 106.692964] msg_submit+0x194/0x2dc > > [ 106.693003] mbox_send_message+0x190/0x330 > > [ 106.693043] imgsys_cmdq_sendtask+0x1618/0x2224 > > [ 106.693082] imgsys_runner_func+0xac/0x11c > > [ 106.693118] process_one_work+0x638/0xf84 > > [ 106.693158] worker_thread+0x808/0xcd0 > > [ 106.693196] kthread+0x24c/0x324 > > [ 106.693231] ret_from_fork+0x10/0x20 > > > > We found that there is a spin_lock_irqsave protection in msg_submit() > > of mailbox.c. > > So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(), > > it will get this BUG report. > > > > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic > > context with interrupts disabled. > > > > I see. The problem with this is that pm_runtime_irq_safe() will raise the refcount > of the parent device "forever", which isn't the best and partially defeats what we > are trying to do here (keeping stuff off unless really needed). > > At this point I wonder if it'd be just better to really fix this instead of working > around the problem. > > I'd say that one of the options would be to perform a change to msg_submit() so > that it will take into account that a .send_data() callback might be using RPM > functions. > > Ideas? :-) > @send_data: The API asks the MBOX controller driver, in atomic * context try to transmit a message on the bus. Returns 0 if * data is accepted for transmission, -EBUSY while rejecting * if the remote hasn't yet read the last data sent. Actual * transmission of data is reported by the controller via * mbox_chan_txdone (if it has some TX ACK irq). It must not * sleep. As long as send_data() does not sleep, ideas are welcome. Cheers.
Hi Angelo, Thanks for the reviews. On Tue, 2024-04-30 at 11:39 +0200, AngeloGioacchino Del Regno wrote: > Il 30/04/24 10:39, Jason-JH.Lin ha scritto: > > When we run kernel with lockdebug option, we will get the BUG > > below: > > [ 106.692124] BUG: sleeping function called from invalid context > > at drivers/base/power/runtime.c:1164 > > [ 106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, > > pid: 3616, name: kworker/u17:3 > > [ 106.692226] preempt_count: 1, expected: 0 > > [ 106.692254] RCU nest depth: 0, expected: 0 > > [ 106.692282] INFO: lockdep is turned off. > > [ 106.692306] irq event stamp: 0 > > [ 106.692331] hardirqs last enabled at (0): [<0000000000000000>] > > 0x0 > > [ 106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] > > copy_process+0xc90/0x2ac0 > > [ 106.692429] softirqs last enabled at (0): [<ffffffee15d37fc4>] > > copy_process+0xcb4/0x2ac0 > > [ 106.692473] softirqs last disabled at (0): [<0000000000000000>] > > 0x0 > > [ 106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted > > 6.1.87-lockdep-14133-g26e933aca785 #1 > > 6839942e1cf34914b0a366137843dd2366f52aa9 > > [ 106.692556] Hardware name: Google Ciri sku0/unprovisioned board > > (DT) > > [ 106.692586] Workqueue: imgsys_runner imgsys_runner_func > > [ 106.692638] Call trace: > > [ 106.692662] dump_backtrace+0x100/0x120 > > [ 106.692702] show_stack+0x20/0x2c > > [ 106.692737] dump_stack_lvl+0x84/0xb4 > > [ 106.692775] dump_stack+0x18/0x48 > > [ 106.692809] __might_resched+0x354/0x4c0 > > [ 106.692847] __might_sleep+0x98/0xe4 > > [ 106.692883] __pm_runtime_resume+0x70/0x124 > > [ 106.692921] cmdq_mbox_send_data+0xe4/0xb1c > > [ 106.692964] msg_submit+0x194/0x2dc > > [ 106.693003] mbox_send_message+0x190/0x330 > > [ 106.693043] imgsys_cmdq_sendtask+0x1618/0x2224 > > [ 106.693082] imgsys_runner_func+0xac/0x11c > > [ 106.693118] process_one_work+0x638/0xf84 > > [ 106.693158] worker_thread+0x808/0xcd0 > > [ 106.693196] kthread+0x24c/0x324 > > [ 106.693231] ret_from_fork+0x10/0x20 > > > > We found that there is a spin_lock_irqsave protection in > > msg_submit() > > of mailbox.c. > > So when cmdq driver calls pm_runtime_get_sync() in > > cmdq_mbox_send_data(), > > it will get this BUG report. > > > > Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in > > atomic > > context with interrupts disabled. > > > > I see. The problem with this is that pm_runtime_irq_safe() will raise > the refcount > of the parent device "forever", which isn't the best and partially > defeats what we > are trying to do here (keeping stuff off unless really needed). > > At this point I wonder if it'd be just better to really fix this > instead of working > around the problem. > > I'd say that one of the options would be to perform a change to > msg_submit() so > that it will take into account that a .send_data() callback might be > using RPM > functions. > > Ideas? :-) > > Cheers, > Angelo > As Jassi mentioned, .send_data() can not use sleep, so we can not use pm_runtime_get_sync() in cmdq_mbox_send_data(). Because we have to make sure the clocks is enabled before setting GCE registers inside the cmdq_mbox_send_data(). I think we can't use the ASYNC pm_runtime_get() instead of pm_runtime_get_sync() either. And you're right, I didn't notice that raising the irq_safe flag will make the parent device can not be suspended. How about this: 1. Use clk_bluk_enable() + pm_runtime_get() instead of pm_runtime_get_sync() everywhere in mtk-cmdq-mailbox.c. 2. Use refcount++ instead of clk_bluk_enable() in cmdq_runtime_resume(). 3. Call clk_bluk_disable() when refcount > 0 in cmdq_runtime_suspend(). Is this an option? Regards, Jason-JH.Lin > > Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime > > PM with autosuspend") > > Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> > > --- > > drivers/mailbox/mtk-cmdq-mailbox.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c > > b/drivers/mailbox/mtk-cmdq-mailbox.c > > index ead2200f39ba..3b4f19633c83 100644 > > --- a/drivers/mailbox/mtk-cmdq-mailbox.c > > +++ b/drivers/mailbox/mtk-cmdq-mailbox.c > > @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device > > *pdev) > > > > pm_runtime_set_autosuspend_delay(dev, > > CMDQ_MBOX_AUTOSUSPEND_DELAY_MS); > > pm_runtime_use_autosuspend(dev); > > + pm_runtime_irq_safe(dev); > > > > return 0; > > } > > >
diff --git a/drivers/mailbox/mtk-cmdq-mailbox.c b/drivers/mailbox/mtk-cmdq-mailbox.c index ead2200f39ba..3b4f19633c83 100644 --- a/drivers/mailbox/mtk-cmdq-mailbox.c +++ b/drivers/mailbox/mtk-cmdq-mailbox.c @@ -694,6 +694,7 @@ static int cmdq_probe(struct platform_device *pdev) pm_runtime_set_autosuspend_delay(dev, CMDQ_MBOX_AUTOSUSPEND_DELAY_MS); pm_runtime_use_autosuspend(dev); + pm_runtime_irq_safe(dev); return 0; }
When we run kernel with lockdebug option, we will get the BUG below: [ 106.692124] BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1164 [ 106.692190] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3616, name: kworker/u17:3 [ 106.692226] preempt_count: 1, expected: 0 [ 106.692254] RCU nest depth: 0, expected: 0 [ 106.692282] INFO: lockdep is turned off. [ 106.692306] irq event stamp: 0 [ 106.692331] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 106.692376] hardirqs last disabled at (0): [<ffffffee15d37fa0>] copy_process+0xc90/0x2ac0 [ 106.692429] softirqs last enabled at (0): [<ffffffee15d37fc4>] copy_process+0xcb4/0x2ac0 [ 106.692473] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 106.692513] CPU: 1 PID: 3616 Comm: kworker/u17:3 Not tainted 6.1.87-lockdep-14133-g26e933aca785 #1 6839942e1cf34914b0a366137843dd2366f52aa9 [ 106.692556] Hardware name: Google Ciri sku0/unprovisioned board (DT) [ 106.692586] Workqueue: imgsys_runner imgsys_runner_func [ 106.692638] Call trace: [ 106.692662] dump_backtrace+0x100/0x120 [ 106.692702] show_stack+0x20/0x2c [ 106.692737] dump_stack_lvl+0x84/0xb4 [ 106.692775] dump_stack+0x18/0x48 [ 106.692809] __might_resched+0x354/0x4c0 [ 106.692847] __might_sleep+0x98/0xe4 [ 106.692883] __pm_runtime_resume+0x70/0x124 [ 106.692921] cmdq_mbox_send_data+0xe4/0xb1c [ 106.692964] msg_submit+0x194/0x2dc [ 106.693003] mbox_send_message+0x190/0x330 [ 106.693043] imgsys_cmdq_sendtask+0x1618/0x2224 [ 106.693082] imgsys_runner_func+0xac/0x11c [ 106.693118] process_one_work+0x638/0xf84 [ 106.693158] worker_thread+0x808/0xcd0 [ 106.693196] kthread+0x24c/0x324 [ 106.693231] ret_from_fork+0x10/0x20 We found that there is a spin_lock_irqsave protection in msg_submit() of mailbox.c. So when cmdq driver calls pm_runtime_get_sync() in cmdq_mbox_send_data(), it will get this BUG report. Add pm_runtime_irq_safe() to let pm_runtime callbacks is safe in atomic context with interrupts disabled. Fixes: 8afe816b0c99 ("mailbox: mtk-cmdq-mailbox: Implement Runtime PM with autosuspend") Signed-off-by: Jason-JH.Lin <jason-jh.lin@mediatek.com> --- drivers/mailbox/mtk-cmdq-mailbox.c | 1 + 1 file changed, 1 insertion(+)