Message ID | 20250226155534.1099538-1-nitin.r.gote@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | dma-buf: Take a breath during dma-fence-chain subtests | expand |
Hi, Changes introduced with this patch is not related to the below Possible regression or failure. Please update CBL filters and re-report. Thanks, Nitin
Hi,
https://patchwork.freedesktop.org/series/145497/- Re-reported.
i915.CI.FULL - Re-reported.
Thanks,
Ravali.
From: I915-ci-infra <i915-ci-infra-bounces@lists.freedesktop.org> On Behalf Of Gote, Nitin R
Sent: 27 February 2025 12:34
To: I915-ci-infra@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: ✗ i915.CI.Full: failure for dma-buf: Take a breath during dma-fence-chain subtests
Hi,
Changes introduced with this patch is not related to the below Possible regression or failure.
Please update CBL filters and re-report.
Thanks,
Nitin
Hi Nitin, On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote: > Give the scheduler a chance to breath by calling cond_resched() > as some of the loops may take some time on old machines (like apl/bsw/pnv), > and so catch the attention of the watchdogs. > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 > Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> This patch goes beyond the intel-gfx domain so that you need to add some people in Cc. By running checkpatch, you should add: Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING FRAMEWORK) "Christian König" <christian.koenig@amd.com> (maintainer:DMA BUFFER SHARING FRAMEWORK) linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK) dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING FRAMEWORK) I added them now, but you might still be asked to resend. Said that, at a first glance, I don't have anything against this patch. Andi > --- > Hi, > > For reviewer reference, adding here watchdog issue seen on old machines > during dma-fence-chain subtests testing. This log is retrieved from device > pstore log while testing dam-buf@all-tests: > > dma-buf: Running dma_fence_chain > Panic#1 Part7 > <6> sizeof(dma_fence_chain)=184 > <6> dma-buf: Running dma_fence_chain/sanitycheck > <6> dma-buf: Running dma_fence_chain/find_seqno > <6> dma-buf: Running dma_fence_chain/find_signaled > <6> dma-buf: Running dma_fence_chain/find_out_of_order > <6> dma-buf: Running dma_fence_chain/find_gap > <6> dma-buf: Running dma_fence_chain/find_race > <6> Completed 4095 cycles > <6> dma-buf: Running dma_fence_chain/signal_forward > <6> dma-buf: Running dma_fence_chain/signal_backward > <6> dma-buf: Running dma_fence_chain/wait_forward > <6> dma-buf: Running dma_fence_chain/wait_backward > <0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263] > Panic#1 Part6 > <4> irq event stamp: 415735 > <4> hardirqs last enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0 > <4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0 > <4> softirqs last enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 > <4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 > <4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1 > <4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018 > <4> RIP: 0010:handle_softirqs+0xb1/0x4d0 > <4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246 > <4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 > <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > <4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000 > <4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a > <4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100 > <4> FS: 000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 > Panic#1 Part5 > <4> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0 > <4> Call Trace: > <4> <IRQ> > <4> ? show_regs+0x6c/0x80 > <4> ? watchdog_timer_fn+0x247/0x2d0 > <4> ? __pfx_watchdog_timer_fn+0x10/0x10 > <4> ? __hrtimer_run_queues+0x1d0/0x420 > <4> ? hrtimer_interrupt+0x116/0x290 > <4> ? __sysvec_apic_timer_interrupt+0x70/0x1e0 > <4> ? sysvec_apic_timer_interrupt+0x47/0xc0 > <4> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 > <4> ? handle_softirqs+0xb1/0x4d0 > <4> __irq_exit_rcu+0x13f/0x160 > <4> irq_exit_rcu+0xe/0x20 > <4> sysvec_irq_work+0xa0/0xc0 > <4> </IRQ> > <4> <TASK> > <4> asm_sysvec_irq_work+0x1b/0x20 > <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 > <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246 > <4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000 > <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > <4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000 > <4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 > <4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c > Panic#1 Part4 > <4> dma_fence_signal+0x49/0xb0 > <4> wait_backward+0xf8/0x140 [dmabuf_selftests] > <4> __subtests+0x75/0x1f0 [dmabuf_selftests] > <4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests] > <4> st_init+0x6a/0xff0 [dmabuf_selftests] > <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests] > <4> do_one_initcall+0x79/0x400 > <4> do_init_module+0x97/0x2a0 > <4> load_module+0x2c23/0x2f60 > <4> init_module_from_file+0x97/0xe0 > <4> ? init_module_from_file+0x97/0xe0 > <4> idempotent_init_module+0x134/0x350 > <4> __x64_sys_finit_module+0x77/0x100 > <4> x64_sys_call+0x1f37/0x2650 > <4> do_syscall_64+0x91/0x180 > <4> ? trace_hardirqs_off+0x5d/0xe0 > <4> ? syscall_exit_to_user_mode+0x95/0x260 > <4> ? do_syscall_64+0x9d/0x180 > <4> ? do_syscall_64+0x9d/0x180 > <4> ? irqentry_exit+0x77/0xb0 > <4> ? sysvec_apic_timer_interrupt+0x57/0xc0 > <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e > <4> RIP: 0033:0x77521e72725d > > > drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c > index ed4b323886e4..328a66ed59e5 100644 > --- a/drivers/dma-buf/st-dma-fence-chain.c > +++ b/drivers/dma-buf/st-dma-fence-chain.c > @@ -505,6 +505,7 @@ static int signal_forward(void *arg) > > for (i = 0; i < fc.chain_length; i++) { > dma_fence_signal(fc.fences[i]); > + cond_resched(); > > if (!dma_fence_is_signaled(fc.chains[i])) { > pr_err("chain[%d] not signaled!\n", i); > @@ -537,6 +538,7 @@ static int signal_backward(void *arg) > > for (i = fc.chain_length; i--; ) { > dma_fence_signal(fc.fences[i]); > + cond_resched(); > > if (i > 0 && dma_fence_is_signaled(fc.chains[i])) { > pr_err("chain[%d] is signaled!\n", i); > @@ -587,8 +589,10 @@ static int wait_forward(void *arg) > get_task_struct(tsk); > yield_to(tsk, true); > > - for (i = 0; i < fc.chain_length; i++) > + for (i = 0; i < fc.chain_length; i++) { > dma_fence_signal(fc.fences[i]); > + cond_resched(); > + } > > err = kthread_stop_put(tsk); > > @@ -616,8 +620,10 @@ static int wait_backward(void *arg) > get_task_struct(tsk); > yield_to(tsk, true); > > - for (i = fc.chain_length; i--; ) > + for (i = fc.chain_length; i--; ) { > dma_fence_signal(fc.fences[i]); > + cond_resched(); > + } > > err = kthread_stop_put(tsk); > > @@ -663,8 +669,10 @@ static int wait_random(void *arg) > get_task_struct(tsk); > yield_to(tsk, true); > > - for (i = 0; i < fc.chain_length; i++) > + for (i = 0; i < fc.chain_length; i++) { > dma_fence_signal(fc.fences[i]); > + cond_resched(); > + } > > err = kthread_stop_put(tsk); > > -- > 2.25.1
Am 27.02.25 um 13:52 schrieb Andi Shyti: > Hi Nitin, > > On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote: >> Give the scheduler a chance to breath by calling cond_resched() >> as some of the loops may take some time on old machines (like apl/bsw/pnv), >> and so catch the attention of the watchdogs. >> >> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 >> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> > This patch goes beyond the intel-gfx domain so that you need to > add some people in Cc. By running checkpatch, you should add: > > Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING FRAMEWORK) > "Christian König" <christian.koenig@amd.com> (maintainer:DMA BUFFER SHARING FRAMEWORK) > linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK) > dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING FRAMEWORK) > > I added them now, but you might still be asked to resend. > > Said that, at a first glance, I don't have anything against this > patch. There has been some push to deprecate cond_resched() cause it is almost always not appropriate. Saying that if I'm not completely mistaken that here is also not 100% correct usage. Question is why is the test taking 26 (busy?) seconds to complete? That sounds really long even for a very old CPU. Do we maybe have an udelay() here which should have been an usleep() or similar? Regards, Christian. > > Andi > >> --- >> Hi, >> >> For reviewer reference, adding here watchdog issue seen on old machines >> during dma-fence-chain subtests testing. This log is retrieved from device >> pstore log while testing dam-buf@all-tests: >> >> dma-buf: Running dma_fence_chain >> Panic#1 Part7 >> <6> sizeof(dma_fence_chain)=184 >> <6> dma-buf: Running dma_fence_chain/sanitycheck >> <6> dma-buf: Running dma_fence_chain/find_seqno >> <6> dma-buf: Running dma_fence_chain/find_signaled >> <6> dma-buf: Running dma_fence_chain/find_out_of_order >> <6> dma-buf: Running dma_fence_chain/find_gap >> <6> dma-buf: Running dma_fence_chain/find_race >> <6> Completed 4095 cycles >> <6> dma-buf: Running dma_fence_chain/signal_forward >> <6> dma-buf: Running dma_fence_chain/signal_backward >> <6> dma-buf: Running dma_fence_chain/wait_forward >> <6> dma-buf: Running dma_fence_chain/wait_backward >> <0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263] >> Panic#1 Part6 >> <4> irq event stamp: 415735 >> <4> hardirqs last enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0 >> <4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0 >> <4> softirqs last enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 >> <4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 >> <4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1 >> <4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018 >> <4> RIP: 0010:handle_softirqs+0xb1/0x4d0 >> <4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246 >> <4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 >> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 >> <4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000 >> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a >> <4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100 >> <4> FS: 000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 >> Panic#1 Part5 >> <4> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> <4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0 >> <4> Call Trace: >> <4> <IRQ> >> <4> ? show_regs+0x6c/0x80 >> <4> ? watchdog_timer_fn+0x247/0x2d0 >> <4> ? __pfx_watchdog_timer_fn+0x10/0x10 >> <4> ? __hrtimer_run_queues+0x1d0/0x420 >> <4> ? hrtimer_interrupt+0x116/0x290 >> <4> ? __sysvec_apic_timer_interrupt+0x70/0x1e0 >> <4> ? sysvec_apic_timer_interrupt+0x47/0xc0 >> <4> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 >> <4> ? handle_softirqs+0xb1/0x4d0 >> <4> __irq_exit_rcu+0x13f/0x160 >> <4> irq_exit_rcu+0xe/0x20 >> <4> sysvec_irq_work+0xa0/0xc0 >> <4> </IRQ> >> <4> <TASK> >> <4> asm_sysvec_irq_work+0x1b/0x20 >> <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 >> <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246 >> <4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000 >> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 >> <4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000 >> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 >> <4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c >> Panic#1 Part4 >> <4> dma_fence_signal+0x49/0xb0 >> <4> wait_backward+0xf8/0x140 [dmabuf_selftests] >> <4> __subtests+0x75/0x1f0 [dmabuf_selftests] >> <4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests] >> <4> st_init+0x6a/0xff0 [dmabuf_selftests] >> <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests] >> <4> do_one_initcall+0x79/0x400 >> <4> do_init_module+0x97/0x2a0 >> <4> load_module+0x2c23/0x2f60 >> <4> init_module_from_file+0x97/0xe0 >> <4> ? init_module_from_file+0x97/0xe0 >> <4> idempotent_init_module+0x134/0x350 >> <4> __x64_sys_finit_module+0x77/0x100 >> <4> x64_sys_call+0x1f37/0x2650 >> <4> do_syscall_64+0x91/0x180 >> <4> ? trace_hardirqs_off+0x5d/0xe0 >> <4> ? syscall_exit_to_user_mode+0x95/0x260 >> <4> ? do_syscall_64+0x9d/0x180 >> <4> ? do_syscall_64+0x9d/0x180 >> <4> ? irqentry_exit+0x77/0xb0 >> <4> ? sysvec_apic_timer_interrupt+0x57/0xc0 >> <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e >> <4> RIP: 0033:0x77521e72725d >> >> >> drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++--- >> 1 file changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c >> index ed4b323886e4..328a66ed59e5 100644 >> --- a/drivers/dma-buf/st-dma-fence-chain.c >> +++ b/drivers/dma-buf/st-dma-fence-chain.c >> @@ -505,6 +505,7 @@ static int signal_forward(void *arg) >> >> for (i = 0; i < fc.chain_length; i++) { >> dma_fence_signal(fc.fences[i]); >> + cond_resched(); >> >> if (!dma_fence_is_signaled(fc.chains[i])) { >> pr_err("chain[%d] not signaled!\n", i); >> @@ -537,6 +538,7 @@ static int signal_backward(void *arg) >> >> for (i = fc.chain_length; i--; ) { >> dma_fence_signal(fc.fences[i]); >> + cond_resched(); >> >> if (i > 0 && dma_fence_is_signaled(fc.chains[i])) { >> pr_err("chain[%d] is signaled!\n", i); >> @@ -587,8 +589,10 @@ static int wait_forward(void *arg) >> get_task_struct(tsk); >> yield_to(tsk, true); >> >> - for (i = 0; i < fc.chain_length; i++) >> + for (i = 0; i < fc.chain_length; i++) { >> dma_fence_signal(fc.fences[i]); >> + cond_resched(); >> + } >> >> err = kthread_stop_put(tsk); >> >> @@ -616,8 +620,10 @@ static int wait_backward(void *arg) >> get_task_struct(tsk); >> yield_to(tsk, true); >> >> - for (i = fc.chain_length; i--; ) >> + for (i = fc.chain_length; i--; ) { >> dma_fence_signal(fc.fences[i]); >> + cond_resched(); >> + } >> >> err = kthread_stop_put(tsk); >> >> @@ -663,8 +669,10 @@ static int wait_random(void *arg) >> get_task_struct(tsk); >> yield_to(tsk, true); >> >> - for (i = 0; i < fc.chain_length; i++) >> + for (i = 0; i < fc.chain_length; i++) { >> dma_fence_signal(fc.fences[i]); >> + cond_resched(); >> + } >> >> err = kthread_stop_put(tsk); >> >> -- >> 2.25.1
Hi, > Am 27.02.25 um 13:52 schrieb Andi Shyti: > > Hi Nitin, > > > > On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote: > >> Give the scheduler a chance to breath by calling cond_resched() as > >> some of the loops may take some time on old machines (like > >> apl/bsw/pnv), and so catch the attention of the watchdogs. > >> > >> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 > >> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> > > This patch goes beyond the intel-gfx domain so that you need to add > > some people in Cc. By running checkpatch, you should add: > > > > Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING > > FRAMEWORK) "Christian König" <christian.koenig@amd.com> > > (maintainer:DMA BUFFER SHARING FRAMEWORK) linux- > media@vger.kernel.org > > (open list:DMA BUFFER SHARING FRAMEWORK) > > dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING > > FRAMEWORK) > > > > I added them now, but you might still be asked to resend. Thank you Andi, for adding dma-buf maintainers. > > > > Said that, at a first glance, I don't have anything against this > > patch. > > There has been some push to deprecate cond_resched() cause it is almost always > not appropriate. Thank you Konig for review. I'm not finding any push/commit or documentation of deprecated cond_resched() api. If you have any reference, Could you please share a push of deprecated cond_resched()? > > Saying that if I'm not completely mistaken that here is also not 100% correct > usage. > > Question is why is the test taking 26 (busy?) seconds to complete? That sounds > really long even for a very old CPU. > > Do we maybe have an udelay() here which should have been an usleep() or > similar? I will check and test with udelay() or similar api. And I will resend a patch after testing. Regards, Nitin > > Regards, > Christian. > > > > > Andi > > > >> --- > >> Hi, > >> > >> For reviewer reference, adding here watchdog issue seen on old > >> machines during dma-fence-chain subtests testing. This log is > >> retrieved from device pstore log while testing dam-buf@all-tests: > >> > >> dma-buf: Running dma_fence_chain > >> Panic#1 Part7 > >> <6> sizeof(dma_fence_chain)=184 > >> <6> dma-buf: Running dma_fence_chain/sanitycheck <6> dma-buf: Running > >> dma_fence_chain/find_seqno <6> dma-buf: Running > >> dma_fence_chain/find_signaled <6> dma-buf: Running > >> dma_fence_chain/find_out_of_order <6> dma-buf: Running > >> dma_fence_chain/find_gap <6> dma-buf: Running > >> dma_fence_chain/find_race <6> Completed 4095 cycles <6> dma-buf: > >> Running dma_fence_chain/signal_forward <6> dma-buf: Running > >> dma_fence_chain/signal_backward <6> dma-buf: Running > >> dma_fence_chain/wait_forward <6> dma-buf: Running > >> dma_fence_chain/wait_backward <0> watchdog: BUG: soft lockup - CPU#2 > >> stuck for 26s! [dmabuf:2263] > >> Panic#1 Part6 > >> <4> irq event stamp: 415735 > >> <4> hardirqs last enabled at (415734): [<ffffffff813d3a1b>] > >> handle_softirqs+0xab/0x4d0 <4> hardirqs last disabled at (415735): > >> [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0 > >> <4> softirqs last enabled at (415728): [<ffffffff813d3f8f>] > >> __irq_exit_rcu+0x13f/0x160 <4> softirqs last disabled at (415733): > >> [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 <4> CPU: 2 UID: 0 > >> PID: 2263 Comm: dmabuf Not tainted > >> 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1 <4> Hardware name: Intel > >> corporation NUC6CAYS/NUC6CAYB, BIOS > AYAPLCEL.86A.0056.2018.0926.1100 > >> 09/26/2018 <4> RIP: 0010:handle_softirqs+0xb1/0x4d0 <4> RSP: > >> 0018:ffffc90000154f60 EFLAGS: 00000246 <4> RAX: 0000000000000000 RBX: > >> 0000000000000001 RCX: 0000000000000000 <4> RDX: 0000000000000000 > RSI: > >> 0000000000000000 RDI: 0000000000000000 <4> RBP: ffffc90000154fb8 R08: > >> 0000000000000000 R09: 0000000000000000 <4> R10: 0000000000000000 > R11: > >> 0000000000000000 R12: 000000000000000a <4> R13: 0000000000000200 > R14: > >> 0000000000000200 R15: 0000000000400100 <4> FS: > >> 000077521c5cd940(0000) GS:ffff888277900000(0000) > >> knlGS:0000000000000000 > >> Panic#1 Part5 > >> <4> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> CR2: > >> 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0 <4> > Call > >> Trace: > >> <4> <IRQ> > >> <4> ? show_regs+0x6c/0x80 > >> <4> ? watchdog_timer_fn+0x247/0x2d0 > >> <4> ? __pfx_watchdog_timer_fn+0x10/0x10 <4> ? > >> __hrtimer_run_queues+0x1d0/0x420 <4> ? hrtimer_interrupt+0x116/0x290 > >> <4> ? __sysvec_apic_timer_interrupt+0x70/0x1e0 > >> <4> ? sysvec_apic_timer_interrupt+0x47/0xc0 > >> <4> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 > >> <4> ? handle_softirqs+0xb1/0x4d0 > >> <4> __irq_exit_rcu+0x13f/0x160 > >> <4> irq_exit_rcu+0xe/0x20 > >> <4> sysvec_irq_work+0xa0/0xc0 > >> <4> </IRQ> > >> <4> <TASK> > >> <4> asm_sysvec_irq_work+0x1b/0x20 > >> <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 > >> <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246 <4> RAX: > >> 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000 <4> RDX: > >> 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 <4> > RBP: > >> ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000 <4> R10: > >> 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 <4> > R13: > >> 0000000000000000 R14: 0000000000000246 R15: 000000000003828c > >> Panic#1 Part4 > >> <4> dma_fence_signal+0x49/0xb0 > >> <4> wait_backward+0xf8/0x140 [dmabuf_selftests] <4> > >> __subtests+0x75/0x1f0 [dmabuf_selftests] <4> > >> dma_fence_chain+0x94/0xe0 [dmabuf_selftests] <4> st_init+0x6a/0xff0 > >> [dmabuf_selftests] <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests] > >> <4> do_one_initcall+0x79/0x400 <4> do_init_module+0x97/0x2a0 <4> > >> load_module+0x2c23/0x2f60 <4> init_module_from_file+0x97/0xe0 <4> ? > >> init_module_from_file+0x97/0xe0 <4> > >> idempotent_init_module+0x134/0x350 > >> <4> __x64_sys_finit_module+0x77/0x100 <4> x64_sys_call+0x1f37/0x2650 > >> <4> do_syscall_64+0x91/0x180 <4> ? trace_hardirqs_off+0x5d/0xe0 <4> ? > >> syscall_exit_to_user_mode+0x95/0x260 > >> <4> ? do_syscall_64+0x9d/0x180 > >> <4> ? do_syscall_64+0x9d/0x180 > >> <4> ? irqentry_exit+0x77/0xb0 > >> <4> ? sysvec_apic_timer_interrupt+0x57/0xc0 > >> <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e > >> <4> RIP: 0033:0x77521e72725d > >> > >> > >> drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++--- > >> 1 file changed, 11 insertions(+), 3 deletions(-) > >> > >> diff --git a/drivers/dma-buf/st-dma-fence-chain.c > >> b/drivers/dma-buf/st-dma-fence-chain.c > >> index ed4b323886e4..328a66ed59e5 100644 > >> --- a/drivers/dma-buf/st-dma-fence-chain.c > >> +++ b/drivers/dma-buf/st-dma-fence-chain.c > >> @@ -505,6 +505,7 @@ static int signal_forward(void *arg) > >> > >> for (i = 0; i < fc.chain_length; i++) { > >> dma_fence_signal(fc.fences[i]); > >> + cond_resched(); > >> > >> if (!dma_fence_is_signaled(fc.chains[i])) { > >> pr_err("chain[%d] not signaled!\n", i); @@ -537,6 > +538,7 @@ > >> static int signal_backward(void *arg) > >> > >> for (i = fc.chain_length; i--; ) { > >> dma_fence_signal(fc.fences[i]); > >> + cond_resched(); > >> > >> if (i > 0 && dma_fence_is_signaled(fc.chains[i])) { > >> pr_err("chain[%d] is signaled!\n", i); @@ -587,8 +589,10 > @@ > >> static int wait_forward(void *arg) > >> get_task_struct(tsk); > >> yield_to(tsk, true); > >> > >> - for (i = 0; i < fc.chain_length; i++) > >> + for (i = 0; i < fc.chain_length; i++) { > >> dma_fence_signal(fc.fences[i]); > >> + cond_resched(); > >> + } > >> > >> err = kthread_stop_put(tsk); > >> > >> @@ -616,8 +620,10 @@ static int wait_backward(void *arg) > >> get_task_struct(tsk); > >> yield_to(tsk, true); > >> > >> - for (i = fc.chain_length; i--; ) > >> + for (i = fc.chain_length; i--; ) { > >> dma_fence_signal(fc.fences[i]); > >> + cond_resched(); > >> + } > >> > >> err = kthread_stop_put(tsk); > >> > >> @@ -663,8 +669,10 @@ static int wait_random(void *arg) > >> get_task_struct(tsk); > >> yield_to(tsk, true); > >> > >> - for (i = 0; i < fc.chain_length; i++) > >> + for (i = 0; i < fc.chain_length; i++) { > >> dma_fence_signal(fc.fences[i]); > >> + cond_resched(); > >> + } > >> > >> err = kthread_stop_put(tsk); > >> > >> -- > >> 2.25.1
diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c index ed4b323886e4..328a66ed59e5 100644 --- a/drivers/dma-buf/st-dma-fence-chain.c +++ b/drivers/dma-buf/st-dma-fence-chain.c @@ -505,6 +505,7 @@ static int signal_forward(void *arg) for (i = 0; i < fc.chain_length; i++) { dma_fence_signal(fc.fences[i]); + cond_resched(); if (!dma_fence_is_signaled(fc.chains[i])) { pr_err("chain[%d] not signaled!\n", i); @@ -537,6 +538,7 @@ static int signal_backward(void *arg) for (i = fc.chain_length; i--; ) { dma_fence_signal(fc.fences[i]); + cond_resched(); if (i > 0 && dma_fence_is_signaled(fc.chains[i])) { pr_err("chain[%d] is signaled!\n", i); @@ -587,8 +589,10 @@ static int wait_forward(void *arg) get_task_struct(tsk); yield_to(tsk, true); - for (i = 0; i < fc.chain_length; i++) + for (i = 0; i < fc.chain_length; i++) { dma_fence_signal(fc.fences[i]); + cond_resched(); + } err = kthread_stop_put(tsk); @@ -616,8 +620,10 @@ static int wait_backward(void *arg) get_task_struct(tsk); yield_to(tsk, true); - for (i = fc.chain_length; i--; ) + for (i = fc.chain_length; i--; ) { dma_fence_signal(fc.fences[i]); + cond_resched(); + } err = kthread_stop_put(tsk); @@ -663,8 +669,10 @@ static int wait_random(void *arg) get_task_struct(tsk); yield_to(tsk, true); - for (i = 0; i < fc.chain_length; i++) + for (i = 0; i < fc.chain_length; i++) { dma_fence_signal(fc.fences[i]); + cond_resched(); + } err = kthread_stop_put(tsk);
Give the scheduler a chance to breath by calling cond_resched() as some of the loops may take some time on old machines (like apl/bsw/pnv), and so catch the attention of the watchdogs. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904 Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> --- Hi, For reviewer reference, adding here watchdog issue seen on old machines during dma-fence-chain subtests testing. This log is retrieved from device pstore log while testing dam-buf@all-tests: dma-buf: Running dma_fence_chain Panic#1 Part7 <6> sizeof(dma_fence_chain)=184 <6> dma-buf: Running dma_fence_chain/sanitycheck <6> dma-buf: Running dma_fence_chain/find_seqno <6> dma-buf: Running dma_fence_chain/find_signaled <6> dma-buf: Running dma_fence_chain/find_out_of_order <6> dma-buf: Running dma_fence_chain/find_gap <6> dma-buf: Running dma_fence_chain/find_race <6> Completed 4095 cycles <6> dma-buf: Running dma_fence_chain/signal_forward <6> dma-buf: Running dma_fence_chain/signal_backward <6> dma-buf: Running dma_fence_chain/wait_forward <6> dma-buf: Running dma_fence_chain/wait_backward <0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263] Panic#1 Part6 <4> irq event stamp: 415735 <4> hardirqs last enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0 <4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0 <4> softirqs last enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 <4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 <4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1 <4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018 <4> RIP: 0010:handle_softirqs+0xb1/0x4d0 <4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246 <4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 <4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000 <4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a <4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100 <4> FS: 000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000 Panic#1 Part5 <4> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0 <4> Call Trace: <4> <IRQ> <4> ? show_regs+0x6c/0x80 <4> ? watchdog_timer_fn+0x247/0x2d0 <4> ? __pfx_watchdog_timer_fn+0x10/0x10 <4> ? __hrtimer_run_queues+0x1d0/0x420 <4> ? hrtimer_interrupt+0x116/0x290 <4> ? __sysvec_apic_timer_interrupt+0x70/0x1e0 <4> ? sysvec_apic_timer_interrupt+0x47/0xc0 <4> ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 <4> ? handle_softirqs+0xb1/0x4d0 <4> __irq_exit_rcu+0x13f/0x160 <4> irq_exit_rcu+0xe/0x20 <4> sysvec_irq_work+0xa0/0xc0 <4> </IRQ> <4> <TASK> <4> asm_sysvec_irq_work+0x1b/0x20 <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80 <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246 <4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000 <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 <4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000 <4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 <4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c Panic#1 Part4 <4> dma_fence_signal+0x49/0xb0 <4> wait_backward+0xf8/0x140 [dmabuf_selftests] <4> __subtests+0x75/0x1f0 [dmabuf_selftests] <4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests] <4> st_init+0x6a/0xff0 [dmabuf_selftests] <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests] <4> do_one_initcall+0x79/0x400 <4> do_init_module+0x97/0x2a0 <4> load_module+0x2c23/0x2f60 <4> init_module_from_file+0x97/0xe0 <4> ? init_module_from_file+0x97/0xe0 <4> idempotent_init_module+0x134/0x350 <4> __x64_sys_finit_module+0x77/0x100 <4> x64_sys_call+0x1f37/0x2650 <4> do_syscall_64+0x91/0x180 <4> ? trace_hardirqs_off+0x5d/0xe0 <4> ? syscall_exit_to_user_mode+0x95/0x260 <4> ? do_syscall_64+0x9d/0x180 <4> ? do_syscall_64+0x9d/0x180 <4> ? irqentry_exit+0x77/0xb0 <4> ? sysvec_apic_timer_interrupt+0x57/0xc0 <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e <4> RIP: 0033:0x77521e72725d drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)