diff mbox series

dma-buf: Take a breath during dma-fence-chain subtests

Message ID 20250226155534.1099538-1-nitin.r.gote@intel.com (mailing list archive)
State New
Headers show
Series dma-buf: Take a breath during dma-fence-chain subtests | expand

Commit Message

Nitin Gote Feb. 26, 2025, 3:55 p.m. UTC
Give the scheduler a chance to breath by calling cond_resched()
as some of the loops may take some time on old machines (like apl/bsw/pnv),
and so catch the attention of the watchdogs.

Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
---
Hi,

For reviewer reference, adding here watchdog issue seen on old machines
during dma-fence-chain subtests testing. This log is retrieved from device
pstore log while testing dam-buf@all-tests:

dma-buf: Running dma_fence_chain
Panic#1 Part7
<6> sizeof(dma_fence_chain)=184
<6> dma-buf: Running dma_fence_chain/sanitycheck
<6> dma-buf: Running dma_fence_chain/find_seqno
<6> dma-buf: Running dma_fence_chain/find_signaled
<6> dma-buf: Running dma_fence_chain/find_out_of_order
<6> dma-buf: Running dma_fence_chain/find_gap
<6> dma-buf: Running dma_fence_chain/find_race
<6> Completed 4095 cycles
<6> dma-buf: Running dma_fence_chain/signal_forward
<6> dma-buf: Running dma_fence_chain/signal_backward
<6> dma-buf: Running dma_fence_chain/wait_forward
<6> dma-buf: Running dma_fence_chain/wait_backward
<0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263]
Panic#1 Part6
<4> irq event stamp: 415735
<4> hardirqs last  enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0
<4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0
<4> softirqs last  enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
<4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
<4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1
<4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
<4> RIP: 0010:handle_softirqs+0xb1/0x4d0
<4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246
<4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
<4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000
<4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
<4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100
<4> FS:  000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
Panic#1 Part5
<4> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0
<4> Call Trace:
<4>  <IRQ>
<4>  ? show_regs+0x6c/0x80
<4>  ? watchdog_timer_fn+0x247/0x2d0
<4>  ? __pfx_watchdog_timer_fn+0x10/0x10
<4>  ? __hrtimer_run_queues+0x1d0/0x420
<4>  ? hrtimer_interrupt+0x116/0x290
<4>  ? __sysvec_apic_timer_interrupt+0x70/0x1e0
<4>  ? sysvec_apic_timer_interrupt+0x47/0xc0
<4>  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
<4>  ? handle_softirqs+0xb1/0x4d0
<4>  __irq_exit_rcu+0x13f/0x160
<4>  irq_exit_rcu+0xe/0x20
<4>  sysvec_irq_work+0xa0/0xc0
<4>  </IRQ>
<4>  <TASK>
<4>  asm_sysvec_irq_work+0x1b/0x20
<4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
<4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246
<4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000
<4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000
<4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
<4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c
Panic#1 Part4
<4> dma_fence_signal+0x49/0xb0
<4> wait_backward+0xf8/0x140 [dmabuf_selftests]
<4> __subtests+0x75/0x1f0 [dmabuf_selftests]
<4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests]
<4> st_init+0x6a/0xff0 [dmabuf_selftests]
<4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests]
<4> do_one_initcall+0x79/0x400
<4> do_init_module+0x97/0x2a0
<4> load_module+0x2c23/0x2f60
<4> init_module_from_file+0x97/0xe0
<4> ? init_module_from_file+0x97/0xe0
<4> idempotent_init_module+0x134/0x350
<4> __x64_sys_finit_module+0x77/0x100
<4> x64_sys_call+0x1f37/0x2650
<4> do_syscall_64+0x91/0x180
<4> ? trace_hardirqs_off+0x5d/0xe0
<4> ? syscall_exit_to_user_mode+0x95/0x260
<4> ? do_syscall_64+0x9d/0x180
<4> ? do_syscall_64+0x9d/0x180
<4> ? irqentry_exit+0x77/0xb0
<4> ? sysvec_apic_timer_interrupt+0x57/0xc0
<4> entry_SYSCALL_64_after_hwframe+0x76/0x7e
<4> RIP: 0033:0x77521e72725d


 drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

Comments

Nitin Gote Feb. 27, 2025, 7:03 a.m. UTC | #1
Hi,
Changes introduced with this patch is not related to the below Possible regression or failure.
Please update CBL filters and re-report.

Thanks,
Nitin
Ravali, JupallyX Feb. 27, 2025, 9:27 a.m. UTC | #2
Hi,

https://patchwork.freedesktop.org/series/145497/- Re-reported.
i915.CI.FULL - Re-reported.

Thanks,
Ravali.

From: I915-ci-infra <i915-ci-infra-bounces@lists.freedesktop.org> On Behalf Of Gote, Nitin R
Sent: 27 February 2025 12:34
To: I915-ci-infra@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: ✗ i915.CI.Full: failure for dma-buf: Take a breath during dma-fence-chain subtests

Hi,
Changes introduced with this patch is not related to the below Possible regression or failure.
Please update CBL filters and re-report.

Thanks,
Nitin
Andi Shyti Feb. 27, 2025, 12:52 p.m. UTC | #3
Hi Nitin,

On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote:
> Give the scheduler a chance to breath by calling cond_resched()
> as some of the loops may take some time on old machines (like apl/bsw/pnv),
> and so catch the attention of the watchdogs.
> 
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>

This patch goes beyond the intel-gfx domain so that you need to
add some people in Cc. By running checkpatch, you should add:

Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING FRAMEWORK)
"Christian König" <christian.koenig@amd.com> (maintainer:DMA BUFFER SHARING FRAMEWORK)
linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK)
dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING FRAMEWORK)

I added them now, but you might still be asked to resend.

Said that, at a first glance, I don't have anything against this
patch.

Andi

> ---
> Hi,
> 
> For reviewer reference, adding here watchdog issue seen on old machines
> during dma-fence-chain subtests testing. This log is retrieved from device
> pstore log while testing dam-buf@all-tests:
> 
> dma-buf: Running dma_fence_chain
> Panic#1 Part7
> <6> sizeof(dma_fence_chain)=184
> <6> dma-buf: Running dma_fence_chain/sanitycheck
> <6> dma-buf: Running dma_fence_chain/find_seqno
> <6> dma-buf: Running dma_fence_chain/find_signaled
> <6> dma-buf: Running dma_fence_chain/find_out_of_order
> <6> dma-buf: Running dma_fence_chain/find_gap
> <6> dma-buf: Running dma_fence_chain/find_race
> <6> Completed 4095 cycles
> <6> dma-buf: Running dma_fence_chain/signal_forward
> <6> dma-buf: Running dma_fence_chain/signal_backward
> <6> dma-buf: Running dma_fence_chain/wait_forward
> <6> dma-buf: Running dma_fence_chain/wait_backward
> <0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263]
> Panic#1 Part6
> <4> irq event stamp: 415735
> <4> hardirqs last  enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0
> <4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0
> <4> softirqs last  enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
> <4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
> <4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1
> <4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
> <4> RIP: 0010:handle_softirqs+0xb1/0x4d0
> <4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246
> <4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> <4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000
> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
> <4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100
> <4> FS:  000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
> Panic#1 Part5
> <4> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0
> <4> Call Trace:
> <4>  <IRQ>
> <4>  ? show_regs+0x6c/0x80
> <4>  ? watchdog_timer_fn+0x247/0x2d0
> <4>  ? __pfx_watchdog_timer_fn+0x10/0x10
> <4>  ? __hrtimer_run_queues+0x1d0/0x420
> <4>  ? hrtimer_interrupt+0x116/0x290
> <4>  ? __sysvec_apic_timer_interrupt+0x70/0x1e0
> <4>  ? sysvec_apic_timer_interrupt+0x47/0xc0
> <4>  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
> <4>  ? handle_softirqs+0xb1/0x4d0
> <4>  __irq_exit_rcu+0x13f/0x160
> <4>  irq_exit_rcu+0xe/0x20
> <4>  sysvec_irq_work+0xa0/0xc0
> <4>  </IRQ>
> <4>  <TASK>
> <4>  asm_sysvec_irq_work+0x1b/0x20
> <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
> <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246
> <4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000
> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> <4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000
> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
> <4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c
> Panic#1 Part4
> <4> dma_fence_signal+0x49/0xb0
> <4> wait_backward+0xf8/0x140 [dmabuf_selftests]
> <4> __subtests+0x75/0x1f0 [dmabuf_selftests]
> <4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests]
> <4> st_init+0x6a/0xff0 [dmabuf_selftests]
> <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests]
> <4> do_one_initcall+0x79/0x400
> <4> do_init_module+0x97/0x2a0
> <4> load_module+0x2c23/0x2f60
> <4> init_module_from_file+0x97/0xe0
> <4> ? init_module_from_file+0x97/0xe0
> <4> idempotent_init_module+0x134/0x350
> <4> __x64_sys_finit_module+0x77/0x100
> <4> x64_sys_call+0x1f37/0x2650
> <4> do_syscall_64+0x91/0x180
> <4> ? trace_hardirqs_off+0x5d/0xe0
> <4> ? syscall_exit_to_user_mode+0x95/0x260
> <4> ? do_syscall_64+0x9d/0x180
> <4> ? do_syscall_64+0x9d/0x180
> <4> ? irqentry_exit+0x77/0xb0
> <4> ? sysvec_apic_timer_interrupt+0x57/0xc0
> <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> <4> RIP: 0033:0x77521e72725d
> 
> 
>  drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index ed4b323886e4..328a66ed59e5 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -505,6 +505,7 @@ static int signal_forward(void *arg)
>  
>  	for (i = 0; i < fc.chain_length; i++) {
>  		dma_fence_signal(fc.fences[i]);
> +		cond_resched();
>  
>  		if (!dma_fence_is_signaled(fc.chains[i])) {
>  			pr_err("chain[%d] not signaled!\n", i);
> @@ -537,6 +538,7 @@ static int signal_backward(void *arg)
>  
>  	for (i = fc.chain_length; i--; ) {
>  		dma_fence_signal(fc.fences[i]);
> +		cond_resched();
>  
>  		if (i > 0 && dma_fence_is_signaled(fc.chains[i])) {
>  			pr_err("chain[%d] is signaled!\n", i);
> @@ -587,8 +589,10 @@ static int wait_forward(void *arg)
>  	get_task_struct(tsk);
>  	yield_to(tsk, true);
>  
> -	for (i = 0; i < fc.chain_length; i++)
> +	for (i = 0; i < fc.chain_length; i++) {
>  		dma_fence_signal(fc.fences[i]);
> +		cond_resched();
> +	}
>  
>  	err = kthread_stop_put(tsk);
>  
> @@ -616,8 +620,10 @@ static int wait_backward(void *arg)
>  	get_task_struct(tsk);
>  	yield_to(tsk, true);
>  
> -	for (i = fc.chain_length; i--; )
> +	for (i = fc.chain_length; i--; ) {
>  		dma_fence_signal(fc.fences[i]);
> +		cond_resched();
> +	}
>  
>  	err = kthread_stop_put(tsk);
>  
> @@ -663,8 +669,10 @@ static int wait_random(void *arg)
>  	get_task_struct(tsk);
>  	yield_to(tsk, true);
>  
> -	for (i = 0; i < fc.chain_length; i++)
> +	for (i = 0; i < fc.chain_length; i++) {
>  		dma_fence_signal(fc.fences[i]);
> +		cond_resched();
> +	}
>  
>  	err = kthread_stop_put(tsk);
>  
> -- 
> 2.25.1
Christian König Feb. 27, 2025, 2:11 p.m. UTC | #4
Am 27.02.25 um 13:52 schrieb Andi Shyti:
> Hi Nitin,
>
> On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote:
>> Give the scheduler a chance to breath by calling cond_resched()
>> as some of the loops may take some time on old machines (like apl/bsw/pnv),
>> and so catch the attention of the watchdogs.
>>
>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
>> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
> This patch goes beyond the intel-gfx domain so that you need to
> add some people in Cc. By running checkpatch, you should add:
>
> Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING FRAMEWORK)
> "Christian König" <christian.koenig@amd.com> (maintainer:DMA BUFFER SHARING FRAMEWORK)
> linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK)
> dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING FRAMEWORK)
>
> I added them now, but you might still be asked to resend.
>
> Said that, at a first glance, I don't have anything against this
> patch.

There has been some push to deprecate cond_resched() cause it is almost always not appropriate.

Saying that if I'm not completely mistaken that here is also not 100% correct usage.

Question is why is the test taking 26 (busy?) seconds to complete? That sounds really long even for a very old CPU.

Do we maybe have an udelay() here which should have been an usleep() or similar?

Regards,
Christian.

>
> Andi
>
>> ---
>> Hi,
>>
>> For reviewer reference, adding here watchdog issue seen on old machines
>> during dma-fence-chain subtests testing. This log is retrieved from device
>> pstore log while testing dam-buf@all-tests:
>>
>> dma-buf: Running dma_fence_chain
>> Panic#1 Part7
>> <6> sizeof(dma_fence_chain)=184
>> <6> dma-buf: Running dma_fence_chain/sanitycheck
>> <6> dma-buf: Running dma_fence_chain/find_seqno
>> <6> dma-buf: Running dma_fence_chain/find_signaled
>> <6> dma-buf: Running dma_fence_chain/find_out_of_order
>> <6> dma-buf: Running dma_fence_chain/find_gap
>> <6> dma-buf: Running dma_fence_chain/find_race
>> <6> Completed 4095 cycles
>> <6> dma-buf: Running dma_fence_chain/signal_forward
>> <6> dma-buf: Running dma_fence_chain/signal_backward
>> <6> dma-buf: Running dma_fence_chain/wait_forward
>> <6> dma-buf: Running dma_fence_chain/wait_backward
>> <0> watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [dmabuf:2263]
>> Panic#1 Part6
>> <4> irq event stamp: 415735
>> <4> hardirqs last  enabled at (415734): [<ffffffff813d3a1b>] handle_softirqs+0xab/0x4d0
>> <4> hardirqs last disabled at (415735): [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0
>> <4> softirqs last  enabled at (415728): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
>> <4> softirqs last disabled at (415733): [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160
>> <4> CPU: 2 UID: 0 PID: 2263 Comm: dmabuf Not tainted 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1
>> <4> Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
>> <4> RIP: 0010:handle_softirqs+0xb1/0x4d0
>> <4> RSP: 0018:ffffc90000154f60 EFLAGS: 00000246
>> <4> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
>> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> <4> RBP: ffffc90000154fb8 R08: 0000000000000000 R09: 0000000000000000
>> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000a
>> <4> R13: 0000000000000200 R14: 0000000000000200 R15: 0000000000400100
>> <4> FS:  000077521c5cd940(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
>> Panic#1 Part5
>> <4> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4> CR2: 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0
>> <4> Call Trace:
>> <4>  <IRQ>
>> <4>  ? show_regs+0x6c/0x80
>> <4>  ? watchdog_timer_fn+0x247/0x2d0
>> <4>  ? __pfx_watchdog_timer_fn+0x10/0x10
>> <4>  ? __hrtimer_run_queues+0x1d0/0x420
>> <4>  ? hrtimer_interrupt+0x116/0x290
>> <4>  ? __sysvec_apic_timer_interrupt+0x70/0x1e0
>> <4>  ? sysvec_apic_timer_interrupt+0x47/0xc0
>> <4>  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
>> <4>  ? handle_softirqs+0xb1/0x4d0
>> <4>  __irq_exit_rcu+0x13f/0x160
>> <4>  irq_exit_rcu+0xe/0x20
>> <4>  sysvec_irq_work+0xa0/0xc0
>> <4>  </IRQ>
>> <4>  <TASK>
>> <4>  asm_sysvec_irq_work+0x1b/0x20
>> <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
>> <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246
>> <4> RAX: 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000
>> <4> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> <4> RBP: ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000
>> <4> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
>> <4> R13: 0000000000000000 R14: 0000000000000246 R15: 000000000003828c
>> Panic#1 Part4
>> <4> dma_fence_signal+0x49/0xb0
>> <4> wait_backward+0xf8/0x140 [dmabuf_selftests]
>> <4> __subtests+0x75/0x1f0 [dmabuf_selftests]
>> <4> dma_fence_chain+0x94/0xe0 [dmabuf_selftests]
>> <4> st_init+0x6a/0xff0 [dmabuf_selftests]
>> <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests]
>> <4> do_one_initcall+0x79/0x400
>> <4> do_init_module+0x97/0x2a0
>> <4> load_module+0x2c23/0x2f60
>> <4> init_module_from_file+0x97/0xe0
>> <4> ? init_module_from_file+0x97/0xe0
>> <4> idempotent_init_module+0x134/0x350
>> <4> __x64_sys_finit_module+0x77/0x100
>> <4> x64_sys_call+0x1f37/0x2650
>> <4> do_syscall_64+0x91/0x180
>> <4> ? trace_hardirqs_off+0x5d/0xe0
>> <4> ? syscall_exit_to_user_mode+0x95/0x260
>> <4> ? do_syscall_64+0x9d/0x180
>> <4> ? do_syscall_64+0x9d/0x180
>> <4> ? irqentry_exit+0x77/0xb0
>> <4> ? sysvec_apic_timer_interrupt+0x57/0xc0
>> <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> <4> RIP: 0033:0x77521e72725d
>>
>>
>>  drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++---
>>  1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
>> index ed4b323886e4..328a66ed59e5 100644
>> --- a/drivers/dma-buf/st-dma-fence-chain.c
>> +++ b/drivers/dma-buf/st-dma-fence-chain.c
>> @@ -505,6 +505,7 @@ static int signal_forward(void *arg)
>>  
>>  	for (i = 0; i < fc.chain_length; i++) {
>>  		dma_fence_signal(fc.fences[i]);
>> +		cond_resched();
>>  
>>  		if (!dma_fence_is_signaled(fc.chains[i])) {
>>  			pr_err("chain[%d] not signaled!\n", i);
>> @@ -537,6 +538,7 @@ static int signal_backward(void *arg)
>>  
>>  	for (i = fc.chain_length; i--; ) {
>>  		dma_fence_signal(fc.fences[i]);
>> +		cond_resched();
>>  
>>  		if (i > 0 && dma_fence_is_signaled(fc.chains[i])) {
>>  			pr_err("chain[%d] is signaled!\n", i);
>> @@ -587,8 +589,10 @@ static int wait_forward(void *arg)
>>  	get_task_struct(tsk);
>>  	yield_to(tsk, true);
>>  
>> -	for (i = 0; i < fc.chain_length; i++)
>> +	for (i = 0; i < fc.chain_length; i++) {
>>  		dma_fence_signal(fc.fences[i]);
>> +		cond_resched();
>> +	}
>>  
>>  	err = kthread_stop_put(tsk);
>>  
>> @@ -616,8 +620,10 @@ static int wait_backward(void *arg)
>>  	get_task_struct(tsk);
>>  	yield_to(tsk, true);
>>  
>> -	for (i = fc.chain_length; i--; )
>> +	for (i = fc.chain_length; i--; ) {
>>  		dma_fence_signal(fc.fences[i]);
>> +		cond_resched();
>> +	}
>>  
>>  	err = kthread_stop_put(tsk);
>>  
>> @@ -663,8 +669,10 @@ static int wait_random(void *arg)
>>  	get_task_struct(tsk);
>>  	yield_to(tsk, true);
>>  
>> -	for (i = 0; i < fc.chain_length; i++)
>> +	for (i = 0; i < fc.chain_length; i++) {
>>  		dma_fence_signal(fc.fences[i]);
>> +		cond_resched();
>> +	}
>>  
>>  	err = kthread_stop_put(tsk);
>>  
>> -- 
>> 2.25.1
Nitin Gote Feb. 28, 2025, 5:12 a.m. UTC | #5
Hi,
 
> Am 27.02.25 um 13:52 schrieb Andi Shyti:
> > Hi Nitin,
> >
> > On Wed, Feb 26, 2025 at 09:25:34PM +0530, Nitin Gote wrote:
> >> Give the scheduler a chance to breath by calling cond_resched() as
> >> some of the loops may take some time on old machines (like
> >> apl/bsw/pnv), and so catch the attention of the watchdogs.
> >>
> >> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12904
> >> Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
> > This patch goes beyond the intel-gfx domain so that you need to add
> > some people in Cc. By running checkpatch, you should add:
> >
> > Sumit Semwal <sumit.semwal@linaro.org> (maintainer:DMA BUFFER SHARING
> > FRAMEWORK) "Christian König" <christian.koenig@amd.com>
> > (maintainer:DMA BUFFER SHARING FRAMEWORK) linux-
> media@vger.kernel.org
> > (open list:DMA BUFFER SHARING FRAMEWORK)
> > dri-devel@lists.freedesktop.org (open list:DMA BUFFER SHARING
> > FRAMEWORK)
> >
> > I added them now, but you might still be asked to resend.

Thank you Andi, for adding dma-buf maintainers.

> >
> > Said that, at a first glance, I don't have anything against this
> > patch.
> 
> There has been some push to deprecate cond_resched() cause it is almost always
> not appropriate.

Thank you Konig for review.
I'm not finding any push/commit or documentation of deprecated cond_resched() api.
If you have any reference, Could you please share a push of deprecated cond_resched()?

> 
> Saying that if I'm not completely mistaken that here is also not 100% correct
> usage.
> 
> Question is why is the test taking 26 (busy?) seconds to complete? That sounds
> really long even for a very old CPU.
> 
> Do we maybe have an udelay() here which should have been an usleep() or
> similar?

I will check and test with udelay() or similar api.
And I will resend a patch after testing.  

Regards,
Nitin
> 
> Regards,
> Christian.
> 
> >
> > Andi
> >
> >> ---
> >> Hi,
> >>
> >> For reviewer reference, adding here watchdog issue seen on old
> >> machines during dma-fence-chain subtests testing. This log is
> >> retrieved from device pstore log while testing dam-buf@all-tests:
> >>
> >> dma-buf: Running dma_fence_chain
> >> Panic#1 Part7
> >> <6> sizeof(dma_fence_chain)=184
> >> <6> dma-buf: Running dma_fence_chain/sanitycheck <6> dma-buf: Running
> >> dma_fence_chain/find_seqno <6> dma-buf: Running
> >> dma_fence_chain/find_signaled <6> dma-buf: Running
> >> dma_fence_chain/find_out_of_order <6> dma-buf: Running
> >> dma_fence_chain/find_gap <6> dma-buf: Running
> >> dma_fence_chain/find_race <6> Completed 4095 cycles <6> dma-buf:
> >> Running dma_fence_chain/signal_forward <6> dma-buf: Running
> >> dma_fence_chain/signal_backward <6> dma-buf: Running
> >> dma_fence_chain/wait_forward <6> dma-buf: Running
> >> dma_fence_chain/wait_backward <0> watchdog: BUG: soft lockup - CPU#2
> >> stuck for 26s! [dmabuf:2263]
> >> Panic#1 Part6
> >> <4> irq event stamp: 415735
> >> <4> hardirqs last  enabled at (415734): [<ffffffff813d3a1b>]
> >> handle_softirqs+0xab/0x4d0 <4> hardirqs last disabled at (415735):
> >> [<ffffffff827c7e31>] sysvec_apic_timer_interrupt+0x11/0xc0
> >> <4> softirqs last  enabled at (415728): [<ffffffff813d3f8f>]
> >> __irq_exit_rcu+0x13f/0x160 <4> softirqs last disabled at (415733):
> >> [<ffffffff813d3f8f>] __irq_exit_rcu+0x13f/0x160 <4> CPU: 2 UID: 0
> >> PID: 2263 Comm: dmabuf Not tainted
> >> 6.14.0-rc2-drm-next_483-g7b91683e7de7+ #1 <4> Hardware name: Intel
> >> corporation NUC6CAYS/NUC6CAYB, BIOS
> AYAPLCEL.86A.0056.2018.0926.1100
> >> 09/26/2018 <4> RIP: 0010:handle_softirqs+0xb1/0x4d0 <4> RSP:
> >> 0018:ffffc90000154f60 EFLAGS: 00000246 <4> RAX: 0000000000000000 RBX:
> >> 0000000000000001 RCX: 0000000000000000 <4> RDX: 0000000000000000
> RSI:
> >> 0000000000000000 RDI: 0000000000000000 <4> RBP: ffffc90000154fb8 R08:
> >> 0000000000000000 R09: 0000000000000000 <4> R10: 0000000000000000
> R11:
> >> 0000000000000000 R12: 000000000000000a <4> R13: 0000000000000200
> R14:
> >> 0000000000000200 R15: 0000000000400100 <4> FS:
> >> 000077521c5cd940(0000) GS:ffff888277900000(0000)
> >> knlGS:0000000000000000
> >> Panic#1 Part5
> >> <4> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> CR2:
> >> 00005dbfee8c00c4 CR3: 0000000133d38000 CR4: 00000000003526f0 <4>
> Call
> >> Trace:
> >> <4>  <IRQ>
> >> <4>  ? show_regs+0x6c/0x80
> >> <4>  ? watchdog_timer_fn+0x247/0x2d0
> >> <4>  ? __pfx_watchdog_timer_fn+0x10/0x10 <4>  ?
> >> __hrtimer_run_queues+0x1d0/0x420 <4>  ? hrtimer_interrupt+0x116/0x290
> >> <4>  ? __sysvec_apic_timer_interrupt+0x70/0x1e0
> >> <4>  ? sysvec_apic_timer_interrupt+0x47/0xc0
> >> <4>  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
> >> <4>  ? handle_softirqs+0xb1/0x4d0
> >> <4>  __irq_exit_rcu+0x13f/0x160
> >> <4>  irq_exit_rcu+0xe/0x20
> >> <4>  sysvec_irq_work+0xa0/0xc0
> >> <4>  </IRQ>
> >> <4>  <TASK>
> >> <4>  asm_sysvec_irq_work+0x1b/0x20
> >> <4> RIP: 0010:_raw_spin_unlock_irqrestore+0x57/0x80
> >> <4> RSP: 0018:ffffc9000292b8f0 EFLAGS: 00000246 <4> RAX:
> >> 0000000000000000 RBX: ffff88810f235480 RCX: 0000000000000000 <4> RDX:
> >> 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 <4>
> RBP:
> >> ffffc9000292b900 R08: 0000000000000000 R09: 0000000000000000 <4> R10:
> >> 0000000000000000 R11: 0000000000000000 R12: 0000000000000246 <4>
> R13:
> >> 0000000000000000 R14: 0000000000000246 R15: 000000000003828c
> >> Panic#1 Part4
> >> <4> dma_fence_signal+0x49/0xb0
> >> <4> wait_backward+0xf8/0x140 [dmabuf_selftests] <4>
> >> __subtests+0x75/0x1f0 [dmabuf_selftests] <4>
> >> dma_fence_chain+0x94/0xe0 [dmabuf_selftests] <4> st_init+0x6a/0xff0
> >> [dmabuf_selftests] <4> ? __pfx_st_init+0x10/0x10 [dmabuf_selftests]
> >> <4> do_one_initcall+0x79/0x400 <4> do_init_module+0x97/0x2a0 <4>
> >> load_module+0x2c23/0x2f60 <4> init_module_from_file+0x97/0xe0 <4> ?
> >> init_module_from_file+0x97/0xe0 <4>
> >> idempotent_init_module+0x134/0x350
> >> <4> __x64_sys_finit_module+0x77/0x100 <4> x64_sys_call+0x1f37/0x2650
> >> <4> do_syscall_64+0x91/0x180 <4> ? trace_hardirqs_off+0x5d/0xe0 <4> ?
> >> syscall_exit_to_user_mode+0x95/0x260
> >> <4> ? do_syscall_64+0x9d/0x180
> >> <4> ? do_syscall_64+0x9d/0x180
> >> <4> ? irqentry_exit+0x77/0xb0
> >> <4> ? sysvec_apic_timer_interrupt+0x57/0xc0
> >> <4> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >> <4> RIP: 0033:0x77521e72725d
> >>
> >>
> >>  drivers/dma-buf/st-dma-fence-chain.c | 14 +++++++++++---
> >>  1 file changed, 11 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/dma-buf/st-dma-fence-chain.c
> >> b/drivers/dma-buf/st-dma-fence-chain.c
> >> index ed4b323886e4..328a66ed59e5 100644
> >> --- a/drivers/dma-buf/st-dma-fence-chain.c
> >> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> >> @@ -505,6 +505,7 @@ static int signal_forward(void *arg)
> >>
> >>  	for (i = 0; i < fc.chain_length; i++) {
> >>  		dma_fence_signal(fc.fences[i]);
> >> +		cond_resched();
> >>
> >>  		if (!dma_fence_is_signaled(fc.chains[i])) {
> >>  			pr_err("chain[%d] not signaled!\n", i); @@ -537,6
> +538,7 @@
> >> static int signal_backward(void *arg)
> >>
> >>  	for (i = fc.chain_length; i--; ) {
> >>  		dma_fence_signal(fc.fences[i]);
> >> +		cond_resched();
> >>
> >>  		if (i > 0 && dma_fence_is_signaled(fc.chains[i])) {
> >>  			pr_err("chain[%d] is signaled!\n", i); @@ -587,8 +589,10
> @@
> >> static int wait_forward(void *arg)
> >>  	get_task_struct(tsk);
> >>  	yield_to(tsk, true);
> >>
> >> -	for (i = 0; i < fc.chain_length; i++)
> >> +	for (i = 0; i < fc.chain_length; i++) {
> >>  		dma_fence_signal(fc.fences[i]);
> >> +		cond_resched();
> >> +	}
> >>
> >>  	err = kthread_stop_put(tsk);
> >>
> >> @@ -616,8 +620,10 @@ static int wait_backward(void *arg)
> >>  	get_task_struct(tsk);
> >>  	yield_to(tsk, true);
> >>
> >> -	for (i = fc.chain_length; i--; )
> >> +	for (i = fc.chain_length; i--; ) {
> >>  		dma_fence_signal(fc.fences[i]);
> >> +		cond_resched();
> >> +	}
> >>
> >>  	err = kthread_stop_put(tsk);
> >>
> >> @@ -663,8 +669,10 @@ static int wait_random(void *arg)
> >>  	get_task_struct(tsk);
> >>  	yield_to(tsk, true);
> >>
> >> -	for (i = 0; i < fc.chain_length; i++)
> >> +	for (i = 0; i < fc.chain_length; i++) {
> >>  		dma_fence_signal(fc.fences[i]);
> >> +		cond_resched();
> >> +	}
> >>
> >>  	err = kthread_stop_put(tsk);
> >>
> >> --
> >> 2.25.1
diff mbox series

Patch

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index ed4b323886e4..328a66ed59e5 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -505,6 +505,7 @@  static int signal_forward(void *arg)
 
 	for (i = 0; i < fc.chain_length; i++) {
 		dma_fence_signal(fc.fences[i]);
+		cond_resched();
 
 		if (!dma_fence_is_signaled(fc.chains[i])) {
 			pr_err("chain[%d] not signaled!\n", i);
@@ -537,6 +538,7 @@  static int signal_backward(void *arg)
 
 	for (i = fc.chain_length; i--; ) {
 		dma_fence_signal(fc.fences[i]);
+		cond_resched();
 
 		if (i > 0 && dma_fence_is_signaled(fc.chains[i])) {
 			pr_err("chain[%d] is signaled!\n", i);
@@ -587,8 +589,10 @@  static int wait_forward(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
-	for (i = 0; i < fc.chain_length; i++)
+	for (i = 0; i < fc.chain_length; i++) {
 		dma_fence_signal(fc.fences[i]);
+		cond_resched();
+	}
 
 	err = kthread_stop_put(tsk);
 
@@ -616,8 +620,10 @@  static int wait_backward(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
-	for (i = fc.chain_length; i--; )
+	for (i = fc.chain_length; i--; ) {
 		dma_fence_signal(fc.fences[i]);
+		cond_resched();
+	}
 
 	err = kthread_stop_put(tsk);
 
@@ -663,8 +669,10 @@  static int wait_random(void *arg)
 	get_task_struct(tsk);
 	yield_to(tsk, true);
 
-	for (i = 0; i < fc.chain_length; i++)
+	for (i = 0; i < fc.chain_length; i++) {
 		dma_fence_signal(fc.fences[i]);
+		cond_resched();
+	}
 
 	err = kthread_stop_put(tsk);