diff mbox series

[RESEND,v4] drm: Don't free jobs in wait_event_interruptible()

Message ID 20191024162424.38548-1-steven.price@arm.com (mailing list archive)
State New, archived
Headers show
Series [RESEND,v4] drm: Don't free jobs in wait_event_interruptible() | expand

Commit Message

Steven Price Oct. 24, 2019, 4:24 p.m. UTC
drm_sched_cleanup_jobs() attempts to free finished jobs, however because
it is called as the condition of wait_event_interruptible() it must not
sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.

Instead let's rename drm_sched_cleanup_jobs() to
drm_sched_get_cleanup_job() and simply return a job for processing if
there is one. The caller can then call the free_job() callback outside
the wait_event_interruptible() where sleeping is possible before
re-checking and returning to sleep if necessary.

Signed-off-by: Steven Price <steven.price@arm.com>
---
Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/

 drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
 1 file changed, 26 insertions(+), 19 deletions(-)

Comments

Christian Gmeiner Oct. 25, 2019, 9:43 a.m. UTC | #1
Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>:
>
> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
> it is called as the condition of wait_event_interruptible() it must not
> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>
> Instead let's rename drm_sched_cleanup_jobs() to
> drm_sched_get_cleanup_job() and simply return a job for processing if
> there is one. The caller can then call the free_job() callback outside
> the wait_event_interruptible() where sleeping is possible before
> re-checking and returning to sleep if necessary.
>
> Signed-off-by: Steven Price <steven.price@arm.com>

Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>

Without this patch I get the following warning:

[  242.935254] ------------[ cut here ]------------
[  242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731
__might_sleep+0x94/0xa8
[  242.948242] do not call blocking ops when !TASK_RUNNING; state=1
set at [<38751e36>] prepare_to_wait_event+0xa8/0x180
[  242.958923] Modules linked in:
[  242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10
[  242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>]
(show_stack+0x10/0x14)
[  242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>]
(dump_stack+0xd8/0x110)
[  242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c)
[  242.997194] [<c0128adc>] (__warn) from [<c0128f10>]
(warn_slowpath_fmt+0x8c/0xb8)
[  243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>]
(__might_sleep+0x94/0xa8)
[  243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>]
(__mutex_lock+0x38/0xa1c)
[  243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>]
(mutex_lock_nested+0x1c/0x24)
[  243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>]
(etnaviv_cmdbuf_free+0x40/0x8c)
[  243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>]
(etnaviv_submit_put+0x38/0x1c8)
[  243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>]
(drm_sched_cleanup_jobs+0xc8/0xec)
[  243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>]
(drm_sched_main+0x214/0x298)
[  243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>]
(kthread+0x140/0x158)
[  243.071329] [<c0152890>] (kthread) from [<c01010b4>]
(ret_from_fork+0x14/0x20)
[  243.078563] Exception stack(0xec691fb0 to 0xec691ff8)
[  243.083630] 1fa0:                                     00000000
00000000 00000000 00000000
[  243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[  243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  243.106795] irq event stamp: 321
[  243.110098] hardirqs last  enabled at (339): [<c0193854>]
console_unlock+0x430/0x620
[  243.117864] hardirqs last disabled at (346): [<c01934cc>]
console_unlock+0xa8/0x620
[  243.125592] softirqs last  enabled at (362): [<c01024e0>]
__do_softirq+0x2c0/0x590
[  243.133232] softirqs last disabled at (373): [<c0130ed0>]
irq_exit+0x100/0x18c
[  243.140517] ---[ end trace 8afcd79e9e2725b2 ]---
Christian König Oct. 25, 2019, 9:49 a.m. UTC | #2
Am 24.10.19 um 18:24 schrieb Steven Price:
> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
> it is called as the condition of wait_event_interruptible() it must not
> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>
> Instead let's rename drm_sched_cleanup_jobs() to
> drm_sched_get_cleanup_job() and simply return a job for processing if
> there is one. The caller can then call the free_job() callback outside
> the wait_event_interruptible() where sleeping is possible before
> re-checking and returning to sleep if necessary.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>
>   drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>   1 file changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9a0ee74d82dc..148468447ba9 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>   }
>   
>   /**
> - * drm_sched_cleanup_jobs - destroy finished jobs
> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>    *
>    * @sched: scheduler instance
>    *
> - * Remove all finished jobs from the mirror list and destroy them.
> + * Returns the next finished job from the mirror list (if there is one)
> + * ready for it to be destroyed.
>    */
> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> +static struct drm_sched_job *
> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
> +	struct drm_sched_job *job = NULL;

Please don't initialize job here.

>   	unsigned long flags;
>   
>   	/* Don't destroy jobs while the timeout worker is running */
>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>   	    !cancel_delayed_work(&sched->work_tdr))
> -		return;
> -
> +		return NULL;
>   
> -	while (!list_empty(&sched->ring_mirror_list)) {
> -		struct drm_sched_job *job;
> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>   
> -		job = list_first_entry(&sched->ring_mirror_list,
> +	job = list_first_entry_or_null(&sched->ring_mirror_list,
>   				       struct drm_sched_job, node);
> -		if (!dma_fence_is_signaled(&job->s_fence->finished))
> -			break;
>   
> -		spin_lock_irqsave(&sched->job_list_lock, flags);
> +	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>   		/* remove job from ring_mirror_list */
>   		list_del_init(&job->node);
> -		spin_unlock_irqrestore(&sched->job_list_lock, flags);
> -
> -		sched->ops->free_job(job);
> +	} else {
> +		job = NULL;
> +		/* queue timeout for next job */
> +		drm_sched_start_timeout(sched);
>   	}
>   
> -	/* queue timeout for next job */
> -	spin_lock_irqsave(&sched->job_list_lock, flags);
> -	drm_sched_start_timeout(sched);
>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
> +	return job;
>   }
>   
>   /**
> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>   		struct drm_sched_fence *s_fence;
>   		struct drm_sched_job *sched_job;
>   		struct dma_fence *fence;
> +		struct drm_sched_job *cleanup_job = NULL;
>   
>   		wait_event_interruptible(sched->wake_up_worker,
> -					 (drm_sched_cleanup_jobs(sched),
> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop()));
> +					 kthread_should_stop());
> +
> +		while (cleanup_job) {

Better make this just "if (cleanup_job)"... to make sure that we stop 
immediately when the thread should stop.

Apart from that looks good to me now,
Christian.

> +			sched->ops->free_job(cleanup_job);
> +			/* queue timeout for next job */
> +			drm_sched_start_timeout(sched);
> +
> +			cleanup_job = drm_sched_get_cleanup_job(sched);
> +		}
>   
>   		if (!entity)
>   			continue;
Steven Price Oct. 25, 2019, 9:49 a.m. UTC | #3
On 25/10/2019 10:43, Christian Gmeiner wrote:
> Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>:
>>
>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>> it is called as the condition of wait_event_interruptible() it must not
>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>
>> Instead let's rename drm_sched_cleanup_jobs() to
>> drm_sched_get_cleanup_job() and simply return a job for processing if
>> there is one. The caller can then call the free_job() callback outside
>> the wait_event_interruptible() where sleeping is possible before
>> re-checking and returning to sleep if necessary.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
> 
> Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com>
> 
> Without this patch I get the following warning:

Thanks, if you've got an (easily) reproducible case, can you check which
commit this fixes. I *think*:

Fixes: 5918045c4ed4 ("drm/scheduler: rework job destruction")

But I haven't got a reliable way of reproducing this (with Panfrost).

Thanks,

Steve

> 
> [  242.935254] ------------[ cut here ]------------
> [  242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731
> __might_sleep+0x94/0xa8
> [  242.948242] do not call blocking ops when !TASK_RUNNING; state=1
> set at [<38751e36>] prepare_to_wait_event+0xa8/0x180
> [  242.958923] Modules linked in:
> [  242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10
> [  242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>]
> (show_stack+0x10/0x14)
> [  242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>]
> (dump_stack+0xd8/0x110)
> [  242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c)
> [  242.997194] [<c0128adc>] (__warn) from [<c0128f10>]
> (warn_slowpath_fmt+0x8c/0xb8)
> [  243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>]
> (__might_sleep+0x94/0xa8)
> [  243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>]
> (__mutex_lock+0x38/0xa1c)
> [  243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>]
> (mutex_lock_nested+0x1c/0x24)
> [  243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>]
> (etnaviv_cmdbuf_free+0x40/0x8c)
> [  243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>]
> (etnaviv_submit_put+0x38/0x1c8)
> [  243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>]
> (drm_sched_cleanup_jobs+0xc8/0xec)
> [  243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>]
> (drm_sched_main+0x214/0x298)
> [  243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>]
> (kthread+0x140/0x158)
> [  243.071329] [<c0152890>] (kthread) from [<c01010b4>]
> (ret_from_fork+0x14/0x20)
> [  243.078563] Exception stack(0xec691fb0 to 0xec691ff8)
> [  243.083630] 1fa0:                                     00000000
> 00000000 00000000 00000000
> [  243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000
> [  243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  243.106795] irq event stamp: 321
> [  243.110098] hardirqs last  enabled at (339): [<c0193854>]
> console_unlock+0x430/0x620
> [  243.117864] hardirqs last disabled at (346): [<c01934cc>]
> console_unlock+0xa8/0x620
> [  243.125592] softirqs last  enabled at (362): [<c01024e0>]
> __do_softirq+0x2c0/0x590
> [  243.133232] softirqs last disabled at (373): [<c0130ed0>]
> irq_exit+0x100/0x18c
> [  243.140517] ---[ end trace 8afcd79e9e2725b2 ]---
>
Steven Price Oct. 25, 2019, 10:26 a.m. UTC | #4
On 25/10/2019 10:49, Koenig, Christian wrote:
> Am 24.10.19 um 18:24 schrieb Steven Price:
>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>> it is called as the condition of wait_event_interruptible() it must not
>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>
>> Instead let's rename drm_sched_cleanup_jobs() to
>> drm_sched_get_cleanup_job() and simply return a job for processing if
>> there is one. The caller can then call the free_job() callback outside
>> the wait_event_interruptible() where sleeping is possible before
>> re-checking and returning to sleep if necessary.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>>
>>   drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>>   1 file changed, 26 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 9a0ee74d82dc..148468447ba9 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>   }
>>   
>>   /**
>> - * drm_sched_cleanup_jobs - destroy finished jobs
>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>    *
>>    * @sched: scheduler instance
>>    *
>> - * Remove all finished jobs from the mirror list and destroy them.
>> + * Returns the next finished job from the mirror list (if there is one)
>> + * ready for it to be destroyed.
>>    */
>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +static struct drm_sched_job *
>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   {
>> +	struct drm_sched_job *job = NULL;
> 
> Please don't initialize job here.

Good spot, will fix.

>>   	unsigned long flags;
>>   
>>   	/* Don't destroy jobs while the timeout worker is running */
>>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>   	    !cancel_delayed_work(&sched->work_tdr))
>> -		return;
>> -
>> +		return NULL;
>>   
>> -	while (!list_empty(&sched->ring_mirror_list)) {
>> -		struct drm_sched_job *job;
>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>   
>> -		job = list_first_entry(&sched->ring_mirror_list,
>> +	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>   				       struct drm_sched_job, node);
>> -		if (!dma_fence_is_signaled(&job->s_fence->finished))
>> -			break;
>>   
>> -		spin_lock_irqsave(&sched->job_list_lock, flags);
>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>   		/* remove job from ring_mirror_list */
>>   		list_del_init(&job->node);
>> -		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>> -
>> -		sched->ops->free_job(job);
>> +	} else {
>> +		job = NULL;
>> +		/* queue timeout for next job */
>> +		drm_sched_start_timeout(sched);
>>   	}
>>   
>> -	/* queue timeout for next job */
>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>> -	drm_sched_start_timeout(sched);
>>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>   
>> +	return job;
>>   }
>>   
>>   /**
>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>>   		struct drm_sched_fence *s_fence;
>>   		struct drm_sched_job *sched_job;
>>   		struct dma_fence *fence;
>> +		struct drm_sched_job *cleanup_job = NULL;
>>   
>>   		wait_event_interruptible(sched->wake_up_worker,
>> -					 (drm_sched_cleanup_jobs(sched),
>> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>   					 (!drm_sched_blocked(sched) &&
>>   					  (entity = drm_sched_select_entity(sched))) ||
>> -					 kthread_should_stop()));
>> +					 kthread_should_stop());
>> +
>> +		while (cleanup_job) {
> 
> Better make this just "if (cleanup_job)"... to make sure that we stop 
> immediately when the thread should stop.

Ok, no problem. Note that this is a change in behaviour (previously
drm_sched_cleanup_jobs() was called *before* checking
kthread_should_stop()). But I can't see the harm.

Steve

> Apart from that looks good to me now,
> Christian.
> 
>> +			sched->ops->free_job(cleanup_job);
>> +			/* queue timeout for next job */
>> +			drm_sched_start_timeout(sched);
>> +
>> +			cleanup_job = drm_sched_get_cleanup_job(sched);
>> +		}
>>   
>>   		if (!entity)
>>   			continue;
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Christian König Oct. 25, 2019, 10:34 a.m. UTC | #5
Am 25.10.19 um 12:26 schrieb Steven Price:
> On 25/10/2019 10:49, Koenig, Christian wrote:
>> Am 24.10.19 um 18:24 schrieb Steven Price:
>>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>>> it is called as the condition of wait_event_interruptible() it must not
>>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>>
>>> Instead let's rename drm_sched_cleanup_jobs() to
>>> drm_sched_get_cleanup_job() and simply return a job for processing if
>>> there is one. The caller can then call the free_job() callback outside
>>> the wait_event_interruptible() where sleeping is possible before
>>> re-checking and returning to sleep if necessary.
>>>
>>> Signed-off-by: Steven Price <steven.price@arm.com>
>>> ---
>>> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/
>>>
>>>    drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++-----------
>>>    1 file changed, 26 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..148468447ba9 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>>    }
>>>    
>>>    /**
>>> - * drm_sched_cleanup_jobs - destroy finished jobs
>>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>>     *
>>>     * @sched: scheduler instance
>>>     *
>>> - * Remove all finished jobs from the mirror list and destroy them.
>>> + * Returns the next finished job from the mirror list (if there is one)
>>> + * ready for it to be destroyed.
>>>     */
>>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>> +static struct drm_sched_job *
>>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>>    {
>>> +	struct drm_sched_job *job = NULL;
>> Please don't initialize job here.
> Good spot, will fix.
>
>>>    	unsigned long flags;
>>>    
>>>    	/* Don't destroy jobs while the timeout worker is running */
>>>    	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>    	    !cancel_delayed_work(&sched->work_tdr))
>>> -		return;
>>> -
>>> +		return NULL;
>>>    
>>> -	while (!list_empty(&sched->ring_mirror_list)) {
>>> -		struct drm_sched_job *job;
>>> +	spin_lock_irqsave(&sched->job_list_lock, flags);
>>>    
>>> -		job = list_first_entry(&sched->ring_mirror_list,
>>> +	job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>    				       struct drm_sched_job, node);
>>> -		if (!dma_fence_is_signaled(&job->s_fence->finished))
>>> -			break;
>>>    
>>> -		spin_lock_irqsave(&sched->job_list_lock, flags);
>>> +	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
>>>    		/* remove job from ring_mirror_list */
>>>    		list_del_init(&job->node);
>>> -		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>> -
>>> -		sched->ops->free_job(job);
>>> +	} else {
>>> +		job = NULL;
>>> +		/* queue timeout for next job */
>>> +		drm_sched_start_timeout(sched);
>>>    	}
>>>    
>>> -	/* queue timeout for next job */
>>> -	spin_lock_irqsave(&sched->job_list_lock, flags);
>>> -	drm_sched_start_timeout(sched);
>>>    	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>>    
>>> +	return job;
>>>    }
>>>    
>>>    /**
>>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param)
>>>    		struct drm_sched_fence *s_fence;
>>>    		struct drm_sched_job *sched_job;
>>>    		struct dma_fence *fence;
>>> +		struct drm_sched_job *cleanup_job = NULL;
>>>    
>>>    		wait_event_interruptible(sched->wake_up_worker,
>>> -					 (drm_sched_cleanup_jobs(sched),
>>> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>>    					 (!drm_sched_blocked(sched) &&
>>>    					  (entity = drm_sched_select_entity(sched))) ||
>>> -					 kthread_should_stop()));
>>> +					 kthread_should_stop());
>>> +
>>> +		while (cleanup_job) {
>> Better make this just "if (cleanup_job)"... to make sure that we stop
>> immediately when the thread should stop.
> Ok, no problem. Note that this is a change in behaviour (previously
> drm_sched_cleanup_jobs() was called *before* checking
> kthread_should_stop()). But I can't see the harm.

Yeah, but this is actually a rather nice improvement.

When we say that the thread should stop then that should happen 
immediately and not cleanup the jobs first.

Christian.

>
> Steve
>
>> Apart from that looks good to me now,
>> Christian.
>>
>>> +			sched->ops->free_job(cleanup_job);
>>> +			/* queue timeout for next job */
>>> +			drm_sched_start_timeout(sched);
>>> +
>>> +			cleanup_job = drm_sched_get_cleanup_job(sched);
>>> +		}
>>>    
>>>    		if (!entity)
>>>    			continue;
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0ee74d82dc..148468447ba9 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -622,43 +622,41 @@  static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
 }
 
 /**
- * drm_sched_cleanup_jobs - destroy finished jobs
+ * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
  *
  * @sched: scheduler instance
  *
- * Remove all finished jobs from the mirror list and destroy them.
+ * Returns the next finished job from the mirror list (if there is one)
+ * ready for it to be destroyed.
  */
-static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
+static struct drm_sched_job *
+drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 {
+	struct drm_sched_job *job = NULL;
 	unsigned long flags;
 
 	/* Don't destroy jobs while the timeout worker is running */
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !cancel_delayed_work(&sched->work_tdr))
-		return;
-
+		return NULL;
 
-	while (!list_empty(&sched->ring_mirror_list)) {
-		struct drm_sched_job *job;
+	spin_lock_irqsave(&sched->job_list_lock, flags);
 
-		job = list_first_entry(&sched->ring_mirror_list,
+	job = list_first_entry_or_null(&sched->ring_mirror_list,
 				       struct drm_sched_job, node);
-		if (!dma_fence_is_signaled(&job->s_fence->finished))
-			break;
 
-		spin_lock_irqsave(&sched->job_list_lock, flags);
+	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 		/* remove job from ring_mirror_list */
 		list_del_init(&job->node);
-		spin_unlock_irqrestore(&sched->job_list_lock, flags);
-
-		sched->ops->free_job(job);
+	} else {
+		job = NULL;
+		/* queue timeout for next job */
+		drm_sched_start_timeout(sched);
 	}
 
-	/* queue timeout for next job */
-	spin_lock_irqsave(&sched->job_list_lock, flags);
-	drm_sched_start_timeout(sched);
 	spin_unlock_irqrestore(&sched->job_list_lock, flags);
 
+	return job;
 }
 
 /**
@@ -698,12 +696,21 @@  static int drm_sched_main(void *param)
 		struct drm_sched_fence *s_fence;
 		struct drm_sched_job *sched_job;
 		struct dma_fence *fence;
+		struct drm_sched_job *cleanup_job = NULL;
 
 		wait_event_interruptible(sched->wake_up_worker,
-					 (drm_sched_cleanup_jobs(sched),
+					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
 					 (!drm_sched_blocked(sched) &&
 					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop()));
+					 kthread_should_stop());
+
+		while (cleanup_job) {
+			sched->ops->free_job(cleanup_job);
+			/* queue timeout for next job */
+			drm_sched_start_timeout(sched);
+
+			cleanup_job = drm_sched_get_cleanup_job(sched);
+		}
 
 		if (!entity)
 			continue;