diff mbox series

[v5,4/6] drm/sched: Keep s_fence->parent pointer

Message ID 1555599624-12285-4-git-send-email-andrey.grodzovsky@amd.com (mailing list archive)
State New, archived
Headers show
Series [v5,1/6] drm/amd/display: wait for fence without holding reservation lock | expand

Commit Message

Andrey Grodzovsky April 18, 2019, 3 p.m. UTC
For later driver's reference to see if the fence is signaled.

v2: Move parent fence put to resubmit jobs.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Chunming Zhou April 22, 2019, 12:59 p.m. UTC | #1
+Monk to response this patch.


在 2019/4/18 23:00, Andrey Grodzovsky 写道:
> For later driver's reference to see if the fence is signaled.
>
> v2: Move parent fence put to resubmit jobs.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 7816de7..03e6bd8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   		if (s_job->s_fence->parent &&
>   		    dma_fence_remove_callback(s_job->s_fence->parent,
>   					      &s_job->cb)) {
> -			dma_fence_put(s_job->s_fence->parent);
> -			s_job->s_fence->parent = NULL;

I vaguely remember Monk set parent to be NULL to avoiod potiential free 
problem after callback removal.


-David


>   			atomic_dec(&sched->hw_rq_count);
>   		} else {
>   			/*
> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>   				sched->ops->free_job(s_job);
>   		}
>   	}
> +
> +	/*
> +	 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
> +	 * avoids the pending timeout work in progress to fire right away after
> +	 * this TDR finished and before the newly restarted jobs had a
> +	 * chance to complete.
> +	 */
> +	cancel_delayed_work(&sched->work_tdr);
>   }
>   
>   EXPORT_SYMBOL(drm_sched_stop);
> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>   		if (found_guilty && s_job->s_fence->scheduled.context == guilty_context)
>   			dma_fence_set_error(&s_fence->finished, -ECANCELED);
>   
> +		dma_fence_put(s_job->s_fence->parent);
>   		s_job->s_fence->parent = sched->ops->run_job(s_job);
>   	}
>   }
Andrey Grodzovsky April 23, 2019, 3:14 p.m. UTC | #2
On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote:
> +Monk to response this patch.
>
>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> For later driver's reference to see if the fence is signaled.
>>
>> v2: Move parent fence put to resubmit jobs.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
>> ---
>>    drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++--
>>    1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 7816de7..03e6bd8 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    		if (s_job->s_fence->parent &&
>>    		    dma_fence_remove_callback(s_job->s_fence->parent,
>>    					      &s_job->cb)) {
>> -			dma_fence_put(s_job->s_fence->parent);
>> -			s_job->s_fence->parent = NULL;
> I vaguely remember Monk set parent to be NULL to avoiod potiential free
> problem after callback removal.
>
>
> -David

I see, we have to avoid setting it to NULL here as in case the guilty 
job does signal and we avoid HW reset we are not going to resubmit the 
jobs and hence stay with the same parent on reattachment of the cb. So I 
need to know exactly what scenario this set to NULL fixes.

Andrey


>
>
>>    			atomic_dec(&sched->hw_rq_count);
>>    		} else {
>>    			/*
>> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
>>    				sched->ops->free_job(s_job);
>>    		}
>>    	}
>> +
>> +	/*
>> +	 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
>> +	 * avoids the pending timeout work in progress to fire right away after
>> +	 * this TDR finished and before the newly restarted jobs had a
>> +	 * chance to complete.
>> +	 */
>> +	cancel_delayed_work(&sched->work_tdr);
>>    }
>>    
>>    EXPORT_SYMBOL(drm_sched_stop);
>> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
>>    		if (found_guilty && s_job->s_fence->scheduled.context == guilty_context)
>>    			dma_fence_set_error(&s_fence->finished, -ECANCELED);
>>    
>> +		dma_fence_put(s_job->s_fence->parent);
>>    		s_job->s_fence->parent = sched->ops->run_job(s_job);
>>    	}
>>    }
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 7816de7..03e6bd8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -375,8 +375,6 @@  void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 		if (s_job->s_fence->parent &&
 		    dma_fence_remove_callback(s_job->s_fence->parent,
 					      &s_job->cb)) {
-			dma_fence_put(s_job->s_fence->parent);
-			s_job->s_fence->parent = NULL;
 			atomic_dec(&sched->hw_rq_count);
 		} else {
 			/*
@@ -403,6 +401,14 @@  void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad)
 				sched->ops->free_job(s_job);
 		}
 	}
+
+	/*
+	 * Stop pending timer in flight as we rearm it in  drm_sched_start. This
+	 * avoids the pending timeout work in progress to fire right away after
+	 * this TDR finished and before the newly restarted jobs had a
+	 * chance to complete.
+	 */
+	cancel_delayed_work(&sched->work_tdr);
 }
 
 EXPORT_SYMBOL(drm_sched_stop);
@@ -477,6 +483,7 @@  void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched)
 		if (found_guilty && s_job->s_fence->scheduled.context == guilty_context)
 			dma_fence_set_error(&s_fence->finished, -ECANCELED);
 
+		dma_fence_put(s_job->s_fence->parent);
 		s_job->s_fence->parent = sched->ops->run_job(s_job);
 	}
 }