Message ID | 1555599624-12285-4-git-send-email-andrey.grodzovsky@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v5,1/6] drm/amd/display: wait for fence without holding reservation lock | expand |
+Monk to response this patch. 在 2019/4/18 23:00, Andrey Grodzovsky 写道: > For later driver's reference to see if the fence is signaled. > > v2: Move parent fence put to resubmit jobs. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 7816de7..03e6bd8 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > if (s_job->s_fence->parent && > dma_fence_remove_callback(s_job->s_fence->parent, > &s_job->cb)) { > - dma_fence_put(s_job->s_fence->parent); > - s_job->s_fence->parent = NULL; I vaguely remember Monk set parent to be NULL to avoiod potiential free problem after callback removal. -David > atomic_dec(&sched->hw_rq_count); > } else { > /* > @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > sched->ops->free_job(s_job); > } > } > + > + /* > + * Stop pending timer in flight as we rearm it in drm_sched_start. This > + * avoids the pending timeout work in progress to fire right away after > + * this TDR finished and before the newly restarted jobs had a > + * chance to complete. > + */ > + cancel_delayed_work(&sched->work_tdr); > } > > EXPORT_SYMBOL(drm_sched_stop); > @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) > if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) > dma_fence_set_error(&s_fence->finished, -ECANCELED); > > + dma_fence_put(s_job->s_fence->parent); > s_job->s_fence->parent = sched->ops->run_job(s_job); > } > }
On 4/22/19 8:59 AM, Zhou, David(ChunMing) wrote: > +Monk to response this patch. > > > 在 2019/4/18 23:00, Andrey Grodzovsky 写道: >> For later driver's reference to see if the fence is signaled. >> >> v2: Move parent fence put to resubmit jobs. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Reviewed-by: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 7816de7..03e6bd8 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> if (s_job->s_fence->parent && >> dma_fence_remove_callback(s_job->s_fence->parent, >> &s_job->cb)) { >> - dma_fence_put(s_job->s_fence->parent); >> - s_job->s_fence->parent = NULL; > I vaguely remember Monk set parent to be NULL to avoiod potiential free > problem after callback removal. > > > -David I see, we have to avoid setting it to NULL here as in case the guilty job does signal and we avoid HW reset we are not going to resubmit the jobs and hence stay with the same parent on reattachment of the cb. So I need to know exactly what scenario this set to NULL fixes. Andrey > > >> atomic_dec(&sched->hw_rq_count); >> } else { >> /* >> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) >> sched->ops->free_job(s_job); >> } >> } >> + >> + /* >> + * Stop pending timer in flight as we rearm it in drm_sched_start. This >> + * avoids the pending timeout work in progress to fire right away after >> + * this TDR finished and before the newly restarted jobs had a >> + * chance to complete. >> + */ >> + cancel_delayed_work(&sched->work_tdr); >> } >> >> EXPORT_SYMBOL(drm_sched_stop); >> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) >> if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) >> dma_fence_set_error(&s_fence->finished, -ECANCELED); >> >> + dma_fence_put(s_job->s_fence->parent); >> s_job->s_fence->parent = sched->ops->run_job(s_job); >> } >> }
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7816de7..03e6bd8 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) if (s_job->s_fence->parent && dma_fence_remove_callback(s_job->s_fence->parent, &s_job->cb)) { - dma_fence_put(s_job->s_fence->parent); - s_job->s_fence->parent = NULL; atomic_dec(&sched->hw_rq_count); } else { /* @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) sched->ops->free_job(s_job); } } + + /* + * Stop pending timer in flight as we rearm it in drm_sched_start. This + * avoids the pending timeout work in progress to fire right away after + * this TDR finished and before the newly restarted jobs had a + * chance to complete. + */ + cancel_delayed_work(&sched->work_tdr); } EXPORT_SYMBOL(drm_sched_stop); @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched) if (found_guilty && s_job->s_fence->scheduled.context == guilty_context) dma_fence_set_error(&s_fence->finished, -ECANCELED); + dma_fence_put(s_job->s_fence->parent); s_job->s_fence->parent = sched->ops->run_job(s_job); } }