diff mbox series

drm/scheduler: fix timeout worker setup for out of order job completions

Message ID 20180803131825.21945-1-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series drm/scheduler: fix timeout worker setup for out of order job completions | expand

Commit Message

Lucas Stach Aug. 3, 2018, 1:18 p.m. UTC
drm_sched_job_finish() is a work item scheduled for each finished job on
a unbound system workqueue. This means the workers can execute out of order
with regard to the real hardware job completions.

If this happens queueing a timeout worker for the first job on the ring
mirror list is wrong, as this may be a job which has already finished
executing. Fix this by reorganizing the code to always queue the worker
for the next job on the list, if this job hasn't finished yet. This is
robust against a potential reordering of the finish workers.

Also move out the timeout worker cancelling, so that we don't need to
take the job list lock twice. As a small optimization list_del is used
to remove the job from the ring mirror list, as there is no need to
reinit the list head in the job we are about to free.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

Comments

Christian König Aug. 3, 2018, 1:51 p.m. UTC | #1
Hi Lucas,

thanks a lot for taking care of that, but there is one thing you have 
missed:

It is perfectly possible that the job is the last one on the list and 
list_next_entry() doesn't test for that, e.g. it never return NULL.

BTW: There are also quite a lot of other things we could optimize here. 
So if you need something todo, just ping me :)

Cheers,
Christian.

Am 03.08.2018 um 15:18 schrieb Lucas Stach:
> drm_sched_job_finish() is a work item scheduled for each finished job on
> a unbound system workqueue. This means the workers can execute out of order
> with regard to the real hardware job completions.
>
> If this happens queueing a timeout worker for the first job on the ring
> mirror list is wrong, as this may be a job which has already finished
> executing. Fix this by reorganizing the code to always queue the worker
> for the next job on the list, if this job hasn't finished yet. This is
> robust against a potential reordering of the finish workers.
>
> Also move out the timeout worker cancelling, so that we don't need to
> take the job list lock twice. As a small optimization list_del is used
> to remove the job from the ring mirror list, as there is no need to
> reinit the list head in the job we are about to free.
>
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++-----------
>   1 file changed, 9 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 44d480768dfe..0be2859d7b80 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
>   						   finish_work);
>   	struct drm_gpu_scheduler *sched = s_job->sched;
>   
> -	/* remove job from ring_mirror_list */
> +	if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> +		cancel_delayed_work_sync(&s_job->work_tdr);
> +
>   	spin_lock(&sched->job_list_lock);
> -	list_del_init(&s_job->node);
>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> -		struct drm_sched_job *next;
> -
> -		spin_unlock(&sched->job_list_lock);
> -		cancel_delayed_work_sync(&s_job->work_tdr);
> -		spin_lock(&sched->job_list_lock);
> +		struct drm_sched_job *next = list_next_entry(s_job, node);
>   
>   		/* queue TDR for next job */
> -		next = list_first_entry_or_null(&sched->ring_mirror_list,
> -						struct drm_sched_job, node);
> -
> -		if (next)
> +		if (next != s_job &&
> +		    !dma_fence_is_signaled(&s_job->s_fence->finished))
>   			schedule_delayed_work(&next->work_tdr, sched->timeout);
>   	}
> +	/* remove job from ring_mirror_list */
> +	list_del(&s_job->node);
>   	spin_unlock(&sched->job_list_lock);
> +
>   	dma_fence_put(&s_job->s_fence->finished);
>   	sched->ops->free_job(s_job);
>   }
Lucas Stach Aug. 3, 2018, 2:09 p.m. UTC | #2
Hi Christian,

Am Freitag, den 03.08.2018, 15:51 +0200 schrieb Christian König:
> Hi Lucas,
> 
> thanks a lot for taking care of that, but there is one thing you have 
> missed:
> 
> It is perfectly possible that the job is the last one on the list and 
> list_next_entry() doesn't test for that, e.g. it never return NULL.

Urgh, right. For some reason I was thinking I would get the same
sched_job in that case again, thus the "next != s_job" in the condition
below. But for sure the next pointer is to the list head in the
scheduler, must be the temperature in here causing bitflips in my
brain. ;)

Updated patch coming up in few minutes.

> BTW: There are also quite a lot of other things we could optimize here. 
> So if you need something todo, just ping me :)

Hehe, I think we are all in the same boat here: too much stuff on the
plate.

Regards,
Lucas

> Cheers,
> Christian.
> 
> Am 03.08.2018 um 15:18 schrieb Lucas Stach:
> > drm_sched_job_finish() is a work item scheduled for each finished job on
> > a unbound system workqueue. This means the workers can execute out of order
> > with regard to the real hardware job completions.
> > 
> > If this happens queueing a timeout worker for the first job on the ring
> > mirror list is wrong, as this may be a job which has already finished
> > executing. Fix this by reorganizing the code to always queue the worker
> > for the next job on the list, if this job hasn't finished yet. This is
> > robust against a potential reordering of the finish workers.
> > 
> > Also move out the timeout worker cancelling, so that we don't need to
> > take the job list lock twice. As a small optimization list_del is used
> > to remove the job from the ring mirror list, as there is no need to
> > reinit the list head in the job we are about to free.
> > 
> > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++-----------
> >   1 file changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > index 44d480768dfe..0be2859d7b80 100644
> > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> > @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
> > > >   						   finish_work);
> > > >   	struct drm_gpu_scheduler *sched = s_job->sched;
> >   
> > > > -	/* remove job from ring_mirror_list */
> > > > +	if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
> > > > +		cancel_delayed_work_sync(&s_job->work_tdr);
> > +
> > > >   	spin_lock(&sched->job_list_lock);
> > > > -	list_del_init(&s_job->node);
> > > >   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
> > > > -		struct drm_sched_job *next;
> > -
> > > > -		spin_unlock(&sched->job_list_lock);
> > > > -		cancel_delayed_work_sync(&s_job->work_tdr);
> > > > -		spin_lock(&sched->job_list_lock);
> > > > +		struct drm_sched_job *next = list_next_entry(s_job, node);
> >   
> > > >   		/* queue TDR for next job */
> > > > -		next = list_first_entry_or_null(&sched->ring_mirror_list,
> > > > -						struct drm_sched_job, node);
> > -
> > > > -		if (next)
> > > > +		if (next != s_job &&
> > > > +		    !dma_fence_is_signaled(&s_job->s_fence->finished))
> > > >   			schedule_delayed_work(&next->work_tdr, sched->timeout);
> > > >   	}
> > > > +	/* remove job from ring_mirror_list */
> > > > +	list_del(&s_job->node);
> > > >   	spin_unlock(&sched->job_list_lock);
> > +
> > > >   	dma_fence_put(&s_job->s_fence->finished);
> > > >   	sched->ops->free_job(s_job);
> >   }
> 
>
Christian König Aug. 3, 2018, 2:24 p.m. UTC | #3
Am 03.08.2018 um 16:09 schrieb Lucas Stach:
> Hi Christian,
>
> Am Freitag, den 03.08.2018, 15:51 +0200 schrieb Christian König:
>> Hi Lucas,
>>
>> thanks a lot for taking care of that, but there is one thing you have
>> missed:
>>
>> It is perfectly possible that the job is the last one on the list and
>> list_next_entry() doesn't test for that, e.g. it never return NULL.
> Urgh, right. For some reason I was thinking I would get the same
> sched_job in that case again, thus the "next != s_job" in the condition
> below. But for sure the next pointer is to the list head in the
> scheduler, must be the temperature in here causing bitflips in my
> brain. ;)
>
> Updated patch coming up in few minutes.
>
>> BTW: There are also quite a lot of other things we could optimize here.
>> So if you need something todo, just ping me :)
> Hehe, I think we are all in the same boat here: too much stuff on the
> plate.

Yeah, of course :)

Well what I wanted to note is that you could turn the list into a ring 
buffer, that might make the whole handling a lot easier.

Cheers,
Christian.

>
> Regards,
> Lucas
>
>> Cheers,
>> Christian.
>>
>> Am 03.08.2018 um 15:18 schrieb Lucas Stach:
>>> drm_sched_job_finish() is a work item scheduled for each finished job on
>>> a unbound system workqueue. This means the workers can execute out of order
>>> with regard to the real hardware job completions.
>>>
>>> If this happens queueing a timeout worker for the first job on the ring
>>> mirror list is wrong, as this may be a job which has already finished
>>> executing. Fix this by reorganizing the code to always queue the worker
>>> for the next job on the list, if this job hasn't finished yet. This is
>>> robust against a potential reordering of the finish workers.
>>>
>>> Also move out the timeout worker cancelling, so that we don't need to
>>> take the job list lock twice. As a small optimization list_del is used
>>> to remove the job from the ring mirror list, as there is no need to
>>> reinit the list head in the job we are about to free.
>>>
>>>>> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
>>> ---
>>>    drivers/gpu/drm/scheduler/gpu_scheduler.c | 20 +++++++++-----------
>>>    1 file changed, 9 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> index 44d480768dfe..0be2859d7b80 100644
>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>> @@ -452,24 +452,22 @@ static void drm_sched_job_finish(struct work_struct *work)
>>>>>    						   finish_work);
>>>>>    	struct drm_gpu_scheduler *sched = s_job->sched;
>>>    
>>>>> -	/* remove job from ring_mirror_list */
>>>>> +	if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
>>>>> +		cancel_delayed_work_sync(&s_job->work_tdr);
>>> +
>>>>>    	spin_lock(&sched->job_list_lock);
>>>>> -	list_del_init(&s_job->node);
>>>>>    	if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
>>>>> -		struct drm_sched_job *next;
>>> -
>>>>> -		spin_unlock(&sched->job_list_lock);
>>>>> -		cancel_delayed_work_sync(&s_job->work_tdr);
>>>>> -		spin_lock(&sched->job_list_lock);
>>>>> +		struct drm_sched_job *next = list_next_entry(s_job, node);
>>>    
>>>>>    		/* queue TDR for next job */
>>>>> -		next = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>> -						struct drm_sched_job, node);
>>> -
>>>>> -		if (next)
>>>>> +		if (next != s_job &&
>>>>> +		    !dma_fence_is_signaled(&s_job->s_fence->finished))
>>>>>    			schedule_delayed_work(&next->work_tdr, sched->timeout);
>>>>>    	}
>>>>> +	/* remove job from ring_mirror_list */
>>>>> +	list_del(&s_job->node);
>>>>>    	spin_unlock(&sched->job_list_lock);
>>> +
>>>>>    	dma_fence_put(&s_job->s_fence->finished);
>>>>>    	sched->ops->free_job(s_job);
>>>    }
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 44d480768dfe..0be2859d7b80 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -452,24 +452,22 @@  static void drm_sched_job_finish(struct work_struct *work)
 						   finish_work);
 	struct drm_gpu_scheduler *sched = s_job->sched;
 
-	/* remove job from ring_mirror_list */
+	if (sched->timeout != MAX_SCHEDULE_TIMEOUT)
+		cancel_delayed_work_sync(&s_job->work_tdr);
+
 	spin_lock(&sched->job_list_lock);
-	list_del_init(&s_job->node);
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
-		struct drm_sched_job *next;
-
-		spin_unlock(&sched->job_list_lock);
-		cancel_delayed_work_sync(&s_job->work_tdr);
-		spin_lock(&sched->job_list_lock);
+		struct drm_sched_job *next = list_next_entry(s_job, node);
 
 		/* queue TDR for next job */
-		next = list_first_entry_or_null(&sched->ring_mirror_list,
-						struct drm_sched_job, node);
-
-		if (next)
+		if (next != s_job &&
+		    !dma_fence_is_signaled(&s_job->s_fence->finished))
 			schedule_delayed_work(&next->work_tdr, sched->timeout);
 	}
+	/* remove job from ring_mirror_list */
+	list_del(&s_job->node);
 	spin_unlock(&sched->job_list_lock);
+
 	dma_fence_put(&s_job->s_fence->finished);
 	sched->ops->free_job(s_job);
 }