diff mbox series

drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL

Message ID 20240130030413.2031009-1-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL | expand

Commit Message

Matthew Brost Jan. 30, 2024, 3:04 a.m. UTC
Rather then loop over entities until one with a ready job is found,
re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.

Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Christian König Jan. 30, 2024, 7:05 a.m. UTC | #1
Am 30.01.24 um 04:04 schrieb Matthew Brost:
> Rather then loop over entities until one with a ready job is found,
> re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
>
> Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 8acbef7ae53d..7e90c9f95611 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
>   	struct drm_sched_entity *entity;
>   	struct dma_fence *fence;
>   	struct drm_sched_fence *s_fence;
> -	struct drm_sched_job *sched_job = NULL;
> +	struct drm_sched_job *sched_job;
>   	int r;
>   
>   	if (READ_ONCE(sched->pause_submit))
>   		return;
>   
>   	/* Find entity with a ready job */
> -	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> -		sched_job = drm_sched_entity_pop_job(entity);
> -		if (!sched_job)
> -			complete_all(&entity->entity_idle);
> -	}
> +	entity = drm_sched_select_entity(sched);
>   	if (!entity)
>   		return;	/* No more work */
>   
> +	sched_job = drm_sched_entity_pop_job(entity);
> +	if (!sched_job) {
> +		complete_all(&entity->entity_idle);
> +		drm_sched_run_job_queue(sched);
> +		return;
> +	}
> +
>   	s_fence = sched_job->s_fence;
>   
>   	atomic_add(sched_job->credits, &sched->credit_count);
Rodrigo Vivi Feb. 2, 2024, 9:58 p.m. UTC | #2
On Tue, Jan 30, 2024 at 08:05:29AM +0100, Christian König wrote:
> Am 30.01.24 um 04:04 schrieb Matthew Brost:
> > Rather then loop over entities until one with a ready job is found,
> > re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
> > 
> > Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")

First of all there's a small typo in this Fixes tag that needs to be fixed.
The correct one is:

Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")

But I couldn't apply this right now in any of our drm-tip trees because it
is not clear where this is coming from originally.

likely amd tree?!

> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>

Christian, if this came from the amd, could you please apply it there and
propagate through your fixes flow?

Thanks,
Rodrigo.

> 
> > ---
> >   drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
> >   1 file changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 8acbef7ae53d..7e90c9f95611 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >   	struct drm_sched_entity *entity;
> >   	struct dma_fence *fence;
> >   	struct drm_sched_fence *s_fence;
> > -	struct drm_sched_job *sched_job = NULL;
> > +	struct drm_sched_job *sched_job;
> >   	int r;
> >   	if (READ_ONCE(sched->pause_submit))
> >   		return;
> >   	/* Find entity with a ready job */
> > -	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> > -		sched_job = drm_sched_entity_pop_job(entity);
> > -		if (!sched_job)
> > -			complete_all(&entity->entity_idle);
> > -	}
> > +	entity = drm_sched_select_entity(sched);
> >   	if (!entity)
> >   		return;	/* No more work */
> > +	sched_job = drm_sched_entity_pop_job(entity);
> > +	if (!sched_job) {
> > +		complete_all(&entity->entity_idle);
> > +		drm_sched_run_job_queue(sched);
> > +		return;
> > +	}
> > +
> >   	s_fence = sched_job->s_fence;
> >   	atomic_add(sched_job->credits, &sched->credit_count);
>
Christian König Feb. 5, 2024, 8:44 a.m. UTC | #3
Am 02.02.24 um 22:58 schrieb Rodrigo Vivi:
> On Tue, Jan 30, 2024 at 08:05:29AM +0100, Christian König wrote:
>> Am 30.01.24 um 04:04 schrieb Matthew Brost:
>>> Rather then loop over entities until one with a ready job is found,
>>> re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
>>>
>>> Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> First of all there's a small typo in this Fixes tag that needs to be fixed.
> The correct one is:
>
> Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
>
> But I couldn't apply this right now in any of our drm-tip trees because it
> is not clear where this is coming from originally.
>
> likely amd tree?!

No, this comes from Matthews work on the DRM scheduler.

Matthews patches were most likely merged through drm-misc.

Regards,
Christian.

>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
> Christian, if this came from the amd, could you please apply it there and
> propagate through your fixes flow?
>
> Thanks,
> Rodrigo.
>
>>> ---
>>>    drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
>>>    1 file changed, 9 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 8acbef7ae53d..7e90c9f95611 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>    	struct drm_sched_entity *entity;
>>>    	struct dma_fence *fence;
>>>    	struct drm_sched_fence *s_fence;
>>> -	struct drm_sched_job *sched_job = NULL;
>>> +	struct drm_sched_job *sched_job;
>>>    	int r;
>>>    	if (READ_ONCE(sched->pause_submit))
>>>    		return;
>>>    	/* Find entity with a ready job */
>>> -	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
>>> -		sched_job = drm_sched_entity_pop_job(entity);
>>> -		if (!sched_job)
>>> -			complete_all(&entity->entity_idle);
>>> -	}
>>> +	entity = drm_sched_select_entity(sched);
>>>    	if (!entity)
>>>    		return;	/* No more work */
>>> +	sched_job = drm_sched_entity_pop_job(entity);
>>> +	if (!sched_job) {
>>> +		complete_all(&entity->entity_idle);
>>> +		drm_sched_run_job_queue(sched);
>>> +		return;
>>> +	}
>>> +
>>>    	s_fence = sched_job->s_fence;
>>>    	atomic_add(sched_job->credits, &sched->credit_count);
Rodrigo Vivi Feb. 5, 2024, 1:33 p.m. UTC | #4
On Mon, Feb 05, 2024 at 09:44:56AM +0100, Christian König wrote:
> Am 02.02.24 um 22:58 schrieb Rodrigo Vivi:
> > On Tue, Jan 30, 2024 at 08:05:29AM +0100, Christian König wrote:
> > > Am 30.01.24 um 04:04 schrieb Matthew Brost:
> > > > Rather then loop over entities until one with a ready job is found,
> > > > re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
> > > > 
> > > > Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> > First of all there's a small typo in this Fixes tag that needs to be fixed.
> > The correct one is:
> > 
> > Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")

Cc: Dave Airlie <airlied@redhat.com>

> > 
> > But I couldn't apply this right now in any of our drm-tip trees because it
> > is not clear where this is coming from originally.
> > 
> > likely amd tree?!
> 
> No, this comes from Matthews work on the DRM scheduler.
> 
> Matthews patches were most likely merged through drm-misc.

the original is not there in drm-misc-next.
it looks like Dave had taken that one directly to drm-next.
So we either need the drm-misc maintainers to have a backmerge or
Dave to take this through the drm-fixes directly.

> 
> Regards,
> Christian.
> 
> > 
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > Reviewed-by: Christian König <christian.koenig@amd.com>
> > Christian, if this came from the amd, could you please apply it there and
> > propagate through your fixes flow?
> > 
> > Thanks,
> > Rodrigo.
> > 
> > > > ---
> > > >    drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
> > > >    1 file changed, 9 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > > > index 8acbef7ae53d..7e90c9f95611 100644
> > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
> > > >    	struct drm_sched_entity *entity;
> > > >    	struct dma_fence *fence;
> > > >    	struct drm_sched_fence *s_fence;
> > > > -	struct drm_sched_job *sched_job = NULL;
> > > > +	struct drm_sched_job *sched_job;
> > > >    	int r;
> > > >    	if (READ_ONCE(sched->pause_submit))
> > > >    		return;
> > > >    	/* Find entity with a ready job */
> > > > -	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> > > > -		sched_job = drm_sched_entity_pop_job(entity);
> > > > -		if (!sched_job)
> > > > -			complete_all(&entity->entity_idle);
> > > > -	}
> > > > +	entity = drm_sched_select_entity(sched);
> > > >    	if (!entity)
> > > >    		return;	/* No more work */
> > > > +	sched_job = drm_sched_entity_pop_job(entity);
> > > > +	if (!sched_job) {
> > > > +		complete_all(&entity->entity_idle);
> > > > +		drm_sched_run_job_queue(sched);
> > > > +		return;
> > > > +	}
> > > > +
> > > >    	s_fence = sched_job->s_fence;
> > > >    	atomic_add(sched_job->credits, &sched->credit_count);
>
Luben Tuikov Feb. 6, 2024, 12:45 a.m. UTC | #5
On 2024-01-29 22:04, Matthew Brost wrote:
> Rather then loop over entities until one with a ready job is found,
> re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
> 
> Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Indeed, we cannot have any loops in the GPU scheduler work items, as we need to bounce
between submit and free in the same work queue. (Coming from the original design before
work items/queues were introduced).

Thanks for fixing this, Matt!

Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Luben Tuikov Feb. 6, 2024, 12:56 a.m. UTC | #6
On 2024-02-05 08:33, Rodrigo Vivi wrote:
> On Mon, Feb 05, 2024 at 09:44:56AM +0100, Christian König wrote:
>> Am 02.02.24 um 22:58 schrieb Rodrigo Vivi:
>>> On Tue, Jan 30, 2024 at 08:05:29AM +0100, Christian König wrote:
>>>> Am 30.01.24 um 04:04 schrieb Matthew Brost:
>>>>> Rather then loop over entities until one with a ready job is found,
>>>>> re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
>>>>>
>>>>> Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
>>> First of all there's a small typo in this Fixes tag that needs to be fixed.
>>> The correct one is:
>>>
>>> Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> 
> Cc: Dave Airlie <airlied@redhat.com>
> 
>>>
>>> But I couldn't apply this right now in any of our drm-tip trees because it
>>> is not clear where this is coming from originally.
>>>
>>> likely amd tree?!
>>
>> No, this comes from Matthews work on the DRM scheduler.
>>
>> Matthews patches were most likely merged through drm-misc.
> 
> the original is not there in drm-misc-next.
> it looks like Dave had taken that one directly to drm-next.
> So we either need the drm-misc maintainers to have a backmerge or
> Dave to take this through the drm-fixes directly.

This is indeed the case.

I was going to push this patch through drm-misc-next, but the original/base patch
(<20240124210811.1639040-1-matthew.brost@intel.com>) isn't there.

If drm-misc maintainers back merge drm-fixes into drm-misc-next,
I'll push this patch into drm-misc-next right away, so that it is complete,
and people can run and test it fully.

Else, Dave will have to pull this patch directly into drm-fixes with our tags,
as was done for the base patch.

Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>

Regards,
Luben

> 
>>
>> Regards,
>> Christian.
>>
>>>
>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>> Christian, if this came from the amd, could you please apply it there and
>>> propagate through your fixes flow?
>>>
>>> Thanks,
>>> Rodrigo.
>>>
>>>>> ---
>>>>>    drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
>>>>>    1 file changed, 9 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index 8acbef7ae53d..7e90c9f95611 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
>>>>>    	struct drm_sched_entity *entity;
>>>>>    	struct dma_fence *fence;
>>>>>    	struct drm_sched_fence *s_fence;
>>>>> -	struct drm_sched_job *sched_job = NULL;
>>>>> +	struct drm_sched_job *sched_job;
>>>>>    	int r;
>>>>>    	if (READ_ONCE(sched->pause_submit))
>>>>>    		return;
>>>>>    	/* Find entity with a ready job */
>>>>> -	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
>>>>> -		sched_job = drm_sched_entity_pop_job(entity);
>>>>> -		if (!sched_job)
>>>>> -			complete_all(&entity->entity_idle);
>>>>> -	}
>>>>> +	entity = drm_sched_select_entity(sched);
>>>>>    	if (!entity)
>>>>>    		return;	/* No more work */
>>>>> +	sched_job = drm_sched_entity_pop_job(entity);
>>>>> +	if (!sched_job) {
>>>>> +		complete_all(&entity->entity_idle);
>>>>> +		drm_sched_run_job_queue(sched);
>>>>> +		return;
>>>>> +	}
>>>>> +
>>>>>    	s_fence = sched_job->s_fence;
>>>>>    	atomic_add(sched_job->credits, &sched->credit_count);
>>
Dave Airlie Feb. 6, 2024, 2:05 a.m. UTC | #7
On Tue, 6 Feb 2024 at 10:56, Luben Tuikov <ltuikov89@gmail.com> wrote:
>
> On 2024-02-05 08:33, Rodrigo Vivi wrote:
> > On Mon, Feb 05, 2024 at 09:44:56AM +0100, Christian König wrote:
> >> Am 02.02.24 um 22:58 schrieb Rodrigo Vivi:
> >>> On Tue, Jan 30, 2024 at 08:05:29AM +0100, Christian König wrote:
> >>>> Am 30.01.24 um 04:04 schrieb Matthew Brost:
> >>>>> Rather then loop over entities until one with a ready job is found,
> >>>>> re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.
> >>>>>
> >>>>> Fixes: 6dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> >>> First of all there's a small typo in this Fixes tag that needs to be fixed.
> >>> The correct one is:
> >>>
> >>> Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
> >
> > Cc: Dave Airlie <airlied@redhat.com>
> >
> >>>
> >>> But I couldn't apply this right now in any of our drm-tip trees because it
> >>> is not clear where this is coming from originally.
> >>>
> >>> likely amd tree?!
> >>
> >> No, this comes from Matthews work on the DRM scheduler.
> >>
> >> Matthews patches were most likely merged through drm-misc.
> >
> > the original is not there in drm-misc-next.
> > it looks like Dave had taken that one directly to drm-next.
> > So we either need the drm-misc maintainers to have a backmerge or
> > Dave to take this through the drm-fixes directly.
>
> This is indeed the case.
>
> I was going to push this patch through drm-misc-next, but the original/base patch
> (<20240124210811.1639040-1-matthew.brost@intel.com>) isn't there.
>
> If drm-misc maintainers back merge drm-fixes into drm-misc-next,
> I'll push this patch into drm-misc-next right away, so that it is complete,
> and people can run and test it fully.
>
> Else, Dave will have to pull this patch directly into drm-fixes with our tags,
> as was done for the base patch.
>

I'll pull this into drm-fixes now, as that is where the base patch went.

Dave.

> Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
>
> Regards,
> Luben
>
> >
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>> Reviewed-by: Christian König <christian.koenig@amd.com>
> >>> Christian, if this came from the amd, could you please apply it there and
> >>> propagate through your fixes flow?
> >>>
> >>> Thanks,
> >>> Rodrigo.
> >>>
> >>>>> ---
> >>>>>    drivers/gpu/drm/scheduler/sched_main.c | 15 +++++++++------
> >>>>>    1 file changed, 9 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> index 8acbef7ae53d..7e90c9f95611 100644
> >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>>>> @@ -1178,21 +1178,24 @@ static void drm_sched_run_job_work(struct work_struct *w)
> >>>>>           struct drm_sched_entity *entity;
> >>>>>           struct dma_fence *fence;
> >>>>>           struct drm_sched_fence *s_fence;
> >>>>> - struct drm_sched_job *sched_job = NULL;
> >>>>> + struct drm_sched_job *sched_job;
> >>>>>           int r;
> >>>>>           if (READ_ONCE(sched->pause_submit))
> >>>>>                   return;
> >>>>>           /* Find entity with a ready job */
> >>>>> - while (!sched_job && (entity = drm_sched_select_entity(sched))) {
> >>>>> -         sched_job = drm_sched_entity_pop_job(entity);
> >>>>> -         if (!sched_job)
> >>>>> -                 complete_all(&entity->entity_idle);
> >>>>> - }
> >>>>> + entity = drm_sched_select_entity(sched);
> >>>>>           if (!entity)
> >>>>>                   return; /* No more work */
> >>>>> + sched_job = drm_sched_entity_pop_job(entity);
> >>>>> + if (!sched_job) {
> >>>>> +         complete_all(&entity->entity_idle);
> >>>>> +         drm_sched_run_job_queue(sched);
> >>>>> +         return;
> >>>>> + }
> >>>>> +
> >>>>>           s_fence = sched_job->s_fence;
> >>>>>           atomic_add(sched_job->credits, &sched->credit_count);
> >>
>
> --
> Regards,
> Luben
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 8acbef7ae53d..7e90c9f95611 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1178,21 +1178,24 @@  static void drm_sched_run_job_work(struct work_struct *w)
 	struct drm_sched_entity *entity;
 	struct dma_fence *fence;
 	struct drm_sched_fence *s_fence;
-	struct drm_sched_job *sched_job = NULL;
+	struct drm_sched_job *sched_job;
 	int r;
 
 	if (READ_ONCE(sched->pause_submit))
 		return;
 
 	/* Find entity with a ready job */
-	while (!sched_job && (entity = drm_sched_select_entity(sched))) {
-		sched_job = drm_sched_entity_pop_job(entity);
-		if (!sched_job)
-			complete_all(&entity->entity_idle);
-	}
+	entity = drm_sched_select_entity(sched);
 	if (!entity)
 		return;	/* No more work */
 
+	sched_job = drm_sched_entity_pop_job(entity);
+	if (!sched_job) {
+		complete_all(&entity->entity_idle);
+		drm_sched_run_job_queue(sched);
+		return;
+	}
+
 	s_fence = sched_job->s_fence;
 
 	atomic_add(sched_job->credits, &sched->credit_count);