diff mbox series

[1/4] drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM

Message ID 20241021175705.1584521-2-matthew.brost@intel.com (mailing list archive)
State New
Headers show
Series Mark work queues with WQ_MEM_RECLAIM | expand

Commit Message

Matthew Brost Oct. 21, 2024, 5:57 p.m. UTC
DRM scheduler work queues are used to submit jobs, jobs are in the path
or dma-fences, and dma-fences are in the path of reclaim. Mark scheduler
work queues with WQ_MEM_RECLAIM so these work queues can continue to
make forward progress during reclaim.

Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Philipp Stanner Oct. 22, 2024, 2:19 p.m. UTC | #1
On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote:
> DRM scheduler work queues are used to submit jobs, jobs are in the
> path

"scheduler work queues" is very generic, how about
"drm_gpu_scheduler.submit_wq is used to submit jobs, [...]"

> or dma-fences, and dma-fences are in the path of reclaim. Mark

s/or/of

> scheduler
> work queues with WQ_MEM_RECLAIM so these work queues can continue to
> make forward progress during reclaim.

It is just *one* queue (per scheduler) really, isn't it?

If the change above is applied, could just say: "Create the work queue
with WQ_MEM_RECLAIM so it can continue [...]"

So for my understanding: is this a performance optimization or is it a
bug? IOW, would forward progress just be delayed or entirely prevented?
Would be cool to state that a bit more clearly in the commit message.

Work-queue docu says "MUST":

``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim
paths **MUST** have this flag set. The wq is guaranteed to have at
least one execution context regardless of memory pressure.

So it seems to me that this fixes a bug? Should it be backported in
your opinion?


> 
> Cc: Luben Tuikov <ltuikov89@gmail.com>

btw., how did you send this email? I couldn't find Luben on CC. Added
him.

Thx,
P.

> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Philipp Stanner <pstanner@redhat.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 6e4d004d09ce..567811957c0f 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1275,10 +1275,10 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched,
>   sched->own_submit_wq = false;
>   } else {
>  #ifdef CONFIG_LOCKDEP
> - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
> + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
> WQ_MEM_RECLAIM,
>          &drm_sched_lockdep_map);
>  #else
> - sched->submit_wq = alloc_ordered_workqueue(name, 0);
> + sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
>  #endif
>   if (!sched->submit_wq)
>   return -ENOMEM;
Matthew Brost Oct. 22, 2024, 6:11 p.m. UTC | #2
On Tue, Oct 22, 2024 at 04:19:18PM +0200, Philipp Stanner wrote:
> On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote:
> > DRM scheduler work queues are used to submit jobs, jobs are in the
> > path
> 
> "scheduler work queues" is very generic, how about
> "drm_gpu_scheduler.submit_wq is used to submit jobs, [...]"
> 

Sure.

> > or dma-fences, and dma-fences are in the path of reclaim. Mark
> 
> s/or/of
> 

Yep.

> > scheduler
> > work queues with WQ_MEM_RECLAIM so these work queues can continue to
> > make forward progress during reclaim.
> 
> It is just *one* queue (per scheduler) really, isn't it?
> 

Yes.

> If the change above is applied, could just say: "Create the work queue
> with WQ_MEM_RECLAIM so it can continue [...]"
> 

Now you are confusing me.

> So for my understanding: is this a performance optimization or is it a
> bug? IOW, would forward progress just be delayed or entirely prevented?
> Would be cool to state that a bit more clearly in the commit message.
> 

I can make that a bit more clear.

> Work-queue docu says "MUST":
> 
> ``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim
> paths **MUST** have this flag set. The wq is guaranteed to have at
> least one execution context regardless of memory pressure.
> 
> So it seems to me that this fixes a bug? Should it be backported in
> your opinion?
> 

Bug - yea probably a fix tag then for backporting. Will add in next rev.

> 
> > 
> > Cc: Luben Tuikov <ltuikov89@gmail.com>
> 
> btw., how did you send this email? I couldn't find Luben on CC. Added
> him.

git send-email...

I may have forgot to include him on the Cc list.

Matt

> 
> Thx,
> P.
> 
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Cc: Philipp Stanner <pstanner@redhat.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 6e4d004d09ce..567811957c0f 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1275,10 +1275,10 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched,
> >   sched->own_submit_wq = false;
> >   } else {
> >  #ifdef CONFIG_LOCKDEP
> > - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
> > + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
> > WQ_MEM_RECLAIM,
> >          &drm_sched_lockdep_map);
> >  #else
> > - sched->submit_wq = alloc_ordered_workqueue(name, 0);
> > + sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
> >  #endif
> >   if (!sched->submit_wq)
> >   return -ENOMEM;
>
Philipp Stanner Oct. 23, 2024, 7:56 a.m. UTC | #3
On Tue, 2024-10-22 at 18:11 +0000, Matthew Brost wrote:
> On Tue, Oct 22, 2024 at 04:19:18PM +0200, Philipp Stanner wrote:
> > On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote:
> > > DRM scheduler work queues are used to submit jobs, jobs are in
> > > the
> > > path
> > 
> > "scheduler work queues" is very generic, how about
> > "drm_gpu_scheduler.submit_wq is used to submit jobs, [...]"
> > 
> 
> Sure.
> 
> > > or dma-fences, and dma-fences are in the path of reclaim. Mark
> > 
> > s/or/of
> > 
> 
> Yep.
> 
> > > scheduler
> > > work queues with WQ_MEM_RECLAIM so these work queues can continue
> > > to
> > > make forward progress during reclaim.
> > 
> > It is just *one* queue (per scheduler) really, isn't it?
> > 
> 
> Yes.
> 
> > If the change above is applied, could just say: "Create the work
> > queue
> > with WQ_MEM_RECLAIM so it can continue [...]"
> > 
> 
> Now you are confusing me.

I just was hinting at using singular instead of plural.
So
s/work queues/work queue

But that's just a nit

> 
> > So for my understanding: is this a performance optimization or is
> > it a
> > bug? IOW, would forward progress just be delayed or entirely
> > prevented?
> > Would be cool to state that a bit more clearly in the commit
> > message.
> > 
> 
> I can make that a bit more clear.
> 
> > Work-queue docu says "MUST":
> > 
> > ``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim
> > paths **MUST** have this flag set. The wq is guaranteed to have at
> > least one execution context regardless of memory pressure.
> > 
> > So it seems to me that this fixes a bug? Should it be backported in
> > your opinion?
> > 
> 
> Bug - yea probably a fix tag then for backporting. Will add in next
> rev.

Cool, thx for clarifying!

P.

> 
> > 
> > > 
> > > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > 
> > btw., how did you send this email? I couldn't find Luben on CC.
> > Added
> > him.
> 
> git send-email...
> 
> I may have forgot to include him on the Cc list.
> 
> Matt
> 
> > 
> > Thx,
> > P.
> > 
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: Philipp Stanner <pstanner@redhat.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_main.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 6e4d004d09ce..567811957c0f 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1275,10 +1275,10 @@ int drm_sched_init(struct
> > > drm_gpu_scheduler
> > > *sched,
> > >   sched->own_submit_wq = false;
> > >   } else {
> > >  #ifdef CONFIG_LOCKDEP
> > > - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
> > > + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
> > > WQ_MEM_RECLAIM,
> > >          &drm_sched_lockdep_map);
> > >  #else
> > > - sched->submit_wq = alloc_ordered_workqueue(name, 0);
> > > + sched->submit_wq = alloc_ordered_workqueue(name,
> > > WQ_MEM_RECLAIM);
> > >  #endif
> > >   if (!sched->submit_wq)
> > >   return -ENOMEM;
> > 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 6e4d004d09ce..567811957c0f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1275,10 +1275,10 @@  int drm_sched_init(struct drm_gpu_scheduler *sched,
 		sched->own_submit_wq = false;
 	} else {
 #ifdef CONFIG_LOCKDEP
-		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
+		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, WQ_MEM_RECLAIM,
 								       &drm_sched_lockdep_map);
 #else
-		sched->submit_wq = alloc_ordered_workqueue(name, 0);
+		sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
 #endif
 		if (!sched->submit_wq)
 			return -ENOMEM;