Message ID | 20241021175705.1584521-2-matthew.brost@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Mark work queues with WQ_MEM_RECLAIM | expand |
On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote: > DRM scheduler work queues are used to submit jobs, jobs are in the > path "scheduler work queues" is very generic, how about "drm_gpu_scheduler.submit_wq is used to submit jobs, [...]" > or dma-fences, and dma-fences are in the path of reclaim. Mark s/or/of > scheduler > work queues with WQ_MEM_RECLAIM so these work queues can continue to > make forward progress during reclaim. It is just *one* queue (per scheduler) really, isn't it? If the change above is applied, could just say: "Create the work queue with WQ_MEM_RECLAIM so it can continue [...]" So for my understanding: is this a performance optimization or is it a bug? IOW, would forward progress just be delayed or entirely prevented? Would be cool to state that a bit more clearly in the commit message. Work-queue docu says "MUST": ``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim paths **MUST** have this flag set. The wq is guaranteed to have at least one execution context regardless of memory pressure. So it seems to me that this fixes a bug? Should it be backported in your opinion? > > Cc: Luben Tuikov <ltuikov89@gmail.com> btw., how did you send this email? I couldn't find Luben on CC. Added him. Thx, P. > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: Philipp Stanner <pstanner@redhat.com> > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 6e4d004d09ce..567811957c0f 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -1275,10 +1275,10 @@ int drm_sched_init(struct drm_gpu_scheduler > *sched, > sched->own_submit_wq = false; > } else { > #ifdef CONFIG_LOCKDEP > - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0, > + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, > WQ_MEM_RECLAIM, > &drm_sched_lockdep_map); > #else > - sched->submit_wq = alloc_ordered_workqueue(name, 0); > + sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM); > #endif > if (!sched->submit_wq) > return -ENOMEM;
On Tue, Oct 22, 2024 at 04:19:18PM +0200, Philipp Stanner wrote: > On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote: > > DRM scheduler work queues are used to submit jobs, jobs are in the > > path > > "scheduler work queues" is very generic, how about > "drm_gpu_scheduler.submit_wq is used to submit jobs, [...]" > Sure. > > or dma-fences, and dma-fences are in the path of reclaim. Mark > > s/or/of > Yep. > > scheduler > > work queues with WQ_MEM_RECLAIM so these work queues can continue to > > make forward progress during reclaim. > > It is just *one* queue (per scheduler) really, isn't it? > Yes. > If the change above is applied, could just say: "Create the work queue > with WQ_MEM_RECLAIM so it can continue [...]" > Now you are confusing me. > So for my understanding: is this a performance optimization or is it a > bug? IOW, would forward progress just be delayed or entirely prevented? > Would be cool to state that a bit more clearly in the commit message. > I can make that a bit more clear. > Work-queue docu says "MUST": > > ``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim > paths **MUST** have this flag set. The wq is guaranteed to have at > least one execution context regardless of memory pressure. > > So it seems to me that this fixes a bug? Should it be backported in > your opinion? > Bug - yea probably a fix tag then for backporting. Will add in next rev. > > > > > Cc: Luben Tuikov <ltuikov89@gmail.com> > > btw., how did you send this email? I couldn't find Luben on CC. Added > him. git send-email... I may have forgot to include him on the Cc list. Matt > > Thx, > P. > > > Cc: Danilo Krummrich <dakr@kernel.org> > > Cc: Philipp Stanner <pstanner@redhat.com> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > --- > > drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > b/drivers/gpu/drm/scheduler/sched_main.c > > index 6e4d004d09ce..567811957c0f 100644 > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > @@ -1275,10 +1275,10 @@ int drm_sched_init(struct drm_gpu_scheduler > > *sched, > > sched->own_submit_wq = false; > > } else { > > #ifdef CONFIG_LOCKDEP > > - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0, > > + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, > > WQ_MEM_RECLAIM, > > &drm_sched_lockdep_map); > > #else > > - sched->submit_wq = alloc_ordered_workqueue(name, 0); > > + sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM); > > #endif > > if (!sched->submit_wq) > > return -ENOMEM; >
On Tue, 2024-10-22 at 18:11 +0000, Matthew Brost wrote: > On Tue, Oct 22, 2024 at 04:19:18PM +0200, Philipp Stanner wrote: > > On Mon, 2024-10-21 at 10:57 -0700, Matthew Brost wrote: > > > DRM scheduler work queues are used to submit jobs, jobs are in > > > the > > > path > > > > "scheduler work queues" is very generic, how about > > "drm_gpu_scheduler.submit_wq is used to submit jobs, [...]" > > > > Sure. > > > > or dma-fences, and dma-fences are in the path of reclaim. Mark > > > > s/or/of > > > > Yep. > > > > scheduler > > > work queues with WQ_MEM_RECLAIM so these work queues can continue > > > to > > > make forward progress during reclaim. > > > > It is just *one* queue (per scheduler) really, isn't it? > > > > Yes. > > > If the change above is applied, could just say: "Create the work > > queue > > with WQ_MEM_RECLAIM so it can continue [...]" > > > > Now you are confusing me. I just was hinting at using singular instead of plural. So s/work queues/work queue But that's just a nit > > > So for my understanding: is this a performance optimization or is > > it a > > bug? IOW, would forward progress just be delayed or entirely > > prevented? > > Would be cool to state that a bit more clearly in the commit > > message. > > > > I can make that a bit more clear. > > > Work-queue docu says "MUST": > > > > ``WQ_MEM_RECLAIM`` All wq which might be used in the memory reclaim > > paths **MUST** have this flag set. The wq is guaranteed to have at > > least one execution context regardless of memory pressure. > > > > So it seems to me that this fixes a bug? Should it be backported in > > your opinion? > > > > Bug - yea probably a fix tag then for backporting. Will add in next > rev. Cool, thx for clarifying! P. > > > > > > > > > Cc: Luben Tuikov <ltuikov89@gmail.com> > > > > btw., how did you send this email? I couldn't find Luben on CC. > > Added > > him. > > git send-email... > > I may have forgot to include him on the Cc list. > > Matt > > > > > Thx, > > P. > > > > > Cc: Danilo Krummrich <dakr@kernel.org> > > > Cc: Philipp Stanner <pstanner@redhat.com> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com> > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index 6e4d004d09ce..567811957c0f 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -1275,10 +1275,10 @@ int drm_sched_init(struct > > > drm_gpu_scheduler > > > *sched, > > > sched->own_submit_wq = false; > > > } else { > > > #ifdef CONFIG_LOCKDEP > > > - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0, > > > + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, > > > WQ_MEM_RECLAIM, > > > &drm_sched_lockdep_map); > > > #else > > > - sched->submit_wq = alloc_ordered_workqueue(name, 0); > > > + sched->submit_wq = alloc_ordered_workqueue(name, > > > WQ_MEM_RECLAIM); > > > #endif > > > if (!sched->submit_wq) > > > return -ENOMEM; > > >
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 6e4d004d09ce..567811957c0f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1275,10 +1275,10 @@ int drm_sched_init(struct drm_gpu_scheduler *sched, sched->own_submit_wq = false; } else { #ifdef CONFIG_LOCKDEP - sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0, + sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, WQ_MEM_RECLAIM, &drm_sched_lockdep_map); #else - sched->submit_wq = alloc_ordered_workqueue(name, 0); + sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM); #endif if (!sched->submit_wq) return -ENOMEM;
DRM scheduler work queues are used to submit jobs, jobs are in the path or dma-fences, and dma-fences are in the path of reclaim. Mark scheduler work queues with WQ_MEM_RECLAIM so these work queues can continue to make forward progress during reclaim. Cc: Luben Tuikov <ltuikov89@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> --- drivers/gpu/drm/scheduler/sched_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)