diff mbox series

[v2] drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM

Message ID 20241023235917.1836428-1-matthew.brost@intel.com (mailing list archive)
State New
Headers show
Series [v2] drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM | expand

Commit Message

Matthew Brost Oct. 23, 2024, 11:59 p.m. UTC
drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path
of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler
work queue with WQ_MEM_RECLAIM to ensure forward progress during
reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
progress during reclaim.

v2:
 - Fixes tags (Philipp)
 - Reword commit message (Philipp)

Cc: Luben Tuikov <ltuikov89@gmail.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <pstanner@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq")
Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Nirmoy Das Oct. 24, 2024, 11:44 a.m. UTC | #1
On 10/24/2024 1:59 AM, Matthew Brost wrote:
> drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path
> of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler
> work queue with WQ_MEM_RECLAIM to ensure forward progress during
> reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
> progress during reclaim.
>
> v2:
>  - Fixes tags (Philipp)
>  - Reword commit message (Philipp)
>
> Cc: Luben Tuikov <ltuikov89@gmail.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Philipp Stanner <pstanner@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq")
> Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Acked-by: Nirmoy Das <nirmoy.das@intel.com>

Looks like Xe has a dependency on this now that xe->ordered_wq is allocated with  WQ_MEM_RECLAIM flag:

https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140135v2/bat-lnl-1/igt@xe_exec_fault_mode@twice-invalid-fault.html

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 540231e6bac6..df0a5abb1400 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1283,10 +1283,11 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>  		sched->own_submit_wq = false;
>  	} else {
>  #ifdef CONFIG_LOCKDEP
> -		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
> +		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
> +								       WQ_MEM_RECLAIM,
>  								       &drm_sched_lockdep_map);
>  #else
> -		sched->submit_wq = alloc_ordered_workqueue(name, 0);
> +		sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
>  #endif
>  		if (!sched->submit_wq)
>  			return -ENOMEM;
Matthew Brost Oct. 24, 2024, 3:22 p.m. UTC | #2
On Thu, Oct 24, 2024 at 01:44:41PM +0200, Nirmoy Das wrote:
> 
> On 10/24/2024 1:59 AM, Matthew Brost wrote:
> > drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the path
> > of dma-fences, and dma-fences are in the path of reclaim. Mark scheduler
> > work queue with WQ_MEM_RECLAIM to ensure forward progress during
> > reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
> > progress during reclaim.
> >
> > v2:
> >  - Fixes tags (Philipp)
> >  - Reword commit message (Philipp)
> >
> > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Cc: Philipp Stanner <pstanner@redhat.com>
> > Cc: stable@vger.kernel.org
> > Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for submit_wq")
> > Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work queue rather than kthread")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> 
> Looks like Xe has a dependency on this now that xe->ordered_wq is allocated with  WQ_MEM_RECLAIM flag:
> 

Thanks for pointing this out.

I merged the Xe patches first not realizing this was going to break CI.
Hopefully I can merge this scheduler patch soon.

Matt

> https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-140135v2/bat-lnl-1/igt@xe_exec_fault_mode@twice-invalid-fault.html
> 
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 540231e6bac6..df0a5abb1400 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1283,10 +1283,11 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
> >  		sched->own_submit_wq = false;
> >  	} else {
> >  #ifdef CONFIG_LOCKDEP
> > -		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
> > +		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
> > +								       WQ_MEM_RECLAIM,
> >  								       &drm_sched_lockdep_map);
> >  #else
> > -		sched->submit_wq = alloc_ordered_workqueue(name, 0);
> > +		sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
> >  #endif
> >  		if (!sched->submit_wq)
> >  			return -ENOMEM;
Philipp Stanner Oct. 24, 2024, 3:35 p.m. UTC | #3
On Wed, 2024-10-23 at 16:59 -0700, Matthew Brost wrote:
> drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the
> path
> of dma-fences, and dma-fences are in the path of reclaim. Mark
> scheduler
> work queue with WQ_MEM_RECLAIM to ensure forward progress during
> reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
> progress during reclaim.
> 
> v2:
>  - Fixes tags (Philipp)
>  - Reword commit message (Philipp)
> 
> Cc: Luben Tuikov <ltuikov89@gmail.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Philipp Stanner <pstanner@redhat.com>
> Cc: stable@vger.kernel.org
> Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for
> submit_wq")
> Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work
> queue rather than kthread")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 540231e6bac6..df0a5abb1400 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -1283,10 +1283,11 @@ int drm_sched_init(struct drm_gpu_scheduler
> *sched,
>  		sched->own_submit_wq = false;
>  	} else {
>  #ifdef CONFIG_LOCKDEP
> -		sched->submit_wq =
> alloc_ordered_workqueue_lockdep_map(name, 0,
> +		sched->submit_wq =
> alloc_ordered_workqueue_lockdep_map(name,
> +								    
>    WQ_MEM_RECLAIM,
>  								    
>    &drm_sched_lockdep_map);
>  #else
> -		sched->submit_wq = alloc_ordered_workqueue(name, 0);
> +		sched->submit_wq = alloc_ordered_workqueue(name,
> WQ_MEM_RECLAIM);
>  #endif
>  		if (!sched->submit_wq)
>  			return -ENOMEM;


Cool, thx – looks good from my POV.

Since you now sent this patch as a single one, what would be the
preferred merge plan for this? Your XE-Series doesn't depend on this
IIUC, so should we take this patch here separately into drm-misc-next?


Regards,
P.
Matthew Brost Oct. 24, 2024, 3:47 p.m. UTC | #4
On Thu, Oct 24, 2024 at 05:35:47PM +0200, Philipp Stanner wrote:
> On Wed, 2024-10-23 at 16:59 -0700, Matthew Brost wrote:
> > drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in the
> > path
> > of dma-fences, and dma-fences are in the path of reclaim. Mark
> > scheduler
> > work queue with WQ_MEM_RECLAIM to ensure forward progress during
> > reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
> > progress during reclaim.
> > 
> > v2:
> >  - Fixes tags (Philipp)
> >  - Reword commit message (Philipp)
> > 
> > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Cc: Philipp Stanner <pstanner@redhat.com>
> > Cc: stable@vger.kernel.org
> > Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for
> > submit_wq")
> > Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a work
> > queue rather than kthread")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > b/drivers/gpu/drm/scheduler/sched_main.c
> > index 540231e6bac6..df0a5abb1400 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -1283,10 +1283,11 @@ int drm_sched_init(struct drm_gpu_scheduler
> > *sched,
> >  		sched->own_submit_wq = false;
> >  	} else {
> >  #ifdef CONFIG_LOCKDEP
> > -		sched->submit_wq =
> > alloc_ordered_workqueue_lockdep_map(name, 0,
> > +		sched->submit_wq =
> > alloc_ordered_workqueue_lockdep_map(name,
> > +								    
> >    WQ_MEM_RECLAIM,
> >  								    
> >    &drm_sched_lockdep_map);
> >  #else
> > -		sched->submit_wq = alloc_ordered_workqueue(name, 0);
> > +		sched->submit_wq = alloc_ordered_workqueue(name,
> > WQ_MEM_RECLAIM);
> >  #endif
> >  		if (!sched->submit_wq)
> >  			return -ENOMEM;
> 
> 
> Cool, thx – looks good from my POV.
> 

Can I get a RB?

> Since you now sent this patch as a single one, what would be the
> preferred merge plan for this? Your XE-Series doesn't depend on this
> IIUC, so should we take this patch here separately into drm-misc-next?
> 

Merge this one to drm-misc and we will backport into drm-xe-next.

Matt

> 
> Regards,
> P.
>
Philipp Stanner Oct. 25, 2024, 10:06 a.m. UTC | #5
On Thu, 2024-10-24 at 15:47 +0000, Matthew Brost wrote:
> On Thu, Oct 24, 2024 at 05:35:47PM +0200, Philipp Stanner wrote:
> > On Wed, 2024-10-23 at 16:59 -0700, Matthew Brost wrote:
> > > drm_gpu_scheduler.submit_wq is used to submit jobs, jobs are in
> > > the
> > > path
> > > of dma-fences, and dma-fences are in the path of reclaim. Mark
> > > scheduler
> > > work queue with WQ_MEM_RECLAIM to ensure forward progress during
> > > reclaim; without WQ_MEM_RECLAIM, work queues cannot make forward
> > > progress during reclaim.
> > > 
> > > v2:
> > >  - Fixes tags (Philipp)
> > >  - Reword commit message (Philipp)
> > > 
> > > Cc: Luben Tuikov <ltuikov89@gmail.com>
> > > Cc: Danilo Krummrich <dakr@kernel.org>
> > > Cc: Philipp Stanner <pstanner@redhat.com>
> > > Cc: stable@vger.kernel.org
> > > Fixes: 34f50cc6441b ("drm/sched: Use drm sched lockdep map for
> > > submit_wq")
> > > Fixes: a6149f039369 ("drm/sched: Convert drm scheduler to use a
> > > work
> > > queue rather than kthread")
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 540231e6bac6..df0a5abb1400 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -1283,10 +1283,11 @@ int drm_sched_init(struct
> > > drm_gpu_scheduler
> > > *sched,
> > >  		sched->own_submit_wq = false;
> > >  	} else {
> > >  #ifdef CONFIG_LOCKDEP
> > > -		sched->submit_wq =
> > > alloc_ordered_workqueue_lockdep_map(name, 0,
> > > +		sched->submit_wq =
> > > alloc_ordered_workqueue_lockdep_map(name,
> > > +								 
> > >    
> > >    WQ_MEM_RECLAIM,
> > >  								 
> > >    
> > >    &drm_sched_lockdep_map);
> > >  #else
> > > -		sched->submit_wq = alloc_ordered_workqueue(name,
> > > 0);
> > > +		sched->submit_wq = alloc_ordered_workqueue(name,
> > > WQ_MEM_RECLAIM);
> > >  #endif
> > >  		if (!sched->submit_wq)
> > >  			return -ENOMEM;
> > 
> > 
> > Cool, thx – looks good from my POV.
> > 
> 
> Can I get a RB?

Oh, sure:

Reviewed-by: Philipp Stanner <pstanner@redhat.com>

> 
> > Since you now sent this patch as a single one, what would be the
> > preferred merge plan for this? Your XE-Series doesn't depend on
> > this
> > IIUC, so should we take this patch here separately into drm-misc-
> > next?
> > 
> 
> Merge this one to drm-misc and we will backport into drm-xe-next.

OK – feel free to apply it yourself if you want, then we wouldn't need
to synchronize

Philipp

> 
> Matt
> 
> > 
> > Regards,
> > P.
> > 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 540231e6bac6..df0a5abb1400 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -1283,10 +1283,11 @@  int drm_sched_init(struct drm_gpu_scheduler *sched,
 		sched->own_submit_wq = false;
 	} else {
 #ifdef CONFIG_LOCKDEP
-		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name, 0,
+		sched->submit_wq = alloc_ordered_workqueue_lockdep_map(name,
+								       WQ_MEM_RECLAIM,
 								       &drm_sched_lockdep_map);
 #else
-		sched->submit_wq = alloc_ordered_workqueue(name, 0);
+		sched->submit_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
 #endif
 		if (!sched->submit_wq)
 			return -ENOMEM;