diff mbox series

[v5,7/7] drm/sched: Add helper to queue TDR immediately for current and future jobs

Message ID 20231011235826.585624-8-matthew.brost@intel.com (mailing list archive)
State New, archived
Headers show
Series DRM scheduler changes for Xe | expand

Commit Message

Matthew Brost Oct. 11, 2023, 11:58 p.m. UTC
Add helper to queue TDR immediately for current and future jobs. This is
used in Xe, a new Intel GPU driver, to trigger a TDR to cleanup a
drm_scheduler that encounter errors.

v2:
 - Drop timeout args, rename function, use mod delayed work (Luben)
v3:
 - s/XE/Xe (Luben)
 - present tense in commit message (Luben)
 - Adjust comment for drm_sched_tdr_queue_imm (Luben)

Cc: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
 include/drm/gpu_scheduler.h            |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

Comments

Luben Tuikov Oct. 14, 2023, 3:04 a.m. UTC | #1
On 2023-10-11 19:58, Matthew Brost wrote:
> Add helper to queue TDR immediately for current and future jobs. This is
> used in Xe, a new Intel GPU driver, to trigger a TDR to cleanup a
> drm_scheduler that encounter errors.

I think the best (most optimal) thing to do is to remove the last sentence
mentioning Xe. It is irrelevant to this patch. This patch is functional
as is, and worth having it as is.

So it's best to have just:

	Add a helper whereby a driver can invoke TDR immediately.

Also remove "for current and future jobs" from the title, as it is
implied by how TDR works. We want to say less.

	drm/sched: Add a helper to queue TDR immediately

These are only GPU scheduler changes, worth having on their own. The fact
that a new (future as of this moment) driver (Xe) would use them is irrelevant
at the moment. Other drivers (new, current?) would most likely end up using the changes
of these patches, and these changes go in on their own merit.

Regards,
Luben

> 
> v2:
>  - Drop timeout args, rename function, use mod delayed work (Luben)
> v3:
>  - s/XE/Xe (Luben)
>  - present tense in commit message (Luben)
>  - Adjust comment for drm_sched_tdr_queue_imm (Luben)
> 
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
>  include/drm/gpu_scheduler.h            |  1 +
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c4d5c3d265a8..f2846745b067 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -431,7 +431,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>  
>  	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>  	    !list_empty(&sched->pending_list))
> -		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
> +		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
>  }
>  
>  static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> @@ -441,6 +441,22 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
>  	spin_unlock(&sched->job_list_lock);
>  }
>  
> +/**
> + * drm_sched_tdr_queue_imm: - immediately start job timeout handler
> + *
> + * @sched: scheduler for which the timeout handling should be started.
> + *
> + * Start timeout handling immediately for the named scheduler.
> + */
> +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
> +{
> +	spin_lock(&sched->job_list_lock);
> +	sched->timeout = 0;
> +	drm_sched_start_timeout(sched);
> +	spin_unlock(&sched->job_list_lock);
> +}
> +EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
> +
>  /**
>   * drm_sched_fault - immediately start timeout handler
>   *
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 625ffe040de3..998b32b8d212 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -568,6 +568,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
>  				    struct drm_gpu_scheduler **sched_list,
>                                     unsigned int num_sched_list);
>  
> +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
>  void drm_sched_job_cleanup(struct drm_sched_job *job);
>  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
>  bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
Matthew Brost Oct. 16, 2023, 3:03 p.m. UTC | #2
On Fri, Oct 13, 2023 at 11:04:47PM -0400, Luben Tuikov wrote:
> On 2023-10-11 19:58, Matthew Brost wrote:
> > Add helper to queue TDR immediately for current and future jobs. This is
> > used in Xe, a new Intel GPU driver, to trigger a TDR to cleanup a
> > drm_scheduler that encounter errors.
> 
> I think the best (most optimal) thing to do is to remove the last sentence
> mentioning Xe. It is irrelevant to this patch. This patch is functional
> as is, and worth having it as is.
> 
> So it's best to have just:
> 
> 	Add a helper whereby a driver can invoke TDR immediately.
> 

+1.

> Also remove "for current and future jobs" from the title, as it is
> implied by how TDR works. We want to say less.
> 
> 	drm/sched: Add a helper to queue TDR immediately
>

Yep, my bad I forgot to adjust the commit message in this rev. Will fix.

Matt

> These are only GPU scheduler changes, worth having on their own. The fact
> that a new (future as of this moment) driver (Xe) would use them is irrelevant
> at the moment. Other drivers (new, current?) would most likely end up using the changes
> of these patches, and these changes go in on their own merit.
> 
> Regards,
> Luben
> 
> > 
> > v2:
> >  - Drop timeout args, rename function, use mod delayed work (Luben)
> > v3:
> >  - s/XE/Xe (Luben)
> >  - present tense in commit message (Luben)
> >  - Adjust comment for drm_sched_tdr_queue_imm (Luben)
> > 
> > Cc: Luben Tuikov <luben.tuikov@amd.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >  drivers/gpu/drm/scheduler/sched_main.c | 18 +++++++++++++++++-
> >  include/drm/gpu_scheduler.h            |  1 +
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index c4d5c3d265a8..f2846745b067 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -431,7 +431,7 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
> >  
> >  	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
> >  	    !list_empty(&sched->pending_list))
> > -		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
> > +		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
> >  }
> >  
> >  static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> > @@ -441,6 +441,22 @@ static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
> >  	spin_unlock(&sched->job_list_lock);
> >  }
> >  
> > +/**
> > + * drm_sched_tdr_queue_imm: - immediately start job timeout handler
> > + *
> > + * @sched: scheduler for which the timeout handling should be started.
> > + *
> > + * Start timeout handling immediately for the named scheduler.
> > + */
> > +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
> > +{
> > +	spin_lock(&sched->job_list_lock);
> > +	sched->timeout = 0;
> > +	drm_sched_start_timeout(sched);
> > +	spin_unlock(&sched->job_list_lock);
> > +}
> > +EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
> > +
> >  /**
> >   * drm_sched_fault - immediately start timeout handler
> >   *
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 625ffe040de3..998b32b8d212 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -568,6 +568,7 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
> >  				    struct drm_gpu_scheduler **sched_list,
> >                                     unsigned int num_sched_list);
> >  
> > +void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
> >  void drm_sched_job_cleanup(struct drm_sched_job *job);
> >  void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
> >  bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c4d5c3d265a8..f2846745b067 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -431,7 +431,7 @@  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !list_empty(&sched->pending_list))
-		queue_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
+		mod_delayed_work(sched->timeout_wq, &sched->work_tdr, sched->timeout);
 }
 
 static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
@@ -441,6 +441,22 @@  static void drm_sched_start_timeout_unlocked(struct drm_gpu_scheduler *sched)
 	spin_unlock(&sched->job_list_lock);
 }
 
+/**
+ * drm_sched_tdr_queue_imm: - immediately start job timeout handler
+ *
+ * @sched: scheduler for which the timeout handling should be started.
+ *
+ * Start timeout handling immediately for the named scheduler.
+ */
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched)
+{
+	spin_lock(&sched->job_list_lock);
+	sched->timeout = 0;
+	drm_sched_start_timeout(sched);
+	spin_unlock(&sched->job_list_lock);
+}
+EXPORT_SYMBOL(drm_sched_tdr_queue_imm);
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 625ffe040de3..998b32b8d212 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -568,6 +568,7 @@  void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
 				    struct drm_gpu_scheduler **sched_list,
                                    unsigned int num_sched_list);
 
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
 void drm_sched_wakeup_if_can_queue(struct drm_gpu_scheduler *sched);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);