diff mbox series

[v4,2/3] drm/i915/guc: Close deregister-context race against CT-loss

Message ID 20230926190518.105393-3-alan.previn.teres.alexis@intel.com (mailing list archive)
State New, archived
Headers show
Series Resolve suspend-resume racing with GuC destroy-context-worker | expand

Commit Message

Alan Previn Sept. 26, 2023, 7:05 p.m. UTC
If we are at the end of suspend or very early in resume
its possible an async fence signal (via rcu_call) is triggered
to free_engines which could lead us to the execution of
the context destruction worker (after a prior worker flush).

Thus, when suspending, insert an rcu_barrier at the start
of wait_for_suspend so that all such cases have completed
and context destruction list isn't missing anything.

In destroyed_worker_func, close the race against CT-loss
by checking that CT is enabled before calling into
deregister_destroyed_contexts.

Based on testing, guc_lrc_desc_unpin may still race and fail
as we traverse the GuC's context-destroy list because the
CT could be disabled right before calling GuC's CT send function.

We've witnessed this race condition once every ~6000-8000
suspend-resume cycles while ensuring workloads that render
something onscreen is continuously started just before
we suspend (and the workload is small enough to complete
and trigger the queued engine/context free-up either very
late in suspend or very early in resume).

In such a case, we need to unroll the entire process because
guc-lrc-unpin takes a gt wakeref which only gets released in
the G2H IRQ reply that never comes through in this corner
case. Without the unroll, the taken wakeref is leaked and will
cascade into a kernel hang later at the tail end of suspend in
this function:

   intel_wakeref_wait_for_idle(&gt->wakeref)
   (called by) - intel_gt_pm_wait_for_idle
   (called by) - wait_for_suspend

Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_-
contexts if guc_lrc_desc_unpin fails due to CT send falure.
When unrolling, keep the context in the GuC's destroy-list so
it can get picked up on the next destroy worker invocation
(if suspend aborted) or get fully purged as part of a GuC
sanitization (end of suspend) or a reset flow.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Tested-by: Mousumi Jana <mousumi.jana@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  7 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ++++++++++++++++---
 2 files changed, 77 insertions(+), 11 deletions(-)

Comments

Gupta, Anshuman Oct. 4, 2023, 6:34 a.m. UTC | #1
> -----Original Message-----
> From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>
> Sent: Wednesday, September 27, 2023 12:35 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com>; dri-
> devel@lists.freedesktop.org; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Ceraolo
> Spurio, Daniele <daniele.ceraolospurio@intel.com>; Harrison, John C
> <john.c.harrison@intel.com>; Anshuman Gupta Alan Previn
> <anshuman.gupta@intel.comalan.previn.teres.alexis@intel.com>; Gupta,
> Anshuman <anshuman.gupta@intel.com>; Jana, Mousumi
> <mousumi.jana@intel.com>
> Subject: [PATCH v4 2/3] drm/i915/guc: Close deregister-context race against CT-
> loss
> 
> If we are at the end of suspend or very early in resume its possible an async fence
> signal (via rcu_call) is triggered to free_engines which could lead us to the
> execution of the context destruction worker (after a prior worker flush).
> 
> Thus, when suspending, insert an rcu_barrier at the start of wait_for_suspend so
> that all such cases have completed and context destruction list isn't missing
> anything.
> 
> In destroyed_worker_func, close the race against CT-loss by checking that CT is
> enabled before calling into deregister_destroyed_contexts.
> 
> Based on testing, guc_lrc_desc_unpin may still race and fail as we traverse the
> GuC's context-destroy list because the CT could be disabled right before calling
> GuC's CT send function.
> 
> We've witnessed this race condition once every ~6000-8000 suspend-resume
> cycles while ensuring workloads that render something onscreen is continuously
> started just before we suspend (and the workload is small enough to complete
> and trigger the queued engine/context free-up either very late in suspend or very
> early in resume).
> 
> In such a case, we need to unroll the entire process because guc-lrc-unpin takes a
> gt wakeref which only gets released in the G2H IRQ reply that never comes
> through in this corner case. Without the unroll, the taken wakeref is leaked and
> will cascade into a kernel hang later at the tail end of suspend in this function:
> 
>    intel_wakeref_wait_for_idle(&gt->wakeref)
>    (called by) - intel_gt_pm_wait_for_idle
>    (called by) - wait_for_suspend
> 
> Thus, do an unroll in guc_lrc_desc_unpin and deregister_destroyed_- contexts if
> guc_lrc_desc_unpin fails due to CT send falure.
> When unrolling, keep the context in the GuC's destroy-list so it can get picked up
> on the next destroy worker invocation (if suspend aborted) or get fully purged as
> part of a GuC sanitization (end of suspend) or a reset flow.
> 
> Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
> Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
> Tested-by: Mousumi Jana <mousumi.jana@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  7 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 81 ++++++++++++++++---
>  2 files changed, 77 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 5a942af0a14e..59b5658a17fb 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -289,6 +289,13 @@ int intel_gt_resume(struct intel_gt *gt)
> 
>  static void wait_for_suspend(struct intel_gt *gt)  {
> +	/*
> +	 * On rare occasions, we've observed the fence completion trigger
> +	 * free_engines asynchronously via rcu_call. Ensure those are done.
> +	 * This path is only called on suspend, so it's an acceptable cost.
> +	 */
> +	rcu_barrier();
Let's add the barrier after the end of prepare suspend and at start of late suspend.
To make sure we don't have any async destroy from any user request or any internal  kmd request during i915 suspend?
Br,
Anshuman Gupta.
> +
>  	if (!intel_gt_pm_is_awake(gt))
>  		return;
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index fdd7179f502a..465baf7660d7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -235,6 +235,13 @@ set_context_destroyed(struct intel_context *ce)
>  	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;  }
> 
> +static inline void
> +clr_context_destroyed(struct intel_context *ce) {
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED; }
> +
>  static inline bool context_pending_disable(struct intel_context *ce)  {
>  	return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
> @@ -612,6 +619,8 @@ static int guc_submission_send_busy_loop(struct
> intel_guc *guc,
>  					 u32 g2h_len_dw,
>  					 bool loop)
>  {
> +	int ret;
> +
>  	/*
>  	 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
>  	 * so we don't handle the case where we don't get a reply because we
> @@ -622,7 +631,11 @@ static int guc_submission_send_busy_loop(struct
> intel_guc *guc,
>  	if (g2h_len_dw)
>  		atomic_inc(&guc->outstanding_submission_g2h);
> 
> -	return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> +	ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> +	if (ret)
> +		atomic_dec(&guc->outstanding_submission_g2h);
> +
> +	return ret;
>  }
> 
>  int intel_guc_wait_for_pending_msg(struct intel_guc *guc, @@ -3205,12
> +3218,13 @@ static void guc_context_close(struct intel_context *ce)
>  	spin_unlock_irqrestore(&ce->guc_state.lock, flags);  }
> 
> -static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +static inline int guc_lrc_desc_unpin(struct intel_context *ce)
>  {
>  	struct intel_guc *guc = ce_to_guc(ce);
>  	struct intel_gt *gt = guc_to_gt(guc);
>  	unsigned long flags;
>  	bool disabled;
> +	int ret;
> 
>  	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
>  	GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id)); @@ -3220,19
> +3234,38 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>  	/* Seal race with Reset */
>  	spin_lock_irqsave(&ce->guc_state.lock, flags);
>  	disabled = submission_disabled(guc);
> -	if (likely(!disabled)) {
> -		__intel_gt_pm_get(gt);
> -		set_context_destroyed(ce);
> -		clr_context_registered(ce);
> -	}
> -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>  	if (unlikely(disabled)) {
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>  		release_guc_id(guc, ce);
>  		__guc_context_destroy(ce);
> -		return;
> +		return 0;
>  	}
> 
> -	deregister_context(ce, ce->guc_id.id);
> +	/* GuC is active, lets destroy this context,
> +	 * but at this point we can still be racing with
> +	 * suspend, so we undo everything if the H2G fails
> +	 */
> +
> +	/* Change context state to destroyed and get gt-pm */
> +	__intel_gt_pm_get(gt);
> +	set_context_destroyed(ce);
> +	clr_context_registered(ce);
> +
> +	ret = deregister_context(ce, ce->guc_id.id);
> +	if (ret) {
> +		/* Undo the state change and put gt-pm if that failed */
> +		set_context_registered(ce);
> +		clr_context_destroyed(ce);
> +		/*
> +		 * Dont use might_sleep / ASYNC verion of put because
> +		 * CT loss in deregister_context could mean an ongoing
> +		 * reset or suspend flow. Immediately put before the unlock
> +		 */
> +		__intel_wakeref_put(&gt->wakeref, 0);
> +	}
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	return ret;
>  }
> 
>  static void __guc_context_destroy(struct intel_context *ce) @@ -3300,7
> +3333,22 @@ static void deregister_destroyed_contexts(struct intel_guc *guc)
>  		if (!ce)
>  			break;
> 
> -		guc_lrc_desc_unpin(ce);
> +		if (guc_lrc_desc_unpin(ce)) {
> +			/*
> +			 * This means GuC's CT link severed mid-way which
> could happen
> +			 * in suspend-resume corner cases. In this case, put the
> +			 * context back into the destroyed_contexts list which
> will
> +			 * get picked up on the next context deregistration event
> or
> +			 * purged in a GuC sanitization event
> (reset/unload/wedged/...).
> +			 */
> +			spin_lock_irqsave(&guc->submission_state.lock, flags);
> +			list_add_tail(&ce->destroyed_link,
> +				      &guc-
> >submission_state.destroyed_contexts);
> +			spin_unlock_irqrestore(&guc->submission_state.lock,
> flags);
> +			/* Bail now since the list might never be emptied if h2gs
> fail */
> +			break;
> +		}
> +
>  	}
>  }
> 
> @@ -3311,6 +3359,17 @@ static void destroyed_worker_func(struct
> work_struct *w)
>  	struct intel_gt *gt = guc_to_gt(guc);
>  	int tmp;
> 
> +	/*
> +	 * In rare cases we can get here via async context-free fence-signals that
> +	 * come very late in suspend flow or very early in resume flows. In these
> +	 * cases, GuC won't be ready but just skipping it here is fine as these
> +	 * pending-destroy-contexts get destroyed totally at GuC reset time at
> the
> +	 * end of suspend.. OR.. this worker can be picked up later on the next
> +	 * context destruction trigger after resume-completes
> +	 */
> +	if (!intel_guc_is_ready(guc))
> +		return;
> +
>  	with_intel_gt_pm(gt, tmp)
>  		deregister_destroyed_contexts(guc);
>  }
> --
> 2.39.0
Alan Previn Oct. 4, 2023, 4:46 p.m. UTC | #2
On Wed, 2023-10-04 at 06:34 +0000, Gupta, Anshuman wrote:
> 
> > -----Original Message-----
> > From: Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com

alan:snip
> > @@ -289,6 +289,13 @@ int intel_gt_resume(struct intel_gt *gt)
> > 
> >  static void wait_for_suspend(struct intel_gt *gt)  {
> > +	/*
> > +	 * On rare occasions, we've observed the fence completion trigger
> > +	 * free_engines asynchronously via rcu_call. Ensure those are done.
> > +	 * This path is only called on suspend, so it's an acceptable cost.
> > +	 */
> > +	rcu_barrier();
> Let's add the barrier after the end of prepare suspend and at start of late suspend.
> To make sure we don't have any async destroy from any user request or any internal  kmd request during i915 suspend?
> Br,
> Anshuman Gupta.
alan: some thoughts: actuallly wait_fos_suspend is being called at from both intel_gt_suspend_prepare
and intel_gt_suspend_late. so putting the barrier in above location would get hit for both steps.
However, because wait_for_suspend may optionally wedge the system if outstanding requests were stuck
for more than I915_GT_SUSPEND_IDLE_TIMEOUT, wouldnt it be better to add the barrier before that
check (i.e. in above location) as opposed to after the return from wait_for_suspend at the end
of suspend_prepare - which would defeat the purpose of even checking that intel_gt_wait_for_idle
from within wait_for_suspend? (which is existing code). Perhaps we need to get one last test on this?
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 5a942af0a14e..59b5658a17fb 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -289,6 +289,13 @@  int intel_gt_resume(struct intel_gt *gt)
 
 static void wait_for_suspend(struct intel_gt *gt)
 {
+	/*
+	 * On rare occasions, we've observed the fence completion trigger
+	 * free_engines asynchronously via rcu_call. Ensure those are done.
+	 * This path is only called on suspend, so it's an acceptable cost.
+	 */
+	rcu_barrier();
+
 	if (!intel_gt_pm_is_awake(gt))
 		return;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdd7179f502a..465baf7660d7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -235,6 +235,13 @@  set_context_destroyed(struct intel_context *ce)
 	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline void
+clr_context_destroyed(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state &= ~SCHED_STATE_DESTROYED;
+}
+
 static inline bool context_pending_disable(struct intel_context *ce)
 {
 	return ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE;
@@ -612,6 +619,8 @@  static int guc_submission_send_busy_loop(struct intel_guc *guc,
 					 u32 g2h_len_dw,
 					 bool loop)
 {
+	int ret;
+
 	/*
 	 * We always loop when a send requires a reply (i.e. g2h_len_dw > 0),
 	 * so we don't handle the case where we don't get a reply because we
@@ -622,7 +631,11 @@  static int guc_submission_send_busy_loop(struct intel_guc *guc,
 	if (g2h_len_dw)
 		atomic_inc(&guc->outstanding_submission_g2h);
 
-	return intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+	ret = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+	if (ret)
+		atomic_dec(&guc->outstanding_submission_g2h);
+
+	return ret;
 }
 
 int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
@@ -3205,12 +3218,13 @@  static void guc_context_close(struct intel_context *ce)
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 }
 
-static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+static inline int guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 	struct intel_gt *gt = guc_to_gt(guc);
 	unsigned long flags;
 	bool disabled;
+	int ret;
 
 	GEM_BUG_ON(!intel_gt_pm_is_awake(gt));
 	GEM_BUG_ON(!ctx_id_mapped(guc, ce->guc_id.id));
@@ -3220,19 +3234,38 @@  static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	/* Seal race with Reset */
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	disabled = submission_disabled(guc);
-	if (likely(!disabled)) {
-		__intel_gt_pm_get(gt);
-		set_context_destroyed(ce);
-		clr_context_registered(ce);
-	}
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	if (unlikely(disabled)) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
-		return;
+		return 0;
 	}
 
-	deregister_context(ce, ce->guc_id.id);
+	/* GuC is active, lets destroy this context,
+	 * but at this point we can still be racing with
+	 * suspend, so we undo everything if the H2G fails
+	 */
+
+	/* Change context state to destroyed and get gt-pm */
+	__intel_gt_pm_get(gt);
+	set_context_destroyed(ce);
+	clr_context_registered(ce);
+
+	ret = deregister_context(ce, ce->guc_id.id);
+	if (ret) {
+		/* Undo the state change and put gt-pm if that failed */
+		set_context_registered(ce);
+		clr_context_destroyed(ce);
+		/*
+		 * Dont use might_sleep / ASYNC verion of put because
+		 * CT loss in deregister_context could mean an ongoing
+		 * reset or suspend flow. Immediately put before the unlock
+		 */
+		__intel_wakeref_put(&gt->wakeref, 0);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	return ret;
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
@@ -3300,7 +3333,22 @@  static void deregister_destroyed_contexts(struct intel_guc *guc)
 		if (!ce)
 			break;
 
-		guc_lrc_desc_unpin(ce);
+		if (guc_lrc_desc_unpin(ce)) {
+			/*
+			 * This means GuC's CT link severed mid-way which could happen
+			 * in suspend-resume corner cases. In this case, put the
+			 * context back into the destroyed_contexts list which will
+			 * get picked up on the next context deregistration event or
+			 * purged in a GuC sanitization event (reset/unload/wedged/...).
+			 */
+			spin_lock_irqsave(&guc->submission_state.lock, flags);
+			list_add_tail(&ce->destroyed_link,
+				      &guc->submission_state.destroyed_contexts);
+			spin_unlock_irqrestore(&guc->submission_state.lock, flags);
+			/* Bail now since the list might never be emptied if h2gs fail */
+			break;
+		}
+
 	}
 }
 
@@ -3311,6 +3359,17 @@  static void destroyed_worker_func(struct work_struct *w)
 	struct intel_gt *gt = guc_to_gt(guc);
 	int tmp;
 
+	/*
+	 * In rare cases we can get here via async context-free fence-signals that
+	 * come very late in suspend flow or very early in resume flows. In these
+	 * cases, GuC won't be ready but just skipping it here is fine as these
+	 * pending-destroy-contexts get destroyed totally at GuC reset time at the
+	 * end of suspend.. OR.. this worker can be picked up later on the next
+	 * context destruction trigger after resume-completes
+	 */
+	if (!intel_guc_is_ready(guc))
+		return;
+
 	with_intel_gt_pm(gt, tmp)
 		deregister_destroyed_contexts(guc);
 }