diff mbox series

[1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend

Message ID 20210210221955.10025-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/6] drm/i915/gt: Sanitize GPU during prepare-to-suspend | expand

Commit Message

Chris Wilson Feb. 10, 2021, 10:19 p.m. UTC
After calling intel_gt_suspend_prepare(), the driver starts to turn off
various subsystems, such as clearing the GGTT, before calling
intel_gt_suspend_late() to relinquish control over the GT. However, if
we still have internal GPU state active as we clear the GGTT, the GPU
may write back its internal state to the residual GGTT addresses that
are now pointing into scratch. Let's reset the GPU to clear that
internal state as soon we have idled the GPU in prepare-to-suspend.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Rodrigo Vivi Feb. 11, 2021, 4:25 a.m. UTC | #1
On Wed, Feb 10, 2021 at 10:19:50PM +0000, Chris Wilson wrote:
> After calling intel_gt_suspend_prepare(), the driver starts to turn off
> various subsystems, such as clearing the GGTT, before calling
> intel_gt_suspend_late() to relinquish control over the GT. However, if
> we still have internal GPU state active as we clear the GGTT, the GPU
> may write back its internal state to the residual GGTT addresses that
> are now pointing into scratch. Let's reset the GPU to clear that
> internal state as soon we have idled the GPU in prepare-to-suspend.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 0bd303d2823e..f41612faa269 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
>  	wait_for_suspend(gt);

you just wedged the gpu here...

>  
>  	intel_uc_suspend(&gt->uc);
> +
> +	/* Flush all the contexts and internal state before turning off GGTT */
> +	gt_sanitize(gt, false);

and now we are unsetting wedge here...

is this right?

>  }
>  
>  static suspend_state_t pm_suspend_target(void)
> @@ -337,7 +340,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
>  		intel_llc_disable(&gt->llc);
>  	}
>  
> -	gt_sanitize(gt, false);
> +	gt_sanitize(gt, false); /* Be paranoid, remove all residual GPU state */
>  
>  	GT_TRACE(gt, "\n");
>  }
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Feb. 11, 2021, 8:57 a.m. UTC | #2
Quoting Rodrigo Vivi (2021-02-11 04:25:17)
> On Wed, Feb 10, 2021 at 10:19:50PM +0000, Chris Wilson wrote:
> > After calling intel_gt_suspend_prepare(), the driver starts to turn off
> > various subsystems, such as clearing the GGTT, before calling
> > intel_gt_suspend_late() to relinquish control over the GT. However, if
> > we still have internal GPU state active as we clear the GGTT, the GPU
> > may write back its internal state to the residual GGTT addresses that
> > are now pointing into scratch. Let's reset the GPU to clear that
> > internal state as soon we have idled the GPU in prepare-to-suspend.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index 0bd303d2823e..f41612faa269 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -295,6 +295,9 @@ void intel_gt_suspend_prepare(struct intel_gt *gt)
> >       wait_for_suspend(gt);
> 
> you just wedged the gpu here...

Potentially. As a means to clear a stuck GPU and force it to idle.
 
> >       intel_uc_suspend(&gt->uc);
> > +
> > +     /* Flush all the contexts and internal state before turning off GGTT */
> > +     gt_sanitize(gt, false);
> 
> and now we are unsetting wedge here...
> 
> is this right?

But irrelevant, since it is undone on any of the resume pathways which
must be taken by this point.

Resume has been for many years the method to unwedge a GPU; with the
presumption being that the intervening PCI level reset would be enough
to recover the GPU. Otherwise, it would presumably quite quickly go back
into the wedged state.

The wedging on suspend is just there to cancel outstanding work. Which
is not what we want (we just want to remove work), but is what we have
for the moment. The sanitize is to make sure we don't leak our state
beyond our control of the HW.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 0bd303d2823e..f41612faa269 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -295,6 +295,9 @@  void intel_gt_suspend_prepare(struct intel_gt *gt)
 	wait_for_suspend(gt);
 
 	intel_uc_suspend(&gt->uc);
+
+	/* Flush all the contexts and internal state before turning off GGTT */
+	gt_sanitize(gt, false);
 }
 
 static suspend_state_t pm_suspend_target(void)
@@ -337,7 +340,7 @@  void intel_gt_suspend_late(struct intel_gt *gt)
 		intel_llc_disable(&gt->llc);
 	}
 
-	gt_sanitize(gt, false);
+	gt_sanitize(gt, false); /* Be paranoid, remove all residual GPU state */
 
 	GT_TRACE(gt, "\n");
 }