Message ID | 20190909225536.7037-2-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915/gt: Allow clobbering gpu resets if the display is off | expand |
On 9/9/19 3:55 PM, Chris Wilson wrote: > Unwedging the GPU requires a successful GPU reset before we restore the > default submission, or else we may see residual context switch events > that we were not expecting. > > Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index fe57296b790c..5242496a893a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > struct intel_gt_timelines *timelines = >->timelines; > struct intel_timeline *tl; > unsigned long flags; > + bool ok; > > if (!test_bit(I915_WEDGED, >->reset.flags)) > return true; > @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > } > spin_unlock_irqrestore(&timelines->lock, flags); > > - intel_gt_sanitize(gt, false); > + ok = false; > + if (!reset_clobbers_display(gt->i915)) > + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; Of the thing we had in the gt_sanitize, we're ok skipping the uc_sanitize() because we take care of that during wedge (from intel_uc_reset_prepare), but what about the loop of __intel_engine_reset()? Is that safe to skip here? Apart from that, the patch LGTM. Worth noting that with this change a successful reset is required to unwedge even after a suspend/resume cycle (in gem_sanitize), which is a good thing IMO. Daniele > + if (!ok) > + return false; > > /* > * Undo nop_submit_request. We prevent all new i915 requests from >
Quoting Daniele Ceraolo Spurio (2019-09-10 01:59:38) > > > On 9/9/19 3:55 PM, Chris Wilson wrote: > > Unwedging the GPU requires a successful GPU reset before we restore the > > default submission, or else we may see residual context switch events > > that we were not expecting. > > > > Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > > index fe57296b790c..5242496a893a 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > > @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > > struct intel_gt_timelines *timelines = >->timelines; > > struct intel_timeline *tl; > > unsigned long flags; > > + bool ok; > > > > if (!test_bit(I915_WEDGED, >->reset.flags)) > > return true; > > @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > > } > > spin_unlock_irqrestore(&timelines->lock, flags); > > > > - intel_gt_sanitize(gt, false); > > + ok = false; > > + if (!reset_clobbers_display(gt->i915)) > > + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; > > Of the thing we had in the gt_sanitize, we're ok skipping the > uc_sanitize() because we take care of that during wedge (from > intel_uc_reset_prepare), but what about the loop of > __intel_engine_reset()? Is that safe to skip here? I think yes, because we always follow the unwedge with a GT restart. That is either via the full reset or the sanitize+restart on resume. Both call paths will also set the wedged bit if they fail. gem_eio/suspend should be testing the recovery upon resume path, and even gem_eio/*-stress should give responsible coverage of the normal recovery via full reset. > Apart from that, the patch LGTM. Worth noting that with this change a > successful reset is required to unwedge even after a suspend/resume > cycle (in gem_sanitize), which is a good thing IMO. Hence why relaxing the gpu_clobbers_display is important to retain the ability to clear wedged across suspend on older devices. -Chris
Hi Chris, On Tuesday, September 10, 2019 12:55:36 AM CEST Chris Wilson wrote: > Unwedging the GPU requires a successful GPU reset before we restore the > default submission, or else we may see residual context switch events > that we were not expecting. > > Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/ gt/intel_reset.c > index fe57296b790c..5242496a893a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > struct intel_gt_timelines *timelines = >->timelines; > struct intel_timeline *tl; > unsigned long flags; > + bool ok; > > if (!test_bit(I915_WEDGED, >->reset.flags)) > return true; > @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) > } > spin_unlock_irqrestore(&timelines->lock, flags); > > - intel_gt_sanitize(gt, false); > + ok = false; > + if (!reset_clobbers_display(gt->i915)) > + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; > + if (!ok) > + return false; Before your change, that code was executed inside intel_gt_sanitize(gt, false) which unfortunately didn't return any result. The same outcome could be achieved by redefining intel_gt_sanitize() to return that result and saying: if (!intel_gt_sanitize(gt, false) return false; Is there any specific reason for intel_gt_sanitize() returning void? Thanks, Janusz > > /* > * Undo nop_submit_request. We prevent all new i915 requests from >
On 9/9/19 11:06 PM, Chris Wilson wrote: > Quoting Daniele Ceraolo Spurio (2019-09-10 01:59:38) >> >> >> On 9/9/19 3:55 PM, Chris Wilson wrote: >>> Unwedging the GPU requires a successful GPU reset before we restore the >>> default submission, or else we may see residual context switch events >>> that we were not expecting. >>> >>> Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> >>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++- >>> 1 file changed, 6 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c >>> index fe57296b790c..5242496a893a 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c >>> @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) >>> struct intel_gt_timelines *timelines = >->timelines; >>> struct intel_timeline *tl; >>> unsigned long flags; >>> + bool ok; >>> >>> if (!test_bit(I915_WEDGED, >->reset.flags)) >>> return true; >>> @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) >>> } >>> spin_unlock_irqrestore(&timelines->lock, flags); >>> >>> - intel_gt_sanitize(gt, false); >>> + ok = false; >>> + if (!reset_clobbers_display(gt->i915)) >>> + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; >> >> Of the thing we had in the gt_sanitize, we're ok skipping the >> uc_sanitize() because we take care of that during wedge (from >> intel_uc_reset_prepare), but what about the loop of >> __intel_engine_reset()? Is that safe to skip here? > > I think yes, because we always follow the unwedge with a GT restart. That > is either via the full reset or the sanitize+restart on resume. Both call > paths will also set the wedged bit if they fail. gem_eio/suspend should > be testing the recovery upon resume path, and even gem_eio/*-stress > should give responsible coverage of the normal recovery via full reset. > >> Apart from that, the patch LGTM. Worth noting that with this change a >> successful reset is required to unwedge even after a suspend/resume >> cycle (in gem_sanitize), which is a good thing IMO. > > Hence why relaxing the gpu_clobbers_display is important to retain the > ability to clear wedged across suspend on older devices. > -Chris > Sold! Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Daniele
Quoting Janusz Krzysztofik (2019-09-10 08:39:51) > Hi Chris, > > On Tuesday, September 10, 2019 12:55:36 AM CEST Chris Wilson wrote: > > @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt > *gt) > > } > > spin_unlock_irqrestore(&timelines->lock, flags); > > > > - intel_gt_sanitize(gt, false); > > + ok = false; > > + if (!reset_clobbers_display(gt->i915)) > > + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; > > + if (!ok) > > + return false; > > Before your change, that code was executed inside intel_gt_sanitize(gt, false) > which unfortunately didn't return any result. The same outcome could be > achieved by redefining intel_gt_sanitize() to return that result and saying: > > if (!intel_gt_sanitize(gt, false) > return false; > > Is there any specific reason for intel_gt_sanitize() returning void? The intent is that sanitize scrubs the leftover BIOS state, failure is not an option. The biggest change with respect to intel_gt_sanitize() is the game we play with reset_clobbers_display -- we need the reset, whereas in sanitize, the reset is good to have (but realistically we do not expect there to be any contexts to scrub and so can take the risk). -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index fe57296b790c..5242496a893a 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -809,6 +809,7 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) struct intel_gt_timelines *timelines = >->timelines; struct intel_timeline *tl; unsigned long flags; + bool ok; if (!test_bit(I915_WEDGED, >->reset.flags)) return true; @@ -854,7 +855,11 @@ static bool __intel_gt_unset_wedged(struct intel_gt *gt) } spin_unlock_irqrestore(&timelines->lock, flags); - intel_gt_sanitize(gt, false); + ok = false; + if (!reset_clobbers_display(gt->i915)) + ok = __intel_gt_reset(gt, ALL_ENGINES) == 0; + if (!ok) + return false; /* * Undo nop_submit_request. We prevent all new i915 requests from
Unwedging the GPU requires a successful GPU reset before we restore the default submission, or else we may see residual context switch events that we were not expecting. Reported-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> --- drivers/gpu/drm/i915/gt/intel_reset.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)