diff mbox series

[3/3] drm/i915/execlists: Reset CSB pointers on canceling requests (wedging)

Message ID 20180914080017.30308-3-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/i915: Limit the backpressure for i915_request allocation | expand

Commit Message

Chris Wilson Sept. 14, 2018, 8 a.m. UTC
The prior assumption was that we did not need to reset the CSB on
wedging when cancelling the outstanding requests as it would be cleaned
up in the subsequent reset prior to restarting the GPU. However, what
was not accounted for was that in performing the reset, we would try to
process the outstanding CSB entries. If the GPU happened to complete a
CS event just as we were performing the cancellation of requests, that
event would be kept in the CSB until the reset -- but our bookkeeping
was cleared, causing confusion when trying to complete the CS event.

v2: Use a sanitize on unwedge to avoid interfering with eio suspend
(where we intentionally disable GPU reset).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Mika Kuoppala Sept. 14, 2018, 2:17 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> The prior assumption was that we did not need to reset the CSB on
> wedging when cancelling the outstanding requests as it would be cleaned
> up in the subsequent reset prior to restarting the GPU. However, what
> was not accounted for was that in performing the reset, we would try to

'performing the reset' could be 'preparing engine for reset'

> process the outstanding CSB entries. If the GPU happened to complete a
> CS event just as we were performing the cancellation of requests, that
> event would be kept in the CSB until the reset -- but our bookkeeping
> was cleared, causing confusion when trying to complete the CS event.
>
> v2: Use a sanitize on unwedge to avoid interfering with eio suspend
> (where we intentionally disable GPU reset).
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107925
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

I was glad to notice that there were quality comments
on resetting/clearing the csb/ports.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d6f2bbd6a0dc..c8020719bcfb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3438,6 +3438,9 @@  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	i915_retire_requests(i915);
 	GEM_BUG_ON(i915->gt.active_requests);
 
+	if (!intel_gpu_reset(i915, ALL_ENGINES))
+		intel_engines_sanitize(i915);
+
 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from
 	 * being queued (by disallowing execbuf whilst wedged) so having