Message ID | 20210125140136.10494-2-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/41] drm/i915/selftests: Check for engine-reset errors in the middle of workarounds | expand |
On 25/01/2021 14:00, Chris Wilson wrote: > In defer_request() we start with the request we just unsubmitted (that > should be the active request on the gpu) and then defer all of its > waiters. No waiter should be ahead of the active request, so none should > be marked as active. That assert failed. > > Of particular note this machine was undergoing persistent GPU result due s/result/reset/ > to underlying HW issues, so that may be a clue. A request is also marked > as active when it is retired, regardless of current queue status, and so > this assertion failure may be a result of the queue being completed by > the reset and then subsequently processed by the tasklet. > > We can filter out retired requests here by doing the assertion check > after the is-ready check (active is a subset of being ready). > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2978 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index 24731be6e462..56e36d938851 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -1061,7 +1061,6 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) > __i915_request_has_started(w) && > !__i915_request_is_complete(rq)); > > - GEM_BUG_ON(i915_request_is_active(w)); > if (!i915_request_is_ready(w)) > continue; > > @@ -1069,6 +1068,7 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) > continue; > > GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); > + GEM_BUG_ON(i915_request_is_active(w)); > list_move_tail(&w->sched.link, &list); > } > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 24731be6e462..56e36d938851 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1061,7 +1061,6 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) __i915_request_has_started(w) && !__i915_request_is_complete(rq)); - GEM_BUG_ON(i915_request_is_active(w)); if (!i915_request_is_ready(w)) continue; @@ -1069,6 +1068,7 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl) continue; GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); + GEM_BUG_ON(i915_request_is_active(w)); list_move_tail(&w->sched.link, &list); }
In defer_request() we start with the request we just unsubmitted (that should be the active request on the gpu) and then defer all of its waiters. No waiter should be ahead of the active request, so none should be marked as active. That assert failed. Of particular note this machine was undergoing persistent GPU result due to underlying HW issues, so that may be a clue. A request is also marked as active when it is retired, regardless of current queue status, and so this assertion failure may be a result of the queue being completed by the reset and then subsequently processed by the tasklet. We can filter out retired requests here by doing the assertion check after the is-ready check (active is a subset of being ready). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2978 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)