diff mbox series

[RFC] drm/i915/guc: Fix premature release of context on reset

Message ID 20190724150525.18291-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC] drm/i915/guc: Fix premature release of context on reset | expand

Commit Message

Janusz Krzysztofik July 24, 2019, 3:05 p.m. UTC
When using GuC submission, some execlists originated helper functions
are reused.  One of them, used inside guc_reset() and
guc_cancel_requests() callbacks introduced by commit 292ad25c22d9
("drm/i915/guc: Implement reset locally"), unfortunately calls
execlists_schedule_out() helper instead of its GuC specific equivalent.
As execlists functions maintain context references for themselves, that
helper releases a context associated with a request being processed as
soon as the context inflight queue becomes empty.  Since GuC submission
doesn't keep extra context references, possibly still active contexts
may be released prematurely, resulting in kernel panic.

Fix it by providing a local, modified copy of
execlists_cancel_port_requests() helper.

Fixes: 292ad25c22d9 ("drm/i915/guc: Implement reset locally")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
Based on drm-intel-next-queued as of 2019-07-24, can be rebased easily
on top of drm-intel-next-2019-07-08 if needed (source file moved).

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Chris Wilson July 24, 2019, 3:11 p.m. UTC | #1
Quoting Janusz Krzysztofik (2019-07-24 16:05:25)
> When using GuC submission, some execlists originated helper functions
> are reused.  One of them, used inside guc_reset() and
> guc_cancel_requests() callbacks introduced by commit 292ad25c22d9
> ("drm/i915/guc: Implement reset locally"), unfortunately calls
> execlists_schedule_out() helper instead of its GuC specific equivalent.
> As execlists functions maintain context references for themselves, that
> helper releases a context associated with a request being processed as
> soon as the context inflight queue becomes empty.  Since GuC submission
> doesn't keep extra context references, possibly still active contexts
> may be released prematurely, resulting in kernel panic.
> 
> Fix it by providing a local, modified copy of
> execlists_cancel_port_requests() helper.

Then remove the export and stub.
-Chris
Chris Wilson July 24, 2019, 3:24 p.m. UTC | #2
Quoting Janusz Krzysztofik (2019-07-24 16:05:25)
> When using GuC submission, some execlists originated helper functions
> are reused.  One of them, used inside guc_reset() and
> guc_cancel_requests() callbacks introduced by commit 292ad25c22d9
> ("drm/i915/guc: Implement reset locally"), unfortunately calls
> execlists_schedule_out() helper instead of its GuC specific equivalent.
> As execlists functions maintain context references for themselves, that
> helper releases a context associated with a request being processed as
> soon as the context inflight queue becomes empty.  Since GuC submission
> doesn't keep extra context references, possibly still active contexts
> may be released prematurely, resulting in kernel panic.

Fwiw, that rq->context->inflight = NULL in schedule_out() is not
protected against a dangling dereference. Which, if you are not falling
into the execlists trap, you can remove entirely to avoid the potential
use-after-free.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a0f2a01365bc..b1ea7a818d61 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -651,6 +651,21 @@  static void guc_reset_prepare(struct intel_engine_cs *engine)
 	__tasklet_disable_sync_once(&execlists->tasklet);
 }
 
+static void
+cancel_port_requests(struct intel_engine_execlists * const execlists)
+{
+	struct i915_request * const *port, *rq;
+
+	for (port = execlists->pending; (rq = *port); port++)
+		schedule_out(rq);
+	memset(execlists->pending, 0, sizeof(execlists->pending));
+
+	for (port = execlists->active; (rq = *port); port++)
+		schedule_out(rq);
+	execlists->active =
+		memset(execlists->inflight, 0, sizeof(execlists->inflight));
+}
+
 static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -659,7 +674,7 @@  static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 
 	spin_lock_irqsave(&engine->active.lock, flags);
 
-	execlists_cancel_port_requests(execlists);
+	cancel_port_requests(execlists);
 
 	/* Push back any incomplete requests for replay after the reset. */
 	rq = execlists_unwind_incomplete_requests(execlists);
@@ -702,7 +717,7 @@  static void guc_cancel_requests(struct intel_engine_cs *engine)
 	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Cancel the requests on the HW and clear the ELSP tracker. */
-	execlists_cancel_port_requests(execlists);
+	cancel_port_requests(execlists);
 
 	/* Mark all executing requests as skipped. */
 	list_for_each_entry(rq, &engine->active.requests, sched.link) {