drm/i915/gvt: Fix crash after request->hw_context change
diff mbox

Message ID 20180518101305.8840-1-zhenyuw@linux.intel.com
State New
Headers show

Commit Message

Zhenyu Wang May 18, 2018, 10:13 a.m. UTC
When we do shadowing, workload's request might not be allocated yet,
so we still require shadow context's object. And when complete workload,
delay to zero workload's request pointer after used for update guest context.

Fixes: 1fc44d9b1afb ("drm/i915: Store a pointer to intel_context in i915_request")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/scheduler.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Chris Wilson May 18, 2018, 10:22 a.m. UTC | #1
Quoting Zhenyu Wang (2018-05-18 11:13:05)
> When we do shadowing, workload's request might not be allocated yet,
> so we still require shadow context's object. And when complete workload,
> delay to zero workload's request pointer after used for update guest context.

Please allocate the context earlier then. The point is that until you
do, shadow_ctx->__engine[ring_id]->state is *undefined* and this code is
still illegal. :-p

The intent is that you start tracking the lifetime of the state you are
using because the assumptions made here will not hold for much longer.
-Chris
Chris Wilson May 18, 2018, 11:03 a.m. UTC | #2
Quoting Patchwork (2018-05-18 11:55:01)
> == Series Details ==
> 
> Series: drm/i915/gvt: Fix crash after request->hw_context change
> URL   : https://patchwork.freedesktop.org/series/43406/
> State : failure
> 
> == Summary ==
> 
> Applying: drm/i915/gvt: Fix crash after request->hw_context change
> error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gvt/scheduler.c).
> error: could not build fake ancestor
> Patch failed at 0001 drm/i915/gvt: Fix crash after request->hw_context change

Wrong tree used as teh baseline, could you resend? Or an alternative to
avoid dereferencing state outside of being pinned?
-Chris
Zhenyu Wang May 18, 2018, 2:55 p.m. UTC | #3
On 2018.05.18 12:03:02 +0100, Chris Wilson wrote:
> Quoting Patchwork (2018-05-18 11:55:01)
> > == Series Details ==
> > 
> > Series: drm/i915/gvt: Fix crash after request->hw_context change
> > URL   : https://patchwork.freedesktop.org/series/43406/
> > State : failure
> > 
> > == Summary ==
> > 
> > Applying: drm/i915/gvt: Fix crash after request->hw_context change
> > error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gvt/scheduler.c).
> > error: could not build fake ancestor
> > Patch failed at 0001 drm/i915/gvt: Fix crash after request->hw_context change
> 
> Wrong tree used as teh baseline, could you resend? Or an alternative to
> avoid dereferencing state outside of being pinned?

Sorry, applied against gvt dev tree which has unmerged ones against
dinq. Current code is to try to do possible shadowing earlier even not
requiring real request be allocated, need to relook for proper lifetime
to sort this out.
Zhenyu Wang May 21, 2018, 3:44 a.m. UTC | #4
On 2018.05.18 11:22:06 +0100, Chris Wilson wrote:
> Quoting Zhenyu Wang (2018-05-18 11:13:05)
> > When we do shadowing, workload's request might not be allocated yet,
> > so we still require shadow context's object. And when complete workload,
> > delay to zero workload's request pointer after used for update guest context.
> 
> Please allocate the context earlier then. The point is that until you
> do, shadow_ctx->__engine[ring_id]->state is *undefined* and this code is
> still illegal. :-p
> 
> The intent is that you start tracking the lifetime of the state you are
> using because the assumptions made here will not hold for much longer.

Chris, after double check, for shadowing we do assure to pin our
context earlier for target engine context, so shadow_ctx->__engine[ring_id]->state
is always valid. We just don't require the real request be generated earlier
during scan/shadow, so use pinned context state pointer directly. Will that be
a problem in future?
Chris Wilson May 21, 2018, 7:39 a.m. UTC | #5
Quoting Zhenyu Wang (2018-05-21 04:44:34)
> On 2018.05.18 11:22:06 +0100, Chris Wilson wrote:
> > Quoting Zhenyu Wang (2018-05-18 11:13:05)
> > > When we do shadowing, workload's request might not be allocated yet,
> > > so we still require shadow context's object. And when complete workload,
> > > delay to zero workload's request pointer after used for update guest context.
> > 
> > Please allocate the context earlier then. The point is that until you
> > do, shadow_ctx->__engine[ring_id]->state is *undefined* and this code is
> > still illegal. :-p
> > 
> > The intent is that you start tracking the lifetime of the state you are
> > using because the assumptions made here will not hold for much longer.
> 
> Chris, after double check, for shadowing we do assure to pin our
> context earlier for target engine context, so shadow_ctx->__engine[ring_id]->state
> is always valid. We just don't require the real request be generated earlier
> during scan/shadow, so use pinned context state pointer directly. Will that be
> a problem in future?

It's just a matter of how you get it. The only reason I was using the
request was that had an explicit pointer to the pinned context; it
looked easier to keep using that rather than add a separate context
member. If you would rather do that, do so -- please don't keep
tacitly using the gem_context to derive the intel_context without any
obvious sign of protection. (It's just that the link between gem_context
and intel_context is going to become more tenuous, so it is important we
make the ownership and lifecycle of intel_context more explicit.)
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 2efb723b90cb..00f79fc940da 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -125,8 +125,9 @@  static int populate_shadow_context(struct intel_vgpu_workload *workload)
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
 	int ring_id = workload->ring_id;
+	struct i915_gem_context *shadow_ctx = vgpu->submission.shadow_ctx;
 	struct drm_i915_gem_object *ctx_obj =
-		workload->req->hw_context->state->obj;
+		shadow_ctx->__engine[ring_id].state->obj;
 	struct execlist_ring_context *shadow_ring_context;
 	struct page *page;
 	void *dst;
@@ -595,8 +596,6 @@  static int prepare_workload(struct intel_vgpu_workload *workload)
 		return ret;
 	}
 
-	update_shadow_pdps(workload);
-
 	ret = intel_vgpu_sync_oos_pages(workload->vgpu);
 	if (ret) {
 		gvt_vgpu_err("fail to vgpu sync oos pages\n");
@@ -615,6 +614,8 @@  static int prepare_workload(struct intel_vgpu_workload *workload)
 		goto err_unpin_mm;
 	}
 
+	update_shadow_pdps(workload);
+
 	ret = prepare_shadow_batch_buffer(workload);
 	if (ret) {
 		gvt_vgpu_err("fail to prepare_shadow_batch_buffer\n");
@@ -825,7 +826,7 @@  static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 		scheduler->current_workload[ring_id];
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_vgpu_submission *s = &vgpu->submission;
-	struct i915_request *rq;
+	struct i915_request *rq = workload->req;
 	int event;
 
 	mutex_lock(&vgpu->vgpu_lock);
@@ -835,7 +836,6 @@  static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 	 * switch to make sure request is completed.
 	 * For the workload w/o request, directly complete the workload.
 	 */
-	rq = fetch_and_zero(&workload->req);
 	if (rq) {
 		wait_event(workload->shadow_ctx_status_wq,
 			   !atomic_read(&workload->shadow_ctx_active));
@@ -866,7 +866,7 @@  static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 		intel_context_unpin(rq->hw_context);
 		mutex_unlock(&rq->i915->drm.struct_mutex);
 
-		i915_request_put(rq);
+		i915_request_put(fetch_and_zero(&workload->req));
 	}
 
 	gvt_dbg_sched("ring id %d complete workload %p status %d\n",