Message ID | 20160121110250.GG16147@nuc-i3427.alporthouse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jan 21, 2016 at 11:02:50AM +0000, Chris Wilson wrote: > intel_gpu_reset(); > for_each_engine() unpin(engine->last_context); > unpin(dev_priv->default_context); > > After a few more passes, we should be able to call the reset and cleanup > functions higher up (as part of the stopping the GPU upon suspend > procedure before tearing down the sw structs) at which point we only > need the unpin(default_context) here. unref(dev_priv->default_context) ! whether or not the individual engines have a pin on it will be left to themselves to decide. -Chris
On 21/01/16 11:02, Chris Wilson wrote: > On Thu, Jan 21, 2016 at 10:10:42AM +0000, Tvrtko Ursulin wrote: >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >> >> In GuC mode LRC pinning lifetime depends exclusively on the >> request liftime. Since that is terminated by the seqno update >> that opens up a race condition between GPU finishing writing >> out the context image and the driver unpinning the LRC. >> >> To extend the LRC lifetime we will employ a similar approach >> to what legacy ringbuffer submission does. >> >> We will start tracking the last submitted context per engine >> and keep it pinned until it is replaced by another one. >> >> Note that the driver unload path is a bit fragile and could >> benefit greatly from efforts to unify the legacy and exec >> list submission code paths. >> >> At the moment i915_gem_context_fini has special casing for the >> two which are potentialy not needed, and also depends on >> i915_gem_cleanup_ringbuffer running before itself. >> >> v2: >> * Move pinning into engine->emit_request and actually fix >> the reference/unreference logic. (Chris Wilson) >> >> * ring->dev can be NULL on driver unload so use a different >> route towards it. >> >> v3: >> * Rebase. >> * Handle the reset path. (Chris Wilson) >> * Exclude default context from the pinning - it is impossible >> to get it right before default context special casing in >> general is eliminated. > > Since you have the refactoring in place, eliminating the special case is > trivial. So patch 2.5? > > Looks like http://patchwork.freedesktop.org/patch/69972/ but you already > have half of that patch in place. > > Also if we take a step to add patch 2.1: > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index c25083c78ba7..d43c886fd95a 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -321,6 +321,14 @@ err_destroy: > return ERR_PTR(ret); > } > > +static void i915_gem_context_unpin(struct intel_context *ctx, > + struct intel_engine_cs *rcs) > +{ > + if (rcs->id == RCS && ctx->legacy_hw_ctx.rcs_state) > + i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state); > + i915_gem_context_unreference(ctx); > +} > + > void i915_gem_context_reset(struct drm_device *dev) > { > struct drm_i915_private *dev_priv = dev->dev_private; > @@ -338,13 +346,9 @@ void i915_gem_context_reset(struct drm_device *dev) > > for (i = 0; i < I915_NUM_RINGS; i++) { > struct intel_engine_cs *ring = &dev_priv->ring[i]; > - struct intel_context *lctx = ring->last_context; > > - if (lctx) { > - if (lctx->legacy_hw_ctx.rcs_state && i == RCS) > - i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state); > - > - i915_gem_context_unreference(lctx); > + if (ring->last_context) { > + i915_gem_context_unpin(ring->last_context, ring); > ring->last_context = NULL; > } > > @@ -424,25 +428,18 @@ void i915_gem_context_fini(struct drm_device *dev) > * to offset the do_switch part, so that i915_gem_context_unreference() > * can then free the base object correctly. */ > WARN_ON(!dev_priv->ring[RCS].last_context); > - if (dev_priv->ring[RCS].last_context == dctx) { > - /* Fake switch to NULL context */ > - WARN_ON(dctx->legacy_hw_ctx.rcs_state->active); > - i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state); > - i915_gem_context_unreference(dctx); > - dev_priv->ring[RCS].last_context = NULL; > - } > - > i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state); > } > > for (i = 0; i < I915_NUM_RINGS; i++) { > struct intel_engine_cs *ring = &dev_priv->ring[i]; > > - if (ring->last_context) > - i915_gem_context_unreference(ring->last_context); > + if (ring->last_context) { > + i915_gem_context_unpin(ring->last_context, ring); > + ring->last_context = NULL; > + } > > ring->default_context = NULL; > - ring->last_context = NULL; > } > > i915_gem_context_unreference(dctx); > > Then we just end up with > > if (i915.enable_execlists) > intel_lr_context_unpin(ring->last_context, ring); > else > i915_gem_context_unpin(ring->last_context, ring); > > which is much less messy in this patch and one step closer to unifying > engine->pin_context()/engine->unpin_context() vfuns. > > The only tricky part there is having to disentangle the legacy paths, > but doing so should allow us to see clearly that the sequence is just > > intel_gpu_reset(); > for_each_engine() unpin(engine->last_context); > unpin(dev_priv->default_context); > > After a few more passes, we should be able to call the reset and cleanup > functions higher up (as part of the stopping the GPU upon suspend > procedure before tearing down the sw structs) at which point we only > need the unpin(default_context) here. I was confused by the current code which does the reset after unpinning: ... i915_gem_reset(dev); simulated = dev_priv->gpu_error.stop_rings != 0; ret = intel_gpu_reset(dev); ... What is right then? Sounds bad to be unpinning with the GPU in unknown state. But obviously it has been like this for who knows how long. So I have no idea. :( Asking since GuC is at the moment struggling with reset, but then again that might be something completely else... Regards, Tvrtko
On Fri, Jan 22, 2016 at 04:33:18PM +0000, Tvrtko Ursulin wrote: > I was confused by the current code which does the reset after unpinning: > > ... > i915_gem_reset(dev); > > simulated = dev_priv->gpu_error.stop_rings != 0; > > ret = intel_gpu_reset(dev); > ... > > > What is right then? Yeah, it would be better to do the sw reset after the hw reset so that the GPU is a known state when we tear down everything. > Sounds bad to be unpinning with the GPU in unknown state. But > obviously it has been like this for who knows how long. So I have no > idea. :( i915_reset() is only called on a hung GPU, and I expect we simply haven't stressed the system enough with an active-vs-hung pair of engines to be able to see stray writes and whatnot. The downside is that if we jiggle the i915_gem_reset() we have to ask the awkward question of what to do with state when reset fails? We should keep it around because the hw state is unknown - but that can be a significant amount of memory trapped. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index c25083c78ba7..d43c886fd95a 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -321,6 +321,14 @@ err_destroy: return ERR_PTR(ret); } +static void i915_gem_context_unpin(struct intel_context *ctx, + struct intel_engine_cs *rcs) +{ + if (rcs->id == RCS && ctx->legacy_hw_ctx.rcs_state) + i915_gem_object_ggtt_unpin(ctx->legacy_hw_ctx.rcs_state); + i915_gem_context_unreference(ctx); +} + void i915_gem_context_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -338,13 +346,9 @@ void i915_gem_context_reset(struct drm_device *dev) for (i = 0; i < I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = &dev_priv->ring[i]; - struct intel_context *lctx = ring->last_context; - if (lctx) { - if (lctx->legacy_hw_ctx.rcs_state && i == RCS) - i915_gem_object_ggtt_unpin(lctx->legacy_hw_ctx.rcs_state); - - i915_gem_context_unreference(lctx); + if (ring->last_context) { + i915_gem_context_unpin(ring->last_context, ring); ring->last_context = NULL; } @@ -424,25 +428,18 @@ void i915_gem_context_fini(struct drm_device *dev) * to offset the do_switch part, so that i915_gem_context_unreference() * can then free the base object correctly. */ WARN_ON(!dev_priv->ring[RCS].last_context); - if (dev_priv->ring[RCS].last_context == dctx) { - /* Fake switch to NULL context */ - WARN_ON(dctx->legacy_hw_ctx.rcs_state->active); - i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state); - i915_gem_context_unreference(dctx); - dev_priv->ring[RCS].last_context = NULL; - } - i915_gem_object_ggtt_unpin(dctx->legacy_hw_ctx.rcs_state); } for (i = 0; i < I915_NUM_RINGS; i++) { struct intel_engine_cs *ring = &dev_priv->ring[i]; - if (ring->last_context) - i915_gem_context_unreference(ring->last_context); + if (ring->last_context) { + i915_gem_context_unpin(ring->last_context, ring); + ring->last_context = NULL; + } ring->default_context = NULL; - ring->last_context = NULL; } i915_gem_context_unreference(dctx);