Message ID | 1403118289-19550-1-git-send-email-ville.syrjala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > We're forgetting to unpin the last_context from the ggtt at GPU reset > time. This leads to the vma pin_count leaking at every reset if the > last context wasn't the ring default context. Further use of the same > context will trigger the pin_count check in i915_gem_object_pin() and > userspace will be faced with EBUSY as a result. > > This plaques kms_flip rather badly since it performs lots of resets, > and every fd has its own default context these days. > > Fix the problem by properly unpinning the last context at reset. Ah, the context reset here is faked because we never restore the default context state. Hmm, in fact, I get the impression that we should just delete i915_gem_context_reset(), and make i915_gem_context_enable() function correctly after the GPU is reset. -Chris
> -----Original Message----- > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf > Of Chris Wilson > Sent: Thursday, June 19, 2014 8:47 AM > To: ville.syrjala@linux.intel.com > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Unpin last_context at reset > > On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com > wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > We're forgetting to unpin the last_context from the ggtt at GPU reset > > time. This leads to the vma pin_count leaking at every reset if the > > last context wasn't the ring default context. Further use of the same > > context will trigger the pin_count check in i915_gem_object_pin() and > > userspace will be faced with EBUSY as a result. > > > > This plaques kms_flip rather badly since it performs lots of resets, > > and every fd has its own default context these days. > > > > Fix the problem by properly unpinning the last context at reset. > > Ah, the context reset here is faked because we never restore the default > context state. Hmm, in fact, I get the impression that we should just delete > i915_gem_context_reset(), and make i915_gem_context_enable() function > correctly after the GPU is reset. > -Chris I´m doing something similar in a parallel conversation (see http://lists.freedesktop.org/archives/intel-gfx/2014-June/047744.html). My proposal was to make both the APPGTT switch and default context switch completely synchronous (RING_PP_DIR_BASE/GEN8_RING_PDP for the PPGTT, CCID for the context) in i915_gem_context_enable(). That way we get rid of i915_gem_context_reset() and we don´t need all the gpu reset special casing inside ppgtt->switch_mm. Daniel wants a completely asynchronous thing, which seems complex without intel_ring_begin, ring->flush, intel_ring_advance, etc..
On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > We're forgetting to unpin the last_context from the ggtt at GPU reset > time. This leads to the vma pin_count leaking at every reset if the > last context wasn't the ring default context. Further use of the same > context will trigger the pin_count check in i915_gem_object_pin() and > userspace will be faced with EBUSY as a result. > > This plaques kms_flip rather badly since it performs lots of resets, > and every fd has its own default context these days. > > Fix the problem by properly unpinning the last context at reset. > > Testcase: igt/gem_ctx_exec/reset-pin-leak > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> I pushed the igt, and also kms_flip hits this easily. It looks to me that the bug has been there since the introduction of i915_gem_context_reset() in [1], so I think we may want this patch merged with cc: stable. Afterwards people can work on killing i915_gem_context_reset() if they so wish. [1] commit acce9ffa4807027965ebd948456fa8385bbee32e Author: Ben Widawsky <ben@bwidawsk.net> Date: Fri Dec 6 14:11:03 2013 -0800 drm/i915: Better reset handling for contexts > --- > drivers/gpu/drm/i915/i915_gem_context.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 3ffe308..e362c96 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev) > dctx->obj->active = 0; > } > > + if (ring->last_context->obj && i == RCS) > + i915_gem_object_ggtt_unpin(ring->last_context->obj); > + > i915_gem_context_unreference(ring->last_context); > i915_gem_context_reference(dctx); > ring->last_context = dctx; > -- > 1.8.5.5
On Mon, Jun 23, 2014 at 01:07:50PM +0300, Ville Syrjälä wrote: > On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > We're forgetting to unpin the last_context from the ggtt at GPU reset > > time. This leads to the vma pin_count leaking at every reset if the > > last context wasn't the ring default context. Further use of the same > > context will trigger the pin_count check in i915_gem_object_pin() and > > userspace will be faced with EBUSY as a result. > > > > This plaques kms_flip rather badly since it performs lots of resets, > > and every fd has its own default context these days. > > > > Fix the problem by properly unpinning the last context at reset. > > > > Testcase: igt/gem_ctx_exec/reset-pin-leak > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > I pushed the igt, and also kms_flip hits this easily. > > It looks to me that the bug has been there since the introduction of > i915_gem_context_reset() in [1], so I think we may want this patch > merged with cc: stable. Afterwards people can work on killing > i915_gem_context_reset() if they so wish. > > [1] > commit acce9ffa4807027965ebd948456fa8385bbee32e > Author: Ben Widawsky <ben@bwidawsk.net> > Date: Fri Dec 6 14:11:03 2013 -0800 > > drm/i915: Better reset handling for contexts I've added this regression marker but forgone the cc: stable - haven't heard anyone complain about this loud enough yet ... So just merged to dinq. We can do the proper fix of unifying the paths here a bit more later on. Thanks for the patch. -Daniel > > > --- > > drivers/gpu/drm/i915/i915_gem_context.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > index 3ffe308..e362c96 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev) > > dctx->obj->active = 0; > > } > > > > + if (ring->last_context->obj && i == RCS) > > + i915_gem_object_ggtt_unpin(ring->last_context->obj); > > + > > i915_gem_context_unreference(ring->last_context); > > i915_gem_context_reference(dctx); > > ring->last_context = dctx; > > -- > > 1.8.5.5 > > -- > Ville Syrjälä > Intel OTC > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 3ffe308..e362c96 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev) dctx->obj->active = 0; } + if (ring->last_context->obj && i == RCS) + i915_gem_object_ggtt_unpin(ring->last_context->obj); + i915_gem_context_unreference(ring->last_context); i915_gem_context_reference(dctx); ring->last_context = dctx;