diff mbox

drm/i915: Unpin last_context at reset

Message ID 1403118289-19550-1-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä June 18, 2014, 7:04 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

We're forgetting to unpin the last_context from the ggtt at GPU reset
time. This leads to the vma pin_count leaking at every reset if the
last context wasn't the ring default context. Further use of the same
context will trigger the pin_count check in i915_gem_object_pin() and
userspace will be faced with EBUSY as a result.

This plaques kms_flip rather badly since it performs lots of resets,
and every fd has its own default context these days.

Fix the problem by properly unpinning the last context at reset.

Testcase: igt/gem_ctx_exec/reset-pin-leak
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson June 19, 2014, 7:47 a.m. UTC | #1
On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We're forgetting to unpin the last_context from the ggtt at GPU reset
> time. This leads to the vma pin_count leaking at every reset if the
> last context wasn't the ring default context. Further use of the same
> context will trigger the pin_count check in i915_gem_object_pin() and
> userspace will be faced with EBUSY as a result.
> 
> This plaques kms_flip rather badly since it performs lots of resets,
> and every fd has its own default context these days.
> 
> Fix the problem by properly unpinning the last context at reset.

Ah, the context reset here is faked because we never restore the default
context state.  Hmm, in fact, I get the impression that we should just
delete i915_gem_context_reset(), and make i915_gem_context_enable()
function correctly after the GPU is reset.
-Chris
oscar.mateo@intel.com June 19, 2014, 4:21 p.m. UTC | #2
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Chris Wilson
> Sent: Thursday, June 19, 2014 8:47 AM
> To: ville.syrjala@linux.intel.com
> Cc: intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Unpin last_context at reset
> 
> On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com
> wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >
> > We're forgetting to unpin the last_context from the ggtt at GPU reset
> > time. This leads to the vma pin_count leaking at every reset if the
> > last context wasn't the ring default context. Further use of the same
> > context will trigger the pin_count check in i915_gem_object_pin() and
> > userspace will be faced with EBUSY as a result.
> >
> > This plaques kms_flip rather badly since it performs lots of resets,
> > and every fd has its own default context these days.
> >
> > Fix the problem by properly unpinning the last context at reset.
> 
> Ah, the context reset here is faked because we never restore the default
> context state.  Hmm, in fact, I get the impression that we should just delete
> i915_gem_context_reset(), and make i915_gem_context_enable() function
> correctly after the GPU is reset.
> -Chris

I´m doing something similar in a parallel conversation (see http://lists.freedesktop.org/archives/intel-gfx/2014-June/047744.html). My proposal was to make both the APPGTT switch and default context switch completely synchronous (RING_PP_DIR_BASE/GEN8_RING_PDP for the PPGTT, CCID for the context) in i915_gem_context_enable(). That way we get rid of i915_gem_context_reset() and we don´t need all the gpu reset special casing inside ppgtt->switch_mm.
Daniel wants a completely asynchronous thing, which seems complex without intel_ring_begin, ring->flush, intel_ring_advance, etc..
Ville Syrjälä June 23, 2014, 10:07 a.m. UTC | #3
On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> We're forgetting to unpin the last_context from the ggtt at GPU reset
> time. This leads to the vma pin_count leaking at every reset if the
> last context wasn't the ring default context. Further use of the same
> context will trigger the pin_count check in i915_gem_object_pin() and
> userspace will be faced with EBUSY as a result.
> 
> This plaques kms_flip rather badly since it performs lots of resets,
> and every fd has its own default context these days.
> 
> Fix the problem by properly unpinning the last context at reset.
> 
> Testcase: igt/gem_ctx_exec/reset-pin-leak
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

I pushed the igt, and also kms_flip hits this easily.

It looks to me that the bug has been there since the introduction of
i915_gem_context_reset() in [1], so I think we may want this patch
merged with cc: stable. Afterwards people can work on killing
i915_gem_context_reset() if they so wish.

[1]
 commit acce9ffa4807027965ebd948456fa8385bbee32e
 Author: Ben Widawsky <ben@bwidawsk.net>
 Date:   Fri Dec 6 14:11:03 2013 -0800

    drm/i915: Better reset handling for contexts

> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3ffe308..e362c96 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev)
>  			dctx->obj->active = 0;
>  		}
>  
> +		if (ring->last_context->obj && i == RCS)
> +			i915_gem_object_ggtt_unpin(ring->last_context->obj);
> +
>  		i915_gem_context_unreference(ring->last_context);
>  		i915_gem_context_reference(dctx);
>  		ring->last_context = dctx;
> -- 
> 1.8.5.5
Daniel Vetter July 7, 2014, 2:38 p.m. UTC | #4
On Mon, Jun 23, 2014 at 01:07:50PM +0300, Ville Syrjälä wrote:
> On Wed, Jun 18, 2014 at 10:04:48PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > We're forgetting to unpin the last_context from the ggtt at GPU reset
> > time. This leads to the vma pin_count leaking at every reset if the
> > last context wasn't the ring default context. Further use of the same
> > context will trigger the pin_count check in i915_gem_object_pin() and
> > userspace will be faced with EBUSY as a result.
> > 
> > This plaques kms_flip rather badly since it performs lots of resets,
> > and every fd has its own default context these days.
> > 
> > Fix the problem by properly unpinning the last context at reset.
> > 
> > Testcase: igt/gem_ctx_exec/reset-pin-leak
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> I pushed the igt, and also kms_flip hits this easily.
> 
> It looks to me that the bug has been there since the introduction of
> i915_gem_context_reset() in [1], so I think we may want this patch
> merged with cc: stable. Afterwards people can work on killing
> i915_gem_context_reset() if they so wish.
> 
> [1]
>  commit acce9ffa4807027965ebd948456fa8385bbee32e
>  Author: Ben Widawsky <ben@bwidawsk.net>
>  Date:   Fri Dec 6 14:11:03 2013 -0800
> 
>     drm/i915: Better reset handling for contexts

I've added this regression marker but forgone the cc: stable - haven't
heard anyone complain about this loud enough yet ... So just merged to
dinq.

We can do the proper fix of unifying the paths here a bit more later on.

Thanks for the patch.
-Daniel
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 3ffe308..e362c96 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -382,6 +382,9 @@ void i915_gem_context_reset(struct drm_device *dev)
> >  			dctx->obj->active = 0;
> >  		}
> >  
> > +		if (ring->last_context->obj && i == RCS)
> > +			i915_gem_object_ggtt_unpin(ring->last_context->obj);
> > +
> >  		i915_gem_context_unreference(ring->last_context);
> >  		i915_gem_context_reference(dctx);
> >  		ring->last_context = dctx;
> > -- 
> > 1.8.5.5
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3ffe308..e362c96 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -382,6 +382,9 @@  void i915_gem_context_reset(struct drm_device *dev)
 			dctx->obj->active = 0;
 		}
 
+		if (ring->last_context->obj && i == RCS)
+			i915_gem_object_ggtt_unpin(ring->last_context->obj);
+
 		i915_gem_context_unreference(ring->last_context);
 		i915_gem_context_reference(dctx);
 		ring->last_context = dctx;