diff mbox

[2/6] drm/i915: Don't wait for page flips if there was GPU reset

Message ID 1359476018-31274-3-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä Jan. 29, 2013, 4:13 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

If a GPU reset occurs while a page flip has been submitted to the ring,
the flip will never complete once the ring has been reset.

The GPU reset can be detected by sampling the reset_counter before the
flip is submitted, and then while waiting for the flip, the sampled
counter is compared with the current reset_counter value.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
 drivers/gpu/drm/i915/intel_drv.h     |  3 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

Comments

Lespiau, Damien Feb. 13, 2013, 10:23 a.m. UTC | #1
On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> If a GPU reset occurs while a page flip has been submitted to the ring,
> the flip will never complete once the ring has been reset.
> 
> The GPU reset can be detected by sampling the reset_counter before the
> flip is submitted, and then while waiting for the flip, the sampled
> counter is compared with the current reset_counter value.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

You might want to rename reset_counter to flip_reset_counter to indicate
this field is specific to the flipping code. Other parts of the code
might need something similar as well?
Ville Syrjälä Feb. 13, 2013, 10:51 a.m. UTC | #2
On Wed, Feb 13, 2013 at 10:23:28AM +0000, Damien Lespiau wrote:
> On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > If a GPU reset occurs while a page flip has been submitted to the ring,
> > the flip will never complete once the ring has been reset.
> > 
> > The GPU reset can be detected by sampling the reset_counter before the
> > flip is submitted, and then while waiting for the flip, the sampled
> > counter is compared with the current reset_counter value.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> 
> You might want to rename reset_counter to flip_reset_counter to indicate
> this field is specific to the flipping code. Other parts of the code
> might need something similar as well?

IIRC I used flip_reset_counter initially but then I decided it's too
long and dropped the flip_ part. I can change it back if that's what
people prefer.
Daniel Vetter Feb. 13, 2013, 11:49 a.m. UTC | #3
On Wed, Feb 13, 2013 at 12:51:33PM +0200, Ville Syrjälä wrote:
> On Wed, Feb 13, 2013 at 10:23:28AM +0000, Damien Lespiau wrote:
> > On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > If a GPU reset occurs while a page flip has been submitted to the ring,
> > > the flip will never complete once the ring has been reset.
> > > 
> > > The GPU reset can be detected by sampling the reset_counter before the
> > > flip is submitted, and then while waiting for the flip, the sampled
> > > counter is compared with the current reset_counter value.
> > > 
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > 
> > You might want to rename reset_counter to flip_reset_counter to indicate
> > this field is specific to the flipping code. Other parts of the code
> > might need something similar as well?
> 
> IIRC I used flip_reset_counter initially but then I decided it's too
> long and dropped the flip_ part. I can change it back if that's what
> people prefer.

Imo the generic reset_counter name is ok. At least as long as we don't
need to keep track of more than one of these per crtc for different things
(or if we ever start to add more fine-grained reset domains).
-Daniel
Daniel Vetter Feb. 13, 2013, 3:23 p.m. UTC | #4
On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> If a GPU reset occurs while a page flip has been submitted to the ring,
> the flip will never complete once the ring has been reset.
> 
> The GPU reset can be detected by sampling the reset_counter before the
> flip is submitted, and then while waiting for the flip, the sampled
> counter is compared with the current reset_counter value.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
>  drivers/gpu/drm/i915/intel_drv.h     |  3 +++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 4097118..e348a68 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2862,10 +2862,12 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
>  {
>  	struct drm_device *dev = crtc->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
>  	unsigned long flags;
>  	bool pending;
>  
> -	if (i915_reset_in_progress(&dev_priv->gpu_error))
> +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> +	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
>  		return false;
>  
>  	spin_lock_irqsave(&dev->event_lock, flags);
> @@ -6912,6 +6914,8 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
>  	if (ret)
>  		goto err_unpin;
>  
> +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);

Ok no bikeshed about ->reset_counter, but a different one: Why does this
need to be in the per-gen callback? Can't we just grab this before we call
down into these callbacks? Imo races wrt the reset/hang code don't matter
that much as long as we don't wait forever for a pageflip which won't
happen. Hanging the gpu concurrently to pageflipping is undefined anyway
right now ...
-Daniel

> +
>  	/* Can't queue multiple flips, so wait for the previous
>  	 * one to finish before executing the next.
>  	 */
> @@ -6956,6 +6960,8 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
>  	if (ret)
>  		goto err_unpin;
>  
> +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> +
>  	if (intel_crtc->plane)
>  		flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
>  	else
> @@ -6997,6 +7003,8 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
>  	if (ret)
>  		goto err_unpin;
>  
> +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> +
>  	/* i965+ uses the linear or tiled offsets from the
>  	 * Display Registers (which do not change across a page-flip)
>  	 * so we need only reprogram the base address.
> @@ -7045,6 +7053,8 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
>  	if (ret)
>  		goto err_unpin;
>  
> +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> +
>  	intel_ring_emit(ring, MI_DISPLAY_FLIP |
>  			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
>  	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
> @@ -7111,6 +7121,8 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
>  	if (ret)
>  		goto err_unpin;
>  
> +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> +
>  	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
>  	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
>  	intel_ring_emit(ring, obj->gtt_offset + intel_crtc->dspaddr_offset);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index fcdfe42..a5521d9 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -235,6 +235,9 @@ struct intel_crtc {
>  	/* We can share PLLs across outputs if the timings match */
>  	struct intel_pch_pll *pch_pll;
>  	uint32_t ddi_pll_sel;
> +
> +	/* reset counter value when the last flip was submitted */
> +	unsigned int reset_counter;
>  };
>  
>  struct intel_plane {
> -- 
> 1.7.12.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä Feb. 13, 2013, 4:52 p.m. UTC | #5
On Wed, Feb 13, 2013 at 04:23:27PM +0100, Daniel Vetter wrote:
> On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > If a GPU reset occurs while a page flip has been submitted to the ring,
> > the flip will never complete once the ring has been reset.
> > 
> > The GPU reset can be detected by sampling the reset_counter before the
> > flip is submitted, and then while waiting for the flip, the sampled
> > counter is compared with the current reset_counter value.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
> >  drivers/gpu/drm/i915/intel_drv.h     |  3 +++
> >  2 files changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index 4097118..e348a68 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -2862,10 +2862,12 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
> >  {
> >  	struct drm_device *dev = crtc->dev;
> >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> >  	unsigned long flags;
> >  	bool pending;
> >  
> > -	if (i915_reset_in_progress(&dev_priv->gpu_error))
> > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > +	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> >  		return false;
> >  
> >  	spin_lock_irqsave(&dev->event_lock, flags);
> > @@ -6912,6 +6914,8 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
> >  	if (ret)
> >  		goto err_unpin;
> >  
> > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> 
> Ok no bikeshed about ->reset_counter, but a different one: Why does this
> need to be in the per-gen callback? Can't we just grab this before we call
> down into these callbacks? Imo races wrt the reset/hang code don't matter
> that much as long as we don't wait forever for a pageflip which won't
> happen. Hanging the gpu concurrently to pageflipping is undefined anyway
> right now ...

Yeah. It could be moved to happen a little earlier, and thus avoid the
duplication.

> -Daniel
> 
> > +
> >  	/* Can't queue multiple flips, so wait for the previous
> >  	 * one to finish before executing the next.
> >  	 */
> > @@ -6956,6 +6960,8 @@ static int intel_gen3_queue_flip(struct drm_device *dev,
> >  	if (ret)
> >  		goto err_unpin;
> >  
> > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> > +
> >  	if (intel_crtc->plane)
> >  		flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
> >  	else
> > @@ -6997,6 +7003,8 @@ static int intel_gen4_queue_flip(struct drm_device *dev,
> >  	if (ret)
> >  		goto err_unpin;
> >  
> > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> > +
> >  	/* i965+ uses the linear or tiled offsets from the
> >  	 * Display Registers (which do not change across a page-flip)
> >  	 * so we need only reprogram the base address.
> > @@ -7045,6 +7053,8 @@ static int intel_gen6_queue_flip(struct drm_device *dev,
> >  	if (ret)
> >  		goto err_unpin;
> >  
> > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> > +
> >  	intel_ring_emit(ring, MI_DISPLAY_FLIP |
> >  			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
> >  	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
> > @@ -7111,6 +7121,8 @@ static int intel_gen7_queue_flip(struct drm_device *dev,
> >  	if (ret)
> >  		goto err_unpin;
> >  
> > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> > +
> >  	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
> >  	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
> >  	intel_ring_emit(ring, obj->gtt_offset + intel_crtc->dspaddr_offset);
> > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> > index fcdfe42..a5521d9 100644
> > --- a/drivers/gpu/drm/i915/intel_drv.h
> > +++ b/drivers/gpu/drm/i915/intel_drv.h
> > @@ -235,6 +235,9 @@ struct intel_crtc {
> >  	/* We can share PLLs across outputs if the timings match */
> >  	struct intel_pch_pll *pch_pll;
> >  	uint32_t ddi_pll_sel;
> > +
> > +	/* reset counter value when the last flip was submitted */
> > +	unsigned int reset_counter;
> >  };
> >  
> >  struct intel_plane {
> > -- 
> > 1.7.12.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Feb. 13, 2013, 5:09 p.m. UTC | #6
On Wed, Feb 13, 2013 at 06:52:29PM +0200, Ville Syrjälä wrote:
> On Wed, Feb 13, 2013 at 04:23:27PM +0100, Daniel Vetter wrote:
> > On Tue, Jan 29, 2013 at 06:13:34PM +0200, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > If a GPU reset occurs while a page flip has been submitted to the ring,
> > > the flip will never complete once the ring has been reset.
> > > 
> > > The GPU reset can be detected by sampling the reset_counter before the
> > > flip is submitted, and then while waiting for the flip, the sampled
> > > counter is compared with the current reset_counter value.
> > > 
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/intel_display.c | 14 +++++++++++++-
> > >  drivers/gpu/drm/i915/intel_drv.h     |  3 +++
> > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > > index 4097118..e348a68 100644
> > > --- a/drivers/gpu/drm/i915/intel_display.c
> > > +++ b/drivers/gpu/drm/i915/intel_display.c
> > > @@ -2862,10 +2862,12 @@ static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
> > >  {
> > >  	struct drm_device *dev = crtc->dev;
> > >  	struct drm_i915_private *dev_priv = dev->dev_private;
> > > +	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
> > >  	unsigned long flags;
> > >  	bool pending;
> > >  
> > > -	if (i915_reset_in_progress(&dev_priv->gpu_error))
> > > +	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
> > > +	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
> > >  		return false;
> > >  
> > >  	spin_lock_irqsave(&dev->event_lock, flags);
> > > @@ -6912,6 +6914,8 @@ static int intel_gen2_queue_flip(struct drm_device *dev,
> > >  	if (ret)
> > >  		goto err_unpin;
> > >  
> > > +	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
> > 
> > Ok no bikeshed about ->reset_counter, but a different one: Why does this
> > need to be in the per-gen callback? Can't we just grab this before we call
> > down into these callbacks? Imo races wrt the reset/hang code don't matter
> > that much as long as we don't wait forever for a pageflip which won't
> > happen. Hanging the gpu concurrently to pageflipping is undefined anyway
> > right now ...
> 
> Yeah. It could be moved to happen a little earlier, and thus avoid the
> duplication.

Applied the patch and moved the assignment around a bit, thanks.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4097118..e348a68 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2862,10 +2862,12 @@  static bool intel_crtc_has_pending_flip(struct drm_crtc *crtc)
 {
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_crtc *intel_crtc = to_intel_crtc(crtc);
 	unsigned long flags;
 	bool pending;
 
-	if (i915_reset_in_progress(&dev_priv->gpu_error))
+	if (i915_reset_in_progress(&dev_priv->gpu_error) ||
+	    intel_crtc->reset_counter != atomic_read(&dev_priv->gpu_error.reset_counter))
 		return false;
 
 	spin_lock_irqsave(&dev->event_lock, flags);
@@ -6912,6 +6914,8 @@  static int intel_gen2_queue_flip(struct drm_device *dev,
 	if (ret)
 		goto err_unpin;
 
+	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+
 	/* Can't queue multiple flips, so wait for the previous
 	 * one to finish before executing the next.
 	 */
@@ -6956,6 +6960,8 @@  static int intel_gen3_queue_flip(struct drm_device *dev,
 	if (ret)
 		goto err_unpin;
 
+	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+
 	if (intel_crtc->plane)
 		flip_mask = MI_WAIT_FOR_PLANE_B_FLIP;
 	else
@@ -6997,6 +7003,8 @@  static int intel_gen4_queue_flip(struct drm_device *dev,
 	if (ret)
 		goto err_unpin;
 
+	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+
 	/* i965+ uses the linear or tiled offsets from the
 	 * Display Registers (which do not change across a page-flip)
 	 * so we need only reprogram the base address.
@@ -7045,6 +7053,8 @@  static int intel_gen6_queue_flip(struct drm_device *dev,
 	if (ret)
 		goto err_unpin;
 
+	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+
 	intel_ring_emit(ring, MI_DISPLAY_FLIP |
 			MI_DISPLAY_FLIP_PLANE(intel_crtc->plane));
 	intel_ring_emit(ring, fb->pitches[0] | obj->tiling_mode);
@@ -7111,6 +7121,8 @@  static int intel_gen7_queue_flip(struct drm_device *dev,
 	if (ret)
 		goto err_unpin;
 
+	intel_crtc->reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
+
 	intel_ring_emit(ring, MI_DISPLAY_FLIP_I915 | plane_bit);
 	intel_ring_emit(ring, (fb->pitches[0] | obj->tiling_mode));
 	intel_ring_emit(ring, obj->gtt_offset + intel_crtc->dspaddr_offset);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index fcdfe42..a5521d9 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -235,6 +235,9 @@  struct intel_crtc {
 	/* We can share PLLs across outputs if the timings match */
 	struct intel_pch_pll *pch_pll;
 	uint32_t ddi_pll_sel;
+
+	/* reset counter value when the last flip was submitted */
+	unsigned int reset_counter;
 };
 
 struct intel_plane {