[v3] drm/i915: Avoid GPU Hang when comming out of s3 or s4
diff mbox

Message ID 1431330645-26704-1-git-send-email-peter.antoine@intel.com
State New
Headers show

Commit Message

Peter Antoine May 11, 2015, 7:50 a.m. UTC
This patch fixes a timing issue that causes a GPU hang when the system
comes out of power saving.

During pm_resume, We are submitting batchbuffers before enabling
Interrupts this is causing us to miss the context switch interrupt,
and in consequence intel_execlists_handle_ctx_events is not triggered.

This patch is based on a patch from Deepak S <deepak.s@intel.com>
from another platform.

The patch fixes an issue introduced by:
  commit e7778be1eab918274f79603d7c17b3ec8be77386
  drm/i915: Fix startup failure in LRC mode after recent init changes

The above patch added a call to init_context() to fix an issue introduced
by a previous patch. But, it then opened up a small timing window for the
batches being added by the init_context (basically setting up the context)
to complete before the interrupts have been turned on, thus hanging the
GPU.

BUG: https://bugs.freedesktop.org/show_bug.cgi?id=89600
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

Comments

Dave Gordon May 11, 2015, 10:08 a.m. UTC | #1
On 11/05/15 08:50, Peter Antoine wrote:
> This patch fixes a timing issue that causes a GPU hang when the system
> comes out of power saving.
> 
> During pm_resume, We are submitting batchbuffers before enabling
> Interrupts this is causing us to miss the context switch interrupt,
> and in consequence intel_execlists_handle_ctx_events is not triggered.
> 
> This patch is based on a patch from Deepak S <deepak.s@intel.com>
> from another platform.
> 
> The patch fixes an issue introduced by:
>   commit e7778be1eab918274f79603d7c17b3ec8be77386
>   drm/i915: Fix startup failure in LRC mode after recent init changes
> 
> The above patch added a call to init_context() to fix an issue introduced
> by a previous patch. But, it then opened up a small timing window for the
> batches being added by the init_context (basically setting up the context)
> to complete before the interrupts have been turned on, thus hanging the
> GPU.
> 
> BUG: https://bugs.freedesktop.org/show_bug.cgi?id=89600
> Signed-off-by: Peter Antoine <peter.antoine@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6bb6c47..90b1309 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -734,6 +734,12 @@ static int i915_drm_resume(struct drm_device *dev)
>  	intel_init_pch_refclk(dev);
>  	drm_mode_config_reset(dev);
>  
> +	/* Interrupts have to enabled so that any batches that are completed
> +	 * when the context is restarted are caught so that the ring buffer
> +	 * does not handle.
> +	 */

The comment above is ungrammatical, and the format is not the preferred
one. How about this instead:

	/*
	 * Interrupts have to be enabled before reinitialising the
	 * hardware, so that batchbuffers can be submitted and the
	 * resulting interrupts handled. Failure to do so will result
	 * in a ring hang, as the batches will not be seen to complete.
	 */

> +	intel_runtime_pm_enable_interrupts(dev_priv);
> +
>  	mutex_lock(&dev->struct_mutex);
>  	if (i915_gem_init_hw(dev)) {
>  		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
> @@ -741,9 +747,7 @@ static int i915_drm_resume(struct drm_device *dev)
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	/* We need working interrupts for modeset enabling ... */
> -	intel_runtime_pm_enable_interrupts(dev_priv);
> -
> +	/* This must follow the pm enable interrupts */
>  	intel_modeset_init_hw(dev);
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
>
Daniel Vetter May 11, 2015, 10:33 a.m. UTC | #2
On Mon, May 11, 2015 at 08:50:45AM +0100, Peter Antoine wrote:
> This patch fixes a timing issue that causes a GPU hang when the system
> comes out of power saving.
> 
> During pm_resume, We are submitting batchbuffers before enabling
> Interrupts this is causing us to miss the context switch interrupt,
> and in consequence intel_execlists_handle_ctx_events is not triggered.
> 
> This patch is based on a patch from Deepak S <deepak.s@intel.com>
> from another platform.
> 
> The patch fixes an issue introduced by:
>   commit e7778be1eab918274f79603d7c17b3ec8be77386
>   drm/i915: Fix startup failure in LRC mode after recent init changes
> 
> The above patch added a call to init_context() to fix an issue introduced
> by a previous patch. But, it then opened up a small timing window for the
> batches being added by the init_context (basically setting up the context)
> to complete before the interrupts have been turned on, thus hanging the
> GPU.
> 
> BUG: https://bugs.freedesktop.org/show_bug.cgi?id=89600

It's Bugzilla: Also this needs to be backported to 4.0, so needs an
Cc: stable@vger.kernel.org

With this change the resume code is now again in-line wrt interrupt
enabling and gem_init_hw with the driver load and gpu reset code.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel

> Signed-off-by: Peter Antoine <peter.antoine@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6bb6c47..90b1309 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -734,6 +734,12 @@ static int i915_drm_resume(struct drm_device *dev)
>  	intel_init_pch_refclk(dev);
>  	drm_mode_config_reset(dev);
>  
> +	/* Interrupts have to enabled so that any batches that are completed
> +	 * when the context is restarted are caught so that the ring buffer
> +	 * does not handle.
> +	 */
> +	intel_runtime_pm_enable_interrupts(dev_priv);
> +
>  	mutex_lock(&dev->struct_mutex);
>  	if (i915_gem_init_hw(dev)) {
>  		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
> @@ -741,9 +747,7 @@ static int i915_drm_resume(struct drm_device *dev)
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	/* We need working interrupts for modeset enabling ... */
> -	intel_runtime_pm_enable_interrupts(dev_priv);
> -
> +	/* This must follow the pm enable interrupts */
>  	intel_modeset_init_hw(dev);
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
> -- 
> 1.9.1
>
Daniel Vetter May 11, 2015, 11:55 a.m. UTC | #3
On Mon, May 11, 2015 at 08:50:45AM +0100, Peter Antoine wrote:
> This patch fixes a timing issue that causes a GPU hang when the system
> comes out of power saving.
> 
> During pm_resume, We are submitting batchbuffers before enabling
> Interrupts this is causing us to miss the context switch interrupt,
> and in consequence intel_execlists_handle_ctx_events is not triggered.
> 
> This patch is based on a patch from Deepak S <deepak.s@intel.com>
> from another platform.
> 
> The patch fixes an issue introduced by:
>   commit e7778be1eab918274f79603d7c17b3ec8be77386
>   drm/i915: Fix startup failure in LRC mode after recent init changes
> 
> The above patch added a call to init_context() to fix an issue introduced
> by a previous patch. But, it then opened up a small timing window for the
> batches being added by the init_context (basically setting up the context)
> to complete before the interrupts have been turned on, thus hanging the
> GPU.
> 
> BUG: https://bugs.freedesktop.org/show_bug.cgi?id=89600
> Signed-off-by: Peter Antoine <peter.antoine@intel.com>

btw can you please follow up with a patch to encode these depencies? A

WARN_ON(!irqs_enabled);

in execlists_context_unqueue after the spinlock assert would be good I
think.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_drv.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6bb6c47..90b1309 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -734,6 +734,12 @@ static int i915_drm_resume(struct drm_device *dev)
>  	intel_init_pch_refclk(dev);
>  	drm_mode_config_reset(dev);
>  
> +	/* Interrupts have to enabled so that any batches that are completed
> +	 * when the context is restarted are caught so that the ring buffer
> +	 * does not handle.
> +	 */
> +	intel_runtime_pm_enable_interrupts(dev_priv);
> +
>  	mutex_lock(&dev->struct_mutex);
>  	if (i915_gem_init_hw(dev)) {
>  		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
> @@ -741,9 +747,7 @@ static int i915_drm_resume(struct drm_device *dev)
>  	}
>  	mutex_unlock(&dev->struct_mutex);
>  
> -	/* We need working interrupts for modeset enabling ... */
> -	intel_runtime_pm_enable_interrupts(dev_priv);
> -
> +	/* This must follow the pm enable interrupts */
>  	intel_modeset_init_hw(dev);
>  
>  	spin_lock_irq(&dev_priv->irq_lock);
> -- 
> 1.9.1
>

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6bb6c47..90b1309 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -734,6 +734,12 @@  static int i915_drm_resume(struct drm_device *dev)
 	intel_init_pch_refclk(dev);
 	drm_mode_config_reset(dev);
 
+	/* Interrupts have to enabled so that any batches that are completed
+	 * when the context is restarted are caught so that the ring buffer
+	 * does not handle.
+	 */
+	intel_runtime_pm_enable_interrupts(dev_priv);
+
 	mutex_lock(&dev->struct_mutex);
 	if (i915_gem_init_hw(dev)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
@@ -741,9 +747,7 @@  static int i915_drm_resume(struct drm_device *dev)
 	}
 	mutex_unlock(&dev->struct_mutex);
 
-	/* We need working interrupts for modeset enabling ... */
-	intel_runtime_pm_enable_interrupts(dev_priv);
-
+	/* This must follow the pm enable interrupts */
 	intel_modeset_init_hw(dev);
 
 	spin_lock_irq(&dev_priv->irq_lock);