diff mbox

[v2,3/8] drm/i915: Cope with request list state change during error state capture

Message ID 1445266373-12952-1-git-send-email-tomas.elf@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tomas Elf Oct. 19, 2015, 2:52 p.m. UTC
Since we're not synchronizing the ring request list during error state capture
the request list state might change between the time the corresponding error
request list was allocated and dimensioned to the time when the ring request
list is actually captured into the error state. If this happens then do an
early exit and be aware that the captured error state might not be fully
reliable.

* v2:
- Chris Wilson: Removed WARN_ON from size check since having the error state
  request list and the live driver request list diverge like this is a
  legitimate behaviour.

- Tomas Elf: Removed update of num_request field since this made no sense. Just
  exit and move on.

Signed-off-by: Tomas Elf <tomas.elf@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Daniel Vetter Oct. 19, 2015, 4:07 p.m. UTC | #1
On Mon, Oct 19, 2015 at 03:52:53PM +0100, Tomas Elf wrote:
> Since we're not synchronizing the ring request list during error state capture
> the request list state might change between the time the corresponding error
> request list was allocated and dimensioned to the time when the ring request
> list is actually captured into the error state. If this happens then do an
> early exit and be aware that the captured error state might not be fully
> reliable.
> 
> * v2:
> - Chris Wilson: Removed WARN_ON from size check since having the error state
>   request list and the live driver request list diverge like this is a
>   legitimate behaviour.
> 
> - Tomas Elf: Removed update of num_request field since this made no sense. Just
>   exit and move on.
> 
> Signed-off-by: Tomas Elf <tomas.elf@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2f04e4f..b08a76b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1071,6 +1071,18 @@ static void i915_gem_record_rings(struct drm_device *dev,
>  		list_for_each_entry(request, &ring->request_list, list) {
>  			struct drm_i915_error_request *erq;
>  
> +			if (count >= error->ring[i].num_requests) {
> +				/*
> +				 * If the ring request list was changed in
> +				 * between the point where the error request
> +				 * list was created and dimensioned and this
> +				 * point then just exit early to avoid crashes.
> +				 */
> +				DRM_ERROR("Request list changed size since allocation (%u->%u)\n",
> +					error->ring[i].num_requests, count);

DRM_ERROR still results in reports about dmesg noise, I think
DRM_DEBUG_DRV is the suitable one here.
-Daniel

> +				break;
> +			}
> +
>  			erq = &error->ring[i].requests[count++];
>  			erq->seqno = request->seqno;
>  			erq->jiffies = request->emitted_jiffies;
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2f04e4f..b08a76b 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1071,6 +1071,18 @@  static void i915_gem_record_rings(struct drm_device *dev,
 		list_for_each_entry(request, &ring->request_list, list) {
 			struct drm_i915_error_request *erq;
 
+			if (count >= error->ring[i].num_requests) {
+				/*
+				 * If the ring request list was changed in
+				 * between the point where the error request
+				 * list was created and dimensioned and this
+				 * point then just exit early to avoid crashes.
+				 */
+				DRM_ERROR("Request list changed size since allocation (%u->%u)\n",
+					error->ring[i].num_requests, count);
+				break;
+			}
+
 			erq = &error->ring[i].requests[count++];
 			erq->seqno = request->seqno;
 			erq->jiffies = request->emitted_jiffies;