[v2] drm/i915: check that rpm ref is held when accessing ringbuf in stolen mem
diff mbox

Message ID 1453904840-10322-1-git-send-email-daniele.ceraolospurio@intel.com
State New
Headers show

Commit Message

Daniele Ceraolo Spurio Jan. 27, 2016, 2:27 p.m. UTC
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

While running some tests on the scheduler patches with rpm enabled I
came across a corruption in the ringbuffer, which was root-caused to
the GPU being suspended while commands were being emitted to the
ringbuffer. The access to memory was failing because the GPU needs to
be awake when accessing stolen memory (where my ringbuffer was located).
Since we have this constraint it looks like a sensible idea to check
that we hold a refcount when we access the rungbuffer.

v2: move the check from ring_begin to ringbuffer iomap time (Chris)

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson Jan. 27, 2016, 3:11 p.m. UTC | #1
On Wed, Jan 27, 2016 at 02:27:20PM +0000, daniele.ceraolospurio@intel.com wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> While running some tests on the scheduler patches with rpm enabled I
> came across a corruption in the ringbuffer, which was root-caused to
> the GPU being suspended while commands were being emitted to the
> ringbuffer. The access to memory was failing because the GPU needs to
> be awake when accessing stolen memory (where my ringbuffer was located).
> Since we have this constraint it looks like a sensible idea to check
> that we hold a refcount when we access the rungbuffer.
> 
> v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> 
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 6f5b511..c661dfe 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>  			return ret;
>  		}
>  
> +		// if we don't hold a wakeref the I/O access can fail

/* C-style comments only */

/* Access through the GTT requires the device to be awake. */
> +		assert_rpm_wakelock_held(dev_priv);

Now if we could just add intel_runtime_pm_get_noresume() here
and add intel_runtime_pm_put() to unpin to indicate the range of
the access, that would be ideal. However, that requires us to be more
proactive in dropping the context pin.
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6f5b511..c661dfe 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2119,6 +2119,9 @@  int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 			return ret;
 		}
 
+		// if we don't hold a wakeref the I/O access can fail
+		assert_rpm_wakelock_held(dev_priv);
+
 		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
 						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
 		if (ringbuf->virtual_start == NULL) {