diff mbox

[v3] drm/i915: check that rpm ref is held when accessing ringbuf in stolen mem

Message ID 1453909429-11024-1-git-send-email-daniele.ceraolospurio@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniele Ceraolo Spurio Jan. 27, 2016, 3:43 p.m. UTC
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

While running some tests on the scheduler patches with rpm enabled I
came across a corruption in the ringbuffer, which was root-caused to
the GPU being suspended while commands were being emitted to the
ringbuffer. The access to memory was failing because the GPU needs to
be awake when accessing stolen memory (where my ringbuffer was located).
Since we have this constraint it looks like a sensible idea to check
that we hold a refcount when we access the rungbuffer.

v2: move the check from ring_begin to ringbuffer iomap time (Chris)
v3: update comment (Chris)

Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Ville Syrjälä Jan. 27, 2016, 4:39 p.m. UTC | #1
On Wed, Jan 27, 2016 at 03:43:49PM +0000, daniele.ceraolospurio@intel.com wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> While running some tests on the scheduler patches with rpm enabled I
> came across a corruption in the ringbuffer, which was root-caused to
> the GPU being suspended while commands were being emitted to the
> ringbuffer. The access to memory was failing because the GPU needs to
> be awake when accessing stolen memory (where my ringbuffer was located).
> Since we have this constraint it looks like a sensible idea to check
> that we hold a refcount when we access the rungbuffer.
> 
> v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> v3: update comment (Chris)
> 
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 6f5b511..133321a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>  			return ret;
>  		}
>  
> +		/* Access through the GTT requires the device to be awake. */
> +		assert_rpm_wakelock_held(dev_priv);
> +

Hmm. This function doesn't actually acces the ring buffer, so it's a bit
odd to see this here.

>  		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
>  						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
>  		if (ringbuf->virtual_start == NULL) {
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 27, 2016, 4:44 p.m. UTC | #2
On Wed, Jan 27, 2016 at 03:43:49PM +0000, daniele.ceraolospurio@intel.com wrote:
> From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> While running some tests on the scheduler patches with rpm enabled I
> came across a corruption in the ringbuffer, which was root-caused to
> the GPU being suspended while commands were being emitted to the
> ringbuffer. The access to memory was failing because the GPU needs to
> be awake when accessing stolen memory (where my ringbuffer was located).
> Since we have this constraint it looks like a sensible idea to check
> that we hold a refcount when we access the rungbuffer.
> 
> v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> v3: update comment (Chris)
> 
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

That explains itself nicely, thanks.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

It also rings alarms bells for intel_fbdev.c
-Chris
Daniele Ceraolo Spurio Jan. 28, 2016, 10:55 a.m. UTC | #3
On 27/01/16 16:39, Ville Syrjälä wrote:
> On Wed, Jan 27, 2016 at 03:43:49PM +0000,daniele.ceraolospurio@intel.com  wrote:
>> From: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
>>
>> While running some tests on the scheduler patches with rpm enabled I
>> came across a corruption in the ringbuffer, which was root-caused to
>> the GPU being suspended while commands were being emitted to the
>> ringbuffer. The access to memory was failing because the GPU needs to
>> be awake when accessing stolen memory (where my ringbuffer was located).
>> Since we have this constraint it looks like a sensible idea to check
>> that we hold a refcount when we access the rungbuffer.
>>
>> v2: move the check from ring_begin to ringbuffer iomap time (Chris)
>> v3: update comment (Chris)
>>
>> Cc: John Harrison<John.C.Harrison@Intel.com>
>> Cc: Chris Wilson<chris@chris-wilson.co.uk>
>> Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 6f5b511..133321a 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
>>   			return ret;
>>   		}
>>   
>> +		/* Access through the GTT requires the device to be awake. */
>> +		assert_rpm_wakelock_held(dev_priv);
>> +
> Hmm. This function doesn't actually acces the ring buffer, so it's a bit
> odd to see this here.

I had it inring_begin initially, but Chris suggested moving it here 
because we pin the ringbuffer before accessing it. Do you have a 
different place in mind for where this should be added or would you be 
happy with a simple comment update?

Thanks,
Daniele

>>   		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
>>   						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
>>   		if (ringbuf->virtual_start == NULL) {
>> -- 
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 28, 2016, 11:45 a.m. UTC | #4
On Thu, Jan 28, 2016 at 10:55:16AM +0000, Daniele Ceraolo Spurio wrote:
> 
> 
> On 27/01/16 16:39, Ville Syrjälä wrote:
> >On Wed, Jan 27, 2016 at 03:43:49PM +0000,daniele.ceraolospurio@intel.com  wrote:
> >>From: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> >>
> >>While running some tests on the scheduler patches with rpm enabled I
> >>came across a corruption in the ringbuffer, which was root-caused to
> >>the GPU being suspended while commands were being emitted to the
> >>ringbuffer. The access to memory was failing because the GPU needs to
> >>be awake when accessing stolen memory (where my ringbuffer was located).
> >>Since we have this constraint it looks like a sensible idea to check
> >>that we hold a refcount when we access the rungbuffer.
> >>
> >>v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> >>v3: update comment (Chris)
> >>
> >>Cc: John Harrison<John.C.Harrison@Intel.com>
> >>Cc: Chris Wilson<chris@chris-wilson.co.uk>
> >>Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> >>---
> >>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>index 6f5b511..133321a 100644
> >>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> >>@@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> >>  			return ret;
> >>  		}
> >>+		/* Access through the GTT requires the device to be awake. */
> >>+		assert_rpm_wakelock_held(dev_priv);
> >>+
> >Hmm. This function doesn't actually acces the ring buffer, so it's a bit
> >odd to see this here.
> 
> I had it inring_begin initially, but Chris suggested moving it here
> because we pin the ringbuffer before accessing it. Do you have a
> different place in mind for where this should be added or would you
> be happy with a simple comment update?

This function we call in order to acquire access to the ring iomap for
the request. At the beginning of the request, we should be pinning
everything we need to build the request. If writing through the GTT we
should be ensuring that the device is also awake. The oddity is that
this is not yet explicit and the asymmetry still exists between
legacy/execlists.
-Chris
Ville Syrjälä Jan. 28, 2016, 12:09 p.m. UTC | #5
On Thu, Jan 28, 2016 at 11:45:24AM +0000, Chris Wilson wrote:
> On Thu, Jan 28, 2016 at 10:55:16AM +0000, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 27/01/16 16:39, Ville Syrjälä wrote:
> > >On Wed, Jan 27, 2016 at 03:43:49PM +0000,daniele.ceraolospurio@intel.com  wrote:
> > >>From: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> > >>
> > >>While running some tests on the scheduler patches with rpm enabled I
> > >>came across a corruption in the ringbuffer, which was root-caused to
> > >>the GPU being suspended while commands were being emitted to the
> > >>ringbuffer. The access to memory was failing because the GPU needs to
> > >>be awake when accessing stolen memory (where my ringbuffer was located).
> > >>Since we have this constraint it looks like a sensible idea to check
> > >>that we hold a refcount when we access the rungbuffer.
> > >>
> > >>v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> > >>v3: update comment (Chris)
> > >>
> > >>Cc: John Harrison<John.C.Harrison@Intel.com>
> > >>Cc: Chris Wilson<chris@chris-wilson.co.uk>
> > >>Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> > >>---
> > >>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
> > >>  1 file changed, 3 insertions(+)
> > >>
> > >>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >>index 6f5b511..133321a 100644
> > >>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > >>@@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> > >>  			return ret;
> > >>  		}
> > >>+		/* Access through the GTT requires the device to be awake. */
> > >>+		assert_rpm_wakelock_held(dev_priv);
> > >>+
> > >Hmm. This function doesn't actually acces the ring buffer, so it's a bit
> > >odd to see this here.
> > 
> > I had it inring_begin initially, but Chris suggested moving it here
> > because we pin the ringbuffer before accessing it. Do you have a
> > different place in mind for where this should be added or would you
> > be happy with a simple comment update?
> 
> This function we call in order to acquire access to the ring iomap for
> the request. At the beginning of the request, we should be pinning
> everything we need to build the request. If writing through the GTT we
> should be ensuring that the device is also awake. The oddity is that
> this is not yet explicit and the asymmetry still exists between
> legacy/execlists.

Yeah, with ringbuffer mode this gets executed exactly once, so more or
less useless at the moment. With execlists I suppose it might catch
something on CHV/BXT.
Chris Wilson Jan. 28, 2016, 12:30 p.m. UTC | #6
On Thu, Jan 28, 2016 at 02:09:37PM +0200, Ville Syrjälä wrote:
> On Thu, Jan 28, 2016 at 11:45:24AM +0000, Chris Wilson wrote:
> > On Thu, Jan 28, 2016 at 10:55:16AM +0000, Daniele Ceraolo Spurio wrote:
> > > 
> > > 
> > > On 27/01/16 16:39, Ville Syrjälä wrote:
> > > >On Wed, Jan 27, 2016 at 03:43:49PM +0000,daniele.ceraolospurio@intel.com  wrote:
> > > >>From: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> > > >>
> > > >>While running some tests on the scheduler patches with rpm enabled I
> > > >>came across a corruption in the ringbuffer, which was root-caused to
> > > >>the GPU being suspended while commands were being emitted to the
> > > >>ringbuffer. The access to memory was failing because the GPU needs to
> > > >>be awake when accessing stolen memory (where my ringbuffer was located).
> > > >>Since we have this constraint it looks like a sensible idea to check
> > > >>that we hold a refcount when we access the rungbuffer.
> > > >>
> > > >>v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> > > >>v3: update comment (Chris)
> > > >>
> > > >>Cc: John Harrison<John.C.Harrison@Intel.com>
> > > >>Cc: Chris Wilson<chris@chris-wilson.co.uk>
> > > >>Signed-off-by: Daniele Ceraolo Spurio<daniele.ceraolospurio@intel.com>
> > > >>---
> > > >>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
> > > >>  1 file changed, 3 insertions(+)
> > > >>
> > > >>diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > >>index 6f5b511..133321a 100644
> > > >>--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > >>+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > > >>@@ -2119,6 +2119,9 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> > > >>  			return ret;
> > > >>  		}
> > > >>+		/* Access through the GTT requires the device to be awake. */
> > > >>+		assert_rpm_wakelock_held(dev_priv);
> > > >>+
> > > >Hmm. This function doesn't actually acces the ring buffer, so it's a bit
> > > >odd to see this here.
> > > 
> > > I had it inring_begin initially, but Chris suggested moving it here
> > > because we pin the ringbuffer before accessing it. Do you have a
> > > different place in mind for where this should be added or would you
> > > be happy with a simple comment update?
> > 
> > This function we call in order to acquire access to the ring iomap for
> > the request. At the beginning of the request, we should be pinning
> > everything we need to build the request. If writing through the GTT we
> > should be ensuring that the device is also awake. The oddity is that
> > this is not yet explicit and the asymmetry still exists between
> > legacy/execlists.
> 
> Yeah, with ringbuffer mode this gets executed exactly once, so more or
> less useless at the moment. With execlists I suppose it might catch
> something on CHV/BXT.

It shouldn't. We hold the wakeref for execbuf request construction, and
that is more or less the only time we create a request in execlists.
(Although it doesn't have to be that way!)
-Chris
Daniel Vetter Feb. 10, 2016, 7:56 a.m. UTC | #7
On Wed, Jan 27, 2016 at 04:44:37PM +0000, Chris Wilson wrote:
> On Wed, Jan 27, 2016 at 03:43:49PM +0000, daniele.ceraolospurio@intel.com wrote:
> > From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > 
> > While running some tests on the scheduler patches with rpm enabled I
> > came across a corruption in the ringbuffer, which was root-caused to
> > the GPU being suspended while commands were being emitted to the
> > ringbuffer. The access to memory was failing because the GPU needs to
> > be awake when accessing stolen memory (where my ringbuffer was located).
> > Since we have this constraint it looks like a sensible idea to check
> > that we hold a refcount when we access the rungbuffer.
> > 
> > v2: move the check from ring_begin to ringbuffer iomap time (Chris)
> > v3: update comment (Chris)
> > 
> > Cc: John Harrison <John.C.Harrison@Intel.com>
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> That explains itself nicely, thanks.
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Queued for -next, thanks for the patch.

> It also rings alarms bells for intel_fbdev.c

Oops, indeed. Might explain why we sometimes just die? And fundamentally
it's unfixable (without a shadow fb) since we can't intercept mmaps on
fbdev. But maybe we need to do that (and use the damage tracking that's
already there in 3 copies in various drivers for uploading).
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6f5b511..133321a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2119,6 +2119,9 @@  int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 			return ret;
 		}
 
+		/* Access through the GTT requires the device to be awake. */
+		assert_rpm_wakelock_held(dev_priv);
+
 		ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
 						    i915_gem_obj_ggtt_offset(obj), ringbuf->size);
 		if (ringbuf->virtual_start == NULL) {