diff mbox series

[08/33] drm/i915/gt: Mark context->state vma as active while pinned

Message ID 20191212140459.1307617-8-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/33] drm/i915: Use EAGAIN for trylock failures | expand

Commit Message

Chris Wilson Dec. 12, 2019, 2:04 p.m. UTC
As we use the active state to keep the vma alive while we are reading
its contents during GPU error capture, we need to mark the
context->state vma as active during execution if we want to include it
in the error state.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Lionel Landwerlin Dec. 12, 2019, 7:55 p.m. UTC | #1
On 12/12/2019 16:04, Chris Wilson wrote:
> As we use the active state to keep the vma alive while we are reading
> its contents during GPU error capture, we need to mark the
> context->state vma as active during execution if we want to include it
> in the error state.
>
> Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own mutex")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 61c39e943f69..f7e2f3af007a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -120,6 +120,10 @@ static int __context_pin_state(struct i915_vma *vma)
>   	if (err)
>   		return err;
>   
> +	err = i915_active_acquire(&vma->active);
> +	if (err)
> +		goto err_unpin;
> +
>   	/*
>   	 * And mark it as a globally pinned object to let the shrinker know
>   	 * it cannot reclaim the object until we release it.
> @@ -128,11 +132,16 @@ static int __context_pin_state(struct i915_vma *vma)
>   	vma->obj->mm.dirty = true;
>   
>   	return 0;
> +
> +err_unpin:
> +	i915_vma_unpin(vma);
> +	return err;
>   }
>   
>   static void __context_unpin_state(struct i915_vma *vma)
>   {
>   	i915_vma_make_shrinkable(vma);
> +	i915_active_release(&vma->active);
>   	__i915_vma_unpin(vma);
>   }
>
Lionel Landwerlin Dec. 12, 2019, 9:35 p.m. UTC | #2
On 12/12/2019 21:55, Lionel Landwerlin wrote:
> On 12/12/2019 16:04, Chris Wilson wrote:
>> As we use the active state to keep the vma alive while we are reading
>> its contents during GPU error capture, we need to mark the
>> context->state vma as active during execution if we want to include it
>> in the error state.
>>
>> Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own 
>> mutex")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> 


I'm wondering whether we want a test for this or some kind of assert in 
the error capture.

If there is a batch, there must be a context/ring kind of assert.


-Lionel
Chris Wilson Dec. 12, 2019, 9:46 p.m. UTC | #3
Quoting Lionel Landwerlin (2019-12-12 21:35:00)
> On 12/12/2019 21:55, Lionel Landwerlin wrote:
> > On 12/12/2019 16:04, Chris Wilson wrote:
> >> As we use the active state to keep the vma alive while we are reading
> >> its contents during GPU error capture, we need to mark the
> >> context->state vma as active during execution if we want to include it
> >> in the error state.
> >>
> >> Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> >> Fixes: b1e3177bd1d8 ("drm/i915: Coordinate i915_active with its own 
> >> mutex")
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> > Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> 
> 
> 
> I'm wondering whether we want a test for this or some kind of assert in 
> the error capture.

Considered it, but it's not a fundamental part of the ABI, it's just
useful to have until we have something better. There are a few valid
reasons why we might not be able to capture it as well, it's best
effort.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 61c39e943f69..f7e2f3af007a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -120,6 +120,10 @@  static int __context_pin_state(struct i915_vma *vma)
 	if (err)
 		return err;
 
+	err = i915_active_acquire(&vma->active);
+	if (err)
+		goto err_unpin;
+
 	/*
 	 * And mark it as a globally pinned object to let the shrinker know
 	 * it cannot reclaim the object until we release it.
@@ -128,11 +132,16 @@  static int __context_pin_state(struct i915_vma *vma)
 	vma->obj->mm.dirty = true;
 
 	return 0;
+
+err_unpin:
+	i915_vma_unpin(vma);
+	return err;
 }
 
 static void __context_unpin_state(struct i915_vma *vma)
 {
 	i915_vma_make_shrinkable(vma);
+	i915_active_release(&vma->active);
 	__i915_vma_unpin(vma);
 }