diff mbox series

[v2,1/6] drm/i915: move the pre_pin earlier

Message ID 20211117142024.1043017-1-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/6] drm/i915: move the pre_pin earlier | expand

Commit Message

Matthew Auld Nov. 17, 2021, 2:20 p.m. UTC
In intel_context_do_pin_ww, when calling into the pre_pin hook(which is
passed the ww context) it could in theory return -EDEADLK(which is very
likely with debug kernels), once we start adding more ww locking in there,
like in the next patch. If so then we need to be mindful of having to
restart the do_pin at this point.

If this is the kernel_context, or some other early in-kernel context
where we have yet to setup the default_state, then we always inhibit the
context restore, and instead rely on the delayed active_release to set
the CONTEXT_VALID_BIT for us(if we even care), which should indicate
that we have context switched away, and that our newly saved context
state should now be valid. However, since we currently grab the active
reference before the potential ww dance, we can end up setting the
CONTEXT_VALID_BIT much too early, if we need to backoff, and then upon
re-trying the do_pin, we could potentially cause the hardware to
incorrectly load some garbage context state when later context switching
to that context, but at the very least this will trigger the
GEM_BUG_ON() in __engine_unpark. For now let's just move any ww dance
stuff prior to arming the active reference.

For normal user contexts this shouldn't be a concern, since we should
already have the default_state ready when initialising the lrc state,
and so there should be no concern with active_release somehow
prematurely setting the CONTEXT_VALID_BIT.

v2(Thomas):
  - Also re-order the union unwind

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Comments

Thomas Hellström Nov. 17, 2021, 6:49 p.m. UTC | #1
On 11/17/21 15:20, Matthew Auld wrote:
> In intel_context_do_pin_ww, when calling into the pre_pin hook(which is
> passed the ww context) it could in theory return -EDEADLK(which is very
> likely with debug kernels), once we start adding more ww locking in there,
> like in the next patch. If so then we need to be mindful of having to
> restart the do_pin at this point.
>
> If this is the kernel_context, or some other early in-kernel context
> where we have yet to setup the default_state, then we always inhibit the
> context restore, and instead rely on the delayed active_release to set
> the CONTEXT_VALID_BIT for us(if we even care), which should indicate
> that we have context switched away, and that our newly saved context
> state should now be valid. However, since we currently grab the active
> reference before the potential ww dance, we can end up setting the
> CONTEXT_VALID_BIT much too early, if we need to backoff, and then upon
> re-trying the do_pin, we could potentially cause the hardware to
> incorrectly load some garbage context state when later context switching
> to that context, but at the very least this will trigger the
> GEM_BUG_ON() in __engine_unpark. For now let's just move any ww dance
> stuff prior to arming the active reference.
>
> For normal user contexts this shouldn't be a concern, since we should
> already have the default_state ready when initialising the lrc state,
> and so there should be no concern with active_release somehow
> prematurely setting the CONTEXT_VALID_BIT.
>
> v2(Thomas):
>    - Also re-order the union unwind
>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>


> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 5634d14052bc..4c296de1d67d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -228,17 +228,17 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
>   	if (err)
>   		return err;
>   
> -	err = i915_active_acquire(&ce->active);
> +	err = ce->ops->pre_pin(ce, ww, &vaddr);
>   	if (err)
>   		goto err_ctx_unpin;
>   
> -	err = ce->ops->pre_pin(ce, ww, &vaddr);
> +	err = i915_active_acquire(&ce->active);
>   	if (err)
> -		goto err_release;
> +		goto err_post_unpin;
>   
>   	err = mutex_lock_interruptible(&ce->pin_mutex);
>   	if (err)
> -		goto err_post_unpin;
> +		goto err_release;
>   
>   	intel_engine_pm_might_get(ce->engine);
>   
> @@ -273,11 +273,11 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
>   
>   err_unlock:
>   	mutex_unlock(&ce->pin_mutex);
> +err_release:
> +	i915_active_release(&ce->active);
>   err_post_unpin:
>   	if (!handoff)
>   		ce->ops->post_unpin(ce);
> -err_release:
> -	i915_active_release(&ce->active);
>   err_ctx_unpin:
>   	intel_context_post_unpin(ce);
>
Thomas Hellström Nov. 18, 2021, 6:57 a.m. UTC | #2
On Wed, 2021-11-17 at 19:49 +0100, Thomas Hellström wrote:
> 
> On 11/17/21 15:20, Matthew Auld wrote:
> > In intel_context_do_pin_ww, when calling into the pre_pin
> > hook(which is
> > passed the ww context) it could in theory return -EDEADLK(which is
> > very
> > likely with debug kernels), once we start adding more ww locking in
> > there,
> > like in the next patch. If so then we need to be mindful of having
> > to
> > restart the do_pin at this point.
> > 
> > If this is the kernel_context, or some other early in-kernel
> > context
> > where we have yet to setup the default_state, then we always
> > inhibit the
> > context restore, and instead rely on the delayed active_release to
> > set
> > the CONTEXT_VALID_BIT for us(if we even care), which should
> > indicate
> > that we have context switched away, and that our newly saved
> > context
> > state should now be valid. However, since we currently grab the
> > active
> > reference before the potential ww dance, we can end up setting the
> > CONTEXT_VALID_BIT much too early, if we need to backoff, and then
> > upon
> > re-trying the do_pin, we could potentially cause the hardware to
> > incorrectly load some garbage context state when later context
> > switching
> > to that context, but at the very least this will trigger the
> > GEM_BUG_ON() in __engine_unpark. For now let's just move any ww
> > dance
> > stuff prior to arming the active reference.
> > 
> > For normal user contexts this shouldn't be a concern, since we
> > should
> > already have the default_state ready when initialising the lrc
> > state,
> > and so there should be no concern with active_release somehow
> > prematurely setting the CONTEXT_VALID_BIT.
> > 
> > v2(Thomas):
> >    - Also re-order the union unwind

Oh should this be 

s/union/onion/ ?


> > 
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> 
> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> 
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c | 12 ++++++------
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index 5634d14052bc..4c296de1d67d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -228,17 +228,17 @@ int __intel_context_do_pin_ww(struct
> > intel_context *ce,
> >         if (err)
> >                 return err;
> >   
> > -       err = i915_active_acquire(&ce->active);
> > +       err = ce->ops->pre_pin(ce, ww, &vaddr);
> >         if (err)
> >                 goto err_ctx_unpin;
> >   
> > -       err = ce->ops->pre_pin(ce, ww, &vaddr);
> > +       err = i915_active_acquire(&ce->active);
> >         if (err)
> > -               goto err_release;
> > +               goto err_post_unpin;
> >   
> >         err = mutex_lock_interruptible(&ce->pin_mutex);
> >         if (err)
> > -               goto err_post_unpin;
> > +               goto err_release;
> >   
> >         intel_engine_pm_might_get(ce->engine);
> >   
> > @@ -273,11 +273,11 @@ int __intel_context_do_pin_ww(struct
> > intel_context *ce,
> >   
> >   err_unlock:
> >         mutex_unlock(&ce->pin_mutex);
> > +err_release:
> > +       i915_active_release(&ce->active);
> >   err_post_unpin:
> >         if (!handoff)
> >                 ce->ops->post_unpin(ce);
> > -err_release:
> > -       i915_active_release(&ce->active);
> >   err_ctx_unpin:
> >         intel_context_post_unpin(ce);
> >
Matthew Auld Nov. 18, 2021, 9:58 a.m. UTC | #3
On 18/11/2021 06:57, Thomas Hellström wrote:
> On Wed, 2021-11-17 at 19:49 +0100, Thomas Hellström wrote:
>>
>> On 11/17/21 15:20, Matthew Auld wrote:
>>> In intel_context_do_pin_ww, when calling into the pre_pin
>>> hook(which is
>>> passed the ww context) it could in theory return -EDEADLK(which is
>>> very
>>> likely with debug kernels), once we start adding more ww locking in
>>> there,
>>> like in the next patch. If so then we need to be mindful of having
>>> to
>>> restart the do_pin at this point.
>>>
>>> If this is the kernel_context, or some other early in-kernel
>>> context
>>> where we have yet to setup the default_state, then we always
>>> inhibit the
>>> context restore, and instead rely on the delayed active_release to
>>> set
>>> the CONTEXT_VALID_BIT for us(if we even care), which should
>>> indicate
>>> that we have context switched away, and that our newly saved
>>> context
>>> state should now be valid. However, since we currently grab the
>>> active
>>> reference before the potential ww dance, we can end up setting the
>>> CONTEXT_VALID_BIT much too early, if we need to backoff, and then
>>> upon
>>> re-trying the do_pin, we could potentially cause the hardware to
>>> incorrectly load some garbage context state when later context
>>> switching
>>> to that context, but at the very least this will trigger the
>>> GEM_BUG_ON() in __engine_unpark. For now let's just move any ww
>>> dance
>>> stuff prior to arming the active reference.
>>>
>>> For normal user contexts this shouldn't be a concern, since we
>>> should
>>> already have the default_state ready when initialising the lrc
>>> state,
>>> and so there should be no concern with active_release somehow
>>> prematurely setting the CONTEXT_VALID_BIT.
>>>
>>> v2(Thomas):
>>>     - Also re-order the union unwind
> 
> Oh should this be
> 
> s/union/onion/ ?

Oops, will fixup when pushing :)

> 
> 
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>
>> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>
>>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_context.c | 12 ++++++------
>>>    1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
>>> b/drivers/gpu/drm/i915/gt/intel_context.c
>>> index 5634d14052bc..4c296de1d67d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>>> @@ -228,17 +228,17 @@ int __intel_context_do_pin_ww(struct
>>> intel_context *ce,
>>>          if (err)
>>>                  return err;
>>>    
>>> -       err = i915_active_acquire(&ce->active);
>>> +       err = ce->ops->pre_pin(ce, ww, &vaddr);
>>>          if (err)
>>>                  goto err_ctx_unpin;
>>>    
>>> -       err = ce->ops->pre_pin(ce, ww, &vaddr);
>>> +       err = i915_active_acquire(&ce->active);
>>>          if (err)
>>> -               goto err_release;
>>> +               goto err_post_unpin;
>>>    
>>>          err = mutex_lock_interruptible(&ce->pin_mutex);
>>>          if (err)
>>> -               goto err_post_unpin;
>>> +               goto err_release;
>>>    
>>>          intel_engine_pm_might_get(ce->engine);
>>>    
>>> @@ -273,11 +273,11 @@ int __intel_context_do_pin_ww(struct
>>> intel_context *ce,
>>>    
>>>    err_unlock:
>>>          mutex_unlock(&ce->pin_mutex);
>>> +err_release:
>>> +       i915_active_release(&ce->active);
>>>    err_post_unpin:
>>>          if (!handoff)
>>>                  ce->ops->post_unpin(ce);
>>> -err_release:
>>> -       i915_active_release(&ce->active);
>>>    err_ctx_unpin:
>>>          intel_context_post_unpin(ce);
>>>    
> 
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 5634d14052bc..4c296de1d67d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -228,17 +228,17 @@  int __intel_context_do_pin_ww(struct intel_context *ce,
 	if (err)
 		return err;
 
-	err = i915_active_acquire(&ce->active);
+	err = ce->ops->pre_pin(ce, ww, &vaddr);
 	if (err)
 		goto err_ctx_unpin;
 
-	err = ce->ops->pre_pin(ce, ww, &vaddr);
+	err = i915_active_acquire(&ce->active);
 	if (err)
-		goto err_release;
+		goto err_post_unpin;
 
 	err = mutex_lock_interruptible(&ce->pin_mutex);
 	if (err)
-		goto err_post_unpin;
+		goto err_release;
 
 	intel_engine_pm_might_get(ce->engine);
 
@@ -273,11 +273,11 @@  int __intel_context_do_pin_ww(struct intel_context *ce,
 
 err_unlock:
 	mutex_unlock(&ce->pin_mutex);
+err_release:
+	i915_active_release(&ce->active);
 err_post_unpin:
 	if (!handoff)
 		ce->ops->post_unpin(ce);
-err_release:
-	i915_active_release(&ce->active);
 err_ctx_unpin:
 	intel_context_post_unpin(ce);