diff mbox series

drm/i915/selftest: If reconfigure_sseu is busy, try again

Message ID 20191126164712.2802694-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/selftest: If reconfigure_sseu is busy, try again | expand

Commit Message

Chris Wilson Nov. 26, 2019, 4:47 p.m. UTC
Following 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context()
to a trylock"), we punt to the caller if the local intel_context
happens to be busy as we try to rewrite the sseu (due to retiring in
another thread). As the interlude should be short, spin until the lock
is available.

The regret for using mutex_trylock() and not an atomic insertion of the
barrier is growing...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Tvrtko Ursulin Nov. 26, 2019, 5 p.m. UTC | #1
On 26/11/2019 16:47, Chris Wilson wrote:
> Following 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context()
> to a trylock"), we punt to the caller if the local intel_context
> happens to be busy as we try to rewrite the sseu (due to retiring in
> another thread). As the interlude should be short, spin until the lock
> is available.
> 
> The regret for using mutex_trylock() and not an atomic insertion of the
> barrier is growing...
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index 2ea4790f3721..571cc996577c 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
>   	if (ret)
>   		goto out_pm;
>   
> -	ret = intel_context_reconfigure_sseu(ce, sseu);
> +	do {
> +		ret = intel_context_reconfigure_sseu(ce, sseu);
> +		cond_resched();
> +	} while (ret == -EAGAIN);
>   	if (ret)
>   		goto out_spin;
>   
> 

Hm I looked at the selftests, saw error is correctly propagated, and 
concluded it will be fine. I missed the problem selftests will not 
actually retry. But wait, can we even count that userspace will if all 
of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels 
like we may need to revert.

Regards,

Tvrtko
Chris Wilson Nov. 26, 2019, 5:05 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-11-26 17:00:53)
> 
> On 26/11/2019 16:47, Chris Wilson wrote:
> > Following 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context()
> > to a trylock"), we punt to the caller if the local intel_context
> > happens to be busy as we try to rewrite the sseu (due to retiring in
> > another thread). As the interlude should be short, spin until the lock
> > is available.
> > 
> > The regret for using mutex_trylock() and not an atomic insertion of the
> > barrier is growing...
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > index 2ea4790f3721..571cc996577c 100644
> > --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> > @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
> >       if (ret)
> >               goto out_pm;
> >   
> > -     ret = intel_context_reconfigure_sseu(ce, sseu);
> > +     do {
> > +             ret = intel_context_reconfigure_sseu(ce, sseu);
> > +             cond_resched();
> > +     } while (ret == -EAGAIN);
> >       if (ret)
> >               goto out_spin;
> >   
> > 
> 
> Hm I looked at the selftests, saw error is correctly propagated, and 
> concluded it will be fine. I missed the problem selftests will not 
> actually retry. But wait, can we even count that userspace will if all 
> of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels 
> like we may need to revert.

We invoke the principle of drmIoctl() catches -EAGAIN.
-Chris
Tvrtko Ursulin Nov. 26, 2019, 5:08 p.m. UTC | #3
On 26/11/2019 17:05, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-26 17:00:53)
>>
>> On 26/11/2019 16:47, Chris Wilson wrote:
>>> Following 58b4c1a07ada ("drm/i915: Reduce nested prepare_remote_context()
>>> to a trylock"), we punt to the caller if the local intel_context
>>> happens to be busy as we try to rewrite the sseu (due to retiring in
>>> another thread). As the interlude should be short, spin until the lock
>>> is available.
>>>
>>> The regret for using mutex_trylock() and not an atomic insertion of the
>>> barrier is growing...
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> index 2ea4790f3721..571cc996577c 100644
>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>> @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
>>>        if (ret)
>>>                goto out_pm;
>>>    
>>> -     ret = intel_context_reconfigure_sseu(ce, sseu);
>>> +     do {
>>> +             ret = intel_context_reconfigure_sseu(ce, sseu);
>>> +             cond_resched();
>>> +     } while (ret == -EAGAIN);
>>>        if (ret)
>>>                goto out_spin;
>>>    
>>>
>>
>> Hm I looked at the selftests, saw error is correctly propagated, and
>> concluded it will be fine. I missed the problem selftests will not
>> actually retry. But wait, can we even count that userspace will if all
>> of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels
>> like we may need to revert.
> 
> We invoke the principle of drmIoctl() catches -EAGAIN.

I'm not comfortable with that. :( Not least how we are saying not to use 
libdrm. man 2 ioctl does not mention -EAGAIN. :(

Regards,

Tvrtko
Tvrtko Ursulin Nov. 26, 2019, 5:11 p.m. UTC | #4
On 26/11/2019 17:08, Tvrtko Ursulin wrote:
> 
> On 26/11/2019 17:05, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2019-11-26 17:00:53)
>>>
>>> On 26/11/2019 16:47, Chris Wilson wrote:
>>>> Following 58b4c1a07ada ("drm/i915: Reduce nested 
>>>> prepare_remote_context()
>>>> to a trylock"), we punt to the caller if the local intel_context
>>>> happens to be busy as we try to rewrite the sseu (due to retiring in
>>>> another thread). As the interlude should be short, spin until the lock
>>>> is available.
>>>>
>>>> The regret for using mutex_trylock() and not an atomic insertion of the
>>>> barrier is growing...
>>>>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
>>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>>> index 2ea4790f3721..571cc996577c 100644
>>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>>>> @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
>>>>        if (ret)
>>>>                goto out_pm;
>>>> -     ret = intel_context_reconfigure_sseu(ce, sseu);
>>>> +     do {
>>>> +             ret = intel_context_reconfigure_sseu(ce, sseu);
>>>> +             cond_resched();
>>>> +     } while (ret == -EAGAIN);
>>>>        if (ret)
>>>>                goto out_spin;
>>>>
>>>
>>> Hm I looked at the selftests, saw error is correctly propagated, and
>>> concluded it will be fine. I missed the problem selftests will not
>>> actually retry. But wait, can we even count that userspace will if all
>>> of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels
>>> like we may need to revert.
>>
>> We invoke the principle of drmIoctl() catches -EAGAIN.
> 
> I'm not comfortable with that. :( Not least how we are saying not to use 
> libdrm. man 2 ioctl does not mention -EAGAIN. :(

Or duct tape by looping in set_sseu as well?

Regards,

Tvrtko
Chris Wilson Nov. 26, 2019, 5:19 p.m. UTC | #5
Quoting Tvrtko Ursulin (2019-11-26 17:11:04)
> 
> On 26/11/2019 17:08, Tvrtko Ursulin wrote:
> > 
> > On 26/11/2019 17:05, Chris Wilson wrote:
> >> Quoting Tvrtko Ursulin (2019-11-26 17:00:53)
> >>>
> >>> On 26/11/2019 16:47, Chris Wilson wrote:
> >>>> Following 58b4c1a07ada ("drm/i915: Reduce nested 
> >>>> prepare_remote_context()
> >>>> to a trylock"), we punt to the caller if the local intel_context
> >>>> happens to be busy as we try to rewrite the sseu (due to retiring in
> >>>> another thread). As the interlude should be short, spin until the lock
> >>>> is available.
> >>>>
> >>>> The regret for using mutex_trylock() and not an atomic insertion of the
> >>>> barrier is growing...
> >>>>
> >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
> >>>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> index 2ea4790f3721..571cc996577c 100644
> >>>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> >>>> @@ -1197,7 +1197,10 @@ __sseu_test(const char *name,
> >>>>        if (ret)
> >>>>                goto out_pm;
> >>>> -     ret = intel_context_reconfigure_sseu(ce, sseu);
> >>>> +     do {
> >>>> +             ret = intel_context_reconfigure_sseu(ce, sseu);
> >>>> +             cond_resched();
> >>>> +     } while (ret == -EAGAIN);
> >>>>        if (ret)
> >>>>                goto out_spin;
> >>>>
> >>>
> >>> Hm I looked at the selftests, saw error is correctly propagated, and
> >>> concluded it will be fine. I missed the problem selftests will not
> >>> actually retry. But wait, can we even count that userspace will if all
> >>> of a sudden ctx.set_param starts returning -EAGAIN sporadically? Feels
> >>> like we may need to revert.
> >>
> >> We invoke the principle of drmIoctl() catches -EAGAIN.
> > 
> > I'm not comfortable with that. :( Not least how we are saying not to use 
> > libdrm. man 2 ioctl does not mention -EAGAIN. :(
> 
> Or duct tape by looping in set_sseu as well?

Could do, the actual atomic insertion doesn't look to horrendous (albeit
yet another atomic op along that path), but will take a fair amount of
rationalising. Certainly another cup of tea worth.

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index dca15ace88f6..181c6a3aaafa 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -753,11 +753,13 @@ __i915_active_fence_set(struct i915_active_fence *active,
        struct dma_fence *prev;
        unsigned long flags;

-       /* NB: must be serialised by an outer timeline mutex (active->lock) */
+       if (fence == rcu_access_pointer(active->fence))
+               return fence;
+
        spin_lock_irqsave(fence->lock, flags);
        GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));

-       prev = rcu_dereference_protected(active->fence, active_is_held(active));
+       prev = xchg((struct dma_fence ** __force)&active->fence, fence);
        if (prev) {
                GEM_BUG_ON(prev == fence);
                spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
@@ -775,12 +777,13 @@ __i915_active_fence_set(struct i915_active_fence *active,
                 * our list_del() [decoupling prev from the callback] or
                 * the callback...
                 */
-               prev = rcu_access_pointer(active->fence);
+               if (!rcu_access_pointer(active->fence)) {
+                       RCU_INIT_POINTER(active->fence, fence);
+                       prev = NULL;
+               }
        }

-       rcu_assign_pointer(active->fence, fence);
        list_add_tail(&active->cb.node, &fence->cb_list);
-
        spin_unlock_irqrestore(fence->lock, flags);

        return prev;

-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 2ea4790f3721..571cc996577c 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -1197,7 +1197,10 @@  __sseu_test(const char *name,
 	if (ret)
 		goto out_pm;
 
-	ret = intel_context_reconfigure_sseu(ce, sseu);
+	do {
+		ret = intel_context_reconfigure_sseu(ce, sseu);
+		cond_resched();
+	} while (ret == -EAGAIN);
 	if (ret)
 		goto out_spin;