diff mbox

drm/i915: Always retire residual requests before suspend

Message ID 20180717084121.28185-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 17, 2018, 8:41 a.m. UTC
If the driver is wedged, we skip idling the GPU. However, we may still
have a few requests still not retired following the wedging (since they
will be waiting for a background worker trying to acquire struct_mutex).
As we hold the struct_mutex, always do a quick request retirement in
order to flush the wedged path.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Tvrtko Ursulin July 18, 2018, 12:53 p.m. UTC | #1
On 17/07/2018 09:41, Chris Wilson wrote:
> If the driver is wedged, we skip idling the GPU. However, we may still
> have a few requests still not retired following the wedging (since they
> will be waiting for a background worker trying to acquire struct_mutex).
> As we hold the struct_mutex, always do a quick request retirement in
> order to flush the wedged path.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 42d24410a98c..cc875e1dc7f6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5074,6 +5074,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>   
>   		assert_kernel_context_is_current(i915);
>   	}
> +	i915_retire_requests(i915); /* ensure we flush after wedging */
> +

We cannot do this in i915_gem_set_wedged due not having the mutex?

I think it should go in an else block of the !terminally_wedged block to 
signify the alternative idling method for that case. And also to make 
sure the !terminally_wedged case does not start relying on this extra 
retire pass.

Or alternative teach i915_gem_wait_for_idle how to handle the wedged 
case and only make switching to kernel context dependant on 
terminally_wedged status in i915_gem_suspend?

Simple else block sounds good enough to me. For that option:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

>   	mutex_unlock(&i915->drm.struct_mutex);
>   
>   	intel_uc_suspend(i915);
>
Chris Wilson July 18, 2018, 1:25 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-07-18 13:53:16)
> 
> On 17/07/2018 09:41, Chris Wilson wrote:
> > If the driver is wedged, we skip idling the GPU. However, we may still
> > have a few requests still not retired following the wedging (since they
> > will be waiting for a background worker trying to acquire struct_mutex).
> > As we hold the struct_mutex, always do a quick request retirement in
> > order to flush the wedged path.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 2 ++
> >   1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 42d24410a98c..cc875e1dc7f6 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5074,6 +5074,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >   
> >               assert_kernel_context_is_current(i915);
> >       }
> > +     i915_retire_requests(i915); /* ensure we flush after wedging */
> > +
> 
> We cannot do this in i915_gem_set_wedged due not having the mutex?

Correct.

> I think it should go in an else block of the !terminally_wedged block to 
> signify the alternative idling method for that case. And also to make 
> sure the !terminally_wedged case does not start relying on this extra 
> retire pass.

I liked the safety net and clarity of not making it conditional.

> Or alternative teach i915_gem_wait_for_idle how to handle the wedged 
> case and only make switching to kernel context dependant on 
> terminally_wedged status in i915_gem_suspend?

Right, it's the checks that actually worry here. Those depend on the gpu
being in a fairly sane state...
-Chris
Tvrtko Ursulin July 18, 2018, 2:57 p.m. UTC | #3
On 18/07/2018 14:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-18 13:53:16)
>>
>> On 17/07/2018 09:41, Chris Wilson wrote:
>>> If the driver is wedged, we skip idling the GPU. However, we may still
>>> have a few requests still not retired following the wedging (since they
>>> will be waiting for a background worker trying to acquire struct_mutex).
>>> As we hold the struct_mutex, always do a quick request retirement in
>>> order to flush the wedged path.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107257
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c | 2 ++
>>>    1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 42d24410a98c..cc875e1dc7f6 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -5074,6 +5074,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>>>    
>>>                assert_kernel_context_is_current(i915);
>>>        }
>>> +     i915_retire_requests(i915); /* ensure we flush after wedging */
>>> +
>>
>> We cannot do this in i915_gem_set_wedged due not having the mutex?
> 
> Correct.
> 
>> I think it should go in an else block of the !terminally_wedged block to
>> signify the alternative idling method for that case. And also to make
>> sure the !terminally_wedged case does not start relying on this extra
>> retire pass.
> 
> I liked the safety net and clarity of not making it conditional.

And I disliked the sweeping under the carpet potential of it. :)

>> Or alternative teach i915_gem_wait_for_idle how to handle the wedged
>> case and only make switching to kernel context dependant on
>> terminally_wedged status in i915_gem_suspend?
> 
> Right, it's the checks that actually worry here. Those depend on the gpu
> being in a fairly sane state...

What checks? i915_terminally_wedged cannot be. Request retirement itself?

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 42d24410a98c..cc875e1dc7f6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5074,6 +5074,8 @@  int i915_gem_suspend(struct drm_i915_private *i915)
 
 		assert_kernel_context_is_current(i915);
 	}
+	i915_retire_requests(i915); /* ensure we flush after wedging */
+
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	intel_uc_suspend(i915);