diff mbox series

[07/14] drm/i915: Stop spinning for DROP_IDLE (debugfs/i915_drop_caches)

Message ID 20190501114541.10077-7-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/14] drm/i915/hangcheck: Track context changes | expand

Commit Message

Chris Wilson May 1, 2019, 11:45 a.m. UTC
If the user is racing a call to debugfs/i915_drop_caches with ongoing
submission from another thread/process, we may never end up idling the
GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying
to catch an idle moment.

Just flush the work once, that should be enough to park the system under
correct conditions. Outside of those we either have a driver bug or the
user is racing themselves. Sadly, because the user may be provoking the
unwanted situation we can't put a warn here to attract attention to a
probable bug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Tvrtko Ursulin May 2, 2019, 1:34 p.m. UTC | #1
On 01/05/2019 12:45, Chris Wilson wrote:
> If the user is racing a call to debugfs/i915_drop_caches with ongoing
> submission from another thread/process, we may never end up idling the
> GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying
> to catch an idle moment.
> 
> Just flush the work once, that should be enough to park the system under
> correct conditions. Outside of those we either have a driver bug or the
> user is racing themselves. Sadly, because the user may be provoking the
> unwanted situation we can't put a warn here to attract attention to a
> probable bug.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 7e8898d0b78b..2ecefacb1e66 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val)
>   	fs_reclaim_release(GFP_KERNEL);
>   
>   	if (val & DROP_IDLE) {
> -		do {
> -			flush_delayed_work(&i915->gem.retire_work);
> -		} while (READ_ONCE(i915->gt.awake));
> +		flush_delayed_work(&i915->gem.retire_work);
>   		flush_work(&i915->gem.idle_work);
>   	}
>   
> 

What were supposed to be semantics of DROP_IDLE? Now it seems rather 
weak. Should it for instance also imply DROP_ACTIVE?

Regards,

Tvrtko
Chris Wilson May 2, 2019, 2 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-05-02 14:34:11)
> 
> On 01/05/2019 12:45, Chris Wilson wrote:
> > If the user is racing a call to debugfs/i915_drop_caches with ongoing
> > submission from another thread/process, we may never end up idling the
> > GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying
> > to catch an idle moment.
> > 
> > Just flush the work once, that should be enough to park the system under
> > correct conditions. Outside of those we either have a driver bug or the
> > user is racing themselves. Sadly, because the user may be provoking the
> > unwanted situation we can't put a warn here to attract attention to a
> > probable bug.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c | 4 +---
> >   1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 7e8898d0b78b..2ecefacb1e66 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val)
> >       fs_reclaim_release(GFP_KERNEL);
> >   
> >       if (val & DROP_IDLE) {
> > -             do {
> > -                     flush_delayed_work(&i915->gem.retire_work);
> > -             } while (READ_ONCE(i915->gt.awake));
> > +             flush_delayed_work(&i915->gem.retire_work);
> >               flush_work(&i915->gem.idle_work);
> >       }
> >   
> > 
> 
> What were supposed to be semantics of DROP_IDLE? Now it seems rather 
> weak. Should it for instance also imply DROP_ACTIVE?

All I need for DROP_IDLE is that the idle worker is flushed. I've always
assumed you would pass in DROP_ACTIVE | DROP_RETIRE | DROP_IDLE as the
trifecta.

The biggest problem here is that's it is an uninterruptible loop.
-Chris
Tvrtko Ursulin May 2, 2019, 2:16 p.m. UTC | #3
On 02/05/2019 15:00, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-05-02 14:34:11)
>>
>> On 01/05/2019 12:45, Chris Wilson wrote:
>>> If the user is racing a call to debugfs/i915_drop_caches with ongoing
>>> submission from another thread/process, we may never end up idling the
>>> GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying
>>> to catch an idle moment.
>>>
>>> Just flush the work once, that should be enough to park the system under
>>> correct conditions. Outside of those we either have a driver bug or the
>>> user is racing themselves. Sadly, because the user may be provoking the
>>> unwanted situation we can't put a warn here to attract attention to a
>>> probable bug.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_debugfs.c | 4 +---
>>>    1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index 7e8898d0b78b..2ecefacb1e66 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val)
>>>        fs_reclaim_release(GFP_KERNEL);
>>>    
>>>        if (val & DROP_IDLE) {
>>> -             do {
>>> -                     flush_delayed_work(&i915->gem.retire_work);
>>> -             } while (READ_ONCE(i915->gt.awake));
>>> +             flush_delayed_work(&i915->gem.retire_work);
>>>                flush_work(&i915->gem.idle_work);
>>>        }
>>>    
>>>
>>
>> What were supposed to be semantics of DROP_IDLE? Now it seems rather
>> weak. Should it for instance also imply DROP_ACTIVE?
> 
> All I need for DROP_IDLE is that the idle worker is flushed. I've always
> assumed you would pass in DROP_ACTIVE | DROP_RETIRE | DROP_IDLE as the
> trifecta.
> 
> The biggest problem here is that's it is an uninterruptible loop.

Okay.. lets see if IGT is playing ball. :)

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 7e8898d0b78b..2ecefacb1e66 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3933,9 +3933,7 @@  i915_drop_caches_set(void *data, u64 val)
 	fs_reclaim_release(GFP_KERNEL);
 
 	if (val & DROP_IDLE) {
-		do {
-			flush_delayed_work(&i915->gem.retire_work);
-		} while (READ_ONCE(i915->gt.awake));
+		flush_delayed_work(&i915->gem.retire_work);
 		flush_work(&i915->gem.idle_work);
 	}