Message ID | 20190501114541.10077-7-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/14] drm/i915/hangcheck: Track context changes | expand |
On 01/05/2019 12:45, Chris Wilson wrote: > If the user is racing a call to debugfs/i915_drop_caches with ongoing > submission from another thread/process, we may never end up idling the > GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying > to catch an idle moment. > > Just flush the work once, that should be enough to park the system under > correct conditions. Outside of those we either have a driver bug or the > user is racing themselves. Sadly, because the user may be provoking the > unwanted situation we can't put a warn here to attract attention to a > probable bug. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index 7e8898d0b78b..2ecefacb1e66 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val) > fs_reclaim_release(GFP_KERNEL); > > if (val & DROP_IDLE) { > - do { > - flush_delayed_work(&i915->gem.retire_work); > - } while (READ_ONCE(i915->gt.awake)); > + flush_delayed_work(&i915->gem.retire_work); > flush_work(&i915->gem.idle_work); > } > > What were supposed to be semantics of DROP_IDLE? Now it seems rather weak. Should it for instance also imply DROP_ACTIVE? Regards, Tvrtko
Quoting Tvrtko Ursulin (2019-05-02 14:34:11) > > On 01/05/2019 12:45, Chris Wilson wrote: > > If the user is racing a call to debugfs/i915_drop_caches with ongoing > > submission from another thread/process, we may never end up idling the > > GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying > > to catch an idle moment. > > > > Just flush the work once, that should be enough to park the system under > > correct conditions. Outside of those we either have a driver bug or the > > user is racing themselves. Sadly, because the user may be provoking the > > unwanted situation we can't put a warn here to attract attention to a > > probable bug. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_debugfs.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > > index 7e8898d0b78b..2ecefacb1e66 100644 > > --- a/drivers/gpu/drm/i915/i915_debugfs.c > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > > @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val) > > fs_reclaim_release(GFP_KERNEL); > > > > if (val & DROP_IDLE) { > > - do { > > - flush_delayed_work(&i915->gem.retire_work); > > - } while (READ_ONCE(i915->gt.awake)); > > + flush_delayed_work(&i915->gem.retire_work); > > flush_work(&i915->gem.idle_work); > > } > > > > > > What were supposed to be semantics of DROP_IDLE? Now it seems rather > weak. Should it for instance also imply DROP_ACTIVE? All I need for DROP_IDLE is that the idle worker is flushed. I've always assumed you would pass in DROP_ACTIVE | DROP_RETIRE | DROP_IDLE as the trifecta. The biggest problem here is that's it is an uninterruptible loop. -Chris
On 02/05/2019 15:00, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2019-05-02 14:34:11) >> >> On 01/05/2019 12:45, Chris Wilson wrote: >>> If the user is racing a call to debugfs/i915_drop_caches with ongoing >>> submission from another thread/process, we may never end up idling the >>> GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying >>> to catch an idle moment. >>> >>> Just flush the work once, that should be enough to park the system under >>> correct conditions. Outside of those we either have a driver bug or the >>> user is racing themselves. Sadly, because the user may be provoking the >>> unwanted situation we can't put a warn here to attract attention to a >>> probable bug. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> --- >>> drivers/gpu/drm/i915/i915_debugfs.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c >>> index 7e8898d0b78b..2ecefacb1e66 100644 >>> --- a/drivers/gpu/drm/i915/i915_debugfs.c >>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c >>> @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val) >>> fs_reclaim_release(GFP_KERNEL); >>> >>> if (val & DROP_IDLE) { >>> - do { >>> - flush_delayed_work(&i915->gem.retire_work); >>> - } while (READ_ONCE(i915->gt.awake)); >>> + flush_delayed_work(&i915->gem.retire_work); >>> flush_work(&i915->gem.idle_work); >>> } >>> >>> >> >> What were supposed to be semantics of DROP_IDLE? Now it seems rather >> weak. Should it for instance also imply DROP_ACTIVE? > > All I need for DROP_IDLE is that the idle worker is flushed. I've always > assumed you would pass in DROP_ACTIVE | DROP_RETIRE | DROP_IDLE as the > trifecta. > > The biggest problem here is that's it is an uninterruptible loop. Okay.. lets see if IGT is playing ball. :) Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 7e8898d0b78b..2ecefacb1e66 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3933,9 +3933,7 @@ i915_drop_caches_set(void *data, u64 val) fs_reclaim_release(GFP_KERNEL); if (val & DROP_IDLE) { - do { - flush_delayed_work(&i915->gem.retire_work); - } while (READ_ONCE(i915->gt.awake)); + flush_delayed_work(&i915->gem.retire_work); flush_work(&i915->gem.idle_work); }
If the user is racing a call to debugfs/i915_drop_caches with ongoing submission from another thread/process, we may never end up idling the GPU and be uninterruptibly spinning in debugfs/i915_drop_caches trying to catch an idle moment. Just flush the work once, that should be enough to park the system under correct conditions. Outside of those we either have a driver bug or the user is racing themselves. Sadly, because the user may be provoking the unwanted situation we can't put a warn here to attract attention to a probable bug. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_debugfs.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)