diff mbox

drm/i915: During shrink_all we only need to idle the GPU

Message ID 1435493199-26869-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 28, 2015, 12:06 p.m. UTC
We can forgo an evict-everything here as the shrinker operation itself
will unbind any vma as required. If we explicitly idle the GPU through a
switch to the default context, we not only create a request in an
illegal context (e.g. whilst shrinking during execbuf with a request
already allocated), but switching to the default context will not free
up the memory backing the active contexts - unless in the unlikely
situation that context had already been closed (and just kept arrive by
being the current context). The saving is near zero and the danger real.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_shrinker.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Daniel Vetter June 29, 2015, 7:13 a.m. UTC | #1
On Sun, Jun 28, 2015 at 01:06:39PM +0100, Chris Wilson wrote:
> We can forgo an evict-everything here as the shrinker operation itself
> will unbind any vma as required. If we explicitly idle the GPU through a
> switch to the default context, we not only create a request in an
> illegal context (e.g. whilst shrinking during execbuf with a request
> already allocated), but switching to the default context will not free
> up the memory backing the active contexts - unless in the unlikely
> situation that context had already been closed (and just kept arrive by
> being the current context). The saving is near zero and the danger real.

Has this already blown up in some bugzilla somewhere? Should be a fairly
recent regression with the olr removal.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_shrinker.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index c41ddf92e404..2d8c79b8c378 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -158,9 +158,16 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
>   */
>  unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
>  {
> -	i915_gem_evict_everything(dev_priv->dev);
> -	return i915_gem_shrink(dev_priv, LONG_MAX,
> -			       I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
> +	unsigned long count;
> +
> +	count = i915_gem_shrink(dev_priv, LONG_MAX,
> +				I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
> +
> +	/* Use a double call to retire to flush any staged frees  */
> +	i915_gem_retire_requests(dev_priv->dev);
> +	i915_gem_retire_requests(dev_priv->dev);

I'm lost - where's that staged free?
-Daniel

> +
> +	return count;
>  }
>  
>  static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson June 29, 2015, 7:30 a.m. UTC | #2
On Mon, Jun 29, 2015 at 09:13:31AM +0200, Daniel Vetter wrote:
> On Sun, Jun 28, 2015 at 01:06:39PM +0100, Chris Wilson wrote:
> > We can forgo an evict-everything here as the shrinker operation itself
> > will unbind any vma as required. If we explicitly idle the GPU through a
> > switch to the default context, we not only create a request in an
> > illegal context (e.g. whilst shrinking during execbuf with a request
> > already allocated), but switching to the default context will not free
> > up the memory backing the active contexts - unless in the unlikely
> > situation that context had already been closed (and just kept arrive by
> > being the current context). The saving is near zero and the danger real.
> 
> Has this already blown up in some bugzilla somewhere? Should be a fairly
> recent regression with the olr removal.

It's damn tricky to catch in an igt, since it requires an oom pass
inside the submit callback. Running out of memory is easy, running out
of memory with outstanding rendering is easy, triggering the recursion
is hard. (Which of course meant it happened almost immediately when
running benchmarks on resource limited machines.) Fault injection is the
only tractible solution. (And don't think what happens if we cancel a
request after doing a switch_context, stale requests galore.)

> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > index c41ddf92e404..2d8c79b8c378 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > @@ -158,9 +158,16 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
> >   */
> >  unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
> >  {
> > -	i915_gem_evict_everything(dev_priv->dev);
> > -	return i915_gem_shrink(dev_priv, LONG_MAX,
> > -			       I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
> > +	unsigned long count;
> > +
> > +	count = i915_gem_shrink(dev_priv, LONG_MAX,
> > +				I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
> > +
> > +	/* Use a double call to retire to flush any staged frees  */
> > +	i915_gem_retire_requests(dev_priv->dev);
> > +	i915_gem_retire_requests(dev_priv->dev);
> 
> I'm lost - where's that staged free?

Execlists has one, and I'm adding another.
-Chris
Shuang He June 29, 2015, 3:49 p.m. UTC | #3
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6655
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
ILK                                  302/302              302/302
SNB                                  312/316              312/316
IVB                                  343/343              343/343
BYT                 -2              287/287              285/287
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*BYT  igt@gem_partial_pwrite_pread@reads      PASS(1)      FAIL(1)
*BYT  igt@gem_tiled_partial_pwrite_pread@reads      PASS(1)      FAIL(1)
Note: You need to pay more attention to line start with '*'
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index c41ddf92e404..2d8c79b8c378 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -158,9 +158,16 @@  i915_gem_shrink(struct drm_i915_private *dev_priv,
  */
 unsigned long i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
-	i915_gem_evict_everything(dev_priv->dev);
-	return i915_gem_shrink(dev_priv, LONG_MAX,
-			       I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
+	unsigned long count;
+
+	count = i915_gem_shrink(dev_priv, LONG_MAX,
+				I915_SHRINK_BOUND | I915_SHRINK_UNBOUND);
+
+	/* Use a double call to retire to flush any staged frees  */
+	i915_gem_retire_requests(dev_priv->dev);
+	i915_gem_retire_requests(dev_priv->dev);
+
+	return count;
 }
 
 static bool i915_gem_shrinker_lock(struct drm_device *dev, bool *unlock)