diff mbox

drm/i915: Use MLC (l3$) for context objects

Message ID 1365427720-10926-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 8, 2013, 1:28 p.m. UTC
Enabling context support increases SwapBuffers latency by about 20%
(measured on an i7-3720qm). We can offset that loss slightly by enabling
faster caching for the contexts. As they are not backed by any
particular cache (such as the sampler or render caches) our only option
is to select the generic mid-level cache. This reduces the latency of
the swap by about 5%.

Oddly this effect can be observed running smokin-guns on IVB at
1280x1024:
Using BLT copies for swaps: 151.67 fps
Using Render copies for swaps (unpatched):  141.70 fps
With contexts disabled: 150.23 fps
With contexts in L3$: 150.77 fps

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
---
 drivers/gpu/drm/i915/i915_gem_context.c |    7 +++++++
 1 file changed, 7 insertions(+)

Comments

Kenneth Graunke April 8, 2013, 2:57 p.m. UTC | #1
On 04/08/2013 06:28 AM, Chris Wilson wrote:
> Enabling context support increases SwapBuffers latency by about 20%
> (measured on an i7-3720qm). We can offset that loss slightly by enabling
> faster caching for the contexts. As they are not backed by any
> particular cache (such as the sampler or render caches) our only option
> is to select the generic mid-level cache. This reduces the latency of
> the swap by about 5%.
>
> Oddly this effect can be observed running smokin-guns on IVB at
> 1280x1024:
> Using BLT copies for swaps: 151.67 fps
> Using Render copies for swaps (unpatched):  141.70 fps
> With contexts disabled: 150.23 fps
> With contexts in L3$: 150.77 fps
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c |    7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 94d873a..a1e8ecb 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -152,6 +152,13 @@ create_hw_context(struct drm_device *dev,
>   		return ERR_PTR(-ENOMEM);
>   	}
>
> +	if (INTEL_INFO(dev)->gen >= 7) {
> +		ret = i915_gem_object_set_cache_level(ctx->obj,
> +						      I915_CACHE_LLC_MLC);
> +		if (ret)
> +			goto err_out;
> +	}
> +
>   	/* The ring associated with the context object is handled by the normal
>   	 * object tracking code. We give an initial ring value simple to pass an
>   	 * assertion in the context switch code.

Sounds good to me, Chris.  Is this also useful on Sandybridge?  I don't 
see why we couldn't use both L3 & LLC there as well.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

On a tangent, it would be nice to clean up the I915_CACHE_* enums.  The 
term "MLC" isn't really used in the modern documentation, and there's 
even text that says "Ivybridge doesn't have MLC".  But what it really 
means here is LLC + L3.  It's probably not worth reworking until we can 
send out the new Haswell bits, though.

Thanks for doing this!

--Ken
Daniel Vetter April 8, 2013, 3:12 p.m. UTC | #2
On Mon, Apr 08, 2013 at 07:57:40AM -0700, Kenneth Graunke wrote:
> On 04/08/2013 06:28 AM, Chris Wilson wrote:
> >Enabling context support increases SwapBuffers latency by about 20%
> >(measured on an i7-3720qm). We can offset that loss slightly by enabling
> >faster caching for the contexts. As they are not backed by any
> >particular cache (such as the sampler or render caches) our only option
> >is to select the generic mid-level cache. This reduces the latency of
> >the swap by about 5%.
> >
> >Oddly this effect can be observed running smokin-guns on IVB at
> >1280x1024:
> >Using BLT copies for swaps: 151.67 fps
> >Using Render copies for swaps (unpatched):  141.70 fps
> >With contexts disabled: 150.23 fps
> >With contexts in L3$: 150.77 fps
> >
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >Cc: Ben Widawsky <ben@bwidawsk.net>
> >Cc: Kenneth Graunke <kenneth@whitecape.org>
> >---
> >  drivers/gpu/drm/i915/i915_gem_context.c |    7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >index 94d873a..a1e8ecb 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_context.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >@@ -152,6 +152,13 @@ create_hw_context(struct drm_device *dev,
> >  		return ERR_PTR(-ENOMEM);
> >  	}
> >
> >+	if (INTEL_INFO(dev)->gen >= 7) {
> >+		ret = i915_gem_object_set_cache_level(ctx->obj,
> >+						      I915_CACHE_LLC_MLC);
> >+		if (ret)
> >+			goto err_out;
> >+	}
> >+
> >  	/* The ring associated with the context object is handled by the normal
> >  	 * object tracking code. We give an initial ring value simple to pass an
> >  	 * assertion in the context switch code.
> 
> Sounds good to me, Chris.  Is this also useful on Sandybridge?  I
> don't see why we couldn't use both L3 & LLC there as well.

Snb doesn't have gpu l3 afaik.
> 
> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
> 
> On a tangent, it would be nice to clean up the I915_CACHE_* enums.
> The term "MLC" isn't really used in the modern documentation, and
> there's even text that says "Ivybridge doesn't have MLC".  But what
> it really means here is LLC + L3.  It's probably not worth reworking
> until we can send out the new Haswell bits, though.

Yeah, my idea is that iternally we split this into cache_level + flags,
where cache_level controls how coherency works (i.e. where we have to
clflush and do similar magic) and the flags do the fine-tuning (like
enabling l3 caching). But like you've said, can wait for more hsw magic
(or gfdt) landing.

Queued for -next, thanks for the patch.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 94d873a..a1e8ecb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -152,6 +152,13 @@  create_hw_context(struct drm_device *dev,
 		return ERR_PTR(-ENOMEM);
 	}
 
+	if (INTEL_INFO(dev)->gen >= 7) {
+		ret = i915_gem_object_set_cache_level(ctx->obj,
+						      I915_CACHE_LLC_MLC);
+		if (ret)
+			goto err_out;
+	}
+
 	/* The ring associated with the context object is handled by the normal
 	 * object tracking code. We give an initial ring value simple to pass an
 	 * assertion in the context switch code.