diff mbox

drm/i915: Drop the overzealous warning from i915_gem_set_cache_level

Message ID 1376304377-11695-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 12, 2013, 10:46 a.m. UTC
By our earlier reckoning, move from a snooped/llc setting to an uncached
setting, leaves the CPU cache in a consistent state irrespective of our
domain tracking - so we can forgo the warning about the lack of
invalidation. Similarly for any writes posted to the snooped CPU domain,
we know will be safely clflushed to the uncached PTEs after forcing the
domain change.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Ben Widawsky Aug. 12, 2013, 9:02 p.m. UTC | #1
On Mon, Aug 12, 2013 at 11:46:17AM +0100, Chris Wilson wrote:
> By our earlier reckoning, move from a snooped/llc setting to an uncached
> setting, leaves the CPU cache in a consistent state irrespective of our
> domain tracking - so we can forgo the warning about the lack of
> invalidation. Similarly for any writes posted to the snooped CPU domain,
> we know will be safely clflushed to the uncached PTEs after forcing the
> domain change.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

I ran into this several times while doing the PPGTT development, and was
always scared to just remove it. Does it make sense to keep the
write_domain assertion with this gone?

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 925c77d..1d3e57e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3520,7 +3520,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		 * Just set it to the CPU cache for now.
>  		 */
>  		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
> -		WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU);
>  
>  		old_read_domains = obj->base.read_domains;
>  		old_write_domain = obj->base.write_domain;
> -- 
> 1.8.4.rc2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Aug. 12, 2013, 9:29 p.m. UTC | #2
On Mon, Aug 12, 2013 at 02:02:09PM -0700, Ben Widawsky wrote:
> On Mon, Aug 12, 2013 at 11:46:17AM +0100, Chris Wilson wrote:
> > By our earlier reckoning, move from a snooped/llc setting to an uncached
> > setting, leaves the CPU cache in a consistent state irrespective of our
> > domain tracking - so we can forgo the warning about the lack of
> > invalidation. Similarly for any writes posted to the snooped CPU domain,
> > we know will be safely clflushed to the uncached PTEs after forcing the
> > domain change.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> I ran into this several times while doing the PPGTT development, and was
> always scared to just remove it. Does it make sense to keep the
> write_domain assertion with this gone?

I think we've justified in the earlier series why we can drop the
WARN_ON(write) with impunity. As we don't need to do so immediately, I'd
like to sleep on it for a while.
-Chris
Daniel Vetter Aug. 13, 2013, 7:54 a.m. UTC | #3
On Mon, Aug 12, 2013 at 11:46:17AM +0100, Chris Wilson wrote:
> By our earlier reckoning, move from a snooped/llc setting to an uncached
> setting, leaves the CPU cache in a consistent state irrespective of our
> domain tracking - so we can forgo the warning about the lack of
> invalidation. Similarly for any writes posted to the snooped CPU domain,
> we know will be safely clflushed to the uncached PTEs after forcing the
> domain change.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040
Tested-by: cancan,feng <cancan.feng@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 925c77d..1d3e57e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3520,7 +3520,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		 * Just set it to the CPU cache for now.
>  		 */
>  		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
> -		WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU);
>  
>  		old_read_domains = obj->base.read_domains;
>  		old_write_domain = obj->base.write_domain;
> -- 
> 1.8.4.rc2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Aug. 13, 2013, 10:38 a.m. UTC | #4
On Tue, Aug 13, 2013 at 09:54:48AM +0200, Daniel Vetter wrote:
> On Mon, Aug 12, 2013 at 11:46:17AM +0100, Chris Wilson wrote:
> > By our earlier reckoning, move from a snooped/llc setting to an uncached
> > setting, leaves the CPU cache in a consistent state irrespective of our
> > domain tracking - so we can forgo the warning about the lack of
> > invalidation. Similarly for any writes posted to the snooped CPU domain,
> > we know will be safely clflushed to the uncached PTEs after forcing the
> > domain change.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68040
> Tested-by: cancan,feng <cancan.feng@intel.com>

Oh and QA blames that this WARN newly pops up on

commit d46f1c3f1372e3a72fab97c60480aa4a1084387f
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Thu Aug 8 14:41:06 2013 +0100
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Sat Aug 10 11:24:18 2013 +0200

    drm/i915: Allow the GPU to cache stolen memory
Ville Syrjälä Aug. 13, 2013, 12:12 p.m. UTC | #5
On Mon, Aug 12, 2013 at 11:46:17AM +0100, Chris Wilson wrote:
> By our earlier reckoning, move from a snooped/llc setting to an uncached
> setting, leaves the CPU cache in a consistent state irrespective of our
> domain tracking - so we can forgo the warning about the lack of
> invalidation. Similarly for any writes posted to the snooped CPU domain,
> we know will be safely clflushed to the uncached PTEs after forcing the
> domain change.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 925c77d..1d3e57e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3520,7 +3520,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		 * Just set it to the CPU cache for now.
>  		 */
>  		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
> -		WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU);

AFAICS this can only be reached by stolen objs starting in GTT read
domain. Normally set_cache_level checks if the object is bound and
then calls finish_gtt, and unbind also calls finish_gtt, and GPU
domain is handled in a similar way. So I don't see that we can end up
here any other way. Based on that, both WARNs seem rather pointless
actually.

Then again I'm not really sure what we gain from setting stolen objs
to GTT read domain initially.

The write domain check might make a bit of sense, except for the fact
that finish_gtt/gpu clears it just before.

Thinking about this stuff a bit, I think I actually came up with a
scenario where we would currently fail to invalidate the CPU cache
between non-snooped GPU/GTT access and CPU access:

1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
2. set to CPU read domain (wd=0 rd|=cpu)
3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
5. set to CPU domain -> CPU cache is still stale

>  		old_read_domains = obj->base.read_domains;
>  		old_write_domain = obj->base.write_domain;
> -- 
> 1.8.4.rc2
Chris Wilson Aug. 13, 2013, 12:20 p.m. UTC | #6
On Tue, Aug 13, 2013 at 03:12:59PM +0300, Ville Syrjälä wrote:
> Thinking about this stuff a bit, I think I actually came up with a
> scenario where we would currently fail to invalidate the CPU cache
> between non-snooped GPU/GTT access and CPU access:
> 
> 1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
> 2. set to CPU read domain (wd=0 rd|=cpu)
> 3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
> 4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
> 5. set to CPU domain -> CPU cache is still stale

You will also find the scanout reads stale data as well. You've managed
to shoot yourself in both feet. The kernel can't fix that, so should we
care about the other foot?
-Chris
Ville Syrjälä Aug. 13, 2013, 12:37 p.m. UTC | #7
On Tue, Aug 13, 2013 at 01:20:13PM +0100, Chris Wilson wrote:
> On Tue, Aug 13, 2013 at 03:12:59PM +0300, Ville Syrjälä wrote:
> > Thinking about this stuff a bit, I think I actually came up with a
> > scenario where we would currently fail to invalidate the CPU cache
> > between non-snooped GPU/GTT access and CPU access:
> > 
> > 1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
> > 2. set to CPU read domain (wd=0 rd|=cpu)
> > 3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
> > 4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
> > 5. set to CPU domain -> CPU cache is still stale
> 
> You will also find the scanout reads stale data as well.

Well, assuming you actually write something to the bo w/ the CPU. If
not, then it keeps scanning out the correct data.

> You've managed
> to shoot yourself in both feet. The kernel can't fix that, so should we
> care about the other foot?

Yeah, I suppose we shouldn't care too much about problems the user
created for himself.
Daniel Vetter Aug. 14, 2013, 8:49 a.m. UTC | #8
On Tue, Aug 13, 2013 at 03:37:56PM +0300, Ville Syrjälä wrote:
> On Tue, Aug 13, 2013 at 01:20:13PM +0100, Chris Wilson wrote:
> > On Tue, Aug 13, 2013 at 03:12:59PM +0300, Ville Syrjälä wrote:
> > > Thinking about this stuff a bit, I think I actually came up with a
> > > scenario where we would currently fail to invalidate the CPU cache
> > > between non-snooped GPU/GTT access and CPU access:
> > > 
> > > 1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
> > > 2. set to CPU read domain (wd=0 rd|=cpu)
> > > 3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
> > > 4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
> > > 5. set to CPU domain -> CPU cache is still stale
> > 
> > You will also find the scanout reads stale data as well.
> 
> Well, assuming you actually write something to the bo w/ the CPU. If
> not, then it keeps scanning out the correct data.

I think an if (obj->pin_display) return -EBUSY; in the set_caching ioctl
would be good to fix this.
-Daniel
Chris Wilson Aug. 14, 2013, 8:54 a.m. UTC | #9
On Wed, Aug 14, 2013 at 10:49:11AM +0200, Daniel Vetter wrote:
> On Tue, Aug 13, 2013 at 03:37:56PM +0300, Ville Syrjälä wrote:
> > On Tue, Aug 13, 2013 at 01:20:13PM +0100, Chris Wilson wrote:
> > > On Tue, Aug 13, 2013 at 03:12:59PM +0300, Ville Syrjälä wrote:
> > > > Thinking about this stuff a bit, I think I actually came up with a
> > > > scenario where we would currently fail to invalidate the CPU cache
> > > > between non-snooped GPU/GTT access and CPU access:
> > > > 
> > > > 1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
> > > > 2. set to CPU read domain (wd=0 rd|=cpu)
> > > > 3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
> > > > 4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
> > > > 5. set to CPU domain -> CPU cache is still stale
> > > 
> > > You will also find the scanout reads stale data as well.
> > 
> > Well, assuming you actually write something to the bo w/ the CPU. If
> > not, then it keeps scanning out the correct data.
> 
> I think an if (obj->pin_display) return -EBUSY; in the set_caching ioctl
> would be good to fix this.

And we already do that check (as a result of obj->pin_count).
Sorted.
-Chris
Daniel Vetter Aug. 14, 2013, 10:01 a.m. UTC | #10
On Wed, Aug 14, 2013 at 09:54:05AM +0100, Chris Wilson wrote:
> On Wed, Aug 14, 2013 at 10:49:11AM +0200, Daniel Vetter wrote:
> > On Tue, Aug 13, 2013 at 03:37:56PM +0300, Ville Syrjälä wrote:
> > > On Tue, Aug 13, 2013 at 01:20:13PM +0100, Chris Wilson wrote:
> > > > On Tue, Aug 13, 2013 at 03:12:59PM +0300, Ville Syrjälä wrote:
> > > > > Thinking about this stuff a bit, I think I actually came up with a
> > > > > scenario where we would currently fail to invalidate the CPU cache
> > > > > between non-snooped GPU/GTT access and CPU access:
> > > > > 
> > > > > 1. make bo non-snooped w/ pin_display=true (wd=0, rd|=gtt)
> > > > > 2. set to CPU read domain (wd=0 rd|=cpu)
> > > > > 3. set to GTT (or GPU) write domain (wd=gtt, rd=gtt) -> CPU cache is stale after this point
> > > > > 4. make bo snooped -> pin_display=true still so we directly set (wd=cpu, rd=cpu)
> > > > > 5. set to CPU domain -> CPU cache is still stale
> > > > 
> > > > You will also find the scanout reads stale data as well.
> > > 
> > > Well, assuming you actually write something to the bo w/ the CPU. If
> > > not, then it keeps scanning out the correct data.
> > 
> > I think an if (obj->pin_display) return -EBUSY; in the set_caching ioctl
> > would be good to fix this.
> 
> And we already do that check (as a result of obj->pin_count).
> Sorted.

Indeed. Patch merged to dinq (with a pimped commit message), thanks.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 925c77d..1d3e57e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3520,7 +3520,6 @@  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		 * Just set it to the CPU cache for now.
 		 */
 		WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU);
-		WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU);
 
 		old_read_domains = obj->base.read_domains;
 		old_write_domain = obj->base.write_domain;