diff mbox

[4/4] drm/i915: Review the memory barriers around CPU access to buffers

Message ID 1349807080-9005-4-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 9, 2012, 6:24 p.m. UTC
We need to treat the GPU core as a distinct processor and so apply the
same SMP memory barriers. In this case, in addition to flushing the
chipset cache, which is a no-op on LLC platforms, apply a write barrier
beforehand. And then when we invalidate the CPU cache, make sure the
memory is coherent (again this was a no-op on LLC platforms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/char/agp/intel-gtt.c    |    1 +
 drivers/gpu/drm/i915/i915_gem.c |    1 +
 2 files changed, 2 insertions(+)

Comments

Jesse Barnes Oct. 11, 2012, 7:52 p.m. UTC | #1
On Tue,  9 Oct 2012 19:24:40 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> We need to treat the GPU core as a distinct processor and so apply the
> same SMP memory barriers. In this case, in addition to flushing the
> chipset cache, which is a no-op on LLC platforms, apply a write barrier
> beforehand. And then when we invalidate the CPU cache, make sure the
> memory is coherent (again this was a no-op on LLC platforms).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/char/agp/intel-gtt.c    |    1 +
>  drivers/gpu/drm/i915/i915_gem.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 8b0f6d19..1223128 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -1706,6 +1706,7 @@ EXPORT_SYMBOL(intel_gtt_get);
>  
>  void intel_gtt_chipset_flush(void)
>  {
> +	wmb();
>  	if (intel_private.driver->chipset_flush)
>  		intel_private.driver->chipset_flush();
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ed8d21a..b1ebb88 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3528,6 +3528,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
>  		i915_gem_clflush_object(obj);
> +		mb(); /* in case the clflush above is optimised away */
>  
>  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>  	}

These need more comments too.

I think the first is to make sure any previous loads have completed
before we start using the new object?  If so, don't we want reads to
complete first too?

The second one looks unnecessary.  If the object isn't in the CPU
domain, there should be no loads/stores against it right?
Daniel Vetter Oct. 11, 2012, 8:46 p.m. UTC | #2
On Tue, Oct 09, 2012 at 07:24:40PM +0100, Chris Wilson wrote:
> We need to treat the GPU core as a distinct processor and so apply the
> same SMP memory barriers. In this case, in addition to flushing the
> chipset cache, which is a no-op on LLC platforms, apply a write barrier
> beforehand. And then when we invalidate the CPU cache, make sure the
> memory is coherent (again this was a no-op on LLC platforms).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I think this one here deserves some love still:
- the fancy new pwrite/pread code does some crazy coherency tricks behind
  the domain tracking code. This patch misses those.
- like Jesse said: comments.
- I'd still wish we'd have some i-g-t tests for this stuff ...

And now my crazy new theory: We've already had some bug reports that
suggested that we're not fully coherent around unbind/rebind and papered
over it with:

commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 14 13:57:23 2011 +0100

    drm/i915: Only clear the GPU domains upon a successful finish

And now we have the cpu_reloc regression from Dave Airlie which could be
explained with similar rebinding penalties (if we're creative). I hope
somewhat that we could explain these with the lack of proper memory
barriers ... So if you can gather tested-by's with the above duct-tape
reverted and these patches applied, I'd be almost as happy as with some
i-g-t tests for these patches.

Cheers, Daniel
> ---
>  drivers/char/agp/intel-gtt.c    |    1 +
>  drivers/gpu/drm/i915/i915_gem.c |    1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> index 8b0f6d19..1223128 100644
> --- a/drivers/char/agp/intel-gtt.c
> +++ b/drivers/char/agp/intel-gtt.c
> @@ -1706,6 +1706,7 @@ EXPORT_SYMBOL(intel_gtt_get);
>  
>  void intel_gtt_chipset_flush(void)
>  {
> +	wmb();
>  	if (intel_private.driver->chipset_flush)
>  		intel_private.driver->chipset_flush();
>  }
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ed8d21a..b1ebb88 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3528,6 +3528,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
>  		i915_gem_clflush_object(obj);
> +		mb(); /* in case the clflush above is optimised away */
>  
>  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>  	}
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Oct. 19, 2012, 8:48 p.m. UTC | #3
On Thu, 11 Oct 2012 12:52:15 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> On Tue,  9 Oct 2012 19:24:40 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > We need to treat the GPU core as a distinct processor and so apply the
> > same SMP memory barriers. In this case, in addition to flushing the
> > chipset cache, which is a no-op on LLC platforms, apply a write barrier
> > beforehand. And then when we invalidate the CPU cache, make sure the
> > memory is coherent (again this was a no-op on LLC platforms).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/char/agp/intel-gtt.c    |    1 +
> >  drivers/gpu/drm/i915/i915_gem.c |    1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
> > index 8b0f6d19..1223128 100644
> > --- a/drivers/char/agp/intel-gtt.c
> > +++ b/drivers/char/agp/intel-gtt.c
> > @@ -1706,6 +1706,7 @@ EXPORT_SYMBOL(intel_gtt_get);
> >  
> >  void intel_gtt_chipset_flush(void)
> >  {
> > +	wmb();
> >  	if (intel_private.driver->chipset_flush)
> >  		intel_private.driver->chipset_flush();
> >  }
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ed8d21a..b1ebb88 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3528,6 +3528,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> >  	/* Flush the CPU cache if it's still invalid. */
> >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> >  		i915_gem_clflush_object(obj);
> > +		mb(); /* in case the clflush above is optimised away */
> >  
> >  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
> >  	}
> 
> These need more comments too.
> 
> I think the first is to make sure any previous loads have completed
> before we start using the new object?  If so, don't we want reads to
> complete first too?

The flush is only used to make sure the writes written from the CPU hit
the cache and/or chipset buffers before we flush them from the chipset
buffer. Userspace is welcome to race read/writes between cores and the
GPU, and there is nothing we can do to prevent that without adopting a
strict coherency model.

Also note that in the past I have proposed this wmb() to fix some
observed incoherency in the cursor sprite: #21442.
 
> The second one looks unnecessary.  If the object isn't in the CPU
> domain, there should be no loads/stores against it right?

Just depends on the programming model between CPU/GPU. The barrier is
there to make sure all the writes into the shared cache from another
core (the gpu in this case) is complete before we begin our reads.
Assuming that the GPU behaves as another core...
-Chris
diff mbox

Patch

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 8b0f6d19..1223128 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -1706,6 +1706,7 @@  EXPORT_SYMBOL(intel_gtt_get);
 
 void intel_gtt_chipset_flush(void)
 {
+	wmb();
 	if (intel_private.driver->chipset_flush)
 		intel_private.driver->chipset_flush();
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ed8d21a..b1ebb88 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3528,6 +3528,7 @@  i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
 		i915_gem_clflush_object(obj);
+		mb(); /* in case the clflush above is optimised away */
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}