[v2] drm/i915: Force wmb() on using GTT io_mapping_map_wc
diff mbox

Message ID 1432576124-20523-1-git-send-email-chris@chris-wilson.co.uk
State New
Headers show

Commit Message

Chris Wilson May 25, 2015, 5:48 p.m. UTC
Since the advent of mmap(wc), where we reused the same cache domain for
WC and GTT paths (oh, how I regret that double-edged advice), we need to
be extra cautious when using GTT iomap_wc internally. Since userspace maybe
modifying through the mmap(wc) and we then modify modifying through an
aliased WC path through the GTT, those writes may overlap and not be
visible to the other path. Easiest to trigger appears to be write the
batch through mmap(wc) and then attempt to perform reloc the GTT,
corruption quickly ensues.

v2: Be stricter and do a full mb() in case we are reading through one
path and about to write through the second path. Also, apply the
barriers on transitioning into the GTT domain as well to cover the white
lie asychronous cases.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

Comments

Daniel Vetter May 26, 2015, 8:01 a.m. UTC | #1
On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote:
> Since the advent of mmap(wc), where we reused the same cache domain for
> WC and GTT paths (oh, how I regret that double-edged advice), we need to
> be extra cautious when using GTT iomap_wc internally. Since userspace maybe
> modifying through the mmap(wc) and we then modify modifying through an
> aliased WC path through the GTT, those writes may overlap and not be
> visible to the other path. Easiest to trigger appears to be write the
> batch through mmap(wc) and then attempt to perform reloc the GTT,
> corruption quickly ensues.
> 
> v2: Be stricter and do a full mb() in case we are reading through one
> path and about to write through the second path. Also, apply the
> barriers on transitioning into the GTT domain as well to cover the white
> lie asychronous cases.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Akash Goel <akash.goel@intel.com>
> Cc: stable@vger.kernel.org

There are a lot more mb() and wmb() in the gem code. I count execlist,
legacy rings, pwrite, set_domain and finish_gtt.

Time to add a flags parameter to the set_domain ioctl so that we can
untangle the domain tracking from the waiting? Add mb() all over the place
in userspace around unsychronized access? Something else?
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 517c5b8100d1..f17168b2cd37 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3923,7 +3923,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  	int ret;
>  
>  	if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
> -		return 0;
> +		goto out;
>  
>  	ret = i915_gem_object_wait_rendering(obj, !write);
>  	if (ret)
> @@ -3943,13 +3943,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	i915_gem_object_flush_cpu_write_domain(obj);
>  
> -	/* Serialise direct access to this object with the barriers for
> -	 * coherent writes from the GPU, by effectively invalidating the
> -	 * GTT domain upon first access.
> -	 */
> -	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
> -		mb();
> -
>  	old_write_domain = obj->base.write_domain;
>  	old_read_domains = obj->base.read_domains;
>  
> @@ -3977,6 +3970,23 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>  		list_move_tail(&vma->mm_list,
>  			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
>  
> +out:
> +	/*
> +	 * Userspace may be writing through mmap(wc) with
> +	 * write_domain=GTT (or even with a white lie unsynchronized write),
> +	 * so we need to explicitly flush before transitioning to writing
> +	 * through the GTT, or vice versa. As we reused the GTT cache domain
> +	 * for both paths, we must always have a memory barrier just in case.
> +	 *
> +	 * We need a full memory barrier in case we are reading through
> +	 * the GTT and before we starting through the WC (or vice versa).
> +	 * If we are only about to read through this access, we need only
> +	 * wait for any pending writes on the other path.
> +	 */
> +	if (write)
> +		mb();
> +	else
> +		wmb();
>  	return 0;
>  }
>  
> -- 
> 2.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson May 26, 2015, 8:49 a.m. UTC | #2
On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote:
> On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote:
> > Since the advent of mmap(wc), where we reused the same cache domain for
> > WC and GTT paths (oh, how I regret that double-edged advice), we need to
> > be extra cautious when using GTT iomap_wc internally. Since userspace maybe
> > modifying through the mmap(wc) and we then modify modifying through an
> > aliased WC path through the GTT, those writes may overlap and not be
> > visible to the other path. Easiest to trigger appears to be write the
> > batch through mmap(wc) and then attempt to perform reloc the GTT,
> > corruption quickly ensues.
> > 
> > v2: Be stricter and do a full mb() in case we are reading through one
> > path and about to write through the second path. Also, apply the
> > barriers on transitioning into the GTT domain as well to cover the white
> > lie asychronous cases.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Akash Goel <akash.goel@intel.com>
> > Cc: stable@vger.kernel.org
> 
> There are a lot more mb() and wmb() in the gem code. I count execlist,
> legacy rings, pwrite, set_domain and finish_gtt.

You have more than I have...
-Chris
Daniel Vetter May 26, 2015, 9:56 a.m. UTC | #3
On Tue, May 26, 2015 at 09:49:15AM +0100, Chris Wilson wrote:
> On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote:
> > On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote:
> > > Since the advent of mmap(wc), where we reused the same cache domain for
> > > WC and GTT paths (oh, how I regret that double-edged advice), we need to
> > > be extra cautious when using GTT iomap_wc internally. Since userspace maybe
> > > modifying through the mmap(wc) and we then modify modifying through an
> > > aliased WC path through the GTT, those writes may overlap and not be
> > > visible to the other path. Easiest to trigger appears to be write the
> > > batch through mmap(wc) and then attempt to perform reloc the GTT,
> > > corruption quickly ensues.
> > > 
> > > v2: Be stricter and do a full mb() in case we are reading through one
> > > path and about to write through the second path. Also, apply the
> > > barriers on transitioning into the GTT domain as well to cover the white
> > > lie asychronous cases.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Akash Goel <akash.goel@intel.com>
> > > Cc: stable@vger.kernel.org
> > 
> > There are a lot more mb() and wmb() in the gem code. I count execlist,
> > legacy rings, pwrite, set_domain and finish_gtt.
> 
> You have more than I have...

Well yeah, but the patch is also cc: stable. Which means we'd need to frob
the barriers first, then refactor. And I'm still not sure whether this is
road we want to go down, or whether adding explicit begin/end coherency
ioctls wouldn't be the better long-term option. It always irked me a bit
that the kernel needs to infer the other edge of a cpu access, explicit
begin/end seemes better (and more in line with what all the gpu apis
expose). Something like this

struct i915_mmap_control {
#define BEGIN 0
#define END 1
#define READ 0
#define WRITE 2
#define UNSYNCHRONIZED 	4 /* or whetever the bikeshed will be */
#define MAPPING_MASK 0xf0
#define GTT_WC		(1 << 4)
#define CPU_WC		(2 << 4)
#define CPU_CACHED	(3 << 4)

	u32 flags;
	u32 bo;
	u64 start;
	u64 length;
};

Pretty much modelled after the begin/end_cpu_access hooks from dma-buf.
Which seems to gain some traction for a generic up/download interface ...

It just feels like incrementally fixing things might lead us to a very bad
place.
-Daniel
Chris Wilson May 26, 2015, 9:59 a.m. UTC | #4
On Tue, May 26, 2015 at 11:56:18AM +0200, Daniel Vetter wrote:
> On Tue, May 26, 2015 at 09:49:15AM +0100, Chris Wilson wrote:
> > On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote:
> > > On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote:
> > > > Since the advent of mmap(wc), where we reused the same cache domain for
> > > > WC and GTT paths (oh, how I regret that double-edged advice), we need to
> > > > be extra cautious when using GTT iomap_wc internally. Since userspace maybe
> > > > modifying through the mmap(wc) and we then modify modifying through an
> > > > aliased WC path through the GTT, those writes may overlap and not be
> > > > visible to the other path. Easiest to trigger appears to be write the
> > > > batch through mmap(wc) and then attempt to perform reloc the GTT,
> > > > corruption quickly ensues.
> > > > 
> > > > v2: Be stricter and do a full mb() in case we are reading through one
> > > > path and about to write through the second path. Also, apply the
> > > > barriers on transitioning into the GTT domain as well to cover the white
> > > > lie asychronous cases.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Akash Goel <akash.goel@intel.com>
> > > > Cc: stable@vger.kernel.org
> > > 
> > > There are a lot more mb() and wmb() in the gem code. I count execlist,
> > > legacy rings, pwrite, set_domain and finish_gtt.
> > 
> > You have more than I have...
> 
> Well yeah, but the patch is also cc: stable. Which means we'd need to frob
> the barriers first, then refactor. And I'm still not sure whether this is
> road we want to go down, or whether adding explicit begin/end coherency
> ioctls wouldn't be the better long-term option. It always irked me a bit
> that the kernel needs to infer the other edge of a cpu access, explicit
> begin/end seemes better (and more in line with what all the gpu apis
> expose). Something like this
> 
> struct i915_mmap_control {
> #define BEGIN 0
> #define END 1
> #define READ 0
> #define WRITE 2
> #define UNSYNCHRONIZED 	4 /* or whetever the bikeshed will be */
> #define MAPPING_MASK 0xf0
> #define GTT_WC		(1 << 4)
> #define CPU_WC		(2 << 4)
> #define CPU_CACHED	(3 << 4)
> 
> 	u32 flags;
> 	u32 bo;
> 	u64 start;
> 	u64 length;
> };

GL advertises persistent mappings (i.e. start, no end), which is what
the rest of userspace assumes as well.

As far as I can see, I don't need any more tools to do exactly what I
want from userspace - the only points in contention at the moment are
where the kernel tries to do CPU access and with mb we can make that
transparent to the user.
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 517c5b8100d1..f17168b2cd37 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3923,7 +3923,7 @@  i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	int ret;
 
 	if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
-		return 0;
+		goto out;
 
 	ret = i915_gem_object_wait_rendering(obj, !write);
 	if (ret)
@@ -3943,13 +3943,6 @@  i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 
 	i915_gem_object_flush_cpu_write_domain(obj);
 
-	/* Serialise direct access to this object with the barriers for
-	 * coherent writes from the GPU, by effectively invalidating the
-	 * GTT domain upon first access.
-	 */
-	if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
-		mb();
-
 	old_write_domain = obj->base.write_domain;
 	old_read_domains = obj->base.read_domains;
 
@@ -3977,6 +3970,23 @@  i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 		list_move_tail(&vma->mm_list,
 			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
 
+out:
+	/*
+	 * Userspace may be writing through mmap(wc) with
+	 * write_domain=GTT (or even with a white lie unsynchronized write),
+	 * so we need to explicitly flush before transitioning to writing
+	 * through the GTT, or vice versa. As we reused the GTT cache domain
+	 * for both paths, we must always have a memory barrier just in case.
+	 *
+	 * We need a full memory barrier in case we are reading through
+	 * the GTT and before we starting through the WC (or vice versa).
+	 * If we are only about to read through this access, we need only
+	 * wait for any pending writes on the other path.
+	 */
+	if (write)
+		mb();
+	else
+		wmb();
 	return 0;
 }