Message ID | 1432576124-20523-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote: > Since the advent of mmap(wc), where we reused the same cache domain for > WC and GTT paths (oh, how I regret that double-edged advice), we need to > be extra cautious when using GTT iomap_wc internally. Since userspace maybe > modifying through the mmap(wc) and we then modify modifying through an > aliased WC path through the GTT, those writes may overlap and not be > visible to the other path. Easiest to trigger appears to be write the > batch through mmap(wc) and then attempt to perform reloc the GTT, > corruption quickly ensues. > > v2: Be stricter and do a full mb() in case we are reading through one > path and about to write through the second path. Also, apply the > barriers on transitioning into the GTT domain as well to cover the white > lie asychronous cases. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Akash Goel <akash.goel@intel.com> > Cc: stable@vger.kernel.org There are a lot more mb() and wmb() in the gem code. I count execlist, legacy rings, pwrite, set_domain and finish_gtt. Time to add a flags parameter to the set_domain ioctl so that we can untangle the domain tracking from the waiting? Add mb() all over the place in userspace around unsychronized access? Something else? -Daniel > --- > drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++-------- > 1 file changed, 18 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 517c5b8100d1..f17168b2cd37 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3923,7 +3923,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) > int ret; > > if (obj->base.write_domain == I915_GEM_DOMAIN_GTT) > - return 0; > + goto out; > > ret = i915_gem_object_wait_rendering(obj, !write); > if (ret) > @@ -3943,13 +3943,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) > > i915_gem_object_flush_cpu_write_domain(obj); > > - /* Serialise direct access to this object with the barriers for > - * coherent writes from the GPU, by effectively invalidating the > - * GTT domain upon first access. > - */ > - if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0) > - mb(); > - > old_write_domain = obj->base.write_domain; > old_read_domains = obj->base.read_domains; > > @@ -3977,6 +3970,23 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) > list_move_tail(&vma->mm_list, > &to_i915(obj->base.dev)->gtt.base.inactive_list); > > +out: > + /* > + * Userspace may be writing through mmap(wc) with > + * write_domain=GTT (or even with a white lie unsynchronized write), > + * so we need to explicitly flush before transitioning to writing > + * through the GTT, or vice versa. As we reused the GTT cache domain > + * for both paths, we must always have a memory barrier just in case. > + * > + * We need a full memory barrier in case we are reading through > + * the GTT and before we starting through the WC (or vice versa). > + * If we are only about to read through this access, we need only > + * wait for any pending writes on the other path. > + */ > + if (write) > + mb(); > + else > + wmb(); > return 0; > } > > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote: > On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote: > > Since the advent of mmap(wc), where we reused the same cache domain for > > WC and GTT paths (oh, how I regret that double-edged advice), we need to > > be extra cautious when using GTT iomap_wc internally. Since userspace maybe > > modifying through the mmap(wc) and we then modify modifying through an > > aliased WC path through the GTT, those writes may overlap and not be > > visible to the other path. Easiest to trigger appears to be write the > > batch through mmap(wc) and then attempt to perform reloc the GTT, > > corruption quickly ensues. > > > > v2: Be stricter and do a full mb() in case we are reading through one > > path and about to write through the second path. Also, apply the > > barriers on transitioning into the GTT domain as well to cover the white > > lie asychronous cases. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Akash Goel <akash.goel@intel.com> > > Cc: stable@vger.kernel.org > > There are a lot more mb() and wmb() in the gem code. I count execlist, > legacy rings, pwrite, set_domain and finish_gtt. You have more than I have... -Chris
On Tue, May 26, 2015 at 09:49:15AM +0100, Chris Wilson wrote: > On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote: > > On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote: > > > Since the advent of mmap(wc), where we reused the same cache domain for > > > WC and GTT paths (oh, how I regret that double-edged advice), we need to > > > be extra cautious when using GTT iomap_wc internally. Since userspace maybe > > > modifying through the mmap(wc) and we then modify modifying through an > > > aliased WC path through the GTT, those writes may overlap and not be > > > visible to the other path. Easiest to trigger appears to be write the > > > batch through mmap(wc) and then attempt to perform reloc the GTT, > > > corruption quickly ensues. > > > > > > v2: Be stricter and do a full mb() in case we are reading through one > > > path and about to write through the second path. Also, apply the > > > barriers on transitioning into the GTT domain as well to cover the white > > > lie asychronous cases. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Akash Goel <akash.goel@intel.com> > > > Cc: stable@vger.kernel.org > > > > There are a lot more mb() and wmb() in the gem code. I count execlist, > > legacy rings, pwrite, set_domain and finish_gtt. > > You have more than I have... Well yeah, but the patch is also cc: stable. Which means we'd need to frob the barriers first, then refactor. And I'm still not sure whether this is road we want to go down, or whether adding explicit begin/end coherency ioctls wouldn't be the better long-term option. It always irked me a bit that the kernel needs to infer the other edge of a cpu access, explicit begin/end seemes better (and more in line with what all the gpu apis expose). Something like this struct i915_mmap_control { #define BEGIN 0 #define END 1 #define READ 0 #define WRITE 2 #define UNSYNCHRONIZED 4 /* or whetever the bikeshed will be */ #define MAPPING_MASK 0xf0 #define GTT_WC (1 << 4) #define CPU_WC (2 << 4) #define CPU_CACHED (3 << 4) u32 flags; u32 bo; u64 start; u64 length; }; Pretty much modelled after the begin/end_cpu_access hooks from dma-buf. Which seems to gain some traction for a generic up/download interface ... It just feels like incrementally fixing things might lead us to a very bad place. -Daniel
On Tue, May 26, 2015 at 11:56:18AM +0200, Daniel Vetter wrote: > On Tue, May 26, 2015 at 09:49:15AM +0100, Chris Wilson wrote: > > On Tue, May 26, 2015 at 10:01:24AM +0200, Daniel Vetter wrote: > > > On Mon, May 25, 2015 at 06:48:44PM +0100, Chris Wilson wrote: > > > > Since the advent of mmap(wc), where we reused the same cache domain for > > > > WC and GTT paths (oh, how I regret that double-edged advice), we need to > > > > be extra cautious when using GTT iomap_wc internally. Since userspace maybe > > > > modifying through the mmap(wc) and we then modify modifying through an > > > > aliased WC path through the GTT, those writes may overlap and not be > > > > visible to the other path. Easiest to trigger appears to be write the > > > > batch through mmap(wc) and then attempt to perform reloc the GTT, > > > > corruption quickly ensues. > > > > > > > > v2: Be stricter and do a full mb() in case we are reading through one > > > > path and about to write through the second path. Also, apply the > > > > barriers on transitioning into the GTT domain as well to cover the white > > > > lie asychronous cases. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Akash Goel <akash.goel@intel.com> > > > > Cc: stable@vger.kernel.org > > > > > > There are a lot more mb() and wmb() in the gem code. I count execlist, > > > legacy rings, pwrite, set_domain and finish_gtt. > > > > You have more than I have... > > Well yeah, but the patch is also cc: stable. Which means we'd need to frob > the barriers first, then refactor. And I'm still not sure whether this is > road we want to go down, or whether adding explicit begin/end coherency > ioctls wouldn't be the better long-term option. It always irked me a bit > that the kernel needs to infer the other edge of a cpu access, explicit > begin/end seemes better (and more in line with what all the gpu apis > expose). Something like this > > struct i915_mmap_control { > #define BEGIN 0 > #define END 1 > #define READ 0 > #define WRITE 2 > #define UNSYNCHRONIZED 4 /* or whetever the bikeshed will be */ > #define MAPPING_MASK 0xf0 > #define GTT_WC (1 << 4) > #define CPU_WC (2 << 4) > #define CPU_CACHED (3 << 4) > > u32 flags; > u32 bo; > u64 start; > u64 length; > }; GL advertises persistent mappings (i.e. start, no end), which is what the rest of userspace assumes as well. As far as I can see, I don't need any more tools to do exactly what I want from userspace - the only points in contention at the moment are where the kernel tries to do CPU access and with mb we can make that transparent to the user. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 517c5b8100d1..f17168b2cd37 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3923,7 +3923,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) int ret; if (obj->base.write_domain == I915_GEM_DOMAIN_GTT) - return 0; + goto out; ret = i915_gem_object_wait_rendering(obj, !write); if (ret) @@ -3943,13 +3943,6 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) i915_gem_object_flush_cpu_write_domain(obj); - /* Serialise direct access to this object with the barriers for - * coherent writes from the GPU, by effectively invalidating the - * GTT domain upon first access. - */ - if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0) - mb(); - old_write_domain = obj->base.write_domain; old_read_domains = obj->base.read_domains; @@ -3977,6 +3970,23 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write) list_move_tail(&vma->mm_list, &to_i915(obj->base.dev)->gtt.base.inactive_list); +out: + /* + * Userspace may be writing through mmap(wc) with + * write_domain=GTT (or even with a white lie unsynchronized write), + * so we need to explicitly flush before transitioning to writing + * through the GTT, or vice versa. As we reused the GTT cache domain + * for both paths, we must always have a memory barrier just in case. + * + * We need a full memory barrier in case we are reading through + * the GTT and before we starting through the WC (or vice versa). + * If we are only about to read through this access, we need only + * wait for any pending writes on the other path. + */ + if (write) + mb(); + else + wmb(); return 0; }
Since the advent of mmap(wc), where we reused the same cache domain for WC and GTT paths (oh, how I regret that double-edged advice), we need to be extra cautious when using GTT iomap_wc internally. Since userspace maybe modifying through the mmap(wc) and we then modify modifying through an aliased WC path through the GTT, those writes may overlap and not be visible to the other path. Easiest to trigger appears to be write the batch through mmap(wc) and then attempt to perform reloc the GTT, corruption quickly ensues. v2: Be stricter and do a full mb() in case we are reading through one path and about to write through the second path. Also, apply the barriers on transitioning into the GTT domain as well to cover the white lie asychronous cases. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: stable@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-)