Message ID | 1431330696-15379-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > With the advent of mmap(wc), we have a path to write directly into > active GPU buffers. When combined with async updates (i.e. avoiding the > explicit domain management along with the memory barriers and GPU > stalls) we start to see the GPU read the wrong values from memory - i.e. > we have insufficient memory barriers along the execbuffer path. Writes > through the GTT should have been naturally serialised with execution > through the GTT as well and so the impact only seems to be from the WC > paths. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Akash Goel <akash.goel@intel.com> > Cc: stable@vger.kernel.org Do we have a nasty igt for this? Bugzilla? -Daniel > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 650ae02484b0..4f97275ba799 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -1127,8 +1127,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring, > if (flush_chipset) > i915_gem_chipset_flush(ring->dev); > > - if (flush_domains & I915_GEM_DOMAIN_GTT) > - wmb(); > + /* Unconditionally flush out writes to memory as the user may be > + * doing asynchronous streaming writes to active buffers (i.e. > + * lazy domain management to avoid serialisation) directly into > + * the physical pages and so not naturally serialised by the GTT. > + */ > + wmb(); > > /* Unconditionally invalidate gpu caches and ensure that we do flush > * any residual writes from the previous batch. > -- > 2.1.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > With the advent of mmap(wc), we have a path to write directly into > > active GPU buffers. When combined with async updates (i.e. avoiding the > > explicit domain management along with the memory barriers and GPU > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > we have insufficient memory barriers along the execbuffer path. Writes > > through the GTT should have been naturally serialised with execution > > through the GTT as well and so the impact only seems to be from the WC > > paths. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Akash Goel <akash.goel@intel.com> > > Cc: stable@vger.kernel.org > > Do we have a nasty igt for this? Bugzilla? Maybe some of the 4.0 mmap(wc) regressions, we don't have an igt (yet). The corruption, spotted whilst playing with enabling mmap(wc) for mesa on byt, is very sporadic and so capturing it in a concise igt may be quite difficult. -Chris
On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > With the advent of mmap(wc), we have a path to write directly into > > active GPU buffers. When combined with async updates (i.e. avoiding the > > explicit domain management along with the memory barriers and GPU > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > we have insufficient memory barriers along the execbuffer path. Writes > > through the GTT should have been naturally serialised with execution > > through the GTT as well and so the impact only seems to be from the WC > > paths. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Akash Goel <akash.goel@intel.com> > > Cc: stable@vger.kernel.org > > Do we have a nasty igt for this? Bugzilla? I've added igt/gem_streaming_writes. That wmb() is not enough for !llc. Since the wmb() made piglit happy it is quite possible I haven't hit the same path exactly, but it's going to take some investigation to see if igt/gem_streaming_writes can possibly work on !llc. -Chris
On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > With the advent of mmap(wc), we have a path to write directly into > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > explicit domain management along with the memory barriers and GPU > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > we have insufficient memory barriers along the execbuffer path. Writes > > > through the GTT should have been naturally serialised with execution > > > through the GTT as well and so the impact only seems to be from the WC > > > paths. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Akash Goel <akash.goel@intel.com> > > > Cc: stable@vger.kernel.org > > > > Do we have a nasty igt for this? Bugzilla? > > I've added igt/gem_streaming_writes. So far the only thing that makes a difference on !llc is hitting it with wbinvd(). I wonder if EXEC_OBJECT_WBINVD (or EXEC_OBJECT_WMB to be less specific) is acceptable? -Chris
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6378
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV 272/272 272/272
ILK 302/302 302/302
SNB -1 315/315 314/315
IVB 343/343 343/343
BYT 287/287 287/287
BDW 317/317 317/317
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
SNB igt@pm_rpm@dpms-mode-unset-non-lpsp DMESG_WARN(6)PASS(1) DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_uncore.c:#assert_device_not_suspended[i915]()@WARNING:.* at .* assert_device_not_suspended+0x
Note: You need to pay more attention to line start with '*'
On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > With the advent of mmap(wc), we have a path to write directly into > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > explicit domain management along with the memory barriers and GPU > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > we have insufficient memory barriers along the execbuffer path. Writes > > > through the GTT should have been naturally serialised with execution > > > through the GTT as well and so the impact only seems to be from the WC > > > paths. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Akash Goel <akash.goel@intel.com> > > > Cc: stable@vger.kernel.org > > > > Do we have a nasty igt for this? Bugzilla? > > I've added igt/gem_streaming_writes. > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it > is quite possible I haven't hit the same path exactly, but it's going to > take some investigation to see if igt/gem_streaming_writes can possibly > work on !llc. Humbug. Found the bug in gem_streaming_writes, even though I still think the wmb() is strictly required, it runs fine without (presumably I haven't managed to avoid all barriers in the execbuffer path yet). However, I think can improve the stress by inserting extra gpu load -- that should help make the CPU writes / GPU reads of the buffer concurrent? -Chris
On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote: > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > > With the advent of mmap(wc), we have a path to write directly into > > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > > explicit domain management along with the memory barriers and GPU > > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > > we have insufficient memory barriers along the execbuffer path. Writes > > > > through the GTT should have been naturally serialised with execution > > > > through the GTT as well and so the impact only seems to be from the WC > > > > paths. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Akash Goel <akash.goel@intel.com> > > > > Cc: stable@vger.kernel.org > > > > > > Do we have a nasty igt for this? Bugzilla? > > > > I've added igt/gem_streaming_writes. > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it > > is quite possible I haven't hit the same path exactly, but it's going to > > take some investigation to see if igt/gem_streaming_writes can possibly > > work on !llc. > > Humbug. > > Found the bug in gem_streaming_writes, even though I still think the > wmb() is strictly required, it runs fine without (presumably I haven't > managed to avoid all barriers in the execbuffer path yet). However, I > think can improve the stress by inserting extra gpu load -- that should > help make the CPU writes / GPU reads of the buffer concurrent? Just a small update. I haven't found a way to reproduce this in igt yet, but I can still observe the effect using vbo-map-unsync and the fix there is the above patch to make the wmb() unconditional. We need to put this into stable@ reasonably quickly (I suspect some of the 4.0 mmap(wc) regressions are due to this as well). -Chris
On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote: > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote: > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > > > With the advent of mmap(wc), we have a path to write directly into > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > > > explicit domain management along with the memory barriers and GPU > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > > > we have insufficient memory barriers along the execbuffer path. Writes > > > > > through the GTT should have been naturally serialised with execution > > > > > through the GTT as well and so the impact only seems to be from the WC > > > > > paths. > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > Cc: Akash Goel <akash.goel@intel.com> > > > > > Cc: stable@vger.kernel.org > > > > > > > > Do we have a nasty igt for this? Bugzilla? > > > > > > I've added igt/gem_streaming_writes. > > > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it > > > is quite possible I haven't hit the same path exactly, but it's going to > > > take some investigation to see if igt/gem_streaming_writes can possibly > > > work on !llc. > > > > Humbug. > > > > Found the bug in gem_streaming_writes, even though I still think the > > wmb() is strictly required, it runs fine without (presumably I haven't > > managed to avoid all barriers in the execbuffer path yet). However, I > > think can improve the stress by inserting extra gpu load -- that should > > help make the CPU writes / GPU reads of the buffer concurrent? > > Just a small update. I haven't found a way to reproduce this in igt yet, > but I can still observe the effect using vbo-map-unsync and the fix > there is the above patch to make the wmb() unconditional. > > We need to put this into stable@ reasonably quickly (I suspect some of > the 4.0 mmap(wc) regressions are due to this as well). What about if (flush_domains & (GTT | CPU)) wmb(); instead? That would imo explain things a lot better, since cpu wc is treated as if in the CPU domains. Hm, looking at the igt that's not quite the case, we still put it into the GTT domain for wc mmaps afaict. -Daniel
On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote: > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote: > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote: > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > > > > With the advent of mmap(wc), we have a path to write directly into > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > > > > explicit domain management along with the memory barriers and GPU > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > > > > we have insufficient memory barriers along the execbuffer path. Writes > > > > > > through the GTT should have been naturally serialised with execution > > > > > > through the GTT as well and so the impact only seems to be from the WC > > > > > > paths. > > > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > Cc: Akash Goel <akash.goel@intel.com> > > > > > > Cc: stable@vger.kernel.org > > > > > > > > > > Do we have a nasty igt for this? Bugzilla? > > > > > > > > I've added igt/gem_streaming_writes. > > > > > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it > > > > is quite possible I haven't hit the same path exactly, but it's going to > > > > take some investigation to see if igt/gem_streaming_writes can possibly > > > > work on !llc. > > > > > > Humbug. > > > > > > Found the bug in gem_streaming_writes, even though I still think the > > > wmb() is strictly required, it runs fine without (presumably I haven't > > > managed to avoid all barriers in the execbuffer path yet). However, I > > > think can improve the stress by inserting extra gpu load -- that should > > > help make the CPU writes / GPU reads of the buffer concurrent? > > > > Just a small update. I haven't found a way to reproduce this in igt yet, > > but I can still observe the effect using vbo-map-unsync and the fix > > there is the above patch to make the wmb() unconditional. > > > > We need to put this into stable@ reasonably quickly (I suspect some of > > the 4.0 mmap(wc) regressions are due to this as well). > > What about > > if (flush_domains & (GTT | CPU)) > wmb(); > > instead? That would imo explain things a lot better, since cpu wc is > treated as if in the CPU domains. Hm, looking at the igt that's not quite > the case, we still put it into the GTT domain for wc mmaps afaict. No. flush_domains is 0. We are talking about async writes which means that userspace is not telling the kernel about susbsequent writes into the inactive portions of the bo, and trusting that the buffer is coherent and the writes are flushed. Putting the wmb() in the kernel is not the only solution, but the most convenient (and allows us to just emit one wmb() - but given the large number of other potential barriers in this path, I am surprised that is required. Empirical evidence to the contrary!) -Chris
On Thu, May 21, 2015 at 02:13:01PM +0100, Chris Wilson wrote: > On Thu, May 21, 2015 at 03:07:54PM +0200, Daniel Vetter wrote: > > On Thu, May 21, 2015 at 02:00:34PM +0100, Chris Wilson wrote: > > > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote: > > > > On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: > > > > > On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: > > > > > > On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: > > > > > > > With the advent of mmap(wc), we have a path to write directly into > > > > > > > active GPU buffers. When combined with async updates (i.e. avoiding the > > > > > > > explicit domain management along with the memory barriers and GPU > > > > > > > stalls) we start to see the GPU read the wrong values from memory - i.e. > > > > > > > we have insufficient memory barriers along the execbuffer path. Writes > > > > > > > through the GTT should have been naturally serialised with execution > > > > > > > through the GTT as well and so the impact only seems to be from the WC > > > > > > > paths. > > > > > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > > > Cc: Akash Goel <akash.goel@intel.com> > > > > > > > Cc: stable@vger.kernel.org > > > > > > > > > > > > Do we have a nasty igt for this? Bugzilla? > > > > > > > > > > I've added igt/gem_streaming_writes. > > > > > > > > > > That wmb() is not enough for !llc. Since the wmb() made piglit happy it > > > > > is quite possible I haven't hit the same path exactly, but it's going to > > > > > take some investigation to see if igt/gem_streaming_writes can possibly > > > > > work on !llc. > > > > > > > > Humbug. > > > > > > > > Found the bug in gem_streaming_writes, even though I still think the > > > > wmb() is strictly required, it runs fine without (presumably I haven't > > > > managed to avoid all barriers in the execbuffer path yet). However, I > > > > think can improve the stress by inserting extra gpu load -- that should > > > > help make the CPU writes / GPU reads of the buffer concurrent? > > > > > > Just a small update. I haven't found a way to reproduce this in igt yet, > > > but I can still observe the effect using vbo-map-unsync and the fix > > > there is the above patch to make the wmb() unconditional. > > > > > > We need to put this into stable@ reasonably quickly (I suspect some of > > > the 4.0 mmap(wc) regressions are due to this as well). > > > > What about > > > > if (flush_domains & (GTT | CPU)) > > wmb(); > > > > instead? That would imo explain things a lot better, since cpu wc is > > treated as if in the CPU domains. Hm, looking at the igt that's not quite > > the case, we still put it into the GTT domain for wc mmaps afaict. > > No. flush_domains is 0. We are talking about async writes which means > that userspace is not telling the kernel about susbsequent writes into > the inactive portions of the bo, and trusting that the buffer is > coherent and the writes are flushed. Putting the wmb() in the kernel is > not the only solution, but the most convenient (and allows us to just > emit one wmb() - but given the large number of other potential barriers > in this path, I am surprised that is required. Empirical evidence to the > contrary!) Hm right. What about emphasising this a bit more in the comment: /* * Empirical evidence indicates that we need a write barrier to * make sure write-combined writes (both to the gtt, but also to * the cpu mmaps). But userspace also uses wc mmaps as * unsynchronized upload paths where it inform the kernel about * domain changes (to avoid the stalls). Hence we must do this * barrier unconditinally. */ Mostly just rewording, unsing unsynchronized as used by gl/libdrm and clarification why we need to have the barrier unconditionally. With that Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> And I guess also Cc: stable@vger.kernel.org
On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote: > Hm right. What about emphasising this a bit more in the comment: > > /* > * Empirical evidence indicates that we need a write barrier to > * make sure write-combined writes (both to the gtt, but also to > * the cpu mmaps). But userspace also uses wc mmaps as > * unsynchronized upload paths where it inform the kernel about > * domain changes (to avoid the stalls). Hence we must do this > * barrier unconditinally. > */ For reference the wording in the commit is: /* Unconditionally flush out writes to memory as the user may be * doing asynchronous streaming writes to active buffers (i.e. * lazy domain management to avoid serialisation) directly into * the physical pages and so not naturally serialised by the GTT. */ > Mostly just rewording, unsing unsynchronized as used by gl/libdrm and > clarification why we need to have the barrier unconditionally. With that Hmm, glMapBufferRange does use unsynchronized, but async is almost universally preferred when talking about io and runqueues. /* Unconditionally flush out writes to memory as the user may be * doing asynchronous streaming writes to active buffers in this * batch (i.e. lazy domain management to avoid serialisation, for * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly * into the physical pages and so not naturally serialised by the GTT. */ > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > And I guess also > > Cc: stable@vger.kernel.org It already was ;-) -Chris
On Thu, May 21, 2015 at 04:22:55PM +0100, Chris Wilson wrote: > On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote: > > Hm right. What about emphasising this a bit more in the comment: > > > > /* > > * Empirical evidence indicates that we need a write barrier to > > * make sure write-combined writes (both to the gtt, but also to > > * the cpu mmaps). But userspace also uses wc mmaps as > > * unsynchronized upload paths where it inform the kernel about > > * domain changes (to avoid the stalls). Hence we must do this > > * barrier unconditinally. > > */ > > For reference the wording in the commit is: > > /* Unconditionally flush out writes to memory as the user may be > * doing asynchronous streaming writes to active buffers (i.e. > * lazy domain management to avoid serialisation) directly into > * the physical pages and so not naturally serialised by the GTT. > */ > > > Mostly just rewording, unsing unsynchronized as used by gl/libdrm and > > clarification why we need to have the barrier unconditionally. With that > > Hmm, glMapBufferRange does use unsynchronized, but async is almost > universally preferred when talking about io and runqueues. > > /* Unconditionally flush out writes to memory as the user may be > * doing asynchronous streaming writes to active buffers in this > * batch (i.e. lazy domain management to avoid serialisation, for > * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly > * into the physical pages and so not naturally serialised by the GTT. > */ > > > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Yeah, r-b: me also with this one. Jani, can you please exchange the comment and apply to -fixes? -Daniel > > > > And I guess also > > > > Cc: stable@vger.kernel.org > > It already was ;-) > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre
On 05/21/2015 06:00 AM, Chris Wilson wrote: > On Tue, May 19, 2015 at 03:41:48PM +0100, Chris Wilson wrote: >> On Mon, May 11, 2015 at 04:25:52PM +0100, Chris Wilson wrote: >>> On Mon, May 11, 2015 at 12:34:37PM +0200, Daniel Vetter wrote: >>>> On Mon, May 11, 2015 at 08:51:36AM +0100, Chris Wilson wrote: >>>>> With the advent of mmap(wc), we have a path to write directly into >>>>> active GPU buffers. When combined with async updates (i.e. avoiding the >>>>> explicit domain management along with the memory barriers and GPU >>>>> stalls) we start to see the GPU read the wrong values from memory - i.e. >>>>> we have insufficient memory barriers along the execbuffer path. Writes >>>>> through the GTT should have been naturally serialised with execution >>>>> through the GTT as well and so the impact only seems to be from the WC >>>>> paths. >>>>> >>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>>>> Cc: Akash Goel <akash.goel@intel.com> >>>>> Cc: stable@vger.kernel.org >>>> >>>> Do we have a nasty igt for this? Bugzilla? >>> >>> I've added igt/gem_streaming_writes. >>> >>> That wmb() is not enough for !llc. Since the wmb() made piglit happy it >>> is quite possible I haven't hit the same path exactly, but it's going to >>> take some investigation to see if igt/gem_streaming_writes can possibly >>> work on !llc. >> >> Humbug. >> >> Found the bug in gem_streaming_writes, even though I still think the >> wmb() is strictly required, it runs fine without (presumably I haven't >> managed to avoid all barriers in the execbuffer path yet). However, I >> think can improve the stress by inserting extra gpu load -- that should >> help make the CPU writes / GPU reads of the buffer concurrent? > > Just a small update. I haven't found a way to reproduce this in igt yet, > but I can still observe the effect using vbo-map-unsync and the fix > there is the above patch to make the wmb() unconditional. > > We need to put this into stable@ reasonably quickly (I suspect some of > the 4.0 mmap(wc) regressions are due to this as well). So the symptom is that the GPU picks up older values from memory, and your theory is that the wmb() kicks the values out of the store buffer or WC buffer prior to the execbuf? I think that's reasonable, and I'm hoping "globally visible" is spec'd to include the GPU and other system agents in the sfence definition. Jesse
On Thu, May 21, 2015 at 5:30 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, May 21, 2015 at 04:22:55PM +0100, Chris Wilson wrote: >> On Thu, May 21, 2015 at 04:21:46PM +0200, Daniel Vetter wrote: >> > Hm right. What about emphasising this a bit more in the comment: >> > >> > /* >> > * Empirical evidence indicates that we need a write barrier to >> > * make sure write-combined writes (both to the gtt, but also to >> > * the cpu mmaps). But userspace also uses wc mmaps as >> > * unsynchronized upload paths where it inform the kernel about >> > * domain changes (to avoid the stalls). Hence we must do this >> > * barrier unconditinally. >> > */ >> >> For reference the wording in the commit is: >> >> /* Unconditionally flush out writes to memory as the user may be >> * doing asynchronous streaming writes to active buffers (i.e. >> * lazy domain management to avoid serialisation) directly into >> * the physical pages and so not naturally serialised by the GTT. >> */ >> >> > Mostly just rewording, unsing unsynchronized as used by gl/libdrm and >> > clarification why we need to have the barrier unconditionally. With that >> >> Hmm, glMapBufferRange does use unsynchronized, but async is almost >> universally preferred when talking about io and runqueues. >> >> /* Unconditionally flush out writes to memory as the user may be >> * doing asynchronous streaming writes to active buffers in this >> * batch (i.e. lazy domain management to avoid serialisation, for >> * example with glMapBufferRange(GL_MAP_UNSYNCHRONIZED_BIT)) directly >> * into the physical pages and so not naturally serialised by the GTT. >> */ >> >> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Yeah, r-b: me also with this one. Jani, can you please exchange the > comment and apply to -fixes? I retract my r-b for now, this seems to be a much bigger can of worms - it seems like we need to make all the mb()/wmb() we have all over the place unconditional. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 650ae02484b0..4f97275ba799 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1127,8 +1127,12 @@ i915_gem_execbuffer_move_to_gpu(struct intel_engine_cs *ring, if (flush_chipset) i915_gem_chipset_flush(ring->dev); - if (flush_domains & I915_GEM_DOMAIN_GTT) - wmb(); + /* Unconditionally flush out writes to memory as the user may be + * doing asynchronous streaming writes to active buffers (i.e. + * lazy domain management to avoid serialisation) directly into + * the physical pages and so not naturally serialised by the GTT. + */ + wmb(); /* Unconditionally invalidate gpu caches and ensure that we do flush * any residual writes from the previous batch.
With the advent of mmap(wc), we have a path to write directly into active GPU buffers. When combined with async updates (i.e. avoiding the explicit domain management along with the memory barriers and GPU stalls) we start to see the GPU read the wrong values from memory - i.e. we have insufficient memory barriers along the execbuffer path. Writes through the GTT should have been naturally serialised with execution through the GTT as well and so the impact only seems to be from the WC paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Cc: stable@vger.kernel.org --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)