Message ID | 20160324212001.GA27742@nuc-i3427.alporthouse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Em Qui, 2016-03-24 às 21:20 +0000, chris@chris-wilson.co.uk escreveu: > On Thu, Mar 24, 2016 at 09:03:59PM +0000, chris@chris-wilson.co.uk > wrote: > > > > On Thu, Mar 24, 2016 at 08:53:21PM +0000, Zanoni, Paulo R wrote: > > > > > > Em Qui, 2016-03-24 às 19:31 +0000, Chris Wilson escreveu: > > > > > > > > On Thu, Mar 24, 2016 at 04:16:11PM -0300, Paulo Zanoni wrote: > > > > > > > > > > > > > > > FBC and the frontbuffer tracking infrastructure were designed > > > > > assuming > > > > > that user space applications would follow a specific set of > > > > > rules > > > > > regarding frontbuffer management and mmapping. I recently > > > > > discovered > > > > > that current user space is not exactly following these rules: > > > > > my > > > > > investigation led me to the conclusion that the generic > > > > > backend > > > > > from > > > > > SNA - used by SKL and the other new platforms without a > > > > > specific > > > > > backend - is not issuing sw_finish/dirty_fb IOCTLs when using > > > > > the > > > > > CPU > > > > > and WC mmaps. I discovered this when running lightdm: I would > > > > > type > > > > > the > > > > > password and nothing would appear on the screen unless I > > > > > moved the > > > > > mouse over the place where the letters were supposed to > > > > > appear. > > > > Yes, that is a kernel bug. The protocol we said the kernel > > > > would > > > > follow > > > > is to disable FBC/WC when userspace marks the object for > > > > writing by > > > > the > > > > CPU and would only reestablish FBC/WC upon dirtyfb. > > > But on WC mmaps we mark the object for writing by the GTT instead > > > of > > > the CPU, and while the tracking engine is able to see "normal" > > > GTT mmap > > > writes, it's not able to see WC mmap writes, so we established > > > that > > > we'd call dirtyfb after frontbuffer drawing through WC mmaps, > > > which is > > > something that the DDX never implemented. This was discussed on > > > #intel- > > > gfx on Nov 5 2014, and also possibly other places, but I can't > > > find the > > > logs. Daniel also confirmed this to me again on private IRC on > > > Jun 16 > > > 2015. So I still don't understand why this is a Kernel bug > > > instead of a > > > DDX bug. > > Because we said that once invalidated, it would not be restored > > until > > dirtyfb. The kernel is not doing that. Your patch does not do that. > > To > > be even close, you should be setting the origin flag based on the > > existence > > of wc mmaping for the object inside set-to-gtt-domain. Otherwise, > > you > > are not implementing even close to the protocol you say you are. > > That is > > invalidate on set-domain, flush on dirtyfb. > > > > The kernel's bug is that is not cancelling FBC. Userspace's bug is > > not > > signalling when to reenable it. > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h > index 8dec2e8..0314346 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2145,6 +2145,7 @@ struct drm_i915_gem_object { > unsigned int cache_dirty:1; > > unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS; > + unsigned int has_wc_mmap:1; > > /** Count of VMA actually bound by this object */ > unsigned int bind_count; > diff --git a/drivers/gpu/drm/i915/i915_gem.c > b/drivers/gpu/drm/i915/i915_gem.c > index 05ae706..29ca96d 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1310,6 +1310,13 @@ unlock: > return ret; > } > > +static enum fb_op_origin > +write_origin(struct drm_i915_gem_object *obj, unsigned domain) > +{ > + return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ? > + ORIGIN_GTT : ORIGIN_CPU; What if I have both a WC mmap and a GTT mmap, and I'm actually using the GTT mmap now? My set_domain call will be treated as WC mmap usage, while in fact it should be treated as GTT usage. Is there a way to differentiate between them with the current set_domain API? > +} > + > /** > * Called when user space prepares to use an object with the CPU, > either > * through the mmap ioctl's mapping or a GTT mapping. > @@ -1363,9 +1370,7 @@ i915_gem_set_domain_ioctl(struct drm_device > *dev, void *data, > ret = i915_gem_object_set_to_cpu_domain(obj, > write_domain != 0); > > if (write_domain != 0) > - intel_fb_obj_invalidate(obj, > - write_domain == > I915_GEM_DOMAIN_GTT ? > - ORIGIN_GTT : ORIGIN_CPU); > + intel_fb_obj_invalidate(obj, write_origin(obj, > write_domain)); > > unref: > drm_gem_object_unreference(&obj->base); > @@ -1466,6 +1471,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev, > void *data, > else > addr = -ENOMEM; > up_write(&mm->mmap_sem); > + > + /* This may race, but that's ok, it only gets set */ > + to_intel_bo(obj)->has_wc_mmap = true; > } >
On Fri, Mar 25, 2016 at 02:05:42PM +0000, Zanoni, Paulo R wrote: > What if I have both a WC mmap and a GTT mmap, and I'm actually using > the GTT mmap now? My set_domain call will be treated as WC mmap usage, > while in fact it should be treated as GTT usage. Is there a way to > differentiate between them with the current set_domain API? No, there is not. As far as the cache domain is regarded WC/GTT appear to be identical, i.e. GTT is just WC. That the GTT is not as coherent as previously regarded is yet another bug. -Chris
On Fri, Mar 25, 2016 at 02:05:42PM +0000, Zanoni, Paulo R wrote: > Em Qui, 2016-03-24 às 21:20 +0000, chris@chris-wilson.co.uk escreveu: > > On Thu, Mar 24, 2016 at 09:03:59PM +0000, chris@chris-wilson.co.uk > > wrote: > > > > > > On Thu, Mar 24, 2016 at 08:53:21PM +0000, Zanoni, Paulo R wrote: > > > > > > > > Em Qui, 2016-03-24 às 19:31 +0000, Chris Wilson escreveu: > > > > > > > > > > On Thu, Mar 24, 2016 at 04:16:11PM -0300, Paulo Zanoni wrote: > > > > > > > > > > > > > > > > > > FBC and the frontbuffer tracking infrastructure were designed > > > > > > assuming > > > > > > that user space applications would follow a specific set of > > > > > > rules > > > > > > regarding frontbuffer management and mmapping. I recently > > > > > > discovered > > > > > > that current user space is not exactly following these rules: > > > > > > my > > > > > > investigation led me to the conclusion that the generic > > > > > > backend > > > > > > from > > > > > > SNA - used by SKL and the other new platforms without a > > > > > > specific > > > > > > backend - is not issuing sw_finish/dirty_fb IOCTLs when using > > > > > > the > > > > > > CPU > > > > > > and WC mmaps. I discovered this when running lightdm: I would > > > > > > type > > > > > > the > > > > > > password and nothing would appear on the screen unless I > > > > > > moved the > > > > > > mouse over the place where the letters were supposed to > > > > > > appear. > > > > > Yes, that is a kernel bug. The protocol we said the kernel > > > > > would > > > > > follow > > > > > is to disable FBC/WC when userspace marks the object for > > > > > writing by > > > > > the > > > > > CPU and would only reestablish FBC/WC upon dirtyfb. > > > > But on WC mmaps we mark the object for writing by the GTT instead > > > > of > > > > the CPU, and while the tracking engine is able to see "normal" > > > > GTT mmap > > > > writes, it's not able to see WC mmap writes, so we established > > > > that > > > > we'd call dirtyfb after frontbuffer drawing through WC mmaps, > > > > which is > > > > something that the DDX never implemented. This was discussed on > > > > #intel- > > > > gfx on Nov 5 2014, and also possibly other places, but I can't > > > > find the > > > > logs. Daniel also confirmed this to me again on private IRC on > > > > Jun 16 > > > > 2015. So I still don't understand why this is a Kernel bug > > > > instead of a > > > > DDX bug. > > > Because we said that once invalidated, it would not be restored > > > until > > > dirtyfb. The kernel is not doing that. Your patch does not do that. > > > To > > > be even close, you should be setting the origin flag based on the > > > existence > > > of wc mmaping for the object inside set-to-gtt-domain. Otherwise, > > > you > > > are not implementing even close to the protocol you say you are. > > > That is > > > invalidate on set-domain, flush on dirtyfb. > > > > > > The kernel's bug is that is not cancelling FBC. Userspace's bug is > > > not > > > signalling when to reenable it. > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > b/drivers/gpu/drm/i915/i915_drv.h > > index 8dec2e8..0314346 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2145,6 +2145,7 @@ struct drm_i915_gem_object { > > unsigned int cache_dirty:1; > > > > unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS; > > + unsigned int has_wc_mmap:1; > > > > /** Count of VMA actually bound by this object */ > > unsigned int bind_count; > > diff --git a/drivers/gpu/drm/i915/i915_gem.c > > b/drivers/gpu/drm/i915/i915_gem.c > > index 05ae706..29ca96d 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -1310,6 +1310,13 @@ unlock: > > return ret; > > } > > > > +static enum fb_op_origin > > +write_origin(struct drm_i915_gem_object *obj, unsigned domain) > > +{ > > + return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ? > > + ORIGIN_GTT : ORIGIN_CPU; > > What if I have both a WC mmap and a GTT mmap, and I'm actually using > the GTT mmap now? My set_domain call will be treated as WC mmap usage, > while in fact it should be treated as GTT usage. Is there a way to > differentiate between them with the current set_domain API? *Blonk* or whatever the sound for suddenly realization is. Totally forgot that we're reuseding set_domain(GTT) for wc mmaps because this it's a nice trick. Otoh, is that trick the reason why wc mmaps aren't coherent enough? One possible difference is that this won't do the magic chipset flush in intel-gtt.c that we have on gen3-5. Let's pretend wbinv doesn't exist on gen2 ;-) But that's just an aside really ... Anyway, now that I can see again, ack on the trick to decide on ORIGIN_GTT vs. ORIGIN_CPU. -Daniel > > > > +} > > + > > /** > > * Called when user space prepares to use an object with the CPU, > > either > > * through the mmap ioctl's mapping or a GTT mapping. > > @@ -1363,9 +1370,7 @@ i915_gem_set_domain_ioctl(struct drm_device > > *dev, void *data, > > ret = i915_gem_object_set_to_cpu_domain(obj, > > write_domain != 0); > > > > if (write_domain != 0) > > - intel_fb_obj_invalidate(obj, > > - write_domain == > > I915_GEM_DOMAIN_GTT ? > > - ORIGIN_GTT : ORIGIN_CPU); > > + intel_fb_obj_invalidate(obj, write_origin(obj, > > write_domain)); > > > > unref: > > drm_gem_object_unreference(&obj->base); > > @@ -1466,6 +1471,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev, > > void *data, > > else > > addr = -ENOMEM; > > up_write(&mm->mmap_sem); > > + > > + /* This may race, but that's ok, it only gets set */ > > + to_intel_bo(obj)->has_wc_mmap = true; > > } > >
On Tue, Mar 29, 2016 at 01:55:02PM +0200, Daniel Vetter wrote: > *Blonk* or whatever the sound for suddenly realization is. Totally forgot > that we're reuseding set_domain(GTT) for wc mmaps because this it's a nice > trick. > > Otoh, is that trick the reason why wc mmaps aren't coherent enough? One > possible difference is that this won't do the magic chipset flush in > intel-gtt.c that we have on gen3-5. Let's pretend wbinv doesn't exist on > gen2 ;-) But that's just an aside really ... No, writes through the *GTT* are not coherent. igt/gem_mmap_wc/coherency -> PASS igt/gem_mmap_gtt/coherency -> FAIL on byt/bsw Also see patch on list to insert a magic delay that fixes the issue exposed in the kernel (which causes various hangs and EINVALs). -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8dec2e8..0314346 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2145,6 +2145,7 @@ struct drm_i915_gem_object { unsigned int cache_dirty:1; unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS; + unsigned int has_wc_mmap:1; /** Count of VMA actually bound by this object */ unsigned int bind_count; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 05ae706..29ca96d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1310,6 +1310,13 @@ unlock: return ret; } +static enum fb_op_origin +write_origin(struct drm_i915_gem_object *obj, unsigned domain) +{ + return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ? + ORIGIN_GTT : ORIGIN_CPU; +} + /** * Called when user space prepares to use an object with the CPU, either * through the mmap ioctl's mapping or a GTT mapping. @@ -1363,9 +1370,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0); if (write_domain != 0) - intel_fb_obj_invalidate(obj, - write_domain == I915_GEM_DOMAIN_GTT ? - ORIGIN_GTT : ORIGIN_CPU); + intel_fb_obj_invalidate(obj, write_origin(obj, write_domain)); unref: drm_gem_object_unreference(&obj->base); @@ -1466,6 +1471,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, else addr = -ENOMEM; up_write(&mm->mmap_sem); + + /* This may race, but that's ok, it only gets set */ + to_intel_bo(obj)->has_wc_mmap = true; } -- Chris Wilson, Intel Open Source Technology Centre