diff mbox

[3/4] drm/i915: opt-out CPU and WC mmaps from FBC

Message ID 20160324212001.GA27742@nuc-i3427.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 24, 2016, 9:20 p.m. UTC
On Thu, Mar 24, 2016 at 09:03:59PM +0000, chris@chris-wilson.co.uk wrote:
> On Thu, Mar 24, 2016 at 08:53:21PM +0000, Zanoni, Paulo R wrote:
> > Em Qui, 2016-03-24 às 19:31 +0000, Chris Wilson escreveu:
> > > On Thu, Mar 24, 2016 at 04:16:11PM -0300, Paulo Zanoni wrote:
> > > > 
> > > > FBC and the frontbuffer tracking infrastructure were designed
> > > > assuming
> > > > that user space applications would follow a specific set of rules
> > > > regarding frontbuffer management and mmapping. I recently
> > > > discovered
> > > > that current user space is not exactly following these rules: my
> > > > investigation led me to the conclusion that the generic backend
> > > > from
> > > > SNA - used by SKL and the other new platforms without a specific
> > > > backend - is not issuing sw_finish/dirty_fb IOCTLs when using the
> > > > CPU
> > > > and WC mmaps. I discovered this when running lightdm: I would type
> > > > the
> > > > password and nothing would appear on the screen unless I moved the
> > > > mouse over the place where the letters were supposed to appear.
> > > Yes, that is a kernel bug. The protocol we said the kernel would
> > > follow
> > > is to disable FBC/WC when userspace marks the object for writing by
> > > the
> > > CPU and would only reestablish FBC/WC upon dirtyfb.
> > 
> > But on WC mmaps we mark the object for writing by the GTT instead of
> > the CPU, and while the tracking engine is able to see "normal" GTT mmap
> > writes, it's not able to see WC mmap writes, so we established that
> > we'd call dirtyfb after frontbuffer drawing through WC mmaps, which is
> > something that the DDX never implemented. This was discussed on #intel-
> > gfx on Nov 5 2014, and also possibly other places, but I can't find the
> > logs. Daniel also confirmed this to me again on private IRC on Jun 16
> > 2015. So I still don't understand why this is a Kernel bug instead of a
> > DDX bug.
> 
> Because we said that once invalidated, it would not be restored until
> dirtyfb. The kernel is not doing that. Your patch does not do that. To
> be even close, you should be setting the origin flag based on the existence
> of wc mmaping for the object inside set-to-gtt-domain. Otherwise, you
> are not implementing even close to the protocol you say you are. That is
> invalidate on set-domain, flush on dirtyfb.
> 
> The kernel's bug is that is not cancelling FBC. Userspace's bug is not
> signalling when to reenable it.

Comments

Zanoni, Paulo R March 25, 2016, 2:05 p.m. UTC | #1
Em Qui, 2016-03-24 às 21:20 +0000, chris@chris-wilson.co.uk escreveu:
> On Thu, Mar 24, 2016 at 09:03:59PM +0000, chris@chris-wilson.co.uk

> wrote:

> > 

> > On Thu, Mar 24, 2016 at 08:53:21PM +0000, Zanoni, Paulo R wrote:

> > > 

> > > Em Qui, 2016-03-24 às 19:31 +0000, Chris Wilson escreveu:

> > > > 

> > > > On Thu, Mar 24, 2016 at 04:16:11PM -0300, Paulo Zanoni wrote:

> > > > > 

> > > > > 

> > > > > FBC and the frontbuffer tracking infrastructure were designed

> > > > > assuming

> > > > > that user space applications would follow a specific set of

> > > > > rules

> > > > > regarding frontbuffer management and mmapping. I recently

> > > > > discovered

> > > > > that current user space is not exactly following these rules:

> > > > > my

> > > > > investigation led me to the conclusion that the generic

> > > > > backend

> > > > > from

> > > > > SNA - used by SKL and the other new platforms without a

> > > > > specific

> > > > > backend - is not issuing sw_finish/dirty_fb IOCTLs when using

> > > > > the

> > > > > CPU

> > > > > and WC mmaps. I discovered this when running lightdm: I would

> > > > > type

> > > > > the

> > > > > password and nothing would appear on the screen unless I

> > > > > moved the

> > > > > mouse over the place where the letters were supposed to

> > > > > appear.

> > > > Yes, that is a kernel bug. The protocol we said the kernel

> > > > would

> > > > follow

> > > > is to disable FBC/WC when userspace marks the object for

> > > > writing by

> > > > the

> > > > CPU and would only reestablish FBC/WC upon dirtyfb.

> > > But on WC mmaps we mark the object for writing by the GTT instead

> > > of

> > > the CPU, and while the tracking engine is able to see "normal"

> > > GTT mmap

> > > writes, it's not able to see WC mmap writes, so we established

> > > that

> > > we'd call dirtyfb after frontbuffer drawing through WC mmaps,

> > > which is

> > > something that the DDX never implemented. This was discussed on

> > > #intel-

> > > gfx on Nov 5 2014, and also possibly other places, but I can't

> > > find the

> > > logs. Daniel also confirmed this to me again on private IRC on

> > > Jun 16

> > > 2015. So I still don't understand why this is a Kernel bug

> > > instead of a

> > > DDX bug.

> > Because we said that once invalidated, it would not be restored

> > until

> > dirtyfb. The kernel is not doing that. Your patch does not do that.

> > To

> > be even close, you should be setting the origin flag based on the

> > existence

> > of wc mmaping for the object inside set-to-gtt-domain. Otherwise,

> > you

> > are not implementing even close to the protocol you say you are.

> > That is

> > invalidate on set-domain, flush on dirtyfb.

> > 

> > The kernel's bug is that is not cancelling FBC. Userspace's bug is

> > not

> > signalling when to reenable it.

> diff --git a/drivers/gpu/drm/i915/i915_drv.h

> b/drivers/gpu/drm/i915/i915_drv.h

> index 8dec2e8..0314346 100644

> --- a/drivers/gpu/drm/i915/i915_drv.h

> +++ b/drivers/gpu/drm/i915/i915_drv.h

> @@ -2145,6 +2145,7 @@ struct drm_i915_gem_object {

>         unsigned int cache_dirty:1;

>  

>         unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;

> +       unsigned int has_wc_mmap:1;

>  

>         /** Count of VMA actually bound by this object */

>         unsigned int bind_count;

> diff --git a/drivers/gpu/drm/i915/i915_gem.c

> b/drivers/gpu/drm/i915/i915_gem.c

> index 05ae706..29ca96d 100644

> --- a/drivers/gpu/drm/i915/i915_gem.c

> +++ b/drivers/gpu/drm/i915/i915_gem.c

> @@ -1310,6 +1310,13 @@ unlock:

>         return ret;

>  }

>  

> +static enum fb_op_origin

> +write_origin(struct drm_i915_gem_object *obj, unsigned domain)

> +{

> +       return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ?

> +             ORIGIN_GTT : ORIGIN_CPU;


What if I have both a WC mmap and a GTT mmap, and I'm actually using
the GTT mmap now? My set_domain call will be treated as WC mmap usage,
while in fact it should be treated as GTT usage. Is there a way to
differentiate between them with the current set_domain API?


> +}

> +

>  /**

>   * Called when user space prepares to use an object with the CPU,

> either

>   * through the mmap ioctl's mapping or a GTT mapping.

> @@ -1363,9 +1370,7 @@ i915_gem_set_domain_ioctl(struct drm_device

> *dev, void *data,

>                 ret = i915_gem_object_set_to_cpu_domain(obj,

> write_domain != 0);

>  

>         if (write_domain != 0)

> -               intel_fb_obj_invalidate(obj,

> -                                       write_domain ==

> I915_GEM_DOMAIN_GTT ?

> -                                       ORIGIN_GTT : ORIGIN_CPU);

> +               intel_fb_obj_invalidate(obj, write_origin(obj,

> write_domain));

>  

>  unref:

>         drm_gem_object_unreference(&obj->base);

> @@ -1466,6 +1471,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev,

> void *data,

>                 else

>                         addr = -ENOMEM;

>                 up_write(&mm->mmap_sem);

> +

> +               /* This may race, but that's ok, it only gets set */

> +               to_intel_bo(obj)->has_wc_mmap = true;

>         }

>
Chris Wilson March 25, 2016, 2:17 p.m. UTC | #2
On Fri, Mar 25, 2016 at 02:05:42PM +0000, Zanoni, Paulo R wrote:
> What if I have both a WC mmap and a GTT mmap, and I'm actually using
> the GTT mmap now? My set_domain call will be treated as WC mmap usage,
> while in fact it should be treated as GTT usage. Is there a way to
> differentiate between them with the current set_domain API?

No, there is not. As far as the cache domain is regarded WC/GTT appear to
be identical, i.e. GTT is just WC. That the GTT is not as coherent as
previously regarded is yet another bug.
-Chris
Daniel Vetter March 29, 2016, 11:55 a.m. UTC | #3
On Fri, Mar 25, 2016 at 02:05:42PM +0000, Zanoni, Paulo R wrote:
> Em Qui, 2016-03-24 às 21:20 +0000, chris@chris-wilson.co.uk escreveu:
> > On Thu, Mar 24, 2016 at 09:03:59PM +0000, chris@chris-wilson.co.uk
> > wrote:
> > > 
> > > On Thu, Mar 24, 2016 at 08:53:21PM +0000, Zanoni, Paulo R wrote:
> > > > 
> > > > Em Qui, 2016-03-24 às 19:31 +0000, Chris Wilson escreveu:
> > > > > 
> > > > > On Thu, Mar 24, 2016 at 04:16:11PM -0300, Paulo Zanoni wrote:
> > > > > > 
> > > > > > 
> > > > > > FBC and the frontbuffer tracking infrastructure were designed
> > > > > > assuming
> > > > > > that user space applications would follow a specific set of
> > > > > > rules
> > > > > > regarding frontbuffer management and mmapping. I recently
> > > > > > discovered
> > > > > > that current user space is not exactly following these rules:
> > > > > > my
> > > > > > investigation led me to the conclusion that the generic
> > > > > > backend
> > > > > > from
> > > > > > SNA - used by SKL and the other new platforms without a
> > > > > > specific
> > > > > > backend - is not issuing sw_finish/dirty_fb IOCTLs when using
> > > > > > the
> > > > > > CPU
> > > > > > and WC mmaps. I discovered this when running lightdm: I would
> > > > > > type
> > > > > > the
> > > > > > password and nothing would appear on the screen unless I
> > > > > > moved the
> > > > > > mouse over the place where the letters were supposed to
> > > > > > appear.
> > > > > Yes, that is a kernel bug. The protocol we said the kernel
> > > > > would
> > > > > follow
> > > > > is to disable FBC/WC when userspace marks the object for
> > > > > writing by
> > > > > the
> > > > > CPU and would only reestablish FBC/WC upon dirtyfb.
> > > > But on WC mmaps we mark the object for writing by the GTT instead
> > > > of
> > > > the CPU, and while the tracking engine is able to see "normal"
> > > > GTT mmap
> > > > writes, it's not able to see WC mmap writes, so we established
> > > > that
> > > > we'd call dirtyfb after frontbuffer drawing through WC mmaps,
> > > > which is
> > > > something that the DDX never implemented. This was discussed on
> > > > #intel-
> > > > gfx on Nov 5 2014, and also possibly other places, but I can't
> > > > find the
> > > > logs. Daniel also confirmed this to me again on private IRC on
> > > > Jun 16
> > > > 2015. So I still don't understand why this is a Kernel bug
> > > > instead of a
> > > > DDX bug.
> > > Because we said that once invalidated, it would not be restored
> > > until
> > > dirtyfb. The kernel is not doing that. Your patch does not do that.
> > > To
> > > be even close, you should be setting the origin flag based on the
> > > existence
> > > of wc mmaping for the object inside set-to-gtt-domain. Otherwise,
> > > you
> > > are not implementing even close to the protocol you say you are.
> > > That is
> > > invalidate on set-domain, flush on dirtyfb.
> > > 
> > > The kernel's bug is that is not cancelling FBC. Userspace's bug is
> > > not
> > > signalling when to reenable it.
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h
> > index 8dec2e8..0314346 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2145,6 +2145,7 @@ struct drm_i915_gem_object {
> >         unsigned int cache_dirty:1;
> >  
> >         unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
> > +       unsigned int has_wc_mmap:1;
> >  
> >         /** Count of VMA actually bound by this object */
> >         unsigned int bind_count;
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index 05ae706..29ca96d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1310,6 +1310,13 @@ unlock:
> >         return ret;
> >  }
> >  
> > +static enum fb_op_origin
> > +write_origin(struct drm_i915_gem_object *obj, unsigned domain)
> > +{
> > +       return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ?
> > +             ORIGIN_GTT : ORIGIN_CPU;
> 
> What if I have both a WC mmap and a GTT mmap, and I'm actually using
> the GTT mmap now? My set_domain call will be treated as WC mmap usage,
> while in fact it should be treated as GTT usage. Is there a way to
> differentiate between them with the current set_domain API?

*Blonk* or whatever the sound for suddenly realization is. Totally forgot
that we're reuseding set_domain(GTT) for wc mmaps because this it's a nice
trick.

Otoh, is that trick the reason why wc mmaps aren't coherent enough? One
possible difference is that this won't do the magic chipset flush in
intel-gtt.c that we have on gen3-5. Let's pretend wbinv doesn't exist on
gen2 ;-) But that's just an aside really ...

Anyway, now that I can see again, ack on the trick to decide on ORIGIN_GTT
vs. ORIGIN_CPU.
-Daniel

> 
> 
> > +}
> > +
> >  /**
> >   * Called when user space prepares to use an object with the CPU,
> > either
> >   * through the mmap ioctl's mapping or a GTT mapping.
> > @@ -1363,9 +1370,7 @@ i915_gem_set_domain_ioctl(struct drm_device
> > *dev, void *data,
> >                 ret = i915_gem_object_set_to_cpu_domain(obj,
> > write_domain != 0);
> >  
> >         if (write_domain != 0)
> > -               intel_fb_obj_invalidate(obj,
> > -                                       write_domain ==
> > I915_GEM_DOMAIN_GTT ?
> > -                                       ORIGIN_GTT : ORIGIN_CPU);
> > +               intel_fb_obj_invalidate(obj, write_origin(obj,
> > write_domain));
> >  
> >  unref:
> >         drm_gem_object_unreference(&obj->base);
> > @@ -1466,6 +1471,9 @@ i915_gem_mmap_ioctl(struct drm_device *dev,
> > void *data,
> >                 else
> >                         addr = -ENOMEM;
> >                 up_write(&mm->mmap_sem);
> > +
> > +               /* This may race, but that's ok, it only gets set */
> > +               to_intel_bo(obj)->has_wc_mmap = true;
> >         }
> >
Chris Wilson March 29, 2016, 12:24 p.m. UTC | #4
On Tue, Mar 29, 2016 at 01:55:02PM +0200, Daniel Vetter wrote:
> *Blonk* or whatever the sound for suddenly realization is. Totally forgot
> that we're reuseding set_domain(GTT) for wc mmaps because this it's a nice
> trick.
> 
> Otoh, is that trick the reason why wc mmaps aren't coherent enough? One
> possible difference is that this won't do the magic chipset flush in
> intel-gtt.c that we have on gen3-5. Let's pretend wbinv doesn't exist on
> gen2 ;-) But that's just an aside really ...

No, writes through the *GTT* are not coherent.

igt/gem_mmap_wc/coherency -> PASS
igt/gem_mmap_gtt/coherency -> FAIL on byt/bsw

Also see patch on list to insert a magic delay that fixes the issue
exposed in the kernel (which causes various hangs and EINVALs).
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8dec2e8..0314346 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2145,6 +2145,7 @@  struct drm_i915_gem_object {
        unsigned int cache_dirty:1;
 
        unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
+       unsigned int has_wc_mmap:1;
 
        /** Count of VMA actually bound by this object */
        unsigned int bind_count;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05ae706..29ca96d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1310,6 +1310,13 @@  unlock:
        return ret;
 }
 
+static enum fb_op_origin
+write_origin(struct drm_i915_gem_object *obj, unsigned domain)
+{
+       return domain == I915_GEM_DOMAIN_GTT && !obj->has_wc_mmap ?
+             ORIGIN_GTT : ORIGIN_CPU;
+}
+
 /**
  * Called when user space prepares to use an object with the CPU, either
  * through the mmap ioctl's mapping or a GTT mapping.
@@ -1363,9 +1370,7 @@  i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
                ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
 
        if (write_domain != 0)
-               intel_fb_obj_invalidate(obj,
-                                       write_domain == I915_GEM_DOMAIN_GTT ?
-                                       ORIGIN_GTT : ORIGIN_CPU);
+               intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
 
 unref:
        drm_gem_object_unreference(&obj->base);
@@ -1466,6 +1471,9 @@  i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
                else
                        addr = -ENOMEM;
                up_write(&mm->mmap_sem);
+
+               /* This may race, but that's ok, it only gets set */
+               to_intel_bo(obj)->has_wc_mmap = true;
        }

-- 
Chris Wilson, Intel Open Source Technology Centre