diff mbox

drm/i915/execlists: Use rmb() to order CSB reads

Message ID 20180508125230.14528-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 8, 2018, 12:52 p.m. UTC
We assume that the CSB is written using the normal ringbuffer
coherency protocols, as outlined in kernel/events/ring_buffer.c:

    *   (HW)                              (DRIVER)
    *
    *   if (LOAD ->data_tail) {            LOAD ->data_head
    *                      (A)             smp_rmb()       (C)
    *      STORE $data                     LOAD $data
    *      smp_wmb()       (B)             smp_mb()        (D)
    *      STORE ->data_head               STORE ->data_tail
    *   }

So we assume that the HW fulfils it's ordering requirements (B), and so
we should use a complimentary rmb (C) to ensure that our read of its
WRITE pointer is completed before we start accessing the data.

The final mb (D) is implied by the uncached mmio we perform to inform
the HW of our READ pointer.

References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
---

Just tweaked the commitmsg to cross reference the mb against the
diagram.
-Chris

---
 drivers/gpu/drm/i915/intel_engine_cs.c | 3 ---
 drivers/gpu/drm/i915/intel_lrc.c       | 1 +
 2 files changed, 1 insertion(+), 3 deletions(-)

Comments

Chris Wilson May 8, 2018, 12:54 p.m. UTC | #1
Quoting Chris Wilson (2018-05-08 13:52:30)
> We assume that the CSB is written using the normal ringbuffer
> coherency protocols, as outlined in kernel/events/ring_buffer.c:
> 
>     *   (HW)                              (DRIVER)
>     *
>     *   if (LOAD ->data_tail) {            LOAD ->data_head
>     *                      (A)             smp_rmb()       (C)
>     *      STORE $data                     LOAD $data
>     *      smp_wmb()       (B)             smp_mb()        (D)
>     *      STORE ->data_head               STORE ->data_tail
>     *   }
> 
> So we assume that the HW fulfils it's ordering requirements (B), and so
> we should use a complimentary rmb (C) to ensure that our read of its
> WRITE pointer is completed before we start accessing the data.
> 
> The final mb (D) is implied by the uncached mmio we perform to inform
> the HW of our READ pointer.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
> References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
> Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Rafael Antognolli <rafael.antognolli@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Timo Aaltonen <tjaalton@ubuntu.com>
> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
> ---
> 
> Just tweaked the commitmsg to cross reference the mb against the
> diagram.

And still forgot to fix s/it's/its/
-Chris
Mika Kuoppala May 8, 2018, 1:21 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We assume that the CSB is written using the normal ringbuffer
> coherency protocols, as outlined in kernel/events/ring_buffer.c:
>
>     *   (HW)                              (DRIVER)
>     *
>     *   if (LOAD ->data_tail) {            LOAD ->data_head
>     *                      (A)             smp_rmb()       (C)
>     *      STORE $data                     LOAD $data
>     *      smp_wmb()       (B)             smp_mb()        (D)
>     *      STORE ->data_head               STORE ->data_tail
>     *   }
>
> So we assume that the HW fulfils it's ordering requirements (B), and so
> we should use a complimentary rmb (C) to ensure that our read of its
> WRITE pointer is completed before we start accessing the data.
>
> The final mb (D) is implied by the uncached mmio we perform to inform
> the HW of our READ pointer.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
> References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
> Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Rafael Antognolli <rafael.antognolli@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Timo Aaltonen <tjaalton@ubuntu.com>
> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
> ---
>
> Just tweaked the commitmsg to cross reference the mb against the
> diagram.
> -Chris
>
> ---
>  drivers/gpu/drm/i915/intel_engine_cs.c | 3 ---
>  drivers/gpu/drm/i915/intel_lrc.c       | 1 +
>  2 files changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 70325e0824e3..8303e05b0c7d 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -470,9 +470,6 @@ static bool csb_force_mmio(struct drm_i915_private *i915)
>  	if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915))
>  		return true;
>  
> -	if (IS_CANNONLAKE(i915))
> -		return true;
> -
>  	return false;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 911f288f78aa..8977600f0d81 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -992,6 +992,7 @@ static void execlists_submission_tasklet(unsigned long data)
>  
>  			head = execlists->csb_head;
>  			tail = READ_ONCE(buf[write_idx]);
> +			rmb(); /* Hopefully paired with a wmb() in HW */

If the gpu does ordered writes (with write buffer in between), this is ok.

If the gpu does not order writes, we would still need the rmb()
to prevent cpu from loading an stale csb entry ahead of tail read?

Quoting memory-barries.txt:
" (*) loads may be done speculatively, leading to the result having been fetched
     at the wrong time in the expected sequence of events;
"

So I would change the comment to /* Enforce tail vs csb entry read order */

-Mika

>  		}
>  		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
>  			  engine->name,
> -- 
> 2.17.0
Chris Wilson May 8, 2018, 1:30 p.m. UTC | #3
Quoting Mika Kuoppala (2018-05-08 14:21:34)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We assume that the CSB is written using the normal ringbuffer
> > coherency protocols, as outlined in kernel/events/ring_buffer.c:
> >
> >     *   (HW)                              (DRIVER)
> >     *
> >     *   if (LOAD ->data_tail) {            LOAD ->data_head
> >     *                      (A)             smp_rmb()       (C)
> >     *      STORE $data                     LOAD $data
> >     *      smp_wmb()       (B)             smp_mb()        (D)
> >     *      STORE ->data_head               STORE ->data_tail
> >     *   }
> >
> > So we assume that the HW fulfils it's ordering requirements (B), and so
> > we should use a complimentary rmb (C) to ensure that our read of its
> > WRITE pointer is completed before we start accessing the data.
> >
> > The final mb (D) is implied by the uncached mmio we perform to inform
> > the HW of our READ pointer.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
> > References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
> > Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > Cc: Rafael Antognolli <rafael.antognolli@intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Timo Aaltonen <tjaalton@ubuntu.com>
> > Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
> > ---
> >
> > Just tweaked the commitmsg to cross reference the mb against the
> > diagram.
> > -Chris
> >
> > ---
> >  drivers/gpu/drm/i915/intel_engine_cs.c | 3 ---
> >  drivers/gpu/drm/i915/intel_lrc.c       | 1 +
> >  2 files changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 70325e0824e3..8303e05b0c7d 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -470,9 +470,6 @@ static bool csb_force_mmio(struct drm_i915_private *i915)
> >       if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915))
> >               return true;
> >  
> > -     if (IS_CANNONLAKE(i915))
> > -             return true;
> > -
> >       return false;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 911f288f78aa..8977600f0d81 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -992,6 +992,7 @@ static void execlists_submission_tasklet(unsigned long data)
> >  
> >                       head = execlists->csb_head;
> >                       tail = READ_ONCE(buf[write_idx]);
> > +                     rmb(); /* Hopefully paired with a wmb() in HW */
> 
> If the gpu does ordered writes (with write buffer in between), this is ok.
> 
> If the gpu does not order writes, we would still need the rmb()
> to prevent cpu from loading an stale csb entry ahead of tail read?
> 
> Quoting memory-barries.txt:
> " (*) loads may be done speculatively, leading to the result having been fetched
>      at the wrong time in the expected sequence of events;
> "
> 
> So I would change the comment to /* Enforce tail vs csb entry read order */

Barriers are always paired, and the comment should tell the reader where
the other barrier is. Otherwise we would just need read_barrier_depends()
to enforce the read order (which is just READ_ONCE).
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 70325e0824e3..8303e05b0c7d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -470,9 +470,6 @@  static bool csb_force_mmio(struct drm_i915_private *i915)
 	if (intel_vgpu_active(i915) && !intel_vgpu_has_hwsp_emulation(i915))
 		return true;
 
-	if (IS_CANNONLAKE(i915))
-		return true;
-
 	return false;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 911f288f78aa..8977600f0d81 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -992,6 +992,7 @@  static void execlists_submission_tasklet(unsigned long data)
 
 			head = execlists->csb_head;
 			tail = READ_ONCE(buf[write_idx]);
+			rmb(); /* Hopefully paired with a wmb() in HW */
 		}
 		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
 			  engine->name,