diff mbox

[1/2] drm/i915/execlists: Use rmb() to order CSB reads

Message ID 20180508163042.21825-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 8, 2018, 4:30 p.m. UTC
We assume that the CSB is written using the normal ringbuffer
coherency protocols, as outlined in kernel/events/ring_buffer.c:

    *   (HW)                              (DRIVER)
    *
    *   if (LOAD ->data_tail) {            LOAD ->data_head
    *                      (A)             smp_rmb()       (C)
    *      STORE $data                     LOAD $data
    *      smp_wmb()       (B)             smp_mb()        (D)
    *      STORE ->data_head               STORE ->data_tail
    *   }

So we assume that the HW fulfils its ordering requirements (B), and so
we should use a complimentary rmb (C) to ensure that our read of its
WRITE pointer is completed before we start accessing the data.

The final mb (D) is implied by the uncached mmio we perform to inform
the HW of our READ pointer.

References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Michel Thierry May 8, 2018, 5:20 p.m. UTC | #1
On 05/08/2018 09:30 AM, Chris Wilson wrote:
> We assume that the CSB is written using the normal ringbuffer
> coherency protocols, as outlined in kernel/events/ring_buffer.c:
> 
>      *   (HW)                              (DRIVER)
>      *
>      *   if (LOAD ->data_tail) {            LOAD ->data_head
>      *                      (A)             smp_rmb()       (C)
>      *      STORE $data                     LOAD $data
>      *      smp_wmb()       (B)             smp_mb()        (D)
>      *      STORE ->data_head               STORE ->data_tail
>      *   }
> 
> So we assume that the HW fulfils its ordering requirements (B), and so
> we should use a complimentary rmb (C) to ensure that our read of its
> WRITE pointer is completed before we start accessing the data.
> 
> The final mb (D) is implied by the uncached mmio we perform to inform
> the HW of our READ pointer.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
> Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Rafael Antognolli <rafael.antognolli@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Timo Aaltonen <tjaalton@ubuntu.com>
> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>

Acked-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 911f288f78aa..8977600f0d81 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -992,6 +992,7 @@ static void execlists_submission_tasklet(unsigned long data)
>   
>   			head = execlists->csb_head;
>   			tail = READ_ONCE(buf[write_idx]);
> +			rmb(); /* Hopefully paired with a wmb() in HW */
>   		}
>   		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
>   			  engine->name,
>
Chris Wilson May 9, 2018, 7:31 a.m. UTC | #2
Quoting Chris Wilson (2018-05-08 17:30:41)
> We assume that the CSB is written using the normal ringbuffer
> coherency protocols, as outlined in kernel/events/ring_buffer.c:
> 
>     *   (HW)                              (DRIVER)
>     *
>     *   if (LOAD ->data_tail) {            LOAD ->data_head
>     *                      (A)             smp_rmb()       (C)
>     *      STORE $data                     LOAD $data
>     *      smp_wmb()       (B)             smp_mb()        (D)
>     *      STORE ->data_head               STORE ->data_tail
>     *   }
> 
> So we assume that the HW fulfils its ordering requirements (B), and so
> we should use a complimentary rmb (C) to ensure that our read of its
> WRITE pointer is completed before we start accessing the data.
> 
> The final mb (D) is implied by the uncached mmio we perform to inform
> the HW of our READ pointer.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
-References: https://bugs.freedesktop.org/show_bug.cgi?id=105064

Resolved as ba74cb10c775 ("drm/i915/execlists: Delay writing to ELSP
until HW has processed the previous write")

So the other possibility is that this fixes up the issue with VT-d
latency.
-Chris
Mika Kuoppala May 11, 2018, 12:02 p.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We assume that the CSB is written using the normal ringbuffer
> coherency protocols, as outlined in kernel/events/ring_buffer.c:
>
>     *   (HW)                              (DRIVER)
>     *
>     *   if (LOAD ->data_tail) {            LOAD ->data_head
>     *                      (A)             smp_rmb()       (C)
>     *      STORE $data                     LOAD $data
>     *      smp_wmb()       (B)             smp_mb()        (D)
>     *      STORE ->data_head               STORE ->data_tail
>     *   }
>
> So we assume that the HW fulfils its ordering requirements (B), and so
> we should use a complimentary rmb (C) to ensure that our read of its
> WRITE pointer is completed before we start accessing the data.
>
> The final mb (D) is implied by the uncached mmio we perform to inform
> the HW of our READ pointer.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105064
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105888
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106185
> Fixes: 767a983ab255 ("drm/i915/execlists: Read the context-status HEAD from the HWSP")
> References: 61bf9719fa17 ("drm/i915/cnl: Use mmio access to context status buffer")
> Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Rafael Antognolli <rafael.antognolli@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Timo Aaltonen <tjaalton@ubuntu.com>
> Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>

Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>

This also makes the hwsp access work fine with iommu on (kbl).

-Mika

> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 911f288f78aa..8977600f0d81 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -992,6 +992,7 @@ static void execlists_submission_tasklet(unsigned long data)
>  
>  			head = execlists->csb_head;
>  			tail = READ_ONCE(buf[write_idx]);
> +			rmb(); /* Hopefully paired with a wmb() in HW */
>  		}
>  		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
>  			  engine->name,
> -- 
> 2.17.0
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 911f288f78aa..8977600f0d81 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -992,6 +992,7 @@  static void execlists_submission_tasklet(unsigned long data)
 
 			head = execlists->csb_head;
 			tail = READ_ONCE(buf[write_idx]);
+			rmb(); /* Hopefully paired with a wmb() in HW */
 		}
 		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
 			  engine->name,