[1/6] drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
diff mbox

Message ID 1452868545-19586-2-git-send-email-chris@chris-wilson.co.uk
State New
Headers show

Commit Message

Chris Wilson Jan. 15, 2016, 2:35 p.m. UTC
In order to ensure seqno/irq coherency, we current read a ring register.
We are not sure quite how it works, only that is does. Experiments show
that e.g. doing a clflush(seqno) instead is not sufficient, but we can
remove the forcewake dance from the mmio access.

v2: Baytrail wants a clflush too.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Mika Kuoppala Jan. 15, 2016, 5:55 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> In order to ensure seqno/irq coherency, we current read a ring register.
> We are not sure quite how it works, only that is does. Experiments show
> that e.g. doing a clflush(seqno) instead is not sufficient, but we can
> remove the forcewake dance from the mmio access.
>
> v2: Baytrail wants a clflush too.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8cd8aabcc3ff..935add1422ae 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1492,10 +1492,21 @@ gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
>  {
>  	/* Workaround to force correct ordering between irq and seqno writes on
>  	 * ivb (and maybe also on snb) by reading from a CS register (like
> -	 * ACTHD) before reading the status page. */
> +	 * ACTHD) before reading the status page.
> +	 *
> +	 * Note that this effectively effectively stalls the read by the
> time

effectively twice there.

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> +	 * it takes to do a memory transaction, which more or less ensures
> +	 * that the write from the GPU has sufficient time to invalidate
> +	 * the CPU cacheline. Alternatively we could delay the interrupt from
> +	 * the CS ring to give the write time to land, but that would incur
> +	 * a delay after every batch i.e. much more frequent than a delay
> +	 * when waiting for the interrupt (with the same net latency).
> +	 */
>  	if (!lazy_coherency) {
>  		struct drm_i915_private *dev_priv = ring->dev->dev_private;
> -		POSTING_READ(RING_ACTHD(ring->mmio_base));
> +		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
> +
> +		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
>  	}
>  
>  	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
> -- 
> 2.7.0.rc3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8cd8aabcc3ff..935add1422ae 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1492,10 +1492,21 @@  gen6_ring_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
 	/* Workaround to force correct ordering between irq and seqno writes on
 	 * ivb (and maybe also on snb) by reading from a CS register (like
-	 * ACTHD) before reading the status page. */
+	 * ACTHD) before reading the status page.
+	 *
+	 * Note that this effectively effectively stalls the read by the time
+	 * it takes to do a memory transaction, which more or less ensures
+	 * that the write from the GPU has sufficient time to invalidate
+	 * the CPU cacheline. Alternatively we could delay the interrupt from
+	 * the CS ring to give the write time to land, but that would incur
+	 * a delay after every batch i.e. much more frequent than a delay
+	 * when waiting for the interrupt (with the same net latency).
+	 */
 	if (!lazy_coherency) {
 		struct drm_i915_private *dev_priv = ring->dev->dev_private;
-		POSTING_READ(RING_ACTHD(ring->mmio_base));
+		POSTING_READ_FW(RING_ACTHD(ring->mmio_base));
+
+		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 	}
 
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);