RFT drm/i915/execlists: Flush memory before signaling ELSQ

Message ID	20181031090243.31993-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 31 Oct 2018 09:02:43 +0000 Message-Id: <20181031090243.31993-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] RFT drm/i915/execlists: Flush memory before signaling ELSQ Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	RFT drm/i915/execlists: Flush memory before signaling ELSQ \| expand RFT drm/i915/execlists: Flush memory before signaling ELSQ

Message ID

20181031090243.31993-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Wed, 31 Oct 2018 09:02:43 +0000
Message-Id: <20181031090243.31993-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH] RFT drm/i915/execlists: Flush memory before
 signaling ELSQ
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

RFT drm/i915/execlists: Flush memory before signaling ELSQ | expand

Commit Message

Chris Wilson Oct. 31, 2018, 9:02 a.m. UTC

We observe that the ordering of writes for a CS event is not as strong
from the GPU as we would like, and that on occasions we see the
ringbuffer tail updated before the event is written into the ringbuffer,
leading us to reuse the stale data.

Through around a big hammer to try and batter ELSQ into submission with
the presumption that perhaps the UC mmio write is not flushing our
writes into the context images.

References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Mika Kuoppala Oct. 31, 2018, 2:49 p.m. UTC | #1

Chris Wilson <chris@chris-wilson.co.uk> writes:

> We observe that the ordering of writes for a CS event is not as strong
> from the GPU as we would like, and that on occasions we see the
> ringbuffer tail updated before the event is written into the ringbuffer,
> leading us to reuse the stale data.
>
> Through around a big hammer to try and batter ELSQ into submission with
> the presumption that perhaps the UC mmio write is not flushing our
> writes into the context images.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 22b57b8926fc..ba61849fbb9b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -454,8 +454,10 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>  	}
>  
>  	/* we need to manually load the submit queue */
> -	if (execlists->ctrl_reg)
> +	if (execlists->ctrl_reg) {
> +		wmb(); /* XXX Big hammer or paper? XXX */
>  		writel(EL_CTRL_LOAD, execlists->ctrl_reg);

Will be difficult to know if it is the additional delay or
not.

The question is then why would it only be with ctrl_reg and
not legacy submission mode. So lets try with legacy mode?

-Mika

> +	}
>  
>  	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
>  }
> -- 
> 2.19.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson Oct. 31, 2018, 3:06 p.m. UTC | #2

Quoting Mika Kuoppala (2018-10-31 14:49:38)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We observe that the ordering of writes for a CS event is not as strong
> > from the GPU as we would like, and that on occasions we see the
> > ringbuffer tail updated before the event is written into the ringbuffer,
> > leading us to reuse the stale data.
> >
> > Through around a big hammer to try and batter ELSQ into submission with
> > the presumption that perhaps the UC mmio write is not flushing our
> > writes into the context images.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=108315
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_lrc.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 22b57b8926fc..ba61849fbb9b 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -454,8 +454,10 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
> >       }
> >  
> >       /* we need to manually load the submit queue */
> > -     if (execlists->ctrl_reg)
> > +     if (execlists->ctrl_reg) {
> > +             wmb(); /* XXX Big hammer or paper? XXX */
> >               writel(EL_CTRL_LOAD, execlists->ctrl_reg);
> 
> Will be difficult to know if it is the additional delay or
> not.

Either way; its notourbug.
 
> The question is then why would it only be with ctrl_reg and
> not legacy submission mode. So lets try with legacy mode?

Because it is definitely platform specific, and the only platform that
shows sign of this is uses ELSQ.
-Chris

Chris Wilson Oct. 31, 2018, 4:43 p.m. UTC | #3

Quoting Patchwork (2018-10-31 16:37:57)
> == Series Details ==
> 
> Series: RFT drm/i915/execlists: Flush memory before signaling ELSQ
> URL   : https://patchwork.freedesktop.org/series/51796/
> State : success
> 
> == Summary ==
> 
> = CI Bug Log - changes from CI_DRM_5062 -> Patchwork_10671 =
> 
> == Summary - WARNING ==
> 
>   Minor unknown changes coming with Patchwork_10671 need to be verified
>   manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_10671, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://patchwork.freedesktop.org/api/1.0/series/51796/revisions/1/mbox/
> 
> == Possible new issues ==
> 
>   Here are the unknown changes that may have been introduced in Patchwork_10671:
> 
>   === IGT changes ===
> 
>     ==== Warnings ====
> 
>     igt@prime_vgem@basic-fence-flip:
>       fi-ivb-3520m:       SKIP -> PASS
> 
>     
> == Known issues ==
> 
>   Here are the changes found in Patchwork_10671 that come from known issues:
> 
>   === IGT changes ===
> 
>     ==== Issues hit ====
> 
>     igt@drv_selftest@live_hangcheck:
>       fi-icl-u:           PASS -> INCOMPLETE (fdo#108315)

Hmm, that could be the failure we are looking for. Hard to tell as it
might just have been the preceding error during reset.
-Chris

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 22b57b8926fc..ba61849fbb9b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -454,8 +454,10 @@  static void execlists_submit_ports(struct intel_engine_cs *engine)
 	}
 
 	/* we need to manually load the submit queue */
-	if (execlists->ctrl_reg)
+	if (execlists->ctrl_reg) {
+		wmb(); /* XXX Big hammer or paper? XXX */
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
+	}
 
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
 }