diff mbox series

[v2,2/3] drm/i915: Setup MCR steering for RCS engine workarounds

Message ID 20200502045744.407060-3-matthew.d.roper@intel.com (mailing list archive)
State New, archived
Headers show
Series Steer multicast register workaround verification | expand

Commit Message

Matt Roper May 2, 2020, 4:57 a.m. UTC
Reads of multicast registers give the value associated with
slice/subslice 0 by default unless we manually steer the reads to a
different slice/subslice.  If slice/subslice 0 are fused off in hardware,
performing unsteered reads of multicast registers will return a value of
0 rather than the value we wrote into the multicast register.

To ensure we can properly readback and verify workarounds that touch
registers in a multicast range, we currently setup MCR steering to a
known-valid slice/subslice as the very first item in the GT workaround
list for gen10+.  That steering will then be in place as we verify the
rest of the registers that show up in the GT workaround list, and at
initialization the steering will also still be in effect when we move on
to applying and verifying the workarounds in the RCS engine's workaround
list (which is where most of the multicast registers actually show up).

However we seem run into problems during resets where RCS engine
workarounds are applied without being preceded by application of the GT
workaround list and the steering isn't in place.  Let's add the same MCR
steering to the beginning of the RCS engine's workaround list to ensure
that it's always in place and we don't get erroneous messages about RCS
engine workarounds failing to apply.

References: https://gitlab.freedesktop.org/drm/intel/issues/1222
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: chris@chris-wilson.co.uk
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Tvrtko Ursulin May 4, 2020, 11:43 a.m. UTC | #1
On 02/05/2020 05:57, Matt Roper wrote:
> Reads of multicast registers give the value associated with
> slice/subslice 0 by default unless we manually steer the reads to a
> different slice/subslice.  If slice/subslice 0 are fused off in hardware,
> performing unsteered reads of multicast registers will return a value of
> 0 rather than the value we wrote into the multicast register.
> 
> To ensure we can properly readback and verify workarounds that touch
> registers in a multicast range, we currently setup MCR steering to a
> known-valid slice/subslice as the very first item in the GT workaround
> list for gen10+.  That steering will then be in place as we verify the
> rest of the registers that show up in the GT workaround list, and at
> initialization the steering will also still be in effect when we move on
> to applying and verifying the workarounds in the RCS engine's workaround
> list (which is where most of the multicast registers actually show up).
> 
> However we seem run into problems during resets where RCS engine
> workarounds are applied without being preceded by application of the GT
> workaround list and the steering isn't in place.  Let's add the same MCR
> steering to the beginning of the RCS engine's workaround list to ensure
> that it's always in place and we don't get erroneous messages about RCS
> engine workarounds failing to apply.
> 
> References: https://gitlab.freedesktop.org/drm/intel/issues/1222
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: chris@chris-wilson.co.uk
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 4a255de13394..b11b83546696 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -1345,6 +1345,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
>   
> +	if (INTEL_GEN(i915) >= 10)
> +		wa_init_mcr(i915, wal);
> +
>   	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
>   		/*
>   		 * Wa_1607138336:tgl
> 

No complaints, only a question - is live_engine_reset_workarounds able 
to catch this, presumably sporadic, 0xfdc loss after engine reset?

Regards,

Tvrtko
Matt Roper May 4, 2020, 11:34 p.m. UTC | #2
On Mon, May 04, 2020 at 12:43:54PM +0100, Tvrtko Ursulin wrote:
> 
> On 02/05/2020 05:57, Matt Roper wrote:
> > Reads of multicast registers give the value associated with
> > slice/subslice 0 by default unless we manually steer the reads to a
> > different slice/subslice.  If slice/subslice 0 are fused off in hardware,
> > performing unsteered reads of multicast registers will return a value of
> > 0 rather than the value we wrote into the multicast register.
> > 
> > To ensure we can properly readback and verify workarounds that touch
> > registers in a multicast range, we currently setup MCR steering to a
> > known-valid slice/subslice as the very first item in the GT workaround
> > list for gen10+.  That steering will then be in place as we verify the
> > rest of the registers that show up in the GT workaround list, and at
> > initialization the steering will also still be in effect when we move on
> > to applying and verifying the workarounds in the RCS engine's workaround
> > list (which is where most of the multicast registers actually show up).
> > 
> > However we seem run into problems during resets where RCS engine
> > workarounds are applied without being preceded by application of the GT
> > workaround list and the steering isn't in place.  Let's add the same MCR
> > steering to the beginning of the RCS engine's workaround list to ensure
> > that it's always in place and we don't get erroneous messages about RCS
> > engine workarounds failing to apply.
> > 
> > References: https://gitlab.freedesktop.org/drm/intel/issues/1222
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: chris@chris-wilson.co.uk
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index 4a255de13394..b11b83546696 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -1345,6 +1345,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	if (INTEL_GEN(i915) >= 10)
> > +		wa_init_mcr(i915, wal);
> > +
> >   	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
> >   		/*
> >   		 * Wa_1607138336:tgl
> > 
> 
> No complaints, only a question - is live_engine_reset_workarounds able to
> catch this, presumably sporadic, 0xfdc loss after engine reset?

From what I can see, it looks like that selftests uses a separate
ring-based approach to handling the workarounds rather than using the
CPU.  It looks like that selftest just skips all MCR registers since we
can't steer ring accesses the way we can with the CPU.


Matt

> 
> Regards,
> 
> Tvrtko
Tvrtko Ursulin May 5, 2020, 8:14 a.m. UTC | #3
On 05/05/2020 00:34, Matt Roper wrote:
> On Mon, May 04, 2020 at 12:43:54PM +0100, Tvrtko Ursulin wrote:
>> On 02/05/2020 05:57, Matt Roper wrote:
>>> Reads of multicast registers give the value associated with
>>> slice/subslice 0 by default unless we manually steer the reads to a
>>> different slice/subslice.  If slice/subslice 0 are fused off in hardware,
>>> performing unsteered reads of multicast registers will return a value of
>>> 0 rather than the value we wrote into the multicast register.
>>>
>>> To ensure we can properly readback and verify workarounds that touch
>>> registers in a multicast range, we currently setup MCR steering to a
>>> known-valid slice/subslice as the very first item in the GT workaround
>>> list for gen10+.  That steering will then be in place as we verify the
>>> rest of the registers that show up in the GT workaround list, and at
>>> initialization the steering will also still be in effect when we move on
>>> to applying and verifying the workarounds in the RCS engine's workaround
>>> list (which is where most of the multicast registers actually show up).
>>>
>>> However we seem run into problems during resets where RCS engine
>>> workarounds are applied without being preceded by application of the GT
>>> workaround list and the steering isn't in place.  Let's add the same MCR
>>> steering to the beginning of the RCS engine's workaround list to ensure
>>> that it's always in place and we don't get erroneous messages about RCS
>>> engine workarounds failing to apply.
>>>
>>> References: https://gitlab.freedesktop.org/drm/intel/issues/1222
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> Cc: chris@chris-wilson.co.uk
>>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> index 4a255de13394..b11b83546696 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>>> @@ -1345,6 +1345,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>>>    {
>>>    	struct drm_i915_private *i915 = engine->i915;
>>> +	if (INTEL_GEN(i915) >= 10)
>>> +		wa_init_mcr(i915, wal);
>>> +
>>>    	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
>>>    		/*
>>>    		 * Wa_1607138336:tgl
>>>
>>
>> No complaints, only a question - is live_engine_reset_workarounds able to
>> catch this, presumably sporadic, 0xfdc loss after engine reset?
> 
>>From what I can see, it looks like that selftests uses a separate
> ring-based approach to handling the workarounds rather than using the
> CPU.  It looks like that selftest just skips all MCR registers since we
> can't steer ring accesses the way we can with the CPU.

But 0xfdc is verified after engine reset with a mmio read. Because it is 
in the GT list. Strange.. Ack on the series anyway.

Regards,

Tvrtko
Daniele Ceraolo Spurio May 12, 2020, 5:30 p.m. UTC | #4
On 5/1/20 9:57 PM, Matt Roper wrote:
> Reads of multicast registers give the value associated with
> slice/subslice 0 by default unless we manually steer the reads to a
> different slice/subslice.  If slice/subslice 0 are fused off in hardware,
> performing unsteered reads of multicast registers will return a value of
> 0 rather than the value we wrote into the multicast register.
> 
> To ensure we can properly readback and verify workarounds that touch
> registers in a multicast range, we currently setup MCR steering to a
> known-valid slice/subslice as the very first item in the GT workaround
> list for gen10+.  That steering will then be in place as we verify the
> rest of the registers that show up in the GT workaround list, and at
> initialization the steering will also still be in effect when we move on
> to applying and verifying the workarounds in the RCS engine's workaround
> list (which is where most of the multicast registers actually show up).
> 
> However we seem run into problems during resets where RCS engine
> workarounds are applied without being preceded by application of the GT
> workaround list and the steering isn't in place.  Let's add the same MCR
> steering to the beginning of the RCS engine's workaround list to ensure
> that it's always in place and we don't get erroneous messages about RCS
> engine workarounds failing to apply.
> 

Did you check with the HW team about this behavior? AFAIU from reading 
the specs, the 0xfdc register should be nowhere close the render domain 
and therefore it should not be impacted by RCS reset. If it really is 
impacted, it might indicate that something funny is happening on the HW 
and there might be other register in the area that are also incorrectly 
losing their value, so IMO worth investigating.

Daniele

> References: https://gitlab.freedesktop.org/drm/intel/issues/1222
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: chris@chris-wilson.co.uk
> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 4a255de13394..b11b83546696 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -1345,6 +1345,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
>   
> +	if (INTEL_GEN(i915) >= 10)
> +		wa_init_mcr(i915, wal);
> +
>   	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
>   		/*
>   		 * Wa_1607138336:tgl
>
Matt Roper May 12, 2020, 11:28 p.m. UTC | #5
On Tue, May 12, 2020 at 10:30:31AM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 5/1/20 9:57 PM, Matt Roper wrote:
> > Reads of multicast registers give the value associated with
> > slice/subslice 0 by default unless we manually steer the reads to a
> > different slice/subslice.  If slice/subslice 0 are fused off in hardware,
> > performing unsteered reads of multicast registers will return a value of
> > 0 rather than the value we wrote into the multicast register.
> > 
> > To ensure we can properly readback and verify workarounds that touch
> > registers in a multicast range, we currently setup MCR steering to a
> > known-valid slice/subslice as the very first item in the GT workaround
> > list for gen10+.  That steering will then be in place as we verify the
> > rest of the registers that show up in the GT workaround list, and at
> > initialization the steering will also still be in effect when we move on
> > to applying and verifying the workarounds in the RCS engine's workaround
> > list (which is where most of the multicast registers actually show up).
> > 
> > However we seem run into problems during resets where RCS engine
> > workarounds are applied without being preceded by application of the GT
> > workaround list and the steering isn't in place.  Let's add the same MCR
> > steering to the beginning of the RCS engine's workaround list to ensure
> > that it's always in place and we don't get erroneous messages about RCS
> > engine workarounds failing to apply.
> > 
> 
> Did you check with the HW team about this behavior? AFAIU from reading the
> specs, the 0xfdc register should be nowhere close the render domain and
> therefore it should not be impacted by RCS reset. If it really is impacted,
> it might indicate that something funny is happening on the HW and there
> might be other register in the area that are also incorrectly losing their
> value, so IMO worth investigating.

The theory is that it might not be the RCS reset itself that causes us
to lose 0xFDC but rather something else that happened between the last
time we applied the steering (e.g., at init) and when we do the reset.

We've asked the hardware guys about this register not sticking, but
really the only thing we've heard back so far is "try checking the
steering."  Presumably steering is good at startup since the workaround
sticks at that point, but then gets changed or cleared later by the time
we do the reset.  Unfortunately none of our pre-merge CI machines or
local machines exhibit this problem, only the CI "re-run" machine, so we
don't have a good way to investigate this in a more direct manner.

Would re-applying the steering in the engine workaround list cause any
problems?  It seems like in the worst case it would do nothing at all
and the problem would persist, but at least it would protect us in case
some other unknown event causes 0xfdc to be reset, and the eventual
results we get from CI re-runs would let us know if we're on the right
track or not.


Matt

> 
> Daniele
> 
> > References: https://gitlab.freedesktop.org/drm/intel/issues/1222
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > Cc: chris@chris-wilson.co.uk
> > Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > index 4a255de13394..b11b83546696 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> > @@ -1345,6 +1345,9 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	if (INTEL_GEN(i915) >= 10)
> > +		wa_init_mcr(i915, wal);
> > +
> >   	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
> >   		/*
> >   		 * Wa_1607138336:tgl
> >
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 4a255de13394..b11b83546696 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1345,6 +1345,9 @@  rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 {
 	struct drm_i915_private *i915 = engine->i915;
 
+	if (INTEL_GEN(i915) >= 10)
+		wa_init_mcr(i915, wal);
+
 	if (IS_TGL_REVID(i915, TGL_REVID_A0, TGL_REVID_A0)) {
 		/*
 		 * Wa_1607138336:tgl