diff mbox series

[v2] drm/i915/dg2: Add performance workaround 18019455067

Message ID 20220627125928.177845-1-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/i915/dg2: Add performance workaround 18019455067 | expand

Commit Message

Lionel Landwerlin June 27, 2022, 12:59 p.m. UTC
The recommended number of stackIDs for Ray Tracing subsystem is 512
rather than 2048 (default HW programming).

v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 4 ++++
 drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
 2 files changed, 9 insertions(+)

Comments

Matt Roper June 29, 2022, 10:16 p.m. UTC | #1
On Mon, Jun 27, 2022 at 03:59:28PM +0300, Lionel Landwerlin wrote:
> The recommended number of stackIDs for Ray Tracing subsystem is 512
> rather than 2048 (default HW programming).
> 
> v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas)

I'm not sure this is actually the correct move.  As far as I can see on
bspec 46261, RT_CTRL isn't part of the engine's context, so we need to
make sure it gets added to engine->wa_list instead of
engine->ctx_wa_list, otherwise it won't be properly re-applied after
engine resets and such.  Most of our other tuning values are part of the
context image, so this one is a bit unusual.

To get it onto the engine->wa_list, the workaround needs to either be
defined via rcs_engine_wa_init() or general_render_compute_wa_init().
The latter is the new, preferred location for registers that are part of
the render/compute reset domain, but that don't live in the RCS engine's
0x2xxx MMIO range (since all RCS and CCS engines get reset together, the
items in general_render_compute_wa_init() will make sure it's dealt with
as part of the handling for the first RCS/CCS engine, so that we won't
miss out on applying it if the platform doesn't have an RCS).

At the moment we don't have too many "tuning" values that we need to set
that aren't part of an engine's context, so we don't yet have a
dedicated "tuning" function for engine-style workarounds like we do with
ctx-style workarounds.


Matt

> 
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 4 ++++
>  drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> index 07ef111947b8c..12fc87b957425 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
> @@ -1112,6 +1112,10 @@
>  #define   GEN12_PUSH_CONST_DEREF_HOLD_DIS	REG_BIT(8)
>  
>  #define RT_CTRL					_MMIO(0xe530)
> +#define   RT_CTRL_NUMBER_OF_STACKIDS_MASK	REG_GENMASK(6, 5)
> +#define   NUMBER_OF_STACKIDS_512		2
> +#define   NUMBER_OF_STACKIDS_1024		1
> +#define   NUMBER_OF_STACKIDS_2048		0
>  #define   DIS_NULL_QUERY			REG_BIT(10)
>  
>  #define EU_PERF_CNTL1				_MMIO(0xe558)
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 3213c593a55f4..4d80716b957d4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -575,6 +575,11 @@ static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine,
>  	       FF_MODE2_TDS_TIMER_MASK,
>  	       FF_MODE2_TDS_TIMER_128,
>  	       0, false);
> +	wa_write_clr_set(wal,
> +			 RT_CTRL,
> +			 RT_CTRL_NUMBER_OF_STACKIDS_MASK,
> +			 REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK,
> +					NUMBER_OF_STACKIDS_512));
>  }
>  
>  /*
> -- 
> 2.34.1
>
Lucas De Marchi June 29, 2022, 11:11 p.m. UTC | #2
On Wed, Jun 29, 2022 at 03:16:09PM -0700, Matt Roper wrote:
>On Mon, Jun 27, 2022 at 03:59:28PM +0300, Lionel Landwerlin wrote:
>> The recommended number of stackIDs for Ray Tracing subsystem is 512
>> rather than 2048 (default HW programming).
>>
>> v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas)
>
>I'm not sure this is actually the correct move.  As far as I can see on
>bspec 46261, RT_CTRL isn't part of the engine's context, so we need to
>make sure it gets added to engine->wa_list instead of
>engine->ctx_wa_list, otherwise it won't be properly re-applied after
>engine resets and such.  Most of our other tuning values are part of the
>context image, so this one is a bit unusual.
>
>To get it onto the engine->wa_list, the workaround needs to either be
>defined via rcs_engine_wa_init() or general_render_compute_wa_init().
>The latter is the new, preferred location for registers that are part of
>the render/compute reset domain, but that don't live in the RCS engine's
>0x2xxx MMIO range (since all RCS and CCS engines get reset together, the
>items in general_render_compute_wa_init() will make sure it's dealt with
>as part of the handling for the first RCS/CCS engine, so that we won't
>miss out on applying it if the platform doesn't have an RCS).
>
>At the moment we don't have too many "tuning" values that we need to set
>that aren't part of an engine's context, so we don't yet have a
>dedicated "tuning" function for engine-style workarounds like we do with
>ctx-style workarounds.


what I meant on my review was not to move it to
dg2_ctx_gt_tuning_init(), but rather to follow the same logic: we need
an equivalent tuning version for engine wa.

Lucas De Marchi
Lionel Landwerlin June 30, 2022, 8:36 a.m. UTC | #3
On 30/06/2022 01:16, Matt Roper wrote:
> On Mon, Jun 27, 2022 at 03:59:28PM +0300, Lionel Landwerlin wrote:
>> The recommended number of stackIDs for Ray Tracing subsystem is 512
>> rather than 2048 (default HW programming).
>>
>> v2: Move the programming to dg2_ctx_gt_tuning_init() (Lucas)
> I'm not sure this is actually the correct move.  As far as I can see on
> bspec 46261, RT_CTRL isn't part of the engine's context, so we need to
> make sure it gets added to engine->wa_list instead of
> engine->ctx_wa_list, otherwise it won't be properly re-applied after
> engine resets and such.  Most of our other tuning values are part of the
> context image, so this one is a bit unusual.
>
> To get it onto the engine->wa_list, the workaround needs to either be
> defined via rcs_engine_wa_init() or general_render_compute_wa_init().
> The latter is the new, preferred location for registers that are part of
> the render/compute reset domain, but that don't live in the RCS engine's
> 0x2xxx MMIO range (since all RCS and CCS engines get reset together, the
> items in general_render_compute_wa_init() will make sure it's dealt with
> as part of the handling for the first RCS/CCS engine, so that we won't
> miss out on applying it if the platform doesn't have an RCS).
>
> At the moment we don't have too many "tuning" values that we need to set
> that aren't part of an engine's context, so we don't yet have a
> dedicated "tuning" function for engine-style workarounds like we do with
> ctx-style workarounds.
>
>
> Matt


Thanks Matt,


I didn't pay attention to the register offset and that it's not 
context/engine specific.

Moving it to general_render_compute_wa_init()


-Lionel


>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt_regs.h     | 4 ++++
>>   drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> index 07ef111947b8c..12fc87b957425 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
>> @@ -1112,6 +1112,10 @@
>>   #define   GEN12_PUSH_CONST_DEREF_HOLD_DIS	REG_BIT(8)
>>   
>>   #define RT_CTRL					_MMIO(0xe530)
>> +#define   RT_CTRL_NUMBER_OF_STACKIDS_MASK	REG_GENMASK(6, 5)
>> +#define   NUMBER_OF_STACKIDS_512		2
>> +#define   NUMBER_OF_STACKIDS_1024		1
>> +#define   NUMBER_OF_STACKIDS_2048		0
>>   #define   DIS_NULL_QUERY			REG_BIT(10)
>>   
>>   #define EU_PERF_CNTL1				_MMIO(0xe558)
>> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> index 3213c593a55f4..4d80716b957d4 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
>> @@ -575,6 +575,11 @@ static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine,
>>   	       FF_MODE2_TDS_TIMER_MASK,
>>   	       FF_MODE2_TDS_TIMER_128,
>>   	       0, false);
>> +	wa_write_clr_set(wal,
>> +			 RT_CTRL,
>> +			 RT_CTRL_NUMBER_OF_STACKIDS_MASK,
>> +			 REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK,
>> +					NUMBER_OF_STACKIDS_512));
>>   }
>>   
>>   /*
>> -- 
>> 2.34.1
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 07ef111947b8c..12fc87b957425 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1112,6 +1112,10 @@ 
 #define   GEN12_PUSH_CONST_DEREF_HOLD_DIS	REG_BIT(8)
 
 #define RT_CTRL					_MMIO(0xe530)
+#define   RT_CTRL_NUMBER_OF_STACKIDS_MASK	REG_GENMASK(6, 5)
+#define   NUMBER_OF_STACKIDS_512		2
+#define   NUMBER_OF_STACKIDS_1024		1
+#define   NUMBER_OF_STACKIDS_2048		0
 #define   DIS_NULL_QUERY			REG_BIT(10)
 
 #define EU_PERF_CNTL1				_MMIO(0xe558)
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 3213c593a55f4..4d80716b957d4 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -575,6 +575,11 @@  static void dg2_ctx_gt_tuning_init(struct intel_engine_cs *engine,
 	       FF_MODE2_TDS_TIMER_MASK,
 	       FF_MODE2_TDS_TIMER_128,
 	       0, false);
+	wa_write_clr_set(wal,
+			 RT_CTRL,
+			 RT_CTRL_NUMBER_OF_STACKIDS_MASK,
+			 REG_FIELD_PREP(RT_CTRL_NUMBER_OF_STACKIDS_MASK,
+					NUMBER_OF_STACKIDS_512));
 }
 
 /*