diff mbox series

[5/8] drm/i915/perf: lock powergating configuration to default when active

Message ID 20180814144058.19286-6-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Per context dynamic (sub)slice power-gating | expand

Commit Message

Tvrtko Ursulin Aug. 14, 2018, 2:40 p.m. UTC
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

If some of the contexts submitting workloads to the GPU have been
configured to shutdown slices/subslices, we might loose the NOA
configurations written in the NOA muxes.

One possible solution to this problem is to reprogram the NOA muxes
when we switch to a new context. We initially tried this in the
workaround batchbuffer but some concerns where raised about the cost
of reprogramming at every context switch. This solution is also not
without consequences from the userspace point of view. Reprogramming
of the muxes can only happen once the powergating configuration has
changed (which happens after context switch). This means for a window
of time during the recording, counters recorded by the OA unit might
be invalid. This requires userspace dealing with OA reports to discard
the invalid values.

Minimizing the reprogramming could be implemented by tracking of the
last programmed configuration somewhere in GGTT and use MI_PREDICATE
to discard some of the programming commands, but the command streamer
would still have to parse all the MI_LRI instructions in the
workaround batchbuffer.

Another solution, which this change implements, is to simply disregard
the user requested configuration for the period of time when i915/perf
is active. There is no known issue with this apart from a performance
penality for some media workloads that benefit from running on a
partially powergated GPU. We already prevent RC6 from affecting the
programming so it doesn't sound completely unreasonable to hold on
powergating for the same reason.

v2: Leave RPCS programming in intel_lrc.c (Lionel)

v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
    More to_intel_context() (Tvrtko)
    s/dev_priv/i915/ (Tvrtko)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
 drivers/gpu/drm/i915/intel_lrc.h |  3 +++
 4 files changed, 43 insertions(+), 9 deletions(-)

Comments

Lionel Landwerlin Aug. 14, 2018, 2:59 p.m. UTC | #1
Hey Tvrtko,

Thanks for taking over this series.

I've been talking to developers using the i915/perf interface and from 
their point of view, they expect the system to be in a stable 
configuration when doing measurements.

One issue with this patch on Gen11 is that it will lock the system in a 
configuration that isn't workable for media workloads (all subslices 
enabled).
So I think we should set the value for the locked configuration per 
generation (gen8-10: all slices/subslices, gen11: only subslices that 
contain VME samplers) so that we always get a functional configurations 
for all workloads.
Could we want to select that configuration when opening perf?

Another question is how do we expose the selected value to the user. But 
that can be solved in a different series.

Cheers,

-
Lionel

On 14/08/18 15:40, Tvrtko Ursulin wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes.
>
> One possible solution to this problem is to reprogram the NOA muxes
> when we switch to a new context. We initially tried this in the
> workaround batchbuffer but some concerns where raised about the cost
> of reprogramming at every context switch. This solution is also not
> without consequences from the userspace point of view. Reprogramming
> of the muxes can only happen once the powergating configuration has
> changed (which happens after context switch). This means for a window
> of time during the recording, counters recorded by the OA unit might
> be invalid. This requires userspace dealing with OA reports to discard
> the invalid values.
>
> Minimizing the reprogramming could be implemented by tracking of the
> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> to discard some of the programming commands, but the command streamer
> would still have to parse all the MI_LRI instructions in the
> workaround batchbuffer.
>
> Another solution, which this change implements, is to simply disregard
> the user requested configuration for the period of time when i915/perf
> is active. There is no known issue with this apart from a performance
> penality for some media workloads that benefit from running on a
> partially powergated GPU. We already prevent RC6 from affecting the
> programming so it doesn't sound completely unreasonable to hold on
> powergating for the same reason.
>
> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>
> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>      More to_intel_context() (Tvrtko)
>      s/dev_priv/i915/ (Tvrtko)
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
>   drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>   4 files changed, 43 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d6049c3f911b..5c12d2676435 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3851,4 +3851,19 @@ static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
>   		return I915_HWS_CSB_WRITE_INDEX;
>   }
>   
> +static inline struct intel_sseu
> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
> +			  struct intel_sseu sseu)
> +{
> +	struct drm_i915_private *i915 = engine->i915;
> +
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. The most natural configuration to take in that case
> +	 * is the default (i.e maximum the hardware can do).
> +	 */
> +	return i915->perf.oa.exclusive_stream ?
> +		intel_device_default_sseu(i915) : sseu;
> +}
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index ccb20230df2c..c2fc2399e0ed 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
>    */
>   static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   					   u32 *reg_state,
> -					   const struct i915_oa_config *oa_config)
> +					   const struct i915_oa_config *oa_config,
> +					   struct intel_sseu sseu)
>   {
>   	struct drm_i915_private *dev_priv = ctx->i915;
>   	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
> @@ -1677,6 +1678,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
> +
> +	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>   }
>   
>   /*
> @@ -1807,6 +1811,7 @@ static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>   static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   				       const struct i915_oa_config *oa_config)
>   {
> +	struct intel_sseu default_sseu = intel_device_default_sseu(dev_priv);
>   	struct intel_engine_cs *engine = dev_priv->engine[RCS];
>   	struct i915_gem_context *ctx;
>   	int ret;
> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   		ce->state->obj->mm.dirty = true;
>   		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>   
> -		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
> +		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
> +					       oa_config ? default_sseu : ce->sseu);
>   
>   		i915_gem_object_unpin_map(ce->state->obj);
>   	}
> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   			    struct i915_gem_context *ctx,
>   			    u32 *reg_state)
>   {
> +	struct drm_i915_private *i915 = engine->i915;
>   	struct i915_perf_stream *stream;
>   
>   	if (engine->id != RCS)
>   		return;
>   
> -	stream = engine->i915->perf.oa.exclusive_stream;
> -	if (stream)
> -		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
> +	stream = i915->perf.oa.exclusive_stream;
> +	if (stream) {
> +		struct intel_sseu default_sseu =
> +			intel_device_default_sseu(i915);
> +
> +		gen8_update_reg_state_unlocked(ctx, reg_state,
> +					       stream->oa_config,
> +					       default_sseu);
> +	}
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7b2f2d6bb057..8a2997be7ef7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
> -		     struct intel_sseu ctx_sseu)
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   struct intel_sseu ctx_sseu)
>   {
>   	u32 rpcs = 0;
>   
> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>   	}
>   
>   	if (rcs) {
> +		struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
> +
>   		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> -			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> -				  to_intel_context(ctx, engine)->sseu));
> +			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
> +				       intel_engine_prepare_sseu(engine,
> +								 sseu)));
>   
>   		i915_oa_init_reg_state(engine, ctx, regs);
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f5a5502ecf70..bf3acdc3d0af 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
>   void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
>   
> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
> +		   struct intel_sseu ctx_sseu);
> +
>   #endif /* _INTEL_LRC_H_ */
Tvrtko Ursulin Aug. 15, 2018, 11:58 a.m. UTC | #2
On 14/08/2018 15:59, Lionel Landwerlin wrote:
> Hey Tvrtko,
> 
> Thanks for taking over this series.
> 
> I've been talking to developers using the i915/perf interface and from 
> their point of view, they expect the system to be in a stable 
> configuration when doing measurements.
> 
> One issue with this patch on Gen11 is that it will lock the system in a 
> configuration that isn't workable for media workloads (all subslices 
> enabled).
> So I think we should set the value for the locked configuration per 
> generation (gen8-10: all slices/subslices, gen11: only subslices that 
> contain VME samplers) so that we always get a functional configurations 
> for all workloads.
> Could we want to select that configuration when opening perf?

This would be via i915_perf.c/gen8_configure_all_contexts?

Sounds like an unfortunate but workable compromise. As long as there 
doesn't appear another asymmetric slice feature in the future, which 
doesn't align with VME. Like one workloads wants slices 0 & 2, another 
wants 1 & 3, or whatever. If that happens then I don't know what we do 
apart from locking out perf/OA.

> Another question is how do we expose the selected value to the user. But 
> that can be solved in a different series.

SSEU get_param would cut it? Although it would be perhaps be unexpected 
to get different results depending on whether perf/OA is active or not.. 
Hm... export device and available bitmasks via get param? Device bitmask 
would be fixed and available would change depending on whether perf/OA 
is active or not.

Regards,

Tvrtko

> Cheers,
> 
> -
> Lionel
> 
> On 14/08/18 15:40, Tvrtko Ursulin wrote:
>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>
>> If some of the contexts submitting workloads to the GPU have been
>> configured to shutdown slices/subslices, we might loose the NOA
>> configurations written in the NOA muxes.
>>
>> One possible solution to this problem is to reprogram the NOA muxes
>> when we switch to a new context. We initially tried this in the
>> workaround batchbuffer but some concerns where raised about the cost
>> of reprogramming at every context switch. This solution is also not
>> without consequences from the userspace point of view. Reprogramming
>> of the muxes can only happen once the powergating configuration has
>> changed (which happens after context switch). This means for a window
>> of time during the recording, counters recorded by the OA unit might
>> be invalid. This requires userspace dealing with OA reports to discard
>> the invalid values.
>>
>> Minimizing the reprogramming could be implemented by tracking of the
>> last programmed configuration somewhere in GGTT and use MI_PREDICATE
>> to discard some of the programming commands, but the command streamer
>> would still have to parse all the MI_LRI instructions in the
>> workaround batchbuffer.
>>
>> Another solution, which this change implements, is to simply disregard
>> the user requested configuration for the period of time when i915/perf
>> is active. There is no known issue with this apart from a performance
>> penality for some media workloads that benefit from running on a
>> partially powergated GPU. We already prevent RC6 from affecting the
>> programming so it doesn't sound completely unreasonable to hold on
>> powergating for the same reason.
>>
>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>
>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>      More to_intel_context() (Tvrtko)
>>      s/dev_priv/i915/ (Tvrtko)
>>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  | 15 +++++++++++++++
>>   drivers/gpu/drm/i915/i915_perf.c | 23 ++++++++++++++++++-----
>>   drivers/gpu/drm/i915/intel_lrc.c | 11 +++++++----
>>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>   4 files changed, 43 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d6049c3f911b..5c12d2676435 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3851,4 +3851,19 @@ static inline int 
>> intel_hws_csb_write_index(struct drm_i915_private *i915)
>>           return I915_HWS_CSB_WRITE_INDEX;
>>   }
>> +static inline struct intel_sseu
>> +intel_engine_prepare_sseu(struct intel_engine_cs *engine,
>> +              struct intel_sseu sseu)
>> +{
>> +    struct drm_i915_private *i915 = engine->i915;
>> +
>> +    /*
>> +     * If i915/perf is active, we want a stable powergating 
>> configuration
>> +     * on the system. The most natural configuration to take in that 
>> case
>> +     * is the default (i.e maximum the hardware can do).
>> +     */
>> +    return i915->perf.oa.exclusive_stream ?
>> +        intel_device_default_sseu(i915) : sseu;
>> +}
>> +
>>   #endif
>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>> b/drivers/gpu/drm/i915/i915_perf.c
>> index ccb20230df2c..c2fc2399e0ed 100644
>> --- a/drivers/gpu/drm/i915/i915_perf.c
>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>> @@ -1631,7 +1631,8 @@ static void hsw_disable_metric_set(struct 
>> drm_i915_private *dev_priv)
>>    */
>>   static void gen8_update_reg_state_unlocked(struct i915_gem_context 
>> *ctx,
>>                          u32 *reg_state,
>> -                       const struct i915_oa_config *oa_config)
>> +                       const struct i915_oa_config *oa_config,
>> +                       struct intel_sseu sseu)
>>   {
>>       struct drm_i915_private *dev_priv = ctx->i915;
>>       u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
>> @@ -1677,6 +1678,9 @@ static void 
>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>           CTX_REG(reg_state, state_offset, flex_regs[i], value);
>>       }
>> +
>> +    CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> +        gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
>>   }
>>   /*
>> @@ -1807,6 +1811,7 @@ static int 
>> gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
>>   static int gen8_configure_all_contexts(struct drm_i915_private 
>> *dev_priv,
>>                          const struct i915_oa_config *oa_config)
>>   {
>> +    struct intel_sseu default_sseu = 
>> intel_device_default_sseu(dev_priv);
>>       struct intel_engine_cs *engine = dev_priv->engine[RCS];
>>       struct i915_gem_context *ctx;
>>       int ret;
>> @@ -1854,7 +1859,8 @@ static int gen8_configure_all_contexts(struct 
>> drm_i915_private *dev_priv,
>>           ce->state->obj->mm.dirty = true;
>>           regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>> -        gen8_update_reg_state_unlocked(ctx, regs, oa_config);
>> +        gen8_update_reg_state_unlocked(ctx, regs, oa_config,
>> +                           oa_config ? default_sseu : ce->sseu);
>>           i915_gem_object_unpin_map(ce->state->obj);
>>       }
>> @@ -2226,14 +2232,21 @@ void i915_oa_init_reg_state(struct 
>> intel_engine_cs *engine,
>>                   struct i915_gem_context *ctx,
>>                   u32 *reg_state)
>>   {
>> +    struct drm_i915_private *i915 = engine->i915;
>>       struct i915_perf_stream *stream;
>>       if (engine->id != RCS)
>>           return;
>> -    stream = engine->i915->perf.oa.exclusive_stream;
>> -    if (stream)
>> -        gen8_update_reg_state_unlocked(ctx, reg_state, 
>> stream->oa_config);
>> +    stream = i915->perf.oa.exclusive_stream;
>> +    if (stream) {
>> +        struct intel_sseu default_sseu =
>> +            intel_device_default_sseu(i915);
>> +
>> +        gen8_update_reg_state_unlocked(ctx, reg_state,
>> +                           stream->oa_config,
>> +                           default_sseu);
>> +    }
>>   }
>>   /**
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 7b2f2d6bb057..8a2997be7ef7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2484,8 +2484,8 @@ int logical_xcs_ring_init(struct intel_engine_cs 
>> *engine)
>>       return logical_ring_init(engine);
>>   }
>> -static u32 make_rpcs(const struct sseu_dev_info *sseu,
>> -             struct intel_sseu ctx_sseu)
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu)
>>   {
>>       u32 rpcs = 0;
>> @@ -2635,10 +2635,13 @@ static void execlists_init_reg_state(u32 *regs,
>>       }
>>       if (rcs) {
>> +        struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
>> +
>>           regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>>           CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>> -            make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> -                  to_intel_context(ctx, engine)->sseu));
>> +            gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
>> +                       intel_engine_prepare_sseu(engine,
>> +                                 sseu)));
>>           i915_oa_init_reg_state(engine, ctx, regs);
>>       }
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h 
>> b/drivers/gpu/drm/i915/intel_lrc.h
>> index f5a5502ecf70..bf3acdc3d0af 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct 
>> drm_i915_private *dev_priv);
>>   void intel_execlists_set_default_submission(struct intel_engine_cs 
>> *engine);
>> +u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
>> +           struct intel_sseu ctx_sseu);
>> +
>>   #endif /* _INTEL_LRC_H_ */
> 
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tvrtko Ursulin Aug. 16, 2018, 7:31 a.m. UTC | #3
On 15/08/2018 12:58, Tvrtko Ursulin wrote:
> 
> On 14/08/2018 15:59, Lionel Landwerlin wrote:
>> Hey Tvrtko,
>>
>> Thanks for taking over this series.
>>
>> I've been talking to developers using the i915/perf interface and from 
>> their point of view, they expect the system to be in a stable 
>> configuration when doing measurements.
>>
>> One issue with this patch on Gen11 is that it will lock the system in 
>> a configuration that isn't workable for media workloads (all subslices 
>> enabled).
>> So I think we should set the value for the locked configuration per 
>> generation (gen8-10: all slices/subslices, gen11: only subslices that 
>> contain VME samplers) so that we always get a functional 
>> configurations for all workloads.
>> Could we want to select that configuration when opening perf?
> 
> This would be via i915_perf.c/gen8_configure_all_contexts?
> 
> Sounds like an unfortunate but workable compromise. As long as there 
> doesn't appear another asymmetric slice feature in the future, which 
> doesn't align with VME. Like one workloads wants slices 0 & 2, another 
> wants 1 & 3, or whatever. If that happens then I don't know what we do 
> apart from locking out perf/OA.
> 
>> Another question is how do we expose the selected value to the user. 
>> But that can be solved in a different series.
> 
> SSEU get_param would cut it? Although it would be perhaps be unexpected 
> to get different results depending on whether perf/OA is active or not.. 
> Hm... export device and available bitmasks via get param? Device bitmask 
> would be fixed and available would change depending on whether perf/OA 
> is active or not.

Big downside to this is that observing something via OA would cause slow 
down of the whole system - since the SSEU config would be locked to a 
subset of slices on Gen11. :( Regardless or not of the details.. say 
even if we choose to lock only if there is one non-default SSEU context, 
that still locks all whilst perf/OA is active.

That's quite bad I think. But the alternative is to lock out perf/OA 
while non-default SSEU contexts are present. Which is also very bad. :(

Overall I don't see a solution. Latter is maybe better by being more 
explicit? Who are the perf/OA users and what would they prefer?

Regards,

Tvrtko
Lionel Landwerlin Aug. 16, 2018, 9:56 a.m. UTC | #4
On 16/08/18 08:31, Tvrtko Ursulin wrote:
>
> On 15/08/2018 12:58, Tvrtko Ursulin wrote:
>>
>> On 14/08/2018 15:59, Lionel Landwerlin wrote:
>>> Hey Tvrtko,
>>>
>>> Thanks for taking over this series.
>>>
>>> I've been talking to developers using the i915/perf interface and 
>>> from their point of view, they expect the system to be in a stable 
>>> configuration when doing measurements.
>>>
>>> One issue with this patch on Gen11 is that it will lock the system 
>>> in a configuration that isn't workable for media workloads (all 
>>> subslices enabled).
>>> So I think we should set the value for the locked configuration per 
>>> generation (gen8-10: all slices/subslices, gen11: only subslices 
>>> that contain VME samplers) so that we always get a functional 
>>> configurations for all workloads.
>>> Could we want to select that configuration when opening perf?
>>
>> This would be via i915_perf.c/gen8_configure_all_contexts?
>>
>> Sounds like an unfortunate but workable compromise. As long as there 
>> doesn't appear another asymmetric slice feature in the future, which 
>> doesn't align with VME. Like one workloads wants slices 0 & 2, 
>> another wants 1 & 3, or whatever. If that happens then I don't know 
>> what we do apart from locking out perf/OA.
>>
>>> Another question is how do we expose the selected value to the user. 
>>> But that can be solved in a different series.
>>
>> SSEU get_param would cut it? Although it would be perhaps be 
>> unexpected to get different results depending on whether perf/OA is 
>> active or not.. Hm... export device and available bitmasks via get 
>> param? Device bitmask would be fixed and available would change 
>> depending on whether perf/OA is active or not.

I would avoid changing the get_param. This is supposed to report the 
physical device's properties.
Maybe add an ioctl entry to report that configuration onto the i915_perf fd?

>
> Big downside to this is that observing something via OA would cause 
> slow down of the whole system - since the SSEU config would be locked 
> to a subset of slices on Gen11. :( Regardless or not of the details.. 
> say even if we choose to lock only if there is one non-default SSEU 
> context, that still locks all whilst perf/OA is active.
>
> That's quite bad I think. But the alternative is to lock out perf/OA 
> while non-default SSEU contexts are present. Which is also very bad. :(
>
> Overall I don't see a solution. Latter is maybe better by being more 
> explicit? Who are the perf/OA users and what would they prefer?

The other solution I could think of would be to prevent schedule 
contexts that are not compatible with the OA configuration.
That doesn't sound great either...

>
> Regards,
>
> Tvrtko
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d6049c3f911b..5c12d2676435 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3851,4 +3851,19 @@  static inline int intel_hws_csb_write_index(struct drm_i915_private *i915)
 		return I915_HWS_CSB_WRITE_INDEX;
 }
 
+static inline struct intel_sseu
+intel_engine_prepare_sseu(struct intel_engine_cs *engine,
+			  struct intel_sseu sseu)
+{
+	struct drm_i915_private *i915 = engine->i915;
+
+	/*
+	 * If i915/perf is active, we want a stable powergating configuration
+	 * on the system. The most natural configuration to take in that case
+	 * is the default (i.e maximum the hardware can do).
+	 */
+	return i915->perf.oa.exclusive_stream ?
+		intel_device_default_sseu(i915) : sseu;
+}
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ccb20230df2c..c2fc2399e0ed 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1631,7 +1631,8 @@  static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  */
 static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+					   const struct i915_oa_config *oa_config,
+					   struct intel_sseu sseu)
 {
 	struct drm_i915_private *dev_priv = ctx->i915;
 	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
@@ -1677,6 +1678,9 @@  static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
+
+	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu, sseu));
 }
 
 /*
@@ -1807,6 +1811,7 @@  static int gen8_switch_to_updated_kernel_context(struct drm_i915_private *dev_pr
 static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 				       const struct i915_oa_config *oa_config)
 {
+	struct intel_sseu default_sseu = intel_device_default_sseu(dev_priv);
 	struct intel_engine_cs *engine = dev_priv->engine[RCS];
 	struct i915_gem_context *ctx;
 	int ret;
@@ -1854,7 +1859,8 @@  static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ctx, regs, oa_config,
+					       oa_config ? default_sseu : ce->sseu);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2226,14 +2232,21 @@  void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 			    struct i915_gem_context *ctx,
 			    u32 *reg_state)
 {
+	struct drm_i915_private *i915 = engine->i915;
 	struct i915_perf_stream *stream;
 
 	if (engine->id != RCS)
 		return;
 
-	stream = engine->i915->perf.oa.exclusive_stream;
-	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+	stream = i915->perf.oa.exclusive_stream;
+	if (stream) {
+		struct intel_sseu default_sseu =
+			intel_device_default_sseu(i915);
+
+		gen8_update_reg_state_unlocked(ctx, reg_state,
+					       stream->oa_config,
+					       default_sseu);
+	}
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7b2f2d6bb057..8a2997be7ef7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2484,8 +2484,8 @@  int logical_xcs_ring_init(struct intel_engine_cs *engine)
 	return logical_ring_init(engine);
 }
 
-static u32 make_rpcs(const struct sseu_dev_info *sseu,
-		     struct intel_sseu ctx_sseu)
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   struct intel_sseu ctx_sseu)
 {
 	u32 rpcs = 0;
 
@@ -2635,10 +2635,13 @@  static void execlists_init_reg_state(u32 *regs,
 	}
 
 	if (rcs) {
+		struct intel_sseu sseu = to_intel_context(ctx, engine)->sseu;
+
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-			make_rpcs(&INTEL_INFO(dev_priv)->sseu,
-				  to_intel_context(ctx, engine)->sseu));
+			gen8_make_rpcs(&INTEL_INFO(dev_priv)->sseu,
+				       intel_engine_prepare_sseu(engine,
+								 sseu)));
 
 		i915_oa_init_reg_state(engine, ctx, regs);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f5a5502ecf70..bf3acdc3d0af 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,4 +104,7 @@  void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
 
+u32 gen8_make_rpcs(const struct sseu_dev_info *sseu,
+		   struct intel_sseu ctx_sseu);
+
 #endif /* _INTEL_LRC_H_ */