diff mbox series

[2/2] drm/i915/selftest: Bump up sample period for busy stats selftest

Message ID 20221105003235.1717908-3-umesh.nerlige.ramappa@intel.com (mailing list archive)
State New, archived
Headers show
Series Fix live busy stats selftest failure | expand

Commit Message

Umesh Nerlige Ramappa Nov. 5, 2022, 12:32 a.m. UTC
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.

When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read. The
latency incurred by this register read causes the failure.

On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.

One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to user
since the CPU timestamp obtained here is only used for (1) selftest and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.

In order to make the selftest more robust and account for such
latencies, increase the sample period to 100 ms.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tvrtko Ursulin Nov. 7, 2022, 10:16 a.m. UTC | #1
On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote:
> Engine busyness samples around a 10ms period is failing with busyness
> ranging approx. from 87% to 115%. The expected range is +/- 5% of the
> sample period.
> 
> When determining busyness of active engine, the GuC based engine
> busyness implementation relies on a 64 bit timestamp register read. The
> latency incurred by this register read causes the failure.
> 
> On DG1, when the test fails, the observed latencies range from 900us -
> 1.5ms.

Is it at all faster with the locked 2x32 or still the same unexplained 
display related latencies can happen?

> One solution tried was to reduce the latency between reg read and
> CPU timestamp capture, but such optimization does not add value to user
> since the CPU timestamp obtained here is only used for (1) selftest and
> (2) i915 rps implementation specific to execlist scheduler. Also, this
> solution only reduces the frequency of failure and does not eliminate
> it.
> 
> In order to make the selftest more robust and account for such
> latencies, increase the sample period to 100 ms.
> 
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> index 0dcb3ed44a73..87c94314cf67 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
>   		ENGINE_TRACE(engine, "measuring busy time\n");
>   		preempt_disable();
>   		de = intel_engine_get_busy_time(engine, &t[0]);
> -		mdelay(10);
> +		mdelay(100);
>   		de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>   		preempt_enable();
>   		dt = ktime_sub(t[1], t[0]);

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Umesh Nerlige Ramappa Nov. 7, 2022, 7:01 p.m. UTC | #2
On Mon, Nov 07, 2022 at 10:16:20AM +0000, Tvrtko Ursulin wrote:
>
>On 05/11/2022 00:32, Umesh Nerlige Ramappa wrote:
>>Engine busyness samples around a 10ms period is failing with busyness
>>ranging approx. from 87% to 115%. The expected range is +/- 5% of the
>>sample period.
>>
>>When determining busyness of active engine, the GuC based engine
>>busyness implementation relies on a 64 bit timestamp register read. The
>>latency incurred by this register read causes the failure.
>>
>>On DG1, when the test fails, the observed latencies range from 900us -
>>1.5ms.
>
>Is it at all faster with the locked 2x32 or still the same unexplained 
>display related latencies can happen?

Considering that originally this failed 1 in 10 runs,

The locked 2x32 patch in this series reduces failure rate to 1 in 50.

What really helps is - if the CPU timestamp is taken within the 
forcewake block, then the correlation between GPU/CPU times is very good 
and that reduces the selftest failure frequency (1 in 200).  More like 
this:

uncore_lock
fw_get
read 64-bit GPU time
read CPU timestamp
fw_put
uncore_unlock.

I recall we had arrived at this sequence in the past when implementing 
query_cs_cycles 
- https://patchwork.freedesktop.org/patch/432041/?series=89766&rev=1

I still included the locked 2x32 patch here because 1 in 50 is still 
better than 1 in 10.

For now, 100 ms sample period is the only promising solution I see. No 
failures for 1000 runs.

Thanks,
Umesh

>
>>One solution tried was to reduce the latency between reg read and
>>CPU timestamp capture, but such optimization does not add value to user
>>since the CPU timestamp obtained here is only used for (1) selftest and
>>(2) i915 rps implementation specific to execlist scheduler. Also, this
>>solution only reduces the frequency of failure and does not eliminate
>>it.
>>
>>In order to make the selftest more robust and account for such
>>latencies, increase the sample period to 100 ms.
>>
>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>---
>>  drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>index 0dcb3ed44a73..87c94314cf67 100644
>>--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
>>@@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
>>  		ENGINE_TRACE(engine, "measuring busy time\n");
>>  		preempt_disable();
>>  		de = intel_engine_get_busy_time(engine, &t[0]);
>>-		mdelay(10);
>>+		mdelay(100);
>>  		de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>>  		preempt_enable();
>>  		dt = ktime_sub(t[1], t[0]);
>
>Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
>Regards,
>
>Tvrtko
Dixit, Ashutosh Nov. 7, 2022, 11:33 p.m. UTC | #3
On Fri, 04 Nov 2022 17:32:35 -0700, Umesh Nerlige Ramappa wrote:
>
> Engine busyness samples around a 10ms period is failing with busyness
> ranging approx. from 87% to 115%. The expected range is +/- 5% of the
> sample period.
>
> When determining busyness of active engine, the GuC based engine
> busyness implementation relies on a 64 bit timestamp register read. The
> latency incurred by this register read causes the failure.
>
> On DG1, when the test fails, the observed latencies range from 900us -
> 1.5ms.
>
> One solution tried was to reduce the latency between reg read and
> CPU timestamp capture, but such optimization does not add value to user
> since the CPU timestamp obtained here is only used for (1) selftest and
> (2) i915 rps implementation specific to execlist scheduler. Also, this
> solution only reduces the frequency of failure and does not eliminate
> it.
>
> In order to make the selftest more robust and account for such
> latencies, increase the sample period to 100 ms.

Hi Umesh,

I think it would be good to add to the commit message:

* Gitlab bug number if any
* Paste of the actual dmesg error in the commit message
* Also adapt the above commit message to the fact that we've now added the
  optimized 64 bit read

With that this is:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

If you want me to review the new commit message I can do that too.

Thanks.
--
Ashutosh


>
> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_engine_pm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> index 0dcb3ed44a73..87c94314cf67 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
> @@ -317,7 +317,7 @@ static int live_engine_busy_stats(void *arg)
>		ENGINE_TRACE(engine, "measuring busy time\n");
>		preempt_disable();
>		de = intel_engine_get_busy_time(engine, &t[0]);
> -		mdelay(10);
> +		mdelay(100);
>		de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
>		preempt_enable();
>		dt = ktime_sub(t[1], t[0]);
> --
> 2.36.1
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
index 0dcb3ed44a73..87c94314cf67 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_pm.c
@@ -317,7 +317,7 @@  static int live_engine_busy_stats(void *arg)
 		ENGINE_TRACE(engine, "measuring busy time\n");
 		preempt_disable();
 		de = intel_engine_get_busy_time(engine, &t[0]);
-		mdelay(10);
+		mdelay(100);
 		de = ktime_sub(intel_engine_get_busy_time(engine, &t[1]), de);
 		preempt_enable();
 		dt = ktime_sub(t[1], t[0]);