diff mbox series

drm/i915/selftests: Bump the scheduling threshold for fast heartbeats

Message ID 20210113125939.10205-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/selftests: Bump the scheduling threshold for fast heartbeats | expand

Commit Message

Chris Wilson Jan. 13, 2021, 12:59 p.m. UTC
Since we are system_highpri_wq, we expected the heartbeat to be
scheduled promptly. However, we see delays of over 10ms upsetting our
assertions. Accept this as inevitable and bump the error threshold to
20ms (from 6ms).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala Jan. 13, 2021, 2:13 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Since we are system_highpri_wq, we expected the heartbeat to be
> scheduled promptly. However, we see delays of over 10ms upsetting our
> assertions. Accept this as inevitable and bump the error threshold to
> 20ms (from 6ms).
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index b88aa35ad75b..e88a01390dc5 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -197,6 +197,7 @@ static int cmp_u32(const void *_a, const void *_b)
>  
>  static int __live_heartbeat_fast(struct intel_engine_cs *engine)
>  {
> +	const int error_threshold = max(20000, jffies_to_usecs(6));

s/jffies/jiffies

Also for the commit message, 6 jiffies are not 6ms so it needs
some mending.

-Mika

>  	struct intel_context *ce;
>  	struct i915_request *rq;
>  	ktime_t t0, t1;
> @@ -254,12 +255,18 @@ static int __live_heartbeat_fast(struct intel_engine_cs *engine)
>  		times[0],
>  		times[ARRAY_SIZE(times) - 1]);
>  
> -	/* Min work delay is 2 * 2 (worst), +1 for scheduling, +1 for slack */
> -	if (times[ARRAY_SIZE(times) / 2] > jiffies_to_usecs(6)) {
> +	/*
> +	 * Ideally, the upper bound on min work delay would be something like
> +	 * 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
> +	 * are, even with system_wq_highpri, at the mercy of the CPU scheduler
> +	 * and may be stuck behind some slow work for many millisecond. Such
> +	 * as our very own display workers.
> +	 */
> +	if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
>  		pr_err("%s: Heartbeat delay was %uus, expected less than %dus\n",
>  		       engine->name,
>  		       times[ARRAY_SIZE(times) / 2],
> -		       jiffies_to_usecs(6));
> +		       error_threshold);
>  		err = -EINVAL;
>  	}
>  
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 13, 2021, 2:20 p.m. UTC | #2
Quoting Mika Kuoppala (2021-01-13 14:13:57)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Since we are system_highpri_wq, we expected the heartbeat to be
> > scheduled promptly. However, we see delays of over 10ms upsetting our
> > assertions. Accept this as inevitable and bump the error threshold to
> > 20ms (from 6ms).
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> > index b88aa35ad75b..e88a01390dc5 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> > @@ -197,6 +197,7 @@ static int cmp_u32(const void *_a, const void *_b)
> >  
> >  static int __live_heartbeat_fast(struct intel_engine_cs *engine)
> >  {
> > +     const int error_threshold = max(20000, jffies_to_usecs(6));
> 
> s/jffies/jiffies
> 
> Also for the commit message, 6 jiffies are not 6ms so it needs
> some mending.

Ok, might as well pull the failure messages from CI as well for a bit
more information.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index b88aa35ad75b..e88a01390dc5 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -197,6 +197,7 @@  static int cmp_u32(const void *_a, const void *_b)
 
 static int __live_heartbeat_fast(struct intel_engine_cs *engine)
 {
+	const int error_threshold = max(20000, jffies_to_usecs(6));
 	struct intel_context *ce;
 	struct i915_request *rq;
 	ktime_t t0, t1;
@@ -254,12 +255,18 @@  static int __live_heartbeat_fast(struct intel_engine_cs *engine)
 		times[0],
 		times[ARRAY_SIZE(times) - 1]);
 
-	/* Min work delay is 2 * 2 (worst), +1 for scheduling, +1 for slack */
-	if (times[ARRAY_SIZE(times) / 2] > jiffies_to_usecs(6)) {
+	/*
+	 * Ideally, the upper bound on min work delay would be something like
+	 * 2 * 2 (worst), +1 for scheduling, +1 for slack. In practice, we
+	 * are, even with system_wq_highpri, at the mercy of the CPU scheduler
+	 * and may be stuck behind some slow work for many millisecond. Such
+	 * as our very own display workers.
+	 */
+	if (times[ARRAY_SIZE(times) / 2] > error_threshold) {
 		pr_err("%s: Heartbeat delay was %uus, expected less than %dus\n",
 		       engine->name,
 		       times[ARRAY_SIZE(times) / 2],
-		       jiffies_to_usecs(6));
+		       error_threshold);
 		err = -EINVAL;
 	}