diff mbox series

[1/2] drm/i915/guc: Fix for potential false positives in GuC hang selftest

Message ID 20231106235929.454983-2-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show
Series Selftest for FAST_REQUEST feature | expand

Commit Message

John Harrison Nov. 6, 2023, 11:59 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

Noticed that the hangcheck selftest is submitting a non-preemptoble
spinner. That means that even if the GuC does not die, the heartbeat
will still kick in and trigger a reset. Which is rather defeating the
purpose of the test - to verify that the heartbeat will kick in if the
GuC itself has died. The test is deliberately killing the GuC, so it
should never hit the case of a non-dead GuC. But it is not impossible
that the kill might fail at some future point due to other driver
re-work.

So, make the spinner pre-emptible. That way the heartbeat can get
through if the GuC is alive and context switching. Thus a reset only
happens if the GuC dies. Thus, if the kill should stop working the
test will now fail rather than claim to pass.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Daniele Ceraolo Spurio Nov. 9, 2023, 8:33 p.m. UTC | #1
On 11/6/2023 3:59 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> Noticed that the hangcheck selftest is submitting a non-preemptoble
> spinner. That means that even if the GuC does not die, the heartbeat
> will still kick in and trigger a reset. Which is rather defeating the
> purpose of the test - to verify that the heartbeat will kick in if the
> GuC itself has died. The test is deliberately killing the GuC, so it
> should never hit the case of a non-dead GuC. But it is not impossible
> that the kill might fail at some future point due to other driver
> re-work.
>
> So, make the spinner pre-emptible. That way the heartbeat can get
> through if the GuC is alive and context switching. Thus a reset only
> happens if the GuC dies. Thus, if the kill should stop working the
> test will now fail rather than claim to pass.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> ---
>   drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
> index 34b5d952e2bcb..26fdc392fce6c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
> @@ -74,7 +74,7 @@ static int intel_hang_guc(void *arg)
>   		goto err;
>   	}
>   
> -	rq = igt_spinner_create_request(&spin, ce, MI_NOOP);
> +	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
>   	intel_context_put(ce);
>   	if (IS_ERR(rq)) {
>   		ret = PTR_ERR(rq);
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
index 34b5d952e2bcb..26fdc392fce6c 100644
--- a/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/uc/selftest_guc_hangcheck.c
@@ -74,7 +74,7 @@  static int intel_hang_guc(void *arg)
 		goto err;
 	}
 
-	rq = igt_spinner_create_request(&spin, ce, MI_NOOP);
+	rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
 	intel_context_put(ce);
 	if (IS_ERR(rq)) {
 		ret = PTR_ERR(rq);