diff mbox series

[6/8] drm/i915/selftests: Terminate hangcheck sanitycheck forcibly

Message ID 20181203113701.12106-6-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/8] drm/i915/breadcrumbs: Reduce missed-breadcrumb false positive rate | expand

Commit Message

Chris Wilson Dec. 3, 2018, 11:36 a.m. UTC
If all else fails and we are stuck eternally waiting for the undying
request, abandon all hope.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala Dec. 3, 2018, 12:09 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> If all else fails and we are stuck eternally waiting for the undying
> request, abandon all hope.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index defe671130ab..a48fbe2557ea 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -308,6 +308,7 @@ static int igt_hang_sanitycheck(void *arg)
>  		goto unlock;
>  
>  	for_each_engine(engine, i915, id) {
> +		struct igt_wedge_me w;
>  		long timeout;
>  
>  		if (!intel_engine_can_store_dword(engine))
> @@ -328,9 +329,14 @@ static int igt_hang_sanitycheck(void *arg)
>  
>  		i915_request_add(rq);
>  
> -		timeout = i915_request_wait(rq,
> -					    I915_WAIT_LOCKED,
> -					    MAX_SCHEDULE_TIMEOUT);
> +		timeout = 0;
> +		igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)

100ms? We are emitting a hanging batch here, so there is something
I am missing here.

-Mika


> +			timeout = i915_request_wait(rq,
> +						    I915_WAIT_LOCKED,
> +						    MAX_SCHEDULE_TIMEOUT);
> +		if (i915_terminally_wedged(&i915->gpu_error))
> +			timeout = -EIO;
> +
>  		i915_request_put(rq);
>  
>  		if (timeout < 0) {
> -- 
> 2.20.0.rc1
Chris Wilson Dec. 3, 2018, 12:17 p.m. UTC | #2
Quoting Mika Kuoppala (2018-12-03 12:09:39)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If all else fails and we are stuck eternally waiting for the undying
> > request, abandon all hope.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 12 +++++++++---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index defe671130ab..a48fbe2557ea 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -308,6 +308,7 @@ static int igt_hang_sanitycheck(void *arg)
> >               goto unlock;
> >  
> >       for_each_engine(engine, i915, id) {
> > +             struct igt_wedge_me w;
> >               long timeout;
> >  
> >               if (!intel_engine_can_store_dword(engine))
> > @@ -328,9 +329,14 @@ static int igt_hang_sanitycheck(void *arg)
> >  
> >               i915_request_add(rq);
> >  
> > -             timeout = i915_request_wait(rq,
> > -                                         I915_WAIT_LOCKED,
> > -                                         MAX_SCHEDULE_TIMEOUT);
> > +             timeout = 0;
> > +             igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)
> 
> 100ms? We are emitting a hanging batch here, so there is something
> I am missing here.

It's not a hanging batch, anymore due to the terminator applied a couple
of lines above.
-Chris
Mika Kuoppala Dec. 3, 2018, 12:21 p.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-12-03 12:09:39)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > If all else fails and we are stuck eternally waiting for the undying
>> > request, abandon all hope.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 12 +++++++++---
>> >  1 file changed, 9 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
>> > index defe671130ab..a48fbe2557ea 100644
>> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
>> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
>> > @@ -308,6 +308,7 @@ static int igt_hang_sanitycheck(void *arg)
>> >               goto unlock;
>> >  
>> >       for_each_engine(engine, i915, id) {
>> > +             struct igt_wedge_me w;
>> >               long timeout;
>> >  
>> >               if (!intel_engine_can_store_dword(engine))
>> > @@ -328,9 +329,14 @@ static int igt_hang_sanitycheck(void *arg)
>> >  
>> >               i915_request_add(rq);
>> >  
>> > -             timeout = i915_request_wait(rq,
>> > -                                         I915_WAIT_LOCKED,
>> > -                                         MAX_SCHEDULE_TIMEOUT);
>> > +             timeout = 0;
>> > +             igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)
>> 
>> 100ms? We are emitting a hanging batch here, so there is something
>> I am missing here.
>
> It's not a hanging batch, anymore due to the terminator applied a couple
> of lines above.

There it is. I did read the code, I did have coffee. It is Monday.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> -Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index defe671130ab..a48fbe2557ea 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -308,6 +308,7 @@  static int igt_hang_sanitycheck(void *arg)
 		goto unlock;
 
 	for_each_engine(engine, i915, id) {
+		struct igt_wedge_me w;
 		long timeout;
 
 		if (!intel_engine_can_store_dword(engine))
@@ -328,9 +329,14 @@  static int igt_hang_sanitycheck(void *arg)
 
 		i915_request_add(rq);
 
-		timeout = i915_request_wait(rq,
-					    I915_WAIT_LOCKED,
-					    MAX_SCHEDULE_TIMEOUT);
+		timeout = 0;
+		igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)
+			timeout = i915_request_wait(rq,
+						    I915_WAIT_LOCKED,
+						    MAX_SCHEDULE_TIMEOUT);
+		if (i915_terminally_wedged(&i915->gpu_error))
+			timeout = -EIO;
+
 		i915_request_put(rq);
 
 		if (timeout < 0) {