diff mbox

[igt,2/2] lib: Remove illegal instructions from hang injection

Message ID 20170807123636.24907-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 7, 2017, 12:36 p.m. UTC
The idea behind using an illegal instruction was to hang the GPU must
faster than simply using the recursive batch. However, we stopped doing
so on gen8+ as the CS parser was much laxer and allowed the illegal
command through but still interpreted the packet length (jumping over
the recursive batch buffer start that followed). Sandybridge doesn't
just hang the GPU when it encounters an illegal command on the BLT
engine, it hangs the machine. That goes above and beyond testing our
hangcheck + reset, so remove the deadly instructions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_gt.c | 25 +++++--------------------
 1 file changed, 5 insertions(+), 20 deletions(-)

Comments

Mika Kuoppala Aug. 7, 2017, 1:33 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> The idea behind using an illegal instruction was to hang the GPU must
> faster than simply using the recursive batch. However, we stopped doing
> so on gen8+ as the CS parser was much laxer and allowed the illegal
> command through but still interpreted the packet length (jumping over
> the recursive batch buffer start that followed). Sandybridge doesn't
> just hang the GPU when it encounters an illegal command on the BLT
> engine, it hangs the machine. That goes above and beyond testing our
> hangcheck + reset, so remove the deadly instructions.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  lib/igt_gt.c | 25 +++++--------------------
>  1 file changed, 5 insertions(+), 20 deletions(-)
>
> diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> index 6f7daa5e..d5e8b557 100644
> --- a/lib/igt_gt.c
> +++ b/lib/igt_gt.c
> @@ -270,30 +270,15 @@ igt_hang_t igt_hang_ctx(int fd,
>  
>  	memset(b, 0xc5, sizeof(b));
>  
> -	/*
> -	 * We emit invalid command to provoke a gpu hang.
> -	 * If that doesn't work, we do bb start loop.
> -	 * Note that the bb start aligment is illegal due this.
> -	 * But hey, we are here to hang the gpu so whatever works.
> -	 * We skip 0xfffffff on gen9 as it confuses hw in an such a way that
> -	 * it will skip over the bb start, causing runaway head and
> -	 * thus much slower hang detection.
> -	 */

Daydreaming about MI_HALT,

Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

>  	len = 2;
> -	if (intel_gen(intel_get_drm_devid(fd)) >= 8) {
> -		b[0] = MI_NOOP;
> +	if (intel_gen(intel_get_drm_devid(fd)) >= 8)
>  		len++;
> -	} else {
> -		b[0] = 0xffffffff;
> -	}
> -
> -	b[1] = MI_BATCH_BUFFER_START | (len - 2);
> -	b[1+len] = MI_BATCH_BUFFER_END;
> -	b[2+len] = MI_NOOP;
> +	b[0] = MI_BATCH_BUFFER_START | (len - 2);
> +	b[len] = MI_BATCH_BUFFER_END;
> +	b[len+1] = MI_NOOP;
>  	gem_write(fd, exec.handle, 0, b, sizeof(b));
>  
> -	reloc.offset = 8;
> -	reloc.delta = 4;
> +	reloc.offset = sizeof(uint32_t);
>  	reloc.target_handle = exec.handle;
>  	reloc.read_domains = I915_GEM_DOMAIN_COMMAND;
>  
> -- 
> 2.13.3
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter Aug. 8, 2017, 1:25 p.m. UTC | #2
On Mon, Aug 07, 2017 at 04:33:40PM +0300, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > The idea behind using an illegal instruction was to hang the GPU must
> > faster than simply using the recursive batch. However, we stopped doing
> > so on gen8+ as the CS parser was much laxer and allowed the illegal
> > command through but still interpreted the packet length (jumping over
> > the recursive batch buffer start that followed). Sandybridge doesn't
> > just hang the GPU when it encounters an illegal command on the BLT
> > engine, it hangs the machine. That goes above and beyond testing our
> > hangcheck + reset, so remove the deadly instructions.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  lib/igt_gt.c | 25 +++++--------------------
> >  1 file changed, 5 insertions(+), 20 deletions(-)
> >
> > diff --git a/lib/igt_gt.c b/lib/igt_gt.c
> > index 6f7daa5e..d5e8b557 100644
> > --- a/lib/igt_gt.c
> > +++ b/lib/igt_gt.c
> > @@ -270,30 +270,15 @@ igt_hang_t igt_hang_ctx(int fd,
> >  
> >  	memset(b, 0xc5, sizeof(b));
> >  
> > -	/*
> > -	 * We emit invalid command to provoke a gpu hang.
> > -	 * If that doesn't work, we do bb start loop.
> > -	 * Note that the bb start aligment is illegal due this.
> > -	 * But hey, we are here to hang the gpu so whatever works.
> > -	 * We skip 0xfffffff on gen9 as it confuses hw in an such a way that
> > -	 * it will skip over the bb start, causing runaway head and
> > -	 * thus much slower hang detection.
> > -	 */
> 
> Daydreaming about MI_HALT,
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

I tested both on my snb for a few hours, works solid. I guess I botched
the job when I tried this conversion, resulting in a gpu that couldn't
reset somehow.

Both patches pushed to igt, thanks a lot.
-Daniel

> 
> >  	len = 2;
> > -	if (intel_gen(intel_get_drm_devid(fd)) >= 8) {
> > -		b[0] = MI_NOOP;
> > +	if (intel_gen(intel_get_drm_devid(fd)) >= 8)
> >  		len++;
> > -	} else {
> > -		b[0] = 0xffffffff;
> > -	}
> > -
> > -	b[1] = MI_BATCH_BUFFER_START | (len - 2);
> > -	b[1+len] = MI_BATCH_BUFFER_END;
> > -	b[2+len] = MI_NOOP;
> > +	b[0] = MI_BATCH_BUFFER_START | (len - 2);
> > +	b[len] = MI_BATCH_BUFFER_END;
> > +	b[len+1] = MI_NOOP;
> >  	gem_write(fd, exec.handle, 0, b, sizeof(b));
> >  
> > -	reloc.offset = 8;
> > -	reloc.delta = 4;
> > +	reloc.offset = sizeof(uint32_t);
> >  	reloc.target_handle = exec.handle;
> >  	reloc.read_domains = I915_GEM_DOMAIN_COMMAND;
> >  
> > -- 
> > 2.13.3
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/lib/igt_gt.c b/lib/igt_gt.c
index 6f7daa5e..d5e8b557 100644
--- a/lib/igt_gt.c
+++ b/lib/igt_gt.c
@@ -270,30 +270,15 @@  igt_hang_t igt_hang_ctx(int fd,
 
 	memset(b, 0xc5, sizeof(b));
 
-	/*
-	 * We emit invalid command to provoke a gpu hang.
-	 * If that doesn't work, we do bb start loop.
-	 * Note that the bb start aligment is illegal due this.
-	 * But hey, we are here to hang the gpu so whatever works.
-	 * We skip 0xfffffff on gen9 as it confuses hw in an such a way that
-	 * it will skip over the bb start, causing runaway head and
-	 * thus much slower hang detection.
-	 */
 	len = 2;
-	if (intel_gen(intel_get_drm_devid(fd)) >= 8) {
-		b[0] = MI_NOOP;
+	if (intel_gen(intel_get_drm_devid(fd)) >= 8)
 		len++;
-	} else {
-		b[0] = 0xffffffff;
-	}
-
-	b[1] = MI_BATCH_BUFFER_START | (len - 2);
-	b[1+len] = MI_BATCH_BUFFER_END;
-	b[2+len] = MI_NOOP;
+	b[0] = MI_BATCH_BUFFER_START | (len - 2);
+	b[len] = MI_BATCH_BUFFER_END;
+	b[len+1] = MI_NOOP;
 	gem_write(fd, exec.handle, 0, b, sizeof(b));
 
-	reloc.offset = 8;
-	reloc.delta = 4;
+	reloc.offset = sizeof(uint32_t);
 	reloc.target_handle = exec.handle;
 	reloc.read_domains = I915_GEM_DOMAIN_COMMAND;