Message ID | 20170807123636.24907-2-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Chris Wilson <chris@chris-wilson.co.uk> writes: > The idea behind using an illegal instruction was to hang the GPU must > faster than simply using the recursive batch. However, we stopped doing > so on gen8+ as the CS parser was much laxer and allowed the illegal > command through but still interpreted the packet length (jumping over > the recursive batch buffer start that followed). Sandybridge doesn't > just hang the GPU when it encounters an illegal command on the BLT > engine, it hangs the machine. That goes above and beyond testing our > hangcheck + reset, so remove the deadly instructions. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > lib/igt_gt.c | 25 +++++-------------------- > 1 file changed, 5 insertions(+), 20 deletions(-) > > diff --git a/lib/igt_gt.c b/lib/igt_gt.c > index 6f7daa5e..d5e8b557 100644 > --- a/lib/igt_gt.c > +++ b/lib/igt_gt.c > @@ -270,30 +270,15 @@ igt_hang_t igt_hang_ctx(int fd, > > memset(b, 0xc5, sizeof(b)); > > - /* > - * We emit invalid command to provoke a gpu hang. > - * If that doesn't work, we do bb start loop. > - * Note that the bb start aligment is illegal due this. > - * But hey, we are here to hang the gpu so whatever works. > - * We skip 0xfffffff on gen9 as it confuses hw in an such a way that > - * it will skip over the bb start, causing runaway head and > - * thus much slower hang detection. > - */ Daydreaming about MI_HALT, Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > len = 2; > - if (intel_gen(intel_get_drm_devid(fd)) >= 8) { > - b[0] = MI_NOOP; > + if (intel_gen(intel_get_drm_devid(fd)) >= 8) > len++; > - } else { > - b[0] = 0xffffffff; > - } > - > - b[1] = MI_BATCH_BUFFER_START | (len - 2); > - b[1+len] = MI_BATCH_BUFFER_END; > - b[2+len] = MI_NOOP; > + b[0] = MI_BATCH_BUFFER_START | (len - 2); > + b[len] = MI_BATCH_BUFFER_END; > + b[len+1] = MI_NOOP; > gem_write(fd, exec.handle, 0, b, sizeof(b)); > > - reloc.offset = 8; > - reloc.delta = 4; > + reloc.offset = sizeof(uint32_t); > reloc.target_handle = exec.handle; > reloc.read_domains = I915_GEM_DOMAIN_COMMAND; > > -- > 2.13.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Mon, Aug 07, 2017 at 04:33:40PM +0300, Mika Kuoppala wrote: > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > The idea behind using an illegal instruction was to hang the GPU must > > faster than simply using the recursive batch. However, we stopped doing > > so on gen8+ as the CS parser was much laxer and allowed the illegal > > command through but still interpreted the packet length (jumping over > > the recursive batch buffer start that followed). Sandybridge doesn't > > just hang the GPU when it encounters an illegal command on the BLT > > engine, it hangs the machine. That goes above and beyond testing our > > hangcheck + reset, so remove the deadly instructions. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > lib/igt_gt.c | 25 +++++-------------------- > > 1 file changed, 5 insertions(+), 20 deletions(-) > > > > diff --git a/lib/igt_gt.c b/lib/igt_gt.c > > index 6f7daa5e..d5e8b557 100644 > > --- a/lib/igt_gt.c > > +++ b/lib/igt_gt.c > > @@ -270,30 +270,15 @@ igt_hang_t igt_hang_ctx(int fd, > > > > memset(b, 0xc5, sizeof(b)); > > > > - /* > > - * We emit invalid command to provoke a gpu hang. > > - * If that doesn't work, we do bb start loop. > > - * Note that the bb start aligment is illegal due this. > > - * But hey, we are here to hang the gpu so whatever works. > > - * We skip 0xfffffff on gen9 as it confuses hw in an such a way that > > - * it will skip over the bb start, causing runaway head and > > - * thus much slower hang detection. > > - */ > > Daydreaming about MI_HALT, > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> I tested both on my snb for a few hours, works solid. I guess I botched the job when I tried this conversion, resulting in a gpu that couldn't reset somehow. Both patches pushed to igt, thanks a lot. -Daniel > > > len = 2; > > - if (intel_gen(intel_get_drm_devid(fd)) >= 8) { > > - b[0] = MI_NOOP; > > + if (intel_gen(intel_get_drm_devid(fd)) >= 8) > > len++; > > - } else { > > - b[0] = 0xffffffff; > > - } > > - > > - b[1] = MI_BATCH_BUFFER_START | (len - 2); > > - b[1+len] = MI_BATCH_BUFFER_END; > > - b[2+len] = MI_NOOP; > > + b[0] = MI_BATCH_BUFFER_START | (len - 2); > > + b[len] = MI_BATCH_BUFFER_END; > > + b[len+1] = MI_NOOP; > > gem_write(fd, exec.handle, 0, b, sizeof(b)); > > > > - reloc.offset = 8; > > - reloc.delta = 4; > > + reloc.offset = sizeof(uint32_t); > > reloc.target_handle = exec.handle; > > reloc.read_domains = I915_GEM_DOMAIN_COMMAND; > > > > -- > > 2.13.3 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff --git a/lib/igt_gt.c b/lib/igt_gt.c index 6f7daa5e..d5e8b557 100644 --- a/lib/igt_gt.c +++ b/lib/igt_gt.c @@ -270,30 +270,15 @@ igt_hang_t igt_hang_ctx(int fd, memset(b, 0xc5, sizeof(b)); - /* - * We emit invalid command to provoke a gpu hang. - * If that doesn't work, we do bb start loop. - * Note that the bb start aligment is illegal due this. - * But hey, we are here to hang the gpu so whatever works. - * We skip 0xfffffff on gen9 as it confuses hw in an such a way that - * it will skip over the bb start, causing runaway head and - * thus much slower hang detection. - */ len = 2; - if (intel_gen(intel_get_drm_devid(fd)) >= 8) { - b[0] = MI_NOOP; + if (intel_gen(intel_get_drm_devid(fd)) >= 8) len++; - } else { - b[0] = 0xffffffff; - } - - b[1] = MI_BATCH_BUFFER_START | (len - 2); - b[1+len] = MI_BATCH_BUFFER_END; - b[2+len] = MI_NOOP; + b[0] = MI_BATCH_BUFFER_START | (len - 2); + b[len] = MI_BATCH_BUFFER_END; + b[len+1] = MI_NOOP; gem_write(fd, exec.handle, 0, b, sizeof(b)); - reloc.offset = 8; - reloc.delta = 4; + reloc.offset = sizeof(uint32_t); reloc.target_handle = exec.handle; reloc.read_domains = I915_GEM_DOMAIN_COMMAND;
The idea behind using an illegal instruction was to hang the GPU must faster than simply using the recursive batch. However, we stopped doing so on gen8+ as the CS parser was much laxer and allowed the illegal command through but still interpreted the packet length (jumping over the recursive batch buffer start that followed). Sandybridge doesn't just hang the GPU when it encounters an illegal command on the BLT engine, it hangs the machine. That goes above and beyond testing our hangcheck + reset, so remove the deadly instructions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- lib/igt_gt.c | 25 +++++-------------------- 1 file changed, 5 insertions(+), 20 deletions(-)