Message ID | 1448468364-20610-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Nov 25, 2015 at 05:19:24PM +0100, Daniel Vetter wrote: > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > hang we need to be careful to not run into the "hanging too fast > check": That's not how it works. It restores the GPU by triggering a manual reset. -Chris
On Wed, Nov 25, 2015 at 04:28:02PM +0000, Chris Wilson wrote: > On Wed, Nov 25, 2015 at 05:19:24PM +0100, Daniel Vetter wrote: > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > hang we need to be careful to not run into the "hanging too fast > > check": > > That's not how it works. It restores the GPU by triggering a manual > reset. Maybe in your tree, but here it goes through the full hang recovery. And the gem_eio default context _does_ get banned without this. Maybe it's a bug in the ban logic, but it didn't look like that. After all debugfs/i915_wedged is a hack for developers - by that time userspace has already either died or fallen back to sw rendering anyway. So I figured no point in fixing this in the kernel with special cases. -Daniel
On Thu, Nov 26, 2015 at 09:34:01AM +0100, Daniel Vetter wrote: > On Wed, Nov 25, 2015 at 04:28:02PM +0000, Chris Wilson wrote: > > On Wed, Nov 25, 2015 at 05:19:24PM +0100, Daniel Vetter wrote: > > > Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu > > > hang we need to be careful to not run into the "hanging too fast > > > check": > > > > That's not how it works. It restores the GPU by triggering a manual > > reset. > > Maybe in your tree, but here it goes through the full hang recovery. Right. > And > the gem_eio default context _does_ get banned without this. Wrong. > Maybe it's a > bug in the ban logic, but it didn't look like that. Indeed, that is not where the *kernel* bug is. -Chris
diff --git a/tests/gem_eio.c b/tests/gem_eio.c index ad67332eae59..f6e41db63a7d 100644 --- a/tests/gem_eio.c +++ b/tests/gem_eio.c @@ -84,13 +84,17 @@ static void trigger_reset(int fd) static void wedge_gpu(int fd) { + igt_hang_ring_t hang; + /* First idle the GPU then disable GPU resets before injecting a hang */ gem_quiescent_gpu(fd); igt_require(i915_reset_control(false)); igt_debug("Wedging GPU by injecting hang\n"); - igt_post_hang_ring(fd, igt_hang_ring(fd, I915_EXEC_DEFAULT)); + hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); + hang.ban = 0; + igt_post_hang_ring(fd, hang); igt_assert(i915_reset_control(true)); } @@ -161,10 +165,14 @@ static void test_wait(int fd) { igt_hang_ring_t hang; + /* First idle the GPU then disable GPU resets before injecting a hang */ + gem_quiescent_gpu(fd); + igt_require(i915_reset_control(false)); hang = igt_hang_ring(fd, I915_EXEC_DEFAULT); igt_assert_eq(__gem_wait(fd, hang.handle, -1), -EIO); + hang.ban = 0; igt_post_hang_ring(fd, hang); igt_assert(i915_reset_control(true));
Since $debugfs/i915_wedged restores a wedged gpu by using a normal gpu hang we need to be careful to not run into the "hanging too fast check": - don't restore the ban period, but instead keep it at 0. - make sure we idle the gpu fully before hanging it again (wait subtest missted that). With this gem_eio works now reliable even when I don't run the subtests individually. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- tests/gem_eio.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)