Message ID | 1436533566-11048-1-git-send-email-tim.gore@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > From: Tim Gore <tim.gore@intel.com> > > In function igt_set_stop_rings, the test > igt_assert_f(flags == 0 || current == 0, .. > > will fail if we are trying to force a hang but the > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > With the introduction of per ring resets in the driver > (in android) these bits do not get cleared to zero when > an individual ring is reset. This causes subsequent > attempt to cause a ring hang via this function to fail, > leading to several igt tests failing (ie gem_reset_stats > subtest ban-xxx etc). Fix tdr to reset these instead? -Daniel > So, modify this test to look only at the bits that are > used to hang the gpu rings. > > Signed-off-by: Tim Gore <tim.gore@intel.com> > --- > lib/igt_gt.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/igt_gt.c b/lib/igt_gt.c > index 8e5e076..12c56fa 100644 > --- a/lib/igt_gt.c > +++ b/lib/igt_gt.c > @@ -345,8 +345,8 @@ void igt_set_stop_rings(enum stop_ring_flags flags) > STOP_RING_ALLOW_ERRORS)) == 0); > > current = igt_get_stop_rings(); > - igt_assert_f(flags == 0 || current == 0, > - "previous i915_ring_stop is still 0x%x\n", current); > + igt_assert_f( (flags & STOP_RING_ALL) == 0 || (current & STOP_RING_ALL) == 0, > + "previous i915_ring_stop is still 0x%x\n", current); > > stop_rings_write(flags); > current = igt_get_stop_rings(); > -- > 1.9.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Tim Gore Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Monday, July 13, 2015 10:30 AM > To: Gore, Tim > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > mode bits > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > > From: Tim Gore <tim.gore@intel.com> > > > > In function igt_set_stop_rings, the test > > igt_assert_f(flags == 0 || current == 0, .. > > > > will fail if we are trying to force a hang but the > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > > With the introduction of per ring resets in the driver (in android) > > these bits do not get cleared to zero when an individual ring is > > reset. This causes subsequent attempt to cause a ring hang via this > > function to fail, leading to several igt tests failing (ie > > gem_reset_stats subtest ban-xxx etc). > > Fix tdr to reset these instead? > -Daniel > I could change tdr, but why. When the TDR handles a ring hang and resets the ring, why would it modify the flag that defines if the driver should ban a frequently hanging context? If we get rid of the stop_rings interface, as Chris Wilson suggested, we would still need to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, but you would not expect to have to re-write these bits each time there is a ring reset. Tim > > So, modify this test to look only at the bits that are used to hang > > the gpu rings. > > > > Signed-off-by: Tim Gore <tim.gore@intel.com> > > --- > > lib/igt_gt.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/lib/igt_gt.c b/lib/igt_gt.c index 8e5e076..12c56fa 100644 > > --- a/lib/igt_gt.c > > +++ b/lib/igt_gt.c > > @@ -345,8 +345,8 @@ void igt_set_stop_rings(enum stop_ring_flags flags) > > STOP_RING_ALLOW_ERRORS)) == 0); > > > > current = igt_get_stop_rings(); > > - igt_assert_f(flags == 0 || current == 0, > > - "previous i915_ring_stop is still 0x%x\n", current); > > + igt_assert_f( (flags & STOP_RING_ALL) == 0 || (current & > STOP_RING_ALL) == 0, > > + "previous i915_ring_stop is still 0x%x\n", current); > > > > stop_rings_write(flags); > > current = igt_get_stop_rings(); > > -- > > 1.9.1 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote: > > > Tim Gore > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > > > -----Original Message----- > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel > > Vetter > > Sent: Monday, July 13, 2015 10:30 AM > > To: Gore, Tim > > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > > mode bits > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > > > From: Tim Gore <tim.gore@intel.com> > > > > > > In function igt_set_stop_rings, the test > > > igt_assert_f(flags == 0 || current == 0, .. > > > > > > will fail if we are trying to force a hang but the > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > > > With the introduction of per ring resets in the driver (in android) > > > these bits do not get cleared to zero when an individual ring is > > > reset. This causes subsequent attempt to cause a ring hang via this > > > function to fail, leading to several igt tests failing (ie > > > gem_reset_stats subtest ban-xxx etc). > > > > Fix tdr to reset these instead? > > -Daniel > > > I could change tdr, but why. When the TDR handles a ring hang and resets the ring, > why would it modify the flag that defines if the driver should ban a frequently hanging > context? If we get rid of the stop_rings interface, as Chris Wilson suggested, we would > still need to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, > but you would not expect to have to re-write these bits each time there is a ring > reset. The fix current hang recover code to no reset this, add some grace period, then push this patch to igt. We don't have full-blown abi guarantees for debugfs/igt stuff, but I want at least a few months (really last released kernel&igt) of backwards/forward compatibility. And inconsistent behaviour isn't great imo. -Daniel
Tim Gore Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Monday, July 13, 2015 3:59 PM > To: Gore, Tim > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > mode bits > > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote: > > > > > > Tim Gore > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon > > SN3 1RJ > > > > > -----Original Message----- > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of > > > Daniel Vetter > > > Sent: Monday, July 13, 2015 10:30 AM > > > To: Gore, Tim > > > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes > > > to stop_rings mode bits > > > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > > > > From: Tim Gore <tim.gore@intel.com> > > > > > > > > In function igt_set_stop_rings, the test > > > > igt_assert_f(flags == 0 || current == 0, .. > > > > > > > > will fail if we are trying to force a hang but the > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > > > > With the introduction of per ring resets in the driver (in > > > > android) these bits do not get cleared to zero when an individual > > > > ring is reset. This causes subsequent attempt to cause a ring hang > > > > via this function to fail, leading to several igt tests failing > > > > (ie gem_reset_stats subtest ban-xxx etc). > > > > > > Fix tdr to reset these instead? > > > -Daniel > > > > > I could change tdr, but why. When the TDR handles a ring hang and > > resets the ring, why would it modify the flag that defines if the > > driver should ban a frequently hanging context? If we get rid of the > > stop_rings interface, as Chris Wilson suggested, we would still need > > to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, but > you > > would not expect to have to re-write these bits each time there is a ring > reset. > > The fix current hang recover code to no reset this, add some grace period, > then push this patch to igt. We don't have full-blown abi guarantees for > debugfs/igt stuff, but I want at least a few months (really last released > kernel&igt) of backwards/forward compatibility. And inconsistent behaviour > isn't great imo. > -Daniel Sorry Daniel, I didn't really follow that. I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will defeat the point of this bit, except perhaps in test situations where you can keep setting it each time you deliberately cause a hang. It seems like the ALLOW_BAN bit has uses in real world situations, although I don't know it anyone uses it. Would you be more comfortable with Igt_assert_f ( ( flags ==0 ) || (( current & STOP_RING_ALL) ==0) && ((current ^ flags) & ~ STOP_RING_ALL == 0 ) ) So the either the new flags must be 0 (currently allowed) or the existing flags must indicate that all hangs are cleared (0 except possibly the mode bits) AND the mode bits you are writing are the same as the current values. ?? Tim > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Mon, Jul 13, 2015 at 04:07:14PM +0000, Gore, Tim wrote: > > > Tim Gore > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > > > > -----Original Message----- > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel > > Vetter > > Sent: Monday, July 13, 2015 3:59 PM > > To: Gore, Tim > > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > > mode bits > > > > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote: > > > > > > > > > Tim Gore > > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon > > > SN3 1RJ > > > > > > > -----Original Message----- > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of > > > > Daniel Vetter > > > > Sent: Monday, July 13, 2015 10:30 AM > > > > To: Gore, Tim > > > > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes > > > > to stop_rings mode bits > > > > > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > > > > > From: Tim Gore <tim.gore@intel.com> > > > > > > > > > > In function igt_set_stop_rings, the test > > > > > igt_assert_f(flags == 0 || current == 0, .. > > > > > > > > > > will fail if we are trying to force a hang but the > > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is set. > > > > > With the introduction of per ring resets in the driver (in > > > > > android) these bits do not get cleared to zero when an individual > > > > > ring is reset. This causes subsequent attempt to cause a ring hang > > > > > via this function to fail, leading to several igt tests failing > > > > > (ie gem_reset_stats subtest ban-xxx etc). > > > > > > > > Fix tdr to reset these instead? > > > > -Daniel > > > > > > > I could change tdr, but why. When the TDR handles a ring hang and > > > resets the ring, why would it modify the flag that defines if the > > > driver should ban a frequently hanging context? If we get rid of the > > > stop_rings interface, as Chris Wilson suggested, we would still need > > > to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in debugfs, but > > you > > > would not expect to have to re-write these bits each time there is a ring > > reset. > > > > The fix current hang recover code to no reset this, add some grace period, > > then push this patch to igt. We don't have full-blown abi guarantees for > > debugfs/igt stuff, but I want at least a few months (really last released > > kernel&igt) of backwards/forward compatibility. And inconsistent behaviour > > isn't great imo. > > -Daniel > > Sorry Daniel, I didn't really follow that. > I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will defeat the > point of this bit, except perhaps in test situations where you can keep setting it > each time you deliberately cause a hang. It seems like the ALLOW_BAN bit has > uses in real world situations, although I don't know it anyone uses it. ALLOW_BAN is only for testing. It's meant to re-enable auto-banning because we disable that by default for automated tests. This is all meant to be used together with the stop_rings stuff only. > Would you be more comfortable with > Igt_assert_f ( ( flags ==0 ) || > (( current & STOP_RING_ALL) ==0) && ((current ^ flags) & ~ STOP_RING_ALL == 0 ) ) > > So the either the new flags must be 0 (currently allowed) or the existing flags must > indicate that all hangs are cleared (0 except possibly the mode bits) AND the mode > bits you are writing are the same as the current values. ?? Iirc gem_reset_stats uses stop_rings only to supress gpu reset warnings in dmesg (since they're expected) and not for the actual stop_rings logic. It sets stop_rings after submitting the hanging batch, but before that one is detected as hung. We could just add another bit to stop_rings for that case. What I still don't understand is why tdr can't just keep on properly resetting stop_rings. It'll break tons of existing tests, and I don't understand the upside. stop_rings has become quasi-abi (that's why the ALLOW bits have such funky semantics, it's for backwards compat), if you need to change it you need to extend it, not break it. -Daniel
Tim Gore Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, July 14, 2015 9:09 AM > To: Gore, Tim > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes to stop_rings > mode bits > > On Mon, Jul 13, 2015 at 04:07:14PM +0000, Gore, Tim wrote: > > > > > > Tim Gore > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon > > SN3 1RJ > > > > > > > -----Original Message----- > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of > > > Daniel Vetter > > > Sent: Monday, July 13, 2015 3:59 PM > > > To: Gore, Tim > > > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org; Wood, Thomas > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow changes > > > to stop_rings mode bits > > > > > > On Mon, Jul 13, 2015 at 09:43:11AM +0000, Gore, Tim wrote: > > > > > > > > > > > > Tim Gore > > > > Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, > > > > Swindon > > > > SN3 1RJ > > > > > > > > > -----Original Message----- > > > > > From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of > > > > > Daniel Vetter > > > > > Sent: Monday, July 13, 2015 10:30 AM > > > > > To: Gore, Tim > > > > > Cc: intel-gfx@lists.freedesktop.org; Wood, Thomas > > > > > Subject: Re: [Intel-gfx] [PATCH i-g-t] lib/igt_gt.c : allow > > > > > changes to stop_rings mode bits > > > > > > > > > > On Fri, Jul 10, 2015 at 02:06:06PM +0100, tim.gore@intel.com wrote: > > > > > > From: Tim Gore <tim.gore@intel.com> > > > > > > > > > > > > In function igt_set_stop_rings, the test > > > > > > igt_assert_f(flags == 0 || current == 0, .. > > > > > > > > > > > > will fail if we are trying to force a hang but the > > > > > > STOP_RINGS_ALLOW_BAN or STOP_RINGS_ALLOW_ERROR bit is > set. > > > > > > With the introduction of per ring resets in the driver (in > > > > > > android) these bits do not get cleared to zero when an > > > > > > individual ring is reset. This causes subsequent attempt to > > > > > > cause a ring hang via this function to fail, leading to > > > > > > several igt tests failing (ie gem_reset_stats subtest ban-xxx etc). > > > > > > > > > > Fix tdr to reset these instead? > > > > > -Daniel > > > > > > > > > I could change tdr, but why. When the TDR handles a ring hang and > > > > resets the ring, why would it modify the flag that defines if the > > > > driver should ban a frequently hanging context? If we get rid of > > > > the stop_rings interface, as Chris Wilson suggested, we would > > > > still need to keep the STOP_RING_ALLOW_BAN/ALLOW_ERRORS bits in > > > > debugfs, but > > > you > > > > would not expect to have to re-write these bits each time there is > > > > a ring > > > reset. > > > > > > The fix current hang recover code to no reset this, add some grace > > > period, then push this patch to igt. We don't have full-blown abi > > > guarantees for debugfs/igt stuff, but I want at least a few months > > > (really last released > > > kernel&igt) of backwards/forward compatibility. And inconsistent > > > behaviour isn't great imo. > > > -Daniel > > > > Sorry Daniel, I didn't really follow that. > > I didn't want a gpu reset to clear the ALLOW_BAN bit, since this will > > defeat the point of this bit, except perhaps in test situations where > > you can keep setting it each time you deliberately cause a hang. It > > seems like the ALLOW_BAN bit has uses in real world situations, although I > don't know it anyone uses it. > > ALLOW_BAN is only for testing. It's meant to re-enable auto-banning > because we disable that by default for automated tests. This is all meant to > be used together with the stop_rings stuff only. > > > Would you be more comfortable with > > Igt_assert_f ( ( flags ==0 ) || > > (( current & STOP_RING_ALL) ==0) && ((current ^ flags) & ~ > > STOP_RING_ALL == 0 ) ) > > > > So the either the new flags must be 0 (currently allowed) or the > > existing flags must indicate that all hangs are cleared (0 except > > possibly the mode bits) AND the mode bits you are writing are the same as > the current values. ?? > > Iirc gem_reset_stats uses stop_rings only to supress gpu reset warnings in > dmesg (since they're expected) and not for the actual stop_rings logic. It sets > stop_rings after submitting the hanging batch, but before that one is > detected as hung. We could just add another bit to stop_rings for that case. > > What I still don't understand is why tdr can't just keep on properly resetting > stop_rings. It'll break tons of existing tests, and I don't understand the > upside. stop_rings has become quasi-abi (that's why the ALLOW bits have > such funky semantics, it's for backwards compat), if you need to change it > you need to extend it, not break it. > -Daniel > -- OK, I hadn't spotted that it was a debug only flag. In that case it is just up to the tests to keep setting it. It only seems to be used in gem_reset_stats anyway. So I'll change TDR to clear ALLOW_BAN/WARN when stop_bits gets cleared to 0. Tim > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
diff --git a/lib/igt_gt.c b/lib/igt_gt.c index 8e5e076..12c56fa 100644 --- a/lib/igt_gt.c +++ b/lib/igt_gt.c @@ -345,8 +345,8 @@ void igt_set_stop_rings(enum stop_ring_flags flags) STOP_RING_ALLOW_ERRORS)) == 0); current = igt_get_stop_rings(); - igt_assert_f(flags == 0 || current == 0, - "previous i915_ring_stop is still 0x%x\n", current); + igt_assert_f( (flags & STOP_RING_ALL) == 0 || (current & STOP_RING_ALL) == 0, + "previous i915_ring_stop is still 0x%x\n", current); stop_rings_write(flags); current = igt_get_stop_rings();