Message ID | 20191111114045.28097-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [i-g-t] i915/gem_eio: Flush RCU before timing our own critical sections | expand |
On 11/11/2019 11:40, Chris Wilson wrote: > We cannot control how long RCU takes to find a quiescent point as that > depends upon the background load and so may take an arbitrary time. > Instead, let's try to avoid that impacting our measurements by inserting > an rcu_barrier() before our critical timing sections and hope that hides > the issue, letting us always perform a fast reset. Fwiw, we do the > expedited RCU synchronize, but that is not always enough. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > tests/i915/gem_eio.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tests/i915/gem_eio.c b/tests/i915/gem_eio.c > index 8d6cb9760..49d2a99e9 100644 > --- a/tests/i915/gem_eio.c > +++ b/tests/i915/gem_eio.c > @@ -71,6 +71,7 @@ static void trigger_reset(int fd) > { > struct timespec ts = { }; > > + rcu_barrier(fd); /* flush any excess work before we start timing */ > igt_nsec_elapsed(&ts); > > igt_kmsg(KMSG_DEBUG "Forcing GPU reset\n"); > @@ -227,6 +228,10 @@ static void hang_handler(union sigval arg) > igt_debug("hang delay = %.2fus\n", > igt_nsec_elapsed(&ctx->delay) / 1000.0); > > + /* flush any excess work before we start timing our reset */ > + igt_assert(igt_sysfs_printf(ctx->debugfs, "i915_drop_caches", > + "%d", DROP_RCU)); > + > igt_nsec_elapsed(ctx->ts); > igt_assert(igt_sysfs_set(ctx->debugfs, "i915_wedged", "-1")); > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Avoid scoring demerit points if you add reference to bugzilla, presumably linking to CI results, showing this was known to be flaky. :) Regards, Tvrtko
diff --git a/tests/i915/gem_eio.c b/tests/i915/gem_eio.c index 8d6cb9760..49d2a99e9 100644 --- a/tests/i915/gem_eio.c +++ b/tests/i915/gem_eio.c @@ -71,6 +71,7 @@ static void trigger_reset(int fd) { struct timespec ts = { }; + rcu_barrier(fd); /* flush any excess work before we start timing */ igt_nsec_elapsed(&ts); igt_kmsg(KMSG_DEBUG "Forcing GPU reset\n"); @@ -227,6 +228,10 @@ static void hang_handler(union sigval arg) igt_debug("hang delay = %.2fus\n", igt_nsec_elapsed(&ctx->delay) / 1000.0); + /* flush any excess work before we start timing our reset */ + igt_assert(igt_sysfs_printf(ctx->debugfs, "i915_drop_caches", + "%d", DROP_RCU)); + igt_nsec_elapsed(ctx->ts); igt_assert(igt_sysfs_set(ctx->debugfs, "i915_wedged", "-1"));
We cannot control how long RCU takes to find a quiescent point as that depends upon the background load and so may take an arbitrary time. Instead, let's try to avoid that impacting our measurements by inserting an rcu_barrier() before our critical timing sections and hope that hides the issue, letting us always perform a fast reset. Fwiw, we do the expedited RCU synchronize, but that is not always enough. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- tests/i915/gem_eio.c | 5 +++++ 1 file changed, 5 insertions(+)