Message ID | 20180514080251.11224-5-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14/05/2018 09:02, Chris Wilson wrote: > trigger_reset() imposes a tight time constraint (2s) so that we verify > that the reset itself completes quickly. In the middle of this check, we > call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to > clear out the freed memory (DROP_FREED). Those barriers may have > unbounded latency pushing beyond the 2s timeout, so restrict the > operation to only wait-for-idle (DROP_ACTIVE). > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > tests/gem_eio.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > index 4720b47b5..e1aff639d 100644 > --- a/tests/gem_eio.c > +++ b/tests/gem_eio.c > @@ -74,8 +74,7 @@ static void trigger_reset(int fd) > /* And just check the gpu is indeed running again */ > igt_debug("Checking that the GPU recovered\n"); > gem_test_engine(fd, ALL_ENGINES); > - > - gem_quiescent_gpu(fd); > + igt_drop_caches_set(fd, DROP_ACTIVE); > > /* We expect forced reset and health check to be quick. */ > igt_assert(igt_seconds_elapsed(&ts) < 2); > Sounds fine to only wait for idle: Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> However I am a bit surprised that under plain IGT environment RCU latency can be so high. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-05-14 12:03:58) > > On 14/05/2018 09:02, Chris Wilson wrote: > > trigger_reset() imposes a tight time constraint (2s) so that we verify > > that the reset itself completes quickly. In the middle of this check, we > > call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to > > clear out the freed memory (DROP_FREED). Those barriers may have > > unbounded latency pushing beyond the 2s timeout, so restrict the > > operation to only wait-for-idle (DROP_ACTIVE). > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > tests/gem_eio.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/tests/gem_eio.c b/tests/gem_eio.c > > index 4720b47b5..e1aff639d 100644 > > --- a/tests/gem_eio.c > > +++ b/tests/gem_eio.c > > @@ -74,8 +74,7 @@ static void trigger_reset(int fd) > > /* And just check the gpu is indeed running again */ > > igt_debug("Checking that the GPU recovered\n"); > > gem_test_engine(fd, ALL_ENGINES); > > - > > - gem_quiescent_gpu(fd); > > + igt_drop_caches_set(fd, DROP_ACTIVE); > > > > /* We expect forced reset and health check to be quick. */ > > igt_assert(igt_seconds_elapsed(&ts) < 2); > > > > Sounds fine to only wait for idle: > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > However I am a bit surprised that under plain IGT environment RCU > latency can be so high. I suspect it's more of a latency issue with recent kernels or CI systems. There are a few mysterious multi-second sleeps that cause sporadic fails around the place. rcu_barrier() can be slow (like 30s slow) but only under load, afaik. -Chris
diff --git a/tests/gem_eio.c b/tests/gem_eio.c index 4720b47b5..e1aff639d 100644 --- a/tests/gem_eio.c +++ b/tests/gem_eio.c @@ -74,8 +74,7 @@ static void trigger_reset(int fd) /* And just check the gpu is indeed running again */ igt_debug("Checking that the GPU recovered\n"); gem_test_engine(fd, ALL_ENGINES); - - gem_quiescent_gpu(fd); + igt_drop_caches_set(fd, DROP_ACTIVE); /* We expect forced reset and health check to be quick. */ igt_assert(igt_seconds_elapsed(&ts) < 2);
trigger_reset() imposes a tight time constraint (2s) so that we verify that the reset itself completes quickly. In the middle of this check, we call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to clear out the freed memory (DROP_FREED). Those barriers may have unbounded latency pushing beyond the 2s timeout, so restrict the operation to only wait-for-idle (DROP_ACTIVE). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- tests/gem_eio.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)