diff mbox

[i-g-t,5/6] tests/gem_eio: Only wait-for-idle inside trigger_reset()

Message ID 20180514080251.11224-5-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 14, 2018, 8:02 a.m. UTC
trigger_reset() imposes a tight time constraint (2s) so that we verify
that the reset itself completes quickly. In the middle of this check, we
call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
clear out the freed memory (DROP_FREED). Those barriers may have
unbounded latency pushing beyond the 2s timeout, so restrict the
operation to only wait-for-idle (DROP_ACTIVE).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/gem_eio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Tvrtko Ursulin May 14, 2018, 11:03 a.m. UTC | #1
On 14/05/2018 09:02, Chris Wilson wrote:
> trigger_reset() imposes a tight time constraint (2s) so that we verify
> that the reset itself completes quickly. In the middle of this check, we
> call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> clear out the freed memory (DROP_FREED). Those barriers may have
> unbounded latency pushing beyond the 2s timeout, so restrict the
> operation to only wait-for-idle (DROP_ACTIVE).
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   tests/gem_eio.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> index 4720b47b5..e1aff639d 100644
> --- a/tests/gem_eio.c
> +++ b/tests/gem_eio.c
> @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
>   	/* And just check the gpu is indeed running again */
>   	igt_debug("Checking that the GPU recovered\n");
>   	gem_test_engine(fd, ALL_ENGINES);
> -
> -	gem_quiescent_gpu(fd);
> +	igt_drop_caches_set(fd, DROP_ACTIVE);
>   
>   	/* We expect forced reset and health check to be quick. */
>   	igt_assert(igt_seconds_elapsed(&ts) < 2);
> 

Sounds fine to only wait for idle:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

However I am a bit surprised that under plain IGT environment RCU 
latency can be so high.

Regards,

Tvrtko
Chris Wilson May 14, 2018, 11:07 a.m. UTC | #2
Quoting Tvrtko Ursulin (2018-05-14 12:03:58)
> 
> On 14/05/2018 09:02, Chris Wilson wrote:
> > trigger_reset() imposes a tight time constraint (2s) so that we verify
> > that the reset itself completes quickly. In the middle of this check, we
> > call gem_quiescent_gpu() which may invoke an rcu_barrier() or two to
> > clear out the freed memory (DROP_FREED). Those barriers may have
> > unbounded latency pushing beyond the 2s timeout, so restrict the
> > operation to only wait-for-idle (DROP_ACTIVE).
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105957
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   tests/gem_eio.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/tests/gem_eio.c b/tests/gem_eio.c
> > index 4720b47b5..e1aff639d 100644
> > --- a/tests/gem_eio.c
> > +++ b/tests/gem_eio.c
> > @@ -74,8 +74,7 @@ static void trigger_reset(int fd)
> >       /* And just check the gpu is indeed running again */
> >       igt_debug("Checking that the GPU recovered\n");
> >       gem_test_engine(fd, ALL_ENGINES);
> > -
> > -     gem_quiescent_gpu(fd);
> > +     igt_drop_caches_set(fd, DROP_ACTIVE);
> >   
> >       /* We expect forced reset and health check to be quick. */
> >       igt_assert(igt_seconds_elapsed(&ts) < 2);
> > 
> 
> Sounds fine to only wait for idle:
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> However I am a bit surprised that under plain IGT environment RCU 
> latency can be so high.

I suspect it's more of a latency issue with recent kernels or CI systems.
There are a few mysterious multi-second sleeps that cause sporadic fails
around the place. rcu_barrier() can be slow (like 30s slow) but only
under load, afaik.
-Chris
diff mbox

Patch

diff --git a/tests/gem_eio.c b/tests/gem_eio.c
index 4720b47b5..e1aff639d 100644
--- a/tests/gem_eio.c
+++ b/tests/gem_eio.c
@@ -74,8 +74,7 @@  static void trigger_reset(int fd)
 	/* And just check the gpu is indeed running again */
 	igt_debug("Checking that the GPU recovered\n");
 	gem_test_engine(fd, ALL_ENGINES);
-
-	gem_quiescent_gpu(fd);
+	igt_drop_caches_set(fd, DROP_ACTIVE);
 
 	/* We expect forced reset and health check to be quick. */
 	igt_assert(igt_seconds_elapsed(&ts) < 2);