diff mbox

drm/i915/selftests: Fix error checking for wait_var_timeout

Message ID 20180417170638.20550-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 17, 2018, 5:06 p.m. UTC
The old wait_on_atomic_t used a custom callback to perform the
schedule(), which used my return semantics of reporting an error code on
timeout. wait_var_event_timeout() uses the schedule() return semantics
of reporting the remaining jiffies (1 if it timed out with 0 jiffies
remaining!) and 0 on failure. This semantic mismatch lead to us falsely
claiming a time out occurred.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Joonas Lahtinen April 18, 2018, 9:10 a.m. UTC | #1
Quoting Chris Wilson (2018-04-17 20:06:38)
> The old wait_on_atomic_t used a custom callback to perform the
> schedule(), which used my return semantics of reporting an error code on
> timeout. wait_var_event_timeout() uses the schedule() return semantics
> of reporting the remaining jiffies (1 if it timed out with 0 jiffies
> remaining!) and 0 on failure. This semantic mismatch lead to us falsely
> claiming a time out occurred.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
> Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Chris Wilson April 18, 2018, 9:14 a.m. UTC | #2
Quoting Joonas Lahtinen (2018-04-18 10:10:17)
> Quoting Chris Wilson (2018-04-17 20:06:38)
> > The old wait_on_atomic_t used a custom callback to perform the
> > schedule(), which used my return semantics of reporting an error code on
> > timeout. wait_var_event_timeout() uses the schedule() return semantics
> > of reporting the remaining jiffies (1 if it timed out with 0 jiffies
> > remaining!) and 0 on failure. This semantic mismatch lead to us falsely
> > claiming a time out occurred.
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
> > Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

How about a backmerge of rc1 onto drm-intel-next-queued so we can apply
the fix?
-Chris
Joonas Lahtinen April 18, 2018, 10:11 a.m. UTC | #3
Quoting Chris Wilson (2018-04-18 12:14:15)
> Quoting Joonas Lahtinen (2018-04-18 10:10:17)
> > Quoting Chris Wilson (2018-04-17 20:06:38)
> > > The old wait_on_atomic_t used a custom callback to perform the
> > > schedule(), which used my return semantics of reporting an error code on
> > > timeout. wait_var_event_timeout() uses the schedule() return semantics
> > > of reporting the remaining jiffies (1 if it timed out with 0 jiffies
> > > remaining!) and 0 on failure. This semantic mismatch lead to us falsely
> > > claiming a time out occurred.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
> > > Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> 
> How about a backmerge of rc1 onto drm-intel-next-queued so we can apply
> the fix?

+ Jani for that

Regards, Joonas

> -Chris
Mika Kuoppala April 18, 2018, 10:36 a.m. UTC | #4
Chris Wilson <chris@chris-wilson.co.uk> writes:

> The old wait_on_atomic_t used a custom callback to perform the
> schedule(), which used my return semantics of reporting an error code on
> timeout. wait_var_event_timeout() uses the schedule() return semantics
> of reporting the remaining jiffies (1 if it timed out with 0 jiffies
> remaining!) and 0 on failure. This semantic mismatch lead to us falsely
> claiming a time out occurred.

I might have gone too far into the rabbit hole on this one.
The more I abseiled, the more there was macro trickery and
shadowing return values. Anxiety started to creep in.

I am back but will need to recuperate.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
> Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> index 46580026c7fc..d6926e7820e5 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> @@ -412,10 +412,11 @@ static int igt_wakeup(void *arg)
>  		 * that they are ready for the next test. We wait until all
>  		 * threads are complete and waiting for us (i.e. not a seqno).
>  		 */
> -		err = wait_var_event_timeout(&done, !atomic_read(&done), 10 * HZ);
> -		if (err) {
> +		if (!wait_var_event_timeout(&done,
> +					    !atomic_read(&done), 10 * HZ)) {
>  			pr_err("Timed out waiting for %d remaining waiters\n",
>  			       atomic_read(&done));
> +			err = -ETIMEDOUT;
>  			break;
>  		}
>  
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson May 2, 2018, 10:20 a.m. UTC | #5
Quoting Patchwork (2018-04-18 14:38:56)
>     ==== Possible fixes ====
> 
>     igt@drv_selftest@mock_breadcrumbs:
>       shard-hsw:          DMESG-FAIL (fdo#106085) -> PASS
>       shard-snb:          DMESG-FAIL (fdo#106085) -> PASS
>       shard-apl:          DMESG-FAIL (fdo#106085) -> PASS
>       shard-kbl:          DMESG-FAIL (fdo#106085) -> PASS

And with dinq rolled forward, I've pushed the patch. Thanks for the
review,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
index 46580026c7fc..d6926e7820e5 100644
--- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
@@ -412,10 +412,11 @@  static int igt_wakeup(void *arg)
 		 * that they are ready for the next test. We wait until all
 		 * threads are complete and waiting for us (i.e. not a seqno).
 		 */
-		err = wait_var_event_timeout(&done, !atomic_read(&done), 10 * HZ);
-		if (err) {
+		if (!wait_var_event_timeout(&done,
+					    !atomic_read(&done), 10 * HZ)) {
 			pr_err("Timed out waiting for %d remaining waiters\n",
 			       atomic_read(&done));
+			err = -ETIMEDOUT;
 			break;
 		}