mbox series

[i-g-t,0/5] tests/gem_ctx_exec: Fix failing preempt timeout updates

Message ID 20240715181523.2825921-7-janusz.krzysztofik@linux.intel.com (mailing list archive)
Headers show
Series tests/gem_ctx_exec: Fix failing preempt timeout updates | expand

Message

Janusz Krzysztofik July 15, 2024, 6:13 p.m. UTC
CI reports the following failures from basic-nohangcheck subtest:

(gem_ctx_exec:1115) CRITICAL: Test assertion failure function nohangcheck_hostile, file ../../../usr/src/igt-gpu-tools/tests/intel/gem_ctx_exec.c:374:
(gem_ctx_exec:1115) CRITICAL: Failed assertion: err == 0
(gem_ctx_exec:1115) CRITICAL: Last errno: 2, No such file or directory
(gem_ctx_exec:1115) CRITICAL: Hostile unpreemptable context was not cancelled immediately upon closure

The subtest sets 50 ms preempt timeout on each engine before proceding
with submission of spins, then it waits up to 1 second for those spins to
be terminated.  However, dump of engines' debugfs data performed by the
subtest after the failure shows preempt timeouts still at their default
values: 7500 ms on rcs0 and 640 ms on other class engines.  Dmesg records
confirm preemption timeouts triggered on other engines after 640 ms and
not on rcs0 within the 1 second limit.

Fix the issue.

Link: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/6268

Janusz Krzysztofik (5):
  tests/gem_ctx_exec: Fail on unsuccessful preempt timeout update
  lib: Add more debug messages to error paths
  lib/gem_engine_topology: Fix premature break from primary find loop
  lib/gem_engine_topology: Simplify the method of opening a primary
  lib/gem_engine_topology: Fix broken compare of device links

 lib/i915/gem_engine_topology.c | 24 ++++++++++--------------
 lib/igt_sysfs.c                |  4 ++--
 tests/intel/gem_ctx_exec.c     |  5 +++--
 3 files changed, 15 insertions(+), 18 deletions(-)