mbox series

[i-g-t,v5,00/21] tests/core_hotunplug: Fixes and enhancements

Message ID 20200828075927.17061-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
Headers show
Series tests/core_hotunplug: Fixes and enhancements | expand

Message

Janusz Krzysztofik Aug. 28, 2020, 7:59 a.m. UTC
Clean up the test code, add some new basic subtests, then unblock
unbind test variants.

No incompletes / aborts have been reported by Trybot this time.

Series changelog:
v2: New patch "Un-blocklist *bind* subtests added.
v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
    from subtest failures".
  - a new patche "Clean up device open error handling" added, an old
    patch "Fix missing newline" obsoleted by the new one dropped,
  - other new patches added:
    - "Let the driver time out essential sysfs operations",
    - "More thorough i915 healthcheck and recovery",
  - a patch "Add 'lateclose before restore' variants" from another
    series included.
v4: Optional patch "Duplicate debug messages in dmesg" from another
    series included.
v5: New patch added with Haswell audio related kernel warning worked
    around and replaced with an IGT warning to preserve visibility of
    the issue.

@MichaƂ: Since some patch updates are trivial, I've preserved your
v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
marked your R-b as v1/v2 applicable.  Please have a look and confirm if
you are still OK with them.

@Tvrtko: As I already asked before, please support my attempt to remove
the unbind test variants from the blocklist.

@Petri, @Martin: Assuming CI results will be as good as those obtained
on Trybot, please give me your green light for merging this series if
you have no objections.

Thanks,
Janusz


Janusz Krzysztofik (21):
  tests/core_hotunplug: Use igt_assert_fd()
  tests/core_hotunplug: Constify dev_bus_addr string
  tests/core_hotunplug: Clean up device open error handling
  tests/core_hotunplug: Consolidate duplicated debug messages
  tests/core_hotunplug: Assert successful device filter application
  tests/core_hotunplug: Maintain a single data structure instance
  tests/core_hotunplug: Pass errors via a data structure field
  tests/core_hotunplug: Handle device close errors
  tests/core_hotunplug: Prepare invariant data once per test run
  tests/core_hotunplug: Skip selectively on sysfs close errors
  tests/core_hotunplug: Recover from subtest failures
  tests/core_hotunplug: Fail subtests on device close errors
  tests/core_hotunplug: Let the driver time out essential sysfs
    operations
  tests/core_hotunplug: Process return values of sysfs operations
  tests/core_hotunplug: Assert expected device presence/absence
  tests/core_hotunplug: Explicitly ignore unused return values
  tests/core_hotunplug: More thorough i915 healthcheck and recovery
  tests/core_hotunplug: Add 'lateclose before restore' variants
  tests/core_hotunplug: Duplicate debug messages in dmesg
  tests/core_hotunplug: HSW audio issue workaround
  tests/core_hotunplug: Un-blocklist *bind* subtests

 tests/core_hotunplug.c       | 542 ++++++++++++++++++++++++++---------
 tests/intel-ci/blacklist.txt |   2 +-
 2 files changed, 410 insertions(+), 134 deletions(-)

Comments

Janusz Krzysztofik Aug. 28, 2020, 1:05 p.m. UTC | #1
On Fri, 2020-08-28 at 11:53 +0000, Patchwork wrote:
> Patch Details
> Series:	tests/core_hotunplug: Fixes and enhancements (rev5)
> URL:	https://patchwork.freedesktop.org/series/79671/
> State:	failure
> Details:	https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> CI Bug Log - changes from IGT_5774_full -> IGTPW_4914_full
> Summary
> FAILURE
> 
> Serious unknown changes coming with IGTPW_4914_full absolutely need to be
> verified manually.
> 
> If you think the reported changes have nothing to do with the changes
> introduced in IGTPW_4914_full, please notify your bug team to allow them
> to document this new failure mode, which will reduce false positives in CI.
> 
> External URL: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> 
> Possible new issues
> Here are the unknown changes that may have been introduced in IGTPW_4914_full:
> 
> IGT changes
> Possible regressions
> {igt@core_hotunplug@hotrebind-lateclose} (NEW):
> 
> shard-snb: NOTRUN -> FAIL
> 
> shard-iclb: NOTRUN -> FAIL
> 
> shard-tglb: NOTRUN -> DMESG-WARN
> 
> shard-glk: NOTRUN -> FAIL
> 
> shard-hsw: NOTRUN -> FAIL
> 
> shard-kbl: NOTRUN -> FAIL

As before (rev4), this is an existing but formerly not reported GPU
hang driver issue exhibited by the test, not a regression.  The issue
needs to be fixed in the driver for the test to succeed.  As one can
see from CI reports, the test succesfully recovers from that condition
and subsequent tests don't report GPU hangs.

> 
> {igt@core_hotunplug@unbind-rebind} (NEW):
> 
> shard-hsw: NOTRUN -> WARN +1 similar issue

This is an IGT warning that replaces a former (rev4) DMESG-WARN ->
INCOMPLETE caused by a known driver issue already reported by 
igt@device_reset@unbind-reset-rebind.  The issue has nothing to do with
device reset, only with driver unbind on Haswell with Azalia audio. 
The kernel side needs to be fixed for the WARN not be triggered and the
tests succeed.  Meanwhile, the IGT warning workaround keeps the issue
still visible in CI while not affecting CI runs.

> igt@gem_render_copy@linear:
> 
> shard-tglb: PASS -> FAIL +2 similar issues

This is a strange issue of an inaccessible "i915_gem_drop_caches"
debugfs entry for the render device node of the device just exercised
with igt@core_hotunplug@hotrebind-lateclose on a GuC platform.  Not
reported by Trybot unfortunately, but here evidently affecting
subsequent tests.  Looks like the health check and recovery phase of
the test still needs more work, sorry.

Thanks,
Janusz


> New tests
> New tests have been introduced between IGT_5774_full and IGTPW_4914_full:
> 
> New IGT tests (3)
> igt@core_hotunplug@hotrebind-lateclose:
> 
> Statuses : 1 dmesg-warn(s) 6 fail(s)
> Exec time: [6.13, 17.39] s
> igt@core_hotunplug@hotunbind-rebind:
> 
> Statuses : 6 pass(s) 1 warn(s)
> Exec time: [0.39, 1.96] s
> igt@core_hotunplug@unbind-rebind:
> 
> Statuses : 6 pass(s) 1 warn(s)
> Exec time: [0.38, 1.91] s
> Known issues
> Here are the changes found in IGTPW_4914_full that come from known issues:
> 
> IGT changes
> Issues hnotit
> igt@gem_exec_reloc@basic-concurrent0:
> 
> shard-tglb: PASS -> TIMEOUT (i915#1958)
> 
> shard-kbl: PASS -> TIMEOUT (i915#1958) +1 similar issue
> 
> igt@gem_exec_whisper@basic-forked:
> 
> shard-iclb: PASS -> TIMEOUT (i915#1958)
> igt@gem_exec_whisper@basic-forked-all:
> 
> shard-glk: PASS -> DMESG-WARN (i915#118 / i915#95)
> igt@gem_exec_whisper@basic-queues-forked-all:
> 
> shard-glk: PASS -> TIMEOUT (i915#1958) +4 similar issues
> 
> shard-apl: PASS -> TIMEOUT (i915#1635 / i915#1958) +1 similar issue
> 
> igt@gen9_exec_parse@allowed-all:
> 
> shard-apl: PASS -> DMESG-WARN (i915#1436 / i915#1635 / i915#716)
> igt@i915_pm_dc@dc6-psr:
> 
> shard-iclb: PASS -> FAIL (i915#1899)
> igt@i915_pm_rpm@reg-read-ioctl:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#165)
> igt@i915_selftest@mock@contexts:
> 
> shard-hsw: PASS -> INCOMPLETE (i915#2278)
> igt@kms_frontbuffer_tracking@fbc-1p-primscrn-shrfb-pgflip-blt:
> 
> shard-tglb: PASS -> DMESG-WARN (i915#1982) +2 similar issues
> igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-shrfb-draw-mmap-wc:
> 
> shard-glk: PASS -> FAIL (i915#49)
> igt@kms_frontbuffer_tracking@fbc-badstride:
> 
> shard-glk: PASS -> DMESG-WARN (i915#1982)
> igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary:
> 
> shard-kbl: PASS -> FAIL (i915#49)
> 
> shard-apl: PASS -> FAIL (i915#1635 / i915#49)
> 
> igt@kms_hdmi_inject@inject-audio:
> 
> shard-tglb: PASS -> SKIP (i915#433)
> igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#180)
> igt@kms_psr2_su@frontbuffer:
> 
> shard-iclb: PASS -> SKIP (fdo#109642 / fdo#111068)
> igt@kms_psr@psr2_primary_mmap_cpu:
> 
> shard-iclb: PASS -> SKIP (fdo#109441) +1 similar issue
> igt@kms_universal_plane@universal-plane-gen9-features-pipe-a:
> 
> shard-kbl: PASS -> DMESG-WARN (i915#1982) +1 similar issue
> igt@kms_vblank@pipe-a-query-busy-hang:
> 
> shard-apl: PASS -> DMESG-WARN (i915#1635 / i915#1982)
> Possible fixes
> igt@gem_exec_reloc@basic-many-active@rcs0:
> 
> shard-apl: FAIL (i915#1635 / i915#2389) -> PASS
> 
> shard-hsw: FAIL (i915#2389) -> PASS
> 
> igt@gem_exec_whisper@basic-contexts-priority:
> 
> shard-apl: TIMEOUT (i915#1635 / i915#1958) -> PASS
> igt@gem_exec_whisper@basic-fds:
> 
> shard-iclb: TIMEOUT (i915#1958) -> PASS +1 similar issue
> igt@gem_exec_whisper@basic-normal:
> 
> shard-glk: TIMEOUT (i915#1958) -> PASS
> igt@i915_selftest@mock@contexts:
> 
> shard-apl: INCOMPLETE (i915#1635 / i915#2278) -> PASS
> igt@i915_suspend@fence-restore-tiled2untiled:
> 
> shard-kbl: INCOMPLETE (i915#155) -> PASS
> igt@kms_big_fb@x-tiled-64bpp-rotate-0:
> 
> shard-glk: DMESG-FAIL (i915#118 / i915#95) -> PASS
> igt@kms_cursor_crc@pipe-b-cursor-64x21-onscreen:
> 
> shard-kbl: FAIL (i915#54) -> PASS
> 
> shard-apl: FAIL (i915#1635 / i915#54) -> PASS
> 
> shard-glk: FAIL (i915#54) -> PASS
> 
> igt@kms_flip@2x-blocking-absolute-wf_vblank-interruptible@ab-vga1-hdmi-a1:
> 
> shard-hsw: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> igt@kms_flip@dpms-vs-vblank-race-interruptible@a-dp1:
> 
> shard-kbl: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> igt@kms_flip@flip-vs-expired-vblank@a-hdmi-a1:
> 
> shard-glk: FAIL (i915#79) -> PASS
> igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
> 
> shard-kbl: DMESG-WARN (i915#180) -> PASS +12 similar issues
> igt@kms_psr@psr2_sprite_mmap_gtt:
> 
> shard-iclb: SKIP (fdo#109441) -> PASS +2 similar issues
> igt@kms_universal_plane@universal-plane-pipe-c-sanity:
> 
> shard-tglb: DMESG-WARN (i915#1982) -> PASS +1 similar issue
> Warnings
> igt@runner@aborted:
> 
> shard-hsw: FAIL (i915#2283) -> (FAIL, FAIL) (i915#1436 / i915#2283)
> 
> shard-apl: FAIL (i915#1635) -> FAIL (fdo#109271 / i915#1635 / i915#716)
> 
> {name}: This element is suppressed. This means it is ignored when computing
> the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
> Participating hosts (8 -> 8)
> No changes in participating hosts
> 
> Build changes
> CI: CI-20190529 -> None
> IGT: IGT_5774 -> IGTPW_4914
> CI-20190529: 20190529
> CI_DRM_8937: 78b090a913c972368c81f05352a532590200cc89 @ git://anongit.freedesktop.org/gfx-ci/linux
> IGTPW_4914: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_4914/index.html
> IGT_5774: 2a5db9f60241383272aeec176e1b97b3f487209f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools