mbox series

[i-g-t,v6,00/24] tests/core_hotunplug: Fixes and enhancements

Message ID 20200911103039.4574-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
Headers show
Series tests/core_hotunplug: Fixes and enhancements | expand

Message

Janusz Krzysztofik Sept. 11, 2020, 10:30 a.m. UTC
Clean up the test code, add some new basic subtests, then unblock
unbind test variants.

No incompletes / aborts nor subsequently run test issues have been
reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
unidentified driver sysfs issue but the device is fully recovered and
left in a usable state.  Perceived Haswell/Broadwell issue with audio
power management has been worked around and its potential occurrence
is reported as an IGT warning.

Series changelog:
v2: New patch "Un-blocklist *bind* subtests added.
v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
    from subtest failures".
  - a new patche "Clean up device open error handling" added, an old
    patch "Fix missing newline" obsoleted by the new one dropped,
  - other new patches added:
    - "Let the driver time out essential sysfs operations",
    - "More thorough i915 healthcheck and recovery",
  - a patch "Add 'lateclose before restore' variants" from another
    series included.
v4: Optional patch "Duplicate debug messages in dmesg" from another
    series included.
v5: New patch added with Haswell audio related kernel warning worked
    around and replaced with an IGT warning to preserve visibility of
    the issue.
v6: New patch added for also checking health of render device nodes,
  - new patch added with proper handling of health check before late
    close,
  - inclusion of unbind-rebind scenario to BAT scope proposed.

@Michał: Since some patch updates are trivial, I've preserved your
v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
marked your R-b as v1/v2 applicable.  Please have a look and confirm if
you are still OK with them.

@Tvrtko: As I already asked before, please support my attempt to remove
the unbind test variants from the blocklist.

@Petri, @Martin: Assuming CI results will be as good as those obtained
on Trybot, please give me your green light for merging this series if
you have no objections.

Thanks,
Janusz

Janusz Krzysztofik (24):
  tests/core_hotunplug: Use igt_assert_fd()
  tests/core_hotunplug: Constify dev_bus_addr string
  tests/core_hotunplug: Clean up device open error handling
  tests/core_hotunplug: Consolidate duplicated debug messages
  tests/core_hotunplug: Assert successful device filter application
  tests/core_hotunplug: Maintain a single data structure instance
  tests/core_hotunplug: Pass errors via a data structure field
  tests/core_hotunplug: Handle device close errors
  tests/core_hotunplug: Prepare invariant data once per test run
  tests/core_hotunplug: Skip selectively on sysfs close errors
  tests/core_hotunplug: Recover from subtest failures
  tests/core_hotunplug: Fail subtests on device close errors
  tests/core_hotunplug: Let the driver time out essential sysfs
    operations
  tests/core_hotunplug: Process return values of sysfs operations
  tests/core_hotunplug: Assert expected device presence/absence
  tests/core_hotunplug: Explicitly ignore unused return values
  tests/core_hotunplug: Also check health of render device node
  tests/core_hotunplug: More thorough i915 healthcheck and recovery
  tests/core_hotunplug: Add 'lateclose before restore' variants
  tests/core_hotunplug: Check health both before and after late close
  tests/core_hotunplug: HSW/BDW audio issue workaround
  tests/core_hotunplug: Duplicate debug messages in dmesg
  tests/core_hotunplug: Un-blocklist *bind* subtests
  tests/core_hotunplug: Add unbind-rebind subtest to BAT scope

 tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
 tests/intel-ci/blacklist.txt          |   2 +-
 tests/intel-ci/fast-feedback.testlist |   1 +
 3 files changed, 431 insertions(+), 132 deletions(-)

Comments

Michał Winiarski Sept. 14, 2020, 6:18 p.m. UTC | #1
Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> Clean up the test code, add some new basic subtests, then unblock
> unbind test variants.
> 
> No incompletes / aborts nor subsequently run test issues have been
> reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> unidentified driver sysfs issue but the device is fully recovered and
> left in a usable state.  Perceived Haswell/Broadwell issue with audio
> power management has been worked around and its potential occurrence
> is reported as an IGT warning.
> 
> Series changelog:
> v2: New patch "Un-blocklist *bind* subtests added.
> v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
>     from subtest failures".
>   - a new patche "Clean up device open error handling" added, an old
>     patch "Fix missing newline" obsoleted by the new one dropped,
>   - other new patches added:
>     - "Let the driver time out essential sysfs operations",
>     - "More thorough i915 healthcheck and recovery",
>   - a patch "Add 'lateclose before restore' variants" from another
>     series included.
> v4: Optional patch "Duplicate debug messages in dmesg" from another
>     series included.
> v5: New patch added with Haswell audio related kernel warning worked
>     around and replaced with an IGT warning to preserve visibility of
>     the issue.
> v6: New patch added for also checking health of render device nodes,
>   - new patch added with proper handling of health check before late
>     close,
>   - inclusion of unbind-rebind scenario to BAT scope proposed.
> 
> @Michał: Since some patch updates are trivial, I've preserved your
> v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> you are still OK with them.

Feel free to add:
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

For the whole series (with the exception of intel-ci part).

-Michał

> 
> @Tvrtko: As I already asked before, please support my attempt to remove
> the unbind test variants from the blocklist.
> 
> @Petri, @Martin: Assuming CI results will be as good as those obtained
> on Trybot, please give me your green light for merging this series if
> you have no objections.
> 
> Thanks,
> Janusz
> 
> Janusz Krzysztofik (24):
>   tests/core_hotunplug: Use igt_assert_fd()
>   tests/core_hotunplug: Constify dev_bus_addr string
>   tests/core_hotunplug: Clean up device open error handling
>   tests/core_hotunplug: Consolidate duplicated debug messages
>   tests/core_hotunplug: Assert successful device filter application
>   tests/core_hotunplug: Maintain a single data structure instance
>   tests/core_hotunplug: Pass errors via a data structure field
>   tests/core_hotunplug: Handle device close errors
>   tests/core_hotunplug: Prepare invariant data once per test run
>   tests/core_hotunplug: Skip selectively on sysfs close errors
>   tests/core_hotunplug: Recover from subtest failures
>   tests/core_hotunplug: Fail subtests on device close errors
>   tests/core_hotunplug: Let the driver time out essential sysfs
>     operations
>   tests/core_hotunplug: Process return values of sysfs operations
>   tests/core_hotunplug: Assert expected device presence/absence
>   tests/core_hotunplug: Explicitly ignore unused return values
>   tests/core_hotunplug: Also check health of render device node
>   tests/core_hotunplug: More thorough i915 healthcheck and recovery
>   tests/core_hotunplug: Add 'lateclose before restore' variants
>   tests/core_hotunplug: Check health both before and after late close
>   tests/core_hotunplug: HSW/BDW audio issue workaround
>   tests/core_hotunplug: Duplicate debug messages in dmesg
>   tests/core_hotunplug: Un-blocklist *bind* subtests
>   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> 
>  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
>  tests/intel-ci/blacklist.txt          |   2 +-
>  tests/intel-ci/fast-feedback.testlist |   1 +
>  3 files changed, 431 insertions(+), 132 deletions(-)
> 
> -- 
> 2.21.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Janusz Krzysztofik Sept. 14, 2020, 7:30 p.m. UTC | #2
On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so far
> > unidentified driver sysfs issue but the device is fully recovered and
> > left in a usable state.  Perceived Haswell/Broadwell issue with audio
> > power management has been worked around and its potential occurrence
> > is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where I
> > marked your R-b as v1/v2 applicable.  Please have a look and confirm if
> > you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core
_hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and
Broadwell platofrms is caused by the same issue as the one reported now
in a similar way on Haswell by igt@device_reset@unbind-reset-rebind -
please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to remove
> > the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those obtained
> > on Trybot, please give me your green light for merging this series if
> > you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > -- 
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Vudum, Lakshminarayana Sept. 14, 2020, 8:43 p.m. UTC | #3
igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log. Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464

Thanks,
Lakshmi.

-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Monday, September 14, 2020 12:31 PM
To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > Clean up the test code, add some new basic subtests, then unblock 
> > unbind test variants.
> > 
> > No incompletes / aborts nor subsequently run test issues have been 
> > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > far unidentified driver sysfs issue but the device is fully 
> > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > issue with audio power management has been worked around and its 
> > potential occurrence is reported as an IGT warning.
> > 
> > Series changelog:
> > v2: New patch "Un-blocklist *bind* subtests added.
> > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> >     from subtest failures".
> >   - a new patche "Clean up device open error handling" added, an old
> >     patch "Fix missing newline" obsoleted by the new one dropped,
> >   - other new patches added:
> >     - "Let the driver time out essential sysfs operations",
> >     - "More thorough i915 healthcheck and recovery",
> >   - a patch "Add 'lateclose before restore' variants" from another
> >     series included.
> > v4: Optional patch "Duplicate debug messages in dmesg" from another
> >     series included.
> > v5: New patch added with Haswell audio related kernel warning worked
> >     around and replaced with an IGT warning to preserve visibility of
> >     the issue.
> > v6: New patch added for also checking health of render device nodes,
> >   - new patch added with proper handling of health check before late
> >     close,
> >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > 
> > @Michał: Since some patch updates are trivial, I've preserved your
> > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > confirm if you are still OK with them.
> 
> Feel free to add:
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> 
> For the whole series (with the exception of intel-ci part).

Pushed.

@Petri, @Michał - thank you for review.

@Lakshmi:
- please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
- IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.

Thanks,
Janusz


> 
> -Michał
> 
> > @Tvrtko: As I already asked before, please support my attempt to 
> > remove the unbind test variants from the blocklist.
> > 
> > @Petri, @Martin: Assuming CI results will be as good as those 
> > obtained on Trybot, please give me your green light for merging this 
> > series if you have no objections.
> > 
> > Thanks,
> > Janusz
> > 
> > Janusz Krzysztofik (24):
> >   tests/core_hotunplug: Use igt_assert_fd()
> >   tests/core_hotunplug: Constify dev_bus_addr string
> >   tests/core_hotunplug: Clean up device open error handling
> >   tests/core_hotunplug: Consolidate duplicated debug messages
> >   tests/core_hotunplug: Assert successful device filter application
> >   tests/core_hotunplug: Maintain a single data structure instance
> >   tests/core_hotunplug: Pass errors via a data structure field
> >   tests/core_hotunplug: Handle device close errors
> >   tests/core_hotunplug: Prepare invariant data once per test run
> >   tests/core_hotunplug: Skip selectively on sysfs close errors
> >   tests/core_hotunplug: Recover from subtest failures
> >   tests/core_hotunplug: Fail subtests on device close errors
> >   tests/core_hotunplug: Let the driver time out essential sysfs
> >     operations
> >   tests/core_hotunplug: Process return values of sysfs operations
> >   tests/core_hotunplug: Assert expected device presence/absence
> >   tests/core_hotunplug: Explicitly ignore unused return values
> >   tests/core_hotunplug: Also check health of render device node
> >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> >   tests/core_hotunplug: Add 'lateclose before restore' variants
> >   tests/core_hotunplug: Check health both before and after late close
> >   tests/core_hotunplug: HSW/BDW audio issue workaround
> >   tests/core_hotunplug: Duplicate debug messages in dmesg
> >   tests/core_hotunplug: Un-blocklist *bind* subtests
> >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > 
> >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> >  tests/intel-ci/blacklist.txt          |   2 +-
> >  tests/intel-ci/fast-feedback.testlist |   1 +
> >  3 files changed, 431 insertions(+), 132 deletions(-)
> > 
> > --
> > 2.21.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Janusz Krzysztofik Sept. 15, 2020, 7:47 a.m. UTC | #4
Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>; Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, where 
> > > I marked your R-b as v1/v2 applicable.  Please have a look and 
> > > confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging this 
> > > series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Vudum, Lakshminarayana Sept. 15, 2020, 3:39 p.m. UTC | #5
Hi Janusz,

I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
Is it GUC issue?

Thanks,
Lakshmi


-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Tuesday, September 15, 2020 12:47 AM
To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements

Hi Lakshmi,

On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.

Here is a fresh evidence:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html

Thanks,
Janusz

>  Otherwise I filed the issue 
> https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> 
> Thanks,
> Lakshmi.
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Sent: Monday, September 14, 2020 12:31 PM
> To: Winiarski, Michal <michal.winiarski@intel.com>; 
> igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; 
> intel-gfx@lists.freedesktop.org; Latvala, Petri 
> <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> <lakshminarayana.vudum@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> Fixes and enhancements
> 
> On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > Clean up the test code, add some new basic subtests, then unblock 
> > > unbind test variants.
> > > 
> > > No incompletes / aborts nor subsequently run test issues have been 
> > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > far unidentified driver sysfs issue but the device is fully 
> > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > issue with audio power management has been worked around and its 
> > > potential occurrence is reported as an IGT warning.
> > > 
> > > Series changelog:
> > > v2: New patch "Un-blocklist *bind* subtests added.
> > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > >     from subtest failures".
> > >   - a new patche "Clean up device open error handling" added, an old
> > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > >   - other new patches added:
> > >     - "Let the driver time out essential sysfs operations",
> > >     - "More thorough i915 healthcheck and recovery",
> > >   - a patch "Add 'lateclose before restore' variants" from another
> > >     series included.
> > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > >     series included.
> > > v5: New patch added with Haswell audio related kernel warning worked
> > >     around and replaced with an IGT warning to preserve visibility of
> > >     the issue.
> > > v6: New patch added for also checking health of render device nodes,
> > >   - new patch added with proper handling of health check before late
> > >     close,
> > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > 
> > > @Michał: Since some patch updates are trivial, I've preserved your
> > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > and confirm if you are still OK with them.
> > 
> > Feel free to add:
> > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > 
> > For the whole series (with the exception of intel-ci part).
> 
> Pushed.
> 
> @Petri, @Michał - thank you for review.
> 
> @Lakshmi:
> - please open a new bug for the issue reported by the igt@core 
> _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> 
> Thanks,
> Janusz
> 
> 
> > -Michał
> > 
> > > @Tvrtko: As I already asked before, please support my attempt to 
> > > remove the unbind test variants from the blocklist.
> > > 
> > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > obtained on Trybot, please give me your green light for merging 
> > > this series if you have no objections.
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > Janusz Krzysztofik (24):
> > >   tests/core_hotunplug: Use igt_assert_fd()
> > >   tests/core_hotunplug: Constify dev_bus_addr string
> > >   tests/core_hotunplug: Clean up device open error handling
> > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > >   tests/core_hotunplug: Assert successful device filter application
> > >   tests/core_hotunplug: Maintain a single data structure instance
> > >   tests/core_hotunplug: Pass errors via a data structure field
> > >   tests/core_hotunplug: Handle device close errors
> > >   tests/core_hotunplug: Prepare invariant data once per test run
> > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > >   tests/core_hotunplug: Recover from subtest failures
> > >   tests/core_hotunplug: Fail subtests on device close errors
> > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > >     operations
> > >   tests/core_hotunplug: Process return values of sysfs operations
> > >   tests/core_hotunplug: Assert expected device presence/absence
> > >   tests/core_hotunplug: Explicitly ignore unused return values
> > >   tests/core_hotunplug: Also check health of render device node
> > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > >   tests/core_hotunplug: Check health both before and after late close
> > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > 
> > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > >  tests/intel-ci/blacklist.txt          |   2 +-
> > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > 
> > > --
> > > 2.21.1
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Janusz Krzysztofik Sept. 16, 2020, 7:59 a.m. UTC | #6
Hi Lakshmi,

On Tue, 2020-09-15 at 15:39 +0000, Vudum, Lakshminarayana wrote:
> Hi Janusz,
> 
> I have filed https://gitlab.freedesktop.org/drm/intel/-/issues/2469 for igt@core_hotunplug@hotrebind-lateclose failure. 
> Is it GUC issue?

Wow, I thought that issue got hidden behind another one and I forgot
about that issueit.  That's great you've identified it.  And yes, it is
GuC specific.  However, as far as I can tell, the test recovers from
that condition so it is not the root cause of the subtest failures -
those happen on non-GuC platforms as well.

Then, we need to open another bug with a filter that captures the
following from the test standard error:

(core_hotunplug:2056) igt_aux-CRITICAL: Test assertion failure function igt_fork_hang_detector, file ../lib/igt_aux.c:517:
(core_hotunplug:2056) igt_aux-CRITICAL: Failed assertion: igt_params_set(fd, "reset", "%d", 1 )
(core_hotunplug:2056) igt_aux-CRITICAL: Last errno: 13, Permission denied

I have no idea if CI filters are able to trigger more than one bug from
a single subtest run, if not then I think the GuC issue should have
higher priority set so both are visible.

Thanks,
Janusz

> 
> Thanks,
> Lakshmi
> 
> 
> -----Original Message-----
> From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
> Sent: Tuesday, September 15, 2020 12:47 AM
> To: Vudum, Lakshminarayana <lakshminarayana.vudum@intel.com>; Winiarski, Michal <michal.winiarski@intel.com>; igt-dev@lists.freedesktop.org
> Cc: Michał Winiarski <michal@hardline.pl>; intel-gfx@lists.freedesktop.org; Latvala, Petri <petri.latvala@intel.com>
> Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: Fixes and enhancements
> 
> Hi Lakshmi,
> 
> On Mon, 2020-09-14 at 20:43 +0000, Vudum, Lakshminarayana wrote:
> > igt@core_hotunplug@hotrebind-lateclose test is not yet in CI bug log.
> 
> Here is a fresh evidence:
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9008/shard-tglb5/igt@core_hotunplug@hotrebind-lateclose.html
> 
> Thanks,
> Janusz
> 
> >  Otherwise I filed the issue 
> > https://gitlab.freedesktop.org/drm/intel/-/issues/2464
> > 
> > Thanks,
> > Lakshmi.
> > 
> > -----Original Message-----
> > From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> > Sent: Monday, September 14, 2020 12:31 PM
> > To: Winiarski, Michal <michal.winiarski@intel.com>; 
> > igt-dev@lists.freedesktop.org
> > Cc: Michał Winiarski <michal@hardline.pl>; 
> > intel-gfx@lists.freedesktop.org; Latvala, Petri 
> > <petri.latvala@intel.com>; Vudum, Lakshminarayana 
> > <lakshminarayana.vudum@intel.com>
> > Subject: Re: [Intel-gfx] [PATCH i-g-t v6 00/24] tests/core_hotunplug: 
> > Fixes and enhancements
> > 
> > On Mon, 2020-09-14 at 20:18 +0200, Michał Winiarski wrote:
> > > Quoting Janusz Krzysztofik (2020-09-11 12:30:15)
> > > > Clean up the test code, add some new basic subtests, then unblock 
> > > > unbind test variants.
> > > > 
> > > > No incompletes / aborts nor subsequently run test issues have been 
> > > > reported by Trybot.  The hotrebind-lateclose subtest fails on a so 
> > > > far unidentified driver sysfs issue but the device is fully 
> > > > recovered and left in a usable state.  Perceived Haswell/Broadwell 
> > > > issue with audio power management has been worked around and its 
> > > > potential occurrence is reported as an IGT warning.
> > > > 
> > > > Series changelog:
> > > > v2: New patch "Un-blocklist *bind* subtests added.
> > > > v3: Patch "Follow failed subtests with healthcheck" renamed to "Recover
> > > >     from subtest failures".
> > > >   - a new patche "Clean up device open error handling" added, an old
> > > >     patch "Fix missing newline" obsoleted by the new one dropped,
> > > >   - other new patches added:
> > > >     - "Let the driver time out essential sysfs operations",
> > > >     - "More thorough i915 healthcheck and recovery",
> > > >   - a patch "Add 'lateclose before restore' variants" from another
> > > >     series included.
> > > > v4: Optional patch "Duplicate debug messages in dmesg" from another
> > > >     series included.
> > > > v5: New patch added with Haswell audio related kernel warning worked
> > > >     around and replaced with an IGT warning to preserve visibility of
> > > >     the issue.
> > > > v6: New patch added for also checking health of render device nodes,
> > > >   - new patch added with proper handling of health check before late
> > > >     close,
> > > >   - inclusion of unbind-rebind scenario to BAT scope proposed.
> > > > 
> > > > @Michał: Since some patch updates are trivial, I've preserved your
> > > > v1/v2 Reviewd-by: except for patches with non-trivial changes, 
> > > > where I marked your R-b as v1/v2 applicable.  Please have a look 
> > > > and confirm if you are still OK with them.
> > > 
> > > Feel free to add:
> > > Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
> > > 
> > > For the whole series (with the exception of intel-ci part).
> > 
> > Pushed.
> > 
> > @Petri, @Michał - thank you for review.
> > 
> > @Lakshmi:
> > - please open a new bug for the issue reported by the igt@core 
> > _hotunplug@hotrebind-lateclose subtest failing on all platforms,
> > - IGT warning reported by igt@core_hotunplug@*bind* on Haswell and Broadwell platofrms is caused by the same issue as the one reported now in a similar way on Haswell by igt@device_reset@unbind-reset-rebind - please update the associated filter so it covers all those tests.
> > 
> > Thanks,
> > Janusz
> > 
> > 
> > > -Michał
> > > 
> > > > @Tvrtko: As I already asked before, please support my attempt to 
> > > > remove the unbind test variants from the blocklist.
> > > > 
> > > > @Petri, @Martin: Assuming CI results will be as good as those 
> > > > obtained on Trybot, please give me your green light for merging 
> > > > this series if you have no objections.
> > > > 
> > > > Thanks,
> > > > Janusz
> > > > 
> > > > Janusz Krzysztofik (24):
> > > >   tests/core_hotunplug: Use igt_assert_fd()
> > > >   tests/core_hotunplug: Constify dev_bus_addr string
> > > >   tests/core_hotunplug: Clean up device open error handling
> > > >   tests/core_hotunplug: Consolidate duplicated debug messages
> > > >   tests/core_hotunplug: Assert successful device filter application
> > > >   tests/core_hotunplug: Maintain a single data structure instance
> > > >   tests/core_hotunplug: Pass errors via a data structure field
> > > >   tests/core_hotunplug: Handle device close errors
> > > >   tests/core_hotunplug: Prepare invariant data once per test run
> > > >   tests/core_hotunplug: Skip selectively on sysfs close errors
> > > >   tests/core_hotunplug: Recover from subtest failures
> > > >   tests/core_hotunplug: Fail subtests on device close errors
> > > >   tests/core_hotunplug: Let the driver time out essential sysfs
> > > >     operations
> > > >   tests/core_hotunplug: Process return values of sysfs operations
> > > >   tests/core_hotunplug: Assert expected device presence/absence
> > > >   tests/core_hotunplug: Explicitly ignore unused return values
> > > >   tests/core_hotunplug: Also check health of render device node
> > > >   tests/core_hotunplug: More thorough i915 healthcheck and recovery
> > > >   tests/core_hotunplug: Add 'lateclose before restore' variants
> > > >   tests/core_hotunplug: Check health both before and after late close
> > > >   tests/core_hotunplug: HSW/BDW audio issue workaround
> > > >   tests/core_hotunplug: Duplicate debug messages in dmesg
> > > >   tests/core_hotunplug: Un-blocklist *bind* subtests
> > > >   tests/core_hotunplug: Add unbind-rebind subtest to BAT scope
> > > > 
> > > >  tests/core_hotunplug.c                | 560 ++++++++++++++++++++------
> > > >  tests/intel-ci/blacklist.txt          |   2 +-
> > > >  tests/intel-ci/fast-feedback.testlist |   1 +
> > > >  3 files changed, 431 insertions(+), 132 deletions(-)
> > > > 
> > > > --
> > > > 2.21.1
> > > > 
> > > > _______________________________________________
> > > > Intel-gfx mailing list
> > > > Intel-gfx@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx