mbox series

[v2,0/9] GuC fixes

Message ID 20190522193203.23932-1-michal.wajdeczko@intel.com (mailing list archive)
Headers show
Series GuC fixes | expand

Message

Michal Wajdeczko May 22, 2019, 7:31 p.m. UTC
Misc GuC fixes for upcoming 32.0.3

v2: modified reset selftests

Michal Wajdeczko (9):
  drm/i915/selftests: Move some reset testcases to separate file
  drm/i915/selftests: Split igt_atomic_reset testcase
  drm/i915/selftests: Use prepare/finish during atomic reset test
  drm/i915/guc: Rename intel_guc_is_alive to intel_guc_is_loaded
  drm/i915/uc: Explicitly sanitize GuC/HuC on failure and finish
  drm/i915/uc: Use GuC firmware status helper
  drm/i915/uc: Skip GuC HW unwinding if GuC is already dead
  drm/i915/uc: Stop talking with GuC when resetting
  drm/i915/uc: Skip reset preparation if GuC is already dead

 drivers/gpu/drm/i915/gt/intel_reset.c         |   4 +
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 159 +++---------------
 drivers/gpu/drm/i915/gt/selftest_reset.c      | 119 +++++++++++++
 drivers/gpu/drm/i915/intel_guc.h              |  10 +-
 drivers/gpu/drm/i915/intel_guc_ct.h           |   5 +
 drivers/gpu/drm/i915/intel_guc_submission.c   |   2 +-
 drivers/gpu/drm/i915/intel_uc.c               |  44 +++--
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 drivers/gpu/drm/i915/selftests/igt_atomic.h   |  56 ++++++
 drivers/gpu/drm/i915/selftests/igt_reset.c    |   8 +
 drivers/gpu/drm/i915/selftests/igt_reset.h    |   1 +
 11 files changed, 249 insertions(+), 160 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/selftest_reset.c
 create mode 100644 drivers/gpu/drm/i915/selftests/igt_atomic.h

Comments

Michal Wajdeczko May 23, 2019, 2:46 p.m. UTC | #1
On Wed, 22 May 2019 22:53:11 +0200, Patchwork  
<patchwork@emeril.freedesktop.org> wrote:

> == Series Details ==
>
> Series: GuC fixes (rev2)
> URL   : https://patchwork.freedesktop.org/series/60795/
> State : failure
>
> == Summary ==
>
> CI Bug Log - changes from CI_DRM_6123 -> Patchwork_13075
> ====================================================
>
> Summary
> -------
>
>   **FAILURE**
>
>   Serious unknown changes coming with Patchwork_13075 absolutely need to  
> be
>   verified manually.
>  If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_13075, please notify your bug team to allow  
> them
>   to document this new failure mode, which will reduce false positives  
> in CI.
>
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/
>
> Possible new issues
> -------------------
>
>   Here are the unknown changes that may have been introduced in  
> Patchwork_13075:
>
> ### IGT changes ###
>
> #### Possible regressions ####
>
>   * igt@i915_module_load@reload:
>     - fi-apl-guc:         [PASS][1] -> [DMESG-WARN][2]
>    [1]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-apl-guc/igt@i915_module_load@reload.html
>    [2]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-apl-guc/igt@i915_module_load@reload.html

these are doorbells warnings at unload/load when GuC submission is enabled:

<7> [309.389262] [drm:doorbell_ok [i915]] Doorbell 0 has unexpected state:  
valid=yes
<7> [309.389683] [drm:doorbell_ok [i915]] Doorbell 128 has unexpected  
state: valid=yes
<4> [309.390061] ------------[ cut here ]------------
<4> [309.390067] WARN_ON(!guc_verify_doorbells(guc))
<4> [309.390360] Call Trace:
<4> [309.390445]  intel_uc_fini+0x46/0x140 [i915]
<4> [309.390520]  i915_gem_fini+0x70/0x1a0 [i915]
<4> [309.390585]  i915_driver_unload+0xdd/0x130 [i915]
<4> [309.390651]  i915_pci_remove+0x19/0x30 [i915]

...

<7> [310.812673] [drm:doorbell_ok [i915]] Doorbell 0 has unexpected state:  
valid=yes
<7> [310.813014] [drm:doorbell_ok [i915]] Doorbell 128 has unexpected  
state: valid=yes
<4> [310.813290] ------------[ cut here ]------------
<4> [310.813295] WARN_ON(!guc_verify_doorbells(guc))
<4> [310.813646] Call Trace:
<4> [310.813755]  intel_uc_init+0xc8/0x1f0 [i915]
<4> [310.813856]  i915_gem_init+0x49b/0xa90 [i915]
<4> [310.813945]  i915_driver_load+0xdb8/0x18b0 [i915]

and can be fixed by patch [a], but since we are going to disable GuC
submission soon [b] maybe we don't care to fix that, as it works [c].

also note that newer GuC takes care of the doorbells maintenance, so
above warnings will be simply removed.

[a] https://patchwork.freedesktop.org/patch/305573/?series=60735&rev=3
[b] https://patchwork.freedesktop.org/patch/305562/?series=60735&rev=3
[c] https://patchwork.freedesktop.org/series/60925/


>
> #### Warnings ####
>
>   * igt@runner@aborted:
>     - fi-apl-guc:         [FAIL][3] ([fdo#110622]) -> [FAIL][4]
>    [3]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-apl-guc/igt@runner@aborted.html
>    [4]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-apl-guc/igt@runner@aborted.html
>
> New tests
> ---------
>
>   New tests have been introduced between CI_DRM_6123 and Patchwork_13075:
>
> ### New IGT tests (1) ###
>
>   * igt@i915_selftest@live_reset:
>     - Statuses : 43 pass(s)
>     - Exec time: [0.31, 1.29] s
>
>
> Known issues
> ------------
>
>   Here are the changes found in Patchwork_13075 that come from known  
> issues:
>
> ### IGT changes ###
>
> #### Issues hit ####
>
>   * igt@amdgpu/amd_basic@userptr:
>     - fi-kbl-8809g:       [PASS][5] -> [DMESG-WARN][6] ([fdo#108965])
>    [5]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-kbl-8809g/igt@amdgpu/amd_basic@userptr.html
>    [6]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-kbl-8809g/igt@amdgpu/amd_basic@userptr.html
>
>   * igt@i915_selftest@live_contexts:
>     - fi-skl-gvtdvm:      [PASS][7] -> [DMESG-FAIL][8] ([fdo#110235])
>    [7]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html
>    [8]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html
>
> #### Possible fixes ####
>
>   * igt@gem_exec_suspend@basic-s3:
>     - fi-blb-e6850:       [INCOMPLETE][9] ([fdo#107718]) -> [PASS][10]
>    [9]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
>    [10]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
>
>   * igt@gem_render_linear_blits@basic:
>     - {fi-icl-u3}:        [DMESG-WARN][11] ([fdo#107724]) -> [PASS][12]  
> +2 similar issues
>    [11]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-icl-u3/igt@gem_render_linear_blits@basic.html
>    [12]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-icl-u3/igt@gem_render_linear_blits@basic.html
>
>   * igt@i915_pm_rpm@module-reload:
>     - fi-skl-6770hq:      [FAIL][13] ([fdo#108511]) -> [PASS][14]
>    [13]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
>    [14]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
>
>   * igt@i915_selftest@live_contexts:
>     - fi-bdw-gvtdvm:      [DMESG-FAIL][15] ([fdo#110235]) -> [PASS][16]
>    [15]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html
>    [16]:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html
>
>  {name}: This element is suppressed. This means it is ignored when  
> computing
>           the status of the difference (SUCCESS, WARNING, or FAILURE).
>
>   [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
>   [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
>   [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
>   [fdo#108511]: https://bugs.freedesktop.org/show_bug.cgi?id=108511
>   [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
>   [fdo#108965]: https://bugs.freedesktop.org/show_bug.cgi?id=108965
>   [fdo#110235]: https://bugs.freedesktop.org/show_bug.cgi?id=110235
>   [fdo#110622]: https://bugs.freedesktop.org/show_bug.cgi?id=110622
>
>
> Participating hosts (53 -> 45)
> ------------------------------
>
>   Missing    (8): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks  
> fi-bsw-cyan fi-cfl-8109u fi-byt-clapper fi-bdw-samus
>
>
> Build changes
> -------------
>
>   * Linux: CI_DRM_6123 -> Patchwork_13075
>
>   CI_DRM_6123: d37dd1f4dfcc7a8814fd27f8bdfa97ea5c0a9bd3 @  
> git://anongit.freedesktop.org/gfx-ci/linux
>   IGT_5005: adf9f435a795d692e30cd6eafe26eddf4993c8ff @  
> git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
>   Patchwork_13075: df2d1b4314459cf88eddc80b6d13a54446cfbccf @  
> git://anongit.freedesktop.org/gfx-ci/linux
>
>
> == Linux commits ==
>
> df2d1b431445 drm/i915/uc: Skip reset preparation if GuC is already dead
> 191f064b4da3 drm/i915/uc: Stop talking with GuC when resetting
> cea15b131008 drm/i915/uc: Skip GuC HW unwinding if GuC is already dead
> 599cfb7e0a02 drm/i915/uc: Use GuC firmware status helper
> 6a0e54a07d39 drm/i915/uc: Explicitly sanitize GuC/HuC on failure and  
> finish
> 346a854fbd49 drm/i915/guc: Rename intel_guc_is_alive to  
> intel_guc_is_loaded
> a24312534312 drm/i915/selftests: Use prepare/finish during atomic reset  
> test
> 010025dfb682 drm/i915/selftests: Split igt_atomic_reset testcase
> 54ff1dca4c58 drm/i915/selftests: Move some reset testcases to separate  
> file
>
> == Logs ==
>
> For more details see:  
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/
Chris Wilson May 23, 2019, 8:58 p.m. UTC | #2
Quoting Michal Wajdeczko (2019-05-23 15:46:58)
> On Wed, 22 May 2019 22:53:11 +0200, Patchwork  
> <patchwork@emeril.freedesktop.org> wrote:
> 
> > == Series Details ==
> >
> > Series: GuC fixes (rev2)
> > URL   : https://patchwork.freedesktop.org/series/60795/
> > State : failure
> >
> > == Summary ==
> >
> > CI Bug Log - changes from CI_DRM_6123 -> Patchwork_13075
> > ====================================================
> >
> > Summary
> > -------
> >
> >   **FAILURE**
> >
> >   Serious unknown changes coming with Patchwork_13075 absolutely need to  
> > be
> >   verified manually.
> >  If you think the reported changes have nothing to do with the changes
> >   introduced in Patchwork_13075, please notify your bug team to allow  
> > them
> >   to document this new failure mode, which will reduce false positives  
> > in CI.
> >
> >   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/
> >
> > Possible new issues
> > -------------------
> >
> >   Here are the unknown changes that may have been introduced in  
> > Patchwork_13075:
> >
> > ### IGT changes ###
> >
> > #### Possible regressions ####
> >
> >   * igt@i915_module_load@reload:
> >     - fi-apl-guc:         [PASS][1] -> [DMESG-WARN][2]
> >    [1]:  
> > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6123/fi-apl-guc/igt@i915_module_load@reload.html
> >    [2]:  
> > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13075/fi-apl-guc/igt@i915_module_load@reload.html
> 
> these are doorbells warnings at unload/load when GuC submission is enabled:
> 
> <7> [309.389262] [drm:doorbell_ok [i915]] Doorbell 0 has unexpected state:  
> valid=yes
> <7> [309.389683] [drm:doorbell_ok [i915]] Doorbell 128 has unexpected  
> state: valid=yes
> <4> [309.390061] ------------[ cut here ]------------
> <4> [309.390067] WARN_ON(!guc_verify_doorbells(guc))
> <4> [309.390360] Call Trace:
> <4> [309.390445]  intel_uc_fini+0x46/0x140 [i915]
> <4> [309.390520]  i915_gem_fini+0x70/0x1a0 [i915]
> <4> [309.390585]  i915_driver_unload+0xdd/0x130 [i915]
> <4> [309.390651]  i915_pci_remove+0x19/0x30 [i915]
> 
> ...
> 
> <7> [310.812673] [drm:doorbell_ok [i915]] Doorbell 0 has unexpected state:  
> valid=yes
> <7> [310.813014] [drm:doorbell_ok [i915]] Doorbell 128 has unexpected  
> state: valid=yes
> <4> [310.813290] ------------[ cut here ]------------
> <4> [310.813295] WARN_ON(!guc_verify_doorbells(guc))
> <4> [310.813646] Call Trace:
> <4> [310.813755]  intel_uc_init+0xc8/0x1f0 [i915]
> <4> [310.813856]  i915_gem_init+0x49b/0xa90 [i915]
> <4> [310.813945]  i915_driver_load+0xdb8/0x18b0 [i915]
> 
> and can be fixed by patch [a], but since we are going to disable GuC
> submission soon [b] maybe we don't care to fix that, as it works [c].

Hmm, it's a bit of nuisance as it means CI stops running tests and we
miss the selftests for guc/apl.

I don't think that's a huge issue as we have your gen11 enabling patch
to land in the very near future which as you say makes this a non-issue.
Till then we will just have to upset Martin!

Thanks for the patches, pushed.
-Chris