mbox series

[0/1] drm/i915/active: Fix misuse of non-idle barriers as fence trackers

Message ID 20230213130546.20370-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
Headers show
Series drm/i915/active: Fix misuse of non-idle barriers as fence trackers | expand

Message

Janusz Krzysztofik Feb. 13, 2023, 1:05 p.m. UTC
Test-with: <20230213095040.13457-2-janusz.krzysztofik@linux.intel.com>

Users reported oopses on list corruptions when using i915 perf with a
number of concurrently running graphics applications.  Root cause analysis
pointed out to an issue in barrier processing code -- a race among perf
open / close replacing active barriers with perf requests on kernel
contexts and concurrent barrier preallocate / acquire operations performed
during user context first pin / last unpin.

Respect results of barrier deletion attempts -- mark the barrier as idle
only after successfully deleted from the list.  Then, before proceeding
with setting our fence as the one currently tracked, make sure that the
tracker we've got is not a non-idle barrier.  If that check fails, don't
use that tracker but go back and try to acquire a new, usable one.

Note:
I'm submitting this fix with a request to CI for testing it with a new
subtest igt@gem_barrier_race@remote-request, developed for that case,
not yet in upstream IGT.  I've selected trybot submission of the test,
with the test added to BAT testlist, to get results from the widest
possible HW range.

Janusz Krzysztofik (1):
  drm/i915/active: Fix misuse of non-idle barriers as fence trackers

 drivers/gpu/drm/i915/i915_active.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

Comments

Janusz Krzysztofik Feb. 13, 2023, 3:12 p.m. UTC | #1
On Monday, 13 February 2023 14:47:42 CET Patchwork wrote:
> == Series Details ==
> 
> Series: drm/i915/active: Fix misuse of non-idle barriers as fence trackers
> URL   : https://patchwork.freedesktop.org/series/113950/
> State : success
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_12730 -> Patchwork_113950v1
> ====================================================
> 
> Summary
> -------
> 
>   **SUCCESS**
> 
>   No regressions found.
> 
>   External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/index.html
> 
> Participating hosts (40 -> 37)
> ------------------------------
> 
>   Missing    (3): fi-tgl-1115g4 bat-atsm-1 fi-snb-2520m 
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_113950v1:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####

No mor list corruptions, only issues already reported by the new 
igt@gem_barrier_race@remote-request test before (without this patch in place).

>   * {igt@gem_barrier_race@remote-request@rcs0} (NEW):
>     - fi-rkl-11600:       NOTRUN -> [ABORT][1]
>    [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-rkl-11600/igt@gem_barrier_race@remote-request@rcs0.html
>     - bat-dg1-5:          NOTRUN -> [ABORT][2]
>    [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/bat-dg1-5/igt@gem_barrier_race@remote-request@rcs0.html

Infinite __i915_active_wait(), similar to 
https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_706/fi-cfl-8109u/igt@gem_barrier_race@remote-request@rcs0.html

>     - {bat-adlm-1}:       NOTRUN -> [DMESG-WARN][3]
>    [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/bat-adlm-1/igt@gem_barrier_race@remote-request@rcs0.html

Suspicious RCU usage, equivalent to 
https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_706/bat-dg1-7/igt@gem_barrier_race@remote-request@rcs0.html,
also seen before, e.g., 
https://gitlab.freedesktop.org/drm/intel/-/issues/7390, 
https://gitlab.freedesktop.org/drm/intel/-/issues/6616

Thanks,
Janusz

> 
>   
> New tests
> ---------
> 
>   New tests have been introduced between CI_DRM_12730 and Patchwork_113950v1:
> 
> ### New IGT tests (2) ###
> 
>   * igt@gem_barrier_race@remote-request:
>     - Statuses :
>     - Exec time: [None] s
> 
>   * igt@gem_barrier_race@remote-request@rcs0:
>     - Statuses : 2 abort(s) 1 dmesg-warn(s) 27 pass(s) 5 skip(s)
>     - Exec time: [0.0] s
> 
>   
> 
> Known issues
> ------------
> 
>   Here are the changes found in Patchwork_113950v1 that come from known issues:
> 
> ### IGT changes ###
> 
> #### Issues hit ####
> 
>   * {igt@gem_barrier_race@remote-request@rcs0} (NEW):
>     - fi-pnv-d510:        NOTRUN -> [SKIP][4] ([fdo#109271])
>    [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-pnv-d510/igt@gem_barrier_race@remote-request@rcs0.html
>     - fi-blb-e6850:       NOTRUN -> [SKIP][5] ([fdo#109271])
>    [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-blb-e6850/igt@gem_barrier_race@remote-request@rcs0.html
>     - fi-ivb-3770:        NOTRUN -> [SKIP][6] ([fdo#109271])
>    [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-ivb-3770/igt@gem_barrier_race@remote-request@rcs0.html
>     - fi-elk-e7500:       NOTRUN -> [SKIP][7] ([fdo#109271])
>    [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-elk-e7500/igt@gem_barrier_race@remote-request@rcs0.html
>     - fi-ilk-650:         NOTRUN -> [SKIP][8] ([fdo#109271])
>    [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-ilk-650/igt@gem_barrier_race@remote-request@rcs0.html
> 
>   
> #### Possible fixes ####
> 
>   * igt@i915_selftest@live@gt_heartbeat:
>     - fi-apl-guc:         [DMESG-FAIL][9] ([i915#5334]) -> [PASS][10]
>    [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12730/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
>    [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-apl-guc/igt@i915_selftest@live@gt_heartbeat.html
> 
>   * igt@i915_selftest@live@requests:
>     - {bat-rpls-2}:       [ABORT][11] ([i915#7982]) -> [PASS][12]
>    [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12730/bat-rpls-2/igt@i915_selftest@live@requests.html
>    [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/bat-rpls-2/igt@i915_selftest@live@requests.html
> 
>   * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
>     - fi-bsw-n3050:       [FAIL][13] ([i915#6298]) -> [PASS][14]
>    [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12730/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
>    [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
> 
>   
>   {name}: This element is suppressed. This means it is ignored when computing
>           the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
>   [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
>   [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
>   [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
>   [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298
>   [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
>   [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
>   [i915#7982]: https://gitlab.freedesktop.org/drm/intel/issues/7982
>   [i915#7996]: https://gitlab.freedesktop.org/drm/intel/issues/7996
> 
> 
> Build changes
> -------------
> 
>   * IGT: IGT_7157 -> TrybotIGT_706
>   * Linux: CI_DRM_12730 -> Patchwork_113950v1
> 
>   CI-20190529: 20190529
>   CI_DRM_12730: c54b5fcf3e686a0abfdd7d6af53e9014c137023a @ git://anongit.freedesktop.org/gfx-ci/linux
>   IGT_7157: 96d12fdc942cee9526a951b377b195ca9c8276b1 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
>   Patchwork_113950v1: c54b5fcf3e686a0abfdd7d6af53e9014c137023a @ git://anongit.freedesktop.org/gfx-ci/linux
>   TrybotIGT_706: https://intel-gfx-ci.01.org/tree/drm-tip/TrybotIGT_706/index.html
> 
> 
> ### Linux commits
> 
> 44de67f6c674 drm/i915/active: Fix misuse of non-idle barriers as fence trackers
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_113950v1/index.html
>