diff mbox series

[v6] drm/i915/display: disable HPD workers before display driver unregister

Message ID 20220513160641.3096487-1-andrzej.hajda@intel.com (mailing list archive)
State New, archived
Headers show
Series [v6] drm/i915/display: disable HPD workers before display driver unregister | expand

Commit Message

Andrzej Hajda May 13, 2022, 4:06 p.m. UTC
Handling HPD during driver removal is pointless, and can cause different
use-after-free/concurrency issues:
1. Setup of deferred fbdev after fbdev unregistration.
2. Access to DP-AUX after DP-AUX removal.

Below stacktraces of both cases observed on CI:

[272.634530] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
[272.634536] CPU: 0 PID: 6030 Comm: i915_selftest Tainted: G     U            5.18.0-rc5-CI_DRM_11603-g12dccf4f5eef+ #1
[272.634541] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.2397.A01.2109300731 09/30/2021
[272.634545] RIP: 0010:fb_do_apertures_overlap.part.14+0x26/0x60
...
[272.634582] Call Trace:
[272.634583]  <TASK>
[272.634585]  do_remove_conflicting_framebuffers+0x59/0xa0
[272.634589]  remove_conflicting_framebuffers+0x2d/0xc0
[272.634592]  remove_conflicting_pci_framebuffers+0xc8/0x110
[272.634595]  drm_aperture_remove_conflicting_pci_framebuffers+0x52/0x70
[272.634604]  i915_driver_probe+0x63a/0xdd0 [i915]

[283.405824] cpu_latency_qos_update_request called for unknown object
[283.405866] WARNING: CPU: 2 PID: 240 at kernel/power/qos.c:296 cpu_latency_qos_update_request+0x2d/0x100
[283.405912] CPU: 2 PID: 240 Comm: kworker/u64:9 Not tainted 5.18.0-rc6-Patchwork_103738v3-g1672d1c43e43+ #1
[283.405915] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.2397.A01.2109300731 09/30/2021
[283.405916] Workqueue: i915-dp i915_digport_work_func [i915]
[283.406020] RIP: 0010:cpu_latency_qos_update_request+0x2d/0x100
...
[283.406040] Call Trace:
[283.406041]  <TASK>
[283.406044]  intel_dp_aux_xfer+0x60e/0x8e0 [i915]
[283.406131]  ? finish_swait+0x80/0x80
[283.406139]  intel_dp_aux_transfer+0xc5/0x2b0 [i915]
[283.406218]  drm_dp_dpcd_access+0x79/0x130 [drm_display_helper]
[283.406227]  drm_dp_dpcd_read+0xe2/0xf0 [drm_display_helper]
[283.406233]  intel_dp_hpd_pulse+0x134/0x570 [i915]
[283.406308]  ? __down_killable+0x70/0x140
[283.406313]  i915_digport_work_func+0xba/0x150 [i915]

Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
---
Hi all,

This is my Nth attempt to solve some old CI bug[1].
v-2: caused issues in kms code [2],
v-1: revealed that not only fbdev does not like HPD on removal [3],
v3: lacks drm_kms_helper_poll_disable[4]
v4: added dump_stack to detect late fb registration
v5: added intel_dp_mst_suspend to stop late fb registration
v6: removed dump_stack with hope everything is handled

Moreover this is quite rare bug, but due to specific configuration
of one of CI machines it appears there very frequently.
Forgive me spamming the list.

[1]: https://gitlab.freedesktop.org/drm/intel/-/issues/5329
[2]: https://patchwork.freedesktop.org/series/103621/
[3]: https://patchwork.freedesktop.org/series/103738/
[4]: https://patchwork.freedesktop.org/series/103811/

Regards
Andrzej
---
 drivers/gpu/drm/i915/display/intel_display.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Andrzej Hajda May 16, 2022, 8:02 a.m. UTC | #1
On 14.05.2022 19:03, Patchwork wrote:
> Project List - Patchwork *Patch Details*
> *Series:* 	drm/i915/display: disable HPD workers before display driver 
> unregister (rev7)
> *URL:* 	https://patchwork.freedesktop.org/series/103811/
> *State:* 	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_103811v7/index.html
>
>
>   CI Bug Log - changes from CI_DRM_11653_full -> Patchwork_103811v7_full
>
>
>     Summary
>
> *FAILURE*
>
> Serious unknown changes coming with Patchwork_103811v7_full absolutely 
> need to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_103811v7_full, please notify your bug team to 
> allow them
> to document this new failure mode, which will reduce false positives 
> in CI.
>
>
>     Participating hosts (13 -> 13)
>
> No changes in participating hosts
>
>
>     Possible new issues
>
> Here are the unknown changes that may have been introduced in 
> Patchwork_103811v7_full:
>
>
>       IGT changes
>
>
>         Possible regressions
>
>   * igt@i915_pm_rpm@system-suspend-modeset:
>       o shard-apl: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11653/shard-apl7/igt@i915_pm_rpm@system-suspend-modeset.html>
>         -> INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_103811v7/shard-apl7/igt@i915_pm_rpm@system-suspend-modeset.html>
>


This is unrelated - the patch changes only behavior on driver removal path.

Regards
Andrzej
Andrzej Hajda May 19, 2022, 11:27 a.m. UTC | #2
Seems the patch correctly fixes the the issue and passes CI so added 
maintainers/reviewers to CC.


On 13.05.2022 18:06, Andrzej Hajda wrote:
> Handling HPD during driver removal is pointless, and can cause different
> use-after-free/concurrency issues:
> 1. Setup of deferred fbdev after fbdev unregistration.
> 2. Access to DP-AUX after DP-AUX removal.
> 
> Below stacktraces of both cases observed on CI:
> 
> [272.634530] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
> [272.634536] CPU: 0 PID: 6030 Comm: i915_selftest Tainted: G     U            5.18.0-rc5-CI_DRM_11603-g12dccf4f5eef+ #1
> [272.634541] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.2397.A01.2109300731 09/30/2021
> [272.634545] RIP: 0010:fb_do_apertures_overlap.part.14+0x26/0x60
> ...
> [272.634582] Call Trace:
> [272.634583]  <TASK>
> [272.634585]  do_remove_conflicting_framebuffers+0x59/0xa0
> [272.634589]  remove_conflicting_framebuffers+0x2d/0xc0
> [272.634592]  remove_conflicting_pci_framebuffers+0xc8/0x110
> [272.634595]  drm_aperture_remove_conflicting_pci_framebuffers+0x52/0x70
> [272.634604]  i915_driver_probe+0x63a/0xdd0 [i915]
> 
> [283.405824] cpu_latency_qos_update_request called for unknown object
> [283.405866] WARNING: CPU: 2 PID: 240 at kernel/power/qos.c:296 cpu_latency_qos_update_request+0x2d/0x100
> [283.405912] CPU: 2 PID: 240 Comm: kworker/u64:9 Not tainted 5.18.0-rc6-Patchwork_103738v3-g1672d1c43e43+ #1
> [283.405915] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.2397.A01.2109300731 09/30/2021
> [283.405916] Workqueue: i915-dp i915_digport_work_func [i915]
> [283.406020] RIP: 0010:cpu_latency_qos_update_request+0x2d/0x100
> ...
> [283.406040] Call Trace:
> [283.406041]  <TASK>
> [283.406044]  intel_dp_aux_xfer+0x60e/0x8e0 [i915]
> [283.406131]  ? finish_swait+0x80/0x80
> [283.406139]  intel_dp_aux_transfer+0xc5/0x2b0 [i915]
> [283.406218]  drm_dp_dpcd_access+0x79/0x130 [drm_display_helper]
> [283.406227]  drm_dp_dpcd_read+0xe2/0xf0 [drm_display_helper]
> [283.406233]  intel_dp_hpd_pulse+0x134/0x570 [i915]
> [283.406308]  ? __down_killable+0x70/0x140
> [283.406313]  i915_digport_work_func+0xba/0x150 [i915]
> 
> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
> ---
> Hi all,
> 
> This is my Nth attempt to solve some old CI bug[1].
> v-2: caused issues in kms code [2],
> v-1: revealed that not only fbdev does not like HPD on removal [3],
> v3: lacks drm_kms_helper_poll_disable[4]
> v4: added dump_stack to detect late fb registration
> v5: added intel_dp_mst_suspend to stop late fb registration
> v6: removed dump_stack with hope everything is handled
> 
> Moreover this is quite rare bug, but due to specific configuration
> of one of CI machines it appears there very frequently.
> Forgive me spamming the list.
> 
> [1]: https://gitlab.freedesktop.org/drm/intel/-/issues/5329
> [2]: https://patchwork.freedesktop.org/series/103621/
> [3]: https://patchwork.freedesktop.org/series/103738/
> [4]: https://patchwork.freedesktop.org/series/103811/
> 
> Regards
> Andrzej
> ---
>   drivers/gpu/drm/i915/display/intel_display.c | 11 ++++-------
>   1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 806d50b302ab92..007bc6daef7d31 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -10486,13 +10486,6 @@ void intel_modeset_driver_remove_noirq(struct drm_i915_private *i915)
>   	 */
>   	intel_hpd_poll_fini(i915);
>   
> -	/*
> -	 * MST topology needs to be suspended so we don't have any calls to
> -	 * fbdev after it's finalized. MST will be destroyed later as part of
> -	 * drm_mode_config_cleanup()
> -	 */
> -	intel_dp_mst_suspend(i915);
> -
>   	/* poll work can call into fbdev, hence clean that up afterwards */
>   	intel_fbdev_fini(i915);
>   
> @@ -10584,6 +10577,10 @@ void intel_display_driver_unregister(struct drm_i915_private *i915)
>   	if (!HAS_DISPLAY(i915))
>   		return;
>   
> +	intel_dp_mst_suspend(i915);
> +	intel_hpd_cancel_work(i915);
> +	drm_kms_helper_poll_disable(&i915->drm);
> +
>   	intel_fbdev_unregister(i915);
>   	intel_audio_deinit(i915);
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 806d50b302ab92..007bc6daef7d31 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -10486,13 +10486,6 @@  void intel_modeset_driver_remove_noirq(struct drm_i915_private *i915)
 	 */
 	intel_hpd_poll_fini(i915);
 
-	/*
-	 * MST topology needs to be suspended so we don't have any calls to
-	 * fbdev after it's finalized. MST will be destroyed later as part of
-	 * drm_mode_config_cleanup()
-	 */
-	intel_dp_mst_suspend(i915);
-
 	/* poll work can call into fbdev, hence clean that up afterwards */
 	intel_fbdev_fini(i915);
 
@@ -10584,6 +10577,10 @@  void intel_display_driver_unregister(struct drm_i915_private *i915)
 	if (!HAS_DISPLAY(i915))
 		return;
 
+	intel_dp_mst_suspend(i915);
+	intel_hpd_cancel_work(i915);
+	drm_kms_helper_poll_disable(&i915->drm);
+
 	intel_fbdev_unregister(i915);
 	intel_audio_deinit(i915);