diff mbox series

[v3,1/3] drm/i915: Block fbdev HPD processing during suspend

Message ID 20190129191001.442-2-lyude@redhat.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: MST and wakeref leak fixes | expand

Commit Message

Lyude Paul Jan. 29, 2019, 7:09 p.m. UTC
When resuming, we check whether or not any previously connected
MST topologies are still present and if so, attempt to resume them. If
this fails, we disable said MST topologies and fire off a hotplug event
so that userspace knows to reprobe.

However, sending a hotplug event involves calling
drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
connector reprobe in the caller's thread - something we can't do at the
point in which i915 calls drm_dp_mst_topology_mgr_resume() since
hotplugging hasn't been fully initialized yet.

This currently causes some rather subtle but fatal issues. For example,
on my T480s the laptop dock connected to it usually disappears during a
suspend cycle, and comes back up a short while after the system has been
resumed. This guarantees pretty much every suspend and resume cycle,
drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
a connector hotplug will occur. Now it's Rute Goldberg time: when the
connector hotplug occurs, i915 reprobes /all/ of the connectors,
including eDP. However, eDP probing requires that we power on the panel
VDD which in turn, grabs a wakeref to the appropriate power domain on
the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
things start breaking, since this all happens before
intel_power_domains_enable() is called we end up leaking the wakeref
that was acquired and never releasing it later. Come next suspend/resume
cycle, this causes us to fail to shut down the GPU properly, which
causes it not to resume properly and die a horrible complicated death.

(as a note: this only happens when there's both an eDP panel and MST
topology connected which is removed mid-suspend. One or the other seems
to always be OK).

We could try to fix the VDD wakeref leak, but this doesn't seem like
it's worth it at all since we aren't able to handle hotplug detection
while resuming anyway. So, let's go with a more robust solution inspired
by nouveau: block fbdev from handling hotplug events until we resume
fbdev. This allows us to still send sysfs hotplug events to be handled
later by user space while we're resuming, while also preventing us from
actually processing any hotplug events we receive until it's safe.

This fixes the wakeref leak observed on the T480s and as such, also
fixes suspend/resume with MST topologies connected on this machine.

Changes since v2:
* Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
  (Chris Wilson)
* Don't call drm_fb_helper_hotplug_event() in
  intel_fbdev_output_poll_changed() under lock (Chris Wilson)
* Always set ifbdev->hpd_waiting (Chris Wilson)

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
Cc: Todd Previte <tprevite@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.17+
---
 drivers/gpu/drm/i915/intel_drv.h   | 10 +++++++++
 drivers/gpu/drm/i915/intel_fbdev.c | 33 +++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

Comments

Chris Wilson Jan. 29, 2019, 7:20 p.m. UTC | #1
Quoting Lyude Paul (2019-01-29 19:09:59)
> When resuming, we check whether or not any previously connected
> MST topologies are still present and if so, attempt to resume them. If
> this fails, we disable said MST topologies and fire off a hotplug event
> so that userspace knows to reprobe.
> 
> However, sending a hotplug event involves calling
> drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
> connector reprobe in the caller's thread - something we can't do at the
> point in which i915 calls drm_dp_mst_topology_mgr_resume() since
> hotplugging hasn't been fully initialized yet.
> 
> This currently causes some rather subtle but fatal issues. For example,
> on my T480s the laptop dock connected to it usually disappears during a
> suspend cycle, and comes back up a short while after the system has been
> resumed. This guarantees pretty much every suspend and resume cycle,
> drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
> a connector hotplug will occur. Now it's Rute Goldberg time: when the
> connector hotplug occurs, i915 reprobes /all/ of the connectors,
> including eDP. However, eDP probing requires that we power on the panel
> VDD which in turn, grabs a wakeref to the appropriate power domain on
> the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
> things start breaking, since this all happens before
> intel_power_domains_enable() is called we end up leaking the wakeref
> that was acquired and never releasing it later. Come next suspend/resume
> cycle, this causes us to fail to shut down the GPU properly, which
> causes it not to resume properly and die a horrible complicated death.
> 
> (as a note: this only happens when there's both an eDP panel and MST
> topology connected which is removed mid-suspend. One or the other seems
> to always be OK).
> 
> We could try to fix the VDD wakeref leak, but this doesn't seem like
> it's worth it at all since we aren't able to handle hotplug detection
> while resuming anyway. So, let's go with a more robust solution inspired
> by nouveau: block fbdev from handling hotplug events until we resume
> fbdev. This allows us to still send sysfs hotplug events to be handled
> later by user space while we're resuming, while also preventing us from
> actually processing any hotplug events we receive until it's safe.
> 
> This fixes the wakeref leak observed on the T480s and as such, also
> fixes suspend/resume with MST topologies connected on this machine.
> 
> Changes since v2:
> * Don't call drm_fb_helper_hotplug_event() under lock, do it after lock
>   (Chris Wilson)
> * Don't call drm_fb_helper_hotplug_event() in
>   intel_fbdev_output_poll_changed() under lock (Chris Wilson)
> * Always set ifbdev->hpd_waiting (Chris Wilson)
> 
> Signed-off-by: Lyude Paul <lyude@redhat.com>
> Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
> Cc: Todd Previte <tprevite@gmail.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: intel-gfx@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v3.17+

I suspect the locking is overkill, but certainly easier to reason than
trying to remember all the ins and outs of fbdev with its dubious async
initialisation.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Sasha Levin Jan. 30, 2019, 2:46 p.m. UTC | #2
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 0e32b39ceed6 drm/i915: add DP 1.2 MST support (v0.7).

The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133.

v4.20.5: Build OK!
v4.19.18: Build OK!
v4.14.96: Failed to apply! Possible dependencies:
    df9e6521749a ("drm/i915/fbdev: Enable late fbdev initial configuration")

v4.9.153: Failed to apply! Possible dependencies:
    1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state")
    275f039db56f ("drm/i915: Move user fault tracking to a separate list")
    3594a3e21f1f ("drm/i915: Remove superfluous locking around userfault_list")
    4f256d8219f2 ("drm/i915: Fix fbdev unload sequence")
    7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend")
    8baa1f04b9ed ("drm/i915: Update debugfs describe_obj() to show fault-mappable")
    96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage")
    a4f5ea64f0a8 ("drm/i915: Refactor object page API")
    ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config")
    bf9e8429ab97 ("drm/i915: Make various init functions take dev_priv")
    eef57324d926 ("drm/i915: setup bridge for HDMI LPE audio driver")
    f8a7fde45610 ("drm/i915: Defer active reference until required")
    fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")

v4.4.172: Failed to apply! Possible dependencies:
    61642ff03523 ("drm/i915: Inspect subunit states on hangcheck")
    c45eb4fed12d ("drm/i915/fbdev: Check for the framebuffer before use")
    ca82580c9cea ("drm/i915: Do not call API requiring struct_mutex where it is not available")
    cbdc12a9fc9d ("drm/i915: make A0 wa's applied to A1")
    e28e404c3e93 ("drm/i915: tidy up a few leftovers")
    e2f80391478a ("drm/i915: Rename local struct intel_engine_cs variables")
    e87a005d90c3 ("drm/i915: add helpers for platform specific revision id range checks")
    ed54c1a1d11c ("drm/i915: abolish separate per-ring default_context pointers")
    ef712bb4b700 ("drm/i915: remove parens around revision ids")
    fac5e23e3c38 ("drm/i915: Mass convert dev->dev_private to to_i915(dev)")
    fffda3f4fb49 ("drm/i915/bxt: add revision id for A1 stepping and use it")

v3.18.133: Failed to apply! Possible dependencies:
    0794aed30285 ("drm/i915: Fix context object leak for legacy contexts")
    20e28fba48f2 ("drm/i915: Be consistent on printing seqnos")
    24955f2412fa ("drm/i915: Clarify mmio_flip_lock locking")
    26ff27621080 ("drm/i915: Add kerneldoc for intel_pipe_update_{start, end}")
    3a8a946efbe0 ("drm/i915: Remove redundant flip_work->flip_queued_ring")
    481a3d43b94f ("drm/i915: Include active flag when describing objects in debugfs")
    493018dcb1c7 ("drm/i915: Implement a framework for batch buffer pools")
    536f5b5e86b2 ("drm/i915: Make mmio flip wait for seqno in the work function")
    6259cead57eb ("drm/i915: Remove 'outstanding_lazy_seqno'")
    9362c7c576d3 ("drm/i915: Use vblank evade mechanism in mmio_flip")
    97b2a6a10a1a ("drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req")
    9eba5d4a1d79 ("drm/i915: Ensure OLS & PLR are always in sync")
    abfe262ae762 ("drm/i915: Add reference count to request structure")
    b47161858ba1 ("drm/i915: Implement inter-engine read-read optimisations")
    c45eb4fed12d ("drm/i915/fbdev: Check for the framebuffer before use")
    dcb4c12a6877 ("drm/i915/bdw: Pin the context backing objects to GGTT on-demand")
    e2f80391478a ("drm/i915: Rename local struct intel_engine_cs variables")
    f06cc1b9401c ("drm/i915: Convert 'flip_queued_seqno' into 'flip_queued_request'")
    fac5e23e3c38 ("drm/i915: Mass convert dev->dev_private to to_i915(dev)")


How should we proceed with this patch?

--
Thanks,
Sasha
Lyude Paul Feb. 6, 2019, 7:37 p.m. UTC | #3
On Wed, 2019-01-30 at 14:46 +0000, Sasha Levin wrote:
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 0e32b39ceed6 drm/i915: add DP 1.2 MST support (v0.7).
> 
> The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96,
> v4.9.153, v4.4.172, v3.18.133.
> 
> v4.20.5: Build OK!
> v4.19.18: Build OK!

Just apply this to 4.20 and 4.19 then
> v4.14.96: Failed to apply! Possible dependencies:
>     df9e6521749a ("drm/i915/fbdev: Enable late fbdev initial configuration")
> 
> v4.9.153: Failed to apply! Possible dependencies:
>     1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices
> state")
>     275f039db56f ("drm/i915: Move user fault tracking to a separate list")
>     3594a3e21f1f ("drm/i915: Remove superfluous locking around
> userfault_list")
>     4f256d8219f2 ("drm/i915: Fix fbdev unload sequence")
>     7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend")
>     8baa1f04b9ed ("drm/i915: Update debugfs describe_obj() to show fault-
> mappable")
>     96d776345277 ("drm/i915: Use a radixtree for random access to the
> object's backing storage")
>     a4f5ea64f0a8 ("drm/i915: Refactor object page API")
>     ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async
> fbdev config")
>     bf9e8429ab97 ("drm/i915: Make various init functions take dev_priv")
>     eef57324d926 ("drm/i915: setup bridge for HDMI LPE audio driver")
>     f8a7fde45610 ("drm/i915: Defer active reference until required")
>     fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker")
> 
> v4.4.172: Failed to apply! Possible dependencies:
>     61642ff03523 ("drm/i915: Inspect subunit states on hangcheck")
>     c45eb4fed12d ("drm/i915/fbdev: Check for the framebuffer before use")
>     ca82580c9cea ("drm/i915: Do not call API requiring struct_mutex where it
> is not available")
>     cbdc12a9fc9d ("drm/i915: make A0 wa's applied to A1")
>     e28e404c3e93 ("drm/i915: tidy up a few leftovers")
>     e2f80391478a ("drm/i915: Rename local struct intel_engine_cs variables")
>     e87a005d90c3 ("drm/i915: add helpers for platform specific revision id
> range checks")
>     ed54c1a1d11c ("drm/i915: abolish separate per-ring default_context
> pointers")
>     ef712bb4b700 ("drm/i915: remove parens around revision ids")
>     fac5e23e3c38 ("drm/i915: Mass convert dev->dev_private to to_i915(dev)")
>     fffda3f4fb49 ("drm/i915/bxt: add revision id for A1 stepping and use
> it")
> 
> v3.18.133: Failed to apply! Possible dependencies:
>     0794aed30285 ("drm/i915: Fix context object leak for legacy contexts")
>     20e28fba48f2 ("drm/i915: Be consistent on printing seqnos")
>     24955f2412fa ("drm/i915: Clarify mmio_flip_lock locking")
>     26ff27621080 ("drm/i915: Add kerneldoc for intel_pipe_update_{start,
> end}")
>     3a8a946efbe0 ("drm/i915: Remove redundant flip_work->flip_queued_ring")
>     481a3d43b94f ("drm/i915: Include active flag when describing objects in
> debugfs")
>     493018dcb1c7 ("drm/i915: Implement a framework for batch buffer pools")
>     536f5b5e86b2 ("drm/i915: Make mmio flip wait for seqno in the work
> function")
>     6259cead57eb ("drm/i915: Remove 'outstanding_lazy_seqno'")
>     9362c7c576d3 ("drm/i915: Use vblank evade mechanism in mmio_flip")
>     97b2a6a10a1a ("drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req")
>     9eba5d4a1d79 ("drm/i915: Ensure OLS & PLR are always in sync")
>     abfe262ae762 ("drm/i915: Add reference count to request structure")
>     b47161858ba1 ("drm/i915: Implement inter-engine read-read
> optimisations")
>     c45eb4fed12d ("drm/i915/fbdev: Check for the framebuffer before use")
>     dcb4c12a6877 ("drm/i915/bdw: Pin the context backing objects to GGTT on-
> demand")
>     e2f80391478a ("drm/i915: Rename local struct intel_engine_cs variables")
>     f06cc1b9401c ("drm/i915: Convert 'flip_queued_seqno' into
> 'flip_queued_request'")
>     fac5e23e3c38 ("drm/i915: Mass convert dev->dev_private to to_i915(dev)")
> 
> 
> How should we proceed with this patch?
> 
> --
> Thanks,
> Sasha
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 90ba5436370e..9ec3d00fbd19 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -213,6 +213,16 @@  struct intel_fbdev {
 	unsigned long vma_flags;
 	async_cookie_t cookie;
 	int preferred_bpp;
+
+	/* Whether or not fbdev hpd processing is temporarily suspended */
+	bool hpd_suspended : 1;
+	/* Set when a hotplug was received while HPD processing was
+	 * suspended
+	 */
+	bool hpd_waiting : 1;
+
+	/* Protects hpd_suspended */
+	struct mutex hpd_lock;
 };
 
 struct intel_encoder {
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 8cf3efe88f02..376ffe842e26 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -681,6 +681,7 @@  int intel_fbdev_init(struct drm_device *dev)
 	if (ifbdev == NULL)
 		return -ENOMEM;
 
+	mutex_init(&ifbdev->hpd_lock);
 	drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs);
 
 	if (!intel_fbdev_init_bios(dev, ifbdev))
@@ -754,6 +755,26 @@  void intel_fbdev_fini(struct drm_i915_private *dev_priv)
 	intel_fbdev_destroy(ifbdev);
 }
 
+/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD
+ * processing, fbdev will perform a full connector reprobe if a hotplug event
+ * was received while HPD was suspended.
+ */
+static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state)
+{
+	bool send_hpd = false;
+
+	mutex_lock(&ifbdev->hpd_lock);
+	ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED;
+	send_hpd = !ifbdev->hpd_suspended && ifbdev->hpd_waiting;
+	ifbdev->hpd_waiting = false;
+	mutex_unlock(&ifbdev->hpd_lock);
+
+	if (send_hpd) {
+		DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n");
+		drm_fb_helper_hotplug_event(&ifbdev->helper);
+	}
+}
+
 void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -775,6 +796,7 @@  void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 		 */
 		if (state != FBINFO_STATE_RUNNING)
 			flush_work(&dev_priv->fbdev_suspend_work);
+
 		console_lock();
 	} else {
 		/*
@@ -802,17 +824,26 @@  void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 
 	drm_fb_helper_set_suspend(&ifbdev->helper, state);
 	console_unlock();
+
+	intel_fbdev_hpd_set_suspend(ifbdev, state);
 }
 
 void intel_fbdev_output_poll_changed(struct drm_device *dev)
 {
 	struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
+	bool send_hpd;
 
 	if (!ifbdev)
 		return;
 
 	intel_fbdev_sync(ifbdev);
-	if (ifbdev->vma || ifbdev->helper.deferred_setup)
+
+	mutex_lock(&ifbdev->hpd_lock);
+	send_hpd = !ifbdev->hpd_suspended;
+	ifbdev->hpd_waiting = true;
+	mutex_unlock(&ifbdev->hpd_lock);
+
+	if (send_hpd && (ifbdev->vma || ifbdev->helper.deferred_setup))
 		drm_fb_helper_hotplug_event(&ifbdev->helper);
 }