Message ID | 20190128205603.16372-2-lyude@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: MST and wakeref leak fixes | expand |
Quoting Lyude Paul (2019-01-28 20:56:01) > When resuming, we check whether or not any previously connected > MST topologies are still present and if so, attempt to resume them. If > this fails, we disable said MST topologies and fire off a hotplug event > so that userspace knows to reprobe. > > However, sending a hotplug event involves calling > drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a > connector reprobe in the caller's thread - something we can't do at the > point in which i915 calls drm_dp_mst_topology_mgr_resume() since > hotplugging hasn't been fully initialized yet. > > This currently causes some rather subtle but fatal issues. For example, > on my T480s the laptop dock connected to it usually disappears during a > suspend cycle, and comes back up a short while after the system has been > resumed. This guarantees pretty much every suspend and resume cycle, > drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, > a connector hotplug will occur. Now it's Rute Goldberg time: when the > connector hotplug occurs, i915 reprobes /all/ of the connectors, > including eDP. However, eDP probing requires that we power on the panel > VDD which in turn, grabs a wakeref to the appropriate power domain on > the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where > things start breaking, since this all happens before > intel_power_domains_enable() is called we end up leaking the wakeref > that was acquired and never releasing it later. Come next suspend/resume > cycle, this causes us to fail to shut down the GPU properly, which > causes it not to resume properly and die a horrible complicated death. > > (as a note: this only happens when there's both an eDP panel and MST > topology connected which is removed mid-suspend. One or the other seems > to always be OK). > > We could try to fix the VDD wakeref leak, but this doesn't seem like > it's worth it at all since we aren't able to handle hotplug detection > while resuming anyway. So, let's go with a more robust solution inspired > by nouveau: block fbdev from handling hotplug events until we resume > fbdev. This allows us to still send sysfs hotplug events to be handled > later by user space while we're resuming, while also preventing us from > actually processing any hotplug events we receive until it's safe. > > This fixes the wakeref leak observed on the T480s and as such, also > fixes suspend/resume with MST topologies connected on this machine. > > Signed-off-by: Lyude Paul <lyude@redhat.com> > Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") > Cc: Todd Previte <tprevite@gmail.com> > Cc: Dave Airlie <airlied@redhat.com> > Cc: Jani Nikula <jani.nikula@linux.intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> > Cc: Imre Deak <imre.deak@intel.com> > Cc: intel-gfx@lists.freedesktop.org > Cc: <stable@vger.kernel.org> # v3.17+ > --- > drivers/gpu/drm/i915/intel_drv.h | 10 ++++++++++ > drivers/gpu/drm/i915/intel_fbdev.c | 30 +++++++++++++++++++++++++++++- > 2 files changed, 39 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h > index 85b913ea6e80..c8549588b2ce 100644 > --- a/drivers/gpu/drm/i915/intel_drv.h > +++ b/drivers/gpu/drm/i915/intel_drv.h > @@ -213,6 +213,16 @@ struct intel_fbdev { > unsigned long vma_flags; > async_cookie_t cookie; > int preferred_bpp; > + > + /* Whether or not fbdev hpd processing is temporarily suspended */ > + bool hpd_suspended : 1; > + /* Set when a hotplug was received while HPD processing was > + * suspended > + */ > + bool hpd_waiting : 1; > + > + /* Protects hpd_suspended */ > + struct mutex hpd_lock; > }; > > struct intel_encoder { > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c > index 8cf3efe88f02..3a6c0bebaaf9 100644 > --- a/drivers/gpu/drm/i915/intel_fbdev.c > +++ b/drivers/gpu/drm/i915/intel_fbdev.c > @@ -681,6 +681,7 @@ int intel_fbdev_init(struct drm_device *dev) > if (ifbdev == NULL) > return -ENOMEM; > > + mutex_init(&ifbdev->hpd_lock); > drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs); > > if (!intel_fbdev_init_bios(dev, ifbdev)) > @@ -754,6 +755,23 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv) > intel_fbdev_destroy(ifbdev); > } > > +/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD > + * processing, fbdev will perform a full connector reprobe if a hotplug event > + * was received while HPD was suspended. > + */ > +static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state) > +{ > + mutex_lock(&ifbdev->hpd_lock); > + ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED; > + if (ifbdev->hpd_waiting) { > + ifbdev->hpd_waiting = false; > + > + DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n"); > + drm_fb_helper_hotplug_event(&ifbdev->helper); Even when set_suspend(true) ? Then on resume, we don't care as another hotplug is generated. Calling the hotplug_even from under the mutex feels like unnecessary lock spreading. > + } > + mutex_unlock(&ifbdev->hpd_lock); I am thinking along the lines of: bool send_event = false; mutex_lock(&hpd_lock); ifbdev->hpd_suspended = state == SUSPENDED; send_event = !hpd_suspended && hpd_waiting; hpd_waiting = false; mutex_unlock(&hpd_lock) if (send_event) { DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n"); drm_fb_helper_hotplug_event(&ifbdev->helper); } > void intel_fbdev_output_poll_changed(struct drm_device *dev) > @@ -812,8 +833,15 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev) > return; > > intel_fbdev_sync(ifbdev); > - if (ifbdev->vma || ifbdev->helper.deferred_setup) > + > + mutex_lock(&ifbdev->hpd_lock); > + if (ifbdev->hpd_suspended) { > + DRM_DEBUG_KMS("fbdev HPD event deferred until resume\n"); > + ifbdev->hpd_waiting = true; > + } else if (ifbdev->vma || ifbdev->helper.deferred_setup) { > drm_fb_helper_hotplug_event(&ifbdev->helper); > + } Seems ripe for simplification. We can always just set hpd_waiting as it will be reset over suspend, so then we just need a way of tidying up the "am I ready to send" check. -Chris
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: 0e32b39ceed6 drm/i915: add DP 1.2 MST support (v0.7). The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133. v4.20.5: Build OK! v4.19.18: Build OK! v4.14.96: Failed to apply! Possible dependencies: df9e6521749a ("drm/i915/fbdev: Enable late fbdev initial configuration") v4.9.153: Failed to apply! Possible dependencies: 1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state") 275f039db56f ("drm/i915: Move user fault tracking to a separate list") 3594a3e21f1f ("drm/i915: Remove superfluous locking around userfault_list") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend") 8baa1f04b9ed ("drm/i915: Update debugfs describe_obj() to show fault-mappable") 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage") a4f5ea64f0a8 ("drm/i915: Refactor object page API") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") bf9e8429ab97 ("drm/i915: Make various init functions take dev_priv") eef57324d926 ("drm/i915: setup bridge for HDMI LPE audio driver") f8a7fde45610 ("drm/i915: Defer active reference until required") fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker") v4.4.172: Failed to apply! Possible dependencies: 0673ad472b98 ("drm/i915: Merge i915_dma.c into i915_drv.c") 0a9d2bed5557 ("drm/i915/skl: Making DC6 entry is the last call in suspend flow.") 0ad35fed618c ("drm/i915: gvt: Introduce the basic architecture of GVT-g") 1f814daca43a ("drm/i915: add support for checking if we hold an RPM reference") 2f693e28b8df ("drm/i915: Make turning on/off PW1 and Misc I/O part of the init/fini sequences") 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") 399bb5b6db02 ("drm/i915: Move allocation of various workqueues earlier during init") 414b7999b8be ("drm/i915/gen9: Remove csr.state, csr_lock and related code.") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 5d7a6eefc3b0 ("drm/i915: Split out load time early initialization") 643a24b6ecdc ("drm/i915: Kconfig for extra driver debugging") 666a45379e2c ("drm/i915: Separate cherryview from valleyview") 73dfc227ff5c ("drm/i915/skl: init/uninit display core as part of the HW power domain state") 755412e29c77 ("drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER") 9c5308ea1cd4 ("drm/i915/skl: Refuse to load outdated dmc firmware") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") b6e7d894c3d2 ("drm/i915/skl: Store and print the DMC firmware version we load") bc87229f323e ("drm/i915/skl: enable PC9/10 power states during suspend-to-idle") c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk") ebae38d061df ("drm/i915/gen9: csr_init after runtime pm enable") f4448375467d ("drm/i915/gen9: Use dev_priv in csr functions") f514c2d84285 ("drm/i915/gen9: flush DMC fw loading work during system suspend") v3.18.133: Failed to apply! Possible dependencies: 03e515f7f894 ("drm/i915: Make sure we invalidate frontbuffer on fbcon.") 0673ad472b98 ("drm/i915: Merge i915_dma.c into i915_drv.c") 08524a9ffa39 ("drm/i915/skl: Restore pipe B/C interrupts") 21cff1484742 ("drm/i915: Use new drm_fb_helper functions") 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 9c065a7d5b67 ("drm/i915: Extract intel_runtime_pm.c") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") cf9d2890da19 ("drm/i915: Introduce a PV INFO page structure for Intel GVT-g.") d2dee86cece9 ("drm/i915: extract intel_init_fbc()") d9a946b52350 ("drm/i915: Another fbdev hack to avoid PSR on fbcon.") e4e7684fc5c5 ("drm/i915: Kerneldoc for intel_runtime_pm.c") f458ebbc3329 ("drm/i915: Bikeshed rpm functions name a bit.") fca52a5565fb ("drm/i915: kerneldoc for interrupt enable/disable functions") How should we proceed with this patch? -- Thanks, Sasha
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: 0e32b39ceed6 drm/i915: add DP 1.2 MST support (v0.7). The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133. v4.20.5: Build OK! v4.19.18: Build OK! v4.14.96: Failed to apply! Possible dependencies: df9e6521749a ("drm/i915/fbdev: Enable late fbdev initial configuration") v4.9.153: Failed to apply! Possible dependencies: 1c777c5d1dcd ("drm/i915/hsw: Fix GPU hang during resume from S3-devices state") 275f039db56f ("drm/i915: Move user fault tracking to a separate list") 3594a3e21f1f ("drm/i915: Remove superfluous locking around userfault_list") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 7c108fd8feac ("drm/i915: Move fence cancellation to runtime suspend") 8baa1f04b9ed ("drm/i915: Update debugfs describe_obj() to show fault-mappable") 96d776345277 ("drm/i915: Use a radixtree for random access to the object's backing storage") a4f5ea64f0a8 ("drm/i915: Refactor object page API") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") bf9e8429ab97 ("drm/i915: Make various init functions take dev_priv") eef57324d926 ("drm/i915: setup bridge for HDMI LPE audio driver") f8a7fde45610 ("drm/i915: Defer active reference until required") fbbd37b36fa5 ("drm/i915: Move object release to a freelist + worker") v4.4.172: Failed to apply! Possible dependencies: 0673ad472b98 ("drm/i915: Merge i915_dma.c into i915_drv.c") 0a9d2bed5557 ("drm/i915/skl: Making DC6 entry is the last call in suspend flow.") 0ad35fed618c ("drm/i915: gvt: Introduce the basic architecture of GVT-g") 1f814daca43a ("drm/i915: add support for checking if we hold an RPM reference") 2f693e28b8df ("drm/i915: Make turning on/off PW1 and Misc I/O part of the init/fini sequences") 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") 399bb5b6db02 ("drm/i915: Move allocation of various workqueues earlier during init") 414b7999b8be ("drm/i915/gen9: Remove csr.state, csr_lock and related code.") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 5d7a6eefc3b0 ("drm/i915: Split out load time early initialization") 643a24b6ecdc ("drm/i915: Kconfig for extra driver debugging") 666a45379e2c ("drm/i915: Separate cherryview from valleyview") 73dfc227ff5c ("drm/i915/skl: init/uninit display core as part of the HW power domain state") 755412e29c77 ("drm/i915: Add an optional selection from i915 of CONFIG_MMU_NOTIFIER") 9c5308ea1cd4 ("drm/i915/skl: Refuse to load outdated dmc firmware") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") b6e7d894c3d2 ("drm/i915/skl: Store and print the DMC firmware version we load") bc87229f323e ("drm/i915/skl: enable PC9/10 power states during suspend-to-idle") c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk") ebae38d061df ("drm/i915/gen9: csr_init after runtime pm enable") f4448375467d ("drm/i915/gen9: Use dev_priv in csr functions") f514c2d84285 ("drm/i915/gen9: flush DMC fw loading work during system suspend") v3.18.133: Failed to apply! Possible dependencies: 03e515f7f894 ("drm/i915: Make sure we invalidate frontbuffer on fbcon.") 0673ad472b98 ("drm/i915: Merge i915_dma.c into i915_drv.c") 08524a9ffa39 ("drm/i915/skl: Restore pipe B/C interrupts") 21cff1484742 ("drm/i915: Use new drm_fb_helper functions") 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") 4f256d8219f2 ("drm/i915: Fix fbdev unload sequence") 9c065a7d5b67 ("drm/i915: Extract intel_runtime_pm.c") ad88d7fc6c03 ("drm/i915/fbdev: Serialise early hotplug events with async fbdev config") cf9d2890da19 ("drm/i915: Introduce a PV INFO page structure for Intel GVT-g.") d2dee86cece9 ("drm/i915: extract intel_init_fbc()") d9a946b52350 ("drm/i915: Another fbdev hack to avoid PSR on fbcon.") e4e7684fc5c5 ("drm/i915: Kerneldoc for intel_runtime_pm.c") f458ebbc3329 ("drm/i915: Bikeshed rpm functions name a bit.") fca52a5565fb ("drm/i915: kerneldoc for interrupt enable/disable functions") How should we proceed with this patch? -- Thanks, Sasha
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 85b913ea6e80..c8549588b2ce 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -213,6 +213,16 @@ struct intel_fbdev { unsigned long vma_flags; async_cookie_t cookie; int preferred_bpp; + + /* Whether or not fbdev hpd processing is temporarily suspended */ + bool hpd_suspended : 1; + /* Set when a hotplug was received while HPD processing was + * suspended + */ + bool hpd_waiting : 1; + + /* Protects hpd_suspended */ + struct mutex hpd_lock; }; struct intel_encoder { diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 8cf3efe88f02..3a6c0bebaaf9 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -681,6 +681,7 @@ int intel_fbdev_init(struct drm_device *dev) if (ifbdev == NULL) return -ENOMEM; + mutex_init(&ifbdev->hpd_lock); drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs); if (!intel_fbdev_init_bios(dev, ifbdev)) @@ -754,6 +755,23 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv) intel_fbdev_destroy(ifbdev); } +/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD + * processing, fbdev will perform a full connector reprobe if a hotplug event + * was received while HPD was suspended. + */ +static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state) +{ + mutex_lock(&ifbdev->hpd_lock); + ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED; + if (ifbdev->hpd_waiting) { + ifbdev->hpd_waiting = false; + + DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n"); + drm_fb_helper_hotplug_event(&ifbdev->helper); + } + mutex_unlock(&ifbdev->hpd_lock); +} + void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous) { struct drm_i915_private *dev_priv = to_i915(dev); @@ -775,6 +793,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous */ if (state != FBINFO_STATE_RUNNING) flush_work(&dev_priv->fbdev_suspend_work); + console_lock(); } else { /* @@ -802,6 +821,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous drm_fb_helper_set_suspend(&ifbdev->helper, state); console_unlock(); + + intel_fbdev_hpd_set_suspend(ifbdev, state); } void intel_fbdev_output_poll_changed(struct drm_device *dev) @@ -812,8 +833,15 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev) return; intel_fbdev_sync(ifbdev); - if (ifbdev->vma || ifbdev->helper.deferred_setup) + + mutex_lock(&ifbdev->hpd_lock); + if (ifbdev->hpd_suspended) { + DRM_DEBUG_KMS("fbdev HPD event deferred until resume\n"); + ifbdev->hpd_waiting = true; + } else if (ifbdev->vma || ifbdev->helper.deferred_setup) { drm_fb_helper_hotplug_event(&ifbdev->helper); + } + mutex_unlock(&ifbdev->hpd_lock); } void intel_fbdev_restore_mode(struct drm_device *dev)
When resuming, we check whether or not any previously connected MST topologies are still present and if so, attempt to resume them. If this fails, we disable said MST topologies and fire off a hotplug event so that userspace knows to reprobe. However, sending a hotplug event involves calling drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a connector reprobe in the caller's thread - something we can't do at the point in which i915 calls drm_dp_mst_topology_mgr_resume() since hotplugging hasn't been fully initialized yet. This currently causes some rather subtle but fatal issues. For example, on my T480s the laptop dock connected to it usually disappears during a suspend cycle, and comes back up a short while after the system has been resumed. This guarantees pretty much every suspend and resume cycle, drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, a connector hotplug will occur. Now it's Rute Goldberg time: when the connector hotplug occurs, i915 reprobes /all/ of the connectors, including eDP. However, eDP probing requires that we power on the panel VDD which in turn, grabs a wakeref to the appropriate power domain on the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where things start breaking, since this all happens before intel_power_domains_enable() is called we end up leaking the wakeref that was acquired and never releasing it later. Come next suspend/resume cycle, this causes us to fail to shut down the GPU properly, which causes it not to resume properly and die a horrible complicated death. (as a note: this only happens when there's both an eDP panel and MST topology connected which is removed mid-suspend. One or the other seems to always be OK). We could try to fix the VDD wakeref leak, but this doesn't seem like it's worth it at all since we aren't able to handle hotplug detection while resuming anyway. So, let's go with a more robust solution inspired by nouveau: block fbdev from handling hotplug events until we resume fbdev. This allows us to still send sysfs hotplug events to be handled later by user space while we're resuming, while also preventing us from actually processing any hotplug events we receive until it's safe. This fixes the wakeref leak observed on the T480s and as such, also fixes suspend/resume with MST topologies connected on this machine. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") Cc: Todd Previte <tprevite@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v3.17+ --- drivers/gpu/drm/i915/intel_drv.h | 10 ++++++++++ drivers/gpu/drm/i915/intel_fbdev.c | 30 +++++++++++++++++++++++++++++- 2 files changed, 39 insertions(+), 1 deletion(-)