Message ID | 20250108-mhi_recovery_fix-v1-2-a0a00a17da46@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | bus: mhi: host: pci_generic: Couple of recovery fixes | expand |
On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote: > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > is started asynchronously and success is returned. But this doesn't align > with what PM core expects as documented in > Documentation/power/runtime_pm.rst: > > "Once the subsystem-level resume callback (or the driver resume callback, > if invoked directly) has completed successfully, the PM core regards the > device as fully operational, which means that the device _must_ be able to > complete I/O operations as needed. The runtime PM status of the device is > then 'active'." > > So the PM core ends up marking the runtime PM status of the device as > 'active', even though the device is not able to handle the I/O operations. > This same condition more or less applies to system resume as well. > > So to avoid this ambiguity, try to recover the device synchronously from > mhi_pci_runtime_resume() and return the actual error code in the case of > recovery failure. > > For doing so, move the recovery code to __mhi_pci_recovery_work() helper > and call that from both mhi_pci_recovery_work() and > mhi_pci_runtime_resume(). Former still ignores the return value, while the > latter passes it to PM core. > > Cc: stable@vger.kernel.org # 5.13 > Reported-by: Johan Hovold <johan@kernel.org> > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Note that it will noticeably impact the user experience on system-wide resume (mhi_pci_resume), because MHI devices usually take a while (a few seconds) to cold boot and reach a ready state (or time out in the worst case). So we may have people complaining about delayed resume regression on their laptop even if they are not using the MHI device/modem function. Are we ok with that? Regards, Loic
On Wed, Jan 08, 2025 at 04:19:06PM +0100, Loic Poulain wrote: > On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay > <devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote: > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > > is started asynchronously and success is returned. But this doesn't align > > with what PM core expects as documented in > > Documentation/power/runtime_pm.rst: > > > > "Once the subsystem-level resume callback (or the driver resume callback, > > if invoked directly) has completed successfully, the PM core regards the > > device as fully operational, which means that the device _must_ be able to > > complete I/O operations as needed. The runtime PM status of the device is > > then 'active'." > > > > So the PM core ends up marking the runtime PM status of the device as > > 'active', even though the device is not able to handle the I/O operations. > > This same condition more or less applies to system resume as well. > > > > So to avoid this ambiguity, try to recover the device synchronously from > > mhi_pci_runtime_resume() and return the actual error code in the case of > > recovery failure. > > > > For doing so, move the recovery code to __mhi_pci_recovery_work() helper > > and call that from both mhi_pci_recovery_work() and > > mhi_pci_runtime_resume(). Former still ignores the return value, while the > > latter passes it to PM core. > > > > Cc: stable@vger.kernel.org # 5.13 > > Reported-by: Johan Hovold <johan@kernel.org> > > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > Note that it will noticeably impact the user experience on system-wide > resume (mhi_pci_resume), because MHI devices usually take a while (a > few seconds) to cold boot and reach a ready state (or time out in the > worst case). So we may have people complaining about delayed resume > regression on their laptop even if they are not using the MHI > device/modem function. Are we ok with that? > Are you saying that the modem will enter D3Cold all the time during system suspend? I think you are referring to x86 host machines here. If that is the case, we should not be using mhi_pci_runtime_*() calls in mhi_pci_suspend/resume(). Rather the MHI stack should be powered down during suspend and powered ON during resume. - Mani
On Wed, 8 Jan 2025 at 17:02, Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote: > > On Wed, Jan 08, 2025 at 04:19:06PM +0100, Loic Poulain wrote: > > On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay > > <devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote: > > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > > > is started asynchronously and success is returned. But this doesn't align > > > with what PM core expects as documented in > > > Documentation/power/runtime_pm.rst: > > > > > > "Once the subsystem-level resume callback (or the driver resume callback, > > > if invoked directly) has completed successfully, the PM core regards the > > > device as fully operational, which means that the device _must_ be able to > > > complete I/O operations as needed. The runtime PM status of the device is > > > then 'active'." > > > > > > So the PM core ends up marking the runtime PM status of the device as > > > 'active', even though the device is not able to handle the I/O operations. > > > This same condition more or less applies to system resume as well. > > > > > > So to avoid this ambiguity, try to recover the device synchronously from > > > mhi_pci_runtime_resume() and return the actual error code in the case of > > > recovery failure. > > > > > > For doing so, move the recovery code to __mhi_pci_recovery_work() helper > > > and call that from both mhi_pci_recovery_work() and > > > mhi_pci_runtime_resume(). Former still ignores the return value, while the > > > latter passes it to PM core. > > > > > > Cc: stable@vger.kernel.org # 5.13 > > > Reported-by: Johan Hovold <johan@kernel.org> > > > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > > > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > Note that it will noticeably impact the user experience on system-wide > > resume (mhi_pci_resume), because MHI devices usually take a while (a > > few seconds) to cold boot and reach a ready state (or time out in the > > worst case). So we may have people complaining about delayed resume > > regression on their laptop even if they are not using the MHI > > device/modem function. Are we ok with that? > > > > Are you saying that the modem will enter D3Cold all the time during system > suspend? I think you are referring to x86 host machines here. It depends on the host and its firmware implementation, but yes I observed that x86_64 based laptops are powering off the mPCIe slot while suspended. > If that is the case, we should not be using mhi_pci_runtime_*() calls in > mhi_pci_suspend/resume(). Rather the MHI stack should be powered down during > suspend and powered ON during resume. Yes, but what about the hosts keeping power in suspend state? we can not really know that programmatically AFAIK. Regards, Loic
On Thu, Jan 09, 2025 at 09:50:55PM +0100, Loic Poulain wrote: > On Wed, 8 Jan 2025 at 17:02, Manivannan Sadhasivam > <manivannan.sadhasivam@linaro.org> wrote: > > > > On Wed, Jan 08, 2025 at 04:19:06PM +0100, Loic Poulain wrote: > > > On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay > > > <devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote: > > > > > > > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > > > > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > > > > is started asynchronously and success is returned. But this doesn't align > > > > with what PM core expects as documented in > > > > Documentation/power/runtime_pm.rst: > > > > > > > > "Once the subsystem-level resume callback (or the driver resume callback, > > > > if invoked directly) has completed successfully, the PM core regards the > > > > device as fully operational, which means that the device _must_ be able to > > > > complete I/O operations as needed. The runtime PM status of the device is > > > > then 'active'." > > > > > > > > So the PM core ends up marking the runtime PM status of the device as > > > > 'active', even though the device is not able to handle the I/O operations. > > > > This same condition more or less applies to system resume as well. > > > > > > > > So to avoid this ambiguity, try to recover the device synchronously from > > > > mhi_pci_runtime_resume() and return the actual error code in the case of > > > > recovery failure. > > > > > > > > For doing so, move the recovery code to __mhi_pci_recovery_work() helper > > > > and call that from both mhi_pci_recovery_work() and > > > > mhi_pci_runtime_resume(). Former still ignores the return value, while the > > > > latter passes it to PM core. > > > > > > > > Cc: stable@vger.kernel.org # 5.13 > > > > Reported-by: Johan Hovold <johan@kernel.org> > > > > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > > > > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > > > Note that it will noticeably impact the user experience on system-wide > > > resume (mhi_pci_resume), because MHI devices usually take a while (a > > > few seconds) to cold boot and reach a ready state (or time out in the > > > worst case). So we may have people complaining about delayed resume > > > regression on their laptop even if they are not using the MHI > > > device/modem function. Are we ok with that? > > > > > > > Are you saying that the modem will enter D3Cold all the time during system > > suspend? I think you are referring to x86 host machines here. > > It depends on the host and its firmware implementation, but yes I > observed that x86_64 based laptops are powering off the mPCIe slot > while suspended. > Then the default behavior should be to power down the MHI stack during suspend. > > If that is the case, we should not be using mhi_pci_runtime_*() calls in > > mhi_pci_suspend/resume(). Rather the MHI stack should be powered down during > > suspend and powered ON during resume. > > Yes, but what about the hosts keeping power in suspend state? we can > not really know that programmatically AFAIK. > Well, there are a few APIs we can rely on, but they are not reliable atleast on DT platforms. However, powering down the MHI stack should be safe irrespective of what the platform decides to do. Regarding your comment on device taking time to resume, we can opt for async PM to let the device come up without affecting overall system resume. Let me know if both of the above options make sense to you. I'll submit patches to incorporate them. - Mani
On Wed, Jan 08, 2025 at 07:09:28PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > is started asynchronously and success is returned. But this doesn't align > with what PM core expects as documented in > Documentation/power/runtime_pm.rst: > > "Once the subsystem-level resume callback (or the driver resume callback, > if invoked directly) has completed successfully, the PM core regards the > device as fully operational, which means that the device _must_ be able to > complete I/O operations as needed. The runtime PM status of the device is > then 'active'." > > So the PM core ends up marking the runtime PM status of the device as > 'active', even though the device is not able to handle the I/O operations. > This same condition more or less applies to system resume as well. > > So to avoid this ambiguity, try to recover the device synchronously from > mhi_pci_runtime_resume() and return the actual error code in the case of > recovery failure. > > For doing so, move the recovery code to __mhi_pci_recovery_work() helper > and call that from both mhi_pci_recovery_work() and > mhi_pci_runtime_resume(). Former still ignores the return value, while the > latter passes it to PM core. > > Cc: stable@vger.kernel.org # 5.13 > Reported-by: Johan Hovold <johan@kernel.org> > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reasoning above makes sense, and I do indeed see resume taking five seconds longer with this patch as Loic suggested it would. Unfortunately, something else is broken as the recovery code now deadlocks again when the modem fails to resume (with both patches applied): [ 729.833701] PM: suspend entry (deep) [ 729.841377] Filesystems sync: 0.000 seconds [ 729.867672] Freezing user space processes [ 729.869494] Freezing user space processes completed (elapsed 0.001 seconds) [ 729.869499] OOM killer disabled. [ 729.869501] Freezing remaining freezable tasks [ 729.870882] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 730.184254] mhi-pci-generic 0005:01:00.0: mhi_pci_runtime_resume [ 730.190643] mhi mhi0: Resuming from non M3 state (SYS ERROR) [ 730.196587] mhi-pci-generic 0005:01:00.0: failed to resume device: -22 [ 730.203412] mhi-pci-generic 0005:01:00.0: device recovery started I've reproduced this three times in three different paths (runtime resume before suspend; runtime resume during suspend; and during system resume). I didn't try to figure what causes the deadlock this time (and lockdep does not trigger), but you should be able to reproduce this by instrumenting a resume failure. Johan
On Wed, Jan 22, 2025 at 04:24:27PM +0100, Johan Hovold wrote: > On Wed, Jan 08, 2025 at 07:09:28PM +0530, Manivannan Sadhasivam via B4 Relay wrote: > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > > > Currently, in mhi_pci_runtime_resume(), if the resume fails, recovery_work > > is started asynchronously and success is returned. But this doesn't align > > with what PM core expects as documented in > > Documentation/power/runtime_pm.rst: > > Cc: stable@vger.kernel.org # 5.13 > > Reported-by: Johan Hovold <johan@kernel.org> > > Closes: https://lore.kernel.org/mhi/Z2PbEPYpqFfrLSJi@hovoldconsulting.com > > Fixes: d3800c1dce24 ("bus: mhi: pci_generic: Add support for runtime PM") > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > Reasoning above makes sense, and I do indeed see resume taking five > seconds longer with this patch as Loic suggested it would. I forgot to mention the following warnings that now show up when system resume succeeds. Recovery was run also before this patch but the "parent mhi0 should not be sleeping" warnings are new: [ 68.753288] qcom_mhi_qrtr mhi0_IPCR: failed to prepare for autoqueue transfer -22 [ 68.761109] qcom_mhi_qrtr mhi0_IPCR: PM: dpm_run_callback(): qcom_mhi_qrtr_pm_resume_early [qrtr_mhi] returns -22 [ 68.771804] qcom_mhi_qrtr mhi0_IPCR: PM: failed to resume early: error -22 [ 68.795053] mhi-pci-generic 0005:01:00.0: mhi_pci_resume [ 68.800709] mhi-pci-generic 0005:01:00.0: mhi_pci_runtime_resume [ 68.800794] mhi mhi0: Resuming from non M3 state (RESET) [ 68.800804] mhi-pci-generic 0005:01:00.0: failed to resume device: -22 [ 68.819517] mhi-pci-generic 0005:01:00.0: device recovery started [ 68.819532] mhi-pci-generic 0005:01:00.0: __mhi_power_down [ 68.819543] mhi-pci-generic 0005:01:00.0: __mhi_power_down - pm mutex taken [ 68.819554] mhi-pci-generic 0005:01:00.0: __mhi_power_down - pm lock taken [ 68.820060] wwan wwan0: port wwan0qcdm0 disconnected [ 68.824839] nvme nvme0: 12/0/0 default/read/poll queues [ 68.857908] wwan wwan0: port wwan0mbim0 disconnected [ 68.864012] wwan wwan0: port wwan0qmi0 disconnected [ 68.943307] mhi-pci-generic 0005:01:00.0: __mhi_power_down - returns [ 68.956253] mhi mhi0: Requested to power ON [ 68.960753] mhi mhi0: Power on setup success [ 68.965262] mhi-pci-generic 0005:01:00.0: mhi_sync_power_up - wait event timeout_ms = 8000 [ 73.183086] mhi mhi0: Wait for device to enter SBL or Mission mode [ 73.653462] mhi-pci-generic 0005:01:00.0: mhi_sync_power_up - wait event returns, ret = 0 [ 73.653752] mhi mhi0_DIAG: PM: parent mhi0 should not be sleeping [ 73.661955] mhi-pci-generic 0005:01:00.0: mhi_sync_power_up - returns [ 73.668461] mhi mhi0_MBIM: PM: parent mhi0 should not be sleeping [ 73.674950] mhi-pci-generic 0005:01:00.0: Recovery completed [ 73.681428] mhi mhi0_QMI: PM: parent mhi0 should not be sleeping [ 74.315919] OOM killer enabled. [ 74.316475] wwan wwan0: port wwan0qcdm0 attached [ 74.319206] Restarting tasks ... [ 74.322825] done. [ 74.322870] random: crng reseeded on system resumption [ 74.325956] wwan wwan0: port wwan0mbim0 attached [ 74.334467] wwan wwan0: port wwan0qmi0 attached > Unfortunately, something else is broken as the recovery code now > deadlocks again when the modem fails to resume (with both patches > applied): > > [ 729.833701] PM: suspend entry (deep) > [ 729.841377] Filesystems sync: 0.000 seconds > [ 729.867672] Freezing user space processes > [ 729.869494] Freezing user space processes completed (elapsed 0.001 seconds) > [ 729.869499] OOM killer disabled. > [ 729.869501] Freezing remaining freezable tasks > [ 729.870882] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 730.184254] mhi-pci-generic 0005:01:00.0: mhi_pci_runtime_resume > [ 730.190643] mhi mhi0: Resuming from non M3 state (SYS ERROR) > [ 730.196587] mhi-pci-generic 0005:01:00.0: failed to resume device: -22 > [ 730.203412] mhi-pci-generic 0005:01:00.0: device recovery started > > I've reproduced this three times in three different paths (runtime > resume before suspend; runtime resume during suspend; and during system > resume). > > I didn't try to figure what causes the deadlock this time (and lockdep > does not trigger), but you should be able to reproduce this by > instrumenting a resume failure. Johan
diff --git a/drivers/bus/mhi/host/pci_generic.c b/drivers/bus/mhi/host/pci_generic.c index e92df380c785..f6de407e077e 100644 --- a/drivers/bus/mhi/host/pci_generic.c +++ b/drivers/bus/mhi/host/pci_generic.c @@ -997,10 +997,8 @@ static void mhi_pci_runtime_put(struct mhi_controller *mhi_cntrl) pm_runtime_put(mhi_cntrl->cntrl_dev); } -static void mhi_pci_recovery_work(struct work_struct *work) +static int __mhi_pci_recovery_work(struct mhi_pci_device *mhi_pdev) { - struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device, - recovery_work); struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl; struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev); int err; @@ -1035,13 +1033,25 @@ static void mhi_pci_recovery_work(struct work_struct *work) set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status); mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD); - return; + + return 0; err_unprepare: mhi_unprepare_after_power_down(mhi_cntrl); err_try_reset: - if (pci_try_reset_function(pdev)) + err = pci_try_reset_function(pdev); + if (err) dev_err(&pdev->dev, "Recovery failed\n"); + + return err; +} + +static void mhi_pci_recovery_work(struct work_struct *work) +{ + struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device, + recovery_work); + + __mhi_pci_recovery_work(mhi_pdev); } static void health_check(struct timer_list *t) @@ -1400,15 +1410,10 @@ static int __maybe_unused mhi_pci_runtime_resume(struct device *dev) return 0; err_recovery: - /* Do not fail to not mess up our PCI device state, the device likely - * lost power (d3cold) and we simply need to reset it from the recovery - * procedure, trigger the recovery asynchronously to prevent system - * suspend exit delaying. - */ - queue_work(system_long_wq, &mhi_pdev->recovery_work); + err = __mhi_pci_recovery_work(mhi_pdev); pm_runtime_mark_last_busy(dev); - return 0; + return err; } static int __maybe_unused mhi_pci_suspend(struct device *dev)