Message ID | 1614953347-10192-1-git-send-email-loic.poulain@linaro.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | bus: mhi: pm: Change mhi_pm_resume timeout value | expand |
On 3/5/2021 7:09 AM, Loic Poulain wrote: > mhi_cntrl->timeout_ms is set by the controller and indicates the > maximum amount of time the controller device will take to be ready. > In case of PCI modems, this value is quite high given modems can take > up to 15 seconds from cold boot to be ready. > > Reusing this value in mhi_pm_resume can cause huge resuming latency > and delay the whole system resume (in case of system wide suspend/ > resume), leading to bad use experience. I think this needs more explanation. The timeout is a maximum value. You indicate that 2 seconds is more than enough for any MHI device to exit M3 (citation needed), but 15 seconds is too much? The difference should only be apparent when the device doesn't transition in the timeout. Put another way, this doesn't say why 15 seconds is bad, if every device only needs 2, given that wait_event_timeout() doesn't always wait for the entire timeout value if the event occurs earlier. > > This change adjusts the resume timeout to a fixed 2s value, which is > more than enough for any MHI device for exiting M3. > > Signed-off-by: Loic Poulain <loic.poulain@linaro.org> > --- > drivers/bus/mhi/core/pm.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c > index 0cd6445..e2d83a9 100644 > --- a/drivers/bus/mhi/core/pm.c > +++ b/drivers/bus/mhi/core/pm.c > @@ -17,6 +17,8 @@ > #include <linux/wait.h> > #include "internal.h" > > +#define MHI_PM_RESUME_TIMEOUT_MS 2000 > + > /* > * Not all MHI state transitions are synchronous. Transitions like Linkdown, > * SYS_ERR, and shutdown can happen anytime asynchronously. This function will > @@ -942,7 +944,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl) > ret = wait_event_timeout(mhi_cntrl->state_event, > mhi_cntrl->dev_state == MHI_STATE_M0 || > MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state), > - msecs_to_jiffies(mhi_cntrl->timeout_ms)); > + msecs_to_jiffies(MHI_PM_RESUME_TIMEOUT_MS)); > > if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) { > dev_err(dev, >
Hi Jeffrey, On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > > On 3/5/2021 7:09 AM, Loic Poulain wrote: > > mhi_cntrl->timeout_ms is set by the controller and indicates the > > maximum amount of time the controller device will take to be ready. > > In case of PCI modems, this value is quite high given modems can take > > up to 15 seconds from cold boot to be ready. > > > > Reusing this value in mhi_pm_resume can cause huge resuming latency > > and delay the whole system resume (in case of system wide suspend/ > > resume), leading to bad use experience. > > I think this needs more explanation. The timeout is a maximum value. > You indicate that 2 seconds is more than enough for any MHI device to > exit M3 (citation needed), but 15 seconds is too much? The difference > should only be apparent when the device doesn't transition in the timeout. > > Put another way, this doesn't say why 15 seconds is bad, if every device > only needs 2, given that wait_event_timeout() doesn't always wait for > the entire timeout value if the event occurs earlier. Yes, right that deserves an explanation: depending on the platform and the suspend type (deep, s2idle), the PCI device may or may not lose power. In case power is maintained, there is no problem and the controller is successfully moved to M0. But in case of power loss, the device is going to restart, and MHI resuming is going to timeout and fail since M0 will never be reached. On PCI side we simply reinitialize the controller in case of resume failure. So in other words, MHI resume is expected to fail in some cases and it should be handled with minimal impact on the system. Regards, Loic
On 3/5/2021 8:08 AM, Loic Poulain wrote: > Hi Jeffrey, > > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote: >> >> On 3/5/2021 7:09 AM, Loic Poulain wrote: >>> mhi_cntrl->timeout_ms is set by the controller and indicates the >>> maximum amount of time the controller device will take to be ready. >>> In case of PCI modems, this value is quite high given modems can take >>> up to 15 seconds from cold boot to be ready. >>> >>> Reusing this value in mhi_pm_resume can cause huge resuming latency >>> and delay the whole system resume (in case of system wide suspend/ >>> resume), leading to bad use experience. >> >> I think this needs more explanation. The timeout is a maximum value. >> You indicate that 2 seconds is more than enough for any MHI device to >> exit M3 (citation needed), but 15 seconds is too much? The difference >> should only be apparent when the device doesn't transition in the timeout. >> >> Put another way, this doesn't say why 15 seconds is bad, if every device >> only needs 2, given that wait_event_timeout() doesn't always wait for >> the entire timeout value if the event occurs earlier. > > Yes, right that deserves an explanation: depending on the platform and > the suspend type (deep, s2idle), the PCI device may or may not lose > power. In case power is maintained, there is no problem and the > controller is successfully moved to M0. But in case of power loss, the > device is going to restart, and MHI resuming is going to timeout and > fail since M0 will never be reached. On PCI side we simply > reinitialize the controller in case of resume failure. So in other > words, MHI resume is expected to fail in some cases and it should be > handled with minimal impact on the system. Can we detect the power loss in far less than 2 seconds, and abort the resume process? Waiting for the entire timeout, regardless of the value, in the power loss scenario you describe seems less than ideal for the system impact you are attempting to optimize.
On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > > On 3/5/2021 8:08 AM, Loic Poulain wrote: > > Hi Jeffrey, > > > > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > >> > >> On 3/5/2021 7:09 AM, Loic Poulain wrote: > >>> mhi_cntrl->timeout_ms is set by the controller and indicates the > >>> maximum amount of time the controller device will take to be ready. > >>> In case of PCI modems, this value is quite high given modems can take > >>> up to 15 seconds from cold boot to be ready. > >>> > >>> Reusing this value in mhi_pm_resume can cause huge resuming latency > >>> and delay the whole system resume (in case of system wide suspend/ > >>> resume), leading to bad use experience. > >> > >> I think this needs more explanation. The timeout is a maximum value. > >> You indicate that 2 seconds is more than enough for any MHI device to > >> exit M3 (citation needed), but 15 seconds is too much? The difference > >> should only be apparent when the device doesn't transition in the timeout. > >> > >> Put another way, this doesn't say why 15 seconds is bad, if every device > >> only needs 2, given that wait_event_timeout() doesn't always wait for > >> the entire timeout value if the event occurs earlier. > > > > Yes, right that deserves an explanation: depending on the platform and > > the suspend type (deep, s2idle), the PCI device may or may not lose > > power. In case power is maintained, there is no problem and the > > controller is successfully moved to M0. But in case of power loss, the > > device is going to restart, and MHI resuming is going to timeout and > > fail since M0 will never be reached. On PCI side we simply > > reinitialize the controller in case of resume failure. So in other > > words, MHI resume is expected to fail in some cases and it should be > > handled with minimal impact on the system. > > Can we detect the power loss in far less than 2 seconds, and abort the > resume process? Waiting for the entire timeout, regardless of the > value, in the power loss scenario you describe seems less than ideal for > the system impact you are attempting to optimize. That's a good question, like checking the state is M3 before trying anything, need to check that. Regards, Loic
On Fri, 5 Mar 2021 at 16:34, Loic Poulain <loic.poulain@linaro.org> wrote: > > On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > > > > On 3/5/2021 8:08 AM, Loic Poulain wrote: > > > Hi Jeffrey, > > > > > > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote: > > >> > > >> On 3/5/2021 7:09 AM, Loic Poulain wrote: > > >>> mhi_cntrl->timeout_ms is set by the controller and indicates the > > >>> maximum amount of time the controller device will take to be ready. > > >>> In case of PCI modems, this value is quite high given modems can take > > >>> up to 15 seconds from cold boot to be ready. > > >>> > > >>> Reusing this value in mhi_pm_resume can cause huge resuming latency > > >>> and delay the whole system resume (in case of system wide suspend/ > > >>> resume), leading to bad use experience. > > >> > > >> I think this needs more explanation. The timeout is a maximum value. > > >> You indicate that 2 seconds is more than enough for any MHI device to > > >> exit M3 (citation needed), but 15 seconds is too much? The difference > > >> should only be apparent when the device doesn't transition in the timeout. > > >> > > >> Put another way, this doesn't say why 15 seconds is bad, if every device > > >> only needs 2, given that wait_event_timeout() doesn't always wait for > > >> the entire timeout value if the event occurs earlier. > > > > > > Yes, right that deserves an explanation: depending on the platform and > > > the suspend type (deep, s2idle), the PCI device may or may not lose > > > power. In case power is maintained, there is no problem and the > > > controller is successfully moved to M0. But in case of power loss, the > > > device is going to restart, and MHI resuming is going to timeout and > > > fail since M0 will never be reached. On PCI side we simply > > > reinitialize the controller in case of resume failure. So in other > > > words, MHI resume is expected to fail in some cases and it should be > > > handled with minimal impact on the system. > > > > Can we detect the power loss in far less than 2 seconds, and abort the > > resume process? Waiting for the entire timeout, regardless of the > > value, in the power loss scenario you describe seems less than ideal for > > the system impact you are attempting to optimize. > > That's a good question, like checking the state is M3 before trying > anything, need to check that. Ok, please discard this patch, I've submitted another change that takes care of this more properly. Thanks, Jeffrey for challenging this. Loic
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index 0cd6445..e2d83a9 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -17,6 +17,8 @@ #include <linux/wait.h> #include "internal.h" +#define MHI_PM_RESUME_TIMEOUT_MS 2000 + /* * Not all MHI state transitions are synchronous. Transitions like Linkdown, * SYS_ERR, and shutdown can happen anytime asynchronously. This function will @@ -942,7 +944,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl) ret = wait_event_timeout(mhi_cntrl->state_event, mhi_cntrl->dev_state == MHI_STATE_M0 || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state), - msecs_to_jiffies(mhi_cntrl->timeout_ms)); + msecs_to_jiffies(MHI_PM_RESUME_TIMEOUT_MS)); if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) { dev_err(dev,
mhi_cntrl->timeout_ms is set by the controller and indicates the maximum amount of time the controller device will take to be ready. In case of PCI modems, this value is quite high given modems can take up to 15 seconds from cold boot to be ready. Reusing this value in mhi_pm_resume can cause huge resuming latency and delay the whole system resume (in case of system wide suspend/ resume), leading to bad use experience. This change adjusts the resume timeout to a fixed 2s value, which is more than enough for any MHI device for exiting M3. Signed-off-by: Loic Poulain <loic.poulain@linaro.org> --- drivers/bus/mhi/core/pm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)