diff mbox series

bus: mhi: pm: Change mhi_pm_resume timeout value

Message ID 1614953347-10192-1-git-send-email-loic.poulain@linaro.org (mailing list archive)
State Not Applicable, archived
Headers show
Series bus: mhi: pm: Change mhi_pm_resume timeout value | expand

Commit Message

Loic Poulain March 5, 2021, 2:09 p.m. UTC
mhi_cntrl->timeout_ms is set by the controller and indicates the
maximum amount of time the controller device will take to be ready.
In case of PCI modems, this value is quite high given modems can take
up to 15 seconds from cold boot to be ready.

Reusing this value in mhi_pm_resume can cause huge resuming latency
and delay the whole system resume (in case of system wide suspend/
resume), leading to bad use experience.

This change adjusts the resume timeout to a fixed 2s value, which is
more than enough for any MHI device for exiting M3.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/core/pm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jeffrey Hugo March 5, 2021, 2:49 p.m. UTC | #1
On 3/5/2021 7:09 AM, Loic Poulain wrote:
> mhi_cntrl->timeout_ms is set by the controller and indicates the
> maximum amount of time the controller device will take to be ready.
> In case of PCI modems, this value is quite high given modems can take
> up to 15 seconds from cold boot to be ready.
> 
> Reusing this value in mhi_pm_resume can cause huge resuming latency
> and delay the whole system resume (in case of system wide suspend/
> resume), leading to bad use experience.

I think this needs more explanation.  The timeout is a maximum value. 
You indicate that 2 seconds is more than enough for any MHI device to 
exit M3 (citation needed), but 15 seconds is too much?  The difference 
should only be apparent when the device doesn't transition in the timeout.

Put another way, this doesn't say why 15 seconds is bad, if every device 
only needs 2, given that wait_event_timeout() doesn't always wait for 
the entire timeout value if the event occurs earlier.

> 
> This change adjusts the resume timeout to a fixed 2s value, which is
> more than enough for any MHI device for exiting M3.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
>   drivers/bus/mhi/core/pm.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index 0cd6445..e2d83a9 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -17,6 +17,8 @@
>   #include <linux/wait.h>
>   #include "internal.h"
>   
> +#define MHI_PM_RESUME_TIMEOUT_MS 2000
> +
>   /*
>    * Not all MHI state transitions are synchronous. Transitions like Linkdown,
>    * SYS_ERR, and shutdown can happen anytime asynchronously. This function will
> @@ -942,7 +944,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
>   	ret = wait_event_timeout(mhi_cntrl->state_event,
>   				 mhi_cntrl->dev_state == MHI_STATE_M0 ||
>   				 MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
> -				 msecs_to_jiffies(mhi_cntrl->timeout_ms));
> +				 msecs_to_jiffies(MHI_PM_RESUME_TIMEOUT_MS));
>   
>   	if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) {
>   		dev_err(dev,
>
Loic Poulain March 5, 2021, 3:08 p.m. UTC | #2
Hi Jeffrey,

On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>
> On 3/5/2021 7:09 AM, Loic Poulain wrote:
> > mhi_cntrl->timeout_ms is set by the controller and indicates the
> > maximum amount of time the controller device will take to be ready.
> > In case of PCI modems, this value is quite high given modems can take
> > up to 15 seconds from cold boot to be ready.
> >
> > Reusing this value in mhi_pm_resume can cause huge resuming latency
> > and delay the whole system resume (in case of system wide suspend/
> > resume), leading to bad use experience.
>
> I think this needs more explanation.  The timeout is a maximum value.
> You indicate that 2 seconds is more than enough for any MHI device to
> exit M3 (citation needed), but 15 seconds is too much?  The difference
> should only be apparent when the device doesn't transition in the timeout.
>
> Put another way, this doesn't say why 15 seconds is bad, if every device
> only needs 2, given that wait_event_timeout() doesn't always wait for
> the entire timeout value if the event occurs earlier.

Yes, right that deserves an explanation: depending on the platform and
the suspend type (deep, s2idle), the PCI device may or may not lose
power. In case power is maintained, there is no problem and the
controller is successfully moved to M0. But in case of power loss, the
device is going to restart, and MHI resuming is going to timeout and
fail since M0 will never be reached. On PCI side we simply
reinitialize the controller in case of resume failure. So in other
words, MHI resume is expected to fail in some cases and it should be
handled with minimal impact on the system.

Regards,
Loic
Jeffrey Hugo March 5, 2021, 3:09 p.m. UTC | #3
On 3/5/2021 8:08 AM, Loic Poulain wrote:
> Hi Jeffrey,
> 
> On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>
>> On 3/5/2021 7:09 AM, Loic Poulain wrote:
>>> mhi_cntrl->timeout_ms is set by the controller and indicates the
>>> maximum amount of time the controller device will take to be ready.
>>> In case of PCI modems, this value is quite high given modems can take
>>> up to 15 seconds from cold boot to be ready.
>>>
>>> Reusing this value in mhi_pm_resume can cause huge resuming latency
>>> and delay the whole system resume (in case of system wide suspend/
>>> resume), leading to bad use experience.
>>
>> I think this needs more explanation.  The timeout is a maximum value.
>> You indicate that 2 seconds is more than enough for any MHI device to
>> exit M3 (citation needed), but 15 seconds is too much?  The difference
>> should only be apparent when the device doesn't transition in the timeout.
>>
>> Put another way, this doesn't say why 15 seconds is bad, if every device
>> only needs 2, given that wait_event_timeout() doesn't always wait for
>> the entire timeout value if the event occurs earlier.
> 
> Yes, right that deserves an explanation: depending on the platform and
> the suspend type (deep, s2idle), the PCI device may or may not lose
> power. In case power is maintained, there is no problem and the
> controller is successfully moved to M0. But in case of power loss, the
> device is going to restart, and MHI resuming is going to timeout and
> fail since M0 will never be reached. On PCI side we simply
> reinitialize the controller in case of resume failure. So in other
> words, MHI resume is expected to fail in some cases and it should be
> handled with minimal impact on the system.

Can we detect the power loss in far less than 2 seconds, and abort the 
resume process?  Waiting for the entire timeout, regardless of the 
value, in the power loss scenario you describe seems less than ideal for 
the system impact you are attempting to optimize.
Loic Poulain March 5, 2021, 3:34 p.m. UTC | #4
On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>
> On 3/5/2021 8:08 AM, Loic Poulain wrote:
> > Hi Jeffrey,
> >
> > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> >>
> >> On 3/5/2021 7:09 AM, Loic Poulain wrote:
> >>> mhi_cntrl->timeout_ms is set by the controller and indicates the
> >>> maximum amount of time the controller device will take to be ready.
> >>> In case of PCI modems, this value is quite high given modems can take
> >>> up to 15 seconds from cold boot to be ready.
> >>>
> >>> Reusing this value in mhi_pm_resume can cause huge resuming latency
> >>> and delay the whole system resume (in case of system wide suspend/
> >>> resume), leading to bad use experience.
> >>
> >> I think this needs more explanation.  The timeout is a maximum value.
> >> You indicate that 2 seconds is more than enough for any MHI device to
> >> exit M3 (citation needed), but 15 seconds is too much?  The difference
> >> should only be apparent when the device doesn't transition in the timeout.
> >>
> >> Put another way, this doesn't say why 15 seconds is bad, if every device
> >> only needs 2, given that wait_event_timeout() doesn't always wait for
> >> the entire timeout value if the event occurs earlier.
> >
> > Yes, right that deserves an explanation: depending on the platform and
> > the suspend type (deep, s2idle), the PCI device may or may not lose
> > power. In case power is maintained, there is no problem and the
> > controller is successfully moved to M0. But in case of power loss, the
> > device is going to restart, and MHI resuming is going to timeout and
> > fail since M0 will never be reached. On PCI side we simply
> > reinitialize the controller in case of resume failure. So in other
> > words, MHI resume is expected to fail in some cases and it should be
> > handled with minimal impact on the system.
>
> Can we detect the power loss in far less than 2 seconds, and abort the
> resume process?  Waiting for the entire timeout, regardless of the
> value, in the power loss scenario you describe seems less than ideal for
> the system impact you are attempting to optimize.

That's a good question, like checking the state is M3 before trying
anything, need to check that.

Regards,
Loic
Loic Poulain March 5, 2021, 4:16 p.m. UTC | #5
On Fri, 5 Mar 2021 at 16:34, Loic Poulain <loic.poulain@linaro.org> wrote:
>
> On Fri, 5 Mar 2021 at 16:09, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> >
> > On 3/5/2021 8:08 AM, Loic Poulain wrote:
> > > Hi Jeffrey,
> > >
> > > On Fri, 5 Mar 2021 at 15:49, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> > >>
> > >> On 3/5/2021 7:09 AM, Loic Poulain wrote:
> > >>> mhi_cntrl->timeout_ms is set by the controller and indicates the
> > >>> maximum amount of time the controller device will take to be ready.
> > >>> In case of PCI modems, this value is quite high given modems can take
> > >>> up to 15 seconds from cold boot to be ready.
> > >>>
> > >>> Reusing this value in mhi_pm_resume can cause huge resuming latency
> > >>> and delay the whole system resume (in case of system wide suspend/
> > >>> resume), leading to bad use experience.
> > >>
> > >> I think this needs more explanation.  The timeout is a maximum value.
> > >> You indicate that 2 seconds is more than enough for any MHI device to
> > >> exit M3 (citation needed), but 15 seconds is too much?  The difference
> > >> should only be apparent when the device doesn't transition in the timeout.
> > >>
> > >> Put another way, this doesn't say why 15 seconds is bad, if every device
> > >> only needs 2, given that wait_event_timeout() doesn't always wait for
> > >> the entire timeout value if the event occurs earlier.
> > >
> > > Yes, right that deserves an explanation: depending on the platform and
> > > the suspend type (deep, s2idle), the PCI device may or may not lose
> > > power. In case power is maintained, there is no problem and the
> > > controller is successfully moved to M0. But in case of power loss, the
> > > device is going to restart, and MHI resuming is going to timeout and
> > > fail since M0 will never be reached. On PCI side we simply
> > > reinitialize the controller in case of resume failure. So in other
> > > words, MHI resume is expected to fail in some cases and it should be
> > > handled with minimal impact on the system.
> >
> > Can we detect the power loss in far less than 2 seconds, and abort the
> > resume process?  Waiting for the entire timeout, regardless of the
> > value, in the power loss scenario you describe seems less than ideal for
> > the system impact you are attempting to optimize.
>
> That's a good question, like checking the state is M3 before trying
> anything, need to check that.

Ok, please discard this patch, I've submitted another change that
takes care of this more properly.
Thanks, Jeffrey for challenging this.

Loic
diff mbox series

Patch

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 0cd6445..e2d83a9 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -17,6 +17,8 @@ 
 #include <linux/wait.h>
 #include "internal.h"
 
+#define MHI_PM_RESUME_TIMEOUT_MS 2000
+
 /*
  * Not all MHI state transitions are synchronous. Transitions like Linkdown,
  * SYS_ERR, and shutdown can happen anytime asynchronously. This function will
@@ -942,7 +944,7 @@  int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
 	ret = wait_event_timeout(mhi_cntrl->state_event,
 				 mhi_cntrl->dev_state == MHI_STATE_M0 ||
 				 MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
-				 msecs_to_jiffies(mhi_cntrl->timeout_ms));
+				 msecs_to_jiffies(MHI_PM_RESUME_TIMEOUT_MS));
 
 	if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) {
 		dev_err(dev,