diff mbox series

[v2,2/2] soc: qcom: pmic_glink: Handle GLINK intent allocation rejections

Message ID 20241023-pmic-glink-ecancelled-v2-2-ebc268129407@oss.qualcomm.com (mailing list archive)
State New
Headers show
Series soc: qcom: pmic_glink: Resolve failures to bring up pmic_glink | expand

Commit Message

Bjorn Andersson Oct. 23, 2024, 5:24 p.m. UTC
Some versions of the pmic_glink firmware does not allow dynamic GLINK
intent allocations, attempting to send a message before the firmware has
allocated its receive buffers and announced these intent allocations
will fail. When this happens something like this showns up in the log:

    pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
    pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
    ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
    qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications

GLINK has been updated to distinguish between the cases where the remote
is going down (-ECANCELED) and the intent allocation being rejected
(-EAGAIN).

Retry the send until intent buffers becomes available, or an actual
error occur.

To avoid infinitely waiting for the firmware in the event that this
misbehaves and no intents arrive, an arbitrary 5 second timeout is
used.

This patch was developed with input from Chris Lew.

Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
Cc: stable@vger.kernel.org # rpmsg: glink: Handle rejected intent request better
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
---
 drivers/soc/qcom/pmic_glink.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

Comments

Chris Lew Oct. 23, 2024, 10:26 p.m. UTC | #1
On 10/23/2024 10:24 AM, Bjorn Andersson wrote:
> Some versions of the pmic_glink firmware does not allow dynamic GLINK
> intent allocations, attempting to send a message before the firmware has
> allocated its receive buffers and announced these intent allocations
> will fail. When this happens something like this showns up in the log:
> 
>      pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125)
>      pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125
>      ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125
>      qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications
> 
> GLINK has been updated to distinguish between the cases where the remote
> is going down (-ECANCELED) and the intent allocation being rejected
> (-EAGAIN).
> 
> Retry the send until intent buffers becomes available, or an actual
> error occur.
> 
> To avoid infinitely waiting for the firmware in the event that this
> misbehaves and no intents arrive, an arbitrary 5 second timeout is
> used.
> 
> This patch was developed with input from Chris Lew.
> 
> Reported-by: Johan Hovold <johan@kernel.org>
> Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
> Cc: stable@vger.kernel.org # rpmsg: glink: Handle rejected intent request better
> Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> ---

Reviewed-by: Chris Lew <quic_clew@quicinc.com>
Johan Hovold Oct. 24, 2024, 6:39 a.m. UTC | #2
On Wed, Oct 23, 2024 at 05:24:33PM +0000, Bjorn Andersson wrote:
> Some versions of the pmic_glink firmware does not allow dynamic GLINK
> intent allocations, attempting to send a message before the firmware has
> allocated its receive buffers and announced these intent allocations
> will fail.

> Retry the send until intent buffers becomes available, or an actual
> error occur.

> Reported-by: Johan Hovold <johan@kernel.org>
> Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
> Cc: stable@vger.kernel.org # rpmsg: glink: Handle rejected intent request better
> Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
> Tested-by: Johan Hovold <johan+linaro@kernel.org>
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>

Thanks for the update. Still works as intended here.

>  int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
>  {
>  	struct pmic_glink *pg = client->pg;
> +	bool timeout_reached = false;
> +	unsigned long start;
>  	int ret;
>  
>  	mutex_lock(&pg->state_lock);
> -	if (!pg->ept)
> +	if (!pg->ept) {
>  		ret = -ECONNRESET;
> -	else
> -		ret = rpmsg_send(pg->ept, data, len);
> +	} else {
> +		start = jiffies;
> +		for (;;) {
> +			ret = rpmsg_send(pg->ept, data, len);
> +			if (ret != -EAGAIN)
> +				break;
> +
> +			if (timeout_reached) {
> +				ret = -ETIMEDOUT;
> +				break;
> +			}
> +
> +			usleep_range(1000, 5000);

I ran some quick tests of this patch this morning (reproducing the issue
five times), and with the above delay it seems a single resend is
enough. Dropping the delay I once hit:

[    8.723479] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.723877] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.723921] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.723951] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.723981] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.724010] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
[    8.724046] qcom_pmic_glink pmic-glink: pmic_glink_send - resend

which seems to suggest that a one millisecond sleep is sufficient for
the currently observed issue.

It would still mean up to 5k calls if you ever try to send a too large
buffer or similar and spin here for five seconds however. Perhaps
nothing to worry about at this point, but increasing the delay or
lowering the timeout could be considered.

> +			timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT);
> +		}
> +	}
>  	mutex_unlock(&pg->state_lock);
>  
>  	return ret;

Johan
Bjorn Andersson Oct. 24, 2024, 5:49 p.m. UTC | #3
On Thu, Oct 24, 2024 at 08:39:25AM GMT, Johan Hovold wrote:
> On Wed, Oct 23, 2024 at 05:24:33PM +0000, Bjorn Andersson wrote:
> > Some versions of the pmic_glink firmware does not allow dynamic GLINK
> > intent allocations, attempting to send a message before the firmware has
> > allocated its receive buffers and announced these intent allocations
> > will fail.
> 
> > Retry the send until intent buffers becomes available, or an actual
> > error occur.
> 
> > Reported-by: Johan Hovold <johan@kernel.org>
> > Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t
> > Cc: stable@vger.kernel.org # rpmsg: glink: Handle rejected intent request better
> > Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
> > Tested-by: Johan Hovold <johan+linaro@kernel.org>
> > Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> > Signed-off-by: Bjorn Andersson <bjorn.andersson@oss.qualcomm.com>
> 
> Thanks for the update. Still works as intended here.
> 

Thanks for the confirmation.

> >  int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
> >  {
> >  	struct pmic_glink *pg = client->pg;
> > +	bool timeout_reached = false;
> > +	unsigned long start;
> >  	int ret;
> >  
> >  	mutex_lock(&pg->state_lock);
> > -	if (!pg->ept)
> > +	if (!pg->ept) {
> >  		ret = -ECONNRESET;
> > -	else
> > -		ret = rpmsg_send(pg->ept, data, len);
> > +	} else {
> > +		start = jiffies;
> > +		for (;;) {
> > +			ret = rpmsg_send(pg->ept, data, len);
> > +			if (ret != -EAGAIN)
> > +				break;
> > +
> > +			if (timeout_reached) {
> > +				ret = -ETIMEDOUT;
> > +				break;
> > +			}
> > +
> > +			usleep_range(1000, 5000);
> 
> I ran some quick tests of this patch this morning (reproducing the issue
> five times), and with the above delay it seems a single resend is
> enough. Dropping the delay I once hit:
> 
> [    8.723479] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.723877] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.723921] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.723951] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.723981] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.724010] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> [    8.724046] qcom_pmic_glink pmic-glink: pmic_glink_send - resend
> 
> which seems to suggest that a one millisecond sleep is sufficient for
> the currently observed issue.
> 
> It would still mean up to 5k calls if you ever try to send a too large
> buffer or similar and spin here for five seconds however. Perhaps
> nothing to worry about at this point, but increasing the delay or
> lowering the timeout could be considered.
> 

I did consider this as well, but this code-path is specific to
pmic-glink, so we shouldn't have any messages of size unexpected to the
other side...

If we do, then let's fix that. If I'm wrong in my assumptions, I'd be
happy to see this corrected, without my arbitrarily chosen timeout
values.

Thanks,
Bjorn

> > +			timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT);
> > +		}
> > +	}
> >  	mutex_unlock(&pg->state_lock);
> >  
> >  	return ret;
> 
> Johan
diff mbox series

Patch

diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 9606222993fd78e80d776ea299cad024a0197e91..baa4ac6704a901661d1055c5caeaab61dc315795 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -4,6 +4,7 @@ 
  * Copyright (c) 2022, Linaro Ltd
  */
 #include <linux/auxiliary_bus.h>
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
@@ -13,6 +14,8 @@ 
 #include <linux/soc/qcom/pmic_glink.h>
 #include <linux/spinlock.h>
 
+#define PMIC_GLINK_SEND_TIMEOUT (5 * HZ)
+
 enum {
 	PMIC_GLINK_CLIENT_BATT = 0,
 	PMIC_GLINK_CLIENT_ALTMODE,
@@ -112,13 +115,29 @@  EXPORT_SYMBOL_GPL(pmic_glink_client_register);
 int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len)
 {
 	struct pmic_glink *pg = client->pg;
+	bool timeout_reached = false;
+	unsigned long start;
 	int ret;
 
 	mutex_lock(&pg->state_lock);
-	if (!pg->ept)
+	if (!pg->ept) {
 		ret = -ECONNRESET;
-	else
-		ret = rpmsg_send(pg->ept, data, len);
+	} else {
+		start = jiffies;
+		for (;;) {
+			ret = rpmsg_send(pg->ept, data, len);
+			if (ret != -EAGAIN)
+				break;
+
+			if (timeout_reached) {
+				ret = -ETIMEDOUT;
+				break;
+			}
+
+			usleep_range(1000, 5000);
+			timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT);
+		}
+	}
 	mutex_unlock(&pg->state_lock);
 
 	return ret;