Message ID | 20240818-pmic-glink-v6-11-races-v1-0-f87c577e0bc9@quicinc.com (mailing list archive) |
---|---|
Headers | show |
Series | soc: qcom: pmic_glink: v6.11-rc bug fixes | expand |
On Mon, 19 Aug 2024 at 04:47, Bjorn Andersson <quic_bjorande@quicinc.com> wrote: > > Amit and Johan both reported a NULL pointer dereference in the > pmic_glink client code during initialization, and Stephen Boyd pointed > out the problem (race condition). > > While investigating, and writing the fix, I noticed that > ucsi_unregister() is called in atomic context but tries to sleep, and I > also noticed that the condition for when to inform the pmic_glink client > drivers when the remote has gone down is just wrong. > > So, let's fix all three. > > As mentioned in the commit message for the UCSI fix, I have a series in > the works that makes the GLINK callback happen in a sleepable context, > which would remove the need for the clients list to be protected by a > spinlock, and removing the work scheduling. This is however not -rc > material... > > In addition to the NULL pointer dereference, there is the -ECANCELED > issue reported here: > https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ > I have not yet been able to either reproduce this or convince myself > that this is the same issue. > Thank you for the fixes Bjorn. I'm not able to reproduce that pmic_glink kernel panic on SM8550-HDK anymore. Tested-by: Amit Pundir <amit.pundir@linaro.org> > Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> > --- > Bjorn Andersson (3): > soc: qcom: pmic_glink: Fix race during initialization > usb: typec: ucsi: Move unregister out of atomic section > soc: qcom: pmic_glink: Actually communicate with remote goes down > > drivers/power/supply/qcom_battmgr.c | 16 ++++++++----- > drivers/soc/qcom/pmic_glink.c | 40 +++++++++++++++++++++---------- > drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++----- > drivers/usb/typec/ucsi/ucsi_glink.c | 44 ++++++++++++++++++++++++++--------- > include/linux/soc/qcom/pmic_glink.h | 11 +++++---- > 5 files changed, 88 insertions(+), 40 deletions(-) > --- > base-commit: 296c871d2904cff2b4742702ef94512ab467a8e3 > change-id: 20240818-pmic-glink-v6-11-races-363f5964c339 > > Best regards, > -- > Bjorn Andersson <quic_bjorande@quicinc.com> >
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote: > Amit and Johan both reported a NULL pointer dereference in the > pmic_glink client code during initialization, and Stephen Boyd pointed > out the problem (race condition). > > While investigating, and writing the fix, I noticed that > ucsi_unregister() is called in atomic context but tries to sleep, and I > also noticed that the condition for when to inform the pmic_glink client > drivers when the remote has gone down is just wrong. > > So, let's fix all three. > > As mentioned in the commit message for the UCSI fix, I have a series in > the works that makes the GLINK callback happen in a sleepable context, > which would remove the need for the clients list to be protected by a > spinlock, and removing the work scheduling. This is however not -rc > material... > > In addition to the NULL pointer dereference, there is the -ECANCELED > issue reported here: > https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ > I have not yet been able to either reproduce this or convince myself > that this is the same issue. > > Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> What tree are these to go through? I can take them through mine, but if someone else wants to, feel free to route them some other way. thanks, greg k-h
On Mon, Aug 19, 2024 at 04:07:41PM +0200, Greg Kroah-Hartman wrote: > On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote: > > Amit and Johan both reported a NULL pointer dereference in the > > pmic_glink client code during initialization, and Stephen Boyd pointed > > out the problem (race condition). > > > > While investigating, and writing the fix, I noticed that > > ucsi_unregister() is called in atomic context but tries to sleep, and I > > also noticed that the condition for when to inform the pmic_glink client > > drivers when the remote has gone down is just wrong. > > > > So, let's fix all three. > > > > As mentioned in the commit message for the UCSI fix, I have a series in > > the works that makes the GLINK callback happen in a sleepable context, > > which would remove the need for the clients list to be protected by a > > spinlock, and removing the work scheduling. This is however not -rc > > material... > > > > In addition to the NULL pointer dereference, there is the -ECANCELED > > issue reported here: > > https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ > > I have not yet been able to either reproduce this or convince myself > > that this is the same issue. > > > > Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> > > What tree are these to go through? I can take them through mine, but if > someone else wants to, feel free to route them some other way. > It's primarily soc/qcom content, so I can pick them through the qcom soc tree. Regards, Bjorn > thanks, > > greg k-h >
On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote: > Amit and Johan both reported a NULL pointer dereference in the > pmic_glink client code during initialization, and Stephen Boyd pointed > out the problem (race condition). > In addition to the NULL pointer dereference, there is the -ECANCELED > issue reported here: > https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ > I have not yet been able to either reproduce this or convince myself > that this is the same issue. I can confirm that I still see the -ECANCELED issue with this series applied: [ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 Johan
On Mon, Aug 19, 2024 at 05:48:26PM +0200, Johan Hovold wrote: > On Sun, Aug 18, 2024 at 04:17:36PM -0700, Bjorn Andersson wrote: > > Amit and Johan both reported a NULL pointer dereference in the > > pmic_glink client code during initialization, and Stephen Boyd pointed > > out the problem (race condition). > > > In addition to the NULL pointer dereference, there is the -ECANCELED > > issue reported here: > > https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ > > I have not yet been able to either reproduce this or convince myself > > that this is the same issue. > > I can confirm that I still see the -ECANCELED issue with this series > applied: > > [ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) > [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 > Could you confirm that you're seeing a call to qcom_glink_handle_intent_req_ack() with granted == 0, leading to the transfer failing. It would also be nice, just for completeness sake to rule out that you do not get a call to qcom_glink_intent_req_abort() here. Regards, Bjorn > Johan >
On Mon, Aug 19, 2024 at 09:53:09AM -0700, Bjorn Andersson wrote: > On Mon, Aug 19, 2024 at 05:48:26PM +0200, Johan Hovold wrote: > > I can confirm that I still see the -ECANCELED issue with this series > > applied: > > > > [ 8.979329] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) > > [ 9.004735] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 > > Could you confirm that you're seeing a call to > qcom_glink_handle_intent_req_ack() with granted == 0, leading to the > transfer failing. It appears so: [ 9.539415] 30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0 [ 9.561750] qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications [ 9.448945] 30000000.remoteproc:glink-edge: qcom_glink_handle_intent_req_ack - cid = 9, granted = 0 [ 9.461267] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) [ 9.469241] qcom,apr 30000000.remoteproc:glink-edge.adsp_apps.-1.-1: Adding APR/GPR dev: gprsvc:service:2:1 [ 9.478968] pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 > It would also be nice, just for completeness sake to rule out that you > do not get a call to qcom_glink_intent_req_abort() here. And I'm not seeing this function being called. Johan
Amit and Johan both reported a NULL pointer dereference in the pmic_glink client code during initialization, and Stephen Boyd pointed out the problem (race condition). While investigating, and writing the fix, I noticed that ucsi_unregister() is called in atomic context but tries to sleep, and I also noticed that the condition for when to inform the pmic_glink client drivers when the remote has gone down is just wrong. So, let's fix all three. As mentioned in the commit message for the UCSI fix, I have a series in the works that makes the GLINK callback happen in a sleepable context, which would remove the need for the clients list to be protected by a spinlock, and removing the work scheduling. This is however not -rc material... In addition to the NULL pointer dereference, there is the -ECANCELED issue reported here: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/ I have not yet been able to either reproduce this or convince myself that this is the same issue. Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com> --- Bjorn Andersson (3): soc: qcom: pmic_glink: Fix race during initialization usb: typec: ucsi: Move unregister out of atomic section soc: qcom: pmic_glink: Actually communicate with remote goes down drivers/power/supply/qcom_battmgr.c | 16 ++++++++----- drivers/soc/qcom/pmic_glink.c | 40 +++++++++++++++++++++---------- drivers/soc/qcom/pmic_glink_altmode.c | 17 +++++++++----- drivers/usb/typec/ucsi/ucsi_glink.c | 44 ++++++++++++++++++++++++++--------- include/linux/soc/qcom/pmic_glink.h | 11 +++++---- 5 files changed, 88 insertions(+), 40 deletions(-) --- base-commit: 296c871d2904cff2b4742702ef94512ab467a8e3 change-id: 20240818-pmic-glink-v6-11-races-363f5964c339 Best regards,