Message ID | 20210516040953.622409-1-bjorn.andersson@linaro.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 8c9b3caab3ac26db1da00b8117901640c55a69dd |
Headers | show |
Series | usb: typec: ucsi: Clear pending after acking connector change | expand |
On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > It's possible that the interrupt handler for the UCSI driver signals a > connector changes after the handler clears the PENDING bit, but before > it has sent the acknowledge request. The result is that the handler is > invoked yet again, to ack the same connector change. > > At least some versions of the Qualcomm UCSI firmware will not handle the > second - "spurious" - acknowledgment gracefully. So make sure to not > clear the pending flag until the change is acknowledged. > > Any connector changes coming in after the acknowledgment, that would > have the pending flag incorrectly cleared, would afaict be covered by > the subsequent connector status check. > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information") > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> I'm OK with this if Bejamin does not see any problems with it. I'll wait for his comments before giving my reviewed-by tag. That workaround (commit 217504a05532) is unfortunately too fragile. I'm going to now separate the processing of the connector state from the event handler (interrupt handler). That way we should be fairly sure we don't loose any of the connector states even if an event is generated while we are still in the middle of processing the previous one(s), and at the same time be sure that we also don't confuse the firmware. So the event handler shall after that only read the connector status, schedule the unique job where it's processed and ACK the event. Nothing else. > --- > drivers/usb/typec/ucsi/ucsi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c > index 282c3c825c13..f451ce0132a9 100644 > --- a/drivers/usb/typec/ucsi/ucsi.c > +++ b/drivers/usb/typec/ucsi/ucsi.c > @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct work_struct *work) > ucsi_send_command(con->ucsi, command, NULL, 0); > > /* 3. ACK connector change */ > - clear_bit(EVENT_PENDING, &ucsi->flags); > ret = ucsi_acknowledge_connector_change(ucsi); > + clear_bit(EVENT_PENDING, &ucsi->flags); > if (ret) { > dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret); > goto out_unlock; > -- > 2.29.2 thanks,
Hi, On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote: > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > > It's possible that the interrupt handler for the UCSI driver signals a > > connector changes after the handler clears the PENDING bit, but before > > it has sent the acknowledge request. The result is that the handler is > > invoked yet again, to ack the same connector change. > > > > At least some versions of the Qualcomm UCSI firmware will not handle the > > second - "spurious" - acknowledgment gracefully. So make sure to not > > clear the pending flag until the change is acknowledged. > > > > Any connector changes coming in after the acknowledgment, that would > > have the pending flag incorrectly cleared, would afaict be covered by > > the subsequent connector status check. > > > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information") > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > I'm OK with this if Bejamin does not see any problems with it. I'll > wait for his comments before giving my reviewed-by tag. > > That workaround (commit 217504a05532) is unfortunately too fragile. > I'm going to now separate the processing of the connector state from > the event handler (interrupt handler). That way we should be fairly > sure we don't loose any of the connector states even if an event is > generated while we are still in the middle of processing the previous > one(s), and at the same time be sure that we also don't confuse the > firmware. > > So the event handler shall after that only read the connector status, > schedule the unique job where it's processed and ACK the event. > Nothing else. Seems to be straightforward to implement. I'm attaching the patch I made for that. I think it should actually also remove the problem you are seeing. Can you test it? thanks,
On Mon, 2021-05-17 at 13:03 +0300, Heikki Krogerus wrote: > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > > It's possible that the interrupt handler for the UCSI driver > > signals a > > connector changes after the handler clears the PENDING bit, but > > before > > it has sent the acknowledge request. The result is that the handler > > is > > invoked yet again, to ack the same connector change. > > > > At least some versions of the Qualcomm UCSI firmware will not > > handle the > > second - "spurious" - acknowledgment gracefully. So make sure to > > not > > clear the pending flag untuntil the changeil the change is > > acknowledged. > > > > Any connector changes coming in after the acknowledgment, that > > would > > have the pending flag incorrectly cleared, would afaict be covered > > by > > the subsequent connector status check. > > > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing > > change information") > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > I'm OK with this if Bejamin does not see any problems with it. I'll > wait for his comments before giving my reviewed-by tag. Interesting problem. I wonder if we can detect this situation and restart the worker instead of sending the ACK. As for the patch, I do believe it is safe. I agree with the assessment in the commit message, that the subsequent connector check will detect any race condition and recovers from it. Acked-By: Benjamin Berg <bberg@redhat.com> Benjamin > That workaround (commit 217504a05532) is unfortunately too fragile. > I'm going to now separate the processing of the connector state from > the event handler (interrupt handler). That way we should be fairly > sure we don't loose any of the connector states even if an event is > generated while we are still in the middle of processing the previous > one(s), and at the same time be sure that we also don't confuse the > firmware. > So the event handler shall after that only read the connector status, > schedule the unique job where it's processed and ACK the event. > Nothing else. > > > --- > > drivers/usb/typec/ucsi/ucsi.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/usb/typec/ucsi/ucsi.c > > b/drivers/usb/typec/ucsi/ucsi.c > > index 282c3c825c13..f451ce0132a9 100644 > > --- a/drivers/usb/typec/ucsi/ucsi.c > > +++ b/drivers/usb/typec/ucsi/ucsi.c > > @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct > > work_struct *work) > > ucsi_send_command(con->ucsi, command, NULL, 0); > > > > /* 3. ACK connector change */ > > - clear_bit(EVENT_PENDING, &ucsi->flags); > > ret = ucsi_acknowledge_connector_change(ucsi); > > + clear_bit(EVENT_PENDING, &ucsi->flags); > > if (ret) { > > dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, > > ret); > > goto out_unlock; > > -- > > 2.29.2 > > thanks, >
Hi Heikki, On Mon, 2021-05-17 at 15:15 +0300, Heikki Krogerus wrote: > On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote: > > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > > > It's possible that the interrupt handler for the UCSI driver > > > signals a > > > connector changes after the handler clears the PENDING bit, but > > > before > > > it has sent the acknowledge request. The result is that the handler > > > is > > > invoked yet again, to ack the same connector change. > > > > > > At least some versions of the Qualcomm UCSI firmware will not > > > handle the > > > second - "spurious" - acknowledgment gracefully. So make sure to > > > not > > > clear the pending flag until the change is acknowledged. > > > > > > Any connector changes coming in after the acknowledgment, that > > > would > > > have the pending flag incorrectly cleared, would afaict be covered > > > by > > > the subsequent connector status check. > > > > > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing > > > change information") > > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > > > I'm OK with this if Bejamin does not see any problems with it. I'll > > wait for his comments before giving my reviewed-by tag. > > > > That workaround (commit 217504a05532) is unfortunately too fragile. > > I'm going to now separate the processing of the connector state from > > the event handler (interrupt handler). That way we should be fairly > > sure we don't loose any of the connector states even if an event is > > generated while we are still in the middle of processing the previous > > one(s), and at the same time be sure that we also don't confuse the > > firmware. > > > > So the event handler shall after that only read the connector status, > > schedule the unique job where it's processed and ACK the event. > > Nothing else. > > Seems to be straightforward to implement. I'm attaching the patch I > made for that. I think it should actually also remove the problem you > are seeing. Can you test it? Hmm, I am happy to try the patch tomorrow in the scenario where I observed my race condition. Unfortunately, I don't feel it'll work. The problem that I was seeing looked like a race condition in the PPM itself, where the window is the time between the UCSI_GET_CONNECTOR_STATUS command and the subsequent ACK. For such a firmware level bug in the PPM, we need a way to detect the race condition when it happens (or get a fix for the firmware). Note that there was also some code to always sent a UCSI_GET_CAM_SUPPORTED command "to ensure the PPM does not get stuck in case it assumes we do". I have no idea where this came from or what PPMs might require this workaround. But I doubt it is a good idea to drop it. Benjamin
On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > It's possible that the interrupt handler for the UCSI driver signals a > connector changes after the handler clears the PENDING bit, but before > it has sent the acknowledge request. The result is that the handler is > invoked yet again, to ack the same connector change. > > At least some versions of the Qualcomm UCSI firmware will not handle the > second - "spurious" - acknowledgment gracefully. So make sure to not > clear the pending flag until the change is acknowledged. > > Any connector changes coming in after the acknowledgment, that would > have the pending flag incorrectly cleared, would afaict be covered by > the subsequent connector status check. > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information") > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> > --- > drivers/usb/typec/ucsi/ucsi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c > index 282c3c825c13..f451ce0132a9 100644 > --- a/drivers/usb/typec/ucsi/ucsi.c > +++ b/drivers/usb/typec/ucsi/ucsi.c > @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct work_struct *work) > ucsi_send_command(con->ucsi, command, NULL, 0); > > /* 3. ACK connector change */ > - clear_bit(EVENT_PENDING, &ucsi->flags); > ret = ucsi_acknowledge_connector_change(ucsi); > + clear_bit(EVENT_PENDING, &ucsi->flags); > if (ret) { > dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret); > goto out_unlock; > -- > 2.29.2
On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote: > Hi Heikki, > > > On Mon, 2021-05-17 at 15:15 +0300, Heikki Krogerus wrote: > > On Mon, May 17, 2021 at 01:03:12PM +0300, Heikki Krogerus wrote: > > > On Sat, May 15, 2021 at 09:09:53PM -0700, Bjorn Andersson wrote: > > > > It's possible that the interrupt handler for the UCSI driver > > > > signals a > > > > connector changes after the handler clears the PENDING bit, but > > > > before > > > > it has sent the acknowledge request. The result is that the handler > > > > is > > > > invoked yet again, to ack the same connector change. > > > > > > > > At least some versions of the Qualcomm UCSI firmware will not > > > > handle the > > > > second - "spurious" - acknowledgment gracefully. So make sure to > > > > not > > > > clear the pending flag until the change is acknowledged. > > > > > > > > Any connector changes coming in after the acknowledgment, that > > > > would > > > > have the pending flag incorrectly cleared, would afaict be covered > > > > by > > > > the subsequent connector status check. > > > > > > > > Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing > > > > change information") > > > > Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> > > > > > > I'm OK with this if Bejamin does not see any problems with it. I'll > > > wait for his comments before giving my reviewed-by tag. > > > > > > That workaround (commit 217504a05532) is unfortunately too fragile. > > > I'm going to now separate the processing of the connector state from > > > the event handler (interrupt handler). That way we should be fairly > > > sure we don't loose any of the connector states even if an event is > > > generated while we are still in the middle of processing the previous > > > one(s), and at the same time be sure that we also don't confuse the > > > firmware. > > > > > > So the event handler shall after that only read the connector status, > > > schedule the unique job where it's processed and ACK the event. > > > Nothing else. > > > > Seems to be straightforward to implement. I'm attaching the patch I > > made for that. I think it should actually also remove the problem you > > are seeing. Can you test it? > > Hmm, I am happy to try the patch tomorrow in the scenario where I > observed my race condition. > > Unfortunately, I don't feel it'll work. The problem that I was seeing > looked like a race condition in the PPM itself, where the window is the > time between the UCSI_GET_CONNECTOR_STATUS command and the subsequent > ACK. > For such a firmware level bug in the PPM, we need a way to detect the > race condition when it happens (or get a fix for the firmware). OK. Let me know does the patch bring the issue back for you. > Note that there was also some code to always sent a > UCSI_GET_CAM_SUPPORTED command "to ensure the PPM does not get stuck in > case it assumes we do". I have no idea where this came from or what > PPMs might require this workaround. But I doubt it is a good idea to > drop it. Sure. thanks,
On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote: > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote: > > > > [SNIP] > > Unfortunately, I don't feel it'll work. The problem that I was > > seeing > > looked like a race condition in the PPM itself, where the window is > > the > > time between the UCSI_GET_CONNECTOR_STATUS command and the > > subsequent > > ACK. > > For such a firmware level bug in the PPM, we need a way to detect > > the > > race condition when it happens (or get a fix for the firmware). > > OK. Let me know does the patch bring the issue back for you. So, I just tried the patch, and I can occasionally reproduce the issue where "online" for the ucsi power adapter is stuck at "1" after unplugging with the patch applied. Benjamin
On Tue, May 18, 2021 at 08:04:14PM +0200, Benjamin Berg wrote: > On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote: > > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote: > > > > > > [SNIP] > > > Unfortunately, I don't feel it'll work. The problem that I was > > > seeing > > > looked like a race condition in the PPM itself, where the window is > > > the > > > time between the UCSI_GET_CONNECTOR_STATUS command and the > > > subsequent > > > ACK. > > > For such a firmware level bug in the PPM, we need a way to detect > > > the > > > race condition when it happens (or get a fix for the firmware). > > > > OK. Let me know does the patch bring the issue back for you. > > So, I just tried the patch, and I can occasionally reproduce the issue > where "online" for the ucsi power adapter is stuck at "1" after > unplugging with the patch applied. Thanks for testing it. I'm still not sure that the PPM is the culprit here. I have a feeling that the problem you are seeing is caused by the workaround (bad workaround) that we have for the issue where the EC firmware does not return with the BUSY bit set in the CCI when it should in many cases. The UCSI ACPI driver has one minute timeout value for command completion because of that, which is way too long. So if the EC firmware decides to take its time before acknowledging command, the driver is stuck, and we start loosing the events... Well, I guess technically the PPM would be the culprit in the end in any case, but I'm just not sure that there is any race like you suspected. But this is off topic. I'll send you an RFC proposal what I think we could do about that. thanks,
On Wed, 2021-05-19 at 14:33 +0300, Heikki Krogerus wrote: > On Tue, May 18, 2021 at 08:04:14PM +0200, Benjamin Berg wrote: > > On Tue, 2021-05-18 at 16:29 +0300, Heikki Krogerus wrote: > > > On Mon, May 17, 2021 at 02:57:28PM +0200, Benjamin Berg wrote: > > > > > > > > [SNIP] > > > > Unfortunately, I don't feel it'll work. The problem that I was > > > > seeing > > > > looked like a race condition in the PPM itself, where the > > > > window is > > > > the > > > > time between the UCSI_GET_CONNECTOR_STATUS command and the > > > > subsequent > > > > ACK. > > > > For such a firmware level bug in the PPM, we need a way to > > > > detect > > > > the > > > > race condition when it happens (or get a fix for the firmware). > > > > > > OK. Let me know does the patch bring the issue back for you. > > > > So, I just tried the patch, and I can occasionally reproduce the > > issue > > where "online" for the ucsi power adapter is stuck at "1" after > > unplugging with the patch applied. > > Thanks for testing it. > > I'm still not sure that the PPM is the culprit here. I have a feeling > that the problem you are seeing is caused by the workaround (bad > workaround) that we have for the issue where the EC firmware does not > return with the BUSY bit set in the CCI when it should in many cases. > The UCSI ACPI driver has one minute timeout value for command > completion because of that, which is way too long. So if the EC > firmware decides to take its time before acknowledging command, the > driver is stuck, and we start loosing the events... Hmm, interesting, yes. If the PPM is sending a second change event before or after we ACK'ed the first one, and we loose this event, then that would absolutely explain the issue I am seeing. In my case, I was able to considerably increase the probability of the bug by inserting a sleep just before acknowledging the connector change. Not sure whether this is meaningful, but maybe that tells us something about who is at fault here. Benjamin > Well, I guess > technically the PPM would be the culprit in the end in any case, but > I'm just not sure that there is any race like you suspected. > > But this is off topic. I'll send you an RFC proposal what I think we > could do about that. > > > thanks, >
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c index 282c3c825c13..f451ce0132a9 100644 --- a/drivers/usb/typec/ucsi/ucsi.c +++ b/drivers/usb/typec/ucsi/ucsi.c @@ -694,8 +694,8 @@ static void ucsi_handle_connector_change(struct work_struct *work) ucsi_send_command(con->ucsi, command, NULL, 0); /* 3. ACK connector change */ - clear_bit(EVENT_PENDING, &ucsi->flags); ret = ucsi_acknowledge_connector_change(ucsi); + clear_bit(EVENT_PENDING, &ucsi->flags); if (ret) { dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret); goto out_unlock;
It's possible that the interrupt handler for the UCSI driver signals a connector changes after the handler clears the PENDING bit, but before it has sent the acknowledge request. The result is that the handler is invoked yet again, to ack the same connector change. At least some versions of the Qualcomm UCSI firmware will not handle the second - "spurious" - acknowledgment gracefully. So make sure to not clear the pending flag until the change is acknowledged. Any connector changes coming in after the acknowledgment, that would have the pending flag incorrectly cleared, would afaict be covered by the subsequent connector status check. Fixes: 217504a05532 ("usb: typec: ucsi: Work around PPM losing change information") Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> --- drivers/usb/typec/ucsi/ucsi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)