mbox series

[0/5] Fix various races in UCSI

Message ID 20240320073927.1641788-1-lk@c--e.de (mailing list archive)
Headers show
Series Fix various races in UCSI | expand

Message

Christian A. Ehrhardt March 20, 2024, 7:39 a.m. UTC
Fix various races in UCSI code:
- The EVENT_PENDING bit should be cleared under the PPM lock to
  avoid spurious re-checking of the connector status.
- The initial connector change notification during init may be
  lost which can cause a stuck UCSI controller. Observed by me
  and others during resume or after module reload.
- Unsupported commands must be ACKed. This was uncovered by the
  recent change from Jameson Thies that did sent unsupported commands.
- The DELL quirk still isn't quite complete and I've found a more
  elegant way to handle this. A connector change ack _is_ accepted
  on affected systems if it is bundled with a command ack.
- If we do two consecutive resets or the controller is already
  reset at boog the second reset might complete early because the
  reset complete bit is already set. ucsi_ccg.c has a work around
  for this but it looks like an more general issue to me.

NOTE:
As a result of these individual fixes we could think about the
question if there are additional cases where we send some type
of command to the PPM while the bit that indicates its completion
is already set in CCI. And in fact there is one more case where
this can happen: The ack command that clears the connector change
is sent directly after the ack command for the previous command.
It might be possible to simply ack the connector change along with
the first command ucsi_handle_connector_change() and not at the
end. AFAICS the connector lock should protect us from races that
might arise out of this.

Christian A. Ehrhardt (5):
  usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
  usb: typec: ucsi: Check for notifications after init
  usb: typec: ucsi: Ack unsupported commands
  usb: typec: ucsi_acpi: Refactor and fix DELL quirk
  usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset

 drivers/usb/typec/ucsi/ucsi.c      | 56 ++++++++++++++++++++--
 drivers/usb/typec/ucsi/ucsi_acpi.c | 75 +++++++++++++-----------------
 2 files changed, 84 insertions(+), 47 deletions(-)

Comments

Kenneth Crudup March 20, 2024, 10:53 a.m. UTC | #1
Applied (cleanly) onto 6.8.1; I'll be testing over the next few days, 
but a few connects/disconnects mixed in with suspend/resume cycles shows 
no obvious issues (and everything seems to work).

Dell XPS 9320, BIOS 2.10.0

-K

On 3/20/24 00:39, Christian A. Ehrhardt wrote:
> Fix various races in UCSI code:
> - The EVENT_PENDING bit should be cleared under the PPM lock to
>    avoid spurious re-checking of the connector status.
> - The initial connector change notification during init may be
>    lost which can cause a stuck UCSI controller. Observed by me
>    and others during resume or after module reload.
> - Unsupported commands must be ACKed. This was uncovered by the
>    recent change from Jameson Thies that did sent unsupported commands.
> - The DELL quirk still isn't quite complete and I've found a more
>    elegant way to handle this. A connector change ack _is_ accepted
>    on affected systems if it is bundled with a command ack.
> - If we do two consecutive resets or the controller is already
>    reset at boog the second reset might complete early because the
>    reset complete bit is already set. ucsi_ccg.c has a work around
>    for this but it looks like an more general issue to me.
> 
> NOTE:
> As a result of these individual fixes we could think about the
> question if there are additional cases where we send some type
> of command to the PPM while the bit that indicates its completion
> is already set in CCI. And in fact there is one more case where
> this can happen: The ack command that clears the connector change
> is sent directly after the ack command for the previous command.
> It might be possible to simply ack the connector change along with
> the first command ucsi_handle_connector_change() and not at the
> end. AFAICS the connector lock should protect us from races that
> might arise out of this.
> 
> Christian A. Ehrhardt (5):
>    usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
>    usb: typec: ucsi: Check for notifications after init
>    usb: typec: ucsi: Ack unsupported commands
>    usb: typec: ucsi_acpi: Refactor and fix DELL quirk
>    usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
> 
>   drivers/usb/typec/ucsi/ucsi.c      | 56 ++++++++++++++++++++--
>   drivers/usb/typec/ucsi/ucsi_acpi.c | 75 +++++++++++++-----------------
>   2 files changed, 84 insertions(+), 47 deletions(-)
>
Heikki Krogerus March 22, 2024, 10:02 a.m. UTC | #2
On Wed, Mar 20, 2024 at 08:39:21AM +0100, Christian A. Ehrhardt wrote:
> Fix various races in UCSI code:
> - The EVENT_PENDING bit should be cleared under the PPM lock to
>   avoid spurious re-checking of the connector status.
> - The initial connector change notification during init may be
>   lost which can cause a stuck UCSI controller. Observed by me
>   and others during resume or after module reload.
> - Unsupported commands must be ACKed. This was uncovered by the
>   recent change from Jameson Thies that did sent unsupported commands.
> - The DELL quirk still isn't quite complete and I've found a more
>   elegant way to handle this. A connector change ack _is_ accepted
>   on affected systems if it is bundled with a command ack.
> - If we do two consecutive resets or the controller is already
>   reset at boog the second reset might complete early because the
>   reset complete bit is already set. ucsi_ccg.c has a work around
>   for this but it looks like an more general issue to me.
> 
> NOTE:
> As a result of these individual fixes we could think about the
> question if there are additional cases where we send some type
> of command to the PPM while the bit that indicates its completion
> is already set in CCI. And in fact there is one more case where
> this can happen: The ack command that clears the connector change
> is sent directly after the ack command for the previous command.
> It might be possible to simply ack the connector change along with
> the first command ucsi_handle_connector_change() and not at the
> end. AFAICS the connector lock should protect us from races that
> might arise out of this.

That sounds good to me.

thanks,
Neil Armstrong March 22, 2024, 10:57 a.m. UTC | #3
On 20/03/2024 11:53, Kenneth Crudup wrote:
> 
> Applied (cleanly) onto 6.8.1; I'll be testing over the next few days, but a few connects/disconnects mixed in with suspend/resume cycles shows no obvious issues (and everything seems to work).
> 
> Dell XPS 9320, BIOS 2.10.0
> 
> -K
> 
> On 3/20/24 00:39, Christian A. Ehrhardt wrote:
>> Fix various races in UCSI code:
>> - The EVENT_PENDING bit should be cleared under the PPM lock to
>>    avoid spurious re-checking of the connector status.
>> - The initial connector change notification during init may be
>>    lost which can cause a stuck UCSI controller. Observed by me
>>    and others during resume or after module reload.
>> - Unsupported commands must be ACKed. This was uncovered by the
>>    recent change from Jameson Thies that did sent unsupported commands.
>> - The DELL quirk still isn't quite complete and I've found a more
>>    elegant way to handle this. A connector change ack _is_ accepted
>>    on affected systems if it is bundled with a command ack.
>> - If we do two consecutive resets or the controller is already
>>    reset at boog the second reset might complete early because the
>>    reset complete bit is already set. ucsi_ccg.c has a work around
>>    for this but it looks like an more general issue to me.
>>
>> NOTE:
>> As a result of these individual fixes we could think about the
>> question if there are additional cases where we send some type
>> of command to the PPM while the bit that indicates its completion
>> is already set in CCI. And in fact there is one more case where
>> this can happen: The ack command that clears the connector change
>> is sent directly after the ack command for the previous command.
>> It might be possible to simply ack the connector change along with
>> the first command ucsi_handle_connector_change() and not at the
>> end. AFAICS the connector lock should protect us from races that
>> might arise out of this.
>>
>> Christian A. Ehrhardt (5):
>>    usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
>>    usb: typec: ucsi: Check for notifications after init
>>    usb: typec: ucsi: Ack unsupported commands
>>    usb: typec: ucsi_acpi: Refactor and fix DELL quirk
>>    usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
>>
>>   drivers/usb/typec/ucsi/ucsi.c      | 56 ++++++++++++++++++++--
>>   drivers/usb/typec/ucsi/ucsi_acpi.c | 75 +++++++++++++-----------------
>>   2 files changed, 84 insertions(+), 47 deletions(-)
>>
> 

with [1] applied:

Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK

[1] https://lore.kernel.org/all/20240315171836.343830-1-jthies@google.com/

Thanks,
Neil