mbox series

[v3,0/4] Bluetooth: Improve retrying of connection attempts

Message ID 20240108224614.56900-1-verdre@v0yd.nl (mailing list archive)
Headers show
Series Bluetooth: Improve retrying of connection attempts | expand

Message

Jonas Dreßler Jan. 8, 2024, 10:46 p.m. UTC
Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
requests"), the kernel supports trying to connect again in case the
bluetooth card is busy and fails to connect.

The logic that should handle this became a bit spotty over time, and also
cards these days appear to fail with more errors than just "Command
Disallowed".

This series refactores the handling of concurrent connection requests
by serializing all "Create Connection" commands for ACL connections
similar to how we do it for LE connections.

---

v1: https://lore.kernel.org/linux-bluetooth/20240102185933.64179-1-verdre@v0yd.nl/
v2: https://lore.kernel.org/linux-bluetooth/20240108183938.468426-1-verdre@v0yd.nl/
v3:
  - Move the new sync function to hci_sync.c as requested by review
  - Abort connection on failure using hci_abort_conn_sync() instead of
    hci_abort_conn()
  - Make the last commit message a bit more precise regarding the meaning
    of BT_CONNECT2 state

Jonas Dreßler (4):
  Bluetooth: Remove superfluous call to hci_conn_check_pending()
  Bluetooth: hci_event: Use HCI error defines instead of magic values
  Bluetooth: hci_conn: Only do ACL connections sequentially
  Bluetooth: Remove pending ACL connection attempts

 include/net/bluetooth/hci.h      |  3 ++
 include/net/bluetooth/hci_core.h |  1 -
 include/net/bluetooth/hci_sync.h |  3 ++
 net/bluetooth/hci_conn.c         | 83 +++-----------------------------
 net/bluetooth/hci_event.c        | 29 +++--------
 net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
 6 files changed, 93 insertions(+), 98 deletions(-)

Comments

Luiz Augusto von Dentz Jan. 9, 2024, 5:53 p.m. UTC | #1
Hi Jonas,

On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
> requests"), the kernel supports trying to connect again in case the
> bluetooth card is busy and fails to connect.
>
> The logic that should handle this became a bit spotty over time, and also
> cards these days appear to fail with more errors than just "Command
> Disallowed".
>
> This series refactores the handling of concurrent connection requests
> by serializing all "Create Connection" commands for ACL connections
> similar to how we do it for LE connections.
>
> ---
>
> v1: https://lore.kernel.org/linux-bluetooth/20240102185933.64179-1-verdre@v0yd.nl/
> v2: https://lore.kernel.org/linux-bluetooth/20240108183938.468426-1-verdre@v0yd.nl/
> v3:
>   - Move the new sync function to hci_sync.c as requested by review
>   - Abort connection on failure using hci_abort_conn_sync() instead of
>     hci_abort_conn()
>   - Make the last commit message a bit more precise regarding the meaning
>     of BT_CONNECT2 state
>
> Jonas Dreßler (4):
>   Bluetooth: Remove superfluous call to hci_conn_check_pending()
>   Bluetooth: hci_event: Use HCI error defines instead of magic values
>   Bluetooth: hci_conn: Only do ACL connections sequentially
>   Bluetooth: Remove pending ACL connection attempts
>
>  include/net/bluetooth/hci.h      |  3 ++
>  include/net/bluetooth/hci_core.h |  1 -
>  include/net/bluetooth/hci_sync.h |  3 ++
>  net/bluetooth/hci_conn.c         | 83 +++-----------------------------
>  net/bluetooth/hci_event.c        | 29 +++--------
>  net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
>  6 files changed, 93 insertions(+), 98 deletions(-)
>
> --
> 2.43.0

After rebasing and fixing a little bit here and there, see v4, looks
like this changes is affecting the following mgmt-tester -s "Pair
Device - Power off 1":

Pair Device - Power off 1 - init
  Read Version callback
    Status: Success (0x00)
    Version 1.22
  Read Commands callback
    Status: Success (0x00)
  Read Index List callback
    Status: Success (0x00)
  Index Added callback
    Index: 0x0000
  Enable management Mesh interface
  Enabling Mesh feature
  Read Info callback
    Status: Success (0x00)
    Address: 00:AA:01:00:00:00
    Version: 0x09
    Manufacturer: 0x05f1
    Supported settings: 0x0001bfff
    Current settings: 0x00000080
    Class: 0x000000
    Name:
    Short name:
  Mesh feature is enabled
Pair Device - Power off 1 - setup
  Setup sending Set Bondable (0x0009)
  Setup sending Set Powered (0x0005)
  Initial settings completed
  Test setup condition added, total 1
  Client set connectable: Success (0x00)
  Test setup condition complete, 0 left
Pair Device - Power off 1 - setup complete
Pair Device - Power off 1 - run
  Sending Pair Device (0x0019)
Bluetooth: hci0: command 0x0405 tx timeout
Bluetooth: hci0: command 0x0408 tx timeout
  Test condition added, total 1
Pair Device - Power off 1 - test timed out
  Pair Device (0x0019): Disconnected (0x0e)
Pair Device - Power off 1 - test not run
Pair Device - Power off 1 - teardown
Pair Device - Power off 1 - teardown
  Index Removed callback
    Index: 0x0000
Pair Device - Power off 1 - teardown complete
Pair Device - Power off 1 - done
patchwork-bot+bluetooth@kernel.org Jan. 9, 2024, 7:20 p.m. UTC | #2
Hello:

This series was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <luiz.von.dentz@intel.com>:

On Mon,  8 Jan 2024 23:46:05 +0100 you wrote:
> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
> requests"), the kernel supports trying to connect again in case the
> bluetooth card is busy and fails to connect.
> 
> The logic that should handle this became a bit spotty over time, and also
> cards these days appear to fail with more errors than just "Command
> Disallowed".
> 
> [...]

Here is the summary with links:
  - [v3,1/4] Bluetooth: Remove superfluous call to hci_conn_check_pending()
    https://git.kernel.org/bluetooth/bluetooth-next/c/a7ee39bea315
  - [v3,2/4] Bluetooth: hci_event: Use HCI error defines instead of magic values
    https://git.kernel.org/bluetooth/bluetooth-next/c/f8c47ee39e6d
  - [v3,3/4] Bluetooth: hci_conn: Only do ACL connections sequentially
    (no matching commit)
  - [v3,4/4] Bluetooth: Remove pending ACL connection attempts
    (no matching commit)

You are awesome, thank you!
Jonas Dreßler Jan. 9, 2024, 9:57 p.m. UTC | #3
Hi Luiz,

On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
> Hi Jonas,
> 
> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>
>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
>> requests"), the kernel supports trying to connect again in case the
>> bluetooth card is busy and fails to connect.
>>
>> The logic that should handle this became a bit spotty over time, and also
>> cards these days appear to fail with more errors than just "Command
>> Disallowed".
>>
>> This series refactores the handling of concurrent connection requests
>> by serializing all "Create Connection" commands for ACL connections
>> similar to how we do it for LE connections.
>>
>> ---
>>
>> v1: https://lore.kernel.org/linux-bluetooth/20240102185933.64179-1-verdre@v0yd.nl/
>> v2: https://lore.kernel.org/linux-bluetooth/20240108183938.468426-1-verdre@v0yd.nl/
>> v3:
>>    - Move the new sync function to hci_sync.c as requested by review
>>    - Abort connection on failure using hci_abort_conn_sync() instead of
>>      hci_abort_conn()
>>    - Make the last commit message a bit more precise regarding the meaning
>>      of BT_CONNECT2 state
>>
>> Jonas Dreßler (4):
>>    Bluetooth: Remove superfluous call to hci_conn_check_pending()
>>    Bluetooth: hci_event: Use HCI error defines instead of magic values
>>    Bluetooth: hci_conn: Only do ACL connections sequentially
>>    Bluetooth: Remove pending ACL connection attempts
>>
>>   include/net/bluetooth/hci.h      |  3 ++
>>   include/net/bluetooth/hci_core.h |  1 -
>>   include/net/bluetooth/hci_sync.h |  3 ++
>>   net/bluetooth/hci_conn.c         | 83 +++-----------------------------
>>   net/bluetooth/hci_event.c        | 29 +++--------
>>   net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
>>   6 files changed, 93 insertions(+), 98 deletions(-)
>>
>> --
>> 2.43.0
> 
> After rebasing and fixing a little bit here and there, see v4, looks
> like this changes is affecting the following mgmt-tester -s "Pair
> Device - Power off 1":
> 
> Pair Device - Power off 1 - init
>    Read Version callback
>      Status: Success (0x00)
>      Version 1.22
>    Read Commands callback
>      Status: Success (0x00)
>    Read Index List callback
>      Status: Success (0x00)
>    Index Added callback
>      Index: 0x0000
>    Enable management Mesh interface
>    Enabling Mesh feature
>    Read Info callback
>      Status: Success (0x00)
>      Address: 00:AA:01:00:00:00
>      Version: 0x09
>      Manufacturer: 0x05f1
>      Supported settings: 0x0001bfff
>      Current settings: 0x00000080
>      Class: 0x000000
>      Name:
>      Short name:
>    Mesh feature is enabled
> Pair Device - Power off 1 - setup
>    Setup sending Set Bondable (0x0009)
>    Setup sending Set Powered (0x0005)
>    Initial settings completed
>    Test setup condition added, total 1
>    Client set connectable: Success (0x00)
>    Test setup condition complete, 0 left
> Pair Device - Power off 1 - setup complete
> Pair Device - Power off 1 - run
>    Sending Pair Device (0x0019)
> Bluetooth: hci0: command 0x0405 tx timeout
> Bluetooth: hci0: command 0x0408 tx timeout
>    Test condition added, total 1
> Pair Device - Power off 1 - test timed out
>    Pair Device (0x0019): Disconnected (0x0e)
> Pair Device - Power off 1 - test not run
> Pair Device - Power off 1 - teardown
> Pair Device - Power off 1 - teardown
>    Index Removed callback
>      Index: 0x0000
> Pair Device - Power off 1 - teardown complete
> Pair Device - Power off 1 - done
> 

Thanks for landing the first two commits!

I think this is actually the same issue causing the test failure
as in the other issue I had:
https://lore.kernel.org/linux-bluetooth/7cee4e74-3a0c-4b7c-9984-696e646160f8@v0yd.nl/

It seems that the emulator is unable to reply to HCI commands sent
from the hci_sync machinery, possibly because that is sending things
on a separate thread?

Cheers,
Jonas
Jonas Dreßler Jan. 24, 2024, 4:17 p.m. UTC | #4
Hi Luiz,

On 1/9/24 10:57 PM, Jonas Dreßler wrote:
> Hi Luiz,
> 
> On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
>> Hi Jonas,
>>
>> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
>>>
>>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
>>> requests"), the kernel supports trying to connect again in case the
>>> bluetooth card is busy and fails to connect.
>>>
>>> The logic that should handle this became a bit spotty over time, and also
>>> cards these days appear to fail with more errors than just "Command
>>> Disallowed".
>>>
>>> This series refactores the handling of concurrent connection requests
>>> by serializing all "Create Connection" commands for ACL connections
>>> similar to how we do it for LE connections.
>>>
>>> ---
>>>
>>> v1: https://lore.kernel.org/linux-bluetooth/20240102185933.64179-1-verdre@v0yd.nl/
>>> v2: https://lore.kernel.org/linux-bluetooth/20240108183938.468426-1-verdre@v0yd.nl/
>>> v3:
>>>    - Move the new sync function to hci_sync.c as requested by review
>>>    - Abort connection on failure using hci_abort_conn_sync() instead of
>>>      hci_abort_conn()
>>>    - Make the last commit message a bit more precise regarding the meaning
>>>      of BT_CONNECT2 state
>>>
>>> Jonas Dreßler (4):
>>>    Bluetooth: Remove superfluous call to hci_conn_check_pending()
>>>    Bluetooth: hci_event: Use HCI error defines instead of magic values
>>>    Bluetooth: hci_conn: Only do ACL connections sequentially
>>>    Bluetooth: Remove pending ACL connection attempts
>>>
>>>   include/net/bluetooth/hci.h      |  3 ++
>>>   include/net/bluetooth/hci_core.h |  1 -
>>>   include/net/bluetooth/hci_sync.h |  3 ++
>>>   net/bluetooth/hci_conn.c         | 83 +++-----------------------------
>>>   net/bluetooth/hci_event.c        | 29 +++--------
>>>   net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
>>>   6 files changed, 93 insertions(+), 98 deletions(-)
>>>
>>> -- 
>>> 2.43.0
>>
>> After rebasing and fixing a little bit here and there, see v4, looks
>> like this changes is affecting the following mgmt-tester -s "Pair
>> Device - Power off 1":
>>
>> Pair Device - Power off 1 - init
>>    Read Version callback
>>      Status: Success (0x00)
>>      Version 1.22
>>    Read Commands callback
>>      Status: Success (0x00)
>>    Read Index List callback
>>      Status: Success (0x00)
>>    Index Added callback
>>      Index: 0x0000
>>    Enable management Mesh interface
>>    Enabling Mesh feature
>>    Read Info callback
>>      Status: Success (0x00)
>>      Address: 00:AA:01:00:00:00
>>      Version: 0x09
>>      Manufacturer: 0x05f1
>>      Supported settings: 0x0001bfff
>>      Current settings: 0x00000080
>>      Class: 0x000000
>>      Name:
>>      Short name:
>>    Mesh feature is enabled
>> Pair Device - Power off 1 - setup
>>    Setup sending Set Bondable (0x0009)
>>    Setup sending Set Powered (0x0005)
>>    Initial settings completed
>>    Test setup condition added, total 1
>>    Client set connectable: Success (0x00)
>>    Test setup condition complete, 0 left
>> Pair Device - Power off 1 - setup complete
>> Pair Device - Power off 1 - run
>>    Sending Pair Device (0x0019)
>> Bluetooth: hci0: command 0x0405 tx timeout
>> Bluetooth: hci0: command 0x0408 tx timeout
>>    Test condition added, total 1
>> Pair Device - Power off 1 - test timed out
>>    Pair Device (0x0019): Disconnected (0x0e)
>> Pair Device - Power off 1 - test not run
>> Pair Device - Power off 1 - teardown
>> Pair Device - Power off 1 - teardown
>>    Index Removed callback
>>      Index: 0x0000
>> Pair Device - Power off 1 - teardown complete
>> Pair Device - Power off 1 - done
>>
> 
> Thanks for landing the first two commits!
> 
> I think this is actually the same issue causing the test failure
> as in the other issue I had:
> https://lore.kernel.org/linux-bluetooth/7cee4e74-3a0c-4b7c-9984-696e646160f8@v0yd.nl/
> 
> It seems that the emulator is unable to reply to HCI commands sent
> from the hci_sync machinery, possibly because that is sending things
> on a separate thread?

Okay I did some further digging now: Turns out this actually not a problem
with vhci and the emulator, but (in this test case) it's actually intended
that there's the command times out, because force_power_off is TRUE for
this test case, and the HCI device gets shut down right after sending the MGMT
command.

The test broke because the "Command Complete" MGMT event comes back with status
"Disconnected" instead of "Not Powered": The reason for that is the
hci_abort_conn_sync() that I added in the case where the "Create Connection" HCI
times out. hci_abort_conn_sync() calls hci_conn_failed() with
HCI_ERROR_LOCAL_HOST_TERM as expected, this in turn calls the hci_connect_cfm()
callback (pairing_complete_cb), and there we we look up HCI_ERROR_LOCAL_HOST_TERM
in mgmt_status_table, ending up with MGMT_STATUS_DISCONNECTED.

When I remove the hci_abort_conn_sync() we get the "Not Powered" failure again,
I'm not exactly sure why that happens (I assume there's some kind of generic mgmt
failure return handler that checks hdev_is_powered() and then sets the error).

So the question now is do we want to adjust the test (and possibly bluetoothd?)
to expect "Disconnected" instead of "Not Powered", or should I get rid of the
hci_abort_conn_sync() again? Fwiw, in hci_le_create_conn_sync() we also clean
up like this on ETIMEDOUT (maybe the spec is just different there?), so
consistency wise it seems better to adjust the test to expect "Disconnected".

Cheers,
Jonas

> 
> Cheers,
> Jonas
Luiz Augusto von Dentz Jan. 24, 2024, 4:34 p.m. UTC | #5
Hi Jonas,

On Wed, Jan 24, 2024 at 11:17 AM Jonas Dreßler <verdre@v0yd.nl> wrote:
>
> Hi Luiz,
>
> On 1/9/24 10:57 PM, Jonas Dreßler wrote:
> > Hi Luiz,
> >
> > On 1/9/24 18:53, Luiz Augusto von Dentz wrote:
> >> Hi Jonas,
> >>
> >> On Mon, Jan 8, 2024 at 5:46 PM Jonas Dreßler <verdre@v0yd.nl> wrote:
> >>>
> >>> Since commit 4c67bc74f016 ("[Bluetooth] Support concurrent connect
> >>> requests"), the kernel supports trying to connect again in case the
> >>> bluetooth card is busy and fails to connect.
> >>>
> >>> The logic that should handle this became a bit spotty over time, and also
> >>> cards these days appear to fail with more errors than just "Command
> >>> Disallowed".
> >>>
> >>> This series refactores the handling of concurrent connection requests
> >>> by serializing all "Create Connection" commands for ACL connections
> >>> similar to how we do it for LE connections.
> >>>
> >>> ---
> >>>
> >>> v1: https://lore.kernel.org/linux-bluetooth/20240102185933.64179-1-verdre@v0yd.nl/
> >>> v2: https://lore.kernel.org/linux-bluetooth/20240108183938.468426-1-verdre@v0yd.nl/
> >>> v3:
> >>>    - Move the new sync function to hci_sync.c as requested by review
> >>>    - Abort connection on failure using hci_abort_conn_sync() instead of
> >>>      hci_abort_conn()
> >>>    - Make the last commit message a bit more precise regarding the meaning
> >>>      of BT_CONNECT2 state
> >>>
> >>> Jonas Dreßler (4):
> >>>    Bluetooth: Remove superfluous call to hci_conn_check_pending()
> >>>    Bluetooth: hci_event: Use HCI error defines instead of magic values
> >>>    Bluetooth: hci_conn: Only do ACL connections sequentially
> >>>    Bluetooth: Remove pending ACL connection attempts
> >>>
> >>>   include/net/bluetooth/hci.h      |  3 ++
> >>>   include/net/bluetooth/hci_core.h |  1 -
> >>>   include/net/bluetooth/hci_sync.h |  3 ++
> >>>   net/bluetooth/hci_conn.c         | 83 +++-----------------------------
> >>>   net/bluetooth/hci_event.c        | 29 +++--------
> >>>   net/bluetooth/hci_sync.c         | 72 +++++++++++++++++++++++++++
> >>>   6 files changed, 93 insertions(+), 98 deletions(-)
> >>>
> >>> --
> >>> 2.43.0
> >>
> >> After rebasing and fixing a little bit here and there, see v4, looks
> >> like this changes is affecting the following mgmt-tester -s "Pair
> >> Device - Power off 1":
> >>
> >> Pair Device - Power off 1 - init
> >>    Read Version callback
> >>      Status: Success (0x00)
> >>      Version 1.22
> >>    Read Commands callback
> >>      Status: Success (0x00)
> >>    Read Index List callback
> >>      Status: Success (0x00)
> >>    Index Added callback
> >>      Index: 0x0000
> >>    Enable management Mesh interface
> >>    Enabling Mesh feature
> >>    Read Info callback
> >>      Status: Success (0x00)
> >>      Address: 00:AA:01:00:00:00
> >>      Version: 0x09
> >>      Manufacturer: 0x05f1
> >>      Supported settings: 0x0001bfff
> >>      Current settings: 0x00000080
> >>      Class: 0x000000
> >>      Name:
> >>      Short name:
> >>    Mesh feature is enabled
> >> Pair Device - Power off 1 - setup
> >>    Setup sending Set Bondable (0x0009)
> >>    Setup sending Set Powered (0x0005)
> >>    Initial settings completed
> >>    Test setup condition added, total 1
> >>    Client set connectable: Success (0x00)
> >>    Test setup condition complete, 0 left
> >> Pair Device - Power off 1 - setup complete
> >> Pair Device - Power off 1 - run
> >>    Sending Pair Device (0x0019)
> >> Bluetooth: hci0: command 0x0405 tx timeout
> >> Bluetooth: hci0: command 0x0408 tx timeout
> >>    Test condition added, total 1
> >> Pair Device - Power off 1 - test timed out
> >>    Pair Device (0x0019): Disconnected (0x0e)
> >> Pair Device - Power off 1 - test not run
> >> Pair Device - Power off 1 - teardown
> >> Pair Device - Power off 1 - teardown
> >>    Index Removed callback
> >>      Index: 0x0000
> >> Pair Device - Power off 1 - teardown complete
> >> Pair Device - Power off 1 - done
> >>
> >
> > Thanks for landing the first two commits!
> >
> > I think this is actually the same issue causing the test failure
> > as in the other issue I had:
> > https://lore.kernel.org/linux-bluetooth/7cee4e74-3a0c-4b7c-9984-696e646160f8@v0yd.nl/
> >
> > It seems that the emulator is unable to reply to HCI commands sent
> > from the hci_sync machinery, possibly because that is sending things
> > on a separate thread?
>
> Okay I did some further digging now: Turns out this actually not a problem
> with vhci and the emulator, but (in this test case) it's actually intended
> that there's the command times out, because force_power_off is TRUE for
> this test case, and the HCI device gets shut down right after sending the MGMT
> command.
>
> The test broke because the "Command Complete" MGMT event comes back with status
> "Disconnected" instead of "Not Powered": The reason for that is the
> hci_abort_conn_sync() that I added in the case where the "Create Connection" HCI
> times out. hci_abort_conn_sync() calls hci_conn_failed() with
> HCI_ERROR_LOCAL_HOST_TERM as expected, this in turn calls the hci_connect_cfm()
> callback (pairing_complete_cb), and there we we look up HCI_ERROR_LOCAL_HOST_TERM
> in mgmt_status_table, ending up with MGMT_STATUS_DISCONNECTED.
>
> When I remove the hci_abort_conn_sync() we get the "Not Powered" failure again,
> I'm not exactly sure why that happens (I assume there's some kind of generic mgmt
> failure return handler that checks hdev_is_powered() and then sets the error).
>
> So the question now is do we want to adjust the test (and possibly bluetoothd?)
> to expect "Disconnected" instead of "Not Powered", or should I get rid of the
> hci_abort_conn_sync() again? Fwiw, in hci_le_create_conn_sync() we also clean
> up like this on ETIMEDOUT (maybe the spec is just different there?), so
> consistency wise it seems better to adjust the test to expect "Disconnected".

Great that you find time to dig into this, and yes I think it is fine
to expect a different error if in the process we clean up using
hci_abort_conn_sync we just need to make sure nothing else is affected
by this change.

> Cheers,
> Jonas
>
> >
> > Cheers,
> > Jonas