Message ID | 20240214012328.BlueZ.1.I10eda6f375bc8dfedf4eef3a8cf6572c65803afc@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [BlueZ] Bluetooth: hci_core: Skip hci_cmd_work if hci_request is pending | expand |
Hi Hsin-chen, On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote: > > hci_cmd_work overwrites the hdev->sent_cmd which contains the required > info for a hci_request to work. In the real world, it's observed that > a request from hci_le_ext_create_conn_sync could be interrupted by > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the > LE connection is triggered by MGMT, it freezes the whole MGMT interface. > > Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> > --- > > net/bluetooth/hci_core.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 34c8dca2069f..e3706889976d 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work) > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); > > - /* Send queued commands */ > - if (atomic_read(&hdev->cmd_cnt)) { > + /* Send queued commands. Don't send the command when there is a pending > + * hci_request because the request callbacks would be overwritten. > + */ > + if (atomic_read(&hdev->cmd_cnt) && > + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { > skb = skb_dequeue(&hdev->cmd_q); > if (!skb) > return; > -- > 2.43.0.687.g38aa6559b0-goog This seems to be causing some mgmt-tester failures: Pair Device - Sec Mode 3 Success 1 Timed out 22.753 seconds Pair Device - Sec Mode 3 Reject 1 Timed out 22.533 seconds Pair Device - Sec Mode 3 Reject 2 Timed out 22.526 seconds I think this is because we need to respond to an event with a command like: < HCI Command: Create Conn.. (0x01|0x0005) plen 13 #241 [hci0] 16:25:38.699066 Address: 00:AA:01:01:00:00 (Intel Corporation) Packet type: 0x0018 DM1 may be used DH1 may be used Page scan repetition mode: R2 (0x02) Page scan mode: Mandatory (0x00) Clock offset: 0x0000 Role switch: Allow peripheral (0x01) > HCI Event: Command Status (0x0f) plen 4 #242 [hci0] 16:25:38.701881 Create Connection (0x01|0x0005) ncmd 1 Status: Success (0x00) > HCI Event: Link Key Request (0x17) plen 6 #243 [hci0] 16:25:38.702375 Address: 00:AA:01:01:00:00 (Intel Corporation) But because Create Connection is pending we cannot respond to Link Key Request, so it is actually a design problem if we cannot send commands because something is pending so perhaps we need to redesign how we store cmd_sent so we can have multiple outstanding commands rather than just one.
Hi Hsin-chen, On Thu, Feb 15, 2024 at 11:21 PM Hsin-chen Chuang <chharry@google.com> wrote: > > +Some Googlers who would be interested in > > Hi Luiz, > > How about moving the hci_req-related data out from sent_cmd? This allows sending HCI commands while hci_req data would not be overwritten. I have something like the following in the works: https://gist.github.com/Vudentz/251275bb688fac32585f90ac0076c407 It is not stable yet, but I think we can get away with it since it just means we can keep the pending request stored in the req_skb, that said we might need to overhaul this design since it is not very clean in my opinion. > On Fri, Feb 16, 2024 at 5:37 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: >> >> Hi Hsin-chen, >> >> On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote: >> > >> > hci_cmd_work overwrites the hdev->sent_cmd which contains the required >> > info for a hci_request to work. In the real world, it's observed that >> > a request from hci_le_ext_create_conn_sync could be interrupted by >> > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When >> > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the >> > LE connection is triggered by MGMT, it freezes the whole MGMT interface. >> > >> > Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> >> > --- >> > >> > net/bluetooth/hci_core.c | 7 +++++-- >> > 1 file changed, 5 insertions(+), 2 deletions(-) >> > >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> > index 34c8dca2069f..e3706889976d 100644 >> > --- a/net/bluetooth/hci_core.c >> > +++ b/net/bluetooth/hci_core.c >> > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work) >> > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, >> > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); >> > >> > - /* Send queued commands */ >> > - if (atomic_read(&hdev->cmd_cnt)) { >> > + /* Send queued commands. Don't send the command when there is a pending >> > + * hci_request because the request callbacks would be overwritten. >> > + */ >> > + if (atomic_read(&hdev->cmd_cnt) && >> > + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { >> > skb = skb_dequeue(&hdev->cmd_q); >> > if (!skb) >> > return; >> > -- >> > 2.43.0.687.g38aa6559b0-goog >> >> >> This seems to be causing some mgmt-tester failures: >> >> Pair Device - Sec Mode 3 Success 1 Timed out 22.753 seconds >> Pair Device - Sec Mode 3 Reject 1 Timed out 22.533 seconds >> Pair Device - Sec Mode 3 Reject 2 Timed out 22.526 seconds >> >> I think this is because we need to respond to an event with a command like: >> >> < HCI Command: Create Conn.. (0x01|0x0005) plen 13 #241 [hci0] 16:25:38.699066 >> Address: 00:AA:01:01:00:00 (Intel Corporation) >> Packet type: 0x0018 >> DM1 may be used >> DH1 may be used >> Page scan repetition mode: R2 (0x02) >> Page scan mode: Mandatory (0x00) >> Clock offset: 0x0000 >> Role switch: Allow peripheral (0x01) >> > HCI Event: Command Status (0x0f) plen 4 #242 [hci0] 16:25:38.701881 >> Create Connection (0x01|0x0005) ncmd 1 >> Status: Success (0x00) >> > HCI Event: Link Key Request (0x17) plen 6 #243 [hci0] 16:25:38.702375 >> Address: 00:AA:01:01:00:00 (Intel Corporation) >> >> But because Create Connection is pending we cannot respond to Link Key >> Request, so it is actually a design problem if we cannot send commands >> because something is pending so perhaps we need to redesign how we >> store cmd_sent so we can have multiple outstanding commands rather >> than just one. >> >> -- >> Luiz Augusto von Dentz > > > > -- > Best Regards, > Hsin-chen
Hi Luiz, Good point. I found the hdev->req_skb is actually not read at all in the current code base, so perhaps we could remove the legacy code entirely, and reuse it for solving the problem we're facing now. This approach seems cleaner to me. WDYT? OTOH the below snippet in hci_event_packet (also another similar one in hci_le_meta_evt) needs redesign as well. It distinguishes LE/Classic event per hdev->sent_cmd, so the hci_req_cmd_complete for LE command would never be called after a classic command is sent, vice versa. /* Only match event if command OGF is not for LE */ if (hdev->sent_cmd && hci_opcode_ogf(hci_skb_opcode(hdev->sent_cmd)) != 0x08 && hci_skb_event(hdev->sent_cmd) == event) { hci_req_cmd_complete(hdev, hci_skb_opcode(hdev->sent_cmd), status, &req_complete, &req_complete_skb); req_evt = event; } On Fri, Feb 16, 2024 at 11:54 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Hsin-chen, > > On Thu, Feb 15, 2024 at 11:21 PM Hsin-chen Chuang <chharry@google.com> wrote: > > > > +Some Googlers who would be interested in > > > > Hi Luiz, > > > > How about moving the hci_req-related data out from sent_cmd? This allows sending HCI commands while hci_req data would not be overwritten. > > I have something like the following in the works: > > https://gist.github.com/Vudentz/251275bb688fac32585f90ac0076c407 > > It is not stable yet, but I think we can get away with it since it > just means we can keep the pending request stored in the req_skb, that > said we might need to overhaul this design since it is not very clean > in my opinion. > > > On Fri, Feb 16, 2024 at 5:37 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > >> > >> Hi Hsin-chen, > >> > >> On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote: > >> > > >> > hci_cmd_work overwrites the hdev->sent_cmd which contains the required > >> > info for a hci_request to work. In the real world, it's observed that > >> > a request from hci_le_ext_create_conn_sync could be interrupted by > >> > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When > >> > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the > >> > LE connection is triggered by MGMT, it freezes the whole MGMT interface. > >> > > >> > Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> > >> > --- > >> > > >> > net/bluetooth/hci_core.c | 7 +++++-- > >> > 1 file changed, 5 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > >> > index 34c8dca2069f..e3706889976d 100644 > >> > --- a/net/bluetooth/hci_core.c > >> > +++ b/net/bluetooth/hci_core.c > >> > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work) > >> > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, > >> > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); > >> > > >> > - /* Send queued commands */ > >> > - if (atomic_read(&hdev->cmd_cnt)) { > >> > + /* Send queued commands. Don't send the command when there is a pending > >> > + * hci_request because the request callbacks would be overwritten. > >> > + */ > >> > + if (atomic_read(&hdev->cmd_cnt) && > >> > + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { > >> > skb = skb_dequeue(&hdev->cmd_q); > >> > if (!skb) > >> > return; > >> > -- > >> > 2.43.0.687.g38aa6559b0-goog > >> > >> > >> This seems to be causing some mgmt-tester failures: > >> > >> Pair Device - Sec Mode 3 Success 1 Timed out 22.753 seconds > >> Pair Device - Sec Mode 3 Reject 1 Timed out 22.533 seconds > >> Pair Device - Sec Mode 3 Reject 2 Timed out 22.526 seconds > >> > >> I think this is because we need to respond to an event with a command like: > >> > >> < HCI Command: Create Conn.. (0x01|0x0005) plen 13 #241 [hci0] 16:25:38.699066 > >> Address: 00:AA:01:01:00:00 (Intel Corporation) > >> Packet type: 0x0018 > >> DM1 may be used > >> DH1 may be used > >> Page scan repetition mode: R2 (0x02) > >> Page scan mode: Mandatory (0x00) > >> Clock offset: 0x0000 > >> Role switch: Allow peripheral (0x01) > >> > HCI Event: Command Status (0x0f) plen 4 #242 [hci0] 16:25:38.701881 > >> Create Connection (0x01|0x0005) ncmd 1 > >> Status: Success (0x00) > >> > HCI Event: Link Key Request (0x17) plen 6 #243 [hci0] 16:25:38.702375 > >> Address: 00:AA:01:01:00:00 (Intel Corporation) > >> > >> But because Create Connection is pending we cannot respond to Link Key > >> Request, so it is actually a design problem if we cannot send commands > >> because something is pending so perhaps we need to redesign how we > >> store cmd_sent so we can have multiple outstanding commands rather > >> than just one. > >> > >> -- > >> Luiz Augusto von Dentz > > > > > > > > -- > > Best Regards, > > Hsin-chen > > > > -- > Luiz Augusto von Dentz
Hi Hsin-chen, On Fri, Feb 16, 2024 at 11:47 AM Hsin-chen Chuang <chharry@google.com> wrote: > > Hi Luiz, > > Good point. I found the hdev->req_skb is actually not read at all in the > current code base, so perhaps we could remove the legacy code entirely, > and reuse it for solving the problem we're facing now. This approach > seems cleaner to me. WDYT? > > OTOH the below snippet in hci_event_packet (also another similar one in > hci_le_meta_evt) needs redesign as well. It distinguishes LE/Classic > event per hdev->sent_cmd, so the hci_req_cmd_complete for LE command > would never be called after a classic command is sent, vice versa. > > /* Only match event if command OGF is not for LE */ > if (hdev->sent_cmd && > hci_opcode_ogf(hci_skb_opcode(hdev->sent_cmd)) != 0x08 && > hci_skb_event(hdev->sent_cmd) == event) { > hci_req_cmd_complete(hdev, hci_skb_opcode(hdev->sent_cmd), > status, &req_complete, &req_complete_skb); > req_evt = event; > } Yeah, Ive fixed that, but there are some mgmt-tester test that don't seem to be running for some reason, anyway this actually fixes tests like Pair Device - Sec Mode 3 Reject 1 since that happens to receive events that requires commands as responses which was taking 20+ seconds to complete since that would be overwrite the sent_cmd. > > On Fri, Feb 16, 2024 at 11:54 PM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Hsin-chen, > > > > On Thu, Feb 15, 2024 at 11:21 PM Hsin-chen Chuang <chharry@google.com> wrote: > > > > > > +Some Googlers who would be interested in > > > > > > Hi Luiz, > > > > > > How about moving the hci_req-related data out from sent_cmd? This allows sending HCI commands while hci_req data would not be overwritten. > > > > I have something like the following in the works: > > > > https://gist.github.com/Vudentz/251275bb688fac32585f90ac0076c407 > > > > It is not stable yet, but I think we can get away with it since it > > just means we can keep the pending request stored in the req_skb, that > > said we might need to overhaul this design since it is not very clean > > in my opinion. > > > > > On Fri, Feb 16, 2024 at 5:37 AM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > >> > > >> Hi Hsin-chen, > > >> > > >> On Tue, Feb 13, 2024 at 12:24 PM Hsin-chen Chuang <chharry@chromium.org> wrote: > > >> > > > >> > hci_cmd_work overwrites the hdev->sent_cmd which contains the required > > >> > info for a hci_request to work. In the real world, it's observed that > > >> > a request from hci_le_ext_create_conn_sync could be interrupted by > > >> > the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When > > >> > it happends, hci_le_ext_create_conn_sync hangs until timeout; If the > > >> > LE connection is triggered by MGMT, it freezes the whole MGMT interface. > > >> > > > >> > Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> > > >> > --- > > >> > > > >> > net/bluetooth/hci_core.c | 7 +++++-- > > >> > 1 file changed, 5 insertions(+), 2 deletions(-) > > >> > > > >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > >> > index 34c8dca2069f..e3706889976d 100644 > > >> > --- a/net/bluetooth/hci_core.c > > >> > +++ b/net/bluetooth/hci_core.c > > >> > @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work) > > >> > BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, > > >> > atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); > > >> > > > >> > - /* Send queued commands */ > > >> > - if (atomic_read(&hdev->cmd_cnt)) { > > >> > + /* Send queued commands. Don't send the command when there is a pending > > >> > + * hci_request because the request callbacks would be overwritten. > > >> > + */ > > >> > + if (atomic_read(&hdev->cmd_cnt) && > > >> > + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { > > >> > skb = skb_dequeue(&hdev->cmd_q); > > >> > if (!skb) > > >> > return; > > >> > -- > > >> > 2.43.0.687.g38aa6559b0-goog > > >> > > >> > > >> This seems to be causing some mgmt-tester failures: > > >> > > >> Pair Device - Sec Mode 3 Success 1 Timed out 22.753 seconds > > >> Pair Device - Sec Mode 3 Reject 1 Timed out 22.533 seconds > > >> Pair Device - Sec Mode 3 Reject 2 Timed out 22.526 seconds > > >> > > >> I think this is because we need to respond to an event with a command like: > > >> > > >> < HCI Command: Create Conn.. (0x01|0x0005) plen 13 #241 [hci0] 16:25:38.699066 > > >> Address: 00:AA:01:01:00:00 (Intel Corporation) > > >> Packet type: 0x0018 > > >> DM1 may be used > > >> DH1 may be used > > >> Page scan repetition mode: R2 (0x02) > > >> Page scan mode: Mandatory (0x00) > > >> Clock offset: 0x0000 > > >> Role switch: Allow peripheral (0x01) > > >> > HCI Event: Command Status (0x0f) plen 4 #242 [hci0] 16:25:38.701881 > > >> Create Connection (0x01|0x0005) ncmd 1 > > >> Status: Success (0x00) > > >> > HCI Event: Link Key Request (0x17) plen 6 #243 [hci0] 16:25:38.702375 > > >> Address: 00:AA:01:01:00:00 (Intel Corporation) > > >> > > >> But because Create Connection is pending we cannot respond to Link Key > > >> Request, so it is actually a design problem if we cannot send commands > > >> because something is pending so perhaps we need to redesign how we > > >> store cmd_sent so we can have multiple outstanding commands rather > > >> than just one. > > >> > > >> -- > > >> Luiz Augusto von Dentz > > > > > > > > > > > > -- > > > Best Regards, > > > Hsin-chen > > > > > > > > -- > > Luiz Augusto von Dentz
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 34c8dca2069f..e3706889976d 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -4213,8 +4213,11 @@ static void hci_cmd_work(struct work_struct *work) BT_DBG("%s cmd_cnt %d cmd queued %d", hdev->name, atomic_read(&hdev->cmd_cnt), skb_queue_len(&hdev->cmd_q)); - /* Send queued commands */ - if (atomic_read(&hdev->cmd_cnt)) { + /* Send queued commands. Don't send the command when there is a pending + * hci_request because the request callbacks would be overwritten. + */ + if (atomic_read(&hdev->cmd_cnt) && + !hci_dev_test_flag(hdev, HCI_CMD_PENDING)) { skb = skb_dequeue(&hdev->cmd_q); if (!skb) return;
hci_cmd_work overwrites the hdev->sent_cmd which contains the required info for a hci_request to work. In the real world, it's observed that a request from hci_le_ext_create_conn_sync could be interrupted by the authentication (hci_conn_auth) caused by rfcomm_sock_connect. When it happends, hci_le_ext_create_conn_sync hangs until timeout; If the LE connection is triggered by MGMT, it freezes the whole MGMT interface. Signed-off-by: Hsin-chen Chuang <chharry@chromium.org> --- net/bluetooth/hci_core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)