diff mbox series

Bluetooth: Ignore cmd_timeout with HCI_USER_CHANNEL

Message ID 20220808110315.1.I5a39052e33f6f3c7406f53b0304a32ccf9f340fa@changeid (mailing list archive)
State New, archived
Headers show
Series Bluetooth: Ignore cmd_timeout with HCI_USER_CHANNEL | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
tedd_an/checkpatch success Checkpatch PASS
tedd_an/gitlint success Gitlint PASS
tedd_an/subjectprefix success PASS
tedd_an/buildkernel success Build Kernel PASS
tedd_an/buildkernel32 success Build Kernel32 PASS
tedd_an/incremental_build success Pass
tedd_an/testrunnersetup success Test Runner Setup PASS
tedd_an/testrunnerl2cap-tester success Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunnerbnep-tester success Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunnermgmt-tester success Total: 494, Passed: 494 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunnerrfcomm-tester success Total: 10, Passed: 10 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunnersco-tester success Total: 12, Passed: 12 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunnersmp-tester success Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0
tedd_an/testrunneruserchan-tester success Total: 4, Passed: 4 (100.0%), Failed: 0, Not Run: 0

Commit Message

Abhishek Pandit-Subedi Aug. 8, 2022, 6:04 p.m. UTC
From: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>

When using HCI_USER_CHANNEL, hci traffic is expected to be handled by
userspace entirely. However, it is still possible (and sometimes
desirable) to be able to send commands to the controller directly. In
such cases, the kernel command timeout shouldn't do any error handling.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
---
This was tested by running a userchannel stack and sending commands via
hcitool cmd on an Intel AX200 controller. Without this change, each
command sent via hcitool would result in hci_cmd_timeout being called
and after 5 commands, the controller would reset.

Hcitool continues working here because it marks the socket as
promiscuous and gets a copy of all traffic while the socket is open (and
does filtering in userspace).

Tested on Chromebook with 5.4 kernel with patch (and applied cleanly on
bluetooth-next).

 net/bluetooth/hci_core.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

Comments

bluez.test.bot@gmail.com Aug. 8, 2022, 7:14 p.m. UTC | #1
This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=666118

---Test result---

Test Summary:
CheckPatch                    PASS      1.29 seconds
GitLint                       PASS      0.76 seconds
SubjectPrefix                 PASS      0.57 seconds
BuildKernel                   PASS      43.76 seconds
BuildKernel32                 PASS      37.79 seconds
Incremental Build with patchesPASS      52.27 seconds
TestRunner: Setup             PASS      630.09 seconds
TestRunner: l2cap-tester      PASS      21.10 seconds
TestRunner: bnep-tester       PASS      8.49 seconds
TestRunner: mgmt-tester       PASS      133.23 seconds
TestRunner: rfcomm-tester     PASS      12.93 seconds
TestRunner: sco-tester        PASS      12.33 seconds
TestRunner: smp-tester        PASS      11.77 seconds
TestRunner: userchan-tester   PASS      8.42 seconds



---
Regards,
Linux Bluetooth
Luiz Augusto von Dentz Aug. 8, 2022, 8:31 p.m. UTC | #2
Hi Abhishek,

On Mon, Aug 8, 2022 at 11:04 AM Abhishek Pandit-Subedi
<abhishekpandit@google.com> wrote:
>
> From: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>
> When using HCI_USER_CHANNEL, hci traffic is expected to be handled by
> userspace entirely. However, it is still possible (and sometimes
> desirable) to be able to send commands to the controller directly. In
> such cases, the kernel command timeout shouldn't do any error handling.
>
> Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> ---
> This was tested by running a userchannel stack and sending commands via
> hcitool cmd on an Intel AX200 controller. Without this change, each
> command sent via hcitool would result in hci_cmd_timeout being called
> and after 5 commands, the controller would reset.
>
> Hcitool continues working here because it marks the socket as
> promiscuous and gets a copy of all traffic while the socket is open (and
> does filtering in userspace).

There is something not quite right here, if you have a controller
using user_channel (addr.hci_channel = HCI_CHANNEL_USER) it probably
shouldn't even accept to be opened again by the likes of hcitool which
uses HCI_CHANNEL_RAW as it can cause conflicts. If you really need a
test tool that does send the command while in HCI_CHANNEL_USER then it
must be send on that mode but I wouldn't do it with hcitool anyway as
that is deprecated and this exercise seem to revolve to a entire stack
on top of HCI_USER_CHANNEL then you shall use tools of that stack and
mix with BlueZ userspace tools.

> Tested on Chromebook with 5.4 kernel with patch (and applied cleanly on
> bluetooth-next).
>
>  net/bluetooth/hci_core.c | 26 +++++++++++++++++---------
>  1 file changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b3a5a3cc9372..c9a15f6633f7 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1481,17 +1481,25 @@ static void hci_cmd_timeout(struct work_struct *work)
>         struct hci_dev *hdev = container_of(work, struct hci_dev,
>                                             cmd_timer.work);
>
> -       if (hdev->sent_cmd) {
> -               struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
> -               u16 opcode = __le16_to_cpu(sent->opcode);
> +       /* Don't trigger the timeout behavior if it happens while we're in
> +        * userchannel mode. Userspace is responsible for handling any command
> +        * timeouts.
> +        */
> +       if (!(hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> +             test_bit(HCI_UP, &hdev->flags))) {
> +               if (hdev->sent_cmd) {
> +                       struct hci_command_hdr *sent =
> +                               (void *)hdev->sent_cmd->data;
> +                       u16 opcode = __le16_to_cpu(sent->opcode);
>
> -               bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> -       } else {
> -               bt_dev_err(hdev, "command tx timeout");
> -       }
> +                       bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> +               } else {
> +                       bt_dev_err(hdev, "command tx timeout");
> +               }
>
> -       if (hdev->cmd_timeout)
> -               hdev->cmd_timeout(hdev);
> +               if (hdev->cmd_timeout)
> +                       hdev->cmd_timeout(hdev);
> +       }

I wonder why hci_cmd_timeout is even active if the controller is in
HCI_USER_CHANNEL mode, that sounds like a bug already.

>         atomic_set(&hdev->cmd_cnt, 1);
>         queue_work(hdev->workqueue, &hdev->cmd_work);
> --
> 2.37.1.559.g78731f0fdb-goog
>
Abhishek Pandit-Subedi Aug. 8, 2022, 9:15 p.m. UTC | #3
On Mon, Aug 8, 2022 at 1:31 PM Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
>
> Hi Abhishek,
>
> On Mon, Aug 8, 2022 at 11:04 AM Abhishek Pandit-Subedi
> <abhishekpandit@google.com> wrote:
> >
> > From: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> >
> > When using HCI_USER_CHANNEL, hci traffic is expected to be handled by
> > userspace entirely. However, it is still possible (and sometimes
> > desirable) to be able to send commands to the controller directly. In
> > such cases, the kernel command timeout shouldn't do any error handling.
> >
> > Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> > ---
> > This was tested by running a userchannel stack and sending commands via
> > hcitool cmd on an Intel AX200 controller. Without this change, each
> > command sent via hcitool would result in hci_cmd_timeout being called
> > and after 5 commands, the controller would reset.
> >
> > Hcitool continues working here because it marks the socket as
> > promiscuous and gets a copy of all traffic while the socket is open (and
> > does filtering in userspace).
>
> There is something not quite right here, if you have a controller
> using user_channel (addr.hci_channel = HCI_CHANNEL_USER) it probably
> shouldn't even accept to be opened again by the likes of hcitool which
> uses HCI_CHANNEL_RAW as it can cause conflicts. If you really need a
> test tool that does send the command while in HCI_CHANNEL_USER then it
> must be send on that mode but I wouldn't do it with hcitool anyway as
> that is deprecated and this exercise seem to revolve to a entire stack
> on top of HCI_USER_CHANNEL then you shall use tools of that stack and
> mix with BlueZ userspace tools.

Our goal is eventually consistent with that aim (not having multiple
users to the socket when using HCI_CHANNEL_USER).

In the interim however, we have existing tooling that expects to be
able to write raw hci to the controller, get responses and not expect
any crashes (Intel Wireless Reporting Tools for example). hcitool is
just an easy test tool here and the real behavior being tested is RAW
channel injections not triggering the cmd timeout.

>
> > Tested on Chromebook with 5.4 kernel with patch (and applied cleanly on
> > bluetooth-next).
> >
> >  net/bluetooth/hci_core.c | 26 +++++++++++++++++---------
> >  1 file changed, 17 insertions(+), 9 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index b3a5a3cc9372..c9a15f6633f7 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1481,17 +1481,25 @@ static void hci_cmd_timeout(struct work_struct *work)
> >         struct hci_dev *hdev = container_of(work, struct hci_dev,
> >                                             cmd_timer.work);
> >
> > -       if (hdev->sent_cmd) {
> > -               struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
> > -               u16 opcode = __le16_to_cpu(sent->opcode);
> > +       /* Don't trigger the timeout behavior if it happens while we're in
> > +        * userchannel mode. Userspace is responsible for handling any command
> > +        * timeouts.
> > +        */
> > +       if (!(hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +             test_bit(HCI_UP, &hdev->flags))) {
> > +               if (hdev->sent_cmd) {
> > +                       struct hci_command_hdr *sent =
> > +                               (void *)hdev->sent_cmd->data;
> > +                       u16 opcode = __le16_to_cpu(sent->opcode);
> >
> > -               bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> > -       } else {
> > -               bt_dev_err(hdev, "command tx timeout");
> > -       }
> > +                       bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
> > +               } else {
> > +                       bt_dev_err(hdev, "command tx timeout");
> > +               }
> >
> > -       if (hdev->cmd_timeout)
> > -               hdev->cmd_timeout(hdev);
> > +               if (hdev->cmd_timeout)
> > +                       hdev->cmd_timeout(hdev);
> > +       }
>
> I wonder why hci_cmd_timeout is even active if the controller is in
> HCI_USER_CHANNEL mode, that sounds like a bug already.

This gets scheduled in hci_cmd_work. I tried not scheduling
hci_cmd_timeout in the first place but that caused the event stream to
hang (I think because subsequent tx work wasn't being scheduled). I
didn't dive very deep here and fix looked complex for a scenario that
we will migrate away from.

>
> >         atomic_set(&hdev->cmd_cnt, 1);
> >         queue_work(hdev->workqueue, &hdev->cmd_work);
> > --
> > 2.37.1.559.g78731f0fdb-goog
> >
>
>
> --
> Luiz Augusto von Dentz
diff mbox series

Patch

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index b3a5a3cc9372..c9a15f6633f7 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1481,17 +1481,25 @@  static void hci_cmd_timeout(struct work_struct *work)
 	struct hci_dev *hdev = container_of(work, struct hci_dev,
 					    cmd_timer.work);
 
-	if (hdev->sent_cmd) {
-		struct hci_command_hdr *sent = (void *) hdev->sent_cmd->data;
-		u16 opcode = __le16_to_cpu(sent->opcode);
+	/* Don't trigger the timeout behavior if it happens while we're in
+	 * userchannel mode. Userspace is responsible for handling any command
+	 * timeouts.
+	 */
+	if (!(hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+	      test_bit(HCI_UP, &hdev->flags))) {
+		if (hdev->sent_cmd) {
+			struct hci_command_hdr *sent =
+				(void *)hdev->sent_cmd->data;
+			u16 opcode = __le16_to_cpu(sent->opcode);
 
-		bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
-	} else {
-		bt_dev_err(hdev, "command tx timeout");
-	}
+			bt_dev_err(hdev, "command 0x%4.4x tx timeout", opcode);
+		} else {
+			bt_dev_err(hdev, "command tx timeout");
+		}
 
-	if (hdev->cmd_timeout)
-		hdev->cmd_timeout(hdev);
+		if (hdev->cmd_timeout)
+			hdev->cmd_timeout(hdev);
+	}
 
 	atomic_set(&hdev->cmd_cnt, 1);
 	queue_work(hdev->workqueue, &hdev->cmd_work);