diff mbox series

Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.

Message ID 20220705125931.3601-1-vasyl.vavrychuk@opensynergy.com (mailing list archive)
State Handled Elsewhere
Headers show
Series Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync. | expand

Checks

Context Check Description
tedd_an/pre-ci_am fail error: patch failed: net/bluetooth/hci_core.c:2675 error: net/bluetooth/hci_core.c: patch does not apply hint: Use 'git am --show-current-patch' to see the failed patch

Commit Message

Vasyl Vavrychuk July 5, 2022, 12:59 p.m. UTC
`cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
commit [1] to ensure that power_on work is canceled after HCI interface
down.

But, in certain cases power_on work function may call hci_dev_close_sync
itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
happens when device is rfkilled on boot. To avoid deadlock, move
power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.

Deadlock introduced by commit [1] was reported in [2,3] as broken
suspend. Suspend did not work because `hdev->req_lock` held as result of
`power_on` work deadlock. In fact, other BT features were not working.
It was not observed when testing [1] since it was verified without
rfkill in place.

NOTE: It is not needed to cancel power_on work from other places where
hci_dev_do_close/hci_dev_close_sync is called in case:
* Requests were serialized due to `hdev->req_workqueue`. The power_on
work is first in that workqueue.
* hci_rfkill_set_block which won't close device anyway until HCI_SETUP
is on.
* hci_sock_release which runs after hci_sock_bind which ensures
HCI_SETUP was cleared.

As result, behaviour is the same as in pre-dd06ed7 commit, except
power_on work cancel added to hci_dev_close.

[1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
[2]: https://lore.kernel.org/lkml/20220614181706.26513-1-max.oss.09@gmail.com/
[2]: https://lore.kernel.org/lkml/1236061d-95dd-c3ad-a38f-2dae7aae51ef@o2.pl/

Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@opensynergy.com>
Reported-by: Max Krummenacher <max.krummenacher@toradex.com>
Reported-by: Mateusz Jonczyk <mat.jonczyk@o2.pl>
---
 net/bluetooth/hci_core.c | 3 +++
 net/bluetooth/hci_sync.c | 1 -
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Max Krummenacher July 5, 2022, 2:12 p.m. UTC | #1
On Tue, Jul 5, 2022 at 3:00 PM Vasyl Vavrychuk
<vasyl.vavrychuk@opensynergy.com> wrote:
>
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
>
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
>
> Deadlock introduced by commit [1] was reported in [2,3] as broken
> suspend. Suspend did not work because `hdev->req_lock` held as result of
> `power_on` work deadlock. In fact, other BT features were not working.
> It was not observed when testing [1] since it was verified without
> rfkill in place.
>
> NOTE: It is not needed to cancel power_on work from other places where
> hci_dev_do_close/hci_dev_close_sync is called in case:
> * Requests were serialized due to `hdev->req_workqueue`. The power_on
> work is first in that workqueue.
> * hci_rfkill_set_block which won't close device anyway until HCI_SETUP
> is on.
> * hci_sock_release which runs after hci_sock_bind which ensures
> HCI_SETUP was cleared.
>
> As result, behaviour is the same as in pre-dd06ed7 commit, except
> power_on work cancel added to hci_dev_close.
>
> [1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> [2]: https://lore.kernel.org/lkml/20220614181706.26513-1-max.oss.09@gmail.com/
> [2]: https://lore.kernel.org/lkml/1236061d-95dd-c3ad-a38f-2dae7aae51ef@o2.pl/
>
> Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@opensynergy.com>
> Reported-by: Max Krummenacher <max.krummenacher@toradex.com>
> Reported-by: Mateusz Jonczyk <mat.jonczyk@o2.pl>
> ---
>  net/bluetooth/hci_core.c | 3 +++
>  net/bluetooth/hci_sync.c | 1 -
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 59a5c1341c26..a0f99baafd35 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -571,6 +571,7 @@ int hci_dev_close(__u16 dev)
>                 goto done;
>         }
>
> +       cancel_work_sync(&hdev->power_on);
>         if (hci_dev_test_and_clear_flag(hdev, HCI_AUTO_OFF))
>                 cancel_delayed_work(&hdev->power_off);
>
> @@ -2675,6 +2676,8 @@ void hci_unregister_dev(struct hci_dev *hdev)
>         list_del(&hdev->list);
>         write_unlock(&hci_dev_list_lock);
>
> +       cancel_work_sync(&hdev->power_on);
> +
>         hci_cmd_sync_clear(hdev);
>
>         if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 286d6767f017..1739e8cb3291 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -4088,7 +4088,6 @@ int hci_dev_close_sync(struct hci_dev *hdev)
>
>         bt_dev_dbg(hdev, "");
>
> -       cancel_work_sync(&hdev->power_on);
>         cancel_delayed_work(&hdev->power_off);
>         cancel_delayed_work(&hdev->ncmd_timer);
>
> --
> 2.30.2
>

This fixes the issue I described in [1]. I.e. The kernel no longer
freezes while going to suspend.
Tested-by: Max Krummenacher <max.krummenacher@toradex.com>

Thanks!
Max
bluez.test.bot@gmail.com July 5, 2022, 2:13 p.m. UTC | #2
This is an automated email and please do not reply to this email.

Dear Submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
While preparing the CI tests, the patches you submitted couldn't be applied to the current HEAD of the repository.

----- Output -----
error: patch failed: net/bluetooth/hci_core.c:2675
error: net/bluetooth/hci_core.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch


Please resolve the issue and submit the patches again.


---
Regards,
Linux Bluetooth
Francesco Dolcini July 5, 2022, 3:14 p.m. UTC | #3
Hello Vasyl,

On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")

This fixes tag is broken, dd06ed7ad057 does not exist on
torvalds/master, and the `commit` word should be removed.

Should be:

Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")


Francesco
Luiz Augusto von Dentz July 5, 2022, 5:26 p.m. UTC | #4
Hi,

On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
<francesco.dolcini@toradex.com> wrote:
>
> Hello Vasyl,
>
> On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
>
> This fixes tag is broken, dd06ed7ad057 does not exist on
> torvalds/master, and the `commit` word should be removed.
>
> Should be:
>
> Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")

Ive rebased the patch on top of bluetooth-next and fixed the hash,
lets see if passes CI I might just go ahead and push it.
Jakub Kicinski July 5, 2022, 6:38 p.m. UTC | #5
On Tue, 5 Jul 2022 10:26:08 -0700 Luiz Augusto von Dentz wrote:
> On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
> <francesco.dolcini@toradex.com> wrote:
> >
> > Hello Vasyl,
> >
> > On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:  
> > > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")  
> >
> > This fixes tag is broken, dd06ed7ad057 does not exist on
> > torvalds/master, and the `commit` word should be removed.
> >
> > Should be:
> >
> > Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")  
> 
> Ive rebased the patch on top of bluetooth-next and fixed the hash,
> lets see if passes CI I might just go ahead and push it.

Thanks for pushing it along, the final version can got thru bluetooth ->
-> net and into 5.19, right?
Mateusz Jończyk July 5, 2022, 6:38 p.m. UTC | #6
W dniu 5.07.2022 o 14:59, Vasyl Vavrychuk pisze:
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
>
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
>
> Deadlock introduced by commit [1] was reported in [2,3] as broken
> suspend. Suspend did not work because `hdev->req_lock` held as result of
> `power_on` work deadlock. In fact, other BT features were not working.
> It was not observed when testing [1] since it was verified without
> rfkill in place.
>
> NOTE: It is not needed to cancel power_on work from other places where
> hci_dev_do_close/hci_dev_close_sync is called in case:
> * Requests were serialized due to `hdev->req_workqueue`. The power_on
> work is first in that workqueue.
> * hci_rfkill_set_block which won't close device anyway until HCI_SETUP
> is on.
> * hci_sock_release which runs after hci_sock_bind which ensures
> HCI_SETUP was cleared.
>
> As result, behaviour is the same as in pre-dd06ed7 commit, except
> power_on work cancel added to hci_dev_close.
>
> [1]: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> [2]: https://lore.kernel.org/lkml/20220614181706.26513-1-max.oss.09@gmail.com/
> [2]: https://lore.kernel.org/lkml/1236061d-95dd-c3ad-a38f-2dae7aae51ef@o2.pl/
>
> Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> Signed-off-by: Vasyl Vavrychuk <vasyl.vavrychuk@opensynergy.com>
> Reported-by: Max Krummenacher <max.krummenacher@toradex.com>
> Reported-by: Mateusz Jonczyk <mat.jonczyk@o2.pl>

Works well: suspend (with bluetooth on and also off), hibernation, sending files, rfkill.

Thank you.

Reported-and-tested-by: Mateusz Jończyk <mat.jonczyk@o2.pl>

Greetings,

Mateusz Jończyk
Luiz Augusto von Dentz July 5, 2022, 7 p.m. UTC | #7
Hi Jakub,

On Tue, Jul 5, 2022 at 11:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 5 Jul 2022 10:26:08 -0700 Luiz Augusto von Dentz wrote:
> > On Tue, Jul 5, 2022 at 8:14 AM Francesco Dolcini
> > <francesco.dolcini@toradex.com> wrote:
> > >
> > > Hello Vasyl,
> > >
> > > On Tue, Jul 05, 2022 at 03:59:31PM +0300, Vasyl Vavrychuk wrote:
> > > > Fixes: commit dd06ed7ad057 ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> > >
> > > This fixes tag is broken, dd06ed7ad057 does not exist on
> > > torvalds/master, and the `commit` word should be removed.
> > >
> > > Should be:
> > >
> > > Fixes: ff7f2926114d ("Bluetooth: core: Fix missing power_on work cancel on HCI close")
> >
> > Ive rebased the patch on top of bluetooth-next and fixed the hash,
> > lets see if passes CI I might just go ahead and push it.
>
> Thanks for pushing it along, the final version can got thru bluetooth ->
> -> net and into 5.19, right?

Yep, I will send the pull request in a moment.
Jakub Kicinski July 5, 2022, 7:13 p.m. UTC | #8
On Tue, 5 Jul 2022 12:00:43 -0700 Luiz Augusto von Dentz wrote:
> > > Ive rebased the patch on top of bluetooth-next and fixed the hash,
> > > lets see if passes CI I might just go ahead and push it.  
> >
> > Thanks for pushing it along, the final version can got thru bluetooth ->  
> > -> net and into 5.19, right?  
> 
> Yep, I will send the pull request in a moment.

Perfect, thank you!!
patchwork-bot+netdevbpf@kernel.org July 5, 2022, 9:50 p.m. UTC | #9
Hello:

This patch was applied to netdev/net.git (master)
by Luiz Augusto von Dentz <luiz.von.dentz@intel.com>:

On Tue,  5 Jul 2022 15:59:31 +0300 you wrote:
> `cancel_work_sync(&hdev->power_on)` was moved to hci_dev_close_sync in
> commit [1] to ensure that power_on work is canceled after HCI interface
> down.
> 
> But, in certain cases power_on work function may call hci_dev_close_sync
> itself: hci_power_on -> hci_dev_do_close -> hci_dev_close_sync ->
> cancel_work_sync(&hdev->power_on), causing deadlock. In particular, this
> happens when device is rfkilled on boot. To avoid deadlock, move
> power_on work canceling out of hci_dev_do_close/hci_dev_close_sync.
> 
> [...]

Here is the summary with links:
  - Bluetooth: core: Fix deadlock due to `cancel_work_sync(&hdev->power_on)` from hci_power_on_sync.
    https://git.kernel.org/netdev/net/c/e36bea6e78ab

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 59a5c1341c26..a0f99baafd35 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -571,6 +571,7 @@  int hci_dev_close(__u16 dev)
 		goto done;
 	}
 
+	cancel_work_sync(&hdev->power_on);
 	if (hci_dev_test_and_clear_flag(hdev, HCI_AUTO_OFF))
 		cancel_delayed_work(&hdev->power_off);
 
@@ -2675,6 +2676,8 @@  void hci_unregister_dev(struct hci_dev *hdev)
 	list_del(&hdev->list);
 	write_unlock(&hci_dev_list_lock);
 
+	cancel_work_sync(&hdev->power_on);
+
 	hci_cmd_sync_clear(hdev);
 
 	if (!test_bit(HCI_QUIRK_NO_SUSPEND_NOTIFIER, &hdev->quirks))
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 286d6767f017..1739e8cb3291 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -4088,7 +4088,6 @@  int hci_dev_close_sync(struct hci_dev *hdev)
 
 	bt_dev_dbg(hdev, "");
 
-	cancel_work_sync(&hdev->power_on);
 	cancel_delayed_work(&hdev->power_off);
 	cancel_delayed_work(&hdev->ncmd_timer);