Message ID | 20230224115310.kernel.v2.1.If0578b001c1f12567f2ebcac5856507f1adee745@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Clear workqueue to avoid use-after-free | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
tedd_an/CheckPatch | success | CheckPatch PASS |
tedd_an/GitLint | success | Gitlint PASS |
tedd_an/SubjectPrefix | success | Gitlint PASS |
tedd_an/BuildKernel | success | BuildKernel PASS |
tedd_an/CheckAllWarning | success | CheckAllWarning PASS |
tedd_an/CheckSparse | success | CheckSparse PASS |
tedd_an/CheckSmatch | success | CheckSparse PASS |
tedd_an/BuildKernel32 | success | BuildKernel32 PASS |
tedd_an/TestRunnerSetup | success | TestRunnerSetup PASS |
Hi Zhengping, On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > racing conditions which cause use-after-free. > > When powering off the adapter, the mgmt cmd list will be cleared. If a > work is queued in the cmd_sync_work queue at the same time, it will > cause the risk of use-after-free, as the cmd pointer is not checked > before use. > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > --- > > Changes in v2: > - Add function to clear the queue without stop the timer > > Changes in v1: > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > index 117eedb6f709..b70365dfff0c 100644 > --- a/net/bluetooth/hci_sync.c > +++ b/net/bluetooth/hci_sync.c > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > } > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > +{ > + struct hci_cmd_sync_work_entry *entry, *tmp; > + > + mutex_lock(&hdev->cmd_sync_work_lock); > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > + if (entry->destroy) { > + hci_req_sync_lock(hdev); > + entry->destroy(hdev, entry->data, -ECANCELED); > + hci_req_sync_unlock(hdev); > + } > + list_del(&entry->list); > + kfree(entry); > + } > + mutex_unlock(&hdev->cmd_sync_work_lock); > +} > + > void hci_cmd_sync_clear(struct hci_dev *hdev) > { > struct hci_cmd_sync_work_entry *entry, *tmp; > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > - hci_dev_test_flag(hdev, HCI_MGMT)) > + hci_dev_test_flag(hdev, HCI_MGMT)) { > + hci_pend_cmd_sync_clear(hdev); Any particular reason why you are not using hci_cmd_sync_clear instead? We also may want to move the clearing logic to hci_dev_close_sync since it should be equivalent to hci_request_cancel_all. > __mgmt_power_off(hdev); > + } > > hci_inquiry_cache_flush(hdev); > hci_pend_le_actions_clear(hdev); > -- > 2.39.2.722.g9855ee24e9-goog >
Hi Luiz, > Any particular reason why you are not using hci_cmd_sync_clear > instead? That is a good question and we used hci_cmd_sync_clear in the first version, but it will clear the queue and also close the timer. As a result, when the adapter is turned on again, the timer will not schedule any new jobs. So the option is to use hci_cmd_sync_clear and re-initiate the queue or to write a new function which only clears the queue. > We also may want to move the clearing logic to > hci_dev_close_sync since it should be equivalent to > hci_request_cancel_all. I actually have a question here. I saw "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought it should force clearing the cmd_sync queue. But it seems cannot prevent the use-after-free situation. Any suggestions to improve the solution? Thanks, Zhengping On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Zhengping, > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > racing conditions which cause use-after-free. > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > work is queued in the cmd_sync_work queue at the same time, it will > > cause the risk of use-after-free, as the cmd pointer is not checked > > before use. > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > --- > > > > Changes in v2: > > - Add function to clear the queue without stop the timer > > > > Changes in v1: > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > index 117eedb6f709..b70365dfff0c 100644 > > --- a/net/bluetooth/hci_sync.c > > +++ b/net/bluetooth/hci_sync.c > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > } > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > +{ > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > + > > + mutex_lock(&hdev->cmd_sync_work_lock); > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > + if (entry->destroy) { > > + hci_req_sync_lock(hdev); > > + entry->destroy(hdev, entry->data, -ECANCELED); > > + hci_req_sync_unlock(hdev); > > + } > > + list_del(&entry->list); > > + kfree(entry); > > + } > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > +} > > + > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > { > > struct hci_cmd_sync_work_entry *entry, *tmp; > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > + hci_pend_cmd_sync_clear(hdev); > > Any particular reason why you are not using hci_cmd_sync_clear > instead? We also may want to move the clearing logic to > hci_dev_close_sync since it should be equivalent to > hci_request_cancel_all. > > > __mgmt_power_off(hdev); > > + } > > > > hci_inquiry_cache_flush(hdev); > > hci_pend_le_actions_clear(hdev); > > -- > > 2.39.2.722.g9855ee24e9-goog > > > > > -- > Luiz Augusto von Dentz
Hi Luiz, I have a question. Given that each command in the cmd_sync queue should clean up the memory in a callback function. I was wondering if the call to cmd_complete_rsp in __mgmt_power_off function is still necessary? Will this always risk a race condition that cmd has been released when the complete callback or _sync function is run? Thanks, Zhengping On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > > Any particular reason why you are not using hci_cmd_sync_clear > > instead? > > That is a good question and we used hci_cmd_sync_clear in the first > version, but it will clear the queue and also close the timer. As a > result, when the adapter is turned on again, the timer will not > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > re-initiate the queue or to write a new function which only clears the > queue. > > > We also may want to move the clearing logic to > > hci_dev_close_sync since it should be equivalent to > > hci_request_cancel_all. > > I actually have a question here. I saw > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > it should force clearing the cmd_sync queue. But it seems cannot > prevent the use-after-free situation. > > Any suggestions to improve the solution? > > Thanks, > Zhengping > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Zhengping, > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > racing conditions which cause use-after-free. > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > work is queued in the cmd_sync_work queue at the same time, it will > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > before use. > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > --- > > > > > > Changes in v2: > > > - Add function to clear the queue without stop the timer > > > > > > Changes in v1: > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > index 117eedb6f709..b70365dfff0c 100644 > > > --- a/net/bluetooth/hci_sync.c > > > +++ b/net/bluetooth/hci_sync.c > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > } > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > +{ > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > + > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > + if (entry->destroy) { > > > + hci_req_sync_lock(hdev); > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > + hci_req_sync_unlock(hdev); > > > + } > > > + list_del(&entry->list); > > > + kfree(entry); > > > + } > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > +} > > > + > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > { > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > + hci_pend_cmd_sync_clear(hdev); > > > > Any particular reason why you are not using hci_cmd_sync_clear > > instead? We also may want to move the clearing logic to > > hci_dev_close_sync since it should be equivalent to > > hci_request_cancel_all. > > > > > __mgmt_power_off(hdev); > > > + } > > > > > > hci_inquiry_cache_flush(hdev); > > > hci_pend_le_actions_clear(hdev); > > > -- > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > -- > > Luiz Augusto von Dentz
Hi Zhengping, On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > I have a question. Given that each command in the cmd_sync queue > should clean up the memory in a callback function. I was wondering if > the call to cmd_complete_rsp in __mgmt_power_off function is still > necessary? Will this always risk a race condition that cmd has been > released when the complete callback or _sync function is run? Not sure I follow you here, do you have a stack trace when the user after free occurs? > Thanks, > Zhengping > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Hi Luiz, > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > instead? > > > > That is a good question and we used hci_cmd_sync_clear in the first > > version, but it will clear the queue and also close the timer. As a > > result, when the adapter is turned on again, the timer will not > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > re-initiate the queue or to write a new function which only clears the > > queue. > > > > > We also may want to move the clearing logic to > > > hci_dev_close_sync since it should be equivalent to > > > hci_request_cancel_all. > > > > I actually have a question here. I saw > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > it should force clearing the cmd_sync queue. But it seems cannot > > prevent the use-after-free situation. > > > > Any suggestions to improve the solution? > > > > Thanks, > > Zhengping > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > <luiz.dentz@gmail.com> wrote: > > > > > > Hi Zhengping, > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > racing conditions which cause use-after-free. > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > before use. > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > --- > > > > > > > > Changes in v2: > > > > - Add function to clear the queue without stop the timer > > > > > > > > Changes in v1: > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > --- a/net/bluetooth/hci_sync.c > > > > +++ b/net/bluetooth/hci_sync.c > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > } > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > +{ > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > + > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > + if (entry->destroy) { > > > > + hci_req_sync_lock(hdev); > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > + hci_req_sync_unlock(hdev); > > > > + } > > > > + list_del(&entry->list); > > > > + kfree(entry); > > > > + } > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > +} > > > > + > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > { > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > instead? We also may want to move the clearing logic to > > > hci_dev_close_sync since it should be equivalent to > > > hci_request_cancel_all. > > > > > > > __mgmt_power_off(hdev); > > > > + } > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > hci_pend_le_actions_clear(hdev); > > > > -- > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > -- > > > Luiz Augusto von Dentz
Hi Luiz, Sure. Hope this helps. Here is one log from the user. [ 53.368740] ================================================================== [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD platform (rev5+) (DT) [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] [ 53.422780] Call trace: [ 53.425310] dump_backtrace+0x0/0x424 [ 53.429108] show_stack+0x20/0x2c [ 53.432534] dump_stack_lvl+0x84/0xb4 [ 53.436514] print_address_description+0x30/0x2fc [ 53.441369] kasan_report+0x15c/0x19c [ 53.445975] __asan_report_load8_noabort+0x44/0x50 [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] [ 53.472214] process_one_work+0x59c/0xa88 [ 53.476990] worker_thread+0x81c/0xd18 [ 53.480854] kthread+0x2d4/0x3d8 [ 53.484272] ret_from_fork+0x10/0x20 [ 53.489733] Allocated by task 1162: [ 53.493336] kasan_save_stack+0x38/0x68 [ 53.498115] __kasan_kmalloc+0xb4/0xd0 [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] [ 53.531850] sock_sendmsg+0xb4/0xd8 [ 53.535454] sock_write_iter+0x1c0/0x2d0 [ 53.539494] do_iter_readv_writev+0x350/0x4e0 [ 53.543980] do_iter_write+0xf0/0x2e4 [ 53.547747] vfs_writev+0xd0/0x13c [ 53.551254] do_writev+0xe8/0x1fc [ 53.554672] __arm64_sys_writev+0x84/0x98 [ 53.558805] invoke_syscall+0x78/0x20c [ 53.562665] el0_svc_common+0x12c/0x2f0 [ 53.566618] do_el0_svc+0x94/0x13c [ 53.570125] el0_svc+0x5c/0x108 [ 53.573374] el0t_64_sync_handler+0x78/0x108 [ 53.577773] el0t_64_sync+0x1a4/0x1a8 [ 53.583089] Freed by task 3207: [ 53.586325] kasan_save_stack+0x38/0x68 [ 53.590282] kasan_set_track+0x28/0x3c [ 53.594153] kasan_set_free_info+0x28/0x4c [ 53.598369] ____kasan_slab_free+0x138/0x17c [ 53.602767] __kasan_slab_free+0x18/0x28 [ 53.606803] slab_free_freelist_hook+0x188/0x260 [ 53.611559] kfree+0x138/0x29c [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] [ 53.662117] sock_do_ioctl+0xd0/0x1e8 [ 53.665900] sock_ioctl+0x4fc/0x72c [ 53.669500] __arm64_sys_ioctl+0x118/0x154 [ 53.673726] invoke_syscall+0x78/0x20c [ 53.677587] el0_svc_common+0x12c/0x2f0 [ 53.681533] do_el0_svc+0x94/0x13c [ 53.685043] el0_svc+0x5c/0x108 [ 53.688278] el0t_64_sync_handler+0x78/0x108 [ 53.692677] el0t_64_sync+0x1a4/0x1a8 [ 53.697988] Last potentially related work creation: [ 53.703009] kasan_save_stack+0x38/0x68 [ 53.706962] kasan_record_aux_stack+0x104/0x130 [ 53.711622] __call_rcu+0x14c/0x860 [ 53.715212] call_rcu+0x18/0x24 [ 53.718448] sk_filter_uncharge+0xc0/0x120 [ 53.722667] __sk_destruct+0xb4/0x4a8 [ 53.726435] sk_destruct+0x78/0xa0 [ 53.729941] __sk_free+0x190/0x270 [ 53.733453] sk_free+0x54/0x8c [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c [ 53.741000] rcu_do_batch+0x3e8/0xd08 [ 53.744772] nocb_cb_wait+0xc8/0xa3c [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 [ 53.752768] kthread+0x2d4/0x3d8 [ 53.756098] ret_from_fork+0x10/0x20 This is another one at a different function but with the same signature. [ 43.363512] ================================================================== [ 43.370966] BUG: KASAN: use-after-free in mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 [ 43.387158] [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] [ 43.418887] Call trace: [ 43.422407] dump_backtrace+0x0/0x424 [ 43.426191] show_stack+0x20/0x2c [ 43.429608] dump_stack_lvl+0x84/0xb4 [ 43.433395] print_address_description+0x30/0x2fc [ 43.438243] kasan_report+0x15c/0x19c [ 43.442070] __asan_report_load8_noabort+0x44/0x50 hciconfig hci0 up [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth 8dae3a82177133cfa9626e7322b3b0c8f665102d] [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth 8dae3a82177133cfa9626e7322b3b0c8f665102d] [ 43.467656] process_one_work+0x59c/0xa88 [ 43.472530] worker_thread+0x81c/0xd18 [ 43.476410] kthread+0x2d4/0x3d8 localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 [ 43.486588] [ 43.488156] Allocated by task 1118: [ 43.491751] kasan_save_stack+0x38/0x68 [ 43.495709] __kasan_kmalloc+0xb4/0xd0 [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] [ 43.531873] sock_sendmsg+0xb4/0xd8 [ 43.535472] sock_write_iter+0x1c0/0x2d0 [ 43.539519] do_iter_readv_writev+0x350/0x4e0 [ 43.544012] do_iter_write+0xf0/0x2e4 [ 43.547788] vfs_writev+0xd0/0x13c [ 43.551295] do_writev+0xe8/0x1fc [ 43.554710] __arm64_sys_writev+0x84/0x98 [ 43.558838] invoke_syscall+0x78/0x20c [ 43.562709] el0_svc_common+0x12c/0x2f0 [ 43.566654] do_el0_svc+0x94/0x13c [ 43.570155] el0_svc+0x5c/0x108 [ 43.573391] el0t_64_sync_handler+0x78/0x108 [ 43.577785] el0t_64_sync+0x1a4/0x1a8 [ 43.581564] [ 43.583115] Freed by task 3217: [ 43.586356] kasan_save_stack+0x38/0x68 [ 43.590314] kasan_set_track+0x28/0x3c [ 43.594180] kasan_set_free_info+0x28/0x4c [ 43.598396] ____kasan_slab_free+0x138/0x17c [ 43.602794] __kasan_slab_free+0x18/0x28 [ 43.606838] slab_free_freelist_hook+0x188/0x260 [ 43.611591] kfree+0x138/0x29c [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] [ 43.662353] sock_do_ioctl+0xd0/0x1e8 [ 43.666134] sock_ioctl+0x4fc/0x72c [ 43.669736] __arm64_sys_ioctl+0x118/0x154 [ 43.673961] invoke_syscall+0x78/0x20c [ 43.677820] el0_svc_common+0x12c/0x2f0 [ 43.681770] do_el0_svc+0x94/0x13c [ 43.685278] el0_svc+0x5c/0x108 [ 43.688514] el0t_64_sync_handler+0x78/0x108 [ 43.692913] el0t_64_sync+0x1a4/0x1a8 Thanks, Zhengping On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Zhengping, > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Hi Luiz, > > > > I have a question. Given that each command in the cmd_sync queue > > should clean up the memory in a callback function. I was wondering if > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > necessary? Will this always risk a race condition that cmd has been > > released when the complete callback or _sync function is run? > > Not sure I follow you here, do you have a stack trace when the user > after free occurs? > > > Thanks, > > Zhengping > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > Hi Luiz, > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > instead? > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > version, but it will clear the queue and also close the timer. As a > > > result, when the adapter is turned on again, the timer will not > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > re-initiate the queue or to write a new function which only clears the > > > queue. > > > > > > > We also may want to move the clearing logic to > > > > hci_dev_close_sync since it should be equivalent to > > > > hci_request_cancel_all. > > > > > > I actually have a question here. I saw > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > it should force clearing the cmd_sync queue. But it seems cannot > > > prevent the use-after-free situation. > > > > > > Any suggestions to improve the solution? > > > > > > Thanks, > > > Zhengping > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > Hi Zhengping, > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > racing conditions which cause use-after-free. > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > before use. > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > --- > > > > > > > > > > Changes in v2: > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > Changes in v1: > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > --- a/net/bluetooth/hci_sync.c > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > } > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > +{ > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > + > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > + if (entry->destroy) { > > > > > + hci_req_sync_lock(hdev); > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > + hci_req_sync_unlock(hdev); > > > > > + } > > > > > + list_del(&entry->list); > > > > > + kfree(entry); > > > > > + } > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > +} > > > > > + > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > { > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > instead? We also may want to move the clearing logic to > > > > hci_dev_close_sync since it should be equivalent to > > > > hci_request_cancel_all. > > > > > > > > > __mgmt_power_off(hdev); > > > > > + } > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > hci_pend_le_actions_clear(hdev); > > > > > -- > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > -- > > > > Luiz Augusto von Dentz > > > > -- > Luiz Augusto von Dentz
Hi Zhengping, On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > Sure. Hope this helps. > Here is one log from the user. > > [ 53.368740] ================================================================== > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > platform (rev5+) (DT) > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > [ 53.422780] Call trace: > [ 53.425310] dump_backtrace+0x0/0x424 > [ 53.429108] show_stack+0x20/0x2c > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > [ 53.436514] print_address_description+0x30/0x2fc > [ 53.441369] kasan_report+0x15c/0x19c > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > [ 53.472214] process_one_work+0x59c/0xa88 > [ 53.476990] worker_thread+0x81c/0xd18 > [ 53.480854] kthread+0x2d4/0x3d8 > [ 53.484272] ret_from_fork+0x10/0x20 > > [ 53.489733] Allocated by task 1162: > [ 53.493336] kasan_save_stack+0x38/0x68 > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > [ 53.531850] sock_sendmsg+0xb4/0xd8 > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > [ 53.543980] do_iter_write+0xf0/0x2e4 > [ 53.547747] vfs_writev+0xd0/0x13c > [ 53.551254] do_writev+0xe8/0x1fc > [ 53.554672] __arm64_sys_writev+0x84/0x98 > [ 53.558805] invoke_syscall+0x78/0x20c > [ 53.562665] el0_svc_common+0x12c/0x2f0 > [ 53.566618] do_el0_svc+0x94/0x13c > [ 53.570125] el0_svc+0x5c/0x108 > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > [ 53.583089] Freed by task 3207: > [ 53.586325] kasan_save_stack+0x38/0x68 > [ 53.590282] kasan_set_track+0x28/0x3c > [ 53.594153] kasan_set_free_info+0x28/0x4c > [ 53.598369] ____kasan_slab_free+0x138/0x17c > [ 53.602767] __kasan_slab_free+0x18/0x28 > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > [ 53.611559] kfree+0x138/0x29c > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > [ 53.665900] sock_ioctl+0x4fc/0x72c > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > [ 53.673726] invoke_syscall+0x78/0x20c > [ 53.677587] el0_svc_common+0x12c/0x2f0 > [ 53.681533] do_el0_svc+0x94/0x13c > [ 53.685043] el0_svc+0x5c/0x108 > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > [ 53.697988] Last potentially related work creation: > [ 53.703009] kasan_save_stack+0x38/0x68 > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > [ 53.711622] __call_rcu+0x14c/0x860 > [ 53.715212] call_rcu+0x18/0x24 > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > [ 53.722667] __sk_destruct+0xb4/0x4a8 > [ 53.726435] sk_destruct+0x78/0xa0 > [ 53.729941] __sk_free+0x190/0x270 > [ 53.733453] sk_free+0x54/0x8c > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > [ 53.752768] kthread+0x2d4/0x3d8 > [ 53.756098] ret_from_fork+0x10/0x20 > > This is another one at a different function but with the same signature. > > [ 43.363512] ================================================================== > [ 43.370966] BUG: KASAN: use-after-free in > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > [ 43.387158] > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > [ 43.418887] Call trace: > [ 43.422407] dump_backtrace+0x0/0x424 > [ 43.426191] show_stack+0x20/0x2c > [ 43.429608] dump_stack_lvl+0x84/0xb4 > [ 43.433395] print_address_description+0x30/0x2fc > [ 43.438243] kasan_report+0x15c/0x19c > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > hciconfig hci0 up > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > [ 43.467656] process_one_work+0x59c/0xa88 > [ 43.472530] worker_thread+0x81c/0xd18 > [ 43.476410] kthread+0x2d4/0x3d8 > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > [ 43.486588] > [ 43.488156] Allocated by task 1118: > [ 43.491751] kasan_save_stack+0x38/0x68 > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > [ 43.531873] sock_sendmsg+0xb4/0xd8 > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > [ 43.544012] do_iter_write+0xf0/0x2e4 > [ 43.547788] vfs_writev+0xd0/0x13c > [ 43.551295] do_writev+0xe8/0x1fc > [ 43.554710] __arm64_sys_writev+0x84/0x98 > [ 43.558838] invoke_syscall+0x78/0x20c > [ 43.562709] el0_svc_common+0x12c/0x2f0 > [ 43.566654] do_el0_svc+0x94/0x13c > [ 43.570155] el0_svc+0x5c/0x108 > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > [ 43.581564] > [ 43.583115] Freed by task 3217: > [ 43.586356] kasan_save_stack+0x38/0x68 > [ 43.590314] kasan_set_track+0x28/0x3c > [ 43.594180] kasan_set_free_info+0x28/0x4c > [ 43.598396] ____kasan_slab_free+0x138/0x17c > [ 43.602794] __kasan_slab_free+0x18/0x28 > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > [ 43.611591] kfree+0x138/0x29c > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > [ 43.666134] sock_ioctl+0x4fc/0x72c > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > [ 43.673961] invoke_syscall+0x78/0x20c > [ 43.677820] el0_svc_common+0x12c/0x2f0 > [ 43.681770] do_el0_svc+0x94/0x13c > [ 43.685278] el0_svc+0x5c/0x108 > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > Thanks, > Zhengping Ok, how about we do something like the following: https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 The actual real culprit seem to be __mgmt_power_off does cleanup mgmt_pending but that is still accessible via cmd_sync_work_list, this is probably how hci_request was designed but in case of cmd_sync we normally have the data as part of cmd_sync_work_list. > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Zhengping, > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > Hi Luiz, > > > > > > I have a question. Given that each command in the cmd_sync queue > > > should clean up the memory in a callback function. I was wondering if > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > necessary? Will this always risk a race condition that cmd has been > > > released when the complete callback or _sync function is run? > > > > Not sure I follow you here, do you have a stack trace when the user > > after free occurs? > > > > > Thanks, > > > Zhengping > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > Hi Luiz, > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > instead? > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > version, but it will clear the queue and also close the timer. As a > > > > result, when the adapter is turned on again, the timer will not > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > re-initiate the queue or to write a new function which only clears the > > > > queue. > > > > > > > > > We also may want to move the clearing logic to > > > > > hci_dev_close_sync since it should be equivalent to > > > > > hci_request_cancel_all. > > > > > > > > I actually have a question here. I saw > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > prevent the use-after-free situation. > > > > > > > > Any suggestions to improve the solution? > > > > > > > > Thanks, > > > > Zhengping > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > Hi Zhengping, > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > before use. > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > --- > > > > > > > > > > > > Changes in v2: > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > Changes in v1: > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > } > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > +{ > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > + > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > + if (entry->destroy) { > > > > > > + hci_req_sync_lock(hdev); > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > + hci_req_sync_unlock(hdev); > > > > > > + } > > > > > > + list_del(&entry->list); > > > > > > + kfree(entry); > > > > > > + } > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > +} > > > > > > + > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > { > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > instead? We also may want to move the clearing logic to > > > > > hci_dev_close_sync since it should be equivalent to > > > > > hci_request_cancel_all. > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > + } > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > -- > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > -- > > > > > Luiz Augusto von Dentz > > > > > > > > -- > > Luiz Augusto von Dentz
Hi Luiz, This looks good to me. I still have a question. Does this prevent a job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition is still there. Maybe using cancel_work_sync and re-init the workqueue timer is the right thing to do? Thanks, Zhengping On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Zhengping, > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Hi Luiz, > > > > Sure. Hope this helps. > > Here is one log from the user. > > > > [ 53.368740] ================================================================== > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > > platform (rev5+) (DT) > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > [ 53.422780] Call trace: > > [ 53.425310] dump_backtrace+0x0/0x424 > > [ 53.429108] show_stack+0x20/0x2c > > > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > > [ 53.436514] print_address_description+0x30/0x2fc > > [ 53.441369] kasan_report+0x15c/0x19c > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > [ 53.472214] process_one_work+0x59c/0xa88 > > [ 53.476990] worker_thread+0x81c/0xd18 > > [ 53.480854] kthread+0x2d4/0x3d8 > > [ 53.484272] ret_from_fork+0x10/0x20 > > > > [ 53.489733] Allocated by task 1162: > > [ 53.493336] kasan_save_stack+0x38/0x68 > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > [ 53.531850] sock_sendmsg+0xb4/0xd8 > > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > > [ 53.543980] do_iter_write+0xf0/0x2e4 > > [ 53.547747] vfs_writev+0xd0/0x13c > > [ 53.551254] do_writev+0xe8/0x1fc > > [ 53.554672] __arm64_sys_writev+0x84/0x98 > > [ 53.558805] invoke_syscall+0x78/0x20c > > [ 53.562665] el0_svc_common+0x12c/0x2f0 > > [ 53.566618] do_el0_svc+0x94/0x13c > > [ 53.570125] el0_svc+0x5c/0x108 > > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > > > [ 53.583089] Freed by task 3207: > > [ 53.586325] kasan_save_stack+0x38/0x68 > > [ 53.590282] kasan_set_track+0x28/0x3c > > [ 53.594153] kasan_set_free_info+0x28/0x4c > > [ 53.598369] ____kasan_slab_free+0x138/0x17c > > [ 53.602767] __kasan_slab_free+0x18/0x28 > > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > > [ 53.611559] kfree+0x138/0x29c > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > > [ 53.665900] sock_ioctl+0x4fc/0x72c > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > > [ 53.673726] invoke_syscall+0x78/0x20c > > [ 53.677587] el0_svc_common+0x12c/0x2f0 > > [ 53.681533] do_el0_svc+0x94/0x13c > > [ 53.685043] el0_svc+0x5c/0x108 > > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > > > [ 53.697988] Last potentially related work creation: > > [ 53.703009] kasan_save_stack+0x38/0x68 > > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > > [ 53.711622] __call_rcu+0x14c/0x860 > > [ 53.715212] call_rcu+0x18/0x24 > > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > > [ 53.722667] __sk_destruct+0xb4/0x4a8 > > [ 53.726435] sk_destruct+0x78/0xa0 > > [ 53.729941] __sk_free+0x190/0x270 > > [ 53.733453] sk_free+0x54/0x8c > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > > [ 53.752768] kthread+0x2d4/0x3d8 > > [ 53.756098] ret_from_fork+0x10/0x20 > > > > This is another one at a different function but with the same signature. > > > > [ 43.363512] ================================================================== > > [ 43.370966] BUG: KASAN: use-after-free in > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > > [ 43.387158] > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > [ 43.418887] Call trace: > > [ 43.422407] dump_backtrace+0x0/0x424 > > [ 43.426191] show_stack+0x20/0x2c > > [ 43.429608] dump_stack_lvl+0x84/0xb4 > > [ 43.433395] print_address_description+0x30/0x2fc > > [ 43.438243] kasan_report+0x15c/0x19c > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > > hciconfig hci0 up > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > [ 43.467656] process_one_work+0x59c/0xa88 > > [ 43.472530] worker_thread+0x81c/0xd18 > > [ 43.476410] kthread+0x2d4/0x3d8 > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > > [ 43.486588] > > [ 43.488156] Allocated by task 1118: > > [ 43.491751] kasan_save_stack+0x38/0x68 > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > [ 43.531873] sock_sendmsg+0xb4/0xd8 > > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > > [ 43.544012] do_iter_write+0xf0/0x2e4 > > [ 43.547788] vfs_writev+0xd0/0x13c > > [ 43.551295] do_writev+0xe8/0x1fc > > [ 43.554710] __arm64_sys_writev+0x84/0x98 > > [ 43.558838] invoke_syscall+0x78/0x20c > > [ 43.562709] el0_svc_common+0x12c/0x2f0 > > [ 43.566654] do_el0_svc+0x94/0x13c > > [ 43.570155] el0_svc+0x5c/0x108 > > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > > [ 43.581564] > > [ 43.583115] Freed by task 3217: > > [ 43.586356] kasan_save_stack+0x38/0x68 > > [ 43.590314] kasan_set_track+0x28/0x3c > > [ 43.594180] kasan_set_free_info+0x28/0x4c > > [ 43.598396] ____kasan_slab_free+0x138/0x17c > > [ 43.602794] __kasan_slab_free+0x18/0x28 > > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > > [ 43.611591] kfree+0x138/0x29c > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > > [ 43.666134] sock_ioctl+0x4fc/0x72c > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > > [ 43.673961] invoke_syscall+0x78/0x20c > > [ 43.677820] el0_svc_common+0x12c/0x2f0 > > [ 43.681770] do_el0_svc+0x94/0x13c > > [ 43.685278] el0_svc+0x5c/0x108 > > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > > > Thanks, > > Zhengping > > Ok, how about we do something like the following: > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 > > The actual real culprit seem to be __mgmt_power_off does cleanup > mgmt_pending but that is still accessible via cmd_sync_work_list, this > is probably how hci_request was designed but in case of cmd_sync we > normally have the data as part of cmd_sync_work_list. > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > > <luiz.dentz@gmail.com> wrote: > > > > > > Hi Zhengping, > > > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > Hi Luiz, > > > > > > > > I have a question. Given that each command in the cmd_sync queue > > > > should clean up the memory in a callback function. I was wondering if > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > > necessary? Will this always risk a race condition that cmd has been > > > > released when the complete callback or _sync function is run? > > > > > > Not sure I follow you here, do you have a stack trace when the user > > > after free occurs? > > > > > > > Thanks, > > > > Zhengping > > > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > Hi Luiz, > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > instead? > > > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > > version, but it will clear the queue and also close the timer. As a > > > > > result, when the adapter is turned on again, the timer will not > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > > re-initiate the queue or to write a new function which only clears the > > > > > queue. > > > > > > > > > > > We also may want to move the clearing logic to > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > hci_request_cancel_all. > > > > > > > > > > I actually have a question here. I saw > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > > prevent the use-after-free situation. > > > > > > > > > > Any suggestions to improve the solution? > > > > > > > > > > Thanks, > > > > > Zhengping > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > > before use. > > > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > > --- > > > > > > > > > > > > > > Changes in v2: > > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > > > Changes in v1: > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > > } > > > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > +{ > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > + > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > > + if (entry->destroy) { > > > > > > > + hci_req_sync_lock(hdev); > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > > + hci_req_sync_unlock(hdev); > > > > > > > + } > > > > > > > + list_del(&entry->list); > > > > > > > + kfree(entry); > > > > > > > + } > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > > +} > > > > > > > + > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > { > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > instead? We also may want to move the clearing logic to > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > > + } > > > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > > -- > > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > -- > > > Luiz Augusto von Dentz > > > > -- > Luiz Augusto von Dentz
Hi Zhengping, On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > This looks good to me. I still have a question. Does this prevent a > job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and > "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition > is still there. Maybe using cancel_work_sync and re-init the workqueue > timer is the right thing to do? I tried the cancel_work_sync but it doesn't work since to cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive also tried stopping new scheduling of new work based on HCI_UP flag but that causes some tests not to run, perhaps we need a dedicated flag to ensure cmd_sync cannot be schedule after a certain point but I could found the exact point it is, anyway I fine leaving this to when we actually have a more clear understanding or a reproducer. > Thanks, > Zhengping > > On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Zhengping, > > > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > Hi Luiz, > > > > > > Sure. Hope this helps. > > > Here is one log from the user. > > > > > > [ 53.368740] ================================================================== > > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > > > platform (rev5+) (DT) > > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > [ 53.422780] Call trace: > > > [ 53.425310] dump_backtrace+0x0/0x424 > > > [ 53.429108] show_stack+0x20/0x2c > > > > > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > > > [ 53.436514] print_address_description+0x30/0x2fc > > > [ 53.441369] kasan_report+0x15c/0x19c > > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > [ 53.472214] process_one_work+0x59c/0xa88 > > > [ 53.476990] worker_thread+0x81c/0xd18 > > > [ 53.480854] kthread+0x2d4/0x3d8 > > > [ 53.484272] ret_from_fork+0x10/0x20 > > > > > > [ 53.489733] Allocated by task 1162: > > > [ 53.493336] kasan_save_stack+0x38/0x68 > > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > [ 53.531850] sock_sendmsg+0xb4/0xd8 > > > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > > > [ 53.543980] do_iter_write+0xf0/0x2e4 > > > [ 53.547747] vfs_writev+0xd0/0x13c > > > [ 53.551254] do_writev+0xe8/0x1fc > > > [ 53.554672] __arm64_sys_writev+0x84/0x98 > > > [ 53.558805] invoke_syscall+0x78/0x20c > > > [ 53.562665] el0_svc_common+0x12c/0x2f0 > > > [ 53.566618] do_el0_svc+0x94/0x13c > > > [ 53.570125] el0_svc+0x5c/0x108 > > > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > > > > > [ 53.583089] Freed by task 3207: > > > [ 53.586325] kasan_save_stack+0x38/0x68 > > > [ 53.590282] kasan_set_track+0x28/0x3c > > > [ 53.594153] kasan_set_free_info+0x28/0x4c > > > [ 53.598369] ____kasan_slab_free+0x138/0x17c > > > [ 53.602767] __kasan_slab_free+0x18/0x28 > > > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > > > [ 53.611559] kfree+0x138/0x29c > > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > > > [ 53.665900] sock_ioctl+0x4fc/0x72c > > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > > > [ 53.673726] invoke_syscall+0x78/0x20c > > > [ 53.677587] el0_svc_common+0x12c/0x2f0 > > > [ 53.681533] do_el0_svc+0x94/0x13c > > > [ 53.685043] el0_svc+0x5c/0x108 > > > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > > > > > [ 53.697988] Last potentially related work creation: > > > [ 53.703009] kasan_save_stack+0x38/0x68 > > > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > > > [ 53.711622] __call_rcu+0x14c/0x860 > > > [ 53.715212] call_rcu+0x18/0x24 > > > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > > > [ 53.722667] __sk_destruct+0xb4/0x4a8 > > > [ 53.726435] sk_destruct+0x78/0xa0 > > > [ 53.729941] __sk_free+0x190/0x270 > > > [ 53.733453] sk_free+0x54/0x8c > > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > > > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > > > [ 53.752768] kthread+0x2d4/0x3d8 > > > [ 53.756098] ret_from_fork+0x10/0x20 > > > > > > This is another one at a different function but with the same signature. > > > > > > [ 43.363512] ================================================================== > > > [ 43.370966] BUG: KASAN: use-after-free in > > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > > > [ 43.387158] > > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > > [ 43.418887] Call trace: > > > [ 43.422407] dump_backtrace+0x0/0x424 > > > [ 43.426191] show_stack+0x20/0x2c > > > [ 43.429608] dump_stack_lvl+0x84/0xb4 > > > [ 43.433395] print_address_description+0x30/0x2fc > > > [ 43.438243] kasan_report+0x15c/0x19c > > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > > > hciconfig hci0 up > > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > [ 43.467656] process_one_work+0x59c/0xa88 > > > [ 43.472530] worker_thread+0x81c/0xd18 > > > [ 43.476410] kthread+0x2d4/0x3d8 > > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > > > [ 43.486588] > > > [ 43.488156] Allocated by task 1118: > > > [ 43.491751] kasan_save_stack+0x38/0x68 > > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > [ 43.531873] sock_sendmsg+0xb4/0xd8 > > > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > > > [ 43.544012] do_iter_write+0xf0/0x2e4 > > > [ 43.547788] vfs_writev+0xd0/0x13c > > > [ 43.551295] do_writev+0xe8/0x1fc > > > [ 43.554710] __arm64_sys_writev+0x84/0x98 > > > [ 43.558838] invoke_syscall+0x78/0x20c > > > [ 43.562709] el0_svc_common+0x12c/0x2f0 > > > [ 43.566654] do_el0_svc+0x94/0x13c > > > [ 43.570155] el0_svc+0x5c/0x108 > > > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > > > [ 43.581564] > > > [ 43.583115] Freed by task 3217: > > > [ 43.586356] kasan_save_stack+0x38/0x68 > > > [ 43.590314] kasan_set_track+0x28/0x3c > > > [ 43.594180] kasan_set_free_info+0x28/0x4c > > > [ 43.598396] ____kasan_slab_free+0x138/0x17c > > > [ 43.602794] __kasan_slab_free+0x18/0x28 > > > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > > > [ 43.611591] kfree+0x138/0x29c > > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > > > [ 43.666134] sock_ioctl+0x4fc/0x72c > > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > > > [ 43.673961] invoke_syscall+0x78/0x20c > > > [ 43.677820] el0_svc_common+0x12c/0x2f0 > > > [ 43.681770] do_el0_svc+0x94/0x13c > > > [ 43.685278] el0_svc+0x5c/0x108 > > > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > > > > > Thanks, > > > Zhengping > > > > Ok, how about we do something like the following: > > > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 > > > > The actual real culprit seem to be __mgmt_power_off does cleanup > > mgmt_pending but that is still accessible via cmd_sync_work_list, this > > is probably how hci_request was designed but in case of cmd_sync we > > normally have the data as part of cmd_sync_work_list. > > > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > Hi Zhengping, > > > > > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > Hi Luiz, > > > > > > > > > > I have a question. Given that each command in the cmd_sync queue > > > > > should clean up the memory in a callback function. I was wondering if > > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > > > necessary? Will this always risk a race condition that cmd has been > > > > > released when the complete callback or _sync function is run? > > > > > > > > Not sure I follow you here, do you have a stack trace when the user > > > > after free occurs? > > > > > > > > > Thanks, > > > > > Zhengping > > > > > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > instead? > > > > > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > > > version, but it will clear the queue and also close the timer. As a > > > > > > result, when the adapter is turned on again, the timer will not > > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > > > re-initiate the queue or to write a new function which only clears the > > > > > > queue. > > > > > > > > > > > > > We also may want to move the clearing logic to > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > I actually have a question here. I saw > > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > > > prevent the use-after-free situation. > > > > > > > > > > > > Any suggestions to improve the solution? > > > > > > > > > > > > Thanks, > > > > > > Zhengping > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > > > before use. > > > > > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > > > --- > > > > > > > > > > > > > > > > Changes in v2: > > > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > > > > > Changes in v1: > > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > > > } > > > > > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > +{ > > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > + > > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > > > + if (entry->destroy) { > > > > > > > > + hci_req_sync_lock(hdev); > > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > > > + hci_req_sync_unlock(hdev); > > > > > > > > + } > > > > > > > > + list_del(&entry->list); > > > > > > > > + kfree(entry); > > > > > > > > + } > > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > > > +} > > > > > > > > + > > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > { > > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > instead? We also may want to move the clearing logic to > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > > > + } > > > > > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > > > -- > > > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > -- > > > > Luiz Augusto von Dentz > > > > > > > > -- > > Luiz Augusto von Dentz
Hi Luiz, Thanks for testing these options! > perhaps we need a dedicated flag to ensure cmd_sync cannot be schedule after a certain point This actually sounds promising to me. I would think about this. This does not happen in regular use, but one of our customers has a script to run a stress test by turning on/off the adapter and rebooting for a few cycles. Then the crash can be reproduced. If you have any new ideas, I can schedule a test. Just to confirm if you will submit the current patch or you would hold it for a solid solution? The current patch to clear the cmd_sync list indeed reduces the crash frequency. Best, Zhengping On Tue, Feb 28, 2023 at 5:53 PM Luiz Augusto von Dentz <luiz.dentz@gmail.com> wrote: > > Hi Zhengping, > > On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Hi Luiz, > > > > This looks good to me. I still have a question. Does this prevent a > > job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and > > "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition > > is still there. Maybe using cancel_work_sync and re-init the workqueue > > timer is the right thing to do? > > I tried the cancel_work_sync but it doesn't work since to > cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive > also tried stopping new scheduling of new work based on HCI_UP flag > but that causes some tests not to run, perhaps we need a dedicated > flag to ensure cmd_sync cannot be schedule after a certain point but I > could found the exact point it is, anyway I fine leaving this to when > we actually have a more clear understanding or a reproducer. > > > Thanks, > > Zhengping > > > > On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz > > <luiz.dentz@gmail.com> wrote: > > > > > > Hi Zhengping, > > > > > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > Hi Luiz, > > > > > > > > Sure. Hope this helps. > > > > Here is one log from the user. > > > > > > > > [ 53.368740] ================================================================== > > > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > > > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > > > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > > > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > > > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > > > > platform (rev5+) (DT) > > > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > [ 53.422780] Call trace: > > > > [ 53.425310] dump_backtrace+0x0/0x424 > > > > [ 53.429108] show_stack+0x20/0x2c > > > > > > > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > > > > [ 53.436514] print_address_description+0x30/0x2fc > > > > [ 53.441369] kasan_report+0x15c/0x19c > > > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > > > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > [ 53.472214] process_one_work+0x59c/0xa88 > > > > [ 53.476990] worker_thread+0x81c/0xd18 > > > > [ 53.480854] kthread+0x2d4/0x3d8 > > > > [ 53.484272] ret_from_fork+0x10/0x20 > > > > > > > > [ 53.489733] Allocated by task 1162: > > > > [ 53.493336] kasan_save_stack+0x38/0x68 > > > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > > > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > > > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > > > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > > > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > [ 53.531850] sock_sendmsg+0xb4/0xd8 > > > > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > > > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > > > > [ 53.543980] do_iter_write+0xf0/0x2e4 > > > > [ 53.547747] vfs_writev+0xd0/0x13c > > > > [ 53.551254] do_writev+0xe8/0x1fc > > > > [ 53.554672] __arm64_sys_writev+0x84/0x98 > > > > [ 53.558805] invoke_syscall+0x78/0x20c > > > > [ 53.562665] el0_svc_common+0x12c/0x2f0 > > > > [ 53.566618] do_el0_svc+0x94/0x13c > > > > [ 53.570125] el0_svc+0x5c/0x108 > > > > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > > > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > [ 53.583089] Freed by task 3207: > > > > [ 53.586325] kasan_save_stack+0x38/0x68 > > > > [ 53.590282] kasan_set_track+0x28/0x3c > > > > [ 53.594153] kasan_set_free_info+0x28/0x4c > > > > [ 53.598369] ____kasan_slab_free+0x138/0x17c > > > > [ 53.602767] __kasan_slab_free+0x18/0x28 > > > > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > > > > [ 53.611559] kfree+0x138/0x29c > > > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > > > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > > > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > > > > [ 53.665900] sock_ioctl+0x4fc/0x72c > > > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > > > > [ 53.673726] invoke_syscall+0x78/0x20c > > > > [ 53.677587] el0_svc_common+0x12c/0x2f0 > > > > [ 53.681533] do_el0_svc+0x94/0x13c > > > > [ 53.685043] el0_svc+0x5c/0x108 > > > > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > > > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > [ 53.697988] Last potentially related work creation: > > > > [ 53.703009] kasan_save_stack+0x38/0x68 > > > > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > > > > [ 53.711622] __call_rcu+0x14c/0x860 > > > > [ 53.715212] call_rcu+0x18/0x24 > > > > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > > > > [ 53.722667] __sk_destruct+0xb4/0x4a8 > > > > [ 53.726435] sk_destruct+0x78/0xa0 > > > > [ 53.729941] __sk_free+0x190/0x270 > > > > [ 53.733453] sk_free+0x54/0x8c > > > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > > > > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > > > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > > > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > > > > [ 53.752768] kthread+0x2d4/0x3d8 > > > > [ 53.756098] ret_from_fork+0x10/0x20 > > > > > > > > This is another one at a different function but with the same signature. > > > > > > > > [ 43.363512] ================================================================== > > > > [ 43.370966] BUG: KASAN: use-after-free in > > > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > > > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > > > > [ 43.387158] > > > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > > > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > > > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > > > > [ 43.418887] Call trace: > > > > [ 43.422407] dump_backtrace+0x0/0x424 > > > > [ 43.426191] show_stack+0x20/0x2c > > > > [ 43.429608] dump_stack_lvl+0x84/0xb4 > > > > [ 43.433395] print_address_description+0x30/0x2fc > > > > [ 43.438243] kasan_report+0x15c/0x19c > > > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > > > > hciconfig hci0 up > > > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > [ 43.467656] process_one_work+0x59c/0xa88 > > > > [ 43.472530] worker_thread+0x81c/0xd18 > > > > [ 43.476410] kthread+0x2d4/0x3d8 > > > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > > > > [ 43.486588] > > > > [ 43.488156] Allocated by task 1118: > > > > [ 43.491751] kasan_save_stack+0x38/0x68 > > > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > > > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > > > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > > > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > [ 43.531873] sock_sendmsg+0xb4/0xd8 > > > > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > > > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > > > > [ 43.544012] do_iter_write+0xf0/0x2e4 > > > > [ 43.547788] vfs_writev+0xd0/0x13c > > > > [ 43.551295] do_writev+0xe8/0x1fc > > > > [ 43.554710] __arm64_sys_writev+0x84/0x98 > > > > [ 43.558838] invoke_syscall+0x78/0x20c > > > > [ 43.562709] el0_svc_common+0x12c/0x2f0 > > > > [ 43.566654] do_el0_svc+0x94/0x13c > > > > [ 43.570155] el0_svc+0x5c/0x108 > > > > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > > > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > > > > [ 43.581564] > > > > [ 43.583115] Freed by task 3217: > > > > [ 43.586356] kasan_save_stack+0x38/0x68 > > > > [ 43.590314] kasan_set_track+0x28/0x3c > > > > [ 43.594180] kasan_set_free_info+0x28/0x4c > > > > [ 43.598396] ____kasan_slab_free+0x138/0x17c > > > > [ 43.602794] __kasan_slab_free+0x18/0x28 > > > > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > > > > [ 43.611591] kfree+0x138/0x29c > > > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > > > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > > > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > > > > [ 43.666134] sock_ioctl+0x4fc/0x72c > > > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > > > > [ 43.673961] invoke_syscall+0x78/0x20c > > > > [ 43.677820] el0_svc_common+0x12c/0x2f0 > > > > [ 43.681770] do_el0_svc+0x94/0x13c > > > > [ 43.685278] el0_svc+0x5c/0x108 > > > > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > > > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > Thanks, > > > > Zhengping > > > > > > Ok, how about we do something like the following: > > > > > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 > > > > > > The actual real culprit seem to be __mgmt_power_off does cleanup > > > mgmt_pending but that is still accessible via cmd_sync_work_list, this > > > is probably how hci_request was designed but in case of cmd_sync we > > > normally have the data as part of cmd_sync_work_list. > > > > > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > Hi Zhengping, > > > > > > > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > I have a question. Given that each command in the cmd_sync queue > > > > > > should clean up the memory in a callback function. I was wondering if > > > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > > > > necessary? Will this always risk a race condition that cmd has been > > > > > > released when the complete callback or _sync function is run? > > > > > > > > > > Not sure I follow you here, do you have a stack trace when the user > > > > > after free occurs? > > > > > > > > > > > Thanks, > > > > > > Zhengping > > > > > > > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > instead? > > > > > > > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > > > > version, but it will clear the queue and also close the timer. As a > > > > > > > result, when the adapter is turned on again, the timer will not > > > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > > > > re-initiate the queue or to write a new function which only clears the > > > > > > > queue. > > > > > > > > > > > > > > > We also may want to move the clearing logic to > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > I actually have a question here. I saw > > > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > > > > prevent the use-after-free situation. > > > > > > > > > > > > > > Any suggestions to improve the solution? > > > > > > > > > > > > > > Thanks, > > > > > > > Zhengping > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > > > > before use. > > > > > > > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > > > > --- > > > > > > > > > > > > > > > > > > Changes in v2: > > > > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > > > > > > > Changes in v1: > > > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > > > > } > > > > > > > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > +{ > > > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > + > > > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > > > > + if (entry->destroy) { > > > > > > > > > + hci_req_sync_lock(hdev); > > > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > > > > + hci_req_sync_unlock(hdev); > > > > > > > > > + } > > > > > > > > > + list_del(&entry->list); > > > > > > > > > + kfree(entry); > > > > > > > > > + } > > > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > > > > +} > > > > > > > > > + > > > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > { > > > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > instead? We also may want to move the clearing logic to > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > > > > + } > > > > > > > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > > > > -- > > > > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > > > > > -- > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > -- > > > Luiz Augusto von Dentz > > > > -- > Luiz Augusto von Dentz
Hi Luiz, I have revisited this issue and asked our partner to do some more tests. I think the root cause is the use of hdev->mgmt_pending. There are some mgmt_pending_cmd, which would be released at callbacks. For example: > err = hci_cmd_sync_queue(hdev, mgmt_remove_adv_monitor_sync, cmd, mgmt_remove_adv_monitor_complete); mgmt_remove_adv_monitor_complete will release the cmd memory by calling mgmt_pending_remove(cmd). In this case, the call to mgmt_pending_foreach(0, hdev, cmd_complete_rsp, &status) at __mgmt_power_off will double free the memory at a race condition. A quick solution is to detect and skip some opcode at cmd_complete_rsp if a command is consistently released by callback. > static void cmd_complete_rsp(struct mgmt_pending_cmd *cmd, void *data) > { > if (cmd->opcode == MGMT_OP_REMOVE_ADV_MONITOR || > cmd->opcode == MGMT_OP_SET_SSP) > return; To fully remove the race condition, maybe using two lists is necessary. What do you think about this proposal? Thanks, Zhengping Thanks, Zhengping On Tue, Feb 28, 2023 at 6:11 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > Thanks for testing these options! > > > perhaps we need a dedicated flag to ensure cmd_sync cannot be schedule after a certain point > This actually sounds promising to me. I would think about this. > > This does not happen in regular use, but one of our customers has a > script to run a stress test by turning on/off the adapter and > rebooting for a few cycles. Then the crash can be reproduced. If you > have any new ideas, I can schedule a test. > > Just to confirm if you will submit the current patch or you would hold > it for a solid solution? The current patch to clear the cmd_sync list > indeed reduces the crash frequency. > > Best, > Zhengping > > On Tue, Feb 28, 2023 at 5:53 PM Luiz Augusto von Dentz > <luiz.dentz@gmail.com> wrote: > > > > Hi Zhengping, > > > > On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > Hi Luiz, > > > > > > This looks good to me. I still have a question. Does this prevent a > > > job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and > > > "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition > > > is still there. Maybe using cancel_work_sync and re-init the workqueue > > > timer is the right thing to do? > > > > I tried the cancel_work_sync but it doesn't work since to > > cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive > > also tried stopping new scheduling of new work based on HCI_UP flag > > but that causes some tests not to run, perhaps we need a dedicated > > flag to ensure cmd_sync cannot be schedule after a certain point but I > > could found the exact point it is, anyway I fine leaving this to when > > we actually have a more clear understanding or a reproducer. > > > > > Thanks, > > > Zhengping > > > > > > On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > Hi Zhengping, > > > > > > > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > Hi Luiz, > > > > > > > > > > Sure. Hope this helps. > > > > > Here is one log from the user. > > > > > > > > > > [ 53.368740] ================================================================== > > > > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > > > > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > > > > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > > > > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > > > > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > > > > > platform (rev5+) (DT) > > > > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > [ 53.422780] Call trace: > > > > > [ 53.425310] dump_backtrace+0x0/0x424 > > > > > [ 53.429108] show_stack+0x20/0x2c > > > > > > > > > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > > > > > [ 53.436514] print_address_description+0x30/0x2fc > > > > > [ 53.441369] kasan_report+0x15c/0x19c > > > > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > > > > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > > [ 53.472214] process_one_work+0x59c/0xa88 > > > > > [ 53.476990] worker_thread+0x81c/0xd18 > > > > > [ 53.480854] kthread+0x2d4/0x3d8 > > > > > [ 53.484272] ret_from_fork+0x10/0x20 > > > > > > > > > > [ 53.489733] Allocated by task 1162: > > > > > [ 53.493336] kasan_save_stack+0x38/0x68 > > > > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > > > > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > > > > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > > > > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > > [ 53.531850] sock_sendmsg+0xb4/0xd8 > > > > > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > > > > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > > > > > [ 53.543980] do_iter_write+0xf0/0x2e4 > > > > > [ 53.547747] vfs_writev+0xd0/0x13c > > > > > [ 53.551254] do_writev+0xe8/0x1fc > > > > > [ 53.554672] __arm64_sys_writev+0x84/0x98 > > > > > [ 53.558805] invoke_syscall+0x78/0x20c > > > > > [ 53.562665] el0_svc_common+0x12c/0x2f0 > > > > > [ 53.566618] do_el0_svc+0x94/0x13c > > > > > [ 53.570125] el0_svc+0x5c/0x108 > > > > > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > > > > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > [ 53.583089] Freed by task 3207: > > > > > [ 53.586325] kasan_save_stack+0x38/0x68 > > > > > [ 53.590282] kasan_set_track+0x28/0x3c > > > > > [ 53.594153] kasan_set_free_info+0x28/0x4c > > > > > [ 53.598369] ____kasan_slab_free+0x138/0x17c > > > > > [ 53.602767] __kasan_slab_free+0x18/0x28 > > > > > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > > > > > [ 53.611559] kfree+0x138/0x29c > > > > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > > > > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > > > > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > > > > > [ 53.665900] sock_ioctl+0x4fc/0x72c > > > > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > > > > > [ 53.673726] invoke_syscall+0x78/0x20c > > > > > [ 53.677587] el0_svc_common+0x12c/0x2f0 > > > > > [ 53.681533] do_el0_svc+0x94/0x13c > > > > > [ 53.685043] el0_svc+0x5c/0x108 > > > > > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > > > > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > [ 53.697988] Last potentially related work creation: > > > > > [ 53.703009] kasan_save_stack+0x38/0x68 > > > > > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > > > > > [ 53.711622] __call_rcu+0x14c/0x860 > > > > > [ 53.715212] call_rcu+0x18/0x24 > > > > > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > > > > > [ 53.722667] __sk_destruct+0xb4/0x4a8 > > > > > [ 53.726435] sk_destruct+0x78/0xa0 > > > > > [ 53.729941] __sk_free+0x190/0x270 > > > > > [ 53.733453] sk_free+0x54/0x8c > > > > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > > > > > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > > > > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > > > > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > > > > > [ 53.752768] kthread+0x2d4/0x3d8 > > > > > [ 53.756098] ret_from_fork+0x10/0x20 > > > > > > > > > > This is another one at a different function but with the same signature. > > > > > > > > > > [ 43.363512] ================================================================== > > > > > [ 43.370966] BUG: KASAN: use-after-free in > > > > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > > > > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > > > > > [ 43.387158] > > > > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > > > > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > > > > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > > > > > > [ 43.418887] Call trace: > > > > > [ 43.422407] dump_backtrace+0x0/0x424 > > > > > [ 43.426191] show_stack+0x20/0x2c > > > > > [ 43.429608] dump_stack_lvl+0x84/0xb4 > > > > > [ 43.433395] print_address_description+0x30/0x2fc > > > > > [ 43.438243] kasan_report+0x15c/0x19c > > > > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > > > > > hciconfig hci0 up > > > > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > > [ 43.467656] process_one_work+0x59c/0xa88 > > > > > [ 43.472530] worker_thread+0x81c/0xd18 > > > > > [ 43.476410] kthread+0x2d4/0x3d8 > > > > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > > > > > [ 43.486588] > > > > > [ 43.488156] Allocated by task 1118: > > > > > [ 43.491751] kasan_save_stack+0x38/0x68 > > > > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > > > > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > > > > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > > > > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > > > > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > > [ 43.531873] sock_sendmsg+0xb4/0xd8 > > > > > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > > > > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > > > > > [ 43.544012] do_iter_write+0xf0/0x2e4 > > > > > [ 43.547788] vfs_writev+0xd0/0x13c > > > > > [ 43.551295] do_writev+0xe8/0x1fc > > > > > [ 43.554710] __arm64_sys_writev+0x84/0x98 > > > > > [ 43.558838] invoke_syscall+0x78/0x20c > > > > > [ 43.562709] el0_svc_common+0x12c/0x2f0 > > > > > [ 43.566654] do_el0_svc+0x94/0x13c > > > > > [ 43.570155] el0_svc+0x5c/0x108 > > > > > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > > > > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > > > > > [ 43.581564] > > > > > [ 43.583115] Freed by task 3217: > > > > > [ 43.586356] kasan_save_stack+0x38/0x68 > > > > > [ 43.590314] kasan_set_track+0x28/0x3c > > > > > [ 43.594180] kasan_set_free_info+0x28/0x4c > > > > > [ 43.598396] ____kasan_slab_free+0x138/0x17c > > > > > [ 43.602794] __kasan_slab_free+0x18/0x28 > > > > > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > > > > > [ 43.611591] kfree+0x138/0x29c > > > > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > > > > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > > > > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > > > > > [ 43.666134] sock_ioctl+0x4fc/0x72c > > > > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > > > > > [ 43.673961] invoke_syscall+0x78/0x20c > > > > > [ 43.677820] el0_svc_common+0x12c/0x2f0 > > > > > [ 43.681770] do_el0_svc+0x94/0x13c > > > > > [ 43.685278] el0_svc+0x5c/0x108 > > > > > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > > > > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > Thanks, > > > > > Zhengping > > > > > > > > Ok, how about we do something like the following: > > > > > > > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 > > > > > > > > The actual real culprit seem to be __mgmt_power_off does cleanup > > > > mgmt_pending but that is still accessible via cmd_sync_work_list, this > > > > is probably how hci_request was designed but in case of cmd_sync we > > > > normally have the data as part of cmd_sync_work_list. > > > > > > > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > > I have a question. Given that each command in the cmd_sync queue > > > > > > > should clean up the memory in a callback function. I was wondering if > > > > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > > > > > necessary? Will this always risk a race condition that cmd has been > > > > > > > released when the complete callback or _sync function is run? > > > > > > > > > > > > Not sure I follow you here, do you have a stack trace when the user > > > > > > after free occurs? > > > > > > > > > > > > > Thanks, > > > > > > > Zhengping > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > > instead? > > > > > > > > > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > > > > > version, but it will clear the queue and also close the timer. As a > > > > > > > > result, when the adapter is turned on again, the timer will not > > > > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > > > > > re-initiate the queue or to write a new function which only clears the > > > > > > > > queue. > > > > > > > > > > > > > > > > > We also may want to move the clearing logic to > > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > > I actually have a question here. I saw > > > > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > > > > > prevent the use-after-free situation. > > > > > > > > > > > > > > > > Any suggestions to improve the solution? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Zhengping > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > > > > > before use. > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > Changes in v2: > > > > > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > > > > > > > > > Changes in v1: > > > > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > > +{ > > > > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > > + > > > > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > > > > > + if (entry->destroy) { > > > > > > > > > > + hci_req_sync_lock(hdev); > > > > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > > > > > + hci_req_sync_unlock(hdev); > > > > > > > > > > + } > > > > > > > > > > + list_del(&entry->list); > > > > > > > > > > + kfree(entry); > > > > > > > > > > + } > > > > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > > > > > +} > > > > > > > > > > + > > > > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > > { > > > > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > > instead? We also may want to move the clearing logic to > > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > > > > > -- > > > > > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > -- > > > > Luiz Augusto von Dentz > > > > > > > > -- > > Luiz Augusto von Dentz
Hi Zhengping, On Tue, May 9, 2023 at 4:13 PM Zhengping Jiang <jiangzp@google.com> wrote: > > Hi Luiz, > > I have revisited this issue and asked our partner to do some more > tests. I think the root cause is the use of hdev->mgmt_pending. There > are some mgmt_pending_cmd, which would be released at callbacks. For > example: > > > err = hci_cmd_sync_queue(hdev, mgmt_remove_adv_monitor_sync, cmd, mgmt_remove_adv_monitor_complete); > > mgmt_remove_adv_monitor_complete will release the cmd memory by > calling mgmt_pending_remove(cmd). > > In this case, the call to mgmt_pending_foreach(0, hdev, > cmd_complete_rsp, &status) at __mgmt_power_off will double free the > memory at a race condition. > > A quick solution is to detect and skip some opcode at cmd_complete_rsp > if a command is consistently released by callback. There should be some means to remove from the list when cleanup is done, or perhaps this is because of the order we are doing the cleanup? Either way it would be great to have a test add to mgmt-tester to try to reproduce so we make sure we don't reintroduce the same behavior in the future. > > static void cmd_complete_rsp(struct mgmt_pending_cmd *cmd, void *data) > > { > > if (cmd->opcode == MGMT_OP_REMOVE_ADV_MONITOR || > > cmd->opcode == MGMT_OP_SET_SSP) > > return; > > To fully remove the race condition, maybe using two lists is > necessary. What do you think about this proposal? > > Thanks, > Zhengping > > Thanks, > Zhengping > > On Tue, Feb 28, 2023 at 6:11 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > Hi Luiz, > > > > Thanks for testing these options! > > > > > perhaps we need a dedicated flag to ensure cmd_sync cannot be schedule after a certain point > > This actually sounds promising to me. I would think about this. > > > > This does not happen in regular use, but one of our customers has a > > script to run a stress test by turning on/off the adapter and > > rebooting for a few cycles. Then the crash can be reproduced. If you > > have any new ideas, I can schedule a test. > > > > Just to confirm if you will submit the current patch or you would hold > > it for a solid solution? The current patch to clear the cmd_sync list > > indeed reduces the crash frequency. > > > > Best, > > Zhengping > > > > On Tue, Feb 28, 2023 at 5:53 PM Luiz Augusto von Dentz > > <luiz.dentz@gmail.com> wrote: > > > > > > Hi Zhengping, > > > > > > On Tue, Feb 28, 2023 at 4:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > Hi Luiz, > > > > > > > > This looks good to me. I still have a question. Does this prevent a > > > > job scheduled between "hci_cmd_sync_work_list_clear(hdev);" and > > > > "__mgmt_power_off(hdev);"? Otherwise, the chance for a race condition > > > > is still there. Maybe using cancel_work_sync and re-init the workqueue > > > > timer is the right thing to do? > > > > > > I tried the cancel_work_sync but it doesn't work since to > > > cmd_sync_work itself can call hci_dev_close_sync so it deadlocks, Ive > > > also tried stopping new scheduling of new work based on HCI_UP flag > > > but that causes some tests not to run, perhaps we need a dedicated > > > flag to ensure cmd_sync cannot be schedule after a certain point but I > > > could found the exact point it is, anyway I fine leaving this to when > > > we actually have a more clear understanding or a reproducer. > > > > > > > Thanks, > > > > Zhengping > > > > > > > > On Tue, Feb 28, 2023 at 1:11 PM Luiz Augusto von Dentz > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > Hi Zhengping, > > > > > > > > > > On Mon, Feb 27, 2023 at 3:58 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > Sure. Hope this helps. > > > > > > Here is one log from the user. > > > > > > > > > > > > [ 53.368740] ================================================================== > > > > > > [ 53.376167] BUG: KASAN: use-after-free in set_ssp_sync+0x44/0x154 [bluetooth] > > > > > > [ 53.384303] Read of size 8 at addr ffffff80b7ee0318 by task kworker/u17:0/190 > > > > > > [ 53.396342] CPU: 7 PID: 190 Comm: kworker/u17:0 Tainted: G W > > > > > > 5.15.59-lockdep #1 29eed131ef0afd42bc369a6a0ca1c69d3653699b > > > > > > [ 53.408868] Hardware name: Qualcomm Technologies, Inc. sc7280 CRD > > > > > > platform (rev5+) (DT) > > > > > > [ 53.417095] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > > [ 53.422780] Call trace: > > > > > > [ 53.425310] dump_backtrace+0x0/0x424 > > > > > > [ 53.429108] show_stack+0x20/0x2c > > > > > > > > > > > > [ 53.432534] dump_stack_lvl+0x84/0xb4 > > > > > > [ 53.436514] print_address_description+0x30/0x2fc > > > > > > [ 53.441369] kasan_report+0x15c/0x19c > > > > > > [ 53.445975] __asan_report_load8_noabort+0x44/0x50 > > > > > > [ 53.450910] set_ssp_sync+0x44/0x154 [bluetooth > > > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > > > [ 53.460136] hci_cmd_sync_work+0x1c8/0x2c8 [bluetooth > > > > > > 34f6fa2bbf49f3d7faf6ea04e8755ae16590a6b3] > > > > > > [ 53.472214] process_one_work+0x59c/0xa88 > > > > > > [ 53.476990] worker_thread+0x81c/0xd18 > > > > > > [ 53.480854] kthread+0x2d4/0x3d8 > > > > > > [ 53.484272] ret_from_fork+0x10/0x20 > > > > > > > > > > > > [ 53.489733] Allocated by task 1162: > > > > > > [ 53.493336] kasan_save_stack+0x38/0x68 > > > > > > [ 53.498115] __kasan_kmalloc+0xb4/0xd0 > > > > > > [ 53.501993] kmem_cache_alloc_trace+0x29c/0x374 > > > > > > [ 53.506661] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > > > [ 53.511905] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > > [ 53.517059] set_ssp+0x2d8/0x5b0 [bluetooth] > > > > > > [ 53.521575] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > > > [ 53.526538] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > > > [ 53.531850] sock_sendmsg+0xb4/0xd8 > > > > > > [ 53.535454] sock_write_iter+0x1c0/0x2d0 > > > > > > [ 53.539494] do_iter_readv_writev+0x350/0x4e0 > > > > > > [ 53.543980] do_iter_write+0xf0/0x2e4 > > > > > > [ 53.547747] vfs_writev+0xd0/0x13c > > > > > > [ 53.551254] do_writev+0xe8/0x1fc > > > > > > [ 53.554672] __arm64_sys_writev+0x84/0x98 > > > > > > [ 53.558805] invoke_syscall+0x78/0x20c > > > > > > [ 53.562665] el0_svc_common+0x12c/0x2f0 > > > > > > [ 53.566618] do_el0_svc+0x94/0x13c > > > > > > [ 53.570125] el0_svc+0x5c/0x108 > > > > > > [ 53.573374] el0t_64_sync_handler+0x78/0x108 > > > > > > [ 53.577773] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > > > [ 53.583089] Freed by task 3207: > > > > > > [ 53.586325] kasan_save_stack+0x38/0x68 > > > > > > [ 53.590282] kasan_set_track+0x28/0x3c > > > > > > [ 53.594153] kasan_set_free_info+0x28/0x4c > > > > > > [ 53.598369] ____kasan_slab_free+0x138/0x17c > > > > > > [ 53.602767] __kasan_slab_free+0x18/0x28 > > > > > > [ 53.606803] slab_free_freelist_hook+0x188/0x260 > > > > > > [ 53.611559] kfree+0x138/0x29c > > > > > > [ 53.614708] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > > > [ 53.619948] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > > > [ 53.625357] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > > > [ 53.630586] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > > > [ 53.636076] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > > > [ 53.641390] hci_dev_close_sync+0x314/0x814 [bluetooth] > > > > > > [ 53.646882] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > > > [ 53.652017] hci_dev_close+0xa4/0x15c [bluetooth] > > > > > > [ 53.656980] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > > > [ 53.662117] sock_do_ioctl+0xd0/0x1e8 > > > > > > [ 53.665900] sock_ioctl+0x4fc/0x72c > > > > > > [ 53.669500] __arm64_sys_ioctl+0x118/0x154 > > > > > > [ 53.673726] invoke_syscall+0x78/0x20c > > > > > > [ 53.677587] el0_svc_common+0x12c/0x2f0 > > > > > > [ 53.681533] do_el0_svc+0x94/0x13c > > > > > > [ 53.685043] el0_svc+0x5c/0x108 > > > > > > [ 53.688278] el0t_64_sync_handler+0x78/0x108 > > > > > > [ 53.692677] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > > > [ 53.697988] Last potentially related work creation: > > > > > > [ 53.703009] kasan_save_stack+0x38/0x68 > > > > > > [ 53.706962] kasan_record_aux_stack+0x104/0x130 > > > > > > [ 53.711622] __call_rcu+0x14c/0x860 > > > > > > [ 53.715212] call_rcu+0x18/0x24 > > > > > > [ 53.718448] sk_filter_uncharge+0xc0/0x120 > > > > > > [ 53.722667] __sk_destruct+0xb4/0x4a8 > > > > > > [ 53.726435] sk_destruct+0x78/0xa0 > > > > > > [ 53.729941] __sk_free+0x190/0x270 > > > > > > [ 53.733453] sk_free+0x54/0x8c > > > > > > [ 53.736603] deferred_put_nlk_sk+0x1d4/0x20c > > > > > > [ 53.741000] rcu_do_batch+0x3e8/0xd08 > > > > > > [ 53.744772] nocb_cb_wait+0xc8/0xa3c > > > > > > [ 53.748453] rcu_nocb_cb_kthread+0x48/0x134 > > > > > > [ 53.752768] kthread+0x2d4/0x3d8 > > > > > > [ 53.756098] ret_from_fork+0x10/0x20 > > > > > > > > > > > > This is another one at a different function but with the same signature. > > > > > > > > > > > > [ 43.363512] ================================================================== > > > > > > [ 43.370966] BUG: KASAN: use-after-free in > > > > > > mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth] > > > > > > [ 43.379813] Read of size 8 at addr ffffff8096c28d18 by task kworker/u17:0/192 > > > > > > [ 43.387158] > > > > > > [ 43.388705] CPU: 6 PID: 192 Comm: kworker/u17:0 Tainted: G W > > > > > > 5.15.59-lockdep #1 59f35e3dfc07f6688b084869895c7a39892c891a > > > > > > localhost ~ # [ 43.410184] Workqueue: hci0 hci_cmd_sync_work [bluetooth] > > > > > > > > > > > > [ 43.418887] Call trace: > > > > > > [ 43.422407] dump_backtrace+0x0/0x424 > > > > > > [ 43.426191] show_stack+0x20/0x2c > > > > > > [ 43.429608] dump_stack_lvl+0x84/0xb4 > > > > > > [ 43.433395] print_address_description+0x30/0x2fc > > > > > > [ 43.438243] kasan_report+0x15c/0x19c > > > > > > [ 43.442070] __asan_report_load8_noabort+0x44/0x50 > > > > > > hciconfig hci0 up > > > > > > [ 43.447009] mgmt_remove_adv_monitor_sync+0x40/0xcc [bluetooth > > > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > > > [ 43.458568] hci_cmd_sync_work+0x1bc/0x2bc [bluetooth > > > > > > 8dae3a82177133cfa9626e7322b3b0c8f665102d] > > > > > > [ 43.467656] process_one_work+0x59c/0xa88 > > > > > > [ 43.472530] worker_thread+0x81c/0xd18 > > > > > > [ 43.476410] kthread+0x2d4/0x3d8 > > > > > > localhost ~ # [ 43.479753] ret_from_fork+0x10/0x20 > > > > > > [ 43.486588] > > > > > > [ 43.488156] Allocated by task 1118: > > > > > > [ 43.491751] kasan_save_stack+0x38/0x68 > > > > > > [ 43.495709] __kasan_kmalloc+0xb4/0xd0 > > > > > > [ 43.499577] kmem_cache_alloc_trace+0x29c/0x374 > > > > > > [ 43.504238] mgmt_pending_new+0x74/0x200 [bluetooth] > > > > > > sleep 2[ 43.509509] mgmt_pending_add+0x28/0xec [bluetooth] > > > > > > > > > > > > [ 43.515244] remove_adv_monitor+0xf8/0x174 [bluetooth] > > > > > > [ 43.521533] hci_mgmt_cmd+0x5c4/0x8b0 [bluetooth] > > > > > > [ 43.526527] hci_sock_sendmsg+0x28c/0x95c [bluetooth] > > > > > > [ 43.531873] sock_sendmsg+0xb4/0xd8 > > > > > > [ 43.535472] sock_write_iter+0x1c0/0x2d0 > > > > > > [ 43.539519] do_iter_readv_writev+0x350/0x4e0 > > > > > > [ 43.544012] do_iter_write+0xf0/0x2e4 > > > > > > [ 43.547788] vfs_writev+0xd0/0x13c > > > > > > [ 43.551295] do_writev+0xe8/0x1fc > > > > > > [ 43.554710] __arm64_sys_writev+0x84/0x98 > > > > > > [ 43.558838] invoke_syscall+0x78/0x20c > > > > > > [ 43.562709] el0_svc_common+0x12c/0x2f0 > > > > > > [ 43.566654] do_el0_svc+0x94/0x13c > > > > > > [ 43.570155] el0_svc+0x5c/0x108 > > > > > > [ 43.573391] el0t_64_sync_handler+0x78/0x108 > > > > > > [ 43.577785] el0t_64_sync+0x1a4/0x1a8 > > > > > > [ 43.581564] > > > > > > [ 43.583115] Freed by task 3217: > > > > > > [ 43.586356] kasan_save_stack+0x38/0x68 > > > > > > [ 43.590314] kasan_set_track+0x28/0x3c > > > > > > [ 43.594180] kasan_set_free_info+0x28/0x4c > > > > > > [ 43.598396] ____kasan_slab_free+0x138/0x17c > > > > > > [ 43.602794] __kasan_slab_free+0x18/0x28 > > > > > > [ 43.606838] slab_free_freelist_hook+0x188/0x260 > > > > > > [ 43.611591] kfree+0x138/0x29c > > > > > > [ 43.614741] mgmt_pending_free+0xac/0xdc [bluetooth] > > > > > > [ 43.620003] mgmt_pending_remove+0xd8/0xf0 [bluetooth] > > > > > > [ 43.625434] cmd_complete_rsp+0xc8/0x178 [bluetooth] > > > > > > [ 43.630686] mgmt_pending_foreach+0xa8/0xf8 [bluetooth] > > > > > > [ 43.636198] __mgmt_power_off+0x114/0x26c [bluetooth] > > > > > > [ 43.641532] hci_dev_close_sync+0x2ec/0x7ec [bluetooth] > > > > > > [ 43.647049] hci_dev_do_close+0x3c/0x7c [bluetooth] > > > > > > [ 43.652209] hci_dev_close+0xac/0x164 [bluetooth] > > > > > > [ 43.657190] hci_sock_ioctl+0x298/0x444 [bluetooth] > > > > > > [ 43.662353] sock_do_ioctl+0xd0/0x1e8 > > > > > > [ 43.666134] sock_ioctl+0x4fc/0x72c > > > > > > [ 43.669736] __arm64_sys_ioctl+0x118/0x154 > > > > > > [ 43.673961] invoke_syscall+0x78/0x20c > > > > > > [ 43.677820] el0_svc_common+0x12c/0x2f0 > > > > > > [ 43.681770] do_el0_svc+0x94/0x13c > > > > > > [ 43.685278] el0_svc+0x5c/0x108 > > > > > > [ 43.688514] el0t_64_sync_handler+0x78/0x108 > > > > > > [ 43.692913] el0t_64_sync+0x1a4/0x1a8 > > > > > > > > > > > > Thanks, > > > > > > Zhengping > > > > > > > > > > Ok, how about we do something like the following: > > > > > > > > > > https://gist.github.com/Vudentz/365d664275e4d2e2af157e47f0502f50 > > > > > > > > > > The actual real culprit seem to be __mgmt_power_off does cleanup > > > > > mgmt_pending but that is still accessible via cmd_sync_work_list, this > > > > > is probably how hci_request was designed but in case of cmd_sync we > > > > > normally have the data as part of cmd_sync_work_list. > > > > > > > > > > > On Mon, Feb 27, 2023 at 3:41 PM Luiz Augusto von Dentz > > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > > > On Sun, Feb 26, 2023 at 11:18 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > > > > I have a question. Given that each command in the cmd_sync queue > > > > > > > > should clean up the memory in a callback function. I was wondering if > > > > > > > > the call to cmd_complete_rsp in __mgmt_power_off function is still > > > > > > > > necessary? Will this always risk a race condition that cmd has been > > > > > > > > released when the complete callback or _sync function is run? > > > > > > > > > > > > > > Not sure I follow you here, do you have a stack trace when the user > > > > > > > after free occurs? > > > > > > > > > > > > > > > Thanks, > > > > > > > > Zhengping > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 2:37 PM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > > > Hi Luiz, > > > > > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > > > instead? > > > > > > > > > > > > > > > > > > That is a good question and we used hci_cmd_sync_clear in the first > > > > > > > > > version, but it will clear the queue and also close the timer. As a > > > > > > > > > result, when the adapter is turned on again, the timer will not > > > > > > > > > schedule any new jobs. So the option is to use hci_cmd_sync_clear and > > > > > > > > > re-initiate the queue or to write a new function which only clears the > > > > > > > > > queue. > > > > > > > > > > > > > > > > > > > We also may want to move the clearing logic to > > > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > > > > I actually have a question here. I saw > > > > > > > > > "drain_workqueue(hdev->workqueue)" in hci_dev_close_sync and thought > > > > > > > > > it should force clearing the cmd_sync queue. But it seems cannot > > > > > > > > > prevent the use-after-free situation. > > > > > > > > > > > > > > > > > > Any suggestions to improve the solution? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Zhengping > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 1:02 PM Luiz Augusto von Dentz > > > > > > > > > <luiz.dentz@gmail.com> wrote: > > > > > > > > > > > > > > > > > > > > Hi Zhengping, > > > > > > > > > > > > > > > > > > > > On Fri, Feb 24, 2023 at 11:53 AM Zhengping Jiang <jiangzp@google.com> wrote: > > > > > > > > > > > > > > > > > > > > > > Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid > > > > > > > > > > > racing conditions which cause use-after-free. > > > > > > > > > > > > > > > > > > > > > > When powering off the adapter, the mgmt cmd list will be cleared. If a > > > > > > > > > > > work is queued in the cmd_sync_work queue at the same time, it will > > > > > > > > > > > cause the risk of use-after-free, as the cmd pointer is not checked > > > > > > > > > > > before use. > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Zhengping Jiang <jiangzp@google.com> > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > > > Changes in v2: > > > > > > > > > > > - Add function to clear the queue without stop the timer > > > > > > > > > > > > > > > > > > > > > > Changes in v1: > > > > > > > > > > > - Clear cmd_sync_work queue before clearing the mgmt cmd list > > > > > > > > > > > > > > > > > > > > > > net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- > > > > > > > > > > > 1 file changed, 20 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > > > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c > > > > > > > > > > > index 117eedb6f709..b70365dfff0c 100644 > > > > > > > > > > > --- a/net/bluetooth/hci_sync.c > > > > > > > > > > > +++ b/net/bluetooth/hci_sync.c > > > > > > > > > > > @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) > > > > > > > > > > > INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > > > +{ > > > > > > > > > > > + struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > > > + > > > > > > > > > > > + mutex_lock(&hdev->cmd_sync_work_lock); > > > > > > > > > > > + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { > > > > > > > > > > > + if (entry->destroy) { > > > > > > > > > > > + hci_req_sync_lock(hdev); > > > > > > > > > > > + entry->destroy(hdev, entry->data, -ECANCELED); > > > > > > > > > > > + hci_req_sync_unlock(hdev); > > > > > > > > > > > + } > > > > > > > > > > > + list_del(&entry->list); > > > > > > > > > > > + kfree(entry); > > > > > > > > > > > + } > > > > > > > > > > > + mutex_unlock(&hdev->cmd_sync_work_lock); > > > > > > > > > > > +} > > > > > > > > > > > + > > > > > > > > > > > void hci_cmd_sync_clear(struct hci_dev *hdev) > > > > > > > > > > > { > > > > > > > > > > > struct hci_cmd_sync_work_entry *entry, *tmp; > > > > > > > > > > > @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) > > > > > > > > > > > > > > > > > > > > > > if (!auto_off && hdev->dev_type == HCI_PRIMARY && > > > > > > > > > > > !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > > > > > > > > > - hci_dev_test_flag(hdev, HCI_MGMT)) > > > > > > > > > > > + hci_dev_test_flag(hdev, HCI_MGMT)) { > > > > > > > > > > > + hci_pend_cmd_sync_clear(hdev); > > > > > > > > > > > > > > > > > > > > Any particular reason why you are not using hci_cmd_sync_clear > > > > > > > > > > instead? We also may want to move the clearing logic to > > > > > > > > > > hci_dev_close_sync since it should be equivalent to > > > > > > > > > > hci_request_cancel_all. > > > > > > > > > > > > > > > > > > > > > __mgmt_power_off(hdev); > > > > > > > > > > > + } > > > > > > > > > > > > > > > > > > > > > > hci_inquiry_cache_flush(hdev); > > > > > > > > > > > hci_pend_le_actions_clear(hdev); > > > > > > > > > > > -- > > > > > > > > > > > 2.39.2.722.g9855ee24e9-goog > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > > > > > > > > > -- > > > > > Luiz Augusto von Dentz > > > > > > > > > > > > -- > > > Luiz Augusto von Dentz
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 117eedb6f709..b70365dfff0c 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -636,6 +636,23 @@ void hci_cmd_sync_init(struct hci_dev *hdev) INIT_DELAYED_WORK(&hdev->adv_instance_expire, adv_timeout_expire); } +static void hci_pend_cmd_sync_clear(struct hci_dev *hdev) +{ + struct hci_cmd_sync_work_entry *entry, *tmp; + + mutex_lock(&hdev->cmd_sync_work_lock); + list_for_each_entry_safe(entry, tmp, &hdev->cmd_sync_work_list, list) { + if (entry->destroy) { + hci_req_sync_lock(hdev); + entry->destroy(hdev, entry->data, -ECANCELED); + hci_req_sync_unlock(hdev); + } + list_del(&entry->list); + kfree(entry); + } + mutex_unlock(&hdev->cmd_sync_work_lock); +} + void hci_cmd_sync_clear(struct hci_dev *hdev) { struct hci_cmd_sync_work_entry *entry, *tmp; @@ -4842,8 +4859,10 @@ int hci_dev_close_sync(struct hci_dev *hdev) if (!auto_off && hdev->dev_type == HCI_PRIMARY && !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - hci_dev_test_flag(hdev, HCI_MGMT)) + hci_dev_test_flag(hdev, HCI_MGMT)) { + hci_pend_cmd_sync_clear(hdev); __mgmt_power_off(hdev); + } hci_inquiry_cache_flush(hdev); hci_pend_le_actions_clear(hdev);
Clear cmd_sync_work queue before clearing the mgmt cmd list to avoid racing conditions which cause use-after-free. When powering off the adapter, the mgmt cmd list will be cleared. If a work is queued in the cmd_sync_work queue at the same time, it will cause the risk of use-after-free, as the cmd pointer is not checked before use. Signed-off-by: Zhengping Jiang <jiangzp@google.com> --- Changes in v2: - Add function to clear the queue without stop the timer Changes in v1: - Clear cmd_sync_work queue before clearing the mgmt cmd list net/bluetooth/hci_sync.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-)