diff mbox series

[v2] Bluetooth: hci_core: fix suspicious RCU usage in hci_conn_drop()

Message ID 20240725134741.27281-2-yskelg@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] Bluetooth: hci_core: fix suspicious RCU usage in hci_conn_drop() | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
tedd_an/CheckPatch success CheckPatch PASS
tedd_an/GitLint success Gitlint PASS
tedd_an/SubjectPrefix success Gitlint PASS
tedd_an/BuildKernel success BuildKernel PASS
tedd_an/CheckAllWarning success CheckAllWarning PASS
tedd_an/CheckSparse success CheckSparse PASS
tedd_an/CheckSmatch success CheckSparse PASS
tedd_an/BuildKernel32 success BuildKernel32 PASS
tedd_an/TestRunnerSetup success TestRunnerSetup PASS
tedd_an/TestRunner_l2cap-tester success TestRunner PASS
tedd_an/TestRunner_iso-tester fail TestRunner_iso-tester: Total: 122, Passed: 117 (95.9%), Failed: 1, Not Run: 4
tedd_an/TestRunner_bnep-tester success TestRunner PASS
tedd_an/TestRunner_mgmt-tester fail TestRunner_mgmt-tester: Total: 492, Passed: 489 (99.4%), Failed: 1, Not Run: 2
tedd_an/TestRunner_rfcomm-tester success TestRunner PASS
tedd_an/TestRunner_sco-tester success TestRunner PASS
tedd_an/TestRunner_ioctl-tester success TestRunner PASS
tedd_an/TestRunner_mesh-tester success TestRunner PASS
tedd_an/TestRunner_smp-tester success TestRunner PASS
tedd_an/TestRunner_userchan-tester success TestRunner PASS
tedd_an/IncrementalBuild success Incremental Build PASS

Commit Message

Yunseong Kim July 25, 2024, 1:47 p.m. UTC
Protection from the queuing operation is achieved with an RCU read lock
to avoid calling 'queue_delayed_work()' after 'cancel_delayed_work()',
but this does not apply to 'hci_conn_drop()'.

commit deee93d13d38 ("Bluetooth: use hdev->workqueue when queuing
 hdev->{cmd,ncmd}_timer works")

The situation described raises concerns about suspicious RCU usage in a
corrupted context.

CPU 1                   CPU 2
 hci_dev_do_reset()
  synchronize_rcu()      hci_conn_drop()
  drain_workqueue()       <-- no RCU read protection during queuing. -->
                           queue_delayed_work()

It displays a warning message like the following

Bluetooth: hci0: unexpected cc 0x0c38 length: 249 > 2
=============================
WARNING: suspicious RCU usage
6.10.0-rc6-01340-gf14c0bb78769 #5 Not tainted
-----------------------------
net/mac80211/util.c:4000 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz-executor/798:
 #0: ffff800089a3de50 (rtnl_mutex){+.+.}-{4:4},
    at: rtnl_lock+0x28/0x40 net/core/rtnetlink.c:79

stack backtrace:
CPU: 0 PID: 798 Comm: syz-executor Not tainted
  6.10.0-rc6-01340-gf14c0bb78769 #5
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x1b8/0x1d0 arch/arm64/kernel/stacktrace.c:317
 dump_backtrace arch/arm64/kernel/stacktrace.c:323 [inline]
 show_stack+0x34/0x50 arch/arm64/kernel/stacktrace.c:324
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xf0/0x170 lib/dump_stack.c:114
 dump_stack+0x20/0x30 lib/dump_stack.c:123
 lockdep_rcu_suspicious+0x204/0x2f8 kernel/locking/lockdep.c:6712
 ieee80211_check_combinations+0x71c/0x828 [mac80211]
 ieee80211_check_concurrent_iface+0x494/0x700 [mac80211]
 ieee80211_open+0x140/0x238 [mac80211]
 __dev_open+0x270/0x498 net/core/dev.c:1474
 __dev_change_flags+0x47c/0x610 net/core/dev.c:8837
 dev_change_flags+0x98/0x170 net/core/dev.c:8909
 devinet_ioctl+0xdf0/0x18d0 net/ipv4/devinet.c:1177
 inet_ioctl+0x34c/0x388 net/ipv4/af_inet.c:1003
 sock_do_ioctl+0xe4/0x240 net/socket.c:1222
 sock_ioctl+0x4cc/0x740 net/socket.c:1341
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:907 [inline]
 __se_sys_ioctl fs/ioctl.c:893 [inline]
 __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:893
 __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
 invoke_syscall+0x90/0x2e8 arch/arm64/kernel/syscall.c:48
 el0_svc_common.constprop.0+0x200/0x2a8 arch/arm64/kernel/syscall.c:131
 el0_svc+0x48/0xc0 arch/arm64/kernel/entry-common.c:712
 el0t_64_sync_handler+0x120/0x130 arch/arm64/kernel/entry-common.c:730
 el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:598

This patch attempts to fix that issue with the same convention.

Cc: stable@vger.kernel.org # v6.1+
Fixes: deee93d13d38 ("Bluetooth: use hdev->workqueue when queuing hdev->{cmd,ncmd}_timer works")
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Tested-by: Yunseong Kim <yskelg@gmail.com>
Signed-off-by: Yunseong Kim <yskelg@gmail.com>
---
 include/net/bluetooth/hci_core.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Tetsuo Handa July 25, 2024, 2:32 p.m. UTC | #1
On 2024/07/25 22:47, Yunseong Kim wrote:
> =============================
> WARNING: suspicious RCU usage
> 6.10.0-rc6-01340-gf14c0bb78769 #5 Not tainted
> -----------------------------
> net/mac80211/util.c:4000 RCU-list traversed in non-reader section!!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 2, debug_locks = 1
> 2 locks held by syz-executor/798:
>  #0: ffff800089a3de50 (rtnl_mutex){+.+.}-{4:4},
>     at: rtnl_lock+0x28/0x40 net/core/rtnetlink.c:79
> 
> stack backtrace:
> CPU: 0 PID: 798 Comm: syz-executor Not tainted
>   6.10.0-rc6-01340-gf14c0bb78769 #5
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x1b8/0x1d0 arch/arm64/kernel/stacktrace.c:317
>  dump_backtrace arch/arm64/kernel/stacktrace.c:323 [inline]
>  show_stack+0x34/0x50 arch/arm64/kernel/stacktrace.c:324
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xf0/0x170 lib/dump_stack.c:114
>  dump_stack+0x20/0x30 lib/dump_stack.c:123
>  lockdep_rcu_suspicious+0x204/0x2f8 kernel/locking/lockdep.c:6712
>  ieee80211_check_combinations+0x71c/0x828 [mac80211]
>  ieee80211_check_concurrent_iface+0x494/0x700 [mac80211]
>  ieee80211_open+0x140/0x238 [mac80211]
>  __dev_open+0x270/0x498 net/core/dev.c:1474
>  __dev_change_flags+0x47c/0x610 net/core/dev.c:8837
>  dev_change_flags+0x98/0x170 net/core/dev.c:8909
>  devinet_ioctl+0xdf0/0x18d0 net/ipv4/devinet.c:1177
>  inet_ioctl+0x34c/0x388 net/ipv4/af_inet.c:1003
>  sock_do_ioctl+0xe4/0x240 net/socket.c:1222
>  sock_ioctl+0x4cc/0x740 net/socket.c:1341
>  vfs_ioctl fs/ioctl.c:51 [inline]
>  __do_sys_ioctl fs/ioctl.c:907 [inline]
>  __se_sys_ioctl fs/ioctl.c:893 [inline]
>  __arm64_sys_ioctl+0x184/0x218 fs/ioctl.c:893
>  __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
>  invoke_syscall+0x90/0x2e8 arch/arm64/kernel/syscall.c:48
>  el0_svc_common.constprop.0+0x200/0x2a8 arch/arm64/kernel/syscall.c:131
>  el0_svc+0x48/0xc0 arch/arm64/kernel/entry-common.c:712
>  el0t_64_sync_handler+0x120/0x130 arch/arm64/kernel/entry-common.c:730
>  el0t_64_sync+0x190/0x198 arch/arm64/kernel/entry.S:598
> 
> This patch attempts to fix that issue with the same convention.

Excuse me, but I can't interpret why this patch solves the warning.

The warning says that list_for_each_entry_rcu() { } in
ieee80211_check_combinations() is called outside of rcu_read_lock() and
rcu_read_unlock() pair, doesn't it? How does that connected to
guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock()
and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { }
in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock()
pair (or annotate that appropriate locks are already held), I can't expect
that the warning will be solved...

Also, what guarantees that drain_workqueue() won't be disturbed by
queue_work(disc_work) which will be called after "timeo" delay, for you are
not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work
and "ncmd_timer" work shown below) before calling drain_workqueue() ?

	/* Cancel these to avoid queueing non-chained pending work */
	hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
	/* Wait for
	 *
	 *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
	 *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
	 *
	 * inside RCU section to see the flag or complete scheduling.
	 */
	synchronize_rcu();
	/* Explicitly cancel works in case scheduled after setting the flag. */
	cancel_delayed_work(&hdev->cmd_timer);
	cancel_delayed_work(&hdev->ncmd_timer);

	/* Avoid potential lockdep warnings from the *_flush() calls by
	 * ensuring the workqueue is empty up front.
	 */
	drain_workqueue(hdev->workqueue);
bluez.test.bot@gmail.com July 25, 2024, 2:34 p.m. UTC | #2
This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=873851

---Test result---

Test Summary:
CheckPatch                    PASS      0.69 seconds
GitLint                       PASS      0.32 seconds
SubjectPrefix                 PASS      0.13 seconds
BuildKernel                   PASS      29.69 seconds
CheckAllWarning               PASS      31.92 seconds
CheckSparse                   PASS      36.93 seconds
CheckSmatch                   PASS      102.67 seconds
BuildKernel32                 PASS      28.92 seconds
TestRunnerSetup               PASS      545.76 seconds
TestRunner_l2cap-tester       PASS      20.45 seconds
TestRunner_iso-tester         FAIL      38.60 seconds
TestRunner_bnep-tester        PASS      4.97 seconds
TestRunner_mgmt-tester        FAIL      122.62 seconds
TestRunner_rfcomm-tester      PASS      7.65 seconds
TestRunner_sco-tester         PASS      15.24 seconds
TestRunner_ioctl-tester       PASS      8.12 seconds
TestRunner_mesh-tester        PASS      6.04 seconds
TestRunner_smp-tester         PASS      7.04 seconds
TestRunner_userchan-tester    PASS      5.12 seconds
IncrementalBuild              PASS      29.77 seconds

Details
##############################
Test: TestRunner_iso-tester - FAIL
Desc: Run iso-tester with test-runner
Output:
Total: 122, Passed: 117 (95.9%), Failed: 1, Not Run: 4

Failed Test Cases
ISO Connect Suspend - Success                        Failed       4.194 seconds
##############################
Test: TestRunner_mgmt-tester - FAIL
Desc: Run mgmt-tester with test-runner
Output:
Total: 492, Passed: 489 (99.4%), Failed: 1, Not Run: 2

Failed Test Cases
LL Privacy - Remove Device 4 (Disable Adv)           Timed out    1.898 seconds


---
Regards,
Linux Bluetooth
Yunseong Kim July 27, 2024, 6:39 p.m. UTC | #3
Hi Tetsuo,

> Excuse me, but I can't interpret why this patch solves the warning.
> 
> The warning says that list_for_each_entry_rcu() { } in
> ieee80211_check_combinations() is called outside of rcu_read_lock() and
> rcu_read_unlock() pair, doesn't it? How does that connected to
> guarding hci_dev_test_flag() and queue_delayed_work() with rcu_read_lock()
> and rcu_read_unlock() pair? Unless you guard list_for_each_entry_rcu() { }
> in ieee80211_check_combinations() with rcu_read_lock() and rcu_read_unlock()
> pair (or annotate that appropriate locks are already held), I can't expect
> that the warning will be solved...

Thank you for the code review.

Sorry, I apologize for attaching the wrong kernel dump.

> Also, what guarantees that drain_workqueue() won't be disturbed by
> queue_work(disc_work) which will be called after "timeo" delay, for you are
> not explicitly cancelling scheduled "disc_work" (unlike "cmd_timer" work
> and "ncmd_timer" work shown below) before calling drain_workqueue() ?
> 
> 	/* Cancel these to avoid queueing non-chained pending work */
> 	hci_dev_set_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE);
> 	/* Wait for
> 	 *
> 	 *    if (!hci_dev_test_flag(hdev, HCI_CMD_DRAIN_WORKQUEUE))
> 	 *        queue_delayed_work(&hdev->{cmd,ncmd}_timer)
> 	 *
> 	 * inside RCU section to see the flag or complete scheduling.
> 	 */
> 	synchronize_rcu();
> 	/* Explicitly cancel works in case scheduled after setting the flag. */
> 	cancel_delayed_work(&hdev->cmd_timer);
> 	cancel_delayed_work(&hdev->ncmd_timer);
> 
> 	/* Avoid potential lockdep warnings from the *_flush() calls by
> 	 * ensuring the workqueue is empty up front.
> 	 */
> 	drain_workqueue(hdev->workqueue);


Please bear with me for a moment.

I'll attach the correct kernel dump and resend the patch email.


Warm regards,

Yunseong Kim
diff mbox series

Patch

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 31020891fc68..111509dc1a23 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -1572,8 +1572,13 @@  static inline void hci_conn_drop(struct hci_conn *conn)
 		}
 
 		cancel_delayed_work(&conn->disc_work);
-		queue_delayed_work(conn->hdev->workqueue,
-				   &conn->disc_work, timeo);
+
+		rcu_read_lock();
+		if (!hci_dev_test_flag(conn->hdev, HCI_CMD_DRAIN_WORKQUEUE)) {
+			queue_delayed_work(conn->hdev->workqueue,
+							   &conn->disc_work, timeo);
+		}
+		rcu_read_unlock();
 	}
 }