Message ID | 9deece33-5d7f-9dcb-9aaa-94c60d28fc9a@i-love.sakura.ne.jp (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2] Bluetooth: call lock_sock() outside of spinlock section | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 8 of 8 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: From:/Signed-off-by: email address mismatch: 'From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>' != 'Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>' |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
Hi Tetsuo, On Wed, Jul 7, 2021 at 2:43 AM Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > syzbot is hitting might_sleep() warning at hci_sock_dev_event() due to > calling lock_sock() with rw spinlock held [1]. Defer calling lock_sock() > via sock_hold(). > > Link: https://syzkaller.appspot.com/bug?extid=a5df189917e79d5e59c9 [1] > Reported-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Tested-by: syzbot <syzbot+a5df189917e79d5e59c9@syzkaller.appspotmail.com> > Fixes: e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object") > --- > Changes in v2: > Take hci_sk_list.lock for write in case bt_sock_unlink() is called after > sk_hashed(sk) test, and defer hci_dev_put(hdev) till schedulable context. > > net/bluetooth/hci_sock.c | 32 +++++++++++++++++++++++++++++--- > 1 file changed, 29 insertions(+), 3 deletions(-) > > diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c > index b04a5a02ecf3..d8e1ac1ae10d 100644 > --- a/net/bluetooth/hci_sock.c > +++ b/net/bluetooth/hci_sock.c > @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > > if (event == HCI_DEV_UNREG) { > struct sock *sk; > + bool put_dev; > > +restart: > + put_dev = false; > /* Detach sockets from device */ > read_lock(&hci_sk_list.lock); > sk_for_each(sk, &hci_sk_list.head) { > + /* hci_sk_list.lock is preventing hci_sock_release() > + * from calling bt_sock_unlink(). > + */ > + if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk)) > + continue; > + /* Take a ref because we can't call lock_sock() with > + * hci_sk_list.lock held. > + */ > + sock_hold(sk); > + read_unlock(&hci_sk_list.lock); > lock_sock(sk); > - if (hci_pi(sk)->hdev == hdev) { > + /* Since hci_sock_release() might have already called > + * bt_sock_unlink() while waiting for lock_sock(), > + * use sk_hashed(sk) for checking that bt_sock_unlink() > + * is not yet called. > + */ > + write_lock(&hci_sk_list.lock); > + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) { > hci_pi(sk)->hdev = NULL; > sk->sk_err = EPIPE; > sk->sk_state = BT_OPEN; > sk->sk_state_change(sk); > - > - hci_dev_put(hdev); > + put_dev = true; > } > + write_unlock(&hci_sk_list.lock); > release_sock(sk); > + sock_put(sk); > + if (put_dev) > + hci_dev_put(hdev); > + /* Restarting is safe, for hci_pi(sk)->hdev != hdev if > + * condition met and sk_unhashed(sk) == true otherwise. > + */ > + goto restart; This sounds a little too complicated, afaik backward goto is not even consider a good practice either, since it appears we don't unlink the sockets here we could perhaps don't release the reference to hdev either and leave hci_sock_release to deal with it and then perhaps we can take away the backward goto, actually why are you restarting to begin with? It is also weird that this only manifests in the Bluetooth HCI sockets or other subsystems don't use such locking mechanism anymore? > } > read_unlock(&hci_sk_list.lock); > } > -- > 2.18.4 > >
On 2021/07/08 3:20, Luiz Augusto von Dentz wrote: >> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c >> index b04a5a02ecf3..d8e1ac1ae10d 100644 >> --- a/net/bluetooth/hci_sock.c >> +++ b/net/bluetooth/hci_sock.c >> @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) >> >> if (event == HCI_DEV_UNREG) { >> struct sock *sk; >> + bool put_dev; >> >> +restart: >> + put_dev = false; >> /* Detach sockets from device */ >> read_lock(&hci_sk_list.lock); >> sk_for_each(sk, &hci_sk_list.head) { >> + /* hci_sk_list.lock is preventing hci_sock_release() >> + * from calling bt_sock_unlink(). >> + */ >> + if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk)) >> + continue; >> + /* Take a ref because we can't call lock_sock() with >> + * hci_sk_list.lock held. >> + */ >> + sock_hold(sk); >> + read_unlock(&hci_sk_list.lock); >> lock_sock(sk); >> - if (hci_pi(sk)->hdev == hdev) { >> + /* Since hci_sock_release() might have already called >> + * bt_sock_unlink() while waiting for lock_sock(), >> + * use sk_hashed(sk) for checking that bt_sock_unlink() >> + * is not yet called. >> + */ >> + write_lock(&hci_sk_list.lock); >> + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) { >> hci_pi(sk)->hdev = NULL; >> sk->sk_err = EPIPE; >> sk->sk_state = BT_OPEN; >> sk->sk_state_change(sk); >> - >> - hci_dev_put(hdev); >> + put_dev = true; >> } >> + write_unlock(&hci_sk_list.lock); >> release_sock(sk); >> + sock_put(sk); >> + if (put_dev) >> + hci_dev_put(hdev); >> + /* Restarting is safe, for hci_pi(sk)->hdev != hdev if >> + * condition met and sk_unhashed(sk) == true otherwise. >> + */ >> + goto restart; > > This sounds a little too complicated, afaik backward goto is not even > consider a good practice either, since it appears we don't unlink the > sockets here Because hci_sock_release() might be concurrently called while hci_sock_dev_event() from hci_unregister_dev() from vhci_release() is running. While hci_sock_dev_event() itself does not unlink the sockets from hci_sk_list.head, bt_sock_unlink() from hci_sock_release() unlinks a socket from hci_sk_list.head. Therefore, as long as there is possibility that hci_sk_list is modified by other thread when current thread is traversing this list, we need to be prepared for such race. > we could perhaps don't release the reference to hdev > either and leave hci_sock_release to deal with it and then perhaps we > can take away the backward goto, actually why are you restarting to > begin with? Do you mean something like diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..0525883f4639 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; - /* Detach sockets from device */ + /* Change socket state and notify */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { - lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { - hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - - hci_dev_put(hdev); } - release_sock(sk); } read_unlock(&hci_sk_list.lock); } ? I can't judge because I don't know how this works. I worry that without lock_sock()/release_sock(), this races with e.g. hci_sock_bind(). We could take away the backward goto if we can do something like below. diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..1ca03769badf 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida); static atomic_t monitor_promisc = ATOMIC_INIT(0); +static DEFINE_MUTEX(sock_list_lock); + /* ----- HCI socket interface ----- */ /* Socket info */ @@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) struct sock *sk; /* Detach sockets from device */ - read_lock(&hci_sk_list.lock); + mutex_lock(&sock_list_lock); sk_for_each(sk, &hci_sk_list.head) { lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { @@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) } release_sock(sk); } - read_unlock(&hci_sk_list.lock); + mutex_unlock(&sock_list_lock); } } @@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock) if (!sk) return 0; + mutex_lock(&sock_list_lock); lock_sock(sk); switch (hci_pi(sk)->channel) { @@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock) } bt_sock_unlink(&hci_sk_list, sk); + mutex_unlock(&sock_list_lock); hdev = hci_pi(sk)->hdev; if (hdev) { @@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol, sock->state = SS_UNCONNECTED; sk->sk_state = BT_OPEN; + mutex_lock(&sock_list_lock); bt_sock_link(&hci_sk_list, sk); + mutex_unlock(&sock_list_lock); return 0; } > It is also weird that this only manifests in the Bluetooth > HCI sockets or other subsystems don't use such locking mechanism > anymore? If other subsystems have similar problem, that should be handled by different patches. This patch fixes a regression introduced when fixing CVE-2021-3573, and I think that Linux distributors are waiting for this regression to be fixed so that they can backport commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object"). Also, this regression is currently 7th top crashers for syzbot, and I'd like to apply this patch as soon as possible. I think that this patch can serve as a response to Lin's comment > In short, I have no idea if there is any lock replacing solution for > this bug. I need help and suggestions because the lock mechanism is > just so difficult. at https://patchwork.kernel.org/project/bluetooth/patch/CAJjojJsj9pzF4j2MVvsM-hCpvyR7OkZn232yt3MdOGnLxOiRRg@mail.gmail.com without changing behavior. > > >> } >> read_unlock(&hci_sk_list.lock); >> } >> -- >> 2.18.4
> > diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c > index b04a5a02ecf3..0525883f4639 100644 > --- a/net/bluetooth/hci_sock.c > +++ b/net/bluetooth/hci_sock.c > @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > if (event == HCI_DEV_UNREG) { > struct sock *sk; > > - /* Detach sockets from device */ > + /* Change socket state and notify */ > read_lock(&hci_sk_list.lock); > sk_for_each(sk, &hci_sk_list.head) { > - lock_sock(sk); > if (hci_pi(sk)->hdev == hdev) { > - hci_pi(sk)->hdev = NULL; > sk->sk_err = EPIPE; > sk->sk_state = BT_OPEN; > sk->sk_state_change(sk); > - > - hci_dev_put(hdev); > } > - release_sock(sk); > } > read_unlock(&hci_sk_list.lock); > } > > ? I can't judge because I don't know how this works. I worry that > without lock_sock()/release_sock(), this races with e.g. hci_sock_bind(). > > We could take away the backward goto if we can do something like below. > > diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c > index b04a5a02ecf3..1ca03769badf 100644 > --- a/net/bluetooth/hci_sock.c > +++ b/net/bluetooth/hci_sock.c > @@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida); > > static atomic_t monitor_promisc = ATOMIC_INIT(0); > > +static DEFINE_MUTEX(sock_list_lock); > + > /* ----- HCI socket interface ----- */ > > /* Socket info */ > @@ -760,7 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > struct sock *sk; > > /* Detach sockets from device */ > - read_lock(&hci_sk_list.lock); > + mutex_lock(&sock_list_lock); > sk_for_each(sk, &hci_sk_list.head) { > lock_sock(sk); > if (hci_pi(sk)->hdev == hdev) { > @@ -773,7 +775,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > } > release_sock(sk); > } > - read_unlock(&hci_sk_list.lock); > + mutex_unlock(&sock_list_lock); > } > } > > @@ -838,6 +840,7 @@ static int hci_sock_release(struct socket *sock) > if (!sk) > return 0; > > + mutex_lock(&sock_list_lock); > lock_sock(sk); > > switch (hci_pi(sk)->channel) { > @@ -860,6 +863,7 @@ static int hci_sock_release(struct socket *sock) > } > > bt_sock_unlink(&hci_sk_list, sk); > + mutex_unlock(&sock_list_lock); > > hdev = hci_pi(sk)->hdev; > if (hdev) { > @@ -2049,7 +2053,9 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol, > sock->state = SS_UNCONNECTED; > sk->sk_state = BT_OPEN; > > + mutex_lock(&sock_list_lock); > bt_sock_link(&hci_sk_list, sk); > + mutex_unlock(&sock_list_lock); > return 0; > } > > > > It is also weird that this only manifests in the Bluetooth > > HCI sockets or other subsystems don't use such locking mechanism > > anymore? > Hello Tetsuo, Yeah, that's a great patch indeed. Add one extra mutex lock for handling this. In fact, I have tried to replace all the hci_sk_list.lock from rwlock_t to mutext. > https://patchwork.kernel.org/project/bluetooth/patch/CAJjojJsj9pzF4j2MVvsM-hCpvyR7OkZn232yt3MdOGnLxOiRRg@mail.gmail.com/ > However, from the lock principle in the Linux kernel, this lock > replacement is not appropriate. I take a lot of time to try with other > lock combinations for this case but failed. For example, I tried to > replace the rwlock_t in the hci_sk_list with a sleep-able mutex lock. Because I have seem other part of code in kernel uses this combination: mutex_t + lock_sock. It shouldn't trigger any locking errors. (Will test it) > Also, this regression is currently 7th top > crashers for syzbot, and I'd like to apply this patch as soon as possible. > XD, Yeah. Because the bug crash point is located at function hci_sock_dev_event(). Whenever syzkaller fuzzes Bluetooth stack and the executor exits, the crash happens. > I think that this patch can serve as a response to Lin's comment > > In short, I have no idea if there is any lock replacing solution for > > this bug. I need help and suggestions because the lock mechanism is > > just so difficult. Thanks for that, it's quite appreciating. Regards Lin Ma
It seems that history of this locking problem is a trial and error. Commit b40df5743ee8aed8 ("[PATCH] bluetooth: fix socket locking in hci_sock_dev_event()") in 2.6.21-rc4 changed bh_lock_sock() to lock_sock() as an attempt to fix lockdep warning. Then, commit 4ce61d1c7a8ef4c1 ("[BLUETOOTH]: Fix locking in hci_sock_dev_event().") in 2.6.22-rc2 changed lock_sock() to local_bh_disable() + bh_lock_sock_nested() as an attempt to fix sleep in atomic context warning. Then, commit 4b5dd696f81b210c ("Bluetooth: Remove local_bh_disable() from hci_sock.c") in 3.3-rc1 removed local_bh_disable(). Then, commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object") in 5.13-rc5 again changed bh_lock_sock_nested() to lock_sock() as an attempt to fix CVE-2021-3573. But unfortunately it is too difficult to convert rw spinlock into mutex; we need to live with current rw spinlock. And we have three choices that can live with current rw spinlock. Patches for these choices are show bottom. All tested by syzbot. (1) Introduce a global mutex dedicated for hci_sock_dev_event(), and block bt_sock_unlink() and concurrent hci_sock_dev_event() callers. This is simplest if it is guaranteed that total delay for lock_sock() on all sockets is short enough. But it is not clear how long lock_sock() might block, for e.g. hci_sock_bound_ioctl() which is called inside lock_sock() section is doing copy_from_user()/copy_to_user() which may result in blocking other lock_sock() waiters for many seconds. I think that POC.zip is demonstrating that this delay is controllable via userfaultfd. Of course, the robust fix will be not to call copy_from_user()/copy_to_user() inside lock_sock() section. But such big change is not suitable for a fix for commit e305509e678b3a4a ("Bluetooth: use correct lock to prevent UAF of hdev object"). (2) Introduce a global mutex for hci_sock_dev_event(), but don't block bt_sock_unlink() nor concurrent hci_sock_dev_event() callers (i.e. use this mutex like a spinlock). Since it will be safe to resume sk_for_each() as long as currently accessing socket remains on that list, we can track which socket is currently accessed, and let bt_sock_unlink() wait if that socket is currently accessed. It is possible that total delay becomes long enough for khungtaskd to complain. Commit 8d0caedb75968304 ("can: bcm/raw/isotp: use per module netdevice notifier") is an example for avoiding khungtaskd warning using this choice. Compared to that commit, this choice for hci_sock_dev_event() case will need to also touch "struct hci_pinfo" because we need to track concurrent hci_sock_dev_event() callers. (3) Don't introduce a global mutex for hci_sock_dev_event(), and don't block bt_sock_unlink() nor concurrent hci_sock_dev_event() callers. Since it will be safe to resume sk_for_each() as long as currently accessing socket remains on that list, take a refcount on currently accessing socket and check if currently accessing socket is still on the list. This choice needs to touch only hci_sock_dev_event(). Which choice do we want to go? Patch for choice (1): ---------------------------------------- net/bluetooth/hci_sock.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..c860ec4ea7b8 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -150,6 +150,8 @@ static struct bt_sock_list hci_sk_list = { .lock = __RW_LOCK_UNLOCKED(hci_sk_list.lock) }; +static DEFINE_MUTEX(hci_sk_list_lock); + static bool is_filtered_packet(struct sock *sk, struct sk_buff *skb) { struct hci_filter *flt; @@ -758,10 +760,13 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; + int put_count = 0; /* Detach sockets from device */ + mutex_lock(&hci_sk_list_lock); read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { + read_unlock(&hci_sk_list.lock); lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; @@ -769,11 +774,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - hci_dev_put(hdev); + put_count++; } release_sock(sk); + read_lock(&hci_sk_list.lock); } read_unlock(&hci_sk_list.lock); + mutex_unlock(&hci_sk_list_lock); + while (put_count--) + hci_dev_put(hdev); } } @@ -838,6 +847,10 @@ static int hci_sock_release(struct socket *sock) if (!sk) return 0; + mutex_lock(&hci_sk_list_lock); + bt_sock_unlink(&hci_sk_list, sk); + mutex_unlock(&hci_sk_list_lock); + lock_sock(sk); switch (hci_pi(sk)->channel) { @@ -859,8 +872,6 @@ static int hci_sock_release(struct socket *sock) break; } - bt_sock_unlink(&hci_sk_list, sk); - hdev = hci_pi(sk)->hdev; if (hdev) { if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { ---------------------------------------- Patch for choice (2): ---------------------------------------- net/bluetooth/hci_sock.c | 39 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 36 insertions(+), 3 deletions(-) diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..3e65fcc8c9af 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -43,6 +43,8 @@ static DEFINE_IDA(sock_cookie_ida); static atomic_t monitor_promisc = ATOMIC_INIT(0); +static DEFINE_MUTEX(dev_event_lock); + /* ----- HCI socket interface ----- */ /* Socket info */ @@ -57,6 +59,7 @@ struct hci_pinfo { unsigned long flags; __u32 cookie; char comm[TASK_COMM_LEN]; + unsigned int event_in_progress; }; void hci_sock_set_flag(struct sock *sk, int nr) @@ -758,10 +761,15 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; + int put_count = 0; /* Detach sockets from device */ + mutex_lock(&dev_event_lock); read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { + read_unlock(&hci_sk_list.lock); + hci_pi(sk)->event_in_progress++; + mutex_unlock(&dev_event_lock); lock_sock(sk); if (hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; @@ -769,11 +777,17 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - hci_dev_put(hdev); + put_count++; } release_sock(sk); + mutex_lock(&dev_event_lock); + hci_pi(sk)->event_in_progress--; + read_lock(&hci_sk_list.lock); } read_unlock(&hci_sk_list.lock); + mutex_unlock(&dev_event_lock); + while (put_count--) + hci_dev_put(hdev); } } @@ -838,6 +852,26 @@ static int hci_sock_release(struct socket *sock) if (!sk) return 0; + /* + * Wait for sk_for_each() in hci_sock_dev_event() to stop accessing + * this sk before unlinking. Need to unlink before lock_sock(), for + * hci_sock_dev_event() calls lock_sock() after incrementing + * event_in_progress counter. + */ + while (1) { + bool unlinked = true; + + mutex_lock(&dev_event_lock); + if (!hci_pi(sk)->event_in_progress) + bt_sock_unlink(&hci_sk_list, sk); + else + unlinked = false; + mutex_unlock(&dev_event_lock); + if (unlinked) + break; + schedule_timeout_uninterruptible(1); + } + lock_sock(sk); switch (hci_pi(sk)->channel) { @@ -859,8 +893,6 @@ static int hci_sock_release(struct socket *sock) break; } - bt_sock_unlink(&hci_sk_list, sk); - hdev = hci_pi(sk)->hdev; if (hdev) { if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { @@ -2049,6 +2081,7 @@ static int hci_sock_create(struct net *net, struct socket *sock, int protocol, sock->state = SS_UNCONNECTED; sk->sk_state = BT_OPEN; + hci_pi(sk)->event_in_progress = 0; bt_sock_link(&hci_sk_list, sk); return 0; } ---------------------------------------- Patch for choice (3): ---------------------------------------- net/bluetooth/hci_sock.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..38146cf37378 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -758,22 +758,53 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; + int put_count = 0; /* Detach sockets from device */ +restart: read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { + /* This sock_hold(sk) is safe, for bt_sock_unlink(sk) + * is not called yet. + */ + sock_hold(sk); + read_unlock(&hci_sk_list.lock); lock_sock(sk); - if (hci_pi(sk)->hdev == hdev) { + write_lock(&hci_sk_list.lock); + /* Check that bt_sock_unlink(sk) is not called yet. */ + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - hci_dev_put(hdev); + put_count++; } + write_unlock(&hci_sk_list.lock); release_sock(sk); + read_lock(&hci_sk_list.lock); + /* If bt_sock_unlink(sk) is not called yet, we can + * continue iteration. We can use __sock_put(sk) here + * because hci_sock_release() will call sock_put(sk) + * after bt_sock_unlink(sk). + */ + if (sk_hashed(sk)) { + __sock_put(sk); + continue; + } + /* Otherwise, we need to restart iteration, for the + * next socket pointed by sk->next might be already + * gone. We can't use __sock_put(sk) here because + * hci_sock_release() might have already called + * sock_put(sk) after bt_sock_unlink(sk). + */ + read_unlock(&hci_sk_list.lock); + sock_put(sk); + goto restart; } read_unlock(&hci_sk_list.lock); + while (put_count--) + hci_dev_put(hdev); } } ----------------------------------------
On 2021/07/08 8:33, Tetsuo Handa wrote: >> we could perhaps don't release the reference to hdev >> either and leave hci_sock_release to deal with it and then perhaps we >> can take away the backward goto, actually why are you restarting to >> begin with? > > Do you mean something like > > diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c > index b04a5a02ecf3..0525883f4639 100644 > --- a/net/bluetooth/hci_sock.c > +++ b/net/bluetooth/hci_sock.c > @@ -759,19 +759,14 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) > if (event == HCI_DEV_UNREG) { > struct sock *sk; > > - /* Detach sockets from device */ > + /* Change socket state and notify */ > read_lock(&hci_sk_list.lock); > sk_for_each(sk, &hci_sk_list.head) { > - lock_sock(sk); > if (hci_pi(sk)->hdev == hdev) { > - hci_pi(sk)->hdev = NULL; > sk->sk_err = EPIPE; > sk->sk_state = BT_OPEN; > sk->sk_state_change(sk); > - > - hci_dev_put(hdev); > } > - release_sock(sk); > } > read_unlock(&hci_sk_list.lock); > } > > ? I can't judge because I don't know how this works. I worry that > without lock_sock()/release_sock(), this races with e.g. hci_sock_bind(). > I examined hci_unregister_dev() and concluded that this can't work. hci_sock_dev_event(hdev, HCI_DEV_UNREG) can't defer dropping the reference to this hdev till hci_sock_release(), for hci_unregister_dev() cleans up everything related to this hdev and calls hci_dev_put(hdev) and then vhci_release() calls hci_free_dev(hdev). That's the reason hci_sock_dev_event() has to use lock_sock() in order not to miss some hci_dev_put(hdev) calls. >> This sounds a little too complicated, afaik backward goto is not even >> consider a good practice either, since it appears we don't unlink the >> sockets here Despite your comment, I'd like to go with choice (3) for now. After lock_sock() became free from delay caused by pagefault handling, we could consider updating to choice (1).
diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index b04a5a02ecf3..d8e1ac1ae10d 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -758,20 +758,46 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event) if (event == HCI_DEV_UNREG) { struct sock *sk; + bool put_dev; +restart: + put_dev = false; /* Detach sockets from device */ read_lock(&hci_sk_list.lock); sk_for_each(sk, &hci_sk_list.head) { + /* hci_sk_list.lock is preventing hci_sock_release() + * from calling bt_sock_unlink(). + */ + if (hci_pi(sk)->hdev != hdev || sk_unhashed(sk)) + continue; + /* Take a ref because we can't call lock_sock() with + * hci_sk_list.lock held. + */ + sock_hold(sk); + read_unlock(&hci_sk_list.lock); lock_sock(sk); - if (hci_pi(sk)->hdev == hdev) { + /* Since hci_sock_release() might have already called + * bt_sock_unlink() while waiting for lock_sock(), + * use sk_hashed(sk) for checking that bt_sock_unlink() + * is not yet called. + */ + write_lock(&hci_sk_list.lock); + if (sk_hashed(sk) && hci_pi(sk)->hdev == hdev) { hci_pi(sk)->hdev = NULL; sk->sk_err = EPIPE; sk->sk_state = BT_OPEN; sk->sk_state_change(sk); - - hci_dev_put(hdev); + put_dev = true; } + write_unlock(&hci_sk_list.lock); release_sock(sk); + sock_put(sk); + if (put_dev) + hci_dev_put(hdev); + /* Restarting is safe, for hci_pi(sk)->hdev != hdev if + * condition met and sk_unhashed(sk) == true otherwise. + */ + goto restart; } read_unlock(&hci_sk_list.lock); }