mbox series

[RFC,0/6] Locking in hci_sync

Message ID cover.1690399379.git.pav@iki.fi (mailing list archive)
Headers show
Series Locking in hci_sync | expand

Message

Pauli Virtanen July 26, 2023, 9:25 p.m. UTC
The current guarantees that a given hci_conn is not freed concurrently
appear to be:

- hci_dev_lock is held, or,
- rcu_read_lock is held  (doesn't guarantee valid conn state), or,
- hci_conn_get refcount held (doesn't guarantee valid conn state),
- current task is running from hdev->workqueue (which is ordered)

The last condition is not exactly true as hci_conn_del is currently
called also elsewhere, but is assumed in hci_event.c and hci_core.c.

It does not appear possible to assume in hdev->req_workqueue, especially
in hci_sync callbacks, that a given conn stays alive at any time if none
of the above precautions are taken.  Similarly, any conn_hash iteration
without locks there appears invalid.

E.g. Disconnect events emitted from remote appear possible to occur at
any time in hci_sync.  There are also some hci_conn_del callsites not in
hdev->workqueue which can run concurrently, eg in __iso_sock_close
(which also doesn't hold hdev->lock).

KASAN crashes are known on real HW, and the deletion race conditions can
be hit in VHCI (see the other BlueZ tester patch series).

***

What to do?

One way to guard against using already freed conns is hci_conn_get +
check that the connection was not deleted, inside suitable critical
section. I didn't find a reliable existing check if hci_conn_del was
run.

To enable using hci_conn_get, the series here suggests again adding
HCI_CONN_DELETED flag to indicate whether hci_conn_del has run (with
hdev->lock or in hdev->workqueue).

It also adds some helpers to make writing hci_sync callbacks that
operate on a given hci_conn or need hdev->lock easier to write. The
patches here add hci_conn_sync_queue, which check if hci_conn was
deleted in meantime, and hold hdev->lock in the callback (but release it
during waits), so that concurrent modification of hci_conn can only
happen during event waits. This is something that is already assumed but
might not be true eg. if the two workqueues run jobs on different CPU.

The last two patches in the series are some motivating ISO related
changes, for proper cleanup of CIS, lookup by handles doesn't quite
work (and also doesn't protect against the deletion race).

***

I tried to find some alternatives, but this seemed minimal one.

I don't see how to make use of hci_conn_drop/hold here, as they appear
to have different purpose, if we change that then how socket channels
work seem to a new mechanism. E.g. nonzero refcnt from socket should not
keep connection alive when eg. Disconnect event from remote occurs.  In
that case we also need to clean up the connection ASAP so the handle can
be reused.

One alternative that makes continuing conn_hash iteration a bit simpler
would be to remove hci_conn from conn_hash only when hci_conn_get
refcount hits zero, so holding hci_conn_get reference would keep the
conn a valid iteration cursor. Also setting conn->type to INVALID_LINK
on deletion could then make lookup functions skip deleted conns.
However, one would need to take a look at all places where conn_hash is
iterated and think it through.

It maybe could also be possible to not allow events to be processed
while hci_sync is running, except when it is waiting for an event (=
more or less, take hci_cmd_sync_dev_lock in all callbacks so hdev->lock
serializes things).  However, it seems a deletion flag would be needed
also here, as the conn might be gone while we are waiting for events.

Pauli Virtanen (6):
  Bluetooth: hci_conn: hci_conn_cleanup is not needed, combine with del
  Bluetooth: hci_conn: add hci_conn_is_alive
  Bluetooth: hci_sync: add hci_conn_sync_queue and
    hci_cmd_sync_dev_(un)lock
  Bluetooth: hci_sync: fix locking in hci_conn_abort and
    hci_disconnect_all_sync
  Bluetooth: hci_sync: delete CIS in BT_OPEN/CONNECT/BOUND when aborting
  Bluetooth: ISO: handle bound CIS cleanup via hci_conn

 include/net/bluetooth/hci_core.h |  22 +++++
 include/net/bluetooth/hci_sync.h |   3 +
 net/bluetooth/hci_conn.c         | 157 ++++++++++++++++++++-----------
 net/bluetooth/hci_sync.c         |  97 +++++++++++++++----
 net/bluetooth/iso.c              |  14 +--
 5 files changed, 211 insertions(+), 82 deletions(-)

Comments

patchwork-bot+bluetooth@kernel.org Aug. 4, 2023, 10:50 p.m. UTC | #1
Hello:

This series was applied to bluetooth/bluetooth-next.git (master)
by Luiz Augusto von Dentz <luiz.von.dentz@intel.com>:

On Thu, 27 Jul 2023 00:25:20 +0300 you wrote:
> The current guarantees that a given hci_conn is not freed concurrently
> appear to be:
> 
> - hci_dev_lock is held, or,
> - rcu_read_lock is held  (doesn't guarantee valid conn state), or,
> - hci_conn_get refcount held (doesn't guarantee valid conn state),
> - current task is running from hdev->workqueue (which is ordered)
> 
> [...]

Here is the summary with links:
  - [RFC,1/6] Bluetooth: hci_conn: hci_conn_cleanup is not needed, combine with del
    (no matching commit)
  - [RFC,2/6] Bluetooth: hci_conn: add hci_conn_is_alive
    (no matching commit)
  - [RFC,3/6] Bluetooth: hci_sync: add hci_conn_sync_queue and hci_cmd_sync_dev_(un)lock
    (no matching commit)
  - [RFC,4/6] Bluetooth: hci_sync: fix locking in hci_conn_abort and hci_disconnect_all_sync
    (no matching commit)
  - [RFC,5/6] Bluetooth: hci_sync: delete CIS in BT_OPEN/CONNECT/BOUND when aborting
    (no matching commit)
  - [RFC,6/6] Bluetooth: ISO: handle bound CIS cleanup via hci_conn
    https://git.kernel.org/bluetooth/bluetooth-next/c/2dfe76d58d3a

You are awesome, thank you!