diff mbox series

[v2,2/2] Bluetooth: hci_conn: verify connection is to be aborted before doing it

Message ID fd9c0f0eddb8a7d73e9797568dc3cf2bc9b8ad62.1696077070.git.pav@iki.fi (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] Bluetooth: hci_sync: always check if connection is alive before deleting | expand

Checks

Context Check Description
tedd_an/pre-ci_am success Success
tedd_an/CheckPatch success CheckPatch PASS
tedd_an/GitLint fail WARNING: I3 - ignore-body-lines: gitlint will be switching from using Python regex 'match' (match beginning) to 'search' (match anywhere) semantics. Please review your ignore-body-lines.regex option accordingly. To remove this warning, set general.regex-style-search=True. More details: https://jorisroovers.github.io/gitlint/configuration/#regex-style-search 18: B1 Line exceeds max length (88>80): "Closes: https://lore.kernel.org/linux-bluetooth/0000000000005ab984060371583e@google.com/" 31: B3 Line contains hard tab characters (\t): " 2874 conn = hci_conn_hash_lookup_handle(hdev, handle);" 32: B3 Line contains hard tab characters (\t): " 2875 if (!conn || WARN_ON(!conn->abort_reason))" 33: B3 Line contains hard tab characters (\t): " 2876 return 0;"
tedd_an/SubjectPrefix success Gitlint PASS
tedd_an/IncrementalBuild success Incremental Build PASS

Commit Message

Pauli Virtanen Sept. 30, 2023, 12:53 p.m. UTC
There is a race condition where a connection handle is reused, after
hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
this case, hci_abort_conn_sync ends up working on the wrong connection,
which can have abort_reason==0, in which case hci_connect_cfm is called
with success status and then connection is deleted, which causes various
use-after-free.

Fix by checking abort_reason is nonzero before calling
hci_abort_conn_sync. If it's zero, then the connection is probably a
different one than we expected and should not be aborted.

Also fix some theoretical UAF / races, where something frees the conn
while hci_abort_conn_sync is working on it.

Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
Reported-by: syzbot+a0c80b06ae2cb8895bc4@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-bluetooth/0000000000005ab984060371583e@google.com/
Signed-off-by: Pauli Virtanen <pav@iki.fi>
---

Notes:
    v2: drop one probably not necessary WARN_ON

    Not sure how you'd hit this condition in real controller, but syzbot
    does end up calling hci_abort_conn_sync with reason == 0 which then
    causes havoc.

    This can be verified: with a patch that changes abort_conn_sync to

        2874	conn = hci_conn_hash_lookup_handle(hdev, handle);
        2875	if (!conn || WARN_ON(!conn->abort_reason))
        2876		return 0;

    https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000

    it hits that WARN_ON:

    https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000

#syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master

 net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

syzbot Sept. 30, 2023, 1:28 p.m. UTC | #1
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+a0c80b06ae2cb8895bc4@syzkaller.appspotmail.com

Tested on:

commit:         62dc2425 Bluetooth: ISO: Fix invalid context error
git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=12897062680000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3d83e69abefedb6e
dashboard link: https://syzkaller.appspot.com/bug?extid=a0c80b06ae2cb8895bc4
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
patch:          https://syzkaller.appspot.com/x/patch.diff?x=14a7711a680000

Note: testing is done by a robot and is best-effort only.
Luiz Augusto von Dentz Oct. 2, 2023, 11:22 p.m. UTC | #2
Hi Pauli,

On Sat, Sep 30, 2023 at 3:25 PM Pauli Virtanen <pav@iki.fi> wrote:
>
> There is a race condition where a connection handle is reused, after
> hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
> this case, hci_abort_conn_sync ends up working on the wrong connection,
> which can have abort_reason==0, in which case hci_connect_cfm is called
> with success status and then connection is deleted, which causes various
> use-after-free.
>
> Fix by checking abort_reason is nonzero before calling
> hci_abort_conn_sync. If it's zero, then the connection is probably a
> different one than we expected and should not be aborted.
>
> Also fix some theoretical UAF / races, where something frees the conn
> while hci_abort_conn_sync is working on it.
>
> Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
> Reported-by: syzbot+a0c80b06ae2cb8895bc4@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/linux-bluetooth/0000000000005ab984060371583e@google.com/
> Signed-off-by: Pauli Virtanen <pav@iki.fi>
> ---
>
> Notes:
>     v2: drop one probably not necessary WARN_ON
>
>     Not sure how you'd hit this condition in real controller, but syzbot
>     does end up calling hci_abort_conn_sync with reason == 0 which then
>     causes havoc.
>
>     This can be verified: with a patch that changes abort_conn_sync to
>
>         2874    conn = hci_conn_hash_lookup_handle(hdev, handle);
>         2875    if (!conn || WARN_ON(!conn->abort_reason))
>         2876            return 0;
>
>     https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000
>
>     it hits that WARN_ON:
>
>     https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000
>
> #syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
>
>  net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index e62a5f368a51..22de82071e35 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -2901,12 +2901,25 @@ static int abort_conn_sync(struct hci_dev *hdev, void *data)
>  {
>         struct hci_conn *conn;
>         u16 handle = PTR_UINT(data);
> +       u8 reason;
> +       int err;
> +
> +       rcu_read_lock();
>
>         conn = hci_conn_hash_lookup_handle(hdev, handle);
> +       if (conn) {
> +               reason = READ_ONCE(conn->abort_reason);
> +               conn = reason ? hci_conn_get(conn) : NULL;
> +       }
> +
> +       rcu_read_unlock();
> +
>         if (!conn)
>                 return 0;
>
> -       return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
> +       err = hci_abort_conn_sync(hdev, conn, reason);
> +       hci_conn_put(conn);
> +       return err;
>  }
>
>  int hci_abort_conn(struct hci_conn *conn, u8 reason)
> @@ -2918,6 +2931,8 @@ int hci_abort_conn(struct hci_conn *conn, u8 reason)
>          */
>         if (conn->abort_reason)
>                 return 0;
> +       if (WARN_ON(!reason))
> +               reason = HCI_ERROR_UNSPECIFIED;
>
>         bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);
>
> --
> 2.41.0

Don't really like where this is going, the culprit here seems that we
are not able to cancel the callback when freeing the hci_conn so it
stays enqueued while the new connection is created with the same
handle, so I think we need to introduce a function that cancel queued
like Ive sent sometime back, that way we don't need to keep trying to
check if it is the same connection or not.
Pauli Virtanen Oct. 3, 2023, 4:41 p.m. UTC | #3
Hi Luiz,

ma, 2023-10-02 kello 16:22 -0700, Luiz Augusto von Dentz kirjoitti:
> Hi Pauli,
> 
> On Sat, Sep 30, 2023 at 3:25 PM Pauli Virtanen <pav@iki.fi> wrote:
> > 
> > There is a race condition where a connection handle is reused, after
> > hci_abort_conn but before abort_conn_sync is processed in hci_sync. In
> > this case, hci_abort_conn_sync ends up working on the wrong connection,
> > which can have abort_reason==0, in which case hci_connect_cfm is called
> > with success status and then connection is deleted, which causes various
> > use-after-free.
> > 
> > Fix by checking abort_reason is nonzero before calling
> > hci_abort_conn_sync. If it's zero, then the connection is probably a
> > different one than we expected and should not be aborted.
> > 
> > Also fix some theoretical UAF / races, where something frees the conn
> > while hci_abort_conn_sync is working on it.
> > 
> > Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections")
> > Reported-by: syzbot+a0c80b06ae2cb8895bc4@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/linux-bluetooth/0000000000005ab984060371583e@google.com/
> > Signed-off-by: Pauli Virtanen <pav@iki.fi>
> > ---
> > 
> > Notes:
> >     v2: drop one probably not necessary WARN_ON
> > 
> >     Not sure how you'd hit this condition in real controller, but syzbot
> >     does end up calling hci_abort_conn_sync with reason == 0 which then
> >     causes havoc.
> > 
> >     This can be verified: with a patch that changes abort_conn_sync to
> > 
> >         2874    conn = hci_conn_hash_lookup_handle(hdev, handle);
> >         2875    if (!conn || WARN_ON(!conn->abort_reason))
> >         2876            return 0;
> > 
> >     https://syzkaller.appspot.com/text?tag=Patch&x=16eff740680000
> > 
> >     it hits that WARN_ON:
> > 
> >     https://syzkaller.appspot.com/x/log.txt?x=10affb97a80000
> > 
> > #syz test git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git master
> > 
> >  net/bluetooth/hci_conn.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> > index e62a5f368a51..22de82071e35 100644
> > --- a/net/bluetooth/hci_conn.c
> > +++ b/net/bluetooth/hci_conn.c
> > @@ -2901,12 +2901,25 @@ static int abort_conn_sync(struct hci_dev *hdev, void *data)
> >  {
> >         struct hci_conn *conn;
> >         u16 handle = PTR_UINT(data);
> > +       u8 reason;
> > +       int err;
> > +
> > +       rcu_read_lock();
> > 
> >         conn = hci_conn_hash_lookup_handle(hdev, handle);
> > +       if (conn) {
> > +               reason = READ_ONCE(conn->abort_reason);
> > +               conn = reason ? hci_conn_get(conn) : NULL;
> > +       }
> > +
> > +       rcu_read_unlock();
> > +
> >         if (!conn)
> >                 return 0;
> > 
> > -       return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
> > +       err = hci_abort_conn_sync(hdev, conn, reason);
> > +       hci_conn_put(conn);
> > +       return err;
> >  }
> > 
> >  int hci_abort_conn(struct hci_conn *conn, u8 reason)
> > @@ -2918,6 +2931,8 @@ int hci_abort_conn(struct hci_conn *conn, u8 reason)
> >          */
> >         if (conn->abort_reason)
> >                 return 0;
> > +       if (WARN_ON(!reason))
> > +               reason = HCI_ERROR_UNSPECIFIED;
> > 
> >         bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);
> > 
> > --
> > 2.41.0
> 
> Don't really like where this is going, the culprit here seems that we
> are not able to cancel the callback when freeing the hci_conn so it
> stays enqueued while the new connection is created with the same
> handle, so I think we need to introduce a function that cancel queued
> like Ive sent sometime back, that way we don't need to keep trying to
> check if it is the same connection or not.

Ack.

Note though that the first patch in series is separate issue, and won't
be fixed by canceling callbacks.

Removing the item from hci_sync queue on deletion would also avoid
syzbot hitting the issue in this second patch.

There'd be a remaining theoretical race though in

	conn = hci_conn_hash_lookup_handle(hdev, handle);
	/* the other workqueue frees conn here -> UAF */
	if (!conn)
		return 0;

unless you do the other stuff here. I wonder if there instead should be
something like hci_cmd_sync_dev_lock/unlock from
https://lore.kernel.org/linux-bluetooth/15e7863c06bd87cd991ab963132fa9d12ef7e11a.1690399379.git.pav@iki.fi/
to limit the concurrency only to the places where we wait for
controller events.
diff mbox series

Patch

diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
index e62a5f368a51..22de82071e35 100644
--- a/net/bluetooth/hci_conn.c
+++ b/net/bluetooth/hci_conn.c
@@ -2901,12 +2901,25 @@  static int abort_conn_sync(struct hci_dev *hdev, void *data)
 {
 	struct hci_conn *conn;
 	u16 handle = PTR_UINT(data);
+	u8 reason;
+	int err;
+
+	rcu_read_lock();
 
 	conn = hci_conn_hash_lookup_handle(hdev, handle);
+	if (conn) {
+		reason = READ_ONCE(conn->abort_reason);
+		conn = reason ? hci_conn_get(conn) : NULL;
+	}
+
+	rcu_read_unlock();
+
 	if (!conn)
 		return 0;
 
-	return hci_abort_conn_sync(hdev, conn, conn->abort_reason);
+	err = hci_abort_conn_sync(hdev, conn, reason);
+	hci_conn_put(conn);
+	return err;
 }
 
 int hci_abort_conn(struct hci_conn *conn, u8 reason)
@@ -2918,6 +2931,8 @@  int hci_abort_conn(struct hci_conn *conn, u8 reason)
 	 */
 	if (conn->abort_reason)
 		return 0;
+	if (WARN_ON(!reason))
+		reason = HCI_ERROR_UNSPECIFIED;
 
 	bt_dev_dbg(hdev, "handle 0x%2.2x reason 0x%2.2x", conn->handle, reason);