diff mbox series

CVE-2021-3640 and the unlimited block of lock_sock()

Message ID s5hv9418mjk.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show
Series CVE-2021-3640 and the unlimited block of lock_sock() | expand

Commit Message

Takashi Iwai Aug. 19, 2021, 3:46 p.m. UTC
Hi,

it seems that the recent fixes in bluetooth tree address most of
issues in CVE-2021-3640 ("Use-After-Free vulnerability in function
sco_sock_sendmsg()").  But there is still a problem left: although we
cover the race with lock_sock() now, the lock may be blocked endlessly
(as the task takes over with userfaultd), which result in the trigger
of watchdog like:

-- 8< --
[   23.226767][    T7] Bluetooth: hci0: command 0x0419 tx timeout
[  284.985881][ T1529] INFO: task poc:7603 blocked for more than 143 seconds.
[  284.989134][ T1529]       Not tainted 5.13.0-rc4+ #48
[  284.990098][ T1529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  284.991705][ T1529] task:poc             state:D stack:13784 pid: 7603 ppid:  7593 flags:0x00000000
[  284.993414][ T1529] Call Trace:
[  284.994025][ T1529]  __schedule+0x32e/0xb90
[  284.994842][ T1529]  ? __local_bh_enable_ip+0x72/0xe0
[  284.995987][ T1529]  schedule+0x38/0xe0
[  284.996723][ T1529]  __lock_sock+0xa1/0x130
[  284.997434][ T1529]  ? finish_wait+0x80/0x80
[  284.998150][ T1529]  lock_sock_nested+0x9f/0xb0
[  284.998914][ T1529]  sco_conn_del+0xb1/0x1a0
[  284.999619][ T1529]  ? sco_conn_del+0x1a0/0x1a0
[  285.000361][ T1529]  sco_disconn_cfm+0x3a/0x60
[  285.001116][ T1529]  hci_conn_hash_flush+0x95/0x130
[  285.001921][ T1529]  hci_dev_do_close+0x298/0x680
[  285.002687][ T1529]  ? up_write+0x12/0x130
[  285.003367][ T1529]  ? vhci_close_dev+0x20/0x20
[  285.004107][ T1529]  hci_unregister_dev+0x9f/0x240
[  285.004886][ T1529]  vhci_release+0x35/0x70
[  285.005602][ T1529]  __fput+0xdf/0x360
[  285.006225][ T1529]  task_work_run+0x86/0xd0
[  285.006927][ T1529]  exit_to_user_mode_prepare+0x267/0x270
[  285.007824][ T1529]  syscall_exit_to_user_mode+0x19/0x60
[  285.008694][ T1529]  do_syscall_64+0x42/0xa0
[  285.009393][ T1529]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  285.010321][ T1529] RIP: 0033:0x4065c7
-- 8< --

Is there any plan to address this?

As a quick hack, I confirmed a workaround like below:

-- 8< --
-- 8< --

.... but I'm not sure whether it's the right way to go.


thanks,

Takashi

Comments

bluez.test.bot@gmail.com Aug. 19, 2021, 4:14 p.m. UTC | #1
This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=534275

---Test result---

Test Summary:
CheckPatch                    FAIL      0.51 seconds
GitLint                       FAIL      0.13 seconds
BuildKernel                   PASS      633.87 seconds
TestRunner: Setup             PASS      404.76 seconds
TestRunner: l2cap-tester      PASS      2.89 seconds
TestRunner: bnep-tester       PASS      2.07 seconds
TestRunner: mgmt-tester       PASS      32.30 seconds
TestRunner: rfcomm-tester     PASS      2.45 seconds
TestRunner: sco-tester        PASS      2.26 seconds
TestRunner: smp-tester        FAIL      2.33 seconds
TestRunner: userchan-tester   PASS      2.20 seconds

Details
##############################
Test: CheckPatch - FAIL - 0.51 seconds
Run checkpatch.pl script with rule in .checkpatch.conf
CVE-2021-3640 and the unlimited block of lock_sock()
ERROR: Missing Signed-off-by: line(s)

total: 1 errors, 0 warnings, 0 checks, 8 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

"[PATCH] CVE-2021-3640 and the unlimited block of lock_sock()" has style problems, please review.

NOTE: If any of the errors are false positives, please report
      them to the maintainer, see CHECKPATCH in MAINTAINERS.


##############################
Test: GitLint - FAIL - 0.13 seconds
Run gitlint with rule in .gitlint
CVE-2021-3640 and the unlimited block of lock_sock()
16: B1 Line exceeds max length (96>80): "[  284.990098][ T1529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message."
17: B1 Line exceeds max length (102>80): "[  284.991705][ T1529] task:poc             state:D stack:13784 pid: 7603 ppid:  7593 flags:0x00000000"


##############################
Test: BuildKernel - PASS - 633.87 seconds
Build Kernel with minimal configuration supports Bluetooth


##############################
Test: TestRunner: Setup - PASS - 404.76 seconds
Setup environment for running Test Runner


##############################
Test: TestRunner: l2cap-tester - PASS - 2.89 seconds
Run test-runner with l2cap-tester
Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: bnep-tester - PASS - 2.07 seconds
Run test-runner with bnep-tester
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: mgmt-tester - PASS - 32.30 seconds
Run test-runner with mgmt-tester
Total: 448, Passed: 445 (99.3%), Failed: 0, Not Run: 3

##############################
Test: TestRunner: rfcomm-tester - PASS - 2.45 seconds
Run test-runner with rfcomm-tester
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: sco-tester - PASS - 2.26 seconds
Run test-runner with sco-tester
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: TestRunner: smp-tester - FAIL - 2.33 seconds
Run test-runner with smp-tester
Total: 8, Passed: 7 (87.5%), Failed: 1, Not Run: 0

Failed Test Cases
SMP Client - SC Request 2                            Failed       0.030 seconds

##############################
Test: TestRunner: userchan-tester - PASS - 2.20 seconds
Run test-runner with userchan-tester
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth
Takashi Iwai Aug. 26, 2021, 10:27 a.m. UTC | #2
On Thu, 19 Aug 2021 17:46:39 +0200,
Takashi Iwai wrote:
> 
> Hi,
> 
> it seems that the recent fixes in bluetooth tree address most of
> issues in CVE-2021-3640 ("Use-After-Free vulnerability in function
> sco_sock_sendmsg()").  But there is still a problem left: although we
> cover the race with lock_sock() now, the lock may be blocked endlessly
> (as the task takes over with userfaultd), which result in the trigger
> of watchdog like:
> 
> -- 8< --
> [   23.226767][    T7] Bluetooth: hci0: command 0x0419 tx timeout
> [  284.985881][ T1529] INFO: task poc:7603 blocked for more than 143 seconds.
> [  284.989134][ T1529]       Not tainted 5.13.0-rc4+ #48
> [  284.990098][ T1529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  284.991705][ T1529] task:poc             state:D stack:13784 pid: 7603 ppid:  7593 flags:0x00000000
> [  284.993414][ T1529] Call Trace:
> [  284.994025][ T1529]  __schedule+0x32e/0xb90
> [  284.994842][ T1529]  ? __local_bh_enable_ip+0x72/0xe0
> [  284.995987][ T1529]  schedule+0x38/0xe0
> [  284.996723][ T1529]  __lock_sock+0xa1/0x130
> [  284.997434][ T1529]  ? finish_wait+0x80/0x80
> [  284.998150][ T1529]  lock_sock_nested+0x9f/0xb0
> [  284.998914][ T1529]  sco_conn_del+0xb1/0x1a0
> [  284.999619][ T1529]  ? sco_conn_del+0x1a0/0x1a0
> [  285.000361][ T1529]  sco_disconn_cfm+0x3a/0x60
> [  285.001116][ T1529]  hci_conn_hash_flush+0x95/0x130
> [  285.001921][ T1529]  hci_dev_do_close+0x298/0x680
> [  285.002687][ T1529]  ? up_write+0x12/0x130
> [  285.003367][ T1529]  ? vhci_close_dev+0x20/0x20
> [  285.004107][ T1529]  hci_unregister_dev+0x9f/0x240
> [  285.004886][ T1529]  vhci_release+0x35/0x70
> [  285.005602][ T1529]  __fput+0xdf/0x360
> [  285.006225][ T1529]  task_work_run+0x86/0xd0
> [  285.006927][ T1529]  exit_to_user_mode_prepare+0x267/0x270
> [  285.007824][ T1529]  syscall_exit_to_user_mode+0x19/0x60
> [  285.008694][ T1529]  do_syscall_64+0x42/0xa0
> [  285.009393][ T1529]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  285.010321][ T1529] RIP: 0033:0x4065c7
> -- 8< --
> 
> Is there any plan to address this?
> 
> As a quick hack, I confirmed a workaround like below:
> 
> -- 8< --
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2628,7 +2628,7 @@ void __lock_sock(struct sock *sk)
>  		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
>  					TASK_UNINTERRUPTIBLE);
>  		spin_unlock_bh(&sk->sk_lock.slock);
> -		schedule();
> +		schedule_timeout(msecs_to_jiffies(10 * 1000));
>  		spin_lock_bh(&sk->sk_lock.slock);
>  		if (!sock_owned_by_user(sk))
>  			break;
> -- 8< --
> 
> .... but I'm not sure whether it's the right way to go.

Does anyone has an idea?


thanks,

Takashi
Luiz Augusto von Dentz Aug. 27, 2021, 1:28 a.m. UTC | #3
Hi Takashi,

On Thu, Aug 26, 2021 at 3:29 AM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Thu, 19 Aug 2021 17:46:39 +0200,
> Takashi Iwai wrote:
> >
> > Hi,
> >
> > it seems that the recent fixes in bluetooth tree address most of
> > issues in CVE-2021-3640 ("Use-After-Free vulnerability in function
> > sco_sock_sendmsg()").  But there is still a problem left: although we
> > cover the race with lock_sock() now, the lock may be blocked endlessly
> > (as the task takes over with userfaultd), which result in the trigger
> > of watchdog like:
> >
> > -- 8< --
> > [   23.226767][    T7] Bluetooth: hci0: command 0x0419 tx timeout
> > [  284.985881][ T1529] INFO: task poc:7603 blocked for more than 143 seconds.
> > [  284.989134][ T1529]       Not tainted 5.13.0-rc4+ #48
> > [  284.990098][ T1529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  284.991705][ T1529] task:poc             state:D stack:13784 pid: 7603 ppid:  7593 flags:0x00000000
> > [  284.993414][ T1529] Call Trace:
> > [  284.994025][ T1529]  __schedule+0x32e/0xb90
> > [  284.994842][ T1529]  ? __local_bh_enable_ip+0x72/0xe0
> > [  284.995987][ T1529]  schedule+0x38/0xe0
> > [  284.996723][ T1529]  __lock_sock+0xa1/0x130
> > [  284.997434][ T1529]  ? finish_wait+0x80/0x80
> > [  284.998150][ T1529]  lock_sock_nested+0x9f/0xb0
> > [  284.998914][ T1529]  sco_conn_del+0xb1/0x1a0
> > [  284.999619][ T1529]  ? sco_conn_del+0x1a0/0x1a0
> > [  285.000361][ T1529]  sco_disconn_cfm+0x3a/0x60
> > [  285.001116][ T1529]  hci_conn_hash_flush+0x95/0x130
> > [  285.001921][ T1529]  hci_dev_do_close+0x298/0x680
> > [  285.002687][ T1529]  ? up_write+0x12/0x130
> > [  285.003367][ T1529]  ? vhci_close_dev+0x20/0x20
> > [  285.004107][ T1529]  hci_unregister_dev+0x9f/0x240
> > [  285.004886][ T1529]  vhci_release+0x35/0x70
> > [  285.005602][ T1529]  __fput+0xdf/0x360
> > [  285.006225][ T1529]  task_work_run+0x86/0xd0
> > [  285.006927][ T1529]  exit_to_user_mode_prepare+0x267/0x270
> > [  285.007824][ T1529]  syscall_exit_to_user_mode+0x19/0x60
> > [  285.008694][ T1529]  do_syscall_64+0x42/0xa0
> > [  285.009393][ T1529]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  285.010321][ T1529] RIP: 0033:0x4065c7
> > -- 8< --
> >
> > Is there any plan to address this?
> >
> > As a quick hack, I confirmed a workaround like below:
> >
> > -- 8< --
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2628,7 +2628,7 @@ void __lock_sock(struct sock *sk)
> >               prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
> >                                       TASK_UNINTERRUPTIBLE);
> >               spin_unlock_bh(&sk->sk_lock.slock);
> > -             schedule();
> > +             schedule_timeout(msecs_to_jiffies(10 * 1000));
> >               spin_lock_bh(&sk->sk_lock.slock);
> >               if (!sock_owned_by_user(sk))
> >                       break;
> > -- 8< --
> >
> > .... but I'm not sure whether it's the right way to go.
>
> Does anyone has an idea?

It seems that we need to rework some code so the functions affected by
userfaultfd are not used with sock_lock held.
Takashi Iwai Aug. 28, 2021, 4:06 p.m. UTC | #4
On Fri, 27 Aug 2021 03:28:09 +0200,
Luiz Augusto von Dentz wrote:
> 
> Hi Takashi,
> 
> On Thu, Aug 26, 2021 at 3:29 AM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Thu, 19 Aug 2021 17:46:39 +0200,
> > Takashi Iwai wrote:
> > >
> > > Hi,
> > >
> > > it seems that the recent fixes in bluetooth tree address most of
> > > issues in CVE-2021-3640 ("Use-After-Free vulnerability in function
> > > sco_sock_sendmsg()").  But there is still a problem left: although we
> > > cover the race with lock_sock() now, the lock may be blocked endlessly
> > > (as the task takes over with userfaultd), which result in the trigger
> > > of watchdog like:
> > >
> > > -- 8< --
> > > [   23.226767][    T7] Bluetooth: hci0: command 0x0419 tx timeout
> > > [  284.985881][ T1529] INFO: task poc:7603 blocked for more than 143 seconds.
> > > [  284.989134][ T1529]       Not tainted 5.13.0-rc4+ #48
> > > [  284.990098][ T1529] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  284.991705][ T1529] task:poc             state:D stack:13784 pid: 7603 ppid:  7593 flags:0x00000000
> > > [  284.993414][ T1529] Call Trace:
> > > [  284.994025][ T1529]  __schedule+0x32e/0xb90
> > > [  284.994842][ T1529]  ? __local_bh_enable_ip+0x72/0xe0
> > > [  284.995987][ T1529]  schedule+0x38/0xe0
> > > [  284.996723][ T1529]  __lock_sock+0xa1/0x130
> > > [  284.997434][ T1529]  ? finish_wait+0x80/0x80
> > > [  284.998150][ T1529]  lock_sock_nested+0x9f/0xb0
> > > [  284.998914][ T1529]  sco_conn_del+0xb1/0x1a0
> > > [  284.999619][ T1529]  ? sco_conn_del+0x1a0/0x1a0
> > > [  285.000361][ T1529]  sco_disconn_cfm+0x3a/0x60
> > > [  285.001116][ T1529]  hci_conn_hash_flush+0x95/0x130
> > > [  285.001921][ T1529]  hci_dev_do_close+0x298/0x680
> > > [  285.002687][ T1529]  ? up_write+0x12/0x130
> > > [  285.003367][ T1529]  ? vhci_close_dev+0x20/0x20
> > > [  285.004107][ T1529]  hci_unregister_dev+0x9f/0x240
> > > [  285.004886][ T1529]  vhci_release+0x35/0x70
> > > [  285.005602][ T1529]  __fput+0xdf/0x360
> > > [  285.006225][ T1529]  task_work_run+0x86/0xd0
> > > [  285.006927][ T1529]  exit_to_user_mode_prepare+0x267/0x270
> > > [  285.007824][ T1529]  syscall_exit_to_user_mode+0x19/0x60
> > > [  285.008694][ T1529]  do_syscall_64+0x42/0xa0
> > > [  285.009393][ T1529]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [  285.010321][ T1529] RIP: 0033:0x4065c7
> > > -- 8< --
> > >
> > > Is there any plan to address this?
> > >
> > > As a quick hack, I confirmed a workaround like below:
> > >
> > > -- 8< --
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -2628,7 +2628,7 @@ void __lock_sock(struct sock *sk)
> > >               prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
> > >                                       TASK_UNINTERRUPTIBLE);
> > >               spin_unlock_bh(&sk->sk_lock.slock);
> > > -             schedule();
> > > +             schedule_timeout(msecs_to_jiffies(10 * 1000));
> > >               spin_lock_bh(&sk->sk_lock.slock);
> > >               if (!sock_owned_by_user(sk))
> > >                       break;
> > > -- 8< --
> > >
> > > .... but I'm not sure whether it's the right way to go.
> >
> > Does anyone has an idea?
> 
> It seems that we need to rework some code so the functions affected by
> userfaultfd are not used with sock_lock held.

OK, now I tried a similar way like the commit 92c685dc5de0 to move the
memcpy_from_msg() call out of lock_sock(), and it seems working.
I'm going to submit the fix.


thanks,

Takashi
diff mbox series

Patch

--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2628,7 +2628,7 @@  void __lock_sock(struct sock *sk)
 		prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait,
 					TASK_UNINTERRUPTIBLE);
 		spin_unlock_bh(&sk->sk_lock.slock);
-		schedule();
+		schedule_timeout(msecs_to_jiffies(10 * 1000));
 		spin_lock_bh(&sk->sk_lock.slock);
 		if (!sock_owned_by_user(sk))
 			break;