diff mbox series

[v0] nfc: nci: add flush_workqueue to prevent uaf

Message ID 20220412160430.11581-1-linma@zju.edu.cn (mailing list archive)
State Accepted
Commit ef27324e2cb7bb24542d6cb2571740eefe6b00dc
Delegated to: Netdev Maintainers
Headers show
Series [v0] nfc: nci: add flush_workqueue to prevent uaf | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/cc_maintainers fail 2 blamed authors not CCed: linville@tuxdriver.com ilane@ti.com; 2 maintainers not CCed: linville@tuxdriver.com ilane@ti.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Lin Ma April 12, 2022, 4:04 p.m. UTC
Our detector found a concurrent use-after-free bug when detaching an
NCI device. The main reason for this bug is the unexpected scheduling
between the used delayed mechanism (timer and workqueue).

The race can be demonstrated below:

Thread-1                           Thread-2
                                 | nci_dev_up()
                                 |   nci_open_device()
                                 |     __nci_request(nci_reset_req)
                                 |       nci_send_cmd
                                 |         queue_work(cmd_work)
nci_unregister_device()          |
  nci_close_device()             | ...
    del_timer_sync(cmd_timer)[1] |
...                              | Worker
nci_free_device()                | nci_cmd_work()
  kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

In short, the cleanup routine thought that the cmd_timer has already
been detached by [1] but the mod_timer can re-attach the timer [2], even
it is already released [3], resulting in UAF.

This UAF is easy to trigger, crash trace by POC is like below

[   66.703713] ==================================================================
[   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
[   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
[   66.703974]
[   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
[   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
[   66.703974] Call Trace:
[   66.703974]  <TASK>
[   66.703974]  dump_stack_lvl+0x57/0x7d
[   66.703974]  print_report.cold+0x5e/0x5db
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  kasan_report+0xbe/0x1c0
[   66.703974]  ? enqueue_timer+0x448/0x490
[   66.703974]  enqueue_timer+0x448/0x490
[   66.703974]  __mod_timer+0x5e6/0xb80
[   66.703974]  ? mark_held_locks+0x9e/0xe0
[   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[   66.703974]  ? queue_work_on+0x61/0x80
[   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
[   66.703974]  process_one_work+0x8bb/0x1510
[   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
[   66.703974]  ? rwlock_bug.part.0+0x90/0x90
[   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
[   66.703974]  worker_thread+0x575/0x1190
[   66.703974]  ? process_one_work+0x1510/0x1510
[   66.703974]  kthread+0x2a0/0x340
[   66.703974]  ? kthread_complete_and_exit+0x20/0x20
[   66.703974]  ret_from_fork+0x22/0x30
[   66.703974]  </TASK>
[   66.703974]
[   66.703974] Allocated by task 267:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  __kasan_kmalloc+0x81/0xa0
[   66.703974]  nci_allocate_device+0xd3/0x390
[   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
[   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
[   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
[   66.703974]  tty_ioctl+0x764/0x1310
[   66.703974]  __x64_sys_ioctl+0x122/0x190
[   66.703974]  do_syscall_64+0x3b/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   66.703974]
[   66.703974] Freed by task 406:
[   66.703974]  kasan_save_stack+0x1e/0x40
[   66.703974]  kasan_set_track+0x21/0x30
[   66.703974]  kasan_set_free_info+0x20/0x30
[   66.703974]  __kasan_slab_free+0x108/0x170
[   66.703974]  kfree+0xb0/0x330
[   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
[   66.703974]  nci_uart_tty_close+0xdf/0x180
[   66.703974]  tty_ldisc_kill+0x73/0x110
[   66.703974]  tty_ldisc_hangup+0x281/0x5b0
[   66.703974]  __tty_hangup.part.0+0x431/0x890
[   66.703974]  tty_release+0x3a8/0xc80
[   66.703974]  __fput+0x1f0/0x8c0
[   66.703974]  task_work_run+0xc9/0x170
[   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
[   66.703974]  syscall_exit_to_user_mode+0x19/0x50
[   66.703974]  do_syscall_64+0x48/0x90
[   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae

To fix the UAF, this patch adds flush_workqueue() to ensure the
nci_cmd_work is finished before the following del_timer_sync.
This combination will promise the timer is actually detached.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
---
 net/nfc/nci/core.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Krzysztof Kozlowski April 13, 2022, 6:57 a.m. UTC | #1
On 12/04/2022 18:04, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 

Thanks!

Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

Best regards,
Krzysztof
patchwork-bot+netdevbpf@kernel.org April 13, 2022, 1:50 p.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Wed, 13 Apr 2022 00:04:30 +0800 you wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 
> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> [...]

Here is the summary with links:
  - [v0] nfc: nci: add flush_workqueue to prevent uaf
    https://git.kernel.org/netdev/net/c/ef27324e2cb7

You are awesome, thank you!
Guenter Roeck April 18, 2022, 1:41 p.m. UTC | #3
On Wed, Apr 13, 2022 at 12:04:30AM +0800, Lin Ma wrote:
> Our detector found a concurrent use-after-free bug when detaching an
> NCI device. The main reason for this bug is the unexpected scheduling
> between the used delayed mechanism (timer and workqueue).
> 
> The race can be demonstrated below:
> 
> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> In short, the cleanup routine thought that the cmd_timer has already
> been detached by [1] but the mod_timer can re-attach the timer [2], even
> it is already released [3], resulting in UAF.
> 
> This UAF is easy to trigger, crash trace by POC is like below
> 
> [   66.703713] ==================================================================
> [   66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490
> [   66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33
> [   66.703974]
> [   66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5
> [   66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work
> [   66.703974] Call Trace:
> [   66.703974]  <TASK>
> [   66.703974]  dump_stack_lvl+0x57/0x7d
> [   66.703974]  print_report.cold+0x5e/0x5db
> [   66.703974]  ? enqueue_timer+0x448/0x490
> [   66.703974]  kasan_report+0xbe/0x1c0
> [   66.703974]  ? enqueue_timer+0x448/0x490
> [   66.703974]  enqueue_timer+0x448/0x490
> [   66.703974]  __mod_timer+0x5e6/0xb80
> [   66.703974]  ? mark_held_locks+0x9e/0xe0
> [   66.703974]  ? try_to_del_timer_sync+0xf0/0xf0
> [   66.703974]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
> [   66.703974]  ? queue_work_on+0x61/0x80
> [   66.703974]  ? lockdep_hardirqs_on+0xbf/0x130
> [   66.703974]  process_one_work+0x8bb/0x1510
> [   66.703974]  ? lockdep_hardirqs_on_prepare+0x410/0x410
> [   66.703974]  ? pwq_dec_nr_in_flight+0x230/0x230
> [   66.703974]  ? rwlock_bug.part.0+0x90/0x90
> [   66.703974]  ? _raw_spin_lock_irq+0x41/0x50
> [   66.703974]  worker_thread+0x575/0x1190
> [   66.703974]  ? process_one_work+0x1510/0x1510
> [   66.703974]  kthread+0x2a0/0x340
> [   66.703974]  ? kthread_complete_and_exit+0x20/0x20
> [   66.703974]  ret_from_fork+0x22/0x30
> [   66.703974]  </TASK>
> [   66.703974]
> [   66.703974] Allocated by task 267:
> [   66.703974]  kasan_save_stack+0x1e/0x40
> [   66.703974]  __kasan_kmalloc+0x81/0xa0
> [   66.703974]  nci_allocate_device+0xd3/0x390
> [   66.703974]  nfcmrvl_nci_register_dev+0x183/0x2c0
> [   66.703974]  nfcmrvl_nci_uart_open+0xf2/0x1dd
> [   66.703974]  nci_uart_tty_ioctl+0x2c3/0x4a0
> [   66.703974]  tty_ioctl+0x764/0x1310
> [   66.703974]  __x64_sys_ioctl+0x122/0x190
> [   66.703974]  do_syscall_64+0x3b/0x90
> [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [   66.703974]
> [   66.703974] Freed by task 406:
> [   66.703974]  kasan_save_stack+0x1e/0x40
> [   66.703974]  kasan_set_track+0x21/0x30
> [   66.703974]  kasan_set_free_info+0x20/0x30
> [   66.703974]  __kasan_slab_free+0x108/0x170
> [   66.703974]  kfree+0xb0/0x330
> [   66.703974]  nfcmrvl_nci_unregister_dev+0x90/0xd0
> [   66.703974]  nci_uart_tty_close+0xdf/0x180
> [   66.703974]  tty_ldisc_kill+0x73/0x110
> [   66.703974]  tty_ldisc_hangup+0x281/0x5b0
> [   66.703974]  __tty_hangup.part.0+0x431/0x890
> [   66.703974]  tty_release+0x3a8/0xc80
> [   66.703974]  __fput+0x1f0/0x8c0
> [   66.703974]  task_work_run+0xc9/0x170
> [   66.703974]  exit_to_user_mode_prepare+0x194/0x1a0
> [   66.703974]  syscall_exit_to_user_mode+0x19/0x50
> [   66.703974]  do_syscall_64+0x48/0x90
> [   66.703974]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> To fix the UAF, this patch adds flush_workqueue() to ensure the
> nci_cmd_work is finished before the following del_timer_sync.
> This combination will promise the timer is actually detached.
> 
> Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
> Signed-off-by: Lin Ma <linma@zju.edu.cn>
> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
>  net/nfc/nci/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
> index d2537383a3e8..0d7763c322b5 100644
> --- a/net/nfc/nci/core.c
> +++ b/net/nfc/nci/core.c
> @@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev)
>  	mutex_lock(&ndev->req_lock);
>  
>  	if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
> +		/* Need to flush the cmd wq in case
> +		 * there is a queued/running cmd_work
> +		 */
> +		flush_workqueue(ndev->cmd_wq);
>  		del_timer_sync(&ndev->cmd_timer);

I have been wondering about this and the same code further below.
What prevents the command timer from firing after the call to
flush_workqueue() ?

Thanks,
Guenter

>  		del_timer_sync(&ndev->data_timer);
>  		mutex_unlock(&ndev->req_lock);
Lin Ma April 18, 2022, 1:59 p.m. UTC | #4
Hello Guenter,

> I have been wondering about this and the same code further below.
> What prevents the command timer from firing after the call to
> flush_workqueue() ?
> 
> Thanks,
> Guenter
> 

From my understanding, once the flush_workqueue() is executed, the work that queued in
ndev->cmd_wq will be taken the care of.

That is, once the flush_workqueue() is finished, it promises there is no executing or 
pending nci_cmd_work() ever.

static void nci_cmd_work(struct work_struct *work)
{
    // ...
		mod_timer(&ndev->cmd_timer,
			  jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
    // ...
}

The command timer is still able be fired because the mod_timer() here. That is why the
del_timer_sync() is necessary after the flush_workqueue().

One very puzzling part is that you may find out the timer queue the work again

/* NCI command timer function */
static void nci_cmd_timer(struct timer_list *t)
{
    // ...
	queue_work(ndev->cmd_wq, &ndev->cmd_work);
}

But I found that this is okay because there is no packets in ndev->cmd_q buffers hence 
even there is a queued nci_cmd_work(), it simply checks the queue and returns.

That is, the old race picture as below

> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     del_timer_sync(cmd_timer)[1] |
> ...                              | Worker
> nci_free_device()                | nci_cmd_work()
>   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]

is impossible now because the patched flush_workqueue() make the race like below

> Thread-1                           Thread-2
>                                  | nci_dev_up()
>                                  |   nci_open_device()
>                                  |     __nci_request(nci_reset_req)
>                                  |       nci_send_cmd
>                                  |         queue_work(cmd_work)
> nci_unregister_device()          |
>   nci_close_device()             | ...
>     flush_workqueue()[patch]     | Worker
>                                  | nci_cmd_work()
>                                  |   mod_timer(cmd_timer)[2]
>     // work over then return
>     del_timer_sync(cmd_timer)[1] |
>                                  | Timer
>                                  | nci_cmd_timer()
>                                  | 
>     // timer over then return    |
> ...                              |
> nci_free_device()                | 
>   kfree(ndev)[3]                 | 


With above thinkings and the given fact that my POC didn't raise the UAF, I think the 
flush_workqueue() + del_timer_sync() combination is okay to hinder this race.

Tell me if there is anything wrong.

Regards
Lin Ma
Guenter Roeck April 23, 2022, 1:52 p.m. UTC | #5
On Mon, Apr 18, 2022 at 09:59:10PM +0800, Lin Ma wrote:
> Hello Guenter,
> 
> > I have been wondering about this and the same code further below.
> > What prevents the command timer from firing after the call to
> > flush_workqueue() ?
> > 
> > Thanks,
> > Guenter
> > 
> 
> From my understanding, once the flush_workqueue() is executed, the work that queued in
> ndev->cmd_wq will be taken the care of.
> 
> That is, once the flush_workqueue() is finished, it promises there is no executing or 
> pending nci_cmd_work() ever.
> 
> static void nci_cmd_work(struct work_struct *work)
> {
>     // ...
> 		mod_timer(&ndev->cmd_timer,
> 			  jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT));
>     // ...
> }
> 
> The command timer is still able be fired because the mod_timer() here. That is why the
> del_timer_sync() is necessary after the flush_workqueue().
> 
> One very puzzling part is that you may find out the timer queue the work again
> 
> /* NCI command timer function */
> static void nci_cmd_timer(struct timer_list *t)
> {
>     // ...
> 	queue_work(ndev->cmd_wq, &ndev->cmd_work);
> }
> 
> But I found that this is okay because there is no packets in ndev->cmd_q buffers hence 
> even there is a queued nci_cmd_work(), it simply checks the queue and returns.
> 
> That is, the old race picture as below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     del_timer_sync(cmd_timer)[1] |
> > ...                              | Worker
> > nci_free_device()                | nci_cmd_work()
> >   kfree(ndev)[3]                 |   mod_timer(cmd_timer)[2]
> 
> is impossible now because the patched flush_workqueue() make the race like below
> 
> > Thread-1                           Thread-2
> >                                  | nci_dev_up()
> >                                  |   nci_open_device()
> >                                  |     __nci_request(nci_reset_req)
> >                                  |       nci_send_cmd
> >                                  |         queue_work(cmd_work)
> > nci_unregister_device()          |
> >   nci_close_device()             | ...
> >     flush_workqueue()[patch]     | Worker
> >                                  | nci_cmd_work()
> >                                  |   mod_timer(cmd_timer)[2]
> >     // work over then return
> >     del_timer_sync(cmd_timer)[1] |
> >                                  | Timer
> >                                  | nci_cmd_timer()
> >                                  | 
> >     // timer over then return    |
> > ...                              |
> > nci_free_device()                | 
> >   kfree(ndev)[3]                 | 
> 
> 
> With above thinkings and the given fact that my POC didn't raise the UAF, I think the 
> flush_workqueue() + del_timer_sync() combination is okay to hinder this race.
> 
> Tell me if there is anything wrong.
> 

Thanks a lot for the detailed explanation and analysis.
I agree with your conclusion.

Guenter
diff mbox series

Patch

diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index d2537383a3e8..0d7763c322b5 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -560,6 +560,10 @@  static int nci_close_device(struct nci_dev *ndev)
 	mutex_lock(&ndev->req_lock);
 
 	if (!test_and_clear_bit(NCI_UP, &ndev->flags)) {
+		/* Need to flush the cmd wq in case
+		 * there is a queued/running cmd_work
+		 */
+		flush_workqueue(ndev->cmd_wq);
 		del_timer_sync(&ndev->cmd_timer);
 		del_timer_sync(&ndev->data_timer);
 		mutex_unlock(&ndev->req_lock);