Message ID | 20220412160430.11581-1-linma@zju.edu.cn (mailing list archive) |
---|---|
State | Accepted |
Commit | ef27324e2cb7bb24542d6cb2571740eefe6b00dc |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v0] nfc: nci: add flush_workqueue to prevent uaf | expand |
On 12/04/2022 18:04, Lin Ma wrote: > Our detector found a concurrent use-after-free bug when detaching an > NCI device. The main reason for this bug is the unexpected scheduling > between the used delayed mechanism (timer and workqueue). > > The race can be demonstrated below: > Thanks! Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Best regards, Krzysztof
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Wed, 13 Apr 2022 00:04:30 +0800 you wrote: > Our detector found a concurrent use-after-free bug when detaching an > NCI device. The main reason for this bug is the unexpected scheduling > between the used delayed mechanism (timer and workqueue). > > The race can be demonstrated below: > > Thread-1 Thread-2 > | nci_dev_up() > | nci_open_device() > | __nci_request(nci_reset_req) > | nci_send_cmd > | queue_work(cmd_work) > nci_unregister_device() | > nci_close_device() | ... > del_timer_sync(cmd_timer)[1] | > ... | Worker > nci_free_device() | nci_cmd_work() > kfree(ndev)[3] | mod_timer(cmd_timer)[2] > > [...] Here is the summary with links: - [v0] nfc: nci: add flush_workqueue to prevent uaf https://git.kernel.org/netdev/net/c/ef27324e2cb7 You are awesome, thank you!
On Wed, Apr 13, 2022 at 12:04:30AM +0800, Lin Ma wrote: > Our detector found a concurrent use-after-free bug when detaching an > NCI device. The main reason for this bug is the unexpected scheduling > between the used delayed mechanism (timer and workqueue). > > The race can be demonstrated below: > > Thread-1 Thread-2 > | nci_dev_up() > | nci_open_device() > | __nci_request(nci_reset_req) > | nci_send_cmd > | queue_work(cmd_work) > nci_unregister_device() | > nci_close_device() | ... > del_timer_sync(cmd_timer)[1] | > ... | Worker > nci_free_device() | nci_cmd_work() > kfree(ndev)[3] | mod_timer(cmd_timer)[2] > > In short, the cleanup routine thought that the cmd_timer has already > been detached by [1] but the mod_timer can re-attach the timer [2], even > it is already released [3], resulting in UAF. > > This UAF is easy to trigger, crash trace by POC is like below > > [ 66.703713] ================================================================== > [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490 > [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33 > [ 66.703974] > [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5 > [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work > [ 66.703974] Call Trace: > [ 66.703974] <TASK> > [ 66.703974] dump_stack_lvl+0x57/0x7d > [ 66.703974] print_report.cold+0x5e/0x5db > [ 66.703974] ? enqueue_timer+0x448/0x490 > [ 66.703974] kasan_report+0xbe/0x1c0 > [ 66.703974] ? enqueue_timer+0x448/0x490 > [ 66.703974] enqueue_timer+0x448/0x490 > [ 66.703974] __mod_timer+0x5e6/0xb80 > [ 66.703974] ? mark_held_locks+0x9e/0xe0 > [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0 > [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410 > [ 66.703974] ? queue_work_on+0x61/0x80 > [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130 > [ 66.703974] process_one_work+0x8bb/0x1510 > [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410 > [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230 > [ 66.703974] ? rwlock_bug.part.0+0x90/0x90 > [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50 > [ 66.703974] worker_thread+0x575/0x1190 > [ 66.703974] ? process_one_work+0x1510/0x1510 > [ 66.703974] kthread+0x2a0/0x340 > [ 66.703974] ? kthread_complete_and_exit+0x20/0x20 > [ 66.703974] ret_from_fork+0x22/0x30 > [ 66.703974] </TASK> > [ 66.703974] > [ 66.703974] Allocated by task 267: > [ 66.703974] kasan_save_stack+0x1e/0x40 > [ 66.703974] __kasan_kmalloc+0x81/0xa0 > [ 66.703974] nci_allocate_device+0xd3/0x390 > [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0 > [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd > [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0 > [ 66.703974] tty_ioctl+0x764/0x1310 > [ 66.703974] __x64_sys_ioctl+0x122/0x190 > [ 66.703974] do_syscall_64+0x3b/0x90 > [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 66.703974] > [ 66.703974] Freed by task 406: > [ 66.703974] kasan_save_stack+0x1e/0x40 > [ 66.703974] kasan_set_track+0x21/0x30 > [ 66.703974] kasan_set_free_info+0x20/0x30 > [ 66.703974] __kasan_slab_free+0x108/0x170 > [ 66.703974] kfree+0xb0/0x330 > [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0 > [ 66.703974] nci_uart_tty_close+0xdf/0x180 > [ 66.703974] tty_ldisc_kill+0x73/0x110 > [ 66.703974] tty_ldisc_hangup+0x281/0x5b0 > [ 66.703974] __tty_hangup.part.0+0x431/0x890 > [ 66.703974] tty_release+0x3a8/0xc80 > [ 66.703974] __fput+0x1f0/0x8c0 > [ 66.703974] task_work_run+0xc9/0x170 > [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0 > [ 66.703974] syscall_exit_to_user_mode+0x19/0x50 > [ 66.703974] do_syscall_64+0x48/0x90 > [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae > > To fix the UAF, this patch adds flush_workqueue() to ensure the > nci_cmd_work is finished before the following del_timer_sync. > This combination will promise the timer is actually detached. > > Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") > Signed-off-by: Lin Ma <linma@zju.edu.cn> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> > --- > net/nfc/nci/core.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c > index d2537383a3e8..0d7763c322b5 100644 > --- a/net/nfc/nci/core.c > +++ b/net/nfc/nci/core.c > @@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev) > mutex_lock(&ndev->req_lock); > > if (!test_and_clear_bit(NCI_UP, &ndev->flags)) { > + /* Need to flush the cmd wq in case > + * there is a queued/running cmd_work > + */ > + flush_workqueue(ndev->cmd_wq); > del_timer_sync(&ndev->cmd_timer); I have been wondering about this and the same code further below. What prevents the command timer from firing after the call to flush_workqueue() ? Thanks, Guenter > del_timer_sync(&ndev->data_timer); > mutex_unlock(&ndev->req_lock);
Hello Guenter, > I have been wondering about this and the same code further below. > What prevents the command timer from firing after the call to > flush_workqueue() ? > > Thanks, > Guenter > From my understanding, once the flush_workqueue() is executed, the work that queued in ndev->cmd_wq will be taken the care of. That is, once the flush_workqueue() is finished, it promises there is no executing or pending nci_cmd_work() ever. static void nci_cmd_work(struct work_struct *work) { // ... mod_timer(&ndev->cmd_timer, jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT)); // ... } The command timer is still able be fired because the mod_timer() here. That is why the del_timer_sync() is necessary after the flush_workqueue(). One very puzzling part is that you may find out the timer queue the work again /* NCI command timer function */ static void nci_cmd_timer(struct timer_list *t) { // ... queue_work(ndev->cmd_wq, &ndev->cmd_work); } But I found that this is okay because there is no packets in ndev->cmd_q buffers hence even there is a queued nci_cmd_work(), it simply checks the queue and returns. That is, the old race picture as below > Thread-1 Thread-2 > | nci_dev_up() > | nci_open_device() > | __nci_request(nci_reset_req) > | nci_send_cmd > | queue_work(cmd_work) > nci_unregister_device() | > nci_close_device() | ... > del_timer_sync(cmd_timer)[1] | > ... | Worker > nci_free_device() | nci_cmd_work() > kfree(ndev)[3] | mod_timer(cmd_timer)[2] is impossible now because the patched flush_workqueue() make the race like below > Thread-1 Thread-2 > | nci_dev_up() > | nci_open_device() > | __nci_request(nci_reset_req) > | nci_send_cmd > | queue_work(cmd_work) > nci_unregister_device() | > nci_close_device() | ... > flush_workqueue()[patch] | Worker > | nci_cmd_work() > | mod_timer(cmd_timer)[2] > // work over then return > del_timer_sync(cmd_timer)[1] | > | Timer > | nci_cmd_timer() > | > // timer over then return | > ... | > nci_free_device() | > kfree(ndev)[3] | With above thinkings and the given fact that my POC didn't raise the UAF, I think the flush_workqueue() + del_timer_sync() combination is okay to hinder this race. Tell me if there is anything wrong. Regards Lin Ma
On Mon, Apr 18, 2022 at 09:59:10PM +0800, Lin Ma wrote: > Hello Guenter, > > > I have been wondering about this and the same code further below. > > What prevents the command timer from firing after the call to > > flush_workqueue() ? > > > > Thanks, > > Guenter > > > > From my understanding, once the flush_workqueue() is executed, the work that queued in > ndev->cmd_wq will be taken the care of. > > That is, once the flush_workqueue() is finished, it promises there is no executing or > pending nci_cmd_work() ever. > > static void nci_cmd_work(struct work_struct *work) > { > // ... > mod_timer(&ndev->cmd_timer, > jiffies + msecs_to_jiffies(NCI_CMD_TIMEOUT)); > // ... > } > > The command timer is still able be fired because the mod_timer() here. That is why the > del_timer_sync() is necessary after the flush_workqueue(). > > One very puzzling part is that you may find out the timer queue the work again > > /* NCI command timer function */ > static void nci_cmd_timer(struct timer_list *t) > { > // ... > queue_work(ndev->cmd_wq, &ndev->cmd_work); > } > > But I found that this is okay because there is no packets in ndev->cmd_q buffers hence > even there is a queued nci_cmd_work(), it simply checks the queue and returns. > > That is, the old race picture as below > > > Thread-1 Thread-2 > > | nci_dev_up() > > | nci_open_device() > > | __nci_request(nci_reset_req) > > | nci_send_cmd > > | queue_work(cmd_work) > > nci_unregister_device() | > > nci_close_device() | ... > > del_timer_sync(cmd_timer)[1] | > > ... | Worker > > nci_free_device() | nci_cmd_work() > > kfree(ndev)[3] | mod_timer(cmd_timer)[2] > > is impossible now because the patched flush_workqueue() make the race like below > > > Thread-1 Thread-2 > > | nci_dev_up() > > | nci_open_device() > > | __nci_request(nci_reset_req) > > | nci_send_cmd > > | queue_work(cmd_work) > > nci_unregister_device() | > > nci_close_device() | ... > > flush_workqueue()[patch] | Worker > > | nci_cmd_work() > > | mod_timer(cmd_timer)[2] > > // work over then return > > del_timer_sync(cmd_timer)[1] | > > | Timer > > | nci_cmd_timer() > > | > > // timer over then return | > > ... | > > nci_free_device() | > > kfree(ndev)[3] | > > > With above thinkings and the given fact that my POC didn't raise the UAF, I think the > flush_workqueue() + del_timer_sync() combination is okay to hinder this race. > > Tell me if there is anything wrong. > Thanks a lot for the detailed explanation and analysis. I agree with your conclusion. Guenter
diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c index d2537383a3e8..0d7763c322b5 100644 --- a/net/nfc/nci/core.c +++ b/net/nfc/nci/core.c @@ -560,6 +560,10 @@ static int nci_close_device(struct nci_dev *ndev) mutex_lock(&ndev->req_lock); if (!test_and_clear_bit(NCI_UP, &ndev->flags)) { + /* Need to flush the cmd wq in case + * there is a queued/running cmd_work + */ + flush_workqueue(ndev->cmd_wq); del_timer_sync(&ndev->cmd_timer); del_timer_sync(&ndev->data_timer); mutex_unlock(&ndev->req_lock);
Our detector found a concurrent use-after-free bug when detaching an NCI device. The main reason for this bug is the unexpected scheduling between the used delayed mechanism (timer and workqueue). The race can be demonstrated below: Thread-1 Thread-2 | nci_dev_up() | nci_open_device() | __nci_request(nci_reset_req) | nci_send_cmd | queue_work(cmd_work) nci_unregister_device() | nci_close_device() | ... del_timer_sync(cmd_timer)[1] | ... | Worker nci_free_device() | nci_cmd_work() kfree(ndev)[3] | mod_timer(cmd_timer)[2] In short, the cleanup routine thought that the cmd_timer has already been detached by [1] but the mod_timer can re-attach the timer [2], even it is already released [3], resulting in UAF. This UAF is easy to trigger, crash trace by POC is like below [ 66.703713] ================================================================== [ 66.703974] BUG: KASAN: use-after-free in enqueue_timer+0x448/0x490 [ 66.703974] Write of size 8 at addr ffff888009fb7058 by task kworker/u4:1/33 [ 66.703974] [ 66.703974] CPU: 1 PID: 33 Comm: kworker/u4:1 Not tainted 5.18.0-rc2 #5 [ 66.703974] Workqueue: nfc2_nci_cmd_wq nci_cmd_work [ 66.703974] Call Trace: [ 66.703974] <TASK> [ 66.703974] dump_stack_lvl+0x57/0x7d [ 66.703974] print_report.cold+0x5e/0x5db [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] kasan_report+0xbe/0x1c0 [ 66.703974] ? enqueue_timer+0x448/0x490 [ 66.703974] enqueue_timer+0x448/0x490 [ 66.703974] __mod_timer+0x5e6/0xb80 [ 66.703974] ? mark_held_locks+0x9e/0xe0 [ 66.703974] ? try_to_del_timer_sync+0xf0/0xf0 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 66.703974] ? queue_work_on+0x61/0x80 [ 66.703974] ? lockdep_hardirqs_on+0xbf/0x130 [ 66.703974] process_one_work+0x8bb/0x1510 [ 66.703974] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 66.703974] ? pwq_dec_nr_in_flight+0x230/0x230 [ 66.703974] ? rwlock_bug.part.0+0x90/0x90 [ 66.703974] ? _raw_spin_lock_irq+0x41/0x50 [ 66.703974] worker_thread+0x575/0x1190 [ 66.703974] ? process_one_work+0x1510/0x1510 [ 66.703974] kthread+0x2a0/0x340 [ 66.703974] ? kthread_complete_and_exit+0x20/0x20 [ 66.703974] ret_from_fork+0x22/0x30 [ 66.703974] </TASK> [ 66.703974] [ 66.703974] Allocated by task 267: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] __kasan_kmalloc+0x81/0xa0 [ 66.703974] nci_allocate_device+0xd3/0x390 [ 66.703974] nfcmrvl_nci_register_dev+0x183/0x2c0 [ 66.703974] nfcmrvl_nci_uart_open+0xf2/0x1dd [ 66.703974] nci_uart_tty_ioctl+0x2c3/0x4a0 [ 66.703974] tty_ioctl+0x764/0x1310 [ 66.703974] __x64_sys_ioctl+0x122/0x190 [ 66.703974] do_syscall_64+0x3b/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 66.703974] [ 66.703974] Freed by task 406: [ 66.703974] kasan_save_stack+0x1e/0x40 [ 66.703974] kasan_set_track+0x21/0x30 [ 66.703974] kasan_set_free_info+0x20/0x30 [ 66.703974] __kasan_slab_free+0x108/0x170 [ 66.703974] kfree+0xb0/0x330 [ 66.703974] nfcmrvl_nci_unregister_dev+0x90/0xd0 [ 66.703974] nci_uart_tty_close+0xdf/0x180 [ 66.703974] tty_ldisc_kill+0x73/0x110 [ 66.703974] tty_ldisc_hangup+0x281/0x5b0 [ 66.703974] __tty_hangup.part.0+0x431/0x890 [ 66.703974] tty_release+0x3a8/0xc80 [ 66.703974] __fput+0x1f0/0x8c0 [ 66.703974] task_work_run+0xc9/0x170 [ 66.703974] exit_to_user_mode_prepare+0x194/0x1a0 [ 66.703974] syscall_exit_to_user_mode+0x19/0x50 [ 66.703974] do_syscall_64+0x48/0x90 [ 66.703974] entry_SYSCALL_64_after_hwframe+0x44/0xae To fix the UAF, this patch adds flush_workqueue() to ensure the nci_cmd_work is finished before the following del_timer_sync. This combination will promise the timer is actually detached. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Signed-off-by: Lin Ma <linma@zju.edu.cn> --- net/nfc/nci/core.c | 4 ++++ 1 file changed, 4 insertions(+)