Message ID | 20230113164849.4004848-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 3a415d59c1dbec9d772dbfab2d2520d98360caae |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/sched: sch_taprio: fix possible use-after-free | expand |
Hi, Eric Dumazet <edumazet@google.com> writes: > syzbot reported a nasty crash [1] in net_tx_action() which > made little sense until we got a repro. > > This repro installs a taprio qdisc, but providing an > invalid TCA_RATE attribute. > > qdisc_create() has to destroy the just initialized > taprio qdisc, and taprio_destroy() is called. > > However, the hrtimer used by taprio had already fired, > therefore advance_sched() called __netif_schedule(). > > Then net_tx_action was trying to use a destroyed qdisc. > > We can not undo the __netif_schedule(), so we must wait > until one cpu serviced the qdisc before we can proceed. > > Many thanks to Alexander Potapenko for his help. > > [1] > BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] > BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline] > BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] > BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 > queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] > do_raw_spin_trylock include/linux/spinlock.h:191 [inline] > __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] > _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 > spin_trylock include/linux/spinlock.h:359 [inline] > qdisc_run_begin include/net/sch_generic.h:187 [inline] > qdisc_run+0xee/0x540 include/net/pkt_sched.h:125 > net_tx_action+0x77c/0x9a0 net/core/dev.c:5086 > __do_softirq+0x1cc/0x7fb kernel/softirq.c:571 > run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934 > smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164 > kthread+0x31b/0x430 kernel/kthread.c:376 > ret_from_fork+0x1f/0x30 > > Uninit was created at: > slab_post_alloc_hook mm/slab.h:732 [inline] > slab_alloc_node mm/slub.c:3258 [inline] > __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970 > kmalloc_reserve net/core/skbuff.c:358 [inline] > __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430 > alloc_skb include/linux/skbuff.h:1257 [inline] > nlmsg_new include/net/netlink.h:953 [inline] > netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436 > netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507 > rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108 > netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] > netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345 > netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921 > sock_sendmsg_nosec net/socket.c:714 [inline] > sock_sendmsg net/socket.c:734 [inline] > ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482 > ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536 > __sys_sendmsg net/socket.c:2565 [inline] > __do_sys_sendmsg net/socket.c:2574 [inline] > __se_sys_sendmsg net/socket.c:2572 [inline] > __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572 > do_syscall_x64 arch/x86/entry/common.c:50 [inline] > do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 > > Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") > Reported-by: syzbot <syzkaller@googlegroups.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Alexander Potapenko <glider@google.com> > Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> > --- > include/net/sch_generic.h | 7 +++++++ > net/sched/sch_taprio.c | 3 +++ > 2 files changed, 10 insertions(+) > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h > index d5517719af4ef22282f0a15b132f8e8a07ae4179..af4aa66aaa4eba8f2eacdd00bc8fef31165c6a90 100644 > --- a/include/net/sch_generic.h > +++ b/include/net/sch_generic.h > @@ -1288,4 +1288,11 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx); > > int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); > > +/* Make sure qdisc is no longer in SCHED state. */ > +static inline void qdisc_synchronize(const struct Qdisc *q) > +{ > + while (test_bit(__QDISC_STATE_SCHED, &q->state)) > + msleep(1); > +} > + > #endif > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index 570389f6cdd7dbab5749dc06d886555305cbf623..9a11a499ea2df8d18c9c062496fdcbcf5a861391 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -1700,6 +1700,8 @@ static void taprio_reset(struct Qdisc *sch) > int i; > > hrtimer_cancel(&q->advance_timer); > + qdisc_synchronize(sch); > + From the commit message, I got the impression that only the one qdisc_synchronize() in taprio_destroy() would be needed. > if (q->qdiscs) { > for (i = 0; i < dev->num_tx_queues; i++) > if (q->qdiscs[i]) > @@ -1720,6 +1722,7 @@ static void taprio_destroy(struct Qdisc *sch) > * happens in qdisc_create(), after taprio_init() has been called. > */ > hrtimer_cancel(&q->advance_timer); > + qdisc_synchronize(sch); > > taprio_disable_offload(dev, q, NULL); > > -- > 2.39.0.314.g84b9a713c41-goog > Cheers,
On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote: > syzbot reported a nasty crash [1] in net_tx_action() which > made little sense until we got a repro. > > This repro installs a taprio qdisc, but providing an > invalid TCA_RATE attribute. > > qdisc_create() has to destroy the just initialized > taprio qdisc, and taprio_destroy() is called. > > However, the hrtimer used by taprio had already fired, > therefore advance_sched() called __netif_schedule(). > > Then net_tx_action was trying to use a destroyed qdisc. > > We can not undo the __netif_schedule(), so we must wait > until one cpu serviced the qdisc before we can proceed. > This workaround looks a bit ugly. I think we _may_ be able to make hrtimer_start() as the last step of the initialization, IOW, move other validations and allocations before it. Can you share your reproducer? Thanks,
On 2023/1/16 8:35, Cong Wang wrote: > On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote: >> syzbot reported a nasty crash [1] in net_tx_action() which >> made little sense until we got a repro. >> >> This repro installs a taprio qdisc, but providing an >> invalid TCA_RATE attribute. >> >> qdisc_create() has to destroy the just initialized >> taprio qdisc, and taprio_destroy() is called. >> >> However, the hrtimer used by taprio had already fired, >> therefore advance_sched() called __netif_schedule(). >> >> Then net_tx_action was trying to use a destroyed qdisc. >> >> We can not undo the __netif_schedule(), so we must wait >> until one cpu serviced the qdisc before we can proceed. >> > > This workaround looks a bit ugly. I think we _may_ be able to make > hrtimer_start() as the last step of the initialization, IOW, move other > validations and allocations before it. > > Can you share your reproducer? > > Thanks, Maybe the issue is the same as https://syzkaller.appspot.com/bug?id=1ccb246eecb5114c440218336e4c7205aed5f2c8
On Mon, Jan 16, 2023 at 3:07 AM shaozhengchao <shaozhengchao@huawei.com> wrote: > > > > On 2023/1/16 8:35, Cong Wang wrote: > > On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote: > >> syzbot reported a nasty crash [1] in net_tx_action() which > >> made little sense until we got a repro. > >> > >> This repro installs a taprio qdisc, but providing an > >> invalid TCA_RATE attribute. > >> > >> qdisc_create() has to destroy the just initialized > >> taprio qdisc, and taprio_destroy() is called. > >> > >> However, the hrtimer used by taprio had already fired, > >> therefore advance_sched() called __netif_schedule(). > >> > >> Then net_tx_action was trying to use a destroyed qdisc. > >> > >> We can not undo the __netif_schedule(), so we must wait > >> until one cpu serviced the qdisc before we can proceed. > >> > > > > This workaround looks a bit ugly. I think we _may_ be able to make > > hrtimer_start() as the last step of the initialization, IOW, move other > > validations and allocations before it. > > > > Can you share your reproducer? > > > > Thanks, > Maybe the issue is the same as > https://syzkaller.appspot.com/bug?id=1ccb246eecb5114c440218336e4c7205aed5f2c8 Most certainly, yes. I also think there were stall reports with the same stack trace where qdisc_run was unable to take a freed lock because its value was set to 1 by another task.
On Mon, Jan 16, 2023 at 1:35 AM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Fri, Jan 13, 2023 at 04:48:49PM +0000, Eric Dumazet wrote: > > syzbot reported a nasty crash [1] in net_tx_action() which > > made little sense until we got a repro. > > > > This repro installs a taprio qdisc, but providing an > > invalid TCA_RATE attribute. > > > > qdisc_create() has to destroy the just initialized > > taprio qdisc, and taprio_destroy() is called. > > > > However, the hrtimer used by taprio had already fired, > > therefore advance_sched() called __netif_schedule(). > > > > Then net_tx_action was trying to use a destroyed qdisc. > > > > We can not undo the __netif_schedule(), so we must wait > > until one cpu serviced the qdisc before we can proceed. > > > > This workaround looks a bit ugly. I think we _may_ be able to make > hrtimer_start() as the last step of the initialization, IOW, move other > validations and allocations before it. > taprio_init() detects no error. So moving around the hrtimer_start() inside it won't help. The error comes later from a wrong TCA_RATE attempt can then: static struct Qdisc *qdisc_create(... ... err = gen_new_estimator(...); if (err) { NL_SET_ERR_MSG(extack, "Failed to generate new estimator"); goto err_out4; } ... err_out4: qdisc_put_stab(rtnl_dereference(sch->stab)); if (ops->destroy) ops->destroy(sch); goto err_out3; This is why we need to make sure ->destroy will fully undo what ->init did, including the possible fact that the hrtimer already fired. This seems to be taprio specific. Or we would need a new method, like ->post_init(), that should be called once all steps have been a success. Or call the hrtimer_start() at first taprio_enqueue(), adding a conditional in fast path... > Can you share your reproducer? Not publicly. Although I think the bug is clear enough.
On Sat, Jan 14, 2023 at 12:41 AM Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > > Hi, > > From the commit message, I got the impression that only the one > qdisc_synchronize() in taprio_destroy() would be needed. > This could be, but then why having hrtimer_cancel(&q->advance_timer); in taprio_reset(), since it is already in taprio_destroy() ?
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Fri, 13 Jan 2023 16:48:49 +0000 you wrote: > syzbot reported a nasty crash [1] in net_tx_action() which > made little sense until we got a repro. > > This repro installs a taprio qdisc, but providing an > invalid TCA_RATE attribute. > > qdisc_create() has to destroy the just initialized > taprio qdisc, and taprio_destroy() is called. > > [...] Here is the summary with links: - [net] net/sched: sch_taprio: fix possible use-after-free https://git.kernel.org/netdev/net/c/3a415d59c1db You are awesome, thank you!
On Sat, Jan 14, 2023 at 12:41 AM Vinicius Costa Gomes <vinicius.gomes@intel.com> wrote: > > Hi, > > > From the commit message, I got the impression that only the one > qdisc_synchronize() in taprio_destroy() would be needed. > Hmm, I think you are right, qdisc_reset() is probably called while qdisc lock is held, with BH disabled. So calling msleep() from qdisc_reset() is a no go. I will send a patch removing the change in taprio_reset(), thanks.
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d5517719af4ef22282f0a15b132f8e8a07ae4179..af4aa66aaa4eba8f2eacdd00bc8fef31165c6a90 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -1288,4 +1288,11 @@ void mq_change_real_num_tx(struct Qdisc *sch, unsigned int new_real_tx); int sch_frag_xmit_hook(struct sk_buff *skb, int (*xmit)(struct sk_buff *skb)); +/* Make sure qdisc is no longer in SCHED state. */ +static inline void qdisc_synchronize(const struct Qdisc *q) +{ + while (test_bit(__QDISC_STATE_SCHED, &q->state)) + msleep(1); +} + #endif diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 570389f6cdd7dbab5749dc06d886555305cbf623..9a11a499ea2df8d18c9c062496fdcbcf5a861391 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1700,6 +1700,8 @@ static void taprio_reset(struct Qdisc *sch) int i; hrtimer_cancel(&q->advance_timer); + qdisc_synchronize(sch); + if (q->qdiscs) { for (i = 0; i < dev->num_tx_queues; i++) if (q->qdiscs[i]) @@ -1720,6 +1722,7 @@ static void taprio_destroy(struct Qdisc *sch) * happens in qdisc_create(), after taprio_init() has been called. */ hrtimer_cancel(&q->advance_timer); + qdisc_synchronize(sch); taprio_disable_offload(dev, q, NULL);
syzbot reported a nasty crash [1] in net_tx_action() which made little sense until we got a repro. This repro installs a taprio qdisc, but providing an invalid TCA_RATE attribute. qdisc_create() has to destroy the just initialized taprio qdisc, and taprio_destroy() is called. However, the hrtimer used by taprio had already fired, therefore advance_sched() called __netif_schedule(). Then net_tx_action was trying to use a destroyed qdisc. We can not undo the __netif_schedule(), so we must wait until one cpu serviced the qdisc before we can proceed. Many thanks to Alexander Potapenko for his help. [1] BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline] BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline] do_raw_spin_trylock include/linux/spinlock.h:191 [inline] __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline] _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138 spin_trylock include/linux/spinlock.h:359 [inline] qdisc_run_begin include/net/sch_generic.h:187 [inline] qdisc_run+0xee/0x540 include/net/pkt_sched.h:125 net_tx_action+0x77c/0x9a0 net/core/dev.c:5086 __do_softirq+0x1cc/0x7fb kernel/softirq.c:571 run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934 smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164 kthread+0x31b/0x430 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 Uninit was created at: slab_post_alloc_hook mm/slab.h:732 [inline] slab_alloc_node mm/slub.c:3258 [inline] __kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970 kmalloc_reserve net/core/skbuff.c:358 [inline] __alloc_skb+0x346/0xcf0 net/core/skbuff.c:430 alloc_skb include/linux/skbuff.h:1257 [inline] nlmsg_new include/net/netlink.h:953 [inline] netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436 netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507 rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0xabc/0xe90 net/socket.c:2482 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536 __sys_sendmsg net/socket.c:2565 [inline] __do_sys_sendmsg net/socket.c:2574 [inline] __se_sys_sendmsg net/socket.c:2572 [inline] __x64_sys_sendmsg+0x367/0x540 net/socket.c:2572 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: 5a781ccbd19e ("tc: Add support for configuring the taprio scheduler") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> --- include/net/sch_generic.h | 7 +++++++ net/sched/sch_taprio.c | 3 +++ 2 files changed, 10 insertions(+)