Message ID | 20220220134202.2187485-1-marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/netfront: destroy queues before real_num_tx_queues is zeroed | expand |
On 20.02.22 14:42, Marek Marczykowski-Górecki wrote: > xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to > delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5 > ("net-sysfs: update the queue counts in the unregistration path"), > unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two > facts together means, that xennet_destroy_queues() called from > xennet_remove() cannot do its job, because it's called after > unregister_netdev(). This results in kfree-ing queues that are still > linked in napi, which ultimately crashes: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 > Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 1 PID: 52 Comm: xenwatch Tainted: G W 5.16.10-1.32.fc32.qubes.x86_64+ #226 > RIP: 0010:free_netdev+0xa3/0x1a0 > Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00 > RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000 > RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff > RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050 > R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680 > FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0 > Call Trace: > <TASK> > xennet_remove+0x13d/0x300 [xen_netfront] > xenbus_dev_remove+0x6d/0xf0 > __device_release_driver+0x17a/0x240 > device_release_driver+0x24/0x30 > bus_remove_device+0xd8/0x140 > device_del+0x18b/0x410 > ? _raw_spin_unlock+0x16/0x30 > ? klist_iter_exit+0x14/0x20 > ? xenbus_dev_request_and_reply+0x80/0x80 > device_unregister+0x13/0x60 > xenbus_dev_changed+0x18e/0x1f0 > xenwatch_thread+0xc0/0x1a0 > ? do_wait_intr_irq+0xa0/0xa0 > kthread+0x16b/0x190 > ? set_kthread_struct+0x40/0x40 > ret_from_fork+0x22/0x30 > </TASK> > > Fix this by calling xennet_destroy_queues() from xennet_close() too, > when real_num_tx_queues is still available. This ensures that queues are > destroyed when real_num_tx_queues is set to 0, regardless of how > unregister_netdev() was called. > > Originally reported at > https://github.com/QubesOS/qubes-issues/issues/7257 > > Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path") > Cc: stable@vger.kernel.org # 5.16+ > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > --- > While this fixes the issue, I'm not sure if that is the correct thing > to do. xennet_remove() calls xennet_destroy_queues() under rtnl_lock, > which may be important here? Just moving xennet_destroy_queues() before I checked some of the call paths leading to xennet_close(), and all of those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already taken here. Could you test with adding an ASSERT_RTNL() in xennet_destroy_queues()? > unregister_netdev() in xennet_remove() did not helped - it crashed in > another way (use-after-free in xennet_close()). Yes, this would need to basically do the xennet_close() handling in xennet_destroy() instead, which I believe is not really an option. In case your test with the added ASSERT_RTNL() doesn't show any problem you can add my: Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
On Mon, Feb 21, 2022 at 07:27:32AM +0100, Juergen Gross wrote: > I checked some of the call paths leading to xennet_close(), and all of > those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already > taken here. Could you test with adding an ASSERT_RTNL() in > xennet_destroy_queues()? Tried that and no issues spotted. > In case your test with the added ASSERT_RTNL() doesn't show any > problem you can add my: > > Reviewed-by: Juergen Gross <jgross@suse.com> Thanks.
On Mon, 21 Feb 2022 07:27:32 +0100 Juergen Gross wrote: > On 20.02.22 14:42, Marek Marczykowski-Górecki wrote: > > xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to > > delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5 > > ("net-sysfs: update the queue counts in the unregistration path"), > > unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two > > facts together means, that xennet_destroy_queues() called from > > xennet_remove() cannot do its job, because it's called after > > unregister_netdev(). This results in kfree-ing queues that are still > > linked in napi, which ultimately crashes: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > #PF: supervisor read access in kernel mode > > #PF: error_code(0x0000) - not-present page > > PGD 0 P4D 0 > > Oops: 0000 [#1] PREEMPT SMP PTI > > CPU: 1 PID: 52 Comm: xenwatch Tainted: G W 5.16.10-1.32.fc32.qubes.x86_64+ #226 > > RIP: 0010:free_netdev+0xa3/0x1a0 > > Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00 > > RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286 > > RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000 > > RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff > > RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050 > > R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680 > > FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0 > > Call Trace: > > <TASK> > > xennet_remove+0x13d/0x300 [xen_netfront] > > xenbus_dev_remove+0x6d/0xf0 > > __device_release_driver+0x17a/0x240 > > device_release_driver+0x24/0x30 > > bus_remove_device+0xd8/0x140 > > device_del+0x18b/0x410 > > ? _raw_spin_unlock+0x16/0x30 > > ? klist_iter_exit+0x14/0x20 > > ? xenbus_dev_request_and_reply+0x80/0x80 > > device_unregister+0x13/0x60 > > xenbus_dev_changed+0x18e/0x1f0 > > xenwatch_thread+0xc0/0x1a0 > > ? do_wait_intr_irq+0xa0/0xa0 > > kthread+0x16b/0x190 > > ? set_kthread_struct+0x40/0x40 > > ret_from_fork+0x22/0x30 > > </TASK> > > > > Fix this by calling xennet_destroy_queues() from xennet_close() too, > > when real_num_tx_queues is still available. This ensures that queues are > > destroyed when real_num_tx_queues is set to 0, regardless of how > > unregister_netdev() was called. > > > > Originally reported at > > https://github.com/QubesOS/qubes-issues/issues/7257 > > > > Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path") > > Cc: stable@vger.kernel.org # 5.16+ > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > > > --- > > While this fixes the issue, I'm not sure if that is the correct thing > > to do. xennet_remove() calls xennet_destroy_queues() under rtnl_lock, > > which may be important here? Just moving xennet_destroy_queues() before > > I checked some of the call paths leading to xennet_close(), and all of > those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already > taken here. Could you test with adding an ASSERT_RTNL() in > xennet_destroy_queues()? > > > unregister_netdev() in xennet_remove() did not helped - it crashed in > > another way (use-after-free in xennet_close()). > > Yes, this would need to basically do the xennet_close() handling in > xennet_destroy() instead, which I believe is not really an option. I think the patch makes open/close asymmetric, tho. After ifup ; ifdown; the next ifup will fail because queues are already destroyed, no? IOW xennet_open() expects the queues were created at an earlier stage. Maybe we can move the destroy to ndo_uninit? (and create to ndo_init?)
On Tue, Feb 22, 2022 at 12:03:01PM -0800, Jakub Kicinski wrote: > On Mon, 21 Feb 2022 07:27:32 +0100 Juergen Gross wrote: > > On 20.02.22 14:42, Marek Marczykowski-Górecki wrote: > > > xennet_destroy_queues() relies on info->netdev->real_num_tx_queues to > > > delete queues. Since d7dac083414eb5bb99a6d2ed53dc2c1b405224e5 > > > ("net-sysfs: update the queue counts in the unregistration path"), > > > unregister_netdev() indirectly sets real_num_tx_queues to 0. Those two > > > facts together means, that xennet_destroy_queues() called from > > > xennet_remove() cannot do its job, because it's called after > > > unregister_netdev(). This results in kfree-ing queues that are still > > > linked in napi, which ultimately crashes: > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > > #PF: supervisor read access in kernel mode > > > #PF: error_code(0x0000) - not-present page > > > PGD 0 P4D 0 > > > Oops: 0000 [#1] PREEMPT SMP PTI > > > CPU: 1 PID: 52 Comm: xenwatch Tainted: G W 5.16.10-1.32.fc32.qubes.x86_64+ #226 > > > RIP: 0010:free_netdev+0xa3/0x1a0 > > > Code: ff 48 89 df e8 2e e9 00 00 48 8b 43 50 48 8b 08 48 8d b8 a0 fe ff ff 48 8d a9 a0 fe ff ff 49 39 c4 75 26 eb 47 e8 ed c1 66 ff <48> 8b 85 60 01 00 00 48 8d 95 60 01 00 00 48 89 ef 48 2d 60 01 00 > > > RSP: 0000:ffffc90000bcfd00 EFLAGS: 00010286 > > > RAX: 0000000000000000 RBX: ffff88800edad000 RCX: 0000000000000000 > > > RDX: 0000000000000001 RSI: ffffc90000bcfc30 RDI: 00000000ffffffff > > > RBP: fffffffffffffea0 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000001 R12: ffff88800edad050 > > > R13: ffff8880065f8f88 R14: 0000000000000000 R15: ffff8880066c6680 > > > FS: 0000000000000000(0000) GS:ffff8880f3300000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000000000000000 CR3: 00000000e998c006 CR4: 00000000003706e0 > > > Call Trace: > > > <TASK> > > > xennet_remove+0x13d/0x300 [xen_netfront] > > > xenbus_dev_remove+0x6d/0xf0 > > > __device_release_driver+0x17a/0x240 > > > device_release_driver+0x24/0x30 > > > bus_remove_device+0xd8/0x140 > > > device_del+0x18b/0x410 > > > ? _raw_spin_unlock+0x16/0x30 > > > ? klist_iter_exit+0x14/0x20 > > > ? xenbus_dev_request_and_reply+0x80/0x80 > > > device_unregister+0x13/0x60 > > > xenbus_dev_changed+0x18e/0x1f0 > > > xenwatch_thread+0xc0/0x1a0 > > > ? do_wait_intr_irq+0xa0/0xa0 > > > kthread+0x16b/0x190 > > > ? set_kthread_struct+0x40/0x40 > > > ret_from_fork+0x22/0x30 > > > </TASK> > > > > > > Fix this by calling xennet_destroy_queues() from xennet_close() too, > > > when real_num_tx_queues is still available. This ensures that queues are > > > destroyed when real_num_tx_queues is set to 0, regardless of how > > > unregister_netdev() was called. > > > > > > Originally reported at > > > https://github.com/QubesOS/qubes-issues/issues/7257 > > > > > > Fixes: d7dac083414eb5bb9 ("net-sysfs: update the queue counts in the unregistration path") > > > Cc: stable@vger.kernel.org # 5.16+ > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > > > > > --- > > > While this fixes the issue, I'm not sure if that is the correct thing > > > to do. xennet_remove() calls xennet_destroy_queues() under rtnl_lock, > > > which may be important here? Just moving xennet_destroy_queues() before > > > > I checked some of the call paths leading to xennet_close(), and all of > > those contained an ASSERT_RTNL(), so it seems the rtnl_lock is already > > taken here. Could you test with adding an ASSERT_RTNL() in > > xennet_destroy_queues()? > > > > > unregister_netdev() in xennet_remove() did not helped - it crashed in > > > another way (use-after-free in xennet_close()). > > > > Yes, this would need to basically do the xennet_close() handling in > > xennet_destroy() instead, which I believe is not really an option. > > I think the patch makes open/close asymmetric, tho. After ifup ; ifdown; > the next ifup will fail because queues are already destroyed, no? > IOW xennet_open() expects the queues were created at an earlier stage. Right. > Maybe we can move the destroy to ndo_uninit? (and create to ndo_init?) It looks like talk_to_netback(), which currently create queues, needs them for for quite some work. It is also called when reconnecting (and netdev is _not_ re-registered in this case), so that would be a significant refactor. But, moving destroy to ndo_uninit() should be fine. It works, including after ifup;ifdown;ifup case too. I'll send v2 shortly.
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index d514d96027a6..5b69a930581e 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -828,6 +828,22 @@ static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev return NETDEV_TX_OK; } +static void xennet_destroy_queues(struct netfront_info *info) +{ + unsigned int i; + + for (i = 0; i < info->netdev->real_num_tx_queues; i++) { + struct netfront_queue *queue = &info->queues[i]; + + if (netif_running(info->netdev)) + napi_disable(&queue->napi); + netif_napi_del(&queue->napi); + } + + kfree(info->queues); + info->queues = NULL; +} + static int xennet_close(struct net_device *dev) { struct netfront_info *np = netdev_priv(dev); @@ -839,6 +855,7 @@ static int xennet_close(struct net_device *dev) queue = &np->queues[i]; napi_disable(&queue->napi); } + xennet_destroy_queues(np); return 0; } @@ -2103,22 +2120,6 @@ static int write_queue_xenstore_keys(struct netfront_queue *queue, return err; } -static void xennet_destroy_queues(struct netfront_info *info) -{ - unsigned int i; - - for (i = 0; i < info->netdev->real_num_tx_queues; i++) { - struct netfront_queue *queue = &info->queues[i]; - - if (netif_running(info->netdev)) - napi_disable(&queue->napi); - netif_napi_del(&queue->napi); - } - - kfree(info->queues); - info->queues = NULL; -} - static int xennet_create_page_pool(struct netfront_queue *queue)