Message ID | 20240114174208.34330-2-rrameshbabu@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] Revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb" | expand |
On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote: > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5. > > Using skb_ensure_writable_head_tail without a call to skb_unshare causes > the MACsec stack to operate on the original skb rather than a copy in the > macsec_encrypt path. This causes the buffer to be exceeded in space, and > leads to warnings generated by skb_put operations. This part of the changelog is confusing to me. It looks like the skb should be uncloned under the same conditions before and after this patch (and/or the reverted)??! Possibly dev->needed_headroom/needed_tailroom values are incorrect?!? Thanks! Paolo
2024-01-16, 11:39:35 +0100, Paolo Abeni wrote: > On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote: > > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5. > > > > Using skb_ensure_writable_head_tail without a call to skb_unshare causes > > the MACsec stack to operate on the original skb rather than a copy in the > > macsec_encrypt path. This causes the buffer to be exceeded in space, and > > leads to warnings generated by skb_put operations. > > This part of the changelog is confusing to me. It looks like the skb > should be uncloned under the same conditions before and after this > patch (and/or the reverted)??! I don't think so. The old code was doing unshare + expand. skb_ensure_writable_head_tail calls pskb_expand_head without unshare, which doesn't give us a fresh sk_buff, only takes care of the headroom/tailroom. Or do I need more coffee? :/ > Possibly dev->needed_headroom/needed_tailroom values are incorrect?!? That's also possible following commit a73d8779d61a ("net: macsec: introduce mdo_insert_tx_tag"). Then this revert would only be hiding the issue.
On Tue, 16 Jan, 2024 14:51:19 +0100 Sabrina Dubroca <sd@queasysnail.net> wrote: > 2024-01-16, 11:39:35 +0100, Paolo Abeni wrote: >> On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote: >> > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5. >> > >> > Using skb_ensure_writable_head_tail without a call to skb_unshare causes >> > the MACsec stack to operate on the original skb rather than a copy in the >> > macsec_encrypt path. This causes the buffer to be exceeded in space, and >> > leads to warnings generated by skb_put operations. >> >> This part of the changelog is confusing to me. It looks like the skb >> should be uncloned under the same conditions before and after this >> patch (and/or the reverted)??! > > I don't think so. The old code was doing unshare + > expand. skb_ensure_writable_head_tail calls pskb_expand_head without > unshare, which doesn't give us a fresh sk_buff, only takes care of the > headroom/tailroom. Or do I need more coffee? :/ Sabrina's analysis is correct. We no longer get a fresh sk_buff with this commit. > >> Possibly dev->needed_headroom/needed_tailroom values are incorrect?!? > > That's also possible following commit a73d8779d61a ("net: macsec: > introduce mdo_insert_tx_tag"). Then this revert would only be hiding > the issue. Ah, I think that is an interesting point. static void macsec_set_head_tail_room(struct net_device *dev) { struct macsec_dev *macsec = macsec_priv(dev); struct net_device *real_dev = macsec->real_dev; int needed_headroom, needed_tailroom; const struct macsec_ops *ops; ops = macsec_get_ops(macsec, NULL); if (ops) { This condition should really be ops && ops->mdo_insert_tx_tags. Let me retest with this change and post back. That said, I am wondering if we still need a fresh skb in the macsec stack or not as was done previously with skb_unshare/skb_copy_expand or not. needed_headroom = ops->needed_headroom; needed_tailroom = ops->needed_tailroom; } else { needed_headroom = MACSEC_NEEDED_HEADROOM; needed_tailroom = MACSEC_NEEDED_TAILROOM; } dev->needed_headroom = real_dev->needed_headroom + needed_headroom; dev->needed_tailroom = real_dev->needed_tailroom + needed_tailroom; } -- Thanks, Rahul Rameshbabu
On Tue, 16 Jan, 2024 12:45:46 -0800 Rahul Rameshbabu <rrameshbabu@nvidia.com> wrote: > On Tue, 16 Jan, 2024 14:51:19 +0100 Sabrina Dubroca <sd@queasysnail.net> wrote: >> 2024-01-16, 11:39:35 +0100, Paolo Abeni wrote: >>> On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote: >>> > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5. >>> > >>> > Using skb_ensure_writable_head_tail without a call to skb_unshare causes >>> > the MACsec stack to operate on the original skb rather than a copy in the >>> > macsec_encrypt path. This causes the buffer to be exceeded in space, and >>> > leads to warnings generated by skb_put operations. >>> >>> This part of the changelog is confusing to me. It looks like the skb >>> should be uncloned under the same conditions before and after this >>> patch (and/or the reverted)??! >> >> I don't think so. The old code was doing unshare + >> expand. skb_ensure_writable_head_tail calls pskb_expand_head without >> unshare, which doesn't give us a fresh sk_buff, only takes care of the >> headroom/tailroom. Or do I need more coffee? :/ > > Sabrina's analysis is correct. We no longer get a fresh sk_buff with > this commit. > >> >>> Possibly dev->needed_headroom/needed_tailroom values are incorrect?!? >> >> That's also possible following commit a73d8779d61a ("net: macsec: >> introduce mdo_insert_tx_tag"). Then this revert would only be hiding >> the issue. > > Ah, I think that is an interesting point. > > static void macsec_set_head_tail_room(struct net_device *dev) > { > struct macsec_dev *macsec = macsec_priv(dev); > struct net_device *real_dev = macsec->real_dev; > int needed_headroom, needed_tailroom; > const struct macsec_ops *ops; > > ops = macsec_get_ops(macsec, NULL); > if (ops) { > > This condition should really be ops && ops->mdo_insert_tx_tags. Let me > retest with this change and post back. That said, I am wondering if we > still need a fresh skb in the macsec stack or not as was done previously > with skb_unshare/skb_copy_expand or not. Both fixing the headroom/tailroom management in this commit, a73d8779d61a ("net: macsec: introduce mdo_insert_tx_tag"), as well as simply reverting this commit does not resolve the issue. I also end up needing to revert b34ab3527b96 ("net: macsec: use skb_ensure_writable_head_tail to expand the skb"), so that a fresh sk_buff is created to avoid the panic mentioned in this commit. I think we can do one of two things. 1. We merge this patch, and I send a follow-up fix with regards to the issues in b34ab3527b96. 2. I send a v2 where I add an additional patch for fixing the issues in b34ab3527b96. > > needed_headroom = ops->needed_headroom; > needed_tailroom = ops->needed_tailroom; > } else { > needed_headroom = MACSEC_NEEDED_HEADROOM; > needed_tailroom = MACSEC_NEEDED_TAILROOM; > } > > dev->needed_headroom = real_dev->needed_headroom + needed_headroom; > dev->needed_tailroom = real_dev->needed_tailroom + needed_tailroom; > } -- Thanks, Rahul Rameshbabu
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index e34816638569..7f5426285c61 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -607,11 +607,26 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb, return ERR_PTR(-EINVAL); } - ret = skb_ensure_writable_head_tail(skb, dev); - if (unlikely(ret < 0)) { - macsec_txsa_put(tx_sa); - kfree_skb(skb); - return ERR_PTR(ret); + if (unlikely(skb_headroom(skb) < MACSEC_NEEDED_HEADROOM || + skb_tailroom(skb) < MACSEC_NEEDED_TAILROOM)) { + struct sk_buff *nskb = skb_copy_expand(skb, + MACSEC_NEEDED_HEADROOM, + MACSEC_NEEDED_TAILROOM, + GFP_ATOMIC); + if (likely(nskb)) { + consume_skb(skb); + skb = nskb; + } else { + macsec_txsa_put(tx_sa); + kfree_skb(skb); + return ERR_PTR(-ENOMEM); + } + } else { + skb = skb_unshare(skb, GFP_ATOMIC); + if (!skb) { + macsec_txsa_put(tx_sa); + return ERR_PTR(-ENOMEM); + } } unprotected_len = skb->len;
This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5. Using skb_ensure_writable_head_tail without a call to skb_unshare causes the MACsec stack to operate on the original skb rather than a copy in the macsec_encrypt path. This causes the buffer to be exceeded in space, and leads to warnings generated by skb_put operations. Opting to revert this change since skb_copy_expand is more efficient than skb_ensure_writable_head_tail followed by a call to skb_unshare. Log: ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:2464! invalid opcode: 0000 [#1] SMP KASAN CPU: 21 PID: 61997 Comm: iperf3 Not tainted 6.7.0-rc8_for_upstream_debug_2024_01_07_17_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_put+0x113/0x190 Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 70 3b 9d bc 00 00 00 77 0e 48 83 c4 08 4c 89 e8 5b 5d 41 5d c3 <0f> 0b 4c 8b 6c 24 20 89 74 24 04 e8 6d b7 f0 fe 8b 74 24 04 48 c7 RSP: 0018:ffff8882694e7278 EFLAGS: 00010202 RAX: 0000000000000025 RBX: 0000000000000100 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88816ae0bad4 RBP: ffff88816ae0ba60 R08: 0000000000000004 R09: 0000000000000004 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88811ba5abfa R13: ffff8882bdecc100 R14: ffff88816ae0ba60 R15: ffff8882bdecc0ae FS: 00007fe54df02740(0000) GS:ffff88881f080000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe54d92e320 CR3: 000000010a345003 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? die+0x33/0x90 ? skb_put+0x113/0x190 ? do_trap+0x1b4/0x3b0 ? skb_put+0x113/0x190 ? do_error_trap+0xb6/0x180 ? skb_put+0x113/0x190 ? handle_invalid_op+0x2c/0x30 ? skb_put+0x113/0x190 ? exc_invalid_op+0x2b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? skb_put+0x113/0x190 ? macsec_start_xmit+0x4e9/0x21d0 macsec_start_xmit+0x830/0x21d0 ? get_txsa_from_nl+0x400/0x400 ? lock_downgrade+0x690/0x690 ? dev_queue_xmit_nit+0x78b/0xae0 dev_hard_start_xmit+0x151/0x560 __dev_queue_xmit+0x1580/0x28f0 ? check_chain_key+0x1c5/0x490 ? netdev_core_pick_tx+0x2d0/0x2d0 ? __ip_queue_xmit+0x798/0x1e00 ? lock_downgrade+0x690/0x690 ? mark_held_locks+0x9f/0xe0 ip_finish_output2+0x11e4/0x2050 ? ip_mc_finish_output+0x520/0x520 ? ip_fragment.constprop.0+0x230/0x230 ? __ip_queue_xmit+0x798/0x1e00 __ip_queue_xmit+0x798/0x1e00 ? __skb_clone+0x57a/0x760 __tcp_transmit_skb+0x169d/0x3490 ? lock_downgrade+0x690/0x690 ? __tcp_select_window+0x1320/0x1320 ? mark_held_locks+0x9f/0xe0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? tcp_small_queue_check.isra.0+0x120/0x3d0 tcp_write_xmit+0x12b6/0x7100 ? skb_page_frag_refill+0x1e8/0x460 __tcp_push_pending_frames+0x92/0x320 tcp_sendmsg_locked+0x1ed4/0x3190 ? tcp_sendmsg_fastopen+0x650/0x650 ? tcp_sendmsg+0x1a/0x40 ? mark_held_locks+0x9f/0xe0 ? lockdep_hardirqs_on_prepare+0x286/0x400 tcp_sendmsg+0x28/0x40 ? inet_send_prepare+0x1b0/0x1b0 __sock_sendmsg+0xc5/0x190 sock_write_iter+0x222/0x380 ? __sock_sendmsg+0x190/0x190 ? kfree+0x96/0x130 vfs_write+0x842/0xbd0 ? kernel_write+0x530/0x530 ? __fget_light+0x51/0x220 ? __fget_light+0x51/0x220 ksys_write+0x172/0x1d0 ? update_socket_protocol+0x10/0x10 ? __x64_sys_read+0xb0/0xb0 ? lockdep_hardirqs_on_prepare+0x286/0x400 do_syscall_64+0x40/0xe0 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7fe54d9018b7 Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffdbd4191d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fe54d9018b7 RDX: 0000000000000025 RSI: 0000000000d9859c RDI: 0000000000000004 RBP: 0000000000d9859c R08: 0000000000000004 R09: 0000000000000000 R10: 00007fe54d80afe0 R11: 0000000000000246 R12: 0000000000000004 R13: 0000000000000025 R14: 00007fe54e00ec00 R15: 0000000000d982a0 </TASK> Modules linked in: 8021q garp mrp iptable_raw bonding vfio_pci rdma_ucm ib_umad mlx5_vfio_pci mlx5_ib vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ip_gre nf_tables ipip tunnel4 ib_ipoib ip6_gre gre ip6_tunnel tunnel6 geneve openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core zram zsmalloc fuse [last unloaded: ib_uverbs] ---[ end trace 0000000000000000 ]--- Cc: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> --- drivers/net/macsec.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-)