diff mbox series

[net] Revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb"

Message ID 20240114174208.34330-2-rrameshbabu@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] Revert "net: macsec: use skb_ensure_writable_head_tail to expand the skb" | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1084 this patch: 1084
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 1095 this patch: 1095
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1099 this patch: 1099
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest pending net-next-2024-01-16--15-00

Commit Message

Rahul Rameshbabu Jan. 14, 2024, 5:42 p.m. UTC
This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.

Using skb_ensure_writable_head_tail without a call to skb_unshare causes
the MACsec stack to operate on the original skb rather than a copy in the
macsec_encrypt path. This causes the buffer to be exceeded in space, and
leads to warnings generated by skb_put operations. Opting to revert this
change since skb_copy_expand is more efficient than
skb_ensure_writable_head_tail followed by a call to skb_unshare.

Log:
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:2464!
  invalid opcode: 0000 [#1] SMP KASAN
  CPU: 21 PID: 61997 Comm: iperf3 Not tainted 6.7.0-rc8_for_upstream_debug_2024_01_07_17_05 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:skb_put+0x113/0x190
  Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 70 3b 9d bc 00 00 00 77 0e 48 83 c4 08 4c 89 e8 5b 5d 41 5d c3 <0f> 0b 4c 8b 6c 24 20 89 74 24 04 e8 6d b7 f0 fe 8b 74 24 04 48 c7
  RSP: 0018:ffff8882694e7278 EFLAGS: 00010202
  RAX: 0000000000000025 RBX: 0000000000000100 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88816ae0bad4
  RBP: ffff88816ae0ba60 R08: 0000000000000004 R09: 0000000000000004
  R10: 0000000000000001 R11: 0000000000000001 R12: ffff88811ba5abfa
  R13: ffff8882bdecc100 R14: ffff88816ae0ba60 R15: ffff8882bdecc0ae
  FS:  00007fe54df02740(0000) GS:ffff88881f080000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fe54d92e320 CR3: 000000010a345003 CR4: 0000000000370eb0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   ? die+0x33/0x90
   ? skb_put+0x113/0x190
   ? do_trap+0x1b4/0x3b0
   ? skb_put+0x113/0x190
   ? do_error_trap+0xb6/0x180
   ? skb_put+0x113/0x190
   ? handle_invalid_op+0x2c/0x30
   ? skb_put+0x113/0x190
   ? exc_invalid_op+0x2b/0x40
   ? asm_exc_invalid_op+0x16/0x20
   ? skb_put+0x113/0x190
   ? macsec_start_xmit+0x4e9/0x21d0
   macsec_start_xmit+0x830/0x21d0
   ? get_txsa_from_nl+0x400/0x400
   ? lock_downgrade+0x690/0x690
   ? dev_queue_xmit_nit+0x78b/0xae0
   dev_hard_start_xmit+0x151/0x560
   __dev_queue_xmit+0x1580/0x28f0
   ? check_chain_key+0x1c5/0x490
   ? netdev_core_pick_tx+0x2d0/0x2d0
   ? __ip_queue_xmit+0x798/0x1e00
   ? lock_downgrade+0x690/0x690
   ? mark_held_locks+0x9f/0xe0
   ip_finish_output2+0x11e4/0x2050
   ? ip_mc_finish_output+0x520/0x520
   ? ip_fragment.constprop.0+0x230/0x230
   ? __ip_queue_xmit+0x798/0x1e00
   __ip_queue_xmit+0x798/0x1e00
   ? __skb_clone+0x57a/0x760
   __tcp_transmit_skb+0x169d/0x3490
   ? lock_downgrade+0x690/0x690
   ? __tcp_select_window+0x1320/0x1320
   ? mark_held_locks+0x9f/0xe0
   ? lockdep_hardirqs_on_prepare+0x286/0x400
   ? tcp_small_queue_check.isra.0+0x120/0x3d0
   tcp_write_xmit+0x12b6/0x7100
   ? skb_page_frag_refill+0x1e8/0x460
   __tcp_push_pending_frames+0x92/0x320
   tcp_sendmsg_locked+0x1ed4/0x3190
   ? tcp_sendmsg_fastopen+0x650/0x650
   ? tcp_sendmsg+0x1a/0x40
   ? mark_held_locks+0x9f/0xe0
   ? lockdep_hardirqs_on_prepare+0x286/0x400
   tcp_sendmsg+0x28/0x40
   ? inet_send_prepare+0x1b0/0x1b0
   __sock_sendmsg+0xc5/0x190
   sock_write_iter+0x222/0x380
   ? __sock_sendmsg+0x190/0x190
   ? kfree+0x96/0x130
   vfs_write+0x842/0xbd0
   ? kernel_write+0x530/0x530
   ? __fget_light+0x51/0x220
   ? __fget_light+0x51/0x220
   ksys_write+0x172/0x1d0
   ? update_socket_protocol+0x10/0x10
   ? __x64_sys_read+0xb0/0xb0
   ? lockdep_hardirqs_on_prepare+0x286/0x400
   do_syscall_64+0x40/0xe0
   entry_SYSCALL_64_after_hwframe+0x46/0x4e
  RIP: 0033:0x7fe54d9018b7
  Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
  RSP: 002b:00007ffdbd4191d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fe54d9018b7
  RDX: 0000000000000025 RSI: 0000000000d9859c RDI: 0000000000000004
  RBP: 0000000000d9859c R08: 0000000000000004 R09: 0000000000000000
  R10: 00007fe54d80afe0 R11: 0000000000000246 R12: 0000000000000004
  R13: 0000000000000025 R14: 00007fe54e00ec00 R15: 0000000000d982a0
   </TASK>
  Modules linked in: 8021q garp mrp iptable_raw bonding vfio_pci rdma_ucm ib_umad mlx5_vfio_pci mlx5_ib vfio_pci_core vfio_iommu_type1 ib_uverbs vfio mlx5_core ip_gre nf_tables ipip tunnel4 ib_ipoib ip6_gre gre ip6_tunnel tunnel6 geneve openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core zram zsmalloc fuse [last unloaded: ib_uverbs]
  ---[ end trace 0000000000000000 ]---

Cc: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
---
 drivers/net/macsec.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

Comments

Paolo Abeni Jan. 16, 2024, 10:39 a.m. UTC | #1
On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote:
> This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.
> 
> Using skb_ensure_writable_head_tail without a call to skb_unshare causes
> the MACsec stack to operate on the original skb rather than a copy in the
> macsec_encrypt path. This causes the buffer to be exceeded in space, and
> leads to warnings generated by skb_put operations. 

This part of the changelog is confusing to me. It looks like the skb
should be uncloned under the same conditions before and after this
patch (and/or the reverted)??!

Possibly dev->needed_headroom/needed_tailroom values are incorrect?!?

Thanks!

Paolo
Sabrina Dubroca Jan. 16, 2024, 1:51 p.m. UTC | #2
2024-01-16, 11:39:35 +0100, Paolo Abeni wrote:
> On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote:
> > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.
> > 
> > Using skb_ensure_writable_head_tail without a call to skb_unshare causes
> > the MACsec stack to operate on the original skb rather than a copy in the
> > macsec_encrypt path. This causes the buffer to be exceeded in space, and
> > leads to warnings generated by skb_put operations. 
> 
> This part of the changelog is confusing to me. It looks like the skb
> should be uncloned under the same conditions before and after this
> patch (and/or the reverted)??!

I don't think so. The old code was doing unshare +
expand. skb_ensure_writable_head_tail calls pskb_expand_head without
unshare, which doesn't give us a fresh sk_buff, only takes care of the
headroom/tailroom. Or do I need more coffee? :/

> Possibly dev->needed_headroom/needed_tailroom values are incorrect?!?

That's also possible following commit a73d8779d61a ("net: macsec:
introduce mdo_insert_tx_tag"). Then this revert would only be hiding
the issue.
Rahul Rameshbabu Jan. 16, 2024, 8:45 p.m. UTC | #3
On Tue, 16 Jan, 2024 14:51:19 +0100 Sabrina Dubroca <sd@queasysnail.net> wrote:
> 2024-01-16, 11:39:35 +0100, Paolo Abeni wrote:
>> On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote:
>> > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.
>> > 
>> > Using skb_ensure_writable_head_tail without a call to skb_unshare causes
>> > the MACsec stack to operate on the original skb rather than a copy in the
>> > macsec_encrypt path. This causes the buffer to be exceeded in space, and
>> > leads to warnings generated by skb_put operations. 
>> 
>> This part of the changelog is confusing to me. It looks like the skb
>> should be uncloned under the same conditions before and after this
>> patch (and/or the reverted)??!
>
> I don't think so. The old code was doing unshare +
> expand. skb_ensure_writable_head_tail calls pskb_expand_head without
> unshare, which doesn't give us a fresh sk_buff, only takes care of the
> headroom/tailroom. Or do I need more coffee? :/

Sabrina's analysis is correct. We no longer get a fresh sk_buff with
this commit.

>
>> Possibly dev->needed_headroom/needed_tailroom values are incorrect?!?
>
> That's also possible following commit a73d8779d61a ("net: macsec:
> introduce mdo_insert_tx_tag"). Then this revert would only be hiding
> the issue.

Ah, I think that is an interesting point.

    static void macsec_set_head_tail_room(struct net_device *dev)
    {
    	struct macsec_dev *macsec = macsec_priv(dev);
    	struct net_device *real_dev = macsec->real_dev;
    	int needed_headroom, needed_tailroom;
    	const struct macsec_ops *ops;

    	ops = macsec_get_ops(macsec, NULL);
    	if (ops) {

This condition should really be ops && ops->mdo_insert_tx_tags. Let me
retest with this change and post back. That said, I am wondering if we
still need a fresh skb in the macsec stack or not as was done previously
with skb_unshare/skb_copy_expand or not.

    		needed_headroom = ops->needed_headroom;
    		needed_tailroom = ops->needed_tailroom;
    	} else {
    		needed_headroom = MACSEC_NEEDED_HEADROOM;
    		needed_tailroom = MACSEC_NEEDED_TAILROOM;
    	}

    	dev->needed_headroom = real_dev->needed_headroom + needed_headroom;
    	dev->needed_tailroom = real_dev->needed_tailroom + needed_tailroom;
    }

--
Thanks,

Rahul Rameshbabu
Rahul Rameshbabu Jan. 17, 2024, 1:22 a.m. UTC | #4
On Tue, 16 Jan, 2024 12:45:46 -0800 Rahul Rameshbabu <rrameshbabu@nvidia.com> wrote:
> On Tue, 16 Jan, 2024 14:51:19 +0100 Sabrina Dubroca <sd@queasysnail.net> wrote:
>> 2024-01-16, 11:39:35 +0100, Paolo Abeni wrote:
>>> On Sun, 2024-01-14 at 09:42 -0800, Rahul Rameshbabu wrote:
>>> > This reverts commit b34ab3527b9622ca4910df24ff5beed5aa66c6b5.
>>> > 
>>> > Using skb_ensure_writable_head_tail without a call to skb_unshare causes
>>> > the MACsec stack to operate on the original skb rather than a copy in the
>>> > macsec_encrypt path. This causes the buffer to be exceeded in space, and
>>> > leads to warnings generated by skb_put operations. 
>>> 
>>> This part of the changelog is confusing to me. It looks like the skb
>>> should be uncloned under the same conditions before and after this
>>> patch (and/or the reverted)??!
>>
>> I don't think so. The old code was doing unshare +
>> expand. skb_ensure_writable_head_tail calls pskb_expand_head without
>> unshare, which doesn't give us a fresh sk_buff, only takes care of the
>> headroom/tailroom. Or do I need more coffee? :/
>
> Sabrina's analysis is correct. We no longer get a fresh sk_buff with
> this commit.
>
>>
>>> Possibly dev->needed_headroom/needed_tailroom values are incorrect?!?
>>
>> That's also possible following commit a73d8779d61a ("net: macsec:
>> introduce mdo_insert_tx_tag"). Then this revert would only be hiding
>> the issue.
>
> Ah, I think that is an interesting point.
>
>     static void macsec_set_head_tail_room(struct net_device *dev)
>     {
>     	struct macsec_dev *macsec = macsec_priv(dev);
>     	struct net_device *real_dev = macsec->real_dev;
>     	int needed_headroom, needed_tailroom;
>     	const struct macsec_ops *ops;
>
>     	ops = macsec_get_ops(macsec, NULL);
>     	if (ops) {
>
> This condition should really be ops && ops->mdo_insert_tx_tags. Let me
> retest with this change and post back. That said, I am wondering if we
> still need a fresh skb in the macsec stack or not as was done previously
> with skb_unshare/skb_copy_expand or not.

Both fixing the headroom/tailroom management in this commit,
a73d8779d61a ("net: macsec: introduce mdo_insert_tx_tag"), as well as
simply reverting this commit does not resolve the issue. I also end up
needing to revert b34ab3527b96 ("net: macsec: use
skb_ensure_writable_head_tail to expand the skb"), so that a fresh
sk_buff is created to avoid the panic mentioned in this commit.

I think we can do one of two things.

1. We merge this patch, and I send a follow-up fix with regards to the
   issues in b34ab3527b96.
2. I send a v2 where I add an additional patch for fixing the issues in
   b34ab3527b96.

>
>     		needed_headroom = ops->needed_headroom;
>     		needed_tailroom = ops->needed_tailroom;
>     	} else {
>     		needed_headroom = MACSEC_NEEDED_HEADROOM;
>     		needed_tailroom = MACSEC_NEEDED_TAILROOM;
>     	}
>
>     	dev->needed_headroom = real_dev->needed_headroom + needed_headroom;
>     	dev->needed_tailroom = real_dev->needed_tailroom + needed_tailroom;
>     }

--
Thanks,

Rahul Rameshbabu
diff mbox series

Patch

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index e34816638569..7f5426285c61 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -607,11 +607,26 @@  static struct sk_buff *macsec_encrypt(struct sk_buff *skb,
 		return ERR_PTR(-EINVAL);
 	}
 
-	ret = skb_ensure_writable_head_tail(skb, dev);
-	if (unlikely(ret < 0)) {
-		macsec_txsa_put(tx_sa);
-		kfree_skb(skb);
-		return ERR_PTR(ret);
+	if (unlikely(skb_headroom(skb) < MACSEC_NEEDED_HEADROOM ||
+		     skb_tailroom(skb) < MACSEC_NEEDED_TAILROOM)) {
+		struct sk_buff *nskb = skb_copy_expand(skb,
+						       MACSEC_NEEDED_HEADROOM,
+						       MACSEC_NEEDED_TAILROOM,
+						       GFP_ATOMIC);
+		if (likely(nskb)) {
+			consume_skb(skb);
+			skb = nskb;
+		} else {
+			macsec_txsa_put(tx_sa);
+			kfree_skb(skb);
+			return ERR_PTR(-ENOMEM);
+		}
+	} else {
+		skb = skb_unshare(skb, GFP_ATOMIC);
+		if (!skb) {
+			macsec_txsa_put(tx_sa);
+			return ERR_PTR(-ENOMEM);
+		}
 	}
 
 	unprotected_len = skb->len;