diff mbox series

[IPV6,v2,1/4] ipv6: allocate enough headroom in ip6_finish_output2()

Message ID 4f6a2b28-a137-2e19-bf62-5a8767d0d0ac@virtuozzo.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series ipv6: allocate enough headroom in ip6_finish_output2() | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Guessed tree name to be net-next
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/source_inline fail Was 0 now: 1
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 48 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link

Commit Message

Vasily Averin July 9, 2021, 9:04 a.m. UTC
When TEE target mirrors traffic to another interface, sk_buff may
not have enough headroom to be processed correctly.
ip_finish_output2() detect this situation for ipv4 and allocates
new skb with enogh headroom. However ipv6 lacks this logic in
ip_finish_output2 and it leads to skb_under_panic:

 skbuff: skb_under_panic: text:ffffffffc0866ad4 len:96 put:24
 head:ffff97be85e31800 data:ffff97be85e317f8 tail:0x58 end:0xc0 dev:gre0
 ------------[ cut here ]------------
 kernel BUG at net/core/skbuff.c:110!
 invalid opcode: 0000 [#1] SMP PTI
 CPU: 2 PID: 393 Comm: kworker/2:2 Tainted: G           OE     5.13.0 #13
 Hardware name: Virtuozzo KVM, BIOS 1.11.0-2.vz7.4 04/01/2014
 Workqueue: ipv6_addrconf addrconf_dad_work
 RIP: 0010:skb_panic+0x48/0x4a
 Call Trace:
  skb_push.cold.111+0x10/0x10
  ipgre_header+0x24/0xf0 [ip_gre]
  neigh_connected_output+0xae/0xf0
  ip6_finish_output2+0x1a8/0x5a0
  ip6_output+0x5c/0x110
  nf_dup_ipv6+0x158/0x1000 [nf_dup_ipv6]
  tee_tg6+0x2e/0x40 [xt_TEE]
  ip6t_do_table+0x294/0x470 [ip6_tables]
  nf_hook_slow+0x44/0xc0
  nf_hook.constprop.34+0x72/0xe0
  ndisc_send_skb+0x20d/0x2e0
  ndisc_send_ns+0xd1/0x210
  addrconf_dad_work+0x3c8/0x540
  process_one_work+0x1d1/0x370
  worker_thread+0x30/0x390
  kthread+0x116/0x130
  ret_from_fork+0x22/0x30

This patch implement new helper that tries to expand headroom on current skb,
if it is not possible (shared_skb) -- creates new one.

v2 open questions:
- currently helper name skb_expand_head is bad
  and should be changed to better one. Any suggestions?  
- proper location for new helper:
   in net/core/skbuff.c right below skb_realloc_headroom() ?
- is it acceptable to free original skb inside helper ?
  Is it probably required to keep it in caller instead?

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
---
 net/ipv6/ip6_output.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

David Miller July 9, 2021, 5:58 p.m. UTC | #1
Please do not use inline in foo.c files, let the compiler decde.

Thank you.
Vasily Averin July 10, 2021, 2:53 a.m. UTC | #2
Dear David,
I'm happy to hear you again.

On 7/9/21 8:58 PM, David Miller wrote:
> Please do not use inline in foo.c files, let the compiler decde.

Thank you for the hint, I did not know it, and will follow him next time.
This time I'm going to move this helper somewhere anyway: 
either to net/core/skbuff.c as exported function where it will lost inline anyway,
or to include/linux/skbuff.h where inline is (it seems?) acceptable.

Could you please help me to find better name for this helper?

I would like to change its current name: 'skb_expand_head' looks very similar
to widely used 'pskb_expand_head' but have different semantic.
I afraid they can be accidentally misused in future.

Thank you,
	Vasily Averin.
diff mbox series

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9eb..6c5f85f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -56,14 +56,48 @@ 
 #include <net/lwtunnel.h>
 #include <net/ip_tunnels.h>
 
+static inline struct sk_buff *skb_expand_head(struct sk_buff *skb, int delta)
+{
+	/* pskb_expand_head() might crash, if skb is shared */
+	if (skb_shared(skb)) {
+		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+
+		if (likely(nskb)) {
+			if (skb->sk)
+				skb_set_owner_w(skb, skb->sk);
+			consume_skb(skb);
+		} else {
+			kfree_skb(skb);
+		}
+		skb = nskb;
+	}
+	if (skb &&
+	    pskb_expand_head(skb, SKB_DATA_ALIGN(delta), 0, GFP_ATOMIC)) {
+		kfree_skb(skb);
+		skb = NULL;
+	}
+	return skb;
+}
+
 static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
+	unsigned int hh_len = LL_RESERVED_SPACE(dev);
+	int delta = hh_len - skb_headroom(skb);
 	const struct in6_addr *nexthop;
 	struct neighbour *neigh;
 	int ret;
 
+	/* Be paranoid, rather than too clever. */
+	if (unlikely(delta  > 0) && dev->header_ops)
+		skb = skb_expand_head(skb, delta);
+
+	if (!skb) {
+		IP6_INC_STATS(net, ip6_dst_idev(dst), IPSTATS_MIB_OUTDISCARDS);
+		return -ENOMEM;
+	}
+
 	if (ipv6_addr_is_multicast(&ipv6_hdr(skb)->daddr)) {
 		struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));