diff mbox series

[net,2/2] ip6_gre: Fix skb_under_panic in __gre6_xmit()

Message ID 9cd9ca4ac2c19be288cb8734a86eb30e4d9e2050.1649715555.git.peilin.ye@bytedance.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,1/2] ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 10 maintainers not CCed: songliubraving@fb.com andrii@kernel.org pabeni@redhat.com daniel@iogearbox.net kafai@fb.com yhs@fb.com bpf@vger.kernel.org john.fastabend@gmail.com kpsingh@kernel.org ast@kernel.org
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 93 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Peilin Ye April 11, 2022, 10:33 p.m. UTC
From: Peilin Ye <peilin.ye@bytedance.com>

Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:

[   82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[   82.607380] Call Trace:
[   82.609389]  <TASK>
[   82.611136]  skb_push.cold.109+0x10/0x10
[   82.614289]  __gre6_xmit+0x41e/0x590
[   82.617169]  ip6gre_tunnel_xmit+0x344/0x3f0
[   82.620526]  dev_hard_start_xmit+0xf1/0x330
[   82.623882]  sch_direct_xmit+0xe4/0x250
[   82.626961]  __dev_queue_xmit+0x720/0xfe0
<...>
[   82.633431]  packet_sendmsg+0x96a/0x1cb0
[   82.636568]  sock_sendmsg+0x30/0x40
<...>

Reproducer:

  OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o

  ip netns add at_ns0
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns at_ns0
  ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
  ip netns exec at_ns0 ip link set dev veth0 up
  ip link set dev veth1 up mtu 1500
  ip addr add dev veth1 172.16.1.200/24

  ip netns exec at_ns0 ip addr add ::11/96 dev veth0
  ip netns exec at_ns0 ip link set dev veth0 up
  ip addr add dev veth1 ::22/96
  ip link set dev veth1 up

  ip netns exec at_ns0 \
  	ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
  	local ::11 remote ::22

  ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
  ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
  ip netns exec at_ns0 ip link set dev ip6gretap00 up

  ip link add dev ip6gretap11 type ip6gretap external
  ip addr add dev ip6gretap11 10.1.1.200/24
  ip addr add dev ip6gretap11 fc80::200/24
  ip link set dev ip6gretap11 up

  tc qdisc add dev ip6gretap11 clsact
  tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
  tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel

  ping6 -c 3 -w 10 -q ::11

The following sequence of events caused the BUG:

1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
   calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
   tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
   skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
   "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
   TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
   room, triggering the BUG.

As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.

Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Co-developed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
---
Hi all,

I couldn't find a proper Fixes: tag for this fix; please comment if you
have any sugguestions.  Thanks!

Peilin Ye

 net/ipv6/ip6_gre.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Jakub Kicinski April 14, 2022, 11:14 a.m. UTC | #1
On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote:
> The following sequence of events caused the BUG:
> 
> 1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
>    calculated based on old flags (see ip6gre_calc_hlen());
> 2. packet_snd() reserves header room for skb A, assuming
>    tunnel->tun_hlen is 4;
> 3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
>    skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
> 4. __gre6_xmit() detects the new tunnel key, and recalculates
>    "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
>    TUNNEL_SEQ);
> 5. gre_build_header() calls skb_push() with insufficient reserved header
>    room, triggering the BUG.
> 
> As sugguested by Cong, fix it by moving the call to skb_cow_head() after
> the recalculation of tun_hlen.
> 
> Reported-by: Feng Zhou <zhoufeng.zf@bytedance.com>
> Co-developed-by: Cong Wang <cong.wang@bytedance.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
> ---
> Hi all,
> 
> I couldn't find a proper Fixes: tag for this fix; please comment if you
> have any sugguestions.  Thanks!

What's wrong with

Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")

?

> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index b43a46449130..976236736146 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
>  	else
>  		fl6->daddr = tunnel->parms.raddr;
>  
> -	if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
> -		return -ENOMEM;
> -
>  	/* Push GRE header. */
>  	protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
>  
> @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
>  			(TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);

We should also reject using SEQ with collect_md, but that's a separate
issue.

>  		tun_hlen = gre_calc_hlen(flags);
>  
> +		if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen))
> +			return -ENOMEM;
> +
>  		gre_build_header(skb, tun_hlen,
>  				 flags, protocol,
>  				 tunnel_id_to_key32(tun_info->key.tun_id),
Peilin Ye April 14, 2022, 8:08 p.m. UTC | #2
Hi Jakub,

On Thu, Apr 14, 2022 at 01:14:24PM +0200, Jakub Kicinski wrote:
> On Mon, 11 Apr 2022 15:33:00 -0700 Peilin Ye wrote:
> > I couldn't find a proper Fixes: tag for this fix; please comment if you
> > have any sugguestions.  Thanks!
> 
> What's wrong with
> 
> Fixes: 6712abc168eb ("ip6_gre: add ip6 gre and gretap collect_md mode")
> 
> ?

Thanks!  I will add this in v2 soon.

> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index b43a46449130..976236736146 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -733,9 +733,6 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> >  	else
> >  		fl6->daddr = tunnel->parms.raddr;
> >  
> > -	if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
> > -		return -ENOMEM;
> > -
> >  	/* Push GRE header. */
> >  	protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
> >  
> > @@ -763,6 +760,9 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
> >  			(TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
> 
> We should also reject using SEQ with collect_md, but that's a separate
> issue.

Could you explain this a bit more?  It seems that commit 77a5196a804e
("gre: add sequence number for collect md mode.") added this
intentionally.

Thanks,
Peilin Ye
Jakub Kicinski April 15, 2022, 5:11 p.m. UTC | #3
On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote:
> > We should also reject using SEQ with collect_md, but that's a separate
> > issue.  
> 
> Could you explain this a bit more?  It seems that commit 77a5196a804e
> ("gre: add sequence number for collect md mode.") added this
> intentionally.

Interesting. Maybe a better way of dealing with the problem would be
rejecting SEQ if it's not set on the device itself.

When the device is set up without the SEQ bit enabled it disables Tx
locking (look for LLTX). This means that multiple CPUs can try to do
the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.
Peilin Ye April 16, 2022, 6:56 a.m. UTC | #4
On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> On Thu, 14 Apr 2022 13:08:54 -0700 Peilin Ye wrote:
> > > We should also reject using SEQ with collect_md, but that's a separate
> > > issue.  
> > 
> > Could you explain this a bit more?  It seems that commit 77a5196a804e
> > ("gre: add sequence number for collect md mode.") added this
> > intentionally.
> 
> Interesting. Maybe a better way of dealing with the problem would be
> rejecting SEQ if it's not set on the device itself.

According to ip-link(8), the 'external' option is mutually exclusive
with the '[o]seq' option.  In other words, a collect_md mode IP6GRETAP
device should always have the TUNNEL_SEQ flag off in its
'tunnel->parms.o_flags'.

(However, I just tried:

  $ ip link add dev ip6gretap11 type ip6gretap oseq external
					       ^^^^ ^^^^^^^^
 ...and my 'ip' executed it with no error.  I will take a closer look at
 iproute2 later; maybe it's undefined behavior...)

How about:

1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
it's okay to set TUNNEL_SEQ in e.g. eBPF";

2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
return -EINVAL.

?

> When the device is set up without the SEQ bit enabled it disables Tx
> locking (look for LLTX). This means that multiple CPUs can try to do
> the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.

Thanks for the explanation!  At first glance, I was wondering why don't
we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
lockless xmit").  I quote:

"""
Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.
"""

I don't fully understand this out-of-order yet, but it seems that making
'o_seqno' atomic is not an option?

Thanks,
Peilin Ye
Jakub Kicinski April 16, 2022, 7:33 a.m. UTC | #5
On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote:
> On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> > > Could you explain this a bit more?  It seems that commit 77a5196a804e
> > > ("gre: add sequence number for collect md mode.") added this
> > > intentionally.  
> > 
> > Interesting. Maybe a better way of dealing with the problem would be
> > rejecting SEQ if it's not set on the device itself.  
> 
> According to ip-link(8), the 'external' option is mutually exclusive
> with the '[o]seq' option.  In other words, a collect_md mode IP6GRETAP
> device should always have the TUNNEL_SEQ flag off in its
> 'tunnel->parms.o_flags'.
> 
> (However, I just tried:
> 
>   $ ip link add dev ip6gretap11 type ip6gretap oseq external
> 					       ^^^^ ^^^^^^^^
>  ...and my 'ip' executed it with no error.  I will take a closer look at
>  iproute2 later; maybe it's undefined behavior...)
> 
> How about:
> 
> 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
> it's okay to set TUNNEL_SEQ in e.g. eBPF";
> 
> 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
> TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
> return -EINVAL.

Maybe pr_warn_once(), no need for a stacktrace.

> > When the device is set up without the SEQ bit enabled it disables Tx
> > locking (look for LLTX). This means that multiple CPUs can try to do
> > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.  
> 
> Thanks for the explanation!  At first glance, I was wondering why don't
> we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
> lockless xmit").  I quote:
> 
> """
> Even using an atomic_t o_seq, we would increase chance for packets being
> out of order at receiver.
> """
> 
> I don't fully understand this out-of-order yet, but it seems that making
> 'o_seqno' atomic is not an option?

atomic_t would also work (if it has enough bits). Whatever is simplest
TBH. It's just about correctness, I don't think seq is widely used.

Thanks!
Peilin Ye April 16, 2022, 8:16 a.m. UTC | #6
On Sat, Apr 16, 2022 at 09:33:20AM +0200, Jakub Kicinski wrote:
> On Fri, 15 Apr 2022 23:56:33 -0700 Peilin Ye wrote:
> > On Fri, Apr 15, 2022 at 07:11:33PM +0200, Jakub Kicinski wrote:
> > > > Could you explain this a bit more?  It seems that commit 77a5196a804e
> > > > ("gre: add sequence number for collect md mode.") added this
> > > > intentionally.  
> > > 
> > > Interesting. Maybe a better way of dealing with the problem would be
> > > rejecting SEQ if it's not set on the device itself.  
> > 
> > According to ip-link(8), the 'external' option is mutually exclusive
> > with the '[o]seq' option.  In other words, a collect_md mode IP6GRETAP
> > device should always have the TUNNEL_SEQ flag off in its
> > 'tunnel->parms.o_flags'.
> > 
> > (However, I just tried:
> > 
> >   $ ip link add dev ip6gretap11 type ip6gretap oseq external
> > 					       ^^^^ ^^^^^^^^
> >  ...and my 'ip' executed it with no error.  I will take a closer look at
> >  iproute2 later; maybe it's undefined behavior...)
> > 
> > How about:
> > 
> > 1. If 'external', then 'oseq' means "always turn off NETIF_F_LLTX, so
> > it's okay to set TUNNEL_SEQ in e.g. eBPF";
> > 
> > 2. Otherwise, if 'external' but NOT 'oseq', then whenever we see a
> > TUNNEL_SEQ in skb's tunnel info, we do something like WARN_ONCE() then
> > return -EINVAL.
> 
> Maybe pr_warn_once(), no need for a stacktrace.

Ah, thanks, coffee needed...

> > > When the device is set up without the SEQ bit enabled it disables Tx
> > > locking (look for LLTX). This means that multiple CPUs can try to do
> > > the tunnel->o_seqno++ in parallel. Not catastrophic but racy for sure.  
> > 
> > Thanks for the explanation!  At first glance, I was wondering why don't
> > we make 'o_seqno' atomic until I found commit b790e01aee74 ("ip_gre:
> > lockless xmit").  I quote:
> > 
> > """
> > Even using an atomic_t o_seq, we would increase chance for packets being
> > out of order at receiver.
> > """
> > 
> > I don't fully understand this out-of-order yet, but it seems that making
> > 'o_seqno' atomic is not an option?
> 
> atomic_t would also work (if it has enough bits). Whatever is simplest
> TBH. It's just about correctness, I don't think seq is widely used.

I see, I will work on this, thanks!

Peilin Ye
diff mbox series

Patch

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index b43a46449130..976236736146 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -733,9 +733,6 @@  static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
 	else
 		fl6->daddr = tunnel->parms.raddr;
 
-	if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
-		return -ENOMEM;
-
 	/* Push GRE header. */
 	protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
 
@@ -763,6 +760,9 @@  static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
 			(TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ);
 		tun_hlen = gre_calc_hlen(flags);
 
+		if (skb_cow_head(skb, dev->needed_headroom ?: tun_hlen + tunnel->encap_hlen))
+			return -ENOMEM;
+
 		gre_build_header(skb, tun_hlen,
 				 flags, protocol,
 				 tunnel_id_to_key32(tun_info->key.tun_id),
@@ -773,6 +773,9 @@  static netdev_tx_t __gre6_xmit(struct sk_buff *skb,
 		if (tunnel->parms.o_flags & TUNNEL_SEQ)
 			tunnel->o_seqno++;
 
+		if (skb_cow_head(skb, dev->needed_headroom ?: tunnel->hlen))
+			return -ENOMEM;
+
 		gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags,
 				 protocol, tunnel->parms.o_key,
 				 htonl(tunnel->o_seqno));