Message ID | 20210218123053.2239986-1-Jason@zx2c4.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: icmp: zero-out cb in icmp{,v6}_ndo_send before sending | expand |
On Thu, Feb 18, 2021 at 7:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming Indeed that also casts skb->cb, to read IP6CB(skb)->iif, good catch. Still, might be good to more precisely detail the relevant bug: icmp_send casts the cb to an option struct. __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt); which is referenced to parse headers by __ip_options_echo, copying data into stack allocated icmp_param and so overwriting the stack frame. > the skb to have come directly from the inet layer. But when the packet > comes from the ndo layer, especially when forwarded, there's no telling > what might be in skb->cb at that point. So, icmp{,v6}_ndo_send must zero > out its skb->cb before passing the packet off to icmp{,v6}_send. > Otherwise the icmp sending code risks reading bogus memory contents, > which can result in nasty stack overflows such as this one reported by a > user: > > panic+0x108/0x2ea > __stack_chk_fail+0x14/0x20 > __icmp_send+0x5bd/0x5c0 > icmp_ndo_send+0x148/0x160 > > This is easy to simulate by doing a `memset(skb->cb, 0x41, > sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by > good fortune and the rarity of icmp sending from that context that we've > avoided reports like this until now. For example, in KASAN: > > BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 > Write of size 38 at addr ffff888006f1f80e by task ping/89 > CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 > Call Trace: > dump_stack+0x9a/0xcc > print_address_description.constprop.0+0x1a/0x160 > __kasan_report.cold+0x20/0x38 > kasan_report+0x32/0x40 > check_memory_region+0x145/0x1a0 > memcpy+0x39/0x60 > __ip_options_echo+0xa0e/0x12b0 > __icmp_send+0x744/0x1700 > > Actually, out of the 4 drivers that do this, only gtp zeroed the cb for > the v4 case, while the rest did not. So this commit actually removes the > gtp-specific zeroing, while putting the code where it belongs in the > shared infrastructure of icmp{,v6}_ndo_send. > > Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") This is from looking at all the callers of icmp{,v6}_ndo_send. If you look at the callers of icmp{,v6}_send there are even a couple more. Such as ipoib_cm_skb_reap (which memsets), clip_neigh_error (which doesn't), various tunnel devices (which live under net/ipv4, but are called as .ndo_start_xmit downstream from, e.g., segmentation (SKB_GSO_CB). Which are fixed (all?) in commit 5146d1f15112 ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). Might be even better to do the memset in __icmp_send/icmp6_send, rather than in the wrapper. What do you think? > Reported-by: SinYu <liuxyon@gmail.com> > Cc: Willem de Bruijn <willemb@google.com> > Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > --- > drivers/net/gtp.c | 1 - > include/linux/icmpv6.h | 6 +++++- > include/net/icmp.h | 6 +++++- > net/ipv4/icmp.c | 2 ++ > net/ipv6/ip6_icmp.c | 2 ++ > 5 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c > index 4c04e271f184..fd3c2d86e48b 100644 > --- a/drivers/net/gtp.c > +++ b/drivers/net/gtp.c > @@ -539,7 +539,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev, > if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) && > mtu < ntohs(iph->tot_len)) { > netdev_dbg(dev, "packet too big, fragmentation needed\n"); > - memset(IPCB(skb), 0, sizeof(*IPCB(skb))); > icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, > htonl(mtu)); > goto err_rt; > diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h > index 1b3371ae8193..87d434fc98a3 100644 > --- a/include/linux/icmpv6.h > +++ b/include/linux/icmpv6.h > @@ -45,7 +45,11 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type, > #if IS_ENABLED(CONFIG_NF_NAT) > void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info); > #else > -#define icmpv6_ndo_send icmpv6_send > +static inline void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) > +{ > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); > + icmpv6_send(skb_in, type, code, info); > +} > #endif > > #else > diff --git a/include/net/icmp.h b/include/net/icmp.h > index 9ac2d2672a93..4bb404c9abc8 100644 > --- a/include/net/icmp.h > +++ b/include/net/icmp.h > @@ -46,7 +46,11 @@ static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 > #if IS_ENABLED(CONFIG_NF_NAT) > void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info); > #else > -#define icmp_ndo_send icmp_send > +static inline void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) > +{ > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); as in the line removed in gtp_build_skb_ip4, it would be sufficient to memset sizeof(*IPCB(skb)). I don't know if you chose to clear the full 40B on purpose. > + icmp_send(skb_in, type, code, info); > +} > #endif > > int icmp_rcv(struct sk_buff *skb); > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index 396b492c804f..ecf080532291 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -781,6 +781,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) > > ct = nf_ct_get(skb_in, &ctinfo); > if (!ct || !(ct->status & IPS_SRC_NAT)) { > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); > icmp_send(skb_in, type, code, info); > return; > } > @@ -796,6 +797,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) > > orig_ip = ip_hdr(skb_in)->saddr; > ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip; > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); > icmp_send(skb_in, type, code, info); > ip_hdr(skb_in)->saddr = orig_ip; > out: > diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c > index 70c8c2f36c98..ddc28be8a65d 100644 > --- a/net/ipv6/ip6_icmp.c > +++ b/net/ipv6/ip6_icmp.c > @@ -57,6 +57,7 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) > > ct = nf_ct_get(skb_in, &ctinfo); > if (!ct || !(ct->status & IPS_SRC_NAT)) { > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); > icmpv6_send(skb_in, type, code, info); > return; > } > @@ -72,6 +73,7 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) > > orig_ip = ipv6_hdr(skb_in)->saddr; > ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6; > + memset(skb_in->cb, 0, sizeof(skb_in->cb)); > icmpv6_send(skb_in, type, code, info); > ipv6_hdr(skb_in)->saddr = orig_ip; > out: > -- > 2.30.1 >
Hi Willem, On Thu, Feb 18, 2021 at 3:57 PM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > On Thu, Feb 18, 2021 at 7:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming > > Indeed that also casts skb->cb, to read IP6CB(skb)->iif, good catch. > > Still, might be good to more precisely detail the relevant bug: > icmp_send casts the cb to an option struct. > > __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt); > > which is referenced to parse headers by __ip_options_echo, copying > data into stack allocated icmp_param and so overwriting the stack > frame. The other way to fix this bug would be to just make icmp_ndo_send call __icmp_send with an zeored stack-allocated ip_options, rather than calling icmp_send which calls __icmp_send with the IPCB one. The implementation of this is very easy, and that's what I did at first, until I noticed that the v6 side would require a little bit more plumbing to do right. But, I can go ahead and do that, if you think that's the better strategy. > This is from looking at all the callers of icmp{,v6}_ndo_send. > > If you look at the callers of icmp{,v6}_send there are even a couple > more. Such as ipoib_cm_skb_reap (which memsets), clip_neigh_error > (which doesn't), various tunnel devices (which live under net/ipv4, > but are called as .ndo_start_xmit downstream from, e.g., segmentation > (SKB_GSO_CB). Which are fixed (all?) in commit 5146d1f15112 > ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). > > Might be even better to do the memset in __icmp_send/icmp6_send, > rather than in the wrapper. What do you think? I don't think memsetting from icmp_send itself is a good idea, since most callers of that are actually from the inet layer, where it makes sense to look at IPCB. Other callers, from the ndo layer, should be using the icmp_ndo_send helper instead. Or am I confused? If there are places that are using icmp_send from ndo_start_xmit, that's a problem that should be fixed, with those uses swapped for icmp_ndo_send. Jason
On Thu, Feb 18, 2021 at 10:40 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hi Willem, > > On Thu, Feb 18, 2021 at 3:57 PM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > On Thu, Feb 18, 2021 at 7:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > > > > > The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming > > > > Indeed that also casts skb->cb, to read IP6CB(skb)->iif, good catch. > > > > Still, might be good to more precisely detail the relevant bug: > > icmp_send casts the cb to an option struct. > > > > __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt); > > > > which is referenced to parse headers by __ip_options_echo, copying > > data into stack allocated icmp_param and so overwriting the stack > > frame. > > The other way to fix this bug would be to just make icmp_ndo_send call > __icmp_send with an zeored stack-allocated ip_options, rather than > calling icmp_send which calls __icmp_send with the IPCB one. The > implementation of this is very easy, and that's what I did at first, > until I noticed that the v6 side would require a little bit more > plumbing to do right. But, I can go ahead and do that, if you think > that's the better strategy. Thanks for that. It does seem to add more code change that we'd like for stable backports. > > This is from looking at all the callers of icmp{,v6}_ndo_send. > > > > If you look at the callers of icmp{,v6}_send there are even a couple > > more. Such as ipoib_cm_skb_reap (which memsets), clip_neigh_error > > (which doesn't), various tunnel devices (which live under net/ipv4, > > but are called as .ndo_start_xmit downstream from, e.g., segmentation > > (SKB_GSO_CB). Which are fixed (all?) in commit 5146d1f15112 > > ("tunnel: Clear IPCB(skb)->opt before dst_link_failure called"). > > > > Might be even better to do the memset in __icmp_send/icmp6_send, > > rather than in the wrapper. What do you think? > > I don't think memsetting from icmp_send itself is a good idea, since > most callers of that are actually from the inet layer, where it makes > sense to look at IPCB. Other callers, from the ndo layer, should be > using the icmp_ndo_send helper instead. Or am I confused? > > If there are places that are using icmp_send from ndo_start_xmit, > that's a problem that should be fixed, with those uses swapped for > icmp_ndo_send. I missed this response earlier (two inboxes). Agreed. Sorry that I didn't reply before your v2.
On Thu, 18 Feb 2021 13:30:53 +0100 Jason A. Donenfeld wrote: > The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming > the skb to have come directly from the inet layer. But when the packet > comes from the ndo layer, especially when forwarded, there's no telling > what might be in skb->cb at that point. So, icmp{,v6}_ndo_send must zero > out its skb->cb before passing the packet off to icmp{,v6}_send. > Otherwise the icmp sending code risks reading bogus memory contents, > which can result in nasty stack overflows such as this one reported by a > user: > > panic+0x108/0x2ea > __stack_chk_fail+0x14/0x20 > __icmp_send+0x5bd/0x5c0 > icmp_ndo_send+0x148/0x160 > > This is easy to simulate by doing a `memset(skb->cb, 0x41, > sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by > good fortune and the rarity of icmp sending from that context that we've > avoided reports like this until now. For example, in KASAN: > > BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 > Write of size 38 at addr ffff888006f1f80e by task ping/89 > CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 > Call Trace: > dump_stack+0x9a/0xcc > print_address_description.constprop.0+0x1a/0x160 > __kasan_report.cold+0x20/0x38 > kasan_report+0x32/0x40 > check_memory_region+0x145/0x1a0 > memcpy+0x39/0x60 > __ip_options_echo+0xa0e/0x12b0 > __icmp_send+0x744/0x1700 > > Actually, out of the 4 drivers that do this, only gtp zeroed the cb for > the v4 case, while the rest did not. So this commit actually removes the > gtp-specific zeroing, while putting the code where it belongs in the > shared infrastructure of icmp{,v6}_ndo_send. > > Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") nit: please make sure you CC the authors of the commits you're blaming.
On Thu, Feb 18, 2021 at 8:08 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 18 Feb 2021 13:30:53 +0100 Jason A. Donenfeld wrote: > > The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming > > the skb to have come directly from the inet layer. But when the packet > > comes from the ndo layer, especially when forwarded, there's no telling > > what might be in skb->cb at that point. So, icmp{,v6}_ndo_send must zero > > out its skb->cb before passing the packet off to icmp{,v6}_send. > > Otherwise the icmp sending code risks reading bogus memory contents, > > which can result in nasty stack overflows such as this one reported by a > > user: > > > > panic+0x108/0x2ea > > __stack_chk_fail+0x14/0x20 > > __icmp_send+0x5bd/0x5c0 > > icmp_ndo_send+0x148/0x160 > > > > This is easy to simulate by doing a `memset(skb->cb, 0x41, > > sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by > > good fortune and the rarity of icmp sending from that context that we've > > avoided reports like this until now. For example, in KASAN: > > > > BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 > > Write of size 38 at addr ffff888006f1f80e by task ping/89 > > CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 > > Call Trace: > > dump_stack+0x9a/0xcc > > print_address_description.constprop.0+0x1a/0x160 > > __kasan_report.cold+0x20/0x38 > > kasan_report+0x32/0x40 > > check_memory_region+0x145/0x1a0 > > memcpy+0x39/0x60 > > __ip_options_echo+0xa0e/0x12b0 > > __icmp_send+0x744/0x1700 > > > > Actually, out of the 4 drivers that do this, only gtp zeroed the cb for > > the v4 case, while the rest did not. So this commit actually removes the > > gtp-specific zeroing, while putting the code where it belongs in the > > shared infrastructure of icmp{,v6}_ndo_send. > > > > Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") > > nit: please make sure you CC the authors of the commits you're blaming. Will do. Though in this case, it's behavior that's a few places, so I put the git commit of the earliest case, to aid with backporting. (This email is a reply to v1, but please check out v2.) Jason
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 4c04e271f184..fd3c2d86e48b 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -539,7 +539,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev, if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) && mtu < ntohs(iph->tot_len)) { netdev_dbg(dev, "packet too big, fragmentation needed\n"); - memset(IPCB(skb), 0, sizeof(*IPCB(skb))); icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu)); goto err_rt; diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h index 1b3371ae8193..87d434fc98a3 100644 --- a/include/linux/icmpv6.h +++ b/include/linux/icmpv6.h @@ -45,7 +45,11 @@ int ip6_err_gen_icmpv6_unreach(struct sk_buff *skb, int nhs, int type, #if IS_ENABLED(CONFIG_NF_NAT) void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info); #else -#define icmpv6_ndo_send icmpv6_send +static inline void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) +{ + memset(skb_in->cb, 0, sizeof(skb_in->cb)); + icmpv6_send(skb_in, type, code, info); +} #endif #else diff --git a/include/net/icmp.h b/include/net/icmp.h index 9ac2d2672a93..4bb404c9abc8 100644 --- a/include/net/icmp.h +++ b/include/net/icmp.h @@ -46,7 +46,11 @@ static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 #if IS_ENABLED(CONFIG_NF_NAT) void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info); #else -#define icmp_ndo_send icmp_send +static inline void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) +{ + memset(skb_in->cb, 0, sizeof(skb_in->cb)); + icmp_send(skb_in, type, code, info); +} #endif int icmp_rcv(struct sk_buff *skb); diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 396b492c804f..ecf080532291 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -781,6 +781,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) ct = nf_ct_get(skb_in, &ctinfo); if (!ct || !(ct->status & IPS_SRC_NAT)) { + memset(skb_in->cb, 0, sizeof(skb_in->cb)); icmp_send(skb_in, type, code, info); return; } @@ -796,6 +797,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info) orig_ip = ip_hdr(skb_in)->saddr; ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip; + memset(skb_in->cb, 0, sizeof(skb_in->cb)); icmp_send(skb_in, type, code, info); ip_hdr(skb_in)->saddr = orig_ip; out: diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c index 70c8c2f36c98..ddc28be8a65d 100644 --- a/net/ipv6/ip6_icmp.c +++ b/net/ipv6/ip6_icmp.c @@ -57,6 +57,7 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) ct = nf_ct_get(skb_in, &ctinfo); if (!ct || !(ct->status & IPS_SRC_NAT)) { + memset(skb_in->cb, 0, sizeof(skb_in->cb)); icmpv6_send(skb_in, type, code, info); return; } @@ -72,6 +73,7 @@ void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info) orig_ip = ipv6_hdr(skb_in)->saddr; ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6; + memset(skb_in->cb, 0, sizeof(skb_in->cb)); icmpv6_send(skb_in, type, code, info); ipv6_hdr(skb_in)->saddr = orig_ip; out:
The icmp{,v6}_send functions make all sorts of use of skb->cb, assuming the skb to have come directly from the inet layer. But when the packet comes from the ndo layer, especially when forwarded, there's no telling what might be in skb->cb at that point. So, icmp{,v6}_ndo_send must zero out its skb->cb before passing the packet off to icmp{,v6}_send. Otherwise the icmp sending code risks reading bogus memory contents, which can result in nasty stack overflows such as this one reported by a user: panic+0x108/0x2ea __stack_chk_fail+0x14/0x20 __icmp_send+0x5bd/0x5c0 icmp_ndo_send+0x148/0x160 This is easy to simulate by doing a `memset(skb->cb, 0x41, sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by good fortune and the rarity of icmp sending from that context that we've avoided reports like this until now. For example, in KASAN: BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0 Write of size 38 at addr ffff888006f1f80e by task ping/89 CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5 Call Trace: dump_stack+0x9a/0xcc print_address_description.constprop.0+0x1a/0x160 __kasan_report.cold+0x20/0x38 kasan_report+0x32/0x40 check_memory_region+0x145/0x1a0 memcpy+0x39/0x60 __ip_options_echo+0xa0e/0x12b0 __icmp_send+0x744/0x1700 Actually, out of the 4 drivers that do this, only gtp zeroed the cb for the v4 case, while the rest did not. So this commit actually removes the gtp-specific zeroing, while putting the code where it belongs in the shared infrastructure of icmp{,v6}_ndo_send. Fixes: a2b78e9b2cac ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs") Reported-by: SinYu <liuxyon@gmail.com> Cc: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> --- drivers/net/gtp.c | 1 - include/linux/icmpv6.h | 6 +++++- include/net/icmp.h | 6 +++++- net/ipv4/icmp.c | 2 ++ net/ipv6/ip6_icmp.c | 2 ++ 5 files changed, 14 insertions(+), 3 deletions(-)