Message ID | 20241111123915.3879488-1-dongchenchen2@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: Fix icmp host relookup triggering ip_rt_bug | expand |
On Mon, Nov 11, 2024 at 08:39:15PM +0800, Dong Chenchen wrote: > arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: > > WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 > Modules linked in: > CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > RIP: 0010:ip_rt_bug+0x14/0x20 > Call Trace: > <IRQ> > ip_send_skb+0x14/0x40 > __icmp_send+0x42d/0x6a0 > ipv4_link_failure+0xe2/0x1d0 > arp_error_report+0x3c/0x50 > neigh_invalidate+0x8d/0x100 > neigh_timer_handler+0x2e1/0x330 > call_timer_fn+0x21/0x120 > __run_timer_base.part.0+0x1c9/0x270 > run_timer_softirq+0x4c/0x80 > handle_softirqs+0xac/0x280 > irq_exit_rcu+0x62/0x80 > sysvec_apic_timer_interrupt+0x77/0x90 > > The script below reproduces this scenario: > ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ > dir out priority 0 ptype main flag localok icmp > ip l a veth1 type veth > ip a a 192.168.141.111/24 dev veth0 > ip l s veth0 up > ping 192.168.141.155 -c 1 > > icmp_route_lookup() create input routes for locally generated packets > while xfrm relookup ICMP traffic.Then it will set input route > (dst->out = ip_rt_bug) to skb for DESTUNREACH. > > Similar to commit ed6e4ef836d4("netfilter: Fix ip_route_me_harder > triggering ip_rt_bug"), avoid creating input routes with > icmp_route_lookup() to fix it. > > Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") > Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> > --- > net/ipv4/icmp.c | 35 ++++++++++------------------------- > 1 file changed, 10 insertions(+), 25 deletions(-) > > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index e1384e7331d8..11ef4eb5b659 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -490,6 +490,7 @@ static struct rtable *icmp_route_lookup(struct net *net, > struct dst_entry *dst, *dst2; > struct rtable *rt, *rt2; > struct flowi4 fl4_dec; > + unsigned int addr_type; > int err; > > memset(fl4, 0, sizeof(*fl4)); > @@ -528,31 +529,15 @@ static struct rtable *icmp_route_lookup(struct net *net, > if (err) > goto relookup_failed; > > - if (inet_addr_type_dev_table(net, route_lookup_dev, > - fl4_dec.saddr) == RTN_LOCAL) { > - rt2 = __ip_route_output_key(net, &fl4_dec); > - if (IS_ERR(rt2)) > - err = PTR_ERR(rt2); > - } else { So you're saying that your packet triggered the else branch? That I think is the bug here, not the input route lookup which is the whole purpose of the original patch (simulate an input route lookup in order to determine the correct policy). AFAIK the packet that triggered the crash has a source address of 192.168.141.111, which should definitely be local. So why is the RTN_LOCAL check failing? Cheers,
On 2024/11/12 10:10, Herbert Xu wrote: > On Mon, Nov 11, 2024 at 08:39:15PM +0800, Dong Chenchen wrote: >> arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: >> >> WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 >> Modules linked in: >> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 >> RIP: 0010:ip_rt_bug+0x14/0x20 >> Call Trace: >> <IRQ> >> ip_send_skb+0x14/0x40 >> __icmp_send+0x42d/0x6a0 >> ipv4_link_failure+0xe2/0x1d0 >> arp_error_report+0x3c/0x50 >> neigh_invalidate+0x8d/0x100 >> neigh_timer_handler+0x2e1/0x330 >> call_timer_fn+0x21/0x120 >> __run_timer_base.part.0+0x1c9/0x270 >> run_timer_softirq+0x4c/0x80 >> handle_softirqs+0xac/0x280 >> irq_exit_rcu+0x62/0x80 >> sysvec_apic_timer_interrupt+0x77/0x90 >> >> The script below reproduces this scenario: >> ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ >> dir out priority 0 ptype main flag localok icmp >> ip l a veth1 type veth >> ip a a 192.168.141.111/24 dev veth0 >> ip l s veth0 up >> ping 192.168.141.155 -c 1 >> >> icmp_route_lookup() create input routes for locally generated packets >> while xfrm relookup ICMP traffic.Then it will set input route >> (dst->out = ip_rt_bug) to skb for DESTUNREACH. >> >> Similar to commit ed6e4ef836d4("netfilter: Fix ip_route_me_harder >> triggering ip_rt_bug"), avoid creating input routes with >> icmp_route_lookup() to fix it. >> >> Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") >> Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> >> --- >> net/ipv4/icmp.c | 35 ++++++++++------------------------- >> 1 file changed, 10 insertions(+), 25 deletions(-) >> >> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c >> index e1384e7331d8..11ef4eb5b659 100644 >> --- a/net/ipv4/icmp.c >> +++ b/net/ipv4/icmp.c >> @@ -490,6 +490,7 @@ static struct rtable *icmp_route_lookup(struct net *net, >> struct dst_entry *dst, *dst2; >> struct rtable *rt, *rt2; >> struct flowi4 fl4_dec; >> + unsigned int addr_type; >> int err; >> >> memset(fl4, 0, sizeof(*fl4)); >> @@ -528,31 +529,15 @@ static struct rtable *icmp_route_lookup(struct net *net, >> if (err) >> goto relookup_failed; >> >> - if (inet_addr_type_dev_table(net, route_lookup_dev, >> - fl4_dec.saddr) == RTN_LOCAL) { >> - rt2 = __ip_route_output_key(net, &fl4_dec); >> - if (IS_ERR(rt2)) >> - err = PTR_ERR(rt2); >> - } else { > So you're saying that your packet triggered the else branch? > > That I think is the bug here, not the input route lookup which > is the whole purpose of the original patch (simulate an input route > lookup in order to determine the correct policy). > > AFAIK the packet that triggered the crash has a source address > of 192.168.141.111, which should definitely be local. So why > is the RTN_LOCAL check failing? > > Cheers, Hi,Herbert. Thanks for your review! As described in rfc4301: "The implementation extracts the header for the packet that triggered the error (from the ICMP message payload), reverses the source and destination IP address fields, extracts the protocol field, and reverses the port fields (if accessible). It then uses this extracted information to locate an appropriate, active outbound SA, and transmits the error message via this SA" icmp_route_lookup reverse IP address fields of skb_in skb_in: A -> B icmp_err: B -> A icmp_route_lookup xfrm_decode_session_reverse(net, skb_in, flowi4_to_flowi(&fl4_dec), AF_INET); //fl4_dec: saddr=B, daddr=A If skb_in is outbound, fl4_dec.saddr is not nolocal. It may be no input route from B to A for first communication. If skb_in is inbound, fl4_dec.saddr is rtn_local. cheers!
On Tue, Nov 12, 2024 at 11:42:47AM +0800, dongchenchen (A) wrote: > > If skb_in is outbound, fl4_dec.saddr is not nolocal. It may be no input > route from B to A for > > first communication. You're right. So the problem here is that for the case of A being local, we should not be taking the ip_route_input code path (this is intended for forwarded packets). In fact if A is local, and we're sending an ICMP message to A, then perhaps we could skip the IPsec lookup completely and just do normal routing? Steffen, what do you think? So I think it boils down to two choices: 1) We could simply drop IPsec processing if we notice that A (fl.fl4_dst) is local; 2) Or we could take the __ip_route_output_key code path and continue to do the xfrm lookup when A is local. Thanks,
>> If skb_in is outbound, fl4_dec.saddr is not nolocal. It may be no input >> route from B to A for >> >> first communication. > You're right. So the problem here is that for the case of A > being local, we should not be taking the ip_route_input code > path (this is intended for forwarded packets). > > In fact if A is local, and we're sending an ICMP message to A, > then perhaps we could skip the IPsec lookup completely and just > do normal routing? > > Steffen, what do you think? > > So I think it boils down to two choices: > > 1) We could simply drop IPsec processing if we notice that A > (fl.fl4_dst) is local; Hi, Herbert! Thanks for your suggestions very much. maybe we can fix it as below: diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index e1384e7331d8..5f63c4dde4e9 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -517,7 +517,9 @@ static struct rtable *icmp_route_lookup(struct net *net, flowi4_to_flowi(fl4), NULL, 0); rt = dst_rtable(dst); if (!IS_ERR(dst)) { - if (rt != rt2) + unsigned int addr_type = inet_addr_type_dev_table(net, route_lookup_dev, fl4->daddr); + if (rt != rt2 || addr_type == RTN_LOCAL) return rt; } else if (PTR_ERR(dst) == -EPERM) { rt = NULL; > 2) Or we could take the __ip_route_output_key code path and > continue to do the xfrm lookup when A is local. If only skip input route lookup as below, xfrm_lookup check may return ENOENT for no xfrm pol or dst nofrm flag. @@ -543,6 +544,10 @@ static struct rtable *icmp_route_lookup(struct net *net, err = PTR_ERR(rt2); goto relookup_failed; } + + unsigned int addr_type = inet_addr_type_dev_table(net, route_lookup_dev, fl4->daddr); + if (addr_type == RTN_LOCAL) + goto relookup; /* Ugh! */ orefdst = skb_in->_skb_refdst; /* save old refdst */ xfrm_lookup xfrm_lookup_with_ifid if (!if_id && ((dst_orig->flags & DST_NOXFRM) || !net->xfrm.policy_count[XFRM_POLICY_OUT])) goto nopol; Thanks a lot! Dong Chenchen > Thanks,
On 2024/11/13 22:13, dongchenchen (A) wrote: >>> If skb_in is outbound, fl4_dec.saddr is not nolocal. It may be no input >>> route from B to A for >>> >>> first communication. >> You're right. So the problem here is that for the case of A >> being local, we should not be taking the ip_route_input code >> path (this is intended for forwarded packets). >> >> In fact if A is local, and we're sending an ICMP message to A, >> then perhaps we could skip the IPsec lookup completely and just >> do normal routing? >> >> Steffen, what do you think? >> >> So I think it boils down to two choices: >> >> 1) We could simply drop IPsec processing if we notice that A >> (fl.fl4_dst) is local; > > Hi, Herbert! Thanks for your suggestions very much. > > maybe we can fix it as below: > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c > index e1384e7331d8..5f63c4dde4e9 100644 > --- a/net/ipv4/icmp.c > +++ b/net/ipv4/icmp.c > @@ -517,7 +517,9 @@ static struct rtable *icmp_route_lookup(struct net > *net, > flowi4_to_flowi(fl4), NULL, 0); > rt = dst_rtable(dst); > if (!IS_ERR(dst)) { > - if (rt != rt2) > + unsigned int addr_type = inet_addr_type_dev_table(net, > route_lookup_dev, fl4->daddr); > + if (rt != rt2 || addr_type == RTN_LOCAL) > return rt; > } else if (PTR_ERR(dst) == -EPERM) { > rt = NULL; >> 2) Or we could take the __ip_route_output_key code path and >> continue to do the xfrm lookup when A is local. sorry, resend it due to format problems If only skip input route lookup as below, xfrm_lookup check may return ENOENT for no xfrm pol or dst nofrm flag. @@ -543,6 +544,10 @@ static struct rtable *icmp_route_lookup(struct net *net, err = PTR_ERR(rt2); goto relookup_failed; } + + unsigned int addr_type = inet_addr_type_dev_table(net, route_lookup_dev, fl4->daddr); + if (addr_type == RTN_LOCAL) + goto relookup; /* Ugh! */ orefdst = skb_in->_skb_refdst; /* save old refdst */ xfrm_lookup xfrm_lookup_with_ifid if (!if_id && ((dst_orig->flags & DST_NOXFRM) || !net->xfrm.policy_count[XFRM_POLICY_OUT])) goto nopol; Thanks a lot! Dong Chenchen > If only skip input route lookup as below, xfrm_lookup check may return > ENOENT > for no xfrm pol or dst nofrm flag. > > @@ -543,6 +544,10 @@ static struct rtable *icmp_route_lookup(struct > net *net, err = PTR_ERR(rt2); goto relookup_failed; } + + unsigned int > addr_type = inet_addr_type_dev_table(net, route_lookup_dev, > fl4->daddr); + if (addr_type == RTN_LOCAL) + goto relookup; /* Ugh! */ > orefdst = skb_in->_skb_refdst; /* save old refdst */ > > xfrm_lookup > xfrm_lookup_with_ifid > if (!if_id && ((dst_orig->flags & DST_NOXFRM) || > !net->xfrm.policy_count[XFRM_POLICY_OUT])) > goto nopol; > > Thanks a lot! > Dong Chenchen > >> Thanks,
On 2024/11/13 22:19, dongchenchen (A) wrote: > > On 2024/11/13 22:13, dongchenchen (A) wrote: >>>> If skb_in is outbound, fl4_dec.saddr is not nolocal. It may be no >>>> input >>>> route from B to A for >>>> >>>> first communication. >>> You're right. So the problem here is that for the case of A >>> being local, we should not be taking the ip_route_input code >>> path (this is intended for forwarded packets). >>> >>> In fact if A is local, and we're sending an ICMP message to A, >>> then perhaps we could skip the IPsec lookup completely and just >>> do normal routing? >>> >>> Steffen, what do you think? >>> >>> So I think it boils down to two choices: >>> >>> 1) We could simply drop IPsec processing if we notice that A >>> (fl.fl4_dst) is local; >> >> Hi, Herbert! Thanks for your suggestions very much. >> >> maybe we can fix it as below: >> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c >> index e1384e7331d8..5f63c4dde4e9 100644 >> --- a/net/ipv4/icmp.c >> +++ b/net/ipv4/icmp.c >> @@ -517,7 +517,9 @@ static struct rtable *icmp_route_lookup(struct >> net *net, >> flowi4_to_flowi(fl4), NULL, 0); >> rt = dst_rtable(dst); >> if (!IS_ERR(dst)) { >> - if (rt != rt2) >> + unsigned int addr_type = >> inet_addr_type_dev_table(net, route_lookup_dev, fl4->daddr); >> + if (rt != rt2 || addr_type == RTN_LOCAL) >> return rt; >> } else if (PTR_ERR(dst) == -EPERM) { >> rt = NULL; >>> 2) Or we could take the __ip_route_output_key code path and >>> continue to do the xfrm lookup when A is local. > > sorry, resend it due to format problems > > If only skip input route lookup as below, xfrm_lookup check may return > ENOENT > for no xfrm pol or dst nofrm flag. Sorry, I misunderstood earlier. it's correct to return ENOENT here. The difference from the above is: If A is local, rt1->dst->dev is lo (fl4->dst is A), rt2->dst->dev is actual dev selected by daddr B(fl4_2->dst is B) > > @@ -543,6 +544,10 @@ static struct rtable *icmp_route_lookup(struct > net *net, > err = PTR_ERR(rt2); > goto relookup_failed; > } > + > + unsigned int addr_type = inet_addr_type_dev_table(net, > route_lookup_dev, fl4->daddr); > + if (addr_type == RTN_LOCAL) > + goto relookup; > /* Ugh! */ > orefdst = skb_in->_skb_refdst; /* save old refdst */ > > xfrm_lookup > xfrm_lookup_with_ifid > if (!if_id && ((dst_orig->flags & DST_NOXFRM) || > !net->xfrm.policy_count[XFRM_POLICY_OUT])) > goto nopol; >
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index e1384e7331d8..11ef4eb5b659 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -490,6 +490,7 @@ static struct rtable *icmp_route_lookup(struct net *net, struct dst_entry *dst, *dst2; struct rtable *rt, *rt2; struct flowi4 fl4_dec; + unsigned int addr_type; int err; memset(fl4, 0, sizeof(*fl4)); @@ -528,31 +529,15 @@ static struct rtable *icmp_route_lookup(struct net *net, if (err) goto relookup_failed; - if (inet_addr_type_dev_table(net, route_lookup_dev, - fl4_dec.saddr) == RTN_LOCAL) { - rt2 = __ip_route_output_key(net, &fl4_dec); - if (IS_ERR(rt2)) - err = PTR_ERR(rt2); - } else { - struct flowi4 fl4_2 = {}; - unsigned long orefdst; - - fl4_2.daddr = fl4_dec.saddr; - rt2 = ip_route_output_key(net, &fl4_2); - if (IS_ERR(rt2)) { - err = PTR_ERR(rt2); - goto relookup_failed; - } - /* Ugh! */ - orefdst = skb_in->_skb_refdst; /* save old refdst */ - skb_dst_set(skb_in, NULL); - err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr, - tos, rt2->dst.dev); - - dst_release(&rt2->dst); - rt2 = skb_rtable(skb_in); - skb_in->_skb_refdst = orefdst; /* restore old refdst */ - } + addr_type = inet_addr_type_dev_table(net, route_lookup_dev, fl4_dec.saddr); + if (addr_type == RTN_LOCAL || addr_type == RTN_UNICAST) + fl4_dec.flowi4_flags |= FLOWI_FLAG_ANYSRC; + else + fl4_dec.saddr = 0; + + rt2 = __ip_route_output_key(net, &fl4_dec); + if (IS_ERR(rt2)) + err = PTR_ERR(rt2); if (err) goto relookup_failed;
arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is: WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20 Modules linked in: CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:ip_rt_bug+0x14/0x20 Call Trace: <IRQ> ip_send_skb+0x14/0x40 __icmp_send+0x42d/0x6a0 ipv4_link_failure+0xe2/0x1d0 arp_error_report+0x3c/0x50 neigh_invalidate+0x8d/0x100 neigh_timer_handler+0x2e1/0x330 call_timer_fn+0x21/0x120 __run_timer_base.part.0+0x1c9/0x270 run_timer_softirq+0x4c/0x80 handle_softirqs+0xac/0x280 irq_exit_rcu+0x62/0x80 sysvec_apic_timer_interrupt+0x77/0x90 The script below reproduces this scenario: ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \ dir out priority 0 ptype main flag localok icmp ip l a veth1 type veth ip a a 192.168.141.111/24 dev veth0 ip l s veth0 up ping 192.168.141.155 -c 1 icmp_route_lookup() create input routes for locally generated packets while xfrm relookup ICMP traffic.Then it will set input route (dst->out = ip_rt_bug) to skb for DESTUNREACH. Similar to commit ed6e4ef836d4("netfilter: Fix ip_route_me_harder triggering ip_rt_bug"), avoid creating input routes with icmp_route_lookup() to fix it. Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> --- net/ipv4/icmp.c | 35 ++++++++++------------------------- 1 file changed, 10 insertions(+), 25 deletions(-)