Message ID | 20220314204551.16369-1-dsahern@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 40867d74c374b235e14d839f3a77f26684feefe5 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices | expand |
Hello: This patch was applied to netdev/net-next.git (master) by Jakub Kicinski <kuba@kernel.org>: On Mon, 14 Mar 2022 14:45:51 -0600 you wrote: > The fundamental premise of VRF and l3mdev core code is binding a socket > to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope. > Legacy code resets flowi_oif to the l3mdev losing any original port > device binding. Ben (among others) has demonstrated use cases where the > original port device binding is important and needs to be retained. > This patch handles that by adding a new entry to the common flow struct > that can indicate the l3mdev index for later rule and table matching > avoiding the need to reset flowi_oif. > > [...] Here is the summary with links: - [net-next] net: Add l3mdev index to flow struct and avoid oif reset for port devices https://git.kernel.org/netdev/net-next/c/40867d74c374 You are awesome, thank you!
On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote: > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > index 2af2b99e0bea..fb0e49c36c2e 100644 > --- a/net/ipv4/fib_trie.c > +++ b/net/ipv4/fib_trie.c > @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, > !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) > return false; > > - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) { > - if (flp->flowi4_oif && > - flp->flowi4_oif != nhc->nhc_oif) > - return false; > - } > + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > + return false; David, we have several test cases that are failing which I have tracked down to this patch. Before the patch, if the original output interface was enslaved to a VRF, the output interface in the flow struct would be updated to the VRF and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above check to be skipped. After the patch, the check is no longer skipped, as original output interface is retained and the flag was removed. This breaks scenarios where a GRE tunnel specifies a dummy device enslaved to a VRF as its physical device. The purpose of this configuration is to redirect the underlay lookup to the table associated with the VRF to which the dummy device is enslaved to. The check fails because 'flp->flowi4_oif' points to the dummy device, whereas 'nhc->nhc_oif' points to the interface via which the encapsulated packet should egress. Skipping the check when an l3mdev was set seems to solve the problem: diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index fb0e49c36c2e..cf1164e05d92 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) return false; - if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) + if (!flp->flowi4_l3mdev && + flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) return false; return true; AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels because 'RT6_LOOKUP_F_IFACE' is not set in ip6_route_output_flags_noref() in this case. WDYT? I plan to test this patch in our regression, but I'm not sure if I missed other cases that might remain broken.
On 3/22/22 3:22 AM, Ido Schimmel wrote: > On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote: >> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c >> index 2af2b99e0bea..fb0e49c36c2e 100644 >> --- a/net/ipv4/fib_trie.c >> +++ b/net/ipv4/fib_trie.c >> @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, >> !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) >> return false; >> >> - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) { >> - if (flp->flowi4_oif && >> - flp->flowi4_oif != nhc->nhc_oif) >> - return false; >> - } >> + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) >> + return false; > > David, we have several test cases that are failing which I have tracked > down to this patch. > > Before the patch, if the original output interface was enslaved to a > VRF, the output interface in the flow struct would be updated to the VRF > and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above > check to be skipped. > > After the patch, the check is no longer skipped, as original output > interface is retained and the flag was removed. > > This breaks scenarios where a GRE tunnel specifies a dummy device > enslaved to a VRF as its physical device. The purpose of this > configuration is to redirect the underlay lookup to the table associated > with the VRF to which the dummy device is enslaved to. The check fails > because 'flp->flowi4_oif' points to the dummy device, whereas > 'nhc->nhc_oif' points to the interface via which the encapsulated packet > should egress. > > Skipping the check when an l3mdev was set seems to solve the problem: > > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > index fb0e49c36c2e..cf1164e05d92 100644 > --- a/net/ipv4/fib_trie.c > +++ b/net/ipv4/fib_trie.c > @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, > !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) > return false; > > - if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > + if (!flp->flowi4_l3mdev && > + flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > return false; > > return true; > > AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels > because 'RT6_LOOKUP_F_IFACE' is not set in > ip6_route_output_flags_noref() in this case. > > WDYT? I plan to test this patch in our regression, but I'm not sure if I > missed other cases that might remain broken. one of the requests with VRF has been to bind a socket to a port device and expect the lookup to enforce use of that egress port (e.g., multipath). Switching the oif to the VRF device and then ignoring the oif check was making that check too flexible for that use case. What's the callchain for this failure? Perhaps the FLOWI_FLAG_SKIP_NH_OIF needs to be kept for this particular use case.
On Tue, Mar 22, 2022 at 08:26:48AM -0600, David Ahern wrote: > On 3/22/22 3:22 AM, Ido Schimmel wrote: > > On Mon, Mar 14, 2022 at 02:45:51PM -0600, David Ahern wrote: > >> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > >> index 2af2b99e0bea..fb0e49c36c2e 100644 > >> --- a/net/ipv4/fib_trie.c > >> +++ b/net/ipv4/fib_trie.c > >> @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, > >> !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) > >> return false; > >> > >> - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) { > >> - if (flp->flowi4_oif && > >> - flp->flowi4_oif != nhc->nhc_oif) > >> - return false; > >> - } > >> + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > >> + return false; > > > > David, we have several test cases that are failing which I have tracked > > down to this patch. > > > > Before the patch, if the original output interface was enslaved to a > > VRF, the output interface in the flow struct would be updated to the VRF > > and the 'FLOWI_FLAG_SKIP_NH_OIF' flag would be set, causing the above > > check to be skipped. > > > > After the patch, the check is no longer skipped, as original output > > interface is retained and the flag was removed. > > > > This breaks scenarios where a GRE tunnel specifies a dummy device > > enslaved to a VRF as its physical device. The purpose of this > > configuration is to redirect the underlay lookup to the table associated > > with the VRF to which the dummy device is enslaved to. The check fails > > because 'flp->flowi4_oif' points to the dummy device, whereas > > 'nhc->nhc_oif' points to the interface via which the encapsulated packet > > should egress. > > > > Skipping the check when an l3mdev was set seems to solve the problem: > > > > diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c > > index fb0e49c36c2e..cf1164e05d92 100644 > > --- a/net/ipv4/fib_trie.c > > +++ b/net/ipv4/fib_trie.c > > @@ -1429,7 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, > > !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) > > return false; > > > > - if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > > + if (!flp->flowi4_l3mdev && > > + flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) > > return false; > > > > return true; > > > > AFAICT, this scenario does not break with ip6gre/ip6gretap tunnels > > because 'RT6_LOOKUP_F_IFACE' is not set in > > ip6_route_output_flags_noref() in this case. > > > > WDYT? I plan to test this patch in our regression, but I'm not sure if I > > missed other cases that might remain broken. > > one of the requests with VRF has been to bind a socket to a port device > and expect the lookup to enforce use of that egress port (e.g., > multipath). Switching the oif to the VRF device and then ignoring the > oif check was making that check too flexible for that use case. I see > > What's the callchain for this failure? Perhaps the > FLOWI_FLAG_SKIP_NH_OIF needs to be kept for this particular use case. This is the stack trace for the failure: fib_lookup_good_nhc+5 fib_table_lookup+3281 fib4_rule_action+501 fib_rules_lookup+858 __fib_lookup+233 fib_lookup.constprop.0+926 ip_route_output_key_hash_rcu+3707 ip_route_output_key_hash+392 ip_route_output_flow+33 ip_tunnel_xmit+1794 gre_tap_xmit+1312 dev_hard_start_xmit+448 sch_direct_xmit+615 __dev_queue_xmit+4841 The GRE tap is using a dummy device enslaved to a VRF as its physical device.
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 714cafcf6c6c..85e362461d71 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -472,14 +472,13 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, memset(&fl6, 0, sizeof(fl6)); /* needed to match OIF rule */ - fl6.flowi6_oif = dev->ifindex; + fl6.flowi6_l3mdev = dev->ifindex; fl6.flowi6_iif = LOOPBACK_IFINDEX; fl6.daddr = iph->daddr; fl6.saddr = iph->saddr; fl6.flowlabel = ip6_flowinfo(iph); fl6.flowi6_mark = skb->mark; fl6.flowi6_proto = iph->nexthdr; - fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF; dst = ip6_dst_lookup_flow(net, NULL, &fl6, NULL); if (IS_ERR(dst) || dst == dst_null) @@ -551,10 +550,10 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, memset(&fl4, 0, sizeof(fl4)); /* needed to match OIF rule */ - fl4.flowi4_oif = vrf_dev->ifindex; + fl4.flowi4_l3mdev = vrf_dev->ifindex; fl4.flowi4_iif = LOOPBACK_IFINDEX; fl4.flowi4_tos = RT_TOS(ip4h->tos); - fl4.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF; + fl4.flowi4_flags = FLOWI_FLAG_ANYSRC; fl4.flowi4_proto = ip4h->protocol; fl4.daddr = ip4h->daddr; fl4.saddr = ip4h->saddr; diff --git a/include/net/flow.h b/include/net/flow.h index 58beb16a49b8..987bd511d652 100644 --- a/include/net/flow.h +++ b/include/net/flow.h @@ -29,6 +29,7 @@ struct flowi_tunnel { struct flowi_common { int flowic_oif; int flowic_iif; + int flowic_l3mdev; __u32 flowic_mark; __u8 flowic_tos; __u8 flowic_scope; @@ -36,7 +37,6 @@ struct flowi_common { __u8 flowic_flags; #define FLOWI_FLAG_ANYSRC 0x01 #define FLOWI_FLAG_KNOWN_NH 0x02 -#define FLOWI_FLAG_SKIP_NH_OIF 0x04 __u32 flowic_secid; kuid_t flowic_uid; struct flowi_tunnel flowic_tun_key; @@ -70,6 +70,7 @@ struct flowi4 { struct flowi_common __fl_common; #define flowi4_oif __fl_common.flowic_oif #define flowi4_iif __fl_common.flowic_iif +#define flowi4_l3mdev __fl_common.flowic_l3mdev #define flowi4_mark __fl_common.flowic_mark #define flowi4_tos __fl_common.flowic_tos #define flowi4_scope __fl_common.flowic_scope @@ -102,6 +103,7 @@ static inline void flowi4_init_output(struct flowi4 *fl4, int oif, { fl4->flowi4_oif = oif; fl4->flowi4_iif = LOOPBACK_IFINDEX; + fl4->flowi4_l3mdev = 0; fl4->flowi4_mark = mark; fl4->flowi4_tos = tos; fl4->flowi4_scope = scope; @@ -132,6 +134,7 @@ struct flowi6 { struct flowi_common __fl_common; #define flowi6_oif __fl_common.flowic_oif #define flowi6_iif __fl_common.flowic_iif +#define flowi6_l3mdev __fl_common.flowic_l3mdev #define flowi6_mark __fl_common.flowic_mark #define flowi6_scope __fl_common.flowic_scope #define flowi6_proto __fl_common.flowic_proto @@ -177,6 +180,7 @@ struct flowi { } u; #define flowi_oif u.__fl_common.flowic_oif #define flowi_iif u.__fl_common.flowic_iif +#define flowi_l3mdev u.__fl_common.flowic_l3mdev #define flowi_mark u.__fl_common.flowic_mark #define flowi_tos u.__fl_common.flowic_tos #define flowi_scope u.__fl_common.flowic_scope diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 7408051632ac..af8209f912ab 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -291,7 +291,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) bool vmark = in_dev && IN_DEV_SRC_VMARK(in_dev); struct flowi4 fl4 = { .flowi4_iif = LOOPBACK_IFINDEX, - .flowi4_oif = l3mdev_master_ifindex_rcu(dev), + .flowi4_l3mdev = l3mdev_master_ifindex_rcu(dev), .daddr = ip_hdr(skb)->saddr, .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK, .flowi4_scope = scope, @@ -353,9 +353,8 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst, bool dev_match; fl4.flowi4_oif = 0; - fl4.flowi4_iif = l3mdev_master_ifindex_rcu(dev); - if (!fl4.flowi4_iif) - fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX; + fl4.flowi4_l3mdev = l3mdev_master_ifindex_rcu(dev); + fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX; fl4.daddr = src; fl4.saddr = dst; fl4.flowi4_tos = tos; diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index c5a29703185a..cc8e84ef2ae4 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -2234,7 +2234,7 @@ void fib_select_multipath(struct fib_result *res, int hash) void fib_select_path(struct net *net, struct fib_result *res, struct flowi4 *fl4, const struct sk_buff *skb) { - if (fl4->flowi4_oif && !(fl4->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) + if (fl4->flowi4_oif) goto check_saddr; #ifdef CONFIG_IP_ROUTE_MULTIPATH diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 2af2b99e0bea..fb0e49c36c2e 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -1429,11 +1429,8 @@ bool fib_lookup_good_nhc(const struct fib_nh_common *nhc, int fib_flags, !(fib_flags & FIB_LOOKUP_IGNORE_LINKSTATE)) return false; - if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) { - if (flp->flowi4_oif && - flp->flowi4_oif != nhc->nhc_oif) - return false; - } + if (flp->flowi4_oif && flp->flowi4_oif != nhc->nhc_oif) + return false; return true; } diff --git a/net/ipv4/route.c b/net/ipv4/route.c index f444f5983405..63f3256a407d 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2263,6 +2263,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr, /* * Now we are ready to route packet. */ + fl4.flowi4_l3mdev = 0; fl4.flowi4_oif = 0; fl4.flowi4_iif = dev->ifindex; fl4.flowi4_mark = skb->mark; @@ -2738,8 +2739,7 @@ struct rtable *ip_route_output_key_hash_rcu(struct net *net, struct flowi4 *fl4, res->fi = NULL; res->table = NULL; if (fl4->flowi4_oif && - (ipv4_is_multicast(fl4->daddr) || - !netif_index_is_l3_master(net, fl4->flowi4_oif))) { + (ipv4_is_multicast(fl4->daddr) || !fl4->flowi4_l3mdev)) { /* Apparently, routing tables are wrong. Assume, * that the destination is on link. * diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c index 9e83bcb6bc99..6fde0b184791 100644 --- a/net/ipv4/xfrm4_policy.c +++ b/net/ipv4/xfrm4_policy.c @@ -28,13 +28,11 @@ static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct flowi4 *fl4, memset(fl4, 0, sizeof(*fl4)); fl4->daddr = daddr->a4; fl4->flowi4_tos = tos; - fl4->flowi4_oif = l3mdev_master_ifindex_by_index(net, oif); + fl4->flowi4_l3mdev = l3mdev_master_ifindex_by_index(net, oif); fl4->flowi4_mark = mark; if (saddr) fl4->saddr = saddr->a4; - fl4->flowi4_flags = FLOWI_FLAG_SKIP_NH_OIF; - rt = __ip_route_output_key(net, fl4); if (!IS_ERR(rt)) return &rt->dst; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index e69fac576970..a76fba3dd47a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1035,8 +1035,7 @@ static struct dst_entry *ip6_sk_dst_check(struct sock *sk, #ifdef CONFIG_IPV6_SUBTREES ip6_rt_check(&rt->rt6i_src, &fl6->saddr, np->saddr_cache) || #endif - (!(fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) && - (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex))) { + (fl6->flowi6_oif && fl6->flowi6_oif != dst->dev->ifindex)) { dst_release(dst); dst = NULL; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6188712f24b0..2fa10e60cccd 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1209,9 +1209,6 @@ INDIRECT_CALLABLE_SCOPE struct rt6_info *ip6_pol_route_lookup(struct net *net, struct fib6_node *fn; struct rt6_info *rt; - if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) - flags &= ~RT6_LOOKUP_F_IFACE; - rcu_read_lock(); fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); restart: @@ -2181,9 +2178,6 @@ int fib6_table_lookup(struct net *net, struct fib6_table *table, int oif, fn = fib6_node_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr); saved_fn = fn; - if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) - oif = 0; - redo_rt6_select: rt6_select(net, fn, oif, res, strict); if (res->f6i == net->ipv6.fib6_null_entry) { @@ -3058,12 +3052,6 @@ INDIRECT_CALLABLE_SCOPE struct rt6_info *__ip6_route_redirect(struct net *net, struct fib6_info *rt; struct fib6_node *fn; - /* l3mdev_update_flow overrides oif if the device is enslaved; in - * this case we must match on the real ingress device, so reset it - */ - if (fl6->flowi6_flags & FLOWI_FLAG_SKIP_NH_OIF) - fl6->flowi6_oif = skb->dev->ifindex; - /* Get the "current" route for this destination and * check if the redirect has come from appropriate router. * diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c index 55bb2cbae13d..e64e427a51cf 100644 --- a/net/ipv6/xfrm6_policy.c +++ b/net/ipv6/xfrm6_policy.c @@ -33,8 +33,7 @@ static struct dst_entry *xfrm6_dst_lookup(struct net *net, int tos, int oif, int err; memset(&fl6, 0, sizeof(fl6)); - fl6.flowi6_oif = l3mdev_master_ifindex_by_index(net, oif); - fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF; + fl6.flowi6_l3mdev = l3mdev_master_ifindex_by_index(net, oif); fl6.flowi6_mark = mark; memcpy(&fl6.daddr, daddr, sizeof(fl6.daddr)); if (saddr) diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c index 17927966abb3..4eb8892fb2ff 100644 --- a/net/l3mdev/l3mdev.c +++ b/net/l3mdev/l3mdev.c @@ -250,25 +250,19 @@ int l3mdev_fib_rule_match(struct net *net, struct flowi *fl, struct net_device *dev; int rc = 0; - rcu_read_lock(); + /* update flow ensures flowi_l3mdev is set when relevant */ + if (!fl->flowi_l3mdev) + return 0; - dev = dev_get_by_index_rcu(net, fl->flowi_oif); - if (dev && netif_is_l3_master(dev) && - dev->l3mdev_ops->l3mdev_fib_table) { - arg->table = dev->l3mdev_ops->l3mdev_fib_table(dev); - rc = 1; - goto out; - } + rcu_read_lock(); - dev = dev_get_by_index_rcu(net, fl->flowi_iif); + dev = dev_get_by_index_rcu(net, fl->flowi_l3mdev); if (dev && netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_fib_table) { arg->table = dev->l3mdev_ops->l3mdev_fib_table(dev); rc = 1; - goto out; } -out: rcu_read_unlock(); return rc; @@ -277,31 +271,28 @@ int l3mdev_fib_rule_match(struct net *net, struct flowi *fl, void l3mdev_update_flow(struct net *net, struct flowi *fl) { struct net_device *dev; - int ifindex; rcu_read_lock(); if (fl->flowi_oif) { dev = dev_get_by_index_rcu(net, fl->flowi_oif); if (dev) { - ifindex = l3mdev_master_ifindex_rcu(dev); - if (ifindex) { - fl->flowi_oif = ifindex; - fl->flowi_flags |= FLOWI_FLAG_SKIP_NH_OIF; - goto out; - } + if (!fl->flowi_l3mdev) + fl->flowi_l3mdev = l3mdev_master_ifindex_rcu(dev); + + /* oif set to L3mdev directs lookup to its table; + * reset to avoid oif match in fib_lookup + */ + if (netif_is_l3_master(dev)) + fl->flowi_oif = 0; + goto out; } } - if (fl->flowi_iif) { + if (fl->flowi_iif > LOOPBACK_IFINDEX && !fl->flowi_l3mdev) { dev = dev_get_by_index_rcu(net, fl->flowi_iif); - if (dev) { - ifindex = l3mdev_master_ifindex_rcu(dev); - if (ifindex) { - fl->flowi_iif = ifindex; - fl->flowi_flags |= FLOWI_FLAG_SKIP_NH_OIF; - } - } + if (dev) + fl->flowi_l3mdev = l3mdev_master_ifindex_rcu(dev); } out: diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh index 3f4c8cfe7aca..47c4d4b4a44a 100755 --- a/tools/testing/selftests/net/fcnal-test.sh +++ b/tools/testing/selftests/net/fcnal-test.sh @@ -750,7 +750,7 @@ ipv4_ping_vrf() log_start show_hint "Fails since address on vrf device is out of device scope" run_cmd ping -c1 -w1 -I ${NSA_DEV} ${a} - log_test_addr ${a} $? 1 "ping local, device bind" + log_test_addr ${a} $? 2 "ping local, device bind" done #