Message ID | 20211122160546.GA95191@ssuryadesk (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | IPCB isn't initialized when MPLS route is pointing to a VRF | expand |
With the attachments this time. On Mon, Nov 22, 2021 at 11:05:46AM -0500, Stephen Suryaputra wrote: > Hi, > > We ran into a problem because IPCB isn't being initialized when MPLS is > egressing into a VRF. Reproducer script and its teardown are attached, > but they are only to illustrate our setup rather than seeing the problem > as it depends on what is in the skb->cb[]. > > We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and > on ler1 the code path is as follows: mpls_forward() calls mpls_egress() > and then calls neigh_xmit(), which ends up calling __dev_queue_xmit() > and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will > call ip_output() and __ip_finish_output() that has the check for > IPCB(skb)->frag_max_size. The conditional happens to be true for us due > to IPCB isn't being initialized even though the packet size is small. > The packet then is dropped in ip_fragment(). > > The change in this diff is a way to fix: > > diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c > index ffeb2df8be7a..1d0a0151e175 100644 > --- a/net/mpls/af_mpls.c > +++ b/net/mpls/af_mpls.c > @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, > htons(hdr4->ttl << 8), > htons(new_ttl << 8)); > hdr4->ttl = new_ttl; > + memset(IPCB(skb), 0, sizeof(*IPCB(skb))); > success = true; > break; > } > @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, > hdr6->hop_limit = dec.ttl; > else if (hdr6->hop_limit) > hdr6->hop_limit = hdr6->hop_limit - 1; > + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); > success = true; > break; > } > > Here are my questions. Is this the best place to initialize the IPCB? > Would it be better done in vrf.c? I can work on the formal patch once we > agree on where the final fix should be > > Cc Florian Westphal since we found the problem after upgrading to kernel > version that has his commit bb4cc1a18856 ("net: ip: always refragment ip > defragmented packets"). But I think the bug is there without his commit > as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to > true. > > Thanks, > > Stephen.
On 11/22/21 9:05 AM, Stephen Suryaputra wrote: > Hi, > > We ran into a problem because IPCB isn't being initialized when MPLS is > egressing into a VRF. Reproducer script and its teardown are attached, > but they are only to illustrate our setup rather than seeing the problem > as it depends on what is in the skb->cb[]. > > We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and > on ler1 the code path is as follows: mpls_forward() calls mpls_egress() > and then calls neigh_xmit(), which ends up calling __dev_queue_xmit() > and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will > call ip_output() and __ip_finish_output() that has the check for > IPCB(skb)->frag_max_size. The conditional happens to be true for us due > to IPCB isn't being initialized even though the packet size is small. > The packet then is dropped in ip_fragment(). > > The change in this diff is a way to fix: > > diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c > index ffeb2df8be7a..1d0a0151e175 100644 > --- a/net/mpls/af_mpls.c > +++ b/net/mpls/af_mpls.c > @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, > htons(hdr4->ttl << 8), > htons(new_ttl << 8)); > hdr4->ttl = new_ttl; > + memset(IPCB(skb), 0, sizeof(*IPCB(skb))); > success = true; > break; > } > @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, > hdr6->hop_limit = dec.ttl; > else if (hdr6->hop_limit) > hdr6->hop_limit = hdr6->hop_limit - 1; > + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); > success = true; > break; > } > > Here are my questions. Is this the best place to initialize the IPCB? > Would it be better done in vrf.c? I can work on the formal patch once we > agree on where the final fix should be vrf.c seems like the right place since it does the direct recirculation (vs loopback which puts in back in the Rx queue). > > Cc Florian Westphal since we found the problem after upgrading to kernel > version that has his commit bb4cc1a18856 ("net: ip: always refragment ip > defragmented packets"). But I think the bug is there without his commit > as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to > true. > > Thanks, > > Stephen. >
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index ffeb2df8be7a..1d0a0151e175 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, htons(hdr4->ttl << 8), htons(new_ttl << 8)); hdr4->ttl = new_ttl; + memset(IPCB(skb), 0, sizeof(*IPCB(skb))); success = true; break; } @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt, hdr6->hop_limit = dec.ttl; else if (hdr6->hop_limit) hdr6->hop_limit = hdr6->hop_limit - 1; + memset(IP6CB(skb), 0, sizeof(*IP6CB(skb))); success = true; break; }