diff mbox series

IPCB isn't initialized when MPLS route is pointing to a VRF

Message ID 20211122160546.GA95191@ssuryadesk (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series IPCB isn't initialized when MPLS route is pointing to a VRF | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/cc_maintainers warning 4 maintainers not CCed: l4stpr0gr4m@gmail.com davem@davemloft.net jiapeng.chong@linux.alibaba.com kuba@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff fail author Signed-off-by missing
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Stephen Suryaputra Nov. 22, 2021, 4:05 p.m. UTC
Hi,

We ran into a problem because IPCB isn't being initialized when MPLS is
egressing into a VRF. Reproducer script and its teardown are attached,
but they are only to illustrate our setup rather than seeing the problem
as it depends on what is in the skb->cb[].

We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
call ip_output() and __ip_finish_output() that has the check for
IPCB(skb)->frag_max_size. The conditional happens to be true for us due
to IPCB isn't being initialized even though the packet size is small.
The packet then is dropped in ip_fragment().

The change in this diff is a way to fix:


Here are my questions. Is this the best place to initialize the IPCB?
Would it be better done in vrf.c? I can work on the formal patch once we
agree on where the final fix should be 

Cc Florian Westphal since we found the problem after upgrading to kernel
version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
defragmented packets"). But I think the bug is there without his commit
as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
true.

Thanks,

Stephen.

Comments

Stephen Suryaputra Nov. 22, 2021, 4:07 p.m. UTC | #1
With the attachments this time.

On Mon, Nov 22, 2021 at 11:05:46AM -0500, Stephen Suryaputra wrote:
> Hi,
> 
> We ran into a problem because IPCB isn't being initialized when MPLS is
> egressing into a VRF. Reproducer script and its teardown are attached,
> but they are only to illustrate our setup rather than seeing the problem
> as it depends on what is in the skb->cb[].
> 
> We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
> on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
> and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
> and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
> call ip_output() and __ip_finish_output() that has the check for
> IPCB(skb)->frag_max_size. The conditional happens to be true for us due
> to IPCB isn't being initialized even though the packet size is small.
> The packet then is dropped in ip_fragment().
> 
> The change in this diff is a way to fix:
> 
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index ffeb2df8be7a..1d0a0151e175 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
>  			      htons(hdr4->ttl << 8),
>  			      htons(new_ttl << 8));
>  		hdr4->ttl = new_ttl;
> +		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
>  		success = true;
>  		break;
>  	}
> @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
>  			hdr6->hop_limit = dec.ttl;
>  		else if (hdr6->hop_limit)
>  			hdr6->hop_limit = hdr6->hop_limit - 1;
> +		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
>  		success = true;
>  		break;
>  	}
> 
> Here are my questions. Is this the best place to initialize the IPCB?
> Would it be better done in vrf.c? I can work on the formal patch once we
> agree on where the final fix should be 
> 
> Cc Florian Westphal since we found the problem after upgrading to kernel
> version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
> defragmented packets"). But I think the bug is there without his commit
> as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
> true.
> 
> Thanks,
> 
> Stephen.
David Ahern Nov. 28, 2021, 10:48 p.m. UTC | #2
On 11/22/21 9:05 AM, Stephen Suryaputra wrote:
> Hi,
> 
> We ran into a problem because IPCB isn't being initialized when MPLS is
> egressing into a VRF. Reproducer script and its teardown are attached,
> but they are only to illustrate our setup rather than seeing the problem
> as it depends on what is in the skb->cb[].
> 
> We found this when h0 is sending an ip packet with DF=1 to 10.1.4.2 and
> on ler1 the code path is as follows: mpls_forward() calls mpls_egress()
> and then calls neigh_xmit(), which ends up calling __dev_queue_xmit()
> and vrf_xmit() through dev_hard_start_xmit(). vrf_xmit() eventually will
> call ip_output() and __ip_finish_output() that has the check for
> IPCB(skb)->frag_max_size. The conditional happens to be true for us due
> to IPCB isn't being initialized even though the packet size is small.
> The packet then is dropped in ip_fragment().
> 
> The change in this diff is a way to fix:
> 
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index ffeb2df8be7a..1d0a0151e175 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -310,6 +310,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
>  			      htons(hdr4->ttl << 8),
>  			      htons(new_ttl << 8));
>  		hdr4->ttl = new_ttl;
> +		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
>  		success = true;
>  		break;
>  	}
> @@ -327,6 +328,7 @@ static bool mpls_egress(struct net *net, struct mpls_route *rt,
>  			hdr6->hop_limit = dec.ttl;
>  		else if (hdr6->hop_limit)
>  			hdr6->hop_limit = hdr6->hop_limit - 1;
> +		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
>  		success = true;
>  		break;
>  	}
> 
> Here are my questions. Is this the best place to initialize the IPCB?
> Would it be better done in vrf.c? I can work on the formal patch once we
> agree on where the final fix should be 

vrf.c seems like the right place since it does the direct recirculation
(vs loopback which puts in back in the Rx queue).

> 
> Cc Florian Westphal since we found the problem after upgrading to kernel
> version that has his commit bb4cc1a18856 ("net: ip: always refragment ip
> defragmented packets"). But I think the bug is there without his commit
> as well if (IPCB(skb)->flags & IPCB_FRAG_PMTU) happens to evaluate to
> true.
> 
> Thanks,
> 
> Stephen.
>
diff mbox series

Patch

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ffeb2df8be7a..1d0a0151e175 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -310,6 +310,7 @@  static bool mpls_egress(struct net *net, struct mpls_route *rt,
 			      htons(hdr4->ttl << 8),
 			      htons(new_ttl << 8));
 		hdr4->ttl = new_ttl;
+		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 		success = true;
 		break;
 	}
@@ -327,6 +328,7 @@  static bool mpls_egress(struct net *net, struct mpls_route *rt,
 			hdr6->hop_limit = dec.ttl;
 		else if (hdr6->hop_limit)
 			hdr6->hop_limit = hdr6->hop_limit - 1;
+		memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
 		success = true;
 		break;
 	}