diff mbox series

bridge:fragmented packets dropped by bridge

Message ID CACzz7uzbSVpLu8iqBYXTULr2aUW_9FDdkEVozK+r-BiM2rMukw@mail.gmail.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series bridge:fragmented packets dropped by bridge | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Vyacheslav Salnikov Oct. 7, 2022, 11:21 a.m. UTC
Hi.

I switched from kernel versions 4.9 to 5.15 and found that the MTU on
the interfaces in the bridge does not change.
For example:
I have the following bridge:
bridge      interface
br0          sw1
               sw2
               sw3

And I change with ifconfig MTU.
I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.

But if i send a ping through these interfaces, I get 1500(I added
prints for output)
I investigated the code and found the reason:
The following commit came in the new kernel:
https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb

And the behavior of the MTU setting has changed:
>
> Kernel 4.9:
> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>    ip_mtu_locked(dst) ||
>    !forwarding)  <--- True
> return dst_mtu(dst) <--- 1982
>
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU);
> if (mtu)
> return mtu;



Kernel 5.15:
>
> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>    ip_mtu_locked(dst) ||
>    !forwarding) { <--- True
> mtu = rt->rt_pmtu;  <--- 0
> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
> goto out;
> }
>
> / 'forwarding = true' case should always honour route mtu /
> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
> if (mtu) <--- True
> goto out;

As I see from the code in the end takes mtu from br_dst_default_metrics
> static const u32 br_dst_default_metrics[RTAX_MAX] = {
> [RTAX_MTU - 1] = 1500,
> };

Why is rt_pmtu now used instead of dst_mtu?
Why is forwarding = False called with dst_metric_raw?
Maybe we should add processing when mtu = rt->rt_pmtu == 0?
Could this be an error?


I found a thread discussing a similar problem. It suggested porting
the next patch:
Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
---
 include/net/ip.h | 2 ++
 1 file changed, 2 insertions(+)



Why was this patch not accepted in the end?

Comments

Slade Watkins Oct. 7, 2022, 1:40 p.m. UTC | #1
Hey there,

On 10/7/22 at 7:21 AM, Vyacheslav Salnikov wrote:
> Why was this patch not accepted in the end?

Huh...

I had to do just a little bit of digging to find the original thread,
but it really doesn't seem to me like there was a consensus on whether
or not to take the patch:
https://lore.kernel.org/lkml/20190730122534.30687-1-rdong.ge@gmail.com/T/#u

Reason I say that is that the thread is rather old, and died off quickly.

Someone involved with that patch may be able to offer the answers you're
looking for, if they haven't forgotten about it after all this time.

-srw
Vadim Fedorenko Oct. 13, 2022, 12:43 a.m. UTC | #2
On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
> 
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge      interface
> br0          sw1
>                 sw2
>                 sw3
> 
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
> 
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
> 
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding)  <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
> 
> 
> 
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding) { <--- True
>> mtu = rt->rt_pmtu;  <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
> 
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
> 
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
> 
If you compare ipv4_mtu code from 4.9 you will see that the very first mtu value 
is filled by rt->rt_pmtu value. I believe there were changes to the bridge code 
where rt_pmtu value got empty or cleared.

I'm still looking for the root cause of the problem, will update you once I find it.


> 
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> ---
>   include/net/ip.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>   static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>      const struct sk_buff *skb)
>   {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
>    if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>    bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> 
> 
> Why was this patch not accepted in the end?
Vadim Fedorenko Oct. 15, 2022, 1:35 a.m. UTC | #3
On 07.10.2022 12:21, Vyacheslav Salnikov wrote:
> Hi.
> 
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge      interface
> br0          sw1
>                 sw2
>                 sw3
> 
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
> 
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7ea6dfb
> 
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding)  <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
> 
> 
> 
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>>     ip_mtu_locked(dst) ||
>>     !forwarding) { <--- True
>> mtu = rt->rt_pmtu;  <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
> 
> As I see from the code in the end takes mtu from br_dst_default_metrics
>> static const u32 br_dst_default_metrics[RTAX_MAX] = {
>> [RTAX_MTU - 1] = 1500,
>> };
> 
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
> 

Can you share kernel configs for both versions? Actually only one config value
is needed - CONFIG_BRIDGE_NETFILTER.

It will help me investigate the issue.

> 
> I found a thread discussing a similar problem. It suggested porting
> the next patch:
> Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> ---
>   include/net/ip.h | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int
> ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>   static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>      const struct sk_buff *skb)
>   {
> + if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> + return min(skb->dev->mtu, IP_MAX_MTU);
>    if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>    bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> 
> 
> Why was this patch not accepted in the end?
diff mbox series

Patch

diff --git a/include/net/ip.h b/include/net/ip.h
index 29d89de..0512de3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -450,6 +450,8 @@  static inline unsigned int
ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
    const struct sk_buff *skb)
 {
+ if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
+ return min(skb->dev->mtu, IP_MAX_MTU);
  if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
  bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;