diff mbox series

[net] net: vlan: allow vlan device MTU change follow real device from smaller to bigger

Message ID 20220221124644.1146105-1-william.xuanziyang@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] net: vlan: allow vlan device MTU change follow real device from smaller to bigger | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: herbert@gondor.apana.org.au; 2 maintainers not CCed: zhudi21@huawei.com herbert@gondor.apana.org.au
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ziyang Xuan (William) Feb. 21, 2022, 12:46 p.m. UTC
vlan device MTU can only follow real device change from bigger to smaller
but from smaller to bigger under the premise of vlan device MTU not exceed
the real device MTU.

This issue can be seen using the following commands:

ip link add link eth1 dev eth1.100 type vlan id 100
ip link set eth1 mtu 256
ip link set eth1 mtu 1500
ip link show

Modify to allow vlan device follow real device MTU change from smaller
to bigger when user has not configured vlan device MTU which is not
equal to real device MTU. That also ensure user configuration has higher
priority.

Fixes: 2e477c9bd2bb ("vlan: Propagate physical MTU changes")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 net/8021q/vlan.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Feb. 21, 2022, 3:43 p.m. UTC | #1
CC Herbert Xu, author of blamed commit.

On Mon, Feb 21, 2022 at 4:28 AM Ziyang Xuan
<william.xuanziyang@huawei.com> wrote:
>
> vlan device MTU can only follow real device change from bigger to smaller
> but from smaller to bigger under the premise of vlan device MTU not exceed
> the real device MTU.
>
> This issue can be seen using the following commands:
>
> ip link add link eth1 dev eth1.100 type vlan id 100
> ip link set eth1 mtu 256
> ip link set eth1 mtu 1500
> ip link show
>
> Modify to allow vlan device follow real device MTU change from smaller
> to bigger when user has not configured vlan device MTU which is not
> equal to real device MTU. That also ensure user configuration has higher
> priority.
>
> Fixes: 2e477c9bd2bb ("vlan: Propagate physical MTU changes")
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  net/8021q/vlan.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 788076b002b3..7de4f462525a 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -361,6 +361,7 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event)
>  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
>                              void *ptr)
>  {
> +       unsigned int orig_mtu = ((struct netdev_notifier_info_ext *)ptr)->ext.mtu;
>         struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr);
>         struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>         struct vlan_group *grp;
> @@ -419,7 +420,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
>
>         case NETDEV_CHANGEMTU:
>                 vlan_group_for_each_dev(grp, i, vlandev) {
> -                       if (vlandev->mtu <= dev->mtu)
> +                       if (vlandev->mtu <= dev->mtu && vlandev->mtu != orig_mtu)
>                                 continue;
>
>                         dev_set_mtu(vlandev, dev->mtu);
> --
> 2.25.1
>

Herbert, do you recall why only a decrease was taken into consideration ?

Thanks.
Herbert Xu Feb. 22, 2022, 12:58 a.m. UTC | #2
On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
>
> Herbert, do you recall why only a decrease was taken into consideration ?

Because we shouldn't override administrative settings of the MTU
on the vlan device, unless we have to because of an MTU reduction
on the underlying device.

Yes this is not perfect if the admin never set an MTU to start with
but as we don't have a way of telling whether the admin has or has
not changed the MTU setting, the safest course of action is to do
nothing in that case.

Thanks,
Ziyang Xuan (William) Feb. 22, 2022, 2:06 a.m. UTC | #3
> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
>>
>> Herbert, do you recall why only a decrease was taken into consideration ?
> 
> Because we shouldn't override administrative settings of the MTU
> on the vlan device, unless we have to because of an MTU reduction
> on the underlying device.
> 
> Yes this is not perfect if the admin never set an MTU to start with
> but as we don't have a way of telling whether the admin has or has
> not changed the MTU setting, the safest course of action is to do
> nothing in that case.
If the admin has changed the vlan device MTU smaller than the underlying
device MTU firstly, then changed the underlying device MTU smaller than
the vlan device MTU secondly. The admin's configuration has been overridden.
Can we consider that the admin's configuration for the vlan device MTU has
been invalid and disappeared after the second change? I think so.

> 
> Thanks,
>
Eric Dumazet Feb. 22, 2022, 2:27 a.m. UTC | #4
On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William)
<william.xuanziyang@huawei.com> wrote:
>
> > On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
> >>
> >> Herbert, do you recall why only a decrease was taken into consideration ?
> >
> > Because we shouldn't override administrative settings of the MTU
> > on the vlan device, unless we have to because of an MTU reduction
> > on the underlying device.
> >
> > Yes this is not perfect if the admin never set an MTU to start with
> > but as we don't have a way of telling whether the admin has or has
> > not changed the MTU setting, the safest course of action is to do
> > nothing in that case.
> If the admin has changed the vlan device MTU smaller than the underlying
> device MTU firstly, then changed the underlying device MTU smaller than
> the vlan device MTU secondly. The admin's configuration has been overridden.
> Can we consider that the admin's configuration for the vlan device MTU has
> been invalid and disappeared after the second change? I think so.

The answer is no.

Herbert is saying:

ip link add link eth1 dev eth1.100 type vlan id 100
...
ip link set eth1.100 mtu 800
..
ip link set eth1 mtu 256
ip link set eth1 mtu 1500

-> we do not want eth1.100 mtu being set back to 1500, this might
break applications, depending on old kernel feature.
 Eventually, setting back to 800 seems ok.

If you want this new feature, we need to record in eth1.100 device
that no admin ever changed the mtu,
as Herbert suggested.

Then, it is okay to upgrade the vlan mtu (but still is a behavioral
change that _could_ break some scripts)

Thank you.
Ziyang Xuan (William) Feb. 22, 2022, 7:31 a.m. UTC | #5
> On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William)
> <william.xuanziyang@huawei.com> wrote:
>>
>>> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
>>>>
>>>> Herbert, do you recall why only a decrease was taken into consideration ?
>>>
>>> Because we shouldn't override administrative settings of the MTU
>>> on the vlan device, unless we have to because of an MTU reduction
>>> on the underlying device.
>>>
>>> Yes this is not perfect if the admin never set an MTU to start with
>>> but as we don't have a way of telling whether the admin has or has
>>> not changed the MTU setting, the safest course of action is to do
>>> nothing in that case.
>> If the admin has changed the vlan device MTU smaller than the underlying
>> device MTU firstly, then changed the underlying device MTU smaller than
>> the vlan device MTU secondly. The admin's configuration has been overridden.
>> Can we consider that the admin's configuration for the vlan device MTU has
>> been invalid and disappeared after the second change? I think so.
> 
> The answer is no.
> 
> Herbert is saying:
> 
> ip link add link eth1 dev eth1.100 type vlan id 100
> ...
> ip link set eth1.100 mtu 800
> ..
> ip link set eth1 mtu 256
> ip link set eth1 mtu 1500
> 
> -> we do not want eth1.100 mtu being set back to 1500, this might
> break applications, depending on old kernel feature.
>  Eventually, setting back to 800 seems ok.

It seem that setting back to 800 more reasonable. We can record user
setting MTU by interface ndo_change_mtu() in struct vlan_dev_priv.

> 
> If you want this new feature, we need to record in eth1.100 device
> that no admin ever changed the mtu,
> as Herbert suggested.
> 
> Then, it is okay to upgrade the vlan mtu (but still is a behavioral
> change that _could_ break some scripts)
> 
> Thank you.
> .
>
Guillaume Nault Feb. 22, 2022, 10:37 a.m. UTC | #6
On Mon, Feb 21, 2022 at 06:27:46PM -0800, Eric Dumazet wrote:
> On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William)
> <william.xuanziyang@huawei.com> wrote:
> >
> > > On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
> > >>
> > >> Herbert, do you recall why only a decrease was taken into consideration ?
> > >
> > > Because we shouldn't override administrative settings of the MTU
> > > on the vlan device, unless we have to because of an MTU reduction
> > > on the underlying device.
> > >
> > > Yes this is not perfect if the admin never set an MTU to start with
> > > but as we don't have a way of telling whether the admin has or has
> > > not changed the MTU setting, the safest course of action is to do
> > > nothing in that case.
> > If the admin has changed the vlan device MTU smaller than the underlying
> > device MTU firstly, then changed the underlying device MTU smaller than
> > the vlan device MTU secondly. The admin's configuration has been overridden.
> > Can we consider that the admin's configuration for the vlan device MTU has
> > been invalid and disappeared after the second change? I think so.
> 
> The answer is no.
> 
> Herbert is saying:
> 
> ip link add link eth1 dev eth1.100 type vlan id 100
> ...
> ip link set eth1.100 mtu 800
> ..
> ip link set eth1 mtu 256
> ip link set eth1 mtu 1500
> 
> -> we do not want eth1.100 mtu being set back to 1500, this might
> break applications, depending on old kernel feature.
>  Eventually, setting back to 800 seems ok.
> 
> If you want this new feature, we need to record in eth1.100 device
> that no admin ever changed the mtu,
> as Herbert suggested.
> 
> Then, it is okay to upgrade the vlan mtu (but still is a behavioral
> change that _could_ break some scripts)

What about an explicit option:

  ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu


Or for something more future proof, an option that can accept several
policies:

  mtu-update <reduce-only,follow,...>

      reduce-only (default):
        update vlan's MTU only if the new MTU is smaller than the
        current one (current behaviour).

      follow:
        always follow the MTU of the parent device.

Then if anyone wants more complex policies:

      follow-if-not-modified:
        follow the MTU of the parent device as long as the VLAN's MTU
        was not manually changed. Otherwise only adjust the VLAN's MTU
        when the parent's one is set to a smaller value.

      follow-if-not-modified-but-not-quite:
        like follow-if-not-modified but revert back to the VLAN's
        last manually modified MTU, if any, whenever possible (that is,
        when the parent device's MTU is set back to a higher value).
        That probably requires the possibility to dump the last
        modified MTU, so the administrator can anticipate the
        consequences of modifying the parent device.

     yet-another-policy (because people have a lot of imagination):
       for example, keep the MTU 4 bytes lower than the parent device,
       to account for VLAN overhead.

Of course feel free to suggest better names and policies :).

This way, we can keep the current behaviour and avoid unexpected
heuristics that are difficult to explain (and even more difficult for
network admins to figure out on their own).

> Thank you.
>
Jakub Kicinski Feb. 22, 2022, 11:28 p.m. UTC | #7
On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote:
> What about an explicit option:
> 
>   ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu
> 
> 
> Or for something more future proof, an option that can accept several
> policies:
> 
>   mtu-update <reduce-only,follow,...>
> 
>       reduce-only (default):
>         update vlan's MTU only if the new MTU is smaller than the
>         current one (current behaviour).
> 
>       follow:
>         always follow the MTU of the parent device.
> 
> Then if anyone wants more complex policies:
> 
>       follow-if-not-modified:
>         follow the MTU of the parent device as long as the VLAN's MTU
>         was not manually changed. Otherwise only adjust the VLAN's MTU
>         when the parent's one is set to a smaller value.
> 
>       follow-if-not-modified-but-not-quite:
>         like follow-if-not-modified but revert back to the VLAN's
>         last manually modified MTU, if any, whenever possible (that is,
>         when the parent device's MTU is set back to a higher value).
>         That probably requires the possibility to dump the last
>         modified MTU, so the administrator can anticipate the
>         consequences of modifying the parent device.
> 
>      yet-another-policy (because people have a lot of imagination):
>        for example, keep the MTU 4 bytes lower than the parent device,
>        to account for VLAN overhead.
> 
> Of course feel free to suggest better names and policies :).
> 
> This way, we can keep the current behaviour and avoid unexpected
> heuristics that are difficult to explain (and even more difficult for
> network admins to figure out on their own).

My $0.02 would be that if we want to make changes that require new uAPI
we should do it across uppers.
Ziyang Xuan (William) Feb. 23, 2022, 1:55 a.m. UTC | #8
>> On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William)
>> <william.xuanziyang@huawei.com> wrote:
>>>
>>>> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote:
>>>>>
>>>>> Herbert, do you recall why only a decrease was taken into consideration ?
>>>>
>>>> Because we shouldn't override administrative settings of the MTU
>>>> on the vlan device, unless we have to because of an MTU reduction
>>>> on the underlying device.
>>>>
>>>> Yes this is not perfect if the admin never set an MTU to start with
>>>> but as we don't have a way of telling whether the admin has or has
>>>> not changed the MTU setting, the safest course of action is to do
>>>> nothing in that case.
>>> If the admin has changed the vlan device MTU smaller than the underlying
>>> device MTU firstly, then changed the underlying device MTU smaller than
>>> the vlan device MTU secondly. The admin's configuration has been overridden.
>>> Can we consider that the admin's configuration for the vlan device MTU has
>>> been invalid and disappeared after the second change? I think so.
>>
>> The answer is no.
>>
>> Herbert is saying:
>>
>> ip link add link eth1 dev eth1.100 type vlan id 100
>> ...
>> ip link set eth1.100 mtu 800
>> ..
>> ip link set eth1 mtu 256
>> ip link set eth1 mtu 1500
>>
>> -> we do not want eth1.100 mtu being set back to 1500, this might
>> break applications, depending on old kernel feature.
>>  Eventually, setting back to 800 seems ok.
> 
> It seem that setting back to 800 more reasonable. We can record user
> setting MTU by interface ndo_change_mtu() in struct vlan_dev_priv.
> 

I attempt to record user setting MTU for vlan device. Use the recorded
MTU to restore when uderlying device change the MTU from smaller to
bigger than recorded vlan device MTU. The modification as following:

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 2be4dd7e90a9..b8970e90a279 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -163,6 +163,7 @@ struct netpoll;
  *     @vlan_proto: VLAN encapsulation protocol
  *     @vlan_id: VLAN identifier
  *     @flags: device flags
+ *     @mtu: user setting MTU
  *     @real_dev: underlying netdevice
  *     @dev_tracker: refcount tracker for @real_dev reference
  *     @real_dev_addr: address of underlying netdevice
@@ -179,6 +180,8 @@ struct vlan_dev_priv {
        u16                                     vlan_id;
        u16                                     flags;

+       u32                                     mtu;
+
        struct net_device                       *real_dev;
        netdevice_tracker                       dev_tracker;

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 788076b002b3..492ef88923c2 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -365,6 +365,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
        struct net_device *dev = netdev_notifier_info_to_dev(ptr);
        struct vlan_group *grp;
        struct vlan_info *vlan_info;
+       unsigned int new_mtu;
        int i, flgs;
        struct net_device *vlandev;
        struct vlan_dev_priv *vlan;
@@ -419,10 +420,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,

        case NETDEV_CHANGEMTU:
                vlan_group_for_each_dev(grp, i, vlandev) {
-                       if (vlandev->mtu <= dev->mtu)
-                               continue;
+                       vlan = vlan_dev_priv(vlandev);
+                       new_mtu = (!vlan->mtu || dev->mtu < vlan->mtu) ? dev->mtu : vlan->mtu;

-                       dev_set_mtu(vlandev, dev->mtu);
+                       dev_set_mtu(vlandev, new_mtu);
                }
                break;

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index d1902828a18a..66c2b64d1ece 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -140,7 +140,8 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,

 static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu)
 {
-       struct net_device *real_dev = vlan_dev_priv(dev)->real_dev;
+       struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
+       struct net_device *real_dev = vlan->real_dev;
        unsigned int max_mtu = real_dev->mtu;

        if (netif_reduces_vlan_mtu(real_dev))
@@ -148,8 +149,15 @@ static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu)
        if (max_mtu < new_mtu)
                return -ERANGE;

-       dev->mtu = new_mtu;
+       /* Identify user MTU change different from the underlying devcie
+        * NETDEV_CHANGEMTU event. Record user setting MTU in mtu member
+        * of struct vlan_dev_priv.
+        */
+       if ((!vlan->mtu && new_mtu != real_dev->mtu) ||
+           (dev->mtu == vlan->mtu && vlan->mtu < real_dev->mtu))
+               vlan->mtu = new_mtu;

+       dev->mtu = new_mtu;
        return 0;
 }


I test it  in various combination scenarios. I found it can not cover one
scenario because user setting can not arrive vlan module codes. For example:

ip link add link eth1 dev eth1.100 type vlan id 100 // eth1.100 MTU 1500
ip link set eth1.100 mtu 1500 // success no error

When new_mtu equal to orig_mtu, user setting operation can not arrive lower
vlan module, vlan module can not perceive, so we can not record the setting.

Before my colleague point that the above setting is not error for user, I
always think that it is invalid setting. But I think his opinion makes sense.

So what do you think about the above setting?

>>
>> If you want this new feature, we need to record in eth1.100 device
>> that no admin ever changed the mtu,
>> as Herbert suggested.
>>
>> Then, it is okay to upgrade the vlan mtu (but still is a behavioral
>> change that _could_ break some scripts)
>>
>> Thank you.
>> .
>>
> .
>
Guillaume Nault Feb. 23, 2022, 11:26 a.m. UTC | #9
On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote:
> On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote:
> > What about an explicit option:
> > 
> >   ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu
> > 
> > 
> > Or for something more future proof, an option that can accept several
> > policies:
> > 
> >   mtu-update <reduce-only,follow,...>
> > 
> >       reduce-only (default):
> >         update vlan's MTU only if the new MTU is smaller than the
> >         current one (current behaviour).
> > 
> >       follow:
> >         always follow the MTU of the parent device.
> > 
> > Then if anyone wants more complex policies:
> > 
> >       follow-if-not-modified:
> >         follow the MTU of the parent device as long as the VLAN's MTU
> >         was not manually changed. Otherwise only adjust the VLAN's MTU
> >         when the parent's one is set to a smaller value.
> > 
> >       follow-if-not-modified-but-not-quite:
> >         like follow-if-not-modified but revert back to the VLAN's
> >         last manually modified MTU, if any, whenever possible (that is,
> >         when the parent device's MTU is set back to a higher value).
> >         That probably requires the possibility to dump the last
> >         modified MTU, so the administrator can anticipate the
> >         consequences of modifying the parent device.
> > 
> >      yet-another-policy (because people have a lot of imagination):
> >        for example, keep the MTU 4 bytes lower than the parent device,
> >        to account for VLAN overhead.
> > 
> > Of course feel free to suggest better names and policies :).
> > 
> > This way, we can keep the current behaviour and avoid unexpected
> > heuristics that are difficult to explain (and even more difficult for
> > network admins to figure out on their own).
> 
> My $0.02 would be that if we want to make changes that require new uAPI
> we should do it across uppers.

Do you mean something like:

  ip link set dev eth0 vlan-mtu-policy <policy-name>

that'd affect all existing (and future) vlans of eth0?

Then I think that for non-ethernet devices, we should reject this
option and skip it when dumping config. But yes, that's another
possibility.

I personnaly don't really mind, as long as we keep a clear behaviour.

What I'd really like to avoid is something like:
  - By default it behaves this way.
  - If you modified the MTU it behaves in another way
  - But if you modified the MTU but later restored the
    original MTU, then you're back to the default behaviour
    (or not?), unless the MTU of the upper device was also
    changed meanwhile, in which case ... to be continued ...
  - BTW, you might not be able to tell how the VLAN's MTU is going to
    behave by simply looking at its configuration, because that also
    depends on past configurations.
  - Well, and if your kernel is older than xxx, then you always get the
    default behaviour.
  - ... and we might modify the heuristics again in the future to
    accomodate with situations or use cases we failed to consider.
Stephen Hemminger Feb. 23, 2022, 3:17 p.m. UTC | #10
On Wed, 23 Feb 2022 12:26:18 +0100
Guillaume Nault <gnault@redhat.com> wrote:

> On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote:
> > On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote:  
> > > What about an explicit option:
> > > 
> > >   ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu
> > > 
> > > 
> > > Or for something more future proof, an option that can accept several
> > > policies:
> > > 
> > >   mtu-update <reduce-only,follow,...>
> > > 
> > >       reduce-only (default):
> > >         update vlan's MTU only if the new MTU is smaller than the
> > >         current one (current behaviour).
> > > 
> > >       follow:
> > >         always follow the MTU of the parent device.
> > > 
> > > Then if anyone wants more complex policies:
> > > 
> > >       follow-if-not-modified:
> > >         follow the MTU of the parent device as long as the VLAN's MTU
> > >         was not manually changed. Otherwise only adjust the VLAN's MTU
> > >         when the parent's one is set to a smaller value.
> > > 
> > >       follow-if-not-modified-but-not-quite:
> > >         like follow-if-not-modified but revert back to the VLAN's
> > >         last manually modified MTU, if any, whenever possible (that is,
> > >         when the parent device's MTU is set back to a higher value).
> > >         That probably requires the possibility to dump the last
> > >         modified MTU, so the administrator can anticipate the
> > >         consequences of modifying the parent device.
> > > 
> > >      yet-another-policy (because people have a lot of imagination):
> > >        for example, keep the MTU 4 bytes lower than the parent device,
> > >        to account for VLAN overhead.
> > > 
> > > Of course feel free to suggest better names and policies :).
> > > 
> > > This way, we can keep the current behaviour and avoid unexpected
> > > heuristics that are difficult to explain (and even more difficult for
> > > network admins to figure out on their own).  
> > 
> > My $0.02 would be that if we want to make changes that require new uAPI
> > we should do it across uppers.  
> 
> Do you mean something like:
> 
>   ip link set dev eth0 vlan-mtu-policy <policy-name>
> 
> that'd affect all existing (and future) vlans of eth0?
> 
> Then I think that for non-ethernet devices, we should reject this
> option and skip it when dumping config. But yes, that's another
> possibility.
> 
> I personnaly don't really mind, as long as we keep a clear behaviour.
> 
> What I'd really like to avoid is something like:
>   - By default it behaves this way.
>   - If you modified the MTU it behaves in another way
>   - But if you modified the MTU but later restored the
>     original MTU, then you're back to the default behaviour
>     (or not?), unless the MTU of the upper device was also
>     changed meanwhile, in which case ... to be continued ...
>   - BTW, you might not be able to tell how the VLAN's MTU is going to
>     behave by simply looking at its configuration, because that also
>     depends on past configurations.
>   - Well, and if your kernel is older than xxx, then you always get the
>     default behaviour.
>   - ... and we might modify the heuristics again in the future to
>     accomodate with situations or use cases we failed to consider.
> 

In general these kind of policy choices are done via sysctl knobs.
They aren't done at netlink/ip link level.
Jakub Kicinski Feb. 23, 2022, 4:03 p.m. UTC | #11
On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote:
> Do you mean something like:
> 
>   ip link set dev eth0 vlan-mtu-policy <policy-name>
> 
> that'd affect all existing (and future) vlans of eth0?

I meant

  ip link set dev vlan0 mtu-policy blah

but also

  ip link set dev bond0 mtu-policy blah

and

  ip link set dev macsec0 mtu-policy blah2
  ip link set dev vxlan0 mtu-policy blah2

etc.

> Then I think that for non-ethernet devices, we should reject this
> option and skip it when dumping config. But yes, that's another
> possibility.
> 
> I personnaly don't really mind, as long as we keep a clear behaviour.
> 
> What I'd really like to avoid is something like:
>   - By default it behaves this way.
>   - If you modified the MTU it behaves in another way
>   - But if you modified the MTU but later restored the
>     original MTU, then you're back to the default behaviour
>     (or not?), unless the MTU of the upper device was also
>     changed meanwhile, in which case ... to be continued ...
>   - BTW, you might not be able to tell how the VLAN's MTU is going to
>     behave by simply looking at its configuration, because that also
>     depends on past configurations.
>   - Well, and if your kernel is older than xxx, then you always get the
>     default behaviour.
>   - ... and we might modify the heuristics again in the future to
>     accomodate with situations or use cases we failed to consider.

To be honest I'm still not clear if this is a real problem.
The patch does not specify what the use case is.
Guillaume Nault Feb. 23, 2022, 4:34 p.m. UTC | #12
On Wed, Feb 23, 2022 at 07:17:36AM -0800, Stephen Hemminger wrote:
> On Wed, 23 Feb 2022 12:26:18 +0100
> Guillaume Nault <gnault@redhat.com> wrote:
> 
> > On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote:
> > > On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote:  
> > > > What about an explicit option:
> > > > 
> > > >   ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu
> > > > 
> > > > 
> > > > Or for something more future proof, an option that can accept several
> > > > policies:
> > > > 
> > > >   mtu-update <reduce-only,follow,...>
> > > > 
> > > >       reduce-only (default):
> > > >         update vlan's MTU only if the new MTU is smaller than the
> > > >         current one (current behaviour).
> > > > 
> > > >       follow:
> > > >         always follow the MTU of the parent device.
> > > > 
> > > > Then if anyone wants more complex policies:
> > > > 
> > > >       follow-if-not-modified:
> > > >         follow the MTU of the parent device as long as the VLAN's MTU
> > > >         was not manually changed. Otherwise only adjust the VLAN's MTU
> > > >         when the parent's one is set to a smaller value.
> > > > 
> > > >       follow-if-not-modified-but-not-quite:
> > > >         like follow-if-not-modified but revert back to the VLAN's
> > > >         last manually modified MTU, if any, whenever possible (that is,
> > > >         when the parent device's MTU is set back to a higher value).
> > > >         That probably requires the possibility to dump the last
> > > >         modified MTU, so the administrator can anticipate the
> > > >         consequences of modifying the parent device.
> > > > 
> > > >      yet-another-policy (because people have a lot of imagination):
> > > >        for example, keep the MTU 4 bytes lower than the parent device,
> > > >        to account for VLAN overhead.
> > > > 
> > > > Of course feel free to suggest better names and policies :).
> > > > 
> > > > This way, we can keep the current behaviour and avoid unexpected
> > > > heuristics that are difficult to explain (and even more difficult for
> > > > network admins to figure out on their own).  
> > > 
> > > My $0.02 would be that if we want to make changes that require new uAPI
> > > we should do it across uppers.  
> > 
> > Do you mean something like:
> > 
> >   ip link set dev eth0 vlan-mtu-policy <policy-name>
> > 
> > that'd affect all existing (and future) vlans of eth0?
> > 
> > Then I think that for non-ethernet devices, we should reject this
> > option and skip it when dumping config. But yes, that's another
> > possibility.
> > 
> > I personnaly don't really mind, as long as we keep a clear behaviour.
> > 
> > What I'd really like to avoid is something like:
> >   - By default it behaves this way.
> >   - If you modified the MTU it behaves in another way
> >   - But if you modified the MTU but later restored the
> >     original MTU, then you're back to the default behaviour
> >     (or not?), unless the MTU of the upper device was also
> >     changed meanwhile, in which case ... to be continued ...
> >   - BTW, you might not be able to tell how the VLAN's MTU is going to
> >     behave by simply looking at its configuration, because that also
> >     depends on past configurations.
> >   - Well, and if your kernel is older than xxx, then you always get the
> >     default behaviour.
> >   - ... and we might modify the heuristics again in the future to
> >     accomodate with situations or use cases we failed to consider.
> > 
> 
> In general these kind of policy choices are done via sysctl knobs.
> They aren't done at netlink/ip link level.

I don't really mind if the configuration is per vlan, per upper device
or per netns, as long as we keep a clear behaviour by default.
Guillaume Nault Feb. 23, 2022, 4:58 p.m. UTC | #13
On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote:
> On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote:
> > Do you mean something like:
> > 
> >   ip link set dev eth0 vlan-mtu-policy <policy-name>
> > 
> > that'd affect all existing (and future) vlans of eth0?
> 
> I meant
> 
>   ip link set dev vlan0 mtu-policy blah
> 
> but also
> 
>   ip link set dev bond0 mtu-policy blah
> 
> and
> 
>   ip link set dev macsec0 mtu-policy blah2
>   ip link set dev vxlan0 mtu-policy blah2
> 
> etc.

Unless I'm missing something, that looks very much like what I proposed
(these are all ARPHRD_ETHER devices). It's just a bit unclear whether
"ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans
that might be stacked on top of it (given your other examples, I assume
it's the later).

> > Then I think that for non-ethernet devices, we should reject this
> > option and skip it when dumping config. But yes, that's another
> > possibility.
> > 
> > I personnaly don't really mind, as long as we keep a clear behaviour.
> > 
> > What I'd really like to avoid is something like:
> >   - By default it behaves this way.
> >   - If you modified the MTU it behaves in another way
> >   - But if you modified the MTU but later restored the
> >     original MTU, then you're back to the default behaviour
> >     (or not?), unless the MTU of the upper device was also
> >     changed meanwhile, in which case ... to be continued ...
> >   - BTW, you might not be able to tell how the VLAN's MTU is going to
> >     behave by simply looking at its configuration, because that also
> >     depends on past configurations.
> >   - Well, and if your kernel is older than xxx, then you always get the
> >     default behaviour.
> >   - ... and we might modify the heuristics again in the future to
> >     accomodate with situations or use cases we failed to consider.
> 
> To be honest I'm still not clear if this is a real problem.
> The patch does not specify what the use case is.

It's probably not a problem as long as we keep sane behaviour by
default. Then we can let admins opt in for something more complex or
loosely defined.
Stephen Hemminger Feb. 23, 2022, 5:05 p.m. UTC | #14
On Wed, 23 Feb 2022 08:03:42 -0800
Jakub Kicinski <kuba@kernel.org> wrote:

> On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote:
> > Do you mean something like:
> > 
> >   ip link set dev eth0 vlan-mtu-policy <policy-name>
> > 
> > that'd affect all existing (and future) vlans of eth0?  
> 
> I meant
> 
>   ip link set dev vlan0 mtu-policy blah
> 
> but also
> 
>   ip link set dev bond0 mtu-policy blah
> 
> and
> 
>   ip link set dev macsec0 mtu-policy blah2
>   ip link set dev vxlan0 mtu-policy blah2

Sorry, putting this in ip link is not the right place.
It belongs in sysctl (if at all); not convinced this is worth doing.
Jakub Kicinski Feb. 23, 2022, 5:37 p.m. UTC | #15
On Wed, 23 Feb 2022 17:58:36 +0100 Guillaume Nault wrote:
> On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote:
> > I meant
> > 
> >   ip link set dev vlan0 mtu-policy blah
> > 
> > but also
> > 
> >   ip link set dev bond0 mtu-policy blah
> > 
> > and
> > 
> >   ip link set dev macsec0 mtu-policy blah2
> >   ip link set dev vxlan0 mtu-policy blah2
> > 
> > etc.  
> 
> Unless I'm missing something, that looks very much like what I proposed
> (these are all ARPHRD_ETHER devices). It's just a bit unclear whether
> "ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans
> that might be stacked on top of it (given your other examples, I assume
> it's the later).

No, sorry I thought it would be clear, we need that neuralink ;)
It applies to the device on which it's configured. What I mean
is that bond, macsec, mpls etc have the same "should it follow 
the MTU of the lower device" problem, it's not vlan specific.
Or am I wrong about that?

> > To be honest I'm still not clear if this is a real problem.
> > The patch does not specify what the use case is.  
> 
> It's probably not a problem as long as we keep sane behaviour by
> default. Then we can let admins opt in for something more complex or
> loosely defined.

What I meant was - does anyone actually flip the MTU of their
interfaces back and forth while the system is running. Maybe
people do.
Guillaume Nault Feb. 23, 2022, 7:46 p.m. UTC | #16
On Wed, Feb 23, 2022 at 09:37:49AM -0800, Jakub Kicinski wrote:
> On Wed, 23 Feb 2022 17:58:36 +0100 Guillaume Nault wrote:
> > On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote:
> > > I meant
> > > 
> > >   ip link set dev vlan0 mtu-policy blah
> > > 
> > > but also
> > > 
> > >   ip link set dev bond0 mtu-policy blah
> > > 
> > > and
> > > 
> > >   ip link set dev macsec0 mtu-policy blah2
> > >   ip link set dev vxlan0 mtu-policy blah2
> > > 
> > > etc.  
> > 
> > Unless I'm missing something, that looks very much like what I proposed
> > (these are all ARPHRD_ETHER devices). It's just a bit unclear whether
> > "ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans
> > that might be stacked on top of it (given your other examples, I assume
> > it's the later).
> 
> No, sorry I thought it would be clear, we need that neuralink ;)
> It applies to the device on which it's configured. What I mean
> is that bond, macsec, mpls etc have the same "should it follow 
> the MTU of the lower device" problem, it's not vlan specific.
> Or am I wrong about that?

Ok, I get it now, sorry for being slow :). But I wouldn't consider mpls
and vxlan. We have no device type for mpls. For vxlan (and other ip
tunnels) the virtual device isn't directly tied to a physical device.
Also, ip tunnels can resort to fragmentation in case of small MTU on
the output device, so following MTU changes is not a hard requirement
as with vlans.

For other devices, we'd probably have to take into account the fact
that some of them need to have a smaller MTU due to their extra header
(that can be the case for some stacked vlans scenarios).

But honnestly, I don't believe it's worth the extra complexity.

> > > To be honest I'm still not clear if this is a real problem.
> > > The patch does not specify what the use case is.  
> > 
> > It's probably not a problem as long as we keep sane behaviour by
> > default. Then we can let admins opt in for something more complex or
> > loosely defined.
> 
> What I meant was - does anyone actually flip the MTU of their
> interfaces back and forth while the system is running. Maybe
> people do.

In my experience people often try to upgrade their MTU, which is prone
to failure because all nodes on the ethernet segment need to be
upgraded at once (and people like unmanageably big ethernet segments).
So reverting to the previous configuration is often needed.
Another reason for back and forth modifications is fat fingers: change
the MTU of a device, realise that was the wrong one, restore settings
and reapply on the correct device.

More importantly, one path people take to upgrade their MTU is to
ensure that all their traffic is vlan encapsulated, then higher the MTU
of the ethernet device, and finally higher the MTU of each vlan on a
case by case basis. In such scenarios, you certainly _don't_ want the
vlans to follow the MTU of their parent device, no matter if their MTU
is the default one, if it's equal to the current MTU of the eth
interface, if it was ever modified since the creation of the device,
or any other situation heuristics might use.
diff mbox series

Patch

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 788076b002b3..7de4f462525a 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -361,6 +361,7 @@  static int __vlan_device_event(struct net_device *dev, unsigned long event)
 static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 			     void *ptr)
 {
+	unsigned int orig_mtu = ((struct netdev_notifier_info_ext *)ptr)->ext.mtu;
 	struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr);
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 	struct vlan_group *grp;
@@ -419,7 +420,7 @@  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 
 	case NETDEV_CHANGEMTU:
 		vlan_group_for_each_dev(grp, i, vlandev) {
-			if (vlandev->mtu <= dev->mtu)
+			if (vlandev->mtu <= dev->mtu && vlandev->mtu != orig_mtu)
 				continue;
 
 			dev_set_mtu(vlandev, dev->mtu);