Message ID | 20220221124644.1146105-1-william.xuanziyang@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: vlan: allow vlan device MTU change follow real device from smaller to bigger | expand |
CC Herbert Xu, author of blamed commit. On Mon, Feb 21, 2022 at 4:28 AM Ziyang Xuan <william.xuanziyang@huawei.com> wrote: > > vlan device MTU can only follow real device change from bigger to smaller > but from smaller to bigger under the premise of vlan device MTU not exceed > the real device MTU. > > This issue can be seen using the following commands: > > ip link add link eth1 dev eth1.100 type vlan id 100 > ip link set eth1 mtu 256 > ip link set eth1 mtu 1500 > ip link show > > Modify to allow vlan device follow real device MTU change from smaller > to bigger when user has not configured vlan device MTU which is not > equal to real device MTU. That also ensure user configuration has higher > priority. > > Fixes: 2e477c9bd2bb ("vlan: Propagate physical MTU changes") > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- > net/8021q/vlan.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c > index 788076b002b3..7de4f462525a 100644 > --- a/net/8021q/vlan.c > +++ b/net/8021q/vlan.c > @@ -361,6 +361,7 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event) > static int vlan_device_event(struct notifier_block *unused, unsigned long event, > void *ptr) > { > + unsigned int orig_mtu = ((struct netdev_notifier_info_ext *)ptr)->ext.mtu; > struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); > struct net_device *dev = netdev_notifier_info_to_dev(ptr); > struct vlan_group *grp; > @@ -419,7 +420,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, > > case NETDEV_CHANGEMTU: > vlan_group_for_each_dev(grp, i, vlandev) { > - if (vlandev->mtu <= dev->mtu) > + if (vlandev->mtu <= dev->mtu && vlandev->mtu != orig_mtu) > continue; > > dev_set_mtu(vlandev, dev->mtu); > -- > 2.25.1 > Herbert, do you recall why only a decrease was taken into consideration ? Thanks.
On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: > > Herbert, do you recall why only a decrease was taken into consideration ? Because we shouldn't override administrative settings of the MTU on the vlan device, unless we have to because of an MTU reduction on the underlying device. Yes this is not perfect if the admin never set an MTU to start with but as we don't have a way of telling whether the admin has or has not changed the MTU setting, the safest course of action is to do nothing in that case. Thanks,
> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: >> >> Herbert, do you recall why only a decrease was taken into consideration ? > > Because we shouldn't override administrative settings of the MTU > on the vlan device, unless we have to because of an MTU reduction > on the underlying device. > > Yes this is not perfect if the admin never set an MTU to start with > but as we don't have a way of telling whether the admin has or has > not changed the MTU setting, the safest course of action is to do > nothing in that case. If the admin has changed the vlan device MTU smaller than the underlying device MTU firstly, then changed the underlying device MTU smaller than the vlan device MTU secondly. The admin's configuration has been overridden. Can we consider that the admin's configuration for the vlan device MTU has been invalid and disappeared after the second change? I think so. > > Thanks, >
On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William) <william.xuanziyang@huawei.com> wrote: > > > On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: > >> > >> Herbert, do you recall why only a decrease was taken into consideration ? > > > > Because we shouldn't override administrative settings of the MTU > > on the vlan device, unless we have to because of an MTU reduction > > on the underlying device. > > > > Yes this is not perfect if the admin never set an MTU to start with > > but as we don't have a way of telling whether the admin has or has > > not changed the MTU setting, the safest course of action is to do > > nothing in that case. > If the admin has changed the vlan device MTU smaller than the underlying > device MTU firstly, then changed the underlying device MTU smaller than > the vlan device MTU secondly. The admin's configuration has been overridden. > Can we consider that the admin's configuration for the vlan device MTU has > been invalid and disappeared after the second change? I think so. The answer is no. Herbert is saying: ip link add link eth1 dev eth1.100 type vlan id 100 ... ip link set eth1.100 mtu 800 .. ip link set eth1 mtu 256 ip link set eth1 mtu 1500 -> we do not want eth1.100 mtu being set back to 1500, this might break applications, depending on old kernel feature. Eventually, setting back to 800 seems ok. If you want this new feature, we need to record in eth1.100 device that no admin ever changed the mtu, as Herbert suggested. Then, it is okay to upgrade the vlan mtu (but still is a behavioral change that _could_ break some scripts) Thank you.
> On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William) > <william.xuanziyang@huawei.com> wrote: >> >>> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: >>>> >>>> Herbert, do you recall why only a decrease was taken into consideration ? >>> >>> Because we shouldn't override administrative settings of the MTU >>> on the vlan device, unless we have to because of an MTU reduction >>> on the underlying device. >>> >>> Yes this is not perfect if the admin never set an MTU to start with >>> but as we don't have a way of telling whether the admin has or has >>> not changed the MTU setting, the safest course of action is to do >>> nothing in that case. >> If the admin has changed the vlan device MTU smaller than the underlying >> device MTU firstly, then changed the underlying device MTU smaller than >> the vlan device MTU secondly. The admin's configuration has been overridden. >> Can we consider that the admin's configuration for the vlan device MTU has >> been invalid and disappeared after the second change? I think so. > > The answer is no. > > Herbert is saying: > > ip link add link eth1 dev eth1.100 type vlan id 100 > ... > ip link set eth1.100 mtu 800 > .. > ip link set eth1 mtu 256 > ip link set eth1 mtu 1500 > > -> we do not want eth1.100 mtu being set back to 1500, this might > break applications, depending on old kernel feature. > Eventually, setting back to 800 seems ok. It seem that setting back to 800 more reasonable. We can record user setting MTU by interface ndo_change_mtu() in struct vlan_dev_priv. > > If you want this new feature, we need to record in eth1.100 device > that no admin ever changed the mtu, > as Herbert suggested. > > Then, it is okay to upgrade the vlan mtu (but still is a behavioral > change that _could_ break some scripts) > > Thank you. > . >
On Mon, Feb 21, 2022 at 06:27:46PM -0800, Eric Dumazet wrote: > On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William) > <william.xuanziyang@huawei.com> wrote: > > > > > On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: > > >> > > >> Herbert, do you recall why only a decrease was taken into consideration ? > > > > > > Because we shouldn't override administrative settings of the MTU > > > on the vlan device, unless we have to because of an MTU reduction > > > on the underlying device. > > > > > > Yes this is not perfect if the admin never set an MTU to start with > > > but as we don't have a way of telling whether the admin has or has > > > not changed the MTU setting, the safest course of action is to do > > > nothing in that case. > > If the admin has changed the vlan device MTU smaller than the underlying > > device MTU firstly, then changed the underlying device MTU smaller than > > the vlan device MTU secondly. The admin's configuration has been overridden. > > Can we consider that the admin's configuration for the vlan device MTU has > > been invalid and disappeared after the second change? I think so. > > The answer is no. > > Herbert is saying: > > ip link add link eth1 dev eth1.100 type vlan id 100 > ... > ip link set eth1.100 mtu 800 > .. > ip link set eth1 mtu 256 > ip link set eth1 mtu 1500 > > -> we do not want eth1.100 mtu being set back to 1500, this might > break applications, depending on old kernel feature. > Eventually, setting back to 800 seems ok. > > If you want this new feature, we need to record in eth1.100 device > that no admin ever changed the mtu, > as Herbert suggested. > > Then, it is okay to upgrade the vlan mtu (but still is a behavioral > change that _could_ break some scripts) What about an explicit option: ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu Or for something more future proof, an option that can accept several policies: mtu-update <reduce-only,follow,...> reduce-only (default): update vlan's MTU only if the new MTU is smaller than the current one (current behaviour). follow: always follow the MTU of the parent device. Then if anyone wants more complex policies: follow-if-not-modified: follow the MTU of the parent device as long as the VLAN's MTU was not manually changed. Otherwise only adjust the VLAN's MTU when the parent's one is set to a smaller value. follow-if-not-modified-but-not-quite: like follow-if-not-modified but revert back to the VLAN's last manually modified MTU, if any, whenever possible (that is, when the parent device's MTU is set back to a higher value). That probably requires the possibility to dump the last modified MTU, so the administrator can anticipate the consequences of modifying the parent device. yet-another-policy (because people have a lot of imagination): for example, keep the MTU 4 bytes lower than the parent device, to account for VLAN overhead. Of course feel free to suggest better names and policies :). This way, we can keep the current behaviour and avoid unexpected heuristics that are difficult to explain (and even more difficult for network admins to figure out on their own). > Thank you. >
On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote: > What about an explicit option: > > ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu > > > Or for something more future proof, an option that can accept several > policies: > > mtu-update <reduce-only,follow,...> > > reduce-only (default): > update vlan's MTU only if the new MTU is smaller than the > current one (current behaviour). > > follow: > always follow the MTU of the parent device. > > Then if anyone wants more complex policies: > > follow-if-not-modified: > follow the MTU of the parent device as long as the VLAN's MTU > was not manually changed. Otherwise only adjust the VLAN's MTU > when the parent's one is set to a smaller value. > > follow-if-not-modified-but-not-quite: > like follow-if-not-modified but revert back to the VLAN's > last manually modified MTU, if any, whenever possible (that is, > when the parent device's MTU is set back to a higher value). > That probably requires the possibility to dump the last > modified MTU, so the administrator can anticipate the > consequences of modifying the parent device. > > yet-another-policy (because people have a lot of imagination): > for example, keep the MTU 4 bytes lower than the parent device, > to account for VLAN overhead. > > Of course feel free to suggest better names and policies :). > > This way, we can keep the current behaviour and avoid unexpected > heuristics that are difficult to explain (and even more difficult for > network admins to figure out on their own). My $0.02 would be that if we want to make changes that require new uAPI we should do it across uppers.
>> On Mon, Feb 21, 2022 at 6:06 PM Ziyang Xuan (William) >> <william.xuanziyang@huawei.com> wrote: >>> >>>> On Mon, Feb 21, 2022 at 07:43:18AM -0800, Eric Dumazet wrote: >>>>> >>>>> Herbert, do you recall why only a decrease was taken into consideration ? >>>> >>>> Because we shouldn't override administrative settings of the MTU >>>> on the vlan device, unless we have to because of an MTU reduction >>>> on the underlying device. >>>> >>>> Yes this is not perfect if the admin never set an MTU to start with >>>> but as we don't have a way of telling whether the admin has or has >>>> not changed the MTU setting, the safest course of action is to do >>>> nothing in that case. >>> If the admin has changed the vlan device MTU smaller than the underlying >>> device MTU firstly, then changed the underlying device MTU smaller than >>> the vlan device MTU secondly. The admin's configuration has been overridden. >>> Can we consider that the admin's configuration for the vlan device MTU has >>> been invalid and disappeared after the second change? I think so. >> >> The answer is no. >> >> Herbert is saying: >> >> ip link add link eth1 dev eth1.100 type vlan id 100 >> ... >> ip link set eth1.100 mtu 800 >> .. >> ip link set eth1 mtu 256 >> ip link set eth1 mtu 1500 >> >> -> we do not want eth1.100 mtu being set back to 1500, this might >> break applications, depending on old kernel feature. >> Eventually, setting back to 800 seems ok. > > It seem that setting back to 800 more reasonable. We can record user > setting MTU by interface ndo_change_mtu() in struct vlan_dev_priv. > I attempt to record user setting MTU for vlan device. Use the recorded MTU to restore when uderlying device change the MTU from smaller to bigger than recorded vlan device MTU. The modification as following: diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 2be4dd7e90a9..b8970e90a279 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -163,6 +163,7 @@ struct netpoll; * @vlan_proto: VLAN encapsulation protocol * @vlan_id: VLAN identifier * @flags: device flags + * @mtu: user setting MTU * @real_dev: underlying netdevice * @dev_tracker: refcount tracker for @real_dev reference * @real_dev_addr: address of underlying netdevice @@ -179,6 +180,8 @@ struct vlan_dev_priv { u16 vlan_id; u16 flags; + u32 mtu; + struct net_device *real_dev; netdevice_tracker dev_tracker; diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 788076b002b3..492ef88923c2 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -365,6 +365,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct vlan_group *grp; struct vlan_info *vlan_info; + unsigned int new_mtu; int i, flgs; struct net_device *vlandev; struct vlan_dev_priv *vlan; @@ -419,10 +420,10 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, case NETDEV_CHANGEMTU: vlan_group_for_each_dev(grp, i, vlandev) { - if (vlandev->mtu <= dev->mtu) - continue; + vlan = vlan_dev_priv(vlandev); + new_mtu = (!vlan->mtu || dev->mtu < vlan->mtu) ? dev->mtu : vlan->mtu; - dev_set_mtu(vlandev, dev->mtu); + dev_set_mtu(vlandev, new_mtu); } break; diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index d1902828a18a..66c2b64d1ece 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -140,7 +140,8 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb, static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu) { - struct net_device *real_dev = vlan_dev_priv(dev)->real_dev; + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + struct net_device *real_dev = vlan->real_dev; unsigned int max_mtu = real_dev->mtu; if (netif_reduces_vlan_mtu(real_dev)) @@ -148,8 +149,15 @@ static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu) if (max_mtu < new_mtu) return -ERANGE; - dev->mtu = new_mtu; + /* Identify user MTU change different from the underlying devcie + * NETDEV_CHANGEMTU event. Record user setting MTU in mtu member + * of struct vlan_dev_priv. + */ + if ((!vlan->mtu && new_mtu != real_dev->mtu) || + (dev->mtu == vlan->mtu && vlan->mtu < real_dev->mtu)) + vlan->mtu = new_mtu; + dev->mtu = new_mtu; return 0; } I test it in various combination scenarios. I found it can not cover one scenario because user setting can not arrive vlan module codes. For example: ip link add link eth1 dev eth1.100 type vlan id 100 // eth1.100 MTU 1500 ip link set eth1.100 mtu 1500 // success no error When new_mtu equal to orig_mtu, user setting operation can not arrive lower vlan module, vlan module can not perceive, so we can not record the setting. Before my colleague point that the above setting is not error for user, I always think that it is invalid setting. But I think his opinion makes sense. So what do you think about the above setting? >> >> If you want this new feature, we need to record in eth1.100 device >> that no admin ever changed the mtu, >> as Herbert suggested. >> >> Then, it is okay to upgrade the vlan mtu (but still is a behavioral >> change that _could_ break some scripts) >> >> Thank you. >> . >> > . >
On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote: > On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote: > > What about an explicit option: > > > > ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu > > > > > > Or for something more future proof, an option that can accept several > > policies: > > > > mtu-update <reduce-only,follow,...> > > > > reduce-only (default): > > update vlan's MTU only if the new MTU is smaller than the > > current one (current behaviour). > > > > follow: > > always follow the MTU of the parent device. > > > > Then if anyone wants more complex policies: > > > > follow-if-not-modified: > > follow the MTU of the parent device as long as the VLAN's MTU > > was not manually changed. Otherwise only adjust the VLAN's MTU > > when the parent's one is set to a smaller value. > > > > follow-if-not-modified-but-not-quite: > > like follow-if-not-modified but revert back to the VLAN's > > last manually modified MTU, if any, whenever possible (that is, > > when the parent device's MTU is set back to a higher value). > > That probably requires the possibility to dump the last > > modified MTU, so the administrator can anticipate the > > consequences of modifying the parent device. > > > > yet-another-policy (because people have a lot of imagination): > > for example, keep the MTU 4 bytes lower than the parent device, > > to account for VLAN overhead. > > > > Of course feel free to suggest better names and policies :). > > > > This way, we can keep the current behaviour and avoid unexpected > > heuristics that are difficult to explain (and even more difficult for > > network admins to figure out on their own). > > My $0.02 would be that if we want to make changes that require new uAPI > we should do it across uppers. Do you mean something like: ip link set dev eth0 vlan-mtu-policy <policy-name> that'd affect all existing (and future) vlans of eth0? Then I think that for non-ethernet devices, we should reject this option and skip it when dumping config. But yes, that's another possibility. I personnaly don't really mind, as long as we keep a clear behaviour. What I'd really like to avoid is something like: - By default it behaves this way. - If you modified the MTU it behaves in another way - But if you modified the MTU but later restored the original MTU, then you're back to the default behaviour (or not?), unless the MTU of the upper device was also changed meanwhile, in which case ... to be continued ... - BTW, you might not be able to tell how the VLAN's MTU is going to behave by simply looking at its configuration, because that also depends on past configurations. - Well, and if your kernel is older than xxx, then you always get the default behaviour. - ... and we might modify the heuristics again in the future to accomodate with situations or use cases we failed to consider.
On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault <gnault@redhat.com> wrote: > On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote: > > On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote: > > > What about an explicit option: > > > > > > ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu > > > > > > > > > Or for something more future proof, an option that can accept several > > > policies: > > > > > > mtu-update <reduce-only,follow,...> > > > > > > reduce-only (default): > > > update vlan's MTU only if the new MTU is smaller than the > > > current one (current behaviour). > > > > > > follow: > > > always follow the MTU of the parent device. > > > > > > Then if anyone wants more complex policies: > > > > > > follow-if-not-modified: > > > follow the MTU of the parent device as long as the VLAN's MTU > > > was not manually changed. Otherwise only adjust the VLAN's MTU > > > when the parent's one is set to a smaller value. > > > > > > follow-if-not-modified-but-not-quite: > > > like follow-if-not-modified but revert back to the VLAN's > > > last manually modified MTU, if any, whenever possible (that is, > > > when the parent device's MTU is set back to a higher value). > > > That probably requires the possibility to dump the last > > > modified MTU, so the administrator can anticipate the > > > consequences of modifying the parent device. > > > > > > yet-another-policy (because people have a lot of imagination): > > > for example, keep the MTU 4 bytes lower than the parent device, > > > to account for VLAN overhead. > > > > > > Of course feel free to suggest better names and policies :). > > > > > > This way, we can keep the current behaviour and avoid unexpected > > > heuristics that are difficult to explain (and even more difficult for > > > network admins to figure out on their own). > > > > My $0.02 would be that if we want to make changes that require new uAPI > > we should do it across uppers. > > Do you mean something like: > > ip link set dev eth0 vlan-mtu-policy <policy-name> > > that'd affect all existing (and future) vlans of eth0? > > Then I think that for non-ethernet devices, we should reject this > option and skip it when dumping config. But yes, that's another > possibility. > > I personnaly don't really mind, as long as we keep a clear behaviour. > > What I'd really like to avoid is something like: > - By default it behaves this way. > - If you modified the MTU it behaves in another way > - But if you modified the MTU but later restored the > original MTU, then you're back to the default behaviour > (or not?), unless the MTU of the upper device was also > changed meanwhile, in which case ... to be continued ... > - BTW, you might not be able to tell how the VLAN's MTU is going to > behave by simply looking at its configuration, because that also > depends on past configurations. > - Well, and if your kernel is older than xxx, then you always get the > default behaviour. > - ... and we might modify the heuristics again in the future to > accomodate with situations or use cases we failed to consider. > In general these kind of policy choices are done via sysctl knobs. They aren't done at netlink/ip link level.
On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote: > Do you mean something like: > > ip link set dev eth0 vlan-mtu-policy <policy-name> > > that'd affect all existing (and future) vlans of eth0? I meant ip link set dev vlan0 mtu-policy blah but also ip link set dev bond0 mtu-policy blah and ip link set dev macsec0 mtu-policy blah2 ip link set dev vxlan0 mtu-policy blah2 etc. > Then I think that for non-ethernet devices, we should reject this > option and skip it when dumping config. But yes, that's another > possibility. > > I personnaly don't really mind, as long as we keep a clear behaviour. > > What I'd really like to avoid is something like: > - By default it behaves this way. > - If you modified the MTU it behaves in another way > - But if you modified the MTU but later restored the > original MTU, then you're back to the default behaviour > (or not?), unless the MTU of the upper device was also > changed meanwhile, in which case ... to be continued ... > - BTW, you might not be able to tell how the VLAN's MTU is going to > behave by simply looking at its configuration, because that also > depends on past configurations. > - Well, and if your kernel is older than xxx, then you always get the > default behaviour. > - ... and we might modify the heuristics again in the future to > accomodate with situations or use cases we failed to consider. To be honest I'm still not clear if this is a real problem. The patch does not specify what the use case is.
On Wed, Feb 23, 2022 at 07:17:36AM -0800, Stephen Hemminger wrote: > On Wed, 23 Feb 2022 12:26:18 +0100 > Guillaume Nault <gnault@redhat.com> wrote: > > > On Tue, Feb 22, 2022 at 03:28:15PM -0800, Jakub Kicinski wrote: > > > On Tue, 22 Feb 2022 11:37:33 +0100 Guillaume Nault wrote: > > > > What about an explicit option: > > > > > > > > ip link add link eth1 dev eth1.100 type vlan id 100 follow-parent-mtu > > > > > > > > > > > > Or for something more future proof, an option that can accept several > > > > policies: > > > > > > > > mtu-update <reduce-only,follow,...> > > > > > > > > reduce-only (default): > > > > update vlan's MTU only if the new MTU is smaller than the > > > > current one (current behaviour). > > > > > > > > follow: > > > > always follow the MTU of the parent device. > > > > > > > > Then if anyone wants more complex policies: > > > > > > > > follow-if-not-modified: > > > > follow the MTU of the parent device as long as the VLAN's MTU > > > > was not manually changed. Otherwise only adjust the VLAN's MTU > > > > when the parent's one is set to a smaller value. > > > > > > > > follow-if-not-modified-but-not-quite: > > > > like follow-if-not-modified but revert back to the VLAN's > > > > last manually modified MTU, if any, whenever possible (that is, > > > > when the parent device's MTU is set back to a higher value). > > > > That probably requires the possibility to dump the last > > > > modified MTU, so the administrator can anticipate the > > > > consequences of modifying the parent device. > > > > > > > > yet-another-policy (because people have a lot of imagination): > > > > for example, keep the MTU 4 bytes lower than the parent device, > > > > to account for VLAN overhead. > > > > > > > > Of course feel free to suggest better names and policies :). > > > > > > > > This way, we can keep the current behaviour and avoid unexpected > > > > heuristics that are difficult to explain (and even more difficult for > > > > network admins to figure out on their own). > > > > > > My $0.02 would be that if we want to make changes that require new uAPI > > > we should do it across uppers. > > > > Do you mean something like: > > > > ip link set dev eth0 vlan-mtu-policy <policy-name> > > > > that'd affect all existing (and future) vlans of eth0? > > > > Then I think that for non-ethernet devices, we should reject this > > option and skip it when dumping config. But yes, that's another > > possibility. > > > > I personnaly don't really mind, as long as we keep a clear behaviour. > > > > What I'd really like to avoid is something like: > > - By default it behaves this way. > > - If you modified the MTU it behaves in another way > > - But if you modified the MTU but later restored the > > original MTU, then you're back to the default behaviour > > (or not?), unless the MTU of the upper device was also > > changed meanwhile, in which case ... to be continued ... > > - BTW, you might not be able to tell how the VLAN's MTU is going to > > behave by simply looking at its configuration, because that also > > depends on past configurations. > > - Well, and if your kernel is older than xxx, then you always get the > > default behaviour. > > - ... and we might modify the heuristics again in the future to > > accomodate with situations or use cases we failed to consider. > > > > In general these kind of policy choices are done via sysctl knobs. > They aren't done at netlink/ip link level. I don't really mind if the configuration is per vlan, per upper device or per netns, as long as we keep a clear behaviour by default.
On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote: > On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote: > > Do you mean something like: > > > > ip link set dev eth0 vlan-mtu-policy <policy-name> > > > > that'd affect all existing (and future) vlans of eth0? > > I meant > > ip link set dev vlan0 mtu-policy blah > > but also > > ip link set dev bond0 mtu-policy blah > > and > > ip link set dev macsec0 mtu-policy blah2 > ip link set dev vxlan0 mtu-policy blah2 > > etc. Unless I'm missing something, that looks very much like what I proposed (these are all ARPHRD_ETHER devices). It's just a bit unclear whether "ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans that might be stacked on top of it (given your other examples, I assume it's the later). > > Then I think that for non-ethernet devices, we should reject this > > option and skip it when dumping config. But yes, that's another > > possibility. > > > > I personnaly don't really mind, as long as we keep a clear behaviour. > > > > What I'd really like to avoid is something like: > > - By default it behaves this way. > > - If you modified the MTU it behaves in another way > > - But if you modified the MTU but later restored the > > original MTU, then you're back to the default behaviour > > (or not?), unless the MTU of the upper device was also > > changed meanwhile, in which case ... to be continued ... > > - BTW, you might not be able to tell how the VLAN's MTU is going to > > behave by simply looking at its configuration, because that also > > depends on past configurations. > > - Well, and if your kernel is older than xxx, then you always get the > > default behaviour. > > - ... and we might modify the heuristics again in the future to > > accomodate with situations or use cases we failed to consider. > > To be honest I'm still not clear if this is a real problem. > The patch does not specify what the use case is. It's probably not a problem as long as we keep sane behaviour by default. Then we can let admins opt in for something more complex or loosely defined.
On Wed, 23 Feb 2022 08:03:42 -0800 Jakub Kicinski <kuba@kernel.org> wrote: > On Wed, 23 Feb 2022 12:26:18 +0100 Guillaume Nault wrote: > > Do you mean something like: > > > > ip link set dev eth0 vlan-mtu-policy <policy-name> > > > > that'd affect all existing (and future) vlans of eth0? > > I meant > > ip link set dev vlan0 mtu-policy blah > > but also > > ip link set dev bond0 mtu-policy blah > > and > > ip link set dev macsec0 mtu-policy blah2 > ip link set dev vxlan0 mtu-policy blah2 Sorry, putting this in ip link is not the right place. It belongs in sysctl (if at all); not convinced this is worth doing.
On Wed, 23 Feb 2022 17:58:36 +0100 Guillaume Nault wrote: > On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote: > > I meant > > > > ip link set dev vlan0 mtu-policy blah > > > > but also > > > > ip link set dev bond0 mtu-policy blah > > > > and > > > > ip link set dev macsec0 mtu-policy blah2 > > ip link set dev vxlan0 mtu-policy blah2 > > > > etc. > > Unless I'm missing something, that looks very much like what I proposed > (these are all ARPHRD_ETHER devices). It's just a bit unclear whether > "ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans > that might be stacked on top of it (given your other examples, I assume > it's the later). No, sorry I thought it would be clear, we need that neuralink ;) It applies to the device on which it's configured. What I mean is that bond, macsec, mpls etc have the same "should it follow the MTU of the lower device" problem, it's not vlan specific. Or am I wrong about that? > > To be honest I'm still not clear if this is a real problem. > > The patch does not specify what the use case is. > > It's probably not a problem as long as we keep sane behaviour by > default. Then we can let admins opt in for something more complex or > loosely defined. What I meant was - does anyone actually flip the MTU of their interfaces back and forth while the system is running. Maybe people do.
On Wed, Feb 23, 2022 at 09:37:49AM -0800, Jakub Kicinski wrote: > On Wed, 23 Feb 2022 17:58:36 +0100 Guillaume Nault wrote: > > On Wed, Feb 23, 2022 at 08:03:42AM -0800, Jakub Kicinski wrote: > > > I meant > > > > > > ip link set dev vlan0 mtu-policy blah > > > > > > but also > > > > > > ip link set dev bond0 mtu-policy blah > > > > > > and > > > > > > ip link set dev macsec0 mtu-policy blah2 > > > ip link set dev vxlan0 mtu-policy blah2 > > > > > > etc. > > > > Unless I'm missing something, that looks very much like what I proposed > > (these are all ARPHRD_ETHER devices). It's just a bit unclear whether > > "ip link set dev vlan0 mtu-policy blah" applies to vlan0 or to the vlans > > that might be stacked on top of it (given your other examples, I assume > > it's the later). > > No, sorry I thought it would be clear, we need that neuralink ;) > It applies to the device on which it's configured. What I mean > is that bond, macsec, mpls etc have the same "should it follow > the MTU of the lower device" problem, it's not vlan specific. > Or am I wrong about that? Ok, I get it now, sorry for being slow :). But I wouldn't consider mpls and vxlan. We have no device type for mpls. For vxlan (and other ip tunnels) the virtual device isn't directly tied to a physical device. Also, ip tunnels can resort to fragmentation in case of small MTU on the output device, so following MTU changes is not a hard requirement as with vlans. For other devices, we'd probably have to take into account the fact that some of them need to have a smaller MTU due to their extra header (that can be the case for some stacked vlans scenarios). But honnestly, I don't believe it's worth the extra complexity. > > > To be honest I'm still not clear if this is a real problem. > > > The patch does not specify what the use case is. > > > > It's probably not a problem as long as we keep sane behaviour by > > default. Then we can let admins opt in for something more complex or > > loosely defined. > > What I meant was - does anyone actually flip the MTU of their > interfaces back and forth while the system is running. Maybe > people do. In my experience people often try to upgrade their MTU, which is prone to failure because all nodes on the ethernet segment need to be upgraded at once (and people like unmanageably big ethernet segments). So reverting to the previous configuration is often needed. Another reason for back and forth modifications is fat fingers: change the MTU of a device, realise that was the wrong one, restore settings and reapply on the correct device. More importantly, one path people take to upgrade their MTU is to ensure that all their traffic is vlan encapsulated, then higher the MTU of the ethernet device, and finally higher the MTU of each vlan on a case by case basis. In such scenarios, you certainly _don't_ want the vlans to follow the MTU of their parent device, no matter if their MTU is the default one, if it's equal to the current MTU of the eth interface, if it was ever modified since the creation of the device, or any other situation heuristics might use.
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 788076b002b3..7de4f462525a 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -361,6 +361,7 @@ static int __vlan_device_event(struct net_device *dev, unsigned long event) static int vlan_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { + unsigned int orig_mtu = ((struct netdev_notifier_info_ext *)ptr)->ext.mtu; struct netlink_ext_ack *extack = netdev_notifier_info_to_extack(ptr); struct net_device *dev = netdev_notifier_info_to_dev(ptr); struct vlan_group *grp; @@ -419,7 +420,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, case NETDEV_CHANGEMTU: vlan_group_for_each_dev(grp, i, vlandev) { - if (vlandev->mtu <= dev->mtu) + if (vlandev->mtu <= dev->mtu && vlandev->mtu != orig_mtu) continue; dev_set_mtu(vlandev, dev->mtu);
vlan device MTU can only follow real device change from bigger to smaller but from smaller to bigger under the premise of vlan device MTU not exceed the real device MTU. This issue can be seen using the following commands: ip link add link eth1 dev eth1.100 type vlan id 100 ip link set eth1 mtu 256 ip link set eth1 mtu 1500 ip link show Modify to allow vlan device follow real device MTU change from smaller to bigger when user has not configured vlan device MTU which is not equal to real device MTU. That also ensure user configuration has higher priority. Fixes: 2e477c9bd2bb ("vlan: Propagate physical MTU changes") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> --- net/8021q/vlan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)