diff mbox series

[net,1/3] ipvlan: fix NETDEV_UP/NETDEV_DOWN event handling

Message ID 20250403085857.17868-2-liuhangbin@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series fix ipvlan/macvlan link event handing | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 33 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Hangbin Liu April 3, 2025, 8:58 a.m. UTC
When setting the lower-layer link up/down, the ipvlan device synchronizes
its state via netif_stacked_transfer_operstate(), which only checks the
carrier state. However, setting the link down does not necessarily change
the carrier state for virtual interfaces like bonding. This causes the
ipvlan state to become out of sync with the lower-layer link state.

If the lower link and ipvlan are in the same namespace, this issue is
hidden because ip link show checks the link state in IFLA_LINK and has
a m_flag to control the state, displaying M-DOWN in the flags. However,
if the ipvlan and the lower link are in different namespaces, this
information is not available, and the ipvlan link state remains unchanged.
For example:

  1. Add an ipvlan over bond0.
  2. Move the ipvlan to a separate namespace and bring it up.
  3. Set bond0 link down.
  4. The ipvlan remains up.

This issue affects containers and pods, causing them to display an
incorrect link state for ipvlan. Fix this by explicitly changing the
IFF_UP flag, similar to how VLAN handles it.

Fixes: 57fb346cc7d0 ("ipvlan: Add handling of NETDEV_UP events")
Fixes: 229783970838 ("ipvlan: handle NETDEV_DOWN event")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Sabrina Dubroca April 3, 2025, 10:28 a.m. UTC | #1
Hello Hangbin,

2025-04-03, 08:58:55 +0000, Hangbin Liu wrote:
> When setting the lower-layer link up/down, the ipvlan device synchronizes
> its state via netif_stacked_transfer_operstate(), which only checks the
> carrier state. However, setting the link down does not necessarily change
> the carrier state for virtual interfaces like bonding. This causes the
> ipvlan state to become out of sync with the lower-layer link state.
> 
> If the lower link and ipvlan are in the same namespace, this issue is
> hidden because ip link show checks the link state in IFLA_LINK and has
> a m_flag to control the state, displaying M-DOWN in the flags. However,
> if the ipvlan and the lower link are in different namespaces, this
> information is not available, and the ipvlan link state remains unchanged.

Is the issue with the actual behavior (sending/receiving packets,
etc), or just in how it's displayed by iproute?

> For example:
> 
>   1. Add an ipvlan over bond0.
>   2. Move the ipvlan to a separate namespace and bring it up.
>   3. Set bond0 link down.
>   4. The ipvlan remains up.
> 
> This issue affects containers and pods, causing them to display an
> incorrect link state for ipvlan. Fix this by explicitly changing the
> IFF_UP flag, similar to how VLAN handles it.

I'm not sure this change of behavior can be done anymore. And I'm not
convinced vlan's behavior is better (commit 5e7565930524 ("vlan:
support "loose binding" to the underlying network device") describes
why it's not always wanted). IMO it makes sense to have admin state
separate from link state.

If you want a consistent behavior, the admin should also not be
allowed to set the link UP again while its lower device is not, like
VLAN does:

static int vlan_dev_open(struct net_device *dev)
{
	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
	struct net_device *real_dev = vlan->real_dev;
	int err;

	if (!(real_dev->flags & IFF_UP) &&
	    !(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
		return -ENETDOWN;


(but that would almost certainly break someone's scripts)
Hangbin Liu April 3, 2025, 1:09 p.m. UTC | #2
Hi Sabrina,
On Thu, Apr 03, 2025 at 12:28:54PM +0200, Sabrina Dubroca wrote:
> Hello Hangbin,
> 
> 2025-04-03, 08:58:55 +0000, Hangbin Liu wrote:
> > When setting the lower-layer link up/down, the ipvlan device synchronizes
> > its state via netif_stacked_transfer_operstate(), which only checks the
> > carrier state. However, setting the link down does not necessarily change
> > the carrier state for virtual interfaces like bonding. This causes the
> > ipvlan state to become out of sync with the lower-layer link state.
> > 
> > If the lower link and ipvlan are in the same namespace, this issue is
> > hidden because ip link show checks the link state in IFLA_LINK and has
> > a m_flag to control the state, displaying M-DOWN in the flags. However,
> > if the ipvlan and the lower link are in different namespaces, this
> > information is not available, and the ipvlan link state remains unchanged.
> 
> Is the issue with the actual behavior (sending/receiving packets,
> etc), or just in how it's displayed by iproute?

The upper link in netns up while lower link down will cause the traffic break
in the pod.

> 
> > For example:
> > 
> >   1. Add an ipvlan over bond0.
> >   2. Move the ipvlan to a separate namespace and bring it up.
> >   3. Set bond0 link down.
> >   4. The ipvlan remains up.
> > 
> > This issue affects containers and pods, causing them to display an
> > incorrect link state for ipvlan. Fix this by explicitly changing the
> > IFF_UP flag, similar to how VLAN handles it.
> 
> I'm not sure this change of behavior can be done anymore. And I'm not
> convinced vlan's behavior is better (commit 5e7565930524 ("vlan:
> support "loose binding" to the underlying network device") describes
> why it's not always wanted). IMO it makes sense to have admin state
> separate from link state.

Thanks for the comments, that's also what I am worried. I have send
a question email[1] 2 months ago but not reply yet. So I post this
patch and welcome any feedback.

[1]https://lore.kernel.org/netdev/Z67lt5v6vrltiRyG@fedora/
> 
> If you want a consistent behavior, the admin should also not be
> allowed to set the link UP again while its lower device is not, like
> VLAN does:
> 
> static int vlan_dev_open(struct net_device *dev)
> {
> 	struct vlan_dev_priv *vlan = vlan_dev_priv(dev);
> 	struct net_device *real_dev = vlan->real_dev;
> 	int err;
> 
> 	if (!(real_dev->flags & IFF_UP) &&
> 	    !(vlan->flags & VLAN_FLAG_LOOSE_BINDING))
> 		return -ENETDOWN;
> 
> 
> (but that would almost certainly break someone's scripts)

Yes, so let's wait for others feedback first.

Thanks
Hangbin
Sabrina Dubroca April 3, 2025, 3 p.m. UTC | #3
2025-04-03, 13:09:02 +0000, Hangbin Liu wrote:
> Hi Sabrina,
> On Thu, Apr 03, 2025 at 12:28:54PM +0200, Sabrina Dubroca wrote:
> > Hello Hangbin,
> > 
> > 2025-04-03, 08:58:55 +0000, Hangbin Liu wrote:
> > > When setting the lower-layer link up/down, the ipvlan device synchronizes
> > > its state via netif_stacked_transfer_operstate(), which only checks the
> > > carrier state. However, setting the link down does not necessarily change
> > > the carrier state for virtual interfaces like bonding. This causes the
> > > ipvlan state to become out of sync with the lower-layer link state.
> > > 
> > > If the lower link and ipvlan are in the same namespace, this issue is
> > > hidden because ip link show checks the link state in IFLA_LINK and has
> > > a m_flag to control the state, displaying M-DOWN in the flags. However,
> > > if the ipvlan and the lower link are in different namespaces, this
> > > information is not available, and the ipvlan link state remains unchanged.
> > 
> > Is the issue with the actual behavior (sending/receiving packets,
> > etc), or just in how it's displayed by iproute?
> 
> The upper link in netns up while lower link down will cause the traffic break
> in the pod.

That seems like the correct behavior based on the actual (not
displayed) state of the links.


I wonder if netif_stacked_transfer_operstate should consider the admin
state of the lower device as well as link state:

@@ -10724,7 +10724,7 @@ void netif_stacked_transfer_operstate(const struct net_device *rootdev,
 	else
 		netif_testing_off(dev);
 
-	if (netif_carrier_ok(rootdev))
+	if (netif_carrier_ok(rootdev) && rootdev->flags & IFF_UP)
 		netif_carrier_on(dev);
 	else
 		netif_carrier_off(dev);


but I haven't looked at all the consequences and possible side
effects.
diff mbox series

Patch

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 0ed2fd833a5d..2abe6ddc4d15 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -730,7 +730,7 @@  static int ipvlan_device_event(struct notifier_block *unused,
 	struct ipvl_dev *ipvlan, *next;
 	struct ipvl_port *port;
 	LIST_HEAD(lst_kill);
-	int err;
+	int flags, err;
 
 	if (!netif_is_ipvlan_port(dev))
 		return NOTIFY_DONE;
@@ -739,7 +739,25 @@  static int ipvlan_device_event(struct notifier_block *unused,
 
 	switch (event) {
 	case NETDEV_UP:
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+			flags = ipvlan->dev->flags;
+			if (flags & IFF_UP)
+				continue;
+			dev_change_flags(ipvlan->dev, flags | IFF_UP, extack);
+			netif_stacked_transfer_operstate(ipvlan->phy_dev,
+							 ipvlan->dev);
+		}
+		break;
 	case NETDEV_DOWN:
+		list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+			flags = ipvlan->dev->flags;
+			if (!(flags & IFF_UP))
+				continue;
+			dev_close(ipvlan->dev);
+			netif_stacked_transfer_operstate(ipvlan->phy_dev,
+							 ipvlan->dev);
+		}
+		break;
 	case NETDEV_CHANGE:
 		list_for_each_entry(ipvlan, &port->ipvlans, pnode)
 			netif_stacked_transfer_operstate(ipvlan->phy_dev,