diff mbox series

[RFC,net] net: ipv4/ipv6 addrconf: call igmp{,6}_group_dropped() while dev is still up

Message ID 20230406233058.780721-1-vladimir.oltean@nxp.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net] net: ipv4/ipv6 addrconf: call igmp{,6}_group_dropped() while dev is still up | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vladimir Oltean April 6, 2023, 11:30 p.m. UTC
ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and
both of these eventually result in calls to dev_mc_del(), either through
igmp_group_dropped() or igmp6_group_dropped().

The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this
will not propagate all the way to the ndo_set_rx_mode() of the device,
because of this check:

	/* dev_open will call this function so the list will stay sane. */
	if (!(dev->flags&IFF_UP))
		return;

and the NETDEV_DOWN notifier is emitted while the interface is already
down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier -
see:

dev_close_many()
-> __dev_close_many()
   -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
   -> dev->flags &= ~IFF_UP;
-> call_netdevice_notifiers(NETDEV_DOWN, dev);

Normally this oversight is easy to miss, because the addresses aren't
lost, just not synced to the device until the next up event.

DSA does some processing in its dsa_slave_set_rx_mode(), and assumes
that all addresses that were synced are also unsynced by the time the
device is unregistered. Due to that assumption not being satisfied,
the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports()
triggers, and we leak memory corresponding to the multicast addresses
that were never synced.

Minimal reproducer:
ip link set swp0 up
ip link set swp0 down
echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind

The proposal is to respond to that slightly earlier notifier with the
IGMP address deletion, so that the ndo_set_rx_mode() of the device does
actually get called. I am not familiar with the details of these layers,
but it appeared to me that NETDEV_DOWN needed to be replaced everywhere
with NETDEV_GOING_DOWN, so I blindly did that and it worked.

Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
Obviously DSA is not the only affected driver, but the extent to which
other drivers are impacted is not obvious to me. At least in DSA, there
is a WARN_ON() and a memory leak, so this is why I chose that Fixes tag.

 net/ipv4/devinet.c  |  7 ++++---
 net/ipv6/addrconf.c | 12 ++++++------
 2 files changed, 10 insertions(+), 9 deletions(-)

Comments

David Ahern April 10, 2023, 2:08 a.m. UTC | #1
[ cc Ido in case such a change has implications to mlxsw ]

On 4/6/23 5:30 PM, Vladimir Oltean wrote:
> ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and
> both of these eventually result in calls to dev_mc_del(), either through
> igmp_group_dropped() or igmp6_group_dropped().
> 
> The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this
> will not propagate all the way to the ndo_set_rx_mode() of the device,
> because of this check:
> 
> 	/* dev_open will call this function so the list will stay sane. */
> 	if (!(dev->flags&IFF_UP))
> 		return;
> 
> and the NETDEV_DOWN notifier is emitted while the interface is already
> down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier -
> see:
> 
> dev_close_many()
> -> __dev_close_many()
>    -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
>    -> dev->flags &= ~IFF_UP;
> -> call_netdevice_notifiers(NETDEV_DOWN, dev);
> 
> Normally this oversight is easy to miss, because the addresses aren't
> lost, just not synced to the device until the next up event.
> 
> DSA does some processing in its dsa_slave_set_rx_mode(), and assumes
> that all addresses that were synced are also unsynced by the time the
> device is unregistered. Due to that assumption not being satisfied,
> the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports()
> triggers, and we leak memory corresponding to the multicast addresses
> that were never synced.
> 
> Minimal reproducer:
> ip link set swp0 up
> ip link set swp0 down
> echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
> 
> The proposal is to respond to that slightly earlier notifier with the
> IGMP address deletion, so that the ndo_set_rx_mode() of the device does
> actually get called. I am not familiar with the details of these layers,
> but it appeared to me that NETDEV_DOWN needed to be replaced everywhere
> with NETDEV_GOING_DOWN, so I blindly did that and it worked.
> 
> Fixes: 5e8a1e03aa4d ("net: dsa: install secondary unicast and multicast addresses as host FDB/MDB")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> Obviously DSA is not the only affected driver, but the extent to which
> other drivers are impacted is not obvious to me. At least in DSA, there
> is a WARN_ON() and a memory leak, so this is why I chose that Fixes tag.
> 
>  net/ipv4/devinet.c  |  7 ++++---
>  net/ipv6/addrconf.c | 12 ++++++------
>  2 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index 5deac0517ef7..95690d16d651 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -392,7 +392,8 @@ static void __inet_del_ifa(struct in_device *in_dev,
>  
>  				rtmsg_ifa(RTM_DELADDR, ifa, nlh, portid);
>  				blocking_notifier_call_chain(&inetaddr_chain,
> -						NETDEV_DOWN, ifa);
> +							     NETDEV_GOING_DOWN,
> +							     ifa);
>  				inet_free_ifa(ifa);
>  			} else {
>  				promote = ifa;
> @@ -429,7 +430,7 @@ static void __inet_del_ifa(struct in_device *in_dev,
>  	   So that, this order is correct.
>  	 */
>  	rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid);
> -	blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1);
> +	blocking_notifier_call_chain(&inetaddr_chain, NETDEV_GOING_DOWN, ifa1);
>  
>  	if (promote) {
>  		struct in_ifaddr *next_sec;
> @@ -1588,7 +1589,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
>  		/* Send gratuitous ARP to notify of link change */
>  		inetdev_send_gratuitous_arp(dev, in_dev);
>  		break;
> -	case NETDEV_DOWN:
> +	case NETDEV_GOING_DOWN:
>  		ip_mc_down(in_dev);
>  		break;
>  	case NETDEV_PRE_TYPE_CHANGE:
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 3797917237d0..9e484f829f1c 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -1307,7 +1307,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
>  
>  	ipv6_ifa_notify(RTM_DELADDR, ifp);
>  
> -	inet6addr_notifier_call_chain(NETDEV_DOWN, ifp);
> +	inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifp);
>  
>  	if (action != CLEANUP_PREFIX_RT_NOP) {
>  		cleanup_prefix_route(ifp, expires,
> @@ -3670,12 +3670,12 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
>  		}
>  		break;
>  
> -	case NETDEV_DOWN:
> +	case NETDEV_GOING_DOWN:
>  	case NETDEV_UNREGISTER:
>  		/*
>  		 *	Remove all addresses from this interface.
>  		 */
> -		addrconf_ifdown(dev, event != NETDEV_DOWN);
> +		addrconf_ifdown(dev, event != NETDEV_GOING_DOWN);
>  		break;
>  
>  	case NETDEV_CHANGENAME:
> @@ -3741,7 +3741,7 @@ static bool addr_is_local(const struct in6_addr *addr)
>  
>  static int addrconf_ifdown(struct net_device *dev, bool unregister)
>  {
> -	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
> +	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_GOING_DOWN;
>  	struct net *net = dev_net(dev);
>  	struct inet6_dev *idev;
>  	struct inet6_ifaddr *ifa;
> @@ -3877,7 +3877,7 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
>  
>  		if (state != INET6_IFADDR_STATE_DEAD) {
>  			__ipv6_ifa_notify(RTM_DELADDR, ifa);
> -			inet6addr_notifier_call_chain(NETDEV_DOWN, ifa);
> +			inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifa);
>  		} else {
>  			if (idev->cnf.forwarding)
>  				addrconf_leave_anycast(ifa);
> @@ -6252,7 +6252,7 @@ static void dev_disable_change(struct inet6_dev *idev)
>  
>  	netdev_notifier_info_init(&info, idev->dev);
>  	if (idev->cnf.disable_ipv6)
> -		addrconf_notify(NULL, NETDEV_DOWN, &info);
> +		addrconf_notify(NULL, NETDEV_GOING_DOWN, &info);
>  	else
>  		addrconf_notify(NULL, NETDEV_UP, &info);
>  }
Ido Schimmel April 10, 2023, 7:55 a.m. UTC | #2
On Sun, Apr 09, 2023 at 08:08:01PM -0600, David Ahern wrote:
> [ cc Ido in case such a change has implications to mlxsw ]

Thanks

> 
> On 4/6/23 5:30 PM, Vladimir Oltean wrote:
> > ipv4 devinet calls ip_mc_down(), and ipv6 calls addrconf_ifdown(), and
> > both of these eventually result in calls to dev_mc_del(), either through
> > igmp_group_dropped() or igmp6_group_dropped().
> > 
> > The problem is that dev_mc_del() does call __dev_set_rx_mode(), but this
> > will not propagate all the way to the ndo_set_rx_mode() of the device,
> > because of this check:
> > 
> > 	/* dev_open will call this function so the list will stay sane. */
> > 	if (!(dev->flags&IFF_UP))
> > 		return;
> > 
> > and the NETDEV_DOWN notifier is emitted while the interface is already
> > down. OTOH we have NETDEV_GOING_DOWN which is emitted a bit earlier -
> > see:
> > 
> > dev_close_many()
> > -> __dev_close_many()
> >    -> call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
> >    -> dev->flags &= ~IFF_UP;
> > -> call_netdevice_notifiers(NETDEV_DOWN, dev);
> > 
> > Normally this oversight is easy to miss, because the addresses aren't
> > lost, just not synced to the device until the next up event.
> > 
> > DSA does some processing in its dsa_slave_set_rx_mode(), and assumes
> > that all addresses that were synced are also unsynced by the time the
> > device is unregistered. Due to that assumption not being satisfied,
> > the WARN_ON(!list_empty(&dp->mdbs)); from dsa_switch_release_ports()
> > triggers, and we leak memory corresponding to the multicast addresses
> > that were never synced.
> > 
> > Minimal reproducer:
> > ip link set swp0 up
> > ip link set swp0 down
> > echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
> > 
> > The proposal is to respond to that slightly earlier notifier with the
> > IGMP address deletion, so that the ndo_set_rx_mode() of the device does
> > actually get called. I am not familiar with the details of these layers,
> > but it appeared to me that NETDEV_DOWN needed to be replaced everywhere
> > with NETDEV_GOING_DOWN, so I blindly did that and it worked.

I think there is a confusion here between the netdev notifier and
inetaddr notifiers. They all use "NETDEV_DOWN", but in the inetaddr
notifiers it means that an address is being deleted. Changing the event
to "NETDEV_GOING_DOWN" is going to break a lot of users since none of
the inetaddr listeners respond to "NETDEV_GOING_DOWN".

IOW, I believe you only need this change for IPv4 (and similarly for
IPv6):

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 5deac0517ef7..679c9819f25b 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1588,7 +1588,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 		/* Send gratuitous ARP to notify of link change */
 		inetdev_send_gratuitous_arp(dev, in_dev);
 		break;
-	case NETDEV_DOWN:
+	case NETDEV_GOING_DOWN:
 		ip_mc_down(in_dev);
 		break;
 	case NETDEV_PRE_TYPE_CHANGE:
Vladimir Oltean April 10, 2023, 10:09 a.m. UTC | #3
Hi Ido,

On Mon, Apr 10, 2023 at 10:55:02AM +0300, Ido Schimmel wrote:
> > > The proposal is to respond to that slightly earlier notifier with the
> > > IGMP address deletion, so that the ndo_set_rx_mode() of the device does
> > > actually get called. I am not familiar with the details of these layers,
> > > but it appeared to me that NETDEV_DOWN needed to be replaced everywhere
> > > with NETDEV_GOING_DOWN, so I blindly did that and it worked.
> 
> I think there is a confusion here between the netdev notifier and
> inetaddr notifiers. They all use "NETDEV_DOWN", but in the inetaddr
> notifiers it means that an address is being deleted. Changing the event
> to "NETDEV_GOING_DOWN" is going to break a lot of users since none of
> the inetaddr listeners respond to "NETDEV_GOING_DOWN".
> 
> IOW, I believe you only need this change for IPv4 (and similarly for
> IPv6):
> 
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index 5deac0517ef7..679c9819f25b 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -1588,7 +1588,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
>  		/* Send gratuitous ARP to notify of link change */
>  		inetdev_send_gratuitous_arp(dev, in_dev);
>  		break;
> -	case NETDEV_DOWN:
> +	case NETDEV_GOING_DOWN:
>  		ip_mc_down(in_dev);
>  		break;
>  	case NETDEV_PRE_TYPE_CHANGE:

You are correct, only that portion is needed for IPv4. When I open my
eyes, I see it too :)

Although it would have been a lot less confusing for someone looking at
the code for the first time if the inetaddr and inet6addr notifiers did
not use events from the same NETDEV_ namespace as the netdev notifiers...

So, how do you think I should proceed with this? One patch or two
(for IPv4 and IPv6)? Is the Fixes: tag ok?
Ido Schimmel April 10, 2023, 11:43 a.m. UTC | #4
On Mon, Apr 10, 2023 at 01:09:58PM +0300, Vladimir Oltean wrote:
> So, how do you think I should proceed with this? One patch or two
> (for IPv4 and IPv6)? Is the Fixes: tag ok?

Fixes tag looks OK and one patch is fine by me. However, given the
problem is the check you mentioned in __dev_set_rx_mode(), wouldn't it
be better to simply remove it? From the comment above this check it
seems to assume that there is no need to update the Rx filters of the
device when it's down because they will be synced when it's put back up,
but it fails to consider the case where one wants to clear the filters
of the device as part of device dismantle.
Vladimir Oltean April 10, 2023, 5:07 p.m. UTC | #5
On Mon, Apr 10, 2023 at 02:43:43PM +0300, Ido Schimmel wrote:
> On Mon, Apr 10, 2023 at 01:09:58PM +0300, Vladimir Oltean wrote:
> > So, how do you think I should proceed with this? One patch or two
> > (for IPv4 and IPv6)? Is the Fixes: tag ok?
> 
> Fixes tag looks OK and one patch is fine by me. However, given the
> problem is the check you mentioned in __dev_set_rx_mode(), wouldn't it
> be better to simply remove it? From the comment above this check it
> seems to assume that there is no need to update the Rx filters of the
> device when it's down because they will be synced when it's put back up,
> but it fails to consider the case where one wants to clear the filters
> of the device as part of device dismantle.

Yes, I can do this.
diff mbox series

Patch

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 5deac0517ef7..95690d16d651 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -392,7 +392,8 @@  static void __inet_del_ifa(struct in_device *in_dev,
 
 				rtmsg_ifa(RTM_DELADDR, ifa, nlh, portid);
 				blocking_notifier_call_chain(&inetaddr_chain,
-						NETDEV_DOWN, ifa);
+							     NETDEV_GOING_DOWN,
+							     ifa);
 				inet_free_ifa(ifa);
 			} else {
 				promote = ifa;
@@ -429,7 +430,7 @@  static void __inet_del_ifa(struct in_device *in_dev,
 	   So that, this order is correct.
 	 */
 	rtmsg_ifa(RTM_DELADDR, ifa1, nlh, portid);
-	blocking_notifier_call_chain(&inetaddr_chain, NETDEV_DOWN, ifa1);
+	blocking_notifier_call_chain(&inetaddr_chain, NETDEV_GOING_DOWN, ifa1);
 
 	if (promote) {
 		struct in_ifaddr *next_sec;
@@ -1588,7 +1589,7 @@  static int inetdev_event(struct notifier_block *this, unsigned long event,
 		/* Send gratuitous ARP to notify of link change */
 		inetdev_send_gratuitous_arp(dev, in_dev);
 		break;
-	case NETDEV_DOWN:
+	case NETDEV_GOING_DOWN:
 		ip_mc_down(in_dev);
 		break;
 	case NETDEV_PRE_TYPE_CHANGE:
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3797917237d0..9e484f829f1c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1307,7 +1307,7 @@  static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 
 	ipv6_ifa_notify(RTM_DELADDR, ifp);
 
-	inet6addr_notifier_call_chain(NETDEV_DOWN, ifp);
+	inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifp);
 
 	if (action != CLEANUP_PREFIX_RT_NOP) {
 		cleanup_prefix_route(ifp, expires,
@@ -3670,12 +3670,12 @@  static int addrconf_notify(struct notifier_block *this, unsigned long event,
 		}
 		break;
 
-	case NETDEV_DOWN:
+	case NETDEV_GOING_DOWN:
 	case NETDEV_UNREGISTER:
 		/*
 		 *	Remove all addresses from this interface.
 		 */
-		addrconf_ifdown(dev, event != NETDEV_DOWN);
+		addrconf_ifdown(dev, event != NETDEV_GOING_DOWN);
 		break;
 
 	case NETDEV_CHANGENAME:
@@ -3741,7 +3741,7 @@  static bool addr_is_local(const struct in6_addr *addr)
 
 static int addrconf_ifdown(struct net_device *dev, bool unregister)
 {
-	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
+	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_GOING_DOWN;
 	struct net *net = dev_net(dev);
 	struct inet6_dev *idev;
 	struct inet6_ifaddr *ifa;
@@ -3877,7 +3877,7 @@  static int addrconf_ifdown(struct net_device *dev, bool unregister)
 
 		if (state != INET6_IFADDR_STATE_DEAD) {
 			__ipv6_ifa_notify(RTM_DELADDR, ifa);
-			inet6addr_notifier_call_chain(NETDEV_DOWN, ifa);
+			inet6addr_notifier_call_chain(NETDEV_GOING_DOWN, ifa);
 		} else {
 			if (idev->cnf.forwarding)
 				addrconf_leave_anycast(ifa);
@@ -6252,7 +6252,7 @@  static void dev_disable_change(struct inet6_dev *idev)
 
 	netdev_notifier_info_init(&info, idev->dev);
 	if (idev->cnf.disable_ipv6)
-		addrconf_notify(NULL, NETDEV_DOWN, &info);
+		addrconf_notify(NULL, NETDEV_GOING_DOWN, &info);
 	else
 		addrconf_notify(NULL, NETDEV_UP, &info);
 }