diff mbox series

[net-next,v2,2/2] net: dsa: update the unicast MAC address when changing conduit

Message ID 20240502122922.28139-3-kabel@kernel.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix changing DSA conduit | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: pabeni@redhat.com; 3 maintainers not CCed: pabeni@redhat.com kuba@kernel.org edumazet@google.com
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 96 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-05--03-00 (tests: 1003)

Commit Message

Marek Behún May 2, 2024, 12:29 p.m. UTC
When changing DSA user interface conduit while the user interface is up,
DSA exhibits different behavior in comparison to when the interface is
down. This different behavior concers the primary unicast MAC address
stored in the port standalone FDB and in the conduit device UC database.

If we put a switch port down while changing the conduit with
  ip link set sw0p0 down
  ip link set sw0p0 type dsa conduit conduit1
  ip link set sw0p0 up
we delete the address in dsa_user_close() and install the (possibly
different) address in dsa_user_open().

But when changing the conduit on the fly, the old address is not
deleted and the new one is not installed.

Since we explicitly want to support live-changing the conduit, uninstall
the old address before calling dsa_port_assign_conduit() and install the
(possibly different) new address after the call.

Because conduit change might also trigger address change (the user
interface is supposed to inherit the conudit interface MAC address if no
address is defined in hardware (dp->mac is a zero address)), move the
eth_hw_addr_inherit() call from dsa_user_change_conduit() to
dsa_port_change_conduit(), just before installing the new address.

Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 net/dsa/port.c | 40 ++++++++++++++++++++++++++++++++++++++++
 net/dsa/user.c | 10 ++--------
 net/dsa/user.h |  2 ++
 3 files changed, 44 insertions(+), 8 deletions(-)

Comments

Vladimir Oltean May 7, 2024, 8:18 p.m. UTC | #1
Hi Marek,

On Thu, May 02, 2024 at 02:29:22PM +0200, Marek Behún wrote:
> When changing DSA user interface conduit while the user interface is up,
> DSA exhibits different behavior in comparison to when the interface is
> down. This different behavior concers the primary unicast MAC address

nitpick: concerns

> stored in the port standalone FDB and in the conduit device UC database.
> 
> If we put a switch port down while changing the conduit with
>   ip link set sw0p0 down
>   ip link set sw0p0 type dsa conduit conduit1
>   ip link set sw0p0 up
> we delete the address in dsa_user_close() and install the (possibly
> different) address in dsa_user_open().
> 
> But when changing the conduit on the fly, the old address is not
> deleted and the new one is not installed.
> 
> Since we explicitly want to support live-changing the conduit, uninstall
> the old address before calling dsa_port_assign_conduit() and install the
> (possibly different) new address after the call.
> 
> Because conduit change might also trigger address change (the user
> interface is supposed to inherit the conudit interface MAC address if no

nitpick: conduit

> address is defined in hardware (dp->mac is a zero address)), move the
> eth_hw_addr_inherit() call from dsa_user_change_conduit() to
> dsa_port_change_conduit(), just before installing the new address.
> 
> Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
> Signed-off-by: Marek Behún <kabel@kernel.org>
> ---

Sorry for the delay. I've tested this change and basically, while there
is clearly a bug, that bug produces no adverse effects / cannot be
reproduced with felix (the only mainline driver with the feature to
change conduits). So it could be sent to 'net-next' rather that 'net' on
that very ground, if there is no other separate reason for this to go to
stable kernels anyway, I guess.

There are 2 reasons why with felix the bug does not manifest itself.

First is because both the 'ocelot' and the alternate 'ocelot-8021q'
tagging protocols have the 'promisc_on_conduit = true' flag. So the
unicast address doesn't have to be in the conduit's RX filter - neither
the old or the new conduit.

Second, dsa_user_host_uc_install() theoretically leaves behind host FDB
entries installed towards the wrong (old) CPU port. But in felix_fdb_add(),
we treat any FDB entry requested towards any CPU port as if it was a
multicast FDB entry programmed towards _all_ CPU ports. For that reason,
it is installed towards the port mask of the PGID_CPU port group ID:

	if (dsa_port_is_cpu(dp))
		port = PGID_CPU;

It would be great if this clarification would be made in the commit
message, to give the right impression to backporters seeking a correct
bug impact assessment.

BTW, I'm curious how this is going to be handled with Marvell. Basically
if all switch Ethernet interfaces have the same MAC address X which
_isn't_ inherited from their respective conduit (so it is preserved when
changing conduit), and you have a split conduit configuration like this:
- half the user ports are under eth0
- half the user ports are under eth1

then you have a situation where MAC address X needs to be programmed as
a host FDB entry both towards the CPU port next to eth0, and towards
that next to eth1.

There isn't any specific "core awareness" in DSA about the way in which
host FDB entries towards multiple CPU ports are handled in the Felix case.
So the core ends up having a not very good idea of what's happening
behind the scenes, and basically requests a migration from the old CPU
port to the new one, when in reality none takes place. I'm wondering how
things are handled in your new code; maybe we need to adapt the core
logic if there is a second implementation that's similar to felix in
this regard. Basically I'm saying that dsa_user_host_uc_install() may
not need to call dsa_port_standalone_host_fdb_add() when changing
conduit, if we had dedicated DSA API for .host_fdb_add() rather than
.port_fdb_add(port == CPU port).

Anyway, I was able to coerce the code (with extra patches) into validating
that your patch works on a driver that hypothetically does things a bit
differently than felix. So, with the commit message reorganized:

Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Marek Behún May 8, 2024, 11:13 a.m. UTC | #2
On Tue, May 07, 2024 at 11:18:27PM +0300, Vladimir Oltean wrote:
> Hi Marek,
> 
> On Thu, May 02, 2024 at 02:29:22PM +0200, Marek Behún wrote:
> > When changing DSA user interface conduit while the user interface is up,
> > DSA exhibits different behavior in comparison to when the interface is
> > down. This different behavior concers the primary unicast MAC address
> 
> nitpick: concerns
> 
> > stored in the port standalone FDB and in the conduit device UC database.
> > 
> > If we put a switch port down while changing the conduit with
> >   ip link set sw0p0 down
> >   ip link set sw0p0 type dsa conduit conduit1
> >   ip link set sw0p0 up
> > we delete the address in dsa_user_close() and install the (possibly
> > different) address in dsa_user_open().
> > 
> > But when changing the conduit on the fly, the old address is not
> > deleted and the new one is not installed.
> > 
> > Since we explicitly want to support live-changing the conduit, uninstall
> > the old address before calling dsa_port_assign_conduit() and install the
> > (possibly different) new address after the call.
> > 
> > Because conduit change might also trigger address change (the user
> > interface is supposed to inherit the conudit interface MAC address if no
> 
> nitpick: conduit
> 
> > address is defined in hardware (dp->mac is a zero address)), move the
> > eth_hw_addr_inherit() call from dsa_user_change_conduit() to
> > dsa_port_change_conduit(), just before installing the new address.
> > 
> > Fixes: 95f510d0b792 ("net: dsa: allow the DSA master to be seen and changed through rtnetlink")
> > Signed-off-by: Marek Behún <kabel@kernel.org>
> > ---
> 
> Sorry for the delay. I've tested this change and basically, while there
> is clearly a bug, that bug produces no adverse effects / cannot be
> reproduced with felix (the only mainline driver with the feature to
> change conduits). So it could be sent to 'net-next' rather that 'net' on
> that very ground, if there is no other separate reason for this to go to
> stable kernels anyway, I guess.

I did send this to net-next. The question is whether I should keep the
Fixes tag.

Marek

> There are 2 reasons why with felix the bug does not manifest itself.
> 
> First is because both the 'ocelot' and the alternate 'ocelot-8021q'
> tagging protocols have the 'promisc_on_conduit = true' flag. So the
> unicast address doesn't have to be in the conduit's RX filter - neither
> the old or the new conduit.
> 
> Second, dsa_user_host_uc_install() theoretically leaves behind host FDB
> entries installed towards the wrong (old) CPU port. But in felix_fdb_add(),
> we treat any FDB entry requested towards any CPU port as if it was a
> multicast FDB entry programmed towards _all_ CPU ports. For that reason,
> it is installed towards the port mask of the PGID_CPU port group ID:
> 
> 	if (dsa_port_is_cpu(dp))
> 		port = PGID_CPU;
> 
> It would be great if this clarification would be made in the commit
> message, to give the right impression to backporters seeking a correct
> bug impact assessment.
> 
> BTW, I'm curious how this is going to be handled with Marvell. Basically
> if all switch Ethernet interfaces have the same MAC address X which
> _isn't_ inherited from their respective conduit (so it is preserved when
> changing conduit), and you have a split conduit configuration like this:
> - half the user ports are under eth0
> - half the user ports are under eth1
> 
> then you have a situation where MAC address X needs to be programmed as
> a host FDB entry both towards the CPU port next to eth0, and towards
> that next to eth1.
> 
> There isn't any specific "core awareness" in DSA about the way in which
> host FDB entries towards multiple CPU ports are handled in the Felix case.
> So the core ends up having a not very good idea of what's happening
> behind the scenes, and basically requests a migration from the old CPU
> port to the new one, when in reality none takes place. I'm wondering how
> things are handled in your new code; maybe we need to adapt the core
> logic if there is a second implementation that's similar to felix in
> this regard. Basically I'm saying that dsa_user_host_uc_install() may
> not need to call dsa_port_standalone_host_fdb_add() when changing
> conduit, if we had dedicated DSA API for .host_fdb_add() rather than
> .port_fdb_add(port == CPU port).
> 
> Anyway, I was able to coerce the code (with extra patches) into validating
> that your patch works on a driver that hypothetically does things a bit
> differently than felix. So, with the commit message reorganized:
> 
> Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
> Tested-by: Vladimir Oltean <olteanv@gmail.com>
Vladimir Oltean May 8, 2024, 1:44 p.m. UTC | #3
On Wed, May 08, 2024 at 01:13:52PM +0200, Marek Behún wrote:
> I did send this to net-next. The question is whether I should keep the
> Fixes tag.

Well, sorry, I didn't notice that. Yes, please reference the commit in
some other way which does not appear to automated tooling that it fixes
a user visible bug.
diff mbox series

Patch

diff --git a/net/dsa/port.c b/net/dsa/port.c
index 9a249d4ac3a5..961b2dc84512 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -1467,10 +1467,34 @@  int dsa_port_change_conduit(struct dsa_port *dp, struct net_device *conduit,
 	 */
 	dsa_user_unsync_ha(dev);
 
+	/* If live-changing, we also need to uninstall the user device address
+	 * from the port FDB and the conduit interface.
+	 */
+	if (dev->flags & IFF_UP)
+		dsa_user_host_uc_uninstall(dev);
+
 	err = dsa_port_assign_conduit(dp, conduit, extack, true);
 	if (err)
 		goto rewind_old_addrs;
 
+	/* If the port doesn't have its own MAC address and relies on the DSA
+	 * conduit's one, inherit it again from the new DSA conduit.
+	 */
+	if (is_zero_ether_addr(dp->mac))
+		eth_hw_addr_inherit(dev, conduit);
+
+	/* If live-changing, we need to install the user device address to the
+	 * port FDB and the conduit interface.
+	 */
+	if (dev->flags & IFF_UP) {
+		err = dsa_user_host_uc_install(dev, dev->dev_addr);
+		if (err) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "Failed to install host UC address");
+			goto rewind_addr_inherit;
+		}
+	}
+
 	dsa_user_sync_ha(dev);
 
 	if (vlan_filtering) {
@@ -1500,10 +1524,26 @@  int dsa_port_change_conduit(struct dsa_port *dp, struct net_device *conduit,
 rewind_new_addrs:
 	dsa_user_unsync_ha(dev);
 
+	if (dev->flags & IFF_UP)
+		dsa_user_host_uc_uninstall(dev);
+
+rewind_addr_inherit:
+	if (is_zero_ether_addr(dp->mac))
+		eth_hw_addr_inherit(dev, old_conduit);
+
 	dsa_port_assign_conduit(dp, old_conduit, NULL, false);
 
 /* Restore the objects on the old CPU port */
 rewind_old_addrs:
+	if (dev->flags & IFF_UP) {
+		tmp = dsa_user_host_uc_install(dev, dev->dev_addr);
+		if (tmp) {
+			dev_err(ds->dev,
+				"port %d failed to restore host UC address: %pe\n",
+				dp->index, ERR_PTR(tmp));
+		}
+	}
+
 	dsa_user_sync_ha(dev);
 
 	if (vlan_filtering) {
diff --git a/net/dsa/user.c b/net/dsa/user.c
index b1d8d1827f91..b599f0e9459c 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -355,7 +355,7 @@  static int dsa_user_get_iflink(const struct net_device *dev)
 	return READ_ONCE(dsa_user_to_conduit(dev)->ifindex);
 }
 
-static int dsa_user_host_uc_install(struct net_device *dev, const u8 *addr)
+int dsa_user_host_uc_install(struct net_device *dev, const u8 *addr)
 {
 	struct net_device *conduit = dsa_user_to_conduit(dev);
 	struct dsa_port *dp = dsa_user_to_port(dev);
@@ -383,7 +383,7 @@  static int dsa_user_host_uc_install(struct net_device *dev, const u8 *addr)
 	return err;
 }
 
-static void dsa_user_host_uc_uninstall(struct net_device *dev)
+void dsa_user_host_uc_uninstall(struct net_device *dev)
 {
 	struct net_device *conduit = dsa_user_to_conduit(dev);
 	struct dsa_port *dp = dsa_user_to_port(dev);
@@ -2779,12 +2779,6 @@  int dsa_user_change_conduit(struct net_device *dev, struct net_device *conduit,
 			    ERR_PTR(err));
 	}
 
-	/* If the port doesn't have its own MAC address and relies on the DSA
-	 * conduit's one, inherit it again from the new DSA conduit.
-	 */
-	if (is_zero_ether_addr(dp->mac))
-		eth_hw_addr_inherit(dev, conduit);
-
 	return 0;
 
 out_revert_conduit_link:
diff --git a/net/dsa/user.h b/net/dsa/user.h
index 996069130bea..016884bead3c 100644
--- a/net/dsa/user.h
+++ b/net/dsa/user.h
@@ -42,6 +42,8 @@  int dsa_user_suspend(struct net_device *user_dev);
 int dsa_user_resume(struct net_device *user_dev);
 int dsa_user_register_notifier(void);
 void dsa_user_unregister_notifier(void);
+int dsa_user_host_uc_install(struct net_device *dev, const u8 *addr);
+void dsa_user_host_uc_uninstall(struct net_device *dev);
 void dsa_user_sync_ha(struct net_device *dev);
 void dsa_user_unsync_ha(struct net_device *dev);
 void dsa_user_setup_tagger(struct net_device *user);