[RFC,net-next,00/12] DSA changes for multiple CPU ports (part 3)

Message ID	20220523104256.3556016-1-olteanv@gmail.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Vladimir Oltean <olteanv@gmail.com> To: netdev@vger.kernel.org Cc: Jakub Kicinski <kuba@kernel.org>, Florian Fainelli <f.fainelli@gmail.com>, Vivien Didelot <vivien.didelot@gmail.com>, Andrew Lunn <andrew@lunn.ch>, Vladimir Oltean <olteanv@gmail.com>, Tobias Waldekranz <tobias@waldekranz.com>, =?utf-8?q?Marek_Beh=C3=BAn?= <kabel@kernel.org>, Ansuel Smith <ansuelsmth@gmail.com>, DENG Qingfang <dqfext@gmail.com>, =?utf-8?q?Alvin_=C5=A0ipraga?= <alsi@bang-olufsen.dk>, Claudiu Manoil <claudiu.manoil@nxp.com>, Alexandre Belloni <alexandre.belloni@bootlin.com>, UNGLinuxDriver@microchip.com, Colin Foster <colin.foster@in-advantage.com>, Linus Walleij <linus.walleij@linaro.org>, Luiz Angelo Daros de Luca <luizluca@gmail.com>, Roopa Prabhu <roopa@nvidia.com>, Nikolay Aleksandrov <razor@blackwall.org>, Frank Wunderlich <frank-w@public-files.de>, Vladimir Oltean <vladimir.oltean@nxp.com> Subject: [RFC PATCH net-next 00/12] DSA changes for multiple CPU ports (part 3) Date: Mon, 23 May 2022 13:42:44 +0300 Message-Id: <20220523104256.3556016-1-olteanv@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	DSA changes for multiple CPU ports (part 3) \| expand [RFC,net-next,00/12] DSA changes for multiple CPU ports (part 3) [RFC,net-next,01/12] net: introduce iterators over synced hw addresses [RFC,net-next,02/12] net: dsa: walk through all changeupper notifier functions [RFC,net-next,03/12] net: dsa: don't stop at NOTIFY_OK when calling ds->ops->port_prechangeupper [RFC,net-next,04/12] net: bridge: move DSA master bridging restriction to DSA [RFC,net-next,05/12] net: dsa: existing DSA masters cannot join upper interfaces [RFC,net-next,06/12] net: dsa: only bring down user ports assigned to a given DSA master [RFC,net-next,07/12] net: dsa: all DSA masters must be down when changing the tagging protocol [RFC,net-next,08/12] net: dsa: use dsa_tree_for_each_cpu_port in dsa_tree_{setup,teardown}_master [RFC,net-next,09/12] net: dsa: introduce dsa_port_get_master() [RFC,net-next,10/12] net: dsa: allow the DSA master to be seen and changed through rtnetlink [RFC,net-next,11/12] net: dsa: allow masters to join a LAG [RFC,net-next,12/12] net: dsa: felix: add support for changing DSA master

Vladimir Oltean May 23, 2022, 10:42 a.m. UTC

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Note: this patch set isn't probably tested nearly well enough, and
contains (at least minor) bugs. Don't do crazy things with it. I'm
posting it to get feedback on the proposed UAPI.

Those who have been following part 1:
https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
and part 2:
https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
will know that I am trying to enable the second internal port pair from
the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".
This series represents part 3 of that effort.

Covered here are some code structure changes so that DSA monitors
changeupper events of its masters, as well as new UAPI introduction via
rtnetlink for changing the current master. Note, in the case of a LAG
DSA master, DSA user ports can be assigned to the LAG in 2 ways, either
through this new IFLA_DSA_MASTER, or simply when their existing DSA
master joins a LAG.

Compared to previous attempts to introduce support for multiple CPU ports:
https://lore.kernel.org/netdev/20210410133454.4768-1-ansuelsmth@gmail.com/

my proposal is to not change anything in the default behavior (i.e.
still start off with the first CPU port from the device tree as the only
active CPU port). But focus is instead put on being able to live-change
what the user-to-CPU-port affinity is. Marek Behun has expressed a
potential use case as being to dynamically load balance the termination
of ports between CPU ports, and that should be best handled by a user
space daemon if it only had the means - this creates the means.

Host address filtering is interesting with multiple CPU ports.
There are 2 types of host filtered addresses to consider:
- standalone MAC addresses of ports. These are either inherited from the
  respective DSA masters of the ports, or from the device tree blob.
- local bridge FDB entries.

Traditionally, DSA manages host-filtered addresses by calling
port_fdb_add(dp->cpu_dp->index) in the appropriate database.
But for example, when we have 2 bridged DSA user ports, one with CPU
port A and the other with CPU port B, and the bridge offloads a local
FDB entry for 00:01:02:03:04:05, DSA would attempt to first call
port_fdb_add(A, 00:01:02:03:04:05, DSA_DB_BRIDGE), then
port_fdb_add(B, 00:01:02:03:04:05, DSA_DB_BRIDGE). And since an FDB
entry can have a single destination, the second port_fdb_add()
overwrites the first one, and locally terminated traffic for the ports
assigned to CPU port A is broken.

What should be done in that situation, at least with the HW I'm working
with, is that the host filtered addresses should be delivered towards a
"multicast" destination that covers both CPU ports, and let the
forwarding matrix eliminate the CPU port that the current user port
isn't affine to.

In my proposed patch set, the Felix driver does exactly that: host
filtered addresses are learned towards a special PGID_CPU that has both
tag_8021q CPU ports as destinations.

I have considered introducing new dsa_switch_ops API in the form of
host_fdb_add(user port) and host_fdb_del(user port) rather than calling
port_fdb_add(cpu port). After all, this would be similar to the newly
introduced port_set_host_flood(user port). But I need to think a bit
more whether it's needed right away.

Finally, there's LAG. Proposals have been made before to describe in DT
that CPU ports are under a LAG, the idea being that we could then do the
same for DSA (cascade) ports. The common problem is that shared (CPU and
DSA) ports have no netdev exposed.

I didn't do that, instead I went for the more natural approach of saying
that if the CPU ports are in a LAG, then the DSA masters are in a
symmetric LAG as well. So why not just monitor when the DSA masters join
a LAG, and piggyback on that configuration and make DSA reconfigure
itself accordingly.

So LAG devices can now be DSA masters, and this is accomplished by
populating their dev->dsa_ptr. Note that we do not create a specific
struct dsa_port to populate their dsa_ptr, instead we reuse the dsa_ptr
of one of the physical DSA masters (the first one, in fact).

Vladimir Oltean (12):
  net: introduce iterators over synced hw addresses
  net: dsa: walk through all changeupper notifier functions
  net: dsa: don't stop at NOTIFY_OK when calling
    ds->ops->port_prechangeupper
  net: bridge: move DSA master bridging restriction to DSA
  net: dsa: existing DSA masters cannot join upper interfaces
  net: dsa: only bring down user ports assigned to a given DSA master
  net: dsa: all DSA masters must be down when changing the tagging
    protocol
  net: dsa: use dsa_tree_for_each_cpu_port in
    dsa_tree_{setup,teardown}_master
  net: dsa: introduce dsa_port_get_master()
  net: dsa: allow the DSA master to be seen and changed through
    rtnetlink
  net: dsa: allow masters to join a LAG
  net: dsa: felix: add support for changing DSA master

 drivers/net/dsa/bcm_sf2.c                     |   4 +-
 drivers/net/dsa/bcm_sf2_cfp.c                 |   4 +-
 drivers/net/dsa/lan9303-core.c                |   4 +-
 drivers/net/dsa/ocelot/felix.c                | 117 ++++-
 drivers/net/dsa/ocelot/felix.h                |   3 +
 .../net/ethernet/mediatek/mtk_ppe_offload.c   |   2 +-
 drivers/net/ethernet/mscc/ocelot.c            |   3 +-
 include/linux/netdevice.h                     |   6 +
 include/net/dsa.h                             |  23 +
 include/soc/mscc/ocelot.h                     |   1 +
 include/uapi/linux/if_link.h                  |  10 +
 net/bridge/br_if.c                            |  20 -
 net/dsa/Makefile                              |  10 +-
 net/dsa/dsa.c                                 |   9 +
 net/dsa/dsa2.c                                |  72 ++--
 net/dsa/dsa_priv.h                            |  18 +-
 net/dsa/master.c                              |  62 ++-
 net/dsa/netlink.c                             |  62 +++
 net/dsa/port.c                                | 162 ++++++-
 net/dsa/slave.c                               | 404 +++++++++++++++++-
 net/dsa/switch.c                              |  22 +-
 net/dsa/tag_8021q.c                           |   4 +-
 22 files changed, 915 insertions(+), 107 deletions(-)
 create mode 100644 net/dsa/netlink.c

Florian Fainelli May 23, 2022, 9:53 p.m. UTC | #1

On 5/23/22 03:42, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Note: this patch set isn't probably tested nearly well enough, and
> contains (at least minor) bugs. Don't do crazy things with it. I'm
> posting it to get feedback on the proposed UAPI.
> 
> Those who have been following part 1:
> https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
> and part 2:
> https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
> will know that I am trying to enable the second internal port pair from
> the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".
> This series represents part 3 of that effort.
> 
> Covered here are some code structure changes so that DSA monitors
> changeupper events of its masters, as well as new UAPI introduction via
> rtnetlink for changing the current master. Note, in the case of a LAG
> DSA master, DSA user ports can be assigned to the LAG in 2 ways, either
> through this new IFLA_DSA_MASTER, or simply when their existing DSA
> master joins a LAG.
> 
> Compared to previous attempts to introduce support for multiple CPU ports:
> https://lore.kernel.org/netdev/20210410133454.4768-1-ansuelsmth@gmail.com/
> 
> my proposal is to not change anything in the default behavior (i.e.
> still start off with the first CPU port from the device tree as the only
> active CPU port). But focus is instead put on being able to live-change
> what the user-to-CPU-port affinity is. Marek Behun has expressed a
> potential use case as being to dynamically load balance the termination
> of ports between CPU ports, and that should be best handled by a user
> space daemon if it only had the means - this creates the means.
> 
> Host address filtering is interesting with multiple CPU ports.
> There are 2 types of host filtered addresses to consider:
> - standalone MAC addresses of ports. These are either inherited from the
>    respective DSA masters of the ports, or from the device tree blob.
> - local bridge FDB entries.
> 
> Traditionally, DSA manages host-filtered addresses by calling
> port_fdb_add(dp->cpu_dp->index) in the appropriate database.
> But for example, when we have 2 bridged DSA user ports, one with CPU
> port A and the other with CPU port B, and the bridge offloads a local
> FDB entry for 00:01:02:03:04:05, DSA would attempt to first call
> port_fdb_add(A, 00:01:02:03:04:05, DSA_DB_BRIDGE), then
> port_fdb_add(B, 00:01:02:03:04:05, DSA_DB_BRIDGE). And since an FDB
> entry can have a single destination, the second port_fdb_add()
> overwrites the first one, and locally terminated traffic for the ports
> assigned to CPU port A is broken.
> 
> What should be done in that situation, at least with the HW I'm working
> with, is that the host filtered addresses should be delivered towards a
> "multicast" destination that covers both CPU ports, and let the
> forwarding matrix eliminate the CPU port that the current user port
> isn't affine to.
> 
> In my proposed patch set, the Felix driver does exactly that: host
> filtered addresses are learned towards a special PGID_CPU that has both
> tag_8021q CPU ports as destinations.
> 
> I have considered introducing new dsa_switch_ops API in the form of
> host_fdb_add(user port) and host_fdb_del(user port) rather than calling
> port_fdb_add(cpu port). After all, this would be similar to the newly
> introduced port_set_host_flood(user port). But I need to think a bit
> more whether it's needed right away.
> 
> Finally, there's LAG. Proposals have been made before to describe in DT
> that CPU ports are under a LAG, the idea being that we could then do the
> same for DSA (cascade) ports. The common problem is that shared (CPU and
> DSA) ports have no netdev exposed.
> 
> I didn't do that, instead I went for the more natural approach of saying
> that if the CPU ports are in a LAG, then the DSA masters are in a
> symmetric LAG as well. So why not just monitor when the DSA masters join
> a LAG, and piggyback on that configuration and make DSA reconfigure
> itself accordingly.
> 
> So LAG devices can now be DSA masters, and this is accomplished by
> populating their dev->dsa_ptr. Note that we do not create a specific
> struct dsa_port to populate their dsa_ptr, instead we reuse the dsa_ptr
> of one of the physical DSA masters (the first one, in fact).

This looks pretty good to me and did not blow up with bcm_sf2 not 
implementing port_change_master, so far so good.

Vladimir Oltean May 23, 2022, 10:51 p.m. UTC | #2

On Mon, May 23, 2022 at 02:53:13PM -0700, Florian Fainelli wrote:
> This looks pretty good to me and did not blow up with bcm_sf2 not
> implementing port_change_master, so far so good.

Well, what did you expect? :)

Do you want the iproute2 patch as well, do you intend to add support for
multiple CPU ports on Starfighter?

Christian Marangi May 24, 2022, 12:02 p.m. UTC | #3

On Mon, May 23, 2022 at 01:42:44PM +0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Note: this patch set isn't probably tested nearly well enough, and
> contains (at least minor) bugs. Don't do crazy things with it. I'm
> posting it to get feedback on the proposed UAPI.
> 
> Those who have been following part 1:
> https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/
> and part 2:
> https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/
> will know that I am trying to enable the second internal port pair from
> the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q".
> This series represents part 3 of that effort.
> 
> Covered here are some code structure changes so that DSA monitors
> changeupper events of its masters, as well as new UAPI introduction via
> rtnetlink for changing the current master. Note, in the case of a LAG
> DSA master, DSA user ports can be assigned to the LAG in 2 ways, either
> through this new IFLA_DSA_MASTER, or simply when their existing DSA
> master joins a LAG.
> 
> Compared to previous attempts to introduce support for multiple CPU ports:
> https://lore.kernel.org/netdev/20210410133454.4768-1-ansuelsmth@gmail.com/
> 
> my proposal is to not change anything in the default behavior (i.e.
> still start off with the first CPU port from the device tree as the only
> active CPU port). But focus is instead put on being able to live-change
> what the user-to-CPU-port affinity is. Marek Behun has expressed a
> potential use case as being to dynamically load balance the termination
> of ports between CPU ports, and that should be best handled by a user
> space daemon if it only had the means - this creates the means.
> 
> Host address filtering is interesting with multiple CPU ports.
> There are 2 types of host filtered addresses to consider:
> - standalone MAC addresses of ports. These are either inherited from the
>   respective DSA masters of the ports, or from the device tree blob.
> - local bridge FDB entries.
> 
> Traditionally, DSA manages host-filtered addresses by calling
> port_fdb_add(dp->cpu_dp->index) in the appropriate database.
> But for example, when we have 2 bridged DSA user ports, one with CPU
> port A and the other with CPU port B, and the bridge offloads a local
> FDB entry for 00:01:02:03:04:05, DSA would attempt to first call
> port_fdb_add(A, 00:01:02:03:04:05, DSA_DB_BRIDGE), then
> port_fdb_add(B, 00:01:02:03:04:05, DSA_DB_BRIDGE). And since an FDB
> entry can have a single destination, the second port_fdb_add()
> overwrites the first one, and locally terminated traffic for the ports
> assigned to CPU port A is broken.
> 
> What should be done in that situation, at least with the HW I'm working
> with, is that the host filtered addresses should be delivered towards a
> "multicast" destination that covers both CPU ports, and let the
> forwarding matrix eliminate the CPU port that the current user port
> isn't affine to.
> 
> In my proposed patch set, the Felix driver does exactly that: host
> filtered addresses are learned towards a special PGID_CPU that has both
> tag_8021q CPU ports as destinations.
> 
> I have considered introducing new dsa_switch_ops API in the form of
> host_fdb_add(user port) and host_fdb_del(user port) rather than calling
> port_fdb_add(cpu port). After all, this would be similar to the newly
> introduced port_set_host_flood(user port). But I need to think a bit
> more whether it's needed right away.
> 
> Finally, there's LAG. Proposals have been made before to describe in DT
> that CPU ports are under a LAG, the idea being that we could then do the
> same for DSA (cascade) ports. The common problem is that shared (CPU and
> DSA) ports have no netdev exposed.
> 
> I didn't do that, instead I went for the more natural approach of saying
> that if the CPU ports are in a LAG, then the DSA masters are in a
> symmetric LAG as well. So why not just monitor when the DSA masters join
> a LAG, and piggyback on that configuration and make DSA reconfigure
> itself accordingly.
> 
> So LAG devices can now be DSA masters, and this is accomplished by
> populating their dev->dsa_ptr. Note that we do not create a specific
> struct dsa_port to populate their dsa_ptr, instead we reuse the dsa_ptr
> of one of the physical DSA masters (the first one, in fact).
> 
> Vladimir Oltean (12):
>   net: introduce iterators over synced hw addresses
>   net: dsa: walk through all changeupper notifier functions
>   net: dsa: don't stop at NOTIFY_OK when calling
>     ds->ops->port_prechangeupper
>   net: bridge: move DSA master bridging restriction to DSA
>   net: dsa: existing DSA masters cannot join upper interfaces
>   net: dsa: only bring down user ports assigned to a given DSA master
>   net: dsa: all DSA masters must be down when changing the tagging
>     protocol
>   net: dsa: use dsa_tree_for_each_cpu_port in
>     dsa_tree_{setup,teardown}_master
>   net: dsa: introduce dsa_port_get_master()
>   net: dsa: allow the DSA master to be seen and changed through
>     rtnetlink
>   net: dsa: allow masters to join a LAG
>   net: dsa: felix: add support for changing DSA master
> 
>  drivers/net/dsa/bcm_sf2.c                     |   4 +-
>  drivers/net/dsa/bcm_sf2_cfp.c                 |   4 +-
>  drivers/net/dsa/lan9303-core.c                |   4 +-
>  drivers/net/dsa/ocelot/felix.c                | 117 ++++-
>  drivers/net/dsa/ocelot/felix.h                |   3 +
>  .../net/ethernet/mediatek/mtk_ppe_offload.c   |   2 +-
>  drivers/net/ethernet/mscc/ocelot.c            |   3 +-
>  include/linux/netdevice.h                     |   6 +
>  include/net/dsa.h                             |  23 +
>  include/soc/mscc/ocelot.h                     |   1 +
>  include/uapi/linux/if_link.h                  |  10 +
>  net/bridge/br_if.c                            |  20 -
>  net/dsa/Makefile                              |  10 +-
>  net/dsa/dsa.c                                 |   9 +
>  net/dsa/dsa2.c                                |  72 ++--
>  net/dsa/dsa_priv.h                            |  18 +-
>  net/dsa/master.c                              |  62 ++-
>  net/dsa/netlink.c                             |  62 +++
>  net/dsa/port.c                                | 162 ++++++-
>  net/dsa/slave.c                               | 404 +++++++++++++++++-
>  net/dsa/switch.c                              |  22 +-
>  net/dsa/tag_8021q.c                           |   4 +-
>  22 files changed, 915 insertions(+), 107 deletions(-)
>  create mode 100644 net/dsa/netlink.c
> 
> -- 
> 2.25.1
> 

Probably offtopic but I wonder if the use of a LAG as master can
cause some problem with configuration where the switch use a mgmt port
to send settings. Wonder if with this change we will have to introduce
an additional value to declare a management port that will be used since
master can now be set to various values. Or just the driver will have to
handle this with its priv struct (think this is the correct solution)

I still have to find time to test this with qca8k.

Vladimir Oltean May 24, 2022, 12:29 p.m. UTC | #4

On Tue, May 24, 2022 at 02:02:19PM +0200, Ansuel Smith wrote:
> Probably offtopic but I wonder if the use of a LAG as master can
> cause some problem with configuration where the switch use a mgmt port
> to send settings. Wonder if with this change we will have to introduce
> an additional value to declare a management port that will be used since
> master can now be set to various values. Or just the driver will have to
> handle this with its priv struct (think this is the correct solution)
> 
> I still have to find time to test this with qca8k.

Not offtopic, this is a good point. dsa_tree_master_admin_state_change()
and dsa_tree_master_oper_state_change() set various flags in cpu_dp =
master->dsa_ptr. It's unclear if the cpu_dp we assign to a LAG should
track the admin/oper state of the LAG itself or of the physical port.
Especially since the lag->dsa_ptr is the same as one of the master->dsa_ptr.
It's clear that the same structure can't track both states. I'm thinking
we should suppress the NETDEV_CHANGE and NETDEV_UP monitoring from slave.c
on LAG DSA masters, and track only the physical ones. In any case,
management traffic does not really benefit from being sent/received over
a LAG, and I'm thinking we should just use the physical port.
Your qca8k_master_change() function explicitly only checks for CPU port
0, which in retrospect was a very wise decision in terms of forward
compatibility with device trees with multiple CPU ports.

Christian Marangi May 24, 2022, 12:38 p.m. UTC | #5

On Tue, May 24, 2022 at 12:29:06PM +0000, Vladimir Oltean wrote:
> On Tue, May 24, 2022 at 02:02:19PM +0200, Ansuel Smith wrote:
> > Probably offtopic but I wonder if the use of a LAG as master can
> > cause some problem with configuration where the switch use a mgmt port
> > to send settings. Wonder if with this change we will have to introduce
> > an additional value to declare a management port that will be used since
> > master can now be set to various values. Or just the driver will have to
> > handle this with its priv struct (think this is the correct solution)
> > 
> > I still have to find time to test this with qca8k.
> 
> Not offtopic, this is a good point. dsa_tree_master_admin_state_change()
> and dsa_tree_master_oper_state_change() set various flags in cpu_dp =
> master->dsa_ptr. It's unclear if the cpu_dp we assign to a LAG should
> track the admin/oper state of the LAG itself or of the physical port.
> Especially since the lag->dsa_ptr is the same as one of the master->dsa_ptr.
> It's clear that the same structure can't track both states. I'm thinking
> we should suppress the NETDEV_CHANGE and NETDEV_UP monitoring from slave.c
> on LAG DSA masters, and track only the physical ones. In any case,
> management traffic does not really benefit from being sent/received over
> a LAG, and I'm thinking we should just use the physical port.
> Your qca8k_master_change() function explicitly only checks for CPU port
> 0, which in retrospect was a very wise decision in terms of forward
> compatibility with device trees with multiple CPU ports.

Switch can also have some hw limitation where mgmt packet are accepted
only by one specific port and I assume using a LAG with load balance can
cause some problem (packet not ack).

Yes I think the oper_state_change would be problematic with a LAG
configuration since the driver should use the pysical port anyway (to
prevent any hw limitation/issue) and track only that.

But I think we can put that on hold and think of a correct solution when
we have a solid base with all of this implemented. Considering qca8k
is the only user of that feature and things will have to change anyway
when qca8k will get support for multiple cpu port, we can address that
later. (in theory everything should work correctly if qca8k doesn't
declare multiple cpu port or a LAG is not confugred)

Vladimir Oltean May 24, 2022, 1:24 p.m. UTC | #6

On Tue, May 24, 2022 at 02:38:53PM +0200, Ansuel Smith wrote:
> On Tue, May 24, 2022 at 12:29:06PM +0000, Vladimir Oltean wrote:
> > On Tue, May 24, 2022 at 02:02:19PM +0200, Ansuel Smith wrote:
> > > Probably offtopic but I wonder if the use of a LAG as master can
> > > cause some problem with configuration where the switch use a mgmt port
> > > to send settings. Wonder if with this change we will have to introduce
> > > an additional value to declare a management port that will be used since
> > > master can now be set to various values. Or just the driver will have to
> > > handle this with its priv struct (think this is the correct solution)
> > > 
> > > I still have to find time to test this with qca8k.
> > 
> > Not offtopic, this is a good point. dsa_tree_master_admin_state_change()
> > and dsa_tree_master_oper_state_change() set various flags in cpu_dp =
> > master->dsa_ptr. It's unclear if the cpu_dp we assign to a LAG should
> > track the admin/oper state of the LAG itself or of the physical port.
> > Especially since the lag->dsa_ptr is the same as one of the master->dsa_ptr.
> > It's clear that the same structure can't track both states. I'm thinking
> > we should suppress the NETDEV_CHANGE and NETDEV_UP monitoring from slave.c
> > on LAG DSA masters, and track only the physical ones. In any case,
> > management traffic does not really benefit from being sent/received over
> > a LAG, and I'm thinking we should just use the physical port.
> > Your qca8k_master_change() function explicitly only checks for CPU port
> > 0, which in retrospect was a very wise decision in terms of forward
> > compatibility with device trees with multiple CPU ports.
> 
> Switch can also have some hw limitation where mgmt packet are accepted
> only by one specific port and I assume using a LAG with load balance can
> cause some problem (packet not ack).
> 
> Yes I think the oper_state_change would be problematic with a LAG
> configuration since the driver should use the pysical port anyway (to
> prevent any hw limitation/issue) and track only that.
> 
> But I think we can put that on hold and think of a correct solution when
> we have a solid base with all of this implemented. Considering qca8k
> is the only user of that feature and things will have to change anyway
> when qca8k will get support for multiple cpu port, we can address that
> later. (in theory everything should work correctly if qca8k doesn't
> declare multiple cpu port or a LAG is not confugred) 

Consider this - the way in which DSA tracks the state of DSA masters
already "supports multiple [ physical ] CPU ports". It's just a matter
of driver writers acknowledging this and doing the right thing in the
ds->ops->master_state_change() callback. DSA tells us when any physical
master goes up or down, and it does this regardless of whether that
master's dsa_ptr is the dp->cpu_dp of any user port. Otherwise said,
given this device tree snippet:

eth0: ethernet@0 {
	...
};

eth1: ethernet@1 {
	...
};

ethernet-switch@0 {
	ethernet-ports {
		ethernet-port@0 {
			label = "swp0";
		};

		ethernet-port@1 {
			label = "swp1";
		};

		ethernet-port@2 {
			ethernet = <&eth0>;
		};

		ethernet-port@3 {
			ethernet = <&eth1>;
		};
	};
};

Current mainline DSA will create swp0@eth0 and swp1@eth0, but it will
call dsa_master_setup(eth0) and dsa_master_setup(eth1). Then it will
monitor the state of both eth0 and eth1, and pass updates to both
masters' states down to the driver.

It is therefore the responsibility of the driver to ensure forward
compatibility with multiple CPU ports (otherwise said, if one master
goes down, don't hurry to say "I don't have any management interface to
use for register access" - maybe the other one is still ok).
Consequently, you can use a DSA master for register access even if no
dp->cpu_dp points to it. This patch set is just about changing the
dp->cpu_dp mapping that is used for netdevice traffic, which is quite
orthogonal to the concern you describe.

[RFC,net-next,00/12] DSA changes for multiple CPU ports (part 3)

Message

Comments