diff mbox series

[v2,net,4/4] net: dsa: mv88e6xxx: Limit rsvd2cpu policy to user ports on 6393X

Message ID 20241219123106.730032-5-tobias@waldekranz.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: dsa: mv88e6xxx: Amethyst (6393X) fixes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: edumazet@google.com
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 57 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-12-20--00-00 (tests: 880)

Commit Message

Tobias Waldekranz Dec. 19, 2024, 12:30 p.m. UTC
For packets with a DA in the IEEE reserved L2 group range, originating
from a CPU, forward it as normal, rather than classifying it as
management.

Example use-case:

     bridge (group_fwd_mask 0x4000)
     / |  \
 swp1 swp2 tap0
   \   /
(mv88e6xxx)

We've created a bridge with a non-zero group_fwd_mask (allowing LLDP
in this example) containing a set of ports managed by mv88e6xxx and
some foreign interface (e.g. an L2 VPN tunnel).

Since an LLDP packet coming in to the bridge from the other side of
tap0 is eligable for tx forward offloading, a FORWARD frame destined
for swp1 and swp2 would be send to the conduit interface.

Before this change, due to rsvd2cpu being enabled on the CPU port, the
switch would try to trap it back to the CPU. Given that the CPU is
trusted, instead assume that it indeed meant for the packet to be
forwarded like any other.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/dsa/mv88e6xxx/port.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

Comments

Andrew Lunn Dec. 19, 2024, 1:44 p.m. UTC | #1
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote:
> For packets with a DA in the IEEE reserved L2 group range, originating
> from a CPU, forward it as normal, rather than classifying it as
> management.
> 
> Example use-case:
> 
>      bridge (group_fwd_mask 0x4000)
>      / |  \
>  swp1 swp2 tap0
>    \   /
> (mv88e6xxx)
> 
> We've created a bridge with a non-zero group_fwd_mask (allowing LLDP
> in this example) containing a set of ports managed by mv88e6xxx and
> some foreign interface (e.g. an L2 VPN tunnel).
> 
> Since an LLDP packet coming in to the bridge from the other side of
> tap0 is eligable for tx forward offloading, a FORWARD frame destined
> for swp1 and swp2 would be send to the conduit interface.
> 
> Before this change, due to rsvd2cpu being enabled on the CPU port, the
> switch would try to trap it back to the CPU. Given that the CPU is
> trusted, instead assume that it indeed meant for the packet to be
> forwarded like any other.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Vladimir Oltean Dec. 19, 2024, 2:05 p.m. UTC | #2
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote:
> For packets with a DA in the IEEE reserved L2 group range, originating
> from a CPU, forward it as normal, rather than classifying it as
> management.

Doesn't this break STP? Must be able to inject into ports with an STP
state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU
tag for that, can't do it with DSA_CMD_FORWARD.

> Example use-case:
> 
>      bridge (group_fwd_mask 0x4000)
>      / |  \
>  swp1 swp2 tap0
>    \   /
> (mv88e6xxx)
> 
> We've created a bridge with a non-zero group_fwd_mask (allowing LLDP
> in this example) containing a set of ports managed by mv88e6xxx and
> some foreign interface (e.g. an L2 VPN tunnel).
> 
> Since an LLDP packet coming in to the bridge from the other side of
> tap0 is eligable for tx forward offloading, a FORWARD frame destined
> for swp1 and swp2 would be send to the conduit interface.
> 
> Before this change, due to rsvd2cpu being enabled on the CPU port, the
> switch would try to trap it back to the CPU. Given that the CPU is
> trusted, instead assume that it indeed meant for the packet to be
> forwarded like any other.

It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't
we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception
for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark?
Vladimir Oltean Dec. 19, 2024, 2:14 p.m. UTC | #3
On Thu, Dec 19, 2024 at 04:05:41PM +0200, Vladimir Oltean wrote:
> On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote:
> > For packets with a DA in the IEEE reserved L2 group range, originating
> > from a CPU, forward it as normal, rather than classifying it as
> > management.
> 
> Doesn't this break STP? Must be able to inject into ports with an STP
> state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU
> tag for that, can't do it with DSA_CMD_FORWARD.

ah, I made a stupid mistake, I though locally generated STP goes through
__br_forward().

I put a WARN_ON_ONCE(skb->protocol == htons(ETH_P_802_2)); in DSA xmit
and convinced myself that this is not the case.

[   67.115425]  br_send_bpdu+0x130/0x2a0
[   67.119187]  br_send_config_bpdu+0x12c/0x170
[   67.123559]  br_transmit_config+0x114/0x180
[   67.127842]  br_config_bpdu_generation+0x6c/0x88
[   67.132562]  br_hello_timer_expired+0x44/0x98
[   67.137022]  call_timer_fn+0xc8/0x300

Please ignore this comment.
Vladimir Oltean Dec. 19, 2024, 2:29 p.m. UTC | #4
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote:
> For packets with a DA in the IEEE reserved L2 group range, originating
> from a CPU, forward it as normal, rather than classifying it as
> management.
> 
> Example use-case:
> 
>      bridge (group_fwd_mask 0x4000)
>      / |  \
>  swp1 swp2 tap0
>    \   /
> (mv88e6xxx)
> 
> We've created a bridge with a non-zero group_fwd_mask (allowing LLDP
> in this example) containing a set of ports managed by mv88e6xxx and
> some foreign interface (e.g. an L2 VPN tunnel).
> 
> Since an LLDP packet coming in to the bridge from the other side of
> tap0 is eligable for tx forward offloading, a FORWARD frame destined
> for swp1 and swp2 would be send to the conduit interface.
> 
> Before this change, due to rsvd2cpu being enabled on the CPU port, the
> switch would try to trap it back to the CPU. Given that the CPU is
> trusted, instead assume that it indeed meant for the packet to be
> forwarded like any other.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>

Is it fair to say that commit d82f8ab0d874 ("net: dsa: tag_dsa: offload
the bridge forwarding process") broke this and that we need a Fixes: tag
for that and not earlier? Prior to that, I believe that rsvd2cpu would
not affect these forwarded reserved L2 multicast groups, because they
were sent with FROM_CPU.

> ---
>  drivers/net/dsa/mv88e6xxx/port.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 56ed2f57fef8..bf6d558c112c 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -1416,6 +1416,23 @@ static int mv88e6393x_port_policy_write_all(struct mv88e6xxx_chip *chip,
>  	return 0;
>  }
>  
> +static int mv88e6393x_port_policy_write_user(struct mv88e6xxx_chip *chip,
> +					     u16 pointer, u8 data)
> +{
> +	int err, port;
> +
> +	for (port = 0; port < mv88e6xxx_num_ports(chip); port++) {
> +		if (!dsa_is_user_port(chip->ds, port))
> +			continue;

Can you remember to convert this "dsa_to_port() called in a loop"
antipattern to dsa_switch_for_each_user_port() in net-next? I wanted to
ask you to do it now, but the blamed commit is in kernel 5.15, and
82b318983c51 ("net: dsa: introduce helpers for iterating through ports
using dp") made its appearance in 5.16.

> +
> +		err = mv88e6393x_port_policy_write(chip, port, pointer, data);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
>  int mv88e6393x_set_egress_port(struct mv88e6xxx_chip *chip,
>  			       enum mv88e6xxx_egress_direction direction,
>  			       int port)
> @@ -1457,26 +1474,28 @@ int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip)
>  	int err;
>  
>  	/* Consider the frames with reserved multicast destination
> -	 * addresses matching 01:80:c2:00:00:00 and
> -	 * 01:80:c2:00:00:02 as MGMT.
> +	 * addresses matching 01:80:c2:00:00:00 and 01:80:c2:00:00:02

Is the comment correct? LLDP is group 01:80:c2:00:00:0e. It doesn't make
it sound like what is done here should affect it.

> +	 * as MGMT when received on user ports. Forward as normal on
> +	 * CPU/DSA ports, to support bridges with non-zero
> +	 * group_fwd_masks.
>  	 */
>  	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XLO;
> -	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
> +	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
>  	if (err)
>  		return err;
>  
>  	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XHI;
> -	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
> +	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
>  	if (err)
>  		return err;
>  
>  	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XLO;
> -	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
> +	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
>  	if (err)
>  		return err;
>  
>  	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XHI;
> -	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
> +	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
>  	if (err)
>  		return err;
>  
> -- 
> 2.43.0
>
Tobias Waldekranz Dec. 19, 2024, 2:34 p.m. UTC | #5
On tor, dec 19, 2024 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote:
>> For packets with a DA in the IEEE reserved L2 group range, originating
>> from a CPU, forward it as normal, rather than classifying it as
>> management.
>
> Doesn't this break STP? Must be able to inject into ports with an STP
> state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU
> tag for that, can't do it with DSA_CMD_FORWARD.

You need a FROM_CPU to force a packet out through a blocking port,
yes. But I don't see how this could apply to STP.

If STP is enabled on the bridge, we will never allow BPDUs to be
forwarded. Locally originating STP BPDUs are always injected directly on
the lower interface, so OFM is never set on those.

If STP is disabled on the bridge, then we will forward incoming BPDUs
(br_handle_frame()). In that case OFM could be set if the BPDU came in
on a foreign port. But since STP is disabled, no port will be blocked in
this case, so it would not matter.

>> Example use-case:
>> 
>>      bridge (group_fwd_mask 0x4000)
>>      / |  \
>>  swp1 swp2 tap0
>>    \   /
>> (mv88e6xxx)
>> 
>> We've created a bridge with a non-zero group_fwd_mask (allowing LLDP
>> in this example) containing a set of ports managed by mv88e6xxx and
>> some foreign interface (e.g. an L2 VPN tunnel).
>> 
>> Since an LLDP packet coming in to the bridge from the other side of
>> tap0 is eligable for tx forward offloading, a FORWARD frame destined
>> for swp1 and swp2 would be send to the conduit interface.
>> 
>> Before this change, due to rsvd2cpu being enabled on the CPU port, the
>> switch would try to trap it back to the CPU. Given that the CPU is
>> trusted, instead assume that it indeed meant for the packet to be
>> forwarded like any other.
>
> It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't
> we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception
> for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark?

That sounds like a better option if it is acceptible to the broader
community. I thought that this might be a quirk of mv88e6xxx's rsvd2cpu
bits. But if more devices behave in the same way, then it would be
better to just exempt this whole class from offloading.

Do you know how any other ASICs behave from this perspective?
Vladimir Oltean Dec. 19, 2024, 2:42 p.m. UTC | #6
On Thu, Dec 19, 2024 at 03:34:43PM +0100, Tobias Waldekranz wrote:
> On tor, dec 19, 2024 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote:
> > It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't
> > we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception
> > for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark?
> 
> That sounds like a better option if it is acceptible to the broader
> community. I thought that this might be a quirk of mv88e6xxx's rsvd2cpu
> bits. But if more devices behave in the same way, then it would be
> better to just exempt this whole class from offloading.
> 
> Do you know how any other ASICs behave from this perspective?

The other driver with tx_fwd_offload, sja1105, is going to drop any
packet coming from the host_port which isn't sent through a management
route (set up by sja1105_defer_xmit()). So it's more than likely bugged.

We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit()
and sja1105_defer_xmit(). It's not just the order of operations in the
tagger. It's the fact that the bridge thinks it doesn't need to clone
the skb, and it does.

So yes, it's probably best to exclude link-local from skb->offload_fwd_mark.
Vladimir Oltean Dec. 19, 2024, 2:52 p.m. UTC | #7
On Thu, Dec 19, 2024 at 04:42:08PM +0200, Vladimir Oltean wrote:
> The other driver with tx_fwd_offload, sja1105,

Correction: I forgot there is one more driver with tx_fwd_offload:
vsc73xx, but that doesn't properly support link-local traffic yet at all,
according to the vsc73xx_port_stp_state_set() comment. So we can ignore it.

> is going to drop any
> packet coming from the host_port which isn't sent through a management
> route (set up by sja1105_defer_xmit()). So it's more than likely bugged.
> 
> We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit()
> and sja1105_defer_xmit(). It's not just the order of operations in the
> tagger. It's the fact that the bridge thinks it doesn't need to clone
> the skb, and it does.

Another correction: we could probably make a best-effort attempt to
honor skb->offload_fwd_mark in sja1105_mgmt_xmit() by setting mgmt_route.destports
to the bit mask of all other ports that are in dsa_port_bridge_dev_get(dp).
But it gets unpleasantly difficult to manage, plus I think we still don't
get MAC SA learning from these packets.

> So yes, it's probably best to exclude link-local from skb->offload_fwd_mark.

So I'm still of this opinion :) I think the effort to handle the corner
cases isn't worth it relative to the benefit of offloading the forwarding
of slow protocols.
Tobias Waldekranz Dec. 19, 2024, 3:02 p.m. UTC | #8
On tor, dec 19, 2024 at 16:52, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Thu, Dec 19, 2024 at 04:42:08PM +0200, Vladimir Oltean wrote:
>> The other driver with tx_fwd_offload, sja1105,
>
> Correction: I forgot there is one more driver with tx_fwd_offload:
> vsc73xx, but that doesn't properly support link-local traffic yet at all,
> according to the vsc73xx_port_stp_state_set() comment. So we can ignore it.
>
>> is going to drop any
>> packet coming from the host_port which isn't sent through a management
>> route (set up by sja1105_defer_xmit()). So it's more than likely bugged.
>> 
>> We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit()
>> and sja1105_defer_xmit(). It's not just the order of operations in the
>> tagger. It's the fact that the bridge thinks it doesn't need to clone
>> the skb, and it does.
>
> Another correction: we could probably make a best-effort attempt to
> honor skb->offload_fwd_mark in sja1105_mgmt_xmit() by setting mgmt_route.destports
> to the bit mask of all other ports that are in dsa_port_bridge_dev_get(dp).
> But it gets unpleasantly difficult to manage, plus I think we still don't
> get MAC SA learning from these packets.
>
>> So yes, it's probably best to exclude link-local from skb->offload_fwd_mark.
>
> So I'm still of this opinion :) I think the effort to handle the corner
> cases isn't worth it relative to the benefit of offloading the forwarding
> of slow protocols.

Agreed. I will remove this patch in v3 and replace it with a general
patch for the bridge instead. Thanks!
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 56ed2f57fef8..bf6d558c112c 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -1416,6 +1416,23 @@  static int mv88e6393x_port_policy_write_all(struct mv88e6xxx_chip *chip,
 	return 0;
 }
 
+static int mv88e6393x_port_policy_write_user(struct mv88e6xxx_chip *chip,
+					     u16 pointer, u8 data)
+{
+	int err, port;
+
+	for (port = 0; port < mv88e6xxx_num_ports(chip); port++) {
+		if (!dsa_is_user_port(chip->ds, port))
+			continue;
+
+		err = mv88e6393x_port_policy_write(chip, port, pointer, data);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 int mv88e6393x_set_egress_port(struct mv88e6xxx_chip *chip,
 			       enum mv88e6xxx_egress_direction direction,
 			       int port)
@@ -1457,26 +1474,28 @@  int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip)
 	int err;
 
 	/* Consider the frames with reserved multicast destination
-	 * addresses matching 01:80:c2:00:00:00 and
-	 * 01:80:c2:00:00:02 as MGMT.
+	 * addresses matching 01:80:c2:00:00:00 and 01:80:c2:00:00:02
+	 * as MGMT when received on user ports. Forward as normal on
+	 * CPU/DSA ports, to support bridges with non-zero
+	 * group_fwd_masks.
 	 */
 	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XLO;
-	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
+	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
 	if (err)
 		return err;
 
 	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XHI;
-	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
+	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
 	if (err)
 		return err;
 
 	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XLO;
-	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
+	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
 	if (err)
 		return err;
 
 	ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XHI;
-	err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff);
+	err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff);
 	if (err)
 		return err;