Message ID | 20241219123106.730032-5-tobias@waldekranz.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dsa: mv88e6xxx: Amethyst (6393X) fixes | expand |
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote: > For packets with a DA in the IEEE reserved L2 group range, originating > from a CPU, forward it as normal, rather than classifying it as > management. > > Example use-case: > > bridge (group_fwd_mask 0x4000) > / | \ > swp1 swp2 tap0 > \ / > (mv88e6xxx) > > We've created a bridge with a non-zero group_fwd_mask (allowing LLDP > in this example) containing a set of ports managed by mv88e6xxx and > some foreign interface (e.g. an L2 VPN tunnel). > > Since an LLDP packet coming in to the bridge from the other side of > tap0 is eligable for tx forward offloading, a FORWARD frame destined > for swp1 and swp2 would be send to the conduit interface. > > Before this change, due to rsvd2cpu being enabled on the CPU port, the > switch would try to trap it back to the CPU. Given that the CPU is > trusted, instead assume that it indeed meant for the packet to be > forwarded like any other. > > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote: > For packets with a DA in the IEEE reserved L2 group range, originating > from a CPU, forward it as normal, rather than classifying it as > management. Doesn't this break STP? Must be able to inject into ports with an STP state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU tag for that, can't do it with DSA_CMD_FORWARD. > Example use-case: > > bridge (group_fwd_mask 0x4000) > / | \ > swp1 swp2 tap0 > \ / > (mv88e6xxx) > > We've created a bridge with a non-zero group_fwd_mask (allowing LLDP > in this example) containing a set of ports managed by mv88e6xxx and > some foreign interface (e.g. an L2 VPN tunnel). > > Since an LLDP packet coming in to the bridge from the other side of > tap0 is eligable for tx forward offloading, a FORWARD frame destined > for swp1 and swp2 would be send to the conduit interface. > > Before this change, due to rsvd2cpu being enabled on the CPU port, the > switch would try to trap it back to the CPU. Given that the CPU is > trusted, instead assume that it indeed meant for the packet to be > forwarded like any other. It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark?
On Thu, Dec 19, 2024 at 04:05:41PM +0200, Vladimir Oltean wrote: > On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote: > > For packets with a DA in the IEEE reserved L2 group range, originating > > from a CPU, forward it as normal, rather than classifying it as > > management. > > Doesn't this break STP? Must be able to inject into ports with an STP > state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU > tag for that, can't do it with DSA_CMD_FORWARD. ah, I made a stupid mistake, I though locally generated STP goes through __br_forward(). I put a WARN_ON_ONCE(skb->protocol == htons(ETH_P_802_2)); in DSA xmit and convinced myself that this is not the case. [ 67.115425] br_send_bpdu+0x130/0x2a0 [ 67.119187] br_send_config_bpdu+0x12c/0x170 [ 67.123559] br_transmit_config+0x114/0x180 [ 67.127842] br_config_bpdu_generation+0x6c/0x88 [ 67.132562] br_hello_timer_expired+0x44/0x98 [ 67.137022] call_timer_fn+0xc8/0x300 Please ignore this comment.
On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote: > For packets with a DA in the IEEE reserved L2 group range, originating > from a CPU, forward it as normal, rather than classifying it as > management. > > Example use-case: > > bridge (group_fwd_mask 0x4000) > / | \ > swp1 swp2 tap0 > \ / > (mv88e6xxx) > > We've created a bridge with a non-zero group_fwd_mask (allowing LLDP > in this example) containing a set of ports managed by mv88e6xxx and > some foreign interface (e.g. an L2 VPN tunnel). > > Since an LLDP packet coming in to the bridge from the other side of > tap0 is eligable for tx forward offloading, a FORWARD frame destined > for swp1 and swp2 would be send to the conduit interface. > > Before this change, due to rsvd2cpu being enabled on the CPU port, the > switch would try to trap it back to the CPU. Given that the CPU is > trusted, instead assume that it indeed meant for the packet to be > forwarded like any other. > > Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Is it fair to say that commit d82f8ab0d874 ("net: dsa: tag_dsa: offload the bridge forwarding process") broke this and that we need a Fixes: tag for that and not earlier? Prior to that, I believe that rsvd2cpu would not affect these forwarded reserved L2 multicast groups, because they were sent with FROM_CPU. > --- > drivers/net/dsa/mv88e6xxx/port.c | 31 +++++++++++++++++++++++++------ > 1 file changed, 25 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c > index 56ed2f57fef8..bf6d558c112c 100644 > --- a/drivers/net/dsa/mv88e6xxx/port.c > +++ b/drivers/net/dsa/mv88e6xxx/port.c > @@ -1416,6 +1416,23 @@ static int mv88e6393x_port_policy_write_all(struct mv88e6xxx_chip *chip, > return 0; > } > > +static int mv88e6393x_port_policy_write_user(struct mv88e6xxx_chip *chip, > + u16 pointer, u8 data) > +{ > + int err, port; > + > + for (port = 0; port < mv88e6xxx_num_ports(chip); port++) { > + if (!dsa_is_user_port(chip->ds, port)) > + continue; Can you remember to convert this "dsa_to_port() called in a loop" antipattern to dsa_switch_for_each_user_port() in net-next? I wanted to ask you to do it now, but the blamed commit is in kernel 5.15, and 82b318983c51 ("net: dsa: introduce helpers for iterating through ports using dp") made its appearance in 5.16. > + > + err = mv88e6393x_port_policy_write(chip, port, pointer, data); > + if (err) > + return err; > + } > + > + return 0; > +} > + > int mv88e6393x_set_egress_port(struct mv88e6xxx_chip *chip, > enum mv88e6xxx_egress_direction direction, > int port) > @@ -1457,26 +1474,28 @@ int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip) > int err; > > /* Consider the frames with reserved multicast destination > - * addresses matching 01:80:c2:00:00:00 and > - * 01:80:c2:00:00:02 as MGMT. > + * addresses matching 01:80:c2:00:00:00 and 01:80:c2:00:00:02 Is the comment correct? LLDP is group 01:80:c2:00:00:0e. It doesn't make it sound like what is done here should affect it. > + * as MGMT when received on user ports. Forward as normal on > + * CPU/DSA ports, to support bridges with non-zero > + * group_fwd_masks. > */ > ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XLO; > - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); > + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); > if (err) > return err; > > ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XHI; > - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); > + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); > if (err) > return err; > > ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XLO; > - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); > + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); > if (err) > return err; > > ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XHI; > - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); > + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); > if (err) > return err; > > -- > 2.43.0 >
On tor, dec 19, 2024 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote: > On Thu, Dec 19, 2024 at 01:30:43PM +0100, Tobias Waldekranz wrote: >> For packets with a DA in the IEEE reserved L2 group range, originating >> from a CPU, forward it as normal, rather than classifying it as >> management. > > Doesn't this break STP? Must be able to inject into ports with an STP > state other than FORWARDING. I expect that you need a DSA_CMD_FROM_CPU > tag for that, can't do it with DSA_CMD_FORWARD. You need a FROM_CPU to force a packet out through a blocking port, yes. But I don't see how this could apply to STP. If STP is enabled on the bridge, we will never allow BPDUs to be forwarded. Locally originating STP BPDUs are always injected directly on the lower interface, so OFM is never set on those. If STP is disabled on the bridge, then we will forward incoming BPDUs (br_handle_frame()). In that case OFM could be set if the BPDU came in on a foreign port. But since STP is disabled, no port will be blocked in this case, so it would not matter. >> Example use-case: >> >> bridge (group_fwd_mask 0x4000) >> / | \ >> swp1 swp2 tap0 >> \ / >> (mv88e6xxx) >> >> We've created a bridge with a non-zero group_fwd_mask (allowing LLDP >> in this example) containing a set of ports managed by mv88e6xxx and >> some foreign interface (e.g. an L2 VPN tunnel). >> >> Since an LLDP packet coming in to the bridge from the other side of >> tap0 is eligable for tx forward offloading, a FORWARD frame destined >> for swp1 and swp2 would be send to the conduit interface. >> >> Before this change, due to rsvd2cpu being enabled on the CPU port, the >> switch would try to trap it back to the CPU. Given that the CPU is >> trusted, instead assume that it indeed meant for the packet to be >> forwarded like any other. > > It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't > we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception > for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark? That sounds like a better option if it is acceptible to the broader community. I thought that this might be a quirk of mv88e6xxx's rsvd2cpu bits. But if more devices behave in the same way, then it would be better to just exempt this whole class from offloading. Do you know how any other ASICs behave from this perspective?
On Thu, Dec 19, 2024 at 03:34:43PM +0100, Tobias Waldekranz wrote: > On tor, dec 19, 2024 at 16:05, Vladimir Oltean <olteanv@gmail.com> wrote: > > It looks like an oversight in the switchdev tx_fwd_offload scheme. Can't > > we teach nbp_switchdev_frame_mark_tx_fwd_offload() to make an exception > > for is_link_local_ether_addr() packets, and not set skb->offload_fwd_mark? > > That sounds like a better option if it is acceptible to the broader > community. I thought that this might be a quirk of mv88e6xxx's rsvd2cpu > bits. But if more devices behave in the same way, then it would be > better to just exempt this whole class from offloading. > > Do you know how any other ASICs behave from this perspective? The other driver with tx_fwd_offload, sja1105, is going to drop any packet coming from the host_port which isn't sent through a management route (set up by sja1105_defer_xmit()). So it's more than likely bugged. We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit() and sja1105_defer_xmit(). It's not just the order of operations in the tagger. It's the fact that the bridge thinks it doesn't need to clone the skb, and it does. So yes, it's probably best to exclude link-local from skb->offload_fwd_mark.
On Thu, Dec 19, 2024 at 04:42:08PM +0200, Vladimir Oltean wrote: > The other driver with tx_fwd_offload, sja1105, Correction: I forgot there is one more driver with tx_fwd_offload: vsc73xx, but that doesn't properly support link-local traffic yet at all, according to the vsc73xx_port_stp_state_set() comment. So we can ignore it. > is going to drop any > packet coming from the host_port which isn't sent through a management > route (set up by sja1105_defer_xmit()). So it's more than likely bugged. > > We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit() > and sja1105_defer_xmit(). It's not just the order of operations in the > tagger. It's the fact that the bridge thinks it doesn't need to clone > the skb, and it does. Another correction: we could probably make a best-effort attempt to honor skb->offload_fwd_mark in sja1105_mgmt_xmit() by setting mgmt_route.destports to the bit mask of all other ports that are in dsa_port_bridge_dev_get(dp). But it gets unpleasantly difficult to manage, plus I think we still don't get MAC SA learning from these packets. > So yes, it's probably best to exclude link-local from skb->offload_fwd_mark. So I'm still of this opinion :) I think the effort to handle the corner cases isn't worth it relative to the benefit of offloading the forwarding of slow protocols.
On tor, dec 19, 2024 at 16:52, Vladimir Oltean <olteanv@gmail.com> wrote: > On Thu, Dec 19, 2024 at 04:42:08PM +0200, Vladimir Oltean wrote: >> The other driver with tx_fwd_offload, sja1105, > > Correction: I forgot there is one more driver with tx_fwd_offload: > vsc73xx, but that doesn't properly support link-local traffic yet at all, > according to the vsc73xx_port_stp_state_set() comment. So we can ignore it. > >> is going to drop any >> packet coming from the host_port which isn't sent through a management >> route (set up by sja1105_defer_xmit()). So it's more than likely bugged. >> >> We can't fix this from sja1105_xmit() by reordering sja1105_imprecise_xmit() >> and sja1105_defer_xmit(). It's not just the order of operations in the >> tagger. It's the fact that the bridge thinks it doesn't need to clone >> the skb, and it does. > > Another correction: we could probably make a best-effort attempt to > honor skb->offload_fwd_mark in sja1105_mgmt_xmit() by setting mgmt_route.destports > to the bit mask of all other ports that are in dsa_port_bridge_dev_get(dp). > But it gets unpleasantly difficult to manage, plus I think we still don't > get MAC SA learning from these packets. > >> So yes, it's probably best to exclude link-local from skb->offload_fwd_mark. > > So I'm still of this opinion :) I think the effort to handle the corner > cases isn't worth it relative to the benefit of offloading the forwarding > of slow protocols. Agreed. I will remove this patch in v3 and replace it with a general patch for the bridge instead. Thanks!
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c index 56ed2f57fef8..bf6d558c112c 100644 --- a/drivers/net/dsa/mv88e6xxx/port.c +++ b/drivers/net/dsa/mv88e6xxx/port.c @@ -1416,6 +1416,23 @@ static int mv88e6393x_port_policy_write_all(struct mv88e6xxx_chip *chip, return 0; } +static int mv88e6393x_port_policy_write_user(struct mv88e6xxx_chip *chip, + u16 pointer, u8 data) +{ + int err, port; + + for (port = 0; port < mv88e6xxx_num_ports(chip); port++) { + if (!dsa_is_user_port(chip->ds, port)) + continue; + + err = mv88e6393x_port_policy_write(chip, port, pointer, data); + if (err) + return err; + } + + return 0; +} + int mv88e6393x_set_egress_port(struct mv88e6xxx_chip *chip, enum mv88e6xxx_egress_direction direction, int port) @@ -1457,26 +1474,28 @@ int mv88e6393x_port_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip) int err; /* Consider the frames with reserved multicast destination - * addresses matching 01:80:c2:00:00:00 and - * 01:80:c2:00:00:02 as MGMT. + * addresses matching 01:80:c2:00:00:00 and 01:80:c2:00:00:02 + * as MGMT when received on user ports. Forward as normal on + * CPU/DSA ports, to support bridges with non-zero + * group_fwd_masks. */ ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XLO; - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); if (err) return err; ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000000XHI; - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); if (err) return err; ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XLO; - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); if (err) return err; ptr = MV88E6393X_PORT_POLICY_MGMT_CTL_PTR_01C280000002XHI; - err = mv88e6393x_port_policy_write_all(chip, ptr, 0xff); + err = mv88e6393x_port_policy_write_user(chip, ptr, 0xff); if (err) return err;
For packets with a DA in the IEEE reserved L2 group range, originating from a CPU, forward it as normal, rather than classifying it as management. Example use-case: bridge (group_fwd_mask 0x4000) / | \ swp1 swp2 tap0 \ / (mv88e6xxx) We've created a bridge with a non-zero group_fwd_mask (allowing LLDP in this example) containing a set of ports managed by mv88e6xxx and some foreign interface (e.g. an L2 VPN tunnel). Since an LLDP packet coming in to the bridge from the other side of tap0 is eligable for tx forward offloading, a FORWARD frame destined for swp1 and swp2 would be send to the conduit interface. Before this change, due to rsvd2cpu being enabled on the CPU port, the switch would try to trap it back to the CPU. Given that the CPU is trusted, instead assume that it indeed meant for the packet to be forwarded like any other. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> --- drivers/net/dsa/mv88e6xxx/port.c | 31 +++++++++++++++++++++++++------ 1 file changed, 25 insertions(+), 6 deletions(-)