diff mbox series

[RFC,v2,net-next,14/16] net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge

Message ID 20210318231829.3892920-15-olteanv@gmail.com (mailing list archive)
State New, archived
Headers show
Series Better support for sandwiched LAGs with bridge and DSA | expand

Commit Message

Vladimir Oltean March 18, 2021, 11:18 p.m. UTC
From: Vladimir Oltean <vladimir.oltean@nxp.com>

DSA has gained the recent ability to deal gracefully with upper
interfaces it cannot offload, such as the bridge, bonding or team
drivers. When such uppers exist, the ports are still in standalone mode
as far as the hardware is concerned.

But when we deliver packets to the software bridge in order for that to
do the forwarding, there is an unpleasant surprise in that the bridge
will refuse to forward them. This is because we unconditionally set
skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
were already forwarded in hardware by us.

Since dp->bridge_dev is populated only when there is hardware offload
for it, but not in the software fallback case, let's introduce a new
helper that can be called from the tagger data path which sets the
skb->offload_fwd_mark accordingly to zero when there is no hardware
offload for bridging. This lets the bridge forward packets back to other
interfaces of our switch, if needed.

Without this change, sending a packet to the CPU for an unoffloaded
interface triggers this WARN_ON:

void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
			      struct sk_buff *skb)
{
	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
}

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/dsa_priv.h         | 14 ++++++++++++++
 net/dsa/tag_brcm.c         |  2 +-
 net/dsa/tag_dsa.c          | 15 +++++++++++----
 net/dsa/tag_hellcreek.c    |  2 +-
 net/dsa/tag_ksz.c          |  2 +-
 net/dsa/tag_lan9303.c      |  3 ++-
 net/dsa/tag_mtk.c          |  2 +-
 net/dsa/tag_ocelot.c       |  2 +-
 net/dsa/tag_ocelot_8021q.c |  2 +-
 net/dsa/tag_rtl4_a.c       |  2 +-
 net/dsa/tag_sja1105.c      |  4 ++--
 net/dsa/tag_xrs700x.c      |  2 +-
 12 files changed, 37 insertions(+), 15 deletions(-)

Comments

Qingfang Deng March 19, 2021, 8:52 a.m. UTC | #1
On Fri, Mar 19, 2021 at 01:18:27AM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA has gained the recent ability to deal gracefully with upper
> interfaces it cannot offload, such as the bridge, bonding or team
> drivers. When such uppers exist, the ports are still in standalone mode
> as far as the hardware is concerned.
> 
> But when we deliver packets to the software bridge in order for that to
> do the forwarding, there is an unpleasant surprise in that the bridge
> will refuse to forward them. This is because we unconditionally set
> skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> were already forwarded in hardware by us.
> 
> Since dp->bridge_dev is populated only when there is hardware offload
> for it, but not in the software fallback case, let's introduce a new
> helper that can be called from the tagger data path which sets the
> skb->offload_fwd_mark accordingly to zero when there is no hardware
> offload for bridging. This lets the bridge forward packets back to other
> interfaces of our switch, if needed.
> 
> Without this change, sending a packet to the CPU for an unoffloaded
> interface triggers this WARN_ON:
> 
> void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> 			      struct sk_buff *skb)
> {
> 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> }
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  net/dsa/dsa_priv.h         | 14 ++++++++++++++
>  net/dsa/tag_brcm.c         |  2 +-
>  net/dsa/tag_dsa.c          | 15 +++++++++++----
>  net/dsa/tag_hellcreek.c    |  2 +-
>  net/dsa/tag_ksz.c          |  2 +-
>  net/dsa/tag_lan9303.c      |  3 ++-
>  net/dsa/tag_mtk.c          |  2 +-
>  net/dsa/tag_ocelot.c       |  2 +-
>  net/dsa/tag_ocelot_8021q.c |  2 +-
>  net/dsa/tag_rtl4_a.c       |  2 +-
>  net/dsa/tag_sja1105.c      |  4 ++--
>  net/dsa/tag_xrs700x.c      |  2 +-
>  12 files changed, 37 insertions(+), 15 deletions(-)
> 
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 92282de54230..b61bef79ce84 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -349,6 +349,20 @@ static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
>  	return skb;
>  }
>  
> +/* If the ingress port offloads the bridge, we mark the frame as autonomously
> + * forwarded by hardware, so the software bridge doesn't forward in twice, back
> + * to us, because we already did. However, if we're in fallback mode and we do
> + * software bridging, we are not offloading it, therefore the dp->bridge_dev
> + * pointer is not populated, and flooding needs to be done by software (we are
> + * effectively operating in standalone ports mode).
> + */
> +static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
> +{
> +	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
> +
> +	skb->offload_fwd_mark = !!(dp->bridge_dev);
> +}

So offload_fwd_mark is set iff the ingress port offloads the bridge.
Consider this set up on a switch which does NOT support LAG offload:

        +----- br0 -----+
        |               |
      bond0             |
        |               |         (Linux interfaces)
    +---+---+       +---+---+
    |       |       |       |
+-------+-------+-------+-------+
| sw0p0 | sw0p1 | sw0p2 | sw0p3 |
+-------+-------+-------+-------+
    |       |       |       |
    +---A---+       B       C     (LAN clients)


sw0p0 and sw0p1 should be in standalone mode (offload_fwd_mark = 0),
while sw0p2 and sw0p3 are offloaded (offload_fwd_mark = 1).

When a frame is sent into sw0p2 or sw0p3, can it be forwarded to sw0p0 or
sw0p1?

Setting offload_fwd_mark to 0 could also cause potential packet loss on
switches that perform learning on the CPU port:

When client C is talking to client A, frames from C will:
1. Enter sw0p3, where the switch will learn C is reachable via sw0p3.
2. Be sent to the CPU port and bounced back, where the switch will learn C is
   reachable via the CPU port, overwriting the previous learned FDB entry.
3. Be sent out of either sw0p0 or sw0p1, and reach its destination - A.

During step 2, if client B sends a frame to C, the frame will be forwarded to
the CPU, which will think it is already forwarded by the switch, and refuse to
forward it back, resulting in packet loss.

Many switch TX tags (mtk, qca, rtl) have a bit to disable source address
learning on a per-frame basis. We should utilise that.

> +
>  /* switch.c */
>  int dsa_switch_register_notifier(struct dsa_switch *ds);
>  void dsa_switch_unregister_notifier(struct dsa_switch *ds);
> diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
> index e2577a7dcbca..a8880b3bb106 100644
> --- a/net/dsa/tag_brcm.c
> +++ b/net/dsa/tag_brcm.c
> @@ -150,7 +150,7 @@ static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
>  	/* Remove Broadcom tag and update checksum */
>  	skb_pull_rcsum(skb, BRCM_TAG_LEN);
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
> index 7e7b7decdf39..09ab9c25e686 100644
> --- a/net/dsa/tag_dsa.c
> +++ b/net/dsa/tag_dsa.c
> @@ -162,8 +162,8 @@ static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
>  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  				  u8 extra)
>  {
> +	bool trap = false, trunk = false;
>  	int source_device, source_port;
> -	bool trunk = false;
>  	enum dsa_code code;
>  	enum dsa_cmd cmd;
>  	u8 *dsa_header;
> @@ -174,8 +174,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  	cmd = dsa_header[0] >> 6;
>  	switch (cmd) {
>  	case DSA_CMD_FORWARD:
> -		skb->offload_fwd_mark = 1;
> -
>  		trunk = !!(dsa_header[1] & 7);
>  		break;
>  
> @@ -194,7 +192,6 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  			 * device (like a bridge) that forwarding has
>  			 * already been done by hardware.
>  			 */
> -			skb->offload_fwd_mark = 1;
>  			break;
>  		case DSA_CODE_MGMT_TRAP:
>  		case DSA_CODE_IGMP_MLD_TRAP:
> @@ -202,6 +199,7 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  			/* Traps have, by definition, not been
>  			 * forwarded by hardware, so don't mark them.
>  			 */
> +			trap = true;
>  			break;
>  		default:
>  			/* Reserved code, this could be anything. Drop
> @@ -235,6 +233,15 @@ static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
>  	if (!skb->dev)
>  		return NULL;
>  
> +	/* When using LAG offload, skb->dev is not a DSA slave interface,
> +	 * so we cannot call dsa_default_offload_fwd_mark and we need to
> +	 * special-case it.
> +	 */
> +	if (trunk)
> +		skb->offload_fwd_mark = true;
> +	else if (!trap)
> +		dsa_default_offload_fwd_mark(skb);
> +
>  	/* If the 'tagged' bit is set; convert the DSA tag to a 802.1Q
>  	 * tag, and delete the ethertype (extra) if applicable. If the
>  	 * 'tagged' bit is cleared; delete the DSA tag, and ethertype
> diff --git a/net/dsa/tag_hellcreek.c b/net/dsa/tag_hellcreek.c
> index a09805c8e1ab..c1ee6eefafe4 100644
> --- a/net/dsa/tag_hellcreek.c
> +++ b/net/dsa/tag_hellcreek.c
> @@ -44,7 +44,7 @@ static struct sk_buff *hellcreek_rcv(struct sk_buff *skb,
>  
>  	pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN);
>  
> -	skb->offload_fwd_mark = true;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
> index 4820dbcedfa2..8eee63a5b93b 100644
> --- a/net/dsa/tag_ksz.c
> +++ b/net/dsa/tag_ksz.c
> @@ -24,7 +24,7 @@ static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
>  
>  	pskb_trim_rcsum(skb, skb->len - len);
>  
> -	skb->offload_fwd_mark = true;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
> index aa1318dccaf0..3a5494d2f7b1 100644
> --- a/net/dsa/tag_lan9303.c
> +++ b/net/dsa/tag_lan9303.c
> @@ -115,7 +115,8 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
>  	skb_pull_rcsum(skb, 2 + 2);
>  	memmove(skb->data - ETH_HLEN, skb->data - (ETH_HLEN + LAN9303_TAG_LEN),
>  		2 * ETH_ALEN);
> -	skb->offload_fwd_mark = !(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU);
> +	if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
> +		dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
> index f9b2966d1936..92ab21d2ceca 100644
> --- a/net/dsa/tag_mtk.c
> +++ b/net/dsa/tag_mtk.c
> @@ -92,7 +92,7 @@ static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
>  	if (!skb->dev)
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
> index f9df9cac81c5..1deba3f1bb82 100644
> --- a/net/dsa/tag_ocelot.c
> +++ b/net/dsa/tag_ocelot.c
> @@ -123,7 +123,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
>  		 */
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  	skb->priority = qos_class;
>  
>  	/* Ocelot switches copy frames unmodified to the CPU. However, it is
> diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
> index 5f3e8e124a82..447e1eeb357c 100644
> --- a/net/dsa/tag_ocelot_8021q.c
> +++ b/net/dsa/tag_ocelot_8021q.c
> @@ -81,7 +81,7 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
>  	if (!skb->dev)
>  		return NULL;
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  	skb->priority = qos_class;
>  
>  	return skb;
> diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
> index e9176475bac8..1864e3a74df8 100644
> --- a/net/dsa/tag_rtl4_a.c
> +++ b/net/dsa/tag_rtl4_a.c
> @@ -114,7 +114,7 @@ static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
>  		skb->data - ETH_HLEN - RTL4_A_HDR_LEN,
>  		2 * ETH_ALEN);
>  
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
> index 50496013cdb7..45cdf64f0e88 100644
> --- a/net/dsa/tag_sja1105.c
> +++ b/net/dsa/tag_sja1105.c
> @@ -295,8 +295,6 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
>  	is_link_local = sja1105_is_link_local(skb);
>  	is_meta = sja1105_is_meta_frame(skb);
>  
> -	skb->offload_fwd_mark = 1;
> -
>  	if (is_tagged) {
>  		/* Normal traffic path. */
>  		skb_push_rcsum(skb, ETH_HLEN);
> @@ -339,6 +337,8 @@ static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
>  		return NULL;
>  	}
>  
> +	dsa_default_offload_fwd_mark(skb);
> +
>  	if (subvlan)
>  		sja1105_decode_subvlan(skb, subvlan);
>  
> diff --git a/net/dsa/tag_xrs700x.c b/net/dsa/tag_xrs700x.c
> index 858cdf9d2913..1208549f45c1 100644
> --- a/net/dsa/tag_xrs700x.c
> +++ b/net/dsa/tag_xrs700x.c
> @@ -46,7 +46,7 @@ static struct sk_buff *xrs700x_rcv(struct sk_buff *skb, struct net_device *dev,
>  		return NULL;
>  
>  	/* Frame is forwarded by hardware, don't forward in software. */
> -	skb->offload_fwd_mark = 1;
> +	dsa_default_offload_fwd_mark(skb);
>  
>  	return skb;
>  }
> -- 
> 2.25.1
>
Vladimir Oltean March 19, 2021, 9:06 a.m. UTC | #2
On Fri, Mar 19, 2021 at 04:52:31PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 01:18:27AM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > 
> > DSA has gained the recent ability to deal gracefully with upper
> > interfaces it cannot offload, such as the bridge, bonding or team
> > drivers. When such uppers exist, the ports are still in standalone mode
> > as far as the hardware is concerned.
> > 
> > But when we deliver packets to the software bridge in order for that to
> > do the forwarding, there is an unpleasant surprise in that the bridge
> > will refuse to forward them. This is because we unconditionally set
> > skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> > were already forwarded in hardware by us.
> > 
> > Since dp->bridge_dev is populated only when there is hardware offload
> > for it, but not in the software fallback case, let's introduce a new
> > helper that can be called from the tagger data path which sets the
> > skb->offload_fwd_mark accordingly to zero when there is no hardware
> > offload for bridging. This lets the bridge forward packets back to other
> > interfaces of our switch, if needed.
> > 
> > Without this change, sending a packet to the CPU for an unoffloaded
> > interface triggers this WARN_ON:
> > 
> > void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> > 			      struct sk_buff *skb)
> > {
> > 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> > 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> > }
> > 
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>
> > ---
> >  net/dsa/dsa_priv.h         | 14 ++++++++++++++
> >  net/dsa/tag_brcm.c         |  2 +-
> >  net/dsa/tag_dsa.c          | 15 +++++++++++----
> >  net/dsa/tag_hellcreek.c    |  2 +-
> >  net/dsa/tag_ksz.c          |  2 +-
> >  net/dsa/tag_lan9303.c      |  3 ++-
> >  net/dsa/tag_mtk.c          |  2 +-
> >  net/dsa/tag_ocelot.c       |  2 +-
> >  net/dsa/tag_ocelot_8021q.c |  2 +-
> >  net/dsa/tag_rtl4_a.c       |  2 +-
> >  net/dsa/tag_sja1105.c      |  4 ++--
> >  net/dsa/tag_xrs700x.c      |  2 +-
> >  12 files changed, 37 insertions(+), 15 deletions(-)
> > 
> > diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> > index 92282de54230..b61bef79ce84 100644
> > --- a/net/dsa/dsa_priv.h
> > +++ b/net/dsa/dsa_priv.h
> > @@ -349,6 +349,20 @@ static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
> >  	return skb;
> >  }
> >  
> > +/* If the ingress port offloads the bridge, we mark the frame as autonomously
> > + * forwarded by hardware, so the software bridge doesn't forward in twice, back
> > + * to us, because we already did. However, if we're in fallback mode and we do
> > + * software bridging, we are not offloading it, therefore the dp->bridge_dev
> > + * pointer is not populated, and flooding needs to be done by software (we are
> > + * effectively operating in standalone ports mode).
> > + */
> > +static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
> > +{
> > +	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
> > +
> > +	skb->offload_fwd_mark = !!(dp->bridge_dev);
> > +}
> 
> So offload_fwd_mark is set iff the ingress port offloads the bridge.
> Consider this set up on a switch which does NOT support LAG offload:
> 
>         +----- br0 -----+
>         |               |
>       bond0             |
>         |               |         (Linux interfaces)
>     +---+---+       +---+---+
>     |       |       |       |
> +-------+-------+-------+-------+
> | sw0p0 | sw0p1 | sw0p2 | sw0p3 |
> +-------+-------+-------+-------+
>     |       |       |       |
>     +---A---+       B       C     (LAN clients)
> 
> 
> sw0p0 and sw0p1 should be in standalone mode (offload_fwd_mark = 0),
> while sw0p2 and sw0p3 are offloaded (offload_fwd_mark = 1).
> 
> When a frame is sent into sw0p2 or sw0p3, can it be forwarded to sw0p0 or
> sw0p1?

bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
				  const struct sk_buff *skb)
{
	return !skb->offload_fwd_mark ||
	       BR_INPUT_SKB_CB(skb)->offload_fwd_mark != p->offload_fwd_mark;
}

where p->offload_fwd_mark is the mark of the egress port, and
BR_INPUT_SKB_CB(skb) is the mark of the ingress port, assigned here:

void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
			      struct sk_buff *skb)
{
	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
}

Basically, sw0p0 and sw0p1 have a switchdev mark of 0, and sw0p2 and
sw0p3 have a non-zero switchdev mark, so nbp_switchdev_allowed_egress
returns true in both directions, regardless of the value of
skb->offload_fwd_mark.

> Setting offload_fwd_mark to 0 could also cause potential packet loss on
> switches that perform learning on the CPU port:
> 
> When client C is talking to client A, frames from C will:
> 1. Enter sw0p3, where the switch will learn C is reachable via sw0p3.
> 2. Be sent to the CPU port and bounced back, where the switch will learn C is
>    reachable via the CPU port, overwriting the previous learned FDB entry.
> 3. Be sent out of either sw0p0 or sw0p1, and reach its destination - A.
> 
> During step 2, if client B sends a frame to C, the frame will be forwarded to
> the CPU, which will think it is already forwarded by the switch, and refuse to
> forward it back, resulting in packet loss.
> 
> Many switch TX tags (mtk, qca, rtl) have a bit to disable source address
> learning on a per-frame basis. We should utilise that.

This is a good point actually, which I thought about, but did not give a
lot of importance to for the moment. Either we go full steam ahead with
assisted learning on the CPU port for everybody, and we selectively
learn the addresses relevant to the bridging funciton only, or we do
what you say, but then it will be a little bit more complicated IMO, and
have hardware dependencies, which isn't as nice.
Qingfang Deng March 19, 2021, 9:29 a.m. UTC | #3
On Fri, Mar 19, 2021 at 5:06 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> This is a good point actually, which I thought about, but did not give a
> lot of importance to for the moment. Either we go full steam ahead with
> assisted learning on the CPU port for everybody, and we selectively
> learn the addresses relevant to the bridging funciton only, or we do
> what you say, but then it will be a little bit more complicated IMO, and
> have hardware dependencies, which isn't as nice.

Are skb->offload_fwd_mark and source DSA switch kept in dsa_slave_xmit?
I think SA learning should be bypassed iff skb->offload_fwd_mark == 1 and
source DSA switch == destination DSA switch.
Vladimir Oltean March 19, 2021, 10:49 a.m. UTC | #4
On Fri, Mar 19, 2021 at 05:29:12PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 5:06 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> >
> > This is a good point actually, which I thought about, but did not give a
> > lot of importance to for the moment. Either we go full steam ahead with
> > assisted learning on the CPU port for everybody, and we selectively
> > learn the addresses relevant to the bridging funciton only, or we do
> > what you say, but then it will be a little bit more complicated IMO, and
> > have hardware dependencies, which isn't as nice.
> 
> Are skb->offload_fwd_mark and source DSA switch kept in dsa_slave_xmit?
> I think SA learning should be bypassed iff skb->offload_fwd_mark == 1 and
> source DSA switch == destination DSA switch.

Why would you even want to look at the source net device for forwarding?
I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
want to bypass address learning if you can. Maybe also for link-local traffic.
Qingfang Deng March 22, 2021, 8:04 a.m. UTC | #5
On Fri, Mar 19, 2021 at 6:49 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> Why would you even want to look at the source net device for forwarding?
> I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
> want to bypass address learning if you can. Maybe also for link-local traffic.

Also for trapped traffic (snooping, tc-flower trap action) if the CPU
sends them back.
Florian Fainelli March 22, 2021, 4:06 p.m. UTC | #6
On 3/18/2021 4:18 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> DSA has gained the recent ability to deal gracefully with upper
> interfaces it cannot offload, such as the bridge, bonding or team
> drivers. When such uppers exist, the ports are still in standalone mode
> as far as the hardware is concerned.
> 
> But when we deliver packets to the software bridge in order for that to
> do the forwarding, there is an unpleasant surprise in that the bridge
> will refuse to forward them. This is because we unconditionally set
> skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
> were already forwarded in hardware by us.
> 
> Since dp->bridge_dev is populated only when there is hardware offload
> for it, but not in the software fallback case, let's introduce a new
> helper that can be called from the tagger data path which sets the
> skb->offload_fwd_mark accordingly to zero when there is no hardware
> offload for bridging. This lets the bridge forward packets back to other
> interfaces of our switch, if needed.
> 
> Without this change, sending a packet to the CPU for an unoffloaded
> interface triggers this WARN_ON:
> 
> void nbp_switchdev_frame_mark(const struct net_bridge_port *p,
> 			      struct sk_buff *skb)
> {
> 	if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark))
> 		BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark;
> }
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Vladimir Oltean March 22, 2021, 10:23 p.m. UTC | #7
On Mon, Mar 22, 2021 at 04:04:01PM +0800, DENG Qingfang wrote:
> On Fri, Mar 19, 2021 at 6:49 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> > Why would you even want to look at the source net device for forwarding?
> > I'd say that if dp->bridge_dev is NULL in the xmit function, you certainly
> > want to bypass address learning if you can. Maybe also for link-local traffic.
> 
> Also for trapped traffic (snooping, tc-flower trap action) if the CPU
> sends them back.

This sounds line an interesting use case, please tell me more about what
commands I could run to reinject trapped packets into the hardware data
path.
diff mbox series

Patch

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 92282de54230..b61bef79ce84 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -349,6 +349,20 @@  static inline struct sk_buff *dsa_untag_bridge_pvid(struct sk_buff *skb)
 	return skb;
 }
 
+/* If the ingress port offloads the bridge, we mark the frame as autonomously
+ * forwarded by hardware, so the software bridge doesn't forward in twice, back
+ * to us, because we already did. However, if we're in fallback mode and we do
+ * software bridging, we are not offloading it, therefore the dp->bridge_dev
+ * pointer is not populated, and flooding needs to be done by software (we are
+ * effectively operating in standalone ports mode).
+ */
+static inline void dsa_default_offload_fwd_mark(struct sk_buff *skb)
+{
+	struct dsa_port *dp = dsa_slave_to_port(skb->dev);
+
+	skb->offload_fwd_mark = !!(dp->bridge_dev);
+}
+
 /* switch.c */
 int dsa_switch_register_notifier(struct dsa_switch *ds);
 void dsa_switch_unregister_notifier(struct dsa_switch *ds);
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index e2577a7dcbca..a8880b3bb106 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -150,7 +150,7 @@  static struct sk_buff *brcm_tag_rcv_ll(struct sk_buff *skb,
 	/* Remove Broadcom tag and update checksum */
 	skb_pull_rcsum(skb, BRCM_TAG_LEN);
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index 7e7b7decdf39..09ab9c25e686 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -162,8 +162,8 @@  static struct sk_buff *dsa_xmit_ll(struct sk_buff *skb, struct net_device *dev,
 static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 				  u8 extra)
 {
+	bool trap = false, trunk = false;
 	int source_device, source_port;
-	bool trunk = false;
 	enum dsa_code code;
 	enum dsa_cmd cmd;
 	u8 *dsa_header;
@@ -174,8 +174,6 @@  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 	cmd = dsa_header[0] >> 6;
 	switch (cmd) {
 	case DSA_CMD_FORWARD:
-		skb->offload_fwd_mark = 1;
-
 		trunk = !!(dsa_header[1] & 7);
 		break;
 
@@ -194,7 +192,6 @@  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 			 * device (like a bridge) that forwarding has
 			 * already been done by hardware.
 			 */
-			skb->offload_fwd_mark = 1;
 			break;
 		case DSA_CODE_MGMT_TRAP:
 		case DSA_CODE_IGMP_MLD_TRAP:
@@ -202,6 +199,7 @@  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 			/* Traps have, by definition, not been
 			 * forwarded by hardware, so don't mark them.
 			 */
+			trap = true;
 			break;
 		default:
 			/* Reserved code, this could be anything. Drop
@@ -235,6 +233,15 @@  static struct sk_buff *dsa_rcv_ll(struct sk_buff *skb, struct net_device *dev,
 	if (!skb->dev)
 		return NULL;
 
+	/* When using LAG offload, skb->dev is not a DSA slave interface,
+	 * so we cannot call dsa_default_offload_fwd_mark and we need to
+	 * special-case it.
+	 */
+	if (trunk)
+		skb->offload_fwd_mark = true;
+	else if (!trap)
+		dsa_default_offload_fwd_mark(skb);
+
 	/* If the 'tagged' bit is set; convert the DSA tag to a 802.1Q
 	 * tag, and delete the ethertype (extra) if applicable. If the
 	 * 'tagged' bit is cleared; delete the DSA tag, and ethertype
diff --git a/net/dsa/tag_hellcreek.c b/net/dsa/tag_hellcreek.c
index a09805c8e1ab..c1ee6eefafe4 100644
--- a/net/dsa/tag_hellcreek.c
+++ b/net/dsa/tag_hellcreek.c
@@ -44,7 +44,7 @@  static struct sk_buff *hellcreek_rcv(struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, skb->len - HELLCREEK_TAG_LEN);
 
-	skb->offload_fwd_mark = true;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_ksz.c b/net/dsa/tag_ksz.c
index 4820dbcedfa2..8eee63a5b93b 100644
--- a/net/dsa/tag_ksz.c
+++ b/net/dsa/tag_ksz.c
@@ -24,7 +24,7 @@  static struct sk_buff *ksz_common_rcv(struct sk_buff *skb,
 
 	pskb_trim_rcsum(skb, skb->len - len);
 
-	skb->offload_fwd_mark = true;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index aa1318dccaf0..3a5494d2f7b1 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -115,7 +115,8 @@  static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev,
 	skb_pull_rcsum(skb, 2 + 2);
 	memmove(skb->data - ETH_HLEN, skb->data - (ETH_HLEN + LAN9303_TAG_LEN),
 		2 * ETH_ALEN);
-	skb->offload_fwd_mark = !(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU);
+	if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
+		dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_mtk.c b/net/dsa/tag_mtk.c
index f9b2966d1936..92ab21d2ceca 100644
--- a/net/dsa/tag_mtk.c
+++ b/net/dsa/tag_mtk.c
@@ -92,7 +92,7 @@  static struct sk_buff *mtk_tag_rcv(struct sk_buff *skb, struct net_device *dev,
 	if (!skb->dev)
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index f9df9cac81c5..1deba3f1bb82 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -123,7 +123,7 @@  static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
 		 */
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 	skb->priority = qos_class;
 
 	/* Ocelot switches copy frames unmodified to the CPU. However, it is
diff --git a/net/dsa/tag_ocelot_8021q.c b/net/dsa/tag_ocelot_8021q.c
index 5f3e8e124a82..447e1eeb357c 100644
--- a/net/dsa/tag_ocelot_8021q.c
+++ b/net/dsa/tag_ocelot_8021q.c
@@ -81,7 +81,7 @@  static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
 	if (!skb->dev)
 		return NULL;
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 	skb->priority = qos_class;
 
 	return skb;
diff --git a/net/dsa/tag_rtl4_a.c b/net/dsa/tag_rtl4_a.c
index e9176475bac8..1864e3a74df8 100644
--- a/net/dsa/tag_rtl4_a.c
+++ b/net/dsa/tag_rtl4_a.c
@@ -114,7 +114,7 @@  static struct sk_buff *rtl4a_tag_rcv(struct sk_buff *skb,
 		skb->data - ETH_HLEN - RTL4_A_HDR_LEN,
 		2 * ETH_ALEN);
 
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 50496013cdb7..45cdf64f0e88 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -295,8 +295,6 @@  static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
 	is_link_local = sja1105_is_link_local(skb);
 	is_meta = sja1105_is_meta_frame(skb);
 
-	skb->offload_fwd_mark = 1;
-
 	if (is_tagged) {
 		/* Normal traffic path. */
 		skb_push_rcsum(skb, ETH_HLEN);
@@ -339,6 +337,8 @@  static struct sk_buff *sja1105_rcv(struct sk_buff *skb,
 		return NULL;
 	}
 
+	dsa_default_offload_fwd_mark(skb);
+
 	if (subvlan)
 		sja1105_decode_subvlan(skb, subvlan);
 
diff --git a/net/dsa/tag_xrs700x.c b/net/dsa/tag_xrs700x.c
index 858cdf9d2913..1208549f45c1 100644
--- a/net/dsa/tag_xrs700x.c
+++ b/net/dsa/tag_xrs700x.c
@@ -46,7 +46,7 @@  static struct sk_buff *xrs700x_rcv(struct sk_buff *skb, struct net_device *dev,
 		return NULL;
 
 	/* Frame is forwarded by hardware, don't forward in software. */
-	skb->offload_fwd_mark = 1;
+	dsa_default_offload_fwd_mark(skb);
 
 	return skb;
 }