diff mbox series

[RFC] net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.

Message ID 20220505225904.342388-1-andrew@lunn.ch (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [RFC] net: bridge: Clear offload_fwd_mark when passing frame up bridge interface. | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 5 maintainers not CCed: roopa@nvidia.com edumazet@google.com pabeni@redhat.com kuba@kernel.org davem@davemloft.net
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Andrew Lunn May 5, 2022, 10:59 p.m. UTC
It is possible to stack bridges on top of each other. Consider the
following which makes use of an Ethernet switch:

       br1
     /    \
    /      \
   /        \
 br0.11    wlan0
   |
   br0
 /  |  \
p1  p2  p3

br0 is offloaded to the switch. Above br0 is a vlan interface, for
vlan 11. This vlan interface is then a slave of br1. br1 also has
wireless interface as a slave. This setup trunks wireless lan traffic
over the copper network inside a VLAN.

A frame received on p1 which is passed up to the bridge has the
skb->offload_fwd_mark flag set to true, indicating it that the switch
has dealt with forwarding the frame out ports p2 and p3 as
needed. This flag instructs the software bridge it does not need to
pass the frame back down again. However, the flag is not getting reset
when the frame is passed upwards. As a result br1 sees the flag,
wrongly interprets it, and fails to forward the frame to wlan0.

When passing a frame upwards, clear the flag.

RFC because i don't know the bridge code well enough if this is the
correct place to do this, and if there are any side effects, could the
skb be a clone, etc.

Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 net/bridge/br_input.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Stephen Hemminger May 5, 2022, 11:07 p.m. UTC | #1
On Fri,  6 May 2022 00:59:04 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> It is possible to stack bridges on top of each other. Consider the
> following which makes use of an Ethernet switch:
> 
>        br1
>      /    \
>     /      \
>    /        \
>  br0.11    wlan0
>    |
>    br0
>  /  |  \
> p1  p2  p3
> 
> br0 is offloaded to the switch. Above br0 is a vlan interface, for
> vlan 11. This vlan interface is then a slave of br1. br1 also has
> wireless interface as a slave. This setup trunks wireless lan traffic
> over the copper network inside a VLAN.
> 
> A frame received on p1 which is passed up to the bridge has the
> skb->offload_fwd_mark flag set to true, indicating it that the switch
> has dealt with forwarding the frame out ports p2 and p3 as
> needed. This flag instructs the software bridge it does not need to
> pass the frame back down again. However, the flag is not getting reset
> when the frame is passed upwards. As a result br1 sees the flag,
> wrongly interprets it, and fails to forward the frame to wlan0.
> 
> When passing a frame upwards, clear the flag.
> 
> RFC because i don't know the bridge code well enough if this is the
> correct place to do this, and if there are any side effects, could the
> skb be a clone, etc.
> 
> Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Bridging of bridges is not supposed to be allowed.
See:

bridge:br_if.c

	/* No bridging of bridges */
	if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) {
		NL_SET_ERR_MSG(extack,
			       "Can not enslave a bridge to a bridge");
		return -ELOOP;
	}
Andrew Lunn May 6, 2022, 1:18 a.m. UTC | #2
On Thu, May 05, 2022 at 04:07:20PM -0700, Stephen Hemminger wrote:
> On Fri,  6 May 2022 00:59:04 +0200
> Andrew Lunn <andrew@lunn.ch> wrote:
> 
> > It is possible to stack bridges on top of each other. Consider the
> > following which makes use of an Ethernet switch:
> > 
> >        br1
> >      /    \
> >     /      \
> >    /        \
> >  br0.11    wlan0
> >    |
> >    br0
> >  /  |  \
> > p1  p2  p3
> > 
> > br0 is offloaded to the switch. Above br0 is a vlan interface, for
> > vlan 11. This vlan interface is then a slave of br1. br1 also has
> > wireless interface as a slave. This setup trunks wireless lan traffic
> > over the copper network inside a VLAN.
> > 
> > A frame received on p1 which is passed up to the bridge has the
> > skb->offload_fwd_mark flag set to true, indicating it that the switch
> > has dealt with forwarding the frame out ports p2 and p3 as
> > needed. This flag instructs the software bridge it does not need to
> > pass the frame back down again. However, the flag is not getting reset
> > when the frame is passed upwards. As a result br1 sees the flag,
> > wrongly interprets it, and fails to forward the frame to wlan0.
> > 
> > When passing a frame upwards, clear the flag.
> > 
> > RFC because i don't know the bridge code well enough if this is the
> > correct place to do this, and if there are any side effects, could the
> > skb be a clone, etc.
> > 
> > Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> > Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> 
> Bridging of bridges is not supposed to be allowed.
> See:
> 
> bridge:br_if.c
> 
> 	/* No bridging of bridges */
> 	if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) {
> 		NL_SET_ERR_MSG(extack,
> 			       "Can not enslave a bridge to a bridge");
> 		return -ELOOP;
> 	}

This is not direct bridging of bridges. There is a vlan interface in
the middle. And even if it is not supposed to work, it does work, it
is being used, and it regressed. This fixes the regression.

   Andrew
Vladimir Oltean May 6, 2022, 2:36 p.m. UTC | #3
Hi Andrew,

On Fri, May 06, 2022 at 12:59:04AM +0200, Andrew Lunn wrote:
> It is possible to stack bridges on top of each other. Consider the
> following which makes use of an Ethernet switch:
> 
>        br1
>      /    \
>     /      \
>    /        \
>  br0.11    wlan0
>    |
>    br0
>  /  |  \
> p1  p2  p3
> 
> br0 is offloaded to the switch. Above br0 is a vlan interface, for
> vlan 11. This vlan interface is then a slave of br1. br1 also has
> wireless interface as a slave. This setup trunks wireless lan traffic
> over the copper network inside a VLAN.
> 
> A frame received on p1 which is passed up to the bridge has the
> skb->offload_fwd_mark flag set to true, indicating it that the switch
> has dealt with forwarding the frame out ports p2 and p3 as
> needed. This flag instructs the software bridge it does not need to
> pass the frame back down again. However, the flag is not getting reset
> when the frame is passed upwards. As a result br1 sees the flag,
> wrongly interprets it, and fails to forward the frame to wlan0.
> 
> When passing a frame upwards, clear the flag.
> 
> RFC because i don't know the bridge code well enough if this is the
> correct place to do this, and if there are any side effects, could the
> skb be a clone, etc.

Each skb has its own offload_fwd_mark, so clearing it for this skb does
not affect a clone. And when a packet is simultaneously forwarded and
locally received, the order is first forward/flood it, then receive it.
Cloning takes place during forwarding using deliver_clone(), so it
shouldn't be the case that you are clearing the offload_fwd_mark for a
yet-to-be-forwarded packet, either. So I think we're good there.

> 
> Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  net/bridge/br_input.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 196417859c4a..9327a5fad1df 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -39,6 +39,13 @@ static int br_pass_frame_up(struct sk_buff *skb)
>  	dev_sw_netstats_rx_add(brdev, skb->len);
>  
>  	vg = br_vlan_group_rcu(br);
> +
> +	/* Reset the offload_fwd_mark because there could be a stacked
> +	 * bridge above, and it should not think this bridge it doing
> +	 * that bridges work forward out its ports.

"this bridge is doing that bridge's work forwarding out its ports"

> +	 */
> +	br_switchdev_frame_unmark(skb);
> +
>  	/* Bridge is just like any other port.  Make sure the
>  	 * packet is allowed except in promisc mode when someone
>  	 * may be running packet capture.
> -- 
> 2.36.0
>

The good thing with this patch is that it avoids conditionals.
The bad thing is that it prevents true offloading of this configuration
from being possible (when "wlan0" is "p4").

I don't know what hardware is capable of doing this, but I think it's
cautious to not exclude it, either.

Some safer alternatives to this patch are based on the idea that we
could ignore skb->offload_fwd_mark coming from an unoffloaded bridge
port (i.e. treat this condition at br1, not at br0). We could:
- clear skb->offload_fwd_mark in br_handle_frame_finish(), if p->hwdom is 0
- change nbp_switchdev_allowed_egress() to return true if cb->src_hwdom == 0
Stephen Hemminger May 6, 2022, 3:05 p.m. UTC | #4
On Fri, 6 May 2022 03:18:22 +0200
Andrew Lunn <andrew@lunn.ch> wrote:

> On Thu, May 05, 2022 at 04:07:20PM -0700, Stephen Hemminger wrote:
> > On Fri,  6 May 2022 00:59:04 +0200
> > Andrew Lunn <andrew@lunn.ch> wrote:
> >   
> > > It is possible to stack bridges on top of each other. Consider the
> > > following which makes use of an Ethernet switch:
> > > 
> > >        br1
> > >      /    \
> > >     /      \
> > >    /        \
> > >  br0.11    wlan0
> > >    |
> > >    br0
> > >  /  |  \
> > > p1  p2  p3
> > > 
> > > br0 is offloaded to the switch. Above br0 is a vlan interface, for
> > > vlan 11. This vlan interface is then a slave of br1. br1 also has
> > > wireless interface as a slave. This setup trunks wireless lan traffic
> > > over the copper network inside a VLAN.
> > > 
> > > A frame received on p1 which is passed up to the bridge has the
> > > skb->offload_fwd_mark flag set to true, indicating it that the switch
> > > has dealt with forwarding the frame out ports p2 and p3 as
> > > needed. This flag instructs the software bridge it does not need to
> > > pass the frame back down again. However, the flag is not getting reset
> > > when the frame is passed upwards. As a result br1 sees the flag,
> > > wrongly interprets it, and fails to forward the frame to wlan0.
> > > 
> > > When passing a frame upwards, clear the flag.
> > > 
> > > RFC because i don't know the bridge code well enough if this is the
> > > correct place to do this, and if there are any side effects, could the
> > > skb be a clone, etc.
> > > 
> > > Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> > > Signed-off-by: Andrew Lunn <andrew@lunn.ch>  
> > 
> > Bridging of bridges is not supposed to be allowed.
> > See:
> > 
> > bridge:br_if.c
> > 
> > 	/* No bridging of bridges */
> > 	if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) {
> > 		NL_SET_ERR_MSG(extack,
> > 			       "Can not enslave a bridge to a bridge");
> > 		return -ELOOP;
> > 	}  
> 
> This is not direct bridging of bridges. There is a vlan interface in
> the middle. And even if it is not supposed to work, it does work, it
> is being used, and it regressed. This fixes the regression.
> 
>    Andrew

The problem is that doing this kind of nested bridging screws up
Spanning Tree.
Andrew Lunn May 6, 2022, 4:58 p.m. UTC | #5
> Some safer alternatives to this patch are based on the idea that we
> could ignore skb->offload_fwd_mark coming from an unoffloaded bridge
> port (i.e. treat this condition at br1, not at br0). We could:
> - clear skb->offload_fwd_mark in br_handle_frame_finish(), if p->hwdom is 0
> - change nbp_switchdev_allowed_egress() to return true if cb->src_hwdom == 0

O.K, i will try out these solutions.

Thanks
     Andrew
Ido Schimmel May 8, 2022, 7:52 a.m. UTC | #6
On Fri, May 06, 2022 at 02:36:45PM +0000, Vladimir Oltean wrote:
> Hi Andrew,
> 
> On Fri, May 06, 2022 at 12:59:04AM +0200, Andrew Lunn wrote:
> > It is possible to stack bridges on top of each other. Consider the
> > following which makes use of an Ethernet switch:
> > 
> >        br1
> >      /    \
> >     /      \
> >    /        \
> >  br0.11    wlan0
> >    |
> >    br0
> >  /  |  \
> > p1  p2  p3
> > 
> > br0 is offloaded to the switch. Above br0 is a vlan interface, for
> > vlan 11. This vlan interface is then a slave of br1. br1 also has
> > wireless interface as a slave. This setup trunks wireless lan traffic
> > over the copper network inside a VLAN.
> > 
> > A frame received on p1 which is passed up to the bridge has the
> > skb->offload_fwd_mark flag set to true, indicating it that the switch
> > has dealt with forwarding the frame out ports p2 and p3 as
> > needed. This flag instructs the software bridge it does not need to
> > pass the frame back down again. However, the flag is not getting reset
> > when the frame is passed upwards. As a result br1 sees the flag,
> > wrongly interprets it, and fails to forward the frame to wlan0.
> > 
> > When passing a frame upwards, clear the flag.
> > 
> > RFC because i don't know the bridge code well enough if this is the
> > correct place to do this, and if there are any side effects, could the
> > skb be a clone, etc.
> 
> Each skb has its own offload_fwd_mark, so clearing it for this skb does
> not affect a clone. And when a packet is simultaneously forwarded and
> locally received, the order is first forward/flood it, then receive it.
> Cloning takes place during forwarding using deliver_clone(), so it
> shouldn't be the case that you are clearing the offload_fwd_mark for a
> yet-to-be-forwarded packet, either. So I think we're good there.
> 
> > 
> > Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> > Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> > ---
> >  net/bridge/br_input.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> > index 196417859c4a..9327a5fad1df 100644
> > --- a/net/bridge/br_input.c
> > +++ b/net/bridge/br_input.c
> > @@ -39,6 +39,13 @@ static int br_pass_frame_up(struct sk_buff *skb)
> >  	dev_sw_netstats_rx_add(brdev, skb->len);
> >  
> >  	vg = br_vlan_group_rcu(br);
> > +
> > +	/* Reset the offload_fwd_mark because there could be a stacked
> > +	 * bridge above, and it should not think this bridge it doing
> > +	 * that bridges work forward out its ports.
> 
> "this bridge is doing that bridge's work forwarding out its ports"
> 
> > +	 */
> > +	br_switchdev_frame_unmark(skb);
> > +
> >  	/* Bridge is just like any other port.  Make sure the
> >  	 * packet is allowed except in promisc mode when someone
> >  	 * may be running packet capture.
> > -- 
> > 2.36.0
> >
> 
> The good thing with this patch is that it avoids conditionals.
> The bad thing is that it prevents true offloading of this configuration
> from being possible (when "wlan0" is "p4").
> 
> I don't know what hardware is capable of doing this, but I think it's
> cautious to not exclude it, either.
> 
> Some safer alternatives to this patch are based on the idea that we
> could ignore skb->offload_fwd_mark coming from an unoffloaded bridge
> port (i.e. treat this condition at br1, not at br0). We could:
> - clear skb->offload_fwd_mark in br_handle_frame_finish(), if p->hwdom is 0
> - change nbp_switchdev_allowed_egress() to return true if cb->src_hwdom == 0

I like Andrew's patch because it is the Rx equivalent of
br_switchdev_frame_unmark() in br_dev_xmit(). However, if we go with the
second option, it should allow us to remove the clearing of the mark in
the Tx path as the control block is cleared in the Tx path since commit
fd65e5a95d08 ("net: bridge: clear bridge's private skb space on xmit").

I don't know how far back Nik's patch was backported and I don't know
how far back Andrew's patch will be backported, so it might be best to
submit Andrew's patch to net as-is and then in net-next change
nbp_switchdev_allowed_egress() and remove br_switchdev_frame_unmark()
from both the Rx and Tx paths.

Anyway, I have applied this patch to our tree for testing. Will report
tomorrow in case there are any regressions.
Andrew Lunn May 12, 2022, 8:38 p.m. UTC | #7
> I like Andrew's patch because it is the Rx equivalent of
> br_switchdev_frame_unmark() in br_dev_xmit(). However, if we go with the
> second option, it should allow us to remove the clearing of the mark in
> the Tx path as the control block is cleared in the Tx path since commit
> fd65e5a95d08 ("net: bridge: clear bridge's private skb space on xmit").
> 
> I don't know how far back Nik's patch was backported and I don't know
> how far back Andrew's patch will be backported, so it might be best to
> submit Andrew's patch to net as-is and then in net-next change
> nbp_switchdev_allowed_egress() and remove br_switchdev_frame_unmark()
> from both the Rx and Tx paths.
> 
> Anyway, I have applied this patch to our tree for testing. Will report
> tomorrow in case there are any regressions.

Hi Ido

Did your testing find any issues?

Thanks
	Andrew
Ido Schimmel May 13, 2022, 12:47 p.m. UTC | #8
On Thu, May 12, 2022 at 10:38:03PM +0200, Andrew Lunn wrote:
> > I like Andrew's patch because it is the Rx equivalent of
> > br_switchdev_frame_unmark() in br_dev_xmit(). However, if we go with the
> > second option, it should allow us to remove the clearing of the mark in
> > the Tx path as the control block is cleared in the Tx path since commit
> > fd65e5a95d08 ("net: bridge: clear bridge's private skb space on xmit").
> > 
> > I don't know how far back Nik's patch was backported and I don't know
> > how far back Andrew's patch will be backported, so it might be best to
> > submit Andrew's patch to net as-is and then in net-next change
> > nbp_switchdev_allowed_egress() and remove br_switchdev_frame_unmark()
> > from both the Rx and Tx paths.
> > 
> > Anyway, I have applied this patch to our tree for testing. Will report
> > tomorrow in case there are any regressions.
> 
> Hi Ido
> 
> Did your testing find any issues?

No, patch is fine. Thanks!
diff mbox series

Patch

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 196417859c4a..9327a5fad1df 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -39,6 +39,13 @@  static int br_pass_frame_up(struct sk_buff *skb)
 	dev_sw_netstats_rx_add(brdev, skb->len);
 
 	vg = br_vlan_group_rcu(br);
+
+	/* Reset the offload_fwd_mark because there could be a stacked
+	 * bridge above, and it should not think this bridge it doing
+	 * that bridges work forward out its ports.
+	 */
+	br_switchdev_frame_unmark(skb);
+
 	/* Bridge is just like any other port.  Make sure the
 	 * packet is allowed except in promisc mode when someone
 	 * may be running packet capture.