diff mbox series

[RFC,1/2] net: bridge: add knob for filtering rx/tx BPDU packets on a port

Message ID 20220210142401.4912-1-nbd@nbd.name (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,1/2] net: bridge: add knob for filtering rx/tx BPDU packets on a port | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4868 this patch: 4868
netdev/cc_maintainers fail 5 maintainers not CCed: roopa@nvidia.com kuba@kernel.org nikolay@nvidia.com davem@davemloft.net bridge@lists.linux-foundation.org
netdev/build_clang success Errors and warnings before: 826 this patch: 826
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5025 this patch: 5025
netdev/checkpatch warning WARNING: line length of 84 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Felix Fietkau Feb. 10, 2022, 2:24 p.m. UTC
Some devices (e.g. wireless APs) can't have devices behind them be part of
a bridge topology with redundant links, due to address limitations.
Additionally, broadcast traffic on these devices is somewhat expensive, due to
the low data rate and wakeups of clients in powersave mode.
This knob can be used to ensure that BPDU packets are never sent or forwarded
to/from these devices

Signed-off-by: Felix Fietkau <nbd@nbd.name>
---
 include/linux/if_bridge.h    | 1 +
 include/uapi/linux/if_link.h | 1 +
 net/bridge/br_forward.c      | 5 +++++
 net/bridge/br_input.c        | 2 ++
 net/bridge/br_netlink.c      | 6 +++++-
 net/bridge/br_stp_bpdu.c     | 9 +++++++--
 net/core/rtnetlink.c         | 4 +++-
 7 files changed, 24 insertions(+), 4 deletions(-)

Comments

Nikolay Aleksandrov Feb. 10, 2022, 2:55 p.m. UTC | #1
On 10/02/2022 16:24, Felix Fietkau wrote:
> Some devices (e.g. wireless APs) can't have devices behind them be part of
> a bridge topology with redundant links, due to address limitations.
> Additionally, broadcast traffic on these devices is somewhat expensive, due to
> the low data rate and wakeups of clients in powersave mode.
> This knob can be used to ensure that BPDU packets are never sent or forwarded
> to/from these devices
> 
> Signed-off-by: Felix Fietkau <nbd@nbd.name>
> ---
>  include/linux/if_bridge.h    | 1 +
>  include/uapi/linux/if_link.h | 1 +
>  net/bridge/br_forward.c      | 5 +++++
>  net/bridge/br_input.c        | 2 ++
>  net/bridge/br_netlink.c      | 6 +++++-
>  net/bridge/br_stp_bpdu.c     | 9 +++++++--
>  net/core/rtnetlink.c         | 4 +++-
>  7 files changed, 24 insertions(+), 4 deletions(-)
> 

Why can't netfilter or tc be used to filter these frames?
Felix Fietkau Feb. 10, 2022, 4:06 p.m. UTC | #2
On 10.02.22 15:55, Nikolay Aleksandrov wrote:
> On 10/02/2022 16:24, Felix Fietkau wrote:
>> Some devices (e.g. wireless APs) can't have devices behind them be part of
>> a bridge topology with redundant links, due to address limitations.
>> Additionally, broadcast traffic on these devices is somewhat expensive, due to
>> the low data rate and wakeups of clients in powersave mode.
>> This knob can be used to ensure that BPDU packets are never sent or forwarded
>> to/from these devices
>> 
>> Signed-off-by: Felix Fietkau <nbd@nbd.name>
>> ---
>>  include/linux/if_bridge.h    | 1 +
>>  include/uapi/linux/if_link.h | 1 +
>>  net/bridge/br_forward.c      | 5 +++++
>>  net/bridge/br_input.c        | 2 ++
>>  net/bridge/br_netlink.c      | 6 +++++-
>>  net/bridge/br_stp_bpdu.c     | 9 +++++++--
>>  net/core/rtnetlink.c         | 4 +++-
>>  7 files changed, 24 insertions(+), 4 deletions(-)
>> 
> 
> Why can't netfilter or tc be used to filter these frames?
netfilter is slow as hell, and even adding a tc rule that has to look at 
all frames to check for useless BPDU packets costs a lot more CPU cycles 
than simply suppressing them at the source.

- Felix
Nikolay Aleksandrov Feb. 11, 2022, 8:16 a.m. UTC | #3
(resending, for some reason my first email didn't make it to the mailing list)

On 10/02/2022 18:06, Felix Fietkau wrote:
> 
> On 10.02.22 15:55, Nikolay Aleksandrov wrote:
>> On 10/02/2022 16:24, Felix Fietkau wrote:
>>> Some devices (e.g. wireless APs) can't have devices behind them be part of
>>> a bridge topology with redundant links, due to address limitations.
>>> Additionally, broadcast traffic on these devices is somewhat expensive, due to
>>> the low data rate and wakeups of clients in powersave mode.
>>> This knob can be used to ensure that BPDU packets are never sent or forwarded
>>> to/from these devices
>>>
>>> Signed-off-by: Felix Fietkau <nbd@nbd.name>
>>> ---
>>>  include/linux/if_bridge.h    | 1 +
>>>  include/uapi/linux/if_link.h | 1 +
>>>  net/bridge/br_forward.c      | 5 +++++
>>>  net/bridge/br_input.c        | 2 ++
>>>  net/bridge/br_netlink.c      | 6 +++++-
>>>  net/bridge/br_stp_bpdu.c     | 9 +++++++--
>>>  net/core/rtnetlink.c         | 4 +++-
>>>  7 files changed, 24 insertions(+), 4 deletions(-)
>>>
>>
>> Why can't netfilter or tc be used to filter these frames?
> netfilter is slow as hell, and even adding a tc rule that has to look at all frames to check for useless BPDU packets costs a lot more CPU cycles than simply suppressing them at the source.
> 
> - Felix

You can use XDP, should be much faster. I don't want new tests in the fast path
for something that is already solved and doesn't need anything bridge-specific.
 
Tomorrow someone will try the same with some other packet type,
sorry but absolutely unacceptable.

Cheers,
 Nik
Stephen Hemminger Feb. 11, 2022, 5:01 p.m. UTC | #4
On Thu, 10 Feb 2022 16:55:40 +0200
Nikolay Aleksandrov <nikolay@nvidia.com> wrote:

> On 10/02/2022 16:24, Felix Fietkau wrote:
> > Some devices (e.g. wireless APs) can't have devices behind them be part of
> > a bridge topology with redundant links, due to address limitations.
> > Additionally, broadcast traffic on these devices is somewhat expensive, due to
> > the low data rate and wakeups of clients in powersave mode.
> > This knob can be used to ensure that BPDU packets are never sent or forwarded
> > to/from these devices
> > 
> > Signed-off-by: Felix Fietkau <nbd@nbd.name>
> > ---
> >  include/linux/if_bridge.h    | 1 +
> >  include/uapi/linux/if_link.h | 1 +
> >  net/bridge/br_forward.c      | 5 +++++
> >  net/bridge/br_input.c        | 2 ++
> >  net/bridge/br_netlink.c      | 6 +++++-
> >  net/bridge/br_stp_bpdu.c     | 9 +++++++--
> >  net/core/rtnetlink.c         | 4 +++-
> >  7 files changed, 24 insertions(+), 4 deletions(-)
> >   
> 
> Why can't netfilter or tc be used to filter these frames?
> 
> 

It could but this looks better.
BPDU filter matches what other hardware switch vendors offer and
has the benefit of not adding another layer of complexity into
user configurations.

Adding one rule into a complex firewall or starting to have to
configure tc is a mess.
diff mbox series

Patch

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 509e18c7e740..18d3b264b754 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -58,6 +58,7 @@  struct br_ip_list {
 #define BR_MRP_LOST_CONT	BIT(18)
 #define BR_MRP_LOST_IN_CONT	BIT(19)
 #define BR_TX_FWD_OFFLOAD	BIT(20)
+#define BR_BPDU_FILTER		BIT(21)
 
 #define BR_DEFAULT_AGEING_TIME	(300 * HZ)
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6218f93f5c1a..4c847c2d6afa 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -537,6 +537,7 @@  enum {
 	IFLA_BRPORT_MRP_IN_OPEN,
 	IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT,
 	IFLA_BRPORT_MCAST_EHT_HOSTS_CNT,
+	IFLA_BRPORT_BPDU_FILTER,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index ec646656dbf1..9fe5c888f27d 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -199,6 +199,7 @@  static struct net_bridge_port *maybe_deliver(
 void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	      enum br_pkt_type pkt_type, bool local_rcv, bool local_orig)
 {
+	const unsigned char *dest = eth_hdr(skb)->h_dest;
 	struct net_bridge_port *prev = NULL;
 	struct net_bridge_port *p;
 
@@ -214,6 +215,10 @@  void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		case BR_PKT_MULTICAST:
 			if (!(p->flags & BR_MCAST_FLOOD) && skb->dev != br->dev)
 				continue;
+			if ((p->flags & BR_BPDU_FILTER) &&
+			    unlikely(is_link_local_ether_addr(dest) &&
+				     dest[5] == 0))
+				continue;
 			break;
 		case BR_PKT_BROADCAST:
 			if (!(p->flags & BR_BCAST_FLOOD) && skb->dev != br->dev)
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index b50382f957c1..d8263c4849c1 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -316,6 +316,8 @@  static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 		fwd_mask |= p->group_fwd_mask;
 		switch (dest[5]) {
 		case 0x00:	/* Bridge Group Address */
+			if (p->flags & BR_BPDU_FILTER)
+				goto drop;
 			/* If STP is turned off,
 			   then must forward to keep loop detection */
 			if (p->br->stp_enabled == BR_NO_STP ||
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 2ff83d84230d..11215c55adc2 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -184,6 +184,7 @@  static inline size_t br_port_info_size(void)
 		+ nla_total_size(1)	/* IFLA_BRPORT_VLAN_TUNNEL */
 		+ nla_total_size(1)	/* IFLA_BRPORT_NEIGH_SUPPRESS */
 		+ nla_total_size(1)	/* IFLA_BRPORT_ISOLATED */
+		+ nla_total_size(1)	/* IFLA_BRPORT_BPDU_FILTER */
 		+ nla_total_size(sizeof(struct ifla_bridge_id))	/* IFLA_BRPORT_ROOT_ID */
 		+ nla_total_size(sizeof(struct ifla_bridge_id))	/* IFLA_BRPORT_BRIDGE_ID */
 		+ nla_total_size(sizeof(u16))	/* IFLA_BRPORT_DESIGNATED_PORT */
@@ -269,7 +270,8 @@  static int br_port_fill_attrs(struct sk_buff *skb,
 							  BR_MRP_LOST_CONT)) ||
 	    nla_put_u8(skb, IFLA_BRPORT_MRP_IN_OPEN,
 		       !!(p->flags & BR_MRP_LOST_IN_CONT)) ||
-	    nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)))
+	    nla_put_u8(skb, IFLA_BRPORT_ISOLATED, !!(p->flags & BR_ISOLATED)) ||
+	    nla_put_u8(skb, IFLA_BRPORT_BPDU_FILTER, !!(p->flags & BR_BPDU_FILTER)))
 		return -EMSGSIZE;
 
 	timerval = br_timer_value(&p->message_age_timer);
@@ -829,6 +831,7 @@  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_ISOLATED]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 	[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT] = { .type = NLA_U32 },
+	[IFLA_BRPORT_BPDU_FILTER] = { .type = NLA_U8 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -893,6 +896,7 @@  static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
 	br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
 	br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
 	br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
+	br_set_port_flag(p, tb, IFLA_BRPORT_BPDU_FILTER, BR_BPDU_FILTER);
 
 	changed_mask = old_flags ^ p->flags;
 
diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c
index 0e4572f31330..9d2a235260eb 100644
--- a/net/bridge/br_stp_bpdu.c
+++ b/net/bridge/br_stp_bpdu.c
@@ -80,7 +80,8 @@  void br_send_config_bpdu(struct net_bridge_port *p, struct br_config_bpdu *bpdu)
 {
 	unsigned char buf[35];
 
-	if (p->br->stp_enabled != BR_KERNEL_STP)
+	if (p->br->stp_enabled != BR_KERNEL_STP ||
+	    (p->flags & BR_BPDU_FILTER))
 		return;
 
 	buf[0] = 0;
@@ -127,7 +128,8 @@  void br_send_tcn_bpdu(struct net_bridge_port *p)
 {
 	unsigned char buf[4];
 
-	if (p->br->stp_enabled != BR_KERNEL_STP)
+	if (p->br->stp_enabled != BR_KERNEL_STP ||
+	    (p->flags & BR_BPDU_FILTER))
 		return;
 
 	buf[0] = 0;
@@ -172,6 +174,9 @@  void br_stp_rcv(const struct stp_proto *proto, struct sk_buff *skb,
 	if (!(br->dev->flags & IFF_UP))
 		goto out;
 
+	if (p->flags & BR_BPDU_FILTER)
+		goto out;
+
 	if (p->state == BR_STATE_DISABLED)
 		goto out;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 710da8a36729..00328f0dd22b 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4707,7 +4707,9 @@  int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 	    brport_nla_put_flag(skb, flags, mask,
 				IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD) ||
 	    brport_nla_put_flag(skb, flags, mask,
-				IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD)) {
+				IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD) ||
+	    brport_nla_put_flag(skb, flags, mask,
+				IFLA_BRPORT_BPDU_FILTER, BR_BPDU_FILTER)) {
 		nla_nest_cancel(skb, protinfo);
 		goto nla_put_failure;
 	}