diff mbox series

[net-next,07/16] net: bridge: Maintain number of MDB entries in net_bridge_mcast_port

Message ID 1dcd4638d78c469eaa2f528de1f69b098222876f.1674752051.git.petrm@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series bridge: Limit number of MDB entries per port, port-vlan | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count fail Series longer than 15 patches (and no cover letter)
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7 this patch: 7
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 106 exceeds 80 columns WARNING: line length of 111 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0

Commit Message

Petr Machata Jan. 26, 2023, 5:01 p.m. UTC
The MDB maintained by the bridge is limited. When the bridge is configured
for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
capacity. In SW datapath, the capacity is configurable through the
IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
similar limit exists in the HW datapath for purposes of offloading.

In order to prevent the issue of unilateral exhaustion of MDB resources,
introduce two parameters in each of two contexts:

- Per-port and per-port-VLAN number of MDB entries that the port
  is member in.

- Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
  per-port-VLAN maximum permitted number of MDB entries, or 0 for
  no limit.

The per-port multicast context is used for tracking of MDB entries for the
port as a whole. This is available for all bridges.

The per-port-VLAN multicast context is then only available on
VLAN-filtering bridges on VLANs that have multicast snooping on.

With these changes in place, it will be possible to configure MDB limit for
bridge as a whole, or any one port as a whole, or any single port-VLAN.

Note that unlike the global limit, exhaustion of the per-port and
per-port-VLAN maximums does not cause disablement of multicast snooping.
It is also permitted to configure the local limit larger than hash_max,
even though that is not useful.

In this patch, introduce only the accounting for number of entries, and the
max field itself, but not the means to toggle the max. The next patch
introduces the netlink APIs to toggle and read the values.

Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
snooping is enabled. The reason for this is that while VLAN snooping is
disabled, permanent entries can be added above the limit imposed by the
configured maximum. Under those circumstances, whatever caused the VLAN
context enablement, would need to be rolled back, adding a fair amount of
code that would be rarely hit and tricky to maintain. At the same time,
the feature that this would enable is IMHO not interesting: I posit that
the usefulness of keeping mcast_max_groups intact across
mcast_vlan_snooping toggles is marginal at best.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---
 net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
 net/bridge/br_private.h   |   2 +
 2 files changed, 132 insertions(+), 1 deletion(-)

Comments

Nikolay Aleksandrov Jan. 29, 2023, 9:40 a.m. UTC | #1
On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 

Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Nikolay Aleksandrov Jan. 29, 2023, 4:55 p.m. UTC | #2
On 26/01/2023 19:01, Petr Machata wrote:
> The MDB maintained by the bridge is limited. When the bridge is configured
> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> capacity. In SW datapath, the capacity is configurable through the
> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> similar limit exists in the HW datapath for purposes of offloading.
> 
> In order to prevent the issue of unilateral exhaustion of MDB resources,
> introduce two parameters in each of two contexts:
> 
> - Per-port and per-port-VLAN number of MDB entries that the port
>   is member in.
> 
> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>   no limit.
> 
> The per-port multicast context is used for tracking of MDB entries for the
> port as a whole. This is available for all bridges.
> 
> The per-port-VLAN multicast context is then only available on
> VLAN-filtering bridges on VLANs that have multicast snooping on.
> 
> With these changes in place, it will be possible to configure MDB limit for
> bridge as a whole, or any one port as a whole, or any single port-VLAN.
> 
> Note that unlike the global limit, exhaustion of the per-port and
> per-port-VLAN maximums does not cause disablement of multicast snooping.
> It is also permitted to configure the local limit larger than hash_max,
> even though that is not useful.
> 
> In this patch, introduce only the accounting for number of entries, and the
> max field itself, but not the means to toggle the max. The next patch
> introduces the netlink APIs to toggle and read the values.
> 
> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> snooping is enabled. The reason for this is that while VLAN snooping is
> disabled, permanent entries can be added above the limit imposed by the
> configured maximum. Under those circumstances, whatever caused the VLAN
> context enablement, would need to be rolled back, adding a fair amount of
> code that would be rarely hit and tricky to maintain. At the same time,
> the feature that this would enable is IMHO not interesting: I posit that
> the usefulness of keeping mcast_max_groups intact across
> mcast_vlan_snooping toggles is marginal at best.
> 

Hmm, I keep thinking about this one and I don't completely agree. It would be
more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
Imposing order of operations (first enable snooping, then config max entries) isn't necessary
and it makes sense for someone to first set the limit and then enable vlan snooping.
Also it would be consistent with port max entries, I'd prefer if we have the same
behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
to check if snooping is enabled so we can keep the count correct regardless, the same as
it's done for the ports. Keeping both limits with consistent semantics seems better to me.

WDYT ?

> Signed-off-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/bridge/br_multicast.c | 131 +++++++++++++++++++++++++++++++++++++-
>  net/bridge/br_private.h   |   2 +
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 51b622afdb67..de531109b947 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -31,6 +31,7 @@
>  #include <net/ip6_checksum.h>
>  #include <net/addrconf.h>
>  #endif
> +#include <trace/events/bridge.h>
>  
>  #include "br_private.h"
>  #include "br_private_mcast_eht.h"
> @@ -234,6 +235,29 @@ br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
>  	return pmctx;
>  }
>  
> +static struct net_bridge_mcast_port *
> +br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx = NULL;
> +	struct net_bridge_vlan *vlan;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
> +		return NULL;
> +
> +	/* Take RCU to access the vlan. */
> +	rcu_read_lock();
> +
> +	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
> +	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
> +		pmctx = &vlan->port_mcast_ctx;
> +
> +	rcu_read_unlock();
> +
> +	return pmctx;
> +}
> +
>  /* when snooping we need to check if the contexts should be used
>   * in the following order:
>   * - if pmctx is non-NULL (port), check if it should be used
> @@ -668,6 +692,80 @@ void br_multicast_del_group_src(struct net_bridge_group_src *src,
>  	__br_multicast_del_group_src(src);
>  }
>  
> +static int
> +br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
> +				  struct netlink_ext_ack *extack)
> +{
> +	if (pmctx->mdb_max_entries &&
> +	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
> +		return -E2BIG;
> +
> +	pmctx->mdb_n_entries++;
> +	return 0;
> +}
> +
> +static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
> +{
> +	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
> +}
> +
> +static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
> +					 const struct br_ip *group,
> +					 struct netlink_ext_ack *extack)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +	int err;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	/* Always count on the port context. */
> +	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
> +				       port->multicast_ctx.mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		return err;
> +	}
> +
> +	/* Only count on the VLAN context if VID is given, and if snooping on
> +	 * that VLAN is enabled.
> +	 */
> +	if (!group->vid)
> +		return 0;
> +
> +	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
> +	if (!pmctx)
> +		return 0;
> +
> +	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
> +	if (err) {
> +		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
> +				       pmctx->mdb_max_entries);
> +		trace_br_mdb_full(port->dev, group);
> +		goto dec_one_out;
> +	}
> +
> +	return 0;
> +
> +dec_one_out:
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +	return err;
> +}
> +
> +static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
> +{
> +	struct net_bridge_mcast_port *pmctx;
> +
> +	lockdep_assert_held_once(&port->br->multicast_lock);
> +
> +	if (vid) {
> +		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
> +		if (pmctx)
> +			br_multicast_port_ngroups_dec_one(pmctx);
> +	}
> +	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
> +}
> +
>  static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
>  {
>  	struct net_bridge_port_group *pg;
> @@ -702,6 +800,7 @@ void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
>  	} else {
>  		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
>  	}
> +	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
>  	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
>  	queue_work(system_long_wq, &br->mcast_gc_work);
>  
> @@ -1165,6 +1264,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
>  		return mp;
>  
>  	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
> +		trace_br_mdb_full(br->dev, group);
>  		br_mc_disabled_update(br->dev, false, NULL);
>  		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
>  		return ERR_PTR(-E2BIG);
> @@ -1288,11 +1388,16 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  			struct netlink_ext_ack *extack)
>  {
>  	struct net_bridge_port_group *p;
> +	int err;
> +
> +	err = br_multicast_port_ngroups_inc(port, group, extack);
> +	if (err)
> +		return NULL;
>  
>  	p = kzalloc(sizeof(*p), GFP_ATOMIC);
>  	if (unlikely(!p)) {
>  		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
> -		return NULL;
> +		goto dec_out;
>  	}
>  
>  	p->key.addr = *group;
> @@ -1326,18 +1431,22 @@ struct net_bridge_port_group *br_multicast_new_port_group(
>  
>  free_out:
>  	kfree(p);
> +dec_out:
> +	br_multicast_port_ngroups_dec(port, group->vid);
>  	return NULL;
>  }
>  
>  void br_multicast_del_port_group(struct net_bridge_port_group *p)
>  {
>  	struct net_bridge_port *port = p->key.port;
> +	__u16 vid = p->key.addr.vid;
>  
>  	hlist_del_init(&p->mglist);
>  	if (!br_multicast_is_star_g(&p->key.addr))
>  		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
>  				       br_sg_port_rht_params);
>  	kfree(p);
> +	br_multicast_port_ngroups_dec(port, vid);
>  }
>  
>  void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
> @@ -1951,6 +2060,26 @@ static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
>  		br_ip4_multicast_add_router(brmctx, pmctx);
>  		br_ip6_multicast_add_router(brmctx, pmctx);
>  	}
> +
> +	if (br_multicast_port_ctx_is_vlan(pmctx)) {
> +		struct net_bridge_port_group *pg;
> +
> +		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
> +		 * disabled, the mcast_n_groups counter is now wrong. First,
> +		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
> +		 * are flushed, thus mcast_n_groups after the toggle does not
> +		 * reflect the true values. And second, permanent entries added
> +		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
> +		 * either. Thus we have to refresh the counter.
> +		 */
> +
> +		pmctx->mdb_max_entries = 0;
> +		pmctx->mdb_n_entries = 0;
> +		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
> +			if (pg->key.addr.vid == pmctx->vlan->vid)
> +				br_multicast_port_ngroups_inc_one(pmctx, NULL);
> +		}
> +	}
>  }
>  
>  void br_multicast_enable_port(struct net_bridge_port *port)
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index e4069e27b5c6..49f411a0a1f1 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -126,6 +126,8 @@ struct net_bridge_mcast_port {
>  	struct hlist_node		ip6_rlist;
>  #endif /* IS_ENABLED(CONFIG_IPV6) */
>  	unsigned char			multicast_router;
> +	u32				mdb_n_entries;
> +	u32				mdb_max_entries;
>  #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
>  };
>
Ido Schimmel Jan. 30, 2023, 8:08 a.m. UTC | #3
On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
> On 26/01/2023 19:01, Petr Machata wrote:
> > The MDB maintained by the bridge is limited. When the bridge is configured
> > for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
> > capacity. In SW datapath, the capacity is configurable through the
> > IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
> > similar limit exists in the HW datapath for purposes of offloading.
> > 
> > In order to prevent the issue of unilateral exhaustion of MDB resources,
> > introduce two parameters in each of two contexts:
> > 
> > - Per-port and per-port-VLAN number of MDB entries that the port
> >   is member in.
> > 
> > - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
> >   per-port-VLAN maximum permitted number of MDB entries, or 0 for
> >   no limit.
> > 
> > The per-port multicast context is used for tracking of MDB entries for the
> > port as a whole. This is available for all bridges.
> > 
> > The per-port-VLAN multicast context is then only available on
> > VLAN-filtering bridges on VLANs that have multicast snooping on.
> > 
> > With these changes in place, it will be possible to configure MDB limit for
> > bridge as a whole, or any one port as a whole, or any single port-VLAN.
> > 
> > Note that unlike the global limit, exhaustion of the per-port and
> > per-port-VLAN maximums does not cause disablement of multicast snooping.
> > It is also permitted to configure the local limit larger than hash_max,
> > even though that is not useful.
> > 
> > In this patch, introduce only the accounting for number of entries, and the
> > max field itself, but not the means to toggle the max. The next patch
> > introduces the netlink APIs to toggle and read the values.
> > 
> > Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
> > snooping is enabled. The reason for this is that while VLAN snooping is
> > disabled, permanent entries can be added above the limit imposed by the
> > configured maximum. Under those circumstances, whatever caused the VLAN
> > context enablement, would need to be rolled back, adding a fair amount of
> > code that would be rarely hit and tricky to maintain. At the same time,
> > the feature that this would enable is IMHO not interesting: I posit that
> > the usefulness of keeping mcast_max_groups intact across
> > mcast_vlan_snooping toggles is marginal at best.
> > 
> 
> Hmm, I keep thinking about this one and I don't completely agree. It would be
> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
> and it makes sense for someone to first set the limit and then enable vlan snooping.
> Also it would be consistent with port max entries, I'd prefer if we have the same
> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
> to check if snooping is enabled so we can keep the count correct regardless, the same as
> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
> 
> WDYT ?

The current approach is strict and prevents user space from performing
configuration that does not make a lot of sense:

1. Setting the maximum to be less than the current count.

2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
snooping is disabled (i.e., maximum is not enforced) and then enabling
VLAN snooping.

However, it is not consistent with similar existing behavior where the
kernel is more liberal. For example:

1. It is possible to set the global maximum to be less than the current
number of entries.

2. Other port-VLAN attributes are not reset when VLAN snooping is
toggled.

And it also results in order of operations problems like you described.

So, it seems to me that we have more good reasons to not reset the
maximum than to reset it. Regardless of which path we take, it is
important to document the behavior in the man page (and in the commit
message, obviously) to avoid "bug reports" later on.
Nikolay Aleksandrov Jan. 30, 2023, 8:56 a.m. UTC | #4
On 30/01/2023 10:08, Ido Schimmel wrote:
> On Sun, Jan 29, 2023 at 06:55:26PM +0200, Nikolay Aleksandrov wrote:
>> On 26/01/2023 19:01, Petr Machata wrote:
>>> The MDB maintained by the bridge is limited. When the bridge is configured
>>> for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
>>> capacity. In SW datapath, the capacity is configurable through the
>>> IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
>>> similar limit exists in the HW datapath for purposes of offloading.
>>>
>>> In order to prevent the issue of unilateral exhaustion of MDB resources,
>>> introduce two parameters in each of two contexts:
>>>
>>> - Per-port and per-port-VLAN number of MDB entries that the port
>>>   is member in.
>>>
>>> - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
>>>   per-port-VLAN maximum permitted number of MDB entries, or 0 for
>>>   no limit.
>>>
>>> The per-port multicast context is used for tracking of MDB entries for the
>>> port as a whole. This is available for all bridges.
>>>
>>> The per-port-VLAN multicast context is then only available on
>>> VLAN-filtering bridges on VLANs that have multicast snooping on.
>>>
>>> With these changes in place, it will be possible to configure MDB limit for
>>> bridge as a whole, or any one port as a whole, or any single port-VLAN.
>>>
>>> Note that unlike the global limit, exhaustion of the per-port and
>>> per-port-VLAN maximums does not cause disablement of multicast snooping.
>>> It is also permitted to configure the local limit larger than hash_max,
>>> even though that is not useful.
>>>
>>> In this patch, introduce only the accounting for number of entries, and the
>>> max field itself, but not the means to toggle the max. The next patch
>>> introduces the netlink APIs to toggle and read the values.
>>>
>>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>>> snooping is enabled. The reason for this is that while VLAN snooping is
>>> disabled, permanent entries can be added above the limit imposed by the
>>> configured maximum. Under those circumstances, whatever caused the VLAN
>>> context enablement, would need to be rolled back, adding a fair amount of
>>> code that would be rarely hit and tricky to maintain. At the same time,
>>> the feature that this would enable is IMHO not interesting: I posit that
>>> the usefulness of keeping mcast_max_groups intact across
>>> mcast_vlan_snooping toggles is marginal at best.
>>>
>>
>> Hmm, I keep thinking about this one and I don't completely agree. It would be
>> more user-friendly if the max count doesn't get reset when mcast snooping is toggled.
>> Imposing order of operations (first enable snooping, then config max entries) isn't necessary
>> and it makes sense for someone to first set the limit and then enable vlan snooping.
>> Also it would be consistent with port max entries, I'd prefer if we have the same
>> behaviour for port and vlan pmctxs. If we allow to set any maximum at any time we
>> don't need to rollback anything, also we already always lookup vlans in br_multicast_port_vid_to_port_ctx()
>> to check if snooping is enabled so we can keep the count correct regardless, the same as
>> it's done for the ports. Keeping both limits with consistent semantics seems better to me.
>>
>> WDYT ?
> 
> The current approach is strict and prevents user space from performing
> configuration that does not make a lot of sense:
> 
> 1. Setting the maximum to be less than the current count.
> 
> 2. Increasing the port-VLAN count above port-VLAN maximum when VLAN
> snooping is disabled (i.e., maximum is not enforced) and then enabling
> VLAN snooping.
> 
> However, it is not consistent with similar existing behavior where the
> kernel is more liberal. For example:
> 
> 1. It is possible to set the global maximum to be less than the current
> number of entries.
> 
> 2. Other port-VLAN attributes are not reset when VLAN snooping is
> toggled.
> 

Right, 2) is my main concern and could be surprising. I'd also like to
have consistent behaviour for both limits - port and vlan.

> And it also results in order of operations problems like you described.
> 
> So, it seems to me that we have more good reasons to not reset the
> maximum than to reset it. Regardless of which path we take, it is
> important to document the behavior in the man page (and in the commit
> message, obviously) to avoid "bug reports" later on.

+1
Absolutely agree.

Thanks,
 Nik
Petr Machata Jan. 30, 2023, 3:02 p.m. UTC | #5
Nikolay Aleksandrov <razor@blackwall.org> writes:

> On 26/01/2023 19:01, Petr Machata wrote:
>> Note that the per-port-VLAN mcast_max_groups value gets reset when VLAN
>> snooping is enabled. The reason for this is that while VLAN snooping is
>> disabled, permanent entries can be added above the limit imposed by the
>> configured maximum. Under those circumstances, whatever caused the VLAN
>> context enablement, would need to be rolled back, adding a fair amount of
>> code that would be rarely hit and tricky to maintain. At the same time,
>> the feature that this would enable is IMHO not interesting: I posit that
>> the usefulness of keeping mcast_max_groups intact across
>> mcast_vlan_snooping toggles is marginal at best.
>> 
>
> Hmm, I keep thinking about this one and I don't completely agree. It
> would be more user-friendly if the max count doesn't get reset when
> mcast snooping is toggled. Imposing order of operations (first enable
> snooping, then config max entries) isn't necessary and it makes sense
> for someone to first set the limit and then enable vlan snooping.

If you are talking about mcast_snooping, that can be disabled while
mcast_vlan_snooping is enabled. So you can configure everything, then
turn snooping on.

If you are talking about configuring max while mcast_vlan_snooping is
off, then I assumed one shouldn't touch the VLAN context if
br_multicast_port_ctx_vlan_disabled(). So we would need to track the n
and max in some other entity than in the multicast context. But maybe
I'm wrong.

> Also it would be consistent with port max entries, I'd prefer if we
> have the same behaviour for port and vlan pmctxs. If we allow to set
> any maximum at any time we don't need to rollback anything, also we
> already always lookup vlans in br_multicast_port_vid_to_port_ctx() to
> check if snooping is enabled so we can keep the count correct
> regardless, the same as it's done for the ports. Keeping both limits
> with consistent semantics seems better to me.

The idea of requiring max >= current felt so natural to me that I didn't
even check what mcast_hash_max was doing. Sure -- let's be consistent.
This will incidentally make all the rollbacks go away, and happily makes
sense WRT locking, too: since the relation between max and n is somewhat
loose, we don't need to worry too much about sequencing inc-/dec-n vs.
set-max.
diff mbox series

Patch

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 51b622afdb67..de531109b947 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -31,6 +31,7 @@ 
 #include <net/ip6_checksum.h>
 #include <net/addrconf.h>
 #endif
+#include <trace/events/bridge.h>
 
 #include "br_private.h"
 #include "br_private_mcast_eht.h"
@@ -234,6 +235,29 @@  br_multicast_pg_to_port_ctx(const struct net_bridge_port_group *pg)
 	return pmctx;
 }
 
+static struct net_bridge_mcast_port *
+br_multicast_port_vid_to_port_ctx(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx = NULL;
+	struct net_bridge_vlan *vlan;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (!br_opt_get(port->br, BROPT_MCAST_VLAN_SNOOPING_ENABLED))
+		return NULL;
+
+	/* Take RCU to access the vlan. */
+	rcu_read_lock();
+
+	vlan = br_vlan_find(nbp_vlan_group_rcu(port), vid);
+	if (vlan && !br_multicast_port_ctx_vlan_disabled(&vlan->port_mcast_ctx))
+		pmctx = &vlan->port_mcast_ctx;
+
+	rcu_read_unlock();
+
+	return pmctx;
+}
+
 /* when snooping we need to check if the contexts should be used
  * in the following order:
  * - if pmctx is non-NULL (port), check if it should be used
@@ -668,6 +692,80 @@  void br_multicast_del_group_src(struct net_bridge_group_src *src,
 	__br_multicast_del_group_src(src);
 }
 
+static int
+br_multicast_port_ngroups_inc_one(struct net_bridge_mcast_port *pmctx,
+				  struct netlink_ext_ack *extack)
+{
+	if (pmctx->mdb_max_entries &&
+	    pmctx->mdb_n_entries == pmctx->mdb_max_entries)
+		return -E2BIG;
+
+	pmctx->mdb_n_entries++;
+	return 0;
+}
+
+static void br_multicast_port_ngroups_dec_one(struct net_bridge_mcast_port *pmctx)
+{
+	WARN_ON_ONCE(pmctx->mdb_n_entries-- == 0);
+}
+
+static int br_multicast_port_ngroups_inc(struct net_bridge_port *port,
+					 const struct br_ip *group,
+					 struct netlink_ext_ack *extack)
+{
+	struct net_bridge_mcast_port *pmctx;
+	int err;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	/* Always count on the port context. */
+	err = br_multicast_port_ngroups_inc_one(&port->multicast_ctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port is already a member in mcast_max_groups (%u) groups",
+				       port->multicast_ctx.mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		return err;
+	}
+
+	/* Only count on the VLAN context if VID is given, and if snooping on
+	 * that VLAN is enabled.
+	 */
+	if (!group->vid)
+		return 0;
+
+	pmctx = br_multicast_port_vid_to_port_ctx(port, group->vid);
+	if (!pmctx)
+		return 0;
+
+	err = br_multicast_port_ngroups_inc_one(pmctx, extack);
+	if (err) {
+		NL_SET_ERR_MSG_FMT_MOD(extack, "Port-VLAN is already a member in mcast_max_groups (%u) groups",
+				       pmctx->mdb_max_entries);
+		trace_br_mdb_full(port->dev, group);
+		goto dec_one_out;
+	}
+
+	return 0;
+
+dec_one_out:
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+	return err;
+}
+
+static void br_multicast_port_ngroups_dec(struct net_bridge_port *port, u16 vid)
+{
+	struct net_bridge_mcast_port *pmctx;
+
+	lockdep_assert_held_once(&port->br->multicast_lock);
+
+	if (vid) {
+		pmctx = br_multicast_port_vid_to_port_ctx(port, vid);
+		if (pmctx)
+			br_multicast_port_ngroups_dec_one(pmctx);
+	}
+	br_multicast_port_ngroups_dec_one(&port->multicast_ctx);
+}
+
 static void br_multicast_destroy_port_group(struct net_bridge_mcast_gc *gc)
 {
 	struct net_bridge_port_group *pg;
@@ -702,6 +800,7 @@  void br_multicast_del_pg(struct net_bridge_mdb_entry *mp,
 	} else {
 		br_multicast_star_g_handle_mode(pg, MCAST_INCLUDE);
 	}
+	br_multicast_port_ngroups_dec(pg->key.port, pg->key.addr.vid);
 	hlist_add_head(&pg->mcast_gc.gc_node, &br->mcast_gc_list);
 	queue_work(system_long_wq, &br->mcast_gc_work);
 
@@ -1165,6 +1264,7 @@  struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
 		return mp;
 
 	if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+		trace_br_mdb_full(br->dev, group);
 		br_mc_disabled_update(br->dev, false, NULL);
 		br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
 		return ERR_PTR(-E2BIG);
@@ -1288,11 +1388,16 @@  struct net_bridge_port_group *br_multicast_new_port_group(
 			struct netlink_ext_ack *extack)
 {
 	struct net_bridge_port_group *p;
+	int err;
+
+	err = br_multicast_port_ngroups_inc(port, group, extack);
+	if (err)
+		return NULL;
 
 	p = kzalloc(sizeof(*p), GFP_ATOMIC);
 	if (unlikely(!p)) {
 		NL_SET_ERR_MSG_MOD(extack, "Couldn't allocate new port group");
-		return NULL;
+		goto dec_out;
 	}
 
 	p->key.addr = *group;
@@ -1326,18 +1431,22 @@  struct net_bridge_port_group *br_multicast_new_port_group(
 
 free_out:
 	kfree(p);
+dec_out:
+	br_multicast_port_ngroups_dec(port, group->vid);
 	return NULL;
 }
 
 void br_multicast_del_port_group(struct net_bridge_port_group *p)
 {
 	struct net_bridge_port *port = p->key.port;
+	__u16 vid = p->key.addr.vid;
 
 	hlist_del_init(&p->mglist);
 	if (!br_multicast_is_star_g(&p->key.addr))
 		rhashtable_remove_fast(&port->br->sg_port_tbl, &p->rhnode,
 				       br_sg_port_rht_params);
 	kfree(p);
+	br_multicast_port_ngroups_dec(port, vid);
 }
 
 void br_multicast_host_join(const struct net_bridge_mcast *brmctx,
@@ -1951,6 +2060,26 @@  static void __br_multicast_enable_port_ctx(struct net_bridge_mcast_port *pmctx)
 		br_ip4_multicast_add_router(brmctx, pmctx);
 		br_ip6_multicast_add_router(brmctx, pmctx);
 	}
+
+	if (br_multicast_port_ctx_is_vlan(pmctx)) {
+		struct net_bridge_port_group *pg;
+
+		/* If BR_VLFLAG_MCAST_ENABLED was enabled in the past, but then
+		 * disabled, the mcast_n_groups counter is now wrong. First,
+		 * BR_VLFLAG_MCAST_ENABLED is toggled before temporary entries
+		 * are flushed, thus mcast_n_groups after the toggle does not
+		 * reflect the true values. And second, permanent entries added
+		 * while BR_VLFLAG_MCAST_ENABLED was disabled, are not reflected
+		 * either. Thus we have to refresh the counter.
+		 */
+
+		pmctx->mdb_max_entries = 0;
+		pmctx->mdb_n_entries = 0;
+		hlist_for_each_entry(pg, &pmctx->port->mglist, mglist) {
+			if (pg->key.addr.vid == pmctx->vlan->vid)
+				br_multicast_port_ngroups_inc_one(pmctx, NULL);
+		}
+	}
 }
 
 void br_multicast_enable_port(struct net_bridge_port *port)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index e4069e27b5c6..49f411a0a1f1 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -126,6 +126,8 @@  struct net_bridge_mcast_port {
 	struct hlist_node		ip6_rlist;
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 	unsigned char			multicast_router;
+	u32				mdb_n_entries;
+	u32				mdb_max_entries;
 #endif /* CONFIG_BRIDGE_IGMP_SNOOPING */
 };