diff mbox series

[net-next,v2,2/3] bridge: Add a limit on learned FDB entries

Message ID 20230619071444.14625-3-jnixdorf-oss@avm.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series bridge: Add a limit on learned FDB entries | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4165 this patch: 4165
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 921 this patch: 921
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4383 this patch: 4383
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0

Commit Message

Johannes Nixdorf June 19, 2023, 7:14 a.m. UTC
A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.

There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.

Mitigate this by adding a bridge netlink setting
IFLA_BR_FDB_MAX_LEARNED_ENTRIES, which, if nonzero, limits the amount
of learned entries to a user specified maximum.

For backwards compatibility the default setting of 0 disables the limit.

User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.

All changes to fdb_n_entries are under br->hash_lock, which means we do
not need additional locking. The call paths are (✓ denotes that
br->hash_lock is taken around the next call):

 - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
                |                     +- br_fdb_change_mac_address ✓
                |                     +- br_fdb_delete_by_port ✓
                +- br_fdb_find_delete_local ✓
                +- fdb_add_local <-+- br_fdb_changeaddr ✓
                |                  +- br_fdb_change_mac_address ✓
                |                  +- br_fdb_add_local ✓
                +- br_fdb_cleanup ✓
                +- br_fdb_flush ✓
                +- br_fdb_delete_by_port ✓
                +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
                +- br_fdb_external_learn_del ✓
 - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
                |                  +- br_fdb_change_mac_address ✓
                |                  +- br_fdb_add_local ✓
                +- br_fdb_update ✓
                +- fdb_add_entry <--- __br_fdb_add ✓
                +- br_fdb_external_learn_add ✓

The flags that imply an entry does not come from learning
(BR_FDB_NOT_LEARNED_MASK) are now only set or cleared under br->hash_lock
as well, and when the boolean value of (fdb->flags &
BR_FDB_NOT_LEARNED_MASK) changes the accounting is updated.

This introduces one additional locked update in br_fdb_update if
BR_FDB_ADDED_BY_USER was set. This is only the case when creating a new
entry via netlink, and never in the packet handling fast path.

Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>

---

Changes since v1:
 - Do not initialize fdb_*_entries to 0. (from review)
 - Do not skip decrementing on 0. (from review)
 - Moved the counters to a conditional hole in struct net_bridge to
   avoid growing the struct. (from review, it still grows the struct as
   there are 2 32-bit values)
 - Add IFLA_BR_FDB_CUR_LEARNED_ENTRIES (from review)
 - Fix br_get_size()
 - Only limit learned entries, rename to
   *_(CUR|MAX)_LEARNED_ENTRIES. (from review)

Obsolete v1 review comments:
 - Return better errors to users: Due to limiting the limit to
   automatically created entries, netlink fdb add requests and changing
   bridge ports are never rejected, so they do not yet need a more
   friendly error returned.

 include/uapi/linux/if_link.h |  2 ++
 net/bridge/br_fdb.c          | 67 +++++++++++++++++++++++++++++++++---
 net/bridge/br_netlink.c      | 13 ++++++-
 net/bridge/br_private.h      |  6 ++++
 4 files changed, 83 insertions(+), 5 deletions(-)

Comments

Ido Schimmel June 19, 2023, 3:34 p.m. UTC | #1
On Mon, Jun 19, 2023 at 09:14:42AM +0200, Johannes Nixdorf wrote:
> A malicious actor behind one bridge port may spam the kernel with packets
> with a random source MAC address, each of which will create an FDB entry,
> each of which is a dynamic allocation in the kernel.
> 
> There are roughly 2^48 different MAC addresses, further limited by the
> rhashtable they are stored in to 2^31. Each entry is of the type struct
> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
> 256GiB, which is too much for most computers.
> 
> Mitigate this by adding a bridge netlink setting
> IFLA_BR_FDB_MAX_LEARNED_ENTRIES, which, if nonzero, limits the amount
> of learned entries to a user specified maximum.
> 
> For backwards compatibility the default setting of 0 disables the limit.
> 
> User-added entries by netlink or from bridge or bridge port addresses
> are never blocked and do not count towards that limit.
> 
> All changes to fdb_n_entries are under br->hash_lock, which means we do
> not need additional locking. The call paths are (✓ denotes that
> br->hash_lock is taken around the next call):
> 
>  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
>                 |                     +- br_fdb_change_mac_address ✓
>                 |                     +- br_fdb_delete_by_port ✓
>                 +- br_fdb_find_delete_local ✓
>                 +- fdb_add_local <-+- br_fdb_changeaddr ✓
>                 |                  +- br_fdb_change_mac_address ✓
>                 |                  +- br_fdb_add_local ✓
>                 +- br_fdb_cleanup ✓
>                 +- br_fdb_flush ✓
>                 +- br_fdb_delete_by_port ✓
>                 +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
>                 +- br_fdb_external_learn_del ✓
>  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
>                 |                  +- br_fdb_change_mac_address ✓
>                 |                  +- br_fdb_add_local ✓
>                 +- br_fdb_update ✓
>                 +- fdb_add_entry <--- __br_fdb_add ✓
>                 +- br_fdb_external_learn_add ✓
> 
> The flags that imply an entry does not come from learning
> (BR_FDB_NOT_LEARNED_MASK) are now only set or cleared under br->hash_lock
> as well, and when the boolean value of (fdb->flags &
> BR_FDB_NOT_LEARNED_MASK) changes the accounting is updated.
> 
> This introduces one additional locked update in br_fdb_update if
> BR_FDB_ADDED_BY_USER was set. This is only the case when creating a new
> entry via netlink, and never in the packet handling fast path.
> 
> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
> 
> ---
> 
> Changes since v1:
>  - Do not initialize fdb_*_entries to 0. (from review)
>  - Do not skip decrementing on 0. (from review)
>  - Moved the counters to a conditional hole in struct net_bridge to
>    avoid growing the struct. (from review, it still grows the struct as
>    there are 2 32-bit values)
>  - Add IFLA_BR_FDB_CUR_LEARNED_ENTRIES (from review)
>  - Fix br_get_size()
>  - Only limit learned entries, rename to
>    *_(CUR|MAX)_LEARNED_ENTRIES. (from review)
> 
> Obsolete v1 review comments:
>  - Return better errors to users: Due to limiting the limit to
>    automatically created entries, netlink fdb add requests and changing
>    bridge ports are never rejected, so they do not yet need a more
>    friendly error returned.
> 
>  include/uapi/linux/if_link.h |  2 ++
>  net/bridge/br_fdb.c          | 67 +++++++++++++++++++++++++++++++++---
>  net/bridge/br_netlink.c      | 13 ++++++-
>  net/bridge/br_private.h      |  6 ++++

To minimize the number of changes per patch and make review easier, try
to first maintain the count and the maximum and then in a separate patch
expose them via netlink. See b57e8d870d52 and a1aee20d5db2, for example.
Merge commit is cb3086cee656.

>  4 files changed, 83 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 4ac1000b0ef2..165b9014379b 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -510,6 +510,8 @@ enum {
>  	IFLA_BR_VLAN_STATS_PER_PORT,
>  	IFLA_BR_MULTI_BOOLOPT,
>  	IFLA_BR_MCAST_QUERIER_STATE,
> +	IFLA_BR_FDB_CUR_LEARNED_ENTRIES,
> +	IFLA_BR_FDB_MAX_LEARNED_ENTRIES,
>  	__IFLA_BR_MAX,
>  };
>  
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index ac1dc8723b9c..bc61d1fd5fcf 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -301,6 +301,38 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  	}
>  }
>  
> +/* Set a FDB flag that implies the entry was not learned, and account
> + * for changes in the learned status.
> + */
> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
> +				       struct net_bridge_fdb_entry *fdb,
> +				       long nr)
> +{
> +	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
> +
> +	/* learned before, but we set a flag that implies it's manually added */
> +	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
> +		br->fdb_cur_learned_entries--;
> +	set_bit(nr, &fdb->flags);
> +}
> +
> +/* Set a FDB flag that implies the entry was not learned, and account
> + * for changes in the learned status.
> + *
> + * This function takes a lock, so ensure it is not called in the fast
> + * path.
> + */
> +static void fdb_set_flag_not_learned(struct net_bridge *br,
> +				     struct net_bridge_fdb_entry *fdb,
> +				     long nr)
> +{
> +	spin_lock_bh(&br->hash_lock);
> +
> +	__fdb_set_flag_not_learned(br, fdb, nr);
> +
> +	spin_unlock_bh(&br->hash_lock);
> +}
> +
>  /* When a static FDB entry is deleted, the HW address from that entry is
>   * also removed from the bridge private HW address list and updates all
>   * the ports with needed information.
> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  		       bool swdev_notify)
>  {
> +	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
> +
>  	trace_fdb_delete(br, f);
>  
>  	if (test_bit(BR_FDB_STATIC, &f->flags))
> @@ -329,11 +363,16 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  	hlist_del_init_rcu(&f->fdb_node);
>  	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>  			       br_fdb_rht_params);
> +	br->fdb_cur_learned_entries -= learned;
>  	fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
>  	call_rcu(&f->rcu, fdb_rcu_free);
>  }
>  
> -/* Delete a local entry if no other port had the same address. */
> +/* Delete a local entry if no other port had the same address.
> + *
> + * This function should only be called on entries with BR_FDB_LOCAL set,
> + * so clear_bit never removes the last bit in BR_FDB_NOT_LEARNED_MASK.
> + */
>  static void fdb_delete_local(struct net_bridge *br,
>  			     const struct net_bridge_port *p,
>  			     struct net_bridge_fdb_entry *f)
> @@ -390,6 +429,11 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
>  {
>  	struct net_bridge_fdb_entry *fdb;
>  	int err;
> +	bool learned = !(flags & BR_FDB_NOT_LEARNED_MASK);

https://www.kernel.org/doc/html/latest/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs

> +
> +	if (unlikely(learned && br->fdb_max_learned_entries &&
> +		     br->fdb_cur_learned_entries >= br->fdb_max_learned_entries))

This function is not run with RTNL held and so 'fdb_max_learned_entries'
can be updated concurrently from user space. You will need READ_ONCE() /
WRITE_ONCE() to silence KCSAN.

> +		return NULL;
>  
>  	fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC);
>  	if (!fdb)
> @@ -409,6 +453,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
>  
>  	hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list);
>  
> +	br->fdb_cur_learned_entries += learned;

Don't remember seeing bool being added to a u32...

> +
>  	return fdb;
>  }
>  
> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>  			}
>  
>  			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
> -				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
>  			if (unlikely(fdb_modified)) {
>  				trace_br_fdb_update(br, source, addr, vid, flags);
>  				fdb_notify(br, fdb, RTM_NEWNEIGH, true);
> @@ -1070,6 +1116,8 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>  			modified = true;
>  		}
>  
> +		if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
> +			br->fdb_cur_learned_entries--;
>  		set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
>  	}
>  
> @@ -1440,10 +1488,10 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>  		}
>  
>  		if (swdev_notify)
> -			set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +			__fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
>  
>  		if (!p)
> -			set_bit(BR_FDB_LOCAL, &fdb->flags);
> +			__fdb_set_flag_not_learned(br, fdb, BR_FDB_LOCAL);
>  
>  		if (modified)
>  			fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
> @@ -1508,3 +1556,14 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid)
>  	spin_unlock_bh(&p->br->hash_lock);
>  }
>  EXPORT_SYMBOL_GPL(br_fdb_clear_offload);
> +
> +u32 br_fdb_get_cur_learned_entries(struct net_bridge *br)
> +{
> +	u32 ret;
> +
> +	spin_lock_bh(&br->hash_lock);
> +	ret = br->fdb_cur_learned_entries;

I believe that for MDB Nik asked not to take a spin lock for each dump
and we used READ_ONCE() / WRITE_ONCE() instead.

> +	spin_unlock_bh(&br->hash_lock);
> +
> +	return ret;
> +}
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 05c5863d2e20..954c468d52ec 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
>  			return err;
>  	}
>  
> +	if (data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]) {

Please add this attribute to 'br_policy' and in a separate patch convert
the policy to use 'strict_start_type' so that future submissions will be
forced to update the policy. See c00041cf1cb82 for example.

I suggest writing a selftest to verify the expected behavior and to make
sure this feature does not regress. There are a lot of examples under
tools/testing/selftests/net/ and tools/testing/selftests/net/forwarding
in particular. You can post another version before writing a test, but
mark it as RFC so that it doesn't get applied by mistake.

Thanks

> +		u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]);
> +
> +		br->fdb_max_learned_entries = val;
> +	}
> +
>  	return 0;
>  }
>  
> @@ -1581,6 +1587,8 @@ static size_t br_get_size(const struct net_device *brdev)
>  	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_TOPOLOGY_CHANGE_TIMER */
>  	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_GC_TIMER */
>  	       nla_total_size(ETH_ALEN) +       /* IFLA_BR_GROUP_ADDR */
> +	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_CUR_LEARNED_ENTRIES */
> +	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_MAX_LEARNED_ENTRIES */
>  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>  	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_ROUTER */
>  	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_SNOOPING */
> @@ -1620,6 +1628,7 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
>  	u32 stp_enabled = br->stp_enabled;
>  	u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1];
>  	u8 vlan_enabled = br_vlan_enabled(br->dev);
> +	u32 fdb_cur_learned_entries = br_fdb_get_cur_learned_entries(br);
>  	struct br_boolopt_multi bm;
>  	u64 clockval;
>  
> @@ -1656,7 +1665,9 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
>  	    nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
>  		       br->topology_change_detected) ||
>  	    nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
> -	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm))
> +	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) ||
> +	    nla_put_u32(skb, IFLA_BR_FDB_CUR_LEARNED_ENTRIES, fdb_cur_learned_entries) ||
> +	    nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED_ENTRIES, br->fdb_max_learned_entries))
>  		return -EMSGSIZE;
>  
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2119729ded2b..df079191479e 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -275,6 +275,8 @@ enum {
>  	BR_FDB_LOCKED,
>  };
>  
> +#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))
> +
>  struct net_bridge_fdb_key {
>  	mac_addr addr;
>  	u16 vlan_id;
> @@ -553,6 +555,9 @@ struct net_bridge {
>  	struct kobject			*ifobj;
>  	u32				auto_cnt;
>  
> +	u32				fdb_max_learned_entries;
> +	u32				fdb_cur_learned_entries;
> +
>  #ifdef CONFIG_NET_SWITCHDEV
>  	/* Counter used to make sure that hardware domains get unique
>  	 * identifiers in case a bridge spans multiple switchdev instances.
> @@ -838,6 +843,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
>  			      bool swdev_notify);
>  void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
>  			  const unsigned char *addr, u16 vid, bool offloaded);
> +u32 br_fdb_get_cur_learned_entries(struct net_bridge *br);
>  
>  /* br_forward.c */
>  enum br_pkt_type {
> -- 
> 2.40.1
>
Nikolay Aleksandrov June 20, 2023, 6:55 a.m. UTC | #2
On 6/19/23 10:14, Johannes Nixdorf wrote:
> A malicious actor behind one bridge port may spam the kernel with packets
> with a random source MAC address, each of which will create an FDB entry,
> each of which is a dynamic allocation in the kernel.
> 
> There are roughly 2^48 different MAC addresses, further limited by the
> rhashtable they are stored in to 2^31. Each entry is of the type struct
> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
> 256GiB, which is too much for most computers.
> 
> Mitigate this by adding a bridge netlink setting
> IFLA_BR_FDB_MAX_LEARNED_ENTRIES, which, if nonzero, limits the amount
> of learned entries to a user specified maximum.
> 
> For backwards compatibility the default setting of 0 disables the limit.
> 
> User-added entries by netlink or from bridge or bridge port addresses
> are never blocked and do not count towards that limit.
> 
> All changes to fdb_n_entries are under br->hash_lock, which means we do
> not need additional locking. The call paths are (✓ denotes that
> br->hash_lock is taken around the next call):
> 
>   - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
>                  |                     +- br_fdb_change_mac_address ✓
>                  |                     +- br_fdb_delete_by_port ✓
>                  +- br_fdb_find_delete_local ✓
>                  +- fdb_add_local <-+- br_fdb_changeaddr ✓
>                  |                  +- br_fdb_change_mac_address ✓
>                  |                  +- br_fdb_add_local ✓
>                  +- br_fdb_cleanup ✓
>                  +- br_fdb_flush ✓
>                  +- br_fdb_delete_by_port ✓
>                  +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
>                  +- br_fdb_external_learn_del ✓
>   - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
>                  |                  +- br_fdb_change_mac_address ✓
>                  |                  +- br_fdb_add_local ✓
>                  +- br_fdb_update ✓
>                  +- fdb_add_entry <--- __br_fdb_add ✓
>                  +- br_fdb_external_learn_add ✓
> 
> The flags that imply an entry does not come from learning
> (BR_FDB_NOT_LEARNED_MASK) are now only set or cleared under br->hash_lock
> as well, and when the boolean value of (fdb->flags &
> BR_FDB_NOT_LEARNED_MASK) changes the accounting is updated.
> 
> This introduces one additional locked update in br_fdb_update if
> BR_FDB_ADDED_BY_USER was set. This is only the case when creating a new
> entry via netlink, and never in the packet handling fast path.
> 
> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
> 
> ---
> 
> Changes since v1:
>   - Do not initialize fdb_*_entries to 0. (from review)
>   - Do not skip decrementing on 0. (from review)
>   - Moved the counters to a conditional hole in struct net_bridge to
>     avoid growing the struct. (from review, it still grows the struct as
>     there are 2 32-bit values)
>   - Add IFLA_BR_FDB_CUR_LEARNED_ENTRIES (from review)
>   - Fix br_get_size()
>   - Only limit learned entries, rename to
>     *_(CUR|MAX)_LEARNED_ENTRIES. (from review)
> 
> Obsolete v1 review comments:
>   - Return better errors to users: Due to limiting the limit to
>     automatically created entries, netlink fdb add requests and changing
>     bridge ports are never rejected, so they do not yet need a more
>     friendly error returned.
> 
>   include/uapi/linux/if_link.h |  2 ++
>   net/bridge/br_fdb.c          | 67 +++++++++++++++++++++++++++++++++---
>   net/bridge/br_netlink.c      | 13 ++++++-
>   net/bridge/br_private.h      |  6 ++++
>   4 files changed, 83 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 4ac1000b0ef2..165b9014379b 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -510,6 +510,8 @@ enum {
>   	IFLA_BR_VLAN_STATS_PER_PORT,
>   	IFLA_BR_MULTI_BOOLOPT,
>   	IFLA_BR_MCAST_QUERIER_STATE,
> +	IFLA_BR_FDB_CUR_LEARNED_ENTRIES,
> +	IFLA_BR_FDB_MAX_LEARNED_ENTRIES,
>   	__IFLA_BR_MAX,
>   };
>   
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index ac1dc8723b9c..bc61d1fd5fcf 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -301,6 +301,38 @@ static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
>   	}
>   }
>   
> +/* Set a FDB flag that implies the entry was not learned, and account
> + * for changes in the learned status.
> + */
> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
> +				       struct net_bridge_fdb_entry *fdb,
> +				       long nr)
> +{
> +	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));

Please use *_bit

> +
> +	/* learned before, but we set a flag that implies it's manually added */
> +	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))

Please use *_bit

> +		br->fdb_cur_learned_entries--;
> +	set_bit(nr, &fdb->flags);
> +}

Having a helper that conditionally decrements only is counterintuitive 
and people can get confused. Either add 2 helpers for inc/dec and use
them where appropriate or don't use helpers at all.

> +
> +/* Set a FDB flag that implies the entry was not learned, and account
> + * for changes in the learned status.
> + *
> + * This function takes a lock, so ensure it is not called in the fast
> + * path.
> + */
> +static void fdb_set_flag_not_learned(struct net_bridge *br,
> +				     struct net_bridge_fdb_entry *fdb,
> +				     long nr)
> +{
> +	spin_lock_bh(&br->hash_lock);
> +
> +	__fdb_set_flag_not_learned(br, fdb, nr);
> +

Unnecessary empty lines

> +	spin_unlock_bh(&br->hash_lock);
> +}
> +
>   /* When a static FDB entry is deleted, the HW address from that entry is
>    * also removed from the bridge private HW address list and updates all
>    * the ports with needed information.
> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>   static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>   		       bool swdev_notify)
>   {
> +	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);

*_bit

> +
>   	trace_fdb_delete(br, f);
>   
>   	if (test_bit(BR_FDB_STATIC, &f->flags))
> @@ -329,11 +363,16 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>   	hlist_del_init_rcu(&f->fdb_node);
>   	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
>   			       br_fdb_rht_params);
> +	br->fdb_cur_learned_entries -= learned;

This looks confusing because of the conditional above, as I mentioned 
please make it explicit.

>   	fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
>   	call_rcu(&f->rcu, fdb_rcu_free);
>   }
>   
> -/* Delete a local entry if no other port had the same address. */
> +/* Delete a local entry if no other port had the same address.
> + *
> + * This function should only be called on entries with BR_FDB_LOCAL set,
> + * so clear_bit never removes the last bit in BR_FDB_NOT_LEARNED_MASK.
> + */
>   static void fdb_delete_local(struct net_bridge *br,
>   			     const struct net_bridge_port *p,
>   			     struct net_bridge_fdb_entry *f)
> @@ -390,6 +429,11 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
>   {
>   	struct net_bridge_fdb_entry *fdb;
>   	int err;
> +	bool learned = !(flags & BR_FDB_NOT_LEARNED_MASK);
> +

*_bit and also use reverse xmas tree order for local variables

> +	if (unlikely(learned && br->fdb_max_learned_entries &&
> +		     br->fdb_cur_learned_entries >= br->fdb_max_learned_entries))
> +		return NULL;
>   
>   	fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC);
>   	if (!fdb)
> @@ -409,6 +453,8 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
>   
>   	hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list);
>   
> +	br->fdb_cur_learned_entries += learned;

I'd prefer to be explicit heere instead of adding a bool

> +
>   	return fdb;
>   }
>   
> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>   			}
>   
>   			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
> -				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);

Unacceptable to take hash_lock and block all learning here, eventual 
consistency is ok or some other method that is much lighter and doesn't
block all learning or requires a lock.

>   			if (unlikely(fdb_modified)) {
>   				trace_br_fdb_update(br, source, addr, vid, flags);
>   				fdb_notify(br, fdb, RTM_NEWNEIGH, true);
> @@ -1070,6 +1116,8 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
>   			modified = true;
>   		}
>   
> +		if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))

*_bit

> +			br->fdb_cur_learned_entries--; >   		set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);

This looks like open-coded version of the helper above.

>   	}
>   
> @@ -1440,10 +1488,10 @@ int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
>   		}
>   
>   		if (swdev_notify)
> -			set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> +			__fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
>   
>   		if (!p)
> -			set_bit(BR_FDB_LOCAL, &fdb->flags);
> +			__fdb_set_flag_not_learned(br, fdb, BR_FDB_LOCAL);
>   
>   		if (modified)
>   			fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
> @@ -1508,3 +1556,14 @@ void br_fdb_clear_offload(const struct net_device *dev, u16 vid)
>   	spin_unlock_bh(&p->br->hash_lock);
>   }
>   EXPORT_SYMBOL_GPL(br_fdb_clear_offload);
> +
> +u32 br_fdb_get_cur_learned_entries(struct net_bridge *br)
> +{
> +	u32 ret;
> +
> +	spin_lock_bh(&br->hash_lock);
> +	ret = br->fdb_cur_learned_entries;
> +	spin_unlock_bh(&br->hash_lock);
> +
> +	return ret;
> +}

It's unnecessary to take lock to read the value, it'd be better to have
inaccurate result and read it without locking and blocking learning.

> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 05c5863d2e20..954c468d52ec 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
>   			return err;
>   	}
>   
> +	if (data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]) {
> +		u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]);
> +
> +		br->fdb_max_learned_entries = val;
> +	}

Here you change the value without any locking, while you check it with 
locks held - kcsan won't be happy.

> +
>   	return 0;
>   }
>   
> @@ -1581,6 +1587,8 @@ static size_t br_get_size(const struct net_device *brdev)
>   	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_TOPOLOGY_CHANGE_TIMER */
>   	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_GC_TIMER */
>   	       nla_total_size(ETH_ALEN) +       /* IFLA_BR_GROUP_ADDR */
> +	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_CUR_LEARNED_ENTRIES */
> +	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_MAX_LEARNED_ENTRIES */
>   #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>   	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_ROUTER */
>   	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_SNOOPING */
> @@ -1620,6 +1628,7 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
>   	u32 stp_enabled = br->stp_enabled;
>   	u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1];
>   	u8 vlan_enabled = br_vlan_enabled(br->dev);
> +	u32 fdb_cur_learned_entries = br_fdb_get_cur_learned_entries(br);
>   	struct br_boolopt_multi bm;
>   	u64 clockval;
>   
> @@ -1656,7 +1665,9 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
>   	    nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
>   		       br->topology_change_detected) ||
>   	    nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
> -	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm))
> +	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) ||
> +	    nla_put_u32(skb, IFLA_BR_FDB_CUR_LEARNED_ENTRIES, fdb_cur_learned_entries) ||
> +	    nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED_ENTRIES, br->fdb_max_learned_entries))

I'd say export these only if max entries is > 0

>   		return -EMSGSIZE;
>   
>   #ifdef CONFIG_BRIDGE_VLAN_FILTERING
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2119729ded2b..df079191479e 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -275,6 +275,8 @@ enum {
>   	BR_FDB_LOCKED,
>   };
>   
> +#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))

Not learned sounds confusing and doesn't accurately describe the entry.
BR_FDB_DYNAMIC_LEARNED perhaps or some other name, that doesn't cause
double negatives (not not learned).

> +
>   struct net_bridge_fdb_key {
>   	mac_addr addr;
>   	u16 vlan_id;
> @@ -553,6 +555,9 @@ struct net_bridge {
>   	struct kobject			*ifobj;
>   	u32				auto_cnt;
>   
> +	u32				fdb_max_learned_entries;
> +	u32				fdb_cur_learned_entries;
> +
>   #ifdef CONFIG_NET_SWITCHDEV
>   	/* Counter used to make sure that hardware domains get unique
>   	 * identifiers in case a bridge spans multiple switchdev instances.
> @@ -838,6 +843,7 @@ int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
>   			      bool swdev_notify);
>   void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
>   			  const unsigned char *addr, u16 vid, bool offloaded);
> +u32 br_fdb_get_cur_learned_entries(struct net_bridge *br);
>   
>   /* br_forward.c */
>   enum br_pkt_type {
Johannes Nixdorf June 20, 2023, 1:35 p.m. UTC | #3
On Tue, Jun 20, 2023 at 09:55:31AM +0300, Nikolay Aleksandrov wrote:
> On 6/19/23 10:14, Johannes Nixdorf wrote:
> > +/* Set a FDB flag that implies the entry was not learned, and account
> > + * for changes in the learned status.
> > + */
> > +static void __fdb_set_flag_not_learned(struct net_bridge *br,
> > +				       struct net_bridge_fdb_entry *fdb,
> > +				       long nr)
> > +{
> > +	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
> 
> Please use *_bit

Can you tell me which *_bit helper you had in mind? The shortest option I could
come up with the ones I found seemed needlessly verbose and wasteful:

  static const unsigned long br_fdb_not_learned_mask = BR_FDB_NOT_LEARNED_MASK;
  ...
  WARN_ON_ONCE(test_bit(nr, &br_fdb_not_learned_mask));

> > +
> > +	/* learned before, but we set a flag that implies it's manually added */
> > +	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
> 
> Please use *_bit

This will be fixed by the redesign to get rid of my use of hash_lock
(proposed later in this mail), as I'll only have to test one bit and can
use test_and_clear_bit then.

> > +		br->fdb_cur_learned_entries--;
> > +	set_bit(nr, &fdb->flags);
> > +}
> 
> Having a helper that conditionally decrements only is counterintuitive and
> people can get confused. Either add 2 helpers for inc/dec and use
> them where appropriate or don't use helpers at all.

The *_set_bit helper can only cause the count to drop, as there
is currently no flag that could turn a manually added entry back into
a dynamically learned one.

The analogous helper that increments the value would be *_clear_bit,
which I did not add because it has no users.

> > +	spin_unlock_bh(&br->hash_lock);
> > +}
> > +
> >   /* When a static FDB entry is deleted, the HW address from that entry is
> >    * also removed from the bridge private HW address list and updates all
> >    * the ports with needed information.
> > @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
> >   static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
> >   		       bool swdev_notify)
> >   {
> > +	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
> 
> *_bit

I do not know a *_bit helper that would help me test the intersection
of multiple bits on both sides. Do you have any in mind?

> > +
> >   	return fdb;
> >   }
> > @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
> >   			}
> >   			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
> > -				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> > +				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
> 
> Unacceptable to take hash_lock and block all learning here, eventual
> consistency is ok or some other method that is much lighter and doesn't
> block all learning or requires a lock.

At the time of writing v2, this seemed difficult because we want to test
multiple bits and increment a counter, but remembering that clear_bit
is never called for the bits I care about I came up with the following
approach:

  a) Add a new flag BR_FDB_DYNAMIC_LEARNED, which is set to 1 iff
     BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL are set in br_create.
     Every time BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL is set, also clear
     BR_FDB_DYNAMIC_LEARNED, and decrement the count if it was 1 before.
     This solves the problem of testing two bits at once, and would not
     have been possible if we had a code path that could clear both bits,
     as it is not as easy to decide when to set BR_FDB_DYNAMIC_LEARNED
     again in that case.
  b) Replace the current count with an atomic_t.

I'll change it this way for v3.

> >   		return -EMSGSIZE;
> >   #ifdef CONFIG_BRIDGE_VLAN_FILTERING
> > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> > index 2119729ded2b..df079191479e 100644
> > --- a/net/bridge/br_private.h
> > +++ b/net/bridge/br_private.h
> > @@ -275,6 +275,8 @@ enum {
> >   	BR_FDB_LOCKED,
> >   };
> > +#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))
> 
> Not learned sounds confusing and doesn't accurately describe the entry.
> BR_FDB_DYNAMIC_LEARNED perhaps or some other name, that doesn't cause
> double negatives (not not learned).

Your proposal would not have captured the mask, as it describes all the
opposite cases, which were _not_ dynamically learned.

But with the proposed new flag from the hash_lock comment we can trivially
flip the meaning, so I went with your proposed name there.
Nikolay Aleksandrov June 22, 2023, 12:27 p.m. UTC | #4
On 20/06/2023 16:35, Johannes Nixdorf wrote:
> On Tue, Jun 20, 2023 at 09:55:31AM +0300, Nikolay Aleksandrov wrote:
>> On 6/19/23 10:14, Johannes Nixdorf wrote:
>>> +/* Set a FDB flag that implies the entry was not learned, and account
>>> + * for changes in the learned status.
>>> + */
>>> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
>>> +				       struct net_bridge_fdb_entry *fdb,
>>> +				       long nr)
>>> +{
>>> +	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
>>
>> Please use *_bit
> 
> Can you tell me which *_bit helper you had in mind? The shortest option I could
> come up with the ones I found seemed needlessly verbose and wasteful:
> 
>   static const unsigned long br_fdb_not_learned_mask = BR_FDB_NOT_LEARNED_MASK;
>   ...
>   WARN_ON_ONCE(test_bit(nr, &br_fdb_not_learned_mask));
> 
>>> +
>>> +	/* learned before, but we set a flag that implies it's manually added */
>>> +	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
>>
>> Please use *_bit
> 
> This will be fixed by the redesign to get rid of my use of hash_lock
> (proposed later in this mail), as I'll only have to test one bit and can
> use test_and_clear_bit then.
> 
>>> +		br->fdb_cur_learned_entries--;
>>> +	set_bit(nr, &fdb->flags);
>>> +}
>>
>> Having a helper that conditionally decrements only is counterintuitive and
>> people can get confused. Either add 2 helpers for inc/dec and use
>> them where appropriate or don't use helpers at all.
> 
> The *_set_bit helper can only cause the count to drop, as there
> is currently no flag that could turn a manually added entry back into
> a dynamically learned one.
> 
> The analogous helper that increments the value would be *_clear_bit,
> which I did not add because it has no users.
> 
>>> +	spin_unlock_bh(&br->hash_lock);
>>> +}
>>> +
>>>   /* When a static FDB entry is deleted, the HW address from that entry is
>>>    * also removed from the bridge private HW address list and updates all
>>>    * the ports with needed information.
>>> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>   static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>>   		       bool swdev_notify)
>>>   {
>>> +	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
>>
>> *_bit
> 
> I do not know a *_bit helper that would help me test the intersection
> of multiple bits on both sides. Do you have any in mind?
> 
>>> +
>>>   	return fdb;
>>>   }
>>> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>>>   			}
>>>   			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
>>> -				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
>>> +				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
>>
>> Unacceptable to take hash_lock and block all learning here, eventual
>> consistency is ok or some other method that is much lighter and doesn't
>> block all learning or requires a lock.
> 
> At the time of writing v2, this seemed difficult because we want to test
> multiple bits and increment a counter, but remembering that clear_bit
> is never called for the bits I care about I came up with the following
> approach:
> 
>   a) Add a new flag BR_FDB_DYNAMIC_LEARNED, which is set to 1 iff
>      BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL are set in br_create.
>      Every time BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL is set, also clear
>      BR_FDB_DYNAMIC_LEARNED, and decrement the count if it was 1 before.
>      This solves the problem of testing two bits at once, and would not
>      have been possible if we had a code path that could clear both bits,
>      as it is not as easy to decide when to set BR_FDB_DYNAMIC_LEARNED
>      again in that case.

I think you can try without adding any new flags, the places that add dynamic
entries are known for the inc part of the problem, and an entry can become
local/added_by_user again only through well known paths as well. You may be able to
infer whether to inc/dec and make it work with careful fn argument passing.
Could you please look into that way? I'd prefer that we don't add new flags as
there are already so many.

>   b) Replace the current count with an atomic_t.
> 

Sounds good.

> I'll change it this way for v3.
> 
>>>   		return -EMSGSIZE;
>>>   #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index 2119729ded2b..df079191479e 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -275,6 +275,8 @@ enum {
>>>   	BR_FDB_LOCKED,
>>>   };
>>> +#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))
>>
>> Not learned sounds confusing and doesn't accurately describe the entry.
>> BR_FDB_DYNAMIC_LEARNED perhaps or some other name, that doesn't cause
>> double negatives (not not learned).
> 
> Your proposal would not have captured the mask, as it describes all the
> opposite cases, which were _not_ dynamically learned.
> 
> But with the proposed new flag from the hash_lock comment we can trivially
> flip the meaning, so I went with your proposed name there.
Nikolay Aleksandrov June 22, 2023, 12:39 p.m. UTC | #5
On 22/06/2023 15:27, Nikolay Aleksandrov wrote:
> On 20/06/2023 16:35, Johannes Nixdorf wrote:
>> On Tue, Jun 20, 2023 at 09:55:31AM +0300, Nikolay Aleksandrov wrote:
>>> On 6/19/23 10:14, Johannes Nixdorf wrote:
>>>> +/* Set a FDB flag that implies the entry was not learned, and account
>>>> + * for changes in the learned status.
>>>> + */
>>>> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
>>>> +				       struct net_bridge_fdb_entry *fdb,
>>>> +				       long nr)
>>>> +{
>>>> +	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
>>>
>>> Please use *_bit
>>
>> Can you tell me which *_bit helper you had in mind? The shortest option I could
>> come up with the ones I found seemed needlessly verbose and wasteful:
>>
>>   static const unsigned long br_fdb_not_learned_mask = BR_FDB_NOT_LEARNED_MASK;
>>   ...
>>   WARN_ON_ONCE(test_bit(nr, &br_fdb_not_learned_mask));
>>
>>>> +
>>>> +	/* learned before, but we set a flag that implies it's manually added */
>>>> +	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
>>>
>>> Please use *_bit
>>
>> This will be fixed by the redesign to get rid of my use of hash_lock
>> (proposed later in this mail), as I'll only have to test one bit and can
>> use test_and_clear_bit then.
>>
>>>> +		br->fdb_cur_learned_entries--;
>>>> +	set_bit(nr, &fdb->flags);
>>>> +}
>>>
>>> Having a helper that conditionally decrements only is counterintuitive and
>>> people can get confused. Either add 2 helpers for inc/dec and use
>>> them where appropriate or don't use helpers at all.
>>
>> The *_set_bit helper can only cause the count to drop, as there
>> is currently no flag that could turn a manually added entry back into
>> a dynamically learned one.
>>
>> The analogous helper that increments the value would be *_clear_bit,
>> which I did not add because it has no users.
>>
>>>> +	spin_unlock_bh(&br->hash_lock);
>>>> +}
>>>> +
>>>>   /* When a static FDB entry is deleted, the HW address from that entry is
>>>>    * also removed from the bridge private HW address list and updates all
>>>>    * the ports with needed information.
>>>> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
>>>>   static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>>>>   		       bool swdev_notify)
>>>>   {
>>>> +	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
>>>
>>> *_bit
>>
>> I do not know a *_bit helper that would help me test the intersection
>> of multiple bits on both sides. Do you have any in mind?
>>
>>>> +
>>>>   	return fdb;
>>>>   }
>>>> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>>>>   			}
>>>>   			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
>>>> -				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
>>>> +				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
>>>
>>> Unacceptable to take hash_lock and block all learning here, eventual
>>> consistency is ok or some other method that is much lighter and doesn't
>>> block all learning or requires a lock.
>>
>> At the time of writing v2, this seemed difficult because we want to test
>> multiple bits and increment a counter, but remembering that clear_bit
>> is never called for the bits I care about I came up with the following
>> approach:
>>
>>   a) Add a new flag BR_FDB_DYNAMIC_LEARNED, which is set to 1 iff
>>      BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL are set in br_create.
>>      Every time BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL is set, also clear
>>      BR_FDB_DYNAMIC_LEARNED, and decrement the count if it was 1 before.
>>      This solves the problem of testing two bits at once, and would not
>>      have been possible if we had a code path that could clear both bits,
>>      as it is not as easy to decide when to set BR_FDB_DYNAMIC_LEARNED
>>      again in that case.
> 
> I think you can try without adding any new flags, the places that add dynamic
> entries are known for the inc part of the problem, and an entry can become
> local/added_by_user again only through well known paths as well. You may be able to
> infer whether to inc/dec and make it work with careful fn argument passing.
> Could you please look into that way? I'd prefer that we don't add new flags as
> there are already so many.
> 

To clarify  - just look into it if it is possible and looks sane, if not do go
ahead with the new flag.

>>   b) Replace the current count with an atomic_t.
>>
> 
> Sounds good.
> 
>> I'll change it this way for v3.
>>
>>>>   		return -EMSGSIZE;
>>>>   #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>> index 2119729ded2b..df079191479e 100644
>>>> --- a/net/bridge/br_private.h
>>>> +++ b/net/bridge/br_private.h
>>>> @@ -275,6 +275,8 @@ enum {
>>>>   	BR_FDB_LOCKED,
>>>>   };
>>>> +#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))
>>>
>>> Not learned sounds confusing and doesn't accurately describe the entry.
>>> BR_FDB_DYNAMIC_LEARNED perhaps or some other name, that doesn't cause
>>> double negatives (not not learned).
>>
>> Your proposal would not have captured the mask, as it describes all the
>> opposite cases, which were _not_ dynamically learned.
>>
>> But with the proposed new flag from the hash_lock comment we can trivially
>> flip the meaning, so I went with your proposed name there.
>
diff mbox series

Patch

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4ac1000b0ef2..165b9014379b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -510,6 +510,8 @@  enum {
 	IFLA_BR_VLAN_STATS_PER_PORT,
 	IFLA_BR_MULTI_BOOLOPT,
 	IFLA_BR_MCAST_QUERIER_STATE,
+	IFLA_BR_FDB_CUR_LEARNED_ENTRIES,
+	IFLA_BR_FDB_MAX_LEARNED_ENTRIES,
 	__IFLA_BR_MAX,
 };
 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index ac1dc8723b9c..bc61d1fd5fcf 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -301,6 +301,38 @@  static void fdb_add_hw_addr(struct net_bridge *br, const unsigned char *addr)
 	}
 }
 
+/* Set a FDB flag that implies the entry was not learned, and account
+ * for changes in the learned status.
+ */
+static void __fdb_set_flag_not_learned(struct net_bridge *br,
+				       struct net_bridge_fdb_entry *fdb,
+				       long nr)
+{
+	WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
+
+	/* learned before, but we set a flag that implies it's manually added */
+	if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
+		br->fdb_cur_learned_entries--;
+	set_bit(nr, &fdb->flags);
+}
+
+/* Set a FDB flag that implies the entry was not learned, and account
+ * for changes in the learned status.
+ *
+ * This function takes a lock, so ensure it is not called in the fast
+ * path.
+ */
+static void fdb_set_flag_not_learned(struct net_bridge *br,
+				     struct net_bridge_fdb_entry *fdb,
+				     long nr)
+{
+	spin_lock_bh(&br->hash_lock);
+
+	__fdb_set_flag_not_learned(br, fdb, nr);
+
+	spin_unlock_bh(&br->hash_lock);
+}
+
 /* When a static FDB entry is deleted, the HW address from that entry is
  * also removed from the bridge private HW address list and updates all
  * the ports with needed information.
@@ -321,6 +353,8 @@  static void fdb_del_hw_addr(struct net_bridge *br, const unsigned char *addr)
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
 		       bool swdev_notify)
 {
+	bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
+
 	trace_fdb_delete(br, f);
 
 	if (test_bit(BR_FDB_STATIC, &f->flags))
@@ -329,11 +363,16 @@  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
 	hlist_del_init_rcu(&f->fdb_node);
 	rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
 			       br_fdb_rht_params);
+	br->fdb_cur_learned_entries -= learned;
 	fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
 	call_rcu(&f->rcu, fdb_rcu_free);
 }
 
-/* Delete a local entry if no other port had the same address. */
+/* Delete a local entry if no other port had the same address.
+ *
+ * This function should only be called on entries with BR_FDB_LOCAL set,
+ * so clear_bit never removes the last bit in BR_FDB_NOT_LEARNED_MASK.
+ */
 static void fdb_delete_local(struct net_bridge *br,
 			     const struct net_bridge_port *p,
 			     struct net_bridge_fdb_entry *f)
@@ -390,6 +429,11 @@  static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
 {
 	struct net_bridge_fdb_entry *fdb;
 	int err;
+	bool learned = !(flags & BR_FDB_NOT_LEARNED_MASK);
+
+	if (unlikely(learned && br->fdb_max_learned_entries &&
+		     br->fdb_cur_learned_entries >= br->fdb_max_learned_entries))
+		return NULL;
 
 	fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC);
 	if (!fdb)
@@ -409,6 +453,8 @@  static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br,
 
 	hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list);
 
+	br->fdb_cur_learned_entries += learned;
+
 	return fdb;
 }
 
@@ -894,7 +940,7 @@  void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
 			}
 
 			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, &flags)))
-				set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
+				fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
 			if (unlikely(fdb_modified)) {
 				trace_br_fdb_update(br, source, addr, vid, flags);
 				fdb_notify(br, fdb, RTM_NEWNEIGH, true);
@@ -1070,6 +1116,8 @@  static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,
 			modified = true;
 		}
 
+		if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
+			br->fdb_cur_learned_entries--;
 		set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
 	}
 
@@ -1440,10 +1488,10 @@  int br_fdb_external_learn_add(struct net_bridge *br, struct net_bridge_port *p,
 		}
 
 		if (swdev_notify)
-			set_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
+			__fdb_set_flag_not_learned(br, fdb, BR_FDB_ADDED_BY_USER);
 
 		if (!p)
-			set_bit(BR_FDB_LOCAL, &fdb->flags);
+			__fdb_set_flag_not_learned(br, fdb, BR_FDB_LOCAL);
 
 		if (modified)
 			fdb_notify(br, fdb, RTM_NEWNEIGH, swdev_notify);
@@ -1508,3 +1556,14 @@  void br_fdb_clear_offload(const struct net_device *dev, u16 vid)
 	spin_unlock_bh(&p->br->hash_lock);
 }
 EXPORT_SYMBOL_GPL(br_fdb_clear_offload);
+
+u32 br_fdb_get_cur_learned_entries(struct net_bridge *br)
+{
+	u32 ret;
+
+	spin_lock_bh(&br->hash_lock);
+	ret = br->fdb_cur_learned_entries;
+	spin_unlock_bh(&br->hash_lock);
+
+	return ret;
+}
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 05c5863d2e20..954c468d52ec 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1527,6 +1527,12 @@  static int br_changelink(struct net_device *brdev, struct nlattr *tb[],
 			return err;
 	}
 
+	if (data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]) {
+		u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_LEARNED_ENTRIES]);
+
+		br->fdb_max_learned_entries = val;
+	}
+
 	return 0;
 }
 
@@ -1581,6 +1587,8 @@  static size_t br_get_size(const struct net_device *brdev)
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_TOPOLOGY_CHANGE_TIMER */
 	       nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_GC_TIMER */
 	       nla_total_size(ETH_ALEN) +       /* IFLA_BR_GROUP_ADDR */
+	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_CUR_LEARNED_ENTRIES */
+	       nla_total_size(sizeof(u32)) +    /* IFLA_BR_FDB_MAX_LEARNED_ENTRIES */
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_ROUTER */
 	       nla_total_size(sizeof(u8)) +     /* IFLA_BR_MCAST_SNOOPING */
@@ -1620,6 +1628,7 @@  static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 	u32 stp_enabled = br->stp_enabled;
 	u16 priority = (br->bridge_id.prio[0] << 8) | br->bridge_id.prio[1];
 	u8 vlan_enabled = br_vlan_enabled(br->dev);
+	u32 fdb_cur_learned_entries = br_fdb_get_cur_learned_entries(br);
 	struct br_boolopt_multi bm;
 	u64 clockval;
 
@@ -1656,7 +1665,9 @@  static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev)
 	    nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
 		       br->topology_change_detected) ||
 	    nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
-	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm))
+	    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) ||
+	    nla_put_u32(skb, IFLA_BR_FDB_CUR_LEARNED_ENTRIES, fdb_cur_learned_entries) ||
+	    nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED_ENTRIES, br->fdb_max_learned_entries))
 		return -EMSGSIZE;
 
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2119729ded2b..df079191479e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -275,6 +275,8 @@  enum {
 	BR_FDB_LOCKED,
 };
 
+#define BR_FDB_NOT_LEARNED_MASK (BIT(BR_FDB_LOCAL) | BIT(BR_FDB_ADDED_BY_USER))
+
 struct net_bridge_fdb_key {
 	mac_addr addr;
 	u16 vlan_id;
@@ -553,6 +555,9 @@  struct net_bridge {
 	struct kobject			*ifobj;
 	u32				auto_cnt;
 
+	u32				fdb_max_learned_entries;
+	u32				fdb_cur_learned_entries;
+
 #ifdef CONFIG_NET_SWITCHDEV
 	/* Counter used to make sure that hardware domains get unique
 	 * identifiers in case a bridge spans multiple switchdev instances.
@@ -838,6 +843,7 @@  int br_fdb_external_learn_del(struct net_bridge *br, struct net_bridge_port *p,
 			      bool swdev_notify);
 void br_fdb_offloaded_set(struct net_bridge *br, struct net_bridge_port *p,
 			  const unsigned char *addr, u16 vid, bool offloaded);
+u32 br_fdb_get_cur_learned_entries(struct net_bridge *br);
 
 /* br_forward.c */
 enum br_pkt_type {