mbox series

[net-next,v2,0/8] net: bridge: add flush filtering support

Message ID 20220411172934.1813604-1-razor@blackwall.org (mailing list archive)
Headers show
Series net: bridge: add flush filtering support | expand

Message

Nikolay Aleksandrov April 11, 2022, 5:29 p.m. UTC
Hi,
This patch-set adds support to specify filtering conditions for a flush
operation. This version has entirely different entry point (v1 had
bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
netdev ndo_fdb_flush op) so I'll give a new overview altogether.
After me and Ido discussed the feature offlist, we agreed that it would
be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
callback which can be re-used for other drivers (e.g. vxlan).
Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
new ndo_fdb_flush call. With this structure we need to add a generic
rtnl_fdb_flush which will be used to do basic attribute validation and
dispatch the call to the appropriate device based on the NTF_USE/MASTER
flags (patch 03). Patch 04 then adds some common flush attributes which
are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
flags/state masks) with basic attribute validation, further validation
can be done by the implementers of the ndo callback. Patch 05 adds a
minimal ndo_fdb_flush to the bridge driver, it uses the current
br_fdb_flush implementation to flush all entries similar to existing
calls. Patch 06 adds filtering support to the new bridge flush op which
supports target ifindex (port or bridge), vlan id and flags/state mask.
Patch 07 converts ndm state/flags and their masks to bridge-private flags
and fills them in the filter descriptor for matching. Finally patch 08
fills in the target ifindex (after validating it) and vlan id (already
validated by rtnl_fdb_flush) for matching. Flush filtering is needed
because user-space applications need a quick way to delete only a
specific set of entries, e.g. mlag implementations need a way to flush only
dynamic entries excluding externally learned ones or only externally
learned ones without static entries etc. Also apps usually want to target
only a specific vlan or port/vlan combination. The current 2 flush
operations (per port and bridge-wide) are not extensible and cannot
provide such filtering.

I decided against embedding new attrs into the old flush attributes for
multiple reasons - proper error handling on unsupported attributes,
older kernels silently flushing all, need for a second mechanism to
signal that the attribute should be parsed (e.g. using boolopts),
special treatment for permanent entries.

Examples:
$ bridge fdb flush dev bridge vlan 100 static
< flush all static entries on vlan 100 >
$ bridge fdb flush dev bridge vlan 1 dynamic
< flush all dynamic entries on vlan 1 >
$ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
< flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev ens16 vlan 1 dynamic master
< as above: flush all dynamic entries on port ens16 and vlan 1 >
$ bridge fdb flush dev bridge nooffloaded nopermanent self
< flush all non-offloaded and non-permanent entries >
$ bridge fdb flush dev bridge static noextern_learn
< flush all static entries which are not externally learned >
$ bridge fdb flush dev bridge permanent
< flush all permanent entries >
$ bridge fdb flush dev bridge port bridge permanent
< flush all permanent entries pointing to the bridge itself >

Note that all flags have their negated version (static vs nostatic etc)
and there are some tricky cases to handle like "static" which in flag
terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
mask matches on both but we need only NUD_NOARP to be set. That's
because permanent entries have both set so we can't just match on
NUD_NOARP. Also note that this flush operation doesn't treat permanent
entries in a special way (fdb_delete vs fdb_delete_local), it will
delete them regardless if any port is using them. We can extend the api
with a flag to do that if needed in the future.

Patch-sets (in order):
 - Initial flush infra and fdb flush filtering (this set)
 - iproute2 support
 - selftests

Future work:
 - mdb flush support (RTM_FLUSHMDB type)

Thanks to Ido for the great discussion and feedback while working on this.

Thanks,
 Nik

Nikolay Aleksandrov (8):
  net: rtnetlink: add RTM_FLUSHNEIGH
  net: add ndo_fdb_flush op
  net: bridge: fdb: add ndo_fdb_flush op
  net: rtnetlink: register a generic rtnl_fdb_flush call
  net: rtnetlink: add common flush attributes
  net: bridge: fdb: add support for fine-grained flushing
  net: bridge: fdb: add support for flush filtering based on ndm flags
    and state
  net: bridge: fdb: add support for flush filtering based on ifindex and
    vlan

 include/linux/netdevice.h      |  11 +++
 include/uapi/linux/neighbour.h |  10 +++
 include/uapi/linux/rtnetlink.h |   3 +
 net/bridge/br_device.c         |   1 +
 net/bridge/br_fdb.c            | 154 +++++++++++++++++++++++++++++++--
 net/bridge/br_netlink.c        |   9 +-
 net/bridge/br_private.h        |  19 +++-
 net/bridge/br_sysfs_br.c       |   6 +-
 net/core/rtnetlink.c           |  62 +++++++++++++
 security/selinux/nlmsgtab.c    |   3 +-
 10 files changed, 266 insertions(+), 12 deletions(-)

Comments

Nikolay Aleksandrov April 11, 2022, 5:42 p.m. UTC | #1
On 11/04/2022 20:29, Nikolay Aleksandrov wrote:
> Hi,
> This patch-set adds support to specify filtering conditions for a flush
> operation. This version has entirely different entry point (v1 had
> bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
> netdev ndo_fdb_flush op) so I'll give a new overview altogether.
> After me and Ido discussed the feature offlist, we agreed that it would
> be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
> callback which can be re-used for other drivers (e.g. vxlan).
> Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
> new ndo_fdb_flush call. With this structure we need to add a generic
> rtnl_fdb_flush which will be used to do basic attribute validation and
> dispatch the call to the appropriate device based on the NTF_USE/MASTER
> flags (patch 03). Patch 04 then adds some common flush attributes which
> are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
> flags/state masks) with basic attribute validation, further validation
> can be done by the implementers of the ndo callback. Patch 05 adds a
> minimal ndo_fdb_flush to the bridge driver, it uses the current
> br_fdb_flush implementation to flush all entries similar to existing
> calls. Patch 06 adds filtering support to the new bridge flush op which
> supports target ifindex (port or bridge), vlan id and flags/state mask.
> Patch 07 converts ndm state/flags and their masks to bridge-private flags
> and fills them in the filter descriptor for matching. Finally patch 08

Aargh.. I mixed up the patch numbers above. Patch 03 adds the minimal ndo_fdb_flush
to the bridge driver (not patch 05), patch 04 adds the generic rtnl_fdb_flush
(not patch 03) and patch 05 adds the common attributes (not patch 04).

Let me know if you'd like me to repost it with fixed numbers. I'll wait for
feedback anyway.

> fills in the target ifindex (after validating it) and vlan id (already
> validated by rtnl_fdb_flush) for matching. Flush filtering is needed> because user-space applications need a quick way to delete only a
> specific set of entries, e.g. mlag implementations need a way to flush only
> dynamic entries excluding externally learned ones or only externally
> learned ones without static entries etc. Also apps usually want to target
> only a specific vlan or port/vlan combination. The current 2 flush
> operations (per port and bridge-wide) are not extensible and cannot
> provide such filtering.
Roopa Prabhu April 11, 2022, 6:08 p.m. UTC | #2
On 4/11/22 10:29, Nikolay Aleksandrov wrote:
> Hi,
> This patch-set adds support to specify filtering conditions for a flush
> operation. This version has entirely different entry point (v1 had
> bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
> netdev ndo_fdb_flush op) so I'll give a new overview altogether.
> After me and Ido discussed the feature offlist, we agreed that it would
> be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
> callback which can be re-used for other drivers (e.g. vxlan).
> Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
> new ndo_fdb_flush call. With this structure we need to add a generic
> rtnl_fdb_flush which will be used to do basic attribute validation and
> dispatch the call to the appropriate device based on the NTF_USE/MASTER
> flags (patch 03). Patch 04 then adds some common flush attributes which
> are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
> flags/state masks) with basic attribute validation, further validation
> can be done by the implementers of the ndo callback. Patch 05 adds a
> minimal ndo_fdb_flush to the bridge driver, it uses the current
> br_fdb_flush implementation to flush all entries similar to existing
> calls. Patch 06 adds filtering support to the new bridge flush op which
> supports target ifindex (port or bridge), vlan id and flags/state mask.
> Patch 07 converts ndm state/flags and their masks to bridge-private flags
> and fills them in the filter descriptor for matching. Finally patch 08
> fills in the target ifindex (after validating it) and vlan id (already
> validated by rtnl_fdb_flush) for matching. Flush filtering is needed
> because user-space applications need a quick way to delete only a
> specific set of entries, e.g. mlag implementations need a way to flush only
> dynamic entries excluding externally learned ones or only externally
> learned ones without static entries etc. Also apps usually want to target
> only a specific vlan or port/vlan combination. The current 2 flush
> operations (per port and bridge-wide) are not extensible and cannot
> provide such filtering.
>
> I decided against embedding new attrs into the old flush attributes for
> multiple reasons - proper error handling on unsupported attributes,
> older kernels silently flushing all, need for a second mechanism to
> signal that the attribute should be parsed (e.g. using boolopts),
> special treatment for permanent entries.
>
> Examples:
> $ bridge fdb flush dev bridge vlan 100 static
> < flush all static entries on vlan 100 >
> $ bridge fdb flush dev bridge vlan 1 dynamic
> < flush all dynamic entries on vlan 1 >
> $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
> < flush all dynamic entries on port ens16 and vlan 1 >
> $ bridge fdb flush dev ens16 vlan 1 dynamic master
> < as above: flush all dynamic entries on port ens16 and vlan 1 >
> $ bridge fdb flush dev bridge nooffloaded nopermanent self
> < flush all non-offloaded and non-permanent entries >
> $ bridge fdb flush dev bridge static noextern_learn
> < flush all static entries which are not externally learned >
> $ bridge fdb flush dev bridge permanent
> < flush all permanent entries >
> $ bridge fdb flush dev bridge port bridge permanent
> < flush all permanent entries pointing to the bridge itself >
>
> Note that all flags have their negated version (static vs nostatic etc)
> and there are some tricky cases to handle like "static" which in flag
> terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
> mask matches on both but we need only NUD_NOARP to be set. That's
> because permanent entries have both set so we can't just match on
> NUD_NOARP. Also note that this flush operation doesn't treat permanent
> entries in a special way (fdb_delete vs fdb_delete_local), it will
> delete them regardless if any port is using them. We can extend the api
> with a flag to do that if needed in the future.
>
> Patch-sets (in order):
>   - Initial flush infra and fdb flush filtering (this set)
>   - iproute2 support
>   - selftests
>
> Future work:
>   - mdb flush support (RTM_FLUSHMDB type)
>
> Thanks to Ido for the great discussion and feedback while working on this.
>
Cant we pile this on to RTM_DELNEIGH with a flush flag ?.

It is a bulk del, and sounds seems similar to the bulk dev del 
discussion on netdev a few months ago (i dont remember how that api 
ended up to be. unless i am misremembering).

neigh subsystem also needs this, curious how this api will work there.

(apologies if you guys already discussed this, did not have time to look 
through all the comments)
Nikolay Aleksandrov April 11, 2022, 6:18 p.m. UTC | #3
On 11/04/2022 21:08, Roopa Prabhu wrote:
> 
> On 4/11/22 10:29, Nikolay Aleksandrov wrote:
>> Hi,
>> This patch-set adds support to specify filtering conditions for a flush
>> operation. This version has entirely different entry point (v1 had
>> bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
>> netdev ndo_fdb_flush op) so I'll give a new overview altogether.
>> After me and Ido discussed the feature offlist, we agreed that it would
>> be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
>> callback which can be re-used for other drivers (e.g. vxlan).
>> Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
>> new ndo_fdb_flush call. With this structure we need to add a generic
>> rtnl_fdb_flush which will be used to do basic attribute validation and
>> dispatch the call to the appropriate device based on the NTF_USE/MASTER
>> flags (patch 03). Patch 04 then adds some common flush attributes which
>> are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
>> flags/state masks) with basic attribute validation, further validation
>> can be done by the implementers of the ndo callback. Patch 05 adds a
>> minimal ndo_fdb_flush to the bridge driver, it uses the current
>> br_fdb_flush implementation to flush all entries similar to existing
>> calls. Patch 06 adds filtering support to the new bridge flush op which
>> supports target ifindex (port or bridge), vlan id and flags/state mask.
>> Patch 07 converts ndm state/flags and their masks to bridge-private flags
>> and fills them in the filter descriptor for matching. Finally patch 08
>> fills in the target ifindex (after validating it) and vlan id (already
>> validated by rtnl_fdb_flush) for matching. Flush filtering is needed
>> because user-space applications need a quick way to delete only a
>> specific set of entries, e.g. mlag implementations need a way to flush only
>> dynamic entries excluding externally learned ones or only externally
>> learned ones without static entries etc. Also apps usually want to target
>> only a specific vlan or port/vlan combination. The current 2 flush
>> operations (per port and bridge-wide) are not extensible and cannot
>> provide such filtering.
>>
>> I decided against embedding new attrs into the old flush attributes for
>> multiple reasons - proper error handling on unsupported attributes,
>> older kernels silently flushing all, need for a second mechanism to
>> signal that the attribute should be parsed (e.g. using boolopts),
>> special treatment for permanent entries.
>>
>> Examples:
>> $ bridge fdb flush dev bridge vlan 100 static
>> < flush all static entries on vlan 100 >
>> $ bridge fdb flush dev bridge vlan 1 dynamic
>> < flush all dynamic entries on vlan 1 >
>> $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
>> < flush all dynamic entries on port ens16 and vlan 1 >
>> $ bridge fdb flush dev ens16 vlan 1 dynamic master
>> < as above: flush all dynamic entries on port ens16 and vlan 1 >
>> $ bridge fdb flush dev bridge nooffloaded nopermanent self
>> < flush all non-offloaded and non-permanent entries >
>> $ bridge fdb flush dev bridge static noextern_learn
>> < flush all static entries which are not externally learned >
>> $ bridge fdb flush dev bridge permanent
>> < flush all permanent entries >
>> $ bridge fdb flush dev bridge port bridge permanent
>> < flush all permanent entries pointing to the bridge itself >
>>
>> Note that all flags have their negated version (static vs nostatic etc)
>> and there are some tricky cases to handle like "static" which in flag
>> terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
>> mask matches on both but we need only NUD_NOARP to be set. That's
>> because permanent entries have both set so we can't just match on
>> NUD_NOARP. Also note that this flush operation doesn't treat permanent
>> entries in a special way (fdb_delete vs fdb_delete_local), it will
>> delete them regardless if any port is using them. We can extend the api
>> with a flag to do that if needed in the future.
>>
>> Patch-sets (in order):
>>   - Initial flush infra and fdb flush filtering (this set)
>>   - iproute2 support
>>   - selftests
>>
>> Future work:
>>   - mdb flush support (RTM_FLUSHMDB type)
>>
>> Thanks to Ido for the great discussion and feedback while working on this.
>>
> Cant we pile this on to RTM_DELNEIGH with a flush flag ?.
> 
> It is a bulk del, and sounds seems similar to the bulk dev del discussion on netdev a few months ago (i dont remember how that api ended up to be. unless i am misremembering).
> 
> neigh subsystem also needs this, curious how this api will work there.
> 
> (apologies if you guys already discussed this, did not have time to look through all the comments)
> 
> 
> 

I thought about that option, but I didn't like overloading delneigh like that.
del currently requires a mac address and we need to either signal the device supports
a null mac, or we should push that verification to ndo_fdb_del users. Also we'll have
attributes which are flush-specific and will work only when flushing as opposed to when
deleting a specific mac, so handling them in the different cases can become a pain.
MDBs will need DELMDB to be modified in a similar way.

IMO a separate flush op is cleaner, but I don't have a strong preference.
This can very easily be adapted to delneigh with just a bit more mechanical changes
if the mac check is pushed to the ndo implementers.

FLUSHNEIGH can easily work for neighs, just need another address family rtnl_register
that implements it, the new ndo is just for PF_BRIDGE. :)

Cheers,
 Nik
Nikolay Aleksandrov April 11, 2022, 6:31 p.m. UTC | #4
On 11/04/2022 21:18, Nikolay Aleksandrov wrote:
> On 11/04/2022 21:08, Roopa Prabhu wrote:
>>
>> On 4/11/22 10:29, Nikolay Aleksandrov wrote:
>>> Hi,
>>> This patch-set adds support to specify filtering conditions for a flush
>>> operation. This version has entirely different entry point (v1 had
>>> bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
>>> netdev ndo_fdb_flush op) so I'll give a new overview altogether.
>>> After me and Ido discussed the feature offlist, we agreed that it would
>>> be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
>>> callback which can be re-used for other drivers (e.g. vxlan).
>>> Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
>>> new ndo_fdb_flush call. With this structure we need to add a generic
>>> rtnl_fdb_flush which will be used to do basic attribute validation and
>>> dispatch the call to the appropriate device based on the NTF_USE/MASTER
>>> flags (patch 03). Patch 04 then adds some common flush attributes which
>>> are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
>>> flags/state masks) with basic attribute validation, further validation
>>> can be done by the implementers of the ndo callback. Patch 05 adds a
>>> minimal ndo_fdb_flush to the bridge driver, it uses the current
>>> br_fdb_flush implementation to flush all entries similar to existing
>>> calls. Patch 06 adds filtering support to the new bridge flush op which
>>> supports target ifindex (port or bridge), vlan id and flags/state mask.
>>> Patch 07 converts ndm state/flags and their masks to bridge-private flags
>>> and fills them in the filter descriptor for matching. Finally patch 08
>>> fills in the target ifindex (after validating it) and vlan id (already
>>> validated by rtnl_fdb_flush) for matching. Flush filtering is needed
>>> because user-space applications need a quick way to delete only a
>>> specific set of entries, e.g. mlag implementations need a way to flush only
>>> dynamic entries excluding externally learned ones or only externally
>>> learned ones without static entries etc. Also apps usually want to target
>>> only a specific vlan or port/vlan combination. The current 2 flush
>>> operations (per port and bridge-wide) are not extensible and cannot
>>> provide such filtering.
>>>
>>> I decided against embedding new attrs into the old flush attributes for
>>> multiple reasons - proper error handling on unsupported attributes,
>>> older kernels silently flushing all, need for a second mechanism to
>>> signal that the attribute should be parsed (e.g. using boolopts),
>>> special treatment for permanent entries.
>>>
>>> Examples:
>>> $ bridge fdb flush dev bridge vlan 100 static
>>> < flush all static entries on vlan 100 >
>>> $ bridge fdb flush dev bridge vlan 1 dynamic
>>> < flush all dynamic entries on vlan 1 >
>>> $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
>>> < flush all dynamic entries on port ens16 and vlan 1 >
>>> $ bridge fdb flush dev ens16 vlan 1 dynamic master
>>> < as above: flush all dynamic entries on port ens16 and vlan 1 >
>>> $ bridge fdb flush dev bridge nooffloaded nopermanent self
>>> < flush all non-offloaded and non-permanent entries >
>>> $ bridge fdb flush dev bridge static noextern_learn
>>> < flush all static entries which are not externally learned >
>>> $ bridge fdb flush dev bridge permanent
>>> < flush all permanent entries >
>>> $ bridge fdb flush dev bridge port bridge permanent
>>> < flush all permanent entries pointing to the bridge itself >
>>>
>>> Note that all flags have their negated version (static vs nostatic etc)
>>> and there are some tricky cases to handle like "static" which in flag
>>> terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
>>> mask matches on both but we need only NUD_NOARP to be set. That's
>>> because permanent entries have both set so we can't just match on
>>> NUD_NOARP. Also note that this flush operation doesn't treat permanent
>>> entries in a special way (fdb_delete vs fdb_delete_local), it will
>>> delete them regardless if any port is using them. We can extend the api
>>> with a flag to do that if needed in the future.
>>>
>>> Patch-sets (in order):
>>>   - Initial flush infra and fdb flush filtering (this set)
>>>   - iproute2 support
>>>   - selftests
>>>
>>> Future work:
>>>   - mdb flush support (RTM_FLUSHMDB type)
>>>
>>> Thanks to Ido for the great discussion and feedback while working on this.
>>>
>> Cant we pile this on to RTM_DELNEIGH with a flush flag ?.
>>
>> It is a bulk del, and sounds seems similar to the bulk dev del discussion on netdev a few months ago (i dont remember how that api ended up to be. unless i am misremembering).
>>
>> neigh subsystem also needs this, curious how this api will work there.
>>
>> (apologies if you guys already discussed this, did not have time to look through all the comments)
>>
>>
>>
> 
> I thought about that option, but I didn't like overloading delneigh like that.
> del currently requires a mac address and we need to either signal the device supports> a null mac, or we should push that verification to ndo_fdb_del users. Also we'll have

that's the only thing, overloading delneigh with a flush-behaviour (multi-del or whatever)
would require to push the mac check to ndo_fdb_del implementers

I don't mind going that road if others agree that we should do it through delneigh
+ a bit/option to signal flush, instead of a new rtm type.

> attributes which are flush-specific and will work only when flushing as opposed to when
> deleting a specific mac, so handling them in the different cases can become a pain.

scratch the specific attributes, those can be adapted for both cases

> MDBs will need DELMDB to be modified in a similar way.
> 
> IMO a separate flush op is cleaner, but I don't have a strong preference.
> This can very easily be adapted to delneigh with just a bit more mechanical changes
> if the mac check is pushed to the ndo implementers.
> 
> FLUSHNEIGH can easily work for neighs, just need another address family rtnl_register
> that implements it, the new ndo is just for PF_BRIDGE. :)
> 
> Cheers,
>  Nik
> 
>
Roopa Prabhu April 11, 2022, 7:22 p.m. UTC | #5
On 4/11/22 11:31, Nikolay Aleksandrov wrote:
> On 11/04/2022 21:18, Nikolay Aleksandrov wrote:
>> On 11/04/2022 21:08, Roopa Prabhu wrote:
>>> On 4/11/22 10:29, Nikolay Aleksandrov wrote:
>>>> Hi,
>>>> This patch-set adds support to specify filtering conditions for a flush
>>>> operation. This version has entirely different entry point (v1 had
>>>> bridge-specific IFLA attribute, here I add new RTM_FLUSHNEIGH msg and
>>>> netdev ndo_fdb_flush op) so I'll give a new overview altogether.
>>>> After me and Ido discussed the feature offlist, we agreed that it would
>>>> be best to add a new generic RTM_FLUSHNEIGH with a new ndo_fdb_flush
>>>> callback which can be re-used for other drivers (e.g. vxlan).
>>>> Patch 01 adds the new RTM_FLUSHNEIGH type, patch 02 then adds the
>>>> new ndo_fdb_flush call. With this structure we need to add a generic
>>>> rtnl_fdb_flush which will be used to do basic attribute validation and
>>>> dispatch the call to the appropriate device based on the NTF_USE/MASTER
>>>> flags (patch 03). Patch 04 then adds some common flush attributes which
>>>> are used by the bridge and vxlan drivers (target ifindex, vlan id, ndm
>>>> flags/state masks) with basic attribute validation, further validation
>>>> can be done by the implementers of the ndo callback. Patch 05 adds a
>>>> minimal ndo_fdb_flush to the bridge driver, it uses the current
>>>> br_fdb_flush implementation to flush all entries similar to existing
>>>> calls. Patch 06 adds filtering support to the new bridge flush op which
>>>> supports target ifindex (port or bridge), vlan id and flags/state mask.
>>>> Patch 07 converts ndm state/flags and their masks to bridge-private flags
>>>> and fills them in the filter descriptor for matching. Finally patch 08
>>>> fills in the target ifindex (after validating it) and vlan id (already
>>>> validated by rtnl_fdb_flush) for matching. Flush filtering is needed
>>>> because user-space applications need a quick way to delete only a
>>>> specific set of entries, e.g. mlag implementations need a way to flush only
>>>> dynamic entries excluding externally learned ones or only externally
>>>> learned ones without static entries etc. Also apps usually want to target
>>>> only a specific vlan or port/vlan combination. The current 2 flush
>>>> operations (per port and bridge-wide) are not extensible and cannot
>>>> provide such filtering.
>>>>
>>>> I decided against embedding new attrs into the old flush attributes for
>>>> multiple reasons - proper error handling on unsupported attributes,
>>>> older kernels silently flushing all, need for a second mechanism to
>>>> signal that the attribute should be parsed (e.g. using boolopts),
>>>> special treatment for permanent entries.
>>>>
>>>> Examples:
>>>> $ bridge fdb flush dev bridge vlan 100 static
>>>> < flush all static entries on vlan 100 >
>>>> $ bridge fdb flush dev bridge vlan 1 dynamic
>>>> < flush all dynamic entries on vlan 1 >
>>>> $ bridge fdb flush dev bridge port ens16 vlan 1 dynamic
>>>> < flush all dynamic entries on port ens16 and vlan 1 >
>>>> $ bridge fdb flush dev ens16 vlan 1 dynamic master
>>>> < as above: flush all dynamic entries on port ens16 and vlan 1 >
>>>> $ bridge fdb flush dev bridge nooffloaded nopermanent self
>>>> < flush all non-offloaded and non-permanent entries >
>>>> $ bridge fdb flush dev bridge static noextern_learn
>>>> < flush all static entries which are not externally learned >
>>>> $ bridge fdb flush dev bridge permanent
>>>> < flush all permanent entries >
>>>> $ bridge fdb flush dev bridge port bridge permanent
>>>> < flush all permanent entries pointing to the bridge itself >
>>>>
>>>> Note that all flags have their negated version (static vs nostatic etc)
>>>> and there are some tricky cases to handle like "static" which in flag
>>>> terms means fdbs that have NUD_NOARP but *not* NUD_PERMANENT, so the
>>>> mask matches on both but we need only NUD_NOARP to be set. That's
>>>> because permanent entries have both set so we can't just match on
>>>> NUD_NOARP. Also note that this flush operation doesn't treat permanent
>>>> entries in a special way (fdb_delete vs fdb_delete_local), it will
>>>> delete them regardless if any port is using them. We can extend the api
>>>> with a flag to do that if needed in the future.
>>>>
>>>> Patch-sets (in order):
>>>>    - Initial flush infra and fdb flush filtering (this set)
>>>>    - iproute2 support
>>>>    - selftests
>>>>
>>>> Future work:
>>>>    - mdb flush support (RTM_FLUSHMDB type)
>>>>
>>>> Thanks to Ido for the great discussion and feedback while working on this.
>>>>
>>> Cant we pile this on to RTM_DELNEIGH with a flush flag ?.
>>>
>>> It is a bulk del, and sounds seems similar to the bulk dev del discussion on netdev a few months ago (i dont remember how that api ended up to be. unless i am misremembering).
>>>
>>> neigh subsystem also needs this, curious how this api will work there.
>>>
>>> (apologies if you guys already discussed this, did not have time to look through all the comments)
>>>
>>>
>>>
>> I thought about that option, but I didn't like overloading delneigh like that.
>> del currently requires a mac address and we need to either signal the device supports> a null mac, or we should push that verification to ndo_fdb_del users. Also we'll have
> that's the only thing, overloading delneigh with a flush-behaviour (multi-del or whatever)
> would require to push the mac check to ndo_fdb_del implementers
>
> I don't mind going that road if others agree that we should do it through delneigh
> + a bit/option to signal flush, instead of a new rtm type.
>
>> attributes which are flush-specific and will work only when flushing as opposed to when
>> deleting a specific mac, so handling them in the different cases can become a pain.
> scratch the specific attributes, those can be adapted for both cases
>
>> MDBs will need DELMDB to be modified in a similar way.
>>
>> IMO a separate flush op is cleaner, but I don't have a strong preference.
>> This can very easily be adapted to delneigh with just a bit more mechanical changes
>> if the mac check is pushed to the ndo implementers.
>>
>> FLUSHNEIGH can easily work for neighs, just need another address family rtnl_register
>> that implements it, the new ndo is just for PF_BRIDGE. :)

all great points. My only reason to explore RTM_DELNEIGH is to see if we 
can find a recipe to support similar bulk deletes of other objects 
handled via rtm msgs in the future. Plus, it allows you to maintain 
symmetry between flush requests and object delete notification msg types.

Lets see if there are other opinions.

Thanks Nikolay
Jakub Kicinski April 11, 2022, 7:49 p.m. UTC | #6
On Mon, 11 Apr 2022 12:22:24 -0700 Roopa Prabhu wrote:
> >> I thought about that option, but I didn't like overloading delneigh like that.
> >> del currently requires a mac address and we need to either signal the device supports> a null mac, or we should push that verification to ndo_fdb_del users. Also we'll have  
> > that's the only thing, overloading delneigh with a flush-behaviour (multi-del or whatever)
> > would require to push the mac check to ndo_fdb_del implementers
> >
> > I don't mind going that road if others agree that we should do it through delneigh
> > + a bit/option to signal flush, instead of a new rtm type.
> >  
> >> attributes which are flush-specific and will work only when flushing as opposed to when
> >> deleting a specific mac, so handling them in the different cases can become a pain.  
> > scratch the specific attributes, those can be adapted for both cases
> >  
> >> MDBs will need DELMDB to be modified in a similar way.
> >>
> >> IMO a separate flush op is cleaner, but I don't have a strong preference.
> >> This can very easily be adapted to delneigh with just a bit more mechanical changes
> >> if the mac check is pushed to the ndo implementers.
> >>
> >> FLUSHNEIGH can easily work for neighs, just need another address family rtnl_register
> >> that implements it, the new ndo is just for PF_BRIDGE. :)  
> 
> all great points. My only reason to explore RTM_DELNEIGH is to see if we 
> can find a recipe to support similar bulk deletes of other objects 
> handled via rtm msgs in the future. Plus, it allows you to maintain 
> symmetry between flush requests and object delete notification msg types.
> 
> Lets see if there are other opinions.

I'd vote for reusing RTM_DELNEIGH, but that's purely based on
intuition, I don't know this code. I'd also lean towards core
creating struct net_bridge_fdb_flush_desc rather than piping
raw netlink attrs thru. Lastly feels like fdb ops should find 
a new home rather than ndos, but that's largely unrelated..
Nikolay Aleksandrov April 11, 2022, 8:34 p.m. UTC | #7
On 11/04/2022 22:49, Jakub Kicinski wrote:
> On Mon, 11 Apr 2022 12:22:24 -0700 Roopa Prabhu wrote:
>>>> I thought about that option, but I didn't like overloading delneigh like that.
>>>> del currently requires a mac address and we need to either signal the device supports> a null mac, or we should push that verification to ndo_fdb_del users. Also we'll have  
>>> that's the only thing, overloading delneigh with a flush-behaviour (multi-del or whatever)
>>> would require to push the mac check to ndo_fdb_del implementers
>>>
>>> I don't mind going that road if others agree that we should do it through delneigh
>>> + a bit/option to signal flush, instead of a new rtm type.
>>>  
>>>> attributes which are flush-specific and will work only when flushing as opposed to when
>>>> deleting a specific mac, so handling them in the different cases can become a pain.  
>>> scratch the specific attributes, those can be adapted for both cases
>>>  
>>>> MDBs will need DELMDB to be modified in a similar way.
>>>>
>>>> IMO a separate flush op is cleaner, but I don't have a strong preference.
>>>> This can very easily be adapted to delneigh with just a bit more mechanical changes
>>>> if the mac check is pushed to the ndo implementers.
>>>>
>>>> FLUSHNEIGH can easily work for neighs, just need another address family rtnl_register
>>>> that implements it, the new ndo is just for PF_BRIDGE. :)  
>>
>> all great points. My only reason to explore RTM_DELNEIGH is to see if we 
>> can find a recipe to support similar bulk deletes of other objects 
>> handled via rtm msgs in the future. Plus, it allows you to maintain 
>> symmetry between flush requests and object delete notification msg types.
>>
>> Lets see if there are other opinions.
> 
> I'd vote for reusing RTM_DELNEIGH, but that's purely based on

OK, I'll look into the delneigh solution. Note that for backwards compatibility
we won't be able to return proper error because rtnl_fdb_del will be called without
a mac address, so for old kernels with new iproute2 fdb flush will return "invalid
address" as an error.

> intuition, I don't know this code. I'd also lean towards core
> creating struct net_bridge_fdb_flush_desc rather than piping
> raw netlink attrs thru. Lastly feels like fdb ops should find 

I don't think the struct can really be centralized, at least for the
bridge case it contains private fields which parsed attributes get mapped to,
specifically the ndm flags and state, and their maps are all mapped into
bridge-private flags. Or did you mean pass the raw attribute vals through a
struct instead of a nlattr table?

> a new home rather than ndos, but that's largely unrelated..

I like separating the ops idea. I'll add that to my bridge todo list. :)

Thanks,
 Nik
Jakub Kicinski April 11, 2022, 8:48 p.m. UTC | #8
On Mon, 11 Apr 2022 23:34:23 +0300 Nikolay Aleksandrov wrote:
> On 11/04/2022 22:49, Jakub Kicinski wrote:
> >> all great points. My only reason to explore RTM_DELNEIGH is to see if we 
> >> can find a recipe to support similar bulk deletes of other objects 
> >> handled via rtm msgs in the future. Plus, it allows you to maintain 
> >> symmetry between flush requests and object delete notification msg types.
> >>
> >> Lets see if there are other opinions.  
> > 
> > I'd vote for reusing RTM_DELNEIGH, but that's purely based on  
> 
> OK, I'll look into the delneigh solution. Note that for backwards compatibility
> we won't be able to return proper error because rtnl_fdb_del will be called without
> a mac address, so for old kernels with new iproute2 fdb flush will return "invalid
> address" as an error.

If only we had policy dump for rtnl :) Another todo item, I guess.

> > intuition, I don't know this code. I'd also lean towards core
> > creating struct net_bridge_fdb_flush_desc rather than piping
> > raw netlink attrs thru. Lastly feels like fdb ops should find   
> 
> I don't think the struct can really be centralized, at least for the
> bridge case it contains private fields which parsed attributes get mapped to,
> specifically the ndm flags and state, and their maps are all mapped into
> bridge-private flags. Or did you mean pass the raw attribute vals through a
> struct instead of a nlattr table?

Yup, basically the policy is defined in the core, so the types are
known. We can extract the fields from the message there, even if 
the exact meaning of the fields gets established in the callback.

BTW setting NLA_REJECT policy is not required, NLA_REJECT is 0 so 
it will be set automatically per C standard.

> > a new home rather than ndos, but that's largely unrelated..  
> 
> I like separating the ops idea. I'll add that to my bridge todo list. :)
> 
> Thanks,
>  Nik
>
Nikolay Aleksandrov April 11, 2022, 9:17 p.m. UTC | #9
On 11/04/2022 23:48, Jakub Kicinski wrote:
> On Mon, 11 Apr 2022 23:34:23 +0300 Nikolay Aleksandrov wrote:
>> On 11/04/2022 22:49, Jakub Kicinski wrote:
>>>> all great points. My only reason to explore RTM_DELNEIGH is to see if we 
>>>> can find a recipe to support similar bulk deletes of other objects 
>>>> handled via rtm msgs in the future. Plus, it allows you to maintain 
>>>> symmetry between flush requests and object delete notification msg types.
>>>>
>>>> Lets see if there are other opinions.  
>>>
>>> I'd vote for reusing RTM_DELNEIGH, but that's purely based on  
>>
>> OK, I'll look into the delneigh solution. Note that for backwards compatibility
>> we won't be able to return proper error because rtnl_fdb_del will be called without
>> a mac address, so for old kernels with new iproute2 fdb flush will return "invalid
>> address" as an error.
> 
> If only we had policy dump for rtnl :) Another todo item, I guess.
> 

:)

>>> intuition, I don't know this code. I'd also lean towards core
>>> creating struct net_bridge_fdb_flush_desc rather than piping
>>> raw netlink attrs thru. Lastly feels like fdb ops should find   
>>
>> I don't think the struct can really be centralized, at least for the
>> bridge case it contains private fields which parsed attributes get mapped to,
>> specifically the ndm flags and state, and their maps are all mapped into
>> bridge-private flags. Or did you mean pass the raw attribute vals through a
>> struct instead of a nlattr table?
> 
> Yup, basically the policy is defined in the core, so the types are
> known. We can extract the fields from the message there, even if 
> the exact meaning of the fields gets established in the callback.
> 

That sounds nice, but there are a few catches, f.e. some ndo_fdb implementations
check if attributes were set, i.e. they can also interpret 0, so it will require
additional state (either special value, bitfield or some other way of telling them
it was actually present but 0).
Anyway I think that is orthogonal to adding the flush support, it's a nice cleanup but can
be done separately because it will have to be done for all ndo_fdb callbacks and I
suspect the change will grow considerably.
OTOH the flush implementation via delneigh doesn't require a new ndo_fdb call way,
would you mind if I finish that up without the struct conversion?

> BTW setting NLA_REJECT policy is not required, NLA_REJECT is 0 so 
> it will be set automatically per C standard.
> 

Indeed - habits, I'll drop it. :)

>>> a new home rather than ndos, but that's largely unrelated..  
>>
>> I like separating the ops idea. I'll add that to my bridge todo list. :)
>>
>> Thanks,
>>  Nik
>>
>
Jakub Kicinski April 11, 2022, 9:35 p.m. UTC | #10
On Tue, 12 Apr 2022 00:17:14 +0300 Nikolay Aleksandrov wrote:
> > Yup, basically the policy is defined in the core, so the types are
> > known. We can extract the fields from the message there, even if 
> > the exact meaning of the fields gets established in the callback.
> 
> That sounds nice, but there are a few catches, f.e. some ndo_fdb implementations
> check if attributes were set, i.e. they can also interpret 0, so it will require
> additional state (either special value, bitfield or some other way of telling them
> it was actually present but 0).
> Anyway I think that is orthogonal to adding the flush support, it's a nice cleanup but can
> be done separately because it will have to be done for all ndo_fdb callbacks and I
> suspect the change will grow considerably.
> OTOH the flush implementation via delneigh doesn't require a new ndo_fdb call way,
> would you mind if I finish that up without the struct conversion?

Not terribly, go ahead.
David Ahern April 11, 2022, 11:03 p.m. UTC | #11
On Mon, Apr 11, 2022 at 12:22:24PM -0700, Roopa Prabhu wrote:
> all great points. My only reason to explore RTM_DELNEIGH is to see if we can
> find a recipe to support similar bulk deletes of other objects handled via
> rtm msgs in the future. Plus, it allows you to maintain symmetry between
> flush requests and object delete notification msg types.
> 
> Lets see if there are other opinions.

I guess I should have read the entire thread. :-) (still getting used to
the new lei + mutt workflow). This was my thought - bulk delete is going
to be a common need, and it is really just a mass delete. The GET
message types are used for dumps and some allow attributes on the
request as a means of coarse grain filtering. I think we should try
something similar here for the flush case.