diff mbox series

[net-next,1/5] skbuff: bridge: Add layer 2 miss indication

Message ID 20230518113328.1952135-2-idosch@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series Add layer 2 miss indication and filtering | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5182 this patch: 5182
netdev/cc_maintainers warning 1 maintainers not CCed: imagedong@tencent.com
netdev/build_clang success Errors and warnings before: 1979 this patch: 1979
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5413 this patch: 5413
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 45 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ido Schimmel May 18, 2023, 11:33 a.m. UTC
Allow the bridge driver to mark packets that did not match a layer 2
entry during forwarding by adding a 'l2_miss' bit to the skb.

Clear the bit whenever a packet enters the bridge (received from a
bridge port or transmitted via the bridge) and set it if the packet did
not match an FDB/MDB entry.

Subsequent patches will allow the flower classifier to match on this
bit. The motivating use case in non-DF (Designated Forwarder) filtering
where we would like to prevent decapsulated packets from being flooded
to a multi-homed host.

Do not allocate the bit if the kernel was not compiled with bridge
support and place it after the two bit fields in accordance with commit
4c60d04c2888 ("net: skbuff: push nf_trace down the bitfield"). The bit
does not increase the size of the structure as it is placed at an
existing hole. Layout with allmodconfig:

struct sk_buff {
[...]
			__u8       csum_not_inet:1;      /*   132: 3  1 */
			__u8       l2_miss:1;            /*   132: 4  1 */

			/* XXX 3 bits hole, try to pack */
			/* XXX 1 byte hole, try to pack */

			__u16      tc_index;             /*   134     2 */
			u16        alloc_cpu;            /*   136     2 */
[...]
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 include/linux/skbuff.h  | 4 ++++
 net/bridge/br_device.c  | 1 +
 net/bridge/br_forward.c | 3 +++
 net/bridge/br_input.c   | 1 +
 4 files changed, 9 insertions(+)

Comments

Nikolay Aleksandrov May 18, 2023, 4:08 p.m. UTC | #1
On 18/05/2023 14:33, Ido Schimmel wrote:
> Allow the bridge driver to mark packets that did not match a layer 2
> entry during forwarding by adding a 'l2_miss' bit to the skb.
> 
> Clear the bit whenever a packet enters the bridge (received from a
> bridge port or transmitted via the bridge) and set it if the packet did
> not match an FDB/MDB entry.
> 
> Subsequent patches will allow the flower classifier to match on this
> bit. The motivating use case in non-DF (Designated Forwarder) filtering
> where we would like to prevent decapsulated packets from being flooded
> to a multi-homed host.
> 
> Do not allocate the bit if the kernel was not compiled with bridge
> support and place it after the two bit fields in accordance with commit
> 4c60d04c2888 ("net: skbuff: push nf_trace down the bitfield"). The bit
> does not increase the size of the structure as it is placed at an
> existing hole. Layout with allmodconfig:
> 
> struct sk_buff {
> [...]
> 			__u8       csum_not_inet:1;      /*   132: 3  1 */
> 			__u8       l2_miss:1;            /*   132: 4  1 */
> 
> 			/* XXX 3 bits hole, try to pack */
> 			/* XXX 1 byte hole, try to pack */
> 
> 			__u16      tc_index;             /*   134     2 */
> 			u16        alloc_cpu;            /*   136     2 */
> [...]
> } __attribute__((__aligned__(8)));
> 
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/linux/skbuff.h  | 4 ++++
>  net/bridge/br_device.c  | 1 +
>  net/bridge/br_forward.c | 3 +++
>  net/bridge/br_input.c   | 1 +
>  4 files changed, 9 insertions(+)
> 
[snip]
>  	while (p || rp) {
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index fc17b9fd93e6..d8ab5890cbe6 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -334,6 +334,7 @@ static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
>  		return RX_HANDLER_CONSUMED;
>  
>  	memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> +	skb->l2_miss = 0;
>  
>  	p = br_port_get_rcu(skb->dev);
>  	if (p->flags & BR_VLAN_TUNNEL)

Overall looks good, only this part is a bit worrisome and needs some additional
investigation because now we'll unconditionally dirty a cache line for every
packet that is forwarded. Could you please check the effect with perf?

Thanks,
 Nik
Ido Schimmel May 19, 2023, 1:51 p.m. UTC | #2
On Thu, May 18, 2023 at 07:08:47PM +0300, Nikolay Aleksandrov wrote:
> On 18/05/2023 14:33, Ido Schimmel wrote:
> > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> > index fc17b9fd93e6..d8ab5890cbe6 100644
> > --- a/net/bridge/br_input.c
> > +++ b/net/bridge/br_input.c
> > @@ -334,6 +334,7 @@ static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
> >  		return RX_HANDLER_CONSUMED;
> >  
> >  	memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> > +	skb->l2_miss = 0;
> >  
> >  	p = br_port_get_rcu(skb->dev);
> >  	if (p->flags & BR_VLAN_TUNNEL)
> 
> Overall looks good, only this part is a bit worrisome and needs some additional
> investigation because now we'll unconditionally dirty a cache line for every
> packet that is forwarded. Could you please check the effect with perf?

To eliminate it I tried the approach we discussed yesterday:

First, add the miss indication to the bridge's control block which is
zeroed for every skb entering the bridge:

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2119729ded2b..bd5c18286a40 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -581,6 +581,7 @@ struct br_input_skb_cb {
 #endif
        u8 proxyarp_replied:1;
        u8 src_port_isolated:1;
+       u8 miss:1;      /* FDB or MDB lookup miss */
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
        u8 vlan_filtered:1;
 #endif

And set this bit upon misses instead of skb->l2_miss:

@@ -203,6 +205,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
        struct net_bridge_port *prev = NULL;
        struct net_bridge_port *p;
 
+       BR_INPUT_SKB_CB(skb)->miss = 1;
+
        list_for_each_entry_rcu(p, &br->port_list, list) {
                /* Do not flood unicast traffic to ports that turn it off, nor
                 * other traffic if flood off, except for traffic we originate
@@ -295,6 +299,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
                        allow_mode_include = false;
        } else {
                p = NULL;
+               BR_INPUT_SKB_CB(skb)->miss = 1;
        }
 
        while (p || rp) {

Then copy it to skb->l2_miss at the very end where the cache line
containing this field is already written to:

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 84d6dd5e5b1a..89f65564e338 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -50,6 +50,8 @@ int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb
 
        br_switchdev_frame_set_offload_fwd_mark(skb);
 
+       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
+
        dev_queue_xmit(skb);
 
        return 0;

Also for locally received packets:

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index fc17b9fd93e6..274e55455b15 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
         */
        br_switchdev_frame_unmark(skb);
 
+       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
+
        /* Bridge is just like any other port.  Make sure the
         * packet is allowed except in promisc mode when someone
         * may be running packet capture.

Ran these changes through the selftest and it seems to work.

WDYT?
Jakub Kicinski May 19, 2023, 9:52 p.m. UTC | #3
On Fri, 19 May 2023 16:51:48 +0300 Ido Schimmel wrote:
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index fc17b9fd93e6..274e55455b15 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
>          */
>         br_switchdev_frame_unmark(skb);
>  
> +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> +
>         /* Bridge is just like any other port.  Make sure the
>          * packet is allowed except in promisc mode when someone
>          * may be running packet capture.
> 
> Ran these changes through the selftest and it seems to work.

Can we possibly put the new field at the end of the CB and then have TC
look at it in the CB? We already do a bit of such CB juggling in strp
(first member of struct sk_skb_cb).
Nikolay Aleksandrov May 21, 2023, 7:34 a.m. UTC | #4
On 19/05/2023 16:51, Ido Schimmel wrote:
> On Thu, May 18, 2023 at 07:08:47PM +0300, Nikolay Aleksandrov wrote:
>> On 18/05/2023 14:33, Ido Schimmel wrote:
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index fc17b9fd93e6..d8ab5890cbe6 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -334,6 +334,7 @@ static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
>>>  		return RX_HANDLER_CONSUMED;
>>>  
>>>  	memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
>>> +	skb->l2_miss = 0;
>>>  
>>>  	p = br_port_get_rcu(skb->dev);
>>>  	if (p->flags & BR_VLAN_TUNNEL)
>>
>> Overall looks good, only this part is a bit worrisome and needs some additional
>> investigation because now we'll unconditionally dirty a cache line for every
>> packet that is forwarded. Could you please check the effect with perf?
> 
> To eliminate it I tried the approach we discussed yesterday:
> 
> First, add the miss indication to the bridge's control block which is
> zeroed for every skb entering the bridge:
> 
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2119729ded2b..bd5c18286a40 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -581,6 +581,7 @@ struct br_input_skb_cb {
>  #endif
>         u8 proxyarp_replied:1;
>         u8 src_port_isolated:1;
> +       u8 miss:1;      /* FDB or MDB lookup miss */
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>         u8 vlan_filtered:1;
>  #endif
> 
> And set this bit upon misses instead of skb->l2_miss:
> 
> @@ -203,6 +205,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
>         struct net_bridge_port *prev = NULL;
>         struct net_bridge_port *p;
>  
> +       BR_INPUT_SKB_CB(skb)->miss = 1;
> +
>         list_for_each_entry_rcu(p, &br->port_list, list) {
>                 /* Do not flood unicast traffic to ports that turn it off, nor
>                  * other traffic if flood off, except for traffic we originate
> @@ -295,6 +299,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
>                         allow_mode_include = false;
>         } else {
>                 p = NULL;
> +               BR_INPUT_SKB_CB(skb)->miss = 1;
>         }
>  
>         while (p || rp) {
> 
> Then copy it to skb->l2_miss at the very end where the cache line
> containing this field is already written to:
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 84d6dd5e5b1a..89f65564e338 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -50,6 +50,8 @@ int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb
>  
>         br_switchdev_frame_set_offload_fwd_mark(skb);
>  
> +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> +
>         dev_queue_xmit(skb);
>  
>         return 0;
> 
> Also for locally received packets:
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index fc17b9fd93e6..274e55455b15 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
>          */
>         br_switchdev_frame_unmark(skb);
>  
> +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> +
>         /* Bridge is just like any other port.  Make sure the
>          * packet is allowed except in promisc mode when someone
>          * may be running packet capture.
> 
> Ran these changes through the selftest and it seems to work.
> 
> WDYT?

Looks good to me, this is what I had in mind wrt cache line dirtying.
The swdev mark already does it, so putting them together is nice. From
bridge POV this is good.

Thanks,
 Nik
Ido Schimmel May 23, 2023, 8:10 a.m. UTC | #5
On Fri, May 19, 2023 at 02:52:18PM -0700, Jakub Kicinski wrote:
> On Fri, 19 May 2023 16:51:48 +0300 Ido Schimmel wrote:
> > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> > index fc17b9fd93e6..274e55455b15 100644
> > --- a/net/bridge/br_input.c
> > +++ b/net/bridge/br_input.c
> > @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
> >          */
> >         br_switchdev_frame_unmark(skb);
> >  
> > +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> > +
> >         /* Bridge is just like any other port.  Make sure the
> >          * packet is allowed except in promisc mode when someone
> >          * may be running packet capture.
> > 
> > Ran these changes through the selftest and it seems to work.
> 
> Can we possibly put the new field at the end of the CB and then have TC
> look at it in the CB? We already do a bit of such CB juggling in strp
> (first member of struct sk_skb_cb).

Using the CB between different layers is very fragile and I would like
to avoid it. Note that the skb can pass various layers until hitting the
classifier, each of which can decide to memset() the CB.

Anyway, I think I have a better alternative. I added the 'l2_miss' bit
to the tc skb extension and adjusted the bridge to mark packets via this
extension. The entire thing is protected by the existing 'tc_skb_ext_tc'
static key, so overhead is kept to a minimum when feature is disabled.
Extended flower to enable / disable this key when filters that match on
'l2_miss' are added / removed.

bridge change to mark the packet:
https://github.com/idosch/linux/commit/3fab206492fcad9177f2340680f02ced1b9a0dec.patch

flow_dissector change to dissect the info from the extension:
https://github.com/idosch/linux/commit/1533c078b02586547817a4e63989a0db62aa5315.patch

flower change to enable / disable the key:
https://github.com/idosch/linux/commit/cf84b277511ec80fe565c41271abc6b2e2f629af.patch

Advantages compared to the previous approach are that we do not need a
new bit in the skb and that overhead is kept to a minimum when feature
is disabled. Disadvantage is that overhead is higher when feature is
enabled.

WDYT?

To be clear, merely asking for feedback on the general approach, not
code review.

Thanks
Paolo Abeni May 23, 2023, 9:04 a.m. UTC | #6
On Tue, 2023-05-23 at 11:10 +0300, Ido Schimmel wrote:
> On Fri, May 19, 2023 at 02:52:18PM -0700, Jakub Kicinski wrote:
> > On Fri, 19 May 2023 16:51:48 +0300 Ido Schimmel wrote:
> > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> > > index fc17b9fd93e6..274e55455b15 100644
> > > --- a/net/bridge/br_input.c
> > > +++ b/net/bridge/br_input.c
> > > @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
> > >          */
> > >         br_switchdev_frame_unmark(skb);
> > >  
> > > +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> > > +
> > >         /* Bridge is just like any other port.  Make sure the
> > >          * packet is allowed except in promisc mode when someone
> > >          * may be running packet capture.
> > > 
> > > Ran these changes through the selftest and it seems to work.
> > 
> > Can we possibly put the new field at the end of the CB and then have TC
> > look at it in the CB? We already do a bit of such CB juggling in strp
> > (first member of struct sk_skb_cb).
> 
> Using the CB between different layers is very fragile and I would like
> to avoid it. Note that the skb can pass various layers until hitting the
> classifier, each of which can decide to memset() the CB.
> 
> Anyway, I think I have a better alternative. I added the 'l2_miss' bit
> to the tc skb extension and adjusted the bridge to mark packets via this
> extension. The entire thing is protected by the existing 'tc_skb_ext_tc'
> static key, so overhead is kept to a minimum when feature is disabled.
> Extended flower to enable / disable this key when filters that match on
> 'l2_miss' are added / removed.
> 
> bridge change to mark the packet:
> https://github.com/idosch/linux/commit/3fab206492fcad9177f2340680f02ced1b9a0dec.patch
> 
> flow_dissector change to dissect the info from the extension:
> https://github.com/idosch/linux/commit/1533c078b02586547817a4e63989a0db62aa5315.patch
> 
> flower change to enable / disable the key:
> https://github.com/idosch/linux/commit/cf84b277511ec80fe565c41271abc6b2e2f629af.patch
> 
> Advantages compared to the previous approach are that we do not need a
> new bit in the skb and that overhead is kept to a minimum when feature
> is disabled. Disadvantage is that overhead is higher when feature is
> enabled.
> 
> WDYT?

Looks good to me.

I think you would only need to set/add the extension when l2_miss is
true, right? (with no extension l2 hit is assumed). That will avoid
unneeded overhead for br_dev_xmit().

All the others involved paths look like slow(er) one, so the occasional
skb extension overhead should not be a problem.

Cheers,

Paolo
Ido Schimmel May 23, 2023, 11:34 a.m. UTC | #7
On Tue, May 23, 2023 at 11:04:27AM +0200, Paolo Abeni wrote:
> I think you would only need to set/add the extension when l2_miss is
> true, right? (with no extension l2 hit is assumed). That will avoid
> unneeded overhead for br_dev_xmit().

If an extension is already present (possibly with 'l2_miss' being 'true'
because the packet was flooded by a different bridge earlier in the
pipeline), then we need to clear it when the packet enters the bridge.
IMO, this is quite unlikely. However, if the extension is missing, then
you are correct and there is no point in allocating one.

IOW, I can squash the following diff to the first patch:

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index fb6525553a8a..32115d76a6de 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -764,10 +764,16 @@ static inline void br_tc_skb_miss_set(struct sk_buff *skb, bool miss)
                return;
 
        ext = skb_ext_find(skb, TC_SKB_EXT);
-       if (!ext)
-               ext = tc_skb_ext_alloc(skb);
-       if (ext)
+       if (ext) {
                ext->l2_miss = miss;
+               return;
+       }
+       if (!miss)
+               return;
+       ext = tc_skb_ext_alloc(skb);
+       if (!ext)
+               return;
+       ext->l2_miss = miss;
 }
 #else
 static inline void br_tc_skb_miss_set(struct sk_buff *skb, bool miss)

Thanks
Nikolay Aleksandrov May 23, 2023, 1:03 p.m. UTC | #8
On 23/05/2023 11:10, Ido Schimmel wrote:
> On Fri, May 19, 2023 at 02:52:18PM -0700, Jakub Kicinski wrote:
>> On Fri, 19 May 2023 16:51:48 +0300 Ido Schimmel wrote:
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index fc17b9fd93e6..274e55455b15 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
>>>          */
>>>         br_switchdev_frame_unmark(skb);
>>>  
>>> +       skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
>>> +
>>>         /* Bridge is just like any other port.  Make sure the
>>>          * packet is allowed except in promisc mode when someone
>>>          * may be running packet capture.
>>>
>>> Ran these changes through the selftest and it seems to work.
>>
>> Can we possibly put the new field at the end of the CB and then have TC
>> look at it in the CB? We already do a bit of such CB juggling in strp
>> (first member of struct sk_skb_cb).
> 
> Using the CB between different layers is very fragile and I would like
> to avoid it. Note that the skb can pass various layers until hitting the
> classifier, each of which can decide to memset() the CB.
> 
> Anyway, I think I have a better alternative. I added the 'l2_miss' bit
> to the tc skb extension and adjusted the bridge to mark packets via this
> extension. The entire thing is protected by the existing 'tc_skb_ext_tc'
> static key, so overhead is kept to a minimum when feature is disabled.
> Extended flower to enable / disable this key when filters that match on
> 'l2_miss' are added / removed.
> 
> bridge change to mark the packet:
> https://github.com/idosch/linux/commit/3fab206492fcad9177f2340680f02ced1b9a0dec.patch
> 
> flow_dissector change to dissect the info from the extension:
> https://github.com/idosch/linux/commit/1533c078b02586547817a4e63989a0db62aa5315.patch
> 
> flower change to enable / disable the key:
> https://github.com/idosch/linux/commit/cf84b277511ec80fe565c41271abc6b2e2f629af.patch
> 
> Advantages compared to the previous approach are that we do not need a
> new bit in the skb and that overhead is kept to a minimum when feature
> is disabled. Disadvantage is that overhead is higher when feature is
> enabled.
> 
> WDYT?
> 
> To be clear, merely asking for feedback on the general approach, not
> code review.
> 
> Thanks

TBH, I like this approach much better for obvious reasons. :)
Thanks for working on it.
Jakub Kicinski May 23, 2023, 8:29 p.m. UTC | #9
On Tue, 23 May 2023 11:10:38 +0300 Ido Schimmel wrote:
> > Can we possibly put the new field at the end of the CB and then have TC
> > look at it in the CB? We already do a bit of such CB juggling in strp
> > (first member of struct sk_skb_cb).  
> 
> Using the CB between different layers is very fragile and I would like
> to avoid it. Note that the skb can pass various layers until hitting the
> classifier, each of which can decide to memset() the CB.
> 
> Anyway, I think I have a better alternative. I added the 'l2_miss' bit
> to the tc skb extension and adjusted the bridge to mark packets via this
> extension. The entire thing is protected by the existing 'tc_skb_ext_tc'
> static key, so overhead is kept to a minimum when feature is disabled.
> Extended flower to enable / disable this key when filters that match on
> 'l2_miss' are added / removed.
> 
> bridge change to mark the packet:
> https://github.com/idosch/linux/commit/3fab206492fcad9177f2340680f02ced1b9a0dec.patch
> 
> flow_dissector change to dissect the info from the extension:
> https://github.com/idosch/linux/commit/1533c078b02586547817a4e63989a0db62aa5315.patch
> 
> flower change to enable / disable the key:
> https://github.com/idosch/linux/commit/cf84b277511ec80fe565c41271abc6b2e2f629af.patch
> 
> Advantages compared to the previous approach are that we do not need a
> new bit in the skb and that overhead is kept to a minimum when feature
> is disabled. Disadvantage is that overhead is higher when feature is
> enabled.
> 
> WDYT?

Sounds good, yup. Thanks!
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8cff3d817131..b64dc3f62c5c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -801,6 +801,7 @@  typedef unsigned char *sk_buff_data_t;
  *	@encap_hdr_csum: software checksum is needed
  *	@csum_valid: checksum is already valid
  *	@csum_not_inet: use CRC32c to resolve CHECKSUM_PARTIAL
+ *	@l2_miss: Packet did not match an L2 entry during forwarding
  *	@csum_complete_sw: checksum was completed by software
  *	@csum_level: indicates the number of consecutive checksums found in
  *		the packet minus one that have been verified as
@@ -991,6 +992,9 @@  struct sk_buff {
 #if IS_ENABLED(CONFIG_IP_SCTP)
 	__u8			csum_not_inet:1;
 #endif
+#if IS_ENABLED(CONFIG_BRIDGE)
+	__u8			l2_miss:1;
+#endif
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 8eca8a5c80c6..91dbdae4afd4 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -39,6 +39,7 @@  netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	u16 vid = 0;
 
 	memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
+	skb->l2_miss = 0;
 
 	rcu_read_lock();
 	nf_ops = rcu_dereference(nf_br_ops);
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 84d6dd5e5b1a..8cf5a51489ce 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -203,6 +203,8 @@  void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	struct net_bridge_port *prev = NULL;
 	struct net_bridge_port *p;
 
+	skb->l2_miss = 1;
+
 	list_for_each_entry_rcu(p, &br->port_list, list) {
 		/* Do not flood unicast traffic to ports that turn it off, nor
 		 * other traffic if flood off, except for traffic we originate
@@ -295,6 +297,7 @@  void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
 			allow_mode_include = false;
 	} else {
 		p = NULL;
+		skb->l2_miss = 1;
 	}
 
 	while (p || rp) {
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index fc17b9fd93e6..d8ab5890cbe6 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -334,6 +334,7 @@  static rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 		return RX_HANDLER_CONSUMED;
 
 	memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
+	skb->l2_miss = 0;
 
 	p = br_port_get_rcu(skb->dev);
 	if (p->flags & BR_VLAN_TUNNEL)