[RFC,net-next,0/9] net: bridge: vlan: Multiple Spanning Trees

Message ID	20220216132934.1775649-1-tobias@waldekranz.com (mailing list archive)
Headers	show Return-Path: <netdev-owner@kernel.org> From: Tobias Waldekranz <tobias@waldekranz.com> To: davem@davemloft.net, kuba@kernel.org Cc: Andrew Lunn <andrew@lunn.ch>, Vivien Didelot <vivien.didelot@gmail.com>, Florian Fainelli <f.fainelli@gmail.com>, Vladimir Oltean <olteanv@gmail.com>, Jiri Pirko <jiri@resnulli.us>, Ivan Vecera <ivecera@redhat.com>, Roopa Prabhu <roopa@nvidia.com>, Nikolay Aleksandrov <nikolay@nvidia.com>, Russell King <linux@armlinux.org.uk>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bridge@lists.linux-foundation.org Subject: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees Date: Wed, 16 Feb 2022 14:29:25 +0100 Message-Id: <20220216132934.1775649-1-tobias@waldekranz.com> MIME-Version: 1.0 Organization: Westermo Content-Transfer-Encoding: 8bit Precedence: bulk
Series	net: bridge: vlan: Multiple Spanning Trees \| expand [RFC,net-next,0/9] net: bridge: vlan: Multiple Spanning Trees [RFC,net-next,1/9] net: bridge: vlan: Introduce multiple spanning trees (MST) [RFC,net-next,2/9] net: bridge: vlan: Allow multiple VLANs to be mapped to a single MST [RFC,net-next,3/9] net: bridge: vlan: Notify switchdev drivers of VLAN MST migrations [RFC,net-next,4/9] net: bridge: vlan: Notify switchdev drivers of MST state changes [RFC,net-next,5/9] net: dsa: Pass VLAN MST migration notifications to driver [RFC,net-next,6/9] net: dsa: Pass MST state changes to driver [RFC,net-next,7/9] net: dsa: mv88e6xxx: Disentangle STU from VTU [RFC,net-next,8/9] net: dsa: mv88e6xxx: Export STU as devlink region [RFC,net-next,9/9] net: dsa: mv88e6xxx: MST Offloading

Message ID

20220216132934.1775649-1-tobias@waldekranz.com (mailing list archive)

Headers

From: Tobias Waldekranz <tobias@waldekranz.com>
To: davem@davemloft.net, kuba@kernel.org
Cc: Andrew Lunn <andrew@lunn.ch>,
        Vivien Didelot <vivien.didelot@gmail.com>,
        Florian Fainelli <f.fainelli@gmail.com>,
        Vladimir Oltean <olteanv@gmail.com>,
        Jiri Pirko <jiri@resnulli.us>,
        Ivan Vecera <ivecera@redhat.com>,
        Roopa Prabhu <roopa@nvidia.com>,
        Nikolay Aleksandrov <nikolay@nvidia.com>,
        Russell King <linux@armlinux.org.uk>, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org, bridge@lists.linux-foundation.org
Subject: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees
Date: Wed, 16 Feb 2022 14:29:25 +0100
Message-Id: <20220216132934.1775649-1-tobias@waldekranz.com>
MIME-Version: 1.0
Organization: Westermo
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

net: bridge: vlan: Multiple Spanning Trees | expand

Message

Tobias Waldekranz Feb. 16, 2022, 1:29 p.m. UTC

The bridge has had per-VLAN STP support for a while now, since:

https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/

The current implementation has some problems:

- The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
  is managed independently. This is awkward from an MSTP (802.1Q-2018,
  Clause 13.5) point of view, where the model is that multiple VLANs
  are grouped into MST instances.

  Because of the way that the standard is written, presumably, this is
  also reflected in hardware implementations. It is not uncommon for a
  switch to support the full 4k range of VIDs, but that the pool of
  MST instances is much smaller. Some examples:

  Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
  Marvell Prestera: 4k VLANs, but only 128 MSTIs
  Microchip SparX-5i: 4k VLANs, but only 128 MSTIs

- By default, the feature is enabled, and there is no way to disable
  it. This makes it hard to add offloading in a backwards compatible
  way, since any underlying switchdevs have no way to refuse the
  function if the hardware does not support it

- The port-global STP state has precedence over per-VLAN states. In
  MSTP, as far as I understand it, all VLANs will use the common
  spanning tree (CST) by default - through traffic engineering you can
  then optimize your network to group subsets of VLANs to use
  different trees (MSTI). To my understanding, the way this is
  typically managed in silicon is roughly:

  Incoming packet:
  .----.----.--------------.----.-------------
  | DA | SA | 802.1Q VID=X | ET | Payload ...
  '----'----'--------------'----'-------------
                        |
                        '->|\     .----------------------------.
                           | +--> | VID | Members | ... | MSTI |
                   PVID -->|/     |-----|---------|-----|------|
                                  |   1 | 0001001 | ... |    0 |
                                  |   2 | 0001010 | ... |   10 |
                                  |   3 | 0001100 | ... |   10 |
                                  '----------------------------'
                                                             |
                               .-----------------------------'
                               |  .------------------------.
                               '->| MSTI | Fwding | Lrning |
                                  |------|--------|--------|
                                  |    0 | 111110 | 111110 |
                                  |   10 | 110111 | 110111 |
                                  '------------------------'

  What this is trying to show is that the STP state (whether MSTP is
  used, or ye olde STP) is always accessed via the VLAN table. If STP
  is running, all MSTI pointers in that table will reference the same
  index in the STP stable - if MSTP is running, some VLANs may point
  to other trees (like in this example).

  The fact that in the Linux bridge, the global state (think: index 0
  in most hardware implementations) is supposed to override the
  per-VLAN state, is very awkward to offload. In effect, this means
  that when the global state changes to blocking, drivers will have to
  iterate over all MSTIs in use, and alter them all to match. This
  also means that you have to cache whether the hardware state is
  currently tracking the global state or the per-VLAN state. In the
  first case, you also have to cache the per-VLAN state so that you
  can restore it if the global state transitions back to forwarding.

This series adds support for an arbitrary M:N mapping of VIDs to
MSTIs, proposing one solution to the first issue. An example of an
offload implementation for mv88e6xxx is also provided. Offloading is
done on a best-effort basis, i.e. notifications of the relevant events
are generated, but there is no way for the user to see whether the
per-VLAN state has been offloaded or not. There is also no handling of
the relationship between the port-global state the the per-VLAN ditto.

If I was king of net/bridge/*, I would make the following additional
changes:

- By default, when a VLAN is created, assign it to MSTID 0, which
  would mean that no per-VLAN state is used and that packets belonging
  to this VLAN should be filtered according to the port-global state.

  This way, when a VLAN is configured to use a separate tree (setting
  a non-zero MSTID), an underlying switchdev could oppose it if it is
  not supported.

  Obviously, this adds an extra step for existing users of per-VLAN
  STP states and would thus not be backwards compatible. Maybe this
  means that that is impossible to do, maybe not.

- Swap the precedence of the port-global and the per-VLAN state,
  i.e. the port-global state only applies to packets belonging to
  VLANs that does not make use of a per-VLAN state (MSTID != 0).

  This would make the offloading much more natural, as you avoid all
  of the caching stuff described above.

  Again, this changes the behavior of the kernel so it is not
  backwards compatible. I suspect that this is less of an issue
  though, since my guess is that very few people rely on the old
  behavior.

Thoughts?

Tobias Waldekranz (9):
  net: bridge: vlan: Introduce multiple spanning trees (MST)
  net: bridge: vlan: Allow multiple VLANs to be mapped to a single MST
  net: bridge: vlan: Notify switchdev drivers of VLAN MST migrations
  net: bridge: vlan: Notify switchdev drivers of MST state changes
  net: dsa: Pass VLAN MST migration notifications to driver
  net: dsa: Pass MST state changes to driver
  net: dsa: mv88e6xxx: Disentangle STU from VTU
  net: dsa: mv88e6xxx: Export STU as devlink region
  net: dsa: mv88e6xxx: MST Offloading

 drivers/net/dsa/mv88e6xxx/chip.c        | 223 +++++++++++++++++
 drivers/net/dsa/mv88e6xxx/chip.h        |  38 +++
 drivers/net/dsa/mv88e6xxx/devlink.c     |  94 +++++++
 drivers/net/dsa/mv88e6xxx/global1.h     |  10 +
 drivers/net/dsa/mv88e6xxx/global1_vtu.c | 311 ++++++++++++++----------
 include/linux/if_bridge.h               |   6 +
 include/net/dsa.h                       |   5 +
 include/net/switchdev.h                 |  17 ++
 include/uapi/linux/if_bridge.h          |   1 +
 net/bridge/br_private.h                 |  44 +++-
 net/bridge/br_vlan.c                    | 249 ++++++++++++++++++-
 net/bridge/br_vlan_options.c            |  48 +++-
 net/dsa/dsa_priv.h                      |   3 +
 net/dsa/port.c                          |  40 +++
 net/dsa/slave.c                         |  12 +
 15 files changed, 941 insertions(+), 160 deletions(-)

Comments

Nikolay Aleksandrov Feb. 16, 2022, 3:28 p.m. UTC | #1

On 16/02/2022 15:29, Tobias Waldekranz wrote:
> The bridge has had per-VLAN STP support for a while now, since:
> 
> https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/
> 
> The current implementation has some problems:
> 
> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>   is managed independently. This is awkward from an MSTP (802.1Q-2018,
>   Clause 13.5) point of view, where the model is that multiple VLANs
>   are grouped into MST instances.
> 
>   Because of the way that the standard is written, presumably, this is
>   also reflected in hardware implementations. It is not uncommon for a
>   switch to support the full 4k range of VIDs, but that the pool of
>   MST instances is much smaller. Some examples:
> 
>   Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>   Marvell Prestera: 4k VLANs, but only 128 MSTIs
>   Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
> 
> - By default, the feature is enabled, and there is no way to disable
>   it. This makes it hard to add offloading in a backwards compatible
>   way, since any underlying switchdevs have no way to refuse the
>   function if the hardware does not support it
> 
> - The port-global STP state has precedence over per-VLAN states. In
>   MSTP, as far as I understand it, all VLANs will use the common
>   spanning tree (CST) by default - through traffic engineering you can
>   then optimize your network to group subsets of VLANs to use
>   different trees (MSTI). To my understanding, the way this is
>   typically managed in silicon is roughly:
> 
>   Incoming packet:
>   .----.----.--------------.----.-------------
>   | DA | SA | 802.1Q VID=X | ET | Payload ...
>   '----'----'--------------'----'-------------
>                         |
>                         '->|\     .----------------------------.
>                            | +--> | VID | Members | ... | MSTI |
>                    PVID -->|/     |-----|---------|-----|------|
>                                   |   1 | 0001001 | ... |    0 |
>                                   |   2 | 0001010 | ... |   10 |
>                                   |   3 | 0001100 | ... |   10 |
>                                   '----------------------------'
>                                                              |
>                                .-----------------------------'
>                                |  .------------------------.
>                                '->| MSTI | Fwding | Lrning |
>                                   |------|--------|--------|
>                                   |    0 | 111110 | 111110 |
>                                   |   10 | 110111 | 110111 |
>                                   '------------------------'
> 
>   What this is trying to show is that the STP state (whether MSTP is
>   used, or ye olde STP) is always accessed via the VLAN table. If STP
>   is running, all MSTI pointers in that table will reference the same
>   index in the STP stable - if MSTP is running, some VLANs may point
>   to other trees (like in this example).
> 
>   The fact that in the Linux bridge, the global state (think: index 0
>   in most hardware implementations) is supposed to override the
>   per-VLAN state, is very awkward to offload. In effect, this means
>   that when the global state changes to blocking, drivers will have to
>   iterate over all MSTIs in use, and alter them all to match. This
>   also means that you have to cache whether the hardware state is
>   currently tracking the global state or the per-VLAN state. In the
>   first case, you also have to cache the per-VLAN state so that you
>   can restore it if the global state transitions back to forwarding.
> 
> This series adds support for an arbitrary M:N mapping of VIDs to
> MSTIs, proposing one solution to the first issue. An example of an
> offload implementation for mv88e6xxx is also provided. Offloading is
> done on a best-effort basis, i.e. notifications of the relevant events
> are generated, but there is no way for the user to see whether the
> per-VLAN state has been offloaded or not. There is also no handling of
> the relationship between the port-global state the the per-VLAN ditto.
> 
> If I was king of net/bridge/*, I would make the following additional
> changes:
> 
> - By default, when a VLAN is created, assign it to MSTID 0, which
>   would mean that no per-VLAN state is used and that packets belonging
>   to this VLAN should be filtered according to the port-global state.
> 
>   This way, when a VLAN is configured to use a separate tree (setting
>   a non-zero MSTID), an underlying switchdev could oppose it if it is
>   not supported.
> 
>   Obviously, this adds an extra step for existing users of per-VLAN
>   STP states and would thus not be backwards compatible. Maybe this
>   means that that is impossible to do, maybe not.
> 
> - Swap the precedence of the port-global and the per-VLAN state,
>   i.e. the port-global state only applies to packets belonging to
>   VLANs that does not make use of a per-VLAN state (MSTID != 0).
> 
>   This would make the offloading much more natural, as you avoid all
>   of the caching stuff described above.
> 
>   Again, this changes the behavior of the kernel so it is not
>   backwards compatible. I suspect that this is less of an issue
>   though, since my guess is that very few people rely on the old
>   behavior.
> 
> Thoughts?
> 

Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
as described help? It can require that there are no vlans present to change 
similar to the per-port vlan stats option. Also based on that option you can alter
how the state checks are performed. For example, you can skip the initial port state
check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
is disabled and mst enabled and you can avoid checking it altogether if filter && mst
are enabled then always use the vlan mst state. Similar changes would have to happen
for the egress path. Since we are talking about multiple tests the new MST logic can
be hidden behind a static key for both br_handle_frame() and later stages.

This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
that is definitely undesirable. Please either cache that state in the vlan and update it when
something changes, or think of some way which avoids that cache line in fast-path.
Alternative would be to make that cache line dependent on the new option, so it's needed
only when mst feature is enabled.

There are other options, but they way more invasive.. I'll think about it more.

Cheers,
 Nik

Tobias Waldekranz Feb. 16, 2022, 3:56 p.m. UTC | #2

On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
> On 16/02/2022 15:29, Tobias Waldekranz wrote:
>> The bridge has had per-VLAN STP support for a while now, since:
>> 
>> https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/
>> 
>> The current implementation has some problems:
>> 
>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>>   is managed independently. This is awkward from an MSTP (802.1Q-2018,
>>   Clause 13.5) point of view, where the model is that multiple VLANs
>>   are grouped into MST instances.
>> 
>>   Because of the way that the standard is written, presumably, this is
>>   also reflected in hardware implementations. It is not uncommon for a
>>   switch to support the full 4k range of VIDs, but that the pool of
>>   MST instances is much smaller. Some examples:
>> 
>>   Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>>   Marvell Prestera: 4k VLANs, but only 128 MSTIs
>>   Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>> 
>> - By default, the feature is enabled, and there is no way to disable
>>   it. This makes it hard to add offloading in a backwards compatible
>>   way, since any underlying switchdevs have no way to refuse the
>>   function if the hardware does not support it
>> 
>> - The port-global STP state has precedence over per-VLAN states. In
>>   MSTP, as far as I understand it, all VLANs will use the common
>>   spanning tree (CST) by default - through traffic engineering you can
>>   then optimize your network to group subsets of VLANs to use
>>   different trees (MSTI). To my understanding, the way this is
>>   typically managed in silicon is roughly:
>> 
>>   Incoming packet:
>>   .----.----.--------------.----.-------------
>>   | DA | SA | 802.1Q VID=X | ET | Payload ...
>>   '----'----'--------------'----'-------------
>>                         |
>>                         '->|\     .----------------------------.
>>                            | +--> | VID | Members | ... | MSTI |
>>                    PVID -->|/     |-----|---------|-----|------|
>>                                   |   1 | 0001001 | ... |    0 |
>>                                   |   2 | 0001010 | ... |   10 |
>>                                   |   3 | 0001100 | ... |   10 |
>>                                   '----------------------------'
>>                                                              |
>>                                .-----------------------------'
>>                                |  .------------------------.
>>                                '->| MSTI | Fwding | Lrning |
>>                                   |------|--------|--------|
>>                                   |    0 | 111110 | 111110 |
>>                                   |   10 | 110111 | 110111 |
>>                                   '------------------------'
>> 
>>   What this is trying to show is that the STP state (whether MSTP is
>>   used, or ye olde STP) is always accessed via the VLAN table. If STP
>>   is running, all MSTI pointers in that table will reference the same
>>   index in the STP stable - if MSTP is running, some VLANs may point
>>   to other trees (like in this example).
>> 
>>   The fact that in the Linux bridge, the global state (think: index 0
>>   in most hardware implementations) is supposed to override the
>>   per-VLAN state, is very awkward to offload. In effect, this means
>>   that when the global state changes to blocking, drivers will have to
>>   iterate over all MSTIs in use, and alter them all to match. This
>>   also means that you have to cache whether the hardware state is
>>   currently tracking the global state or the per-VLAN state. In the
>>   first case, you also have to cache the per-VLAN state so that you
>>   can restore it if the global state transitions back to forwarding.
>> 
>> This series adds support for an arbitrary M:N mapping of VIDs to
>> MSTIs, proposing one solution to the first issue. An example of an
>> offload implementation for mv88e6xxx is also provided. Offloading is
>> done on a best-effort basis, i.e. notifications of the relevant events
>> are generated, but there is no way for the user to see whether the
>> per-VLAN state has been offloaded or not. There is also no handling of
>> the relationship between the port-global state the the per-VLAN ditto.
>> 
>> If I was king of net/bridge/*, I would make the following additional
>> changes:
>> 
>> - By default, when a VLAN is created, assign it to MSTID 0, which
>>   would mean that no per-VLAN state is used and that packets belonging
>>   to this VLAN should be filtered according to the port-global state.
>> 
>>   This way, when a VLAN is configured to use a separate tree (setting
>>   a non-zero MSTID), an underlying switchdev could oppose it if it is
>>   not supported.
>> 
>>   Obviously, this adds an extra step for existing users of per-VLAN
>>   STP states and would thus not be backwards compatible. Maybe this
>>   means that that is impossible to do, maybe not.
>> 
>> - Swap the precedence of the port-global and the per-VLAN state,
>>   i.e. the port-global state only applies to packets belonging to
>>   VLANs that does not make use of a per-VLAN state (MSTID != 0).
>> 
>>   This would make the offloading much more natural, as you avoid all
>>   of the caching stuff described above.
>> 
>>   Again, this changes the behavior of the kernel so it is not
>>   backwards compatible. I suspect that this is less of an issue
>>   though, since my guess is that very few people rely on the old
>>   behavior.
>> 
>> Thoughts?
>> 
>
> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
> as described help? It can require that there are no vlans present to change 
> similar to the per-port vlan stats option.

Great idea, I did not know that that's how vlan stats worked. I will
definitely look into it, thanks!

> Also based on that option you can alter
> how the state checks are performed. For example, you can skip the initial port state
> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
> is disabled and mst enabled and you can avoid checking it altogether if filter && mst
> are enabled then always use the vlan mst state. Similar changes would have to happen
> for the egress path. Since we are talking about multiple tests the new MST logic can
> be hidden behind a static key for both br_handle_frame() and later stages.

Makes sense.

So should we keep the current per-VLAN state as-is then?  And bolt the
MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8
state` for the current implementation _and_ a `struct br_vlan_mst *`
that is populated for VLANs tied to a non-zero MSTI?

> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
> that is definitely undesirable. Please either cache that state in the vlan and update it when
> something changes, or think of some way which avoids that cache line in fast-path.
> Alternative would be to make that cache line dependent on the new option, so it's needed
> only when mst feature is enabled.

If we go with the approach I suggested above, then the current `u8
state` on `struct net_bridge_vlan` could be that cache, right?

With the current implementation, it is set directly - in the new MST
mode all grouped VLANs would have their states updated when updating the
MSTI's state.

Nikolay Aleksandrov Feb. 16, 2022, 4:12 p.m. UTC | #3

On 16/02/2022 17:56, Tobias Waldekranz wrote:
> On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <nikolay@nvidia.com> wrote:
>> On 16/02/2022 15:29, Tobias Waldekranz wrote:
>>> The bridge has had per-VLAN STP support for a while now, since:
>>>
>>> https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/
>>>
>>> The current implementation has some problems:
>>>
>>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>>>   is managed independently. This is awkward from an MSTP (802.1Q-2018,
>>>   Clause 13.5) point of view, where the model is that multiple VLANs
>>>   are grouped into MST instances.
>>>
>>>   Because of the way that the standard is written, presumably, this is
>>>   also reflected in hardware implementations. It is not uncommon for a
>>>   switch to support the full 4k range of VIDs, but that the pool of
>>>   MST instances is much smaller. Some examples:
>>>
>>>   Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>>>   Marvell Prestera: 4k VLANs, but only 128 MSTIs
>>>   Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>>>
>>> - By default, the feature is enabled, and there is no way to disable
>>>   it. This makes it hard to add offloading in a backwards compatible
>>>   way, since any underlying switchdevs have no way to refuse the
>>>   function if the hardware does not support it
>>>
>>> - The port-global STP state has precedence over per-VLAN states. In
>>>   MSTP, as far as I understand it, all VLANs will use the common
>>>   spanning tree (CST) by default - through traffic engineering you can
>>>   then optimize your network to group subsets of VLANs to use
>>>   different trees (MSTI). To my understanding, the way this is
>>>   typically managed in silicon is roughly:
>>>
>>>   Incoming packet:
>>>   .----.----.--------------.----.-------------
>>>   | DA | SA | 802.1Q VID=X | ET | Payload ...
>>>   '----'----'--------------'----'-------------
>>>                         |
>>>                         '->|\     .----------------------------.
>>>                            | +--> | VID | Members | ... | MSTI |
>>>                    PVID -->|/     |-----|---------|-----|------|
>>>                                   |   1 | 0001001 | ... |    0 |
>>>                                   |   2 | 0001010 | ... |   10 |
>>>                                   |   3 | 0001100 | ... |   10 |
>>>                                   '----------------------------'
>>>                                                              |
>>>                                .-----------------------------'
>>>                                |  .------------------------.
>>>                                '->| MSTI | Fwding | Lrning |
>>>                                   |------|--------|--------|
>>>                                   |    0 | 111110 | 111110 |
>>>                                   |   10 | 110111 | 110111 |
>>>                                   '------------------------'
>>>
>>>   What this is trying to show is that the STP state (whether MSTP is
>>>   used, or ye olde STP) is always accessed via the VLAN table. If STP
>>>   is running, all MSTI pointers in that table will reference the same
>>>   index in the STP stable - if MSTP is running, some VLANs may point
>>>   to other trees (like in this example).
>>>
>>>   The fact that in the Linux bridge, the global state (think: index 0
>>>   in most hardware implementations) is supposed to override the
>>>   per-VLAN state, is very awkward to offload. In effect, this means
>>>   that when the global state changes to blocking, drivers will have to
>>>   iterate over all MSTIs in use, and alter them all to match. This
>>>   also means that you have to cache whether the hardware state is
>>>   currently tracking the global state or the per-VLAN state. In the
>>>   first case, you also have to cache the per-VLAN state so that you
>>>   can restore it if the global state transitions back to forwarding.
>>>
>>> This series adds support for an arbitrary M:N mapping of VIDs to
>>> MSTIs, proposing one solution to the first issue. An example of an
>>> offload implementation for mv88e6xxx is also provided. Offloading is
>>> done on a best-effort basis, i.e. notifications of the relevant events
>>> are generated, but there is no way for the user to see whether the
>>> per-VLAN state has been offloaded or not. There is also no handling of
>>> the relationship between the port-global state the the per-VLAN ditto.
>>>
>>> If I was king of net/bridge/*, I would make the following additional
>>> changes:
>>>
>>> - By default, when a VLAN is created, assign it to MSTID 0, which
>>>   would mean that no per-VLAN state is used and that packets belonging
>>>   to this VLAN should be filtered according to the port-global state.
>>>
>>>   This way, when a VLAN is configured to use a separate tree (setting
>>>   a non-zero MSTID), an underlying switchdev could oppose it if it is
>>>   not supported.
>>>
>>>   Obviously, this adds an extra step for existing users of per-VLAN
>>>   STP states and would thus not be backwards compatible. Maybe this
>>>   means that that is impossible to do, maybe not.
>>>
>>> - Swap the precedence of the port-global and the per-VLAN state,
>>>   i.e. the port-global state only applies to packets belonging to
>>>   VLANs that does not make use of a per-VLAN state (MSTID != 0).
>>>
>>>   This would make the offloading much more natural, as you avoid all
>>>   of the caching stuff described above.
>>>
>>>   Again, this changes the behavior of the kernel so it is not
>>>   backwards compatible. I suspect that this is less of an issue
>>>   though, since my guess is that very few people rely on the old
>>>   behavior.
>>>
>>> Thoughts?
>>>
>>
>> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
>> as described help? It can require that there are no vlans present to change 
>> similar to the per-port vlan stats option.
> 
> Great idea, I did not know that that's how vlan stats worked. I will
> definitely look into it, thanks!
> 
>> Also based on that option you can alter
>> how the state checks are performed. For example, you can skip the initial port state
>> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
>> is disabled and mst enabled and you can avoid checking it altogether if filter && mst
>> are enabled then always use the vlan mst state. Similar changes would have to happen
>> for the egress path. Since we are talking about multiple tests the new MST logic can
>> be hidden behind a static key for both br_handle_frame() and later stages.
> 
> Makes sense.
> 
> So should we keep the current per-VLAN state as-is then?  And bolt the
> MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8
> state` for the current implementation _and_ a `struct br_vlan_mst *`
> that is populated for VLANs tied to a non-zero MSTI?
> 

Good question. The u8 we should keep for the quick access/cache of state, the
ptr we might escape by keeping just the mst id and fetching it, i.e. it could
be just 2 bytes instead of 8, but that's not really a problem if it will be
used just for bookkeeping in slow paths. It can always be pushed in the end of
the struct and if a ptr makes things simpler and easier it's ok.
So in short yes, I think we can keep both.

>> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
>> that is definitely undesirable. Please either cache that state in the vlan and update it when
>> something changes, or think of some way which avoids that cache line in fast-path.
>> Alternative would be to make that cache line dependent on the new option, so it's needed
>> only when mst feature is enabled.
> 
> If we go with the approach I suggested above, then the current `u8
> state` on `struct net_bridge_vlan` could be that cache, right?
> 

Yep, that's the idea.

> With the current implementation, it is set directly - in the new MST
> mode all grouped VLANs would have their states updated when updating the
> MSTI's state.

Right

Cheers,
 Nik