mbox series

[v5,net-next,00/10] Let switchdev drivers offload and unoffload bridge ports at their own convenience

Message ID 20210720134655.892334-1-vladimir.oltean@nxp.com (mailing list archive)
Headers show
Series Let switchdev drivers offload and unoffload bridge ports at their own convenience | expand

Message

Vladimir Oltean July 20, 2021, 1:46 p.m. UTC
This series introduces an explicit API through which switchdev drivers
mark a bridge port as offloaded or not:
- switchdev_bridge_port_offload()
- switchdev_bridge_port_unoffload()

Currently, the bridge assumes that a port is offloaded if
dev_get_port_parent_id(dev, &ppid, recurse=true) returns something, but
that is just an assumption that breaks some use cases (like a
non-offloaded LAG interface on top of a switchdev port, bridged with
other switchdev ports).

Along with some consolidation of the bridge logic to assign a "switchdev
offloading mark" to a port (now better called a "hardware domain"), this
series allows the bridge driver side to no longer impose restrictions on
that configuration.

Right now, all switchdev drivers must be modified to use the explicit
API, but more and more logic can then be placed centrally in the bridge
and therefore ease the job of a switchdev driver writer in the future.

For example, the first thing we can hook into the explicit switchdev
offloading API calls are the switchdev object and FDB replay helpers.
So far, these have only been used by DSA in "pull" mode (where the
driver must ask for them). Adding the replay helpers to other drivers
involves a lot of repetition. But by moving the helpers inside the
bridge port offload/unoffload hook points, we can move the entire replay
process to "push" mode (where the bridge provides them automatically).

The explicit switchdev offloading API will see further extensions in the
future.

The patches were split from a larger series for easier review:
https://patchwork.kernel.org/project/netdevbpf/cover/20210718214434.3938850-1-vladimir.oltean@nxp.com/

Tobias Waldekranz (2):
  net: bridge: disambiguate offload_fwd_mark
  net: bridge: switchdev: recycle unused hwdoms

Vladimir Oltean (8):
  net: dpaa2-switch: use extack in dpaa2_switch_port_bridge_join
  net: dpaa2-switch: refactor prechangeupper sanity checks
  mlxsw: spectrum: refactor prechangeupper sanity checks
  mlxsw: spectrum: refactor leaving an 8021q upper that is a bridge port
  net: marvell: prestera: refactor prechangeupper sanity checks
  net: switchdev: guard drivers against multiple obj replays on same
    bridge port
  net: bridge: switchdev: let drivers inform which bridge ports are
    offloaded
  net: bridge: switchdev object replay helpers for everybody

 .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  69 +++-
 .../ethernet/marvell/prestera/prestera_main.c |  99 +++--
 .../marvell/prestera/prestera_switchdev.c     |  42 ++-
 .../marvell/prestera/prestera_switchdev.h     |   7 +-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 347 ++++++++++++------
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |   4 +
 .../mellanox/mlxsw/spectrum_switchdev.c       |  28 +-
 .../microchip/sparx5/sparx5_switchdev.c       |  48 ++-
 drivers/net/ethernet/mscc/ocelot_net.c        | 115 ++++--
 drivers/net/ethernet/rocker/rocker.h          |   9 +-
 drivers/net/ethernet/rocker/rocker_main.c     |  34 +-
 drivers/net/ethernet/rocker/rocker_ofdpa.c    |  42 ++-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  34 +-
 drivers/net/ethernet/ti/am65-cpsw-switchdev.c |  14 +-
 drivers/net/ethernet/ti/am65-cpsw-switchdev.h |   3 +
 drivers/net/ethernet/ti/cpsw_new.c            |  32 +-
 drivers/net/ethernet/ti/cpsw_switchdev.c      |   4 +-
 drivers/net/ethernet/ti/cpsw_switchdev.h      |   3 +
 include/linux/if_bridge.h                     |  60 +--
 net/bridge/br_fdb.c                           |   1 -
 net/bridge/br_if.c                            |  11 +-
 net/bridge/br_mdb.c                           |   1 -
 net/bridge/br_private.h                       |  61 ++-
 net/bridge/br_switchdev.c                     | 254 +++++++++++--
 net/bridge/br_vlan.c                          |   1 -
 net/dsa/port.c                                |  83 ++---
 26 files changed, 1059 insertions(+), 347 deletions(-)

Comments

Ido Schimmel July 20, 2021, 2:01 p.m. UTC | #1
On Tue, Jul 20, 2021 at 04:46:45PM +0300, Vladimir Oltean wrote:
> The explicit switchdev offloading API will see further extensions in the
> future.
> 
> The patches were split from a larger series for easier review:
> https://patchwork.kernel.org/project/netdevbpf/cover/20210718214434.3938850-1-vladimir.oltean@nxp.com/

This is not what I meant. I specifically suggested to get the TX
forwarding offload first and then extending the API with an argument to
opt-in for the replay / cleanup:

https://lore.kernel.org/netdev/YPbU20%2Fcjkz04s8b@shredder/

> 
> Tobias Waldekranz (2):
>   net: bridge: disambiguate offload_fwd_mark
>   net: bridge: switchdev: recycle unused hwdoms
> 
> Vladimir Oltean (8):
>   net: dpaa2-switch: use extack in dpaa2_switch_port_bridge_join
>   net: dpaa2-switch: refactor prechangeupper sanity checks
>   mlxsw: spectrum: refactor prechangeupper sanity checks
>   mlxsw: spectrum: refactor leaving an 8021q upper that is a bridge port
>   net: marvell: prestera: refactor prechangeupper sanity checks
>   net: switchdev: guard drivers against multiple obj replays on same
>     bridge port
>   net: bridge: switchdev: let drivers inform which bridge ports are
>     offloaded
>   net: bridge: switchdev object replay helpers for everybody
> 
>  .../ethernet/freescale/dpaa2/dpaa2-switch.c   |  69 +++-
>  .../ethernet/marvell/prestera/prestera_main.c |  99 +++--
>  .../marvell/prestera/prestera_switchdev.c     |  42 ++-
>  .../marvell/prestera/prestera_switchdev.h     |   7 +-
>  .../net/ethernet/mellanox/mlxsw/spectrum.c    | 347 ++++++++++++------
>  .../net/ethernet/mellanox/mlxsw/spectrum.h    |   4 +
>  .../mellanox/mlxsw/spectrum_switchdev.c       |  28 +-
>  .../microchip/sparx5/sparx5_switchdev.c       |  48 ++-
>  drivers/net/ethernet/mscc/ocelot_net.c        | 115 ++++--
>  drivers/net/ethernet/rocker/rocker.h          |   9 +-
>  drivers/net/ethernet/rocker/rocker_main.c     |  34 +-
>  drivers/net/ethernet/rocker/rocker_ofdpa.c    |  42 ++-
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  34 +-
>  drivers/net/ethernet/ti/am65-cpsw-switchdev.c |  14 +-
>  drivers/net/ethernet/ti/am65-cpsw-switchdev.h |   3 +
>  drivers/net/ethernet/ti/cpsw_new.c            |  32 +-
>  drivers/net/ethernet/ti/cpsw_switchdev.c      |   4 +-
>  drivers/net/ethernet/ti/cpsw_switchdev.h      |   3 +
>  include/linux/if_bridge.h                     |  60 +--
>  net/bridge/br_fdb.c                           |   1 -
>  net/bridge/br_if.c                            |  11 +-
>  net/bridge/br_mdb.c                           |   1 -
>  net/bridge/br_private.h                       |  61 ++-
>  net/bridge/br_switchdev.c                     | 254 +++++++++++--
>  net/bridge/br_vlan.c                          |   1 -
>  net/dsa/port.c                                |  83 ++---
>  26 files changed, 1059 insertions(+), 347 deletions(-)
> 
> -- 
> 2.25.1
>
Vladimir Oltean July 20, 2021, 2:12 p.m. UTC | #2
On Tue, Jul 20, 2021 at 05:01:48PM +0300, Ido Schimmel wrote:
> > The patches were split from a larger series for easier review:
> 
> This is not what I meant. I specifically suggested to get the TX
> forwarding offload first and then extending the API with an argument to
> opt-in for the replay / cleanup:

Yeah, ok, I did not get that and I had already reposted by the time you
clarified, sorry.

Anyway, is it so bad that we cannot look at the patches in the order
that they are in right now (even if this means that maybe a few more
days would pass)? To me it makes a bit more sense anyway to first
consolidate the code that is already in the tree right now, before
adding new logic. And I don't really want to rebase the patches again to
change the ordering and risk a build breakage without a good reason.
Ido Schimmel July 20, 2021, 2:25 p.m. UTC | #3
On Tue, Jul 20, 2021 at 02:12:01PM +0000, Vladimir Oltean wrote:
> On Tue, Jul 20, 2021 at 05:01:48PM +0300, Ido Schimmel wrote:
> > > The patches were split from a larger series for easier review:
> > 
> > This is not what I meant. I specifically suggested to get the TX
> > forwarding offload first and then extending the API with an argument to
> > opt-in for the replay / cleanup:
> 
> Yeah, ok, I did not get that and I had already reposted by the time you
> clarified, sorry.
> 
> Anyway, is it so bad that we cannot look at the patches in the order
> that they are in right now (even if this means that maybe a few more
> days would pass)? To me it makes a bit more sense anyway to first
> consolidate the code that is already in the tree right now, before
> adding new logic. And I don't really want to rebase the patches again to
> change the ordering and risk a build breakage without a good reason.

If you don't want to change the order, then at least make the
replay/cleanup optional and set it to 'false' for mlxsw. This should
mean that the only change in mlxsw should be adding calls to
switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
respectively.
Vladimir Oltean July 20, 2021, 2:46 p.m. UTC | #4
On Tue, Jul 20, 2021 at 05:25:08PM +0300, Ido Schimmel wrote:
> If you don't want to change the order, then at least make the
> replay/cleanup optional and set it to 'false' for mlxsw. This should
> mean that the only change in mlxsw should be adding calls to
> switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
> mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
> respectively.

I mean, I could guard br_{vlan,mdb,fdb}_replay() against NULL notifier
block pointers, and then make mlxsw pass NULL for both the atomic_nb and
blocking_nb.

But why? How do you deal with a host-joined mdb that was auto-installed
while there was no port under the bridge? How does anyone deal with
that? What's optional about it? Why would driver X opt out of it but Y
not (apart for the case where driver X does not offload MDBs at all,
that I can understand).
Ido Schimmel July 20, 2021, 2:51 p.m. UTC | #5
On Tue, Jul 20, 2021 at 02:46:18PM +0000, Vladimir Oltean wrote:
> On Tue, Jul 20, 2021 at 05:25:08PM +0300, Ido Schimmel wrote:
> > If you don't want to change the order, then at least make the
> > replay/cleanup optional and set it to 'false' for mlxsw. This should
> > mean that the only change in mlxsw should be adding calls to
> > switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
> > mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
> > respectively.
> 
> I mean, I could guard br_{vlan,mdb,fdb}_replay() against NULL notifier
> block pointers, and then make mlxsw pass NULL for both the atomic_nb and
> blocking_nb.
> 
> But why? How do you deal with a host-joined mdb that was auto-installed
> while there was no port under the bridge?

mlxsw does not currently support such entries. It's on my TODO list.
When we add support for that, we will also take care of the replay.

> How does anyone deal with that? What's optional about it? Why would
> driver X opt out of it but Y not (apart for the case where driver X
> does not offload MDBs at all, that I can understand).
Vladimir Oltean July 20, 2021, 3:36 p.m. UTC | #6
On Tue, Jul 20, 2021 at 05:51:24PM +0300, Ido Schimmel wrote:
> On Tue, Jul 20, 2021 at 02:46:18PM +0000, Vladimir Oltean wrote:
> > On Tue, Jul 20, 2021 at 05:25:08PM +0300, Ido Schimmel wrote:
> > > If you don't want to change the order, then at least make the
> > > replay/cleanup optional and set it to 'false' for mlxsw. This should
> > > mean that the only change in mlxsw should be adding calls to
> > > switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
> > > mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
> > > respectively.
> > 
> > I mean, I could guard br_{vlan,mdb,fdb}_replay() against NULL notifier
> > block pointers, and then make mlxsw pass NULL for both the atomic_nb and
> > blocking_nb.
> > 
> > But why? How do you deal with a host-joined mdb that was auto-installed
> > while there was no port under the bridge?
> 
> mlxsw does not currently support such entries. It's on my TODO list.
> When we add support for that, we will also take care of the replay.

Okay, that I can do. I had the impression that mlxsw does - I knew for
certain that DSA isn't the only driver offloading SWITCHDEV_OBJ_ID_HOST_MDB
so I looked it up right now, and I remembered. cpsw was the other driver,
and it does a pretty funny thing: the same thing as for
SWITCHDEV_OBJ_ID_PORT_MDB.

I guess I'll just provide NULL pointers for every driver except those I
already received acks for (dpaa2-switch, ocelot) and DSA. Then driver
maintainers can take it from there as they wish. Hopefully this should
also make the patches slide in easier.
Vladimir Oltean July 20, 2021, 7:47 p.m. UTC | #7
On Tue, Jul 20, 2021 at 05:25:08PM +0300, Ido Schimmel wrote:
> If you don't want to change the order, then at least make the
> replay/cleanup optional and set it to 'false' for mlxsw. This should
> mean that the only change in mlxsw should be adding calls to
> switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
> mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
> respectively.

Is there any specific reason why you suggested me to move the
switchdev_bridge_port_offload() call from the top-level
mlxsw_sp_port_bridge_join() into mlxsw_sp_bridge_port_create()
(and similarly, from _pre_bridge_leave to _destroy)?

Even if you don't support replays right now, you'd need to move a bunch
of code around before you would get them to work. As far as I can see,
mlxsw_sp_bridge_port_create() is a bit too early and
mlxsw_sp_bridge_port_destroy() is a bit too late. The port needs to be
fairly up and running to be able to process the switchdev notifiers at
that stage.

Do you mind if I keep the hooks where they are, which is what I do for
all drivers? I don't think I am missing to handle any case.
Ido Schimmel July 21, 2021, 6:55 a.m. UTC | #8
On Tue, Jul 20, 2021 at 07:47:18PM +0000, Vladimir Oltean wrote:
> On Tue, Jul 20, 2021 at 05:25:08PM +0300, Ido Schimmel wrote:
> > If you don't want to change the order, then at least make the
> > replay/cleanup optional and set it to 'false' for mlxsw. This should
> > mean that the only change in mlxsw should be adding calls to
> > switchdev_bridge_port_offload() / switchdev_bridge_port_unoffload() in
> > mlxsw_sp_bridge_port_create() / mlxsw_sp_bridge_port_destroy(),
> > respectively.
> 
> Is there any specific reason why you suggested me to move the
> switchdev_bridge_port_offload() call from the top-level
> mlxsw_sp_port_bridge_join() into mlxsw_sp_bridge_port_create()
> (and similarly, from _pre_bridge_leave to _destroy)?
> 
> Even if you don't support replays right now, you'd need to move a bunch
> of code around before you would get them to work. As far as I can see,
> mlxsw_sp_bridge_port_create() is a bit too early and
> mlxsw_sp_bridge_port_destroy() is a bit too late. The port needs to be
> fairly up and running to be able to process the switchdev notifiers at
> that stage.
> 
> Do you mind if I keep the hooks where they are, which is what I do for
> all drivers? I don't think I am missing to handle any case.

I want to avoid introducing unnecessary changes. I will do them when
needed.

Thanks