mbox series

[RFC,net-next,0/8] Drop rtnl_lock from DSA .port_fdb_{add,del}

Message ID 20210824114049.3814660-1-vladimir.oltean@nxp.com (mailing list archive)
Headers show
Series Drop rtnl_lock from DSA .port_fdb_{add,del} | expand

Message

Vladimir Oltean Aug. 24, 2021, 11:40 a.m. UTC
This is a heads-up to DSA driver maintainers: in about 3 weeks time
(when the development cycle for v5.16 begins), I will come back with
these patches and attempt to drop the rtnl_lock guarantee from the DSA
.port_fdb_add and .port_fdb_del methods.

Plans might change, but this seems like an overall beneficial change if
we could make it, regardless of whether it is going to be part of the
final solution for enforcing FDB isolation in DSA.

After applying the entire patch set, the .port_fdb_add and .port_fdb_del
methods will run unlocked, and I would appreciate any regression test
that a maintainer can run on their hardware. Most drivers have locking
of sorts, but I wouldn't trust it, since it wasn't really put to test
until now.

The change set is structured as follows (from bottom to top):

1. A self test that driver maintainers could run while testing their
   newly unlocked ops. It is not bullet-proof, but I have found issues
   with it, so maybe it is useful.

   To run it, rsync the entire "selftests" folder to the board, then run:

   ./selftests/drivers/net/dsa/test_bridge_fdb_stress.sh sw0p2

   For the sja1105 driver, this was an indication that the internal
   locking was not sufficient:

[  282.615386] sja1105 spi2.0: port 2 failed to read back entry for 00:01:02:03:04:05 vid 1: -ENOENT <- printed by the driver
[  282.624796] sja1105 spi2.0: port 2 failed to add 00:01:02:03:04:05 vid 1 to fdb: -2 <- printed by DSA

   The self-test does not test traffic, but it would be nice to check if
   the switch still behaves normally after the test finishes.

2. The DSA changes themselves that drop the rtnl_lock.

3. Some example changes that I needed to make in two drivers I could
   test. I would very much prefer avoiding non-expert, wide ranging
   locking schemes such as an "FDB lock" or a "register lock". It is
   best to understand what needs to be atomic and what can be safely
   concurrent.

As usual, feedback and ACKs/NACKs are very welcome.

Vladimir Oltean (8):
  net: dsa: sja1105: wait for dynamic config command completion on
    writes too
  net: dsa: sja1105: serialize access to the dynamic config interface
  net: mscc: ocelot: serialize access to the MAC table
  net: dsa: introduce locking for the address lists on CPU and DSA ports
  net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work
  net: dsa: flush switchdev workqueue when leaving the bridge
  selftests: lib: forwarding: allow tests to not require mz and jq
  selftests: net: dsa: add a stress test for unlocked FDB operations

 MAINTAINERS                                   |  1 +
 drivers/net/dsa/sja1105/sja1105.h             |  2 +
 .../net/dsa/sja1105/sja1105_dynamic_config.c  | 91 ++++++++++++++-----
 drivers/net/dsa/sja1105/sja1105_main.c        |  1 +
 drivers/net/ethernet/mscc/ocelot.c            | 53 ++++++++---
 include/net/dsa.h                             |  1 +
 include/soc/mscc/ocelot.h                     |  3 +
 net/dsa/dsa.c                                 |  5 +
 net/dsa/dsa2.c                                |  1 +
 net/dsa/dsa_priv.h                            |  2 +
 net/dsa/port.c                                |  2 +
 net/dsa/slave.c                               |  2 -
 net/dsa/switch.c                              | 76 +++++++++++-----
 .../drivers/net/dsa/test_bridge_fdb_stress.sh | 48 ++++++++++
 tools/testing/selftests/net/forwarding/lib.sh | 10 +-
 15 files changed, 235 insertions(+), 63 deletions(-)
 create mode 100755 tools/testing/selftests/drivers/net/dsa/test_bridge_fdb_stress.sh