Message ID | 20250327135659.2057487-9-sdf@fomichev.me (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: hold instance lock during NETDEV_UP/REGISTER/UNREGISTER | expand |
On Thu, 27 Mar 2025 06:56:56 -0700 Stanislav Fomichev wrote: > We don't have a consistent state yet, but document where we think > we are and where we wanna be. Thanks for adding the doc! > +Notifiers and netdev instance lock > +================================== > + > +For device drivers that implement shaping or queue management APIs, > +some of the notifiers (``enum netdev_cmd``) are running under the netdev > +instance lock. > + > +Currently only the following notifiers are running under the instance lock: I'd repeat again here: ... for devices with locked ops: > +* ``NETDEV_REGISTER`` > +* ``NETDEV_UP`` > +* ``NETDEV_UNREGISTER`` Can I ask the obvious question - anything specific that's hard in also taking it in DOWN or just no time to investigate? Symmetry would be great. > +There are no clear expectations for the remaining notifiers. Notifiers not on > +the list may run with or without the instance lock, potentially even invoking > +the same notifier type with and without the lock from different code paths. > +The goal is to eventually ensure that all (or most, with a few documented > +exceptions) notifiers run under the instance lock.
On Thu, 27 Mar 2025 12:16:13 -0700 Jakub Kicinski wrote: > > +* ``NETDEV_REGISTER`` > > +* ``NETDEV_UP`` > > +* ``NETDEV_UNREGISTER`` > > Can I ask the obvious question - anything specific that's hard in also > taking it in DOWN or just no time to investigate? Symmetry would be > great. Looking at patch 4 maybe we should do the opposite. This was my original commit msg for locking UNREGISTER: net: make NETDEV_UNREGISTER and instance lock more consistent The NETDEV_UNREGISTER notifier gets called under the ops lock when device changes namespace but not during real unregistration. Take it consistently, XSK tries to poke at netdev queue state from this notifier. So if the only caller currently under the lock is netns change, and we already split that to release the lock - maybe we can make UNREGISTER always unlocked instead?
On 03/27, Jakub Kicinski wrote: > On Thu, 27 Mar 2025 12:16:13 -0700 Jakub Kicinski wrote: > > > +* ``NETDEV_REGISTER`` > > > +* ``NETDEV_UP`` > > > +* ``NETDEV_UNREGISTER`` > > > > Can I ask the obvious question - anything specific that's hard in also > > taking it in DOWN or just no time to investigate? Symmetry would be > > great. The latter: I added locks for DOWN, hit a dev_close somewhere, and decided no to go down the rabbit hole. > Looking at patch 4 maybe we should do the opposite. This was my > original commit msg for locking UNREGISTER: > > net: make NETDEV_UNREGISTER and instance lock more consistent > > The NETDEV_UNREGISTER notifier gets called under the ops lock > when device changes namespace but not during real unregistration. > Take it consistently, XSK tries to poke at netdev queue state > from this notifier. > > So if the only caller currently under the lock is netns change, and > we already split that to release the lock - maybe we can make > UNREGISTER always unlocked instead? That sounds very sensible, let me try it out and run the tests. I'll have to drop the lock twice, once for NETDEV_UNREGISTER and another time for move_netdevice_notifiers_dev_net, but since the device is unlisted, nothing should touch it (in theory)? netif_change_net_namespace is already the first thing that happens in do_setlink, so I won't be converting it to dev_xxx (lmk if I miss something here). If it goes well, I'll also hack your xsk patch to grab the ops lock in xsk_notifier:NETDEV_UNREGISTER, I think that's the only thing that needs changes. The ugly bonding and teaming unlock/locks will go away (which is a very nice side effect).
On Thu, 27 Mar 2025 13:57:01 -0700 Stanislav Fomichev wrote: > That sounds very sensible, let me try it out and run the tests. > I'll have to drop the lock twice, once for NETDEV_UNREGISTER > and another time for move_netdevice_notifiers_dev_net, but since > the device is unlisted, nothing should touch it (in theory)? Yup, and/or we can adjust if we find a reason to, I don't think the ordering of the actions in netns changes is precisely intentional. > netif_change_net_namespace is already the first thing that happens > in do_setlink, so I won't be converting it to dev_xxx (lmk if I > miss something here). I thought you could move it outside the lock in do_setlink() and have [netif -> dev]_change_net_namespace take the lock. Dropping and taking the lock in a callee is a bit bad, so I'd prefer if the netif_ / "I want to switch netns but I'm already holding the lock" version of _change_net_namespace didn't exist at all.
On 03/27, Jakub Kicinski wrote: > On Thu, 27 Mar 2025 13:57:01 -0700 Stanislav Fomichev wrote: > > That sounds very sensible, let me try it out and run the tests. > > I'll have to drop the lock twice, once for NETDEV_UNREGISTER > > and another time for move_netdevice_notifiers_dev_net, but since > > the device is unlisted, nothing should touch it (in theory)? > > Yup, and/or we can adjust if we find a reason to, I don't think > the ordering of the actions in netns changes is precisely intentional. > > > netif_change_net_namespace is already the first thing that happens > > in do_setlink, so I won't be converting it to dev_xxx (lmk if I > > miss something here). > > I thought you could move it outside the lock in do_setlink() > and have [netif -> dev]_change_net_namespace take the lock. > Dropping and taking the lock in a callee is a bit bad, so > I'd prefer if the netif_ / "I want to switch netns but I'm already > holding the lock" version of _change_net_namespace didn't exist > at all. Looks like I also accidentally killed extack argument of netif_change_net_namespace (by always passing NULL). Will bring __dev_change_net_namespace, with proper locking and extack and will call it before grabbing a lock in the do_setlink as you suggest (with proper locking inside).
diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index ebb868f50ac2..89337cfec36e 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -343,6 +343,24 @@ there are two sets of interfaces: ``dev_xxx`` and ``netif_xxx`` (e.g., acquiring the instance lock themselves, while the ``netif_xxx`` functions assume that the driver has already acquired the instance lock. +Notifiers and netdev instance lock +================================== + +For device drivers that implement shaping or queue management APIs, +some of the notifiers (``enum netdev_cmd``) are running under the netdev +instance lock. + +Currently only the following notifiers are running under the instance lock: +* ``NETDEV_REGISTER`` +* ``NETDEV_UP`` +* ``NETDEV_UNREGISTER`` + +There are no clear expectations for the remaining notifiers. Notifiers not on +the list may run with or without the instance lock, potentially even invoking +the same notifier type with and without the lock from different code paths. +The goal is to eventually ensure that all (or most, with a few documented +exceptions) notifiers run under the instance lock. + NETDEV_INTERNAL symbol namespace ================================
We don't have a consistent state yet, but document where we think we are and where we wanna be. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> --- Documentation/networking/netdevices.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)