mbox series

[net,0/3] net: fix issues around register_netdevice() failures

Message ID 20210106184007.1821480-1-kuba@kernel.org (mailing list archive)
Headers show
Series net: fix issues around register_netdevice() failures | expand

Message

Jakub Kicinski Jan. 6, 2021, 6:40 p.m. UTC
This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().

This is the next chapter of the saga which already includes:
commit 0e0eee2465df ("net: correct error path in rtnl_newlink()")
commit e51fb152318e ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a89ed5 ("net: fix memleak in register_netdevice()")
commit 10cc514f451a ("net: Fix null de-reference of device refcount")

The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.

Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).

This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.

With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.

To prevent the pathological case of:

 dev->needs_free_netdev = true;
 if (register_netdevice(dev)) {
   rtnl_unlock();
   free_netdev(dev);
 }

make register_netdevice()'s failure clear dev->needs_free_netdev.

Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
 - those called outside rtnl_lock (register_netdev(), and
   unregister_netdev());
 - and those to be used under rtnl_lock - register_netdevice()
   and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.

Jakub Kicinski (3):
  docs: net: explain struct net_device lifetime
  net: make free_netdev() more lenient with unregistering devices
  net: make sure devices go through netdev_wait_all_refs

 Documentation/networking/netdevices.rst | 171 +++++++++++++++++++++++-
 net/8021q/vlan.c                        |   4 +-
 net/core/dev.c                          |  25 ++--
 net/core/rtnetlink.c                    |  23 +---
 4 files changed, 187 insertions(+), 36 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Jan. 9, 2021, 3:40 a.m. UTC | #1
Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Wed,  6 Jan 2021 10:40:04 -0800 you wrote:
> This series attempts to clean up the life cycle of struct
> net_device. Dave has added dev->needs_free_netdev in the
> past to fix double frees, we can lean on that mechanism
> a little more to fix remaining issues with register_netdevice().
> 
> This is the next chapter of the saga which already includes:
> commit 0e0eee2465df ("net: correct error path in rtnl_newlink()")
> commit e51fb152318e ("rtnetlink: fix a memory leak when ->newlink fails")
> commit cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
> commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
> commit 814152a89ed5 ("net: fix memleak in register_netdevice()")
> commit 10cc514f451a ("net: Fix null de-reference of device refcount")
> 
> [...]

Here is the summary with links:
  - [net,1/3] docs: net: explain struct net_device lifetime
    https://git.kernel.org/netdev/net/c/2b446e650b41
  - [net,2/3] net: make free_netdev() more lenient with unregistering devices
    https://git.kernel.org/netdev/net/c/c269a24ce057
  - [net,3/3] net: make sure devices go through netdev_wait_all_refs
    https://git.kernel.org/netdev/net/c/766b0515d5be

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html