From patchwork Wed Jan 6 18:40:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 12002077 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60DB2C433E9 for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3222523138 for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726589AbhAFSk4 (ORCPT ); Wed, 6 Jan 2021 13:40:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:60926 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725800AbhAFSky (ORCPT ); Wed, 6 Jan 2021 13:40:54 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id B56282312E; Wed, 6 Jan 2021 18:40:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1609958414; bh=+RzsTnhe1k6OhIx+Tp2vR0dTQGfo1k6qgy8dqBi1zfM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Yp6kb5Co74v9jefJdyBsxxixxJ53oilBzlWW9krWJqCY7Fz1DzQxyLu1APV+usS1g JNVmQvGvTHFI3bTZ6UILFEqgLtun3alHKj6pqSyxGWggYtOfspA/Udcct7WjoRD7gh ZUjzy1DN4b9UCKUEvFXBDx5/B0av7ulSgMh2WLTZNjQ1jSdvOaCsvqGCNq4O5abtXj /uZoDXPScFfcuUAx5QoOnjrz40naz8sYUNum1cE2fna+eTezbCuZ8bpl2oQYeh+FRh mSv+aHHD1mstx9+EX9JLUS3xfolnX7L9W2LcfJHf0YSnVwAZGpOk5IJtZoTChgxfaP yaiineGHYvSXA== From: Jakub Kicinski To: davem@davemloft.net Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org, f.fainelli@gmail.com, xiyou.wangcong@gmail.com, Jakub Kicinski Subject: [PATCH net 1/3] docs: net: explain struct net_device lifetime Date: Wed, 6 Jan 2021 10:40:05 -0800 Message-Id: <20210106184007.1821480-2-kuba@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210106184007.1821480-1-kuba@kernel.org> References: <20210106184007.1821480-1-kuba@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Explain the two basic flows of struct net_device's operation. Signed-off-by: Jakub Kicinski --- Documentation/networking/netdevices.rst | 171 +++++++++++++++++++++++- net/core/rtnetlink.c | 2 +- 2 files changed, 166 insertions(+), 7 deletions(-) diff --git a/Documentation/networking/netdevices.rst b/Documentation/networking/netdevices.rst index e65665c5ab50..17bdcb746dcf 100644 --- a/Documentation/networking/netdevices.rst +++ b/Documentation/networking/netdevices.rst @@ -10,18 +10,177 @@ Introduction The following is a random collection of documentation regarding network devices. -struct net_device allocation rules -================================== +struct net_device lifetime rules +================================ Network device structures need to persist even after module is unloaded and must be allocated with alloc_netdev_mqs() and friends. If device has registered successfully, it will be freed on last use -by free_netdev(). This is required to handle the pathologic case cleanly -(example: rmmod mydriver needs_free_netdev = true; + } + + static void my_destructor(struct net_device *dev) + { + some_obj_destroy(priv->obj); + some_uninit(priv); + } + + int create_link() + { + struct my_device_priv *priv; + int err; + + ASSERT_RTNL(); + + dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup); + if (!dev) + return -ENOMEM; + priv = netdev_priv(dev); + + /* Implicit constructor */ + err = some_init(priv); + if (err) + goto err_free_dev; + + priv->obj = some_obj_create(); + if (!priv->obj) { + err = -ENOMEM; + goto err_some_uninit; + } + /* End of constructor, set the destructor: */ + dev->priv_destructor = my_destructor; + + err = register_netdevice(dev); + if (err) + /* register_netdevice() calls destructor on failure */ + goto err_free_dev; + + /* If anything fails now unregister_netdevice() (or unregister_netdev()) + * will take care of calling my_destructor and free_netdev(). + */ + + return 0; + + err_some_uninit: + some_uninit(priv); + err_free_dev: + free_netdev(dev); + return err; + } + +If struct net_device.priv_destructor is set it will be called by the core +some time after unregister_netdevice(), it will also be called if +register_netdevice() fails. The callback may be invoked with or without +``rtnl_lock`` held. + +There is no explicit constructor callback, driver "constructs" the private +netdev state after allocating it and before registration. + +Setting struct net_device.needs_free_netdev makes core call free_netdevice() +automatically after unregister_netdevice() when all references to the device +are gone. It only takes effect after a successful call to register_netdevice() +so if register_netdevice() fails driver is responsible for calling +free_netdev(). + +free_netdev() is safe to call on error paths right after unregister_netdevice() +or when register_netdevice() fails. Parts of netdev (de)registration process +happen after ``rtnl_lock`` is released, therefore in those cases free_netdev() +will defer some of the processing until ``rtnl_lock`` is released. + +Devices spawned from struct rtnl_link_ops should never free the +struct net_device directly. + +.ndo_init and .ndo_uninit +~~~~~~~~~~~~~~~~~~~~~~~~~ + +``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device +registration and de-registration, under ``rtnl_lock``. Drivers can use +those e.g. when parts of their init process need to run under ``rtnl_lock``. + +``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit`` +runs during de-registering after device is closed but other subsystems +may still have outstanding references to the netdevice. MTU === diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index bb0596c41b3e..79f514afb17d 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3441,7 +3441,7 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, if (ops->newlink) { err = ops->newlink(link_net ? : net, dev, tb, data, extack); - /* Drivers should call free_netdev() in ->destructor + /* Drivers should set dev->needs_free_netdev * and unregister it on failure after registration * so that device could be finally freed in rtnl_unlock. */ From patchwork Wed Jan 6 18:40:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 12002079 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C100C433E6 for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BED523125 for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726551AbhAFSk4 (ORCPT ); Wed, 6 Jan 2021 13:40:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:60938 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726416AbhAFSkz (ORCPT ); Wed, 6 Jan 2021 13:40:55 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 25F6723132; Wed, 6 Jan 2021 18:40:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1609958414; bh=T9ZCTKpmGSnUw53mKVrZVPldMGN72kA4kpFQX1gW0GE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cTF1rS9ysaq72jZtolgUyGrJ7dkoTP9uTi4ZUNixWoE21DCR7FuSnunvuwsv09foL iuxAOLHvy6V2tRlc6Sb6sO8aNFyz9rQbrj+0bjJkpnuxQsoZRO58CU1I2fXK64Utun 7GNzpG4iFUQJEXzai63qFSLfwzwoO8TB06CSBFmkSFryZ1hV5eESpBWq+1DCd/bAtb pxs4FqeJxV5Ei2iYI4GvOKjs0zpXGg3oS7uRAWXAOYV6GGh/ILl6YR1WukckMTWa4a rt4W2FvXcwjHUe7WK1+6R6Uv5280mBGs3XloePW2qpku6QmIsCVZx6GQAtI2pvg8nd lAuWrJZngEHTQ== From: Jakub Kicinski To: davem@davemloft.net Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org, f.fainelli@gmail.com, xiyou.wangcong@gmail.com, Jakub Kicinski Subject: [PATCH net 2/3] net: make free_netdev() more lenient with unregistering devices Date: Wed, 6 Jan 2021 10:40:06 -0800 Message-Id: <20210106184007.1821480-3-kuba@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210106184007.1821480-1-kuba@kernel.org> References: <20210106184007.1821480-1-kuba@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org There are two flavors of handling netdev registration: - ones called without holding rtnl_lock: register_netdev() and unregister_netdev(); and - those called with rtnl_lock held: register_netdevice() and unregister_netdevice(). While the semantics of the former are pretty clear, the same can't be said about the latter. The netdev_todo mechanism is utilized to perform some of the device unregistering tasks and it hooks into rtnl_unlock() so the locked variants can't actually finish the work. In general free_netdev() does not mix well with locked calls. Most drivers operating under rtnl_lock set dev->needs_free_netdev to true and expect core to make the free_netdev() call some time later. The part where this becomes most problematic is error paths. There is no way to unwind the state cleanly after a call to register_netdevice(), since unreg can't be performed fully without dropping locks. Make free_netdev() more lenient, and defer the freeing if device is being unregistered. This allows error paths to simply call free_netdev() both after register_netdevice() failed, and after a call to unregister_netdevice() but before dropping rtnl_lock. Simplify the error paths which are currently doing gymnastics around free_netdev() handling. Signed-off-by: Jakub Kicinski --- net/8021q/vlan.c | 4 +--- net/core/dev.c | 11 +++++++++++ net/core/rtnetlink.c | 23 ++++++----------------- 3 files changed, 18 insertions(+), 20 deletions(-) diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index 15bbfaf943fd..8b644113715e 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -284,9 +284,7 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id) return 0; out_free_newdev: - if (new_dev->reg_state == NETREG_UNINITIALIZED || - new_dev->reg_state == NETREG_UNREGISTERED) - free_netdev(new_dev); + free_netdev(new_dev); return err; } diff --git a/net/core/dev.c b/net/core/dev.c index 8fa739259041..adde93cbca9f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10631,6 +10631,17 @@ void free_netdev(struct net_device *dev) struct napi_struct *p, *n; might_sleep(); + + /* When called immediately after register_netdevice() failed the unwind + * handling may still be dismantling the device. Handle that case by + * deferring the free. + */ + if (dev->reg_state == NETREG_UNREGISTERING) { + ASSERT_RTNL(); + dev->needs_free_netdev = true; + return; + } + netif_free_tx_queues(dev); netif_free_rx_queues(dev); diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 79f514afb17d..3d6ab194d0f5 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3439,26 +3439,15 @@ static int __rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, dev->ifindex = ifm->ifi_index; - if (ops->newlink) { + if (ops->newlink) err = ops->newlink(link_net ? : net, dev, tb, data, extack); - /* Drivers should set dev->needs_free_netdev - * and unregister it on failure after registration - * so that device could be finally freed in rtnl_unlock. - */ - if (err < 0) { - /* If device is not registered at all, free it now */ - if (dev->reg_state == NETREG_UNINITIALIZED || - dev->reg_state == NETREG_UNREGISTERED) - free_netdev(dev); - goto out; - } - } else { + else err = register_netdevice(dev); - if (err < 0) { - free_netdev(dev); - goto out; - } + if (err < 0) { + free_netdev(dev); + goto out; } + err = rtnl_configure_link(dev, ifm); if (err < 0) goto out_unregister; From patchwork Wed Jan 6 18:40:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 12002075 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 775D0C43381 for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A6252312F for ; Wed, 6 Jan 2021 18:41:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726641AbhAFSk5 (ORCPT ); Wed, 6 Jan 2021 13:40:57 -0500 Received: from mail.kernel.org ([198.145.29.99]:60954 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725789AbhAFSkz (ORCPT ); Wed, 6 Jan 2021 13:40:55 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8A2B223137; Wed, 6 Jan 2021 18:40:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1609958414; bh=0BrWNZz5FTEATIslw6poEkKFNsItoHl93UC2ZCWjyoE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HfQ2g8hPUPNlBlMXidj9GFsuK4OMZX/NKYDlnlH4XaeG/k9oKWeZuPTUzueec8Xd+ /3sidZ14IzfASQ2xT5f/ofLD6abPAvQBJFLuNtMiIe0jYEFemtJItziyWqYoVYB1OG 9KCdMzwpifrIy2InTt8PpUYhMDnS14nsN4hJ55PJetGTey8vcNGgOqdp0zrLGs+oAD AITOrmyiTvvdeHAQqnHmXv4qvEvZ6GimVI1qqUYGo8BR+owdx7uvmujWudj5Rd8Tb8 muR7kgxCF7hM+2FeJb/hOCSkoFW74igvhpmbPUKOTL2LYtVmfgG+KFeDcqj6kNLeTj f71eNB0Jphn/g== From: Jakub Kicinski To: davem@davemloft.net Cc: netdev@vger.kernel.org, linux-doc@vger.kernel.org, f.fainelli@gmail.com, xiyou.wangcong@gmail.com, Jakub Kicinski , Hulk Robot , Yang Yingliang Subject: [PATCH net 3/3] net: make sure devices go through netdev_wait_all_refs Date: Wed, 6 Jan 2021 10:40:07 -0800 Message-Id: <20210106184007.1821480-4-kuba@kernel.org> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210106184007.1821480-1-kuba@kernel.org> References: <20210106184007.1821480-1-kuba@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org If register_netdevice() fails at the very last stage - the notifier call - some subsystems may have already seen it and grabbed a reference. struct net_device can't be freed right away without calling netdev_wait_all_refs(). Now that we have a clean interface in form of dev->needs_free_netdev and lenient free_netdev() we can undo what commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.") has done and complete the unregistration path by bringing the net_set_todo() call back. After registration fails user is still expected to explicitly free the net_device, so make sure ->needs_free_netdev is cleared, otherwise rolling back the registration will cause the old double free for callers who release rtnl_lock before the free. This also solves the problem of priv_destructor not being called on notifier error. net_set_todo() will be moved back into unregister_netdevice_queue() in a follow up. Reported-by: Hulk Robot Reported-by: Yang Yingliang Signed-off-by: Jakub Kicinski --- net/core/dev.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index adde93cbca9f..0071a11a6dc3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10077,17 +10077,11 @@ int register_netdevice(struct net_device *dev) ret = call_netdevice_notifiers(NETDEV_REGISTER, dev); ret = notifier_to_errno(ret); if (ret) { + /* Expect explicit free_netdev() on failure */ + dev->needs_free_netdev = false; rollback_registered(dev); - rcu_barrier(); - - dev->reg_state = NETREG_UNREGISTERED; - /* We should put the kobject that hold in - * netdev_unregister_kobject(), otherwise - * the net device cannot be freed when - * driver calls free_netdev(), because the - * kobject is being hold. - */ - kobject_put(&dev->dev.kobj); + net_set_todo(dev); + goto out; } /* * Prevent userspace races by waiting until the network