Message ID | 20250213-jk-iavf-abba-lock-crash-v2-1-033d7bf298f8@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net,v2] iavf: fix circular lock dependency with netdev_lock | expand |
On Thu, 13 Feb 2025 16:30:59 -0800 Jacob Keller wrote: > Analyzing the places where we take crit_lock in the driver there are two > sources: > > a) several of the work queue tasks including adminq_task, watchdog_task, > reset_task, and the finish_config task. > > b) various callbacks which ultimately stem back to .ndo operations or > ethtool operations. > > The latter cannot be triggered until after the netdevice registration is > completed successfully. > > The iAVF driver uses alloc_ordered_workqueue, which is an unbound workqueue > that has a max limit of 1, and thus guarantees that only a single work item > on the queue is executing at any given time, so none of the other work > threads could be executing due to the ordered workqueue guarantees. > > The iavf_finish_config() function also does not do anything else after > register_netdevice, unless it fails. It seems unlikely that the driver > private crit_lock is protecting anything that register_netdevice() itself > touches. > > Thus, to fix this ABBA lock violation, lets simply release the > adapter->crit_lock as well as netdev_lock prior to calling > register_netdevice(). We do still keep holding the RTNL lock as required by > the function. If we do fail to register the netdevice, then we re-acquire > the adapter critical lock to finish the transition back to > __IAVF_INIT_CONFIG_ADAPTER. > > This ensures every call where both netdev_lock and the adapter->crit_lock > are acquired under the same ordering. Thanks a lot for figuring this out, much appreciated! Reviewed-by: Jakub Kicinski <kuba@kernel.org>
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Jakub > Kicinski > Sent: Friday, February 14, 2025 8:40 PM > To: Keller, Jacob E <jacob.e.keller@intel.com> > Cc: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>; netdev > <netdev@vger.kernel.org>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; > Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com> > Subject: Re: [Intel-wired-lan] [PATCH iwl-net v2] iavf: fix circular lock dependency > with netdev_lock > > On Thu, 13 Feb 2025 16:30:59 -0800 Jacob Keller wrote: > > Analyzing the places where we take crit_lock in the driver there are > > two > > sources: > > > > a) several of the work queue tasks including adminq_task, > > watchdog_task, reset_task, and the finish_config task. > > > > b) various callbacks which ultimately stem back to .ndo operations or > > ethtool operations. > > > > The latter cannot be triggered until after the netdevice registration > > is completed successfully. > > > > The iAVF driver uses alloc_ordered_workqueue, which is an unbound > > workqueue that has a max limit of 1, and thus guarantees that only a > > single work item on the queue is executing at any given time, so none > > of the other work threads could be executing due to the ordered workqueue > guarantees. > > > > The iavf_finish_config() function also does not do anything else after > > register_netdevice, unless it fails. It seems unlikely that the driver > > private crit_lock is protecting anything that register_netdevice() > > itself touches. > > > > Thus, to fix this ABBA lock violation, lets simply release the > > adapter->crit_lock as well as netdev_lock prior to calling > > register_netdevice(). We do still keep holding the RTNL lock as > > required by the function. If we do fail to register the netdevice, > > then we re-acquire the adapter critical lock to finish the transition > > back to __IAVF_INIT_CONFIG_ADAPTER. > > > > This ensures every call where both netdev_lock and the > > adapter->crit_lock are acquired under the same ordering. > > Thanks a lot for figuring this out, much appreciated! > > Reviewed-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index 852e5b62f0a5dc038c0e5c0f76541870e69384ac..6faa62bced3a2ccb935219ba0726275c8ae60365 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -1983,7 +1983,7 @@ static int iavf_reinit_interrupt_scheme(struct iavf_adapter *adapter, bool runni static void iavf_finish_config(struct work_struct *work) { struct iavf_adapter *adapter; - bool netdev_released = false; + bool locks_released = false; int pairs, err; adapter = container_of(work, struct iavf_adapter, finish_config); @@ -2012,19 +2012,22 @@ static void iavf_finish_config(struct work_struct *work) netif_set_real_num_tx_queues(adapter->netdev, pairs); if (adapter->netdev->reg_state != NETREG_REGISTERED) { + mutex_unlock(&adapter->crit_lock); netdev_unlock(adapter->netdev); - netdev_released = true; + locks_released = true; err = register_netdevice(adapter->netdev); if (err) { dev_err(&adapter->pdev->dev, "Unable to register netdev (%d)\n", err); /* go back and try again.*/ + mutex_lock(&adapter->crit_lock); iavf_free_rss(adapter); iavf_free_misc_irq(adapter); iavf_reset_interrupt_capability(adapter); iavf_change_state(adapter, __IAVF_INIT_CONFIG_ADAPTER); + mutex_unlock(&adapter->crit_lock); goto out; } } @@ -2040,9 +2043,10 @@ static void iavf_finish_config(struct work_struct *work) } out: - mutex_unlock(&adapter->crit_lock); - if (!netdev_released) + if (!locks_released) { + mutex_unlock(&adapter->crit_lock); netdev_unlock(adapter->netdev); + } rtnl_unlock(); }