Message ID | 20210316100141.53551-1-sassmann@kpanic.de (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | iavf: fix locking of critical sections | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | warning | 2 maintainers not CCed: davem@davemloft.net kuba@kernel.org |
netdev/source_inline | fail | Was 0 now: 1 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 1 this patch: 2 |
netdev/kdoc | fail | Errors and warnings before: 1 this patch: 2 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: __func__ should be used instead of gcc specific __FUNCTION__ WARNING: line length of 109 exceeds 80 columns |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 1 this patch: 2 |
netdev/header_inline | success | Link |
On Tue, 16 Mar 2021 11:01:41 +0100 Stefan Assmann wrote: > To avoid races between iavf_init_task(), iavf_reset_task(), > iavf_watchdog_task(), iavf_adminq_task() as well as the shutdown and > remove functions more locking is required. > The current protection by __IAVF_IN_CRITICAL_TASK is needed in > additional places. > > - The reset task performs state transitions, therefore needs locking. > - The adminq task acts on replies from the PF in > iavf_virtchnl_completion() which may alter the states. > - The init task is not only run during probe but also if a VF gets stuck > to reinitialize it. > - The shutdown function performs a state transition. > - The remove function perorms a state transition and also free's > resources. > > iavf_lock_timeout() is introduced to avoid waiting infinitely > and cause a deadlock. Rather unlock and print a warning. > > Signed-off-by: Stefan Assmann <sassmann@kpanic.de> I personally think that the overuse of flags in Intel drivers brings nothing but trouble. At which point does it make sense to just add a lock / semaphore here rather than open code all this with no clear semantics? No code seems to just test the __IAVF_IN_CRITICAL_TASK flag, all the uses look like poor man's locking at a quick grep. What am I missing?
On 16.03.21 18:14, Jakub Kicinski wrote: > On Tue, 16 Mar 2021 11:01:41 +0100 Stefan Assmann wrote: >> To avoid races between iavf_init_task(), iavf_reset_task(), >> iavf_watchdog_task(), iavf_adminq_task() as well as the shutdown and >> remove functions more locking is required. >> The current protection by __IAVF_IN_CRITICAL_TASK is needed in >> additional places. >> >> - The reset task performs state transitions, therefore needs locking. >> - The adminq task acts on replies from the PF in >> iavf_virtchnl_completion() which may alter the states. >> - The init task is not only run during probe but also if a VF gets stuck >> to reinitialize it. >> - The shutdown function performs a state transition. >> - The remove function perorms a state transition and also free's >> resources. >> >> iavf_lock_timeout() is introduced to avoid waiting infinitely >> and cause a deadlock. Rather unlock and print a warning. >> >> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> > > I personally think that the overuse of flags in Intel drivers brings > nothing but trouble. At which point does it make sense to just add a > lock / semaphore here rather than open code all this with no clear > semantics? No code seems to just test the __IAVF_IN_CRITICAL_TASK flag, > all the uses look like poor man's locking at a quick grep. What am I > missing? > Hi Jakub, I agree with you that the locking could be done with other locking mechanisms just as good. I didn't invent the current method so I'll let Intel comment on that part, but I'd like to point out that what I'm making use of is fixing what is currently in the driver. Thanks! Stefan
On Tue, 16 Mar 2021 18:27:10 +0100 Stefan Assmann wrote: > On 16.03.21 18:14, Jakub Kicinski wrote: > > On Tue, 16 Mar 2021 11:01:41 +0100 Stefan Assmann wrote: > >> To avoid races between iavf_init_task(), iavf_reset_task(), > >> iavf_watchdog_task(), iavf_adminq_task() as well as the shutdown and > >> remove functions more locking is required. > >> The current protection by __IAVF_IN_CRITICAL_TASK is needed in > >> additional places. > >> > >> - The reset task performs state transitions, therefore needs locking. > >> - The adminq task acts on replies from the PF in > >> iavf_virtchnl_completion() which may alter the states. > >> - The init task is not only run during probe but also if a VF gets stuck > >> to reinitialize it. > >> - The shutdown function performs a state transition. > >> - The remove function perorms a state transition and also free's > >> resources. > >> > >> iavf_lock_timeout() is introduced to avoid waiting infinitely > >> and cause a deadlock. Rather unlock and print a warning. > >> > >> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> > > > > I personally think that the overuse of flags in Intel drivers brings > > nothing but trouble. At which point does it make sense to just add a > > lock / semaphore here rather than open code all this with no clear > > semantics? No code seems to just test the __IAVF_IN_CRITICAL_TASK flag, > > all the uses look like poor man's locking at a quick grep. What am I > > missing? > > I agree with you that the locking could be done with other locking > mechanisms just as good. I didn't invent the current method so I'll let > Intel comment on that part, but I'd like to point out that what I'm > making use of is fixing what is currently in the driver. Right, I should have made it clear that I don't blame you for the current state of things. Would you mind sending a patch on top of this one to do a conversion to a semaphore? Intel folks any opinions?
Jakub Kicinski wrote: > > > I personally think that the overuse of flags in Intel drivers brings > > > nothing but trouble. At which point does it make sense to just add a > > > lock / semaphore here rather than open code all this with no clear > > > semantics? No code seems to just test the __IAVF_IN_CRITICAL_TASK flag, > > > all the uses look like poor man's locking at a quick grep. What am I > > > missing? > > > > I agree with you that the locking could be done with other locking > > mechanisms just as good. I didn't invent the current method so I'll let > > Intel comment on that part, but I'd like to point out that what I'm > > making use of is fixing what is currently in the driver. > > Right, I should have made it clear that I don't blame you for the > current state of things. Would you mind sending a patch on top of > this one to do a conversion to a semaphore? > > Intel folks any opinions? I know Slawomir has been working closely with Stefan on figuring out the right ways to fix this code. Hopefully he can speak for himself, but I know he's on Europe time. As for conversion to mutexes I'm a big fan, and as long as we don't have too many collisions with the RTNL lock I think it's a reasonable improvement to do, and if Stefan doesn't want to work on it, we can look into whether Slawomir or his team can.
On 16.03.21 23:02, Jesse Brandeburg wrote: > Jakub Kicinski wrote: >>>> I personally think that the overuse of flags in Intel drivers brings >>>> nothing but trouble. At which point does it make sense to just add a >>>> lock / semaphore here rather than open code all this with no clear >>>> semantics? No code seems to just test the __IAVF_IN_CRITICAL_TASK flag, >>>> all the uses look like poor man's locking at a quick grep. What am I >>>> missing? >>> >>> I agree with you that the locking could be done with other locking >>> mechanisms just as good. I didn't invent the current method so I'll let >>> Intel comment on that part, but I'd like to point out that what I'm >>> making use of is fixing what is currently in the driver. >> >> Right, I should have made it clear that I don't blame you for the >> current state of things. Would you mind sending a patch on top of >> this one to do a conversion to a semaphore? Sure, I'm happy to help working on the conversion once the current issue is resolved. >> Intel folks any opinions? > > I know Slawomir has been working closely with Stefan on figuring out > the right ways to fix this code. Hopefully he can speak for himself, > but I know he's on Europe time. > > As for conversion to mutexes I'm a big fan, and as long as we don't > have too many collisions with the RTNL lock I think it's a reasonable > improvement to do, and if Stefan doesn't want to work on it, we can > look into whether Slawomir or his team can. I'd appreciate to be involved. Thanks! Stefan
We were discussing introducing mutexes in those critical spots for a long time now (in my team). Stefan, if you find time, you are most welcome to offer your solution with mutexes. Slawek -----Original Message----- From: Stefan Assmann <sassmann@kpanic.de> Sent: Wednesday, March 17, 2021 8:50 AM To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Jakub Kicinski <kuba@kernel.org> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Yang, Lihong <lihong.yang@intel.com>; Laba, SlawomirX <slawomirx.laba@intel.com>; Nunley, Nicholas D <nicholas.d.nunley@intel.com> Subject: Re: [PATCH] iavf: fix locking of critical sections On 16.03.21 23:02, Jesse Brandeburg wrote: > Jakub Kicinski wrote: >>>> I personally think that the overuse of flags in Intel drivers >>>> brings nothing but trouble. At which point does it make sense to >>>> just add a lock / semaphore here rather than open code all this >>>> with no clear semantics? No code seems to just test the >>>> __IAVF_IN_CRITICAL_TASK flag, all the uses look like poor man's >>>> locking at a quick grep. What am I missing? >>> >>> I agree with you that the locking could be done with other locking >>> mechanisms just as good. I didn't invent the current method so I'll >>> let Intel comment on that part, but I'd like to point out that what >>> I'm making use of is fixing what is currently in the driver. >> >> Right, I should have made it clear that I don't blame you for the >> current state of things. Would you mind sending a patch on top of >> this one to do a conversion to a semaphore? Sure, I'm happy to help working on the conversion once the current issue is resolved. >> Intel folks any opinions? > > I know Slawomir has been working closely with Stefan on figuring out > the right ways to fix this code. Hopefully he can speak for himself, > but I know he's on Europe time. > > As for conversion to mutexes I'm a big fan, and as long as we don't > have too many collisions with the RTNL lock I think it's a reasonable > improvement to do, and if Stefan doesn't want to work on it, we can > look into whether Slawomir or his team can. I'd appreciate to be involved. Thanks! Stefan
On 17.03.21 18:27, Laba, SlawomirX wrote: > We were discussing introducing mutexes in those critical spots for a long time now (in my team). > Stefan, if you find time, you are most welcome to offer your solution with mutexes. Hi Slawek, I'll work on that conversion once the current patch went through Intel testing and is merged. Smaller patches, smaller steps are usually better, so let's make one change at a time. Thanks! Stefan > Slawek > > > -----Original Message----- > From: Stefan Assmann <sassmann@kpanic.de> > Sent: Wednesday, March 17, 2021 8:50 AM > To: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Jakub Kicinski <kuba@kernel.org> > Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Yang, Lihong <lihong.yang@intel.com>; Laba, SlawomirX <slawomirx.laba@intel.com>; Nunley, Nicholas D <nicholas.d.nunley@intel.com> > Subject: Re: [PATCH] iavf: fix locking of critical sections > > On 16.03.21 23:02, Jesse Brandeburg wrote: >> Jakub Kicinski wrote: >>>>> I personally think that the overuse of flags in Intel drivers >>>>> brings nothing but trouble. At which point does it make sense to >>>>> just add a lock / semaphore here rather than open code all this >>>>> with no clear semantics? No code seems to just test the >>>>> __IAVF_IN_CRITICAL_TASK flag, all the uses look like poor man's >>>>> locking at a quick grep. What am I missing? >>>> >>>> I agree with you that the locking could be done with other locking >>>> mechanisms just as good. I didn't invent the current method so I'll >>>> let Intel comment on that part, but I'd like to point out that what >>>> I'm making use of is fixing what is currently in the driver. >>> >>> Right, I should have made it clear that I don't blame you for the >>> current state of things. Would you mind sending a patch on top of >>> this one to do a conversion to a semaphore? > > Sure, I'm happy to help working on the conversion once the current issue is resolved. > >>> Intel folks any opinions? >> >> I know Slawomir has been working closely with Stefan on figuring out >> the right ways to fix this code. Hopefully he can speak for himself, >> but I know he's on Europe time. >> >> As for conversion to mutexes I'm a big fan, and as long as we don't >> have too many collisions with the RTNL lock I think it's a reasonable >> improvement to do, and if Stefan doesn't want to work on it, we can >> look into whether Slawomir or his team can. > > I'd appreciate to be involved. > Thanks! > > Stefan >
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of > Stefan Assmann > Sent: wtorek, 16 marca 2021 11:02 > To: intel-wired-lan@lists.osuosl.org > Cc: Laba, SlawomirX <slawomirx.laba@intel.com>; netdev@vger.kernel.org; > sassmann@kpanic.de > Subject: [Intel-wired-lan] [PATCH] iavf: fix locking of critical sections > > To avoid races between iavf_init_task(), iavf_reset_task(), > iavf_watchdog_task(), iavf_adminq_task() as well as the shutdown and > remove functions more locking is required. > The current protection by __IAVF_IN_CRITICAL_TASK is needed in additional > places. > > - The reset task performs state transitions, therefore needs locking. > - The adminq task acts on replies from the PF in > iavf_virtchnl_completion() which may alter the states. > - The init task is not only run during probe but also if a VF gets stuck > to reinitialize it. > - The shutdown function performs a state transition. > - The remove function perorms a state transition and also free's > resources. > > iavf_lock_timeout() is introduced to avoid waiting infinitely and cause a > deadlock. Rather unlock and print a warning. > > Signed-off-by: Stefan Assmann <sassmann@kpanic.de> > --- > drivers/net/ethernet/intel/iavf/iavf_main.c | 57 ++++++++++++++++++--- > 1 file changed, 50 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c > b/drivers/net/ethernet/intel/iavf/iavf_main.c > index dc5b3c06d1e0..538b7aa43fa5 100644 > --- a/drivers/net/ethernet/intel/iavf/iavf_main.c > +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c > @@ -131,6 +131,30 @@ enum iavf_status iavf_free_virt_mem_d(struct > iavf_hw *hw, > return 0; > } > > +/** > + * iavf_timeout - try to set bit but give up after timeout > + * @adapter: board private structure > + * @bit: bit to set > + * @msecs: timeout in msecs > + * > + * Returns 0 on success, negative on failure **/ static inline int > +iavf_lock_timeout(struct iavf_adapter *adapter, > + enum iavf_critical_section_t bit, > + unsigned int msecs) > +{ > + unsigned int wait, delay = 10; > + > + for (wait = 0; wait < msecs; wait += delay) { > + if (!test_and_set_bit(bit, &adapter->crit_section)) > + return 0; > + > + msleep(delay); > + } > + > + return -1; > +} > + > /** > * iavf_schedule_reset - Set the flags and schedule a reset event > * @adapter: board private structure > @@ -2069,6 +2093,10 @@ static void iavf_reset_task(struct work_struct > *work) > if (test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) > return; > > + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 200)) { > + schedule_work(&adapter->reset_task); > + return; > + } > while (test_and_set_bit(__IAVF_IN_CLIENT_TASK, > &adapter->crit_section)) > usleep_range(500, 1000); > @@ -2275,6 +2303,8 @@ static void iavf_adminq_task(struct work_struct > *work) > if (!event.msg_buf) > goto out; > > + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 200)) > + goto freedom; > do { > ret = iavf_clean_arq_element(hw, &event, &pending); > v_op = (enum > virtchnl_ops)le32_to_cpu(event.desc.cookie_high); > @@ -2288,6 +2318,7 @@ static void iavf_adminq_task(struct work_struct > *work) > if (pending != 0) > memset(event.msg_buf, 0, > IAVF_MAX_AQ_BUF_SIZE); > } while (pending); > + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); > > if ((adapter->flags & > (IAVF_FLAG_RESET_PENDING | IAVF_FLAG_RESET_NEEDED)) || > @@ -3590,6 +3621,10 @@ static void iavf_init_task(struct work_struct > *work) > init_task.work); > struct iavf_hw *hw = &adapter->hw; > > + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) { > + dev_warn(&adapter->pdev->dev, "failed to set > __IAVF_IN_CRITICAL_TASK in %s\n", __FUNCTION__); > + return; > + } > switch (adapter->state) { > case __IAVF_STARTUP: > if (iavf_startup(adapter) < 0) > @@ -3602,14 +3637,14 @@ static void iavf_init_task(struct work_struct > *work) > case __IAVF_INIT_GET_RESOURCES: > if (iavf_init_get_resources(adapter) < 0) > goto init_failed; > - return; > + goto out; > default: > goto init_failed; > } > > queue_delayed_work(iavf_wq, &adapter->init_task, > msecs_to_jiffies(30)); > - return; > + goto out; > init_failed: > if (++adapter->aq_wait_count > IAVF_AQ_MAX_ERR) { > dev_err(&adapter->pdev->dev, > @@ -3618,9 +3653,11 @@ static void iavf_init_task(struct work_struct > *work) > iavf_shutdown_adminq(hw); > adapter->state = __IAVF_STARTUP; > queue_delayed_work(iavf_wq, &adapter->init_task, HZ * 5); > - return; > + goto out; > } > queue_delayed_work(iavf_wq, &adapter->init_task, HZ); > +out: > + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); > } > > /** > @@ -3637,9 +3674,12 @@ static void iavf_shutdown(struct pci_dev *pdev) > if (netif_running(netdev)) > iavf_close(netdev); > > + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) > + dev_warn(&adapter->pdev->dev, "failed to set > __IAVF_IN_CRITICAL_TASK > +in %s\n", __FUNCTION__); > /* Prevent the watchdog from running. */ > adapter->state = __IAVF_REMOVE; > adapter->aq_required = 0; > + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); > > #ifdef CONFIG_PM > pci_save_state(pdev); > @@ -3866,10 +3906,6 @@ static void iavf_remove(struct pci_dev *pdev) > err); > } > > - /* Shut down all the garbage mashers on the detention level */ > - adapter->state = __IAVF_REMOVE; > - adapter->aq_required = 0; > - adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; > iavf_request_reset(adapter); > msleep(50); > /* If the FW isn't responding, kick it once, but only once. */ @@ - > 3877,6 +3913,13 @@ static void iavf_remove(struct pci_dev *pdev) > iavf_request_reset(adapter); > msleep(50); > } > + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) > + dev_warn(&adapter->pdev->dev, "failed to set > __IAVF_IN_CRITICAL_TASK > +in %s\n", __FUNCTION__); > + > + /* Shut down all the garbage mashers on the detention level */ > + adapter->state = __IAVF_REMOVE; > + adapter->aq_required = 0; > + adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; > iavf_free_all_tx_resources(adapter); > iavf_free_all_rx_resources(adapter); > iavf_misc_irq_disable(adapter); > -- > 2.29.2 Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c index dc5b3c06d1e0..538b7aa43fa5 100644 --- a/drivers/net/ethernet/intel/iavf/iavf_main.c +++ b/drivers/net/ethernet/intel/iavf/iavf_main.c @@ -131,6 +131,30 @@ enum iavf_status iavf_free_virt_mem_d(struct iavf_hw *hw, return 0; } +/** + * iavf_timeout - try to set bit but give up after timeout + * @adapter: board private structure + * @bit: bit to set + * @msecs: timeout in msecs + * + * Returns 0 on success, negative on failure + **/ +static inline int iavf_lock_timeout(struct iavf_adapter *adapter, + enum iavf_critical_section_t bit, + unsigned int msecs) +{ + unsigned int wait, delay = 10; + + for (wait = 0; wait < msecs; wait += delay) { + if (!test_and_set_bit(bit, &adapter->crit_section)) + return 0; + + msleep(delay); + } + + return -1; +} + /** * iavf_schedule_reset - Set the flags and schedule a reset event * @adapter: board private structure @@ -2069,6 +2093,10 @@ static void iavf_reset_task(struct work_struct *work) if (test_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section)) return; + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 200)) { + schedule_work(&adapter->reset_task); + return; + } while (test_and_set_bit(__IAVF_IN_CLIENT_TASK, &adapter->crit_section)) usleep_range(500, 1000); @@ -2275,6 +2303,8 @@ static void iavf_adminq_task(struct work_struct *work) if (!event.msg_buf) goto out; + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 200)) + goto freedom; do { ret = iavf_clean_arq_element(hw, &event, &pending); v_op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high); @@ -2288,6 +2318,7 @@ static void iavf_adminq_task(struct work_struct *work) if (pending != 0) memset(event.msg_buf, 0, IAVF_MAX_AQ_BUF_SIZE); } while (pending); + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); if ((adapter->flags & (IAVF_FLAG_RESET_PENDING | IAVF_FLAG_RESET_NEEDED)) || @@ -3590,6 +3621,10 @@ static void iavf_init_task(struct work_struct *work) init_task.work); struct iavf_hw *hw = &adapter->hw; + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) { + dev_warn(&adapter->pdev->dev, "failed to set __IAVF_IN_CRITICAL_TASK in %s\n", __FUNCTION__); + return; + } switch (adapter->state) { case __IAVF_STARTUP: if (iavf_startup(adapter) < 0) @@ -3602,14 +3637,14 @@ static void iavf_init_task(struct work_struct *work) case __IAVF_INIT_GET_RESOURCES: if (iavf_init_get_resources(adapter) < 0) goto init_failed; - return; + goto out; default: goto init_failed; } queue_delayed_work(iavf_wq, &adapter->init_task, msecs_to_jiffies(30)); - return; + goto out; init_failed: if (++adapter->aq_wait_count > IAVF_AQ_MAX_ERR) { dev_err(&adapter->pdev->dev, @@ -3618,9 +3653,11 @@ static void iavf_init_task(struct work_struct *work) iavf_shutdown_adminq(hw); adapter->state = __IAVF_STARTUP; queue_delayed_work(iavf_wq, &adapter->init_task, HZ * 5); - return; + goto out; } queue_delayed_work(iavf_wq, &adapter->init_task, HZ); +out: + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); } /** @@ -3637,9 +3674,12 @@ static void iavf_shutdown(struct pci_dev *pdev) if (netif_running(netdev)) iavf_close(netdev); + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) + dev_warn(&adapter->pdev->dev, "failed to set __IAVF_IN_CRITICAL_TASK in %s\n", __FUNCTION__); /* Prevent the watchdog from running. */ adapter->state = __IAVF_REMOVE; adapter->aq_required = 0; + clear_bit(__IAVF_IN_CRITICAL_TASK, &adapter->crit_section); #ifdef CONFIG_PM pci_save_state(pdev); @@ -3866,10 +3906,6 @@ static void iavf_remove(struct pci_dev *pdev) err); } - /* Shut down all the garbage mashers on the detention level */ - adapter->state = __IAVF_REMOVE; - adapter->aq_required = 0; - adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; iavf_request_reset(adapter); msleep(50); /* If the FW isn't responding, kick it once, but only once. */ @@ -3877,6 +3913,13 @@ static void iavf_remove(struct pci_dev *pdev) iavf_request_reset(adapter); msleep(50); } + if (iavf_lock_timeout(adapter, __IAVF_IN_CRITICAL_TASK, 5000)) + dev_warn(&adapter->pdev->dev, "failed to set __IAVF_IN_CRITICAL_TASK in %s\n", __FUNCTION__); + + /* Shut down all the garbage mashers on the detention level */ + adapter->state = __IAVF_REMOVE; + adapter->aq_required = 0; + adapter->flags &= ~IAVF_FLAG_REINIT_ITR_NEEDED; iavf_free_all_tx_resources(adapter); iavf_free_all_rx_resources(adapter); iavf_misc_irq_disable(adapter);
To avoid races between iavf_init_task(), iavf_reset_task(), iavf_watchdog_task(), iavf_adminq_task() as well as the shutdown and remove functions more locking is required. The current protection by __IAVF_IN_CRITICAL_TASK is needed in additional places. - The reset task performs state transitions, therefore needs locking. - The adminq task acts on replies from the PF in iavf_virtchnl_completion() which may alter the states. - The init task is not only run during probe but also if a VF gets stuck to reinitialize it. - The shutdown function performs a state transition. - The remove function perorms a state transition and also free's resources. iavf_lock_timeout() is introduced to avoid waiting infinitely and cause a deadlock. Rather unlock and print a warning. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> --- drivers/net/ethernet/intel/iavf/iavf_main.c | 57 ++++++++++++++++++--- 1 file changed, 50 insertions(+), 7 deletions(-)