diff mbox

[1/3] net: ethernet: ti: cpdma: fix lockup in cpdma_ctlr_destroy()

Message ID 20160722135847.20557-2-grygorii.strashko@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Grygorii Strashko July 22, 2016, 1:58 p.m. UTC
Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
cpsw module removal:
 cpsw_remove()
 - cpdma_ctlr_destroy()
   - spin_lock_irqsave(&ctlr->lock, flags)
   - cpdma_ctlr_stop()
     - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
   - cpdma_chan_destroy()
     - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock

The issue has not been observed before because CPDMA channels have
been destroyed manually by CPSW until commit d941ebe88a41 ("net:
ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
 1 file changed, 2 deletions(-)

Comments

kernel test robot July 22, 2016, 4:03 p.m. UTC | #1
Hi,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.7-rc7 next-20160722]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Grygorii-Strashko/drivers-net-cpsw-fix-driver-loading-unloading/20160722-221708
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/ti/davinci_cpdma.c: In function 'cpdma_ctlr_destroy':
>> drivers/net/ethernet/ti/davinci_cpdma.c:433:16: warning: unused variable 'flags' [-Wunused-variable]
     unsigned long flags;
                   ^

vim +/flags +433 drivers/net/ethernet/ti/davinci_cpdma.c

ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  417  			 dma_reg_read(ctlr, CPDMA_DMASTATUS));
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  418  		dev_info(dev, "CPDMA: rxbuffofs: %x",
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  419  			 dma_reg_read(ctlr, CPDMA_RXBUFFOFS));
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  420  	}
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  421  
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  422  	for (i = 0; i < ARRAY_SIZE(ctlr->channels); i++)
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  423  		if (ctlr->channels[i])
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  424  			cpdma_chan_dump(ctlr->channels[i]);
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  425  
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  426  	spin_unlock_irqrestore(&ctlr->lock, flags);
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  427  	return 0;
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  428  }
32a6d90b drivers/net/ethernet/ti/davinci_cpdma.c Arnd Bergmann     2012-04-20  429  EXPORT_SYMBOL_GPL(cpdma_ctlr_dump);
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  430  
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  431  int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  432  {
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15 @433  	unsigned long flags;
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  434  	int ret = 0, i;
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  435  
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  436  	if (!ctlr)
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  437  		return -EINVAL;
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  438  
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  439  	if (ctlr->state != CPDMA_STATE_IDLE)
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  440  		cpdma_ctlr_stop(ctlr);
ef8c2dab drivers/net/davinci_cpdma.c             Cyril Chemparathy 2010-09-15  441  

:::::: The code at line 433 was first introduced by commit
:::::: ef8c2dab01b6e30c4b2ca3ea3b8db33430493589 net: davinci_emac: separate out cpdma code

:::::: TO: Cyril Chemparathy <cyril@ti.com>
:::::: CC: Kevin Hilman <khilman@deeprootsystems.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Ivan Khoronzhuk July 23, 2016, 6:24 a.m. UTC | #2
On 22.07.16 16:58, Grygorii Strashko wrote:
> Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
> cpsw module removal:
>  cpsw_remove()
>  - cpdma_ctlr_destroy()
>    - spin_lock_irqsave(&ctlr->lock, flags)
>    - cpdma_ctlr_stop()
>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>    - cpdma_chan_destroy()
>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>
> The issue has not been observed before because CPDMA channels have
> been destroyed manually by CPSW until commit d941ebe88a41 ("net:
> ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index a68652a..89242e9 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -436,7 +436,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>  	if (!ctlr)
>  		return -EINVAL;
>
> -	spin_lock_irqsave(&ctlr->lock, flags);
Should ctlr->state be checked under lock?
Seems like here should be used unlocked static versions of
cpdma_ctlr_stop() and cpdma_chan_destroy() instead.

>  	if (ctlr->state != CPDMA_STATE_IDLE)
>  		cpdma_ctlr_stop(ctlr);
>
> @@ -444,7 +443,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>  		cpdma_chan_destroy(ctlr->channels[i]);
>
>  	cpdma_desc_pool_destroy(ctlr->pool);
> -	spin_unlock_irqrestore(&ctlr->lock, flags);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
>
Grygorii Strashko July 26, 2016, 4:02 p.m. UTC | #3
On 07/23/2016 09:24 AM, Ivan Khoronzhuk wrote:
> 
> 
> On 22.07.16 16:58, Grygorii Strashko wrote:
>> Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
>> cpsw module removal:
>>  cpsw_remove()
>>  - cpdma_ctlr_destroy()
>>    - spin_lock_irqsave(&ctlr->lock, flags)
>>    - cpdma_ctlr_stop()
>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>    - cpdma_chan_destroy()
>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>
>> The issue has not been observed before because CPDMA channels have
>> been destroyed manually by CPSW until commit d941ebe88a41 ("net:
>> ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
>>
>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>> ---
>>  drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c
>> b/drivers/net/ethernet/ti/davinci_cpdma.c
>> index a68652a..89242e9 100644
>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>> @@ -436,7 +436,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>      if (!ctlr)
>>          return -EINVAL;
>>
>> -    spin_lock_irqsave(&ctlr->lock, flags);
> Should ctlr->state be checked under lock?
> Seems like here should be used unlocked static versions of
> cpdma_ctlr_stop() and cpdma_chan_destroy() instead.

As per my understanding it's not expected the ctlr->state will be changed at this 
moment as all net devices has been unregistered already.

> 
>>      if (ctlr->state != CPDMA_STATE_IDLE)

May be I can move above check in cpdma_ctlr_stop() instead.
What do you think?

>>          cpdma_ctlr_stop(ctlr);
>>
>> @@ -444,7 +443,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>          cpdma_chan_destroy(ctlr->channels[i]);
>>
>>      cpdma_desc_pool_destroy(ctlr->pool);
>> -    spin_unlock_irqrestore(&ctlr->lock, flags);
>>      return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
>>
>
Ivan Khoronzhuk July 26, 2016, 8:54 p.m. UTC | #4
On 26.07.16 19:02, Grygorii Strashko wrote:
> On 07/23/2016 09:24 AM, Ivan Khoronzhuk wrote:
>>
>>
>> On 22.07.16 16:58, Grygorii Strashko wrote:
>>> Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
>>> cpsw module removal:
>>>  cpsw_remove()
>>>  - cpdma_ctlr_destroy()
>>>    - spin_lock_irqsave(&ctlr->lock, flags)
>>>    - cpdma_ctlr_stop()
>>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>>    - cpdma_chan_destroy()
>>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>>
>>> The issue has not been observed before because CPDMA channels have
>>> been destroyed manually by CPSW until commit d941ebe88a41 ("net:
>>> ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
>>>
>>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>> ---
>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
>>>  1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c
>>> b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> index a68652a..89242e9 100644
>>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> @@ -436,7 +436,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>>      if (!ctlr)
>>>          return -EINVAL;
>>>
>>> -    spin_lock_irqsave(&ctlr->lock, flags);
>> Should ctlr->state be checked under lock?
>> Seems like here should be used unlocked static versions of
>> cpdma_ctlr_stop() and cpdma_chan_destroy() instead.
>
> As per my understanding it's not expected the ctlr->state will be changed at this
> moment as all net devices has been unregistered already.
Seems yes, the race can be only in case of incorrect usage, stop while destroy,
destroy while start...etc..all they are mostly unreal use-cases, you are right,
but such check w/o lock always under eyes control, that always makes you think
that smth wrong.

>
>>
>>>      if (ctlr->state != CPDMA_STATE_IDLE)
>
> May be I can move above check in cpdma_ctlr_stop() instead.
> What do you think?
Yes, it be more clear.
I was thinking about lock deletion also, as under this destroy function the
ctlr destroys it's resources one by one, ok, the channels are destroyed under lock,
but pool ....(it's good that it's destroyed after channels). I see that it should never
happen, but ctrl is external structure, who knows as it can be used while destroying.
That was my paranoiac point, so don't pay a lot attention to it. In case of normal usage,
as it's currently is and should be, the lock can be removed.

>
>>>          cpdma_ctlr_stop(ctlr);
>>>
>>> @@ -444,7 +443,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>>          cpdma_chan_destroy(ctlr->channels[i]);
>>>
>>>      cpdma_desc_pool_destroy(ctlr->pool);
>>> -    spin_unlock_irqrestore(&ctlr->lock, flags);
>>>      return ret;
>>>  }
>>>  EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
>>>
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Grygorii Strashko July 28, 2016, 9:44 a.m. UTC | #5
On 07/26/2016 11:54 PM, ivan.khoronzhuk wrote:
> 
> 
> On 26.07.16 19:02, Grygorii Strashko wrote:
>> On 07/23/2016 09:24 AM, Ivan Khoronzhuk wrote:
>>>
>>>
>>> On 22.07.16 16:58, Grygorii Strashko wrote:
>>>> Fix deadlock in cpdma_ctlr_destroy() which is triggered now on
>>>> cpsw module removal:
>>>>  cpsw_remove()
>>>>  - cpdma_ctlr_destroy()
>>>>    - spin_lock_irqsave(&ctlr->lock, flags)
>>>>    - cpdma_ctlr_stop()
>>>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>>>    - cpdma_chan_destroy()
>>>>      - spin_lock_irqsave(&ctlr->lock, flags); <- deadlock
>>>>
>>>> The issue has not been observed before because CPDMA channels have
>>>> been destroyed manually by CPSW until commit d941ebe88a41 ("net:
>>>> ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged.
>>>>
>>>> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>>> ---
>>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 2 --
>>>>  1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> b/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> index a68652a..89242e9 100644
>>>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>>>> @@ -436,7 +436,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>>>      if (!ctlr)
>>>>          return -EINVAL;
>>>>
>>>> -    spin_lock_irqsave(&ctlr->lock, flags);
>>> Should ctlr->state be checked under lock?
>>> Seems like here should be used unlocked static versions of
>>> cpdma_ctlr_stop() and cpdma_chan_destroy() instead.
>>
>> As per my understanding it's not expected the ctlr->state will be
>> changed at this
>> moment as all net devices has been unregistered already.
> Seems yes, the race can be only in case of incorrect usage, stop while
> destroy,
> destroy while start...etc..all they are mostly unreal use-cases, you are
> right,
> but such check w/o lock always under eyes control, that always makes you
> think
> that smth wrong.
> 
>>
>>>
>>>>      if (ctlr->state != CPDMA_STATE_IDLE)
>>
>> May be I can move above check in cpdma_ctlr_stop() instead.
>> What do you think?
> Yes, it be more clear.
> I was thinking about lock deletion also, as under this destroy function the
> ctlr destroys it's resources one by one, ok, the channels are destroyed
> under lock,
> but pool ....(it's good that it's destroyed after channels). I see that
> it should never
> happen, but ctrl is external structure, who knows as it can be used
> while destroying.
> That was my paranoiac point, so don't pay a lot attention to it. In case
> of normal usage,
> as it's currently is and should be, the lock can be removed.


I'm going to keep it as is after some thinking and code checking -
I don't see any reasons for races here and I can't simply move this check in cpdma_ctlr_stop() 
as it might break ndo_open failure handling (and this is not smth. I'd like to fix within this series).

I'll resend v2 with build issue fixed and with fix for new issue I've found.
> 
>>
>>>>          cpdma_ctlr_stop(ctlr);
>>>>
>>>> @@ -444,7 +443,6 @@ int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
>>>>          cpdma_chan_destroy(ctlr->channels[i]);
>>>>
>>>>      cpdma_desc_pool_destroy(ctlr->pool);
>>>> -    spin_unlock_irqrestore(&ctlr->lock, flags);
>>>>      return ret;
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);
>>>>
>>>
>>
>>
diff mbox

Patch

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index a68652a..89242e9 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -436,7 +436,6 @@  int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
 	if (!ctlr)
 		return -EINVAL;
 
-	spin_lock_irqsave(&ctlr->lock, flags);
 	if (ctlr->state != CPDMA_STATE_IDLE)
 		cpdma_ctlr_stop(ctlr);
 
@@ -444,7 +443,6 @@  int cpdma_ctlr_destroy(struct cpdma_ctlr *ctlr)
 		cpdma_chan_destroy(ctlr->channels[i]);
 
 	cpdma_desc_pool_destroy(ctlr->pool);
-	spin_unlock_irqrestore(&ctlr->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(cpdma_ctlr_destroy);