diff mbox

b43legacy: Fix a sleep-in-atomic bug in b43legacy_op_bss_info_changed

Message ID 1496225353-5544-1-git-send-email-baijiaju1990@163.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show

Commit Message

Jia-Ju Bai May 31, 2017, 10:09 a.m. UTC
The driver may sleep under a spin lock, and the function call path is:
b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
  b43legacy_synchronize_irq
    synchronize_irq --> may sleep

To fix it, the lock is released before b43legacy_synchronize_irq, and the 
lock is acquired again after this function.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
---
 drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
 1 file changed, 2 insertions(+)

Comments

Kalle Valo May 31, 2017, 10:26 a.m. UTC | #1
Jia-Ju Bai <baijiaju1990@163.com> writes:

> The driver may sleep under a spin lock, and the function call path is:
> b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
>   b43legacy_synchronize_irq
>     synchronize_irq --> may sleep
>
> To fix it, the lock is released before b43legacy_synchronize_irq, and the 
> lock is acquired again after this function.
>
> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
> ---
>  drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
> index f1e3dad..31ead21 100644
> --- a/drivers/net/wireless/broadcom/b43legacy/main.c
> +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
> @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
>  	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
>  
>  	if (changed & BSS_CHANGED_BSSID) {
> +		spin_unlock_irqrestore(&wl->irq_lock, flags);
>  		b43legacy_synchronize_irq(dev);
> +		spin_lock_irqsave(&wl->irq_lock, flags);

To me this looks like a fragile workaround and not a real fix. You can
easily add new race conditions with releasing the lock like this.
Arend van Spriel May 31, 2017, 12:15 p.m. UTC | #2
On 31-05-17 12:26, Kalle Valo wrote:
> Jia-Ju Bai <baijiaju1990@163.com> writes:
> 
>> The driver may sleep under a spin lock, and the function call path is:
>> b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
>>   b43legacy_synchronize_irq
>>     synchronize_irq --> may sleep
>>
>> To fix it, the lock is released before b43legacy_synchronize_irq, and the 
>> lock is acquired again after this function.
>>
>> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
>> ---
>>  drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
>> index f1e3dad..31ead21 100644
>> --- a/drivers/net/wireless/broadcom/b43legacy/main.c
>> +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
>> @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
>>  	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
>>  
>>  	if (changed & BSS_CHANGED_BSSID) {
>> +		spin_unlock_irqrestore(&wl->irq_lock, flags);
>>  		b43legacy_synchronize_irq(dev);
>> +		spin_lock_irqsave(&wl->irq_lock, flags);
> 
> To me this looks like a fragile workaround and not a real fix. You can
> easily add new race conditions with releasing the lock like this.

Hi Jia-Ju,

Agree with Kalle as I was about to say the same thing. You really need
to determine what is protected by the irq_lock. Here you are using the
lock because you are about to change wl->bssid a bit further down. Did
not check the entire function but it seems the lock perimeter is too wide.

Regards,
Arend
Michael Büsch May 31, 2017, 3:32 p.m. UTC | #3
On Wed, 31 May 2017 13:26:43 +0300
Kalle Valo <kvalo@codeaurora.org> wrote:

> Jia-Ju Bai <baijiaju1990@163.com> writes:
> 
> > The driver may sleep under a spin lock, and the function call path is:
> > b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
> >   b43legacy_synchronize_irq
> >     synchronize_irq --> may sleep
> >
> > To fix it, the lock is released before b43legacy_synchronize_irq, and the 
> > lock is acquired again after this function.
> >
> > Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
> > ---
> >  drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
> > index f1e3dad..31ead21 100644
> > --- a/drivers/net/wireless/broadcom/b43legacy/main.c
> > +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
> > @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
> >  	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
> >  
> >  	if (changed & BSS_CHANGED_BSSID) {
> > +		spin_unlock_irqrestore(&wl->irq_lock, flags);
> >  		b43legacy_synchronize_irq(dev);
> > +		spin_lock_irqsave(&wl->irq_lock, flags);  
> 
> To me this looks like a fragile workaround and not a real fix. You can
> easily add new race conditions with releasing the lock like this.
> 


I think releasing the lock possibly is fine. It certainly is better than
sleeping with a lock held.
We disabled the device interrupts just before this line.

However I think the synchronize_irq should be outside of the
conditional right after the write to B43legacy_MMIO_GEN_IRQ_MASK. (So
two lines above)
I don't think it makes sense to only synchronize if BSS_CHANGED_BSSID
is set.


On the other hand b43 does not have this irq-disabling foobar anymore.
So somebody must have removed it. Maybe you can find the commit that
removed this stuff from b43 and port it to b43legacy?


So I would vote for moving the synchronize_irq up outside of the
conditional and put the unlock/lock sequence around it.
And as a second patch on top of that try to remove this stuff
altogether like b43 did.
Larry Finger June 1, 2017, 12:07 a.m. UTC | #4
On 05/31/2017 10:32 AM, Michael Büsch wrote:
> On Wed, 31 May 2017 13:26:43 +0300
> Kalle Valo <kvalo@codeaurora.org> wrote:
> 
>> Jia-Ju Bai <baijiaju1990@163.com> writes:
>>
>>> The driver may sleep under a spin lock, and the function call path is:
>>> b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
>>>    b43legacy_synchronize_irq
>>>      synchronize_irq --> may sleep
>>>
>>> To fix it, the lock is released before b43legacy_synchronize_irq, and the
>>> lock is acquired again after this function.
>>>
>>> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
>>> ---
>>>   drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
>>> index f1e3dad..31ead21 100644
>>> --- a/drivers/net/wireless/broadcom/b43legacy/main.c
>>> +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
>>> @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
>>>   	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
>>>   
>>>   	if (changed & BSS_CHANGED_BSSID) {
>>> +		spin_unlock_irqrestore(&wl->irq_lock, flags);
>>>   		b43legacy_synchronize_irq(dev);
>>> +		spin_lock_irqsave(&wl->irq_lock, flags);
>>
>> To me this looks like a fragile workaround and not a real fix. You can
>> easily add new race conditions with releasing the lock like this.
>>
> 
> 
> I think releasing the lock possibly is fine. It certainly is better than
> sleeping with a lock held.
> We disabled the device interrupts just before this line.
> 
> However I think the synchronize_irq should be outside of the
> conditional right after the write to B43legacy_MMIO_GEN_IRQ_MASK. (So
> two lines above)
> I don't think it makes sense to only synchronize if BSS_CHANGED_BSSID
> is set.
> 
> 
> On the other hand b43 does not have this irq-disabling foobar anymore.
> So somebody must have removed it. Maybe you can find the commit that
> removed this stuff from b43 and port it to b43legacy?
> 
> 
> So I would vote for moving the synchronize_irq up outside of the
> conditional and put the unlock/lock sequence around it.
> And as a second patch on top of that try to remove this stuff
> altogether like b43 did.

The patch that removed it in b43 is

commit 36dbd9548e92268127b0c31b0e121e63e9207108
Author: Michael Buesch <mb@bu3sch.de>
Date:   Fri Sep 4 22:51:29 2009 +0200

     b43: Use a threaded IRQ handler

     Use a threaded IRQ handler to allow locking the mutex and
     sleeping while executing an interrupt.
     This removes usage of the irq_lock spinlock, but introduces
     a new hardirq_lock, which is _only_ used for the PCI/SSB lowlevel
     hard-irq handler. Sleeping busses (SDIO) will use mutex instead.

     Signed-off-by: Michael Buesch <mb@bu3sch.de>
     Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
     Signed-off-by: John W. Linville <linville@tuxdriver.com>

I vaguely remember this patch. Although it is roughly a 1000-line fix, I will 
try to port it to b43legacy. I still have an old BCM4306 PCMCIA card that I can 
test in a PowerBook G4.

I agree with Michael that this is the way to go. Both of Jia-Ju's patches should 
be rejected.

Larry
Jia-Ju Bai June 1, 2017, 1:07 a.m. UTC | #5
On 06/01/2017 08:07 AM, Larry Finger wrote:
> On 05/31/2017 10:32 AM, Michael Büsch wrote:
>> On Wed, 31 May 2017 13:26:43 +0300
>> Kalle Valo <kvalo@codeaurora.org> wrote:
>>
>>> Jia-Ju Bai <baijiaju1990@163.com> writes:
>>>
>>>> The driver may sleep under a spin lock, and the function call path is:
>>>> b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
>>>>    b43legacy_synchronize_irq
>>>>      synchronize_irq --> may sleep
>>>>
>>>> To fix it, the lock is released before b43legacy_synchronize_irq, 
>>>> and the
>>>> lock is acquired again after this function.
>>>>
>>>> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
>>>> ---
>>>>   drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c 
>>>> b/drivers/net/wireless/broadcom/b43legacy/main.c
>>>> index f1e3dad..31ead21 100644
>>>> --- a/drivers/net/wireless/broadcom/b43legacy/main.c
>>>> +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
>>>> @@ -2859,7 +2859,9 @@ static void 
>>>> b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
>>>>       b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
>>>>         if (changed & BSS_CHANGED_BSSID) {
>>>> +        spin_unlock_irqrestore(&wl->irq_lock, flags);
>>>>           b43legacy_synchronize_irq(dev);
>>>> +        spin_lock_irqsave(&wl->irq_lock, flags);
>>>
>>> To me this looks like a fragile workaround and not a real fix. You can
>>> easily add new race conditions with releasing the lock like this.
>>>
>>
>>
>> I think releasing the lock possibly is fine. It certainly is better than
>> sleeping with a lock held.
>> We disabled the device interrupts just before this line.
>>
>> However I think the synchronize_irq should be outside of the
>> conditional right after the write to B43legacy_MMIO_GEN_IRQ_MASK. (So
>> two lines above)
>> I don't think it makes sense to only synchronize if BSS_CHANGED_BSSID
>> is set.
>>
>>
>> On the other hand b43 does not have this irq-disabling foobar anymore.
>> So somebody must have removed it. Maybe you can find the commit that
>> removed this stuff from b43 and port it to b43legacy?
>>
>>
>> So I would vote for moving the synchronize_irq up outside of the
>> conditional and put the unlock/lock sequence around it.
>> And as a second patch on top of that try to remove this stuff
>> altogether like b43 did.
>
> The patch that removed it in b43 is
>
> commit 36dbd9548e92268127b0c31b0e121e63e9207108
> Author: Michael Buesch <mb@bu3sch.de>
> Date:   Fri Sep 4 22:51:29 2009 +0200
>
>     b43: Use a threaded IRQ handler
>
>     Use a threaded IRQ handler to allow locking the mutex and
>     sleeping while executing an interrupt.
>     This removes usage of the irq_lock spinlock, but introduces
>     a new hardirq_lock, which is _only_ used for the PCI/SSB lowlevel
>     hard-irq handler. Sleeping busses (SDIO) will use mutex instead.
>
>     Signed-off-by: Michael Buesch <mb@bu3sch.de>
>     Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
>     Signed-off-by: John W. Linville <linville@tuxdriver.com>
>
> I vaguely remember this patch. Although it is roughly a 1000-line fix, 
> I will try to port it to b43legacy. I still have an old BCM4306 PCMCIA 
> card that I can test in a PowerBook G4.
>
> I agree with Michael that this is the way to go. Both of Jia-Ju's 
> patches should be rejected.
>
> Larry
>
>

It is fine to me to fix the bug by porting this former patch.

Thanks,
Jia-Ju Bai
Kalle Valo June 1, 2017, 4:27 a.m. UTC | #6
Michael Büsch <m@bues.ch> writes:

>> > --- a/drivers/net/wireless/broadcom/b43legacy/main.c
>> > +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
>> > @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
>> >  	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
>> >  
>> >  	if (changed & BSS_CHANGED_BSSID) {
>> > +		spin_unlock_irqrestore(&wl->irq_lock, flags);
>> >  		b43legacy_synchronize_irq(dev);
>> > +		spin_lock_irqsave(&wl->irq_lock, flags);  
>> 
>> To me this looks like a fragile workaround and not a real fix. You can
>> easily add new race conditions with releasing the lock like this.
>> 
>
>
> I think releasing the lock possibly is fine. It certainly is better than
> sleeping with a lock held.

Sure, but IMHO in general I think the practise of releasing the lock
like this in a middle of function is dangerous as one can easily miss
that upper and lower halves of the function are not actually atomic
anymore. And in this case that it's under a conditional makes it even
worse.
Michael Büsch June 1, 2017, 5:29 a.m. UTC | #7
On Thu, 01 Jun 2017 07:27:20 +0300
Kalle Valo <kvalo@codeaurora.org> wrote:

> Michael Büsch <m@bues.ch> writes:
> 
> >> > --- a/drivers/net/wireless/broadcom/b43legacy/main.c
> >> > +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
> >> > @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
> >> >  	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
> >> >  
> >> >  	if (changed & BSS_CHANGED_BSSID) {
> >> > +		spin_unlock_irqrestore(&wl->irq_lock, flags);
> >> >  		b43legacy_synchronize_irq(dev);
> >> > +		spin_lock_irqsave(&wl->irq_lock, flags);    
> >> 
> >> To me this looks like a fragile workaround and not a real fix. You can
> >> easily add new race conditions with releasing the lock like this.
> >>   
> >
> >
> > I think releasing the lock possibly is fine. It certainly is better than
> > sleeping with a lock held.  
> 
> Sure, but IMHO in general I think the practise of releasing the lock
> like this in a middle of function is dangerous as one can easily miss
> that upper and lower halves of the function are not actually atomic
> anymore. And in this case that it's under a conditional makes it even
> worse.
> 


Yes in general I agree. Releasing and re-acquiring a lock is dangerous.
But I think in this special case here it might be harmless.
The irq_lock is used mostly (if not exclusively; I don't fully
remember) to protect against the IRQ top and bottom half.
But we disabled the device IRQs a line above and the purpose of this
synchronize is to make sure the handler will finish and thus make
dropping the lock save.
Of course it does not make sense to do this with the lock held :)
Michael Büsch June 1, 2017, 5:31 a.m. UTC | #8
On Wed, 31 May 2017 19:07:15 -0500
Larry Finger <Larry.Finger@lwfinger.net> wrote:

> On 05/31/2017 10:32 AM, Michael Büsch wrote:
> > On Wed, 31 May 2017 13:26:43 +0300
> > Kalle Valo <kvalo@codeaurora.org> wrote:
> >   
> >> Jia-Ju Bai <baijiaju1990@163.com> writes:
> >>  
> >>> The driver may sleep under a spin lock, and the function call path is:
> >>> b43legacy_op_bss_info_changed (acquire the lock by spin_lock_irqsave)
> >>>    b43legacy_synchronize_irq
> >>>      synchronize_irq --> may sleep
> >>>
> >>> To fix it, the lock is released before b43legacy_synchronize_irq, and the
> >>> lock is acquired again after this function.
> >>>
> >>> Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
> >>> ---
> >>>   drivers/net/wireless/broadcom/b43legacy/main.c |    2 ++
> >>>   1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
> >>> index f1e3dad..31ead21 100644
> >>> --- a/drivers/net/wireless/broadcom/b43legacy/main.c
> >>> +++ b/drivers/net/wireless/broadcom/b43legacy/main.c
> >>> @@ -2859,7 +2859,9 @@ static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
> >>>   	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
> >>>   
> >>>   	if (changed & BSS_CHANGED_BSSID) {
> >>> +		spin_unlock_irqrestore(&wl->irq_lock, flags);
> >>>   		b43legacy_synchronize_irq(dev);
> >>> +		spin_lock_irqsave(&wl->irq_lock, flags);  
> >>
> >> To me this looks like a fragile workaround and not a real fix. You can
> >> easily add new race conditions with releasing the lock like this.
> >>  
> > 
> > 
> > I think releasing the lock possibly is fine. It certainly is better than
> > sleeping with a lock held.
> > We disabled the device interrupts just before this line.
> > 
> > However I think the synchronize_irq should be outside of the
> > conditional right after the write to B43legacy_MMIO_GEN_IRQ_MASK. (So
> > two lines above)
> > I don't think it makes sense to only synchronize if BSS_CHANGED_BSSID
> > is set.
> > 
> > 
> > On the other hand b43 does not have this irq-disabling foobar anymore.
> > So somebody must have removed it. Maybe you can find the commit that
> > removed this stuff from b43 and port it to b43legacy?
> > 
> > 
> > So I would vote for moving the synchronize_irq up outside of the
> > conditional and put the unlock/lock sequence around it.
> > And as a second patch on top of that try to remove this stuff
> > altogether like b43 did.  
> 
> The patch that removed it in b43 is
> 
> commit 36dbd9548e92268127b0c31b0e121e63e9207108
> Author: Michael Buesch <mb@bu3sch.de>
> Date:   Fri Sep 4 22:51:29 2009 +0200

Damn it :D

>      b43: Use a threaded IRQ handler
> 
>      Use a threaded IRQ handler to allow locking the mutex and
>      sleeping while executing an interrupt.
>      This removes usage of the irq_lock spinlock, but introduces
>      a new hardirq_lock, which is _only_ used for the PCI/SSB lowlevel
>      hard-irq handler. Sleeping busses (SDIO) will use mutex instead.
> 
>      Signed-off-by: Michael Buesch <mb@bu3sch.de>
>      Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
>      Signed-off-by: John W. Linville <linville@tuxdriver.com>
> 
> I vaguely remember this patch. Although it is roughly a 1000-line fix, I will 
> try to port it to b43legacy. I still have an old BCM4306 PCMCIA card that I can 
> test in a PowerBook G4.
> 
> I agree with Michael that this is the way to go. Both of Jia-Ju's patches should 
> be rejected.


I'm not sure if it's worth it. There is a risk that this would
introduce new bugs.
But sure, please feel free to try it. This way we can find out how big
this change becomes.
diff mbox

Patch

diff --git a/drivers/net/wireless/broadcom/b43legacy/main.c b/drivers/net/wireless/broadcom/b43legacy/main.c
index f1e3dad..31ead21 100644
--- a/drivers/net/wireless/broadcom/b43legacy/main.c
+++ b/drivers/net/wireless/broadcom/b43legacy/main.c
@@ -2859,7 +2859,9 @@  static void b43legacy_op_bss_info_changed(struct ieee80211_hw *hw,
 	b43legacy_write32(dev, B43legacy_MMIO_GEN_IRQ_MASK, 0);
 
 	if (changed & BSS_CHANGED_BSSID) {
+		spin_unlock_irqrestore(&wl->irq_lock, flags);
 		b43legacy_synchronize_irq(dev);
+		spin_lock_irqsave(&wl->irq_lock, flags);
 
 		if (conf->bssid)
 			memcpy(wl->bssid, conf->bssid, ETH_ALEN);