diff mbox

ath9k: fix NULL-deref in hw_per_calibration() for ar9002

Message ID 1399447378-31503-1-git-send-email-dh.herrmann@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

David Herrmann May 7, 2014, 7:22 a.m. UTC
ah->caldata may be NULL if no channel is selected. Check for that before
accessing it.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
---
Hi

This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
is freed while we run in hw_per_calibration(). However, this patch fixes serious
kernel panics with wifi-P2P on my machine.

I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
correct fix would be to synchronously stop any running hw-calibration before
setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
this small workaround.

Thanks
David

 drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

John W. Linville May 7, 2014, 7:54 p.m. UTC | #1
On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> ah->caldata may be NULL if no channel is selected. Check for that before
> accessing it.
> 
> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
> ---
> Hi
> 
> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> kernel panics with wifi-P2P on my machine.
> 
> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> correct fix would be to synchronously stop any running hw-calibration before
> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> this small workaround.
> 
> Thanks
> David

Is there any hope for getting a more complete fix from the ath9k guys
in short order?

John

> 
>  drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> index cdc7400..4667ab9 100644
> --- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> +++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
> @@ -99,14 +99,16 @@ static bool ar9002_hw_per_calibration(struct ath_hw *ah,
>  				}
>  
>  				currCal->calData->calPostProc(ah, numChains);
> -				caldata->CalValid |= currCal->calData->calType;
> +				if (caldata)
> +					caldata->CalValid |= currCal->calData->calType;
> +
>  				currCal->calState = CAL_DONE;
>  				iscaldone = true;
>  			} else {
>  				ar9002_hw_setup_calibration(ah, currCal);
>  			}
>  		}
> -	} else if (!(caldata->CalValid & currCal->calData->calType)) {
> +	} else if (caldata && !(caldata->CalValid & currCal->calData->calType)) {
>  		ath9k_hw_reset_calibration(ah, currCal);
>  	}
>  
> -- 
> 1.9.2
> 
>
Luis R. Rodriguez May 7, 2014, 8:03 p.m. UTC | #2
On Wed, May 7, 2014 at 12:54 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> Is there any hope for getting a more complete fix from the ath9k guys
> in short order?

Wait, who are the ath9k folks now exactly ?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Fietkau May 7, 2014, 8:15 p.m. UTC | #3
On 2014-05-07 21:54, John W. Linville wrote:
> On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> ah->caldata may be NULL if no channel is selected. Check for that before
>> accessing it.
>> 
>> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
>> ---
>> Hi
>> 
>> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> kernel panics with wifi-P2P on my machine.
>> 
>> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> correct fix would be to synchronously stop any running hw-calibration before
>> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> this small workaround.
>> 
>> Thanks
>> David
> 
> Is there any hope for getting a more complete fix from the ath9k guys
> in short order?
This looks easy to fix. I'll send a patch soon.

- Felix

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John W. Linville May 7, 2014, 8:38 p.m. UTC | #4
On Wed, May 07, 2014 at 01:03:00PM -0700, Luis R. Rodriguez wrote:
> On Wed, May 7, 2014 at 12:54 PM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > Is there any hope for getting a more complete fix from the ath9k guys
> > in short order?
> 
> Wait, who are the ath9k folks now exactly ?

A better question may be, who are the qualcomm folks now exactly? ;-)

Anyway, I hope that the ath9k guys know who they are and will respond.
If there aren't any, then I guess we'll have to start slapping bubble
gum and popsicle sticks onto ath9k...

John
Rajkumar Manoharan May 8, 2014, 6:18 p.m. UTC | #5
On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> ah->caldata may be NULL if no channel is selected. Check for that before
> accessing it.
> 
> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
> ---
> Hi
> 
> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> kernel panics with wifi-P2P on my machine.
> 
> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> correct fix would be to synchronously stop any running hw-calibration before
> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> this small workaround.
>
David,

Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in
hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped
synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe
the panics. Is there a easiest way to reproduce the problem. Are you
using wireless-testing tree? Thanks for reporting the problem. Will try
to fix asap.

-Rajkumar
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Herrmann May 8, 2014, 8:16 p.m. UTC | #6
Hi

On Thu, May 8, 2014 at 8:18 PM, Rajkumar Manoharan
<rmanohar@qti.qualcomm.com> wrote:
> On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> ah->caldata may be NULL if no channel is selected. Check for that before
>> accessing it.
>>
>> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
>> ---
>> Hi
>>
>> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> kernel panics with wifi-P2P on my machine.
>>
>> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> correct fix would be to synchronously stop any running hw-calibration before
>> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> this small workaround.
>>
> David,
>
> Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in
> hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped
> synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe
> the panics. Is there a easiest way to reproduce the problem. Are you
> using wireless-testing tree? Thanks for reporting the problem. Will try
> to fix asap.

Reproducing it is actually quite easy on my machine. Whenever I start
a P2P-connect from my Android-phone to my linux-host and _immediately_
accept it (via p2p_connect on wpas), I get the kernel-panic. Adding
the NULL-protection fixes this.

However, if I delay accepting the connection (ie, issuing p2p_connect
by hand instead of automatically), I cannot see the bug. Furthermore,
on my slower Intel Core 2 Duo, the bug happens much less likely. On my
ARM machine I never saw this happening. Given that my main machine is
an Intel hsw quad-core, I guess it's a simple race-condition.

I also added a printk() whenever caldata is NULL and noticed that it
fires only during the first 2 or 3 runs. After that, it never happened
again.

The bug happens on all linux kernels I tested (starting with 3.9ish up
to linux-next). However, if I apply my fix, anything after 3.13-stable
fails to transmit DHCP data. I can connect properly but DHCP always
times out. I'm not sure why that happens and I'm still debugging this,
but it's quite likely a separate issue. (if I find some time, I will
bisect this)

I now looked at the ath9k code and I couldn't see any locking around
the hw_reset at all. I don't know whether the wifi-core / nl80211
locks this, but what happens if two hw_resets race each other? Just a
guess.. I will try to look into it tomorrow.

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John W. Linville May 12, 2014, 5:49 p.m. UTC | #7
On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote:
> On 2014-05-07 21:54, John W. Linville wrote:
> > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
> >> ah->caldata may be NULL if no channel is selected. Check for that before
> >> accessing it.
> >> 
> >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
> >> ---
> >> Hi
> >> 
> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
> >> is freed while we run in hw_per_calibration(). However, this patch fixes serious
> >> kernel panics with wifi-P2P on my machine.
> >> 
> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
> >> correct fix would be to synchronously stop any running hw-calibration before
> >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
> >> this small workaround.
> >> 
> >> Thanks
> >> David
> > 
> > Is there any hope for getting a more complete fix from the ath9k guys
> > in short order?
> This looks easy to fix. I'll send a patch soon.

Ping?
Felix Fietkau May 12, 2014, 6:43 p.m. UTC | #8
On 2014-05-12 19:49, John W. Linville wrote:
> On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote:
>> On 2014-05-07 21:54, John W. Linville wrote:
>> > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote:
>> >> ah->caldata may be NULL if no channel is selected. Check for that before
>> >> accessing it.
>> >> 
>> >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
>> >> ---
>> >> Hi
>> >> 
>> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata
>> >> is freed while we run in hw_per_calibration(). However, this patch fixes serious
>> >> kernel panics with wifi-P2P on my machine.
>> >> 
>> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the
>> >> correct fix would be to synchronously stop any running hw-calibration before
>> >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote
>> >> this small workaround.
>> >> 
>> >> Thanks
>> >> David
>> > 
>> > Is there any hope for getting a more complete fix from the ath9k guys
>> > in short order?
>> This looks easy to fix. I'll send a patch soon.
> 
> Ping?
I looked into it again, the scenario where I assumed that this problem
could occur didn't turn out to be true. I have no idea how this crash
can occur.

- Felix
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Herrmann May 13, 2014, 6:50 a.m. UTC | #9
Hi

On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote:
> I looked into it again, the scenario where I assumed that this problem
> could occur didn't turn out to be true. I have no idea how this crash
> can occur.

The only path that can set ah->caldata to NULL is through:

ieee80211_hw_config()
  ath9k_htc_config()
    ath9k_htc_set_channel()
      ath9k_hw_reset()

This happens whenever IEEE80211_CONF_OFFCHANNEL is set.

Now mac80211 is way to big for me to review right now and
ieee80211_hw_config() is used quite often. Given that the described
call-path does no synchronization against ath9k_htc_ani_work(), all
the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
ani-work is running. Is that intentional? I cannot see any of those
functions calling into ath9k_htc_stop_ani(). This might of course be
implicit.

One call-path I see is:
ieee80211_scan_cancel()
  cancel_delayed_work()

We cannot use cancel_delayed_work_sync() here due to locking issues.
However, this obviously races against any following
set_channel(OFFCHANNEL) request.

If there's anything I can do to debug this, let me know. I tried
adding some printk()'s into the hot-path and it turns out to no longer
fail then. So this really seems to be a quite small race (given that a
bunch of simple printk()s is slow enough to work around it).

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rajkumar Manoharan May 13, 2014, 9 a.m. UTC | #10
On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote:
> Hi
> 
> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote:
> > I looked into it again, the scenario where I assumed that this problem
> > could occur didn't turn out to be true. I have no idea how this crash
> > can occur.
> 
> The only path that can set ah->caldata to NULL is through:
> 
> ieee80211_hw_config()
>   ath9k_htc_config()
>     ath9k_htc_set_channel()
>       ath9k_hw_reset()
> 
> This happens whenever IEEE80211_CONF_OFFCHANNEL is set.
> 
> Now mac80211 is way to big for me to review right now and
> ieee80211_hw_config() is used quite often. Given that the described
> call-path does no synchronization against ath9k_htc_ani_work(), all
> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
> ani-work is running. Is that intentional? I cannot see any of those
> functions calling into ath9k_htc_stop_ani(). This might of course be
> implicit.
> 
> One call-path I see is:
> ieee80211_scan_cancel()
>   cancel_delayed_work()
> 
> We cannot use cancel_delayed_work_sync() here due to locking issues.
> However, this obviously races against any following
> set_channel(OFFCHANNEL) request.
> 
> If there's anything I can do to debug this, let me know. I tried
> adding some printk()'s into the hot-path and it turns out to no longer
> fail then. So this really seems to be a quite small race (given that a
> bunch of simple printk()s is slow enough to work around it).

David,

Are you using USB devices? What is the PID/VID? I have not looked at
HTC driver perspective.

-Rajkumar
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Herrmann May 13, 2014, 9:09 a.m. UTC | #11
Hi

On Tue, May 13, 2014 at 11:00 AM, Rajkumar Manoharan
<rmanohar@qti.qualcomm.com> wrote:
> On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote:
>> Hi
>>
>> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote:
>> > I looked into it again, the scenario where I assumed that this problem
>> > could occur didn't turn out to be true. I have no idea how this crash
>> > can occur.
>>
>> The only path that can set ah->caldata to NULL is through:
>>
>> ieee80211_hw_config()
>>   ath9k_htc_config()
>>     ath9k_htc_set_channel()
>>       ath9k_hw_reset()
>>
>> This happens whenever IEEE80211_CONF_OFFCHANNEL is set.
>>
>> Now mac80211 is way to big for me to review right now and
>> ieee80211_hw_config() is used quite often. Given that the described
>> call-path does no synchronization against ath9k_htc_ani_work(), all
>> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no
>> ani-work is running. Is that intentional? I cannot see any of those
>> functions calling into ath9k_htc_stop_ani(). This might of course be
>> implicit.
>>
>> One call-path I see is:
>> ieee80211_scan_cancel()
>>   cancel_delayed_work()
>>
>> We cannot use cancel_delayed_work_sync() here due to locking issues.
>> However, this obviously races against any following
>> set_channel(OFFCHANNEL) request.
>>
>> If there's anything I can do to debug this, let me know. I tried
>> adding some printk()'s into the hot-path and it turns out to no longer
>> fail then. So this really seems to be a quite small race (given that a
>> bunch of simple printk()s is slow enough to work around it).
>
> David,
>
> Are you using USB devices? What is the PID/VID? I have not looked at
> HTC driver perspective.

Yes, I'm using the htc driver. I thought that was clear by pointing at
ar9002, but I was wrong, sorry. lsusb information is:
  'Bus 003 Device 056: ID 0411:017f BUFFALO INC. (formerly MelCo.,
Inc.) Sony UWA-BR100 802.11abgn Wireless Adapter [Atheros
AR7010+AR9280]'

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
index cdc7400..4667ab9 100644
--- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c
+++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c
@@ -99,14 +99,16 @@  static bool ar9002_hw_per_calibration(struct ath_hw *ah,
 				}
 
 				currCal->calData->calPostProc(ah, numChains);
-				caldata->CalValid |= currCal->calData->calType;
+				if (caldata)
+					caldata->CalValid |= currCal->calData->calType;
+
 				currCal->calState = CAL_DONE;
 				iscaldone = true;
 			} else {
 				ar9002_hw_setup_calibration(ah, currCal);
 			}
 		}
-	} else if (!(caldata->CalValid & currCal->calData->calType)) {
+	} else if (caldata && !(caldata->CalValid & currCal->calData->calType)) {
 		ath9k_hw_reset_calibration(ah, currCal);
 	}