Message ID | 1399447378-31503-1-git-send-email-dh.herrmann@gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: > ah->caldata may be NULL if no channel is selected. Check for that before > accessing it. > > Signed-off-by: David Herrmann <dh.herrmann@gmail.com> > --- > Hi > > This is _definitely_ only a workaround, given that no-one guarantees ah->caldata > is freed while we run in hw_per_calibration(). However, this patch fixes serious > kernel panics with wifi-P2P on my machine. > > I'm not sure why ah->caldata can be NULL, but it definitely is. I think the > correct fix would be to synchronously stop any running hw-calibration before > setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote > this small workaround. > > Thanks > David Is there any hope for getting a more complete fix from the ath9k guys in short order? John > > drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c > index cdc7400..4667ab9 100644 > --- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c > +++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c > @@ -99,14 +99,16 @@ static bool ar9002_hw_per_calibration(struct ath_hw *ah, > } > > currCal->calData->calPostProc(ah, numChains); > - caldata->CalValid |= currCal->calData->calType; > + if (caldata) > + caldata->CalValid |= currCal->calData->calType; > + > currCal->calState = CAL_DONE; > iscaldone = true; > } else { > ar9002_hw_setup_calibration(ah, currCal); > } > } > - } else if (!(caldata->CalValid & currCal->calData->calType)) { > + } else if (caldata && !(caldata->CalValid & currCal->calData->calType)) { > ath9k_hw_reset_calibration(ah, currCal); > } > > -- > 1.9.2 > >
On Wed, May 7, 2014 at 12:54 PM, John W. Linville <linville@tuxdriver.com> wrote: > Is there any hope for getting a more complete fix from the ath9k guys > in short order? Wait, who are the ath9k folks now exactly ? Luis -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014-05-07 21:54, John W. Linville wrote: > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: >> ah->caldata may be NULL if no channel is selected. Check for that before >> accessing it. >> >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> >> --- >> Hi >> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata >> is freed while we run in hw_per_calibration(). However, this patch fixes serious >> kernel panics with wifi-P2P on my machine. >> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the >> correct fix would be to synchronously stop any running hw-calibration before >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote >> this small workaround. >> >> Thanks >> David > > Is there any hope for getting a more complete fix from the ath9k guys > in short order? This looks easy to fix. I'll send a patch soon. - Felix -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 07, 2014 at 01:03:00PM -0700, Luis R. Rodriguez wrote: > On Wed, May 7, 2014 at 12:54 PM, John W. Linville > <linville@tuxdriver.com> wrote: > > Is there any hope for getting a more complete fix from the ath9k guys > > in short order? > > Wait, who are the ath9k folks now exactly ? A better question may be, who are the qualcomm folks now exactly? ;-) Anyway, I hope that the ath9k guys know who they are and will respond. If there aren't any, then I guess we'll have to start slapping bubble gum and popsicle sticks onto ath9k... John
On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: > ah->caldata may be NULL if no channel is selected. Check for that before > accessing it. > > Signed-off-by: David Herrmann <dh.herrmann@gmail.com> > --- > Hi > > This is _definitely_ only a workaround, given that no-one guarantees ah->caldata > is freed while we run in hw_per_calibration(). However, this patch fixes serious > kernel panics with wifi-P2P on my machine. > > I'm not sure why ah->caldata can be NULL, but it definitely is. I think the > correct fix would be to synchronously stop any running hw-calibration before > setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote > this small workaround. > David, Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe the panics. Is there a easiest way to reproduce the problem. Are you using wireless-testing tree? Thanks for reporting the problem. Will try to fix asap. -Rajkumar -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Thu, May 8, 2014 at 8:18 PM, Rajkumar Manoharan <rmanohar@qti.qualcomm.com> wrote: > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: >> ah->caldata may be NULL if no channel is selected. Check for that before >> accessing it. >> >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> >> --- >> Hi >> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata >> is freed while we run in hw_per_calibration(). However, this patch fixes serious >> kernel panics with wifi-P2P on my machine. >> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the >> correct fix would be to synchronously stop any running hw-calibration before >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote >> this small workaround. >> > David, > > Whenever the DUT is moving to off-channel, ah->caldata is set to NULL in > hw_reset. As you mentioned, before doing hw_reset, the on-going calibration is stopped > synchronously. I using ar9280 for p2p (GO & CLI) validation. Somehow i do not observe > the panics. Is there a easiest way to reproduce the problem. Are you > using wireless-testing tree? Thanks for reporting the problem. Will try > to fix asap. Reproducing it is actually quite easy on my machine. Whenever I start a P2P-connect from my Android-phone to my linux-host and _immediately_ accept it (via p2p_connect on wpas), I get the kernel-panic. Adding the NULL-protection fixes this. However, if I delay accepting the connection (ie, issuing p2p_connect by hand instead of automatically), I cannot see the bug. Furthermore, on my slower Intel Core 2 Duo, the bug happens much less likely. On my ARM machine I never saw this happening. Given that my main machine is an Intel hsw quad-core, I guess it's a simple race-condition. I also added a printk() whenever caldata is NULL and noticed that it fires only during the first 2 or 3 runs. After that, it never happened again. The bug happens on all linux kernels I tested (starting with 3.9ish up to linux-next). However, if I apply my fix, anything after 3.13-stable fails to transmit DHCP data. I can connect properly but DHCP always times out. I'm not sure why that happens and I'm still debugging this, but it's quite likely a separate issue. (if I find some time, I will bisect this) I now looked at the ath9k code and I couldn't see any locking around the hw_reset at all. I don't know whether the wifi-core / nl80211 locks this, but what happens if two hw_resets race each other? Just a guess.. I will try to look into it tomorrow. Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote: > On 2014-05-07 21:54, John W. Linville wrote: > > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: > >> ah->caldata may be NULL if no channel is selected. Check for that before > >> accessing it. > >> > >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> > >> --- > >> Hi > >> > >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata > >> is freed while we run in hw_per_calibration(). However, this patch fixes serious > >> kernel panics with wifi-P2P on my machine. > >> > >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the > >> correct fix would be to synchronously stop any running hw-calibration before > >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote > >> this small workaround. > >> > >> Thanks > >> David > > > > Is there any hope for getting a more complete fix from the ath9k guys > > in short order? > This looks easy to fix. I'll send a patch soon. Ping?
On 2014-05-12 19:49, John W. Linville wrote: > On Wed, May 07, 2014 at 10:15:53PM +0200, Felix Fietkau wrote: >> On 2014-05-07 21:54, John W. Linville wrote: >> > On Wed, May 07, 2014 at 09:22:58AM +0200, David Herrmann wrote: >> >> ah->caldata may be NULL if no channel is selected. Check for that before >> >> accessing it. >> >> >> >> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> >> >> --- >> >> Hi >> >> >> >> This is _definitely_ only a workaround, given that no-one guarantees ah->caldata >> >> is freed while we run in hw_per_calibration(). However, this patch fixes serious >> >> kernel panics with wifi-P2P on my machine. >> >> >> >> I'm not sure why ah->caldata can be NULL, but it definitely is. I think the >> >> correct fix would be to synchronously stop any running hw-calibration before >> >> setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote >> >> this small workaround. >> >> >> >> Thanks >> >> David >> > >> > Is there any hope for getting a more complete fix from the ath9k guys >> > in short order? >> This looks easy to fix. I'll send a patch soon. > > Ping? I looked into it again, the scenario where I assumed that this problem could occur didn't turn out to be true. I have no idea how this crash can occur. - Felix -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote: > I looked into it again, the scenario where I assumed that this problem > could occur didn't turn out to be true. I have no idea how this crash > can occur. The only path that can set ah->caldata to NULL is through: ieee80211_hw_config() ath9k_htc_config() ath9k_htc_set_channel() ath9k_hw_reset() This happens whenever IEEE80211_CONF_OFFCHANNEL is set. Now mac80211 is way to big for me to review right now and ieee80211_hw_config() is used quite often. Given that the described call-path does no synchronization against ath9k_htc_ani_work(), all the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no ani-work is running. Is that intentional? I cannot see any of those functions calling into ath9k_htc_stop_ani(). This might of course be implicit. One call-path I see is: ieee80211_scan_cancel() cancel_delayed_work() We cannot use cancel_delayed_work_sync() here due to locking issues. However, this obviously races against any following set_channel(OFFCHANNEL) request. If there's anything I can do to debug this, let me know. I tried adding some printk()'s into the hot-path and it turns out to no longer fail then. So this really seems to be a quite small race (given that a bunch of simple printk()s is slow enough to work around it). Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote: > Hi > > On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote: > > I looked into it again, the scenario where I assumed that this problem > > could occur didn't turn out to be true. I have no idea how this crash > > can occur. > > The only path that can set ah->caldata to NULL is through: > > ieee80211_hw_config() > ath9k_htc_config() > ath9k_htc_set_channel() > ath9k_hw_reset() > > This happens whenever IEEE80211_CONF_OFFCHANNEL is set. > > Now mac80211 is way to big for me to review right now and > ieee80211_hw_config() is used quite often. Given that the described > call-path does no synchronization against ath9k_htc_ani_work(), all > the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no > ani-work is running. Is that intentional? I cannot see any of those > functions calling into ath9k_htc_stop_ani(). This might of course be > implicit. > > One call-path I see is: > ieee80211_scan_cancel() > cancel_delayed_work() > > We cannot use cancel_delayed_work_sync() here due to locking issues. > However, this obviously races against any following > set_channel(OFFCHANNEL) request. > > If there's anything I can do to debug this, let me know. I tried > adding some printk()'s into the hot-path and it turns out to no longer > fail then. So this really seems to be a quite small race (given that a > bunch of simple printk()s is slow enough to work around it). David, Are you using USB devices? What is the PID/VID? I have not looked at HTC driver perspective. -Rajkumar -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Tue, May 13, 2014 at 11:00 AM, Rajkumar Manoharan <rmanohar@qti.qualcomm.com> wrote: > On Tue, May 13, 2014 at 08:50:00AM +0200, David Herrmann wrote: >> Hi >> >> On Mon, May 12, 2014 at 8:43 PM, Felix Fietkau <nbd@openwrt.org> wrote: >> > I looked into it again, the scenario where I assumed that this problem >> > could occur didn't turn out to be true. I have no idea how this crash >> > can occur. >> >> The only path that can set ah->caldata to NULL is through: >> >> ieee80211_hw_config() >> ath9k_htc_config() >> ath9k_htc_set_channel() >> ath9k_hw_reset() >> >> This happens whenever IEEE80211_CONF_OFFCHANNEL is set. >> >> Now mac80211 is way to big for me to review right now and >> ieee80211_hw_config() is used quite often. Given that the described >> call-path does no synchronization against ath9k_htc_ani_work(), all >> the callers of mac80211_hw_config(OFFCHANNEL) must guarantee that no >> ani-work is running. Is that intentional? I cannot see any of those >> functions calling into ath9k_htc_stop_ani(). This might of course be >> implicit. >> >> One call-path I see is: >> ieee80211_scan_cancel() >> cancel_delayed_work() >> >> We cannot use cancel_delayed_work_sync() here due to locking issues. >> However, this obviously races against any following >> set_channel(OFFCHANNEL) request. >> >> If there's anything I can do to debug this, let me know. I tried >> adding some printk()'s into the hot-path and it turns out to no longer >> fail then. So this really seems to be a quite small race (given that a >> bunch of simple printk()s is slow enough to work around it). > > David, > > Are you using USB devices? What is the PID/VID? I have not looked at > HTC driver perspective. Yes, I'm using the htc driver. I thought that was clear by pointing at ar9002, but I was wrong, sorry. lsusb information is: 'Bus 003 Device 056: ID 0411:017f BUFFALO INC. (formerly MelCo., Inc.) Sony UWA-BR100 802.11abgn Wireless Adapter [Atheros AR7010+AR9280]' Thanks David -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/wireless/ath/ath9k/ar9002_calib.c b/drivers/net/wireless/ath/ath9k/ar9002_calib.c index cdc7400..4667ab9 100644 --- a/drivers/net/wireless/ath/ath9k/ar9002_calib.c +++ b/drivers/net/wireless/ath/ath9k/ar9002_calib.c @@ -99,14 +99,16 @@ static bool ar9002_hw_per_calibration(struct ath_hw *ah, } currCal->calData->calPostProc(ah, numChains); - caldata->CalValid |= currCal->calData->calType; + if (caldata) + caldata->CalValid |= currCal->calData->calType; + currCal->calState = CAL_DONE; iscaldone = true; } else { ar9002_hw_setup_calibration(ah, currCal); } } - } else if (!(caldata->CalValid & currCal->calData->calType)) { + } else if (caldata && !(caldata->CalValid & currCal->calData->calType)) { ath9k_hw_reset_calibration(ah, currCal); }
ah->caldata may be NULL if no channel is selected. Check for that before accessing it. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> --- Hi This is _definitely_ only a workaround, given that no-one guarantees ah->caldata is freed while we run in hw_per_calibration(). However, this patch fixes serious kernel panics with wifi-P2P on my machine. I'm not sure why ah->caldata can be NULL, but it definitely is. I think the correct fix would be to synchronously stop any running hw-calibration before setting ah->caldata to NULL. I don't know whether/where that is done, so I wrote this small workaround. Thanks David drivers/net/wireless/ath/ath9k/ar9002_calib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)