diff mbox

[RFC] ath9k: Work around complete stuck of hw

Message ID 5053160B.8060908@openwrt.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Felix Fietkau Sept. 14, 2012, 11:33 a.m. UTC
On 2012-09-14 11:47 AM, Sven Eckelmann wrote:
> AR9330 and most likely other chips like AR9285 seem to get stuck completely
> after they worked a long period of time in special environments. It is
> currently unknown which parameters causes this problem.
> 
> Symptom of these stuck is the exposure of 0xdeadbeef through different hardware
> registers. An interface down/up change seems to help the hardware to recover
> from the problem.
> 
> A workaround is to periodically test register AR_CFG for 0xdeadbeef and force
> an reset when 0xdeadbeef would be unexpected.
> 
> Signed-off-by: Sven Eckelmann <sven@narfation.org>
> Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
> ---
> This check is currently tested. This takes quite a long time and maybe someone
> with more knowledge of atheros devices can check whether this one is completely
> and utterly wrong.
> 
> The type RESET_TYPE_FATAL_INT was chosen in this test to allow us to see
> whether this condition was already true by reading from
> /sys/kernel/debug/ieee80211/phy0/ath9k/reset
Your debug patch should not be silent when it resets the hw. We need 
to make sure that this bug gets fixed properly. If my patch below does
not fix it, then at least add a WARN_ON to ensure that we don't just
hide the bug and move on.

Somebody on the openwrt-devel list pointed out that there is some code 
missing in the ar933x wmac reset function. Please try this patch (apply 
it to your kernel tree):
---


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sven Eckelmann Sept. 23, 2012, 10:04 a.m. UTC | #1
On Friday 14 September 2012 13:33:31 Felix Fietkau wrote:
[..]
> > ---
> > This check is currently tested. This takes quite a long time and maybe
> > someone with more knowledge of atheros devices can check whether this one
> > is completely and utterly wrong.
> > 
> > The type RESET_TYPE_FATAL_INT was chosen in this test to allow us to see
> > whether this condition was already true by reading from
> > /sys/kernel/debug/ieee80211/phy0/ath9k/reset
> 
> Your debug patch should not be silent when it resets the hw. We need
> to make sure that this bug gets fixed properly. If my patch below does
> not fix it, then at least add a WARN_ON to ensure that we don't just
> hide the bug and move on.

It is not silent for me and is not a patch that should be applied to the 
upstream kernel. Just wanted to ask whether this is the right place and 
approach

> Somebody on the openwrt-devel list pointed out that there is some code
> missing in the ar933x wmac reset function. Please try this patch (apply
> it to your kernel tree):

Neither mine nor your patch has an effect. One node is now again dead after ~8 
1/2 days. It looks like the check isn't done (correctly). I will investigate 
this further tomorrow.

Kind regards,
	Sven
diff mbox

Patch

--- a/arch/mips/ath79/dev-wmac.c
+++ b/arch/mips/ath79/dev-wmac.c
@@ -67,10 +67,27 @@  static void __init ar913x_wmac_setup(voi
 
 static int ar933x_wmac_reset(void)
 {
+	int retries = 20;
+
 	ath79_device_reset_set(AR933X_RESET_WMAC);
 	ath79_device_reset_clear(AR933X_RESET_WMAC);
 
-	return 0;
+	while (1) {
+		u32 bootstrap;
+
+		bootstrap = ath79_reset_rr(AR933X_RESET_REG_BOOTSTRAP);
+		if ((bootstrap & AR933X_BOOTSTRAP_EEPBUSY) == 0)
+			return 0;
+
+		if (retries-- == 0)
+			break;
+
+		udelay(10000);
+		retries++;
+	}
+
+	pr_err("ar93xx: WMAC reset timed out");
+	return -ETIMEDOUT;
 }
 
 static int ar933x_r1_get_wmac_revision(void)