diff mbox

[iwl4965] Microcode SW error detected

Message ID 20110607145435.GA5179@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Stanislaw Gruszka June 7, 2011, 2:54 p.m. UTC
On Tue, Jun 07, 2011 at 08:32:48AM +0200, Bernhard Schmidt wrote:
> On Monday, May 23, 2011 13:45:06 Paul Bolle wrote:
> > 0) Since I started running (release candidates of) kernel v2.6.39 errors
> > like these show up in my log, every now and then:
> > 
> > iwl4965 0000:03:00.0: Microcode SW error detected.  Restarting 0x82000000.
> > iwl4965 0000:03:00.0: Loaded firmware version: 228.61.2.24
> > iwl4965 0000:03:00.0: Start IWL Error Log Dump:
> > iwl4965 0000:03:00.0: Status: 0x000213E4, count: 5
> > iwl4965 0000:03:00.0: Desc                                  Time       data1      data2      line
> > iwl4965 0000:03:00.0: FH_ERROR                     (0x000C) 1821446380 0x00000008 0x03130000 208 
> 
> On a unrelated site note, I have a case where I can trigger a
> FH_ERROR at this line 100% reliably. Not on Linux though..
> While being associated to a 5GHz BSS doing a scan chan by chan
> (instead of all at once) is enough to trigger it.
That could be useful hint, we do not scan chan by chan, but we
have thing called "plcp check health", which "restart radio"
by requesting one channel scan. So perhaps disabling that could
help.

> A workaround
> is too not send probe requests for 2GHz channels at 1Mbps (CCK
> flag) but at 6Mbps instead.
> 
> Maybe this bug report [1] is related too?
> 
> [1] http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=1965

Hard to tell.

Thanks
Stanislaw

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paul Bolle June 7, 2011, 7:23 p.m. UTC | #1
On Tue, 2011-06-07 at 16:54 +0200, Stanislaw Gruszka wrote:
> That could be useful hint, we do not scan chan by chan, but we
> have thing called "plcp check health", which "restart radio"
> by requesting one channel scan. So perhaps disabling that could
> help.

At this moment I'm interested in something (a script, some sequence of
actions, whatever) that (somewhat) reliably triggers this error. Because
right now I have no clue what triggers it.

Is your patch in that category or is it a (crude) fix? If it's a fix,
I'm not sure it is of much help at this stage.



Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanislaw Gruszka June 8, 2011, 1:47 p.m. UTC | #2
On Tue, Jun 07, 2011 at 09:23:00PM +0200, Paul Bolle wrote:
> On Tue, 2011-06-07 at 16:54 +0200, Stanislaw Gruszka wrote:
> > That could be useful hint, we do not scan chan by chan, but we
> > have thing called "plcp check health", which "restart radio"
> > by requesting one channel scan. So perhaps disabling that could
> > help.
> 
> At this moment I'm interested in something (a script, some sequence of
> actions, whatever) that (somewhat) reliably triggers this error. Because
> right now I have no clue what triggers it.

Having reliable reproducer will be definitely something that is
nice to have. But bug could be some kind of race condition that happen
in code flow  once per 10000000000 cases ...  

> Is your patch in that category or is it a (crude) fix? If it's a fix,
> I'm not sure it is of much help at this stage.

It could be possible fix. Why you can not simply patch and see if errors
are still there? If after a week or so there will be no errors, we could
consider bug fixed, otherwise well ... still will need looking around
for fix.

I just posted patch that remove these "plcp health check" and related
code on -next anyway, because I don't think this is something that we
need.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Bolle Aug. 15, 2011, 10:51 a.m. UTC | #3
On Wed, 2011-06-08 at 15:47 +0200, Stanislaw Gruszka wrote:
> It could be possible fix. Why you can not simply patch and see if errors
> are still there? If after a week or so there will be no errors, we could
> consider bug fixed, otherwise well ... still will need looking around
> for fix.
> 
> I just posted patch that remove these "plcp health check" and related
> code on -next anyway, because I don't think this is something that we
> need.

0) This is just to note that I haven't yet tried to see if you're small
patch helps. I still hope to do that as I have not given up on this
issue. Feel free to prod me if I again disappear for too long and you
loose your patience.

1) By the way, I still see this error (every now and then) in my logs.
Most recently while running v3.0.1, so it appears not to be fixed by
recent updates for iwl4965 (if any).


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Bolle Sept. 4, 2011, 8:28 a.m. UTC | #4
On Tue, 2011-06-07 at 16:54 +0200, Stanislaw Gruszka wrote:
> That could be useful hint, we do not scan chan by chan, but we
> have thing called "plcp check health", which "restart radio"
> by requesting one channel scan. So perhaps disabling that could
> help.
> 
> [...]
>
> diff --git a/drivers/net/wireless/iwlegacy/iwl-rx.c b/drivers/net/wireless/iwlegacy/iwl-rx.c
> index 654cf23..6062da0 100644
> --- a/drivers/net/wireless/iwlegacy/iwl-rx.c
> +++ b/drivers/net/wireless/iwlegacy/iwl-rx.c
> @@ -230,6 +230,8 @@ EXPORT_SYMBOL(iwl_legacy_rx_spectrum_measure_notif);
>  void iwl_legacy_recover_from_statistics(struct iwl_priv *priv,
>  				struct iwl_rx_packet *pkt)
>  {
> +	return;
> +
>  	if (test_bit(STATUS_EXIT_PENDING, &priv->status))
>  		return;
>  	if (iwl_legacy_is_any_associated(priv)) {

0) I finally got around to applying this patch (to v3.0.4).

1) After a few days of normal usage (with quite a bit of suspend and
resume cycles) this error was again triggered. So avoiding
check_plcp_health() doesn't seem to help.

2) I never send you the debug output (ie, output after doing "modprobe
iwl4965 debug=0x47ffffff"), did I?


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanislaw Gruszka Sept. 5, 2011, 9:33 a.m. UTC | #5
On Sun, Sep 04, 2011 at 10:28:35AM +0200, Paul Bolle wrote:
> On Tue, 2011-06-07 at 16:54 +0200, Stanislaw Gruszka wrote:
> > That could be useful hint, we do not scan chan by chan, but we
> > have thing called "plcp check health", which "restart radio"
> > by requesting one channel scan. So perhaps disabling that could
> > help.
> > 
> > [...]
> >
> > diff --git a/drivers/net/wireless/iwlegacy/iwl-rx.c b/drivers/net/wireless/iwlegacy/iwl-rx.c
> > index 654cf23..6062da0 100644
> > --- a/drivers/net/wireless/iwlegacy/iwl-rx.c
> > +++ b/drivers/net/wireless/iwlegacy/iwl-rx.c
> > @@ -230,6 +230,8 @@ EXPORT_SYMBOL(iwl_legacy_rx_spectrum_measure_notif);
> >  void iwl_legacy_recover_from_statistics(struct iwl_priv *priv,
> >  				struct iwl_rx_packet *pkt)
> >  {
> > +	return;
> > +
> >  	if (test_bit(STATUS_EXIT_PENDING, &priv->status))
> >  		return;
> >  	if (iwl_legacy_is_any_associated(priv)) {
> 
> 0) I finally got around to applying this patch (to v3.0.4).
> 
> 1) After a few days of normal usage (with quite a bit of suspend and
> resume cycles) this error was again triggered. So avoiding
> check_plcp_health() doesn't seem to help.
> 
> 2) I never send you the debug output (ie, output after doing "modprobe
> iwl4965 debug=0x47ffffff"), did I?

No, but if error show up after few days, gathering and analyzing few
days of debug logs in impractical. Does wifi stop working after an
error, or there is some other negative impact? Or only that messages
are printed and driver recover itself?

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Bolle Sept. 5, 2011, 10:32 a.m. UTC | #6
On Mon, 2011-09-05 at 11:33 +0200, Stanislaw Gruszka wrote:
> On Sun, Sep 04, 2011 at 10:28:35AM +0200, Paul Bolle wrote:
> > 1) After a few days of normal usage (with quite a bit of suspend and
> > resume cycles) this error was again triggered. So avoiding
> > check_plcp_health() doesn't seem to help.
> > 
> > 2) I never send you the debug output (ie, output after doing "modprobe
> > iwl4965 debug=0x47ffffff"), did I?
> 
> No, but if error show up after few days, gathering and analyzing few
> days of debug logs in impractical.

I see.

> Does wifi stop working after an
> error, or there is some other negative impact? Or only that messages
> are printed and driver recover itself?

There doesn't seem to be any impact (ie, it might have some impact but
I'm too insensitive to notice). The driver does recover itself and I do
not have to mess with rfkill or "modprobe -r" or whatever. I actually
discovered this because I tend to regularly do
    dmesg -r |  grep "^<[123]>"

to keep myself informed of any kernel errors (or worse). And then these
few dozen lines can't go unnoticed.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Bolle Sept. 14, 2012, 12:17 p.m. UTC | #7
On Mon, 2011-09-05 at 12:32 +0200, Paul Bolle wrote:
> On Mon, 2011-09-05 at 11:33 +0200, Stanislaw Gruszka wrote:
> > Does wifi stop working after an
> > error, or there is some other negative impact? Or only that messages
> > are printed and driver recover itself?
> 
> There doesn't seem to be any impact (ie, it might have some impact but
> I'm too insensitive to notice). The driver does recover itself and I do
> not have to mess with rfkill or "modprobe -r" or whatever. I actually
> discovered this because I tend to regularly do
>     dmesg -r |  grep "^<[123]>"
> 
> to keep myself informed of any kernel errors (or worse). And then these
> few dozen lines can't go unnoticed.

0) It's one year later now and this Microcode SW error again showed up
in the logs. I recently upgraded and I haven't kept any logs, but my
guess would be that I have run into that error once every week. (This
laptop is now running a v3.5.3 based kernel as shipped for Fedora 17.)

1) Would you have any suggestions how to pinpoint the cause of this
error? It is mainly annoying, and I managed to ignore it since my
previous message, but I still would like to free the logs from the noise
it makes.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Bolle Oct. 15, 2012, 2:51 p.m. UTC | #8
On Fri, 2012-09-14 at 14:17 +0200, Paul Bolle wrote:
> 0) It's one year later now and this Microcode SW error again showed up
> in the logs. I recently upgraded and I haven't kept any logs, but my
> guess would be that I have run into that error once every week. (This
> laptop is now running a v3.5.3 based kernel as shipped for Fedora 17.)
> 
> 1) Would you have any suggestions how to pinpoint the cause of this
> error? It is mainly annoying, and I managed to ignore it since my
> previous message, but I still would like to free the logs from the noise
> it makes.

0) I ported the "iwlegacy_tracing" patch from
https://bugzilla.kernel.org/show_bug.cgi?id=42766 to v3.6-rc7 and to
iwl4965. I've been running iwl4965 with tracing enabled ever since (that
is on: v3.6-rc7, v3.6, v3.6.1, and v3.6.2). Finally, after only three
weeks I hit our Microcode SW error again.

1) So now I've got a 600+k line (or 65 MB) trace dump. What should I do
with it?


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stanislaw Gruszka Oct. 15, 2012, 3:17 p.m. UTC | #9
On Mon, Oct 15, 2012 at 04:51:00PM +0200, Paul Bolle wrote:
> On Fri, 2012-09-14 at 14:17 +0200, Paul Bolle wrote:
> > 0) It's one year later now and this Microcode SW error again showed up
> > in the logs. I recently upgraded and I haven't kept any logs, but my
> > guess would be that I have run into that error once every week. (This
> > laptop is now running a v3.5.3 based kernel as shipped for Fedora 17.)
> > 
> > 1) Would you have any suggestions how to pinpoint the cause of this
> > error? It is mainly annoying, and I managed to ignore it since my
> > previous message, but I still would like to free the logs from the noise
> > it makes.
> 
> 0) I ported the "iwlegacy_tracing" patch from
> https://bugzilla.kernel.org/show_bug.cgi?id=42766 to v3.6-rc7 and to
> iwl4965. I've been running iwl4965 with tracing enabled ever since (that
> is on: v3.6-rc7, v3.6, v3.6.1, and v3.6.2). Finally, after only three
> weeks I hit our Microcode SW error again.
> 
> 1) So now I've got a 600+k line (or 65 MB) trace dump. What should I do
> with it?

Just post me privately let say last 10MB of it...

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/wireless/iwlegacy/iwl-rx.c b/drivers/net/wireless/iwlegacy/iwl-rx.c
index 654cf23..6062da0 100644
--- a/drivers/net/wireless/iwlegacy/iwl-rx.c
+++ b/drivers/net/wireless/iwlegacy/iwl-rx.c
@@ -230,6 +230,8 @@  EXPORT_SYMBOL(iwl_legacy_rx_spectrum_measure_notif);
 void iwl_legacy_recover_from_statistics(struct iwl_priv *priv,
 				struct iwl_rx_packet *pkt)
 {
+	return;
+
 	if (test_bit(STATUS_EXIT_PENDING, &priv->status))
 		return;
 	if (iwl_legacy_is_any_associated(priv)) {