diff mbox

wl1271: tx stuck in firmware

Message ID 1397047312.4418.68.camel@dubbel (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Luca Coelho April 9, 2014, 12:41 p.m. UTC
On Wed, 2014-04-09 at 15:20 +0300, Arik Nemtsov wrote:
> >>
> >> # calibrator wlan0 plt power_mode on
> >> [   93.107215] wlcore: power up
> >> [   93.688714] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
> >> # [  139.232802] YY: wl1271_tx_allocate
> >> [  144.241849] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
> >
> > Okay, you have a starting point. :)
> >
> > Now try to look from where this is being called and so on.  I don't
> > think we should be calling wl1271_tx_allocate() in PLT mode either.  Try
> > tracing back until you find where this is coming from.
> 
> Actually we've seen this bug before. At the time it was caused by FW
> requests for dummy packets during PLT mode, which forced us to issue
> Tx..

Ah, interesting, I hadn't heard about it.  Would it send it even if the
event was masked out? Masking it out (if it's not already) for PLT would
probably be the right thing to do.


> It was fixed in the FW, but I'm not sure this was ever upstreamed
> (customer specific fix, etc). Anyway it's pretty safe to disable the
> Tx watchdog during PLT - try something like this:
> 
> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
> {
>      if (wl->plt)
>          return;

Maybe this could be blocked even earlier, like here:


--
Luca.

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Arik Nemtsov April 9, 2014, 1:07 p.m. UTC | #1
On Wed, Apr 9, 2014 at 3:41 PM, Luca Coelho <luca@coelho.fi> wrote:
>> Actually we've seen this bug before. At the time it was caused by FW
>> requests for dummy packets during PLT mode, which forced us to issue
>> Tx..
>
> Ah, interesting, I hadn't heard about it.  Would it send it even if the
> event was masked out? Masking it out (if it's not already) for PLT would
> probably be the right thing to do.
>
>
>> It was fixed in the FW, but I'm not sure this was ever upstreamed
>> (customer specific fix, etc). Anyway it's pretty safe to disable the
>> Tx watchdog during PLT - try something like this:
>>
>> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
>> {
>>      if (wl->plt)
>>          return;
>
> Maybe this could be blocked even earlier, like here:
>
> diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
> index 1f9a360..c44d3c0 100644
> --- a/drivers/net/wireless/ti/wlcore/event.c
> +++ b/drivers/net/wireless/ti/wlcore/event.c
> @@ -158,6 +158,9 @@ EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);
>
>  void wlcore_event_dummy_packet(struct wl1271 *wl)
>  {
> +       if (wl->plt)
> +               return;
> +
>         wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
>         wl1271_tx_dummy_packet(wl);
>  }
>
> --
> Luca.
>

These are good suggestions as well.

Arik
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yegor Yefremov April 9, 2014, 1:20 p.m. UTC | #2
On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <arik@wizery.com> wrote:
> On Wed, Apr 9, 2014 at 3:41 PM, Luca Coelho <luca@coelho.fi> wrote:
>>> Actually we've seen this bug before. At the time it was caused by FW
>>> requests for dummy packets during PLT mode, which forced us to issue
>>> Tx..
>>
>> Ah, interesting, I hadn't heard about it.  Would it send it even if the
>> event was masked out? Masking it out (if it's not already) for PLT would
>> probably be the right thing to do.
>>
>>
>>> It was fixed in the FW, but I'm not sure this was ever upstreamed
>>> (customer specific fix, etc). Anyway it's pretty safe to disable the
>>> Tx watchdog during PLT - try something like this:
>>>
>>> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
>>> {
>>>      if (wl->plt)
>>>          return;
>>
>> Maybe this could be blocked even earlier, like here:
>>
>> diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
>> index 1f9a360..c44d3c0 100644
>> --- a/drivers/net/wireless/ti/wlcore/event.c
>> +++ b/drivers/net/wireless/ti/wlcore/event.c
>> @@ -158,6 +158,9 @@ EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);
>>
>>  void wlcore_event_dummy_packet(struct wl1271 *wl)
>>  {
>> +       if (wl->plt)
>> +               return;
>> +
>>         wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
>>         wl1271_tx_dummy_packet(wl);
>>  }
>>
>> --
>> Luca.
>>
>
> These are good suggestions as well.

Both changes helped a lot. At least one can make a lot of tests
without a crash, though I had one

# calibrator wlan0 plt tune_channel 0 1
# calibrator wlan0 plt tx_tone 1 9000
[  180.041505] wlcore: ERROR command complete timeout
[  180.046622] ------------[ cut here ]------------
[  180.051756] WARNING: CPU: 0 PID: 1072 at
drivers/net/wireless/ti/wlcore/main.c:803
wl12xx_queue_recovery_work+0x60/0x6c()
[  180.063451] Modules linked in:
[  180.066734] CPU: 0 PID: 1072 Comm: calibrator Not tainted
3.14.0-rc4-12739-g17534e5-dirty #251
[  180.075991] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[  180.084311] [<c0011e38>] (show_stack) from [<c05c57d4>]
(dump_stack+0x7c/0x94)
[  180.092062] [<c05c57d4>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[  180.100721] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[  180.110028] [<c0040334>] (warn_slowpath_null) from [<c038de68>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[  180.119976] [<c038de68>] (wl12xx_queue_recovery_work) from
[<c0390710>] (wlcore_cmd_send_failsafe+0x74/0x420)
[  180.130596] [<c0390710>] (wlcore_cmd_send_failsafe) from
[<c0390b30>] (wl1271_cmd_test+0x4c/0x9c)
[  180.140005] [<c0390b30>] (wl1271_cmd_test) from [<c03a0184>]
(wl1271_tm_cmd+0x370/0x5ac)
[  180.148740] [<c03a0184>] (wl1271_tm_cmd) from [<c055a710>]
(nl80211_testmode_do+0x94/0x148)
[  180.157698] [<c055a710>] (nl80211_testmode_do) from [<c0496434>]
(genl_rcv_msg+0x224/0x390)
[  180.166635] [<c0496434>] (genl_rcv_msg) from [<c0495a1c>]
(netlink_rcv_skb+0xc4/0xd8)
[  180.175015] [<c0495a1c>] (netlink_rcv_skb) from [<c04961fc>]
(genl_rcv+0x20/0x34)
[  180.183023] [<c04961fc>] (genl_rcv) from [<c0495364>]
(netlink_unicast+0x15c/0x22c)
[  180.191219] [<c0495364>] (netlink_unicast) from [<c0495794>]
(netlink_sendmsg+0x2bc/0x37c)
[  180.199974] [<c0495794>] (netlink_sendmsg) from [<c045f950>]
(sock_sendmsg+0x84/0xa8)
[  180.208360] [<c045f950>] (sock_sendmsg) from [<c0460ea8>]
(___sys_sendmsg.part.28+0x268/0x278)
[  180.217568] [<c0460ea8>] (___sys_sendmsg.part.28) from [<c0461f10>]
(__sys_sendmsg+0x4c/0x7c)
[  180.226686] [<c0461f10>] (__sys_sendmsg) from [<c000e2c0>]
(ret_fast_syscall+0x0/0x48)
[  180.235143] ---[ end trace c13b2d4f8f45be2b ]---
[  180.241635] wlcore: WARNING TEST command failed
[  180.246453] wlcore: WARNING testmode cmd test failed: -110
command failed: Connection timed out (-110)

But after reboot I could successfully invoke calibrator wlan0 plt tx_tone 1 9000
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arik Nemtsov April 9, 2014, 1:25 p.m. UTC | #3
On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <arik@wizery.com> wrote:
> Both changes helped a lot. At least one can make a lot of tests
> without a crash, though I had one
>
> # calibrator wlan0 plt tune_channel 0 1
> # calibrator wlan0 plt tx_tone 1 9000
> [  180.041505] wlcore: ERROR command complete timeout

It's a FW error.
Not sure which change you tried, but if it was one of Luca's, I guess
the FW doesn't like it when you ignore its requests for a dummy
packet. Did you try my change with just disabling the re-arm?

Arik
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yegor Yefremov April 9, 2014, 3:28 p.m. UTC | #4
On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <arik@wizery.com> wrote:
> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <arik@wizery.com> wrote:
>> Both changes helped a lot. At least one can make a lot of tests
>> without a crash, though I had one
>>
>> # calibrator wlan0 plt tune_channel 0 1
>> # calibrator wlan0 plt tx_tone 1 9000
>> [  180.041505] wlcore: ERROR command complete timeout
>
> It's a FW error.
> Not sure which change you tried, but if it was one of Luca's, I guess
> the FW doesn't like it when you ignore its requests for a dummy
> packet. Did you try my change with just disabling the re-arm?

I have tried both changes. At first disabling the re-arm, it comes
with the same crash as when I implement both changes at once.

Yegor
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yegor Yefremov April 10, 2014, 7:21 a.m. UTC | #5
On Wed, Apr 9, 2014 at 5:28 PM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <arik@wizery.com> wrote:
>> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
>> <yegorslists@googlemail.com> wrote:
>>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <arik@wizery.com> wrote:
>>> Both changes helped a lot. At least one can make a lot of tests
>>> without a crash, though I had one
>>>
>>> # calibrator wlan0 plt tune_channel 0 1
>>> # calibrator wlan0 plt tx_tone 1 9000
>>> [  180.041505] wlcore: ERROR command complete timeout
>>
>> It's a FW error.
>> Not sure which change you tried, but if it was one of Luca's, I guess
>> the FW doesn't like it when you ignore its requests for a dummy
>> packet. Did you try my change with just disabling the re-arm?
>
> I have tried both changes. At first disabling the re-arm, it comes
> with the same crash as when I implement both changes at once.

What to do with usual fw and scanning?

# [  165.608684] YY: wl1271_scan_complete_work
[  165.614218] wlcore: Scan completed due to error.
[  165.619356] ------------[ cut here ]------------
[  165.624315] WARNING: CPU: 0 PID: 761 at
drivers/net/wireless/ti/wlcore/main.c:803
wl12xx_queue_recovery_work+0x60/0x6c()
[  165.635889] Modules linked in:
[  165.639335] CPU: 0 PID: 761 Comm: kworker/u2:2 Not tainted
3.14.0-rc4-12738-g674748b-dirty #253
[  165.648663] Workqueue: phy0 wl1271_scan_complete_work
[  165.654087] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[  165.662412] [<c0011e38>] (show_stack) from [<c05c5794>]
(dump_stack+0x7c/0x94)
[  165.670162] [<c05c5794>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[  165.678819] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[  165.688213] [<c0040334>] (warn_slowpath_null) from [<c038de34>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[  165.698165] [<c038de34>] (wl12xx_queue_recovery_work) from
[<c039f8d8>] (wl1271_scan_complete_work+0xf4/0x108)
[  165.708911] [<c039f8d8>] (wl1271_scan_complete_work) from
[<c0058fac>] (process_one_work+0x1ac/0x4c4)
[  165.718772] [<c0058fac>] (process_one_work) from [<c0059e40>]
(worker_thread+0x114/0x3b4)
[  165.727531] [<c0059e40>] (worker_thread) from [<c005f9fc>]
(kthread+0xcc/0xe8)
[  165.735187] [<c005f9fc>] (kthread) from [<c000e388>]
(ret_from_fork+0x14/0x2c)
[  165.742915] ---[ end trace 9c62bb183fd26be3 ]---
[  165.755662] wlcore: Hardware recovery in progress. FW ver: Rev 6.3.10.0.133
[  165.769379] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 1
[  165.778031] wlcore: down
[  165.790946] ieee80211 phy0: Hardware restart was requested
[  166.337122] wlcore: firmware booted (Rev 6.3.10.0.133)
[  188.977108] YY: wl1271_scan_complete_work
[  191.333964] YY: wl1271_scan_complete_work
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arik Nemtsov April 10, 2014, 7:43 a.m. UTC | #6
On Thu, Apr 10, 2014 at 10:21 AM, Yegor Yefremov
<yegorslists@googlemail.com> wrote:
> On Wed, Apr 9, 2014 at 5:28 PM, Yegor Yefremov
> <yegorslists@googlemail.com> wrote:
>> On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <arik@wizery.com> wrote:
>>> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
>>> <yegorslists@googlemail.com> wrote:
>>>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <arik@wizery.com> wrote:
>>>> Both changes helped a lot. At least one can make a lot of tests
>>>> without a crash, though I had one
>>>>
>>>> # calibrator wlan0 plt tune_channel 0 1
>>>> # calibrator wlan0 plt tx_tone 1 9000
>>>> [  180.041505] wlcore: ERROR command complete timeout
>>>
>>> It's a FW error.
>>> Not sure which change you tried, but if it was one of Luca's, I guess
>>> the FW doesn't like it when you ignore its requests for a dummy
>>> packet. Did you try my change with just disabling the re-arm?
>>
>> I have tried both changes. At first disabling the re-arm, it comes
>> with the same crash as when I implement both changes at once.
>
> What to do with usual fw and scanning?
>
> # [  165.608684] YY: wl1271_scan_complete_work
> [  165.614218] wlcore: Scan completed due to error.

Ah you didn't mention you also had problems with a regular FW. It's
probably either an API sync issue between the FW and driver, or a
platform integration issue where you're sometimes not getting
interrupts.

Arik
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
index 1f9a360..c44d3c0 100644
--- a/drivers/net/wireless/ti/wlcore/event.c
+++ b/drivers/net/wireless/ti/wlcore/event.c
@@ -158,6 +158,9 @@  EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);
 
 void wlcore_event_dummy_packet(struct wl1271 *wl)
 {
+       if (wl->plt)
+               return;
+
        wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
        wl1271_tx_dummy_packet(wl);
 }