Patchwork brcmfmac: sdio: Increase the default timeouts a bit

login
register
mail settings
Submitter Sjoerd Simons
Date Jan. 25, 2016, 10:47 a.m.
Message ID <1453718849-3508-1-git-send-email-sjoerd.simons@collabora.co.uk>
Download mbox | patch
Permalink /patch/8106601/
State New
Headers show

Comments

Sjoerd Simons - Jan. 25, 2016, 10:47 a.m.
On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
the card responds very quickly most of the time, unfortunately during
initialisation it sometimes seems to take just a bit over 2 seconds to
respond.

This results intialization failing with message like:
  brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
  brcmf_bus_start: failed: -52
  brcmf_sdio_firmware_callback: dongle is not responding

Increasing the timeout to allow for a bit more headroom allows the
card to initialize reliably.

A quick search online after diagnosing/fixing this showed that Google
has a similar patch in their ChromeOS tree, so this doesn't seem
specific to the board I'm using.

Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>

---

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Julian Calaby - Jan. 25, 2016, 11:06 a.m.
Hi Sjoerd,

On Mon, Jan 25, 2016 at 9:47 PM, Sjoerd Simons
<sjoerd.simons@collabora.co.uk> wrote:
> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
> the card responds very quickly most of the time, unfortunately during
> initialisation it sometimes seems to take just a bit over 2 seconds to
> respond.
>
> This results intialization failing with message like:
>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>   brcmf_bus_start: failed: -52
>   brcmf_sdio_firmware_callback: dongle is not responding
>
> Increasing the timeout to allow for a bit more headroom allows the
> card to initialize reliably.
>
> A quick search online after diagnosing/fixing this showed that Google
> has a similar patch in their ChromeOS tree, so this doesn't seem
> specific to the board I'm using.
>
> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>

Looks sane to me.

Reviewed-by: Julian Calaby <julian.calaby@gmail.com>

>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index dd66143..75ac4bd 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -45,8 +45,8 @@
>  #include "chip.h"
>  #include "firmware.h"
>
> -#define DCMD_RESP_TIMEOUT      msecs_to_jiffies(2000)
> -#define CTL_DONE_TIMEOUT       msecs_to_jiffies(2000)
> +#define DCMD_RESP_TIMEOUT      msecs_to_jiffies(2500)
> +#define CTL_DONE_TIMEOUT       msecs_to_jiffies(2500)
>
>  #ifdef DEBUG
>
> --
> 2.7.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arend van Spriel - Jan. 25, 2016, 3:36 p.m.
On 25-01-16 11:47, Sjoerd Simons wrote:
> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
> the card responds very quickly most of the time, unfortunately during
> initialisation it sometimes seems to take just a bit over 2 seconds to
> respond.
> 
> This results intialization failing with message like:
>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>   brcmf_bus_start: failed: -52
>   brcmf_sdio_firmware_callback: dongle is not responding
> 
> Increasing the timeout to allow for a bit more headroom allows the
> card to initialize reliably.

I would prefer to know where the 2 second response time comes from.
Could be sdio retuning. Maybe the chromeos people can comment whether
this has been root caused.

There is a mmc patch pending in which retuning procedure can be deferred
by the driver. Using that API may resolve the issue as well and I would
prefer that solution.

> A quick search online after diagnosing/fixing this showed that Google
> has a similar patch in their ChromeOS tree, so this doesn't seem
> specific to the board I'm using.

As the retuning stuff is not in main line I guess we need this fix for
now so...

Acked-by: Arend van Spriel <arend@broadcom.com>
> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
> 
> ---
Still would like to know whether it is firmware initialization or some
mmc stack procedure. Any suggestions to debug this are welcome.

Regards,
Arend
---
> 
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> index dd66143..75ac4bd 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> @@ -45,8 +45,8 @@
>  #include "chip.h"
>  #include "firmware.h"
>  
> -#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2000)
> -#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2000)
> +#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2500)
> +#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2500)
>  
>  #ifdef DEBUG
>  
>
Arend van Spriel - Jan. 25, 2016, 3:39 p.m.
On 25-01-16 12:06, Julian Calaby wrote:
> Hi Sjoerd,
> 
> On Mon, Jan 25, 2016 at 9:47 PM, Sjoerd Simons
> <sjoerd.simons@collabora.co.uk> wrote:
>> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
>> the card responds very quickly most of the time, unfortunately during
>> initialisation it sometimes seems to take just a bit over 2 seconds to
>> respond.
>>
>> This results intialization failing with message like:
>>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>>   brcmf_bus_start: failed: -52
>>   brcmf_sdio_firmware_callback: dongle is not responding
>>
>> Increasing the timeout to allow for a bit more headroom allows the
>> card to initialize reliably.
>>
>> A quick search online after diagnosing/fixing this showed that Google
>> has a similar patch in their ChromeOS tree, so this doesn't seem
>> specific to the board I'm using.
>>
>> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
> 
> Looks sane to me.
> 
> Reviewed-by: Julian Calaby <julian.calaby@gmail.com>

Not really a cleanup patch :-p , but thanks for the review.

Regards,
Arend

> 
>>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> index dd66143..75ac4bd 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
>> @@ -45,8 +45,8 @@
>>  #include "chip.h"
>>  #include "firmware.h"
>>
>> -#define DCMD_RESP_TIMEOUT      msecs_to_jiffies(2000)
>> -#define CTL_DONE_TIMEOUT       msecs_to_jiffies(2000)
>> +#define DCMD_RESP_TIMEOUT      msecs_to_jiffies(2500)
>> +#define CTL_DONE_TIMEOUT       msecs_to_jiffies(2500)
>>
>>  #ifdef DEBUG
>>
>> --
>> 2.7.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
>
Sjoerd Simons - Jan. 25, 2016, 3:55 p.m.
On Mon, 2016-01-25 at 16:36 +0100, Arend van Spriel wrote:
> On 25-01-16 11:47, Sjoerd Simons wrote:
> > On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it
> > seems
> > the card responds very quickly most of the time, unfortunately
> > during
> > initialisation it sometimes seems to take just a bit over 2 seconds
> > to
> > respond.
> > 
> > This results intialization failing with message like:
> >   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
> >   brcmf_bus_start: failed: -52
> >   brcmf_sdio_firmware_callback: dongle is not responding
> > 
> > Increasing the timeout to allow for a bit more headroom allows the
> > card to initialize reliably.
> 
> I would prefer to know where the 2 second response time comes from.
> Could be sdio retuning. Maybe the chromeos people can comment whether
> this has been root caused.

The only reference i could find for where the timeout came from was in
the bcmdhd which has:
  #define IOCTL_RESP_TIMEOUT  2000  /* In milli second default value
for Production FW */

But not sure if that's helpful :). 


> There is a mmc patch pending in which retuning procedure can be
> deferred
> by the driver. Using that API may resolve the issue as well and I
> would
> prefer that solution.
> 
> > A quick search online after diagnosing/fixing this showed that
> > Google
> > has a similar patch in their ChromeOS tree, so this doesn't seem
> > specific to the board I'm using.
> 
> As the retuning stuff is not in main line I guess we need this fix
> for
> now so...
> 
> Acked-by: Arend van Spriel <arend@broadcom.com>
> > Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
> > 
> > ---
> Still would like to know whether it is firmware initialization or
> some
> mmc stack procedure. Any suggestions to debug this are welcome.
> 
> Regards,
> Arend
> ---
> > 
> >  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git
> > a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> > b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> > index dd66143..75ac4bd 100644
> > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
> > @@ -45,8 +45,8 @@
> >  #include "chip.h"
> >  #include "firmware.h"
> >  
> > -#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2000)
> > -#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2000)
> > +#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2500)
> > +#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2500)
> >  
> >  #ifdef DEBUG
> >  
> >
Douglas Anderson - Jan. 25, 2016, 7:23 p.m.
Hi,

On Mon, Jan 25, 2016 at 7:36 AM, Arend van Spriel <aspriel@gmail.com> wrote:
> On 25-01-16 11:47, Sjoerd Simons wrote:
>> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
>> the card responds very quickly most of the time, unfortunately during
>> initialisation it sometimes seems to take just a bit over 2 seconds to
>> respond.
>>
>> This results intialization failing with message like:
>>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>>   brcmf_bus_start: failed: -52
>>   brcmf_sdio_firmware_callback: dongle is not responding
>>
>> Increasing the timeout to allow for a bit more headroom allows the
>> card to initialize reliably.
>
> I would prefer to know where the 2 second response time comes from.
> Could be sdio retuning. Maybe the chromeos people can comment whether
> this has been root caused.

I reviewed Paul's change here
<https://chromium-review.googlesource.com/#/c/225921/> but didn't do
any root causing.

I think that, like Sjoerd saw, we were seeing this problem at boot
time.  Certainly at boot time lots of things are happening all at the
same time in the system and there are often delays, so anything that
might have been close to timing out in the past may now be actually
timing out.

This is the kind of thing that, IMHO, should have a real timeout that
is 10x what was expected and a non-fatal warning whenever we go over
the expected time.  ...but maybe that's overdesign.  :-P

Kinda curious: do we get one or two really slow responses on every
bootup, or just some bootups?  Do we ever succeed even with a slow
(like 1.8 or 1.9 seconds) response, or is it always either "fast" or
"2.1" seconds?


In any case, in my experience the Broadcom firmware is fairly
complicated and has numerous cases where it stretches SDIO more than
the other SDIO WiFi chip I've worked with.  It wouldn't terribly
surprise me if there was a period of time during bootup where it was
non-responsive for 2 seconds.  As unrelated "evidence" showing some of
the Broadcom SDIO limitations, you can see
<https://chromium-review.googlesource.com/#/c/250228/> and also the
fact that Broadcom often holds the SDIO "busy" signal whereas the
other SDIO WiFi chip I've worked never did that.  Also, even with all
fixes the Broadcom WiFi module will still show periodic SDIO errors
that the higher level driver just knows to ignore.

My old debugging from the (sorry, private) bug
http://crosbug.com/p/36975 showed this periodically even with all
known fixes:

[21310.271635] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
[21550.583598] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
[21550.616035] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
[21550.648460] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
frame, send NAK
[21550.683502] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
[21550.691214] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
[22671.121329] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
[22671.153167] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x01000104
[22671.184581] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
[22671.192600] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
frame, send NAK
[22671.201929] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000114
[22671.209536] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
[28463.941736] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104

At the time dekim@ responded:

> There are several sleep/wake control at different level. The one we're talking
> about here is controlled by brcmf_sdio_bus_sleep() in the host driver to turn
> on/off bus core on the chip. There can be a period of time when chip is not
> paying attention to the host command (cmd52 to the
> SBSDIO_FUNC1_SLEEPCSR).

...and we decided that the periodic SDIO errors weren't causing any
huge problems (since they were retried).  As far as I know, they still
happen today.


All of the above may not help you, but it serves as evidence that the
SDIO communication to Broadcom isn't terribly amazing and apparently
that's just the way that the module (or perhaps its firmware) is
designed.  It doesn't seem to affect anything in the real world, so I
suppose it is just something we need to live with.


Obviously if you have access to the firmware source code and can debug
further, that would be awesome.  I'm just not hopeful.


In any case:

Reviewed-by: Douglas Anderson <dianders@chromium.org>
Arend van Spriel - Jan. 25, 2016, 8:07 p.m.
On 25-1-2016 20:23, Doug Anderson wrote:
> Hi,
> 
> On Mon, Jan 25, 2016 at 7:36 AM, Arend van Spriel <aspriel@gmail.com> wrote:
>> On 25-01-16 11:47, Sjoerd Simons wrote:
>>> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
>>> the card responds very quickly most of the time, unfortunately during
>>> initialisation it sometimes seems to take just a bit over 2 seconds to
>>> respond.
>>>
>>> This results intialization failing with message like:
>>>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>>>   brcmf_bus_start: failed: -52
>>>   brcmf_sdio_firmware_callback: dongle is not responding
>>>
>>> Increasing the timeout to allow for a bit more headroom allows the
>>> card to initialize reliably.
>>
>> I would prefer to know where the 2 second response time comes from.
>> Could be sdio retuning. Maybe the chromeos people can comment whether
>> this has been root caused.

Hi Doug,

Thanks for the elaborate response

> I reviewed Paul's change here
> <https://chromium-review.googlesource.com/#/c/225921/> but didn't do
> any root causing.
> 
> I think that, like Sjoerd saw, we were seeing this problem at boot
> time.  Certainly at boot time lots of things are happening all at the
> same time in the system and there are often delays, so anything that
> might have been close to timing out in the past may now be actually
> timing out.
> 
> This is the kind of thing that, IMHO, should have a real timeout that
> is 10x what was expected and a non-fatal warning whenever we go over
> the expected time.  ...but maybe that's overdesign.  :-P
> 
> Kinda curious: do we get one or two really slow responses on every
> bootup, or just some bootups?  Do we ever succeed even with a slow
> (like 1.8 or 1.9 seconds) response, or is it always either "fast" or
> "2.1" seconds?

Now these are interesting questions that I should have spilled out in
the first place. Thanks.

> In any case, in my experience the Broadcom firmware is fairly
> complicated and has numerous cases where it stretches SDIO more than
> the other SDIO WiFi chip I've worked with.  It wouldn't terribly
> surprise me if there was a period of time during bootup where it was
> non-responsive for 2 seconds.  As unrelated "evidence" showing some of
> the Broadcom SDIO limitations, you can see
> <https://chromium-review.googlesource.com/#/c/250228/> and also the
> fact that Broadcom often holds the SDIO "busy" signal whereas the
> other SDIO WiFi chip I've worked never did that.  Also, even with all
> fixes the Broadcom WiFi module will still show periodic SDIO errors
> that the higher level driver just knows to ignore.

The busy signal is in accordance with the SDIO spec. It would be good to
know if that is what is happening. Unfortunately I do have an SDIO
analyzer, but not reproduced it. May retry on veyron device.

> My old debugging from the (sorry, private) bug
> http://crosbug.com/p/36975 showed this periodically even with all
> known fixes:
> 
> [21310.271635] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
> [21550.583598] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
> [21550.616035] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
> [21550.648460] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
> frame, send NAK
> [21550.683502] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
> [21550.691214] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
> [22671.121329] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
> [22671.153167] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x01000104
> [22671.184581] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
> [22671.192600] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
> frame, send NAK
> [22671.201929] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000114
> [22671.209536] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
> [28463.941736] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
> 
> At the time dekim@ responded:
> 
>> There are several sleep/wake control at different level. The one we're talking
>> about here is controlled by brcmf_sdio_bus_sleep() in the host driver to turn
>> on/off bus core on the chip. There can be a period of time when chip is not
>> paying attention to the host command (cmd52 to the
>> SBSDIO_FUNC1_SLEEPCSR).
> 
> ...and we decided that the periodic SDIO errors weren't causing any
> huge problems (since they were retried).  As far as I know, they still
> happen today.

Were these true periodic errors or random at interval.

> 
> All of the above may not help you, but it serves as evidence that the
> SDIO communication to Broadcom isn't terribly amazing and apparently
> that's just the way that the module (or perhaps its firmware) is
> designed.  It doesn't seem to affect anything in the real world, so I
> suppose it is just something we need to live with.
> 
> 
> Obviously if you have access to the firmware source code and can debug
> further, that would be awesome.  I'm just not hopeful.

I have, but that does not get my hopes up either. The issue may just as
well be in backplane access during boot and then we are talking VHDL.

> In any case:
> 
> Reviewed-by: Douglas Anderson <dianders@chromium.org>

Thanks again ;-)

Regards,
Arend
Douglas Anderson - Jan. 25, 2016, 8:32 p.m.
Hi,

On Mon, Jan 25, 2016 at 12:07 PM, Arend van Spriel <arend@broadcom.com> wrote:
>> In any case, in my experience the Broadcom firmware is fairly
>> complicated and has numerous cases where it stretches SDIO more than
>> the other SDIO WiFi chip I've worked with.  It wouldn't terribly
>> surprise me if there was a period of time during bootup where it was
>> non-responsive for 2 seconds.  As unrelated "evidence" showing some of
>> the Broadcom SDIO limitations, you can see
>> <https://chromium-review.googlesource.com/#/c/250228/> and also the
>> fact that Broadcom often holds the SDIO "busy" signal whereas the
>> other SDIO WiFi chip I've worked never did that.  Also, even with all
>> fixes the Broadcom WiFi module will still show periodic SDIO errors
>> that the higher level driver just knows to ignore.
>
> The busy signal is in accordance with the SDIO spec.

Sorry, didn't mean to imply that it wasn't.  That particular instance
is definitely legal, but it was also just showing a case where the
Broadcom WiFi needed something that wasn't needed elsewhere.

AKA: it's kinda like saying that other modules work on a UART without
flow control lines and the Broadcom needs the flow control lines.
UARTs are designed with flow control lines and it's totally legal to
need them.  ...but if you saw that one serial device didn't need them
then you would presume that maybe the device on the other end had very
low interrupt latency or had some other hardware help (large FIFOs,
DMA, fast processor, hard real time OS, etc).  A device with low
interrupt latency, large FIFOs, DMA, fast processor, etc would
presumably be less likely to have other pauses in communication.


> It would be good to
> know if that is what is happening. Unfortunately I do have an SDIO
> analyzer, but not reproduced it. May retry on veyron device.

You might just be able to put a trace to see if it hits the busy
signal.  Depending on your Linux version you might just be able to put
a trace in __mmc_start_request().  If not and if you're on dw_mmc you
can put a trace in dw_mci_wait_while_busy().


>> My old debugging from the (sorry, private) bug
>> http://crosbug.com/p/36975 showed this periodically even with all
>> known fixes:
>>
>> [21310.271635] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
>> [21550.583598] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
>> [21550.616035] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
>> [21550.648460] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
>> frame, send NAK
>> [21550.683502] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
>> [21550.691214] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
>> [22671.121329] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
>> [22671.153167] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x01000104
>> [22671.184581] brcmfmac: brcmf_sdio_readframes: RXHEADER FAILED: -110
>> [22671.192600] brcmfmac: brcmf_sdio_rxfail: abort command, terminate
>> frame, send NAK
>> [22671.201929] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000114
>> [22671.209536] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000100
>> [28463.941736] dwmmc_rockchip ff0d0000.dwmmc: CMD ERR: 0x00000104
>>
>> At the time dekim@ responded:
>>
>>> There are several sleep/wake control at different level. The one we're talking
>>> about here is controlled by brcmf_sdio_bus_sleep() in the host driver to turn
>>> on/off bus core on the chip. There can be a period of time when chip is not
>>> paying attention to the host command (cmd52 to the
>>> SBSDIO_FUNC1_SLEEPCSR).
>>
>> ...and we decided that the periodic SDIO errors weren't causing any
>> huge problems (since they were retried).  As far as I know, they still
>> happen today.
>
> Were these true periodic errors or random at interval.

I believe they were random, and I believe that I needed "power save"
mode turned on for WiFi.  I believe errors were made worse if I had a
TCP/IP ping running (AKA bring out of power save and go back in
constantly).
Julian Calaby - Jan. 25, 2016, 11:41 p.m.
Hi Arend,

On Tue, Jan 26, 2016 at 2:39 AM, Arend van Spriel <aspriel@gmail.com> wrote:
> On 25-01-16 12:06, Julian Calaby wrote:
>> Hi Sjoerd,
>>
>> On Mon, Jan 25, 2016 at 9:47 PM, Sjoerd Simons
>> <sjoerd.simons@collabora.co.uk> wrote:
>>> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
>>> the card responds very quickly most of the time, unfortunately during
>>> initialisation it sometimes seems to take just a bit over 2 seconds to
>>> respond.
>>>
>>> This results intialization failing with message like:
>>>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>>>   brcmf_bus_start: failed: -52
>>>   brcmf_sdio_firmware_callback: dongle is not responding
>>>
>>> Increasing the timeout to allow for a bit more headroom allows the
>>> card to initialize reliably.
>>>
>>> A quick search online after diagnosing/fixing this showed that Google
>>> has a similar patch in their ChromeOS tree, so this doesn't seem
>>> specific to the board I'm using.
>>>
>>> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
>>
>> Looks sane to me.
>>
>> Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
>
> Not really a cleanup patch :-p , but thanks for the review.

I'm trying to review any "small" patch from (relatively) new people.

Thanks,
Arend van Spriel - Jan. 26, 2016, 6:29 a.m.
On 26-01-16 00:41, Julian Calaby wrote:
> Hi Arend,
> 
> On Tue, Jan 26, 2016 at 2:39 AM, Arend van Spriel <aspriel@gmail.com> wrote:
>> On 25-01-16 12:06, Julian Calaby wrote:
>>> Hi Sjoerd,
>>>
>>> On Mon, Jan 25, 2016 at 9:47 PM, Sjoerd Simons
>>> <sjoerd.simons@collabora.co.uk> wrote:
>>>> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
>>>> the card responds very quickly most of the time, unfortunately during
>>>> initialisation it sometimes seems to take just a bit over 2 seconds to
>>>> respond.
>>>>
>>>> This results intialization failing with message like:
>>>>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>>>>   brcmf_bus_start: failed: -52
>>>>   brcmf_sdio_firmware_callback: dongle is not responding
>>>>
>>>> Increasing the timeout to allow for a bit more headroom allows the
>>>> card to initialize reliably.
>>>>
>>>> A quick search online after diagnosing/fixing this showed that Google
>>>> has a similar patch in their ChromeOS tree, so this doesn't seem
>>>> specific to the board I'm using.
>>>>
>>>> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
>>>
>>> Looks sane to me.
>>>
>>> Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
>>
>> Not really a cleanup patch :-p , but thanks for the review.
> 
> I'm trying to review any "small" patch from (relatively) new people.

And it is surely appreciated. Just read your reply in "cleanup patch
pile" thread and felt I had to make the stupid remark with just fun
intended.

Regards,
Arend
Sjoerd Simons - Jan. 26, 2016, 9:12 a.m.
On Mon, 2016-01-25 at 21:07 +0100, Arend van Spriel wrote:
> On 25-1-2016 20:23, Doug Anderson wrote:
> > Kinda curious: do we get one or two really slow responses on every
> > bootup, or just some bootups?  Do we ever succeed even with a slow
> > (like 1.8 or 1.9 seconds) response, or is it always either "fast"
> > or
> > "2.1" seconds?
> 
> Now these are interesting questions that I should have spilled out in
> the first place. Thanks.

I would need to get a more data to really conclusively say, but what i
saw in testing thusfar was:

* The issue occurs on _some_ boots
* The issue seems to come up more often on warm reboots
* The likelyhood seemed to change depending on the kernel
  configuration (iotw when the initilisation order/timing changed?)
* Forcing the brcmfmac module to load later in boot seemed to make the 
  likelyhood of the issue showing up lower.
* The samples I have with traces showed either quick responses on 
  success or slow "2.1" seconds responses on failure.

I'll try to make some time to get some statistics over a series of
boots wrt. the response time(s) for control messages so we can get a
better idea how often this happens and its behaviour.


I guess one other thing to note is that both Doug and myself were
seeing this on Rockchip RK3288 based devices. It would ofcourse be
interesting to know if this has been seen on other types of devices..

-- 
Sjoerd Simons
Collabora Ltd.
Kalle Valo - Feb. 6, 2016, 11:59 a.m.
> On a Radxa Rock2 board with a Ampak AP6335 (Broadcom 4339 core) it seems
> the card responds very quickly most of the time, unfortunately during
> initialisation it sometimes seems to take just a bit over 2 seconds to
> respond.
> 
> This results intialization failing with message like:
>   brcmf_c_preinit_dcmds: Retreiving cur_etheraddr failed, -52
>   brcmf_bus_start: failed: -52
>   brcmf_sdio_firmware_callback: dongle is not responding
> 
> Increasing the timeout to allow for a bit more headroom allows the
> card to initialize reliably.
> 
> A quick search online after diagnosing/fixing this showed that Google
> has a similar patch in their ChromeOS tree, so this doesn't seem
> specific to the board I'm using.
> 
> Signed-off-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
> Reviewed-by: Julian Calaby <julian.calaby@gmail.com>
> Acked-by: Arend van Spriel <arend@broadcom.com>
> Reviewed-by: Douglas Anderson <dianders@chromium.org>

Thanks, applied to wireless-drivers-next.git.

Kalle Valo

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index dd66143..75ac4bd 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -45,8 +45,8 @@ 
 #include "chip.h"
 #include "firmware.h"
 
-#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2000)
-#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2000)
+#define DCMD_RESP_TIMEOUT	msecs_to_jiffies(2500)
+#define CTL_DONE_TIMEOUT	msecs_to_jiffies(2500)
 
 #ifdef DEBUG