mbox series

[v2,0/2] mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card

Message ID 20190722193939.125578-1-dianders@chromium.org (mailing list archive)
Headers show
Series mmc: core: Fix Marvell WiFi reset by adding SDIO API to replug card | expand

Message

Doug Anderson July 22, 2019, 7:39 p.m. UTC
As talked about in the thread at:

http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com

...when the Marvell WiFi card tries to reset itself it kills
Bluetooth.  It was observed that we could re-init the card properly by
unbinding / rebinding the host controller.  It was also observed that
in the downstream Chrome OS codebase the solution used was
mmc_remove_host() / mmc_add_host(), which is similar to the solution
in this series.

So far I've only done testing of this series using the reset test
source that can be simulated via sysfs.  Specifically I ran this test:

for i in $(seq 1000); do
  echo "LOOP $i --------"
  echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset

  while true; do
    if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
      fail=$(( fail + 1 ))
      echo "Fail WiFi ${fail}"
      if [[ ${fail} == 3 ]]; then
        exit 1
      fi
    else
      fail=0
      break
    fi
  done

  hciconfig hci0 down
  sleep 1
  if ! hciconfig hci0 up; then
    echo "Fail BT"
    exit 1
  fi
done

I ran this several times and got several hundred iterations each
before a failure.  When I saw failures:

* Once I saw a "Fail BT"; manually resetting the card again fixed it.
  I didn't give it time to see if it would have detected this
  automatically.
* Once I saw the ping fail because (for some reason) my device only
  got an IPv6 address from my router and the IPv4 ping failed.  I
  changed my script to use 'ping6' to see if that would help.
* Once I saw the ping fail because the higher level network stack
  ("shill" in my case) seemed to crash.  A few minutes later the
  system recovered itself automatically.  https://crbug.com/984593 if
  you want more details.
* Sometimes while I was testing I saw "Fail WiFi 1" indicating a
  transitory failure.  Usually this was an association failure, but in
  one case I saw the device do "Firmware wakeup failed" after I
  triggered the reset.  This caused the driver to trigger a re-reset
  of itself which eventually recovered things.  This was good because
  it was an actual test of the normal reset flow (not the one
  triggered via sysfs).

Changes in v2:
- s/routnine/routine (Brian Norris, Matthias Kaehlcke).
- s/contining/containing (Matthias Kaehlcke).
- Add Matthias Reviewed-by tag.
- Removed clear_bit() calls and old comment (Brian Norris).
- Explicit CC of Andreas Fenkart.
- Explicit CC of Brian Norris.
- Add "Fixes" pointing at the commit Brian talked about.
- Add Brian's Reviewed-by tag.

Douglas Anderson (2):
  mmc: core: Add sdio_trigger_replug() API
  mwifiex: Make use of the new sdio_trigger_replug() API to reset

 drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
 drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
 drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
 include/linux/mmc/host.h                    | 15 ++++++++++-
 include/linux/mmc/sdio_func.h               |  2 ++
 5 files changed, 63 insertions(+), 18 deletions(-)

Comments

Ulf Hansson July 25, 2019, 1:28 p.m. UTC | #1
On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
>
> As talked about in the thread at:
>
> http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
>
> ...when the Marvell WiFi card tries to reset itself it kills
> Bluetooth.  It was observed that we could re-init the card properly by
> unbinding / rebinding the host controller.  It was also observed that
> in the downstream Chrome OS codebase the solution used was
> mmc_remove_host() / mmc_add_host(), which is similar to the solution
> in this series.
>
> So far I've only done testing of this series using the reset test
> source that can be simulated via sysfs.  Specifically I ran this test:
>
> for i in $(seq 1000); do
>   echo "LOOP $i --------"
>   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
>
>   while true; do
>     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
>       fail=$(( fail + 1 ))
>       echo "Fail WiFi ${fail}"
>       if [[ ${fail} == 3 ]]; then
>         exit 1
>       fi
>     else
>       fail=0
>       break
>     fi
>   done
>
>   hciconfig hci0 down
>   sleep 1
>   if ! hciconfig hci0 up; then
>     echo "Fail BT"
>     exit 1
>   fi
> done
>
> I ran this several times and got several hundred iterations each
> before a failure.  When I saw failures:
>
> * Once I saw a "Fail BT"; manually resetting the card again fixed it.
>   I didn't give it time to see if it would have detected this
>   automatically.
> * Once I saw the ping fail because (for some reason) my device only
>   got an IPv6 address from my router and the IPv4 ping failed.  I
>   changed my script to use 'ping6' to see if that would help.
> * Once I saw the ping fail because the higher level network stack
>   ("shill" in my case) seemed to crash.  A few minutes later the
>   system recovered itself automatically.  https://crbug.com/984593 if
>   you want more details.
> * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
>   transitory failure.  Usually this was an association failure, but in
>   one case I saw the device do "Firmware wakeup failed" after I
>   triggered the reset.  This caused the driver to trigger a re-reset
>   of itself which eventually recovered things.  This was good because
>   it was an actual test of the normal reset flow (not the one
>   triggered via sysfs).
>
> Changes in v2:
> - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> - s/contining/containing (Matthias Kaehlcke).
> - Add Matthias Reviewed-by tag.
> - Removed clear_bit() calls and old comment (Brian Norris).
> - Explicit CC of Andreas Fenkart.
> - Explicit CC of Brian Norris.
> - Add "Fixes" pointing at the commit Brian talked about.
> - Add Brian's Reviewed-by tag.
>
> Douglas Anderson (2):
>   mmc: core: Add sdio_trigger_replug() API
>   mwifiex: Make use of the new sdio_trigger_replug() API to reset
>
>  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
>  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
>  include/linux/mmc/host.h                    | 15 ++++++++++-
>  include/linux/mmc/sdio_func.h               |  2 ++
>  5 files changed, 63 insertions(+), 18 deletions(-)
>

Doug, thanks for sending this!

As you know, I have been working on additional changes for SDIO
suspend/resume (still WIP and not ready for sharing) and this series
is related.

The thing is, that even during system suspend/resume, synchronizations
are needed between the different layers (mmc host, mmc core and
sdio-funcs), which is common to the problem you want to solve.

That said, I need to scratch my head a bit more before I can provide
you some feedback on $subject series. Moreover, it's vacation period
at my side so things are moving a bit slower. Please be patient.

Kind regards
Uffe
Andreas Fenkart July 30, 2019, 8:46 a.m. UTC | #2
Hi Douglas,

Am Mo., 22. Juli 2019 um 21:41 Uhr schrieb Douglas Anderson
<dianders@chromium.org>:
>
> As talked about in the thread at:
>
> http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
>
> ...when the Marvell WiFi card tries to reset itself it kills
> Bluetooth.  It was observed that we could re-init the card properly by
> unbinding / rebinding the host controller.  It was also observed that
> in the downstream Chrome OS codebase the solution used was
> mmc_remove_host() / mmc_add_host(), which is similar to the solution
> in this series.
>
> So far I've only done testing of this series using the reset test
> source that can be simulated via sysfs.  Specifically I ran this test:
>
> for i in $(seq 1000); do
>   echo "LOOP $i --------"
>   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
>
>   while true; do
>     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
>       fail=$(( fail + 1 ))
>       echo "Fail WiFi ${fail}"
>       if [[ ${fail} == 3 ]]; then
>         exit 1
>       fi
>     else
>       fail=0
>       break
>     fi
>   done
>
>   hciconfig hci0 down
>   sleep 1
>   if ! hciconfig hci0 up; then
>     echo "Fail BT"
>     exit 1
>   fi
> done
>
> I ran this several times and got several hundred iterations each
> before a failure.  When I saw failures:
>
> * Once I saw a "Fail BT"; manually resetting the card again fixed it.
>   I didn't give it time to see if it would have detected this
>   automatically.
> * Once I saw the ping fail because (for some reason) my device only
>   got an IPv6 address from my router and the IPv4 ping failed.  I
>   changed my script to use 'ping6' to see if that would help.
> * Once I saw the ping fail because the higher level network stack
>   ("shill" in my case) seemed to crash.  A few minutes later the
>   system recovered itself automatically.  https://crbug.com/984593 if
>   you want more details.
> * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
>   transitory failure.  Usually this was an association failure, but in
>   one case I saw the device do "Firmware wakeup failed" after I
>   triggered the reset.  This caused the driver to trigger a re-reset
>   of itself which eventually recovered things.  This was good because
>   it was an actual test of the normal reset flow (not the one
>   triggered via sysfs).

This error triggers something. I remember that when I was working on
suspend-to-ram feature, we had problems to wake up the firmware
reliable. I found this patch in one of my old 3.13 tree

    the missing bit -- ugly hack to force cmd52 before cmd53.
---
 drivers/mmc/host/omap_hsmmc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c
index fb24a006080f..710d0bdf39e5 100644
--- a/drivers/mmc/host/omap_hsmmc.c
+++ b/drivers/mmc/host/omap_hsmmc.c
@@ -2372,6 +2372,12 @@ static int omap_hsmmc_suspend(struct device *dev)
        if (host->flags & HSMMC_SWAKEUP_QUIRK)
                disable_irq(host->gpio_sdio_irq);

+       /*
+        * force a polling cycle after resume.
+        * will issue cmd52, not cmd53 straight away
+        */
+       omap_hsmmc_enable_sdio_irq(host->mmc, false);
+
        if (host->dbclk)
                clk_disable_unprepare(host->dbclk);

>
> Changes in v2:
> - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> - s/contining/containing (Matthias Kaehlcke).
> - Add Matthias Reviewed-by tag.
> - Removed clear_bit() calls and old comment (Brian Norris).
> - Explicit CC of Andreas Fenkart.
> - Explicit CC of Brian Norris.
> - Add "Fixes" pointing at the commit Brian talked about.
> - Add Brian's Reviewed-by tag.
>
> Douglas Anderson (2):
>   mmc: core: Add sdio_trigger_replug() API
>   mwifiex: Make use of the new sdio_trigger_replug() API to reset
>
>  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
>  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
>  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
>  include/linux/mmc/host.h                    | 15 ++++++++++-
>  include/linux/mmc/sdio_func.h               |  2 ++
>  5 files changed, 63 insertions(+), 18 deletions(-)
>
> --
> 2.22.0.657.g960e92d24f-goog
>
Doug Anderson July 30, 2019, 4:59 p.m. UTC | #3
Hi,

On Tue, Jul 30, 2019 at 1:47 AM Andreas Fenkart <afenkart@gmail.com> wrote:
>
> > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> >   transitory failure.  Usually this was an association failure, but in
> >   one case I saw the device do "Firmware wakeup failed" after I
> >   triggered the reset.  This caused the driver to trigger a re-reset
> >   of itself which eventually recovered things.  This was good because
> >   it was an actual test of the normal reset flow (not the one
> >   triggered via sysfs).
>
> This error triggers something. I remember that when I was working on
> suspend-to-ram feature, we had problems to wake up the firmware
> reliable. I found this patch in one of my old 3.13 tree
>
>     the missing bit -- ugly hack to force cmd52 before cmd53.

Thanks for the reference!  At the moment I'm not terribly worried
about this particular failure case (compared to other failure modes)
because it's rare and it self-heals.

...my best guess, though, is that the problem isn't exactly the same.
The "Firmware wakeup failed" is a pretty generic error message, kind
of like "something went wrong" and not all instances of this message
will have the same root cause.

I actually dealt with a few suspend/resume issues around mwifiex
recently though.  If you ever uprev, you might be interested in:

b82d6c1f8f82 mwifiex: Make resume actually do something useful again
on SDIO cards
83293386bc95 mmc: core: Prevent processing SDIO IRQs when the card is suspended

-Doug
Doug Anderson Sept. 11, 2019, 9:26 p.m. UTC | #4
Hi,

On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
> >
> > As talked about in the thread at:
> >
> > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> >
> > ...when the Marvell WiFi card tries to reset itself it kills
> > Bluetooth.  It was observed that we could re-init the card properly by
> > unbinding / rebinding the host controller.  It was also observed that
> > in the downstream Chrome OS codebase the solution used was
> > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > in this series.
> >
> > So far I've only done testing of this series using the reset test
> > source that can be simulated via sysfs.  Specifically I ran this test:
> >
> > for i in $(seq 1000); do
> >   echo "LOOP $i --------"
> >   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> >
> >   while true; do
> >     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> >       fail=$(( fail + 1 ))
> >       echo "Fail WiFi ${fail}"
> >       if [[ ${fail} == 3 ]]; then
> >         exit 1
> >       fi
> >     else
> >       fail=0
> >       break
> >     fi
> >   done
> >
> >   hciconfig hci0 down
> >   sleep 1
> >   if ! hciconfig hci0 up; then
> >     echo "Fail BT"
> >     exit 1
> >   fi
> > done
> >
> > I ran this several times and got several hundred iterations each
> > before a failure.  When I saw failures:
> >
> > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> >   I didn't give it time to see if it would have detected this
> >   automatically.
> > * Once I saw the ping fail because (for some reason) my device only
> >   got an IPv6 address from my router and the IPv4 ping failed.  I
> >   changed my script to use 'ping6' to see if that would help.
> > * Once I saw the ping fail because the higher level network stack
> >   ("shill" in my case) seemed to crash.  A few minutes later the
> >   system recovered itself automatically.  https://crbug.com/984593 if
> >   you want more details.
> > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> >   transitory failure.  Usually this was an association failure, but in
> >   one case I saw the device do "Firmware wakeup failed" after I
> >   triggered the reset.  This caused the driver to trigger a re-reset
> >   of itself which eventually recovered things.  This was good because
> >   it was an actual test of the normal reset flow (not the one
> >   triggered via sysfs).
> >
> > Changes in v2:
> > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > - s/contining/containing (Matthias Kaehlcke).
> > - Add Matthias Reviewed-by tag.
> > - Removed clear_bit() calls and old comment (Brian Norris).
> > - Explicit CC of Andreas Fenkart.
> > - Explicit CC of Brian Norris.
> > - Add "Fixes" pointing at the commit Brian talked about.
> > - Add Brian's Reviewed-by tag.
> >
> > Douglas Anderson (2):
> >   mmc: core: Add sdio_trigger_replug() API
> >   mwifiex: Make use of the new sdio_trigger_replug() API to reset
> >
> >  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
> >  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
> >  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> >  include/linux/mmc/host.h                    | 15 ++++++++++-
> >  include/linux/mmc/sdio_func.h               |  2 ++
> >  5 files changed, 63 insertions(+), 18 deletions(-)
> >
>
> Doug, thanks for sending this!
>
> As you know, I have been working on additional changes for SDIO
> suspend/resume (still WIP and not ready for sharing) and this series
> is related.
>
> The thing is, that even during system suspend/resume, synchronizations
> are needed between the different layers (mmc host, mmc core and
> sdio-funcs), which is common to the problem you want to solve.
>
> That said, I need to scratch my head a bit more before I can provide
> you some feedback on $subject series. Moreover, it's vacation period
> at my side so things are moving a bit slower. Please be patient.

I had kinda forgotten about this series after we landed it locally in
Chrome OS, but I realized that it never got resolved upstream.  Any
chance your head has been sufficiently scratched and you're now happy
with $subject series?  ;-)

-Doug
Ulf Hansson Sept. 16, 2019, 9:25 a.m. UTC | #5
On Wed, 11 Sep 2019 at 23:26, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
> > >
> > > As talked about in the thread at:
> > >
> > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > >
> > > ...when the Marvell WiFi card tries to reset itself it kills
> > > Bluetooth.  It was observed that we could re-init the card properly by
> > > unbinding / rebinding the host controller.  It was also observed that
> > > in the downstream Chrome OS codebase the solution used was
> > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > in this series.
> > >
> > > So far I've only done testing of this series using the reset test
> > > source that can be simulated via sysfs.  Specifically I ran this test:
> > >
> > > for i in $(seq 1000); do
> > >   echo "LOOP $i --------"
> > >   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > >
> > >   while true; do
> > >     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > >       fail=$(( fail + 1 ))
> > >       echo "Fail WiFi ${fail}"
> > >       if [[ ${fail} == 3 ]]; then
> > >         exit 1
> > >       fi
> > >     else
> > >       fail=0
> > >       break
> > >     fi
> > >   done
> > >
> > >   hciconfig hci0 down
> > >   sleep 1
> > >   if ! hciconfig hci0 up; then
> > >     echo "Fail BT"
> > >     exit 1
> > >   fi
> > > done
> > >
> > > I ran this several times and got several hundred iterations each
> > > before a failure.  When I saw failures:
> > >
> > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > >   I didn't give it time to see if it would have detected this
> > >   automatically.
> > > * Once I saw the ping fail because (for some reason) my device only
> > >   got an IPv6 address from my router and the IPv4 ping failed.  I
> > >   changed my script to use 'ping6' to see if that would help.
> > > * Once I saw the ping fail because the higher level network stack
> > >   ("shill" in my case) seemed to crash.  A few minutes later the
> > >   system recovered itself automatically.  https://crbug.com/984593 if
> > >   you want more details.
> > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > >   transitory failure.  Usually this was an association failure, but in
> > >   one case I saw the device do "Firmware wakeup failed" after I
> > >   triggered the reset.  This caused the driver to trigger a re-reset
> > >   of itself which eventually recovered things.  This was good because
> > >   it was an actual test of the normal reset flow (not the one
> > >   triggered via sysfs).
> > >
> > > Changes in v2:
> > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > - s/contining/containing (Matthias Kaehlcke).
> > > - Add Matthias Reviewed-by tag.
> > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > - Explicit CC of Andreas Fenkart.
> > > - Explicit CC of Brian Norris.
> > > - Add "Fixes" pointing at the commit Brian talked about.
> > > - Add Brian's Reviewed-by tag.
> > >
> > > Douglas Anderson (2):
> > >   mmc: core: Add sdio_trigger_replug() API
> > >   mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > >
> > >  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
> > >  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
> > >  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > >  include/linux/mmc/host.h                    | 15 ++++++++++-
> > >  include/linux/mmc/sdio_func.h               |  2 ++
> > >  5 files changed, 63 insertions(+), 18 deletions(-)
> > >
> >
> > Doug, thanks for sending this!
> >
> > As you know, I have been working on additional changes for SDIO
> > suspend/resume (still WIP and not ready for sharing) and this series
> > is related.
> >
> > The thing is, that even during system suspend/resume, synchronizations
> > are needed between the different layers (mmc host, mmc core and
> > sdio-funcs), which is common to the problem you want to solve.
> >
> > That said, I need to scratch my head a bit more before I can provide
> > you some feedback on $subject series. Moreover, it's vacation period
> > at my side so things are moving a bit slower. Please be patient.
>
> I had kinda forgotten about this series after we landed it locally in
> Chrome OS, but I realized that it never got resolved upstream.  Any
> chance your head has been sufficiently scratched and you're now happy
> with $subject series?  ;-)

It's still on my TODO list. Apologize for the delay, but I still need
more time to look into it, possibly later this week.

In any case, let's make sure we get this problem resolved for v5.5.

Kind regards
Uffe
Doug Anderson Oct. 7, 2019, 11:39 p.m. UTC | #6
Hi,

On Mon, Sep 16, 2019 at 2:25 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 11 Sep 2019 at 23:26, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
> > > >
> > > > As talked about in the thread at:
> > > >
> > > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > > >
> > > > ...when the Marvell WiFi card tries to reset itself it kills
> > > > Bluetooth.  It was observed that we could re-init the card properly by
> > > > unbinding / rebinding the host controller.  It was also observed that
> > > > in the downstream Chrome OS codebase the solution used was
> > > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > > in this series.
> > > >
> > > > So far I've only done testing of this series using the reset test
> > > > source that can be simulated via sysfs.  Specifically I ran this test:
> > > >
> > > > for i in $(seq 1000); do
> > > >   echo "LOOP $i --------"
> > > >   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > > >
> > > >   while true; do
> > > >     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > > >       fail=$(( fail + 1 ))
> > > >       echo "Fail WiFi ${fail}"
> > > >       if [[ ${fail} == 3 ]]; then
> > > >         exit 1
> > > >       fi
> > > >     else
> > > >       fail=0
> > > >       break
> > > >     fi
> > > >   done
> > > >
> > > >   hciconfig hci0 down
> > > >   sleep 1
> > > >   if ! hciconfig hci0 up; then
> > > >     echo "Fail BT"
> > > >     exit 1
> > > >   fi
> > > > done
> > > >
> > > > I ran this several times and got several hundred iterations each
> > > > before a failure.  When I saw failures:
> > > >
> > > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > > >   I didn't give it time to see if it would have detected this
> > > >   automatically.
> > > > * Once I saw the ping fail because (for some reason) my device only
> > > >   got an IPv6 address from my router and the IPv4 ping failed.  I
> > > >   changed my script to use 'ping6' to see if that would help.
> > > > * Once I saw the ping fail because the higher level network stack
> > > >   ("shill" in my case) seemed to crash.  A few minutes later the
> > > >   system recovered itself automatically.  https://crbug.com/984593 if
> > > >   you want more details.
> > > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > > >   transitory failure.  Usually this was an association failure, but in
> > > >   one case I saw the device do "Firmware wakeup failed" after I
> > > >   triggered the reset.  This caused the driver to trigger a re-reset
> > > >   of itself which eventually recovered things.  This was good because
> > > >   it was an actual test of the normal reset flow (not the one
> > > >   triggered via sysfs).
> > > >
> > > > Changes in v2:
> > > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > > - s/contining/containing (Matthias Kaehlcke).
> > > > - Add Matthias Reviewed-by tag.
> > > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > > - Explicit CC of Andreas Fenkart.
> > > > - Explicit CC of Brian Norris.
> > > > - Add "Fixes" pointing at the commit Brian talked about.
> > > > - Add Brian's Reviewed-by tag.
> > > >
> > > > Douglas Anderson (2):
> > > >   mmc: core: Add sdio_trigger_replug() API
> > > >   mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > > >
> > > >  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
> > > >  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
> > > >  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > > >  include/linux/mmc/host.h                    | 15 ++++++++++-
> > > >  include/linux/mmc/sdio_func.h               |  2 ++
> > > >  5 files changed, 63 insertions(+), 18 deletions(-)
> > > >
> > >
> > > Doug, thanks for sending this!
> > >
> > > As you know, I have been working on additional changes for SDIO
> > > suspend/resume (still WIP and not ready for sharing) and this series
> > > is related.
> > >
> > > The thing is, that even during system suspend/resume, synchronizations
> > > are needed between the different layers (mmc host, mmc core and
> > > sdio-funcs), which is common to the problem you want to solve.
> > >
> > > That said, I need to scratch my head a bit more before I can provide
> > > you some feedback on $subject series. Moreover, it's vacation period
> > > at my side so things are moving a bit slower. Please be patient.
> >
> > I had kinda forgotten about this series after we landed it locally in
> > Chrome OS, but I realized that it never got resolved upstream.  Any
> > chance your head has been sufficiently scratched and you're now happy
> > with $subject series?  ;-)
>
> It's still on my TODO list. Apologize for the delay, but I still need
> more time to look into it, possibly later this week.
>
> In any case, let's make sure we get this problem resolved for v5.5.

Hi Ulf.  It's your friendly pest, Doug, here to ask how things are going.  :-P


-Doug
Ulf Hansson Oct. 8, 2019, 11:49 a.m. UTC | #7
On Tue, 8 Oct 2019 at 01:39, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Sep 16, 2019 at 2:25 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Wed, 11 Sep 2019 at 23:26, Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Jul 25, 2019 at 6:28 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > On Mon, 22 Jul 2019 at 21:41, Douglas Anderson <dianders@chromium.org> wrote:
> > > > >
> > > > > As talked about in the thread at:
> > > > >
> > > > > http://lkml.kernel.org/r/CAD=FV=X7P2F1k_zwHc0mbtfk55-rucTz_GoDH=PL6zWqKYcpuw@mail.gmail.com
> > > > >
> > > > > ...when the Marvell WiFi card tries to reset itself it kills
> > > > > Bluetooth.  It was observed that we could re-init the card properly by
> > > > > unbinding / rebinding the host controller.  It was also observed that
> > > > > in the downstream Chrome OS codebase the solution used was
> > > > > mmc_remove_host() / mmc_add_host(), which is similar to the solution
> > > > > in this series.
> > > > >
> > > > > So far I've only done testing of this series using the reset test
> > > > > source that can be simulated via sysfs.  Specifically I ran this test:
> > > > >
> > > > > for i in $(seq 1000); do
> > > > >   echo "LOOP $i --------"
> > > > >   echo 1 > /sys/kernel/debug/mwifiex/mlan0/reset
> > > > >
> > > > >   while true; do
> > > > >     if ! ping -w15 -c1 "${GW}" >/dev/null 2>&1; then
> > > > >       fail=$(( fail + 1 ))
> > > > >       echo "Fail WiFi ${fail}"
> > > > >       if [[ ${fail} == 3 ]]; then
> > > > >         exit 1
> > > > >       fi
> > > > >     else
> > > > >       fail=0
> > > > >       break
> > > > >     fi
> > > > >   done
> > > > >
> > > > >   hciconfig hci0 down
> > > > >   sleep 1
> > > > >   if ! hciconfig hci0 up; then
> > > > >     echo "Fail BT"
> > > > >     exit 1
> > > > >   fi
> > > > > done
> > > > >
> > > > > I ran this several times and got several hundred iterations each
> > > > > before a failure.  When I saw failures:
> > > > >
> > > > > * Once I saw a "Fail BT"; manually resetting the card again fixed it.
> > > > >   I didn't give it time to see if it would have detected this
> > > > >   automatically.
> > > > > * Once I saw the ping fail because (for some reason) my device only
> > > > >   got an IPv6 address from my router and the IPv4 ping failed.  I
> > > > >   changed my script to use 'ping6' to see if that would help.
> > > > > * Once I saw the ping fail because the higher level network stack
> > > > >   ("shill" in my case) seemed to crash.  A few minutes later the
> > > > >   system recovered itself automatically.  https://crbug.com/984593 if
> > > > >   you want more details.
> > > > > * Sometimes while I was testing I saw "Fail WiFi 1" indicating a
> > > > >   transitory failure.  Usually this was an association failure, but in
> > > > >   one case I saw the device do "Firmware wakeup failed" after I
> > > > >   triggered the reset.  This caused the driver to trigger a re-reset
> > > > >   of itself which eventually recovered things.  This was good because
> > > > >   it was an actual test of the normal reset flow (not the one
> > > > >   triggered via sysfs).
> > > > >
> > > > > Changes in v2:
> > > > > - s/routnine/routine (Brian Norris, Matthias Kaehlcke).
> > > > > - s/contining/containing (Matthias Kaehlcke).
> > > > > - Add Matthias Reviewed-by tag.
> > > > > - Removed clear_bit() calls and old comment (Brian Norris).
> > > > > - Explicit CC of Andreas Fenkart.
> > > > > - Explicit CC of Brian Norris.
> > > > > - Add "Fixes" pointing at the commit Brian talked about.
> > > > > - Add Brian's Reviewed-by tag.
> > > > >
> > > > > Douglas Anderson (2):
> > > > >   mmc: core: Add sdio_trigger_replug() API
> > > > >   mwifiex: Make use of the new sdio_trigger_replug() API to reset
> > > > >
> > > > >  drivers/mmc/core/core.c                     | 28 +++++++++++++++++++--
> > > > >  drivers/mmc/core/sdio_io.c                  | 20 +++++++++++++++
> > > > >  drivers/net/wireless/marvell/mwifiex/sdio.c | 16 +-----------
> > > > >  include/linux/mmc/host.h                    | 15 ++++++++++-
> > > > >  include/linux/mmc/sdio_func.h               |  2 ++
> > > > >  5 files changed, 63 insertions(+), 18 deletions(-)
> > > > >
> > > >
> > > > Doug, thanks for sending this!
> > > >
> > > > As you know, I have been working on additional changes for SDIO
> > > > suspend/resume (still WIP and not ready for sharing) and this series
> > > > is related.
> > > >
> > > > The thing is, that even during system suspend/resume, synchronizations
> > > > are needed between the different layers (mmc host, mmc core and
> > > > sdio-funcs), which is common to the problem you want to solve.
> > > >
> > > > That said, I need to scratch my head a bit more before I can provide
> > > > you some feedback on $subject series. Moreover, it's vacation period
> > > > at my side so things are moving a bit slower. Please be patient.
> > >
> > > I had kinda forgotten about this series after we landed it locally in
> > > Chrome OS, but I realized that it never got resolved upstream.  Any
> > > chance your head has been sufficiently scratched and you're now happy
> > > with $subject series?  ;-)
> >
> > It's still on my TODO list. Apologize for the delay, but I still need
> > more time to look into it, possibly later this week.
> >
> > In any case, let's make sure we get this problem resolved for v5.5.
>
> Hi Ulf.  It's your friendly pest, Doug, here to ask how things are going.  :-P

:-)

The series on the top of my "things to review" list. I will definitely
provide you with some feedback then next days or so.

Again, sorry for the delay!

Kind regards
Uffe