diff mbox series

[2/2] mmc: core: Re-work HW reset for SDIO cards

Message ID 20191017135739.1315-3-ulf.hansson@linaro.org (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series mmc: core: Fixup HW reset for SDIO cards | expand

Commit Message

Ulf Hansson Oct. 17, 2019, 1:57 p.m. UTC
It have turned out that it's not a good idea to try to power cycle and to
re-initialize the SDIO card, via mmc_hw_reset. This because there may be
multiple SDIO funcs attached to the same SDIO card.

To solve this problem, we would need to inform each of the SDIO func in
some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely
trivial operation. Therefore, let's instead take the easy way out, by
triggering a card removal and force a new rescan of the SDIO card.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/core/core.c |  3 +--
 drivers/mmc/core/core.h |  2 ++
 drivers/mmc/core/sdio.c | 11 +++++++++--
 3 files changed, 12 insertions(+), 4 deletions(-)

Comments

Doug Anderson Oct. 21, 2019, 10:13 p.m. UTC | #1
Hi,

On Thu, Oct 17, 2019 at 6:58 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> It have turned out that it's not a good idea to try to power cycle and to
> re-initialize the SDIO card, via mmc_hw_reset. This because there may be
> multiple SDIO funcs attached to the same SDIO card.
>
> To solve this problem, we would need to inform each of the SDIO func in
> some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely
> trivial operation. Therefore, let's instead take the easy way out, by
> triggering a card removal and force a new rescan of the SDIO card.
>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/mmc/core/core.c |  3 +--
>  drivers/mmc/core/core.h |  2 ++
>  drivers/mmc/core/sdio.c | 11 +++++++++--
>  3 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 6f8342702c73..39c4567e39d8 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -1469,8 +1469,7 @@ void mmc_detach_bus(struct mmc_host *host)
>         mmc_bus_put(host);
>  }
>
> -static void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
> -                               bool cd_irq)
> +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq)
>  {
>         /*
>          * If the device is configured as wakeup, we prevent a new sleep for
> diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> index 328c78dbee66..575ac0257af2 100644
> --- a/drivers/mmc/core/core.h
> +++ b/drivers/mmc/core/core.h
> @@ -70,6 +70,8 @@ void mmc_rescan(struct work_struct *work);
>  void mmc_start_host(struct mmc_host *host);
>  void mmc_stop_host(struct mmc_host *host);
>
> +void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
> +                       bool cd_irq);
>  int _mmc_detect_card_removed(struct mmc_host *host);
>  int mmc_detect_card_removed(struct mmc_host *host);
>
> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> index 26cabd53ddc5..5d7462c223c3 100644
> --- a/drivers/mmc/core/sdio.c
> +++ b/drivers/mmc/core/sdio.c
> @@ -1050,8 +1050,15 @@ static int mmc_sdio_runtime_resume(struct mmc_host *host)
>
>  static int mmc_sdio_hw_reset(struct mmc_host *host)
>  {
> -       mmc_power_cycle(host, host->card->ocr);
> -       return mmc_sdio_reinit_card(host);
> +       /*
> +        * We may have more multiple SDIO funcs. Rather than to inform them all,
> +        * let's trigger a removal and force a new rescan of the card.
> +        */
> +       host->rescan_entered = 0;
> +       mmc_card_set_removed(host->card);
> +       _mmc_detect_change(host, 0, false);
> +
> +       return 0;
>  }

The problem I see here is that callers of this reset function aren't
expecting it to work this way.  Look specifically at
mwifiex_sdio_card_reset_work().  It's assuming that it needs to do
things like shutdown / reinit.  Now it's true that the old
mwifiex_sdio_card_reset_work() was pretty broken on any systems that
also had SDIO bluetooth, but presumably it worked OK on systems
without SDIO Bluetooth.  I don't think it'll work so well now.

Testing shows that indeed your patch breaks mwifiex reset worse than
it was before (AKA WiFi totally fails instead of it just killing
Bluetooth).

I think it may be better to add a new API call rather than trying to
co-opt the old one.  Maybe put a WARN_ON() for the old API call to
make people move away from it, or something?


...but on the bright side, your patch does seem to work.  If I add my
patch from <https://lkml.kernel.org/r/20190722193939.125578-3-dianders@chromium.org>
and change "sdio_trigger_replug(func)" to
"mmc_hw_reset(func->card->host)" then I can pass reboot tests.  I made
it through about 300 cycles of my old test before stopping the test to
work on other things.


In terms of the implementation, I will freely admit that I'm always
confused by the SD/MMC state machines, but as far as I can tell your
patch accomplishes the same thing as mine but in a bit simpler way.
;-)  I even confirmed that with your patch mmc_power_off() /
mmc_power_up are still called...

Thanks!

-Doug
Ulf Hansson Oct. 22, 2019, 6:51 a.m. UTC | #2
On Tue, 22 Oct 2019 at 00:13, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Oct 17, 2019 at 6:58 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > It have turned out that it's not a good idea to try to power cycle and to
> > re-initialize the SDIO card, via mmc_hw_reset. This because there may be
> > multiple SDIO funcs attached to the same SDIO card.
> >
> > To solve this problem, we would need to inform each of the SDIO func in
> > some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely
> > trivial operation. Therefore, let's instead take the easy way out, by
> > triggering a card removal and force a new rescan of the SDIO card.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  drivers/mmc/core/core.c |  3 +--
> >  drivers/mmc/core/core.h |  2 ++
> >  drivers/mmc/core/sdio.c | 11 +++++++++--
> >  3 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> > index 6f8342702c73..39c4567e39d8 100644
> > --- a/drivers/mmc/core/core.c
> > +++ b/drivers/mmc/core/core.c
> > @@ -1469,8 +1469,7 @@ void mmc_detach_bus(struct mmc_host *host)
> >         mmc_bus_put(host);
> >  }
> >
> > -static void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
> > -                               bool cd_irq)
> > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq)
> >  {
> >         /*
> >          * If the device is configured as wakeup, we prevent a new sleep for
> > diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
> > index 328c78dbee66..575ac0257af2 100644
> > --- a/drivers/mmc/core/core.h
> > +++ b/drivers/mmc/core/core.h
> > @@ -70,6 +70,8 @@ void mmc_rescan(struct work_struct *work);
> >  void mmc_start_host(struct mmc_host *host);
> >  void mmc_stop_host(struct mmc_host *host);
> >
> > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
> > +                       bool cd_irq);
> >  int _mmc_detect_card_removed(struct mmc_host *host);
> >  int mmc_detect_card_removed(struct mmc_host *host);
> >
> > diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> > index 26cabd53ddc5..5d7462c223c3 100644
> > --- a/drivers/mmc/core/sdio.c
> > +++ b/drivers/mmc/core/sdio.c
> > @@ -1050,8 +1050,15 @@ static int mmc_sdio_runtime_resume(struct mmc_host *host)
> >
> >  static int mmc_sdio_hw_reset(struct mmc_host *host)
> >  {
> > -       mmc_power_cycle(host, host->card->ocr);
> > -       return mmc_sdio_reinit_card(host);
> > +       /*
> > +        * We may have more multiple SDIO funcs. Rather than to inform them all,
> > +        * let's trigger a removal and force a new rescan of the card.
> > +        */
> > +       host->rescan_entered = 0;
> > +       mmc_card_set_removed(host->card);
> > +       _mmc_detect_change(host, 0, false);
> > +
> > +       return 0;
> >  }
>
> The problem I see here is that callers of this reset function aren't
> expecting it to work this way.  Look specifically at
> mwifiex_sdio_card_reset_work().  It's assuming that it needs to do
> things like shutdown / reinit.  Now it's true that the old
> mwifiex_sdio_card_reset_work() was pretty broken on any systems that
> also had SDIO bluetooth, but presumably it worked OK on systems
> without SDIO Bluetooth.  I don't think it'll work so well now.

Good point!

I guess I was hoping that running through ->remove() and then
->probe() for the SDIO func drivers should simply take care of
whatever that may be needed. In some way this makes the driver broken
already in regards to this path, but never mind.

>
> Testing shows that indeed your patch breaks mwifiex reset worse than
> it was before (AKA WiFi totally fails instead of it just killing
> Bluetooth).
>
> I think it may be better to add a new API call rather than trying to
> co-opt the old one.  Maybe put a WARN_ON() for the old API call to
> make people move away from it, or something?

Thanks again for testing and for valuable feedback! Clearly this needs
a little more thinking.

An additional concern I see with the "hotplug approach" implemented in
$subject patch, is that it becomes unnecessary heavy when there is
only one SDIO func driver bound.

In one way I am tempted to try to address that situation, as it seems
a bit silly to do full hotplug dance when it isn't needed.

>
>
> ...but on the bright side, your patch does seem to work.  If I add my
> patch from <https://lkml.kernel.org/r/20190722193939.125578-3-dianders@chromium.org>
> and change "sdio_trigger_replug(func)" to
> "mmc_hw_reset(func->card->host)" then I can pass reboot tests.  I made
> it through about 300 cycles of my old test before stopping the test to
> work on other things.

That's really good news. Thanks!

>
>
> In terms of the implementation, I will freely admit that I'm always
> confused by the SD/MMC state machines, but as far as I can tell your
> patch accomplishes the same thing as mine but in a bit simpler way.
> ;-)  I even confirmed that with your patch mmc_power_off() /
> mmc_power_up are still called...
>
> Thanks!
>
> -Doug

Alright, let me rework this and post a new version. I keep you posted.

Kind regards
Uffe
Doug Anderson Oct. 22, 2019, 2:47 p.m. UTC | #3
Hi,

On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> > The problem I see here is that callers of this reset function aren't
> > expecting it to work this way.  Look specifically at
> > mwifiex_sdio_card_reset_work().  It's assuming that it needs to do
> > things like shutdown / reinit.  Now it's true that the old
> > mwifiex_sdio_card_reset_work() was pretty broken on any systems that
> > also had SDIO bluetooth, but presumably it worked OK on systems
> > without SDIO Bluetooth.  I don't think it'll work so well now.
>
> Good point!
>
> I guess I was hoping that running through ->remove() and then
> ->probe() for the SDIO func drivers should simply take care of
> whatever that may be needed. In some way this makes the driver broken
> already in regards to this path, but never mind.

Yeah, probably true.  I guess if anyone actually expected to use one
of these cards as a removable SDIO card (I have seen such dev boards
long ago) then it would always have been possible for someone to
remove the card at just the wrong time and break things.


> > Testing shows that indeed your patch breaks mwifiex reset worse than
> > it was before (AKA WiFi totally fails instead of it just killing
> > Bluetooth).
> >
> > I think it may be better to add a new API call rather than trying to
> > co-opt the old one.  Maybe put a WARN_ON() for the old API call to
> > make people move away from it, or something?
>
> Thanks again for testing and for valuable feedback! Clearly this needs
> a little more thinking.
>
> An additional concern I see with the "hotplug approach" implemented in
> $subject patch, is that it becomes unnecessary heavy when there is
> only one SDIO func driver bound.
>
> In one way I am tempted to try to address that situation, as it seems
> a bit silly to do full hotplug dance when it isn't needed.

True, though I kinda like the heavy solution here.  At least in the
mwifiex case this isn't a part of the normal flow.  AKA: we don't call
this function during normal bootup nor during any normal operations.
It's much more of an "oh crap, something's not working and we don't
know what to do" type solution.  I mean, I guess it's still not
uncommon that we end up in this code path due to the number of bugs in
Marvell firmware, but I'm just trying to say that it's an error code
path and not a normal one.  In my mind that means the more things we
can re-init the better.

If this was, on the other hand, a reset that we were supposed to
always assert when doing a normal operation (like it wants us to reset
it when we switch modes, or something) then a lighter operation would
make more sense.

-Doug
Ulf Hansson Oct. 23, 2019, 3:06 p.m. UTC | #4
On Tue, 22 Oct 2019 at 16:47, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > > The problem I see here is that callers of this reset function aren't
> > > expecting it to work this way.  Look specifically at
> > > mwifiex_sdio_card_reset_work().  It's assuming that it needs to do
> > > things like shutdown / reinit.  Now it's true that the old
> > > mwifiex_sdio_card_reset_work() was pretty broken on any systems that
> > > also had SDIO bluetooth, but presumably it worked OK on systems
> > > without SDIO Bluetooth.  I don't think it'll work so well now.
> >
> > Good point!
> >
> > I guess I was hoping that running through ->remove() and then
> > ->probe() for the SDIO func drivers should simply take care of
> > whatever that may be needed. In some way this makes the driver broken
> > already in regards to this path, but never mind.
>
> Yeah, probably true.  I guess if anyone actually expected to use one
> of these cards as a removable SDIO card (I have seen such dev boards
> long ago) then it would always have been possible for someone to
> remove the card at just the wrong time and break things.

Well, this isn't solely about card removal but driver removal as well.
And the latter can be managed from user space at any point in time.

>
>
> > > Testing shows that indeed your patch breaks mwifiex reset worse than
> > > it was before (AKA WiFi totally fails instead of it just killing
> > > Bluetooth).
> > >
> > > I think it may be better to add a new API call rather than trying to
> > > co-opt the old one.  Maybe put a WARN_ON() for the old API call to
> > > make people move away from it, or something?
> >
> > Thanks again for testing and for valuable feedback! Clearly this needs
> > a little more thinking.
> >
> > An additional concern I see with the "hotplug approach" implemented in
> > $subject patch, is that it becomes unnecessary heavy when there is
> > only one SDIO func driver bound.
> >
> > In one way I am tempted to try to address that situation, as it seems
> > a bit silly to do full hotplug dance when it isn't needed.
>
> True, though I kinda like the heavy solution here.  At least in the
> mwifiex case this isn't a part of the normal flow.  AKA: we don't call
> this function during normal bootup nor during any normal operations.
> It's much more of an "oh crap, something's not working and we don't
> know what to do" type solution.  I mean, I guess it's still not
> uncommon that we end up in this code path due to the number of bugs in
> Marvell firmware, but I'm just trying to say that it's an error code
> path and not a normal one.  In my mind that means the more things we
> can re-init the better.

You have a point, but...

>
> If this was, on the other hand, a reset that we were supposed to
> always assert when doing a normal operation (like it wants us to reset
> it when we switch modes, or something) then a lighter operation would
> make more sense.

This is indeed the tricky part, as it depends on the level of bugs,
but also under what specific circumstances the reset is getting
called.

In the TI case (drivers/net/wireless/ti/wlcore/sdio.c) the reset is
executed at the "power on" case, which for example is at system
resume. And we want system resume to be as fast as possible...

I am exploring a few options to deal with both cases, let's see what I
can come up with in a day or two.

Kind regards
Uffe
Ulf Hansson Oct. 25, 2019, 2:16 p.m. UTC | #5
On Wed, 23 Oct 2019 at 17:06, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Tue, 22 Oct 2019 at 16:47, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > > The problem I see here is that callers of this reset function aren't
> > > > expecting it to work this way.  Look specifically at
> > > > mwifiex_sdio_card_reset_work().  It's assuming that it needs to do
> > > > things like shutdown / reinit.  Now it's true that the old
> > > > mwifiex_sdio_card_reset_work() was pretty broken on any systems that
> > > > also had SDIO bluetooth, but presumably it worked OK on systems
> > > > without SDIO Bluetooth.  I don't think it'll work so well now.
> > >
> > > Good point!
> > >
> > > I guess I was hoping that running through ->remove() and then
> > > ->probe() for the SDIO func drivers should simply take care of
> > > whatever that may be needed. In some way this makes the driver broken
> > > already in regards to this path, but never mind.
> >
> > Yeah, probably true.  I guess if anyone actually expected to use one
> > of these cards as a removable SDIO card (I have seen such dev boards
> > long ago) then it would always have been possible for someone to
> > remove the card at just the wrong time and break things.
>
> Well, this isn't solely about card removal but driver removal as well.
> And the latter can be managed from user space at any point in time.
>
> >
> >
> > > > Testing shows that indeed your patch breaks mwifiex reset worse than
> > > > it was before (AKA WiFi totally fails instead of it just killing
> > > > Bluetooth).
> > > >
> > > > I think it may be better to add a new API call rather than trying to
> > > > co-opt the old one.  Maybe put a WARN_ON() for the old API call to
> > > > make people move away from it, or something?
> > >
> > > Thanks again for testing and for valuable feedback! Clearly this needs
> > > a little more thinking.
> > >
> > > An additional concern I see with the "hotplug approach" implemented in
> > > $subject patch, is that it becomes unnecessary heavy when there is
> > > only one SDIO func driver bound.
> > >
> > > In one way I am tempted to try to address that situation, as it seems
> > > a bit silly to do full hotplug dance when it isn't needed.
> >
> > True, though I kinda like the heavy solution here.  At least in the
> > mwifiex case this isn't a part of the normal flow.  AKA: we don't call
> > this function during normal bootup nor during any normal operations.
> > It's much more of an "oh crap, something's not working and we don't
> > know what to do" type solution.  I mean, I guess it's still not
> > uncommon that we end up in this code path due to the number of bugs in
> > Marvell firmware, but I'm just trying to say that it's an error code
> > path and not a normal one.  In my mind that means the more things we
> > can re-init the better.
>
> You have a point, but...
>
> >
> > If this was, on the other hand, a reset that we were supposed to
> > always assert when doing a normal operation (like it wants us to reset
> > it when we switch modes, or something) then a lighter operation would
> > make more sense.
>
> This is indeed the tricky part, as it depends on the level of bugs,
> but also under what specific circumstances the reset is getting
> called.
>
> In the TI case (drivers/net/wireless/ti/wlcore/sdio.c) the reset is
> executed at the "power on" case, which for example is at system
> resume. And we want system resume to be as fast as possible...
>
> I am exploring a few options to deal with both cases, let's see what I
> can come up with in a day or two.

FYI, still exploring and trying a few slightly different options. I
should be able to post something early next week, stay tuned. :-)

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 6f8342702c73..39c4567e39d8 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1469,8 +1469,7 @@  void mmc_detach_bus(struct mmc_host *host)
 	mmc_bus_put(host);
 }
 
-static void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
-				bool cd_irq)
+void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq)
 {
 	/*
 	 * If the device is configured as wakeup, we prevent a new sleep for
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 328c78dbee66..575ac0257af2 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -70,6 +70,8 @@  void mmc_rescan(struct work_struct *work);
 void mmc_start_host(struct mmc_host *host);
 void mmc_stop_host(struct mmc_host *host);
 
+void _mmc_detect_change(struct mmc_host *host, unsigned long delay,
+			bool cd_irq);
 int _mmc_detect_card_removed(struct mmc_host *host);
 int mmc_detect_card_removed(struct mmc_host *host);
 
diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index 26cabd53ddc5..5d7462c223c3 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -1050,8 +1050,15 @@  static int mmc_sdio_runtime_resume(struct mmc_host *host)
 
 static int mmc_sdio_hw_reset(struct mmc_host *host)
 {
-	mmc_power_cycle(host, host->card->ocr);
-	return mmc_sdio_reinit_card(host);
+	/*
+	 * We may have more multiple SDIO funcs. Rather than to inform them all,
+	 * let's trigger a removal and force a new rescan of the card.
+	 */
+	host->rescan_entered = 0;
+	mmc_card_set_removed(host->card);
+	_mmc_detect_change(host, 0, false);
+
+	return 0;
 }
 
 static int mmc_sdio_sw_reset(struct mmc_host *host)