diff mbox series

[4/7] mmc: sdio: Drop powered-on re-init at runtime resume and HW reset

Message ID 20190618153448.27145-5-ulf.hansson@linaro.org (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series mmc: sdio: Various fixes/improvements for SDIO PM | expand

Commit Message

Ulf Hansson June 18, 2019, 3:34 p.m. UTC
To use the so called powered-on re-initialization of an SDIO card, the
power to the card must obviously have stayed on. If not, the initialization
will simply fail.

In the runtime suspend case, the card is always powered off. Hence, let's
drop the support for powered-on re-initialization during runtime resume, as
it doesn't make sense.

Moreover, during a HW reset, the point is to cut the power to the card and
then do fresh re-initialization. Therefore drop the support for powered-on
re-initialization during HW reset.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
---
 drivers/mmc/core/sdio.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Doug Anderson July 4, 2019, 12:01 a.m. UTC | #1
Hi,

On Tue, Jun 18, 2019 at 8:35 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> To use the so called powered-on re-initialization of an SDIO card, the
> power to the card must obviously have stayed on. If not, the initialization
> will simply fail.
>
> In the runtime suspend case, the card is always powered off. Hence, let's
> drop the support for powered-on re-initialization during runtime resume, as
> it doesn't make sense.
>
> Moreover, during a HW reset, the point is to cut the power to the card and
> then do fresh re-initialization. Therefore drop the support for powered-on
> re-initialization during HW reset.
>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> ---
>  drivers/mmc/core/sdio.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)

This has been on my list of things to test for a while but I never
quite got to it...

...and then, today, I spent time bisecting why the "reset"
functionality of miwfiex is broken on my 4.19 kernel [1].  AKA, this
is broken:

cd /sys/kernel/debug/mwifiex/mlan0
echo 1 > reset

I finally bisected the problem and tracked it down to commit
ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when SDIO IRQs
are enabled"), which embarrassingly has my Tested-by on it.  I guess I
never tested the Marvell reset call.  :-/

I dug a little and found that when the Marvell code did its reset we
ended up getting a call to dw_mci_enable_sdio_irq(enb=0) and never saw
a dw_mci_enable_sdio_irq(enb=1) after.  I tracked it down further and
found that specifically it was the call to mmc_signal_sdio_irq() in
mmc_sdio_power_restore() that was making the call.  The call stack
shown for the "enb=0" call:

[<c071a290>] (dw_mci_enable_sdio_irq) from [<c070a960>]
(mmc_sdio_power_restore+0x98/0xc0)
[<c070a960>] (mmc_sdio_power_restore) from [<c070a9b4>]
(mmc_sdio_reset+0x2c/0x30)
[<c070a9b4>] (mmc_signal_sdio_irq) from [<c06ff160>] (mmc_hw_reset+0xbc/0x138)
[<c06ff160>] (mmc_hw_reset) from [<bf1bbad8>]
(mwifiex_sdio_work+0x5d4/0x678 [mwifiex_sdio])
[<bf1bbad8>] (mwifiex_sdio_work [mwifiex_sdio]) from [<c0247cd0>]
(process_one_work+0x290/0x4b4)

I picked your patch here (which gets rid of the call to
mmc_signal_sdio_irq()) and magically the problem went away because
there is no more call to mmc_signal_sdio_irq().

I personally don't have lots of history about the whole
"powered_resume" code path.  I checked and mmc_card_keep_power() was 0
in my test case of getting called from hw_reset, so the rest of this
patch doesn't affect me at all.  This surprised me a little since I
saw "MMC_PM_KEEP_POWER" being set in mwifiex but then I realized that
it was only set for the duration of suspend and then cleared by the
core.  ;-)

I will also say that I don't have any test case or knowledge of how
SDIO runtime suspend/resume is supposed to work since on dw_mmc SDIO
cards are currently not allowed to runtime suspend anyway.  ;-)


So I guess the result of all that long-winded reply is that for on
rk3288-veyron-jerry:

Fixes: ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when
SDIO IRQs are enabled")
Tested-by: Douglas Anderson <dianders@chromium.org>


One last note is that, though Marvell WiFi works after a reset after
this commit, Marvell Bluetooth (on the same SDIO module) doesn't.  I
guess next week it'll be another bisect...

[1] https://crbug.com/981113



-Doug
Ulf Hansson July 8, 2019, 10:53 a.m. UTC | #2
On Thu, 4 Jul 2019 at 02:02, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Jun 18, 2019 at 8:35 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > To use the so called powered-on re-initialization of an SDIO card, the
> > power to the card must obviously have stayed on. If not, the initialization
> > will simply fail.
> >
> > In the runtime suspend case, the card is always powered off. Hence, let's
> > drop the support for powered-on re-initialization during runtime resume, as
> > it doesn't make sense.
> >
> > Moreover, during a HW reset, the point is to cut the power to the card and
> > then do fresh re-initialization. Therefore drop the support for powered-on
> > re-initialization during HW reset.
> >
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > ---
> >  drivers/mmc/core/sdio.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
>
> This has been on my list of things to test for a while but I never
> quite got to it...
>
> ...and then, today, I spent time bisecting why the "reset"
> functionality of miwfiex is broken on my 4.19 kernel [1].  AKA, this
> is broken:
>
> cd /sys/kernel/debug/mwifiex/mlan0
> echo 1 > reset
>
> I finally bisected the problem and tracked it down to commit
> ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when SDIO IRQs
> are enabled"), which embarrassingly has my Tested-by on it.  I guess I
> never tested the Marvell reset call.  :-/
>
> I dug a little and found that when the Marvell code did its reset we
> ended up getting a call to dw_mci_enable_sdio_irq(enb=0) and never saw
> a dw_mci_enable_sdio_irq(enb=1) after.  I tracked it down further and
> found that specifically it was the call to mmc_signal_sdio_irq() in
> mmc_sdio_power_restore() that was making the call.  The call stack
> shown for the "enb=0" call:
>
> [<c071a290>] (dw_mci_enable_sdio_irq) from [<c070a960>]
> (mmc_sdio_power_restore+0x98/0xc0)
> [<c070a960>] (mmc_sdio_power_restore) from [<c070a9b4>]
> (mmc_sdio_reset+0x2c/0x30)
> [<c070a9b4>] (mmc_signal_sdio_irq) from [<c06ff160>] (mmc_hw_reset+0xbc/0x138)
> [<c06ff160>] (mmc_hw_reset) from [<bf1bbad8>]
> (mwifiex_sdio_work+0x5d4/0x678 [mwifiex_sdio])
> [<bf1bbad8>] (mwifiex_sdio_work [mwifiex_sdio]) from [<c0247cd0>]
> (process_one_work+0x290/0x4b4)
>
> I picked your patch here (which gets rid of the call to
> mmc_signal_sdio_irq()) and magically the problem went away because
> there is no more call to mmc_signal_sdio_irq().
>
> I personally don't have lots of history about the whole
> "powered_resume" code path.  I checked and mmc_card_keep_power() was 0
> in my test case of getting called from hw_reset, so the rest of this
> patch doesn't affect me at all.  This surprised me a little since I
> saw "MMC_PM_KEEP_POWER" being set in mwifiex but then I realized that
> it was only set for the duration of suspend and then cleared by the
> core.  ;-)
>
> I will also say that I don't have any test case or knowledge of how
> SDIO runtime suspend/resume is supposed to work since on dw_mmc SDIO
> cards are currently not allowed to runtime suspend anyway.  ;-)
>
>
> So I guess the result of all that long-winded reply is that for on
> rk3288-veyron-jerry:
>
> Fixes: ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when
> SDIO IRQs are enabled")
> Tested-by: Douglas Anderson <dianders@chromium.org>

Thanks a lot for testing and for your detailed feedback. I have
amended the patch by adding your tags above.

Moreover, we seems to need this for stable as well, but I am leaving
that to be managed as a separate task. We may even consider the hole
series for stable, but that requires more testing first.

>
>
> One last note is that, though Marvell WiFi works after a reset after
> this commit, Marvell Bluetooth (on the same SDIO module) doesn't.  I
> guess next week it'll be another bisect...

Is the Bluetooth connected to the same SDIO interface, thus the
Bluetooth driver is an SDIO func driver?

>
> [1] https://crbug.com/981113
>
>
>
> -Doug

Kind regards
Uffe
Doug Anderson July 8, 2019, 9:12 p.m. UTC | #3
Hi,

On Mon, Jul 8, 2019 at 3:54 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 4 Jul 2019 at 02:02, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Tue, Jun 18, 2019 at 8:35 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > To use the so called powered-on re-initialization of an SDIO card, the
> > > power to the card must obviously have stayed on. If not, the initialization
> > > will simply fail.
> > >
> > > In the runtime suspend case, the card is always powered off. Hence, let's
> > > drop the support for powered-on re-initialization during runtime resume, as
> > > it doesn't make sense.
> > >
> > > Moreover, during a HW reset, the point is to cut the power to the card and
> > > then do fresh re-initialization. Therefore drop the support for powered-on
> > > re-initialization during HW reset.
> > >
> > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > ---
> > >  drivers/mmc/core/sdio.c | 8 +-------
> > >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > This has been on my list of things to test for a while but I never
> > quite got to it...
> >
> > ...and then, today, I spent time bisecting why the "reset"
> > functionality of miwfiex is broken on my 4.19 kernel [1].  AKA, this
> > is broken:
> >
> > cd /sys/kernel/debug/mwifiex/mlan0
> > echo 1 > reset
> >
> > I finally bisected the problem and tracked it down to commit
> > ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when SDIO IRQs
> > are enabled"), which embarrassingly has my Tested-by on it.  I guess I
> > never tested the Marvell reset call.  :-/
> >
> > I dug a little and found that when the Marvell code did its reset we
> > ended up getting a call to dw_mci_enable_sdio_irq(enb=0) and never saw
> > a dw_mci_enable_sdio_irq(enb=1) after.  I tracked it down further and
> > found that specifically it was the call to mmc_signal_sdio_irq() in
> > mmc_sdio_power_restore() that was making the call.  The call stack
> > shown for the "enb=0" call:
> >
> > [<c071a290>] (dw_mci_enable_sdio_irq) from [<c070a960>]
> > (mmc_sdio_power_restore+0x98/0xc0)
> > [<c070a960>] (mmc_sdio_power_restore) from [<c070a9b4>]
> > (mmc_sdio_reset+0x2c/0x30)
> > [<c070a9b4>] (mmc_signal_sdio_irq) from [<c06ff160>] (mmc_hw_reset+0xbc/0x138)
> > [<c06ff160>] (mmc_hw_reset) from [<bf1bbad8>]
> > (mwifiex_sdio_work+0x5d4/0x678 [mwifiex_sdio])
> > [<bf1bbad8>] (mwifiex_sdio_work [mwifiex_sdio]) from [<c0247cd0>]
> > (process_one_work+0x290/0x4b4)
> >
> > I picked your patch here (which gets rid of the call to
> > mmc_signal_sdio_irq()) and magically the problem went away because
> > there is no more call to mmc_signal_sdio_irq().
> >
> > I personally don't have lots of history about the whole
> > "powered_resume" code path.  I checked and mmc_card_keep_power() was 0
> > in my test case of getting called from hw_reset, so the rest of this
> > patch doesn't affect me at all.  This surprised me a little since I
> > saw "MMC_PM_KEEP_POWER" being set in mwifiex but then I realized that
> > it was only set for the duration of suspend and then cleared by the
> > core.  ;-)
> >
> > I will also say that I don't have any test case or knowledge of how
> > SDIO runtime suspend/resume is supposed to work since on dw_mmc SDIO
> > cards are currently not allowed to runtime suspend anyway.  ;-)
> >
> >
> > So I guess the result of all that long-winded reply is that for on
> > rk3288-veyron-jerry:
> >
> > Fixes: ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when
> > SDIO IRQs are enabled")
> > Tested-by: Douglas Anderson <dianders@chromium.org>
>
> Thanks a lot for testing and for your detailed feedback. I have
> amended the patch by adding your tags above.

Sure!  I'm going to try to do some detailed testing on the next patch
too to confirm it's OK, but I have a few other tasks to get to first.
;-)


> Moreover, we seems to need this for stable as well, but I am leaving
> that to be managed as a separate task. We may even consider the hole
> series for stable, but that requires more testing first.

Sure, makes sense.  We'll pick it to the Chrome OS 4.19 kernel
directly but it's usually nice to get fixes like this into stable so
everyone can benefit.


> > One last note is that, though Marvell WiFi works after a reset after
> > this commit, Marvell Bluetooth (on the same SDIO module) doesn't.  I
> > guess next week it'll be another bisect...
>
> Is the Bluetooth connected to the same SDIO interface, thus the
> Bluetooth driver is an SDIO func driver?

Yes, it's a SDIO func driver connected to the same interface.  So far
I've managed to confirm the problem on:

v4.4 (plus 76ae3e26ea43 mwifiex: add debugfs file for testing reset of card)
v4.9
next-20190708

...so it seems like it's not a "regression", it's just never worked.
:-P  I guess I'll have to see if I can figure out what's different in
our chromeos-3.14 kernel.  Ah, I see.  In 3.14 we had this solution:

pr_err("Resetting card...\n");
mmc_remove_host(reset_host);
/* 200ms delay is based on experiment with sdhci controller */
mdelay(200);
reset_host->rescan_entered = 0;
mmc_add_host(reset_host);

...I think that didn't fly upstream.  ...but I can confirm that this works fine:

cd /sys/bus/platform/drivers/dwmmc_rockchip
echo ff0d0000.dwmmc > unbind
sleep .5
echo ff0d0000.dwmmc > bind

...so I guess this boils down to: how does the mwifiex reset code not
behave like a full removal and re-insertion of the card?  Oh, but
maybe that's obvious.  We're doing all the reset / re-init from the
WiFi side of things (mwifiex_sdio_card_reset_work) but I don't think
anything is communicated to the Bluetooth side of things.  Presumably
this is just totally broken for everyone?  ...or am I confused?


-Doug


-Doug
Ulf Hansson July 9, 2019, 12:01 p.m. UTC | #4
On Mon, 8 Jul 2019 at 23:12, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Jul 8, 2019 at 3:54 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> >
> > On Thu, 4 Jul 2019 at 02:02, Doug Anderson <dianders@chromium.org> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, Jun 18, 2019 at 8:35 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > >
> > > > To use the so called powered-on re-initialization of an SDIO card, the
> > > > power to the card must obviously have stayed on. If not, the initialization
> > > > will simply fail.
> > > >
> > > > In the runtime suspend case, the card is always powered off. Hence, let's
> > > > drop the support for powered-on re-initialization during runtime resume, as
> > > > it doesn't make sense.
> > > >
> > > > Moreover, during a HW reset, the point is to cut the power to the card and
> > > > then do fresh re-initialization. Therefore drop the support for powered-on
> > > > re-initialization during HW reset.
> > > >
> > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > ---
> > > >  drivers/mmc/core/sdio.c | 8 +-------
> > > >  1 file changed, 1 insertion(+), 7 deletions(-)
> > >
> > > This has been on my list of things to test for a while but I never
> > > quite got to it...
> > >
> > > ...and then, today, I spent time bisecting why the "reset"
> > > functionality of miwfiex is broken on my 4.19 kernel [1].  AKA, this
> > > is broken:
> > >
> > > cd /sys/kernel/debug/mwifiex/mlan0
> > > echo 1 > reset
> > >
> > > I finally bisected the problem and tracked it down to commit
> > > ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when SDIO IRQs
> > > are enabled"), which embarrassingly has my Tested-by on it.  I guess I
> > > never tested the Marvell reset call.  :-/
> > >
> > > I dug a little and found that when the Marvell code did its reset we
> > > ended up getting a call to dw_mci_enable_sdio_irq(enb=0) and never saw
> > > a dw_mci_enable_sdio_irq(enb=1) after.  I tracked it down further and
> > > found that specifically it was the call to mmc_signal_sdio_irq() in
> > > mmc_sdio_power_restore() that was making the call.  The call stack
> > > shown for the "enb=0" call:
> > >
> > > [<c071a290>] (dw_mci_enable_sdio_irq) from [<c070a960>]
> > > (mmc_sdio_power_restore+0x98/0xc0)
> > > [<c070a960>] (mmc_sdio_power_restore) from [<c070a9b4>]
> > > (mmc_sdio_reset+0x2c/0x30)
> > > [<c070a9b4>] (mmc_signal_sdio_irq) from [<c06ff160>] (mmc_hw_reset+0xbc/0x138)
> > > [<c06ff160>] (mmc_hw_reset) from [<bf1bbad8>]
> > > (mwifiex_sdio_work+0x5d4/0x678 [mwifiex_sdio])
> > > [<bf1bbad8>] (mwifiex_sdio_work [mwifiex_sdio]) from [<c0247cd0>]
> > > (process_one_work+0x290/0x4b4)
> > >
> > > I picked your patch here (which gets rid of the call to
> > > mmc_signal_sdio_irq()) and magically the problem went away because
> > > there is no more call to mmc_signal_sdio_irq().
> > >
> > > I personally don't have lots of history about the whole
> > > "powered_resume" code path.  I checked and mmc_card_keep_power() was 0
> > > in my test case of getting called from hw_reset, so the rest of this
> > > patch doesn't affect me at all.  This surprised me a little since I
> > > saw "MMC_PM_KEEP_POWER" being set in mwifiex but then I realized that
> > > it was only set for the duration of suspend and then cleared by the
> > > core.  ;-)
> > >
> > > I will also say that I don't have any test case or knowledge of how
> > > SDIO runtime suspend/resume is supposed to work since on dw_mmc SDIO
> > > cards are currently not allowed to runtime suspend anyway.  ;-)
> > >
> > >
> > > So I guess the result of all that long-winded reply is that for on
> > > rk3288-veyron-jerry:
> > >
> > > Fixes: ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when
> > > SDIO IRQs are enabled")
> > > Tested-by: Douglas Anderson <dianders@chromium.org>
> >
> > Thanks a lot for testing and for your detailed feedback. I have
> > amended the patch by adding your tags above.
>
> Sure!  I'm going to try to do some detailed testing on the next patch
> too to confirm it's OK, but I have a few other tasks to get to first.
> ;-)
>
>
> > Moreover, we seems to need this for stable as well, but I am leaving
> > that to be managed as a separate task. We may even consider the hole
> > series for stable, but that requires more testing first.
>
> Sure, makes sense.  We'll pick it to the Chrome OS 4.19 kernel
> directly but it's usually nice to get fixes like this into stable so
> everyone can benefit.
>
>
> > > One last note is that, though Marvell WiFi works after a reset after
> > > this commit, Marvell Bluetooth (on the same SDIO module) doesn't.  I
> > > guess next week it'll be another bisect...
> >
> > Is the Bluetooth connected to the same SDIO interface, thus the
> > Bluetooth driver is an SDIO func driver?
>
> Yes, it's a SDIO func driver connected to the same interface.  So far
> I've managed to confirm the problem on:
>
> v4.4 (plus 76ae3e26ea43 mwifiex: add debugfs file for testing reset of card)
> v4.9
> next-20190708
>
> ...so it seems like it's not a "regression", it's just never worked.
> :-P  I guess I'll have to see if I can figure out what's different in
> our chromeos-3.14 kernel.  Ah, I see.  In 3.14 we had this solution:
>
> pr_err("Resetting card...\n");
> mmc_remove_host(reset_host);
> /* 200ms delay is based on experiment with sdhci controller */
> mdelay(200);
> reset_host->rescan_entered = 0;
> mmc_add_host(reset_host);
>
> ...I think that didn't fly upstream.  ...but I can confirm that this works fine:
>
> cd /sys/bus/platform/drivers/dwmmc_rockchip
> echo ff0d0000.dwmmc > unbind
> sleep .5
> echo ff0d0000.dwmmc > bind
>
> ...so I guess this boils down to: how does the mwifiex reset code not
> behave like a full removal and re-insertion of the card?  Oh, but
> maybe that's obvious.  We're doing all the reset / re-init from the
> WiFi side of things (mwifiex_sdio_card_reset_work) but I don't think
> anything is communicated to the Bluetooth side of things.  Presumably
> this is just totally broken for everyone?  ...or am I confused?

Nope, that is most likely what is happening.

I am not sure what is the best method to deal with this. Perhaps we
should invent some callback the SDIO core code can call, for each
active SDIO func on the particular SDIO card that becomes reset.

Or is there a better way you think?

Kind regards
Uffe
Doug Anderson July 9, 2019, 11:35 p.m. UTC | #5
Hi,

On Tue, Jul 9, 2019 at 5:02 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Mon, 8 Jul 2019 at 23:12, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Jul 8, 2019 at 3:54 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > >
> > > On Thu, 4 Jul 2019 at 02:02, Doug Anderson <dianders@chromium.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Jun 18, 2019 at 8:35 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
> > > > >
> > > > > To use the so called powered-on re-initialization of an SDIO card, the
> > > > > power to the card must obviously have stayed on. If not, the initialization
> > > > > will simply fail.
> > > > >
> > > > > In the runtime suspend case, the card is always powered off. Hence, let's
> > > > > drop the support for powered-on re-initialization during runtime resume, as
> > > > > it doesn't make sense.
> > > > >
> > > > > Moreover, during a HW reset, the point is to cut the power to the card and
> > > > > then do fresh re-initialization. Therefore drop the support for powered-on
> > > > > re-initialization during HW reset.
> > > > >
> > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > ---
> > > > >  drivers/mmc/core/sdio.c | 8 +-------
> > > > >  1 file changed, 1 insertion(+), 7 deletions(-)
> > > >
> > > > This has been on my list of things to test for a while but I never
> > > > quite got to it...
> > > >
> > > > ...and then, today, I spent time bisecting why the "reset"
> > > > functionality of miwfiex is broken on my 4.19 kernel [1].  AKA, this
> > > > is broken:
> > > >
> > > > cd /sys/kernel/debug/mwifiex/mlan0
> > > > echo 1 > reset
> > > >
> > > > I finally bisected the problem and tracked it down to commit
> > > > ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when SDIO IRQs
> > > > are enabled"), which embarrassingly has my Tested-by on it.  I guess I
> > > > never tested the Marvell reset call.  :-/
> > > >
> > > > I dug a little and found that when the Marvell code did its reset we
> > > > ended up getting a call to dw_mci_enable_sdio_irq(enb=0) and never saw
> > > > a dw_mci_enable_sdio_irq(enb=1) after.  I tracked it down further and
> > > > found that specifically it was the call to mmc_signal_sdio_irq() in
> > > > mmc_sdio_power_restore() that was making the call.  The call stack
> > > > shown for the "enb=0" call:
> > > >
> > > > [<c071a290>] (dw_mci_enable_sdio_irq) from [<c070a960>]
> > > > (mmc_sdio_power_restore+0x98/0xc0)
> > > > [<c070a960>] (mmc_sdio_power_restore) from [<c070a9b4>]
> > > > (mmc_sdio_reset+0x2c/0x30)
> > > > [<c070a9b4>] (mmc_signal_sdio_irq) from [<c06ff160>] (mmc_hw_reset+0xbc/0x138)
> > > > [<c06ff160>] (mmc_hw_reset) from [<bf1bbad8>]
> > > > (mwifiex_sdio_work+0x5d4/0x678 [mwifiex_sdio])
> > > > [<bf1bbad8>] (mwifiex_sdio_work [mwifiex_sdio]) from [<c0247cd0>]
> > > > (process_one_work+0x290/0x4b4)
> > > >
> > > > I picked your patch here (which gets rid of the call to
> > > > mmc_signal_sdio_irq()) and magically the problem went away because
> > > > there is no more call to mmc_signal_sdio_irq().
> > > >
> > > > I personally don't have lots of history about the whole
> > > > "powered_resume" code path.  I checked and mmc_card_keep_power() was 0
> > > > in my test case of getting called from hw_reset, so the rest of this
> > > > patch doesn't affect me at all.  This surprised me a little since I
> > > > saw "MMC_PM_KEEP_POWER" being set in mwifiex but then I realized that
> > > > it was only set for the duration of suspend and then cleared by the
> > > > core.  ;-)
> > > >
> > > > I will also say that I don't have any test case or knowledge of how
> > > > SDIO runtime suspend/resume is supposed to work since on dw_mmc SDIO
> > > > cards are currently not allowed to runtime suspend anyway.  ;-)
> > > >
> > > >
> > > > So I guess the result of all that long-winded reply is that for on
> > > > rk3288-veyron-jerry:
> > > >
> > > > Fixes: ca8971ca5753 ("mmc: dw_mmc: Prevent runtime PM suspend when
> > > > SDIO IRQs are enabled")
> > > > Tested-by: Douglas Anderson <dianders@chromium.org>
> > >
> > > Thanks a lot for testing and for your detailed feedback. I have
> > > amended the patch by adding your tags above.
> >
> > Sure!  I'm going to try to do some detailed testing on the next patch
> > too to confirm it's OK, but I have a few other tasks to get to first.
> > ;-)
> >
> >
> > > Moreover, we seems to need this for stable as well, but I am leaving
> > > that to be managed as a separate task. We may even consider the hole
> > > series for stable, but that requires more testing first.
> >
> > Sure, makes sense.  We'll pick it to the Chrome OS 4.19 kernel
> > directly but it's usually nice to get fixes like this into stable so
> > everyone can benefit.
> >
> >
> > > > One last note is that, though Marvell WiFi works after a reset after
> > > > this commit, Marvell Bluetooth (on the same SDIO module) doesn't.  I
> > > > guess next week it'll be another bisect...
> > >
> > > Is the Bluetooth connected to the same SDIO interface, thus the
> > > Bluetooth driver is an SDIO func driver?
> >
> > Yes, it's a SDIO func driver connected to the same interface.  So far
> > I've managed to confirm the problem on:
> >
> > v4.4 (plus 76ae3e26ea43 mwifiex: add debugfs file for testing reset of card)
> > v4.9
> > next-20190708
> >
> > ...so it seems like it's not a "regression", it's just never worked.
> > :-P  I guess I'll have to see if I can figure out what's different in
> > our chromeos-3.14 kernel.  Ah, I see.  In 3.14 we had this solution:
> >
> > pr_err("Resetting card...\n");
> > mmc_remove_host(reset_host);
> > /* 200ms delay is based on experiment with sdhci controller */
> > mdelay(200);
> > reset_host->rescan_entered = 0;
> > mmc_add_host(reset_host);
> >
> > ...I think that didn't fly upstream.  ...but I can confirm that this works fine:
> >
> > cd /sys/bus/platform/drivers/dwmmc_rockchip
> > echo ff0d0000.dwmmc > unbind
> > sleep .5
> > echo ff0d0000.dwmmc > bind
> >
> > ...so I guess this boils down to: how does the mwifiex reset code not
> > behave like a full removal and re-insertion of the card?  Oh, but
> > maybe that's obvious.  We're doing all the reset / re-init from the
> > WiFi side of things (mwifiex_sdio_card_reset_work) but I don't think
> > anything is communicated to the Bluetooth side of things.  Presumably
> > this is just totally broken for everyone?  ...or am I confused?
>
> Nope, that is most likely what is happening.
>
> I am not sure what is the best method to deal with this. Perhaps we
> should invent some callback the SDIO core code can call, for each
> active SDIO func on the particular SDIO card that becomes reset.
>
> Or is there a better way you think?

I didn't get a chance to fully dig today, but I keep thinking that the
cleanest way would be if I could somehow tell the MMC core to pretend
to unplug the card and then re-plug it in.  Then we could go through
all the standard code paths.  I remember the last time I looked for a
nice way to do that I couldn't find one.

...barring that I wonder if it's enough to just remove and re-add all
the active SDIO funcs?

-Doug
diff mbox series

Patch

diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index 29f86c1e9923..a9bfcae8db5b 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -1028,13 +1028,7 @@  static int mmc_sdio_resume(struct mmc_host *host)
 
 static int mmc_sdio_power_restore(struct mmc_host *host)
 {
-	int ret;
-
-	ret = mmc_sdio_reinit_card(host, mmc_card_keep_power(host));
-	if (!ret && host->sdio_irqs)
-		mmc_signal_sdio_irq(host);
-
-	return ret;
+	return mmc_sdio_reinit_card(host, 0);
 }
 
 static int mmc_sdio_runtime_suspend(struct mmc_host *host)