Message ID | 20191017135739.1315-3-ulf.hansson@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | mmc: core: Fixup HW reset for SDIO cards | expand |
Hi, On Thu, Oct 17, 2019 at 6:58 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > It have turned out that it's not a good idea to try to power cycle and to > re-initialize the SDIO card, via mmc_hw_reset. This because there may be > multiple SDIO funcs attached to the same SDIO card. > > To solve this problem, we would need to inform each of the SDIO func in > some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely > trivial operation. Therefore, let's instead take the easy way out, by > triggering a card removal and force a new rescan of the SDIO card. > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > --- > drivers/mmc/core/core.c | 3 +-- > drivers/mmc/core/core.h | 2 ++ > drivers/mmc/core/sdio.c | 11 +++++++++-- > 3 files changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c > index 6f8342702c73..39c4567e39d8 100644 > --- a/drivers/mmc/core/core.c > +++ b/drivers/mmc/core/core.c > @@ -1469,8 +1469,7 @@ void mmc_detach_bus(struct mmc_host *host) > mmc_bus_put(host); > } > > -static void _mmc_detect_change(struct mmc_host *host, unsigned long delay, > - bool cd_irq) > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq) > { > /* > * If the device is configured as wakeup, we prevent a new sleep for > diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h > index 328c78dbee66..575ac0257af2 100644 > --- a/drivers/mmc/core/core.h > +++ b/drivers/mmc/core/core.h > @@ -70,6 +70,8 @@ void mmc_rescan(struct work_struct *work); > void mmc_start_host(struct mmc_host *host); > void mmc_stop_host(struct mmc_host *host); > > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, > + bool cd_irq); > int _mmc_detect_card_removed(struct mmc_host *host); > int mmc_detect_card_removed(struct mmc_host *host); > > diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c > index 26cabd53ddc5..5d7462c223c3 100644 > --- a/drivers/mmc/core/sdio.c > +++ b/drivers/mmc/core/sdio.c > @@ -1050,8 +1050,15 @@ static int mmc_sdio_runtime_resume(struct mmc_host *host) > > static int mmc_sdio_hw_reset(struct mmc_host *host) > { > - mmc_power_cycle(host, host->card->ocr); > - return mmc_sdio_reinit_card(host); > + /* > + * We may have more multiple SDIO funcs. Rather than to inform them all, > + * let's trigger a removal and force a new rescan of the card. > + */ > + host->rescan_entered = 0; > + mmc_card_set_removed(host->card); > + _mmc_detect_change(host, 0, false); > + > + return 0; > } The problem I see here is that callers of this reset function aren't expecting it to work this way. Look specifically at mwifiex_sdio_card_reset_work(). It's assuming that it needs to do things like shutdown / reinit. Now it's true that the old mwifiex_sdio_card_reset_work() was pretty broken on any systems that also had SDIO bluetooth, but presumably it worked OK on systems without SDIO Bluetooth. I don't think it'll work so well now. Testing shows that indeed your patch breaks mwifiex reset worse than it was before (AKA WiFi totally fails instead of it just killing Bluetooth). I think it may be better to add a new API call rather than trying to co-opt the old one. Maybe put a WARN_ON() for the old API call to make people move away from it, or something? ...but on the bright side, your patch does seem to work. If I add my patch from <https://lkml.kernel.org/r/20190722193939.125578-3-dianders@chromium.org> and change "sdio_trigger_replug(func)" to "mmc_hw_reset(func->card->host)" then I can pass reboot tests. I made it through about 300 cycles of my old test before stopping the test to work on other things. In terms of the implementation, I will freely admit that I'm always confused by the SD/MMC state machines, but as far as I can tell your patch accomplishes the same thing as mine but in a bit simpler way. ;-) I even confirmed that with your patch mmc_power_off() / mmc_power_up are still called... Thanks! -Doug
On Tue, 22 Oct 2019 at 00:13, Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Thu, Oct 17, 2019 at 6:58 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > It have turned out that it's not a good idea to try to power cycle and to > > re-initialize the SDIO card, via mmc_hw_reset. This because there may be > > multiple SDIO funcs attached to the same SDIO card. > > > > To solve this problem, we would need to inform each of the SDIO func in > > some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely > > trivial operation. Therefore, let's instead take the easy way out, by > > triggering a card removal and force a new rescan of the SDIO card. > > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> > > --- > > drivers/mmc/core/core.c | 3 +-- > > drivers/mmc/core/core.h | 2 ++ > > drivers/mmc/core/sdio.c | 11 +++++++++-- > > 3 files changed, 12 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c > > index 6f8342702c73..39c4567e39d8 100644 > > --- a/drivers/mmc/core/core.c > > +++ b/drivers/mmc/core/core.c > > @@ -1469,8 +1469,7 @@ void mmc_detach_bus(struct mmc_host *host) > > mmc_bus_put(host); > > } > > > > -static void _mmc_detect_change(struct mmc_host *host, unsigned long delay, > > - bool cd_irq) > > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq) > > { > > /* > > * If the device is configured as wakeup, we prevent a new sleep for > > diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h > > index 328c78dbee66..575ac0257af2 100644 > > --- a/drivers/mmc/core/core.h > > +++ b/drivers/mmc/core/core.h > > @@ -70,6 +70,8 @@ void mmc_rescan(struct work_struct *work); > > void mmc_start_host(struct mmc_host *host); > > void mmc_stop_host(struct mmc_host *host); > > > > +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, > > + bool cd_irq); > > int _mmc_detect_card_removed(struct mmc_host *host); > > int mmc_detect_card_removed(struct mmc_host *host); > > > > diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c > > index 26cabd53ddc5..5d7462c223c3 100644 > > --- a/drivers/mmc/core/sdio.c > > +++ b/drivers/mmc/core/sdio.c > > @@ -1050,8 +1050,15 @@ static int mmc_sdio_runtime_resume(struct mmc_host *host) > > > > static int mmc_sdio_hw_reset(struct mmc_host *host) > > { > > - mmc_power_cycle(host, host->card->ocr); > > - return mmc_sdio_reinit_card(host); > > + /* > > + * We may have more multiple SDIO funcs. Rather than to inform them all, > > + * let's trigger a removal and force a new rescan of the card. > > + */ > > + host->rescan_entered = 0; > > + mmc_card_set_removed(host->card); > > + _mmc_detect_change(host, 0, false); > > + > > + return 0; > > } > > The problem I see here is that callers of this reset function aren't > expecting it to work this way. Look specifically at > mwifiex_sdio_card_reset_work(). It's assuming that it needs to do > things like shutdown / reinit. Now it's true that the old > mwifiex_sdio_card_reset_work() was pretty broken on any systems that > also had SDIO bluetooth, but presumably it worked OK on systems > without SDIO Bluetooth. I don't think it'll work so well now. Good point! I guess I was hoping that running through ->remove() and then ->probe() for the SDIO func drivers should simply take care of whatever that may be needed. In some way this makes the driver broken already in regards to this path, but never mind. > > Testing shows that indeed your patch breaks mwifiex reset worse than > it was before (AKA WiFi totally fails instead of it just killing > Bluetooth). > > I think it may be better to add a new API call rather than trying to > co-opt the old one. Maybe put a WARN_ON() for the old API call to > make people move away from it, or something? Thanks again for testing and for valuable feedback! Clearly this needs a little more thinking. An additional concern I see with the "hotplug approach" implemented in $subject patch, is that it becomes unnecessary heavy when there is only one SDIO func driver bound. In one way I am tempted to try to address that situation, as it seems a bit silly to do full hotplug dance when it isn't needed. > > > ...but on the bright side, your patch does seem to work. If I add my > patch from <https://lkml.kernel.org/r/20190722193939.125578-3-dianders@chromium.org> > and change "sdio_trigger_replug(func)" to > "mmc_hw_reset(func->card->host)" then I can pass reboot tests. I made > it through about 300 cycles of my old test before stopping the test to > work on other things. That's really good news. Thanks! > > > In terms of the implementation, I will freely admit that I'm always > confused by the SD/MMC state machines, but as far as I can tell your > patch accomplishes the same thing as mine but in a bit simpler way. > ;-) I even confirmed that with your patch mmc_power_off() / > mmc_power_up are still called... > > Thanks! > > -Doug Alright, let me rework this and post a new version. I keep you posted. Kind regards Uffe
Hi, On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > The problem I see here is that callers of this reset function aren't > > expecting it to work this way. Look specifically at > > mwifiex_sdio_card_reset_work(). It's assuming that it needs to do > > things like shutdown / reinit. Now it's true that the old > > mwifiex_sdio_card_reset_work() was pretty broken on any systems that > > also had SDIO bluetooth, but presumably it worked OK on systems > > without SDIO Bluetooth. I don't think it'll work so well now. > > Good point! > > I guess I was hoping that running through ->remove() and then > ->probe() for the SDIO func drivers should simply take care of > whatever that may be needed. In some way this makes the driver broken > already in regards to this path, but never mind. Yeah, probably true. I guess if anyone actually expected to use one of these cards as a removable SDIO card (I have seen such dev boards long ago) then it would always have been possible for someone to remove the card at just the wrong time and break things. > > Testing shows that indeed your patch breaks mwifiex reset worse than > > it was before (AKA WiFi totally fails instead of it just killing > > Bluetooth). > > > > I think it may be better to add a new API call rather than trying to > > co-opt the old one. Maybe put a WARN_ON() for the old API call to > > make people move away from it, or something? > > Thanks again for testing and for valuable feedback! Clearly this needs > a little more thinking. > > An additional concern I see with the "hotplug approach" implemented in > $subject patch, is that it becomes unnecessary heavy when there is > only one SDIO func driver bound. > > In one way I am tempted to try to address that situation, as it seems > a bit silly to do full hotplug dance when it isn't needed. True, though I kinda like the heavy solution here. At least in the mwifiex case this isn't a part of the normal flow. AKA: we don't call this function during normal bootup nor during any normal operations. It's much more of an "oh crap, something's not working and we don't know what to do" type solution. I mean, I guess it's still not uncommon that we end up in this code path due to the number of bugs in Marvell firmware, but I'm just trying to say that it's an error code path and not a normal one. In my mind that means the more things we can re-init the better. If this was, on the other hand, a reset that we were supposed to always assert when doing a normal operation (like it wants us to reset it when we switch modes, or something) then a lighter operation would make more sense. -Doug
On Tue, 22 Oct 2019 at 16:47, Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > The problem I see here is that callers of this reset function aren't > > > expecting it to work this way. Look specifically at > > > mwifiex_sdio_card_reset_work(). It's assuming that it needs to do > > > things like shutdown / reinit. Now it's true that the old > > > mwifiex_sdio_card_reset_work() was pretty broken on any systems that > > > also had SDIO bluetooth, but presumably it worked OK on systems > > > without SDIO Bluetooth. I don't think it'll work so well now. > > > > Good point! > > > > I guess I was hoping that running through ->remove() and then > > ->probe() for the SDIO func drivers should simply take care of > > whatever that may be needed. In some way this makes the driver broken > > already in regards to this path, but never mind. > > Yeah, probably true. I guess if anyone actually expected to use one > of these cards as a removable SDIO card (I have seen such dev boards > long ago) then it would always have been possible for someone to > remove the card at just the wrong time and break things. Well, this isn't solely about card removal but driver removal as well. And the latter can be managed from user space at any point in time. > > > > > Testing shows that indeed your patch breaks mwifiex reset worse than > > > it was before (AKA WiFi totally fails instead of it just killing > > > Bluetooth). > > > > > > I think it may be better to add a new API call rather than trying to > > > co-opt the old one. Maybe put a WARN_ON() for the old API call to > > > make people move away from it, or something? > > > > Thanks again for testing and for valuable feedback! Clearly this needs > > a little more thinking. > > > > An additional concern I see with the "hotplug approach" implemented in > > $subject patch, is that it becomes unnecessary heavy when there is > > only one SDIO func driver bound. > > > > In one way I am tempted to try to address that situation, as it seems > > a bit silly to do full hotplug dance when it isn't needed. > > True, though I kinda like the heavy solution here. At least in the > mwifiex case this isn't a part of the normal flow. AKA: we don't call > this function during normal bootup nor during any normal operations. > It's much more of an "oh crap, something's not working and we don't > know what to do" type solution. I mean, I guess it's still not > uncommon that we end up in this code path due to the number of bugs in > Marvell firmware, but I'm just trying to say that it's an error code > path and not a normal one. In my mind that means the more things we > can re-init the better. You have a point, but... > > If this was, on the other hand, a reset that we were supposed to > always assert when doing a normal operation (like it wants us to reset > it when we switch modes, or something) then a lighter operation would > make more sense. This is indeed the tricky part, as it depends on the level of bugs, but also under what specific circumstances the reset is getting called. In the TI case (drivers/net/wireless/ti/wlcore/sdio.c) the reset is executed at the "power on" case, which for example is at system resume. And we want system resume to be as fast as possible... I am exploring a few options to deal with both cases, let's see what I can come up with in a day or two. Kind regards Uffe
On Wed, 23 Oct 2019 at 17:06, Ulf Hansson <ulf.hansson@linaro.org> wrote: > > On Tue, 22 Oct 2019 at 16:47, Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Mon, Oct 21, 2019 at 11:51 PM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > > > > > The problem I see here is that callers of this reset function aren't > > > > expecting it to work this way. Look specifically at > > > > mwifiex_sdio_card_reset_work(). It's assuming that it needs to do > > > > things like shutdown / reinit. Now it's true that the old > > > > mwifiex_sdio_card_reset_work() was pretty broken on any systems that > > > > also had SDIO bluetooth, but presumably it worked OK on systems > > > > without SDIO Bluetooth. I don't think it'll work so well now. > > > > > > Good point! > > > > > > I guess I was hoping that running through ->remove() and then > > > ->probe() for the SDIO func drivers should simply take care of > > > whatever that may be needed. In some way this makes the driver broken > > > already in regards to this path, but never mind. > > > > Yeah, probably true. I guess if anyone actually expected to use one > > of these cards as a removable SDIO card (I have seen such dev boards > > long ago) then it would always have been possible for someone to > > remove the card at just the wrong time and break things. > > Well, this isn't solely about card removal but driver removal as well. > And the latter can be managed from user space at any point in time. > > > > > > > > > Testing shows that indeed your patch breaks mwifiex reset worse than > > > > it was before (AKA WiFi totally fails instead of it just killing > > > > Bluetooth). > > > > > > > > I think it may be better to add a new API call rather than trying to > > > > co-opt the old one. Maybe put a WARN_ON() for the old API call to > > > > make people move away from it, or something? > > > > > > Thanks again for testing and for valuable feedback! Clearly this needs > > > a little more thinking. > > > > > > An additional concern I see with the "hotplug approach" implemented in > > > $subject patch, is that it becomes unnecessary heavy when there is > > > only one SDIO func driver bound. > > > > > > In one way I am tempted to try to address that situation, as it seems > > > a bit silly to do full hotplug dance when it isn't needed. > > > > True, though I kinda like the heavy solution here. At least in the > > mwifiex case this isn't a part of the normal flow. AKA: we don't call > > this function during normal bootup nor during any normal operations. > > It's much more of an "oh crap, something's not working and we don't > > know what to do" type solution. I mean, I guess it's still not > > uncommon that we end up in this code path due to the number of bugs in > > Marvell firmware, but I'm just trying to say that it's an error code > > path and not a normal one. In my mind that means the more things we > > can re-init the better. > > You have a point, but... > > > > > If this was, on the other hand, a reset that we were supposed to > > always assert when doing a normal operation (like it wants us to reset > > it when we switch modes, or something) then a lighter operation would > > make more sense. > > This is indeed the tricky part, as it depends on the level of bugs, > but also under what specific circumstances the reset is getting > called. > > In the TI case (drivers/net/wireless/ti/wlcore/sdio.c) the reset is > executed at the "power on" case, which for example is at system > resume. And we want system resume to be as fast as possible... > > I am exploring a few options to deal with both cases, let's see what I > can come up with in a day or two. FYI, still exploring and trying a few slightly different options. I should be able to post something early next week, stay tuned. :-) Kind regards Uffe
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c index 6f8342702c73..39c4567e39d8 100644 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@ -1469,8 +1469,7 @@ void mmc_detach_bus(struct mmc_host *host) mmc_bus_put(host); } -static void _mmc_detect_change(struct mmc_host *host, unsigned long delay, - bool cd_irq) +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, bool cd_irq) { /* * If the device is configured as wakeup, we prevent a new sleep for diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h index 328c78dbee66..575ac0257af2 100644 --- a/drivers/mmc/core/core.h +++ b/drivers/mmc/core/core.h @@ -70,6 +70,8 @@ void mmc_rescan(struct work_struct *work); void mmc_start_host(struct mmc_host *host); void mmc_stop_host(struct mmc_host *host); +void _mmc_detect_change(struct mmc_host *host, unsigned long delay, + bool cd_irq); int _mmc_detect_card_removed(struct mmc_host *host); int mmc_detect_card_removed(struct mmc_host *host); diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c index 26cabd53ddc5..5d7462c223c3 100644 --- a/drivers/mmc/core/sdio.c +++ b/drivers/mmc/core/sdio.c @@ -1050,8 +1050,15 @@ static int mmc_sdio_runtime_resume(struct mmc_host *host) static int mmc_sdio_hw_reset(struct mmc_host *host) { - mmc_power_cycle(host, host->card->ocr); - return mmc_sdio_reinit_card(host); + /* + * We may have more multiple SDIO funcs. Rather than to inform them all, + * let's trigger a removal and force a new rescan of the card. + */ + host->rescan_entered = 0; + mmc_card_set_removed(host->card); + _mmc_detect_change(host, 0, false); + + return 0; } static int mmc_sdio_sw_reset(struct mmc_host *host)
It have turned out that it's not a good idea to try to power cycle and to re-initialize the SDIO card, via mmc_hw_reset. This because there may be multiple SDIO funcs attached to the same SDIO card. To solve this problem, we would need to inform each of the SDIO func in some way when mmc_sdio_hw_reset() gets called, but that isn't an entirely trivial operation. Therefore, let's instead take the easy way out, by triggering a card removal and force a new rescan of the SDIO card. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> --- drivers/mmc/core/core.c | 3 +-- drivers/mmc/core/core.h | 2 ++ drivers/mmc/core/sdio.c | 11 +++++++++-- 3 files changed, 12 insertions(+), 4 deletions(-)