diff mbox series

[2/2] mmc: core: Run handlers for pending SDIO interrupts on resume

Message ID 20190828214620.66003-2-mka@chromium.org (mailing list archive)
State New, archived
Headers show
Series [1/2] mmc: sdio: Move code to get pending SDIO IRQs to a function | expand

Commit Message

Matthias Kaehlcke Aug. 28, 2019, 9:46 p.m. UTC
With commit 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs
when the card is suspended") SDIO interrupts are dropped if they
occur while the card is suspended. Dropping the interrupts can cause
problems after resume with cards that remain powered during suspend
and preserve their state. These cards may end up in an inconsistent
state since the event that triggered the interrupt is never processed
and remains pending. One example is the Bluetooth function of the
Marvell 8997, SDIO is broken on resume (for both Bluetooth and WiFi)
when processing of a pending HCI event is skipped.

For cards that remained powered during suspend check on resume if
SDIO interrupts are pending, and trigger interrupt processing if
needed.

Fixes: 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs when the card is suspended")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
 drivers/mmc/core/sdio.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Ulf Hansson Aug. 29, 2019, 8:48 a.m. UTC | #1
On Wed, 28 Aug 2019 at 23:46, Matthias Kaehlcke <mka@chromium.org> wrote:
>
> With commit 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs
> when the card is suspended") SDIO interrupts are dropped if they
> occur while the card is suspended. Dropping the interrupts can cause
> problems after resume with cards that remain powered during suspend
> and preserve their state. These cards may end up in an inconsistent
> state since the event that triggered the interrupt is never processed
> and remains pending. One example is the Bluetooth function of the
> Marvell 8997, SDIO is broken on resume (for both Bluetooth and WiFi)
> when processing of a pending HCI event is skipped.
>
> For cards that remained powered during suspend check on resume if
> SDIO interrupts are pending, and trigger interrupt processing if
> needed.

Thanks for the detailed changelog, much appreciated!

>
> Fixes: 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs when the card is suspended")
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
>  drivers/mmc/core/sdio.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> index 8dd8fc32ecca..a6b4742a91c6 100644
> --- a/drivers/mmc/core/sdio.c
> +++ b/drivers/mmc/core/sdio.c
> @@ -975,6 +975,7 @@ static int mmc_sdio_suspend(struct mmc_host *host)
>  static int mmc_sdio_resume(struct mmc_host *host)
>  {
>         int err = 0;
> +       u8 pending = 0;
>
>         /* Basic card reinitialization. */
>         mmc_claim_host(host);
> @@ -1009,6 +1010,14 @@ static int mmc_sdio_resume(struct mmc_host *host)
>         /* Allow SDIO IRQs to be processed again. */
>         mmc_card_clr_suspended(host->card);
>
> +       if (!mmc_card_keep_power(host))
> +               goto skip_pending_irqs;
> +
> +       if (!sdio_get_pending_irqs(host, &pending) &&
> +           pending != 0)
> +               sdio_signal_irq(host);

In one way, this change makes sense as it adopts the legacy behavior,
signaling "cached" SDIO IRQs also for the new SDIO irq work interface.

However, there is at least one major concern I see with this approach.
That is, in the execution path for sdio_signal_irq() (or calling
wake_up_process() for the legacy path), we may end up invoking the
SDIO func's ->irq_handler() callback, as to let the SDIO func driver
to consume the SDIO IRQ.

The problem with this is, that the corresponding SDIO func driver may
not have been system resumed, when the ->irq_handler() callback is
invoked. If the SDIO func driver would have configured the IRQ as a
wakeup, then I would expect this to work, but not just by having a
maintained power to the card.

In the end, I think we need to deal with synchronizations for this,
through the mmc/sdio core, in one way or the other - before we kick
SDIO IRQs during system resume.

> +
> +skip_pending_irqs:
>         if (host->sdio_irqs) {
>                 if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD))
>                         wake_up_process(host->sdio_irq_thread);
> --
> 2.23.0.187.g17f5b7556c-goog
>

Kind regards
Uffe
Matthias Kaehlcke Aug. 29, 2019, 5:15 p.m. UTC | #2
Hi Ulf,

On Thu, Aug 29, 2019 at 10:48:58AM +0200, Ulf Hansson wrote:
> On Wed, 28 Aug 2019 at 23:46, Matthias Kaehlcke <mka@chromium.org> wrote:
> >
> > With commit 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs
> > when the card is suspended") SDIO interrupts are dropped if they
> > occur while the card is suspended. Dropping the interrupts can cause
> > problems after resume with cards that remain powered during suspend
> > and preserve their state. These cards may end up in an inconsistent
> > state since the event that triggered the interrupt is never processed
> > and remains pending. One example is the Bluetooth function of the
> > Marvell 8997, SDIO is broken on resume (for both Bluetooth and WiFi)
> > when processing of a pending HCI event is skipped.
> >
> > For cards that remained powered during suspend check on resume if
> > SDIO interrupts are pending, and trigger interrupt processing if
> > needed.
> 
> Thanks for the detailed changelog, much appreciated!
> 
> >
> > Fixes: 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs when the card is suspended")
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> >  drivers/mmc/core/sdio.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> > index 8dd8fc32ecca..a6b4742a91c6 100644
> > --- a/drivers/mmc/core/sdio.c
> > +++ b/drivers/mmc/core/sdio.c
> > @@ -975,6 +975,7 @@ static int mmc_sdio_suspend(struct mmc_host *host)
> >  static int mmc_sdio_resume(struct mmc_host *host)
> >  {
> >         int err = 0;
> > +       u8 pending = 0;
> >
> >         /* Basic card reinitialization. */
> >         mmc_claim_host(host);
> > @@ -1009,6 +1010,14 @@ static int mmc_sdio_resume(struct mmc_host *host)
> >         /* Allow SDIO IRQs to be processed again. */
> >         mmc_card_clr_suspended(host->card);
> >
> > +       if (!mmc_card_keep_power(host))
> > +               goto skip_pending_irqs;
> > +
> > +       if (!sdio_get_pending_irqs(host, &pending) &&
> > +           pending != 0)
> > +               sdio_signal_irq(host);
> 
> In one way, this change makes sense as it adopts the legacy behavior,
> signaling "cached" SDIO IRQs also for the new SDIO irq work interface.
> 
> However, there is at least one major concern I see with this approach.
> That is, in the execution path for sdio_signal_irq() (or calling
> wake_up_process() for the legacy path), we may end up invoking the
> SDIO func's ->irq_handler() callback, as to let the SDIO func driver
> to consume the SDIO IRQ.
> 
> The problem with this is, that the corresponding SDIO func driver may
> not have been system resumed, when the ->irq_handler() callback is
> invoked.

While debugging the the problem with btmrvl I found that this is
already the case without the patch, just not during resume, but when
suspending. The func driver suspends before the SDIO bus and
interrupts can keep coming in. These are processed while the func
driver is suspended, until the SDIO core starts dropping the
interrupts.

And I think it is also already true at resume time: mmc_sdio_resume()
re-enables SDIO IRQs and disables dropping them.

> If the SDIO func driver would have configured the IRQ as a
> wakeup, then I would expect this to work, but not just by having a
> maintained power to the card.

Is the assumption that no IRQs are generated after SDIO func suspend
unless wakeup is enabled?

On the system I'm currently debugging OOB wakeup is not working,
which might be part of the problem.

> In the end, I think we need to deal with synchronizations for this,
> through the mmc/sdio core, in one way or the other - before we kick
> SDIO IRQs during system resume.
> 
> > +
> > +skip_pending_irqs:
> >         if (host->sdio_irqs) {
> >                 if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD))
> >                         wake_up_process(host->sdio_irq_thread);
Doug Anderson Aug. 29, 2019, 5:39 p.m. UTC | #3
Hi,

On Thu, Aug 29, 2019 at 10:16 AM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> > In one way, this change makes sense as it adopts the legacy behavior,
> > signaling "cached" SDIO IRQs also for the new SDIO irq work interface.
> >
> > However, there is at least one major concern I see with this approach.
> > That is, in the execution path for sdio_signal_irq() (or calling
> > wake_up_process() for the legacy path), we may end up invoking the
> > SDIO func's ->irq_handler() callback, as to let the SDIO func driver
> > to consume the SDIO IRQ.
> >
> > The problem with this is, that the corresponding SDIO func driver may
> > not have been system resumed, when the ->irq_handler() callback is
> > invoked.
>
> While debugging the the problem with btmrvl I found that this is
> already the case without the patch, just not during resume, but when
> suspending. The func driver suspends before the SDIO bus and
> interrupts can keep coming in. These are processed while the func
> driver is suspended, until the SDIO core starts dropping the
> interrupts.
>
> And I think it is also already true at resume time: mmc_sdio_resume()
> re-enables SDIO IRQs and disables dropping them.

I would also note that this matches the design of the normal system
suspend/resume functions.  Interrupts continue to be enabled even
after the "suspend" call is made for a device.  Presumably this is so
that the suspend function can make use of interrupts even if there is
no other reason.  If it's important for a device to stop getting
interrupts after the "suspend" function is called then it's up to that
device to re-configure the device to stop giving interrupts.

-Doug
Doug Anderson Aug. 29, 2019, 5:44 p.m. UTC | #4
Hi,

On Wed, Aug 28, 2019 at 2:46 PM Matthias Kaehlcke <mka@chromium.org> wrote:
>
> With commit 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs
> when the card is suspended") SDIO interrupts are dropped if they
> occur while the card is suspended. Dropping the interrupts can cause
> problems after resume with cards that remain powered during suspend
> and preserve their state. These cards may end up in an inconsistent
> state since the event that triggered the interrupt is never processed
> and remains pending. One example is the Bluetooth function of the
> Marvell 8997, SDIO is broken on resume (for both Bluetooth and WiFi)
> when processing of a pending HCI event is skipped.
>
> For cards that remained powered during suspend check on resume if
> SDIO interrupts are pending, and trigger interrupt processing if
> needed.
>
> Fixes: 83293386bc95 ("mmc: core: Prevent processing SDIO IRQs when the card is suspended")
> Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> ---
>  drivers/mmc/core/sdio.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
> index 8dd8fc32ecca..a6b4742a91c6 100644
> --- a/drivers/mmc/core/sdio.c
> +++ b/drivers/mmc/core/sdio.c
> @@ -975,6 +975,7 @@ static int mmc_sdio_suspend(struct mmc_host *host)
>  static int mmc_sdio_resume(struct mmc_host *host)
>  {
>         int err = 0;
> +       u8 pending = 0;
>
>         /* Basic card reinitialization. */
>         mmc_claim_host(host);
> @@ -1009,6 +1010,14 @@ static int mmc_sdio_resume(struct mmc_host *host)
>         /* Allow SDIO IRQs to be processed again. */
>         mmc_card_clr_suspended(host->card);
>
> +       if (!mmc_card_keep_power(host))
> +               goto skip_pending_irqs;
> +
> +       if (!sdio_get_pending_irqs(host, &pending) &&
> +           pending != 0)
> +               sdio_signal_irq(host);
> +
> +skip_pending_irqs:
>         if (host->sdio_irqs) {

nit: I'd prefer to avoid the "goto" if possible.  Using "goto" to
handle unwinding during error handling always makes good sent to me,
but here you're not doing unwinding--you're just using the "goto" as
an unstructured "if".  I'd rather just see:

  if (mmc_card_keep_power(host) &&
      !sdio_get_pending_irqs(host, &pending) && pending != 0)
          sdio_signal_irq(host);

Other than that this patch seems sane to me though (obviously) the
person you'd need to convince is Ulf.  ;-)

-Doug
Ulf Hansson Aug. 30, 2019, 6:08 a.m. UTC | #5
On Thu, 29 Aug 2019 at 19:40, Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Thu, Aug 29, 2019 at 10:16 AM Matthias Kaehlcke <mka@chromium.org> wrote:
> >
> > > In one way, this change makes sense as it adopts the legacy behavior,
> > > signaling "cached" SDIO IRQs also for the new SDIO irq work interface.
> > >
> > > However, there is at least one major concern I see with this approach.
> > > That is, in the execution path for sdio_signal_irq() (or calling
> > > wake_up_process() for the legacy path), we may end up invoking the
> > > SDIO func's ->irq_handler() callback, as to let the SDIO func driver
> > > to consume the SDIO IRQ.
> > >
> > > The problem with this is, that the corresponding SDIO func driver may
> > > not have been system resumed, when the ->irq_handler() callback is
> > > invoked.
> >
> > While debugging the the problem with btmrvl I found that this is
> > already the case without the patch, just not during resume, but when
> > suspending. The func driver suspends before the SDIO bus and
> > interrupts can keep coming in. These are processed while the func
> > driver is suspended, until the SDIO core starts dropping the
> > interrupts.
> >
> > And I think it is also already true at resume time: mmc_sdio_resume()
> > re-enables SDIO IRQs and disables dropping them.
>
> I would also note that this matches the design of the normal system
> suspend/resume functions.  Interrupts continue to be enabled even
> after the "suspend" call is made for a device.  Presumably this is so
> that the suspend function can make use of interrupts even if there is
> no other reason.

I understand and you have a good point!

However, in my experience, the most common generic case, is that it's
a bad idea to let a device process interrupts once they have been
suspended. This also applies to runtime suspend (via runtime PM).

> If it's important for a device to stop getting
> interrupts after the "suspend" function is called then it's up to that
> device to re-configure the device to stop giving interrupts.

Again, you have a very good point. The corresponding driver for the
device in question is responsible for dealing with this.

Then, for this particular case, the SDIO func driver scenario, how
would that work?

For example, assume that the SDIO func driver can't process IRQs after
its been system suspended, however it still wants the IRQs to be
re-kicked to consume them once it has been resumed?

Or are you saying that the SDIO func driver for cases when IRQs can't
be consumed during system suspend, that is should call
sdio_release_irq() (then reclaim the IRQ once resumed)?

Kind regards
Uffe
Ulf Hansson Aug. 30, 2019, 10:38 a.m. UTC | #6
On Fri, 30 Aug 2019 at 08:08, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Thu, 29 Aug 2019 at 19:40, Doug Anderson <dianders@chromium.org> wrote:
> >
> > Hi,
> >
> > On Thu, Aug 29, 2019 at 10:16 AM Matthias Kaehlcke <mka@chromium.org> wrote:
> > >
> > > > In one way, this change makes sense as it adopts the legacy behavior,
> > > > signaling "cached" SDIO IRQs also for the new SDIO irq work interface.
> > > >
> > > > However, there is at least one major concern I see with this approach.
> > > > That is, in the execution path for sdio_signal_irq() (or calling
> > > > wake_up_process() for the legacy path), we may end up invoking the
> > > > SDIO func's ->irq_handler() callback, as to let the SDIO func driver
> > > > to consume the SDIO IRQ.
> > > >
> > > > The problem with this is, that the corresponding SDIO func driver may
> > > > not have been system resumed, when the ->irq_handler() callback is
> > > > invoked.
> > >
> > > While debugging the the problem with btmrvl I found that this is
> > > already the case without the patch, just not during resume, but when
> > > suspending. The func driver suspends before the SDIO bus and
> > > interrupts can keep coming in. These are processed while the func
> > > driver is suspended, until the SDIO core starts dropping the
> > > interrupts.
> > >
> > > And I think it is also already true at resume time: mmc_sdio_resume()
> > > re-enables SDIO IRQs and disables dropping them.
> >
> > I would also note that this matches the design of the normal system
> > suspend/resume functions.  Interrupts continue to be enabled even
> > after the "suspend" call is made for a device.  Presumably this is so
> > that the suspend function can make use of interrupts even if there is
> > no other reason.
>
> I understand and you have a good point!
>
> However, in my experience, the most common generic case, is that it's
> a bad idea to let a device process interrupts once they have been
> suspended. This also applies to runtime suspend (via runtime PM).
>
> > If it's important for a device to stop getting
> > interrupts after the "suspend" function is called then it's up to that
> > device to re-configure the device to stop giving interrupts.
>
> Again, you have a very good point. The corresponding driver for the
> device in question is responsible for dealing with this.
>
> Then, for this particular case, the SDIO func driver scenario, how
> would that work?
>
> For example, assume that the SDIO func driver can't process IRQs after
> its been system suspended, however it still wants the IRQs to be
> re-kicked to consume them once it has been resumed?
>
> Or are you saying that the SDIO func driver for cases when IRQs can't
> be consumed during system suspend, that is should call
> sdio_release_irq() (then reclaim the IRQ once resumed)?

I have been thinking more about this. The above seems like a
reasonable assumption to make. So, I started to hack and improve the
SDIO IRQ management, again.

Just to be clear, I like the approach Matthias has taken here, which
means reading SDIO_CCCR_INTx register at system resume, to understand
whether there are some SDIO IRQs that needs to be processed, however
there are some more corner cases that needs to be covered as well.

Let me post something on Monday, then we can continue our discussions
and run some tests as well.

Have nice weekend!
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index 8dd8fc32ecca..a6b4742a91c6 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -975,6 +975,7 @@  static int mmc_sdio_suspend(struct mmc_host *host)
 static int mmc_sdio_resume(struct mmc_host *host)
 {
 	int err = 0;
+	u8 pending = 0;
 
 	/* Basic card reinitialization. */
 	mmc_claim_host(host);
@@ -1009,6 +1010,14 @@  static int mmc_sdio_resume(struct mmc_host *host)
 	/* Allow SDIO IRQs to be processed again. */
 	mmc_card_clr_suspended(host->card);
 
+	if (!mmc_card_keep_power(host))
+		goto skip_pending_irqs;
+
+	if (!sdio_get_pending_irqs(host, &pending) &&
+	    pending != 0)
+		sdio_signal_irq(host);
+
+skip_pending_irqs:
 	if (host->sdio_irqs) {
 		if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD))
 			wake_up_process(host->sdio_irq_thread);