diff mbox

[2/2] mmc: core: fall back host->f_init if failing to init mmc card after resume

Message ID 1468918715-24914-2-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shawn Lin July 19, 2016, 8:58 a.m. UTC
We observed the failure of initializing card after resume
accidentally. It's hard to reproduce but we did get report from
the suspend/resume test of our RK3399 mp test farm . Unfortunately,
we still fail to figure out what was going wrong at that time.
Also we can't achieve it by retrying the host->f_init without falling
back it. But this patch will solve the problem as we could add some log
there and see that we resume the mmc card successfully after falling
back the host->f_init. There is no obvious side effect found, so it seems
this patch will improve the stability.

[   93.405085] mmc1: unexpected status 0x800900 after switch
[   93.408474] mmc1: switch to bus width 1 failed
[   93.408482] mmc1: mmc_select_hs200 failed, error -110
[   93.408492] mmc1: error -110 during resume (card was removed?)
[   93.408705] PM: resume of devices complete after 213.453 msecs

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

 drivers/mmc/core/mmc.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

Comments

Ulf Hansson July 19, 2016, 10:31 a.m. UTC | #1
On 19 July 2016 at 10:58, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> We observed the failure of initializing card after resume
> accidentally. It's hard to reproduce but we did get report from
> the suspend/resume test of our RK3399 mp test farm . Unfortunately,
> we still fail to figure out what was going wrong at that time.
> Also we can't achieve it by retrying the host->f_init without falling
> back it. But this patch will solve the problem as we could add some log
> there and see that we resume the mmc card successfully after falling
> back the host->f_init. There is no obvious side effect found, so it seems
> this patch will improve the stability.

What f_init did the original rescan work find out being successful?

Did you then verify that it was *another* frequency that made the
initialization to work in the resume?

>
> [   93.405085] mmc1: unexpected status 0x800900 after switch
> [   93.408474] mmc1: switch to bus width 1 failed
> [   93.408482] mmc1: mmc_select_hs200 failed, error -110
> [   93.408492] mmc1: error -110 during resume (card was removed?)
> [   93.408705] PM: resume of devices complete after 213.453 msecs
>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
>
>  drivers/mmc/core/mmc.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> index 403b97b..bef40c8 100644
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -1945,6 +1945,7 @@ static int mmc_suspend(struct mmc_host *host)
>  static int _mmc_resume(struct mmc_host *host)
>  {
>         int err = 0;
> +       int i;
>
>         BUG_ON(!host);
>         BUG_ON(!host->card);
> @@ -1954,8 +1955,25 @@ static int _mmc_resume(struct mmc_host *host)
>         if (!mmc_card_suspended(host->card))
>                 goto out;
>
> -       mmc_power_up(host, host->card->ocr);
> -       err = mmc_init_card(host, host->card->ocr, host->card);
> +       /*
> +        * Let's try to fallback the host->f_init
> +        * if failing to init card after resume.
> +        */
> +
> +       for (i = 0; i < ARRAY_SIZE(freqs); i++) {
> +               if (host->f_init < freqs[i])
> +                       continue;
> +               else
> +                       host->f_init = freqs[i];

This loop is wrong, as you don't consider that host->f_min may not
exactly match the frequencies in the freqs array.

> +
> +               mmc_power_up(host, host->card->ocr);
> +               err = mmc_init_card(host, host->card->ocr, host->card);
> +               if (!err)
> +                       break;
> +
> +               mmc_power_off(host);

The mmc core expects the host/card to be powered up after a resume, as
it may send commands to it without first invoking mmc_power_up().
Even if those commands may fail (or timeout), at least those doesn't
hang which may be the case if the host/card isn't powered up first.

So, you must *not* leave the host/card in "mmc_power_off()" state in
_mmc_resume().

[...]

Kind regards
Uffe
Shawn Lin July 20, 2016, 1:33 a.m. UTC | #2
在 2016/7/19 18:31, Ulf Hansson 写道:
> On 19 July 2016 at 10:58, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> We observed the failure of initializing card after resume
>> accidentally. It's hard to reproduce but we did get report from
>> the suspend/resume test of our RK3399 mp test farm . Unfortunately,
>> we still fail to figure out what was going wrong at that time.
>> Also we can't achieve it by retrying the host->f_init without falling
>> back it. But this patch will solve the problem as we could add some log
>> there and see that we resume the mmc card successfully after falling
>> back the host->f_init. There is no obvious side effect found, so it seems
>> this patch will improve the stability.
>
> What f_init did the original rescan work find out being successful?

The original f_init was 400k when booting up the system.

>
> Did you then verify that it was *another* frequency that made the
> initialization to work in the resume?

yes, I added some log and saw 100k made the initialization to work in
the resume finally with the patch applied.

Also I mentioned in the commit msg that if I just retried 400K, it
cannot made the initialization successful.

>
>>
>> [   93.405085] mmc1: unexpected status 0x800900 after switch
>> [   93.408474] mmc1: switch to bus width 1 failed
>> [   93.408482] mmc1: mmc_select_hs200 failed, error -110
>> [   93.408492] mmc1: error -110 during resume (card was removed?)
>> [   93.408705] PM: resume of devices complete after 213.453 msecs
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>> ---
>>
>>  drivers/mmc/core/mmc.c | 22 ++++++++++++++++++++--
>>  1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
>> index 403b97b..bef40c8 100644
>> --- a/drivers/mmc/core/mmc.c
>> +++ b/drivers/mmc/core/mmc.c
>> @@ -1945,6 +1945,7 @@ static int mmc_suspend(struct mmc_host *host)
>>  static int _mmc_resume(struct mmc_host *host)
>>  {
>>         int err = 0;
>> +       int i;
>>
>>         BUG_ON(!host);
>>         BUG_ON(!host->card);
>> @@ -1954,8 +1955,25 @@ static int _mmc_resume(struct mmc_host *host)
>>         if (!mmc_card_suspended(host->card))
>>                 goto out;
>>
>> -       mmc_power_up(host, host->card->ocr);
>> -       err = mmc_init_card(host, host->card->ocr, host->card);
>> +       /*
>> +        * Let's try to fallback the host->f_init
>> +        * if failing to init card after resume.
>> +        */
>> +
>> +       for (i = 0; i < ARRAY_SIZE(freqs); i++) {
>> +               if (host->f_init < freqs[i])
>> +                       continue;
>> +               else
>> +                       host->f_init = freqs[i];
>
> This loop is wrong, as you don't consider that host->f_min may not
> exactly match the frequencies in the freqs array.

Ah, seems I should use 'max(freqs[i], host->f_min)'.. :)

>
>> +
>> +               mmc_power_up(host, host->card->ocr);
>> +               err = mmc_init_card(host, host->card->ocr, host->card);
>> +               if (!err)
>> +                       break;
>> +
>> +               mmc_power_off(host);
>
> The mmc core expects the host/card to be powered up after a resume, as
> it may send commands to it without first invoking mmc_power_up().
> Even if those commands may fail (or timeout), at least those doesn't
> hang which may be the case if the host/card isn't powered up first.
>

I got it, thanks.

> So, you must *not* leave the host/card in "mmc_power_off()" state in
> _mmc_resume().
>
> [...]
>
> Kind regards
> Uffe
>
>
>
diff mbox

Patch

diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 403b97b..bef40c8 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1945,6 +1945,7 @@  static int mmc_suspend(struct mmc_host *host)
 static int _mmc_resume(struct mmc_host *host)
 {
 	int err = 0;
+	int i;
 
 	BUG_ON(!host);
 	BUG_ON(!host->card);
@@ -1954,8 +1955,25 @@  static int _mmc_resume(struct mmc_host *host)
 	if (!mmc_card_suspended(host->card))
 		goto out;
 
-	mmc_power_up(host, host->card->ocr);
-	err = mmc_init_card(host, host->card->ocr, host->card);
+	/*
+	 * Let's try to fallback the host->f_init
+	 * if failing to init card after resume.
+	 */
+
+	for (i = 0; i < ARRAY_SIZE(freqs); i++) {
+		if (host->f_init < freqs[i])
+			continue;
+		else
+			host->f_init = freqs[i];
+
+		mmc_power_up(host, host->card->ocr);
+		err = mmc_init_card(host, host->card->ocr, host->card);
+		if (!err)
+			break;
+
+		mmc_power_off(host);
+	}
+
 	mmc_card_clr_suspended(host->card);
 
 out: