diff mbox series

[v3] mmc: core: Set HS clock speed before sending HS CMD13

Message ID 20220330132946.v3.1.I484f4ee35609f78b932bd50feed639c29e64997e@changeid (mailing list archive)
State New, archived
Headers show
Series [v3] mmc: core: Set HS clock speed before sending HS CMD13 | expand

Commit Message

Brian Norris March 30, 2022, 8:29 p.m. UTC
Way back in commit 4f25580fb84d ("mmc: core: changes frequency to
hs_max_dtr when selecting hs400es"), Rockchip engineers noticed that
some eMMC don't respond to SEND_STATUS commands very reliably if they're
still running at a low initial frequency. As mentioned in that commit,
JESD84-B51 P49 suggests a sequence in which the host:
1. sets HS_TIMING
2. bumps the clock ("<= 52 MHz")
3. sends further commands

It doesn't exactly require that we don't use a lower-than-52MHz
frequency, but in practice, these eMMC don't like it.

The aforementioned commit tried to get that right for HS400ES, although
it's unclear whether this ever truly worked as committed into mainline,
as other changes/refactoring adjusted the sequence in conflicting ways:

08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode
switch")

53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode
for mmc")

In any case, today we do step 3 before step 2. Let's fix that, and also
apply the same logic to HS200/400, where this eMMC has problems too.

Resolves errors like this seen when booting some RK3399 Gru/Scarlet
systems:

[    2.058881] mmc1: CQHCI version 5.10
[    2.097545] mmc1: SDHCI controller on fe330000.mmc [fe330000.mmc] using ADMA
[    2.209804] mmc1: mmc_select_hs400es failed, error -84
[    2.215597] mmc1: error -84 whilst initialising MMC card
[    2.417514] mmc1: mmc_select_hs400es failed, error -110
[    2.423373] mmc1: error -110 whilst initialising MMC card
[    2.605052] mmc1: mmc_select_hs400es failed, error -110
[    2.617944] mmc1: error -110 whilst initialising MMC card
[    2.835884] mmc1: mmc_select_hs400es failed, error -110
[    2.841751] mmc1: error -110 whilst initialising MMC card

Fixes: 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode switch")
Fixes: 53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode for mmc")
Fixes: 4f25580fb84d ("mmc: core: changes frequency to hs_max_dtr when selecting hs400es")
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

Changes in v3:
 * Use mmc_set_bus_speed() to help choose the right clock rate
 * Avoid redundant clock rate changes
 * Restore clock rate on failed HS200 switch

Changes in v2:
 * Use ext_csd.hs200_max_dtr for HS200
 * Retest on top of 3b6c472822f8 ("mmc: core: Improve fallback to speed
   modes if eMMC HS200 fails")

 drivers/mmc/core/core.c |  3 +++
 drivers/mmc/core/mmc.c  | 21 +++++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

Comments

Ulf Hansson April 6, 2022, 2:55 p.m. UTC | #1
On Wed, 30 Mar 2022 at 22:30, Brian Norris <briannorris@chromium.org> wrote:
>
> Way back in commit 4f25580fb84d ("mmc: core: changes frequency to
> hs_max_dtr when selecting hs400es"), Rockchip engineers noticed that
> some eMMC don't respond to SEND_STATUS commands very reliably if they're
> still running at a low initial frequency. As mentioned in that commit,
> JESD84-B51 P49 suggests a sequence in which the host:
> 1. sets HS_TIMING
> 2. bumps the clock ("<= 52 MHz")
> 3. sends further commands
>
> It doesn't exactly require that we don't use a lower-than-52MHz
> frequency, but in practice, these eMMC don't like it.
>
> The aforementioned commit tried to get that right for HS400ES, although
> it's unclear whether this ever truly worked as committed into mainline,
> as other changes/refactoring adjusted the sequence in conflicting ways:
>
> 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode
> switch")
>
> 53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode
> for mmc")
>
> In any case, today we do step 3 before step 2. Let's fix that, and also
> apply the same logic to HS200/400, where this eMMC has problems too.
>
> Resolves errors like this seen when booting some RK3399 Gru/Scarlet
> systems:
>
> [    2.058881] mmc1: CQHCI version 5.10
> [    2.097545] mmc1: SDHCI controller on fe330000.mmc [fe330000.mmc] using ADMA
> [    2.209804] mmc1: mmc_select_hs400es failed, error -84
> [    2.215597] mmc1: error -84 whilst initialising MMC card
> [    2.417514] mmc1: mmc_select_hs400es failed, error -110
> [    2.423373] mmc1: error -110 whilst initialising MMC card
> [    2.605052] mmc1: mmc_select_hs400es failed, error -110
> [    2.617944] mmc1: error -110 whilst initialising MMC card
> [    2.835884] mmc1: mmc_select_hs400es failed, error -110
> [    2.841751] mmc1: error -110 whilst initialising MMC card
>
> Fixes: 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode switch")
> Fixes: 53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode for mmc")
> Fixes: 4f25580fb84d ("mmc: core: changes frequency to hs_max_dtr when selecting hs400es")
> Cc: Shawn Lin <shawn.lin@rock-chips.com>
> Signed-off-by: Brian Norris <briannorris@chromium.org>

To get this thoroughly tested, I have applied it to my next branch, for now.

If it turns out that there are no regressions being reported, I think
we should move the patch to the fixes branch (to get it included for
v5.18) and then also tag it for stable. So, I will get back to this in
a couple of weeks.

Thanks and kind regards
Uffe


> ---
>
> Changes in v3:
>  * Use mmc_set_bus_speed() to help choose the right clock rate
>  * Avoid redundant clock rate changes
>  * Restore clock rate on failed HS200 switch
>
> Changes in v2:
>  * Use ext_csd.hs200_max_dtr for HS200
>  * Retest on top of 3b6c472822f8 ("mmc: core: Improve fallback to speed
>    modes if eMMC HS200 fails")
>
>  drivers/mmc/core/core.c |  3 +++
>  drivers/mmc/core/mmc.c  | 21 +++++++++++++++++----
>  2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
> index 368f10405e13..61abae221623 100644
> --- a/drivers/mmc/core/core.c
> +++ b/drivers/mmc/core/core.c
> @@ -914,6 +914,9 @@ void mmc_set_clock(struct mmc_host *host, unsigned int hz)
>         if (hz > host->f_max)
>                 hz = host->f_max;
>
> +       if (host->ios.clock == hz)
> +               return;
> +
>         host->ios.clock = hz;
>         mmc_set_ios(host);
>  }
> diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
> index e7ea45386c22..1f22f1d2e9b8 100644
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -1384,13 +1384,17 @@ static int mmc_select_hs400es(struct mmc_card *card)
>                 goto out_err;
>         }
>
> +       /*
> +        * Bump to HS timing and frequency. Some cards don't handle
> +        * SEND_STATUS reliably at the initial frequency.
> +        */
>         mmc_set_timing(host, MMC_TIMING_MMC_HS);
> +       mmc_set_bus_speed(card);
> +
>         err = mmc_switch_status(card, true);
>         if (err)
>                 goto out_err;
>
> -       mmc_set_clock(host, card->ext_csd.hs_max_dtr);
> -
>         /* Switch card to DDR with strobe bit */
>         val = EXT_CSD_DDR_BUS_WIDTH_8 | EXT_CSD_BUS_WIDTH_STROBE;
>         err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
> @@ -1448,7 +1452,7 @@ static int mmc_select_hs400es(struct mmc_card *card)
>  static int mmc_select_hs200(struct mmc_card *card)
>  {
>         struct mmc_host *host = card->host;
> -       unsigned int old_timing, old_signal_voltage;
> +       unsigned int old_timing, old_signal_voltage, old_clock;
>         int err = -EINVAL;
>         u8 val;
>
> @@ -1479,8 +1483,15 @@ static int mmc_select_hs200(struct mmc_card *card)
>                                    false, true, MMC_CMD_RETRIES);
>                 if (err)
>                         goto err;
> +
> +               /*
> +                * Bump to HS200 timing and frequency. Some cards don't
> +                * handle SEND_STATUS reliably at the initial frequency.
> +                */
>                 old_timing = host->ios.timing;
> +               old_clock = host->ios.clock;
>                 mmc_set_timing(host, MMC_TIMING_MMC_HS200);
> +               mmc_set_bus_speed(card);
>
>                 /*
>                  * For HS200, CRC errors are not a reliable way to know the
> @@ -1493,8 +1504,10 @@ static int mmc_select_hs200(struct mmc_card *card)
>                  * mmc_select_timing() assumes timing has not changed if
>                  * it is a switch error.
>                  */
> -               if (err == -EBADMSG)
> +               if (err == -EBADMSG) {
> +                       mmc_set_clock(host, old_clock);
>                         mmc_set_timing(host, old_timing);
> +               }
>         }
>  err:
>         if (err) {
> --
> 2.35.1.1094.g7c7d902a7c-goog
>
Luca Weiss April 21, 2022, 6:46 p.m. UTC | #2
Hi Brian and Ulf,

On Mittwoch, 6. April 2022 16:55:40 CEST Ulf Hansson wrote:
> On Wed, 30 Mar 2022 at 22:30, Brian Norris <briannorris@chromium.org> wrote:
> > Way back in commit 4f25580fb84d ("mmc: core: changes frequency to
> > hs_max_dtr when selecting hs400es"), Rockchip engineers noticed that
> > some eMMC don't respond to SEND_STATUS commands very reliably if they're
> > still running at a low initial frequency. As mentioned in that commit,
> > JESD84-B51 P49 suggests a sequence in which the host:
> > 1. sets HS_TIMING
> > 2. bumps the clock ("<= 52 MHz")
> > 3. sends further commands
> > 
> > It doesn't exactly require that we don't use a lower-than-52MHz
> > frequency, but in practice, these eMMC don't like it.
> > 
> > The aforementioned commit tried to get that right for HS400ES, although
> > it's unclear whether this ever truly worked as committed into mainline,
> > as other changes/refactoring adjusted the sequence in conflicting ways:
> > 
> > 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed mode
> > switch")
> > 
> > 53e60650f74e ("mmc: core: Allow CMD13 polling when switching to HS mode
> > for mmc")
> > 
> > In any case, today we do step 3 before step 2. Let's fix that, and also
> > apply the same logic to HS200/400, where this eMMC has problems too.
> > 
> > Resolves errors like this seen when booting some RK3399 Gru/Scarlet
> > systems:
> > 
> > [    2.058881] mmc1: CQHCI version 5.10
> > [    2.097545] mmc1: SDHCI controller on fe330000.mmc [fe330000.mmc] using
> > ADMA [    2.209804] mmc1: mmc_select_hs400es failed, error -84
> > [    2.215597] mmc1: error -84 whilst initialising MMC card
> > [    2.417514] mmc1: mmc_select_hs400es failed, error -110
> > [    2.423373] mmc1: error -110 whilst initialising MMC card
> > [    2.605052] mmc1: mmc_select_hs400es failed, error -110
> > [    2.617944] mmc1: error -110 whilst initialising MMC card
> > [    2.835884] mmc1: mmc_select_hs400es failed, error -110
> > [    2.841751] mmc1: error -110 whilst initialising MMC card
> > 
> > Fixes: 08573eaf1a70 ("mmc: mmc: do not use CMD13 to get status after speed
> > mode switch") Fixes: 53e60650f74e ("mmc: core: Allow CMD13 polling when
> > switching to HS mode for mmc") Fixes: 4f25580fb84d ("mmc: core: changes
> > frequency to hs_max_dtr when selecting hs400es") Cc: Shawn Lin
> > <shawn.lin@rock-chips.com>
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> 
> To get this thoroughly tested, I have applied it to my next branch, for now.
> 
> If it turns out that there are no regressions being reported, I think
> we should move the patch to the fixes branch (to get it included for
> v5.18) and then also tag it for stable. So, I will get back to this in
> a couple of weeks.

Unfortunately this patch breaks internal storage on qcom-msm8974-fairphone-fp2

With this patch (included in linux-next-20220421) it fails to initialize:

[    1.868608] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci] using 
ADMA 64-bit
[    1.925220] mmc0: mmc_select_hs200 failed, error -110
[    1.925285] mmc0: error -110 whilst initialising MMC card

After reverting this patch, it works fine again.

[    1.908835] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci] using 
ADMA 64-bit
[    1.964700] mmc0: new HS200 MMC card at address 0001
[    1.965388] mmcblk0: mmc0:0001 BWBC3R 29.1 GiB 
[    1.975106]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 
p16 p17 p18 p19 p20
[    1.982545] mmcblk0boot0: mmc0:0001 BWBC3R 4.00 MiB 
[    1.988247] mmcblk0boot1: mmc0:0001 BWBC3R 4.00 MiB 
[    1.993287] mmcblk0rpmb: mmc0:0001 BWBC3R 4.00 MiB, chardev (242:0)


Regards
Luca
Brian Norris April 21, 2022, 8:25 p.m. UTC | #3
Hi Luca,

On Thu, Apr 21, 2022 at 08:46:42PM +0200, Luca Weiss wrote:
> On Mittwoch, 6. April 2022 16:55:40 CEST Ulf Hansson wrote:
> > To get this thoroughly tested, I have applied it to my next branch, for now.
> > 
> > If it turns out that there are no regressions being reported, I think
> > we should move the patch to the fixes branch (to get it included for
> > v5.18) and then also tag it for stable. So, I will get back to this in
> > a couple of weeks.
> 
> Unfortunately this patch breaks internal storage on qcom-msm8974-fairphone-fp2

That is indeed unfortunate :( So we should definitely not pick it to
fixes/stable, at least not yet. And if we can't come to a solution soon,
maybe revert it entirely, or at least drop the HS200 portions of the
change. (The systems that inspired this change are OK at HS400ES, FWIW,
so the HS200 changes are just a bonus.)

> With this patch (included in linux-next-20220421) it fails to initialize:
> 
> [    1.868608] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci] using 
> ADMA 64-bit
> [    1.925220] mmc0: mmc_select_hs200 failed, error -110
> [    1.925285] mmc0: error -110 whilst initialising MMC card
> 
> After reverting this patch, it works fine again.
> 
> [    1.908835] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci] using 
> ADMA 64-bit
> [    1.964700] mmc0: new HS200 MMC card at address 0001
> [    1.965388] mmcblk0: mmc0:0001 BWBC3R 29.1 GiB 
> [    1.975106]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 
> p16 p17 p18 p19 p20
> [    1.982545] mmcblk0boot0: mmc0:0001 BWBC3R 4.00 MiB 
> [    1.988247] mmcblk0boot1: mmc0:0001 BWBC3R 4.00 MiB 
> [    1.993287] mmcblk0rpmb: mmc0:0001 BWBC3R 4.00 MiB, chardev (242:0)

As a bit of a (semi-educated) shot in the dark: can you try the appended
patch? That's what my patch v1 did, but I changed it due to review
comments. (Either way worked for my systems.) After re-reading the
HS200-specific portions of the spec (JESD84-B51 page 45 / 6.6.2.2), it's
possible setting all the way to 200 MHz this early was a bit
overagressive, and we should be keeping a max of 52 MHz at this point.

Thanks for testing and reporting.

Brian

--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1491,7 +1491,7 @@ static int mmc_select_hs200(struct mmc_card *card)
 		old_timing = host->ios.timing;
 		old_clock = host->ios.clock;
 		mmc_set_timing(host, MMC_TIMING_MMC_HS200);
-		mmc_set_bus_speed(card);
+		mmc_set_clock(card->host, card->ext_csd.hs_max_dtr);
 
 		/*
 		 * For HS200, CRC errors are not a reliable way to know the
Luca Weiss April 21, 2022, 10:04 p.m. UTC | #4
Hi Brian,

On Donnerstag, 21. April 2022 22:25:21 CEST Brian Norris wrote:
> Hi Luca,
> 
> On Thu, Apr 21, 2022 at 08:46:42PM +0200, Luca Weiss wrote:
> > On Mittwoch, 6. April 2022 16:55:40 CEST Ulf Hansson wrote:
> > > To get this thoroughly tested, I have applied it to my next branch, for
> > > now.
> > > 
> > > If it turns out that there are no regressions being reported, I think
> > > we should move the patch to the fixes branch (to get it included for
> > > v5.18) and then also tag it for stable. So, I will get back to this in
> > > a couple of weeks.
> > 
> > Unfortunately this patch breaks internal storage on
> > qcom-msm8974-fairphone-fp2
> That is indeed unfortunate :( So we should definitely not pick it to
> fixes/stable, at least not yet. And if we can't come to a solution soon,
> maybe revert it entirely, or at least drop the HS200 portions of the
> change. (The systems that inspired this change are OK at HS400ES, FWIW,
> so the HS200 changes are just a bonus.)
> 
> > With this patch (included in linux-next-20220421) it fails to initialize:
> > 
> > [    1.868608] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci]
> > using ADMA 64-bit
> > [    1.925220] mmc0: mmc_select_hs200 failed, error -110
> > [    1.925285] mmc0: error -110 whilst initialising MMC card
> > 
> > After reverting this patch, it works fine again.
> > 
> > [    1.908835] mmc0: SDHCI controller on f9824900.sdhci [f9824900.sdhci]
> > using ADMA 64-bit
> > [    1.964700] mmc0: new HS200 MMC card at address 0001
> > [    1.965388] mmcblk0: mmc0:0001 BWBC3R 29.1 GiB
> > [    1.975106]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14
> > p15
> > p16 p17 p18 p19 p20
> > [    1.982545] mmcblk0boot0: mmc0:0001 BWBC3R 4.00 MiB
> > [    1.988247] mmcblk0boot1: mmc0:0001 BWBC3R 4.00 MiB
> > [    1.993287] mmcblk0rpmb: mmc0:0001 BWBC3R 4.00 MiB, chardev (242:0)
> 
> As a bit of a (semi-educated) shot in the dark: can you try the appended
> patch? That's what my patch v1 did, but I changed it due to review
> comments. (Either way worked for my systems.) After re-reading the
> HS200-specific portions of the spec (JESD84-B51 page 45 / 6.6.2.2), it's
> possible setting all the way to 200 MHz this early was a bit
> overagressive, and we should be keeping a max of 52 MHz at this point.

It looks like with the original patch in, plus your attached patch on top it 
seems to work as well. Thanks!

Regards
Luca

> 
> Thanks for testing and reporting.
> 
> Brian
> 
> --- a/drivers/mmc/core/mmc.c
> +++ b/drivers/mmc/core/mmc.c
> @@ -1491,7 +1491,7 @@ static int mmc_select_hs200(struct mmc_card *card)
>  		old_timing = host->ios.timing;
>  		old_clock = host->ios.clock;
>  		mmc_set_timing(host, MMC_TIMING_MMC_HS200);
> -		mmc_set_bus_speed(card);
> +		mmc_set_clock(card->host, card->ext_csd.hs_max_dtr);
> 
>  		/*
>  		 * For HS200, CRC errors are not a reliable way to know 
the
diff mbox series

Patch

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 368f10405e13..61abae221623 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -914,6 +914,9 @@  void mmc_set_clock(struct mmc_host *host, unsigned int hz)
 	if (hz > host->f_max)
 		hz = host->f_max;
 
+	if (host->ios.clock == hz)
+		return;
+
 	host->ios.clock = hz;
 	mmc_set_ios(host);
 }
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index e7ea45386c22..1f22f1d2e9b8 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -1384,13 +1384,17 @@  static int mmc_select_hs400es(struct mmc_card *card)
 		goto out_err;
 	}
 
+	/*
+	 * Bump to HS timing and frequency. Some cards don't handle
+	 * SEND_STATUS reliably at the initial frequency.
+	 */
 	mmc_set_timing(host, MMC_TIMING_MMC_HS);
+	mmc_set_bus_speed(card);
+
 	err = mmc_switch_status(card, true);
 	if (err)
 		goto out_err;
 
-	mmc_set_clock(host, card->ext_csd.hs_max_dtr);
-
 	/* Switch card to DDR with strobe bit */
 	val = EXT_CSD_DDR_BUS_WIDTH_8 | EXT_CSD_BUS_WIDTH_STROBE;
 	err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
@@ -1448,7 +1452,7 @@  static int mmc_select_hs400es(struct mmc_card *card)
 static int mmc_select_hs200(struct mmc_card *card)
 {
 	struct mmc_host *host = card->host;
-	unsigned int old_timing, old_signal_voltage;
+	unsigned int old_timing, old_signal_voltage, old_clock;
 	int err = -EINVAL;
 	u8 val;
 
@@ -1479,8 +1483,15 @@  static int mmc_select_hs200(struct mmc_card *card)
 				   false, true, MMC_CMD_RETRIES);
 		if (err)
 			goto err;
+
+		/*
+		 * Bump to HS200 timing and frequency. Some cards don't
+		 * handle SEND_STATUS reliably at the initial frequency.
+		 */
 		old_timing = host->ios.timing;
+		old_clock = host->ios.clock;
 		mmc_set_timing(host, MMC_TIMING_MMC_HS200);
+		mmc_set_bus_speed(card);
 
 		/*
 		 * For HS200, CRC errors are not a reliable way to know the
@@ -1493,8 +1504,10 @@  static int mmc_select_hs200(struct mmc_card *card)
 		 * mmc_select_timing() assumes timing has not changed if
 		 * it is a switch error.
 		 */
-		if (err == -EBADMSG)
+		if (err == -EBADMSG) {
+			mmc_set_clock(host, old_clock);
 			mmc_set_timing(host, old_timing);
+		}
 	}
 err:
 	if (err) {