diff mbox series

mmc: sdhci-msm: Add retries when all tuning phases are found valid

Message ID 20200827075809.1.If179abf5ecb67c963494db79c3bc4247d987419b@changeid (mailing list archive)
State New, archived
Headers show
Series mmc: sdhci-msm: Add retries when all tuning phases are found valid | expand

Commit Message

Doug Anderson Aug. 27, 2020, 2:58 p.m. UTC
As the comments in this patch say, if we tune and find all phases are
valid it's _almost_ as bad as no phases being found valid.  Probably
all phases are not really reliable but we didn't detect where the
unreliable place is.  That means we'll essentially be guessing and
hoping we get a good phase.

This is not just a problem in theory.  It was causing real problems on
a real board.  On that board, most often phase 10 is found as the only
invalid phase, though sometimes 10 and 11 are invalid and sometimes
just 11.  Some percentage of the time, however, all phases are found
to be valid.  When this happens, the current logic will decide to use
phase 11.  Since phase 11 is sometimes found to be invalid, this is a
bad choice.  Sure enough, when phase 11 is picked we often get mmc
errors later in boot.

I have seen cases where all phases were found to be valid 3 times in a
row, so increase the retry count to 10 just to be extra sure.

Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/mmc/host/sdhci-msm.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Comments

Veerabhadrarao Badiganti Aug. 28, 2020, 12:42 p.m. UTC | #1
On 2020-08-27 20:28, Douglas Anderson wrote:
> As the comments in this patch say, if we tune and find all phases are
> valid it's _almost_ as bad as no phases being found valid.  Probably
> all phases are not really reliable but we didn't detect where the
> unreliable place is.  That means we'll essentially be guessing and
> hoping we get a good phase.
> 
> This is not just a problem in theory.  It was causing real problems on
> a real board.  On that board, most often phase 10 is found as the only
> invalid phase, though sometimes 10 and 11 are invalid and sometimes
> just 11.  Some percentage of the time, however, all phases are found
> to be valid.  When this happens, the current logic will decide to use
> phase 11.  Since phase 11 is sometimes found to be invalid, this is a
> bad choice.  Sure enough, when phase 11 is picked we often get mmc
> errors later in boot.
> 
> I have seen cases where all phases were found to be valid 3 times in a
> row, so increase the retry count to 10 just to be extra sure.
> 
> Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning
> implementation")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 

Thanks for adding this logic.

Reviewed-by: Veerabhadrarao Badiganti <vbadigan@codeaurora.org>


>  drivers/mmc/host/sdhci-msm.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-msm.c 
> b/drivers/mmc/host/sdhci-msm.c
> index b7e47107a31a..1b78106681e0 100644
> --- a/drivers/mmc/host/sdhci-msm.c
> +++ b/drivers/mmc/host/sdhci-msm.c
> @@ -1165,7 +1165,7 @@ static void sdhci_msm_set_cdr(struct sdhci_host
> *host, bool enable)
>  static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  {
>  	struct sdhci_host *host = mmc_priv(mmc);
> -	int tuning_seq_cnt = 3;
> +	int tuning_seq_cnt = 10;
>  	u8 phase, tuned_phases[16], tuned_phase_cnt = 0;
>  	int rc;
>  	struct mmc_ios ios = host->mmc->ios;
> @@ -1221,6 +1221,22 @@ static int sdhci_msm_execute_tuning(struct
> mmc_host *mmc, u32 opcode)
>  	} while (++phase < ARRAY_SIZE(tuned_phases));
> 
>  	if (tuned_phase_cnt) {
> +		if (tuned_phase_cnt == ARRAY_SIZE(tuned_phases)) {
> +			/*
> +			 * All phases valid is _almost_ as bad as no phases
> +			 * valid.  Probably all phases are not really reliable
> +			 * but we didn't detect where the unreliable place is.
> +			 * That means we'll essentially be guessing and hoping
> +			 * we get a good phase.  Better to try a few times.
> +			 */
> +			dev_dbg(mmc_dev(mmc), "%s: All phases valid; try again\n",
> +				mmc_hostname(mmc));
> +			if (--tuning_seq_cnt) {
> +				tuned_phase_cnt = 0;
> +				goto retry;
> +			}
> +		}
> +
>  		rc = msm_find_most_appropriate_phase(host, tuned_phases,
>  						     tuned_phase_cnt);
>  		if (rc < 0)
Adrian Hunter Aug. 31, 2020, 2:14 p.m. UTC | #2
On 27/08/20 5:58 pm, Douglas Anderson wrote:
> As the comments in this patch say, if we tune and find all phases are
> valid it's _almost_ as bad as no phases being found valid.  Probably
> all phases are not really reliable but we didn't detect where the
> unreliable place is.  That means we'll essentially be guessing and
> hoping we get a good phase.
> 
> This is not just a problem in theory.  It was causing real problems on
> a real board.  On that board, most often phase 10 is found as the only
> invalid phase, though sometimes 10 and 11 are invalid and sometimes
> just 11.  Some percentage of the time, however, all phases are found
> to be valid.  When this happens, the current logic will decide to use
> phase 11.  Since phase 11 is sometimes found to be invalid, this is a
> bad choice.  Sure enough, when phase 11 is picked we often get mmc
> errors later in boot.
> 
> I have seen cases where all phases were found to be valid 3 times in a
> row, so increase the retry count to 10 just to be extra sure.
> 
> Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Acked-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
> 
>  drivers/mmc/host/sdhci-msm.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
> index b7e47107a31a..1b78106681e0 100644
> --- a/drivers/mmc/host/sdhci-msm.c
> +++ b/drivers/mmc/host/sdhci-msm.c
> @@ -1165,7 +1165,7 @@ static void sdhci_msm_set_cdr(struct sdhci_host *host, bool enable)
>  static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  {
>  	struct sdhci_host *host = mmc_priv(mmc);
> -	int tuning_seq_cnt = 3;
> +	int tuning_seq_cnt = 10;
>  	u8 phase, tuned_phases[16], tuned_phase_cnt = 0;
>  	int rc;
>  	struct mmc_ios ios = host->mmc->ios;
> @@ -1221,6 +1221,22 @@ static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  	} while (++phase < ARRAY_SIZE(tuned_phases));
>  
>  	if (tuned_phase_cnt) {
> +		if (tuned_phase_cnt == ARRAY_SIZE(tuned_phases)) {
> +			/*
> +			 * All phases valid is _almost_ as bad as no phases
> +			 * valid.  Probably all phases are not really reliable
> +			 * but we didn't detect where the unreliable place is.
> +			 * That means we'll essentially be guessing and hoping
> +			 * we get a good phase.  Better to try a few times.
> +			 */
> +			dev_dbg(mmc_dev(mmc), "%s: All phases valid; try again\n",
> +				mmc_hostname(mmc));
> +			if (--tuning_seq_cnt) {
> +				tuned_phase_cnt = 0;
> +				goto retry;
> +			}
> +		}
> +
>  		rc = msm_find_most_appropriate_phase(host, tuned_phases,
>  						     tuned_phase_cnt);
>  		if (rc < 0)
>
Ulf Hansson Sept. 2, 2020, 9:03 a.m. UTC | #3
On Thu, 27 Aug 2020 at 16:59, Douglas Anderson <dianders@chromium.org> wrote:
>
> As the comments in this patch say, if we tune and find all phases are
> valid it's _almost_ as bad as no phases being found valid.  Probably
> all phases are not really reliable but we didn't detect where the
> unreliable place is.  That means we'll essentially be guessing and
> hoping we get a good phase.
>
> This is not just a problem in theory.  It was causing real problems on
> a real board.  On that board, most often phase 10 is found as the only
> invalid phase, though sometimes 10 and 11 are invalid and sometimes
> just 11.  Some percentage of the time, however, all phases are found
> to be valid.  When this happens, the current logic will decide to use
> phase 11.  Since phase 11 is sometimes found to be invalid, this is a
> bad choice.  Sure enough, when phase 11 is picked we often get mmc
> errors later in boot.
>
> I have seen cases where all phases were found to be valid 3 times in a
> row, so increase the retry count to 10 just to be extra sure.
>
> Fixes: 415b5a75da43 ("mmc: sdhci-msm: Add platform_execute_tuning implementation")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Applied for fixes and by adding a stable tag, thanks!

Kind regards
Uffe


> ---
>
>  drivers/mmc/host/sdhci-msm.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
> index b7e47107a31a..1b78106681e0 100644
> --- a/drivers/mmc/host/sdhci-msm.c
> +++ b/drivers/mmc/host/sdhci-msm.c
> @@ -1165,7 +1165,7 @@ static void sdhci_msm_set_cdr(struct sdhci_host *host, bool enable)
>  static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  {
>         struct sdhci_host *host = mmc_priv(mmc);
> -       int tuning_seq_cnt = 3;
> +       int tuning_seq_cnt = 10;
>         u8 phase, tuned_phases[16], tuned_phase_cnt = 0;
>         int rc;
>         struct mmc_ios ios = host->mmc->ios;
> @@ -1221,6 +1221,22 @@ static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
>         } while (++phase < ARRAY_SIZE(tuned_phases));
>
>         if (tuned_phase_cnt) {
> +               if (tuned_phase_cnt == ARRAY_SIZE(tuned_phases)) {
> +                       /*
> +                        * All phases valid is _almost_ as bad as no phases
> +                        * valid.  Probably all phases are not really reliable
> +                        * but we didn't detect where the unreliable place is.
> +                        * That means we'll essentially be guessing and hoping
> +                        * we get a good phase.  Better to try a few times.
> +                        */
> +                       dev_dbg(mmc_dev(mmc), "%s: All phases valid; try again\n",
> +                               mmc_hostname(mmc));
> +                       if (--tuning_seq_cnt) {
> +                               tuned_phase_cnt = 0;
> +                               goto retry;
> +                       }
> +               }
> +
>                 rc = msm_find_most_appropriate_phase(host, tuned_phases,
>                                                      tuned_phase_cnt);
>                 if (rc < 0)
> --
> 2.28.0.297.g1956fa8f8d-goog
>
diff mbox series

Patch

diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
index b7e47107a31a..1b78106681e0 100644
--- a/drivers/mmc/host/sdhci-msm.c
+++ b/drivers/mmc/host/sdhci-msm.c
@@ -1165,7 +1165,7 @@  static void sdhci_msm_set_cdr(struct sdhci_host *host, bool enable)
 static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
 {
 	struct sdhci_host *host = mmc_priv(mmc);
-	int tuning_seq_cnt = 3;
+	int tuning_seq_cnt = 10;
 	u8 phase, tuned_phases[16], tuned_phase_cnt = 0;
 	int rc;
 	struct mmc_ios ios = host->mmc->ios;
@@ -1221,6 +1221,22 @@  static int sdhci_msm_execute_tuning(struct mmc_host *mmc, u32 opcode)
 	} while (++phase < ARRAY_SIZE(tuned_phases));
 
 	if (tuned_phase_cnt) {
+		if (tuned_phase_cnt == ARRAY_SIZE(tuned_phases)) {
+			/*
+			 * All phases valid is _almost_ as bad as no phases
+			 * valid.  Probably all phases are not really reliable
+			 * but we didn't detect where the unreliable place is.
+			 * That means we'll essentially be guessing and hoping
+			 * we get a good phase.  Better to try a few times.
+			 */
+			dev_dbg(mmc_dev(mmc), "%s: All phases valid; try again\n",
+				mmc_hostname(mmc));
+			if (--tuning_seq_cnt) {
+				tuned_phase_cnt = 0;
+				goto retry;
+			}
+		}
+
 		rc = msm_find_most_appropriate_phase(host, tuned_phases,
 						     tuned_phase_cnt);
 		if (rc < 0)