diff mbox

eMMC tuning issue on Odroid C2 and a possible solution

Message ID b931985f-16b2-0a90-e73b-b6c349ad7d6c@gmail.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Heiner Kallweit Oct. 11, 2017, 8:46 p.m. UTC
Hi Jerome,

we have the known issue that the latest next kernel still fails on
Odroid C2 with a 128GB eMMC card (w/o adjusting the initial tx phase).

I found the time to dig a little deeper into it and reason is that
there are certain rx/tx phase combinations which are perfectly fine
when tuning but fail in real life.
Don't ask me how this can happen, I just see it happen.

To deal with such cases I added some code to avoid known invalid phase
values when retuning. In addition I added some code to deal with the
following (more or less corner) case:
Let's say we have rx = 0° and tx = 0° and working is only combination
rx = 180° and tx = 180°.
Then just tuning rx only or tx only will never result in a working
combination.

Following patch makes my system work. I just see one CRC error when
the initally tuned rx/tx phase combination fails and then the
retuning results in a stable system w/o further errors.

I'd appreciate if you could check that this patch doesn't break any
of your systems.

Rgds, Heiner

---
 drivers/mmc/host/meson-gx-mmc.c | 29 ++++++++++++++++++++++-------
 1 file changed, 22 insertions(+), 7 deletions(-)

Comments

Jerome Brunet Oct. 12, 2017, 3:22 p.m. UTC | #1
On Wed, 2017-10-11 at 22:46 +0200, Heiner Kallweit wrote:
> Hi Jerome,
> 
> we have the known issue that the latest next kernel still fails on
> Odroid C2 with a 128GB eMMC card (w/o adjusting the initial tx phase).
> 
> I found the time to dig a little deeper into it and reason is that
> there are certain rx/tx phase combinations which are perfectly fine
> when tuning but fail in real life.

You did not mention it in this mail, but I going to guess this is done, again,
in hs400 ...

> Don't ask me how this can happen, I just see it happen.

Oh I have pretty good idea of what is happening here, and to be absolutely it is
not a regression.

I think I already explained this but, before the MMC series that went in this
cycle, the frequency reported by the mmc driver was twice what it really was
for all DDR mode, including HS400.

So, up to v4.13, when you thought you were doing hs400 @ 200MHz, the frequency
was really 100Mhz

With all the mmc patches applied (including the recent tuning fix) the frequency
in DDR mode is set as requested by the framework. So on your target, it has
increased from 100MHz to 166Mhz. This is a huge difference. You did not change
the DT but the frequency did change.

The symptoms you are describing, I have seen them while trying SDR104 on boards
where the layout proved to be unable to cope with the clock rate involved. Some
cards worked well, some did not at all. When looking at the signal amplitude,
the problem was clear, the signal quality was just not good enough. The
communication might be OK for a short while (while doing the tuning) but may
fail while doing "real life" transfers.

So why does the tuning succeed and you get errors later on ? Simply because the
tuning is not the problem. The HW (and the frequency used) is.

As far as I can tell, your eMMC card + your odroidc2 is simply not able to cope
(reliably) with 166Mhz. The fact that you continue to have CRC errors with your
"hand picked phases" is an evidence of this fact.

This is not the only platform out there which can't cope with hs400 @ 166Mhz,
the p200 eMMC is hs400 capable, but the platform can't cope with it. That's how
it is.

Lower your frequency or change the mode to hs200 (which is the setting in the
upstream DT BTW)

> 
> To deal with such cases I added some code to avoid known invalid phase
> values when retuning. In addition I added some code to deal with the
> following (more or less corner) case:
> Let's say we have rx = 0° and tx = 0° and working is only combination
> rx = 180° and tx = 180°.
> Then just tuning rx only or tx only will never result in a working
> combination.
> 
> Following patch makes my system work. I just see one CRC error when
> the initally tuned rx/tx phase combination fails and then the
> retuning results in a stable system w/o further errors.
> 
> I'd appreciate if you could check that this patch doesn't break any
> of your systems.
> 
> Rgds, Heiner
> 
> ---
>  drivers/mmc/host/meson-gx-mmc.c | 29 ++++++++++++++++++++++-------
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
> index 85745ef1..95cb439d 100644
> --- a/drivers/mmc/host/meson-gx-mmc.c
> +++ b/drivers/mmc/host/meson-gx-mmc.c
> @@ -120,6 +120,7 @@
>  #define SD_EMMC_DESC_CHAIN_MODE BIT(1)
>  
>  #define MUX_CLK_NUM_PARENTS 2
> +#define MAX_TUNING_ATTEMPTS 10
>  
>  struct sd_emmc_desc {
>  	u32 cmd_cfg;
> @@ -687,15 +688,23 @@ static int meson_mmc_find_tuning_point(unsigned long
> *test)
>  static int meson_mmc_clk_phase_tuning(struct mmc_host *mmc, u32 opcode,
>  				      struct clk *clk)
>  {
> -	int point, ret;
> +	int point, ret, old_phase;
>  	DECLARE_BITMAP(test, CLK_PHASE_POINT_NUM);
>  
> +	old_phase = clk_get_phase(clk);
> +	if (old_phase < 0)
> +		return old_phase;
> +
>  	dev_dbg(mmc_dev(mmc), "%s phase/delay tunning...\n",
>  		__clk_get_name(clk));
>  	bitmap_zero(test, CLK_PHASE_POINT_NUM);
>  
>  	/* Explore tuning points */
>  	for (point = 0; point < CLK_PHASE_POINT_NUM; point++) {
> +		/* when retuning avoid the surrounding of where we failed */
> +		if (mmc->doing_retune)
> +			if (abs(point * CLK_PHASE_STEP - old_phase) <= 45)
> +				continue;

This is just a fake alteration of the working window.
The point of this whole tuning is to find the middle of the window to get the
most stable setting

If you still get CRC error with the center of the window, the signal quality is
just too low to cope with the timing set.

This is patch is just a (convoluted) hack that will force the tuning algorithm
to come up with new values each times, hoping to pick setting where the problem
is bit masked.


>  		clk_set_phase(clk, point * CLK_PHASE_STEP);
>  		ret = mmc_send_tuning(mmc, opcode, NULL);
>  		if (!ret)
> @@ -704,8 +713,11 @@ static int meson_mmc_clk_phase_tuning(struct mmc_host
> *mmc, u32 opcode,
>  
>  	/* Find the optimal tuning point and apply it */
>  	point = meson_mmc_find_tuning_point(test);
> -	if (point < 0)
> +	if (point < 0) {
> +		/* prevent from getting stuck if we failed */
> +		clk_set_phase(clk, (old_phase + 90) % 360);
>  		return point; /* tuning failed */
> +	}
>  
>  	clk_set_phase(clk, point * CLK_PHASE_STEP);
>  	dev_dbg(mmc_dev(mmc), "success with phase: %d\n",
> @@ -716,7 +728,7 @@ static int meson_mmc_clk_phase_tuning(struct mmc_host
> *mmc, u32 opcode,
>  static int meson_mmc_execute_tuning(struct mmc_host *mmc, u32 opcode)
>  {
>  	struct meson_host *host = mmc_priv(mmc);
> -	int ret;
> +	int i, ret;
>  
>  	/*
>  	 * If this is the initial tuning, try to get a sane Rx starting
> @@ -729,11 +741,14 @@ static int meson_mmc_execute_tuning(struct mmc_host
> *mmc, u32 opcode)
>  			return ret;
>  	}
>  
> -	ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
> -	if (ret)
> -		return ret;
> +	for (i = 0; i < MAX_TUNING_ATTEMPTS; i++) {
> +		meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
> +		ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
> +		if (!ret)
> +			return 0;

With the hack that is going to skip 25% on the phase, that is up to:
12 phase * 2 (TX/RX) * 10 * 0.75 = 180 MMC tunes !! that's way too much and
unnecessary.

> +	}
>  
> -	return meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
> +	return ret;
>  }
>  
>  static void meson_mmc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
Heiner Kallweit Oct. 12, 2017, 6:17 p.m. UTC | #2
Am 12.10.2017 um 17:22 schrieb Jerome Brunet:
> On Wed, 2017-10-11 at 22:46 +0200, Heiner Kallweit wrote:
>> Hi Jerome,
>>
>> we have the known issue that the latest next kernel still fails on
>> Odroid C2 with a 128GB eMMC card (w/o adjusting the initial tx phase).
>>
>> I found the time to dig a little deeper into it and reason is that
>> there are certain rx/tx phase combinations which are perfectly fine
>> when tuning but fail in real life.
> 
> You did not mention it in this mail, but I going to guess this is done, again,
> in hs400 ...
> 
Sorry. I use the default, hs200 with 200MHz.

>> Don't ask me how this can happen, I just see it happen.
> 
> Oh I have pretty good idea of what is happening here, and to be absolutely it is
> not a regression.
> 
> I think I already explained this but, before the MMC series that went in this
> cycle, the frequency reported by the mmc driver was twice what it really was
> for all DDR mode, including HS400.
> 
> So, up to v4.13, when you thought you were doing hs400 @ 200MHz, the frequency
> was really 100Mhz
> 
> With all the mmc patches applied (including the recent tuning fix) the frequency
> in DDR mode is set as requested by the framework. So on your target, it has
> increased from 100MHz to 166Mhz. This is a huge difference. You did not change
> the DT but the frequency did change.
> 
> The symptoms you are describing, I have seen them while trying SDR104 on boards
> where the layout proved to be unable to cope with the clock rate involved. Some
> cards worked well, some did not at all. When looking at the signal amplitude,
> the problem was clear, the signal quality was just not good enough. The
> communication might be OK for a short while (while doing the tuning) but may
> fail while doing "real life" transfers.
> 
> So why does the tuning succeed and you get errors later on ? Simply because the
> tuning is not the problem. The HW (and the frequency used) is.
> 
> As far as I can tell, your eMMC card + your odroidc2 is simply not able to cope
> (reliably) with 166Mhz. The fact that you continue to have CRC errors with your
> "hand picked phases" is an evidence of this fact.
> 
When setting the default tx phase to 0 I have a rock-stable system w/o any CRC
error (hs200 with DT 200MHz).

When leaving the default tx phase at 270, after tuning I end up with rx phase
90 and tx phase 300. This combination works perfect when tuning but fails
in real life.

I saw in the chip spec that there are few emmc-related calibration values in
SD_EMMC_ADJUST and SD_EMMC_CALOUT. However there's no documentation how to
use them. Looking at the vendor driver might help, though.
Did you have a closer look at these values ?

> This is not the only platform out there which can't cope with hs400 @ 166Mhz,
> the p200 eMMC is hs400 capable, but the platform can't cope with it. That's how
> it is.
> 
> Lower your frequency or change the mode to hs200 (which is the setting in the
> upstream DT BTW)
> 
>>
>> To deal with such cases I added some code to avoid known invalid phase
>> values when retuning. In addition I added some code to deal with the
>> following (more or less corner) case:
>> Let's say we have rx = 0° and tx = 0° and working is only combination
>> rx = 180° and tx = 180°.
>> Then just tuning rx only or tx only will never result in a working
>> combination.
>>
>> Following patch makes my system work. I just see one CRC error when
>> the initally tuned rx/tx phase combination fails and then the
>> retuning results in a stable system w/o further errors.
>>
>> I'd appreciate if you could check that this patch doesn't break any
>> of your systems.
>>
>> Rgds, Heiner
>>
>> ---
>>  drivers/mmc/host/meson-gx-mmc.c | 29 ++++++++++++++++++++++-------
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
>> index 85745ef1..95cb439d 100644
>> --- a/drivers/mmc/host/meson-gx-mmc.c
>> +++ b/drivers/mmc/host/meson-gx-mmc.c
>> @@ -120,6 +120,7 @@
>>  #define SD_EMMC_DESC_CHAIN_MODE BIT(1)
>>  
>>  #define MUX_CLK_NUM_PARENTS 2
>> +#define MAX_TUNING_ATTEMPTS 10
>>  
>>  struct sd_emmc_desc {
>>  	u32 cmd_cfg;
>> @@ -687,15 +688,23 @@ static int meson_mmc_find_tuning_point(unsigned long
>> *test)
>>  static int meson_mmc_clk_phase_tuning(struct mmc_host *mmc, u32 opcode,
>>  				      struct clk *clk)
>>  {
>> -	int point, ret;
>> +	int point, ret, old_phase;
>>  	DECLARE_BITMAP(test, CLK_PHASE_POINT_NUM);
>>  
>> +	old_phase = clk_get_phase(clk);
>> +	if (old_phase < 0)
>> +		return old_phase;
>> +
>>  	dev_dbg(mmc_dev(mmc), "%s phase/delay tunning...\n",
>>  		__clk_get_name(clk));
>>  	bitmap_zero(test, CLK_PHASE_POINT_NUM);
>>  
>>  	/* Explore tuning points */
>>  	for (point = 0; point < CLK_PHASE_POINT_NUM; point++) {
>> +		/* when retuning avoid the surrounding of where we failed */
>> +		if (mmc->doing_retune)
>> +			if (abs(point * CLK_PHASE_STEP - old_phase) <= 45)
>> +				continue;
> 
> This is just a fake alteration of the working window.
> The point of this whole tuning is to find the middle of the window to get the
> most stable setting
> 
> If you still get CRC error with the center of the window, the signal quality is
> just too low to cope with the timing set.
> 
> This is patch is just a (convoluted) hack that will force the tuning algorithm
> to come up with new values each times, hoping to pick setting where the problem
> is bit masked.
> 
> 
>>  		clk_set_phase(clk, point * CLK_PHASE_STEP);
>>  		ret = mmc_send_tuning(mmc, opcode, NULL);
>>  		if (!ret)
>> @@ -704,8 +713,11 @@ static int meson_mmc_clk_phase_tuning(struct mmc_host
>> *mmc, u32 opcode,
>>  
>>  	/* Find the optimal tuning point and apply it */
>>  	point = meson_mmc_find_tuning_point(test);
>> -	if (point < 0)
>> +	if (point < 0) {
>> +		/* prevent from getting stuck if we failed */
>> +		clk_set_phase(clk, (old_phase + 90) % 360);
>>  		return point; /* tuning failed */
>> +	}
>>  
>>  	clk_set_phase(clk, point * CLK_PHASE_STEP);
>>  	dev_dbg(mmc_dev(mmc), "success with phase: %d\n",
>> @@ -716,7 +728,7 @@ static int meson_mmc_clk_phase_tuning(struct mmc_host
>> *mmc, u32 opcode,
>>  static int meson_mmc_execute_tuning(struct mmc_host *mmc, u32 opcode)
>>  {
>>  	struct meson_host *host = mmc_priv(mmc);
>> -	int ret;
>> +	int i, ret;
>>  
>>  	/*
>>  	 * If this is the initial tuning, try to get a sane Rx starting
>> @@ -729,11 +741,14 @@ static int meson_mmc_execute_tuning(struct mmc_host
>> *mmc, u32 opcode)
>>  			return ret;
>>  	}
>>  
>> -	ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
>> -	if (ret)
>> -		return ret;
>> +	for (i = 0; i < MAX_TUNING_ATTEMPTS; i++) {
>> +		meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
>> +		ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
>> +		if (!ret)
>> +			return 0;
> 
> With the hack that is going to skip 25% on the phase, that is up to:
> 12 phase * 2 (TX/RX) * 10 * 0.75 = 180 MMC tunes !! that's way too much and
> unnecessary.
> 
>> +	}
>>  
>> -	return meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
>> +	return ret;
>>  }
>>  
>>  static void meson_mmc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)
> 
>
Jerome Brunet Oct. 12, 2017, 7:34 p.m. UTC | #3
On Thu, 2017-10-12 at 20:17 +0200, Heiner Kallweit wrote:
> > You did not mention it in this mail, but I going to guess this is done,
> > again,
> > in hs400 ...
> > 
> 
> Sorry. I use the default, hs200 with 200MHz.
> 

Please state this clearly next time. This seems to change with each of your
issue report, I'm done guessing.

[...]

> > As far as I can tell, your eMMC card + your odroidc2 is simply not able to
> > cope
> > (reliably) with 166Mhz. The fact that you continue to have CRC errors with
> > your
> > "hand picked phases" is an evidence of this fact.
> > 
> 
> When setting the default tx phase to 0 I have a rock-stable system w/o any CRC
> error (hs200 with DT 200MHz).

Getting CRC Errors is not my definition of "rock stable"

> 
> When leaving the default tx phase at 270, after tuning I end up with rx phase
> 90 and tx phase 300. This combination works perfect when tuning but fails
> in real life.
> 

Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
And what different did it gives you starting Rx:0/Tx:0 ?

The result of the tuning does not depends on starting point, so I don't really
understand how it would significantly change things.

There is nothing vastly innovative in this driver. There is at least 2 other
drivers upstream which use the same kind of algorithm to perform the tuning.
I've tested it on the vast majority of amlogic supported platforms, including
yours.

Yes, I don't have your particular eMMC chip but I gave you my understanding of
the situation based on my experience : If the tuning succeed but you then get
CRC, it means that signal (quality) is not good enough.

If you think it is something else, I'm happy to review your changes but you need
to provide a better explanation than "Don't ask me how this can happen, I just
see it happen." ... because I will ask !

> I saw in the chip spec that there are few emmc-related calibration values in
> SD_EMMC_ADJUST 

Adjust is just another fancy way to change clock phase based on the input clock
resampling. When the rate increase the precision of this method decrease (with
the divisor value). I tried it, it is not useful.

> and SD_EMMC_CALOUT. However there's no documentation how to
> use them. Looking at the vendor driver might help, though.

The vendor kernel around this calibration is very complex. It is used to set per
line delays to adjust for track length differences. Given that I tested the
odroidc2 with both hs200 and hs400 w/o this, I doubt it will change anything for
you

> Did you have a closer look at these values ?
Heiner Kallweit Oct. 12, 2017, 7:49 p.m. UTC | #4
Am 12.10.2017 um 21:34 schrieb Jerome Brunet:
> On Thu, 2017-10-12 at 20:17 +0200, Heiner Kallweit wrote:
>>> You did not mention it in this mail, but I going to guess this is done,
>>> again,
>>> in hs400 ...
>>>
>>
>> Sorry. I use the default, hs200 with 200MHz.
>>
> 
> Please state this clearly next time. This seems to change with each of your
> issue report, I'm done guessing.
> 
> [...]
> 
>>> As far as I can tell, your eMMC card + your odroidc2 is simply not able to
>>> cope
>>> (reliably) with 166Mhz. The fact that you continue to have CRC errors with
>>> your
>>> "hand picked phases" is an evidence of this fact.
>>>
>>
>> When setting the default tx phase to 0 I have a rock-stable system w/o any CRC
>> error (hs200 with DT 200MHz).
> 
> Getting CRC Errors is not my definition of "rock stable"
> 
When changing the initial tx phase to 0 I have no CRC errors at all with
hs200@200.
hs200@200 fails with initial tx phase = 270, just with my patch I get one
CRC error and after retuning system works w/oo further CRC errors.

With initial tx phase = 0 hs400@142 fails whilst hs400@125 works w/o CRC errors
and gives me 177 MB/s result when doing
dd if=/dev/mmcblk0 of=/dev/null iflag=direct bs=1G count=1

>>
>> When leaving the default tx phase at 270, after tuning I end up with rx phase
>> 90 and tx phase 300. This combination works perfect when tuning but fails
>> in real life.
>>
> 
> Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
> And what different did it gives you starting Rx:0/Tx:0 ?
> 
> The result of the tuning does not depends on starting point, so I don't really
> understand how it would significantly change things.
> 
I think it depends on the tx phase starting point.

Tuning does:
rx phase tuning
tx phase tuning
rx phase tuning

Result of each step depends on result of previous step. So also the initial
rx phase tuning result depends on the starting point of tx phase.

> There is nothing vastly innovative in this driver. There is at least 2 other
> drivers upstream which use the same kind of algorithm to perform the tuning.
> I've tested it on the vast majority of amlogic supported platforms, including
> yours.
> 
> Yes, I don't have your particular eMMC chip but I gave you my understanding of
> the situation based on my experience : If the tuning succeed but you then get
> CRC, it means that signal (quality) is not good enough.
> 
> If you think it is something else, I'm happy to review your changes but you need
> to provide a better explanation than "Don't ask me how this can happen, I just
> see it happen." ... because I will ask !
> 
>> I saw in the chip spec that there are few emmc-related calibration values in
>> SD_EMMC_ADJUST 
> 
> Adjust is just another fancy way to change clock phase based on the input clock
> resampling. When the rate increase the precision of this method decrease (with
> the divisor value). I tried it, it is not useful.
> 
>> and SD_EMMC_CALOUT. However there's no documentation how to
>> use them. Looking at the vendor driver might help, though.
> 
> The vendor kernel around this calibration is very complex. It is used to set per
> line delays to adjust for track length differences. Given that I tested the
> odroidc2 with both hs200 and hs400 w/o this, I doubt it will change anything for
> you
> 
>> Did you have a closer look at these values ?
> 
>
Jerome Brunet Oct. 12, 2017, 8:05 p.m. UTC | #5
On Thu, 2017-10-12 at 21:49 +0200, Heiner Kallweit wrote:
> > The result of the tuning does not depends on starting point, so I don't
> > really
> > understand how it would significantly change things.
> > 
> 
> I think it depends on the tx phase starting point.
> 

If the result of the tuning is not independent of the starting, instead of just
telling me, it is fairly easy for you to give actual result, like I asked you:

> > Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
> > And what different did it gives you starting Rx:0/Tx:0 ?

And if the result are indeed vastly different, let's debug and get a real
explanation.

> Tuning does:
> rx phase tuning
> tx phase tuning
> rx phase tuning
> 
> Result of each step depends on result of previous step.

Again, we don't agree. 

> So also the initial
> rx phase tuning result depends on the starting point of tx phase.

The first Rx tuning is there only to get a sane starting point for the tx tuning
,as explained in the code. This is the reason why is not done when re-tuning.
Heiner Kallweit Oct. 12, 2017, 8:29 p.m. UTC | #6
Am 12.10.2017 um 22:05 schrieb Jerome Brunet:
> On Thu, 2017-10-12 at 21:49 +0200, Heiner Kallweit wrote:
>>> The result of the tuning does not depends on starting point, so I don't
>>> really
>>> understand how it would significantly change things.
>>>
>>
>> I think it depends on the tx phase starting point.
>>
> 
> If the result of the tuning is not independent of the starting, instead of just
> telling me, it is fairly easy for you to give actual result, like I asked you:
> 

With hs200@200 and tx phase starting point 0 I get the following with no further CRC errors.

[    0.726572] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay tunning...
[    0.728664] hctosys: unable to open rtc device (rtc0)
[    0.728827] USB_OTG_PWR: disabling
[    0.728831] TFLASH_VDD: disabling
[    0.728833] TF_IO: disabling
[    0.753183] Waiting for root device /dev/mmcblk0p1...
[    0.754352] meson-gx-mmc d0074000.mmc: success with phase: 270
[    0.758386] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay tunning...
[    0.768050] meson-gx-mmc d0074000.mmc: success with phase: 120
[    0.771235] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay tunning...
[    0.780902] meson-gx-mmc d0074000.mmc: success with phase: 0
[    0.784036] mmc0: new HS200 MMC card at address 0001
[    0.789090] mmcblk0: mmc0:0001 DJNB4R 116 GiB
[    0.793328] mmcblk0boot0: mmc0:0001 DJNB4R partition 1 4.00 MiB
[    0.799198] mmcblk0boot1: mmc0:0001 DJNB4R partition 2 4.00 MiB
[    0.805048] mmcblk0rpmb: mmc0:0001 DJNB4R partition 3 4.00 MiB, chardev (249:0)
[    0.812799] mmcblk0: response CRC error sending r/w cmd command, card status 0x900
[    0.819727] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay tunning...
[    0.827007] meson-gx-mmc d0074000.mmc: success with phase: 210
[    0.832559] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay tunning...
[    0.839848] meson-gx-mmc d0074000.mmc: success with phase: 0


>>> Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
>>> And what different did it gives you starting Rx:0/Tx:0 ?
> 
> And if the result are indeed vastly different, let's debug and get a real
> explanation.
> 
>> Tuning does:
>> rx phase tuning
>> tx phase tuning
>> rx phase tuning
>>
>> Result of each step depends on result of previous step.
> 
> Again, we don't agree. 
> 
>> So also the initial
>> rx phase tuning result depends on the starting point of tx phase.
> 
> The first Rx tuning is there only to get a sane starting point for the tx tuning
> ,as explained in the code. This is the reason why is not done when re-tuning.
> 
> 
>
Jerome Brunet Oct. 12, 2017, 8:59 p.m. UTC | #7
On Thu, 2017-10-12 at 22:29 +0200, Heiner Kallweit wrote:
> Am 12.10.2017 um 22:05 schrieb Jerome Brunet:
> > On Thu, 2017-10-12 at 21:49 +0200, Heiner Kallweit wrote:
> > > > The result of the tuning does not depends on starting point, so I don't
> > > > really
> > > > understand how it would significantly change things.
> > > > 
> > > 
> > > I think it depends on the tx phase starting point.
> > > 
> > 
> > If the result of the tuning is not independent of the starting, instead of
> > just
> > telling me, it is fairly easy for you to give actual result, like I asked
> > you:
> > 
> 
> With hs200@200 and tx phase starting point 0 I get the following with no
> further CRC errors.
> 
> [    0.726572] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
> tunning...
> [    0.728664] hctosys: unable to open rtc device (rtc0)
> [    0.728827] USB_OTG_PWR: disabling
> [    0.728831] TFLASH_VDD: disabling
> [    0.728833] TF_IO: disabling
> [    0.753183] Waiting for root device /dev/mmcblk0p1...
> [    0.754352] meson-gx-mmc d0074000.mmc: success with phase: 270
> [    0.758386] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay
> tunning...
> [    0.768050] meson-gx-mmc d0074000.mmc: success with phase: 120
> [    0.771235] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
> tunning...
> [    0.780902] meson-gx-mmc d0074000.mmc: success with phase: 0
> [    0.784036] mmc0: new HS200 MMC card at address 0001
> [    0.789090] mmcblk0: mmc0:0001 DJNB4R 116 GiB
> [    0.793328] mmcblk0boot0: mmc0:0001 DJNB4R partition 1 4.00 MiB
> [    0.799198] mmcblk0boot1: mmc0:0001 DJNB4R partition 2 4.00 MiB
> [    0.805048] mmcblk0rpmb: mmc0:0001 DJNB4R partition 3 4.00 MiB, chardev
> (249:0)
> [    0.812799] mmcblk0: response CRC error sending r/w cmd command, card
> status 0x900
> [    0.819727] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay
> tunning...
> [    0.827007] meson-gx-mmc d0074000.mmc: success with phase: 210
> [    0.832559] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
> tunning...
> [    0.839848] meson-gx-mmc d0074000.mmc: success with phase: 0
> 
> 
> > > > Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
> > > > And what different did it gives you starting Rx:0/Tx:0 ?
> > 
> > And if the result are indeed vastly different, let's debug and get a real
> > explanation.
> > 
> > > Tuning does:
> > > rx phase tuning
> > > tx phase tuning
> > > rx phase tuning
> > > 
> > > Result of each step depends on result of previous step.
> > 
> > Again, we don't agree. 
> > 
> > > So also the initial
> > > rx phase tuning result depends on the starting point of tx phase.
> > 
> > The first Rx tuning is there only to get a sane starting point for the tx
> > tuning
> > ,as explained in the code. This is the reason why is not done when re-
> > tuning.
> > 

Heiner, This is the same story again and again !
Getting clear status is just really painful. It's like half the message is
ignored each time. 

I can clearly see that, this result comes from your hack, and this hack makes no
sense. I don't care about this result. This is not what I asked.

What I asked is:
1) From linux-next what is tuning result in hs200@200Mhz with the starting :
* Rx:0/Tx:270/Core:180 (current setting)
* Rx:0/Tx:0/Core:180 (the setting you keep on mentioning)

Please don't send your logs: the phase set are displayed in
debugfs/clk?clk_summary

2) Try to come up with a real explanation: "It just happens ..." is not an
explanation. 

> > 
> > 
> 
>
Heiner Kallweit Oct. 12, 2017, 9:04 p.m. UTC | #8
Am 12.10.2017 um 22:59 schrieb Jerome Brunet:
> On Thu, 2017-10-12 at 22:29 +0200, Heiner Kallweit wrote:
>> Am 12.10.2017 um 22:05 schrieb Jerome Brunet:
>>> On Thu, 2017-10-12 at 21:49 +0200, Heiner Kallweit wrote:
>>>>> The result of the tuning does not depends on starting point, so I don't
>>>>> really
>>>>> understand how it would significantly change things.
>>>>>
>>>>
>>>> I think it depends on the tx phase starting point.
>>>>
>>>
>>> If the result of the tuning is not independent of the starting, instead of
>>> just
>>> telling me, it is fairly easy for you to give actual result, like I asked
>>> you:
>>>
>>
>> With hs200@200 and tx phase starting point 0 I get the following with no
>> further CRC errors.
>>
>> [    0.726572] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
>> tunning...
>> [    0.728664] hctosys: unable to open rtc device (rtc0)
>> [    0.728827] USB_OTG_PWR: disabling
>> [    0.728831] TFLASH_VDD: disabling
>> [    0.728833] TF_IO: disabling
>> [    0.753183] Waiting for root device /dev/mmcblk0p1...
>> [    0.754352] meson-gx-mmc d0074000.mmc: success with phase: 270
>> [    0.758386] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay
>> tunning...
>> [    0.768050] meson-gx-mmc d0074000.mmc: success with phase: 120
>> [    0.771235] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
>> tunning...
>> [    0.780902] meson-gx-mmc d0074000.mmc: success with phase: 0
>> [    0.784036] mmc0: new HS200 MMC card at address 0001
>> [    0.789090] mmcblk0: mmc0:0001 DJNB4R 116 GiB
>> [    0.793328] mmcblk0boot0: mmc0:0001 DJNB4R partition 1 4.00 MiB
>> [    0.799198] mmcblk0boot1: mmc0:0001 DJNB4R partition 2 4.00 MiB
>> [    0.805048] mmcblk0rpmb: mmc0:0001 DJNB4R partition 3 4.00 MiB, chardev
>> (249:0)
>> [    0.812799] mmcblk0: response CRC error sending r/w cmd command, card
>> status 0x900
>> [    0.819727] meson-gx-mmc d0074000.mmc: d0074000.mmc#tx phase/delay
>> tunning...
>> [    0.827007] meson-gx-mmc d0074000.mmc: success with phase: 210
>> [    0.832559] meson-gx-mmc d0074000.mmc: d0074000.mmc#rx phase/delay
>> tunning...
>> [    0.839848] meson-gx-mmc d0074000.mmc: success with phase: 0
>>
>>
>>>>> Ok, starting from Rx:0 Tx:270 then tuning gives you Tx:300 Rx:90
>>>>> And what different did it gives you starting Rx:0/Tx:0 ?
>>>
>>> And if the result are indeed vastly different, let's debug and get a real
>>> explanation.
>>>
>>>> Tuning does:
>>>> rx phase tuning
>>>> tx phase tuning
>>>> rx phase tuning
>>>>
>>>> Result of each step depends on result of previous step.
>>>
>>> Again, we don't agree. 
>>>
>>>> So also the initial
>>>> rx phase tuning result depends on the starting point of tx phase.
>>>
>>> The first Rx tuning is there only to get a sane starting point for the tx
>>> tuning
>>> ,as explained in the code. This is the reason why is not done when re-
>>> tuning.
>>>
> 
> Heiner, This is the same story again and again !
> Getting clear status is just really painful. It's like half the message is
> ignored each time. 
> 
> I can clearly see that, this result comes from your hack, and this hack makes no
> sense. I don't care about this result. This is not what I asked.
> 
No, the log provided is w/o my hack. The debugfs values just represent the current
values, interesting is how they change from tuning step to tuning step.

> What I asked is:
> 1) From linux-next what is tuning result in hs200@200Mhz with the starting :
> * Rx:0/Tx:270/Core:180 (current setting)
> * Rx:0/Tx:0/Core:180 (the setting you keep on mentioning)
> 
> Please don't send your logs: the phase set are displayed in
> debugfs/clk?clk_summary
> 
> 2) Try to come up with a real explanation: "It just happens ..." is not an
> explanation. 
> 
>>>
>>>
>>
>>
> 
>
diff mbox

Patch

diff --git a/drivers/mmc/host/meson-gx-mmc.c b/drivers/mmc/host/meson-gx-mmc.c
index 85745ef1..95cb439d 100644
--- a/drivers/mmc/host/meson-gx-mmc.c
+++ b/drivers/mmc/host/meson-gx-mmc.c
@@ -120,6 +120,7 @@ 
 #define SD_EMMC_DESC_CHAIN_MODE BIT(1)
 
 #define MUX_CLK_NUM_PARENTS 2
+#define MAX_TUNING_ATTEMPTS 10
 
 struct sd_emmc_desc {
 	u32 cmd_cfg;
@@ -687,15 +688,23 @@  static int meson_mmc_find_tuning_point(unsigned long *test)
 static int meson_mmc_clk_phase_tuning(struct mmc_host *mmc, u32 opcode,
 				      struct clk *clk)
 {
-	int point, ret;
+	int point, ret, old_phase;
 	DECLARE_BITMAP(test, CLK_PHASE_POINT_NUM);
 
+	old_phase = clk_get_phase(clk);
+	if (old_phase < 0)
+		return old_phase;
+
 	dev_dbg(mmc_dev(mmc), "%s phase/delay tunning...\n",
 		__clk_get_name(clk));
 	bitmap_zero(test, CLK_PHASE_POINT_NUM);
 
 	/* Explore tuning points */
 	for (point = 0; point < CLK_PHASE_POINT_NUM; point++) {
+		/* when retuning avoid the surrounding of where we failed */
+		if (mmc->doing_retune)
+			if (abs(point * CLK_PHASE_STEP - old_phase) <= 45)
+				continue;
 		clk_set_phase(clk, point * CLK_PHASE_STEP);
 		ret = mmc_send_tuning(mmc, opcode, NULL);
 		if (!ret)
@@ -704,8 +713,11 @@  static int meson_mmc_clk_phase_tuning(struct mmc_host *mmc, u32 opcode,
 
 	/* Find the optimal tuning point and apply it */
 	point = meson_mmc_find_tuning_point(test);
-	if (point < 0)
+	if (point < 0) {
+		/* prevent from getting stuck if we failed */
+		clk_set_phase(clk, (old_phase + 90) % 360);
 		return point; /* tuning failed */
+	}
 
 	clk_set_phase(clk, point * CLK_PHASE_STEP);
 	dev_dbg(mmc_dev(mmc), "success with phase: %d\n",
@@ -716,7 +728,7 @@  static int meson_mmc_clk_phase_tuning(struct mmc_host *mmc, u32 opcode,
 static int meson_mmc_execute_tuning(struct mmc_host *mmc, u32 opcode)
 {
 	struct meson_host *host = mmc_priv(mmc);
-	int ret;
+	int i, ret;
 
 	/*
 	 * If this is the initial tuning, try to get a sane Rx starting
@@ -729,11 +741,14 @@  static int meson_mmc_execute_tuning(struct mmc_host *mmc, u32 opcode)
 			return ret;
 	}
 
-	ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
-	if (ret)
-		return ret;
+	for (i = 0; i < MAX_TUNING_ATTEMPTS; i++) {
+		meson_mmc_clk_phase_tuning(mmc, opcode, host->tx_clk);
+		ret = meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
+		if (!ret)
+			return 0;
+	}
 
-	return meson_mmc_clk_phase_tuning(mmc, opcode, host->rx_clk);
+	return ret;
 }
 
 static void meson_mmc_set_ios(struct mmc_host *mmc, struct mmc_ios *ios)