diff mbox series

[v4] arm64: dts: rockchip: Fix SD card init on rk3399-nanopi4

Message ID 73F9AED0-D2A8-4294-B6E1-1B92D2A36529@kohlschutter.com (mailing list archive)
State New, archived
Headers show
Series [v4] arm64: dts: rockchip: Fix SD card init on rk3399-nanopi4 | expand

Commit Message

Christian Kohlschütter July 15, 2022, 5:12 p.m. UTC
mmc/SD-card initialization may fail on NanoPi R4S with
"mmc1: problem reading SD Status register" /
"mmc1: error -110 whilst initialising SD card"
either on cold boot or after a reboot.

Moreover, the system would also sometimes hang upon reboot.

This is prevented by setting an explicit undervoltage protection limit
for the SD-card-specific vcc3v0_sd voltage regulator.

Set the undervoltage protection limit to 2.7V, which is the minimum
permissible SD card operating voltage.

Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
---
arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi | 4 ++++
1 file changed, 4 insertions(+)
mode change 100644 => 100755 arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi

Comments

Christian Kohlschütter July 15, 2022, 5:16 p.m. UTC | #1
OK, this took me a while to figure out.

When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...

> Am 15.07.2022 um 19:12 schrieb Christian Kohlschütter <christian@kohlschutter.com>:
> 
> mmc/SD-card initialization may fail on NanoPi R4S with
> "mmc1: problem reading SD Status register" /
> "mmc1: error -110 whilst initialising SD card"
> either on cold boot or after a reboot.
> 
> Moreover, the system would also sometimes hang upon reboot.
> 
> This is prevented by setting an explicit undervoltage protection limit
> for the SD-card-specific vcc3v0_sd voltage regulator.
> 
> Set the undervoltage protection limit to 2.7V, which is the minimum
> permissible SD card operating voltage.
> 
> Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
> ---
> arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi | 4 ++++
> 1 file changed, 4 insertions(+)
> mode change 100644 => 100755 arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
> 
> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
> old mode 100644
> new mode 100755
> index 8c0ff6c96e03..669c74ce4d13
> --- a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
> @@ -73,6 +73,10 @@ vcc3v0_sd: vcc3v0-sd {
> 		regulator-always-on;
> 		regulator-min-microvolt = <3000000>;
> 		regulator-max-microvolt = <3000000>;
> +
> +		// must be configured or SD card may fail to initialize occasionally
> +		regulator-uv-protection-microvolt = <2700000>;
> +
> 		regulator-name = "vcc3v0_sd";
> 		vin-supply = <&vcc3v3_sys>;
> 	};
> -- 
> 2.36.1
Robin Murphy July 15, 2022, 6:11 p.m. UTC | #2
On 2022-07-15 18:16, Christian Kohlschütter wrote:
> OK, this took me a while to figure out.
> 
> When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
> When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...

Well this has to be in the running for "weirdest placebo ever"... :/

All it actually seems to achieve is printing an error[1] (this is after 
all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), 
and if that makes an appreciable difference then there has to be some 
kind of weird timing condition at play. Maybe regulator_register() ends 
up turning it off and on again rapidly enough that the card sees a 
voltage brownout and glitches, and adding more delay by printing to the 
console somewhere in the middle gives it enough time to act as a proper 
power cycle with no ill effect?

If you just whack something like an mdelay(500) at around that point in 
set_machine_constraints(), without the DT property, does it have the 
same effect?

Robin.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/regulator/core.c#n1521

>> Am 15.07.2022 um 19:12 schrieb Christian Kohlschütter <christian@kohlschutter.com>:
>>
>> mmc/SD-card initialization may fail on NanoPi R4S with
>> "mmc1: problem reading SD Status register" /
>> "mmc1: error -110 whilst initialising SD card"
>> either on cold boot or after a reboot.
>>
>> Moreover, the system would also sometimes hang upon reboot.
>>
>> This is prevented by setting an explicit undervoltage protection limit
>> for the SD-card-specific vcc3v0_sd voltage regulator.
>>
>> Set the undervoltage protection limit to 2.7V, which is the minimum
>> permissible SD card operating voltage.
>>
>> Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
>> ---
>> arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi | 4 ++++
>> 1 file changed, 4 insertions(+)
>> mode change 100644 => 100755 arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>>
>> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>> old mode 100644
>> new mode 100755
>> index 8c0ff6c96e03..669c74ce4d13
>> --- a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>> +++ b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>> @@ -73,6 +73,10 @@ vcc3v0_sd: vcc3v0-sd {
>> 		regulator-always-on;
>> 		regulator-min-microvolt = <3000000>;
>> 		regulator-max-microvolt = <3000000>;
>> +
>> +		// must be configured or SD card may fail to initialize occasionally
>> +		regulator-uv-protection-microvolt = <2700000>;
>> +
>> 		regulator-name = "vcc3v0_sd";
>> 		vin-supply = <&vcc3v3_sys>;
>> 	};
>> -- 
>> 2.36.1
>
Christian Kohlschütter July 15, 2022, 6:57 p.m. UTC | #3
> Am 15.07.2022 um 20:11 schrieb Robin Murphy <robin.murphy@arm.com>:
> 
> On 2022-07-15 18:16, Christian Kohlschütter wrote:
>> OK, this took me a while to figure out.
>> When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
>> When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...
> 
> Well this has to be in the running for "weirdest placebo ever"... :/
> 
> All it actually seems to achieve is printing an error[1] (this is after all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), and if that makes an appreciable difference then there has to be some kind of weird timing condition at play. Maybe regulator_register() ends up turning it off and on again rapidly enough that the card sees a voltage brownout and glitches, and adding more delay by printing to the console somewhere in the middle gives it enough time to act as a proper power cycle with no ill effect?

That's definitely something between placebo and homeopathy :-)

I can confirm that setting a limit higher than 3.0V still works, which means that the one time incident where it still crashed means that there's indeed a timing issue at play, and adding that undervoltage statement (unlike the ramp-delay configs that I also tried) added just enough of a delay that made it work 99 out of 100 times.

> If you just whack something like an mdelay(500) at around that point in set_machine_constraints(), without the DT property, does it have the same effect?
Adding a delay for vcc3v0_sd works, which is great! (patch below)

Is there an existing path from device-tree parser to regular/core.c that we can use to specify this delay specifically for this regulator?
Also, what delay should we choose to make sure it works all the time and not just 99 out of 100 times?

Best,
Christian

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index c4d844ffad7a..0e15ec2548f4 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1483,6 +1483,11 @@ static int set_machine_constraints(struct regulator_dev *rdev)
                          "IC does not support requested over voltage limits\n");
        }
 
+if(!strncmp(rdev_get_name(rdev),"vcc3v0_sd",sizeof("vcc3v0_sd"))) {
+       rdev_err(rdev, "DELAY: %s\n", rdev_get_name(rdev));
+       mdelay(500);
+}
+
        if (rdev->constraints->under_voltage_detection)
                ret = handle_notify_limits(rdev,
                                           ops->set_under_voltage_protection,
Robin Murphy July 15, 2022, 6:57 p.m. UTC | #4
On 2022-07-15 19:11, Robin Murphy wrote:
> On 2022-07-15 18:16, Christian Kohlschütter wrote:
>> OK, this took me a while to figure out.
>>
>> When no undervoltage limit is configured, I can reliably trigger the 
>> initialization bug upon boot.
>> When the limit is set to 3.0V, it rarely occurs, but just after I send 
>> the v3 patch, I was able to reproduce...
> 
> Well this has to be in the running for "weirdest placebo ever"... :/
> 
> All it actually seems to achieve is printing an error[1] (this is after 
> all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), 
> and if that makes an appreciable difference then there has to be some 
> kind of weird timing condition at play. Maybe regulator_register() ends 
> up turning it off and on again rapidly enough that the card sees a 
> voltage brownout and glitches, and adding more delay by printing to the 
> console somewhere in the middle gives it enough time to act as a proper 
> power cycle with no ill effect?

...and apparently the answer is yes, it seems to be doing exactly that 
(see attached). But seemingly my SD cards don't mind, or maybe my T4 
board happens to have more capacitance than Christian's R4S so my 
voltage dip isn't as bad, or both.

So it seems like the solution here might indeed simply be to remove the 
regulator-always-on which doesn't seem to have any reason to be here 
anyway. Without that, the enable stays low until the MMC driver probes 
and claims it, which is then massively longer than the time it takes for 
VCC3V0_SD to ramp down completely.

Robin.

> 
> If you just whack something like an mdelay(500) at around that point in 
> set_machine_constraints(), without the DT property, does it have the 
> same effect?
> 
> Robin.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/regulator/core.c#n1521 
> 
> 
>>> Am 15.07.2022 um 19:12 schrieb Christian Kohlschütter 
>>> <christian@kohlschutter.com>:
>>>
>>> mmc/SD-card initialization may fail on NanoPi R4S with
>>> "mmc1: problem reading SD Status register" /
>>> "mmc1: error -110 whilst initialising SD card"
>>> either on cold boot or after a reboot.
>>>
>>> Moreover, the system would also sometimes hang upon reboot.
>>>
>>> This is prevented by setting an explicit undervoltage protection limit
>>> for the SD-card-specific vcc3v0_sd voltage regulator.
>>>
>>> Set the undervoltage protection limit to 2.7V, which is the minimum
>>> permissible SD card operating voltage.
>>>
>>> Signed-off-by: Christian Kohlschütter <christian@kohlschutter.com>
>>> ---
>>> arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>> mode change 100644 => 100755 
>>> arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>>>
>>> diff --git a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi 
>>> b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>>> old mode 100644
>>> new mode 100755
>>> index 8c0ff6c96e03..669c74ce4d13
>>> --- a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>>> +++ b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
>>> @@ -73,6 +73,10 @@ vcc3v0_sd: vcc3v0-sd {
>>>         regulator-always-on;
>>>         regulator-min-microvolt = <3000000>;
>>>         regulator-max-microvolt = <3000000>;
>>> +
>>> +        // must be configured or SD card may fail to initialize 
>>> occasionally
>>> +        regulator-uv-protection-microvolt = <2700000>;
>>> +
>>>         regulator-name = "vcc3v0_sd";
>>>         vin-supply = <&vcc3v3_sys>;
>>>     };
>>> -- 
>>> 2.36.1
>>
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
Christian Kohlschütter July 15, 2022, 7:04 p.m. UTC | #5
Am 15.07.2022 um 20:57 schrieb Robin Murphy <robin.murphy@arm.com>:
> 
> On 2022-07-15 19:11, Robin Murphy wrote:
>> On 2022-07-15 18:16, Christian Kohlschütter wrote:
>>> OK, this took me a while to figure out.
>>> 
>>> When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
>>> When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...
>> Well this has to be in the running for "weirdest placebo ever"... :/
>> All it actually seems to achieve is printing an error[1] (this is after all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), and if that makes an appreciable difference then there has to be some kind of weird timing condition at play. Maybe regulator_register() ends up turning it off and on again rapidly enough that the card sees a voltage brownout and glitches, and adding more delay by printing to the console somewhere in the middle gives it enough time to act as a proper power cycle with no ill effect?
> 
> ...and apparently the answer is yes, it seems to be doing exactly that (see attached). But seemingly my SD cards don't mind, or maybe my T4 board happens to have more capacitance than Christian's R4S so my voltage dip isn't as bad, or both.
> 
> So it seems like the solution here might indeed simply be to remove the regulator-always-on which doesn't seem to have any reason to be here anyway. Without that, the enable stays low until the MMC driver probes and claims it, which is then massively longer than the time it takes for VCC3V0_SD to ramp down completely.
> 
> Robin.

Removing "regulator-always-on" has the effect that the system freezes upon reboot. There may well be another bug slumbering in the codebase that is circumvented by 1. adding a delay in the code and 2. not turning the regulator off upon shutdown.
Robin Murphy July 15, 2022, 7:38 p.m. UTC | #6
On 2022-07-15 20:04, Christian Kohlschütter wrote:
> Am 15.07.2022 um 20:57 schrieb Robin Murphy <robin.murphy@arm.com>:
>>
>> On 2022-07-15 19:11, Robin Murphy wrote:
>>> On 2022-07-15 18:16, Christian Kohlschütter wrote:
>>>> OK, this took me a while to figure out.
>>>>
>>>> When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
>>>> When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...
>>> Well this has to be in the running for "weirdest placebo ever"... :/
>>> All it actually seems to achieve is printing an error[1] (this is after all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), and if that makes an appreciable difference then there has to be some kind of weird timing condition at play. Maybe regulator_register() ends up turning it off and on again rapidly enough that the card sees a voltage brownout and glitches, and adding more delay by printing to the console somewhere in the middle gives it enough time to act as a proper power cycle with no ill effect?
>>
>> ...and apparently the answer is yes, it seems to be doing exactly that (see attached). But seemingly my SD cards don't mind, or maybe my T4 board happens to have more capacitance than Christian's R4S so my voltage dip isn't as bad, or both.
>>
>> So it seems like the solution here might indeed simply be to remove the regulator-always-on which doesn't seem to have any reason to be here anyway. Without that, the enable stays low until the MMC driver probes and claims it, which is then massively longer than the time it takes for VCC3V0_SD to ramp down completely.
>>
>> Robin.
> 
> Removing "regulator-always-on" has the effect that the system freezes upon reboot.

Ah, right (can we fast-forward to a world where everyone has a reliable 
bootloader in SPI flash or similar?). Is that more glitching, or a 
firmware bug not resetting the GPIOs to their default state on warm 
reset, I wonder.

> There may well be another bug slumbering in the codebase that is circumvented by 1. adding a delay in the code and 2. not turning the regulator off upon shutdown.

Yes, it seems suboptimal that the regulator core allows this glitch 
where an always-on regulator which is already on gets turned off at all, 
but I guess that's its own problem. In the meantime, off-on-delay-us 
sounds like the most likely property to bandage this locally. I'm seeing 
a fall time in the order of milliseconds (attached), so we'd probably 
want a fair chunk of that to be safe.

Robin.
Christian Kohlschütter July 15, 2022, 10:33 p.m. UTC | #7
> Am 15.07.2022 um 21:38 schrieb Robin Murphy <robin.murphy@arm.com>:
> 
> On 2022-07-15 20:04, Christian Kohlschütter wrote:
>> Am 15.07.2022 um 20:57 schrieb Robin Murphy <robin.murphy@arm.com>:
>>> 
>>> On 2022-07-15 19:11, Robin Murphy wrote:
>>>> On 2022-07-15 18:16, Christian Kohlschütter wrote:
>>>>> OK, this took me a while to figure out.
>>>>> 
>>>>> When no undervoltage limit is configured, I can reliably trigger the initialization bug upon boot.
>>>>> When the limit is set to 3.0V, it rarely occurs, but just after I send the v3 patch, I was able to reproduce...
>>>> Well this has to be in the running for "weirdest placebo ever"... :/
>>>> All it actually seems to achieve is printing an error[1] (this is after all a tiny 5-pin fixed-voltage LDO regulator, not an intelligent PMIC), and if that makes an appreciable difference then there has to be some kind of weird timing condition at play. Maybe regulator_register() ends up turning it off and on again rapidly enough that the card sees a voltage brownout and glitches, and adding more delay by printing to the console somewhere in the middle gives it enough time to act as a proper power cycle with no ill effect?
>>> 
>>> ...and apparently the answer is yes, it seems to be doing exactly that (see attached). But seemingly my SD cards don't mind, or maybe my T4 board happens to have more capacitance than Christian's R4S so my voltage dip isn't as bad, or both.
>>> 
>>> So it seems like the solution here might indeed simply be to remove the regulator-always-on which doesn't seem to have any reason to be here anyway. Without that, the enable stays low until the MMC driver probes and claims it, which is then massively longer than the time it takes for VCC3V0_SD to ramp down completely.
>>> 
>>> Robin.
>> Removing "regulator-always-on" has the effect that the system freezes upon reboot.
> 
> Ah, right (can we fast-forward to a world where everyone has a reliable bootloader in SPI flash or similar?). Is that more glitching, or a firmware bug not resetting the GPIOs to their default state on warm reset, I wonder.
> 
>> There may well be another bug slumbering in the codebase that is circumvented by 1. adding a delay in the code and 2. not turning the regulator off upon shutdown.
> 
> Yes, it seems suboptimal that the regulator core allows this glitch where an always-on regulator which is already on gets turned off at all, but I guess that's its own problem. In the meantime, off-on-delay-us sounds like the most likely property to bandage this locally. I'm seeing a fall time in the order of milliseconds (attached), so we'd probably want a fair chunk of that to be safe.
> 
> Robin.<SDS00003.png>

I think we have a way where there's no need to pick a delay value that may ultimately not work in all cases.
Following up with "[PATCH] regulator: core: Resolve supply name earlier to prevent double-init" [1]

Thank you so much for helping me getting that far! It would be great if you'd keep following the thread.

Best,
Christian

[1] https://www.spinics.net/lists/kernel/msg4440365.html
Christian Kohlschütter July 16, 2022, 12:24 a.m. UTC | #8
>> 
>>> There may well be another bug slumbering in the codebase that is circumvented by 1. adding a delay in the code and 2. not turning the regulator off upon shutdown.
>> 
>> Yes, it seems suboptimal that the regulator core allows this glitch where an always-on regulator which is already on gets turned off at all, but I guess that's its own problem. In the meantime, off-on-delay-us sounds like the most likely property to bandage this locally. I'm seeing a fall time in the order of milliseconds (attached), so we'd probably want a fair chunk of that to be safe.
>> 
>> Robin.<SDS00003.png>
> 
> I think we have a way where there's no need to pick a delay value that may ultimately not work in all cases.
> Following up with "[PATCH] regulator: core: Resolve supply name earlier to prevent double-init" [1]
> 
> Thank you so much for helping me getting that far! It would be great if you'd keep following the thread.
> 
> Best,
> Christian
> 
> [1] https://www.spinics.net/lists/kernel/msg4440365.html

@Robin,

oddly enough, setting off-on-delay-us with values of up to a second (1000000 us) still results in failed inits.
I hope we can find another bandage until the regular-core patch gets merged.
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
old mode 100644
new mode 100755
index 8c0ff6c96e03..669c74ce4d13
--- a/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-nanopi4.dtsi
@@ -73,6 +73,10 @@  vcc3v0_sd: vcc3v0-sd {
		regulator-always-on;
		regulator-min-microvolt = <3000000>;
		regulator-max-microvolt = <3000000>;
+
+		// must be configured or SD card may fail to initialize occasionally
+		regulator-uv-protection-microvolt = <2700000>;
+
		regulator-name = "vcc3v0_sd";
		vin-supply = <&vcc3v3_sys>;
	};