diff mbox series

regulator: do not balance 'boot-on' coupled regulators without constraints

Message ID 20200605063724.9030-1-m.szyprowski@samsung.com (mailing list archive)
State Not Applicable
Headers show
Series regulator: do not balance 'boot-on' coupled regulators without constraints | expand

Commit Message

Marek Szyprowski June 5, 2020, 6:37 a.m. UTC
Balancing of the 'boot-on' coupled regulators must wait until the clients
set their constraints, otherwise the balancing code might change the
voltage of the not-yet-constrained regulator to the value below the
bootloader-configured operation point, what might cause a system crash.
This is achieved by assuming that, the minimal voltage allowed for the
given 'boot-on' regulator is equal to it's current voltage until
consumers apply their constraints.

Suggested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
This is yet another approach to fix the regulator coupling on
Exynos5800/5422 SoCs in the regulator core. I agree with Dmitry that this
issue is generic and if possible it should be handled in the core.

This patchset is another attempt to fix the regulator coupling on
Exynos5800/5422 SoCs. Here are links to the previous attempts and
discussions:

https://lore.kernel.org/linux-samsung-soc/20191008101709.qVNy8eijBi0LynOteWFMnTg4GUwKG599n6OyYoX1Abs@z/
https://lore.kernel.org/lkml/20191017102758.8104-1-m.szyprowski@samsung.com/
https://lore.kernel.org/linux-pm/cover.1589528491.git.viresh.kumar@linaro.org/
https://lore.kernel.org/linux-pm/20200528131130.17984-1-m.szyprowski@samsung.com/
https://lore.kernel.org/linux-samsung-soc/57cf3a15-5d9b-7636-4c69-60742e8cfae6@samsung.com/

The problem is with "vdd_int" regulator coupled with "vdd_arm" on Odroid
XU3/XU4 boards family. "vdd_arm" is handled by CPUfreq. "vdd_int" is
handled by devfreq. CPUfreq initialized quite early during boot and it
starts changing OPPs and "vdd_arm" value. Sometimes CPU activity during
boot goes down and some low-frequency OPPs are selected, what in turn
causes lowering "vdd_arm". This happens before devfreq applies its
requirements on "vdd_int". Regulator balancing code reduces "vdd_arm"
voltage value, what in turn causes lowering "vdd_int" value to the lowest
possible value. This is much below the operation point of the wcore bus,
which still runs at the highest frequency.

The issue was hard to notice because in the most cases the board managed
to boot properly, even when the regulator was set to lowest value allowed
by the regulator constraints. However, it caused some random issues,
which can be observed as "Unhandled prefetch abort" or low USB stability.

Best regards
Marek Szyprowski
---
 drivers/regulator/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Viresh Kumar June 5, 2020, 6:54 a.m. UTC | #1
+Saravana,

On 05-06-20, 08:37, Marek Szyprowski wrote:
> Balancing of the 'boot-on' coupled regulators must wait until the clients
> set their constraints, otherwise the balancing code might change the
> voltage of the not-yet-constrained regulator to the value below the
> bootloader-configured operation point, what might cause a system crash.
> This is achieved by assuming that, the minimal voltage allowed for the
> given 'boot-on' regulator is equal to it's current voltage until
> consumers apply their constraints.
> 
> Suggested-by: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> This is yet another approach to fix the regulator coupling on
> Exynos5800/5422 SoCs in the regulator core. I agree with Dmitry that this
> issue is generic and if possible it should be handled in the core.
> 
> This patchset is another attempt to fix the regulator coupling on
> Exynos5800/5422 SoCs. Here are links to the previous attempts and
> discussions:
> 
> https://lore.kernel.org/linux-samsung-soc/20191008101709.qVNy8eijBi0LynOteWFMnTg4GUwKG599n6OyYoX1Abs@z/
> https://lore.kernel.org/lkml/20191017102758.8104-1-m.szyprowski@samsung.com/
> https://lore.kernel.org/linux-pm/cover.1589528491.git.viresh.kumar@linaro.org/
> https://lore.kernel.org/linux-pm/20200528131130.17984-1-m.szyprowski@samsung.com/
> https://lore.kernel.org/linux-samsung-soc/57cf3a15-5d9b-7636-4c69-60742e8cfae6@samsung.com/
> 
> The problem is with "vdd_int" regulator coupled with "vdd_arm" on Odroid
> XU3/XU4 boards family. "vdd_arm" is handled by CPUfreq. "vdd_int" is
> handled by devfreq. CPUfreq initialized quite early during boot and it
> starts changing OPPs and "vdd_arm" value. Sometimes CPU activity during
> boot goes down and some low-frequency OPPs are selected, what in turn
> causes lowering "vdd_arm". This happens before devfreq applies its
> requirements on "vdd_int". Regulator balancing code reduces "vdd_arm"
> voltage value, what in turn causes lowering "vdd_int" value to the lowest
> possible value. This is much below the operation point of the wcore bus,
> which still runs at the highest frequency.
> 
> The issue was hard to notice because in the most cases the board managed
> to boot properly, even when the regulator was set to lowest value allowed
> by the regulator constraints. However, it caused some random issues,
> which can be observed as "Unhandled prefetch abort" or low USB stability.
> 
> Best regards
> Marek Szyprowski
> ---
>  drivers/regulator/core.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
> index 03154f5b939f..7e9af7ea4bdf 100644
> --- a/drivers/regulator/core.c
> +++ b/drivers/regulator/core.c
> @@ -3553,6 +3553,17 @@ static int regulator_get_optimal_voltage(struct regulator_dev *rdev,
>  		if (ret < 0)
>  			return ret;
>  
> +		/*
> +		 * If no constraints set yet and regulator has boot-on flag,
> +		 * keep its voltage unchanged
> +		 */
> +		if (tmp_min == 0 && c_rdevs[i]->constraints->boot_on) {
> +			ret = regulator_get_voltage_rdev(c_rdevs[i]);
> +			if (ret < 0)
> +				return ret;
> +			tmp_min = ret;
> +		}
> +
>  		ret = regulator_check_voltage(c_rdevs[i], &tmp_min, &tmp_max);
>  		if (ret < 0)
>  			return ret;

This is exactly what Saravana tried to solve earlier AFAIR, lets see what he has
to say here.
Mark Brown June 5, 2020, 10:20 a.m. UTC | #2
On Fri, Jun 05, 2020 at 08:37:24AM +0200, Marek Szyprowski wrote:

> Balancing of the 'boot-on' coupled regulators must wait until the clients
> set their constraints, otherwise the balancing code might change the

No, this is not what boot-on means at all.  It is there for cases where
we can't read the enable status from the hardware.  Trying to infer
*anything* about the runtime behaviour from it being present or absent
is very badly broken.

Saravana (CCed) was working on some patches which tried to deal with
some stuff around this for enables using the sync_state() callback.
Unfortunately there's quite a few problems with the current approach
(the biggest one from my point of view being that it's implemented so
that it requires every single consumer of every device on the PMIC to
come up but there's others at more of an implementation level).
Marek Szyprowski June 5, 2020, 1:37 p.m. UTC | #3
Hi Mark,

On 05.06.2020 12:20, Mark Brown wrote:
> On Fri, Jun 05, 2020 at 08:37:24AM +0200, Marek Szyprowski wrote:
>
>> Balancing of the 'boot-on' coupled regulators must wait until the clients
>> set their constraints, otherwise the balancing code might change the
> No, this is not what boot-on means at all.  It is there for cases where
> we can't read the enable status from the hardware.  Trying to infer
> *anything* about the runtime behaviour from it being present or absent
> is very badly broken.

Okay, what about the 'always-on' property? I don't think that we need 
another property for annotating this behavior, as in my opinion this is 
just an implementation issue on the Linux kernel and regulator 
framework. Alternatively I can drop the property check, but then it 
won't be possible to have a regulator without a consumer, which follows 
the other one (although we still don't have a real use case for it).

If you don't like this idea at all, I will try to move this logic to the 
custom coupler again, although it would mean some code copying.

> Saravana (CCed) was working on some patches which tried to deal with
> some stuff around this for enables using the sync_state() callback.
> Unfortunately there's quite a few problems with the current approach
> (the biggest one from my point of view being that it's implemented so
> that it requires every single consumer of every device on the PMIC to
> come up but there's others at more of an implementation level).
I'm not sure if we really need such complex solution for this...

Best regards
Mark Brown June 5, 2020, 3:59 p.m. UTC | #4
On Fri, Jun 05, 2020 at 03:37:32PM +0200, Marek Szyprowski wrote:
> On 05.06.2020 12:20, Mark Brown wrote:

> > No, this is not what boot-on means at all.  It is there for cases where
> > we can't read the enable status from the hardware.  Trying to infer
> > *anything* about the runtime behaviour from it being present or absent
> > is very badly broken.

> Okay, what about the 'always-on' property? I don't think that we need 
> another property for annotating this behavior, as in my opinion this is 

No, that's just as disconnected from the need - we may as well do it
based on the regulator name being an odd number of characters.

> just an implementation issue on the Linux kernel and regulator 
> framework. Alternatively I can drop the property check, but then it 
> won't be possible to have a regulator without a consumer, which follows 
> the other one (although we still don't have a real use case for it).

> If you don't like this idea at all, I will try to move this logic to the 
> custom coupler again, although it would mean some code copying.

I think that's better TBH.

> > Saravana (CCed) was working on some patches which tried to deal with
> > some stuff around this for enables using the sync_state() callback.
> > Unfortunately there's quite a few problems with the current approach
> > (the biggest one from my point of view being that it's implemented so
> > that it requires every single consumer of every device on the PMIC to
> > come up but there's others at more of an implementation level).

> I'm not sure if we really need such complex solution for this...

So I think that the specific approach there is overly heavyweight and
restrictive but I do see the general use case here for something per
regulator providing we can avoid breaking anything that does actually
need to change the regulator state (eg, raising the voltage for
cpufreq).  Previously to the past week I'd only really heard about it
causing problems in the context of displays left on by the bootloader
glitching during boot but this is a concrete use case and we already
have the infrastructure to track dependencies at the device model level
if we use it well.  

OTOH if you have a coupler already that needs to be doing stuff all the
time at runtime it may be easier to just put this in the coupler,
especially I think in this case where the lack of the devfreq driver
wouldn't mean that the hardware being controlled wasn't being used at
all.  The coupler would end up backstopping a missing cpufreq or devfreq
driver.
Saravana Kannan June 9, 2020, 2:37 a.m. UTC | #5
On Fri, Jun 5, 2020 at 8:59 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Jun 05, 2020 at 03:37:32PM +0200, Marek Szyprowski wrote:
> > On 05.06.2020 12:20, Mark Brown wrote:
>
> > > No, this is not what boot-on means at all.  It is there for cases where
> > > we can't read the enable status from the hardware.  Trying to infer
> > > *anything* about the runtime behaviour from it being present or absent
> > > is very badly broken.
>
> > Okay, what about the 'always-on' property? I don't think that we need
> > another property for annotating this behavior, as in my opinion this is
>
> No, that's just as disconnected from the need - we may as well do it
> based on the regulator name being an odd number of characters.
>
> > just an implementation issue on the Linux kernel and regulator
> > framework. Alternatively I can drop the property check, but then it
> > won't be possible to have a regulator without a consumer, which follows
> > the other one (although we still don't have a real use case for it).
>
> > If you don't like this idea at all, I will try to move this logic to the
> > custom coupler again, although it would mean some code copying.
>
> I think that's better TBH.
>
> > > Saravana (CCed) was working on some patches which tried to deal with
> > > some stuff around this for enables using the sync_state() callback.
> > > Unfortunately there's quite a few problems with the current approach
> > > (the biggest one from my point of view being that it's implemented so
> > > that it requires every single consumer of every device on the PMIC to
> > > come up but there's others at more of an implementation level).
>
> > I'm not sure if we really need such complex solution for this...
>
> So I think that the specific approach there is overly heavyweight and
> restrictive but I do see the general use case here for something per
> regulator providing we can avoid breaking anything that does actually
> need to change the regulator state (eg, raising the voltage for
> cpufreq).

The changes I propose won't prevent anything from asking for more
power/energy (will always allow turning on stuff, increasing voltage,
increasing current, etc). It'll only prevent reducing power lower than
what was provided when the bootloader left stuff on. This shouldn't
break most boards -- because any other consumer could be setting
similar limits and things don't break then. But even if that's a
concern, we can still default to a timeout behavior and then give
folks the choice of disabling the timeout if they know all their
devices will probe.

Btw, the patch series I sent fixes a lot of subtle use cases even with
the timeout enabled. For example, in one hardware platform, a LDO is
shared between camera, display, UFS and USB. The camera driver would
probe first, enable the regulator, poll its HW and then disable the
regulator. This causes the regulator to be disabled before display,
UFS, and USB could probe and this caused hardware faults for those.

> Previously to the past week I'd only really heard about it
> causing problems in the context of displays left on by the bootloader
> glitching during boot but this is a concrete

Ah, finally! I have examples of pretty much the same issue in some
downstream kernels -- the CPU and memory shares rails with other
hardware blocks and things fail if this isn't taken care of. Glad that
someone else found an example for me in the upstream kernel.

> use case and we already
> have the infrastructure to track dependencies at the device model level
> if we use it well.

I'll send out a v3 series in a couple of days to address Mark's
earlier comments and also add the voltage support to address Marek's
case. We can take it from there.

-Saravana
diff mbox series

Patch

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 03154f5b939f..7e9af7ea4bdf 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -3553,6 +3553,17 @@  static int regulator_get_optimal_voltage(struct regulator_dev *rdev,
 		if (ret < 0)
 			return ret;
 
+		/*
+		 * If no constraints set yet and regulator has boot-on flag,
+		 * keep its voltage unchanged
+		 */
+		if (tmp_min == 0 && c_rdevs[i]->constraints->boot_on) {
+			ret = regulator_get_voltage_rdev(c_rdevs[i]);
+			if (ret < 0)
+				return ret;
+			tmp_min = ret;
+		}
+
 		ret = regulator_check_voltage(c_rdevs[i], &tmp_min, &tmp_max);
 		if (ret < 0)
 			return ret;