diff mbox series

[22/22] clk: Prevent a clock without a rate to register

Message ID 20220408091037.2041955-23-maxime@cerno.tech (mailing list archive)
State Superseded, archived
Headers show
Series clk: More clock rate fixes and tests | expand

Commit Message

Maxime Ripard April 8, 2022, 9:10 a.m. UTC
A rate of 0 for a clock is considered an error, as evidenced by the
documentation of clk_get_rate() and the code of clk_get_rate() and
clk_core_get_rate_nolock().

The main source of that error is if the clock is supposed to have a
parent but is orphan at the moment of the call. This is likely to be
transient and solved later in the life of the system as more clocks are
registered.

The corollary is thus that if a clock is not an orphan, has a parent that
has a rate (so is not an orphan itself either) but returns a rate of 0,
something is wrong in the driver. Let's return an error in such a case.

Signed-off-by: Maxime Ripard <maxime@cerno.tech>
---
 drivers/clk/clk.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Jerome Brunet April 8, 2022, 9:18 a.m. UTC | #1
On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:

> A rate of 0 for a clock is considered an error, as evidenced by the
> documentation of clk_get_rate() and the code of clk_get_rate() and
> clk_core_get_rate_nolock().
>
> The main source of that error is if the clock is supposed to have a
> parent but is orphan at the moment of the call. This is likely to be
> transient and solved later in the life of the system as more clocks are
> registered.
>
> The corollary is thus that if a clock is not an orphan, has a parent that
> has a rate (so is not an orphan itself either) but returns a rate of 0,
> something is wrong in the driver. Let's return an error in such a case.
>
> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> ---
>  drivers/clk/clk.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 8bbb6adeeead..e8c55678da85 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
>  		rate = 0;
>  	core->rate = core->req_rate = rate;
>  
> +	/*
> +	 * If we're not an orphan clock and our parent has a rate, then
> +	 * if our rate is 0, something is badly broken in recalc_rate.
> +	 */
> +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
> +		ret = -EINVAL;
> +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> +		goto out;
> +	}
> +

As hinted in the cover letter, I don't really agree with that.

There are situations where we can't compute the rate. Getting invalid
value in the register is one reason.

You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
SoCs would be affected):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
Yes, PLL that have not been previously used (by the ROMCode or the
bootloader) tend to have the value of the divider set to 0 which in
invalid as it would result in a division by zero.

I don't think this is a bug. It is just what the HW is, an unlocked,
uninitialized PLL. There is no problem here and the PLL can remain like
that until it is needed.

IMO, whenever possible we should not put default values in the clocks
which is why I chose to leave it like that.

The PLL being unlocked, it has no rate. It is not set to have any rate.
IMO a returning a rate of 0 is valid here.

>  	/*
>  	 * Enable CLK_IS_CRITICAL clocks so newly added critical clocks
>  	 * don't get accidentally disabled when walking the orphan tree and
Maxime Ripard April 8, 2022, 10:41 a.m. UTC | #2
On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote:
> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
> > A rate of 0 for a clock is considered an error, as evidenced by the
> > documentation of clk_get_rate() and the code of clk_get_rate() and
> > clk_core_get_rate_nolock().
> >
> > The main source of that error is if the clock is supposed to have a
> > parent but is orphan at the moment of the call. This is likely to be
> > transient and solved later in the life of the system as more clocks are
> > registered.
> >
> > The corollary is thus that if a clock is not an orphan, has a parent that
> > has a rate (so is not an orphan itself either) but returns a rate of 0,
> > something is wrong in the driver. Let's return an error in such a case.
> >
> > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > ---
> >  drivers/clk/clk.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 8bbb6adeeead..e8c55678da85 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> >  		rate = 0;
> >  	core->rate = core->req_rate = rate;
> >  
> > +	/*
> > +	 * If we're not an orphan clock and our parent has a rate, then
> > +	 * if our rate is 0, something is badly broken in recalc_rate.
> > +	 */
> > +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
> > +		ret = -EINVAL;
> > +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> > +		goto out;
> > +	}
> > +
> 
> As hinted in the cover letter, I don't really agree with that.
> 
> There are situations where we can't compute the rate. Getting invalid
> value in the register is one reason.
> 
> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> SoCs would be affected):
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> Yes, PLL that have not been previously used (by the ROMCode or the
> bootloader) tend to have the value of the divider set to 0 which in
> invalid as it would result in a division by zero.
> 
> I don't think this is a bug. It is just what the HW is, an unlocked,
> uninitialized PLL. There is no problem here and the PLL can remain like
> that until it is needed.

I think the larger issue is around the semantics of clk_get_rate(), and
especially whether we can call it without a clk_enable(), and whether
returning 0 is fine.

The (clk.h) documentation of clk_get_rate() mentions that "This is only
valid once the clock source has been enabled", and it's fairly
ambiguous. I can see how it could be interpreted as "you need to call
clk_enable() before calling clk_get_rate()", but it can also be
interpreted as "The returned rate will only be valid once clk_enable()
is called".

I think the latter is the proper interpretation though based on what the
drivers are doing, and even the CCF itself will call recalc_rate without
making sure that the clock is enabled (in __clk_core_init() for example).

Then there is the question of whether returning 0 is fine. Again
clk_get_rate() (clk.c) documentation states that "If clk is NULL then
returns 0.". This is indeed returned in case of an error condition (in
clk_get_rate() itself, but also in clk_core_get_rate_nolock()).

All the drivers I could find either assume the rate is valid, or test
whether it's 0 or not (randomly picked, but across completely different
platforms):
https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50
https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74
https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194
https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278

So my understanding is that the consensus is that clk_get_rate() can be
called even if the clock hasn't been enabled, and that returning 0 is
only meant to be used for errors in general, a NULL pointer according to
the documentation.

That would mean that pcie_pll_dco is buggy because it assumes that
clk_enable() is going to be called before clk_get_rate(), gp0_pll_dco
and hifi_pll_dco because they expect "someone" to call clk_set_rate()
before clk_get_rate(), and hdmi_pll_dco because it will always return 0,
unless the display driver comes around and updates it. If it never does,
or if it's not compiled in, then you're out of luck.

> IMO, whenever possible we should not put default values in the clocks
> which is why I chose to leave it like that.
>
> The PLL being unlocked, it has no rate. It is not set to have any rate.
> IMO a returning a rate of 0 is valid here.

If there's not a sensible default in the hardware already, I don't see
what the big issue is to be honest. You already kind of do that for all
the other PLL parameters with init_regs, and most drivers do that for
various stuff:
https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917
https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448
https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157
https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013

If the driver needs that kind of quirks in order to make the clock
usable in itself, then it just makes sense to do that, especially if
it's to avoid breaking a generic API.

Maxime
Jerome Brunet April 8, 2022, 11:24 a.m. UTC | #3
On Fri 08 Apr 2022 at 12:41, Maxime Ripard <maxime@cerno.tech> wrote:

> [[PGP Signed Part:Undecided]]
> On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote:
>> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
>> > A rate of 0 for a clock is considered an error, as evidenced by the
>> > documentation of clk_get_rate() and the code of clk_get_rate() and
>> > clk_core_get_rate_nolock().
>> >
>> > The main source of that error is if the clock is supposed to have a
>> > parent but is orphan at the moment of the call. This is likely to be
>> > transient and solved later in the life of the system as more clocks are
>> > registered.
>> >
>> > The corollary is thus that if a clock is not an orphan, has a parent that
>> > has a rate (so is not an orphan itself either) but returns a rate of 0,
>> > something is wrong in the driver. Let's return an error in such a case.
>> >
>> > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
>> > ---
>> >  drivers/clk/clk.c | 10 ++++++++++
>> >  1 file changed, 10 insertions(+)
>> >
>> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>> > index 8bbb6adeeead..e8c55678da85 100644
>> > --- a/drivers/clk/clk.c
>> > +++ b/drivers/clk/clk.c
>> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
>> >  		rate = 0;
>> >  	core->rate = core->req_rate = rate;
>> >  
>> > +	/*
>> > +	 * If we're not an orphan clock and our parent has a rate, then
>> > +	 * if our rate is 0, something is badly broken in recalc_rate.
>> > +	 */
>> > +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
>> > +		ret = -EINVAL;
>> > +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
>> > +		goto out;
>> > +	}
>> > +
>> 
>> As hinted in the cover letter, I don't really agree with that.
>> 
>> There are situations where we can't compute the rate. Getting invalid
>> value in the register is one reason.
>> 
>> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
>> SoCs would be affected):
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
>> Yes, PLL that have not been previously used (by the ROMCode or the
>> bootloader) tend to have the value of the divider set to 0 which in
>> invalid as it would result in a division by zero.
>> 
>> I don't think this is a bug. It is just what the HW is, an unlocked,
>> uninitialized PLL. There is no problem here and the PLL can remain like
>> that until it is needed.
>
> I think the larger issue is around the semantics of clk_get_rate(), and
> especially whether we can call it without a clk_enable(), and whether
> returning 0 is fine.
>
> The (clk.h) documentation of clk_get_rate() mentions that "This is only
> valid once the clock source has been enabled", and it's fairly
> ambiguous. I can see how it could be interpreted as "you need to call
> clk_enable() before calling clk_get_rate()", but it can also be
> interpreted as "The returned rate will only be valid once clk_enable()
> is called".
>
> I think the latter is the proper interpretation though based on what the
> drivers are doing, and even the CCF itself will call recalc_rate without
> making sure that the clock is enabled (in __clk_core_init() for example).
>
> Then there is the question of whether returning 0 is fine. Again
> clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> returns 0.". This is indeed returned in case of an error condition (in
> clk_get_rate() itself, but also in clk_core_get_rate_nolock()).
>
> All the drivers I could find either assume the rate is valid, or test
> whether it's 0 or not (randomly picked, but across completely different
> platforms):
> https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50
> https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74
> https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194
> https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278
>
> So my understanding is that the consensus is that clk_get_rate() can be
> called even if the clock hasn't been enabled, and that returning 0 is
> only meant to be used for errors in general, a NULL pointer according to
> the documentation.
>
> That would mean that pcie_pll_dco is buggy because it assumes that
> clk_enable()

This one indeed does everything in the enable and I could agree it is
fishy, but since it supports only a single rate I don't think it is a
problem. Even if it had a proper set_rate(), it would not change your
problem since it would still return 0 until some consumer actually needs
its parameter to change.

> is going to be called before clk_get_rate(), gp0_pll_dco
> and hifi_pll_dco because they expect "someone" to call clk_set_rate()
> before clk_get_rate(),

No, they don't expect anything. They will return 0 until they are set
with a an actual rate. I don't think returning 0 should be
problem and it has not been so far.

I understand the ambiguity you mentioned above. If the framework decides
it is clearly forbidden to return 0, we'll change.

Still I don't think it would be wise. What are the alternative if you
can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is
a clear indication that the clock is not in a working state.

> and hdmi_pll_dco because it will always return 0,

It is a read-only clock - whatever we do in CCF, it is not going to
change. CCF has always supported RO clocks.

> unless the display driver comes around and updates it. If it never does,
> or if it's not compiled in, then you're out of luck.

I'm all for managing the display clocks from CCF but it has proved tricky
so far. Maybe it will someday.

Being a read-only clock, the value is what it is and CCF should
deal with it gracefully. It has so far.

If the driver actually managing the clock is not compiled in, then the
clock will never be set, and it should not be a problem either.

>
>> IMO, whenever possible we should not put default values in the clocks
>> which is why I chose to leave it like that.
>>
>> The PLL being unlocked, it has no rate. It is not set to have any rate.
>> IMO a returning a rate of 0 is valid here.
>
> If there's not a sensible default in the hardware already, I don't see
> what the big issue is to be honest. You already kind of do that for all
> the other PLL parameters with init_regs

Those initial parameters are "magic" analog setting which don't have an
impact on the rate setting. The initial rate of the clock is never set
by the clock driver on purpose.

> , and most drivers do that for
> various stuff:
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013

It is done by other drivers or controllers, yes. It does not make it
right (again, it is just my opinion). Rate should never be set by the
clock driver or the clock controller - Those should just implement what
consumer wants. I would agree it sometimes proves tricky to hold onto
this.

Taking one of the example above:
https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550

I think it would be better to have an "assigned-clock" on the related
PLL in the USB node of the platform DT. That way the PLL is set when
needed.

If we go down the road of "others are doing it, so why not ?", I think Marek
initial regression report mentioned Exynos too ;) 

>
> If the driver needs that kind of quirks in order to make the clock
> usable in itself, then it just makes sense to do that, especially if
> it's to avoid breaking a generic API.

As it is the clock are usable and it did not break anything so far.

I have no problem updating the drivers if need be. I do have a problem
with the framework changing and requiring the clock driver to set an
initial rate to make it happy.

>
> Maxime
>
> [[End of PGP Signed Part]]
Marek Szyprowski April 8, 2022, 12:17 p.m. UTC | #4
Hi All,

On 08.04.2022 11:18, Jerome Brunet wrote:
> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
>
>> A rate of 0 for a clock is considered an error, as evidenced by the
>> documentation of clk_get_rate() and the code of clk_get_rate() and
>> clk_core_get_rate_nolock().
>>
>> The main source of that error is if the clock is supposed to have a
>> parent but is orphan at the moment of the call. This is likely to be
>> transient and solved later in the life of the system as more clocks are
>> registered.
>>
>> The corollary is thus that if a clock is not an orphan, has a parent that
>> has a rate (so is not an orphan itself either) but returns a rate of 0,
>> something is wrong in the driver. Let's return an error in such a case.
>>
>> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
>> ---
>>   drivers/clk/clk.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>> index 8bbb6adeeead..e8c55678da85 100644
>> --- a/drivers/clk/clk.c
>> +++ b/drivers/clk/clk.c
>> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
>>   		rate = 0;
>>   	core->rate = core->req_rate = rate;
>>   
>> +	/*
>> +	 * If we're not an orphan clock and our parent has a rate, then
>> +	 * if our rate is 0, something is badly broken in recalc_rate.
>> +	 */
>> +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
>> +		ret = -EINVAL;
>> +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
>> +		goto out;
>> +	}
>> +
> As hinted in the cover letter, I don't really agree with that.
>
> There are situations where we can't compute the rate. Getting invalid
> value in the register is one reason.
>
> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> SoCs would be affected):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> Yes, PLL that have not been previously used (by the ROMCode or the
> bootloader) tend to have the value of the divider set to 0 which in
> invalid as it would result in a division by zero.
>
> I don't think this is a bug. It is just what the HW is, an unlocked,
> uninitialized PLL. There is no problem here and the PLL can remain like
> that until it is needed.
>
> IMO, whenever possible we should not put default values in the clocks
> which is why I chose to leave it like that.
>
> The PLL being unlocked, it has no rate. It is not set to have any rate.
> IMO a returning a rate of 0 is valid here.

I agree that it should be possible to register so called unconfigured 
clock, which doesn't have any rate set yet. It might be used later by 
some drivers, which will do the whole initialization (set rate and or 
parents). I also wonder why the above patch triggers warning on the 
Amlogic Socs, while on Exynos I still have some unused clocks with 0 
rate registered properly.

Best regards
Maxime Ripard April 8, 2022, 12:25 p.m. UTC | #5
Hi,

On Fri, Apr 08, 2022 at 02:17:55PM +0200, Marek Szyprowski wrote:
> On 08.04.2022 11:18, Jerome Brunet wrote:
> > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
> >
> >> A rate of 0 for a clock is considered an error, as evidenced by the
> >> documentation of clk_get_rate() and the code of clk_get_rate() and
> >> clk_core_get_rate_nolock().
> >>
> >> The main source of that error is if the clock is supposed to have a
> >> parent but is orphan at the moment of the call. This is likely to be
> >> transient and solved later in the life of the system as more clocks are
> >> registered.
> >>
> >> The corollary is thus that if a clock is not an orphan, has a parent that
> >> has a rate (so is not an orphan itself either) but returns a rate of 0,
> >> something is wrong in the driver. Let's return an error in such a case.
> >>
> >> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> >> ---
> >>   drivers/clk/clk.c | 10 ++++++++++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> >> index 8bbb6adeeead..e8c55678da85 100644
> >> --- a/drivers/clk/clk.c
> >> +++ b/drivers/clk/clk.c
> >> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> >>   		rate = 0;
> >>   	core->rate = core->req_rate = rate;
> >>   
> >> +	/*
> >> +	 * If we're not an orphan clock and our parent has a rate, then
> >> +	 * if our rate is 0, something is badly broken in recalc_rate.
> >> +	 */
> >> +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
> >> +		ret = -EINVAL;
> >> +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> >> +		goto out;
> >> +	}
> >> +
> > As hinted in the cover letter, I don't really agree with that.
> >
> > There are situations where we can't compute the rate. Getting invalid
> > value in the register is one reason.
> >
> > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> > SoCs would be affected):
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> > Yes, PLL that have not been previously used (by the ROMCode or the
> > bootloader) tend to have the value of the divider set to 0 which in
> > invalid as it would result in a division by zero.
> >
> > I don't think this is a bug. It is just what the HW is, an unlocked,
> > uninitialized PLL. There is no problem here and the PLL can remain like
> > that until it is needed.
> >
> > IMO, whenever possible we should not put default values in the clocks
> > which is why I chose to leave it like that.
> >
> > The PLL being unlocked, it has no rate. It is not set to have any rate.
> > IMO a returning a rate of 0 is valid here.
> 
> I agree that it should be possible to register so called unconfigured 
> clock, which doesn't have any rate set yet. It might be used later by 
> some drivers, which will do the whole initialization (set rate and or 
> parents).

I think we should differentiate between the clock being fine from the
CCF point of view, and from the device point of view. I'm arguing about
the former, but the latter should definitely be up to other drivers to
set it up so that their hardware work.

But in both cases, we should return something valid under the CCF
semantics.

> I also wonder why the above patch triggers warning on the Amlogic
> Socs, while on Exynos I still have some unused clocks with 0 rate
> registered properly.

Most likely the clocks that return a rate of 0 are orphans (ie, their
parent isn't registered yet).

We can't expect all the parents to be registered before their children,
so this is fine, and that patch doesn't break it.

Did you test this series for exynos and meson then? Could you give your
tested-by if you did? :)

Maxime
Maxime Ripard April 8, 2022, 12:55 p.m. UTC | #6
On Fri, Apr 08, 2022 at 01:24:59PM +0200, Jerome Brunet wrote:
> On Fri 08 Apr 2022 at 12:41, Maxime Ripard <maxime@cerno.tech> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote:
> >> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
> >> > A rate of 0 for a clock is considered an error, as evidenced by the
> >> > documentation of clk_get_rate() and the code of clk_get_rate() and
> >> > clk_core_get_rate_nolock().
> >> >
> >> > The main source of that error is if the clock is supposed to have a
> >> > parent but is orphan at the moment of the call. This is likely to be
> >> > transient and solved later in the life of the system as more clocks are
> >> > registered.
> >> >
> >> > The corollary is thus that if a clock is not an orphan, has a parent that
> >> > has a rate (so is not an orphan itself either) but returns a rate of 0,
> >> > something is wrong in the driver. Let's return an error in such a case.
> >> >
> >> > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> >> > ---
> >> >  drivers/clk/clk.c | 10 ++++++++++
> >> >  1 file changed, 10 insertions(+)
> >> >
> >> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> >> > index 8bbb6adeeead..e8c55678da85 100644
> >> > --- a/drivers/clk/clk.c
> >> > +++ b/drivers/clk/clk.c
> >> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> >> >  		rate = 0;
> >> >  	core->rate = core->req_rate = rate;
> >> >  
> >> > +	/*
> >> > +	 * If we're not an orphan clock and our parent has a rate, then
> >> > +	 * if our rate is 0, something is badly broken in recalc_rate.
> >> > +	 */
> >> > +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
> >> > +		ret = -EINVAL;
> >> > +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> >> > +		goto out;
> >> > +	}
> >> > +
> >> 
> >> As hinted in the cover letter, I don't really agree with that.
> >> 
> >> There are situations where we can't compute the rate. Getting invalid
> >> value in the register is one reason.
> >> 
> >> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> >> SoCs would be affected):
> >> 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> >> Yes, PLL that have not been previously used (by the ROMCode or the
> >> bootloader) tend to have the value of the divider set to 0 which in
> >> invalid as it would result in a division by zero.
> >> 
> >> I don't think this is a bug. It is just what the HW is, an unlocked,
> >> uninitialized PLL. There is no problem here and the PLL can remain like
> >> that until it is needed.
> >
> > I think the larger issue is around the semantics of clk_get_rate(), and
> > especially whether we can call it without a clk_enable(), and whether
> > returning 0 is fine.
> >
> > The (clk.h) documentation of clk_get_rate() mentions that "This is only
> > valid once the clock source has been enabled", and it's fairly
> > ambiguous. I can see how it could be interpreted as "you need to call
> > clk_enable() before calling clk_get_rate()", but it can also be
> > interpreted as "The returned rate will only be valid once clk_enable()
> > is called".
> >
> > I think the latter is the proper interpretation though based on what the
> > drivers are doing, and even the CCF itself will call recalc_rate without
> > making sure that the clock is enabled (in __clk_core_init() for example).
> >
> > Then there is the question of whether returning 0 is fine. Again
> > clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> > returns 0.". This is indeed returned in case of an error condition (in
> > clk_get_rate() itself, but also in clk_core_get_rate_nolock()).
> >
> > All the drivers I could find either assume the rate is valid, or test
> > whether it's 0 or not (randomly picked, but across completely different
> > platforms):
> > https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50
> > https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74
> > https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194
> > https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278
> >
> > So my understanding is that the consensus is that clk_get_rate() can be
> > called even if the clock hasn't been enabled, and that returning 0 is
> > only meant to be used for errors in general, a NULL pointer according to
> > the documentation.
> >
> > That would mean that pcie_pll_dco is buggy because it assumes that
> > clk_enable()
> 
> This one indeed does everything in the enable and I could agree it is
> fishy, but since it supports only a single rate I don't think it is a
> problem. Even if it had a proper set_rate(), it would not change your
> problem since it would still return 0 until some consumer actually needs
> its parameter to change.
> 
> > is going to be called before clk_get_rate(), gp0_pll_dco
> > and hifi_pll_dco because they expect "someone" to call clk_set_rate()
> > before clk_get_rate(),
> 
> No, they don't expect anything. They will return 0 until they are set
> with a an actual rate. I don't think returning 0 should be
> problem and it has not been so far.

So, if I was to rephrase, gp0_pll_dco and hifi_pll_dco expect someone to
call clk_set_rate() before clk_get_rate() to have it return anything
other than 0.

> I understand the ambiguity you mentioned above. If the framework decides
> it is clearly forbidden to return 0, we'll change.
> 
> Still I don't think it would be wise. What are the alternative if you
> can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is
> a clear indication that the clock is not in a working state.

No, the alternative would be to initialize the clock to something
sensible before registering it (or in init).

> > and hdmi_pll_dco because it will always return 0,
> 
> It is a read-only clock - whatever we do in CCF, it is not going to
> change. CCF has always supported RO clocks.

Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's
useful in anyway, it should report its current rate. Read-only or not.

> > unless the display driver comes around and updates it. If it never does,
> > or if it's not compiled in, then you're out of luck.
> 
> I'm all for managing the display clocks from CCF but it has proved tricky
> so far. Maybe it will someday.
> 
> Being a read-only clock, the value is what it is and CCF should
> deal with it gracefully. It has so far.
>
> If the driver actually managing the clock is not compiled in, then the
> clock will never be set, and it should not be a problem either.

Again, it depends on what you expect from clk_get_rate(). If it can only
report a rate of 0 on error, then it's definitely a problem.

And it's not because it was done before that it wasn't just as
problematic then.

> >> IMO, whenever possible we should not put default values in the clocks
> >> which is why I chose to leave it like that.
> >>
> >> The PLL being unlocked, it has no rate. It is not set to have any rate.
> >> IMO a returning a rate of 0 is valid here.
> >
> > If there's not a sensible default in the hardware already, I don't see
> > what the big issue is to be honest. You already kind of do that for all
> > the other PLL parameters with init_regs
> 
> Those initial parameters are "magic" analog setting which don't have an
> impact on the rate setting. The initial rate of the clock is never set
> by the clock driver on purpose.

It's still fundamentally the same: whatever is there at boot isn't good
enough, so you change it to have a somewhat sensible default.

> > , and most drivers do that for
> > various stuff:
> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917
> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448
> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157
> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013
> 
> It is done by other drivers or controllers, yes. It does not make it
> right (again, it is just my opinion). Rate should never be set by the
> clock driver or the clock controller - Those should just implement what
> consumer wants. I would agree it sometimes proves tricky to hold onto
> this.
> 
> Taking one of the example above:
> https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
> 
> I think it would be better to have an "assigned-clock" on the related
> PLL in the USB node of the platform DT. That way the PLL is set when
> needed.

Both are valid. Assigned-clocks is arguably more fragile, but that's not
really the point.

> If we go down the road of "others are doing it, so why not ?", I think Marek
> initial regression report mentioned Exynos too ;) 

Yes, he did mention exynos as well, but the issue was entirely
different.

Exynos had the issue that req_rate wasn't updated whenever a clock
wasn't orphan anymore because it changed parent. It's fixed in patch 10.

Out of the drivers I tested this week (sunxi-ng, exynos, omap3, qcom,
imx6, imx8 and g12b), only meson is in this case.

> > If the driver needs that kind of quirks in order to make the clock
> > usable in itself, then it just makes sense to do that, especially if
> > it's to avoid breaking a generic API.
> 
> As it is the clock are usable and it did not break anything so far.

Anything but the abstraction.

> I have no problem updating the drivers if need be. I do have a problem
> with the framework changing and requiring the clock driver to set an
> initial rate to make it happy.

I mean, the alternative is changing the semantics of clk_get_rate(), and
all the call sites. Fixing one inconsistent driver definitely seems
preferable.

Maxime
Marek Szyprowski April 8, 2022, 1:46 p.m. UTC | #7
Hi Maxime,

On 08.04.2022 14:25, Maxime Ripard wrote:
> On Fri, Apr 08, 2022 at 02:17:55PM +0200, Marek Szyprowski wrote:
>> On 08.04.2022 11:18, Jerome Brunet wrote:
>>> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
>>>
>>>> A rate of 0 for a clock is considered an error, as evidenced by the
>>>> documentation of clk_get_rate() and the code of clk_get_rate() and
>>>> clk_core_get_rate_nolock().
>>>>
>>>> The main source of that error is if the clock is supposed to have a
>>>> parent but is orphan at the moment of the call. This is likely to be
>>>> transient and solved later in the life of the system as more clocks are
>>>> registered.
>>>>
>>>> The corollary is thus that if a clock is not an orphan, has a parent that
>>>> has a rate (so is not an orphan itself either) but returns a rate of 0,
>>>> something is wrong in the driver. Let's return an error in such a case.
>>>>
>>>> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
>>>> ---
>>>>    drivers/clk/clk.c | 10 ++++++++++
>>>>    1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
>>>> index 8bbb6adeeead..e8c55678da85 100644
>>>> --- a/drivers/clk/clk.c
>>>> +++ b/drivers/clk/clk.c
>>>> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
>>>>    		rate = 0;
>>>>    	core->rate = core->req_rate = rate;
>>>>    
>>>> +	/*
>>>> +	 * If we're not an orphan clock and our parent has a rate, then
>>>> +	 * if our rate is 0, something is badly broken in recalc_rate.
>>>> +	 */
>>>> +	if (!core->orphan && (parent && parent->rate) && !core->rate) {
>>>> +		ret = -EINVAL;
>>>> +		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
>>>> +		goto out;
>>>> +	}
>>>> +
>>> As hinted in the cover letter, I don't really agree with that.
>>>
>>> There are situations where we can't compute the rate. Getting invalid
>>> value in the register is one reason.
>>>
>>> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
>>> SoCs would be affected):
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
>>> Yes, PLL that have not been previously used (by the ROMCode or the
>>> bootloader) tend to have the value of the divider set to 0 which in
>>> invalid as it would result in a division by zero.
>>>
>>> I don't think this is a bug. It is just what the HW is, an unlocked,
>>> uninitialized PLL. There is no problem here and the PLL can remain like
>>> that until it is needed.
>>>
>>> IMO, whenever possible we should not put default values in the clocks
>>> which is why I chose to leave it like that.
>>>
>>> The PLL being unlocked, it has no rate. It is not set to have any rate.
>>> IMO a returning a rate of 0 is valid here.
>> I agree that it should be possible to register so called unconfigured
>> clock, which doesn't have any rate set yet. It might be used later by
>> some drivers, which will do the whole initialization (set rate and or
>> parents).
> I think we should differentiate between the clock being fine from the
> CCF point of view, and from the device point of view. I'm arguing about
> the former, but the latter should definitely be up to other drivers to
> set it up so that their hardware work.
>
> But in both cases, we should return something valid under the CCF
> semantics.
>
>> I also wonder why the above patch triggers warning on the Amlogic
>> Socs, while on Exynos I still have some unused clocks with 0 rate
>> registered properly.
> Most likely the clocks that return a rate of 0 are orphans (ie, their
> parent isn't registered yet).
Right.
> We can't expect all the parents to be registered before their children,
> so this is fine, and that patch doesn't break it.
>
> Did you test this series for exynos and meson then? Could you give your
> tested-by if you did? :)

Yes, I've tested the whole set. All my Exynos based test boards work 
fine. All my Meson based boards fails to boot due to this patch, thus my 
comment. Feel free to add to all but the last patch:

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
Jerome Brunet April 8, 2022, 2:48 p.m. UTC | #8
On Fri 08 Apr 2022 at 14:55, Maxime Ripard <maxime@cerno.tech> wrote:

>> 
>> No, they don't expect anything. They will return 0 until they are set
>> with a an actual rate. I don't think returning 0 should be
>> problem and it has not been so far.
>
> So, if I was to rephrase, gp0_pll_dco and hifi_pll_dco expect someone to
> call clk_set_rate() before clk_get_rate() to have it return anything
> other than 0.

Yes. It has no rate. I won't change until something make it so

>
>> I understand the ambiguity you mentioned above. If the framework decides
>> it is clearly forbidden to return 0, we'll change.
>> 
>> Still I don't think it would be wise. What are the alternative if you
>> can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is
>> a clear indication that the clock is not in a working state.
>
> No, the alternative would be to initialize the clock to something
> sensible before registering it (or in init).

Well, I disagree :/

>
>> > and hdmi_pll_dco because it will always return 0,
>> 
>> It is a read-only clock - whatever we do in CCF, it is not going to
>> change. CCF has always supported RO clocks.
>
> Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's
> useful in anyway, it should report its current rate. Read-only or not.
>

It does report its rate, it does not have any.
... and It would pretty weird to initialize a read-only clock.

>> > unless the display driver comes around and updates it. If it never does,
>> > or if it's not compiled in, then you're out of luck.
>> 
>> I'm all for managing the display clocks from CCF but it has proved tricky
>> so far. Maybe it will someday.
>> 
>> Being a read-only clock, the value is what it is and CCF should
>> deal with it gracefully. It has so far.
>>
>> If the driver actually managing the clock is not compiled in, then the
>> clock will never be set, and it should not be a problem either.
>
> Again, it depends on what you expect from clk_get_rate(). If it can only
> report a rate of 0 on error, then it's definitely a problem.

Agreed, it depends on what you expect from clk_get_rate().
Still when something does not oscillate, the rate is 0.

If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I
think returning 0 makes sense.

>
> And it's not because it was done before that it wasn't just as
> problematic then.
>
>> >> IMO, whenever possible we should not put default values in the clocks
>> >> which is why I chose to leave it like that.
>> >>
>> >> The PLL being unlocked, it has no rate. It is not set to have any rate.
>> >> IMO a returning a rate of 0 is valid here.
>> >
>> > If there's not a sensible default in the hardware already, I don't see
>> > what the big issue is to be honest. You already kind of do that for all
>> > the other PLL parameters with init_regs
>> 
>> Those initial parameters are "magic" analog setting which don't have an
>> impact on the rate setting. The initial rate of the clock is never set
>> by the clock driver on purpose.
>
> It's still fundamentally the same: whatever is there at boot isn't good
> enough, so you change it to have a somewhat sensible default.

If you don't need it, no.

>
>> > , and most drivers do that for
>> > various stuff:
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157
>> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013
>> 
>> It is done by other drivers or controllers, yes. It does not make it
>> right (again, it is just my opinion). Rate should never be set by the
>> clock driver or the clock controller - Those should just implement what
>> consumer wants. I would agree it sometimes proves tricky to hold onto
>> this.
>> 
>> Taking one of the example above:
>> https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550
>> 
>> I think it would be better to have an "assigned-clock" on the related
>> PLL in the USB node of the platform DT. That way the PLL is set when
>> needed.
>
> Both are valid. Assigned-clocks is arguably more fragile, but that's not
> really the point.
>
>> If we go down the road of "others are doing it, so why not ?", I think Marek
>> initial regression report mentioned Exynos too ;) 
>
> Yes, he did mention exynos as well, but the issue was entirely
> different.
>
> Exynos had the issue that req_rate wasn't updated whenever a clock
> wasn't orphan anymore because it changed parent. It's fixed in patch 10.
>
> Out of the drivers I tested this week (sunxi-ng, exynos, omap3, qcom,
> imx6, imx8 and g12b), only meson is in this case.
>
>> > If the driver needs that kind of quirks in order to make the clock
>> > usable in itself, then it just makes sense to do that, especially if
>> > it's to avoid breaking a generic API.
>> 
>> As it is the clock are usable and it did not break anything so far.
>
> Anything but the abstraction.
>
>> I have no problem updating the drivers if need be. I do have a problem
>> with the framework changing and requiring the clock driver to set an
>> initial rate to make it happy.
>
> I mean, the alternative is changing the semantics of clk_get_rate(), and
> all the call sites. Fixing one inconsistent driver definitely seems
> preferable.
>
> Maxime
>
> [[End of PGP Signed Part]]
Maxime Ripard April 8, 2022, 3:36 p.m. UTC | #9
On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote:
> >> > and hdmi_pll_dco because it will always return 0,
> >> 
> >> It is a read-only clock - whatever we do in CCF, it is not going to
> >> change. CCF has always supported RO clocks.
> >
> > Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's
> > useful in anyway, it should report its current rate. Read-only or not.
> >
> 
> It does report its rate, it does not have any.
> ... and It would pretty weird to initialize a read-only clock.

The KMS driver doesn't seem to have a problem with that.

> >> > unless the display driver comes around and updates it. If it never does,
> >> > or if it's not compiled in, then you're out of luck.
> >> 
> >> I'm all for managing the display clocks from CCF but it has proved tricky
> >> so far. Maybe it will someday.
> >> 
> >> Being a read-only clock, the value is what it is and CCF should
> >> deal with it gracefully. It has so far.
> >>
> >> If the driver actually managing the clock is not compiled in, then the
> >> clock will never be set, and it should not be a problem either.
> >
> > Again, it depends on what you expect from clk_get_rate(). If it can only
> > report a rate of 0 on error, then it's definitely a problem.
> 
> Agreed, it depends on what you expect from clk_get_rate().
> Still when something does not oscillate, the rate is 0.
> 
> If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I
> think returning 0 makes sense.

You're confusing two things: the rate output by the hardware, and what
the CCF needs to return. We're discussing the latter here, a software
construct. It models the hardware, but it doesn't have to be true to the
hardware.

And even the meson driver doesn't follow what you're claiming to the
letter and is inconsistent with what you're saying. Any disabled gate
will also have a hardware rate of 0. Yet, it doesn't return 0 in such a
case. And no one does, because clk_get_rate() isn't supposed to return
the actual rate in hardware at the moment. It's supposed to return what
would be the rate if it was enabled.

> > And it's not because it was done before that it wasn't just as
> > problematic then.
> >
> >> >> IMO, whenever possible we should not put default values in the clocks
> >> >> which is why I chose to leave it like that.
> >> >>
> >> >> The PLL being unlocked, it has no rate. It is not set to have any rate.
> >> >> IMO a returning a rate of 0 is valid here.
> >> >
> >> > If there's not a sensible default in the hardware already, I don't see
> >> > what the big issue is to be honest. You already kind of do that for all
> >> > the other PLL parameters with init_regs
> >> 
> >> Those initial parameters are "magic" analog setting which don't have an
> >> impact on the rate setting. The initial rate of the clock is never set
> >> by the clock driver on purpose.
> >
> > It's still fundamentally the same: whatever is there at boot isn't good
> > enough, so you change it to have a somewhat sensible default.
> 
> If you don't need it, no.

I mean, I provided multiple examples that the meson driver is being
inconsistent with both the CCF documentation and the expected usage of
the CCF consumers. And I suggested solutions to fix this inconsistency
both here and on IRC to Neil and you.

The only argument you provided so far is that you don't like or want it.
I'm sorry you feel this way, but it doesn't change either the consensus
or the documentation that every other driver and consumer is following.
Nor does it provide any solution.

In the grand scheme of things, I don't care, if you want to have your
own conventions a policies, go ahead. Just don't expect me to debug
whatever is going on next time it falls apart.

Maxime
Neil Armstrong April 11, 2022, 7:40 a.m. UTC | #10
On 08/04/2022 17:36, Maxime Ripard wrote:
> On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote:
>>>>> and hdmi_pll_dco because it will always return 0,
>>>>
>>>> It is a read-only clock - whatever we do in CCF, it is not going to
>>>> change. CCF has always supported RO clocks.
>>>
>>> Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's
>>> useful in anyway, it should report its current rate. Read-only or not.
>>>
>>
>> It does report its rate, it does not have any.
>> ... and It would pretty weird to initialize a read-only clock.
> 
> The KMS driver doesn't seem to have a problem with that.
> 
>>>>> unless the display driver comes around and updates it. If it never does,
>>>>> or if it's not compiled in, then you're out of luck.
>>>>
>>>> I'm all for managing the display clocks from CCF but it has proved tricky
>>>> so far. Maybe it will someday.
>>>>
>>>> Being a read-only clock, the value is what it is and CCF should
>>>> deal with it gracefully. It has so far.
>>>>
>>>> If the driver actually managing the clock is not compiled in, then the
>>>> clock will never be set, and it should not be a problem either.
>>>
>>> Again, it depends on what you expect from clk_get_rate(). If it can only
>>> report a rate of 0 on error, then it's definitely a problem.
>>
>> Agreed, it depends on what you expect from clk_get_rate().
>> Still when something does not oscillate, the rate is 0.
>>
>> If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I
>> think returning 0 makes sense.
> 
> You're confusing two things: the rate output by the hardware, and what
> the CCF needs to return. We're discussing the latter here, a software
> construct. It models the hardware, but it doesn't have to be true to the
> hardware.
> 
> And even the meson driver doesn't follow what you're claiming to the
> letter and is inconsistent with what you're saying. Any disabled gate
> will also have a hardware rate of 0. Yet, it doesn't return 0 in such a
> case. And no one does, because clk_get_rate() isn't supposed to return
> the actual rate in hardware at the moment. It's supposed to return what
> would be the rate if it was enabled.

You're pointing a real issue, what should we return as a rate then ?

The rate is not only the rate output by the HW but is also used by the whole
subtree of the PLL to calculate the rates.

If we return a fake or dummy rate instead of 0 (which is the physical reality),
the clock subtree will be populated from a fake rate and if we did set a clock
rate using assigned-clock, eventually this fake rate would match and no set_rate()
would be called on the PLL.

This is why we return the true physical rate, it's the software to adapt in order
to handle the possible HW states.

> 
>>> And it's not because it was done before that it wasn't just as
>>> problematic then.
>>>
>>>>>> IMO, whenever possible we should not put default values in the clocks
>>>>>> which is why I chose to leave it like that.
>>>>>>
>>>>>> The PLL being unlocked, it has no rate. It is not set to have any rate.
>>>>>> IMO a returning a rate of 0 is valid here.
>>>>>
>>>>> If there's not a sensible default in the hardware already, I don't see
>>>>> what the big issue is to be honest. You already kind of do that for all
>>>>> the other PLL parameters with init_regs
>>>>
>>>> Those initial parameters are "magic" analog setting which don't have an
>>>> impact on the rate setting. The initial rate of the clock is never set
>>>> by the clock driver on purpose.
>>>
>>> It's still fundamentally the same: whatever is there at boot isn't good
>>> enough, so you change it to have a somewhat sensible default.
>>
>> If you don't need it, no.
> 
> I mean, I provided multiple examples that the meson driver is being
> inconsistent with both the CCF documentation and the expected usage of
> the CCF consumers. And I suggested solutions to fix this inconsistency
> both here and on IRC to Neil and you.

It's not that we don't like it, we only care the actual HW state is correctly
reported in the clock tree, nothing else.

If a new flag like CLK_INVALID_RATE existed then we would use it.

But so far a zero rate never was problematic, and iMX8mp HDMI PLL and MSM mmpll2
also return a 0 rate as init because it's probably the default PLL state at
HW startup.

> 
> The only argument you provided so far is that you don't like or want it.
> I'm sorry you feel this way, but it doesn't change either the consensus
> or the documentation that every other driver and consumer is following.
> Nor does it provide any solution.
> 
> In the grand scheme of things, I don't care, if you want to have your
> own conventions a policies, go ahead. Just don't expect me to debug
> whatever is going on next time it falls apart.
> 
> Maxime

Neil
Jerome Brunet April 11, 2022, 8:20 a.m. UTC | #11
On Fri 08 Apr 2022 at 17:36, Maxime Ripard <maxime@cerno.tech> wrote:

>
> You're confusing two things: the rate output by the hardware, and what
> the CCF needs to return. We're discussing the latter here, a software
> construct. It models the hardware, but it doesn't have to be true to the
> hardware.
>
> And even the meson driver doesn't follow what you're claiming to the
> letter and is inconsistent with what you're saying. Any disabled gate
> will also have a hardware rate of 0. Yet, it doesn't return 0 in such a
> case. And no one does, because clk_get_rate() isn't supposed to return
> the actual rate in hardware at the moment. It's supposed to return what
> would be the rate if it was enabled.

I don't think I am confused at all.

What rate would you get if you would hit enable on those PLLs with the
invalid setting as they are ? 0 - no lock.

recalc_rate() is about giving the rate of the clock based on its current
parameters. This is exactly what the amlogic PLL driver does. I can't
compute a division by zero, sorry.

The fact that some parameters may be invalid for a clock element is not
specific to any SoC and we have to deal with it.

It is not up to the clock driver to select a random rate to just to make
that valid again, assuming it even can do so  which is not necessarily
the case.
Maxime Ripard April 12, 2022, 12:56 p.m. UTC | #12
On Mon, Apr 11, 2022 at 09:40:03AM +0200, Neil Armstrong wrote:
> On 08/04/2022 17:36, Maxime Ripard wrote:
> > On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote:
> > > > > > and hdmi_pll_dco because it will always return 0,
> > > > > 
> > > > > It is a read-only clock - whatever we do in CCF, it is not going to
> > > > > change. CCF has always supported RO clocks.
> > > > 
> > > > Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's
> > > > useful in anyway, it should report its current rate. Read-only or not.
> > > > 
> > > 
> > > It does report its rate, it does not have any.
> > > ... and It would pretty weird to initialize a read-only clock.
> > 
> > The KMS driver doesn't seem to have a problem with that.
> > 
> > > > > > unless the display driver comes around and updates it. If it never does,
> > > > > > or if it's not compiled in, then you're out of luck.
> > > > > 
> > > > > I'm all for managing the display clocks from CCF but it has proved tricky
> > > > > so far. Maybe it will someday.
> > > > > 
> > > > > Being a read-only clock, the value is what it is and CCF should
> > > > > deal with it gracefully. It has so far.
> > > > > 
> > > > > If the driver actually managing the clock is not compiled in, then the
> > > > > clock will never be set, and it should not be a problem either.
> > > > 
> > > > Again, it depends on what you expect from clk_get_rate(). If it can only
> > > > report a rate of 0 on error, then it's definitely a problem.
> > > 
> > > Agreed, it depends on what you expect from clk_get_rate().
> > > Still when something does not oscillate, the rate is 0.
> > > 
> > > If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I
> > > think returning 0 makes sense.
> > 
> > You're confusing two things: the rate output by the hardware, and what
> > the CCF needs to return. We're discussing the latter here, a software
> > construct. It models the hardware, but it doesn't have to be true to the
> > hardware.
> > 
> > And even the meson driver doesn't follow what you're claiming to the
> > letter and is inconsistent with what you're saying. Any disabled gate
> > will also have a hardware rate of 0. Yet, it doesn't return 0 in such a
> > case. And no one does, because clk_get_rate() isn't supposed to return
> > the actual rate in hardware at the moment. It's supposed to return what
> > would be the rate if it was enabled.
> 
> You're pointing a real issue, what should we return as a rate then ?
> 
> The rate is not only the rate output by the HW but is also used by the
> whole subtree of the PLL to calculate the rates.
> 
> If we return a fake or dummy rate instead of 0 (which is the physical
> reality), the clock subtree will be populated from a fake rate and if
> we did set a clock rate using assigned-clock, eventually this fake
> rate would match and no set_rate() would be called on the PLL.
> 
> This is why we return the true physical rate, it's the software to
> adapt in order to handle the possible HW states.

I'm not sure why you insist so much on the "true physical rate" tbh.
That was never really a thing. It's not for a gate, it's not for a
spread-spectrum clock (which would be even more interesting to report
properly following your criteria). And then you have the clock
accuracies that are throwing it off entirely too.

It just isn't practical, and that's why no-one relies on it.

Or let's turn it the other way around. When you'll have called
clk_enable() on those clocks (but only that), what do you expect the
rate to be? 0? Yet, the "true physical rate" probably changed by then.

If that was a thing, we would also have to:

  * make sure clk_set_rate can't be called on a disabled clock, because
    the true physical rate would never match what is required

  * rate ranges just wouldn't be usable either, because we would have no
    way to check that the true physical rate is actually within that
    range at any given time.

  * All clocks rate wouldn't be cacheable to account for the jitter in
    the system clock source.

  * We would need to use some hardware engine that reads the clock rate
    being output at any given time, to account for spread-spectrum and
    the clock source accuracy.

  * We wouldn't be able to report phases properly, because you can't
    really measure the clock skew either.

I understand that it's convenient for you, but it isn't practical if we
look at it from the framework's PoV.

You could very well initialize the PLLs to have a 1/1 ratio to their
parent, that would probably be sensible and would make the framework
happy.

> > > > And it's not because it was done before that it wasn't just as
> > > > problematic then.
> > > > 
> > > > > > > IMO, whenever possible we should not put default values in the clocks
> > > > > > > which is why I chose to leave it like that.
> > > > > > > 
> > > > > > > The PLL being unlocked, it has no rate. It is not set to have any rate.
> > > > > > > IMO a returning a rate of 0 is valid here.
> > > > > > 
> > > > > > If there's not a sensible default in the hardware already, I don't see
> > > > > > what the big issue is to be honest. You already kind of do that for all
> > > > > > the other PLL parameters with init_regs
> > > > > 
> > > > > Those initial parameters are "magic" analog setting which don't have an
> > > > > impact on the rate setting. The initial rate of the clock is never set
> > > > > by the clock driver on purpose.
> > > > 
> > > > It's still fundamentally the same: whatever is there at boot isn't good
> > > > enough, so you change it to have a somewhat sensible default.
> > > 
> > > If you don't need it, no.
> > 
> > I mean, I provided multiple examples that the meson driver is being
> > inconsistent with both the CCF documentation and the expected usage of
> > the CCF consumers. And I suggested solutions to fix this inconsistency
> > both here and on IRC to Neil and you.
> 
> It's not that we don't like it, we only care the actual HW state is
> correctly reported in the clock tree, nothing else.
>
> If a new flag like CLK_INVALID_RATE existed then we would use it.
> 
> But so far a zero rate never was problematic, and iMX8mp HDMI PLL

That patch isn't upstream.

> and MSM mmpll2 also return a 0 rate as init because it's probably the
> default PLL state at HW startup.

And two wrongs never made a right.
Stephen Boyd April 23, 2022, 4:12 a.m. UTC | #13
Quoting Maxime Ripard (2022-04-08 02:10:37)
> A rate of 0 for a clock is considered an error, as evidenced by the
> documentation of clk_get_rate() and the code of clk_get_rate() and
> clk_core_get_rate_nolock().
> 
> The main source of that error is if the clock is supposed to have a
> parent but is orphan at the moment of the call. This is likely to be
> transient and solved later in the life of the system as more clocks are
> registered.
> 
> The corollary is thus that if a clock is not an orphan, has a parent that
> has a rate (so is not an orphan itself either) but returns a rate of 0,
> something is wrong in the driver. Let's return an error in such a case.
> 
> Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> ---
>  drivers/clk/clk.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 8bbb6adeeead..e8c55678da85 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
>                 rate = 0;
>         core->rate = core->req_rate = rate;
>  
> +       /*
> +        * If we're not an orphan clock and our parent has a rate, then
> +        * if our rate is 0, something is badly broken in recalc_rate.
> +        */
> +       if (!core->orphan && (parent && parent->rate) && !core->rate) {

It's possible that it is an orphan at time of registration, so this
check doesn't even cover the case when it is parented by a later clk
registration. How would we error out when parenting the clk to the
parent if recalc_rate then starts returning 0? It doesn't seem possible
to implement this.
Stephen Boyd April 23, 2022, 4:42 a.m. UTC | #14
Quoting Maxime Ripard (2022-04-08 03:41:27)
> On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote:
> > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
> > > A rate of 0 for a clock is considered an error, as evidenced by the
> > > documentation of clk_get_rate() and the code of clk_get_rate() and
> > > clk_core_get_rate_nolock().

Where?

> > >
> > > The main source of that error is if the clock is supposed to have a
> > > parent but is orphan at the moment of the call. This is likely to be
> > > transient and solved later in the life of the system as more clocks are
> > > registered.
> > >
> > > The corollary is thus that if a clock is not an orphan, has a parent that
> > > has a rate (so is not an orphan itself either) but returns a rate of 0,
> > > something is wrong in the driver. Let's return an error in such a case.
> > >
> > > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > > ---
> > >  drivers/clk/clk.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > > index 8bbb6adeeead..e8c55678da85 100644
> > > --- a/drivers/clk/clk.c
> > > +++ b/drivers/clk/clk.c
> > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> > >             rate = 0;
> > >     core->rate = core->req_rate = rate;
> > >  
> > > +   /*
> > > +    * If we're not an orphan clock and our parent has a rate, then
> > > +    * if our rate is 0, something is badly broken in recalc_rate.
> > > +    */
> > > +   if (!core->orphan && (parent && parent->rate) && !core->rate) {
> > > +           ret = -EINVAL;
> > > +           pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> > > +           goto out;
> > > +   }
> > > +
> > 
> > As hinted in the cover letter, I don't really agree with that.
> > 
> > There are situations where we can't compute the rate. Getting invalid
> > value in the register is one reason.
> > 
> > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> > SoCs would be affected):
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> > Yes, PLL that have not been previously used (by the ROMCode or the
> > bootloader) tend to have the value of the divider set to 0 which in
> > invalid as it would result in a division by zero.
> > 
> > I don't think this is a bug. It is just what the HW is, an unlocked,
> > uninitialized PLL. There is no problem here and the PLL can remain like
> > that until it is needed.
> 
> I think the larger issue is around the semantics of clk_get_rate(), and
> especially whether we can call it without a clk_enable(), and whether
> returning 0 is fine.
> 
> The (clk.h) documentation of clk_get_rate() mentions that "This is only
> valid once the clock source has been enabled", and it's fairly
> ambiguous. I can see how it could be interpreted as "you need to call
> clk_enable() before calling clk_get_rate()", but it can also be
> interpreted as "The returned rate will only be valid once clk_enable()
> is called".

I enjoy the ambiguity! :) This question has come up before and it
doesn't really matter. Drivers can call clk_prepare_enable() if they
want to be sure that clk_get_rate() is meaningful to them, or they can
not. The CCF returns a rate that it gets from calling recalc_rate, which
could be inaccurate for others reasons, either because some driver has
called clk_set_rate() after the clk_get_rate() or because the clk is an
orphan still and clk_get() succeeded, or because the clk_op couldn't
calculate it at the time of caching. Indeed the CCF doesn't try to
recalc the rate after enabling the clk. Maybe we should do that? It
would mean that we have to schedule a work from the enable path to
update the rate accounting outside of any atomic context.

Just thinking out loud, the simpler solution is to probably drop all
rate caching in the CCF and get the frequency on a clk_get_rate() call.
It complicates some of the core though when we check to see if we need
to update clk rates. We could have some middle ground where drivers
indicate that they want to update their cached rate because it's valid
now (either from their enable path or from somewhere else). This may be
nice actually because we could have clk providers call this to force a
recalc down the tree from where they've updated. I think the qcom
DisplayPort phy would want this.

> 
> I think the latter is the proper interpretation though based on what the
> drivers are doing, and even the CCF itself will call recalc_rate without
> making sure that the clock is enabled (in __clk_core_init() for example).
> 
> Then there is the question of whether returning 0 is fine. Again
> clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> returns 0.". This is indeed returned in case of an error condition (in
> clk_get_rate() itself, but also in clk_core_get_rate_nolock()).

A NULL clk isn't an error. We use NULL in the CCF to indicate that it's
an optional clk. Returning 0 from clk_get_rate() is not an error. If
clk_get() returns an error pointer then it's an error. And NULL isn't an
error value per PTR_ERR() (because NULL == 0 when casted, this isn't
golang).

> 
> All the drivers I could find either assume the rate is valid, or test
> whether it's 0 or not (randomly picked, but across completely different
> platforms):
> https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50
> https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74
> https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194
> https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278
> 
> So my understanding is that the consensus is that clk_get_rate() can be
> called even if the clock hasn't been enabled, and that returning 0 is
> only meant to be used for errors in general, a NULL pointer according to
> the documentation.

Again, NULL isn't an invalid clk handle.

> 
> That would mean that pcie_pll_dco is buggy because it assumes that
> clk_enable() is going to be called before clk_get_rate(), gp0_pll_dco
> and hifi_pll_dco because they expect "someone" to call clk_set_rate()
> before clk_get_rate(), and hdmi_pll_dco because it will always return 0,
> unless the display driver comes around and updates it. If it never does,
> or if it's not compiled in, then you're out of luck.
> 

I think this is all fine.
Maxime Ripard April 23, 2022, 7:49 a.m. UTC | #15
On Fri, Apr 22, 2022 at 09:12:33PM -0700, Stephen Boyd wrote:
> Quoting Maxime Ripard (2022-04-08 02:10:37)
> > A rate of 0 for a clock is considered an error, as evidenced by the
> > documentation of clk_get_rate() and the code of clk_get_rate() and
> > clk_core_get_rate_nolock().
> > 
> > The main source of that error is if the clock is supposed to have a
> > parent but is orphan at the moment of the call. This is likely to be
> > transient and solved later in the life of the system as more clocks are
> > registered.
> > 
> > The corollary is thus that if a clock is not an orphan, has a parent that
> > has a rate (so is not an orphan itself either) but returns a rate of 0,
> > something is wrong in the driver. Let's return an error in such a case.
> > 
> > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > ---
> >  drivers/clk/clk.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > index 8bbb6adeeead..e8c55678da85 100644
> > --- a/drivers/clk/clk.c
> > +++ b/drivers/clk/clk.c
> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> >                 rate = 0;
> >         core->rate = core->req_rate = rate;
> >  
> > +       /*
> > +        * If we're not an orphan clock and our parent has a rate, then
> > +        * if our rate is 0, something is badly broken in recalc_rate.
> > +        */
> > +       if (!core->orphan && (parent && parent->rate) && !core->rate) {
> 
> It's possible that it is an orphan at time of registration, so this
> check doesn't even cover the case when it is parented by a later clk
> registration. How would we error out when parenting the clk to the
> parent if recalc_rate then starts returning 0? It doesn't seem possible
> to implement this.

Like I said in my cover letter, this was mostly to spark a discussion :)

Indeed, that case you mentioned wouldn't be covered by this check. I
don't think this patch is reasonable either :)

I mostly wanted to discuss whether you felt like it was something that
was ok or not. If it isn't, I think a good way forward would be to add a
bunch of pr_warn messages to mention that something is fishy and to
clarify the doc.

Hopefully, it will raise both the attention of developers of already
in-tree drivers to fix this, and will prevent new drivers from
introducing more of that behavior.

If the issue persists we could then take stronger measures some time in
the future if needs be?

Maxime
Maxime Ripard April 23, 2022, 9:17 a.m. UTC | #16
On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote:
> Quoting Maxime Ripard (2022-04-08 03:41:27)
> > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote:
> > > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote:
> > > > A rate of 0 for a clock is considered an error, as evidenced by the
> > > > documentation of clk_get_rate() and the code of clk_get_rate() and
> > > > clk_core_get_rate_nolock().
> 
> Where?
> 
> > > >
> > > > The main source of that error is if the clock is supposed to have a
> > > > parent but is orphan at the moment of the call. This is likely to be
> > > > transient and solved later in the life of the system as more clocks are
> > > > registered.
> > > >
> > > > The corollary is thus that if a clock is not an orphan, has a parent that
> > > > has a rate (so is not an orphan itself either) but returns a rate of 0,
> > > > something is wrong in the driver. Let's return an error in such a case.
> > > >
> > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech>
> > > > ---
> > > >  drivers/clk/clk.c | 10 ++++++++++
> > > >  1 file changed, 10 insertions(+)
> > > >
> > > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> > > > index 8bbb6adeeead..e8c55678da85 100644
> > > > --- a/drivers/clk/clk.c
> > > > +++ b/drivers/clk/clk.c
> > > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core)
> > > >             rate = 0;
> > > >     core->rate = core->req_rate = rate;
> > > >  
> > > > +   /*
> > > > +    * If we're not an orphan clock and our parent has a rate, then
> > > > +    * if our rate is 0, something is badly broken in recalc_rate.
> > > > +    */
> > > > +   if (!core->orphan && (parent && parent->rate) && !core->rate) {
> > > > +           ret = -EINVAL;
> > > > +           pr_warn("%s: recalc_rate returned a null rate\n", core->name);
> > > > +           goto out;
> > > > +   }
> > > > +
> > > 
> > > As hinted in the cover letter, I don't really agree with that.
> > > 
> > > There are situations where we can't compute the rate. Getting invalid
> > > value in the register is one reason.
> > > 
> > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all
> > > SoCs would be affected):
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82
> > > Yes, PLL that have not been previously used (by the ROMCode or the
> > > bootloader) tend to have the value of the divider set to 0 which in
> > > invalid as it would result in a division by zero.
> > > 
> > > I don't think this is a bug. It is just what the HW is, an unlocked,
> > > uninitialized PLL. There is no problem here and the PLL can remain like
> > > that until it is needed.
> > 
> > I think the larger issue is around the semantics of clk_get_rate(), and
> > especially whether we can call it without a clk_enable(), and whether
> > returning 0 is fine.
> > 
> > The (clk.h) documentation of clk_get_rate() mentions that "This is only
> > valid once the clock source has been enabled", and it's fairly
> > ambiguous. I can see how it could be interpreted as "you need to call
> > clk_enable() before calling clk_get_rate()", but it can also be
> > interpreted as "The returned rate will only be valid once clk_enable()
> > is called".
> 
> I enjoy the ambiguity! :) This question has come up before and it
> doesn't really matter. Drivers can call clk_prepare_enable() if they
> want to be sure that clk_get_rate() is meaningful to them, or they can
> not.

Right, but that alone already removes the ambiguity :)

If not calling clk_enable() prior to calling clk_get_rate() is ok, then
we don't *need* to call it, both are valid.

Therefore, the "you need to call clk_enable() before calling
clk_get_rate()" interpretation isn't the right one, right?

> The CCF returns a rate that it gets from calling recalc_rate, which
> could be inaccurate for others reasons, either because some driver has
> called clk_set_rate() after the clk_get_rate()

Right, but I guess that's ok, the CCF never claimed to support atomicity
in any way. That could also be the result of a parent changing its rate,
or a reparenting, etc.

Still, it means that it was the best estimate we could do when
clk_get_rate() was called. It just turned out to be short-lived.

> or because the clk is an orphan still and clk_get() succeeded

In this case, core->parent is NULL, so we could argue that it's the same
case than clk_get_rate(NULL). It makes sense.

> or because the clk_op couldn't calculate it at the time of caching.

I guess this is what the discussion is all about, whether it's actually
something that we should find it acceptable or not.

From what you're saying, it is indeed acceptable and thus clk_get_rate()
can return 0 not only from a NULL clock, but also if it's incapable of
figuring out a rate for that clock. We should thus document that
recalc_rate can indeed return 0 in such a case, and that users should
take this into account and not always expect a valid rate.

Would a documentation for recalc_rate like that:

Recalculate the rate of this clock, by querying hardware. The parent
rate is an input parameter. It is up to the caller to ensure that the
prepare_mutex is held across this call. Returns the calculated rate. If
the driver cannot figure out a rate for this clock, it must return 0.
Optional, but recommended - if this op is not set then clock rate will
be initialized to 0.

And for clk_get_rate() (in drivers/clk/clk.c):

Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE
flag is set, which means a recalc_rate will be issued. Can be called if
the clock is enabled or not. If clk is NULL, or if an error occurred,
then returns 0.

Work for everyone?

> Indeed the CCF doesn't try to recalc the rate after enabling the clk.
> Maybe we should do that? It would mean that we have to schedule a work
> from the enable path to update the rate accounting outside of any
> atomic context.
> 
> Just thinking out loud, the simpler solution is to probably drop all
> rate caching in the CCF and get the frequency on a clk_get_rate() call.

The above is just enough for me. recalc_rate returning 0 is valid, and
should be taken properly into account.

Longer term I guess it makes sense to drop the rate caching, but I've
introduced enough clock regressions already for a couple of releases :)

> It complicates some of the core though when we check to see if we need
> to update clk rates. We could have some middle ground where drivers
> indicate that they want to update their cached rate because it's valid
> now (either from their enable path or from somewhere else). This may be
> nice actually because we could have clk providers call this to force a
> recalc down the tree from where they've updated. I think the qcom
> DisplayPort phy would want this.

A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those
clocks.

> > 
> > I think the latter is the proper interpretation though based on what the
> > drivers are doing, and even the CCF itself will call recalc_rate without
> > making sure that the clock is enabled (in __clk_core_init() for example).
> > 
> > Then there is the question of whether returning 0 is fine. Again
> > clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> > returns 0.". This is indeed returned in case of an error condition (in
> > clk_get_rate() itself, but also in clk_core_get_rate_nolock()).
> 
> A NULL clk isn't an error. We use NULL in the CCF to indicate that it's
> an optional clk. Returning 0 from clk_get_rate() is not an error. If
> clk_get() returns an error pointer then it's an error. And NULL isn't an
> error value per PTR_ERR() (because NULL == 0 when casted, this isn't
> golang).

Ok. So we can either group both the "we couldn't figure out a rate" and
the "our clock is NULL" case as the same case in clk_get_rate(), or we
can convert clk_get_rate() to return signed integers and a negative
error code. The latter seems much more intrusive to me and will probably
lead to a whole bunch of regressions, so I'd be more inclined to stick
to the former. Especially since drivers seem to either don't care or
treat 0 as an error already.

Maxime
Stephen Boyd April 29, 2022, 2:08 a.m. UTC | #17
Quoting Maxime Ripard (2022-04-23 02:17:24)
> On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote:
> > I enjoy the ambiguity! :) This question has come up before and it
> > doesn't really matter. Drivers can call clk_prepare_enable() if they
> > want to be sure that clk_get_rate() is meaningful to them, or they can
> > not.
> 
> Right, but that alone already removes the ambiguity :)
> 
> If not calling clk_enable() prior to calling clk_get_rate() is ok, then
> we don't *need* to call it, both are valid.
> 
> Therefore, the "you need to call clk_enable() before calling
> clk_get_rate()" interpretation isn't the right one, right?

It depends. The CCF isn't the only implementation of the clk API, so
some drivers want to maintain API compatibility and call clk_enable()
before getting the rate to make sure the rate is valid. If all other clk
API implementations are removed or made to work without calling
clk_enable() we should be able to update the clk.h documentation.

> 
> > The CCF returns a rate that it gets from calling recalc_rate, which
> > could be inaccurate for others reasons, either because some driver has
> > called clk_set_rate() after the clk_get_rate()
> 
> Right, but I guess that's ok, the CCF never claimed to support atomicity
> in any way. That could also be the result of a parent changing its rate,
> or a reparenting, etc.
> 
> Still, it means that it was the best estimate we could do when
> clk_get_rate() was called. It just turned out to be short-lived.
> 
> > or because the clk is an orphan still and clk_get() succeeded
> 
> In this case, core->parent is NULL, so we could argue that it's the same
> case than clk_get_rate(NULL). It makes sense.
> 
> > or because the clk_op couldn't calculate it at the time of caching.
> 
> I guess this is what the discussion is all about, whether it's actually
> something that we should find it acceptable or not.
> 
> From what you're saying, it is indeed acceptable and thus clk_get_rate()
> can return 0 not only from a NULL clock, but also if it's incapable of
> figuring out a rate for that clock. We should thus document that
> recalc_rate can indeed return 0 in such a case, and that users should
> take this into account and not always expect a valid rate.

I don't think there's anything to update.

> 
> Would a documentation for recalc_rate like that:
> 
> Recalculate the rate of this clock, by querying hardware. The parent
> rate is an input parameter. It is up to the caller to ensure that the
> prepare_mutex is held across this call. Returns the calculated rate. If
> the driver cannot figure out a rate for this clock, it must return 0.

What would the driver return besides 0 if the rate can't be calculated?
UINT_MAX? Does the framework care that the rate couldn't be calculated?
Weakening "must" to "should" or saying it "will most likely return 0" is
better. But again I don't see much point in clarifying here. 0 Hz is a
valid frequency. Maybe we should just say that 0 is valid?

> Optional, but recommended - if this op is not set then clock rate will
> be initialized to 0.
> 
> And for clk_get_rate() (in drivers/clk/clk.c):
> 
> Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE
> flag is set, which means a recalc_rate will be issued. Can be called if
> the clock is enabled or not. If clk is NULL, or if an error occurred,
> then returns 0.

We should remove the clk_get_rate() documentation in clk.c and let the
clk.h documentation be the one that is used. That will clarify that
clk.c implements the API in clk.h. I've been meaning to clean up the
documentation so I'll hack on that tomorrow and see if I can make this
better.

> 
> Work for everyone?
> 
> > Indeed the CCF doesn't try to recalc the rate after enabling the clk.
> > Maybe we should do that? It would mean that we have to schedule a work
> > from the enable path to update the rate accounting outside of any
> > atomic context.
> > 
> > Just thinking out loud, the simpler solution is to probably drop all
> > rate caching in the CCF and get the frequency on a clk_get_rate() call.
> 
> The above is just enough for me. recalc_rate returning 0 is valid, and
> should be taken properly into account.
> 
> Longer term I guess it makes sense to drop the rate caching, but I've
> introduced enough clock regressions already for a couple of releases :)

Ok.

> 
> > It complicates some of the core though when we check to see if we need
> > to update clk rates. We could have some middle ground where drivers
> > indicate that they want to update their cached rate because it's valid
> > now (either from their enable path or from somewhere else). This may be
> > nice actually because we could have clk providers call this to force a
> > recalc down the tree from where they've updated. I think the qcom
> > DisplayPort phy would want this.
> 
> A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those
> clocks.

It's not a good workaround. The clk flag isn't used internally. That's
the problem. It's only used for callers of clk_get_rate(), but we have
rate setting logic where a child doesn't know that the rate should be
recalculated for the parent. It's also not a transitive flag, so if a
clk with the flag set has children we don't care that clk_get_rate() is
called on those children when it should really call up to the parent and
figure out how high to go before calculating the frequency down. This is
where an API that pushes the information to the framework makes sense so
that we can push the update instead of pull it on clk_get_rate().
Anyway, it's a problem for another day.

> 
> > > 
> > > I think the latter is the proper interpretation though based on what the
> > > drivers are doing, and even the CCF itself will call recalc_rate without
> > > making sure that the clock is enabled (in __clk_core_init() for example).
> > > 
> > > Then there is the question of whether returning 0 is fine. Again
> > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> > > returns 0.". This is indeed returned in case of an error condition (in
> > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()).
> > 
> > A NULL clk isn't an error. We use NULL in the CCF to indicate that it's
> > an optional clk. Returning 0 from clk_get_rate() is not an error. If
> > clk_get() returns an error pointer then it's an error. And NULL isn't an
> > error value per PTR_ERR() (because NULL == 0 when casted, this isn't
> > golang).
> 
> Ok. So we can either group both the "we couldn't figure out a rate" and
> the "our clock is NULL" case as the same case in clk_get_rate(), or we
> can convert clk_get_rate() to return signed integers and a negative
> error code. The latter seems much more intrusive to me and will probably
> lead to a whole bunch of regressions, so I'd be more inclined to stick
> to the former. Especially since drivers seem to either don't care or
> treat 0 as an error already.

Agreed. We can try to work around the "not ready to figure out the rate"
problem by implementing EPROBE_DEFER for clks that are orphans. That
needs some thought around cases where clks have parents that aren't
described in the kernel and we want to reparent them via clk_set_rate()
or clk_set_parent() or their parent is provided by the driver getting
the clk (this latter case is a head spinner). In general indicating this
not fully registered state through clk_get_rate() is wrong though, hence
the decision to make clk_get_rate() always return something valid and
have the "don't know" case be returning zero.
Maxime Ripard April 29, 2022, 3:45 p.m. UTC | #18
On Thu, Apr 28, 2022 at 07:08:10PM -0700, Stephen Boyd wrote:
> Quoting Maxime Ripard (2022-04-23 02:17:24)
> > On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote:
> > > I enjoy the ambiguity! :) This question has come up before and it
> > > doesn't really matter. Drivers can call clk_prepare_enable() if they
> > > want to be sure that clk_get_rate() is meaningful to them, or they can
> > > not.
> > 
> > Right, but that alone already removes the ambiguity :)
> > 
> > If not calling clk_enable() prior to calling clk_get_rate() is ok, then
> > we don't *need* to call it, both are valid.
> > 
> > Therefore, the "you need to call clk_enable() before calling
> > clk_get_rate()" interpretation isn't the right one, right?
> 
> It depends. The CCF isn't the only implementation of the clk API, so
> some drivers want to maintain API compatibility and call clk_enable()
> before getting the rate to make sure the rate is valid. If all other clk
> API implementations are removed or made to work without calling
> clk_enable() we should be able to update the clk.h documentation.
> 
> > 
> > > The CCF returns a rate that it gets from calling recalc_rate, which
> > > could be inaccurate for others reasons, either because some driver has
> > > called clk_set_rate() after the clk_get_rate()
> > 
> > Right, but I guess that's ok, the CCF never claimed to support atomicity
> > in any way. That could also be the result of a parent changing its rate,
> > or a reparenting, etc.
> > 
> > Still, it means that it was the best estimate we could do when
> > clk_get_rate() was called. It just turned out to be short-lived.
> > 
> > > or because the clk is an orphan still and clk_get() succeeded
> > 
> > In this case, core->parent is NULL, so we could argue that it's the same
> > case than clk_get_rate(NULL). It makes sense.
> > 
> > > or because the clk_op couldn't calculate it at the time of caching.
> > 
> > I guess this is what the discussion is all about, whether it's actually
> > something that we should find it acceptable or not.
> > 
> > From what you're saying, it is indeed acceptable and thus clk_get_rate()
> > can return 0 not only from a NULL clock, but also if it's incapable of
> > figuring out a rate for that clock. We should thus document that
> > recalc_rate can indeed return 0 in such a case, and that users should
> > take this into account and not always expect a valid rate.
> 
> I don't think there's anything to update.
> 
> > 
> > Would a documentation for recalc_rate like that:
> > 
> > Recalculate the rate of this clock, by querying hardware. The parent
> > rate is an input parameter. It is up to the caller to ensure that the
> > prepare_mutex is held across this call. Returns the calculated rate. If
> > the driver cannot figure out a rate for this clock, it must return 0.
> 
> What would the driver return besides 0 if the rate can't be calculated?
> UINT_MAX?

Well, we could expect the driver to never fail, and always be able to
calculate its rate (provided its parent is there). This is what this
patch is doing essentially, and it was working fine on a number of
platforms, so it doesn't seem unfeasible.

Either way, I'm not sure what your pushback is about here. Whether or
not the driver can or cannot return 0 on failure, that should be made
clear in the documentation, and for now it isn't.

If we want to maintain the status quo, then it means that recalc_rate
can return 0 on error, what's the harm in clearly stating that?

> Does the framework care that the rate couldn't be calculated?

I'd say to some extent, yes. The clk_set_rate_range() rework could just
ignore the call to clk_set_rate() if the current rate is 0. That would
have saved us from a couple of regressions.

> Weakening "must" to "should" or saying it "will most likely return 0" is
> better. But again I don't see much point in clarifying here. 0 Hz is a
> valid frequency. Maybe we should just say that 0 is valid?

Ok, so it's about the "must" part.

Yeah, maybe that's the best solution? I don't know, but the current
comment hints a bit at the "0 is an error" part with the reference to a
NULL clk.

> > Optional, but recommended - if this op is not set then clock rate will
> > be initialized to 0.
> > 
> > And for clk_get_rate() (in drivers/clk/clk.c):
> > 
> > Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE
> > flag is set, which means a recalc_rate will be issued. Can be called if
> > the clock is enabled or not. If clk is NULL, or if an error occurred,
> > then returns 0.
> 
> We should remove the clk_get_rate() documentation in clk.c and let the
> clk.h documentation be the one that is used. That will clarify that
> clk.c implements the API in clk.h. I've been meaning to clean up the
> documentation so I'll hack on that tomorrow and see if I can make this
> better.

ok, thanks :)

> > Work for everyone?
> > 
> > > Indeed the CCF doesn't try to recalc the rate after enabling the clk.
> > > Maybe we should do that? It would mean that we have to schedule a work
> > > from the enable path to update the rate accounting outside of any
> > > atomic context.
> > > 
> > > Just thinking out loud, the simpler solution is to probably drop all
> > > rate caching in the CCF and get the frequency on a clk_get_rate() call.
> > 
> > The above is just enough for me. recalc_rate returning 0 is valid, and
> > should be taken properly into account.
> > 
> > Longer term I guess it makes sense to drop the rate caching, but I've
> > introduced enough clock regressions already for a couple of releases :)
> 
> Ok.
> 
> > 
> > > It complicates some of the core though when we check to see if we need
> > > to update clk rates. We could have some middle ground where drivers
> > > indicate that they want to update their cached rate because it's valid
> > > now (either from their enable path or from somewhere else). This may be
> > > nice actually because we could have clk providers call this to force a
> > > recalc down the tree from where they've updated. I think the qcom
> > > DisplayPort phy would want this.
> > 
> > A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those
> > clocks.
> 
> It's not a good workaround. The clk flag isn't used internally. That's
> the problem. It's only used for callers of clk_get_rate(), but we have
> rate setting logic where a child doesn't know that the rate should be
> recalculated for the parent. It's also not a transitive flag, so if a
> clk with the flag set has children we don't care that clk_get_rate() is
> called on those children when it should really call up to the parent and
> figure out how high to go before calculating the frequency down. This is
> where an API that pushes the information to the framework makes sense so
> that we can push the update instead of pull it on clk_get_rate().

Ah, I see what you mean, thanks

> Anyway, it's a problem for another day.

Yeah, I have a can of worms big enough as it is :)

> > > > I think the latter is the proper interpretation though based on what the
> > > > drivers are doing, and even the CCF itself will call recalc_rate without
> > > > making sure that the clock is enabled (in __clk_core_init() for example).
> > > > 
> > > > Then there is the question of whether returning 0 is fine. Again
> > > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then
> > > > returns 0.". This is indeed returned in case of an error condition (in
> > > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()).
> > > 
> > > A NULL clk isn't an error. We use NULL in the CCF to indicate that it's
> > > an optional clk. Returning 0 from clk_get_rate() is not an error. If
> > > clk_get() returns an error pointer then it's an error. And NULL isn't an
> > > error value per PTR_ERR() (because NULL == 0 when casted, this isn't
> > > golang).
> > 
> > Ok. So we can either group both the "we couldn't figure out a rate" and
> > the "our clock is NULL" case as the same case in clk_get_rate(), or we
> > can convert clk_get_rate() to return signed integers and a negative
> > error code. The latter seems much more intrusive to me and will probably
> > lead to a whole bunch of regressions, so I'd be more inclined to stick
> > to the former. Especially since drivers seem to either don't care or
> > treat 0 as an error already.
> 
> Agreed. We can try to work around the "not ready to figure out the rate"
> problem by implementing EPROBE_DEFER for clks that are orphans. That
> needs some thought around cases where clks have parents that aren't
> described in the kernel and we want to reparent them via clk_set_rate()
> or clk_set_parent() or their parent is provided by the driver getting
> the clk (this latter case is a head spinner). In general indicating this
> not fully registered state through clk_get_rate() is wrong though, hence
> the decision to make clk_get_rate() always return something valid and
> have the "don't know" case be returning zero.

Ok

maxime
diff mbox series

Patch

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 8bbb6adeeead..e8c55678da85 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -3773,6 +3773,16 @@  static int __clk_core_init(struct clk_core *core)
 		rate = 0;
 	core->rate = core->req_rate = rate;
 
+	/*
+	 * If we're not an orphan clock and our parent has a rate, then
+	 * if our rate is 0, something is badly broken in recalc_rate.
+	 */
+	if (!core->orphan && (parent && parent->rate) && !core->rate) {
+		ret = -EINVAL;
+		pr_warn("%s: recalc_rate returned a null rate\n", core->name);
+		goto out;
+	}
+
 	/*
 	 * Enable CLK_IS_CRITICAL clocks so newly added critical clocks
 	 * don't get accidentally disabled when walking the orphan tree and