Message ID | 20220408091037.2041955-23-maxime@cerno.tech (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | clk: More clock rate fixes and tests | expand |
On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > A rate of 0 for a clock is considered an error, as evidenced by the > documentation of clk_get_rate() and the code of clk_get_rate() and > clk_core_get_rate_nolock(). > > The main source of that error is if the clock is supposed to have a > parent but is orphan at the moment of the call. This is likely to be > transient and solved later in the life of the system as more clocks are > registered. > > The corollary is thus that if a clock is not an orphan, has a parent that > has a rate (so is not an orphan itself either) but returns a rate of 0, > something is wrong in the driver. Let's return an error in such a case. > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > --- > drivers/clk/clk.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 8bbb6adeeead..e8c55678da85 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > rate = 0; > core->rate = core->req_rate = rate; > > + /* > + * If we're not an orphan clock and our parent has a rate, then > + * if our rate is 0, something is badly broken in recalc_rate. > + */ > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > + ret = -EINVAL; > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > + goto out; > + } > + As hinted in the cover letter, I don't really agree with that. There are situations where we can't compute the rate. Getting invalid value in the register is one reason. You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all SoCs would be affected): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 Yes, PLL that have not been previously used (by the ROMCode or the bootloader) tend to have the value of the divider set to 0 which in invalid as it would result in a division by zero. I don't think this is a bug. It is just what the HW is, an unlocked, uninitialized PLL. There is no problem here and the PLL can remain like that until it is needed. IMO, whenever possible we should not put default values in the clocks which is why I chose to leave it like that. The PLL being unlocked, it has no rate. It is not set to have any rate. IMO a returning a rate of 0 is valid here. > /* > * Enable CLK_IS_CRITICAL clocks so newly added critical clocks > * don't get accidentally disabled when walking the orphan tree and
On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote: > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > > A rate of 0 for a clock is considered an error, as evidenced by the > > documentation of clk_get_rate() and the code of clk_get_rate() and > > clk_core_get_rate_nolock(). > > > > The main source of that error is if the clock is supposed to have a > > parent but is orphan at the moment of the call. This is likely to be > > transient and solved later in the life of the system as more clocks are > > registered. > > > > The corollary is thus that if a clock is not an orphan, has a parent that > > has a rate (so is not an orphan itself either) but returns a rate of 0, > > something is wrong in the driver. Let's return an error in such a case. > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > --- > > drivers/clk/clk.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > > index 8bbb6adeeead..e8c55678da85 100644 > > --- a/drivers/clk/clk.c > > +++ b/drivers/clk/clk.c > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > > rate = 0; > > core->rate = core->req_rate = rate; > > > > + /* > > + * If we're not an orphan clock and our parent has a rate, then > > + * if our rate is 0, something is badly broken in recalc_rate. > > + */ > > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > > + ret = -EINVAL; > > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > > + goto out; > > + } > > + > > As hinted in the cover letter, I don't really agree with that. > > There are situations where we can't compute the rate. Getting invalid > value in the register is one reason. > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > SoCs would be affected): > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > Yes, PLL that have not been previously used (by the ROMCode or the > bootloader) tend to have the value of the divider set to 0 which in > invalid as it would result in a division by zero. > > I don't think this is a bug. It is just what the HW is, an unlocked, > uninitialized PLL. There is no problem here and the PLL can remain like > that until it is needed. I think the larger issue is around the semantics of clk_get_rate(), and especially whether we can call it without a clk_enable(), and whether returning 0 is fine. The (clk.h) documentation of clk_get_rate() mentions that "This is only valid once the clock source has been enabled", and it's fairly ambiguous. I can see how it could be interpreted as "you need to call clk_enable() before calling clk_get_rate()", but it can also be interpreted as "The returned rate will only be valid once clk_enable() is called". I think the latter is the proper interpretation though based on what the drivers are doing, and even the CCF itself will call recalc_rate without making sure that the clock is enabled (in __clk_core_init() for example). Then there is the question of whether returning 0 is fine. Again clk_get_rate() (clk.c) documentation states that "If clk is NULL then returns 0.". This is indeed returned in case of an error condition (in clk_get_rate() itself, but also in clk_core_get_rate_nolock()). All the drivers I could find either assume the rate is valid, or test whether it's 0 or not (randomly picked, but across completely different platforms): https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50 https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74 https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194 https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278 So my understanding is that the consensus is that clk_get_rate() can be called even if the clock hasn't been enabled, and that returning 0 is only meant to be used for errors in general, a NULL pointer according to the documentation. That would mean that pcie_pll_dco is buggy because it assumes that clk_enable() is going to be called before clk_get_rate(), gp0_pll_dco and hifi_pll_dco because they expect "someone" to call clk_set_rate() before clk_get_rate(), and hdmi_pll_dco because it will always return 0, unless the display driver comes around and updates it. If it never does, or if it's not compiled in, then you're out of luck. > IMO, whenever possible we should not put default values in the clocks > which is why I chose to leave it like that. > > The PLL being unlocked, it has no rate. It is not set to have any rate. > IMO a returning a rate of 0 is valid here. If there's not a sensible default in the hardware already, I don't see what the big issue is to be honest. You already kind of do that for all the other PLL parameters with init_regs, and most drivers do that for various stuff: https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917 https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448 https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157 https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013 If the driver needs that kind of quirks in order to make the clock usable in itself, then it just makes sense to do that, especially if it's to avoid breaking a generic API. Maxime
On Fri 08 Apr 2022 at 12:41, Maxime Ripard <maxime@cerno.tech> wrote: > [[PGP Signed Part:Undecided]] > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote: >> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: >> > A rate of 0 for a clock is considered an error, as evidenced by the >> > documentation of clk_get_rate() and the code of clk_get_rate() and >> > clk_core_get_rate_nolock(). >> > >> > The main source of that error is if the clock is supposed to have a >> > parent but is orphan at the moment of the call. This is likely to be >> > transient and solved later in the life of the system as more clocks are >> > registered. >> > >> > The corollary is thus that if a clock is not an orphan, has a parent that >> > has a rate (so is not an orphan itself either) but returns a rate of 0, >> > something is wrong in the driver. Let's return an error in such a case. >> > >> > Signed-off-by: Maxime Ripard <maxime@cerno.tech> >> > --- >> > drivers/clk/clk.c | 10 ++++++++++ >> > 1 file changed, 10 insertions(+) >> > >> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c >> > index 8bbb6adeeead..e8c55678da85 100644 >> > --- a/drivers/clk/clk.c >> > +++ b/drivers/clk/clk.c >> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) >> > rate = 0; >> > core->rate = core->req_rate = rate; >> > >> > + /* >> > + * If we're not an orphan clock and our parent has a rate, then >> > + * if our rate is 0, something is badly broken in recalc_rate. >> > + */ >> > + if (!core->orphan && (parent && parent->rate) && !core->rate) { >> > + ret = -EINVAL; >> > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); >> > + goto out; >> > + } >> > + >> >> As hinted in the cover letter, I don't really agree with that. >> >> There are situations where we can't compute the rate. Getting invalid >> value in the register is one reason. >> >> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all >> SoCs would be affected): >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 >> Yes, PLL that have not been previously used (by the ROMCode or the >> bootloader) tend to have the value of the divider set to 0 which in >> invalid as it would result in a division by zero. >> >> I don't think this is a bug. It is just what the HW is, an unlocked, >> uninitialized PLL. There is no problem here and the PLL can remain like >> that until it is needed. > > I think the larger issue is around the semantics of clk_get_rate(), and > especially whether we can call it without a clk_enable(), and whether > returning 0 is fine. > > The (clk.h) documentation of clk_get_rate() mentions that "This is only > valid once the clock source has been enabled", and it's fairly > ambiguous. I can see how it could be interpreted as "you need to call > clk_enable() before calling clk_get_rate()", but it can also be > interpreted as "The returned rate will only be valid once clk_enable() > is called". > > I think the latter is the proper interpretation though based on what the > drivers are doing, and even the CCF itself will call recalc_rate without > making sure that the clock is enabled (in __clk_core_init() for example). > > Then there is the question of whether returning 0 is fine. Again > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > returns 0.". This is indeed returned in case of an error condition (in > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). > > All the drivers I could find either assume the rate is valid, or test > whether it's 0 or not (randomly picked, but across completely different > platforms): > https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50 > https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74 > https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194 > https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278 > > So my understanding is that the consensus is that clk_get_rate() can be > called even if the clock hasn't been enabled, and that returning 0 is > only meant to be used for errors in general, a NULL pointer according to > the documentation. > > That would mean that pcie_pll_dco is buggy because it assumes that > clk_enable() This one indeed does everything in the enable and I could agree it is fishy, but since it supports only a single rate I don't think it is a problem. Even if it had a proper set_rate(), it would not change your problem since it would still return 0 until some consumer actually needs its parameter to change. > is going to be called before clk_get_rate(), gp0_pll_dco > and hifi_pll_dco because they expect "someone" to call clk_set_rate() > before clk_get_rate(), No, they don't expect anything. They will return 0 until they are set with a an actual rate. I don't think returning 0 should be problem and it has not been so far. I understand the ambiguity you mentioned above. If the framework decides it is clearly forbidden to return 0, we'll change. Still I don't think it would be wise. What are the alternative if you can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is a clear indication that the clock is not in a working state. > and hdmi_pll_dco because it will always return 0, It is a read-only clock - whatever we do in CCF, it is not going to change. CCF has always supported RO clocks. > unless the display driver comes around and updates it. If it never does, > or if it's not compiled in, then you're out of luck. I'm all for managing the display clocks from CCF but it has proved tricky so far. Maybe it will someday. Being a read-only clock, the value is what it is and CCF should deal with it gracefully. It has so far. If the driver actually managing the clock is not compiled in, then the clock will never be set, and it should not be a problem either. > >> IMO, whenever possible we should not put default values in the clocks >> which is why I chose to leave it like that. >> >> The PLL being unlocked, it has no rate. It is not set to have any rate. >> IMO a returning a rate of 0 is valid here. > > If there's not a sensible default in the hardware already, I don't see > what the big issue is to be honest. You already kind of do that for all > the other PLL parameters with init_regs Those initial parameters are "magic" analog setting which don't have an impact on the rate setting. The initial rate of the clock is never set by the clock driver on purpose. > , and most drivers do that for > various stuff: > https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917 > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 > https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448 > https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157 > https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013 It is done by other drivers or controllers, yes. It does not make it right (again, it is just my opinion). Rate should never be set by the clock driver or the clock controller - Those should just implement what consumer wants. I would agree it sometimes proves tricky to hold onto this. Taking one of the example above: https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 I think it would be better to have an "assigned-clock" on the related PLL in the USB node of the platform DT. That way the PLL is set when needed. If we go down the road of "others are doing it, so why not ?", I think Marek initial regression report mentioned Exynos too ;) > > If the driver needs that kind of quirks in order to make the clock > usable in itself, then it just makes sense to do that, especially if > it's to avoid breaking a generic API. As it is the clock are usable and it did not break anything so far. I have no problem updating the drivers if need be. I do have a problem with the framework changing and requiring the clock driver to set an initial rate to make it happy. > > Maxime > > [[End of PGP Signed Part]]
Hi All, On 08.04.2022 11:18, Jerome Brunet wrote: > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > >> A rate of 0 for a clock is considered an error, as evidenced by the >> documentation of clk_get_rate() and the code of clk_get_rate() and >> clk_core_get_rate_nolock(). >> >> The main source of that error is if the clock is supposed to have a >> parent but is orphan at the moment of the call. This is likely to be >> transient and solved later in the life of the system as more clocks are >> registered. >> >> The corollary is thus that if a clock is not an orphan, has a parent that >> has a rate (so is not an orphan itself either) but returns a rate of 0, >> something is wrong in the driver. Let's return an error in such a case. >> >> Signed-off-by: Maxime Ripard <maxime@cerno.tech> >> --- >> drivers/clk/clk.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c >> index 8bbb6adeeead..e8c55678da85 100644 >> --- a/drivers/clk/clk.c >> +++ b/drivers/clk/clk.c >> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) >> rate = 0; >> core->rate = core->req_rate = rate; >> >> + /* >> + * If we're not an orphan clock and our parent has a rate, then >> + * if our rate is 0, something is badly broken in recalc_rate. >> + */ >> + if (!core->orphan && (parent && parent->rate) && !core->rate) { >> + ret = -EINVAL; >> + pr_warn("%s: recalc_rate returned a null rate\n", core->name); >> + goto out; >> + } >> + > As hinted in the cover letter, I don't really agree with that. > > There are situations where we can't compute the rate. Getting invalid > value in the register is one reason. > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > SoCs would be affected): > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > Yes, PLL that have not been previously used (by the ROMCode or the > bootloader) tend to have the value of the divider set to 0 which in > invalid as it would result in a division by zero. > > I don't think this is a bug. It is just what the HW is, an unlocked, > uninitialized PLL. There is no problem here and the PLL can remain like > that until it is needed. > > IMO, whenever possible we should not put default values in the clocks > which is why I chose to leave it like that. > > The PLL being unlocked, it has no rate. It is not set to have any rate. > IMO a returning a rate of 0 is valid here. I agree that it should be possible to register so called unconfigured clock, which doesn't have any rate set yet. It might be used later by some drivers, which will do the whole initialization (set rate and or parents). I also wonder why the above patch triggers warning on the Amlogic Socs, while on Exynos I still have some unused clocks with 0 rate registered properly. Best regards
Hi, On Fri, Apr 08, 2022 at 02:17:55PM +0200, Marek Szyprowski wrote: > On 08.04.2022 11:18, Jerome Brunet wrote: > > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > > > >> A rate of 0 for a clock is considered an error, as evidenced by the > >> documentation of clk_get_rate() and the code of clk_get_rate() and > >> clk_core_get_rate_nolock(). > >> > >> The main source of that error is if the clock is supposed to have a > >> parent but is orphan at the moment of the call. This is likely to be > >> transient and solved later in the life of the system as more clocks are > >> registered. > >> > >> The corollary is thus that if a clock is not an orphan, has a parent that > >> has a rate (so is not an orphan itself either) but returns a rate of 0, > >> something is wrong in the driver. Let's return an error in such a case. > >> > >> Signed-off-by: Maxime Ripard <maxime@cerno.tech> > >> --- > >> drivers/clk/clk.c | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > >> index 8bbb6adeeead..e8c55678da85 100644 > >> --- a/drivers/clk/clk.c > >> +++ b/drivers/clk/clk.c > >> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > >> rate = 0; > >> core->rate = core->req_rate = rate; > >> > >> + /* > >> + * If we're not an orphan clock and our parent has a rate, then > >> + * if our rate is 0, something is badly broken in recalc_rate. > >> + */ > >> + if (!core->orphan && (parent && parent->rate) && !core->rate) { > >> + ret = -EINVAL; > >> + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > >> + goto out; > >> + } > >> + > > As hinted in the cover letter, I don't really agree with that. > > > > There are situations where we can't compute the rate. Getting invalid > > value in the register is one reason. > > > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > > SoCs would be affected): > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > > Yes, PLL that have not been previously used (by the ROMCode or the > > bootloader) tend to have the value of the divider set to 0 which in > > invalid as it would result in a division by zero. > > > > I don't think this is a bug. It is just what the HW is, an unlocked, > > uninitialized PLL. There is no problem here and the PLL can remain like > > that until it is needed. > > > > IMO, whenever possible we should not put default values in the clocks > > which is why I chose to leave it like that. > > > > The PLL being unlocked, it has no rate. It is not set to have any rate. > > IMO a returning a rate of 0 is valid here. > > I agree that it should be possible to register so called unconfigured > clock, which doesn't have any rate set yet. It might be used later by > some drivers, which will do the whole initialization (set rate and or > parents). I think we should differentiate between the clock being fine from the CCF point of view, and from the device point of view. I'm arguing about the former, but the latter should definitely be up to other drivers to set it up so that their hardware work. But in both cases, we should return something valid under the CCF semantics. > I also wonder why the above patch triggers warning on the Amlogic > Socs, while on Exynos I still have some unused clocks with 0 rate > registered properly. Most likely the clocks that return a rate of 0 are orphans (ie, their parent isn't registered yet). We can't expect all the parents to be registered before their children, so this is fine, and that patch doesn't break it. Did you test this series for exynos and meson then? Could you give your tested-by if you did? :) Maxime
On Fri, Apr 08, 2022 at 01:24:59PM +0200, Jerome Brunet wrote: > On Fri 08 Apr 2022 at 12:41, Maxime Ripard <maxime@cerno.tech> wrote: > > [[PGP Signed Part:Undecided]] > > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote: > >> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > >> > A rate of 0 for a clock is considered an error, as evidenced by the > >> > documentation of clk_get_rate() and the code of clk_get_rate() and > >> > clk_core_get_rate_nolock(). > >> > > >> > The main source of that error is if the clock is supposed to have a > >> > parent but is orphan at the moment of the call. This is likely to be > >> > transient and solved later in the life of the system as more clocks are > >> > registered. > >> > > >> > The corollary is thus that if a clock is not an orphan, has a parent that > >> > has a rate (so is not an orphan itself either) but returns a rate of 0, > >> > something is wrong in the driver. Let's return an error in such a case. > >> > > >> > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > >> > --- > >> > drivers/clk/clk.c | 10 ++++++++++ > >> > 1 file changed, 10 insertions(+) > >> > > >> > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > >> > index 8bbb6adeeead..e8c55678da85 100644 > >> > --- a/drivers/clk/clk.c > >> > +++ b/drivers/clk/clk.c > >> > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > >> > rate = 0; > >> > core->rate = core->req_rate = rate; > >> > > >> > + /* > >> > + * If we're not an orphan clock and our parent has a rate, then > >> > + * if our rate is 0, something is badly broken in recalc_rate. > >> > + */ > >> > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > >> > + ret = -EINVAL; > >> > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > >> > + goto out; > >> > + } > >> > + > >> > >> As hinted in the cover letter, I don't really agree with that. > >> > >> There are situations where we can't compute the rate. Getting invalid > >> value in the register is one reason. > >> > >> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > >> SoCs would be affected): > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > >> Yes, PLL that have not been previously used (by the ROMCode or the > >> bootloader) tend to have the value of the divider set to 0 which in > >> invalid as it would result in a division by zero. > >> > >> I don't think this is a bug. It is just what the HW is, an unlocked, > >> uninitialized PLL. There is no problem here and the PLL can remain like > >> that until it is needed. > > > > I think the larger issue is around the semantics of clk_get_rate(), and > > especially whether we can call it without a clk_enable(), and whether > > returning 0 is fine. > > > > The (clk.h) documentation of clk_get_rate() mentions that "This is only > > valid once the clock source has been enabled", and it's fairly > > ambiguous. I can see how it could be interpreted as "you need to call > > clk_enable() before calling clk_get_rate()", but it can also be > > interpreted as "The returned rate will only be valid once clk_enable() > > is called". > > > > I think the latter is the proper interpretation though based on what the > > drivers are doing, and even the CCF itself will call recalc_rate without > > making sure that the clock is enabled (in __clk_core_init() for example). > > > > Then there is the question of whether returning 0 is fine. Again > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > > returns 0.". This is indeed returned in case of an error condition (in > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). > > > > All the drivers I could find either assume the rate is valid, or test > > whether it's 0 or not (randomly picked, but across completely different > > platforms): > > https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50 > > https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74 > > https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194 > > https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278 > > > > So my understanding is that the consensus is that clk_get_rate() can be > > called even if the clock hasn't been enabled, and that returning 0 is > > only meant to be used for errors in general, a NULL pointer according to > > the documentation. > > > > That would mean that pcie_pll_dco is buggy because it assumes that > > clk_enable() > > This one indeed does everything in the enable and I could agree it is > fishy, but since it supports only a single rate I don't think it is a > problem. Even if it had a proper set_rate(), it would not change your > problem since it would still return 0 until some consumer actually needs > its parameter to change. > > > is going to be called before clk_get_rate(), gp0_pll_dco > > and hifi_pll_dco because they expect "someone" to call clk_set_rate() > > before clk_get_rate(), > > No, they don't expect anything. They will return 0 until they are set > with a an actual rate. I don't think returning 0 should be > problem and it has not been so far. So, if I was to rephrase, gp0_pll_dco and hifi_pll_dco expect someone to call clk_set_rate() before clk_get_rate() to have it return anything other than 0. > I understand the ambiguity you mentioned above. If the framework decides > it is clearly forbidden to return 0, we'll change. > > Still I don't think it would be wise. What are the alternative if you > can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is > a clear indication that the clock is not in a working state. No, the alternative would be to initialize the clock to something sensible before registering it (or in init). > > and hdmi_pll_dco because it will always return 0, > > It is a read-only clock - whatever we do in CCF, it is not going to > change. CCF has always supported RO clocks. Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's useful in anyway, it should report its current rate. Read-only or not. > > unless the display driver comes around and updates it. If it never does, > > or if it's not compiled in, then you're out of luck. > > I'm all for managing the display clocks from CCF but it has proved tricky > so far. Maybe it will someday. > > Being a read-only clock, the value is what it is and CCF should > deal with it gracefully. It has so far. > > If the driver actually managing the clock is not compiled in, then the > clock will never be set, and it should not be a problem either. Again, it depends on what you expect from clk_get_rate(). If it can only report a rate of 0 on error, then it's definitely a problem. And it's not because it was done before that it wasn't just as problematic then. > >> IMO, whenever possible we should not put default values in the clocks > >> which is why I chose to leave it like that. > >> > >> The PLL being unlocked, it has no rate. It is not set to have any rate. > >> IMO a returning a rate of 0 is valid here. > > > > If there's not a sensible default in the hardware already, I don't see > > what the big issue is to be honest. You already kind of do that for all > > the other PLL parameters with init_regs > > Those initial parameters are "magic" analog setting which don't have an > impact on the rate setting. The initial rate of the clock is never set > by the clock driver on purpose. It's still fundamentally the same: whatever is there at boot isn't good enough, so you change it to have a somewhat sensible default. > > , and most drivers do that for > > various stuff: > > https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917 > > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 > > https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448 > > https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157 > > https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013 > > It is done by other drivers or controllers, yes. It does not make it > right (again, it is just my opinion). Rate should never be set by the > clock driver or the clock controller - Those should just implement what > consumer wants. I would agree it sometimes proves tricky to hold onto > this. > > Taking one of the example above: > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 > > I think it would be better to have an "assigned-clock" on the related > PLL in the USB node of the platform DT. That way the PLL is set when > needed. Both are valid. Assigned-clocks is arguably more fragile, but that's not really the point. > If we go down the road of "others are doing it, so why not ?", I think Marek > initial regression report mentioned Exynos too ;) Yes, he did mention exynos as well, but the issue was entirely different. Exynos had the issue that req_rate wasn't updated whenever a clock wasn't orphan anymore because it changed parent. It's fixed in patch 10. Out of the drivers I tested this week (sunxi-ng, exynos, omap3, qcom, imx6, imx8 and g12b), only meson is in this case. > > If the driver needs that kind of quirks in order to make the clock > > usable in itself, then it just makes sense to do that, especially if > > it's to avoid breaking a generic API. > > As it is the clock are usable and it did not break anything so far. Anything but the abstraction. > I have no problem updating the drivers if need be. I do have a problem > with the framework changing and requiring the clock driver to set an > initial rate to make it happy. I mean, the alternative is changing the semantics of clk_get_rate(), and all the call sites. Fixing one inconsistent driver definitely seems preferable. Maxime
Hi Maxime, On 08.04.2022 14:25, Maxime Ripard wrote: > On Fri, Apr 08, 2022 at 02:17:55PM +0200, Marek Szyprowski wrote: >> On 08.04.2022 11:18, Jerome Brunet wrote: >>> On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: >>> >>>> A rate of 0 for a clock is considered an error, as evidenced by the >>>> documentation of clk_get_rate() and the code of clk_get_rate() and >>>> clk_core_get_rate_nolock(). >>>> >>>> The main source of that error is if the clock is supposed to have a >>>> parent but is orphan at the moment of the call. This is likely to be >>>> transient and solved later in the life of the system as more clocks are >>>> registered. >>>> >>>> The corollary is thus that if a clock is not an orphan, has a parent that >>>> has a rate (so is not an orphan itself either) but returns a rate of 0, >>>> something is wrong in the driver. Let's return an error in such a case. >>>> >>>> Signed-off-by: Maxime Ripard <maxime@cerno.tech> >>>> --- >>>> drivers/clk/clk.c | 10 ++++++++++ >>>> 1 file changed, 10 insertions(+) >>>> >>>> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c >>>> index 8bbb6adeeead..e8c55678da85 100644 >>>> --- a/drivers/clk/clk.c >>>> +++ b/drivers/clk/clk.c >>>> @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) >>>> rate = 0; >>>> core->rate = core->req_rate = rate; >>>> >>>> + /* >>>> + * If we're not an orphan clock and our parent has a rate, then >>>> + * if our rate is 0, something is badly broken in recalc_rate. >>>> + */ >>>> + if (!core->orphan && (parent && parent->rate) && !core->rate) { >>>> + ret = -EINVAL; >>>> + pr_warn("%s: recalc_rate returned a null rate\n", core->name); >>>> + goto out; >>>> + } >>>> + >>> As hinted in the cover letter, I don't really agree with that. >>> >>> There are situations where we can't compute the rate. Getting invalid >>> value in the register is one reason. >>> >>> You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all >>> SoCs would be affected): >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 >>> Yes, PLL that have not been previously used (by the ROMCode or the >>> bootloader) tend to have the value of the divider set to 0 which in >>> invalid as it would result in a division by zero. >>> >>> I don't think this is a bug. It is just what the HW is, an unlocked, >>> uninitialized PLL. There is no problem here and the PLL can remain like >>> that until it is needed. >>> >>> IMO, whenever possible we should not put default values in the clocks >>> which is why I chose to leave it like that. >>> >>> The PLL being unlocked, it has no rate. It is not set to have any rate. >>> IMO a returning a rate of 0 is valid here. >> I agree that it should be possible to register so called unconfigured >> clock, which doesn't have any rate set yet. It might be used later by >> some drivers, which will do the whole initialization (set rate and or >> parents). > I think we should differentiate between the clock being fine from the > CCF point of view, and from the device point of view. I'm arguing about > the former, but the latter should definitely be up to other drivers to > set it up so that their hardware work. > > But in both cases, we should return something valid under the CCF > semantics. > >> I also wonder why the above patch triggers warning on the Amlogic >> Socs, while on Exynos I still have some unused clocks with 0 rate >> registered properly. > Most likely the clocks that return a rate of 0 are orphans (ie, their > parent isn't registered yet). Right. > We can't expect all the parents to be registered before their children, > so this is fine, and that patch doesn't break it. > > Did you test this series for exynos and meson then? Could you give your > tested-by if you did? :) Yes, I've tested the whole set. All my Exynos based test boards work fine. All my Meson based boards fails to boot due to this patch, thus my comment. Feel free to add to all but the last patch: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Best regards
On Fri 08 Apr 2022 at 14:55, Maxime Ripard <maxime@cerno.tech> wrote: >> >> No, they don't expect anything. They will return 0 until they are set >> with a an actual rate. I don't think returning 0 should be >> problem and it has not been so far. > > So, if I was to rephrase, gp0_pll_dco and hifi_pll_dco expect someone to > call clk_set_rate() before clk_get_rate() to have it return anything > other than 0. Yes. It has no rate. I won't change until something make it so > >> I understand the ambiguity you mentioned above. If the framework decides >> it is clearly forbidden to return 0, we'll change. >> >> Still I don't think it would be wise. What are the alternative if you >> can't compute a rate ? return 1 ? This looks aweful to me. At least 0 is >> a clear indication that the clock is not in a working state. > > No, the alternative would be to initialize the clock to something > sensible before registering it (or in init). Well, I disagree :/ > >> > and hdmi_pll_dco because it will always return 0, >> >> It is a read-only clock - whatever we do in CCF, it is not going to >> change. CCF has always supported RO clocks. > > Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's > useful in anyway, it should report its current rate. Read-only or not. > It does report its rate, it does not have any. ... and It would pretty weird to initialize a read-only clock. >> > unless the display driver comes around and updates it. If it never does, >> > or if it's not compiled in, then you're out of luck. >> >> I'm all for managing the display clocks from CCF but it has proved tricky >> so far. Maybe it will someday. >> >> Being a read-only clock, the value is what it is and CCF should >> deal with it gracefully. It has so far. >> >> If the driver actually managing the clock is not compiled in, then the >> clock will never be set, and it should not be a problem either. > > Again, it depends on what you expect from clk_get_rate(). If it can only > report a rate of 0 on error, then it's definitely a problem. Agreed, it depends on what you expect from clk_get_rate(). Still when something does not oscillate, the rate is 0. If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I think returning 0 makes sense. > > And it's not because it was done before that it wasn't just as > problematic then. > >> >> IMO, whenever possible we should not put default values in the clocks >> >> which is why I chose to leave it like that. >> >> >> >> The PLL being unlocked, it has no rate. It is not set to have any rate. >> >> IMO a returning a rate of 0 is valid here. >> > >> > If there's not a sensible default in the hardware already, I don't see >> > what the big issue is to be honest. You already kind of do that for all >> > the other PLL parameters with init_regs >> >> Those initial parameters are "magic" analog setting which don't have an >> impact on the rate setting. The initial rate of the clock is never set >> by the clock driver on purpose. > > It's still fundamentally the same: whatever is there at boot isn't good > enough, so you change it to have a somewhat sensible default. If you don't need it, no. > >> > , and most drivers do that for >> > various stuff: >> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/imx/clk-imx6q.c#L917 >> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 >> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/rockchip/clk-rk3036.c#L448 >> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/sunxi-ng/ccu-sun8i-h3.c#L1157 >> > https://elixir.bootlin.com/linux/latest/source/drivers/clk/tegra/clk-tegra20.c#L1013 >> >> It is done by other drivers or controllers, yes. It does not make it >> right (again, it is just my opinion). Rate should never be set by the >> clock driver or the clock controller - Those should just implement what >> consumer wants. I would agree it sometimes proves tricky to hold onto >> this. >> >> Taking one of the example above: >> https://elixir.bootlin.com/linux/latest/source/drivers/clk/nxp/clk-lpc32xx.c#L1550 >> >> I think it would be better to have an "assigned-clock" on the related >> PLL in the USB node of the platform DT. That way the PLL is set when >> needed. > > Both are valid. Assigned-clocks is arguably more fragile, but that's not > really the point. > >> If we go down the road of "others are doing it, so why not ?", I think Marek >> initial regression report mentioned Exynos too ;) > > Yes, he did mention exynos as well, but the issue was entirely > different. > > Exynos had the issue that req_rate wasn't updated whenever a clock > wasn't orphan anymore because it changed parent. It's fixed in patch 10. > > Out of the drivers I tested this week (sunxi-ng, exynos, omap3, qcom, > imx6, imx8 and g12b), only meson is in this case. > >> > If the driver needs that kind of quirks in order to make the clock >> > usable in itself, then it just makes sense to do that, especially if >> > it's to avoid breaking a generic API. >> >> As it is the clock are usable and it did not break anything so far. > > Anything but the abstraction. > >> I have no problem updating the drivers if need be. I do have a problem >> with the framework changing and requiring the clock driver to set an >> initial rate to make it happy. > > I mean, the alternative is changing the semantics of clk_get_rate(), and > all the call sites. Fixing one inconsistent driver definitely seems > preferable. > > Maxime > > [[End of PGP Signed Part]]
On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote: > >> > and hdmi_pll_dco because it will always return 0, > >> > >> It is a read-only clock - whatever we do in CCF, it is not going to > >> change. CCF has always supported RO clocks. > > > > Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's > > useful in anyway, it should report its current rate. Read-only or not. > > > > It does report its rate, it does not have any. > ... and It would pretty weird to initialize a read-only clock. The KMS driver doesn't seem to have a problem with that. > >> > unless the display driver comes around and updates it. If it never does, > >> > or if it's not compiled in, then you're out of luck. > >> > >> I'm all for managing the display clocks from CCF but it has proved tricky > >> so far. Maybe it will someday. > >> > >> Being a read-only clock, the value is what it is and CCF should > >> deal with it gracefully. It has so far. > >> > >> If the driver actually managing the clock is not compiled in, then the > >> clock will never be set, and it should not be a problem either. > > > > Again, it depends on what you expect from clk_get_rate(). If it can only > > report a rate of 0 on error, then it's definitely a problem. > > Agreed, it depends on what you expect from clk_get_rate(). > Still when something does not oscillate, the rate is 0. > > If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I > think returning 0 makes sense. You're confusing two things: the rate output by the hardware, and what the CCF needs to return. We're discussing the latter here, a software construct. It models the hardware, but it doesn't have to be true to the hardware. And even the meson driver doesn't follow what you're claiming to the letter and is inconsistent with what you're saying. Any disabled gate will also have a hardware rate of 0. Yet, it doesn't return 0 in such a case. And no one does, because clk_get_rate() isn't supposed to return the actual rate in hardware at the moment. It's supposed to return what would be the rate if it was enabled. > > And it's not because it was done before that it wasn't just as > > problematic then. > > > >> >> IMO, whenever possible we should not put default values in the clocks > >> >> which is why I chose to leave it like that. > >> >> > >> >> The PLL being unlocked, it has no rate. It is not set to have any rate. > >> >> IMO a returning a rate of 0 is valid here. > >> > > >> > If there's not a sensible default in the hardware already, I don't see > >> > what the big issue is to be honest. You already kind of do that for all > >> > the other PLL parameters with init_regs > >> > >> Those initial parameters are "magic" analog setting which don't have an > >> impact on the rate setting. The initial rate of the clock is never set > >> by the clock driver on purpose. > > > > It's still fundamentally the same: whatever is there at boot isn't good > > enough, so you change it to have a somewhat sensible default. > > If you don't need it, no. I mean, I provided multiple examples that the meson driver is being inconsistent with both the CCF documentation and the expected usage of the CCF consumers. And I suggested solutions to fix this inconsistency both here and on IRC to Neil and you. The only argument you provided so far is that you don't like or want it. I'm sorry you feel this way, but it doesn't change either the consensus or the documentation that every other driver and consumer is following. Nor does it provide any solution. In the grand scheme of things, I don't care, if you want to have your own conventions a policies, go ahead. Just don't expect me to debug whatever is going on next time it falls apart. Maxime
On 08/04/2022 17:36, Maxime Ripard wrote: > On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote: >>>>> and hdmi_pll_dco because it will always return 0, >>>> >>>> It is a read-only clock - whatever we do in CCF, it is not going to >>>> change. CCF has always supported RO clocks. >>> >>> Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's >>> useful in anyway, it should report its current rate. Read-only or not. >>> >> >> It does report its rate, it does not have any. >> ... and It would pretty weird to initialize a read-only clock. > > The KMS driver doesn't seem to have a problem with that. > >>>>> unless the display driver comes around and updates it. If it never does, >>>>> or if it's not compiled in, then you're out of luck. >>>> >>>> I'm all for managing the display clocks from CCF but it has proved tricky >>>> so far. Maybe it will someday. >>>> >>>> Being a read-only clock, the value is what it is and CCF should >>>> deal with it gracefully. It has so far. >>>> >>>> If the driver actually managing the clock is not compiled in, then the >>>> clock will never be set, and it should not be a problem either. >>> >>> Again, it depends on what you expect from clk_get_rate(). If it can only >>> report a rate of 0 on error, then it's definitely a problem. >> >> Agreed, it depends on what you expect from clk_get_rate(). >> Still when something does not oscillate, the rate is 0. >> >> If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I >> think returning 0 makes sense. > > You're confusing two things: the rate output by the hardware, and what > the CCF needs to return. We're discussing the latter here, a software > construct. It models the hardware, but it doesn't have to be true to the > hardware. > > And even the meson driver doesn't follow what you're claiming to the > letter and is inconsistent with what you're saying. Any disabled gate > will also have a hardware rate of 0. Yet, it doesn't return 0 in such a > case. And no one does, because clk_get_rate() isn't supposed to return > the actual rate in hardware at the moment. It's supposed to return what > would be the rate if it was enabled. You're pointing a real issue, what should we return as a rate then ? The rate is not only the rate output by the HW but is also used by the whole subtree of the PLL to calculate the rates. If we return a fake or dummy rate instead of 0 (which is the physical reality), the clock subtree will be populated from a fake rate and if we did set a clock rate using assigned-clock, eventually this fake rate would match and no set_rate() would be called on the PLL. This is why we return the true physical rate, it's the software to adapt in order to handle the possible HW states. > >>> And it's not because it was done before that it wasn't just as >>> problematic then. >>> >>>>>> IMO, whenever possible we should not put default values in the clocks >>>>>> which is why I chose to leave it like that. >>>>>> >>>>>> The PLL being unlocked, it has no rate. It is not set to have any rate. >>>>>> IMO a returning a rate of 0 is valid here. >>>>> >>>>> If there's not a sensible default in the hardware already, I don't see >>>>> what the big issue is to be honest. You already kind of do that for all >>>>> the other PLL parameters with init_regs >>>> >>>> Those initial parameters are "magic" analog setting which don't have an >>>> impact on the rate setting. The initial rate of the clock is never set >>>> by the clock driver on purpose. >>> >>> It's still fundamentally the same: whatever is there at boot isn't good >>> enough, so you change it to have a somewhat sensible default. >> >> If you don't need it, no. > > I mean, I provided multiple examples that the meson driver is being > inconsistent with both the CCF documentation and the expected usage of > the CCF consumers. And I suggested solutions to fix this inconsistency > both here and on IRC to Neil and you. It's not that we don't like it, we only care the actual HW state is correctly reported in the clock tree, nothing else. If a new flag like CLK_INVALID_RATE existed then we would use it. But so far a zero rate never was problematic, and iMX8mp HDMI PLL and MSM mmpll2 also return a 0 rate as init because it's probably the default PLL state at HW startup. > > The only argument you provided so far is that you don't like or want it. > I'm sorry you feel this way, but it doesn't change either the consensus > or the documentation that every other driver and consumer is following. > Nor does it provide any solution. > > In the grand scheme of things, I don't care, if you want to have your > own conventions a policies, go ahead. Just don't expect me to debug > whatever is going on next time it falls apart. > > Maxime Neil
On Fri 08 Apr 2022 at 17:36, Maxime Ripard <maxime@cerno.tech> wrote: > > You're confusing two things: the rate output by the hardware, and what > the CCF needs to return. We're discussing the latter here, a software > construct. It models the hardware, but it doesn't have to be true to the > hardware. > > And even the meson driver doesn't follow what you're claiming to the > letter and is inconsistent with what you're saying. Any disabled gate > will also have a hardware rate of 0. Yet, it doesn't return 0 in such a > case. And no one does, because clk_get_rate() isn't supposed to return > the actual rate in hardware at the moment. It's supposed to return what > would be the rate if it was enabled. I don't think I am confused at all. What rate would you get if you would hit enable on those PLLs with the invalid setting as they are ? 0 - no lock. recalc_rate() is about giving the rate of the clock based on its current parameters. This is exactly what the amlogic PLL driver does. I can't compute a division by zero, sorry. The fact that some parameters may be invalid for a clock element is not specific to any SoC and we have to deal with it. It is not up to the clock driver to select a random rate to just to make that valid again, assuming it even can do so which is not necessarily the case.
On Mon, Apr 11, 2022 at 09:40:03AM +0200, Neil Armstrong wrote: > On 08/04/2022 17:36, Maxime Ripard wrote: > > On Fri, Apr 08, 2022 at 04:48:08PM +0200, Jerome Brunet wrote: > > > > > > and hdmi_pll_dco because it will always return 0, > > > > > > > > > > It is a read-only clock - whatever we do in CCF, it is not going to > > > > > change. CCF has always supported RO clocks. > > > > > > > > Yes, if a clock has a rate of 0Hz it's entirely useless. And if it's > > > > useful in anyway, it should report its current rate. Read-only or not. > > > > > > > > > > It does report its rate, it does not have any. > > > ... and It would pretty weird to initialize a read-only clock. > > > > The KMS driver doesn't seem to have a problem with that. > > > > > > > > unless the display driver comes around and updates it. If it never does, > > > > > > or if it's not compiled in, then you're out of luck. > > > > > > > > > > I'm all for managing the display clocks from CCF but it has proved tricky > > > > > so far. Maybe it will someday. > > > > > > > > > > Being a read-only clock, the value is what it is and CCF should > > > > > deal with it gracefully. It has so far. > > > > > > > > > > If the driver actually managing the clock is not compiled in, then the > > > > > clock will never be set, and it should not be a problem either. > > > > > > > > Again, it depends on what you expect from clk_get_rate(). If it can only > > > > report a rate of 0 on error, then it's definitely a problem. > > > > > > Agreed, it depends on what you expect from clk_get_rate(). > > > Still when something does not oscillate, the rate is 0. > > > > > > If a driver call clk_get_rate() on an uninitialized, unlocked PLL, I > > > think returning 0 makes sense. > > > > You're confusing two things: the rate output by the hardware, and what > > the CCF needs to return. We're discussing the latter here, a software > > construct. It models the hardware, but it doesn't have to be true to the > > hardware. > > > > And even the meson driver doesn't follow what you're claiming to the > > letter and is inconsistent with what you're saying. Any disabled gate > > will also have a hardware rate of 0. Yet, it doesn't return 0 in such a > > case. And no one does, because clk_get_rate() isn't supposed to return > > the actual rate in hardware at the moment. It's supposed to return what > > would be the rate if it was enabled. > > You're pointing a real issue, what should we return as a rate then ? > > The rate is not only the rate output by the HW but is also used by the > whole subtree of the PLL to calculate the rates. > > If we return a fake or dummy rate instead of 0 (which is the physical > reality), the clock subtree will be populated from a fake rate and if > we did set a clock rate using assigned-clock, eventually this fake > rate would match and no set_rate() would be called on the PLL. > > This is why we return the true physical rate, it's the software to > adapt in order to handle the possible HW states. I'm not sure why you insist so much on the "true physical rate" tbh. That was never really a thing. It's not for a gate, it's not for a spread-spectrum clock (which would be even more interesting to report properly following your criteria). And then you have the clock accuracies that are throwing it off entirely too. It just isn't practical, and that's why no-one relies on it. Or let's turn it the other way around. When you'll have called clk_enable() on those clocks (but only that), what do you expect the rate to be? 0? Yet, the "true physical rate" probably changed by then. If that was a thing, we would also have to: * make sure clk_set_rate can't be called on a disabled clock, because the true physical rate would never match what is required * rate ranges just wouldn't be usable either, because we would have no way to check that the true physical rate is actually within that range at any given time. * All clocks rate wouldn't be cacheable to account for the jitter in the system clock source. * We would need to use some hardware engine that reads the clock rate being output at any given time, to account for spread-spectrum and the clock source accuracy. * We wouldn't be able to report phases properly, because you can't really measure the clock skew either. I understand that it's convenient for you, but it isn't practical if we look at it from the framework's PoV. You could very well initialize the PLLs to have a 1/1 ratio to their parent, that would probably be sensible and would make the framework happy. > > > > And it's not because it was done before that it wasn't just as > > > > problematic then. > > > > > > > > > > > IMO, whenever possible we should not put default values in the clocks > > > > > > > which is why I chose to leave it like that. > > > > > > > > > > > > > > The PLL being unlocked, it has no rate. It is not set to have any rate. > > > > > > > IMO a returning a rate of 0 is valid here. > > > > > > > > > > > > If there's not a sensible default in the hardware already, I don't see > > > > > > what the big issue is to be honest. You already kind of do that for all > > > > > > the other PLL parameters with init_regs > > > > > > > > > > Those initial parameters are "magic" analog setting which don't have an > > > > > impact on the rate setting. The initial rate of the clock is never set > > > > > by the clock driver on purpose. > > > > > > > > It's still fundamentally the same: whatever is there at boot isn't good > > > > enough, so you change it to have a somewhat sensible default. > > > > > > If you don't need it, no. > > > > I mean, I provided multiple examples that the meson driver is being > > inconsistent with both the CCF documentation and the expected usage of > > the CCF consumers. And I suggested solutions to fix this inconsistency > > both here and on IRC to Neil and you. > > It's not that we don't like it, we only care the actual HW state is > correctly reported in the clock tree, nothing else. > > If a new flag like CLK_INVALID_RATE existed then we would use it. > > But so far a zero rate never was problematic, and iMX8mp HDMI PLL That patch isn't upstream. > and MSM mmpll2 also return a 0 rate as init because it's probably the > default PLL state at HW startup. And two wrongs never made a right.
Quoting Maxime Ripard (2022-04-08 02:10:37) > A rate of 0 for a clock is considered an error, as evidenced by the > documentation of clk_get_rate() and the code of clk_get_rate() and > clk_core_get_rate_nolock(). > > The main source of that error is if the clock is supposed to have a > parent but is orphan at the moment of the call. This is likely to be > transient and solved later in the life of the system as more clocks are > registered. > > The corollary is thus that if a clock is not an orphan, has a parent that > has a rate (so is not an orphan itself either) but returns a rate of 0, > something is wrong in the driver. Let's return an error in such a case. > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > --- > drivers/clk/clk.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index 8bbb6adeeead..e8c55678da85 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > rate = 0; > core->rate = core->req_rate = rate; > > + /* > + * If we're not an orphan clock and our parent has a rate, then > + * if our rate is 0, something is badly broken in recalc_rate. > + */ > + if (!core->orphan && (parent && parent->rate) && !core->rate) { It's possible that it is an orphan at time of registration, so this check doesn't even cover the case when it is parented by a later clk registration. How would we error out when parenting the clk to the parent if recalc_rate then starts returning 0? It doesn't seem possible to implement this.
Quoting Maxime Ripard (2022-04-08 03:41:27) > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote: > > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > > > A rate of 0 for a clock is considered an error, as evidenced by the > > > documentation of clk_get_rate() and the code of clk_get_rate() and > > > clk_core_get_rate_nolock(). Where? > > > > > > The main source of that error is if the clock is supposed to have a > > > parent but is orphan at the moment of the call. This is likely to be > > > transient and solved later in the life of the system as more clocks are > > > registered. > > > > > > The corollary is thus that if a clock is not an orphan, has a parent that > > > has a rate (so is not an orphan itself either) but returns a rate of 0, > > > something is wrong in the driver. Let's return an error in such a case. > > > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > > --- > > > drivers/clk/clk.c | 10 ++++++++++ > > > 1 file changed, 10 insertions(+) > > > > > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > > > index 8bbb6adeeead..e8c55678da85 100644 > > > --- a/drivers/clk/clk.c > > > +++ b/drivers/clk/clk.c > > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > > > rate = 0; > > > core->rate = core->req_rate = rate; > > > > > > + /* > > > + * If we're not an orphan clock and our parent has a rate, then > > > + * if our rate is 0, something is badly broken in recalc_rate. > > > + */ > > > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > > > + ret = -EINVAL; > > > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > > > + goto out; > > > + } > > > + > > > > As hinted in the cover letter, I don't really agree with that. > > > > There are situations where we can't compute the rate. Getting invalid > > value in the register is one reason. > > > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > > SoCs would be affected): > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > > Yes, PLL that have not been previously used (by the ROMCode or the > > bootloader) tend to have the value of the divider set to 0 which in > > invalid as it would result in a division by zero. > > > > I don't think this is a bug. It is just what the HW is, an unlocked, > > uninitialized PLL. There is no problem here and the PLL can remain like > > that until it is needed. > > I think the larger issue is around the semantics of clk_get_rate(), and > especially whether we can call it without a clk_enable(), and whether > returning 0 is fine. > > The (clk.h) documentation of clk_get_rate() mentions that "This is only > valid once the clock source has been enabled", and it's fairly > ambiguous. I can see how it could be interpreted as "you need to call > clk_enable() before calling clk_get_rate()", but it can also be > interpreted as "The returned rate will only be valid once clk_enable() > is called". I enjoy the ambiguity! :) This question has come up before and it doesn't really matter. Drivers can call clk_prepare_enable() if they want to be sure that clk_get_rate() is meaningful to them, or they can not. The CCF returns a rate that it gets from calling recalc_rate, which could be inaccurate for others reasons, either because some driver has called clk_set_rate() after the clk_get_rate() or because the clk is an orphan still and clk_get() succeeded, or because the clk_op couldn't calculate it at the time of caching. Indeed the CCF doesn't try to recalc the rate after enabling the clk. Maybe we should do that? It would mean that we have to schedule a work from the enable path to update the rate accounting outside of any atomic context. Just thinking out loud, the simpler solution is to probably drop all rate caching in the CCF and get the frequency on a clk_get_rate() call. It complicates some of the core though when we check to see if we need to update clk rates. We could have some middle ground where drivers indicate that they want to update their cached rate because it's valid now (either from their enable path or from somewhere else). This may be nice actually because we could have clk providers call this to force a recalc down the tree from where they've updated. I think the qcom DisplayPort phy would want this. > > I think the latter is the proper interpretation though based on what the > drivers are doing, and even the CCF itself will call recalc_rate without > making sure that the clock is enabled (in __clk_core_init() for example). > > Then there is the question of whether returning 0 is fine. Again > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > returns 0.". This is indeed returned in case of an error condition (in > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). A NULL clk isn't an error. We use NULL in the CCF to indicate that it's an optional clk. Returning 0 from clk_get_rate() is not an error. If clk_get() returns an error pointer then it's an error. And NULL isn't an error value per PTR_ERR() (because NULL == 0 when casted, this isn't golang). > > All the drivers I could find either assume the rate is valid, or test > whether it's 0 or not (randomly picked, but across completely different > platforms): > https://elixir.bootlin.com/linux/latest/source/drivers/clocksource/armv7m_systick.c#L50 > https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/armada-8k-cpufreq.c#L74 > https://elixir.bootlin.com/linux/latest/source/sound/soc/sti/uniperif_player.c#L194 > https://elixir.bootlin.com/linux/latest/source/sound/soc/tegra/tegra20_i2s.c#L278 > > So my understanding is that the consensus is that clk_get_rate() can be > called even if the clock hasn't been enabled, and that returning 0 is > only meant to be used for errors in general, a NULL pointer according to > the documentation. Again, NULL isn't an invalid clk handle. > > That would mean that pcie_pll_dco is buggy because it assumes that > clk_enable() is going to be called before clk_get_rate(), gp0_pll_dco > and hifi_pll_dco because they expect "someone" to call clk_set_rate() > before clk_get_rate(), and hdmi_pll_dco because it will always return 0, > unless the display driver comes around and updates it. If it never does, > or if it's not compiled in, then you're out of luck. > I think this is all fine.
On Fri, Apr 22, 2022 at 09:12:33PM -0700, Stephen Boyd wrote: > Quoting Maxime Ripard (2022-04-08 02:10:37) > > A rate of 0 for a clock is considered an error, as evidenced by the > > documentation of clk_get_rate() and the code of clk_get_rate() and > > clk_core_get_rate_nolock(). > > > > The main source of that error is if the clock is supposed to have a > > parent but is orphan at the moment of the call. This is likely to be > > transient and solved later in the life of the system as more clocks are > > registered. > > > > The corollary is thus that if a clock is not an orphan, has a parent that > > has a rate (so is not an orphan itself either) but returns a rate of 0, > > something is wrong in the driver. Let's return an error in such a case. > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > --- > > drivers/clk/clk.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > > index 8bbb6adeeead..e8c55678da85 100644 > > --- a/drivers/clk/clk.c > > +++ b/drivers/clk/clk.c > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > > rate = 0; > > core->rate = core->req_rate = rate; > > > > + /* > > + * If we're not an orphan clock and our parent has a rate, then > > + * if our rate is 0, something is badly broken in recalc_rate. > > + */ > > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > > It's possible that it is an orphan at time of registration, so this > check doesn't even cover the case when it is parented by a later clk > registration. How would we error out when parenting the clk to the > parent if recalc_rate then starts returning 0? It doesn't seem possible > to implement this. Like I said in my cover letter, this was mostly to spark a discussion :) Indeed, that case you mentioned wouldn't be covered by this check. I don't think this patch is reasonable either :) I mostly wanted to discuss whether you felt like it was something that was ok or not. If it isn't, I think a good way forward would be to add a bunch of pr_warn messages to mention that something is fishy and to clarify the doc. Hopefully, it will raise both the attention of developers of already in-tree drivers to fix this, and will prevent new drivers from introducing more of that behavior. If the issue persists we could then take stronger measures some time in the future if needs be? Maxime
On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote: > Quoting Maxime Ripard (2022-04-08 03:41:27) > > On Fri, Apr 08, 2022 at 11:18:58AM +0200, Jerome Brunet wrote: > > > On Fri 08 Apr 2022 at 11:10, Maxime Ripard <maxime@cerno.tech> wrote: > > > > A rate of 0 for a clock is considered an error, as evidenced by the > > > > documentation of clk_get_rate() and the code of clk_get_rate() and > > > > clk_core_get_rate_nolock(). > > Where? > > > > > > > > > The main source of that error is if the clock is supposed to have a > > > > parent but is orphan at the moment of the call. This is likely to be > > > > transient and solved later in the life of the system as more clocks are > > > > registered. > > > > > > > > The corollary is thus that if a clock is not an orphan, has a parent that > > > > has a rate (so is not an orphan itself either) but returns a rate of 0, > > > > something is wrong in the driver. Let's return an error in such a case. > > > > > > > > Signed-off-by: Maxime Ripard <maxime@cerno.tech> > > > > --- > > > > drivers/clk/clk.c | 10 ++++++++++ > > > > 1 file changed, 10 insertions(+) > > > > > > > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > > > > index 8bbb6adeeead..e8c55678da85 100644 > > > > --- a/drivers/clk/clk.c > > > > +++ b/drivers/clk/clk.c > > > > @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) > > > > rate = 0; > > > > core->rate = core->req_rate = rate; > > > > > > > > + /* > > > > + * If we're not an orphan clock and our parent has a rate, then > > > > + * if our rate is 0, something is badly broken in recalc_rate. > > > > + */ > > > > + if (!core->orphan && (parent && parent->rate) && !core->rate) { > > > > + ret = -EINVAL; > > > > + pr_warn("%s: recalc_rate returned a null rate\n", core->name); > > > > + goto out; > > > > + } > > > > + > > > > > > As hinted in the cover letter, I don't really agree with that. > > > > > > There are situations where we can't compute the rate. Getting invalid > > > value in the register is one reason. > > > > > > You mentioned the PLLs of the Amlogic SoCs (it is not limited to g12 - all > > > SoCs would be affected): > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/meson/clk-pll.c#n82 > > > Yes, PLL that have not been previously used (by the ROMCode or the > > > bootloader) tend to have the value of the divider set to 0 which in > > > invalid as it would result in a division by zero. > > > > > > I don't think this is a bug. It is just what the HW is, an unlocked, > > > uninitialized PLL. There is no problem here and the PLL can remain like > > > that until it is needed. > > > > I think the larger issue is around the semantics of clk_get_rate(), and > > especially whether we can call it without a clk_enable(), and whether > > returning 0 is fine. > > > > The (clk.h) documentation of clk_get_rate() mentions that "This is only > > valid once the clock source has been enabled", and it's fairly > > ambiguous. I can see how it could be interpreted as "you need to call > > clk_enable() before calling clk_get_rate()", but it can also be > > interpreted as "The returned rate will only be valid once clk_enable() > > is called". > > I enjoy the ambiguity! :) This question has come up before and it > doesn't really matter. Drivers can call clk_prepare_enable() if they > want to be sure that clk_get_rate() is meaningful to them, or they can > not. Right, but that alone already removes the ambiguity :) If not calling clk_enable() prior to calling clk_get_rate() is ok, then we don't *need* to call it, both are valid. Therefore, the "you need to call clk_enable() before calling clk_get_rate()" interpretation isn't the right one, right? > The CCF returns a rate that it gets from calling recalc_rate, which > could be inaccurate for others reasons, either because some driver has > called clk_set_rate() after the clk_get_rate() Right, but I guess that's ok, the CCF never claimed to support atomicity in any way. That could also be the result of a parent changing its rate, or a reparenting, etc. Still, it means that it was the best estimate we could do when clk_get_rate() was called. It just turned out to be short-lived. > or because the clk is an orphan still and clk_get() succeeded In this case, core->parent is NULL, so we could argue that it's the same case than clk_get_rate(NULL). It makes sense. > or because the clk_op couldn't calculate it at the time of caching. I guess this is what the discussion is all about, whether it's actually something that we should find it acceptable or not. From what you're saying, it is indeed acceptable and thus clk_get_rate() can return 0 not only from a NULL clock, but also if it's incapable of figuring out a rate for that clock. We should thus document that recalc_rate can indeed return 0 in such a case, and that users should take this into account and not always expect a valid rate. Would a documentation for recalc_rate like that: Recalculate the rate of this clock, by querying hardware. The parent rate is an input parameter. It is up to the caller to ensure that the prepare_mutex is held across this call. Returns the calculated rate. If the driver cannot figure out a rate for this clock, it must return 0. Optional, but recommended - if this op is not set then clock rate will be initialized to 0. And for clk_get_rate() (in drivers/clk/clk.c): Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE flag is set, which means a recalc_rate will be issued. Can be called if the clock is enabled or not. If clk is NULL, or if an error occurred, then returns 0. Work for everyone? > Indeed the CCF doesn't try to recalc the rate after enabling the clk. > Maybe we should do that? It would mean that we have to schedule a work > from the enable path to update the rate accounting outside of any > atomic context. > > Just thinking out loud, the simpler solution is to probably drop all > rate caching in the CCF and get the frequency on a clk_get_rate() call. The above is just enough for me. recalc_rate returning 0 is valid, and should be taken properly into account. Longer term I guess it makes sense to drop the rate caching, but I've introduced enough clock regressions already for a couple of releases :) > It complicates some of the core though when we check to see if we need > to update clk rates. We could have some middle ground where drivers > indicate that they want to update their cached rate because it's valid > now (either from their enable path or from somewhere else). This may be > nice actually because we could have clk providers call this to force a > recalc down the tree from where they've updated. I think the qcom > DisplayPort phy would want this. A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those clocks. > > > > I think the latter is the proper interpretation though based on what the > > drivers are doing, and even the CCF itself will call recalc_rate without > > making sure that the clock is enabled (in __clk_core_init() for example). > > > > Then there is the question of whether returning 0 is fine. Again > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > > returns 0.". This is indeed returned in case of an error condition (in > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). > > A NULL clk isn't an error. We use NULL in the CCF to indicate that it's > an optional clk. Returning 0 from clk_get_rate() is not an error. If > clk_get() returns an error pointer then it's an error. And NULL isn't an > error value per PTR_ERR() (because NULL == 0 when casted, this isn't > golang). Ok. So we can either group both the "we couldn't figure out a rate" and the "our clock is NULL" case as the same case in clk_get_rate(), or we can convert clk_get_rate() to return signed integers and a negative error code. The latter seems much more intrusive to me and will probably lead to a whole bunch of regressions, so I'd be more inclined to stick to the former. Especially since drivers seem to either don't care or treat 0 as an error already. Maxime
Quoting Maxime Ripard (2022-04-23 02:17:24) > On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote: > > I enjoy the ambiguity! :) This question has come up before and it > > doesn't really matter. Drivers can call clk_prepare_enable() if they > > want to be sure that clk_get_rate() is meaningful to them, or they can > > not. > > Right, but that alone already removes the ambiguity :) > > If not calling clk_enable() prior to calling clk_get_rate() is ok, then > we don't *need* to call it, both are valid. > > Therefore, the "you need to call clk_enable() before calling > clk_get_rate()" interpretation isn't the right one, right? It depends. The CCF isn't the only implementation of the clk API, so some drivers want to maintain API compatibility and call clk_enable() before getting the rate to make sure the rate is valid. If all other clk API implementations are removed or made to work without calling clk_enable() we should be able to update the clk.h documentation. > > > The CCF returns a rate that it gets from calling recalc_rate, which > > could be inaccurate for others reasons, either because some driver has > > called clk_set_rate() after the clk_get_rate() > > Right, but I guess that's ok, the CCF never claimed to support atomicity > in any way. That could also be the result of a parent changing its rate, > or a reparenting, etc. > > Still, it means that it was the best estimate we could do when > clk_get_rate() was called. It just turned out to be short-lived. > > > or because the clk is an orphan still and clk_get() succeeded > > In this case, core->parent is NULL, so we could argue that it's the same > case than clk_get_rate(NULL). It makes sense. > > > or because the clk_op couldn't calculate it at the time of caching. > > I guess this is what the discussion is all about, whether it's actually > something that we should find it acceptable or not. > > From what you're saying, it is indeed acceptable and thus clk_get_rate() > can return 0 not only from a NULL clock, but also if it's incapable of > figuring out a rate for that clock. We should thus document that > recalc_rate can indeed return 0 in such a case, and that users should > take this into account and not always expect a valid rate. I don't think there's anything to update. > > Would a documentation for recalc_rate like that: > > Recalculate the rate of this clock, by querying hardware. The parent > rate is an input parameter. It is up to the caller to ensure that the > prepare_mutex is held across this call. Returns the calculated rate. If > the driver cannot figure out a rate for this clock, it must return 0. What would the driver return besides 0 if the rate can't be calculated? UINT_MAX? Does the framework care that the rate couldn't be calculated? Weakening "must" to "should" or saying it "will most likely return 0" is better. But again I don't see much point in clarifying here. 0 Hz is a valid frequency. Maybe we should just say that 0 is valid? > Optional, but recommended - if this op is not set then clock rate will > be initialized to 0. > > And for clk_get_rate() (in drivers/clk/clk.c): > > Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE > flag is set, which means a recalc_rate will be issued. Can be called if > the clock is enabled or not. If clk is NULL, or if an error occurred, > then returns 0. We should remove the clk_get_rate() documentation in clk.c and let the clk.h documentation be the one that is used. That will clarify that clk.c implements the API in clk.h. I've been meaning to clean up the documentation so I'll hack on that tomorrow and see if I can make this better. > > Work for everyone? > > > Indeed the CCF doesn't try to recalc the rate after enabling the clk. > > Maybe we should do that? It would mean that we have to schedule a work > > from the enable path to update the rate accounting outside of any > > atomic context. > > > > Just thinking out loud, the simpler solution is to probably drop all > > rate caching in the CCF and get the frequency on a clk_get_rate() call. > > The above is just enough for me. recalc_rate returning 0 is valid, and > should be taken properly into account. > > Longer term I guess it makes sense to drop the rate caching, but I've > introduced enough clock regressions already for a couple of releases :) Ok. > > > It complicates some of the core though when we check to see if we need > > to update clk rates. We could have some middle ground where drivers > > indicate that they want to update their cached rate because it's valid > > now (either from their enable path or from somewhere else). This may be > > nice actually because we could have clk providers call this to force a > > recalc down the tree from where they've updated. I think the qcom > > DisplayPort phy would want this. > > A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those > clocks. It's not a good workaround. The clk flag isn't used internally. That's the problem. It's only used for callers of clk_get_rate(), but we have rate setting logic where a child doesn't know that the rate should be recalculated for the parent. It's also not a transitive flag, so if a clk with the flag set has children we don't care that clk_get_rate() is called on those children when it should really call up to the parent and figure out how high to go before calculating the frequency down. This is where an API that pushes the information to the framework makes sense so that we can push the update instead of pull it on clk_get_rate(). Anyway, it's a problem for another day. > > > > > > > I think the latter is the proper interpretation though based on what the > > > drivers are doing, and even the CCF itself will call recalc_rate without > > > making sure that the clock is enabled (in __clk_core_init() for example). > > > > > > Then there is the question of whether returning 0 is fine. Again > > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > > > returns 0.". This is indeed returned in case of an error condition (in > > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). > > > > A NULL clk isn't an error. We use NULL in the CCF to indicate that it's > > an optional clk. Returning 0 from clk_get_rate() is not an error. If > > clk_get() returns an error pointer then it's an error. And NULL isn't an > > error value per PTR_ERR() (because NULL == 0 when casted, this isn't > > golang). > > Ok. So we can either group both the "we couldn't figure out a rate" and > the "our clock is NULL" case as the same case in clk_get_rate(), or we > can convert clk_get_rate() to return signed integers and a negative > error code. The latter seems much more intrusive to me and will probably > lead to a whole bunch of regressions, so I'd be more inclined to stick > to the former. Especially since drivers seem to either don't care or > treat 0 as an error already. Agreed. We can try to work around the "not ready to figure out the rate" problem by implementing EPROBE_DEFER for clks that are orphans. That needs some thought around cases where clks have parents that aren't described in the kernel and we want to reparent them via clk_set_rate() or clk_set_parent() or their parent is provided by the driver getting the clk (this latter case is a head spinner). In general indicating this not fully registered state through clk_get_rate() is wrong though, hence the decision to make clk_get_rate() always return something valid and have the "don't know" case be returning zero.
On Thu, Apr 28, 2022 at 07:08:10PM -0700, Stephen Boyd wrote: > Quoting Maxime Ripard (2022-04-23 02:17:24) > > On Fri, Apr 22, 2022 at 09:42:26PM -0700, Stephen Boyd wrote: > > > I enjoy the ambiguity! :) This question has come up before and it > > > doesn't really matter. Drivers can call clk_prepare_enable() if they > > > want to be sure that clk_get_rate() is meaningful to them, or they can > > > not. > > > > Right, but that alone already removes the ambiguity :) > > > > If not calling clk_enable() prior to calling clk_get_rate() is ok, then > > we don't *need* to call it, both are valid. > > > > Therefore, the "you need to call clk_enable() before calling > > clk_get_rate()" interpretation isn't the right one, right? > > It depends. The CCF isn't the only implementation of the clk API, so > some drivers want to maintain API compatibility and call clk_enable() > before getting the rate to make sure the rate is valid. If all other clk > API implementations are removed or made to work without calling > clk_enable() we should be able to update the clk.h documentation. > > > > > > The CCF returns a rate that it gets from calling recalc_rate, which > > > could be inaccurate for others reasons, either because some driver has > > > called clk_set_rate() after the clk_get_rate() > > > > Right, but I guess that's ok, the CCF never claimed to support atomicity > > in any way. That could also be the result of a parent changing its rate, > > or a reparenting, etc. > > > > Still, it means that it was the best estimate we could do when > > clk_get_rate() was called. It just turned out to be short-lived. > > > > > or because the clk is an orphan still and clk_get() succeeded > > > > In this case, core->parent is NULL, so we could argue that it's the same > > case than clk_get_rate(NULL). It makes sense. > > > > > or because the clk_op couldn't calculate it at the time of caching. > > > > I guess this is what the discussion is all about, whether it's actually > > something that we should find it acceptable or not. > > > > From what you're saying, it is indeed acceptable and thus clk_get_rate() > > can return 0 not only from a NULL clock, but also if it's incapable of > > figuring out a rate for that clock. We should thus document that > > recalc_rate can indeed return 0 in such a case, and that users should > > take this into account and not always expect a valid rate. > > I don't think there's anything to update. > > > > > Would a documentation for recalc_rate like that: > > > > Recalculate the rate of this clock, by querying hardware. The parent > > rate is an input parameter. It is up to the caller to ensure that the > > prepare_mutex is held across this call. Returns the calculated rate. If > > the driver cannot figure out a rate for this clock, it must return 0. > > What would the driver return besides 0 if the rate can't be calculated? > UINT_MAX? Well, we could expect the driver to never fail, and always be able to calculate its rate (provided its parent is there). This is what this patch is doing essentially, and it was working fine on a number of platforms, so it doesn't seem unfeasible. Either way, I'm not sure what your pushback is about here. Whether or not the driver can or cannot return 0 on failure, that should be made clear in the documentation, and for now it isn't. If we want to maintain the status quo, then it means that recalc_rate can return 0 on error, what's the harm in clearly stating that? > Does the framework care that the rate couldn't be calculated? I'd say to some extent, yes. The clk_set_rate_range() rework could just ignore the call to clk_set_rate() if the current rate is 0. That would have saved us from a couple of regressions. > Weakening "must" to "should" or saying it "will most likely return 0" is > better. But again I don't see much point in clarifying here. 0 Hz is a > valid frequency. Maybe we should just say that 0 is valid? Ok, so it's about the "must" part. Yeah, maybe that's the best solution? I don't know, but the current comment hints a bit at the "0 is an error" part with the reference to a NULL clk. > > Optional, but recommended - if this op is not set then clock rate will > > be initialized to 0. > > > > And for clk_get_rate() (in drivers/clk/clk.c): > > > > Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE > > flag is set, which means a recalc_rate will be issued. Can be called if > > the clock is enabled or not. If clk is NULL, or if an error occurred, > > then returns 0. > > We should remove the clk_get_rate() documentation in clk.c and let the > clk.h documentation be the one that is used. That will clarify that > clk.c implements the API in clk.h. I've been meaning to clean up the > documentation so I'll hack on that tomorrow and see if I can make this > better. ok, thanks :) > > Work for everyone? > > > > > Indeed the CCF doesn't try to recalc the rate after enabling the clk. > > > Maybe we should do that? It would mean that we have to schedule a work > > > from the enable path to update the rate accounting outside of any > > > atomic context. > > > > > > Just thinking out loud, the simpler solution is to probably drop all > > > rate caching in the CCF and get the frequency on a clk_get_rate() call. > > > > The above is just enough for me. recalc_rate returning 0 is valid, and > > should be taken properly into account. > > > > Longer term I guess it makes sense to drop the rate caching, but I've > > introduced enough clock regressions already for a couple of releases :) > > Ok. > > > > > > It complicates some of the core though when we check to see if we need > > > to update clk rates. We could have some middle ground where drivers > > > indicate that they want to update their cached rate because it's valid > > > now (either from their enable path or from somewhere else). This may be > > > nice actually because we could have clk providers call this to force a > > > recalc down the tree from where they've updated. I think the qcom > > > DisplayPort phy would want this. > > > > A good workaround for now would be to set CLK_GET_RATE_NOCACHE for those > > clocks. > > It's not a good workaround. The clk flag isn't used internally. That's > the problem. It's only used for callers of clk_get_rate(), but we have > rate setting logic where a child doesn't know that the rate should be > recalculated for the parent. It's also not a transitive flag, so if a > clk with the flag set has children we don't care that clk_get_rate() is > called on those children when it should really call up to the parent and > figure out how high to go before calculating the frequency down. This is > where an API that pushes the information to the framework makes sense so > that we can push the update instead of pull it on clk_get_rate(). Ah, I see what you mean, thanks > Anyway, it's a problem for another day. Yeah, I have a can of worms big enough as it is :) > > > > I think the latter is the proper interpretation though based on what the > > > > drivers are doing, and even the CCF itself will call recalc_rate without > > > > making sure that the clock is enabled (in __clk_core_init() for example). > > > > > > > > Then there is the question of whether returning 0 is fine. Again > > > > clk_get_rate() (clk.c) documentation states that "If clk is NULL then > > > > returns 0.". This is indeed returned in case of an error condition (in > > > > clk_get_rate() itself, but also in clk_core_get_rate_nolock()). > > > > > > A NULL clk isn't an error. We use NULL in the CCF to indicate that it's > > > an optional clk. Returning 0 from clk_get_rate() is not an error. If > > > clk_get() returns an error pointer then it's an error. And NULL isn't an > > > error value per PTR_ERR() (because NULL == 0 when casted, this isn't > > > golang). > > > > Ok. So we can either group both the "we couldn't figure out a rate" and > > the "our clock is NULL" case as the same case in clk_get_rate(), or we > > can convert clk_get_rate() to return signed integers and a negative > > error code. The latter seems much more intrusive to me and will probably > > lead to a whole bunch of regressions, so I'd be more inclined to stick > > to the former. Especially since drivers seem to either don't care or > > treat 0 as an error already. > > Agreed. We can try to work around the "not ready to figure out the rate" > problem by implementing EPROBE_DEFER for clks that are orphans. That > needs some thought around cases where clks have parents that aren't > described in the kernel and we want to reparent them via clk_set_rate() > or clk_set_parent() or their parent is provided by the driver getting > the clk (this latter case is a head spinner). In general indicating this > not fully registered state through clk_get_rate() is wrong though, hence > the decision to make clk_get_rate() always return something valid and > have the "don't know" case be returning zero. Ok maxime
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index 8bbb6adeeead..e8c55678da85 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -3773,6 +3773,16 @@ static int __clk_core_init(struct clk_core *core) rate = 0; core->rate = core->req_rate = rate; + /* + * If we're not an orphan clock and our parent has a rate, then + * if our rate is 0, something is badly broken in recalc_rate. + */ + if (!core->orphan && (parent && parent->rate) && !core->rate) { + ret = -EINVAL; + pr_warn("%s: recalc_rate returned a null rate\n", core->name); + goto out; + } + /* * Enable CLK_IS_CRITICAL clocks so newly added critical clocks * don't get accidentally disabled when walking the orphan tree and
A rate of 0 for a clock is considered an error, as evidenced by the documentation of clk_get_rate() and the code of clk_get_rate() and clk_core_get_rate_nolock(). The main source of that error is if the clock is supposed to have a parent but is orphan at the moment of the call. This is likely to be transient and solved later in the life of the system as more clocks are registered. The corollary is thus that if a clock is not an orphan, has a parent that has a rate (so is not an orphan itself either) but returns a rate of 0, something is wrong in the driver. Let's return an error in such a case. Signed-off-by: Maxime Ripard <maxime@cerno.tech> --- drivers/clk/clk.c | 10 ++++++++++ 1 file changed, 10 insertions(+)